Upload
big-data-spain
View
358
Download
5
Embed Size (px)
Citation preview
wwwbsces
Automating Big Data Benchmarking
and Performance Analysis with ALOJA
October 2015
Nicolas Poggi Senior Researcher
Barcelona Supercomputing Center (BSC)
Spanish national supercomputing center ndash 22 year history in Computer Architecture networking and distributed
systems research ndash Based at the Technical University of Catalonia (UPC)
Led by Mateo Valero ndash ACM fellow Eckert-Mauchly award 2007 Goode award 2009 ndash Active research staff with 1000+ publications
Large ongoing life science computational projects ndash Computational Genomics Molecular modeling amp Bioinformatics Protein
Interactions amp Docking
In place computational capabilities ndash Mare Nostrum Super Computer
Prominent body of research activity around Hadoop since 2008 ndash Previous to ALOJA
bull SLA-driven scheduling (Adaptive Scheduler) in memory caching etc
BSC-MSRS Centre ndash Long-term relationship between BSC Microsoft Research product teams
ndash ALOJA is the latest phase of the engagement to explore cost-efficient upcoming Big Data architectures
ndash Open model bull No patents public IP publications and open source main focus
The MareNostrum 3 Supercomputer
Over 1015 Floating Point Operations per
second
Nearly 50000 cores
1008 TB of main memory 2 PB of disk storage
70 distributed through PRACE
24 distributed through RES
6 for BSC-CNS use
Over 1015 Floating Point Operations per second
Nearly 50000 cores
1008 TB of main memory
2 PB of disk storage
Agenda
1 Intro on Hadoop
performance
1 Current scenario and
problematic
2 ALOJA project
1 Background
2 Open source tools
3 Benchmarking
1 Benchmarking workflow
2 DEMO
4 Results
1 HW and SW speedups
2 CostPerformance
3 Online results DEMO
5 Predictive Analytics and
learning
6 Future lines and conclusions
Intro Hadoop performance and ecosystem
Hadoop design
Hadoop was designed to solve complex data ndash Structured and non structured
ndash with [close to] linear scalability
ndash and application reliability
Simplifying the programming model ndash From MPI OpenMP CUDA hellip
Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins
ndash YARN abstracts even more
Image source Hadoop the definitive guide
Hadoop highly-scalable buthellip
Not a high-performance solution
Requires
ndash Design
bull Clusters topology clusters
ndash Setup
bull OS Hadoop config
ndash Fine tuning required
bull Iterative approach
bull Time consuming
and extensive benchmarking
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Barcelona Supercomputing Center (BSC)
Spanish national supercomputing center ndash 22 year history in Computer Architecture networking and distributed
systems research ndash Based at the Technical University of Catalonia (UPC)
Led by Mateo Valero ndash ACM fellow Eckert-Mauchly award 2007 Goode award 2009 ndash Active research staff with 1000+ publications
Large ongoing life science computational projects ndash Computational Genomics Molecular modeling amp Bioinformatics Protein
Interactions amp Docking
In place computational capabilities ndash Mare Nostrum Super Computer
Prominent body of research activity around Hadoop since 2008 ndash Previous to ALOJA
bull SLA-driven scheduling (Adaptive Scheduler) in memory caching etc
BSC-MSRS Centre ndash Long-term relationship between BSC Microsoft Research product teams
ndash ALOJA is the latest phase of the engagement to explore cost-efficient upcoming Big Data architectures
ndash Open model bull No patents public IP publications and open source main focus
The MareNostrum 3 Supercomputer
Over 1015 Floating Point Operations per
second
Nearly 50000 cores
1008 TB of main memory 2 PB of disk storage
70 distributed through PRACE
24 distributed through RES
6 for BSC-CNS use
Over 1015 Floating Point Operations per second
Nearly 50000 cores
1008 TB of main memory
2 PB of disk storage
Agenda
1 Intro on Hadoop
performance
1 Current scenario and
problematic
2 ALOJA project
1 Background
2 Open source tools
3 Benchmarking
1 Benchmarking workflow
2 DEMO
4 Results
1 HW and SW speedups
2 CostPerformance
3 Online results DEMO
5 Predictive Analytics and
learning
6 Future lines and conclusions
Intro Hadoop performance and ecosystem
Hadoop design
Hadoop was designed to solve complex data ndash Structured and non structured
ndash with [close to] linear scalability
ndash and application reliability
Simplifying the programming model ndash From MPI OpenMP CUDA hellip
Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins
ndash YARN abstracts even more
Image source Hadoop the definitive guide
Hadoop highly-scalable buthellip
Not a high-performance solution
Requires
ndash Design
bull Clusters topology clusters
ndash Setup
bull OS Hadoop config
ndash Fine tuning required
bull Iterative approach
bull Time consuming
and extensive benchmarking
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
The MareNostrum 3 Supercomputer
Over 1015 Floating Point Operations per
second
Nearly 50000 cores
1008 TB of main memory 2 PB of disk storage
70 distributed through PRACE
24 distributed through RES
6 for BSC-CNS use
Over 1015 Floating Point Operations per second
Nearly 50000 cores
1008 TB of main memory
2 PB of disk storage
Agenda
1 Intro on Hadoop
performance
1 Current scenario and
problematic
2 ALOJA project
1 Background
2 Open source tools
3 Benchmarking
1 Benchmarking workflow
2 DEMO
4 Results
1 HW and SW speedups
2 CostPerformance
3 Online results DEMO
5 Predictive Analytics and
learning
6 Future lines and conclusions
Intro Hadoop performance and ecosystem
Hadoop design
Hadoop was designed to solve complex data ndash Structured and non structured
ndash with [close to] linear scalability
ndash and application reliability
Simplifying the programming model ndash From MPI OpenMP CUDA hellip
Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins
ndash YARN abstracts even more
Image source Hadoop the definitive guide
Hadoop highly-scalable buthellip
Not a high-performance solution
Requires
ndash Design
bull Clusters topology clusters
ndash Setup
bull OS Hadoop config
ndash Fine tuning required
bull Iterative approach
bull Time consuming
and extensive benchmarking
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Agenda
1 Intro on Hadoop
performance
1 Current scenario and
problematic
2 ALOJA project
1 Background
2 Open source tools
3 Benchmarking
1 Benchmarking workflow
2 DEMO
4 Results
1 HW and SW speedups
2 CostPerformance
3 Online results DEMO
5 Predictive Analytics and
learning
6 Future lines and conclusions
Intro Hadoop performance and ecosystem
Hadoop design
Hadoop was designed to solve complex data ndash Structured and non structured
ndash with [close to] linear scalability
ndash and application reliability
Simplifying the programming model ndash From MPI OpenMP CUDA hellip
Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins
ndash YARN abstracts even more
Image source Hadoop the definitive guide
Hadoop highly-scalable buthellip
Not a high-performance solution
Requires
ndash Design
bull Clusters topology clusters
ndash Setup
bull OS Hadoop config
ndash Fine tuning required
bull Iterative approach
bull Time consuming
and extensive benchmarking
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Intro Hadoop performance and ecosystem
Hadoop design
Hadoop was designed to solve complex data ndash Structured and non structured
ndash with [close to] linear scalability
ndash and application reliability
Simplifying the programming model ndash From MPI OpenMP CUDA hellip
Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins
ndash YARN abstracts even more
Image source Hadoop the definitive guide
Hadoop highly-scalable buthellip
Not a high-performance solution
Requires
ndash Design
bull Clusters topology clusters
ndash Setup
bull OS Hadoop config
ndash Fine tuning required
bull Iterative approach
bull Time consuming
and extensive benchmarking
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Hadoop design
Hadoop was designed to solve complex data ndash Structured and non structured
ndash with [close to] linear scalability
ndash and application reliability
Simplifying the programming model ndash From MPI OpenMP CUDA hellip
Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins
ndash YARN abstracts even more
Image source Hadoop the definitive guide
Hadoop highly-scalable buthellip
Not a high-performance solution
Requires
ndash Design
bull Clusters topology clusters
ndash Setup
bull OS Hadoop config
ndash Fine tuning required
bull Iterative approach
bull Time consuming
and extensive benchmarking
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Hadoop highly-scalable buthellip
Not a high-performance solution
Requires
ndash Design
bull Clusters topology clusters
ndash Setup
bull OS Hadoop config
ndash Fine tuning required
bull Iterative approach
bull Time consuming
and extensive benchmarking
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Setting up your Big Data system
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Image source Intelreg Distribution for Apache Hadoop
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Product claims on performance and TCO
Eco-system is not transparent
ndash Needs auditing
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
How do I set my system too many options
Default values in Apache source not ideal
Large and spread eco system
ndash Different distributions
ndash Product claims
Each job is different
ndash No one-fits-all solution
Cloud vs On-premise
ndash IaaS
bull Tens of different VMs to choose
ndash PaaS
bull HDInsight CloudBigData EMR
New economic HW
ndash SSDs InfiniBand Networking
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
The ALOJA project research lines and challenges
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
BSCrsquos project ALOJA towards cost-effective Big Data
Open research project for improving the cost-effectiveness
of Big Data deployments
Benchmarking and Analysis tools
Online repository and largest Big Data repo
ndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
ALOJA research lines broad coverage
Techniques for obtaining CostPerformance Insights
Profiling
bull HPC Low-level
bull High Accuracy
bull Manual Analysis
Benchmarking
bull Iterate configs
bull HW and SW
bull Real executions
bull Log parsing and data sanitization
Analysis tools
bull Summarize large number of results
bull By criteria
bull Filter noise
bull Fast processing
Predictive Analytics
bull Automated modeling
bull Estimations
bull Virtual executions
bull Automated KD
Big Data Apps
Frameworks
Systems Clusters
Cloud ProvidersDatacenters
Evaluation of
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Test different clusters and architectures ndash On-premise
bull Commodity high-end appliance low-power
ndash Cloud IaaS bull 32 different VMs in Azure
similar in other providers
ndash Cloud PaaS bull HDInsight EMR CloudBigData
Different access level ndash Full admin user-only request-
to-install everything ready queuing systems (SGE)
Different versions ndash Hadoop JVM Spark Hive
etchellip
ndash Other benchmarks
Problems ndash All systems though for PROD
bull Not for comparison
ndash No Azure support
ndash Many different packages
ndash No one-fits-all solution
Dev environments and testing
ndash Big Data usually requires a cluster to develop and test
Solution ndash Custom implementation
ndash Based in simple components
ndash Wrapping commands
Challenges (circa end 2013)
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Benchmarking with ALOJArsquos open source tools
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
ALOJA Platform main components
2 Online Repository
bullExplore results
bullExecution details
bullCluster details
bullCosts
bullData sharing
3 Web Analytics
bullData views and evaluations
bullAggregates
bullAbstracted Metrics
bullJob characterization
bullMachine Learning
bullPredictions and clustering
1 Big Data Benchmarking
bullDeploy amp Provision
bullConf Management
bullParameter selection amp Queuing
bullPerf counters
bullLow-level instrumentation
bullApp logs
19
NGINX PHP MySQL
BASH Unix tools CLIs R SQL JS
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Extending and collaborating in ALOJA
1 Install prerequisites ndash git vagrant VirtualBox
2 git clone httpsgithubcomAlojaalojagit
3 cd aloja
4 vagrant up
5 Open your browser at httplocalhost8080
6 Optional start the benchmarking cluster
vagrant up
Setting up a DEV environment
Installs a Web Server with sample data
Sets a local cluster to test benchmarking
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Workflow in ALOJA
Cluster(s) definition
bull VM sizes
bull nodes
bull OS disks bull Capabilities
Execution plan
bull Start cluster
bull Setup
bull Exec Benchmarks
bull Cleanup
Import data
bull Convert perf metric
bull Parse logs
bull Import into DB
Evaluate data
bull Data views in Vagrant VM
bull Or httpalojabsces
PA and KD
bullPredictive Analytics
bullKnowledge Discovery
Historic
Repo
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Commands and providers
Provisioning commands Providers
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
On-premise
ndash Custom settings for
clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack
Rackspace AWS (testing)
Cloud PaaS
ndash HDInsight CloudBigData
EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster Import defaults (if any) ndash Sets OS version
Select provider ndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optional ndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be defined ndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Running benchmarks in ALOJA
Benchmarking with defaults
repo_locationaloja-benchrun_benchssh
To queue jobs
repo_locationshellexeqsh
Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Testing different configurations
Approaches
1 Config folders
2 Override variables
1 In benchmark_defaultsconf
2 In cluster config
3 Cmd line
1 Via parameters
run_benchssh -r 2 -m 10
1 Via shell globals HADOOP_VERSION=hadoop-271
BENCH_DATA_SIZE=1TB
Things to look for HW OS ndash Versions
ndash Disk config and mounts
SW ndash Replication
ndash Block sizes
ndash Compression
ndash IO buffers
Build your exec plan in a script and queue
Or follow ML recommendations
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
ALOJA-WEB
Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views
Online DEMO at httpalojabsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Online benchmarking results
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
28
2) ALOJA-WEB Online Repository
Entry point for explore the results collected from the executions
ndash Index of executions bull Quick glance of executions
bull Searchable Sortable
ndash Execution details bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs
Sharing results ndash Download executions ndash Add external executions
Documentation and References ndash Papers links and feature documentation
Available at httphadoopbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map phase)
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Comparing 3 runs on same cluster different configs
CPU context switches 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
High context switches with 3
containers on a 2-core VM
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Impact of SW configurations in Speedup (4 node clusters)
Number of mappers Compression algorithm
No comp
ZLIB
BZIP2
snappy
4m
6m
8m
10m
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local only
1 Remote
2 Remotes
3 Remotes
3 Remotes tmp local
2 Remotes tmp local
1 Remotes tmp local
HDD-ETH
HDD-IB
SSD-ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
VM Size comparison (Azure)
Lower is better
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Clusters by cost-effectiveness 12
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Clusters by cost-effectiveness 22
URL httpalojabscesclustercosteffectiveness
Fastest Exec Cheapest exec
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
CostPerformance Scalability of cluster size
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
Execution time Execution cost
Recommended size
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Predictive Analytics and automated learning
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Modeling Hadoop ndash Methodology
Methodology ndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks
Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
40
ALOJA
Data-Set
Training
Validation
Testing
Model Select this
model
Final
Model Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Use case 1 Anomaly Detection
Anomaly Detection
ndash Model-based detection procedure
ndash Pass executions through the model
ndash Executions not fitting the model are considered ldquoout of the systemrdquo
Anomaly detection procedure Sample view from site
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Use case 2 Guided Benchmarking ndash Method
Guided Benchmarking
ndash Best subset of configurations for modeling a Hadoop deployment
ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset
of executions
ALOJA
Data-Set
Increase number of centers
NO
YES Clustering
Data-set
(centers) Model Is error
OK
Configs to
execute
Model Build
Build
Test
Reference
42
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Use case 3 Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
43
pred_time
HDD SSD
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB
IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB
IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Concluding remarks and reference
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
Concluding remarks
Benchmarking its fun or at leasthellip
ndash It will save you euroeuroeuro and allow you to scale
But it is also tough ndash The industry needs more transparency
ndash We still have a lot to dohellip
In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point
We are adding constantly new features ndash Benchmarks systems providers
It is an open initiate your invited to participate ndash Beta testers ndash Contributors
With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time
Find us around the conference for more details on the toolshellip
Fork our repo at httpsgithubcomAlojaaloja
ne
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
More info
ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications
Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop
BDOOP meetup group in Barcelona
Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)
ndash httpcldssdscedubdbccommunity
Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca
SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-
grouphtml
Slides and video ndash Michael Frank on Big Data benchmarking
bull httpwwwtele-taskdearchivepodcast20430
ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces
wwwbsces
QampA
Thanks
Contact nicolaspoggibsces