View
0
Download
0
Category
Preview:
Citation preview
Herodotos Herodotou, Harold Lim, Fei Dong,
Shivnath Babu
Duke University
http://www.cs.duke.edu/starfish
Practitioners of Big Data Analytics
6/29/2011 Starfish 2
Data Size
Yahoo!
eBay
Journalists
Systems
researchers
Workload
Complexity
Biologists
Economists
Physicists
Counts &
Aggregates
Rollups &
Drilldowns
Statistical analysis,
Linear Algebra
Machine
learning
Text / Images / Video / Graphs
MapReduce/Hadoop Ecosystem
6/29/2011 Starfish 3
MapReduce Execution Engine
Distributed File System
Hadoop
Oozie Hive Pig Elastic
MapReduce
Java / Ruby /
Python Client
Jaql
HBase
MADDER Principles of Big Data Analytics
6/29/2011 Starfish 4
Magnetic
Agile
Deep
Data-lifecycle-aware
Elastic
Robust
6/29/2011 Starfish 5
Magnetic
A
D
D
E
R
Easy to get data into the system
MADDER Principles of Big Data Analytics
6/29/2011 Starfish 6
M
Agile
D
D
E
R
Make change (data/requirements) easy
MADDER Principles of Big Data Analytics
6/29/2011 Starfish 7
M
A
Deep
D
E
R
Support the full spectrum of analytics
Write MapReduce programs in Java /
Python / R or use interfaces like Pig / Jaql
MADDER Principles of Big Data Analytics
6/29/2011 Starfish 8
M
A
D
Data-lifecycle-
aware
E
R
MADDER Principles of Big Data Analytics
Data cycle at LinkedIn
6/29/2011 Starfish 9
M
A
D
D
Elastic
R
Adapt resources/costs to actual workloads
MADDER Principles of Big Data Analytics
6/29/2011 Starfish 10
M
A
D
D
E
Robust Graceful degradation under undesirable
events
MADDER Principles of Big Data Analytics
6/29/2011 Starfish 11
Magnetic
Agile
Deep
Data-lifecycle-aware
Elastic
Robust
Data can be opaque until run-time
Ease
of use
Hard to get good
performance
out of the box
Programs are a different beast
from SQL
Policies are nontrivial
Ease-of-Use Vs. Out-of-the-box Perf.
Starfish: MADDER + Self-Tuning
6/29/2011 Starfish 12
Dynamic instrumentation
Profile
what-if calls Relative
simulation
Mix of models &
Recursive random search
Ease
of use
Get good performance
automatically
Magnetic
Agile
Deep
Data-lifecycle-aware
Elastic
Robust
Starfish: MADDER + Self-Tuning
6/29/2011 Starfish 13
Goal: Provide good performance automatically
MapReduce Execution Engine
Distributed File System
Hadoop
Oozie Hive Pig Elastic MR Java Client …
Starfish
Analytics System
What are the Tuning Problems?
6/29/2011 Starfish 14
Job-level
MapReduce
configuration
Workflow
optimization
J1
J2
D
Workload
management
Data layout
tuning
Cluster sizing
Starfish Architecture
6/29/2011 Starfish 15
What-if
Engine
Workflow-level tuning
Workflow Optimizer
Workload-level tuning
Workload Optimizer Elastisizer
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
Job Optimizer
Profiler
Job-level tuning
Sampler
Starfish Architecture
6/29/2011 Starfish 16
What-if
Engine
Workflow-level tuning
Workflow Optimizer
Workload-level tuning
Workload Optimizer Elastisizer
Job Optimizer
Profiler
Job-level tuning
Sampler
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
Map function
Reduce function
Run this program as a
MapReduce job
MapReduce Job Execution
Input Splits
job j = < program p, data d, resources r, configuration c >
Map Wave 1
Reduce Wave 1
Map Wave 2
Reduce Wave 2
Input Splits
MapReduce Job Execution
How are the number of splits, number of map and reduce
tasks, memory allocation to tasks, etc., determined?
job j = < program p, data d, resources r, configuration c >
Optimizing MapReduce Job Execution
6/29/2011 Starfish 19
Space of configuration choices include settings for:
Number of map tasks
Number of reduce tasks
Partitioning of map outputs to reduce tasks
Memory allocation to task-level buffers
Multiphase external sorting in the tasks
Whether output data from tasks should be compressed
Whether combine function should be used
job j = < program p, data d, resources r, configuration c >
Optimizing MapReduce Job Execution
190+ parameters in Hadoop
6/29/2011 Starfish 20
2-dim projection
of 13-dim surface
feature will greatly increase the utility
Hadoop has that can affect the latency and scalability of a job. For different types of jobs, different configurations will yield optimal results. For example, a job with no memory-intensive operations in the map phase but with a combine phase will want to set Hadoop's io.sort.mb quite high, to minimize the number of spills from the map.
…
Adding this of Pig for Hadoop users, as it will free them from needing to understand Hadoop well enough to tune it themselves for their particular jobs.
Case for Automated Hadoop Tuning (from wiki.apache.org/pig/PigJournal)
6/29/2011 Starfish 21
many configuration parameters
Starfish’s Core Approach to Tuning
Challenges: p is an arbitrary MapReduce program; c is
high-dimensional; …
6/29/2011 Starfish 22
),,,(minarg crdpFcSc
opt
),,,( crdpFperf program p, data d,
resources r,
configuration c
Profiler
What-if Engine
Optimizer(s)
Runs MapReduce jobs to collect job
profiles (concise execution summary)
Given profile of j = <p,d,r,c>, estimates
virtual profile for job j' = <p,d’,r’,c’>
Enumerates and searches through the
optimization space efficiently
Profile of a MapReduce Job
6/29/2011 Starfish 23
split 0 map out 0 reduce
Two Map Waves One Reduce Wave
split 2 map
split 1 map split 3 map Out 1 reduce
Concise representation of program execution
Records information at the level of “task phases”
split 0 map out 0 reduce
Profile of a MapReduce Job
6/29/2011 Starfish 24
Concise representation of program execution
Records information at the level of “task phases”
Memory Buffer
Merge
Sort,
[Combine],
[Compress]
Serialize,
Partition
Map
func
Merge
DFS
Spill Collect Map Read
Map Task Phases
split 0
split 0 map out 0 reduce
Profile of a MapReduce Job
6/29/2011 Starfish 25
Concise representation of program execution
Records information at the level of “task phases”
split 0 map out 0 reduce
Profile
Dataflow
Amount of data flowing though tasks & task phases
Dataflow Statistics
Statistical info about the dataflow
Cost
Execution time at level of tasks & task phases
Cost Statistics
Statistical info about the costs
Fields in a Job Profile Dataflow
Map output bytes
Number of map-side spills
Number of merge rounds
Number of records in buffer per spill
6/29/2011 Starfish 26
Cost
Read phase time in the map task
Map phase time in the map task
Collect phase time in the map task
Spill phase time in the map task
Dataflow Statistics
Map func’s selectivity (output / input)
Map output compression ratio
Combiner’s selectivity
Size of records (keys and values)
Cost Statistics
I/O cost for reading from local disk per byte
I/O cost for writing to HDFS per byte
CPU cost for executing Map func per record
CPU cost for uncompressing the input per byte
Generating Profiles Concise representation of program execution
Records information at the level of “task phases”
Generated by Profiler through measurement or by the
What-if Engine through estimation
6/29/2011 Starfish 27
Generating Profiles by Measurement
Goals
Have zero overhead when off, low overhead when on
Require no modifications to Hadoop
Support unmodified MapReduce programs written in
Java/Python/Ruby/C++
Approach: Dynamic (on-demand) instrumentation
Event-condition-action rules are specified (in Java)
Leads to run-time instrumentation of Hadoop internals
Monitors task phases of MapReduce job execution
We currently use BTrace (Hadoop internals are in Java)
6/29/2011 Starfish 28
6/29/2011 Starfish 29
split 0 map out 0 reduce
split 1 map
raw data
raw data
raw data
map
profile
reduce
profile
job
profile
Use of Sampling
• Profile fewer tasks
• Execute fewer tasks
JVM = Java Virtual Machine, ECA = Event-Condition-Action
JVM JVM
JVM
Enable Profiling
ECA rules
Generating Profiles by Measurement
Overhead of Profile Measurement
Word Co-occurrence job running on a 16-node cluster of c1.medium EC2 nodes on the Amazon Cloud
6/29/2011 Starfish 30
Task Scheduler Simulator
What-if Engine
6/29/2011 Starfish 31
Job Oracle
What-if Engine
Job
Profile
Input Data
Properties
Cluster
Resources
Configuration
Settings
Virtual Job Profile for <p, d2, r2, c2>
Properties of Hypothetical Job
Possibly Hypothetical
<p, d1, r1, c1> <d2> <r2> <c2>
What-if Questions Starfish can Answer
How will job j’s execution time change if the number of
reduce tasks is changed from 20 to 40?
What will the change in I/O be if map o/p compression
is turned on, but the input data size increases by 40%?
What will job j’s new execution time be if 5 more nodes
are added to the cluster, bringing the total to 20?
How will workload execution time & dollar cost change
if we move the production cluster from m1.xlarge nodes
to c1.medium nodes on Amazon EC2?
6/29/2011 Starfish 32
(Virtual) Profile for j'
Virtual Profile Estimation
6/29/2011 Starfish 33
Dataflow
Statistics
Dataflow
Cost
Statistics
Cost
Profile for j
Input
Data d2
Confi-
guration
c2
Resources
r2
Dataflow
Statistics
Cardinality
Models
Cost
White-box Models
Cost
Statistics Relative
Black-box
Models
Given profile for job j = <p, d1, r1, c1>
Estimate profile for job j' = <p, d2, r2, c2>
Dataflow
White-box Models
Job Optimizer
6/29/2011 Starfish 34
Job
Profile
Input Data
Properties
Cluster
Resources
Subspace Enumeration
Recursive Random Search
Job Optimizer
What-if
calls
<p, d1, r1, c1> <d2> <r2>
Best Configuration
Settings <copt> for <p, d2, r2>
Experimental Setup
6/29/2011 Starfish 35
Hadoop cluster on 16 Amazon EC2 nodes, c1.medium type
2 map slots & 2 reduce slots
300MB max memory per task
Cost-Based Job Optimizer Vs. Rule-Based Optimizer
Experimental Setup Hadoop cluster on 16 Amazon EC2 nodes, c1.medium type
2 map slots & 2 reduce slots
300MB max memory per task
Cost-Based Job Optimizer Vs. Rule-Based Optimizer
6/29/2011 Starfish 36
Abbr. MapReduce Program Domain Dataset
CO Word Co-occurrence NLP 30GB,Wikipedia
WC WordCount Text Analytics 30GB, Wikipedia
TS TeraSort Business Analytics 30GB, Teragen
LG LinkGraph Graph Processing 10GB, Wikipedia (compressed)
JO Join Business Analytics 30GB, TPC-H
Job Optimizer Evaluation
6/29/2011 Starfish 37
0
2
4
6
8
10
12
14
16
CO WC TS LG JO
Sp
eed
up
MapReduce Job
Default
Settings
Rule-Based
Optimizer
Just-in-Time
OptimizerCost-Based Job
Insights from Profiles for WordCount
6/29/2011 Starfish 38
Many, small spills
Combiner gave smaller data reduction
Better resource utilization in Mappers
Few, large spills
Combiner gave high data reduction
Combiner made Mappers CPU bound
A: Rule-Based B: Cost-Based (2x faster)
Estimates from the What-if Engine
6/29/2011 Starfish 39
True surface Estimated surface
Starfish Architecture
6/29/2011 Starfish 40
What-if
Engine
Workflow-level tuning
Workflow Optimizer
Workload-level tuning
Workload Optimizer Elastisizer
Job Optimizer
Profiler
Job-level tuning
Sampler
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
Workflow Optimization Space
6/29/2011 Starfish 41
Job-level Configuration
Dataset-level Configuration
Vertical Packing
Intra-job Inter-job
Partition Function Selection
Optimization Space
Physical Logical
Optimizations on TF-IDF Workflow
6/29/2011 Starfish 42
Logical
Optimization
M1
R1
… D0 <{D},{W}>
J1
D1
D2
D4
M2
R2
… <{D, W},{f}>
J2
… <{D},{W, f, c}>
J3, J4
… <{W},{D, t}>
Partition:{D}
Sort: {D,W} M1
R1
M2
R2
… D0 <{D},{W}>
J1, J2
D2
D4
M3
R3
M4
… <{D},{W, f, c}>
J3, J4
… <{W},{D, t}>
Physical
Optimization
Reducers= 50 Compress = off Memory = 400 …
…
…
…
Reducers= 20 Compress = on Memory = 300 …
Legend
D = docname f = frequency
W = word c = count
t = TF-IDF
M3
R3
M4
Starfish Architecture
6/29/2011 Starfish 43
What-if
Engine
Workflow-level tuning
Workflow Optimizer
Workload-level tuning
Workload Optimizer Elastisizer
Job Optimizer
Profiler
Job-level tuning
Sampler
Data Manager
Metadata
Mgr.
Intermediate
Data Mgr.
Data Layout &
Storage Mgr.
6/29/2011 Starfish 44
Cloud enables users to provision Hadoop clusters in minutes
Can avoid the man (system administrator) in the middle
Pay only for what resources are used
0200400600800
1,0001,200
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Ru
nn
ing
Tim
e
(min
)
EC2 Instance Type
0.00
2.00
4.00
6.00
8.00
10.00
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Co
st (
$)
EC2 Instance Type
Multi-objective Cluster Provisioning
6/29/2011 Starfish 45
0200400600800
1,0001,200
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Ru
nn
ing
Tim
e
(min
)
EC2 Instance Type for Target Cluster
Actual
Predicted
0.00
2.00
4.00
6.00
8.00
10.00
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Co
st (
$)
EC2 Instance Type for Target Cluster
Actual
Predicted
Instance Type for Source Cluster: m1.large
Multi-objective Cluster Provisioning
More Info: www.cs.duke.edu/starfish
6/29/2011 Starfish 46
Job-level
MapReduce
configuration
Workflow
optimization
J1
J2
D
Workload
management
Data layout
tuning
Cluster sizing
Recommended