Scheduling and Guided Search for Cloud Applications Cristiana Amza
Preview:
Citation preview
- Slide 1
- Scheduling and Guided Search for Cloud Applications Cristiana
Amza
- Slide 2
- Big Data is Here Data growth (by 2015) = 100x in ten years [IDC
2012] Population growth = 10% in ten years Monetizing data for
commerce, health, science, services, . [source: Economist] courtesy
of Babak Falsafi
- Slide 3
- Data Growing Faster than Technology WinterCorp Survey,
www.wintercorp.com Growing technology gap courtesy of Babak
Falsafi
- Slide 4
- Challenge 1: Costs of a datacenter 4 Estimated costs of
datacenter: 46,000 servers $3,500,000 per month to run Data
courtesy of James Hamilton [SIGMOD11 Keynote] 3yr server and 10yr
infrastructure amortization Server & Power are 88% of total
cost Server & Power are 88% of total cost
- Slide 5
- Datacenter Energy Not Sustainable Modern datacenters 20 MW! In
modern world, 6% of all electricity, growing at >20%! Billion
Kilowatt hour/year 2001 2005 2009 2013 2017 A Modern Datacenter 17x
football stadium, $3 billion 50 million homes courtesy of Babak
Falsafi
- Slide 6
- Amazon Cant Recover All Its Cloud Data From Outage Max Eddy, 27
April 2011, www.geekosystem.com When the Cloud Fails: T-Mobile,
Microsoft Lose Sidekick Customer Data Om Malik, 10 October 2009,
gigaom.com Whoops Facebook loses 1 billion photos Chris Keall, 10
March 2009, The National Business Review Cloud Storage Often
Results in Data Loss Chad Brooks, 10 October 2011,
www.businessnewsdaily.com 6 Cloudy with a chance of failure
courtesy of Haryadi S. Gunawi Challenge 2: Data Management
(Anomalies)
- Slide 7
- Problems are entrenched I have been working in this area since
2001 Problems have only grown more complex/intractable Same old
Distributed Systems problems New: Levels of indirection (remote
processing, deep software stacks, VMs, etc) Eg: Cloud monitoring
and logging data (terrabytes per day) But, no notable success
stories with analyzing such data 7
- Slide 8
- 8 Reduce Map MapReduce parallelism : Embarrassing/simplistic
Works for aggregate op Simple scheduling Challenge 3: Paradigm
Limitations
- Slide 9
- 9 Hadoop/Enterprise: Separate Storage Silos Hardware $$$
Periodic data ingest Cross-silo data management $$$ Hadoop
- Slide 10
- What can we do ? Find Meaningful Apps We can produce/find tons
of data Need to analyze something of vital importance to justify
draining vital resources Otherwise the simplest solution is to stop
creating the problem(s) 10
- Slide 11
- What can we do ? Consolidate Research Agendas Find overarching,
mission critical paradigms State of the art: MapReduce too
simplistic Develop standards, common tools and benchmarks Integrate
solutions, think holistically Enforce accountability for data
center/Cloud provider 11
- Slide 12
- Opportunity 1: The Brain Challenge Started to explore
Neuroscience workloads in 2010 A Brain Summit/Workshop held at IBM
TJ Watson Started a collaboration with Stephen Strother at Baycrest
a year later An application that is both data and compute intensive
Boils down to an optimization problem in a highly parametrized
search space 12
- Slide 13
- Opportunity 2: Guided Modeling Performance modeling, energy
modeling, anomaly modeling, biophysical modeling All tend to be
interpolations/searches/optimizations in highly parametrized spaces
Key idea: Develop a common framework that works for all Extend the
way MapReduce standardized aggregation ops Guidance: Operator
Reduction, Linear Interpolation, etc 13
- Slide 14
- Building models takes time High Latency Low Latency DB Memory
Storage Memory Avg. Latency 14 32 data points 32x32= 102 4 sampling
points Actuate a live system and take experimental samples. Sample
in 512MB chunks; 15 minutes for each point 16GB Exhaustive sampling
takes 11 days!
- Slide 15
- Goal: Reduce Time by Model Reuse High Latency Low Latency DB
Resources Storage Resources Avg. Latency Dynamic Resource
Allocation [FAST09] Capacity Planning [SIGMOD13] What-if Queries
[SIGMETRICS10] Anomaly Detection [SIGMETRICS10] Towards A Virtual
Brain Model [ HPCS14 ] 15 Provide resource-to-performance mapping
More Less
- Slide 16
- Use less resources: Customer wants 1000 TPS. What is the most
efficient (e.g., CPU/Memory) to deliver it? Share resources: Can I
place customer As DB along side customer Bs DB? Will their
service-levels be met? Service Provider Management Interactions 16
Use the right amount of resources: What will be the performance
(e.g., query latency) if I use 8GB of RAM instead of 16GB? Solve
performance problems: Im only getting 500 TPS. Whats wrong? Is the
cloud to blame? Customer DBA Need to build performance models to
understand
- Slide 17
- Libraries/archive of models 17 Black-box Models Minimal
assumptions Needs lots of samples Could over-fit Analytical Models
No samples required Difficult to derive Fragile to maintain
Gray-box Models Few samples needed Can be adapted Still need to
derive Data driven Knowledge driven Use an Ensemble of models
- Slide 18
- Model Ensemble approach 18 1. Guidance as trends and patterns y
x y x y x Use data 2. Automatically tune the models Test & Rank
3. Rank the models use a blend y x y x y x y x Repeat (if needed) 1
2
- Slide 19
- How to specify guidance 19 SelfTalk Language to describe
relationships Provide a catalog of common functions Details in
SIGMETRICS10 paper y x Specifies model inputs and parameter s
Curve- fitting and validation algorithms
- Slide 20
- HINT myHint RELATION LINEAR(x,y) METRIC (x,y) {
x.name=MySQL.CPU y.name=MySQL.QPS } CONTEXT (a) {
a.name=MySQL.BufPoolAlloc a.value >= 512MB } Refine models using
data 20 Use hints to link relations to metrics y x Hints that CPU
linearly correlated to QPS Learns parameters using data (or
requests more data) But working-set should be in RAM
- Slide 21
- Rank models and blend High Latency DB Memory Storage Memory
Avg. Latency 21 16GB 1. Divide search space into regions y y y 3.
Associate best-model to region 2. n-fold cross- validation to
rank
- Slide 22
- Prototype 22 SelfTalk, Catalog of models, data MySQL &
Storage Server How should I partition resources between two
applications A and B?
- Slide 23
- Runtime Engine 23 Model Matching Model Repository model new
workload reuse similar data/models selective, iterative and
ensemble learning process Expand Samples Model Validation and
Refinement refine if necessary
- Slide 24
- Ex 1: Predicting buffer pool latencies 24
- Slide 25
- Ex 2: Model Transformation CacheL1L2L3 Core i7 256 kB1024
kB8192 kB
- Slide 26
- Guidance: Step function (3D to 2D reduction) L1L2L3 Xeon 256
kB2048 kB20480 kB Core i7 256 kB1024 kB8192 kB
- Slide 27
- With minimum new samples CacheL1L2L3 Xeon 256 kB2048 kB20480
kB
- Slide 28
- Ex 3: Modeling and Job Scheduling for Brain Data centers
usually have heterogeneous structure variety of multicores, GPUs,
etc. Different stages of application have different resource
demands (CPU versus data intensive) Job scheduling to available
resources becomes non-trivial Guided modeling helps 28
- Slide 29
- Functional MRI Goal Studying brain functionality Procedure
Asking patients (subjects) to do a task and capturing brain slices
measuring blood oxygen level. Correlating images to identify brain
activity 29
- Slide 30
- Functional MRI Overall Pipeline Data Acquisition 30
- Slide 31
- Functional MRI 31
- Slide 32
- NPAIRS As Our Application NPAIRS Goal: Processing images to
find images correlations Feature Extraction: A common technique in
image processing applications (e.g. Face Recognition) Using
Principal Component Analysis to extract Eigen Vectors Finding a set
of Eigen Vectors which is a good representative for the whole set
of subjects Machine Learning Methods, Heuristic Search, etc.
32
- Slide 33
- NPAIRS SPLIT-HALF 2 FULL DATA Scans Design Data Scans
Statistical Parametric Map SJ2 SPM SJ1 Design Data SPLIT-HALF 1
REPRODUCIBILITY ESTIMATE (r) v SJ2 v SJ1 Split J 33
- Slide 34
- Output of NPAIRS 34
- Slide 35
- NPAIRS Flowchart 35
- Slide 36
- NPAIRS Profiling Results 36
- Slide 37
- GPU Execution Profile 37
- Slide 38
- NPAIRS Execution on Different Nodes 38
- Slide 39
- Job Modeling: Exhaustive Sampling 39 Sample Set: 1 to 99
Fitness Score R 2 : 0.995 Total Run Time: 64933
- Slide 40
- Uniform Sampling 40 Sample Set: 2 12 22 32 42 52 62 72 82 92
Using 5-fold cross validation Fitness Score R 2 : 0.990 Total Run
Time: 6368
- Slide 41
- Guidance: Step function + Fast Sampling 41 Sample Set: 2 4 8 12
16 20 24 32 48 96 Fitness Score R 2 : 0.993 Total Run Time: 5313
16.6% Time Saving!
- Slide 42
- Heterogeneous CPU Only (3Fat + 5Light Nodes) 42
- Slide 43
- 3 Fat, 3 Light, 3 GPU nodes 43
- Slide 44
- Resource Utilization 44
- Slide 45
- Overall Execution Time Comparison 45
- Slide 46
- Conclusions Big Data processing is driving a quantum leap in IT
Hampered by slow progress in data center management We propose to
investigate guided modeling Promising preliminary results with
Neuroscience workloads 7x speedup of NPAIRS on small CPU+GPU
cluster 46
- Slide 47
- Backup Slides 47
- Slide 48
- Modeling Procedure Get sample set and split it using 5-fold
cross validation Fit the model using 4 folds sample data Test the
model using 1 fold sample data Try 5 type of splits, and sum the
model error. If the error is less than the threshold, stop. We
found the model. 48
- Slide 49
- Notes Total run time is the summary of total sampling time.
Modeling time is negligible. Use the exhaustive data set as the
true value, and the fitted model to predict the value. Then compute
the coefficient of determination R 2. 49