Upload
frederic-desprez
View
334
Download
3
Tags:
Embed Size (px)
Citation preview
Workflow Allocations and Scheduling on IaaSPlatforms, from Theory to Practice
Eddy Caron1, Frederic Desprez2, Adrian Mures, an1, FredericSuter3, Kate Keahey4
1Ecole Normale Superieure de Lyon, France
2 INRIA
3IN2P3 Computing Center, CNRS, IN2P3
4UChicago, Argonne National Laboratory
Joint-lab workshop
Outline
Context
TheoryModelsProposed solutionSimulations
PracticeArchitectureApplicationExperimentation
Conclusions and perspectives
Workflows are a common pattern in scientific applications
I applications built on legacy code
I applications built as an aggregate
I use inherent task parallelism
I phenomenons having inherent workflow structure
Workflows are omnipresent!
(a) Ramses (b) Montage
(c) Ocean current modeling (d) Epigenomics
Figure Workflow application examples
Classic model of resource provisioning
I static allocations in a grid environment
I researchers compete for resources
I researchers tend to over-provision and under-use
I workflow applications have a non-constant resource demand
This is inefficient, but can it be improved?
Yes!How?
I on-demand resources
I automate resource provisioning
I smarter scheduling strategies
Classic model of resource provisioning
I static allocations in a grid environment
I researchers compete for resources
I researchers tend to over-provision and under-use
I workflow applications have a non-constant resource demand
This is inefficient, but can it be improved?Yes!How?
I on-demand resources
I automate resource provisioning
I smarter scheduling strategies
Classic model of resource provisioning
I static allocations in a grid environment
I researchers compete for resources
I researchers tend to over-provision and under-use
I workflow applications have a non-constant resource demand
This is inefficient, but can it be improved?Yes!How?
I on-demand resources
I automate resource provisioning
I smarter scheduling strategies
Why on-demand resources?
I more efficient resource usage
I eliminate overbooking of resources
I can be easily automated
I unlimited resources ∗
=⇒ budget limit
Our goal
I consider a more general model of workflow apps
I consider on-demand resources and a budget limit
I find a good allocation strategy
Why on-demand resources?
I more efficient resource usage
I eliminate overbooking of resources
I can be easily automated
I unlimited resources ∗ =⇒ budget limit
Our goal
I consider a more general model of workflow apps
I consider on-demand resources and a budget limit
I find a good allocation strategy
Why on-demand resources?
I more efficient resource usage
I eliminate overbooking of resources
I can be easily automated
I unlimited resources ∗ =⇒ budget limit
Our goal
I consider a more general model of workflow apps
I consider on-demand resources and a budget limit
I find a good allocation strategy
Related work
Functional workflowsBahsi, E.M., Ceyhan, E., Kosar, T.: Conditional Workflow Management: A Survey and Analysis. Scientific
Programming 15(4), 283–297 (2007)
biCPADesprez, F., Suter, F.: A Bi-Criteria Algorithm for Scheduling Parallel Task Graphs on Clusters. In: Proc.
of the 10th IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing. pp. 243–252 (2010)
Chemical programming for workflow applicationsFernandez, H., Tedeschi, C., Priol, T.: A Chemistry Inspired Workflow Management System for Scientific
Applications in Clouds. In: Proc. of the 7th Intl. Conference on E-Science. pp. 39–46 (2011)
PegasusTMalawski, M., Juve, G., Deelman, E. and Nabrzyski, J..: Cost- and Deadline-Constrained Provisioning
for Scientific Workflow Ensembles in IaaS Clouds. 24th IEEE/ACM International Conference onSupercomputing (SC12) (2012)
Outline
Context
TheoryModelsProposed solutionSimulations
PracticeArchitectureApplicationExperimentation
Conclusions and perspectives
1
3
4
2
(a) Classic
1
3
4
2
(b) PTG
1
3
4
2
Loop
Or
(c) Functional
Figure Workflow types
Application model
Non-deterministic (functional) workflows
An application is a graph G = (V, E), where
V = {vi |i = 1, . . . , |V |} is a set of vertices
E = {ei ,j |(i , j) ∈ {1, . . . , |V |} × {1, . . . , |V |}} is a set of edgesrepresenting precedence and flow constraints
Vertices
I a computational task [parallel, moldable]
I an OR-split [transitions described by random variables]
I an OR-join
Example workflow
Figure Example workflow
Platform modelA provider of on-demand resources from a catalog:C = {vmi = (nCPUi , costi )|i ≥ 1}
nCPU represents the number of equivalent virtual CPUs
cost represents a monetary cost per running hour(Amazon-like)
communication bounded multi-port model
Makespan
C = maxi C (vi ) is the global makespan where
C (vi ) is the finish time of task vi ∈ V
Cost of a schedule SCost =
∑∀vmi∈SdTendi
− Tstarti e × costi
Tstarti ,Tendirepresent the start and end times of vmi
costi is the catalog cost of virtual resource vmi
Problem statementGiven
G a workflow application
C a provider of resources from the catalog
B a budget
find a schedule S such that
Cost ≤ B budget limit is not passed
C (makespan) is minimized
with a predefined confidence.
Proposed approach
1. Decompose the non-DAG workflow into DAG sub-workflows
2. Distribute the budget to the sub-workflows
3. Determine allocations by adapting an existing allocationapproach
Step 1: Decomposing the workflow
Figure Decomposing a nontrivial workflow
Step 2: Allocating budget
1. Compute the number of executions of each sub-worflow
I # of transitions of the edge connecting its parent OR-split toits start node
I Described by a random variable according to a distinct normaldistribution + confidence parameter
2. Give each sub-workflow a ratio of the budget proportional to itswork contribution.
Work contribution of a sub-workflow G i
I as the sum of the average execution times of its tasks
I average execution time computed over the catalog CI task speedup model is taken into consideration
I multiple executions of a sub-workflow also considered
Step 3: Determining allocations
Two algorithms based on the bi-CPA algorithm.
Eager algorithm
I one allocation for each task
I good trade-off between makespan and average time-cost area
I fast algorithm
I considers allocation-time cost estimations only
Deferred algorithm
I outputs multiple allocations for each task
I good trade-off between makespan and average time-cost area
I slower algorithm
I one allocation is chosen at scheduling time
Step 3: Determining allocations
Two algorithms based on the bi-CPA algorithm.
Eager algorithm
I one allocation for each task
I good trade-off between makespan and average time-cost area
I fast algorithm
I considers allocation-time cost estimations only
Deferred algorithm
I outputs multiple allocations for each task
I good trade-off between makespan and average time-cost area
I slower algorithm
I one allocation is chosen at scheduling time
Algorithm parameters
T overA , T under
A average work allocated to tasks
TCP duration of the critical path
B ′ estimation of the used budget when TA and TCP
meet
I TA keeps increasing as we increase theallocation of tasks and TCP keeps decreasing sothey will eventually meet.
I When they do meet we have a trade-off betweenthe average work in tasks and the makespan.
p(vi ) number of processing units allocated to task vi
The eager allocation algorithm
1: for all v ∈ V i do2: Alloc(v)← {minvmi∈C CPUi}3: end for4: Compute B ′
5: while TCP > T overA ∩
∑|V i |j=1 cost(vj ) ≤ B i do
6: for all vi ∈ Critical Path do7: Determine Alloc ′(vi ) such that p′(vi ) = p(vi ) + 1
8: Gain(vi )← T (vi ,Alloc(vi ))p(vi )
− T (vi ,Alloc ′(vi ))p′(vi )
9: end for10: Select v such that Gain(v) is maximal11: Alloc(v)← Alloc ′(v)12: Update T over
A and TCP
13: end whileAlgorithm 1: Eager-allocate(G i = (V i , E i ),B i )
Methodology
I Simulation using SimGridI Used 864 synthetic workflows for three types of applications
I Fast Fourier TransformI Strassen matrix multiplicationI Random workloads
I Used a virtual resource catalog inspired by Amazon EC2
I Used a classic list-scheduler for task mappingI Measured
I Cost and makespan after task mapping
Name #VCPUs Network performance Cost / hourm1.small 1 moderate 0.09m1.med 2 moderate 0.18m1.large 4 high 0.36m1.xlarge 8 high 0.72m2.xlarge 6.5 moderate 0.506
m2.2xlarge 13 high 1.012m2.4xlarge 26 high 2.024
c1.med 5 moderate 0.186c1.xlarge 20 high 0.744
cc1.4xlarge 33.5 10 Gigabit Ethernet 0.186cc2.8xlarge 88 10 Gigabit Ethernet 0.744
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 5 10 15 20 25 30 35 40 45 50
Re
lative
ma
ke
sp
an
Budget limit
Equality
Figure Relative makespan ( EagerDeferred ) for all workflow applications
0
1
2
3
4
5
1 5 10 15 20 25 30 35 40 45
Re
lative
co
st
Budget limit
Equality
Figure Relative cost ( EagerDeferred ) for all workflow applications
First conclusions
I Eager is fast but cannot guarantee budget constraint aftermapping
I Deferred is slower, but guarantees budget constraint
I After a certain budget they yield to identical allocations
I for small applications and small budgets Deferred should bepreferred.
I When the size of the applications increases or the budget limitapproaches task parallelism saturation, using Eager ispreferable.
Outline
Context
TheoryModelsProposed solutionSimulations
PracticeArchitectureApplicationExperimentation
Conclusions and perspectives
Architecture
DIET (MADAG) Workflow engine
Nimbus
VM1
VM2
VM3
VMn
Acquire/release resources
Send workflow
Receive result
Deploy workflow tasks
Scientist
SeD Ramses
Phantom frontend
SeD Grafic1
Phantom frontend
Phantom
service
. . .
Manage Autoscaling Groups
Figure System architecture
Main components
Nimbus
I open-source IaaS provider
I provides low-level resources (VMs)
I compatible with the Amazon EC2 interface
I used a FutureGrid install
Main components
Phantom
I auto-scaling and high availability provider
I high-level resource provider
I subset of the Amazon auto-scale service
I part of the Nimbus platform
I used a FutureGrid install
I still under development
Main components
MADag
I workflow engine
I part of the DIET (Distributed Interactive Engineering Toolkit)software
I one service implementation per task
I each service launches its afferent task
I supports DAG, PTG and functional workflows
Main components
Client
I describes his workflow in xml
I implements the services
I calls the workflow engine
I no explicit resource management
I selects the IaaS provider to deploy on
How does it work?
DIET (MADAG) Workflow engine
Nimbus
VM1
VM2
VM3
VMn
Acquire/release resources
Send workflow
Receive result
Deploy workflow tasks
Scientist
SeD Ramses
Phantom frontend
SeD Grafic1
Phantom frontend
Phantom
service
. . .
Manage Autoscaling Groups
Figure System architecture
RAMSES
I n-body simulations of dark matter interactions
I backbone of galaxy formations
I AMR workflow application
I parallel (MPI) application
I can refine at different zoom levels
RAMSES
Figure The RAMSES workflow
Methodology
I used a FutureGrid Nimbus installation as a testbed
I measured running time for static and dynamic allocations
I estimated cost for each allocation
I varied maximum number of used resources
Figure Slice through a 28 × 28 × 28 box simulation
Results
5:00
10:00
15:00
20:00
25:00
0 2 4 6 8 10 12 14 16
Ru
nn
ing
tim
e (
MM
:SS
)
Number of VMs
Runtime (static)VM alloc (dynamic)Runtime (dynamic)
Figure Running times for a 26 × 26 × 26 box simulation
Results
0
5
10
15
20
25
0 2 4 6 8 10 12 14 16
Co
st
(un
its)
Number of VMs
Cost (static)Cost (dynamic)
Figure Estimated costs for a 26 × 26 × 26 box simulation
Outline
Context
TheoryModelsProposed solutionSimulations
PracticeArchitectureApplicationExperimentation
Conclusions and perspectives
Conclusions
I proposed two algorithms – Eager and Deferred with each theirpro and cons
I on-demand resources can better model workflow usage
I on-demand resources have a VM allocation overhead
I allocation overhead decreases with number of VMs
I for RAMSES, cost is greatly reduced
Perspectives
I preallocate VMs
I spot instances
I smarter scheduling strategy
I determine per application type which is the tipping point
I Compare our algorithms with others
Collaborations
I Continue the collaboration with the Nimbus/FutureGrid teams
I On the algorithms themselves (currently too complicated foran actual implementation)
I Understanding (obtaining models) clouds and virtualizedplatforms
I going from theoretical algorithms to (accurate) simulationsand actual implementation
Perspectives
I preallocate VMs
I spot instances
I smarter scheduling strategy
I determine per application type which is the tipping point
I Compare our algorithms with others
Collaborations
I Continue the collaboration with the Nimbus/FutureGrid teams
I On the algorithms themselves (currently too complicated foran actual implementation)
I Understanding (obtaining models) clouds and virtualizedplatforms
I going from theoretical algorithms to (accurate) simulationsand actual implementation
References
Eddy Caron, Frederic Desprez, Adrian Muresan and Frederic Suter.Budget Constrained Resource Allocation for Non-DeterministicWorkflows on a IaaS Cloud. 12th International Conference onAlgorithms and Architectures for Parallel Processing. (ICA3PP-12),Fukuoka, Japan, September 04 - 07, 2012
Adrian Muresan, Kate Keahey. Outsourcing computations for galaxysimulations. In eXtreme Science and Engineering Discovery Environment2012 - XSEDE12, Chicago, Illinois, USA, June 15 - 19 2012. Postersession.
Adrian Muresan. Scheduling and deployment of large-scale applicationson Cloud platforms. Laboratoire de l’Informatique du Parallelisme (LIP),ENS Lyon, France, Dec. 10th, 2012 PhD thesis.
Algorithm parameters
T overA - Eager
T overA =
1
B ′×|V i |∑j=1
(T (vj ,Alloc(vj ))× cost(vj )) ,
T underA - Deferred
T underA =
1
B ′×|V i |∑j=1
(T (vj ,Alloc(vj ))× costunder (vj ))
Budget distribution algorithm
1: ω∗ ← 02: for all G i = (V i , E i ) ⊆ G do3: nExec i ← CDF−1(D i ,Confidence)
4: ωi ←∑
vj∈V i
1
|C|∑
vmk∈CT (vj , vmk )
× nExec i
5: ω∗ ← ω∗ + ωi
6: end for7: for all G i ⊆ G do8: B i ← B × ωi
ω∗ ×1
nExec i
9: end forAlgorithm 2: Share Budget(B,G,Confidence)
Absolute makespans
00:00:00
00:10:00
00:20:00
00:30:00
00:40:00
00:50:00
01:00:00
01:10:00
01:20:00
1 5 10 15 20 25 30 35 40 45
Ma
ke
sp
an
(H
H:M
M:S
S)
Budget limit
(a) Eager for all workflow applications
Absolute makespans
00:00:00
00:10:00
00:20:00
00:30:00
00:40:00
00:50:00
01:00:00
01:10:00
01:20:00
1 5 10 15 20 25 30 35 40 45
Ma
ke
sp
an
(H
H:M
M:S
S)
Budget limit
(b) Deferred for all workflow applications
Absolute cost
0
10
20
30
40
50
1 5 10 15 20 25 30 35 40 45 50
Co
st
Budget limit
Budget limit
(c) Eager for all workflow applications
Absolute cost
0
10
20
30
40
50
1 5 10 15 20 25 30 35 40 45 50
Co
st
Budget limit
Budget limit
(d) Deferred for all workflow applications