Upload
fabien-hermenier
View
75
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Fabien Hermenier, Adrien Lèbre, and Jean-Marc Menaud. Proceeding of the international workshop Virtualization Techniques for Distributed Computing (VTDC'10), with the 19th ACM International Symposium on High Performance Distributed Computing (HPDC'10). ACM, New York, pages 658-666.
Citation preview
Cluster-Wide Context Switch of Virtualized Jobs
Fabien Hermenier, Adrien Lèbre, Jean-Marc Menaud
VTDC’10, 22 June 2010
ASCOLA Team
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 1 / 23
Agenda
Motivation
Global DesignArchitectureImplementation
Proof of conceptA sample schedulerExperiment on a cluster
Conclusion
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 2 / 23
Motivation
Agenda
Motivation
Global DesignArchitectureImplementation
Proof of conceptA sample schedulerExperiment on a cluster
Conclusion
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 3 / 23
Motivation
Motivation
ClustersI large infrastructures to execute various jobs
Resource Management System (RMS)
I manage the execution of jobsI resources are allocated to jobs according to their descriptionI scheduling: which jobs to execute, and where ?
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 4 / 23
Motivation
Jobs schedulers
UsuallyA corse-grain exploitation of resources :
I static allocation of resourcesI execution to completion
Dynamic schedulers existBased on mechanisms that manipulate the jobs dynamically (migration,preemption, dynamic allocation of resources, . . . ). BUT
I mechanisms are complex to implementI mechanisms are complex to use efficiently
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 5 / 23
Motivation
Motivation
Virtual Machines (VMs) as a backend for dynamic schedulers
I each component is embedded into its VMI VMMs provide migration, preemptionI still complex to use efficiently
A cutting-edge building blockdynamic consolidation, best-effort jobs , . . .
I various policies, but common concepts to perform the changesI each provides an ad-hoc solution to handle several common issues:
I dependencies between actionsI correctnessI reactivity
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 6 / 23
Motivation
Proposition
Performing the changes should not be a primary concern for developersI a generic cluster-wide context switch based on VMsI developers only focus on the algorithm to select the jobs to runI the cluster-wide context switch takes care of the rest
I detects the changes to performI ensures the correctness of the transitionI computes the fastest possible transition
The implementation leverages the consolidation manager Entropy
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 7 / 23
Global Design
Agenda
Motivation
Global DesignArchitectureImplementation
Proof of conceptA sample schedulerExperiment on a cluster
Conclusion
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 8 / 23
Global Design Architecture
From jobs to virtualized Jobs
Figure: The life cycle of a vjob
I a vjob encapsulates one or several VMsI to change the state of a vjob,
actions (except migrate) are executed on each VMs
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 9 / 23
Global Design Architecture
Configuration
I describes the assignment of the running VMs to working nodesI nodes provide CPU and memory resourcesI running VMs require CPU and memory resources to run at peak level
(a) Non-viable configuration (b) Viable configuration
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 10 / 23
Global Design Architecture
The control loop of Entropy
MonitorI extract the current configuration:
VM position, CPU/memory consumptionI adaptable to a specific monitoring system (currently Ganglia)
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Architecture
The control loop of Entropy
Scheduling policyI an algorithm to select the vjobs to run wrt. the current configurationI provided by a developer
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Architecture
The control loop of Entropy
The cluster-wide context switch moduleI selects a position for each VM to runI infers the actions that make the transition w. the current configurationI computes the fastest plan that ensure the correctness of the process
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Architecture
The control loop of Entropy
ExecutionI associate each action of the plan with a driver that performs the actionI adaptable to specific environments.
Currently support Xen VMM (XML-RPC) or shell command
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 11 / 23
Global Design Implementation
Role of the CW context switch
I detects the actions to performI selects a position for each VM to runI plans the actions to guarantee the correctness of the processI computes the fastest possible plan
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 12 / 23
Global Design Implementation
Plan the actions
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation
Plan the actions
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation
Plan the actions
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation
Plan the actions
The reconfiguration planI a protocol to execute actionsI actions feasible in parallel are grouped into a same stepI steps are executed sequentially
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 13 / 23
Global Design Implementation
Suspending/Resuming a vjobI inter-connected VMs should be continuously in the same stateI coordination to ensure that distributed applications will not fail
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
Global Design Implementation
Suspending/Resuming a vjobI inter-connected VMs should be continuously in the same stateI coordination to ensure that distributed applications will not fail
I actions are grouped into a same stepI synchronization between the pause/unpause actions
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 14 / 23
Global Design Implementation
Reducing the duration of a cluster-wide context switch
40
35
30
25
20
15
10
5
128 256
45
1024 2048
start/runmigrate
stop/shutdown
VM size (in MB)
Com
plet
ion
time
(in s
ec)
0512
Com
plet
ion
time
200
150
100
50
0128 256
(in s
ec)
1024 2048
localnfs
local scplocal rsync
VM size (in MB)512
I the duration of an action depends on its contextI a function estimates the cost of a whole CW context switch
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 15 / 23
Global Design Implementation
Reducing the duration of a CW context switch
An approach based on constraint programingEntropy computes a new configuration that
I is viableI respects the scheduling policyI implies the minimal cost
In practice
I actions are performed asap.I prefer moving VMs will small memory requirementsI avoid migrations and remote resumes
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 16 / 23
Proof of concept
Agenda
Motivation
Global DesignArchitectureImplementation
Proof of conceptA sample schedulerExperiment on a cluster
Conclusion
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 17 / 23
Proof of concept A sample scheduler
A sample scheduler
Principle
I a FIFO queueI VMs are assigned to nodes using a First Fit Decrease heuristicI priority between jobs to prevent starvation
Example
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
Proof of concept A sample scheduler
A sample scheduler
Principle
I a FIFO queueI VMs are assigned to nodes using a First Fit Decrease heuristicI priority between jobs to prevent starvation
Example
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
Proof of concept A sample scheduler
A sample scheduler
Principle
I a FIFO queueI VMs are assigned to nodes using a First Fit Decrease heuristicI priority between jobs to prevent starvation
Benefits using CW context switch
I dynamic allocation of resourcesI preemptionI migration of VMs
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 18 / 23
Proof of concept Experiment on a cluster
Environment
HardwareI 11 working nodesI 3 storage nodes share VM imagesI 1 service node is running Entropy
ProtocolI a queue of 8 vjobs (NASGrid benchmarks)I each vjob uses 9 VMsI comparison with regards to FCFS
I resources usageI completion time
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 19 / 23
Proof of concept Experiment on a cluster
Experiment on a clusterBenefits
I improve resource usageI suspend/resume transparent for the developer
I reduce the completion time
Resources usage
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
Proof of concept Experiment on a cluster
Experiment on a cluster
BenefitsI improve resource usageI suspend/resume transparent for the developerI reduce the completion time
Cumulated execution timeI FCFS: 250 minutesI Entropy: 150 minutes
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 20 / 23
Conclusion
Agenda
Motivation
Global DesignArchitectureImplementation
Proof of conceptA sample schedulerExperiment on a cluster
Conclusion
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 21 / 23
Conclusion
ConclusionRMSs start to manage VMs instead of process
I VMMs provide mechanisms to implement dynamic schedulersI manipulate VMs is tedious and may be non cost-effectiveI various scheduling policies but common concepts to perform the
context switch
A generic cluster-wide context switch
I make the implementation of dynamic schedulers easierI the context switch is outside the scheduling algorithmI an implementation in Entropy with a sample algorithm
http://entropy.gforge.inria.frversion 1.2 (LGPL)
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 22 / 23
Conclusion
I’m looking for a postdocposition
I fond of - virtualization, distributed systems, autonomic computing, . . .I dislike - tomatoes
Hermenier et al. (ASCOLA) Cluster-Wide Context Switch of Virtualized Jobs 23 / 23