Upload
navaneeth-rameshan
View
135
Download
0
Tags:
Embed Size (px)
Citation preview
Project Report
Cloud Computing Report1
December 22, 2010
Marcus Ljungblad
Navaneeth Rameshan
Wasif Malik
This report is prepared by
Marcus Ljungblad
Navaneeth Rameshan
Wasif Malik
1This report is a part of the cloud computing project.
Contents
1 Introduction 1
2 ProposedMethod 2
2.1 Attempted approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.1 Web Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.2 Single Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Web Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 Single Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Implementation 5
4 Results 7
4.1 Single-Task mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.1 Simulation summary . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Web-Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.1 Simulation Summary . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Round-Robin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.1 Simulation Summary . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Conclusion 13
5.1 Scope for Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1.1 Shifting jobs between workers . . . . . . . . . . . . . . . . . . . . 13
5.1.2 Load parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1.3 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
i
List of Figures
2.1 High level flow of Scheduling algorithm . . . . . . . . . . . . . . . . . . . 3
3.1 UML diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1 Response and Queued jobs over time in single-task mode. . . . . . . . . . 8
4.2 Number of active, idle, and computing workers over time for single-taskmode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Response time and queued jobs over time in web-task mode . . . . . . . 10
4.4 Active, idle and computing workers over time in web-task mode. . . . . . 11
4.5 Response time and queued jobs using round-robin scheduling . . . . . . . 12
ii
Chapter 1
Introduction
Distributing jobs of unknown size efficiently across a large number of machines is one
of the greatest challenges today in cloud computing. The goals ranges from minimizing
cost to minimizing time to complete a set of jobs; two often contradicting requirements.
While the best way to complete all jobs as fast as possible may be to schedule one job per
machine, it is far from the most cost-efficient. In effect, one must always make a trade-off
between the two.
In this report we present three algorithms: one focuses on minimizing response time for
a job, one on minimizing cost, and one reference algorithm implemented using round-
robin. To test the algorithms a cloud simulator was implemented in C++. Results from
simulations with varying inputs are compared against the round-robin algorithm. Finally,
a set of improvements to the evaluated algorithms are proposed.
1
Chapter 2
ProposedMethod
2.1 Attempted approaches
The following subsections describe the initial attempted approaches for scheduling.
2.1.1 Web Mode
For the web mode, we intended to do an efficient distribution of jobs to worker nodes, to
minimize swapping costs and fit them in memory in the best possible way so as to ensure
load distribution. However, since the scheduler has no information of the job memory, the
only possible way to efficiently distribute jobs is to schedule jobs in round-robin or random
distribution initially and get information about the job’s memory from the worker. Then
the cost of transferring jobs from one worker node to another is calculated to see if it is
feasible to do a transfer for efficient load distribution. It may not be feasible to transfer a
job if the job has already computed most of its instructions. Although the worker nodes
don’t provide a feedback on the number of instructions, the worst case time taken for the
jobs completed is used to estimate the time remaining. We discarded this method, as the
estimate for cost was deviating significantly from the actual cost. Time taken for a job
to complete depends largely on the number of jobs that are already present in the worker
node, the swapping costs and also the number of instructions for the job. We believe that
these factors made it difficult to predict the completion time.
2
3
2.1.2 Single Task Mode
For the single task mode, we intended to submit jobs in a round robin manner and also
compute the job away time for all the jobs that have been submitted . The job away
time is computed every scheduler cycle after at least one job from a task completes. The
average time is used in this case to estimate the completion time for jobs that have been
submitted. If the job away time is lesser than the average time of completion for the jobs
in the task, then the estimated time of completion is the average time itself but if the job
away time is greater than the average time, the average completion time is recomputed
as a weighted average. However, here again the estimation of time to complete deviated
significantly from the actual values and as a result either more workers were started than
what was necessary or lesser in some cases.
2.2 Proposed method
Figure 2.1 shows the high level flow of the scheduling algorithm.
Figure 2.1: High level flow of Scheduling algorithm
2.2.1 Web Mode
In the web mode, the goal of the scheduling algorithm is to minimize the average response
time. Ideally, a practical solution would be to distribute load evenly and to start enough
4
number of worker nodes to minimize response time. The jobs are submitted in round-
robin manner until at least one job completes. This is a modified version of the normal
round robin algorithm; it only sends one job per active worker in each scheduling cycle.
This results in the scheduler holding back jobs and increases the chances of jobs finishing
quickly due to reduced swapping time. As jobs complete, we keep track of worst case
completion time. Based on the current time, we also keep a track of time until next
charging tick. As soon as we have at least one completed job, the future jobs are scheduled
to worker nodes based on their current load. In the implementation, load is the number of
jobs a worker node is currently working on. Lesser the load, more jobs can be sent to the
worker and vice versa. For each worker node, the worst case execution times for the jobs
to be sent are estimated based on the worst case completion time seen so far. If the worst
case execution time of the jobs at hand exceeds the time until the next charging tick,
then only that many jobs are sent that are estimated to complete within the charging
tick. These jobs are chosen randomly from the queue for a good distribution. Jobs that
arent sent are considered spilled and spilled jobs for each worker is accumulated in the
same cycle. Depending on the estimation of how long the spilled jobs take, new worker
nodes are started. The scheduler cannot sent the spilled jobs to the newly started workers
immediately because the workers take some time to boot up and cant accept jobs in that
time. So, the spilled jobs are saved to a hash map and the scheduler tries to send them
to the specified workers at the start of each scheduling cycle. As soon as the new worker
node(s) boot up, the jobs will be submitted to them.
2.2.2 Single Task Mode
In single task mode, a similar scheduling algorithm like above is used but with one
major difference; the decision to start new nodes is dependent on the percentage of waste
value. The more money the user is willing to waste, the more nodes will be started
by the scheduler. This would result in a very quick response time but the cost will be
significantly more. Similarly, the lower the percentage of waste value, the more strict is
the scheduler in starting new nodes, the lesser the cost, but increased average response
time. For estimating the time it would take to complete all the jobs in hand, the scheduler
uses the same approach used in web task scheduler i.e. calculates the completion time
by multiplying the number of jobs in queue with the worst time to complete one job.
Theoretically, It would have been better to consider the average completion time for each
task and then decide if new nodes should be started or not; but due to time constraints
and complexity, this approach was not implemented.
Chapter 3
Implementation
The implementation of all modules was done in C++. The UML diagram in figure 3.1
shows the relationship between modules and their key attributes.
Worker-workerState: enum+execute()+startWorker()+stopWorker()+submitJobs()+getState()+getAvailableMemory()+isAcceptingJobs()+getTotalMemory()+getCostPerHour()+getInstructionsPerTime()+getTotalExecutionTime()+getTotalCPUTime()+getAverageResponseTime()+getTotalCost()+getQueuedJobs()+getJobsCompleted()
Scheduler-queuedJobs: list<Job>-runningJobs: list<Job>-completedJobs: list<Job>-workers: list<Worker>-workerStats: list<WorkerStats>+runScheduler()+submitJobs(list<Job>)+notifyJobCompletion()-getSlowestJobTime()-fetchJobsFromQueueRandomly()-startWorkerNode()-runRoundRobinScheduler()-runWebScheduler()-runSingleTaskScheduler()
Job-jobid: long-taskid: longnum_instructions: longmem_size: long+getJobID()+getTaskID()
TaskGen-tasks: list<Task>-jobs: list<Job>scheduler: Scheduler-sendTask()-createTask()
Simulator-currentTime: long
Task-taskid: long-jobrate-num_of_jobs-jobs
1 0..nhas
1 1submit jobs/tasks
0..n
1
has
0..n
1
has
1 1
1
0..n
1 1..nhas
Figure 3.1: UML diagram
The clock functionality was implemented in the Simulator class by having a while loop
calling the task generator, scheduler, and worker objects in every iteration. One iteration
5
6
is considered to be one millisecond by each module. But since the scheduler can only
work after a configurable interval, the scheduler ignores iterations and only does work
after the specified scheduling interval. The jobs and tasks are fed to the scheduler by the
task generator at a rate specified in the input file (input.conf). The number of workers
to start automatically at the start of the simulation can be configured in workers.conf
but they will still take time to start-up when the simulation starts. Till that time, the
jobs will be queued at the schedulers end. The initial workers objects are created by
simulator and passed on to the scheduler at the start of the simulation. A limitation in
this implementation is that the scheduler has to wait for workers to start, even though
an initial numbers of started workers is specified.
The worker node is implemented as a state machine with the following states: Initialising,
Idle, Computing, Swapping, or Offline. It can accept jobs in all states except Initialising
and Offline. It maintains two queues: jobs in memory, and jobs in hard drive. When
swapping occurs at a node, it is conducted in a round-robin fashion. A job is started only
if it fits in the memory, if not, an existing job is swapped out (i.e moved to the jobs in
hard drive queue) and the next job retried. Moreover, the worker maintains a public API
statistical use towards the scheduler.
Chapter 4
Results
In this section, three test runs are presented and evaluated. The following configuration
was used for all the test runs.
Table 4.1: Simulator Configuration
Scheduling interval 0.1 sWorker node speed 300 instr/s
Worker node memory 8 GbWorker swapping cost 5 instr/gb
Worker quantum 0.1 sWorker node start-up time 120 s
Worker node notification time 2 instrWorker node cost 1 Euro/hour
Allowed waste 30%Workers started 2
4.1 Single-Task mode
Table 4.2 shows the input configuration for the jobs.
It can be seen from figure4.1 that in the first 120 seconds no jobs are scheduled as the
workers are being started during this time. Then jobs are scheduled in a round-robin
fashion until a sufficiently accurate estimation of when the jobs will complete is attained.
At this point one more worker is started and the spilled jobs are sent to the designated
worker. However, as well see in the following graph 4.2, this new worker is not well used
7
8
Table 4.2: Input Configuration
S 1 100 1 0 500 1000 1024 2048S 2 400 10 0 5000 7500 1024 2048S 3 100 10 0 1000 2000 1024 2048S 4 200 1 0 7500 10000 1024 2048
Figure 4.1: Response and Queued jobs over time in single-task mode.
and only a small number of jobs are sent to it. Response time starts to increase around
time 6000s since larger jobs remain and more swapping is carried out between them. Even
though average response time starts to increase no more worker nodes are started. This
is because the scheduler has already submitted all of the jobs in its queue and there is no
more estimation made. Instead it would increase the cost, and unused time, compared
to letting them run on existing machines.
Based on the initial jobs completed the scheduler estimates that one more worker is
required to optimize cost. As seen in the graph 4.2 , however, this worker is not very well
used while the other two are. This is because cost takes priority. Note that the machine
is switched off before the next charging instant.
9
Figure 4.2: Number of active, idle, and computing workers over time for single-taskmode.
Table 4.3: Simulation summary
Number of jobs: 800Total: 25138sActive: 21723sUnused: 3415sWaste: 15.7207%Cost: 6 Euro
Job avg response time: 7430.67sStandard deviation: 3299.69s
4.1.1 Simulation summary
Cost is calculated incorrectly when a node has been turned of due to a bug in the im-
plementation. It should be 7 Euro as we still have to pay for the hour the third worker
was online. Moreover, average response time is considerable as the long running jobs are
being swapped often.
By, for example, re-evaluating the need for more workers when response time increases
and restart a portion of the jobs at a new worker would be one way of minimizing the
response time in this case.
10
4.2 Web-Task Mode
Table 4.4: Input Configuration
W 1 100 1 0 5 100 1024 2048W 2 400 10 0 50 1000 1024 2048W 3 100 10 0 500 10000 1024 2048W 4 200 1 0 1000 20000 1024 2048
Figure 4.3: Response time and queued jobs over time in web-task mode
From graph 4.3, initially no jobs are scheduled, then as soon as workers are up an running
all jobs that have arrived to the scheduler so far are sent to workers. Response time starts
to increase due to swapping. It levels a little between 1000 to 2000 seconds as the job sizes
of the first jobs remain small. However, there are 200 jobs with possibly long execution
times (up to 20k instructions) which causes the response time to increase. Since no new
workers are started later in the execution, and jobs are not moved between workers, the
response time will continue to increase.
Even though response time increases towards the end of the simulation, and despite that
jobs are being completed, we see that worker usage is maximised for most of the execution.
Note that the simulation runs for almost 2 hours, but finishes before the next charging
time unit.
11
Figure 4.4: Active, idle and computing workers over time in web-task mode.
4.2.1 Simulation Summary
Table 4.5: Summary of simulation
Number of jobs: 800Total: 16138sActive: 13977sUnused: 2161sWaste: 15.4611%Cost: 6 Euro
Job avg response time: 1442.15sStandard deviation: 1535.77s
Standard deviation is relatively high as some jobs finishes in very little time, and some
take a long time to complete due to increased swapping.
We should make a note here that minimizing swapping is one factor that should be taken
into account when designing the scheduling algorithm. It is also the first completing jobs
that determine how many more additional workers to start, causing the response time
towards the end to increase significantly as the available workers are heavily loaded and
the estimation of work that remains starts to drift.
12
4.3 Round-Robin
As a comparison to the examples provided above, this example uses a simple round-robin
algorithm. It starts with two workers and does not start any additional workers. This
example was made using the same configuration as for Web-Task mode.
Figure 4.5: Response time and queued jobs using round-robin scheduling
4.3.1 Simulation Summary
Table 4.6: Simulation summary
Number of jobs: 800Total: 23755sActive: 13882sUnused: 9873sWaste: 71.1209%Cost: 8 Euro
Job avg response time: 2871.86sStandard deviation: 3753.55s
As seen from graph 4.5, the response time is almost doubled and the time to complete all
jobs is also doubled. Comparing the summaries we see that unused time is a lot higher
using round-robin. Primarily because this algorithm does not take allowed waste into
account when distributing the jobs, nor starts new worker nodes.
Chapter 5
Conclusion
5.1 Scope for Improvement
5.1.1 Shifting jobs between workers
As the results show, the a drawback with the algorithms proposed is that workers to
start is only estimated when jobs are in the scheduler queue. If, during the simulation,
the response time starts to increase dramatically due to large jobs that was not part of
the initial estimation by the scheduler, more nodes could be started up and jobs shifted
to these new nodes. For example, the scheduler could be improved to take increased
response time into account. Since jobs have already been passed on to workers by the
scheduler, such change would require jobs to be cancelled at one worker, and restarted at
another. Something which was not implemented fully in this simulator.
The increase in response time is largely dependent on the number of instructions to
complete a job, and to a lesser extent the memory. Although the latter affect response
time, workers that have been assigned too many jobs will swap even though its jobs have
few instructions. A worker node with many jobs assigned to it will conduct swapping
between all the jobs in a round-robin fashion. While this avoids starvation (i.e small jobs
not being computed at all), it increases the average response time significantly for small
jobs in case it is competing for computing cycles with many other jobs.
5.1.2 Load parameters
Another improvement that could be made is the variables used to estimate load. In the
proposed method, only number of jobs per worker is considered a measurement of load.
13
14
However, if, for example, time remaining and memory consumption of a job (only be
known once it has been started) can be added to estimate load, a better distribution of
jobs would be attained. In that case the scheduler would try to aggregate smaller jobs
onto the same worker, and fewer larger jobs per worker. This way swapping between jobs
would be minimized and response time for small jobs would not suffer as much.
5.1.3 Timeline
It would be interesting to evaluate the scheduler if jobs were not only sent at the beginning
of the simulation. By for example adding delay between tasks in the input configuration
file, jobs arriving at different times would have to be accounted for. While a rate of which
jobs are sent from the Task Generator to the Scheduler provides some of this behaviour,
it primarily impacts the beginning of the simulation. In a real-world scenario jobs must
be able to arrive at any time to the scheduler, something a delay parameter between tasks
could provide.
5.2 Conclusion
In this report two cloud scheduling algorithms have been evaluated and compared to a
round-robin algorithm. The results show improvement over the round robin scheduling.
A simulator with a number of constraints was implemented to test these algorithms.
Beyond the proposed method, two other approaches were considered but were discarded
due to difficulties in estimating cost and time to complete.