19
A scheduling component for e-Science Central Anirudh Agarwal Jacek Cała

A scheduling component for e-Science Central Anirudh Agarwal Jacek Cała

Embed Size (px)

Citation preview

A scheduling component fore-Science Central

Anirudh AgarwalJacek Cała

2

Introduction

• .– Cloud-based workflow management system for

data analytics.

– Workflows composed of blocks which can be written in Java, R, Octave, JavaScript, Gnuplot, recently also bash.

– Portable system – workflows can run on a laptop, cluster, private or public clouds.

• EUBrazil Cloud Connect– to create an intercontinental, federated

infrastructure for the scientific use.

– combined effort between Brazil and several EU countries.

– 3 user applications to demonstrate potential of the EUBCC infrastructure:• Leishmania Virtual Laboratory, Heart Simulation, Biodiversity and climate change

3

EUBrazil Cloud ConnectAAI

Opportunistic Cloud

HPC

COMPSs PMESCSGRID e-SC

PDAS

fogbow

Private Cloud

mc2Users

Execution & Provisioning Services

InfrastructureProviders

COMPSse-SC APIProgramming Frameworks & Services

Data Providers

IM VMRC

LSF

OCCI CDMI

BESx509

oAuth2

OVFVOMS

OGE

4

EUBrazil Cloud ConnectAAI

Opportunistic Cloud

HPC

COMPSs PMESCSGRID e-SC

PDAS

fogbow

Private Cloud

mc2Users

Execution & Provisioning Services

InfrastructureProviders

COMPSse-SC APIProgramming Frameworks & Services

Data Providers

IM VMRC

LSF

OCCI CDMI

BESx509

oAuth2

OVFVOMS

OGE

5

e-Science Centralworkflow execution model

• Workflows are constructed from a number of interacting blocks.• Each workflow invocation is deployed onto one engine as a single job.• Each engine can process one or more workflows at a time.• Workflows can be composite -- can submit sub-workflow invocations

allowing for parallelism.

6

Advantages of the current model

• Simple management:– single pool of engines,– the pool can grow and shrink according to needs,– engines can be of different speed.

• Good scalability:– very little overheads.

0 50 100 150 200 2500

50

100

150

200

250

20.0

99.0%(49)

95.5%(95)

92.5% (139)

181.2

idealactual

Number of nodes

Rela

tive

spee

d-up

7

Limitations of the current model

• To simple for more sophisticated needs:– heterogeneous workflows/blocks,– heterogeneous hardware infrastructure.

• No control over invocation dispatch policy:– no priorities – e.g. admin == user,– no fairness – single user can block the system submitting 1000s of

invocations,– invocation messages may be consumed in an unfavourable manner.

• Invocation messages which are once moved to the JMS queue cannot be re-allocated.

8

Selected scheduler requirements

• To run workflows based on their hard and soft requirements and static and dynamic infrastructure capabilities:– support for heterogeneous workflows and resources <= federated

resources,– data-aware scheduling,– user-defined scheduling policies.

• To allow system to adapt in size dynamically (cloud bursting, opportunistic resources).

• To allow users to specify the priority for the workflows.• Improve the use of resources available:

– offer users/administrators some optimisation strategies.

9

Our focus in EUBCC

• To run workflows based on their hard and soft requirements and static and dynamic infrastructure capabilities:– support for heterogeneous workflows and resources <= federated

resources,– data-aware scheduling,– user-defined scheduling policies.

• To allow system to adapt in size dynamically (cloud bursting, opportunistic resources).

• To allow users to specify the priority for the workflows.• Improve the use of resources available:

– offer users/administrators some optimisation strategies.

10

Proposed solution

• Add a scheduling component (as a pluggable module) between the e-SC server and engines.

• Make use of the Performance Monitor which gathers information about the system.

• Have a one-one JMS queue for each engine (pool?).• Based on a Scheduling Policy choose the best engine to send

the workflow to.• Make sure the pending workflows can be rescheduled when

all the execution threads are busy.

11

Proposed Solution (cont.)

jBoss AS

e-SC server

JMS queue

Scheduling component

JMS queue

JMS queue

JMS queue

Engine pool Call engines equivalent

Engine pool Aall engines equivalent

Engine pool Ball engines equivalent

workflow invocations dispatched by the server

workflow invocations started by engines

workflow invocations started by users

12

Progress so far…

workflow invocations dispatched by the server

workflow invocations started by engines

workflow invocations started by users

performance and provenance information

jBoss AS

e-SC server

JMS queue

Scheduling component

JMS queue

JMS queue

JMS queue

Engine pool

Performance monitor

13

DEMO 1

14

Progress so far (cont.)

• Current scheduling policy based on CPU load– not effective – just as a PoC.

• More advanced queue management– able to dynamically attach a new engine to a “scheluder” queue,– able to grow the queue pool if needed.

• Able to save workflow invocations in the scheduler when all the engine execution threads are exhausted– currently assuming there is 1 execution thread per engine.

15

Current problems and issues

• Simple CPU load policy.

• Engine vs engine pool per queue.

• Impact of the delay betweenengine --> PM --> scheduler.

• Missing event-based communication betweenthe engine and server.

16

DEMO 2

17

Delay problem

E-SC server Scheduler

Performance Monitor

Engine

JMS QueueStart Workflow Check for Jobs

Get Information from PM

Send job to correct engine

Update PMUpdate Server about job status

5 sec delay

Gets wrong engine information from PM because of 5 second delay

Wrong engine maybe selected or wrong task maybe assigned

18

Expected issues and problems

• For more sophisticated policies:– Lack of input information about the task and its inputs and outputs:

• hardware/software requirements and capabilities,• absence of time completion for the task rules out many scheduling

policies,• data locality can play important role.

• Support for cloud bursting• e.g. interaction with an Infrastructure Manager

• Support for simulation– e.g. integration with WorkflowSim

19

DISCUSSION