Workflows and Scheduling in Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID – Summer

Workflows and Schedulingin Grids

Ramin Yahyapour

University DortmundLeader CoreGRID Institute

on Resource Management and Scheduling

CoreGRID – Summer SchoolBudapest, 05 September 2007

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24.07.06

2

Implementations

CoreGRID RMS Institute Objective

Objectives: Development of a common and generic solution

for Grid resource management/scheduling in Next Generation Grids.

Development of new algorithms for coordinated scheduling for all resource types, including data,network etc.

Support of Grid business models in the scheduling process

Architecture

Algorithms

Goal: linking theoretical foundation and practical implementation on the different level of Resource Management


24.07.06

3

Inst. RMS

Inst. RMS

Current Institute Roadmap

Common Scheduling/Brokerage

Architecture Model

Common Scheduling/Brokerage

Architecture Model

Algorithms for coordinated scheduling/negotiation

Algorithms for coordinated scheduling/negotiation

Support for SLA Management and Negotiation

Support for SLA Management and Negotiation

Domain-specific solutions forComputational Grids

Domain-specific solutions forComputational Grids

Solutions for Evaluation, Testing, Prediction

Solutions for Evaluation, Testing, Prediction


24.07.06

4

Participants

CETIC, Belgium

IPP-BAS, Bulgaria

CNR-ISTI, Italy

CNRS, France

Delft University, Netherlands

EPFL, Switzerland

Fraunhofer Gesellschaft, Germany

Research Center Jülich, Germany

PSNC, Poland

MTA SZTAKI, Hungary

University of Münster, Germany

University of Calabria, Italy

University of Cyprus

University of Dortmund, Germany

University of Manchester, UK

EAI-FR, Switzerland

University of Westminster, UK

Technical University of Catalonia, Spain

Zuse Institute Berlin, Germany

University of Innsbruck, Austria

20 participating institutes; 89 researchers

Grid Scheduling


24.07.06

6

Key Question

“Which services/resources to use for an activity, when, where, how?”

Typically: A particular user, or business application, or component applicationneeds for an activity one or several services/resourcesunder given constraints

• Trust & Security• Timing & Economics• Functionality & Service level• Application-specifics & Inter-dependencies• Scheduling and Access Policies

This question has to be answered in an automatic, efficient, and reliable way.

Part of the invisible and smart infrastructure!


24.07.06

7

Motivation

Resource Management for Future/Next Generation Grids!

But what are Future Generation Grids?

HPC Computing– Parallel Computing– Cluster Computing– Desktop Computing

HPC Computing– Parallel Computing– Cluster Computing– Desktop Computing

Enterprise Grids– Business Services– Application Server– SOA/Webservices

Enterprise Grids– Business Services– Application Server– SOA/Webservices

Ambient IntelligenceUbiquitous Computing

– PDA, Mobile Devices

Ambient IntelligenceUbiquitous Computing

– PDA, Mobile Devicesdepends on who you ask!


24.07.06

8

Resource Definition

Concluding from the different interpretations of “Grid”:for broad acceptance Grid RMS should probably cover the whole scope;

Resources:

Compute

Network

Storage

Data

Software

– components, licenses

Services

– functionality, ability

Management of some resources is less complex,

while other resources require coordination and orchestration to be effective (e.g. HW and SW).

Management of some resources is less complex,

while other resources require coordination and orchestration to be effective (e.g. HW and SW).


24.07.06

9

Resource Management LayerGrid Resource Management System consists of :Local resource management system (Resource Layer)

– Basic resource management unit – Provide a standard interface for using remote resources– e.g. GRAM, etc.

Global resource management system (Collective Layer)– Coordinate all Local resource management system within multiple or

distributed Virtual Organizations (VOs)– Provide high-level functionalities to efficiently use all of resources

• Job Submission• Resource Discovery and Selection• Scheduling• Co-allocation• Job Monitoring, etc.

– e.g. Meta-scheduler, Resource Broker, etc.


24.07.06

10

ResourceBroker

Grid Resource Manager



Information Services

MonitoringServices

SecurityServices

Core Grid Infrastructure Services

Grid Middlewar

e

PBS LSF …

Resource Resource Resource

Local Resource

Management

Higher-Level Services

User/Application

Grid RMS


24.07.06

11

Grid Scheduling

Scheduler

Schedule

tim

e

Job-Queue

Machine 1

Scheduler

Scheduleti

me

Job-Queue

Machine 2

Scheduler

Schedule

tim

e

Job-Queue

Machine 3

Grid-SchedulerGrid User


24.07.06

12

Select a Resource for Execution

Most systems do not provide advance information about future job execution– user information not accurate as mentioned before– new jobs arrive that may surpass current queue entries due to

higher priority

Grid scheduler might consider current queue situation, however this does not give reliable information for future executions:– A job may wait long in a short queue while it would have been

executed earlier on another system.Available information:

– Grid information service gives the state of the resources and possibly authorization information

– Prediction heuristics: estimate job’s wait time for a given resource, based on the current state and the job’s requirements.


24.07.06

13

Co-allocation

It is often requested that several resources are used for a single job.– that is, a scheduler has to assure that all resources are

available when needed.• in parallel (e.g. visualization and processing)

• with time dependencies (e.g. a workflow)

The task is especially difficult if the resources belong to different administrative domains.– The actual allocation time must be known for co-allocation– or the different local resource management systems must

synchronize each other (wait for availability of all resources)


24.07.06

14

Example Multi-Site Job Execution

Scheduler

Scheduleti

me

Job-Queue

Machine 2

Scheduler

Schedule

tim

e

Job-Queue

Machine 3

A job uses several resources at different sites in parallel.Network communication is an issue.

Scheduler

Schedule

tim

e

Job-Queue

Machine 1

Grid-Scheduler

Multi-Side Job


24.07.06

15

Advanced Reservation

Co-allocation and other applications require a priori information about the precise resource availability

With the concept of advanced reservation, the resource provider guarantees a specified resource allocation.– includes a two- or three-phase commit for agreeing on

the reservation

Implementations:– GARA/DUROC/SNAP provide interfaces for Globus to

create advanced reservation– implementations for network QoS available.

• setup of a dedicated bandwidth between endpoints– “WS-Agreement” defines a protocol for agreement

management


24.07.06

16

Using Service Level Agreements

The mapping of jobs to resources can be abstracted using the concept of Service Level Agreement (SLAs)

SLA: Contract negotiated between– resource provider, e.g. local scheduler– resource consumer, e.g., grid scheduler, application

SLAs provide a uniform approach for the client to– specify resource and QoS requirements, while– hiding from the client details about the resources,– such as queue names and current workload


24.07.06

17

Execution Alternatives

Time sharing:

The local scheduler starts multiple processes per physical CPU with the goal of increasing resource utilization.

– multi-tasking

The scheduler may also suspend jobs to keep the system load under control

– preemption

Space sharing:

The job uses the requested resources exclusively; no other job is allocated to the same set of CPUs.

– The job has to be queued until sufficient resources are free.


24.07.06

18

Job Classifications

Batch Jobs vs interactive jobs

– batch jobs are queued until execution

– interactive jobs need immediate resource allocation

Parallel vs. sequential jobs

– a job requires several processing nodes in parallel

the majority of HPC installations are used to run batch jobs in space-sharing mode!

– a job is not influenced by other co-allocated jobs

– the assigned processors, node memory, caches etc. are exclusively available for a single job.

– overhead for context switches is minimized

– important aspects for parallel applications


24.07.06

19

Parallel Application Types

Rigid– Requires a fixed number of processors

Moldable – The number of processors can be adapted only at the start of

the execution

Malleable– Number of assigned processors can be changed during

runtime (i.e., grow/shrink)

D. G. Feitelson and L. Rudolph, “Toward convergence in job schedulers for parallel supercomputers,” in JSPP’96

Rigid

Moldable Malleable

# of Processors # of Processors # of Processors

time

time

time


24.07.06

20

Preemption

A job is preempted by interrupting its current execution– the job might be on hold on a CPU set and later

resumed; job still resident on that nodes (consumption of memory)

– alternatively a checkpoint is written and the job is migrated to another resource where it is restarted later

Preemption can be useful to reallocate resources due to new job submissions (e.g. with higher priority)

or if a job is running longer then expected.


24.07.06

21

Job Scheduling

A job is assigned to resources through a scheduling process– responsible for identifying available resources– matching job requirements to resources– making decision about job ordering and priorities

HPC resources are typically subject to high utilization

therefore, resources are not immediately available and jobs are queued for future execution– time until execution is often quite long (many production

systems have an average delay until execution of >1h)– jobs may run for a long time (several hours, days or

weeks)


24.07.06

22

Typical Scheduling Objectives

Minimizing the Average Weighted Response Time

Maximize machine utilization/minimize idle time– conflicting objective– criteria is usually static for an installation and implicit given

by the scheduling algorithm

Jobsjj

Jobsjjjj

w

)r(tw

AWRT

r : submission time of a job

t : completion time of a job

w : weight/priority of a job


24.07.06

23

Job Steps

Scheduler

Schedule

tim

elokale

Job-Queue

HPC Machin

e

Grid-

User

Job ExecutionManagement

Node Job Mgmt Node Job

Mgmt Node Job Mgmt

Job Description

A user job enters a job queue,

the scheduler (its strategy) decides on start time and resource allocation of the job.


24.07.06

24

Example of Grid Scheduling Decision Making

Scheduler

Schedule

tim

e

Job-Queue

Machine 1

Scheduler

Schedule

tim

e

Job-Queue

Machine 2

Scheduler

Schedule

tim

e

Job-Queue

Machine 3

Grid-SchedulerGrid User

15 jobs running20 jobs queued



Where to put the Grid job?


24.07.06

25

Available Information from the Local Schedulers

Decision making is difficult for the Grid scheduler

– limited information about local schedulers is available

– available information may not be reliable

Possible information:

– queue length, running jobs

– detailed information about the queued jobs• execution length, process requirements,…

– tentative schedule about future job executions

These information are often technically not provided by the local scheduler

In addition, these information may be subject to privacy concerns!


24.07.06

26

Grid-Level Scheduler

Discovers & selects the appropriate resource(s) for a job

If selected resources are under the control of several local schedulers, a meta-scheduling action is performed

Architecture:– Centralized: all lower level schedulers are under the

control of a single Grid scheduler• not realistic in global Grids

– Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler


24.07.06

27

Towards Grid Scheduling

Grid Scheduling Methods:

– Support for individual scheduling objectives and policies

– Multi-criteria scheduling models

– Economic scheduling methods to Grids

Architectural requirements:

– Generic job description

– Negotiation interface between higher- and lower-level scheduler

– Economic management services

– Workflow management

– Integration of data and network management


24.07.06

28

Scheduling Objectives in the Grid

In contrast to local computing, there is no general scheduling objective anymore– minimizing response time, minimizing cost– tradeoff between quality, cost, response-time etc.

Cost and different service quality come into play– the user will introduce individual objectives– the Grid can be seen as a market where resource are

concurring alternatives

Similarly, the resource provider has individual scheduling policies

Workflow Scheduling


24.07.06

30

WorkflowsWhat is a workflow?

Task1 Task 2 Task 3 Task 4

Example: A simple Job Chain

Dependencies between tasks/job steps:Control and/or data dependencies


24.07.06

31

Example of a Workflow

A simple workflow from climate research with data dependencies

Task1 Task 2 Task 3

Climate ArchiveSelect

InterestingData

VisualizeSimulate

DataSubset

NewResults


24.07.06

32

Communication/Data Dependencies

• Workflows can cover different communication models– synchronous

(e.g. streaming of multiple active jobs)

or– asynchronous

(e.g. via files)

• Synchronous communication requires co-allocation of jobs and data streaming management

• Asynchronous communication requires file/data management in distributed Grid environments


24.07.06

33

Impact of Coordinated Scheduling (1)

Consider an application example with a simple workflow consisting of 4 consecutive tasks/steps each running 4 minutes


Consider also a Grid resource with a batch queuing system (e.g. Torque) that has on average a queue waiting time of 60 minutes.

We apply a just-in-time scheduling.

How long will it take to execute the whole workflow?

Task 1 waits for 1h and runs for 5 minutesTask 2 waits for Task 1 to complete, all other tasks analogous = 4*1h + 4*5min = 4h 20 min


24.07.06

34

Impact of Coordinated Scheduling (2)

How to improve?

• or using advance reservations (= Planning)

How long will it ideally take to execute the whole workflow?

Task 1 waits for 1h and runs for 5 minutesTask 2 starts immediately after Task 1 all other tasks analogous = 1h + 4*5min = 1h 20 min

• put several step in the queue and keep them on hold if preceeding step is not finished(might produce idle times on resources)


24.07.06

35

More complex workflow (1)

Concurrent activities

Task1

T 2.3

T 2.2

T 2.1

Task3

T 4.3

T 4.2

T 4.1

Task5


24.07.06

36

More complex workflow (1)

Using loops



24.07.06

37

Example: DAGMan

Directed Acyclic Graph ManagerDAGMan allows you to specify the dependencies between your

Condor-G jobs, so it can manage them automatically for you.

(e.g., “Don’t run job “B” until job “A” has completed successfully.”)

A DAG is defined by a .dag file, listing each of its nodes and their dependencies:

# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B CParent B C Child D

Job A

Job B Job C

Job D

Source: Miron Livny


24.07.06

38

Dynamic Workflows vs Static Workflows

• Some workflows are not known in advance and its structure might be determined during run time

= Dynamic Workflows

• Static workflows are known in advance

• Major impact for planning and scheduling workflows


24.07.06

39

Promoter Identification Workflow

Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)


24.07.06

40

Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)


24.07.06

41

Ecology: GARP Analysis Pipeline for Invasive Species Prediction

Training sample

(d)

GARPrule set

(e)

Test sample (d)

Integrated layers

(native range) (c)

Speciespresence &

absence points(native range)

(a)EcoGridQuery

EcoGridQuery

LayerIntegration

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Validation

MapGeneration

Integrated layers (invasion area) (c)

Species presence &absence points

(invasion area) (a)

Native range

predictionmap (f)

Model qualityparameter (g)

Environmental layers (native

range) (b)

GenerateMetadata

ArchiveTo Ecogrid

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

Environmental layers (invasion

area) (b)

Invasionarea prediction

map (f)

Model qualityparameter (g)

Selectedpredictionmaps (h)

Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)


24.07.06

42

http://www.gridlab.org/


24.07.06

43

Triana Prototype

GEO 600 Coalescing Binary Search


24.07.06

44

Workflow Taxonomy

Workflow designAnd specification

Component/ServiceDiscovery

Scheduling andEnactment

Data Management

OperationalAttributes

Workflow System

structure

Model/spec

composition

Source: Omer Rana


24.07.06

45

Workflow Composition

User Directed Automated

Language-based Graph-based

Markup

Functional

LogicDAG

UML

Petri Net

ProcessCalculi

ProcessCalculi

Composition

User defined

scripting

Planner

Templates

DesignPatterns

Sub-workflows

Factory

Source: Omer Rana


24.07.06

46

Taxonomy of Workflow Scheduling

• Scheduling Criteria

– Single vs. multiple

• Number of workflows considered during scheduling step

– Single (optimizing a single workflow) vs.

– multiple (optimizing several or all workflows at the same time)

• Dynamicity

– Full-ahead vs.

– Just-time vs.

– Hybrid

Source: CoreGRID Report by U. Innsbruck, FhG FIRST Berlin


24.07.06

47

Taxonomy of Workflow Scheduling (2)

• Optimization Model

– Workflow-oriented (considering the benefit of a single workflow/user) vs.

– Grid-wide (overall optimization goal)

• Advance Reservation

– With AR (using reservations/SLAs)

– or without

Source: CoreGRID Report by U. Innsbruck, FhG FIRST Berlin


24.07.06

48

Taxonomy of Workflow Scheduling Systems

Source: Jia Yu, Rajkumar Buyya


24.07.06

49

Workflow Languages

Plenty of them, see Grid Workflow Forum:

Workflow languages (scientific and industrial)

* AGWL * BPEL4WS * BPML * DGL * DPML * GJobDL * GSFL * GFDL * GWorkflowDL * MoML * SWFL

* WSCL * WSCI * WSFL * XLANG * YAWL * SCUFL/XScufl * WPDL * PIF * PSL * OWL-S * xWFLSource: Grid Workflow Forum (www.gridworkflow.org)


24.07.06

50

Excerpt of Workflow Scheduling Systems

• DAGMan• Pegasus• Triana• ICENI• Taverna• GridAnt• GrADS• GridFlow• Unicore• Gridbus workflow• Askalon• Karajan• Kepler

Source: Grid Workflow Forum (www.gridworkflow.org)


24.07.06

51

Scheduling of a Workflow (1)

Schedules without advance reservation

- All times are depending on the local queues

- The probability of an accidental schedule that reflects the logical flow of the workflow tasks is rather low

- In many cases the workflow will be broken


24.07.06

52


Schedules without advance reservation - more “intelligent”

- All times are depending on the state of the local queues

- A subsequent task is submitted when the previous one terminates

- The logical flow of the workflow’s tasks is maintained

- Overall time depends on thelocal queues and the probability for longer makespan is quite high


24.07.06

53


Optimal schedules with advance reservation

- t3 = t2 and t5 = t4

- In case of data transfer oflengths td12 and td23 between tasks:

t3 = t2 + td12, t5 = t4 + td23


24.07.06

54

Thanks!


24.07.06

55

Background Information

Surveys on Grid Workflow Scheduling:

A Taxonomy of Workflow Management Systems for Grid Computing, Yu, J.

Buyya, R., JOURNAL OF GRID COMPUTING, 2005, VOL 3; NUMBER 3-4, pages 171-200

Taxonomy of the Multi-criteria Grid Workflow Scheduling Problem, Marek Wieczorek, Andreas Hoheisel, Radu ProdanCoreGrid Technical Report

Documents

Workflows and Scheduling in Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID – Summer