Upload
ashiq-kohinoor
View
222
Download
0
Embed Size (px)
Citation preview
8/13/2019 Europar Tutorial GRMSr
1/153
University Dortmund
Tutorial
Grid Resource
Management and Scheduling
EuroPar 2004
Pisa, Italy
Ramin Yahyapour
CEI, University of Dortmund
8/13/2019 Europar Tutorial GRMSr
2/153
2
Agenda
Background on Resource Management and Scheduling
Job scheduling and resource management on HPC resources
Transition to Grid RM and Grid Scheduling
Current state-of-the-art
Future of GRMS
Requirements, outlook for upcoming GRMS
8/13/2019 Europar Tutorial GRMSr
3/153
3
Introduction
We all know what the Grid is
one of the many definitions:
Resource sharing & coordinated problem solving in dynamic, multi-
institutional virtual organizations (Ian Foster)
however, the actual scope of the Grid is still quite controversial
Many people consider High Performance Computing (HPC) as the
main Grid application.
todays Grids are mostly Computational Gridsor Data Gridswith HPCresources as building blocks
thus, Grid resource management is much related to resource
management on HPC resources (our starting point).
we will return to a broader Grid scope and its implications later
8/13/2019 Europar Tutorial GRMSr
4/153
4
Resource Management
on HPC Resources
HPC resources are usually parallel computers or large scale clusters
The local resource management systems (RMS) for such resources
includes:
configuration management
monitoring of machine state
job management
There is no standard for this resource management.
Several different proprietary solutions are in use.
Examples for job management systems: PBS, LSF, NQS, LoadLeveler, Condor
8/13/2019 Europar Tutorial GRMSr
5/153
5
HPC Management Architecture
in General
Compute Resources/Processing Nodes
Master
Server
Control Service
Job Master
Resource/
Job Monitor
Resource/
Job Monitor
Resource/
Job Monitor
Compute
Node
Compute
Node
Compute
Node
Resource and Job
Monitoring and Management Services
8/13/2019 Europar Tutorial GRMSr
6/153
6
Computational Job
A job is a computational task that requires processing capabilities (e.g. 64 nodes) and
is subject to constraints (e.g. a specific other job must finish before thestart of this job)
The job information is provided by the user resource requirements
CPU architecture, number of nodes, speed
memory size per CPU
software libraries, licenses
I/O capabilities
job description additional constraints and preferences
The format of job description is not standardized, but usually verysimilar
8/13/2019 Europar Tutorial GRMSr
7/153
7
Example: PBS Job Description
Simple job script:
#!/bin/csh
# resource limits: allocate needed nodes#PBS -l nodes=1
## resource limits: amount of memory and CPU time([[h:]m:]s).
#PBS -l mem=256mb#PBS -l cput=2:00:00
# path/filename for standard output#PBS -o master:/mypath/myjob.out
./my-task
whole job file is a shell script
information for the RMS are
comments
the actual job is started in the
script
8/13/2019 Europar Tutorial GRMSr
8/153
8
Job Submission
The user submits the job to the RMSe.g. issuing qsub jobscript.pbs
The user can control the job
qsub: submit
qstat: poll status information
qdel: cancel job
It is the task of the resource management system to start a job on the
required resources Current system state is taken into account
8/13/2019 Europar Tutorial GRMSr
9/153
9
Job ExecutionJob & Resource
Monitor
Processing Node
Job ExecutionJob & Resource
Monitor
Processing Node
PBS Structure
Job Submission
Management
ServerScheduler
Job ExecutionJob & Resource
Monitor
Processing Node
qsub jobscript
8/13/2019 Europar Tutorial GRMSr
10/153
10
Execution Alternatives
Time sharing:
The local scheduler starts multiple processes per physical CPU withthe goal of increasing resource utilization.
multi-tasking
The scheduler may also suspend jobs to keep the system load undercontrol
preemption
Space sharing:
The job uses the requested resources exclusively; no other job isallocated to the same set of CPUs.
The job has to be queued until sufficient resources are free.
8/13/2019 Europar Tutorial GRMSr
11/153
11
Job Classifications
Batch Jobs vs interactive jobs batch jobs are queued until execution
interactive jobs need immediate resource allocation
Parallel vs. sequential jobs
a job requires several processing nodes in parallel
the majority of HPC installations are used to run batch jobs in space-
sharing mode!
a job is not influenced by other co-allocated jobs
the assigned processors, node memory, caches etc. are exclusivelyavailable for a single job.
overhead for context switches is minimized
important aspects for parallel applications
8/13/2019 Europar Tutorial GRMSr
12/153
12
Preemption
A job is preempted by interrupting its current execution the job might be on hold on a CPU set and later resumed; job still
resident on that nodes (consumption of memory)
alternatively a checkpoint is written and the job is migrated to anotherresource where it is restarted later
Preemption can be useful to reallocate resources due to new jobsubmissions (e.g. with higher priority)
or if a job is running longer then expected.
8/13/2019 Europar Tutorial GRMSr
13/153
13
Job Scheduling
A job is assigned to resources through a scheduling process responsible for identifying available resources
matching job requirements to resources
making decision about job ordering and priorities
HPC resources are typically subject to high utilization
therefore, resources are not immediately available and jobs are
queued for future execution
time until execution is often quite long (many production systems have an
average delay until execution of >1h)
jobs may run for a long time (several hours, days or weeks)
8/13/2019 Europar Tutorial GRMSr
14/153
14
Typical Scheduling Objectives
Minimizing the Average Weighted Response Time
Maximize machine utilization/minimize idle time
conflicting objective
criteria is usually static for an installation and implicit given by the
scheduling algorithm
Jobsj
j
Jobsj
jjj
w
)r(tw
AWRT
r : submission time of a job
t : completion time of a job
w : weight/priority of a job
8/13/2019 Europar Tutorial GRMSr
15/153
15
Job Steps
Scheduler
Schedule
time
lokale
Job-Queue
HPC
Machine
Grid-
User
Job Execution
Management
Node Job
MgmtNode Job
MgmtNode Job
Mgmt
Job
Description
A user job enters a job queue,
the scheduler (its strategy)
decides on start time andresource allocation of the job.
8/13/2019 Europar Tutorial GRMSr
16/153
16
Scheduling Algorithms:
FCFS
Well known and very simple: First-Come First-Serve
Jobs are started in order of submission
Ad-hoc scheduling when resources become free again
no advance scheduling
Advantage: simple to implement
easy to understand and fair for the users
(job queue represents execution order)
does not require a priori knowledge about job lengths
Problems:
performance can extremely degrade; overall utilization of a machine can
suffer if highly parallel jobs occur, that is, if a significant share of nodes is
requested for a single job.
8/13/2019 Europar Tutorial GRMSr
17/153
17
FCFS Schedule
Scheduler
Schedule
time
Job-Queue
Compute
Resource
Resources
Procssing Nodes
Time
Queue
1.
2.
3.
4
8/13/2019 Europar Tutorial GRMSr
18/153
18
Scheduling Algorithms:
Backfilling
Improvement over FCFS A job can be started before an earlier submitted job if it does not
delay the first job in the queue
may still cause delay of other jobs further down the queue
Some fairness is still maintained Advantage:
utilization is improved
Information about the job execution length is needed
sometimes difficult to provide user estimation not necessarily accurate
Jobs are usually terminated after exceeding its allocated execution time;
otherwise users may deliberately underestimate the job length to get anearlier job start time
8/13/2019 Europar Tutorial GRMSr
19/153
19
Backfill Scheduling
Scheduler
Schedule
time
Job-Queue
Compute
Resource
Queue
1.
2.
3.
4
Job 3 is started before Job 2 as it does not delay it
Resources
Procssing Nodes
Time
8/13/2019 Europar Tutorial GRMSr
20/153
20
Backfill Scheduling
Scheduler
Schedule
time
Job-Queue
Compute
Resource
Resources
Procssing Nodes
Time
However, if a job finishes earlier than expected, the backfilling causesdelays that otherwise would not occur
need for accurate job length information (difficult to obtain)
Job finishes earlier!
Queue
1.
2.
3.
4
8/13/2019 Europar Tutorial GRMSr
21/153
21
Job Execution Manager
After the scheduling process,the RMS is responsible for the job execution:
sets up the execution environment for a job,
starts a job,
monitors job state, and cleans-up after execution (copying output-files etc.)
notifies the used (e.g. sending email)
8/13/2019 Europar Tutorial GRMSr
22/153
22
Scheduling Options
Parallel job scheduling algorithms are well studied; performance isusually acceptable
Real implementations may have addition requirements instead of need
of more complex theoretical algorithms:
Prioritization of jobs, users, or groups while maintaining fairness
Partitioning of machines
e.g.: interactive and development partition vs. production batch partitions
Combination of different queue characteristics
For instance, the Maui Scheduler is often deployed as it is quite flexible
in terms of prioritization, backfilling, fairness etc.
8/13/2019 Europar Tutorial GRMSr
23/153
University Dortmund
Transition to Grid Resource
Management and Scheduling
Current state of the art
8/13/2019 Europar Tutorial GRMSr
24/153
24
Transition to the Grid
More resource typescome into play: Resources are any kind of entity, service or capability to perform a specific
task
processing nodes, memory, storage, networks, experimental devices, instruments
data, software, licenses
people
The task/job/activity can also be of a broader meaning
a job may involve different resources and consists of several activities in a
workflow with according dependencies
The resources are distributed and may belong to different administrative
domains
HPC is still key the application for Grids. Consequently, the main resources in
a Grid are the previously considered HPC machines with their local RMS
I li i G id R
8/13/2019 Europar Tutorial GRMSr
25/153
25
Implications to Grid Resource
Management
Several security-related issues have to be considered: authentication,authorization,accounting
who has access to a certain resource?
what information can be exposed to whom?
There is lack of global information:
what resources are when available for an activity?
The resources are quite heterogeneous:
different RMS in use
individual access and usage paradigms
administrative policies have to be considered
8/13/2019 Europar Tutorial GRMSr
26/153
26
Scope of Grids
Cluster Grid Enterprise Grid Global Grid
Source: Ian Foster
8/13/2019 Europar Tutorial GRMSr
27/153
27
Resource Management Layer
Grid Resource Management System consists of : Local resource management system (Resource Layer)
Basic resource management unit
Provide a standard interface for using remote resources
e.g. GRAM, etc.
Global resource management system (Collective Layer) Coordinate all Local resource management system within multiple or distributed
Virtual Organizations (VOs)
Provide high-level functionalities to efficiently use all of resources
Job Submission
Resource Discovery and Selection
Scheduling Co-allocation
Job Monitoring, etc.
e.g. Meta-scheduler, Resource Broker, etc.
8/13/2019 Europar Tutorial GRMSr
28/153
28
Grid Middleware
Application
FabricControl local execution
ConnectivityCommunication with internalresource functions and services
ResourceAdd resource: Negotiate access,control access and utilization
Collective
Coordination of severalresources: infrastructureservices, application services
Internet
Transport
Application
Link
InternetProtocolArchitecture
Source: Ian Foster
8/13/2019 Europar Tutorial GRMSr
29/153
29
Grid Middleware (2)
Resource
Broker
Grid Resource
Manager
Grid Resource
Manager
Grid Resource
Manager
Information
Services
Monitoring
Services
SecurityServices
Core Grid
Infrastructure Services
Grid Middleware
PBS LSF
Resource Resource Resource
Local ResourceManagement
Higher-Level
Services
User/
Application
8/13/2019 Europar Tutorial GRMSr
30/153
30
Globus Grid Middleware
Globus Toolkit common source for Grid middleware
GT2
GT3Web/GridService-based
GT4WSRF-based
GRAM is responsible for providing a service for a given job
specification that can:
Create an environment for a job
Stage files to/from the environment
Submit a job to a local scheduler
Monitor a job
Send job state change notifications
Stream a jobs stdout/err during execution
8/13/2019 Europar Tutorial GRMSr
31/153
8/13/2019 Europar Tutorial GRMSr
32/153
32
Globus GT2 Execution
User/Application Resource Broker
Resource
Allocation
PBS LSF
Resource Resource Resource
GRAM GRAM GRAM
MDS
RSL
Specialized
RSL
RSL
8/13/2019 Europar Tutorial GRMSr
33/153
33
RSL
Grid jobs are described in the resource specificationlanguage (RSL)
RSL Version 1 is used in GT2
It has an LDAP filter-like syntax that supports boolean
expressions:
Example:
& (executable = a.out)(directory = /home/nobody )
(arguments = arg1 "arg 2")(count = 1)
8/13/2019 Europar Tutorial GRMSr
34/153
34
Globus Job States
suspended
pending stage-in active done
failed
stage-out
8/13/2019 Europar Tutorial GRMSr
35/153
35
Globus GT3
With transition to Web/Grid-Services, the job management becomes the Master Managed Job Factory Service (MMJFS)
the Managed Job Factory Service (MJFS)
the Managed Job Service (MJS)
The client contacts the MMJFS
MMJFS informs the MJFS to create a MJS for the job
The MJS takes care of managing the job actions.
interact with local scheduler
file staging store job status
8/13/2019 Europar Tutorial GRMSr
36/153
36
Globus GT3 Job Execution
User/Application
File Streaming
Factor Service
Managed
Job Service
Master Managed
Job Factory Service
Resource Information
Provider Service
Managed
Factory Job Service
Local Scheduler
File Streaming
Service
Globus as a toolkit does not perform scheduling and automatic
resource selection
Example: Extending the Globus
8/13/2019 Europar Tutorial GRMSr
37/153
37
Providers
ResourcePreferenceProvider
Example: Extending the Globus
Architecture at KAIST
JobSubmission
Service
Client
JobMonitoring
Service
SchedulingService
Resource
Selection
Service
ResourceInformation
Service
LocalResource
MonitoringService (RIPS)
InformationProvider
InformationProvider
Job MangerService (MJS)
LocalResource
Manager (PBS)
ResourceReservation
Service
WorkflowMonitoring Information Flow
1
2
9
3 4
57
86
Source: Jin-Soo Kim
8/13/2019 Europar Tutorial GRMSr
38/153
38
Job Description with RSL2
The version 2 of RSL is XML-based Two namespaces are used:
rsl: for basic types as int, string, path, url
gram: for the elements of a job
*GNS = http://www.globus.org/namespaces
8/13/2019 Europar Tutorial GRMSr
39/153
39
RSL2 Attributes
(type = rsl:integerType) Number of processes to run (default is 1)
(type = rsl:integerType)
On SMP multi-computers, number of nodes to distribute the count processes
across
count/hostCount = number of processes per host
(type = rsl:stringType)
Queue into which to submit job
(type = rsl:longType)
Maximum wall clock runtime in minutes
(type = rsl:longType)
Maximum CPU runtime in minutes
(type = rsl:longType)
Only applies if above are not used
Maximum wall clock or cpu runtime (schedulerss choice) in minutes
8/13/2019 Europar Tutorial GRMSr
40/153
40
Job Submission Tools
GT 3 provides the Java class GramClient GT 2.x: command line programs for job submission
globus-job-run: interactive jobs
globus-job-submit: batch jobs
globusrun: takes RSL as input
8/13/2019 Europar Tutorial GRMSr
41/153
41
Globus 2 Job Client Interface
A multirequest specifies multiple resources for a job
globus-job-run -dumprsl -: host1 /bin/uname -a \-: host2 /bin/uname a+ ( &(resourceManagerContact="host1")(subjobStartType=strict-barrier) (label="subjob 0")
(executable="/bin/uname") (arguments= "-a"))( &(resourceManagerContact="host2")(subjobStartType=strict-barrier)(label="subjob 1")(executable="/bin/uname") (arguments= "-a"))
A simple job submission requiring 2 nodes:globus-job-run np 2 s myprog arg1 arg2
8/13/2019 Europar Tutorial GRMSr
42/153
42
Globus 2 Job Client Interface
The full flexibility of RSL is available through the command line toolglobusrun
Support for file staging: executable and stdin/stdout
Example:
globusrun -o r hpc1.acme.com/jobmanager-pbs
'&(executable=$(HOME)/a.out) (jobtype=single)
(queue=time-shared)
Problem: Job Submission
8/13/2019 Europar Tutorial GRMSr
43/153
43
Problem: Job Submission
Descriptions differ
The deliverables of the GGF Working Group JSDL:
A specification for an abstract standard Job Submission Description
Language(JSDL) that is independent of language bindings, including;
the JSDL feature set and attribute semantics,
the definition of the relationship between attributes,
and the range of attribute values.
A normative XML Schemacorresponding to the JSDL specification.
A document of translation tablesto and from the scheduling languages of a
set of popular batch systems for both the job requirements and resource
description attributes of those languages, which are relevant to the JSDL.
8/13/2019 Europar Tutorial GRMSr
44/153
44
JSDL Attribute Categories
The job attribute categories will include:
Job Identity Attributes
ID, owner, group, project, type, etc.
Job Resource Attributes
hardware, software, including applications, Web and Grid Services, etc.
Job Environment Attributes
environment variables, argument lists, etc.
Job Data Attributes
databases, files, data formats, and staging, replication, caching, and disk
requirements, etc.
Job Scheduling Attributes start and end times, duration, immediate dependencies etc.
Job Security Attributes
authentication, authorisation, data encryption, etc.
Problem: Resource Management Systems
8/13/2019 Europar Tutorial GRMSr
45/153
45
Problem: Resource Management Systems
DifferAcross Each Component
Interface Format Execution Environment Platform Mix
LSFHas API plus Batch Utilities via
LSF Scripts
User: Local disk exported
System: Remote initialized (option)Unix, Windows
Grid EngineGDI API Interface plus
Command line interface
System: Remote initialized, with SGE local
variables exportedUnix only
PBSAPI (script option)
Batch Utilities via PBS Scripts
System: Remote initialized, with PBS local
variables exportedUnix only
Application
Scheduler
Task definition
Submit
Task Analysis
Executing Host
Staging
Task
Source: Hrabri Rajic
8/13/2019 Europar Tutorial GRMSr
46/153
46
GGF-WG DRMAA
GGF Working Group Distributed Resource ManagementApplication API
From the charter: Develop an API specification for the submission and
control of jobs to one or more Distributed ResourceManagement (DRM) systems.
The scope of this specification is all the high levelfunctionality which is necessary for an application to
consign a job to a DRM system including commonoperations on jobs like termination or suspension.
The objective is to facilitate the direct interfacing ofapplications to today's DRM systems by application'sbuilders, portal builders, and Independent SoftwareVendors (ISVs).
8/13/2019 Europar Tutorial GRMSr
47/153
47
DRMAA State Diagram
The remote job could be in followingstates:
system hold
user hold
system and user holdsimultaneously
queued active
system suspended
user suspended
system and user suspended
simultaneously running
finished (un)successfully
Source: Hrabri Rajic
8/13/2019 Europar Tutorial GRMSr
48/153
48
Example: Condor-G
Condor-G is a Condor system enhanced to manage Globus jobs.
It provides two main features
Globus Universe:interface for submitting, queuing and monitoring jobs that use Globus
resources
GlideIn:system for efficient execution of jobs on remote Globus resources
Condor-G runs as a personal Condor system daemons run as non-privileged user processes
each user runs her/his Condor-G
8/13/2019 Europar Tutorial GRMSr
49/153
49
Condor-G: GlideIn
Globus is used to run the Condor daemons on Grid resources Condor daemons run as a Globusmanaged job
GRAM service starts daemons rather than the Condor jobs
When the resources run these GlideIn jobs, they will join the personal
Condor pool
These daemons can be used to launch a job from Condor-G to aGlobus resource
Jobs are submitted as Condor jobs and they will be matched and run
on the Grid resources
the daemons receive jobs from the users Condor queue
combines the benefits of Globus and Condor
8/13/2019 Europar Tutorial GRMSr
50/153
50
Using GlideIn
Schedd JobManager
LSF
Startd
Collector
Condor-G Grid Resource
GridManager
600 Condorjobs
glide-ins
Source: ANL/USC ISI
E l DAGM
8/13/2019 Europar Tutorial GRMSr
51/153
51
Example: DAGMan
Directed Acyclic Graph Manager DAGMan allows you to specify the dependenciesbetween your Condor-G
jobs, so it can managethem automatically for you.
(e.g., Dont run job B until job A has completed successfully.)
A DAG is defined by a .dag file, listing each of its nodes and their
dependencies:# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B C
Parent B C Child D
each node will run the Condor-G job specified by its accompanying Condor
submit file
Job A
Job B Job C
Job D
Source: Miron Livny
University Dortmund
8/13/2019 Europar Tutorial GRMSr
52/153
Grid Scheduling
How to select resources in the Grid?
Diff L l f S h d li
8/13/2019 Europar Tutorial GRMSr
53/153
53
Different Level of Scheduling
Resource-level scheduler low-level scheduler, local scheduler, local resource manager
scheduler close to the resource, controlling a supercomputer, cluster, ornetwork of workstations, on the same local area network
Examples: Open PBS, PBS Pro, LSF, SGE
Enterprise-level scheduler Scheduling across multiple local schedulers belonging to the same
organization
Examples: PBS Pro peer scheduling, LSF Multicluster
Grid-level scheduler
also known as super-scheduler, broker, community scheduler Discovers resources that can meet a jobs requirements
Schedules across lower level schedulers
G id L l S h d l
8/13/2019 Europar Tutorial GRMSr
54/153
54
Grid-Level Scheduler
Discovers & selects the appropriate resource(s) for a job If selected resources are under the control of several local
schedulers, a meta-scheduling action is performed
Architecture:
Centralized: all lower level schedulers are under the control of a singleGrid scheduler
not realistic in global Grids
Distributed: lower level schedulers are under the control of several gridscheduler components; a local scheduler may receive jobs from several
components of the grid scheduler
G id S h d li
8/13/2019 Europar Tutorial GRMSr
55/153
55
Grid Scheduling
Scheduler
Schedule
time
Job-Queue
Machine 1
Scheduler
Schedule
time
Job-Queue
Machine 2
Scheduler
Schedule
time
Job-Queue
Machine 3
Grid-SchedulerGrid User
A ti iti f G id S h d l
8/13/2019 Europar Tutorial GRMSr
56/153
56
Activities of a Grid Scheduler
GGF Document:10 Actions of Super
Scheduling (GFD-I.4)1. Authorization Filtering
3. Min. Requirement Filtering
2. Application Definition
Phase One-Resource Discovery
5. System Selection
4. Information Gathering
Phase Two - System Selection
7. Job Submission
6. Advance Reservation
9. Monitoring Progress
8. Preparation Tasks
11. Clean-up Tasks
10 Job Completion
Phase Three- Job Execution
Source: Jennifer Schopf
G id S h d li
8/13/2019 Europar Tutorial GRMSr
57/153
57
Grid Scheduling
A Grid scheduler allows the user to specify the requiredresources and environment of the job without having toindicate the exact location of the resources
A Grid scheduler answers the question: to which local resourcemanger(s) should this job be submitted?
Answering this question is hard:
resources may dynamically join and leave a computational grid
not all currently unused resources are available to grid jobs:
resource owner policies such as maximum number of grid jobs
allowed
it is hard to predict how long jobs will wait in a queue
S l t R f E ti
8/13/2019 Europar Tutorial GRMSr
58/153
58
Select a Resource for Execution
Most systems do not provide advance information about future jobexecution
user information not accurate as mentioned before
new jobs arrive that may surpass current queue entries due to higherpriority
Grid scheduler might consider current queue situation,however this does not give reliable information for future executions:
A job may wait long in a short queue while it would have been executedearlier on another system.
Available information:
Grid information service gives the state of the resources and possiblyauthorization information
Prediction heuristics: estimate jobs wait time for a given resource, basedon the current state and the jobs requirements.
S l ti C it i
8/13/2019 Europar Tutorial GRMSr
59/153
59
Selection Criteria
Distribute jobs in order to balance load across resources not suitable for large scale grids with different providers
Data affinity: run job on the resource where data is located
Use heuristics to estimate job execution time.
Best-fit: select the set of resources with the smallest capabilities and
capacities that can meet jobs requirements
Quality of Service of
a resource or
its local resource management system what features has the local RMS?
can they be controlled from the Grid scheduler?
S h d li Att ib t
8/13/2019 Europar Tutorial GRMSr
60/153
60
Scheduling Attributes
Working Group in the Global Grid Forum toDefine the attributes of a lower-level scheduling instance that can be exploited
by a higher-level scheduling instance.
The following attributes have been defined:
Attributes of allocation properties
Revocation of an allocation The local scheduler reserves the right to withdraw a given allocation.
Guaranteed completion time of allocation,
A deadline for job completion is provided by the local scheduler.
Guaranteed Number of Attempts to Complete a Job A local scheduler will retry a given job task, e.g. useful for data transfer actions.
Allocations run-to-completion A job is not preempted if it has been started
Exclusive Allocations
A job has exclusive access to the given resources; e.g. no time-sharing is performed
Malleable Allocations
The given resource set may change during runtime; e.g. a computational job will gain(moldable job) or lose processors
S h d li Att ib t (2)
8/13/2019 Europar Tutorial GRMSr
61/153
61
Scheduling Attributes (2)
Attributes of available information Access to tentative schedule
The local scheduler exposes his schedule of future allocations
Option: Only the projected start time of a specified allocation is available
Option: Only partial information on the current schedule is available
Exclusive control
The local scheduler is exclusively in charge of the resources; no other jobs can appear onthe resources
Event Notification
The local scheduler provides an event subscription service
Attributes for manipulating allocation execution
Preemption
Checkpointing
Migration
Restart
Scheduling Attributes (3)
8/13/2019 Europar Tutorial GRMSr
62/153
62
Scheduling Attributes (3)
Attributes for requesting resources Allocation Offers
The local system can provides an interface to request offers for an allocation
Allocation Cost or Objective Information
The local scheduler can provide cost or objective information
Advance Reservation
Allocations can be reserved in advance
Requirement for Providing Maximum Allocation Length in Advance
The higher-level scheduler must provide a maximum job execution length
Deallocation Policy
A policy applies for the allocation that must be met to stay valid
Remote Co-Scheduling
A schedule can be generated by a higher-level instance and imposed on the localscheduler
Consideration of Job Dependencies
The local scheduler can deal with dependency information of jobs; e.g. for workflows
CSFCommunity Scheduler
8/13/2019 Europar Tutorial GRMSr
63/153
63
y
Framework
An open source implementation of OGSA-based metascheduler forVOs.
Supports emerging WS-Agreement spec
Supports GT GRAM
Fills in gaps in existing resource management picture
Contributed from platform to the Globus Toolkit
Extensible, Open Source framework for implementing meta-
schedulers
Provides basic protocols and interfaces to help resources work
together in heterogeneous environments
CSF Architecture
8/13/2019 Europar Tutorial GRMSr
64/153
64
CSF Architecture
Platform
LSF User
Globus
Toolkit User
Platform LSF
LSF
Meta-
schedulerPlugin
SGE PBS
Grid Service Hosting
Environment
Job
Service
Reservation
Service
Meta-SchedulerGlobal
Information
Service
RIPS
GRAM
SGE RIPS
GRAM
PBS RIPS
RM
Adapter
RIPS = Resource Information
Provider Service
Queuing
Service
Source: Chris Smith
Global Information Service
8/13/2019 Europar Tutorial GRMSr
65/153
65
Global Information Service
Global Information Service
RIPS DB
Index Service
RegistrySD
Aggregator
SDProvider
Manager
Data
Storage
Rsrc
Info
Req
Cluster
Info
Req
Data
Store
Req
Data
Load
Req
Cluster
Register Rsrc,
job rsv
info
RIPS
Data
Store
Load
Job
Info
Rsv
Info
Source: Chris Smith
Support for Virtual Organizations
8/13/2019 Europar Tutorial GRMSr
66/153
66
User of VO A
Virtual
Organization A
Organization 1Organization 2
Organization 3
Community
SchedulerCommunity
Scheduler
VirtualOrganization B
User of VO B
Support for Virtual Organizations
Source: Chris Smith
CSF Grid Services
8/13/2019 Europar Tutorial GRMSr
67/153
67
CSF Grid Services
Job Service: creates, monitors and controls compute jobs
Reservation Service:
guarantees resources are available for running a job
Queuing Service:
provides a service where administrators can customize and definescheduling policies at the VO level and/or at the different resource
manager level
Defines an API for plug in schedulers
RM Adaptor Service: provides a Grid service interface that bridges the Grid service protocol
and resource managers (LSF, PBS, SGE, Condor and other RMs)
GT3 Job Submission /
8/13/2019 Europar Tutorial GRMSr
68/153
68
Architecture
Site AMMJFS on node1
SGE
MJS for SGE
MMJFS
RIPS
Index Service
PBS
MJS for PBS
MMJFS
RIPS
LSF
MJS for LSF
MMJFS
RIPS
managed-job-
globusrun
managed-job-
globusrun
managed-job-
globusrun
Site BMMJFS on node2
Site CMMJFS on node3
MMJFS = Master Managed Job Factory Service
MJS = Managed Job ServiceBlueindicates a Grid Service hosted in a GT3 container
Source: Chris Smith
GT3 + CSF Architecture
8/13/2019 Europar Tutorial GRMSr
69/153
69
GT3 + CSF Architecture
Queuing
Service
Job
Service
Reservation
Service
Virtual
Organization
Index Service
PBS
Site B
RM Adapter
for PBS
RIPS
LSF
Site A
RM Adapterfor LSF
RIPS
SGE
Site C
MMJFS/MJS
RIPS
Source: Chris Smith
Queue Service
8/13/2019 Europar Tutorial GRMSr
70/153
70
Queue Service
In CSF, Job Service instances are submitted to theQueue Service for dispatch to a resource manager.
The Queue Service provides a plug in API for extending
the scheduling algorithms provided by default with CSF.
The Queue Service is responsible for: loads and validates configuration information
loads all configured scheduler plugins
calls the plugin API functions
schedInit() after loading the plugin successfully
schedOrder() when a new job is submitted
schedMatch() during the scheduling cycle
schedPost() before the scheduling cycle ends, and after scheduling
decisions are sent to the job service instances
8/13/2019 Europar Tutorial GRMSr
71/153
8/13/2019 Europar Tutorial GRMSr
72/153
Co-allocation
8/13/2019 Europar Tutorial GRMSr
73/153
73
Co-allocation
It is often requested that several resources are used for a single job. that is, a scheduler has to assure that all resources are available when
needed.
in parallel (e.g. visualization and processing)
with time dependencies (e.g. a workflow)
The task is especially difficult if the resources belong to different
administrative domains.
The actual allocation time must be known for co-allocation
or the different local resource management systems must synchronize
each other (wait for availability of all resources)
Example Multi-Site Job Execution
8/13/2019 Europar Tutorial GRMSr
74/153
74
Example Multi-Site Job Execution
Scheduler
Schedule
time
Job-Queue
Machine 2
Scheduler
Schedule
time
Job-Queue
Machine 3
A job uses several resources at different sites in parallel.
Network communication is an issue.
Scheduler
Schedule
time
Job-Queue
Machine 1
Grid-Scheduler
Multi-Side Job
Advanced Reservation
8/13/2019 Europar Tutorial GRMSr
75/153
75
Advanced Reservation
Co-allocation and other applications require a priori information aboutthe precise resource availability
With the concept of advanced reservation, the resource provider
guarantees a specified resource allocation.
includes a two- or three-phase commit for agreeing on the reservation
Implementations:
GARA/DUROC/SNAP provide interfaces for Globus to create advanced
reservation
implementations for network QoS available.
setup of a dedicated bandwidth between endpoints
Limitations of current Grid RMS
8/13/2019 Europar Tutorial GRMSr
76/153
76
Limitations of current Grid RMS
The interaction between local scheduling and higher-level Gridscheduling is currently a one-way communication
current local schedulers are not optimized for Grid-use
limited information available about future job execution
a site is usually selected by a Grid scheduler and the job enters the
remote queue.
The decision about job placement is inefficient.
Actual job execution is usually not known
Co-allocation is a problem as many systems do not provide advance
reservation
Example of Grid Scheduling
8/13/2019 Europar Tutorial GRMSr
77/153
77
Decision Making
Scheduler
Schedulet
ime
Job-Queue
Machine 1
Scheduler
Schedulet
ime
Job-Queue
Machine 2
Scheduler
Schedulet
ime
Job-Queue
Machine 3
Grid-SchedulerGrid User
15 jobs running
20 jobs queued
5 jobs running
2 jobs queued
40 jobs running
80 jobs queued
Where to put the Grid job?
Available Information from the
8/13/2019 Europar Tutorial GRMSr
78/153
78
Local Schedulers
Decision making is difficult for the Grid scheduler limited information about local schedulers is available
available information may not be reliable
Possible information:
queue length, running jobs detailed information about the queued jobs
execution length, process requirements,
tentative schedule about future job executions
These information are often technically not provided by the localscheduler
In addition, these information may be subject to privacy concerns!
Consequence
8/13/2019 Europar Tutorial GRMSr
79/153
79
Consequence
Consider a workflow with 3 short steps (e.g. 1 minute each) thatdepend on each other
Assume available machines with an average queue length of 1 hour.
The Grid scheduler can only submit the subsequent step if theprevious job step is finished.
Result:
The completion time of the workflow may be larger than 3 hours(compared to 3 minutes of execution time)
Current Grids are suitable for simple jobs, but still quite inefficient in
handling more complex applications
Need for better coordination of higher- and lower-level scheduling!
University Dortmund
8/13/2019 Europar Tutorial GRMSr
80/153
GRMS in
Next Generation Grids
Outlook on future Grid ResourceManagement and Scheduling
Example Grid Scenario
8/13/2019 Europar Tutorial GRMSr
81/153
81
Example Grid Scenario
Remote CenterReads and Generates TB of Data
LAN/WAN Transfer
WAN Transfer ComputeResources
Visualization
Assume a data-intensive simulation
that should be visualized and
steered during runtime!
Resource Request of a Simple
G id J b
8/13/2019 Europar Tutorial GRMSr
82/153
82
Grid Job
A specified architecture with
48 processing nodes,
1 GB of available memory, and
a specified licensed software package
for 1 hour between 8am and 6pm of the following day
Time must be known in advance.
A specific visualization device during program execution
Minimum bandwidth between the VR device and the main computer during program
execution
Input: a specified data set from a data repository
at most 4
preference of cheaper job execution over an earlier execution.
actually a pretty simple example (no complex workflows)
Use-Case Example:Coordinated
Si l ti d Vi li ti
8/13/2019 Europar Tutorial GRMSr
83/153
83
Simulation and Visualization
Expected output of a Grid scheduler:
time
Data Transfer
Loading Data Parallel Computation Providing Data
Data TransferNetwork 1
Computer 1
Parallel ComputationComputer 2
Communication for Computation
Network 3
VR-Cave Visualization
Data Data Access Storing Data
Communication for Visualization
Network 2
Software UsageSoftware License
Data StorageStorage
resources
Reservations are
necessary!
Need for a Grid Scheduling
A hit t
8/13/2019 Europar Tutorial GRMSr
84/153
84
Architecture
Grid-
Scheduler
Grid-
User/Application
Information
Services
Monitoring
Services
Security
Services
Accounting/Billing
Other
Grid ServicesLocal RMS
Schedule
time
Other
Resources/
Services
Schedule
time
Schedule
time
Schedule
time
Grid
Middleware
Local RMS
Grid
Middleware
Local RMS
Grid
Middleware
Local RMS
Grid
Middleware
Data
Resources
Compute
Resources
Network
Resources
8/13/2019 Europar Tutorial GRMSr
85/153
Service Oriented Architectures
8/13/2019 Europar Tutorial GRMSr
86/153
86
Service Oriented Architectures
Services are used to abstract all resources and functionalities. Concept of OGSI and WSRF
using WebServices, SOAP, XML to implement the services
OGSI idea of GridServices is implemented in GT3
transition to WSRF with GT4
Core service for building a Grid are discussed in the Open Grid
Services Architecture (OGSA)
Open Grid Services Architecture
8/13/2019 Europar Tutorial GRMSr
87/153
87
Web Services: Basic Functionality
OGSA
OGSI: Interface to Grid Infrastructure
Applications in Problem Domain X
Compute, Data & Storage Resources
Distributed
Application & Integration Technology for Problem Domain X
Users in Problem Domain X
Virtual Integration Architecture
Generic Virtual Service Access and Integration Layer
-
Structured DataIntegration
Structured Data Access
Structured DataRelational XML Semi-structured
Transformation
Registry
Job Submission
Data Transport Resource Usage
Banking
Brokering Workflow
Authorisation
p
OGSA Outlook
8/13/2019 Europar Tutorial GRMSr
88/153
88
Context
Services
Status +
MonitoringServices
Infrastructure
Services
Security
Services
Resource
Management
Services
Job &
Workflow
Management
Data
Services
Policy +
Agreement
Virtual
Organization
Data Access
Data Integration
Data Provision
Data Catalog
FirewallTransition
Delegation
Authorization
Authentication
WS-RF
(OGSI)Notification
WS
DistributedManagement
Event
Problem
Determination
Information
Service
Job
Manager
Job
Service
Logging
Service
Broker
Execution
Planning
Service
Workflow
Manager
Workload
Manager
Provisioning
Application
Content
Manager
Deploy +Configuration
Service
Service
Container
Reservation
Service
OGSA Execution Planning
8/13/2019 Europar Tutorial GRMSr
89/153
89
g
Work load Mgmt.
Framework
User/Job
Proxies
Environment
Mgmt.
Policies
SupplyDemand
CMM
Resource Mgmt. Framewo rk
Reservation
Opt imiz ing Framework
Resource Selection
Resource Workload
Optimal Mapping
Workload Optimization
Workload Post Balancing
Resource Provisioning
Work load Opt imiz ing Framework
Workload Optimization
Workload Orchestration
Workload Models (History/Prediction)
Dependency management
Scheduling Resource Opt imiz ing Framework
Capacity Management
Resource Placement
Primary Interaction
Meta - Interaction
Represents one or more OGSA services
Resource
Allocation
(or Binding)
Job
Factory
Admission Control (Resources)
Admission Control (Workload)
SLA Management (Workload)
Quality of Service (Resources)
Queuing Services
Resource
Factory
Information Provider
Selection Context (e.g. VO)
Functional Requirements
for Grid Scheduling
8/13/2019 Europar Tutorial GRMSr
90/153
90
for Grid Scheduling
Functional Requirements:
Cooperation between different resource providers
Interaction with local resource management systems
Support for reservations and service level agreements
Orchestration of coordinated resources allocations
Automatic handling of accounting and billing
Distributed Monitoring
Failure Transparency
What are Basic Blocks for
a Grid Scheduling Architecture?
8/13/2019 Europar Tutorial GRMSr
91/153
91
a Grid Scheduling Architecture?
Scheduling
Service
Data Management
Service
Network Management
Service
Information Service
Resources
Data
Network
Network-Resources
Management System
Network
Network Manager
Management
System
Compute/ Storage /Visualization etc
Compute Manager Data Manager
Data-Resources
Query for resources
Maintain information
Maintain information
static &
scheduled/forecasted
Reservation
Accounting
and Billing
Service
Job
SupervisorService
Scheduling-relevant Interfaces
of Basic Blocks are still to be
defined!
Information Service
Resource Discovery
8/13/2019 Europar Tutorial GRMSr
92/153
92
Resource Discovery
Relevant for Grid Scheduling: Access to static and dynamic information
Dynamic information include data about planned or forecasted future
events
e.g.: existing reservations, scheduled tasks, future availabilities
need for anonymous and limited information (privacy concerns)
Information about all resource types
including e.g. data and network
future reservation, data transfers etc.
Job/Workflow Description
Requirement Description
8/13/2019 Europar Tutorial GRMSr
93/153
93
Requirement Description
Information about the job specifics (what is the job) and job requirements (what is required for the job)
including data access and creation
Need for common workflow description
e.g. a DAG formulation
include static and dynamic dependencies
need for the ability to extract workflow information to schedule a whole
workflow in advance
Reservation Management
Agreement and Negotiation
8/13/2019 Europar Tutorial GRMSr
94/153
94
Agreement and Negotiation
Interaction between scheduling instances, betweenresource/agreement providers and agreement initiators (higher-level
scheduler)
access to tentative information necessary
negotiations might take very long
individual scheduling objectives to be considered probably market-oriented and economic scheduling needed
Need for combining agreements from different providers
coordinate complex resource requests or workflows
Maintain different negotiations at the same time
probable several levels of negotiations, agreement commitment andreservation
Accounting and Billing
8/13/2019 Europar Tutorial GRMSr
95/153
95
g g
Interaction to budget information Charging for allocations, reservations;
preliminary allocation of budgets
Concepts for reliable authorization of Grid schedulers to spend
money on behalf of the user
Re-funding in terms of resource/SLA failure, re-scheduling etc.
Reliable monitoring and accounting
required for tracing whether a party fulfilled an agreement
Monitoring Services
8/13/2019 Europar Tutorial GRMSr
96/153
96
Monitoring of resource conditions
agreements
schedules
program execution
SLA conformance workflow
Monitoring must be reliable as it is part of accountability
fail or fulfillment of a service/resource provider must be clearly identifiable
Conclusions for Grid Scheduling
8/13/2019 Europar Tutorial GRMSr
97/153
97
Grids ultimately require coordinated scheduling services.
Support for different scheduling instances
different local management systems
different scheduling algorithms/strategies
For arbitrary resources
not only computing resources, also
data, storage, network, software etc.
Support for co-allocation and reservation
necessary for coordinated grid usage
(see data, network, software, storage)
Different scheduling objectives cost, quality, other
Scheduling Model
8/13/2019 Europar Tutorial GRMSr
98/153
98
Using a Brokerage/Trading strategy:
Submit Grid
Job Description
Analyze Query
Query for
Allocation Offers
Higher-level
scheduling
Lower-levelscheduling
Discover ResourcesSelect Offers
Collect Offers
Coordinate Allocations
Generate
Allocation Offer
Consider individual owner
policies
Consider individual userpolicies
Properties ofMulti Level Scheduling Model
8/13/2019 Europar Tutorial GRMSr
99/153
99
Multi-Level Scheduling Model
Multi-level scheduling must support different RM systems andstrategies.
Provider can enforce individual policies in generating resource
offers.
User receive resource allocation optimized to the individualobjective
Different higher-level scheduling strategies can be applied.
Multiple levels of scheduling instances are possible
Support for fault-tolerant and load-balanced services.
Negotiation in Grids
8/13/2019 Europar Tutorial GRMSr
100/153
100
Multilevel Grid scheduling architecture Lower level local scheduling instance
Implementation of owner policies
Higher level Grid scheduling instance
Resource selection and coordination
(Static) Interface definition between both instances Different types of resources
Different local scheduling systems with different properties
Different owner policies
(Dynamic) Communication between both instances
Resource discovery Job monitoring
Using Service Level Agreements
8/13/2019 Europar Tutorial GRMSr
101/153
101
The mapping of jobs to resources can be abstracted using theconcept of Service Level Agreement (SLAs) (Czajkowski, Foster,Kesselman & Tuecke)
SLA: Contract negotiated between
resource provider, e.g. local scheduler
resource consumer, e.g., grid scheduler, application SLAs provide a uniform approach for the client to
specify resource and QoS requirements, while
hiding from the client details about the resources,
such as queue names and current workload
Service Level Agreement Types
8/13/2019 Europar Tutorial GRMSr
102/153
102
Resource SLA (RSLA)A promise of resource availability
Client must utilize promise in subsequent SLAs
Advance Reservation is an RSLA
Task SLA (TSLA)
A promise to perform a task Complex task requirements
May reference an RSLA
Binding SLA (BSLA) Binds a resource capability to a TSLA
May reference an RSLA (i.e. a reservation) May be created lazily to provision the task
Allows complex resource arrangements
Agreement-Based Negotiation
8/13/2019 Europar Tutorial GRMSr
103/153
103
A client (application) submits a task to a Grid scheduler The client negotiates a TSLA for the task with the Grid Scheduler
In order to provision the TSLA, the Grid Scheduler mayobtain an RSLA with the Grid resource or may use a pre-existing RSLA that the Grid scheduler has negotiatedspeculatively
TSLA that refers to an RSLA assures the jobs gets the reservedresources at a specified time
TSLA without an RSLA tells little about when the resources will be
available to the job
8/13/2019 Europar Tutorial GRMSr
104/153
8/13/2019 Europar Tutorial GRMSr
105/153
GGF: GRAAP-WG
8/13/2019 Europar Tutorial GRMSr
106/153
106
Goal: Defining WebService-based protocols for negotiation and
agreement management WS-Agreement Protocol:
Towards Grid Scheduling
8/13/2019 Europar Tutorial GRMSr
107/153
107
Grid Scheduling Methods: Support for individual scheduling objectives and policies
Multi-criteria scheduling models
Economic scheduling methods to Grids
Architectural requirements:
Generic job description
Negotiation interface between higher- and lower-level scheduler
Economic management services
Workflow management
Integration of data and network management
Grid Scheduling Strategies
8/13/2019 Europar Tutorial GRMSr
108/153
108
Current approach:
Extension of job scheduling for parallel computers.
Resource discovery and load-distribution to a remote resource
Usually batch job scheduling model on remote machine
But actually required for Grid scheduling is:
Co-allocation and coordinationof different resource allocations for a Grid job
Instantaneous ad-hoc allocation is not always suitable
This complex task involves:
Cooperation between different resource providers
Interaction with local resource management systems Support for reservations and service level agreements
Orchestration of coordinated resources allocations
User Objective
8/13/2019 Europar Tutorial GRMSr
109/153
109
Local computingtypically has:
A given scheduling objective as minimization of response time
Use of batch queuing strategies
Simple scheduling algorithms: FCFS, Backfilling
Grid Computingrequires: Individual scheduling objective
better resources
faster execution
cheaper execution
More complex objective functions apply for individual Grid jobs!
Provider/Owner Objective
8/13/2019 Europar Tutorial GRMSr
110/153
110
Local computingtypically has:
Single scheduling objective for the whole system:
e.g. minimization of average weighted response time
or high utilization/job throughput
In Grid Computing: Individual policies must be considered:
access policy,
priority policy,
accounting policy, and other
More complex objective functionsapply for individual resource
allocations!
User and owner policies/objectives may be subject to privacy
considerations!
Grid EconomicsDifferent Business Models
8/13/2019 Europar Tutorial GRMSr
111/153
111
Different Business Models
Cost model
Use of a resource
Reservation of a resource
Individual scheduling objective functions
User and owner objective functions
Formulation of an objective function Integration of the function in a scheduling algorithm
Resource selection
The scheduling instances act as broker Collection and evaluation of resource offers
Scheduling Objectives in the Grid
8/13/2019 Europar Tutorial GRMSr
112/153
112
In contrast to local computing, there is no general scheduling
objective anymore
minimizing response time
minimizing cost
tradeoff between quality, cost, response-time etc.
Cost and different service quality come into play the user will introduce individual objectives
the Grid can be seen as a market where resource are concurring
alternatives
Similarly, the resource provider has individual scheduling policies
Problem: the different policies and objectives must be integrated in the scheduling
process
different objectives require different scheduling strategies
part of the policies may not be suitable for public exposition
(e.g. different pricing or quality for certain user groups)
Grid Scheduling Algorithms
8/13/2019 Europar Tutorial GRMSr
113/153
113
Due to the mentioned requirements in Grids its not to be expected
that a single scheduling algorithm or strategy is suitable for all
problems.
Therefore, there is need for an infrastructure that
allows the integration of different scheduling algorithms the individual objectives and policies can be included
resource control stays at the participating service providers
Transition into a market-oriented Grid scheduling model
Economic Scheduling
8/13/2019 Europar Tutorial GRMSr
114/153
114
Market-oriented approaches are a suitable way to implement theinteraction of different scheduling layers
agents in the Grid market can implement different policies and strategies
negotiations and agreements link the different strategies together
participating sites stay autonomous
Needs for suitable scheduling algorithms and strategies for creating
and selecting offers
need for creating the Pareto-Optimal scheduling solutions
Performance relies highly on the available information
negotiation can be hard task if many potential providers are available.
Economic Scheduling (2)
8/13/2019 Europar Tutorial GRMSr
115/153
115
Several possibilities for market models:
auctions of resources/services
auctions of jobs
Offer-request mechanisms support:
inclusion of different cost models, price determination individual objective/utility functions for optimization goals
Market-oriented algorithms are considered:
robust
flexible in case of errors simple to adapt
markets can have unforeseeable dynamics
8/13/2019 Europar Tutorial GRMSr
116/153
Offer Creation (2)
8/13/2019 Europar Tutorial GRMSr
117/153
117
R1 R2 R3 R4 R5 R6 R7 R8
t
Offer 1
End of current
schedule
8/13/2019 Europar Tutorial GRMSr
118/153
Offer Creation (4)
8/13/2019 Europar Tutorial GRMSr
119/153
119
R1 R2 R3 R4 R5 R6 R7 R8
t
Offer 3
End of current
schedule
New end ofcurrent schedule
Evaluate Offers
8/13/2019 Europar Tutorial GRMSr
120/153
120
Evaluation with utility functions
)( 21max pricealatencyaUutil
A utility function is a mathematical representation of a users
preference
The utility function may be complex and
contain several different criteria
Example using response time (or delay time) and price:
Optimization Space
8/13/2019 Europar Tutorial GRMSr
121/153
121
latency
price
A1
A1
A3
A3
A2
A2
The utility depends on the user and provider objectives
Example: NWIRE
8/13/2019 Europar Tutorial GRMSr
122/153
122
Following are some slides to illustrate the scheduling process in the
NWIRE project.
NWIRE models a Grid infrastructure without central services
A P2P scheduling model is employed where Grid Scheduler/Managersthat reside on each local site are queried for offers.
A Grid schedulers asks known other Schedulers. Similar to P2P systems, no Grid manager knows the whole grid but
keeps a list of know other Grid managers
Those schedulers can forward the request to other schedulers
The scope and depths of the query can be parameterized
Individual objective functions are used to evaluate the utility for a userand the resource provider.
The scheduler selects the allocation offer with the highest utility value
The whole model represents a Grid market and auction system.
Evaluation of Economic Approach
8/13/2019 Europar Tutorial GRMSr
123/153
123
Real Workload Data of Grids is not yet available.
However, available are workloads of single HPC installations.
Example of evaluation results:
considered: traces from Cornell Theory Center (430 nodes IBM RS/6000 SP)
4 workload adapted from the real traces, each 10000 jobs
Assumed Synthetic Grid machine configurations:
Name Configuration Largest Maschine
m64 4*64 + 6*32 + 8*8 64
m128 4*128 128
m256 2*256 256
m384 1*384 + 1*64 + 4*16 384
m512 1*512 512
Scheduling Process
8/13/2019 Europar Tutorial GRMSr
124/153
124
ResourceManager ResourceManager ResourceManager
Grid
ManagerApplication
Request
RequirementsJob Attributes
ObjectiveFunction
1) Client sends a
request
Example Request
8/13/2019 Europar Tutorial GRMSr
125/153
125
KEY Hops {VALUE HOPS {2}}
KEY MaxOfferNumber {VALUE MaxOfferNumber {10}}
KEY MinMulti-Site {VALUE MinMulti-Site {10}}
...
KEY Utility {
ELEMENT 1 {
CONDITION{((OperatingSystem EQ Linux)&&( NumberOfProcessors >= 8))}
VALUE UtilityValue {-EndTime}
VALUE RunTime {5}
}
...
ELEMENT 2 {
CONDITION{ ... }VALUE UtilityValue {-JobCost}
}
...
}
Scheduling Process
8/13/2019 Europar Tutorial GRMSr
126/153
126
Grid
ManagerGrid
Manager
Resource
Manager
Resource
Manager
Resource
Manager
Grid
ManagerApplication
RequestRequest
RequestRequest Request
Scheduler requests offers from other Grid Managers/Schedulers.
Search depths and scope can be parameterized.
Request
Requirements
Job AttributesObjectiveFunction
2) Query for
other offers
3) Limitations of
direction and depth
Scheduling Process
8/13/2019 Europar Tutorial GRMSr
127/153
127
Grid
ManagerGrid
Manager
Resource
Manager
Resource
Manager
Resource
Manager
Grid
ManagerApplication
RequestRequest
Offer
AttributesObjectiveFunction
RequestRequest Request
Offer
Selection of allocation according to the utility value
Returned offers are collected.
4) Allocation;
Maximization of the
objectives
Scheduling Process
8/13/2019 Europar Tutorial GRMSr
128/153
128
Grid
ManagerGrid
Manager
Resource
Manager
Resource
Manager
Resource
Manager
Grid
ManagerApplication
RequestRequest
RequestRequest Request
Offer
The user can directly be informed about the schedule for his execution.
The Grid Manager may re-schedule his allocations to optimize the result further.
ScheduleAllokation_1Allokation_2Allokation_3
5) User and Resource
Managers are informed
about allocations
Offer Creation
8/13/2019 Europar Tutorial GRMSr
129/153
129
Zeit
1 2 3 4 5 6 7
Resources
B
C
D
A
Offer Creation
8/13/2019 Europar Tutorial GRMSr
130/153
130
Zeit
1 2 3 4 5 6 7
Resources
B
C
D
A
Offer Creation
8/13/2019 Europar Tutorial GRMSr
131/153
131
Zeit
1 2 3 4 5 6 7
Resources
B
C
D
A
8/13/2019 Europar Tutorial GRMSr
132/153
Offer Creation
8/13/2019 Europar Tutorial GRMSr
133/153
133
Zeit
1 2 3 4 5 6 7
Resources
B
C
D
A
8/13/2019 Europar Tutorial GRMSr
134/153
Offer Creation
8/13/2019 Europar Tutorial GRMSr
135/153
135
Zeit
1 2 3 4 5 6 7
Resources
B
C
D
A
Offer Creation
8/13/2019 Europar Tutorial GRMSr
136/153
136
Zeit
1 2 3 4 5 6 7
Resources
B
C
D
A
AWRT for Economic SchedulingCompared to Conservative Backfilling
8/13/2019 Europar Tutorial GRMSr
137/153
137
Changes in average completion time of
market-economic vs conventional scheduler
-15,000
-10,000
-5,000
0,000
5,000
10,000
M128 M256 M384
Resource Configuration
Changesonpct
Conventional
algoritms better
Markted-oriented
better
Utilization
8/13/2019 Europar Tutorial GRMSr
138/153
138
Changes in Utilization of
market-economic vs conventional schedulers
-3,00
-2,00
-1,00
0,00
1,00
2,00
3,00
M128 M256 M384
Resource Configurations
Changesinpc
t
Conventional
better
Markted-oriented
better
8/13/2019 Europar Tutorial GRMSr
139/153
Results on Economic Models
8/13/2019 Europar Tutorial GRMSr
140/153
140
Economical models provide results in the range of conventional
algorithms in terms of AWRT
Economic methods leave much more flexibilities in defining the
desired resources
Problems of site autonomy, heterogeneous resources and individualowner and user preferences are solved
Advantage of reservation of resources ahead of time
Further analysis needed:
Usage of other utility functions
8/13/2019 Europar Tutorial GRMSr
141/153
8/13/2019 Europar Tutorial GRMSr
142/153
Data Management
8/13/2019 Europar Tutorial GRMSr
143/153
143
Access to information about the location of data sets
Information about transfer costs
Scheduling of data transfers and data availability
optimize data transfers in regards to available network bandwidth and
storage space
Coordination with network or other resources
Similarities with general grid scheduling:
access to similar services
similar tasks to execute
interaction necessary
Functional View of GridData Management
8/13/2019 Europar Tutorial GRMSr
144/153
144
Location based on
data attributes
Location of one or
more physical replicas
State of grid resources,
performance measurementsand predictions
MetadataService
Application
Replica Location
Service
Information Services
Planner:Data location,Replica selection,Selection of compute
and storage nodes
Security and Policy
Executor:Initiates
data transfers andcomputations
Data Movement
Data Access
Compute Resources Storage Resources
Source: Carl Kesselmann
Replica Location Service InContext
8/13/2019 Europar Tutorial GRMSr
145/153
145
Replica Location ServiceReliable Data
Transfer Service
GridFTP
Reliable Replication Service
Replica Consistency Management Services
Metadata
Service
The Replica Location Service is one component in a layered datamanagement architecture
Provides a simple, distributed registry of mappings
Consistency management provided by higher-level services
Source: Carl Kesselmann
Example of a Scheduling Process
8/13/2019 Europar Tutorial GRMSr
146/153
146
Scheduling Service:
1. receives job description2. queries Information Service for static resource
information
3. prioritizes and pre-selects resources
4. queries for dynamic information about resourceavailability
5. queries Data and Network Management Services
6. generates schedule for job
7. reserves allocation if possibleotherwise selects another allocation
8. delegates job monitoring to Job Supervisor
Job Supervisor/Network and Data Management:service, monitor and initiate allocation
Example:
40 resources of requested type are
found.
12 resources are selected.
8 resources are available.
Network and data dependencies are
detected.
Utility function is evaluated.
6thtried allocation is confirmed.
Data/network provided and
job is started
Use-Case Example:CoordinatedSimulation and Visualization
8/13/2019 Europar Tutorial GRMSr
147/153
147
Expected output of a Grid scheduler:
time
Data Transfer
Loading Data Parallel Computation Providing Data
Data TransferNetwork 1
Computer 1
Parallel ComputationComputer 2
Communication for Computation
Network 3
VR-Cave Visualization
Data Data Access Storing Data
Communication for Visualization
Network 2
Software UsageSoftware License
Data StorageStorage
resources
Re-Scheduling
8/13/2019 Europar Tutorial GRMSr
148/153
148
Reconsidering a schedule with already made agreements may be a
good idea from time to time because resource situation may have changed, or
workload situation has changed
Optimization of the schedule can only work with the bounds of made
agreements and reservations given guarantees must be observed
The schedulers can try to maximize the utility values of the overallschedule
a Grid scheduler may negotiate with other resource providers in order toget better agreements; may cancel previous agreements
a local scheduler may optimize the local allocations to improve theschedule.
8/13/2019 Europar Tutorial GRMSr
149/153
Conclusion
8/13/2019 Europar Tutorial GRMSr
150/153
150
Resource management and scheduling is a key service in an Next
Generation Grid. In a large Grid the user cannot handle this task.
Nor is the orchestration of resources a provider task.
System integration is complex but vital.
The local systems must be enabled to interact with the Grid.
Providing sufficient information, expose services for negotiation
Basic research is still required in this area.
No ready-to-implement solution is available.
New concepts are necessary.
Current efforts provide the basic Grid infrastructure. Higher-levelservices as Grid scheduling are still lacking.
Future RMS systems will provide extensible negotiation interfaces
Grid scheduling will include coordination of different resources
Further Outlook
8/13/2019 Europar Tutorial GRMSr
151/153
151
Commercial Interest in the Grid may have stronger focus on non HPC-
related aspects eCommerce-Solutions
Dynamically migrating Application Server Load
Inter-operation between companies
Supply-Chain Management
request distributed resources under certain constraints Enterprise Application Integration EAI
dynamically discover and use available services
The problems tackled by Grid RMS is common to many application
fields. Grid solutions may become a key technology infrastructure
or solutions from other areas (e.g. the WebService-Community) maysurpass Grid efforts?
References
8/13/2019 Europar Tutorial GRMSr
152/153
152
Book: Grid Resource Management: State of the Art
and Future Trends,co-editors Jarek Nabrzyski, Jennifer M. Schopf, and
Jan Weglarz, Kluwer Publishing, 2004
PBS, PBS pro: www.openpbs.organd
www.pbspro.com
LSF, CSF: www.platform.com
Globus: www.globus.org
Global Grid Forum: www.ggf.org, see SRM area
http://www.openpbs.org/http://www.pbspro.com/http://www.platform.com/http://www.globus.org/http://www.ggf.org/http://www.ggf.org/http://www.globus.org/http://www.platform.com/http://www.pbspro.com/http://www.openpbs.org/8/13/2019 Europar Tutorial GRMSr
153/153
Contact:
Computer Engineering Institute
University of Dortmund
mailto:[email protected]:[email protected]