Local Resource Management System & State Estimation Local resource management systems Condor,...

Local Resource Management System & State Estimation

Local resource management systems Condor, Maui, LSF, PBS

Prediction techniques example NWS improve resource selection

Condor - Introduction

Batch job system that allows usage of both dedicated and non-dedicated systems.

Provides users with extra computing power

Introduces complexities remove jobs before they are finished (preemption) run on a wide array of machines (matchmaking)

CondorPreemptive Resume Scheduling

Advantages use resources that are only available occasionally by

the use of checkpoints, preemption and allocation no backfilling (take advantage of holes in the schedule

to run more jobs, and hereby increase efficiency) fair sharing of jobs and towards users compute on demand (low vs high priority)

Condor – Scheduling

Submit jobs to local computer queue Interact with matchmaker to run job (1 cpu/job) Run appropiate (ClassAd) job by claiming it

Triumvirate

User agent – make sure job finishes, on failure resubmit, etc.

Owner agent – ensure owner's policy of how computer is used, responsible for running submitted jobs

Matchmaker – find matches between user and owner agent and implement system-wide policies

Triumvirate (2)

Condor – Matchmaking & Claiming

User submits job to queue, unique identification User agent sends ClassAd (5 min) until there are

jobs that are not running Owner agent sends ClassAd (5 min) to describe

the computer it is responsible for Matchmaker accepts ClassAd's and attempts to

find matches – negotiation On match, user and owner agent independently of

matchmaker work out the details (up-to-date inf.) User agent sends job to owner agent, and it runs

Condor – Matchmaking & Claiming (2)

On problems outside process redo matchmaking; on program error, record problem and inform user

When program starts, another process (shadow) is started on user agent that is responsible for Condor’s remote I/O capabilities

Running jobs continue even if matchmaker fails

Condor - preemption

Preemption is necessary to respect interests of all parties

Key to success is checkpoint creation when preempted from a machine manual checkpoint creation periodic checkpoint creation to safeguard against failures

Crashes/disruptions happen frequently in grids Check pointing and reacting to preemptions is an

essential part of Condor’s approach to reliability.

Condor – user preemption

Manual preemption Automation of above process (eg. running time) Preemption on behalf of Condor

eg. check if job can run on a better machine not supported in current version of Condor needs consideration such as ‘thrashing’ (always look

for better computer, not being able to do any jobs)

Condor – owner / matchmaker preemption

Owner removes job running on his machine automated by Condor (eg. check keyboard inactivity) manually by running a command

Matchmaker can enforce administrator policies to increase efficiency eg. run a better job on a machine already running one Condor strongly prefers however not to preempt jobs if

they can be run on an idle machine.

Condor - conclusion

Condor can balance the desires of all stakeholders Condor can take both advantage of sporadically

available resources and react to problems such as failures

This flexibility and robustness is its key to success

Maui Scheduler - Introduction

High performance scheduler for local clusters Includes resource reservation, availability

estimation and allocation management

External manager, extends and enhances the capabilities and performance of existing scheduler

Maui – Allocation properties

Concept of reservation to maintain resource allocations most important feature is future allocations set aside a block of resources for various purposes such

as cluster maintenance, guaranteed job start time resource expression: resource quantity and type

conditions which must be met to include access control list (ACL): which consumers may utilize

the reserved resources timeframe: time period over which reservation actually

blocks resources

Maui – Allocation properties (2)

Revocation of allocation support for revocable and irrevocable reservations eg. strict time constrains on data availability or job

completion default is irrevocable; reservations maintained until

timeframe has expired or explicitly removed Guaranteed completion time of allocations

locked to exact time, guaranteed to complete before certain time or guaranteed to start after given time

scheduler regularly tries to optimize

Guaranteed number of attempts to complete a job don’t attempt to start job until all prerequisites are met using defer mechanism maui can specify how many

times to locate resources for a job before giving up, or putting on hold

Allocation run-to-completion configure to disable all or subset of preemptions thus

guaranteeing a job to complete without interference Exclusive allocations

request dedicated resources to guarantee exclusive access

Malleable Allocations all aspects can be dynamically modified if job consumes excessive resources, Maui can preempt or

even cancel job depending on the resource utilization policy

Maui - Access to available scheduling info

Access to the tentative scheduler provide information to all possible availability times scheduler can request single estimated start time for

job Exclusive control

Maui maintains exclusive control over the execution Event notification

generalized event management interface; respond immediately to changes in the environment

Maui – Requesting resources

Allocation offers full contextual information regarding the request and if

and how Maui can satisfy this request Allocation cost or objective information

interface with allocation management systems that assist to assign costs to resource consumption

Advance reservation allows full control to peers over the scheduling of jobs

through time Requirement for providing maximum allocation

time in advance credential-based walltime limits can be configured

based on various criteria

Maui – Requesting resources (2)

Deallocation policy support for single-step resource allocation requests;

create resource allocation valid until job completion two-phase courtesy reservation; after courtesy is sent,

needs to receive a reservation commit; otherwise remove job

Remote co-scheduling stage remote jobs to a local cluster

Consideration of job dependencies offer basic job dependency support to block certain job

steps until specific prerequisites are met

Maui – Manipulating the allocation execution

Preemption suspend operations are supported as far as that

capability is available in the underlying manager Checkpointing

‘checkpoint and terminate’ & ‘checkpoint and continue’ are supported

Migration support for intra-domain job migration, but no support

for QoS, load balancing, or other optimization Restart

checkpoints used if available

LSF - Introduction

As a low-level scheduler

Load Sharing Facility

LSF – Available-information attributes

Access to the tentative scheduler often impractical in real-world applications, no support

Exclusive control LSF executes in user-space, so its control is not

exclusive so can only provide necessary measures Event notification

supplies an event-notification service for high-level schedulers

LSF – Available-information attributes

Access to the tentative scheduler often impractical in real-world applications, no support

Exclusive control LSF executes in user-space, so its control is not

exclusive so can only provide necessary measures Event notification

supplies an event-notification service for high-level schedulers

LSF – Requesting resources

Allocation offers doesn’t expose potential resource allocations

Allocation cost or objective information unsupported

Advance reservation provides built-in and Maui-integrated capabilites

Requirement for providing maximum allocation time in advance high regard

LSF – Requesting resources (2)

Deallocation policy automatic

Remote co-scheduling support by a higher-order scheduling instances

Consideration of job dependencies built-in support for job dependencies by logical

expressions based on 15 dependency conditions

LSF – Allocation properties

Revocation of allocation not needed because of resource shortness, etc.

Guaranteed completion time of allocations

LSF – Allocation properties (2)

Guaranteed number of attempts to complete a job distinguish between attempts that are execution pre-

condition and execution condition with complete flexibility

Allocation run-to-completion with implicit assumptions that allocations don’t exceed

resource limits for example Exclusive allocations

can dispatch jobs to hosts where no other LSF job is running

LSF – Allocation properties (3)

Malleable Allocations built-in mechanisms allow allocations to decay consumption

over time on a per-resource basis

LSF – Manipulating the allocation execution

Preemption support since 1995, preempted workloads retain

resources Checkpointing

assuming application supports it, LSF provides interface Migration

provide mechanism to be done by high-level scheduler Restart

provides interface

LSF - Conclusion

Supports most attributes of a low-level scheduler that can be exploited by a high-level scheduler

PBS – Introduction

Portable Batch System Flexible workload management and batch job

scheduling system Covers the entire Grid computing space: security,

information, compute and data Middleware technology that sits between

compute-intensive or data-intensive applictions and the network, hardware and OS

All jobs to single virtual pool which is scheduled and distributed on the grid

PBS – Security

Fundamental capabilities are secure authentication and authentication

Internally it makes use of user-name based auth Support for X.509 Grid standard identification

certificate lifetime (expire/renew) Identity mapping between sites is handled by a

mapping function

PBS - Information

Information management with access to the state of the infrastructure

Collect real-time data on state with job executor daemon process (MOMs)

Easy integration with larger Grid information databases

PBS - Compute

Advance reservation support check for conflicts eg. reserve resources for car-crash test including

computer cycles, network, database, facility Cycle harvesting

expand available computing resources by using idle workstations

Peer scheduling enable a site or sites with different PBS installations to

automatically run jobs from eachother no job will be moved if it cannot run immediately

PBS - Data

Most basic capability of data Grid: file staging automatic handling of copying files onto execution

nodes (stage-in) prior to running job copying files off execution nodes (stage-out) after job

completes PBS will not run jobs until stage-in is fully done Support for Globus Toolkit, scp, Gridftp, etc.

PBS – Available-information attributes

Access basic information by typing qstat Email notification

PBS – Requesting resources

Single resource solution to a job request Estimated completion time is configurable

absence of this information however hampers peformance (needed by backfilling for example)

Job dependencies Co-scheduling by simply configuring the queues

of the system

PBS – Allocation properties

Revoke any allocation both while job is queued or is running

Also possible preemption by the scheduler; choice of suspension, checkpointing, requeuing, termination

Configurable job completion attempts Configurable exclusive allocation, etc. No support for malleable allocation (eg. allows

addition or revocation of resources during runtime)

PBS - Manipulating the allocation execution

Support for requeue, restart On preemption checkpoint generation and

migration

Prediction techniques

Problem of scheduling and resource allocation are central to Grid performance

Applications must balance between performance and communication overhead parallelism produces

Grid resources differ widely in performance

A resource allocator must choose right combination of resources from pool while it's constantly changing

Prediction techniques (2)

Categorization into static and dynamic performance characteristics based on speed of change

static: clock speed (CPU) for example dynamic: CPU load, network throughput

Grid resource performance prediction

For a grid scheduler two characteristics can be exploited to overcome the complexities introduced by the dynamics of Grid performance response

Observable Forecast Accuracy predictions for future performance measurements can

be evaluated by recording the accuracy once the measurements are actually gathered

Near-term Forecasting Epochs scheduler can make decisions dynamically, just before

execution begins. Since accuracy usually degrades into the future, make decision at last possible moment

Prediction – an example (NWS)

Provide 3 fundamental functionalities Monitoring, Forecasting, Reporting

NWS – Network Weather Service grid monitoring and forecasting tool designed to

support dynamic resource allocation and scheduling sensor control subsystem historical data for future performance prediction multiple reporting interfaces convenient methodology for replication and caching

Prediction – an example (NWS) (2)

Performance monitoring and forecasting system must be able to execute on all platforms available to the user written in C; highest portability with standard libs

Two types of monitors (CPU probe) passive: read measurement gathered through some

other means (eg. local OS) eg. UNIX load average non-intrusive inaccurate?

active: load own resource and observe performance response

know exact performance intrusive

Intrusiveness vs Scalability (Network probe) probe the network by timing packet travel duration for more hosts, probe collision will occur, resulting in

loss of bandwidth NWS uses a token-passing method to prevent such

problems

Forecasting an inherent problem of prediction. assumptions made on what resources will be when the

job runs in Grid settings, available resource performance can

fluctuate dynamically

NWS uses statistical methods to attempt to mechanize and automate forecasting based on historical data

Prediction - Conclusions

Effective resource allocation and scheduling are critical to performance

Immediate performance history data is used to make implicit prediction

To be truly effective the performance gathering system must be robust, portable and non-intrusive

Overhead introduced by perf.gath. system must be carefully controlled

Using fast, robust techniques it is possible to improve accuracy of performance predictions

Improve resource selection with prediction

Run time predictions statistical analysis that have already run automatic code analysis or instrumentation

Explanation of two techniques, both using statistical data with information provided to scheduler upon run

Categorization prediction technique

Derive run time predictions from historical information based on previous similar runs many ways to look at similar applications; application

name, user, arguments, submission time, etc. use of genetic algorithm to identify good templates (eg

user+time) for a given workload use a mean prediction type results are an average error of 39%

Instance-based learning approach

Also called locally-weighted learning techniques A database of experiences is maintained and used

for predictions each entry consists of input and output features input is the condition under which experience was

observed output describe what happened under those conditions

Use genetic algorithm to find values that minimize prediction error

Error rate of 49%

Queue wait time predictors

Request to execute a job is not serviced immediately but put on a queue

Predictions of wait times are useful for such systems guide user to select appropiate queue submit multiple requests so they receive resources

simultaneously plan other activities in supercomputer environments

Scheduling algorithms

Two methods are examined

Predict execution time for each application in the system and use this to drive simulation algorithm potential to provide very accurate run time predictors if queue items depend on items not yet submitted to

queue, inaccuracy drops requires detailed knowledge of scheduling system used

Predict wait time based on wait times of applications that were in a similer scheduler state eg how long will it take if I have 3 before me and 4

after?

Scheduling Algorithms (2)

FCFS, LWF and conservative backfill First Come First Serve, in order of arrival Least Work First tries in order of arrival but ordered in

estimated amount of work CF is a variant on FCFS in that it allows a job to run

before it would if it doesn't delay jobs in the queue waiting before it

Results show that FCFS is most accurately predicted followed by backfill and LWF.

Both methods are affected by not knowing what applications will be submitted in the near future

Scheduling

Scheduling using run time predicitons use application execution times for scheduling measure utilization and wait time improves backfill and LWF minimal impact but decreases mean wait time by 25%

Scheduling with advance reservations some applications want resources from multiple

parallel computers to execute non-restarble applications are forced to used maximum

wait times as predictions when scheduling even without reservations, performance can be

increased with more accurate run time predictions

Local Resource Management System & State Estimation Local resource management systems Condor,...

Documents

Local Crop Genetic Resource Utilization and Management … · Local Crop Genetic Resource Utilization and Management ... Local Crop Genetic Resource Utilization and Management in

Lsf advert rh con

Literature Review of LSF

LSF Magazine Winter 2014

IBM Platform LSF Command Referencescc.ustc.edu.cn/zlsc/lsf/201404/W020140411542762423595.pdf · IBM Platform LSF Command Reference 9 Displays accounting statistics for the specified

Load Sharing Facility (LSF)

LSF ACADEMY - BASKETBALL

Installing Platform LSF on UNIX and Linux Platform LSF on UNIX and Linux

Platform LSF Foundations

LSF magazine

LSF Magazine Fall 2012

Contexto Na LSF

Using Platform LSF on Windows - SAS SupportUsing Platform LSF on Windows 9 Platform LSF on Platform EGO LSF on Platform EGO allows EGO to serve as the central resource broker, enabling

Inter Frame Lsf

Platform LSF Configuration Reference · IBM, you grant IBM a ... The user environment shell files cshrc.lsfand profile.lsfset the LSF operating ... user environment for LSF commands

LSF Magazine Summer 2012

LSF Bulletin

LSF funding proposal 2015

Release Notes for Platform LSF - BSC-CNS · v Migrating IBM Platform LSF Version 7 to IBM Platform LSF Version 9.1.2 on Windows - SC27531702 v IBM Platform LSF Security - SC27530302

Release Notes for Platform LSF - SAS Technical Support · LSF documentation in the IBM Knowledge Center ... to have a local version of the full LSF documentation set. ... v Release