CSE 160/Berman Grid Computing 2 ://legion/ (thanks to shava and holly

CSE 160/Berman

Grid Computing 2

http://www.globus.orghttp://www.cs.virginia.edu/

~legion/http://www.cs.wisc.edu/condor/

(thanks to shava and holly [see notes for CSE 225])

http://www.globus.org/

http://www.globus.org/

http://xxx/

http://www.cs.wisc.edu/condor/

CSE 160/Berman

Outline

• Today:– Condor

– Globus

– Legion

• Next class: – Talk by Marc Snir, Architect of IBM’s

Blue Gene

– Tuesday June 6, AP&M 4301 1:00-2:00

CSE 160/Berman

Condor

• Condor is a high-throughput scheduler

– Main idea is to leverage free cycles on very large collections of privately owned, non-dedicated desktop workstations

– Performance measure is throughput of jobs

• Rather than how fast can a particular job run, how many jobs can complete over a long period of time.

• Developed by Miron Livny et al. at U. of Wisconsin

CSE 160/Berman

Condor Basics• Condor = “hunter of idle workstations”• Condor pool consists of large number of

privately controlled UNIX workstations– (Condor now being ported to NT)– WS owners define the conditions under which the WS

can be allocated by Condor to an external user

• External Condor jobs run while machines are idle– User does not need a login on participating

machines• Uses remote system calls to submitting WS

CSE 160/Berman

Condor Architecture (all machines in same Condor

Pool)

Architecture:• Each WS runs Schedd

and Startd daemons– Startd monitors and

terminates jobs assigned by CM

– Schedd queues jobs submitted to Condor at that WS and seeks resources for them

• Central Manager (CM) WS controls allocation and execution for all jobs

Schedd

Shadow

Startd

Starter

UserProcess

Central Manager

SubmissionMachine

ExecutionMachine

CSE 160/Berman

Standard Condor Protocol (all machines in same Condor

Pool)Protocol:• Schedd (submitting machine) sends job

context to CM; Execution machine sends machine context to CM

• CM identifies a match between job requirements and execution machine resources

• CM sends to Schedd the execution machine ID

• Schedd forks a Shadow process on submission machine

• Shadow passes job requirements to Startd on execution machine and gets acknowledgement that execution machine is still idle

• Shadow sends executable to execution machine where it executes until completion or migration

Schedd

Shadow

Startd

Starter

UserProcess

Central Manager

SubmissionMachine

ExecutionMachine

CSE 160/Berman

More Condor Basics• Participating condor machines not required to

share file systems• No source code changes to user’s code

required to use Condor, users must re-link their program in order to use checkpoint and migration– vanilla jobs vs. condor jobs

• Condor jobs allocated to good target resource using a matchmaker

• Single condor jobs automatically checkpointed and migrated between WSs, and restarted as needed

CSE 160/Berman

Condor Remote System Call Strategy • Job must be able to read and write files on its

submit workstation

Submission WS

Submittedprocess

file

Execution WS

Submission WS

allocatedprocess

Shadowprocess

file

ExecutionWS

After allocation …

CSE 160/Berman

Condor Matchmaking• Matchmaking mechanism matches job specs to

machine characteristics• Matchmaking done using classads

– Resources produce resource offer ads• Include information such as available RAM memory, CPU type

and speed, virtual memory size, physical location, current load average, etc.

– Jobs provide resource request ad which defines the required and desired set of resources to run on

• Condor acts as a broker which matches and ranks resource offer ads against resource request ads – Condor makes sure that all requirements in both ads are

satisfied– Priorities of users and certain types of ads also taken into

consideration

CSE 160/Berman

Condor Checkpointing

• When WS owner returns, job can be checkpointed and restarted on another WS– Periodic checkpoint feature can periodically

checkpoint the job so that work is not lost should the job be migrated

• Condor jobs vs. “vanilla” jobs– Condor job executables must be relinked and can be

checkpointed, migrated and restarted

– Vanilla jobs are not relinked and cannot be checkpointed and migrated

CSE 160/Berman

Condor Checkpointing Limitations

• Only single process jobs supported

– Inter-process communication not supported (socket, send, recv, etc. not implemented)

• All file operations idempotent (read-only, write-only work correctly, read and write to the same file may not)

• Disk space must be available to store the checkpoint file on the submitting machines.

– Each checkpointed job has an associated checkpoint file which is approximately the size of the address space of the process.

CSE 160/Berman

Condor-PVM and Parallel Jobs

• PVM master/slave jobs can be submitted to Condor pool. (Special condor-pvm universe)– Master is run on machine where the job was submitted– Slaves pulled from the condor pool as they become

available

• Condor acts as resource manager for pvm daemon– Whenever pvm program asks for nodes, request is

remapped to Condor– Condor finds machine in condor pool and adds it to pvm

virtual machine

CSE 160/Berman

Condor and the Grid

• Condor and the Alliance– Condor one of the Grid technologies

deployed by the Alliance– Used for production high-throughput

computing by partners

• Condor and Globus– Globus can use Condor as a local resource

manager. – Globus RSL specs translated into

matchmaker classads

CSE 160/Berman

Condor and the Grid• Flock of Condors

– Aggregation of condor pools into “flock” enables Condor pools to cross load-sharing and protection boundaries

– Condor flock may include Condor pools connected by wide-area networks

• Infrastructure– Idea is to add Gateway machine for every pool.

– Gateway machines act as resource brokers for machines external to a pool• In published description, GW machine presents randomly chosen external pools/machines

• CM does not need to know about flocking

– Each GW machine runs GW-startd and GW-schedd as with a single condor pool

CSE 160/Berman

Flocking Protocol(machines in different pools)

Schedd

Shadow

GW-Startd

GW-Startdchild

Central Manager

SubmissionMachine

GatewayMachine

Startd

Starter

UserProcess

Central Manager

GatewayMachine

ExecutionMachine

Submission Pool Execution Pool

GW-Schedd

GW-SimulateShadow

CSE 160/Berman

Globus

• Globus -- integrated toolkit of Grid services– Developed by Ian Foster (ANL/UC) and

Carl Kesselman (USC/ISI)– Bag of services model – applications

can use Grid services without having to adopt a particular programming model

CSE 160/Berman

Core Globus Services

• Resource allocation and process management (GRAM, DUROC, RSL)

• Information Infrastructure (MDS)• Security (GSI)• Communication (Nexus)• Remote Access (GASS, GEM)• Fault Detection (HBM)• QoS (GARA, Gloperf)

CSE 160/Berman

Globus Layered Architecture

Applications

Core ServicesMetacomputing

Directory Service

GRAMGlobus

Security Interface

Heartbeat Monitor

Nexus

Gloperf

Local Services

LSF

Condor MPI

NQEEasy

TCP

SolarisIrixAIX

UDP

High-level Services and Tools

DUROC globusrunMPI Nimrod/GMPI-IO CC++

GlobusView Testbed Status

GASS

CSE 160/Berman

Globus Resource Management Services

• Resource Management services provide mechanism for remote job submission and management

• 3 low level services: – GRAM (Globus Resource Allocation Manager)

• Provides remote job submission and management

– DUROC (Dynamically Updated Request Online Co-allocator)

• Provides simultaneous job submission• Layers on top of GRAM

– RSL (Resource Specification Language)• Language used to communicate resource requests

CSE 160/Berman

GRAM GRAM GRAM

LSF EASY-LL NQE

Application

RSL

Simple ground RSL

Information Service

Localresourcemanagers

RSLspecialization

Broker

Ground RSL

Co-allocator

Queries& Info

Globus Resource Management Architecture

CSE 160/Berman

Globus Information Infrastructure

• MDS (Metacomputing Directory Service)– MDS stores information about entry = some

type of object (organization, person, network, computer, etc.)

– Object class associated with each entry describes a set of entry attributes

– LDAP (Lightweight Directory Access Protocol) used to store information about resources

• LDAP = hierarchical, tree-structured information model defining form and character of information

CSE 160/Berman

Globus Security Service

• GSI (Grid Security Infrastructure)– Provides public key-based security system

that layers on top of local site security• User identified to system using X.509 certificate

containing info about the duration of permissions, public key, signature of certificate authority

• User also has private key

– Provides users with a single sign-on access to the various sites to which they are authorized

CSE 160/Berman

More GSI

• Resource management system uses GSI to establish which machines user may have access to

• GSI system allows for proxies so that user only need logon once, as opposed to logging on for all machines involved in a distributed computation– Proxies used for short-term authentication, rather than

long-term use

CSE 160/Berman

Globus Communication Services

• Nexus– Communication library which provides

asynchronous RPC, multi-method communication, data conversion and multi-threading facilities

• I/O– Low level communication library which

provides a thin wrapper around TCP, UDP, IP multicast and file I/O

– Integrates GSI into TCP communication

CSE 160/Berman

Globus Remote Access Services

• GASS (Globus Access to Secondary Storage)– Provides secure remote access to files

• GEM (Globus Executable Management)– Intended to support identification,

location, and creation of executables in a heterogeneous environment.

CSE 160/Berman

Globus Fault Detection Services

• HBM (Heartbeat Monitor)– Provides mechanisms for monitoring

multiple remote processes in a job and enabling application to respond to failures

• Nexus Fault Detection:– Notifies applications using Nexus

when a communicating process fails (but not which one)

CSE 160/Berman

Globus QoS Services• GARA (Globus Architecture for Reservation

and Allocation)– Provides dedicated access to collections of

resources via reservations

• Gloperf – Provides bandwidth and latency information

• Wolski’s NWS being integrated with Globus– NWS provides monitoring and predictive

information

CSE 160/Berman

Globus and the Grid

• Major player in Grid Infrastructure development

• Currently deployed widely • User community strong• Infrastructure supported by IPG,

Alliance and NPACI– Exclusive infrastructure of Alliance

and IPG

CSE 160/Berman

Legion• Developed by Andrew Grimshaw (UVA)

• Provides single, coherent virtual machine model that addresses grid issues within a reflective, object-based metasystem

• Everything is an object in Legion model – HW resources, SW resources, etc.

CSE 160/Berman

Legion Goals• Site autonomy

– Each organization maintains control over their own resources

• Extensibility– Users can construct own mechanisms and

policies within Legion

• Scalability– No centralized structures or servers; full

distribution

CSE 160/Berman

Legion Goals• Easy to use / seamless

– System must hide complexity of environment– “Ninja users” must be able to tune applications

• High performance via parallelism– Coarse-grained applications should perform

well

• Single, persistent object space– Single name space, transparent of location or

replication

• Security– “do no harm” – Legion should not weaken local

security policies

CSE 160/Berman

Legion Object Model• Every Legion object is defined and

managed by its class object; class objects act as managers and make policy, as well as define instances

• Legion defines the interface and basic functionality of a set of core object types which support basic services

• Users may also define and build their own class objects

CSE 160/Berman

Legion Object Model• Core Objects:

– Host objects• Encapsulate machine capabilities in Legion

(processors and memory)• Currently represent single host systems

(uniprocessor and multiprocessor shared memory)

– Vault objects• Represents persistent storage

– Implementation objects• Generally an executable file – host object can

execute when it receives a request to activate or create an object

CSE 160/Berman

Legion Object Model

• Basic system services provided by core objects– Naming and binding, object creation, activation,

deactivation and deletion

• Responsibility for system-level functionality endowed on classes– Classes (which are also objects) define and manage

objects associated with them

– Classes create new instances, schedule them for execution, activate and deactivate them, and provide current location info for contacting them

• Users can define and build own class objects

CSE 160/Berman

Legion Programming• Legion supports MPI and PVM

libraries via “emulation libraries” (which use runtime Legion library)– Applications need to be recompiled

and relinked

• Legion supports BFS (Basic Fortran Support) and Java

• Legion OO programming language = Mentat (MPL)

CSE 160/Berman

Legion and the Grid• Major Grid player with Globus• Legion infrastructure deployed at NPACI,

Department of Defense Modernization sites, being considered as infrastructure for Boeing’s distributed product data management and manufacturing resource control systems.

• Large-scale application implementations of molecular dynamics applications [Charmm and Amber] at NPACI

CSE 160/Berman

Still other Infrastructure Approaches

• Corba• Globe (Europe)• Suma (Venezuela)• Web-based approaches (Geoffrey Fox)• Jini (Sun)• DCom (MS)etc.

CSE 160/Berman

What’s Missing?• How do we ensure application

performance?• Performance-efficient application

development and execution:– Ninja programming– AppLeS, Nimrod, Mars,

Prophet/Gallop, MSHN, etc.– GrADS

GrADS – Grid Application Development and Execution

Environment• Prototype system which

facilitates end-to-end “grid-aware” program development

• Based on the idea of a performance economy in which negotiated contracts bind application to resources

• Joint project with large team of researchers

Ken KennedyJack DongarraDennis GannonDan Reed Lennart Johnsson

PSE

Config.object

program

wholeprogramcompiler

Source appli-cation

libraries

Realtimeperf

monitor

Dynamicoptimizer

Grid runtime System

(Globus)

negotiation

Softwarecomponents

Scheduler/Service

Negotiator

Performance feedbackPerf

problem

Grid Application Development System

Andrew ChienRich WolskiIan FosterCarl KesselmanFran Berman

Cool GrADS Ideas• Performance Contracts

– Vehicle for sharing complex, multi-dimensional performance information between components

• Performance Economy– Framework in which to negotiate services and promote

performance.

– Performance contracts play fundamental role in exchange of information and binding of resources

• Resource allocation and performance steering using fuzzy logic (“AppLePilot”)– Mechanism for describing quality of information

– Allows for performance steering based on evaluation of application progress

Next Time• Talk by Marc Snir,

Architect of IBM’s Blue Gene– Tuesday June 6, AP&M 4301 1:00-2:00

• Abstract IBM Research announced in December a 5 year, $100M research

project aimed at developing a petaop computer and using it for research in computational biology. The talk will discuss the architectural choices involved in the design of a petaop computer, and will present the design point pursued by the Blue Gene project. We shall discuss the mapping of molecular dynamic computations onto the Blue Gene architecture and outline research problems in Computer Science and Computational Biology that such project motivates.

Documents

CSE 160/Berman Grid Computing 2 ://legion/ (thanks to shava and holly