Introduction to Grid Computing Alina Bejan - University of Chicago SC08 Education Program, OU - Oklahoma Aug 14 2008 1

Introduction to Grid Computing

Alina Bejan - University of Chicago

SC08 Education Program, OU - Oklahoma

Aug 14 2008

1


2

Computing “Clusters” are today’s Supercomputers

Cluster Management“frontend”

Tape Backup robots

I/O Servers typically RAID fileserver

Disk Arrays Lots of Worker Nodes

A few Headnodes, gatekeepers and

other service nodes

2


3

Cluster ArchitectureCluster

User

3

Head Node(s)

Login access (ssh)

Cluster Scheduler(PBS, Condor, SGE)

Web Service (http)

Remote File Access(scp, FTP etc)

Node 0

…

…

…

Node N

SharedCluster

FilesystemStorage

(applicationsand data)

Job execution requests & status

ComputeNodes

(10 … 10,000PC’swithlocal

disks)

…Cluster

User

Internet

Protocols


4

Scaling up Science:Citation Network Analysis in Sociology

2002

1975

1990

1985

1980

2000

1995

Work of James Evans, University of Chicago,

Department of Sociology

4


5

Scaling up the analysis

Query and analysis of 25+ million citations Work started on desktop workstations Queries grew to month-long duration With data distributed across

U of Chicago TeraPort cluster: 50 (faster) CPUs gave 100 X speedup Many more methods and hypotheses can be tested!

Higher throughput and capacity enables deeper analysis and broader community access.

5


6

Grids consist of distributed clusters

Grid Client

Application& User Interface

Grid ClientMiddleware

Resource,Workflow

& Data Catalogs

6

Grid Site 2:Sao Paolo

GridService

Middleware

ComputeCluster

GridStorage

Grid

Protocols

Grid Site 1:Fermilab

GridService

Middleware

ComputeCluster

GridStorage

…Grid Site N:UWisconsin

GridService

Middleware

ComputeCluster

GridStorage


7

HPC vs. HTCHTC: many computing resources over long periods of time to

accomplish a computational task.

HPC tasks are characterized as needing large of amounts of computing power for short periods of time, whereas HTC tasks also require large amounts of computing, but for much longer times (months and years, rather than hours and days).

Fine-grained calculations are better suited to HPC: big, monolithic supercomputers, or very tightly coupled computer clusters with lots of identical processors and an extremely fast, reliable network between the processors.

Embarrassingly parallel calculations are ideal for HTC: more loosely-coupled networks of computers where delays in getting results from one processor will not affect the work of the others. (course grained)


8

Grids represent a different approach

– Building HTC resource by joining clusters together in a grid

– Example:

Current Compute Resources:– 70 Open Science Grid sites, but all

very different– Connected via Inet2, NLR.... from

10 Gbps – 622 Mbps– Compute Elements & Storage

Elements– Most are Linux clusters– Most are shared

• Campus grids• Local non-grid users

– More than 20,000 CPUs• A lot of opportunistic usage • Total computing capacity

difficult to estimate• Same with Storage

Current Compute Resources:– 70 Open Science Grid sites, but all

very different– Connected via Inet2, NLR.... from

10 Gbps – 622 Mbps– Compute Elements & Storage

Elements– Most are Linux clusters– Most are shared

• Campus grids• Local non-grid users

– More than 20,000 CPUs• A lot of opportunistic usage • Total computing capacity

difficult to estimate• Same with Storage

The OSG


9

Initial Grid driver: High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)




Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 3Tier 3

1 TIPS is approximately 25,000

SpecInt95 equivalents

Image courtesy Harvey Newman, Caltech 9


10

Mining Seismic data for hazard analysis (Southern Calif. Earthquake Center).

InSAR Image of theHector Mine Earthquake

• A satellitegeneratedInterferometricSynthetic Radar(InSAR) image ofthe 1999 HectorMine earthquake.

• Shows thedisplacement fieldin the direction ofradar imaging

• Each fringe (e.g.,from red to red)corresponds to afew centimeters ofdisplacement.

SeismicHazardModel

Seismicity Paleoseismology Local site effects Geologic structure

Faults

Stresstransfer

Crustal motion Crustal deformation Seismic velocity structure

Rupturedynamics 1010


11

Grids can process vast datasets. Many HEP and Astronomy experiments consist of:

Large datasets as inputs (find datasets) “Transformations” which work on the input datasets (process) The output datasets (store and publish)

The emphasis is on the sharing of these large datasets Workflows of independent program can be parallelized.

Montage Workflow: ~1200 jobs, 7 levelsNVO, NASA, ISI/Pegasus - Deelman et al.

Mosaic of M42 created on TeraGrid

= Data Transfer

= Compute Job 11


12

Ian Foster’s Grid Checklist

A Grid is a system that:

Coordinates resources that are not subject to centralized control

Uses standard, open, general-purpose protocols and interfaces

Delivers non-trivial qualities of service

12


13

The Grid Middleware Stack (and course modules)

Grid Security Infrastructure

JobManagement

DataManagement

Grid InformationServices

Core Globus Services

Standard Network Protocols and Web Services

Workflow system (explicit or ad-hoc)

Grid Application (often includes a Portal)

13

Job and resource management

14


15

How do we access the grid ?

Command line with tools that you'll use Specialised applications

Ex: Write a program to process images that sends data to run on the grid as an inbuilt feature.

Web portals I2U2 SIDGrid


16

Grid Middleware glues the grid together A short, intuitive definition:

the software that glues together different clusters into a grid

taking into consideration the socio-political side of things (such as common policies on who can use what, how much, and what for)


17

Grid middleware components

Job management Storage management Information Services Security


18

Globus and Condor play key roles Globus Toolkit provides the base middleware

Client tools which you can use from a command line APIs (scripting languages, C, C++, Java, …) to build

your own tools, or use direct from applications Web service interfaces Higher level tools built from these basic components,

e.g. Reliable File Transfer (RFT) Condor provides both client & server scheduling

In grids, Condor provides an agent to queue, schedule and manage work submission

18


19

Globus Toolkit Developed at UChicago/ANL and USC/ISI (Globus

Alliance) - globus.org Open source Adopted by different scientific communities and

industries Conceived as an open set of architectures, services and

software libraries that support grids and grid applications Provides services in major areas of distributed systems:

Core services Data management Security


20

Globus - core services

A toolkit that can serve as a foundation to build a grid Authorization Message level security System level services (e.g., monitoring) Associated data management provides file services

GridFTP RFT (Reliable File Transfer) RLS (Replica Location Service)


21

Local Resource Managers (LRM) Compute resources have a local resource manager

(LRM) that controls: Who is allowed to run jobs How jobs run on a specific resource Specifies the order and location of jobs

Example policy: Each cluster node can run one job. If there are more jobs, then they must wait in a queue

Examples: PBS, LSF, Condor


22

Local Resource Manager: a batch schedulerfor running jobs on a computing cluster Popular LRMs include:

PBS – Portable Batch System LSF – Load Sharing Facility SGE – Sun Grid Engine Condor – Originally for cycle scavenging, Condor has evolved

into a comprehensive system for managing computing LRMs execute on the cluster’s head node Simplest LRM allows you to “fork” jobs quickly

Runs on the head node (gatekeeper) for fast utility functions No queuing (but this is emerging to “throttle” heavy loads)

In GRAM, each LRM is handled with a “job manager”

22


23

GRAM Globus Resource Allocation Manager

GRAM = provides a standardised interface to submit jobs to LRMs.

Clients submit a job request to GRAM GRAM translates into something a(ny) LRM can

understand …. Same job request can be used for many different

kinds of LRM


24

Job Management on a Grid

User

The Grid

Condor

PBS

LSF

fork

GRAM

Site A

Site B

Site C

Site D


25

GRAM’s abilities Given a job specification:

Creates an environment for the job Stages files to and from the environment Submits a job to a local resource manager Monitors a job Sends notifications of the job state change Streams a job’s stdout/err during execution


26

GRAM components

Worker nodes / CPUsWorker node / CPU

Worker node / CPU

Worker node / CPU

Worker node / CPU

Worker node / CPU

LRM eg Condor, PBS, LSF

Gatekeeper

Internet

JobmanagerJobmanager

globus-job-run

Submitting machine(e.g. User's workstation)


27

Submitting a job with GRAM globus-job-run command

$ globus-job-run workshop1.ci.uchicago.edu /bin/hostname

Run '/bin/hostname' on the resource workshop1.ci.uchicago.edu

We don't care what LRM is used on ‘workshop1’ This command works with any LRM.


28

Condor

Condor is a specialized workload management system for compute-intensive jobs (aka batch system)

is a software system that creates an HTC environment Created at UW-Madison

Detects machine availability Harnesses available resources Uses remote system calls to send R/W operations over the

network Provides powerful resource management by matching resource

owners with consumers (broker)


29

Condor manages your cluster Given a set of computers…

Dedicated Opportunistic (Desktop computers)

And given a set of jobs… Can be a very large set of jobs

Condor will run the jobs on the computers Fault-tolerance (restart the jobs) Priorities Can be in a specific order With more features than we can mention here…


30

Condor - features

Checkpoint & migration Remote system calls

Able to transfer data files and executables across machines

Job ordering Job requirements and preferences can be

specified via powerful expressions


31

Condor-G Condor-G is a marketing term

It’s a feature of Condor It can be used without using Condor on your cluster

Feature: instead of submitting just to your local cluster, you can submit to an external grid. The “G” in Condor-G

Many grid types are supported: Globus (old or new) Nordugrid CREAM Amazon EC2 … others


32

Globus GRAM Protocol Globus GRAM

Submit to LRM

Organization A Organization B

Condor-GCondor-G

myjob1myjob2myjob3myjob4myjob5…

Remote Resource Access: Condor-G + Globus + Condor


33

Why Condor-G?

Globus has command-line tools, right?

Condor-G provides extra features:

Access to multiple types of grid sites

A reliable queue for your jobs

Job throttling


34

Four Steps to Run a Job with Condor

These choices tell Condor how when where to run the job, and describe exactly what you want to run.

Choose a Universe for your job Make your job batch-ready Create a submit description file Run condor_submit


35

1. Choose a Universe There are many choices

Vanilla: any old job Grid: run jobs on the grid Standard: checkpointing & remote I/O Java: better for Java jobs MPI: Run parallel MPI jobs Virtual Machine: Run a virtual machine as job …

For now, we’ll just consider grid


36

2. Make your job batch-ready

Must be able to run in the background: no interactive input, windows, GUI, etc.

Condor is designed to run jobs as a batch system, with pre-defined inputs for jobs

Can still use STDIN, STDOUT, and STDERR (the keyboard and the screen), but files are used for these instead of the actual devices

Organize data files


37

3. Create a Submit Description File A plain ASCII text file Condor does not care about file extensions

Tells Condor about your job:

Which executable to run and where to find it Which universe Location of input, output and error files Command-line arguments, if any Environment variables Any special requirements or preferences


38

Simple Submit Description File# myjob.submit file

# Simple condor_submit input file

# (Lines beginning with # are comments)

# NOTE: the words on the left side are not

# case sensitive, but filenames are!executable=/sw/national_grids/primetest arguments=143output=results.outputerror=results.errorlog=results.lognotification=neveruniverse=gridgrid_resource=gt2 workshop1.ci.uchicago.edu/jobmanager-

forkqueue


39

4. Run condor_submit

You give condor_submit the name of the submit file you have created:

condor_submit my_job.submit

condor_submit parses the submit file


40

Other Condor commands

condor_q – show status of job queue

condor_status – show status of compute nodes condor_rm – remove a job condor_hold – hold a job temporarily condor_release – release a job from hold


41

Submitting more complex jobs

express dependencies between jobs WORKFLOWS

And also, we would like the workflow to be managed even in the face of failures


42

DAGMan Directed Acyclic Graph Manager DAGMan allows you to specify the dependencies between

your Condor jobs, so it can manage them automatically for you.

(e.g., “Don’t run job “B” until job “A” has completed successfully.”)


43

What is a DAG? A DAG is the data structure used by

DAGMan to represent these dependencies.

Each job is a “node” in the DAG.

Each node can have any number of “parent” or “children” nodes –

as long as there are no loops!

Job A

Job B Job C

Job D


44

A DAG is defined by a .dag file, listing each of its nodes and their dependencies:

# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B CParent B C Child D

each node will run the Condor job specified by its accompanying Condor submit file

Defining a DAG

Job A

Job B Job C

Job D


45

Submitting a DAG To start your DAG, just run condor_submit_dag with your .dag file,

and Condor will start a personal DAGMan daemon which to begin running your jobs:

% condor_submit_dag diamond.dag

condor_submit_dag submits a Scheduler Universe Job with DAGMan as the executable.

Thus the DAGMan daemon itself runs as a Condor job, so you don’t have to baby-sit it.


46

Why DAGMan?

Submit and forget You can submit a large workflow and let DAGMan run

it Fault tolerance

DAGMan can help recover from many common failures

Example: You can retry jobs automatically if they fail

Grid Workflow

47


48

A typical workflow pattern in image analysis runs many filtering apps (fMRI - functional magnetic resonance imaging)

3a.h

align_warp/1

3a.i

3a.s.h

softmean/9

3a.s.i

3a.w

reslice/2

4a.h

align_warp/3

4a.i

4a.s.h 4a.s.i

4a.w

reslice/4

5a.h

align_warp/5

5a.i

5a.s.h 5a.s.i

5a.w

reslice/6

6a.h

align_warp/7

6a.i

6a.s.h 6a.s.i

6a.w

reslice/8

ref.h ref.i

atlas.h atlas.i

slicer/10 slicer/12 slicer/14

atlas_x.jpg

atlas_x.ppm

convert/11

atlas_y.jpg

atlas_y.ppm

convert/13

atlas_z.jpg

atlas_z.ppm

convert/15

Workflow courtesy James Dobson, Dartmouth Brain Imaging Center 48


49

Workflow management systems Orchestration of many resources over long time

periods Very complex to do manually - workflow automates

this effort Enables restart of long running scripts Write scripts in a manner that’s location-

independent: run anywhere Higher level of abstraction gives increased portability

of the workflow script (over ad-hoc scripting)


50

OSG & job submissions OSG sites present interfaces allowing remotely submitted

jobs to be accepted, queued and executed locally.

OSG supports the Condor-G job submission client which interfaces to either the pre-web service or web services GRAM Globus interface at the executing site.

Job managers at the backend of the GRAM gatekeeper support job execution by local Condor, LSF, PBS, or SGE batch systems.

Data Management

\

51


52

High-performance tools to solve data problems. The huge volume of data:

Storing it Moving it

[ Measured in terabytes, petabytes, and further …]

The huge number of filenames: 1012 filenames is expected soon Collection of 1012 of anything is a lot to handle efficiently

How to find the data


53

Data Questions on the Grid

Where are the files I want?

How to move data/files to/from where I want?


54

Data management services provide the mechanisms to find, move and share data GridFTP

Fast, Flexible, Secure, Ubiquitous data transport Often embedded in higher level services

RFT Reliable file transfer service using GridFTP

Replica Location Service (RLS) Tracks multiple copies of data for speed and reliability

Storage Resource Manager (SRM) Manages storage space allocation, aggregation, and transfer

54


55

GridFTP high performance, secure, and reliable data transfer

protocol based on the standard FTP

Extensions include Strong authentication, encryption via Globus GSI Multiple data channels for parallel transfers Third-party transfers


56

Data channel

A file transfer with GridFTP Control channel can go either way

Depends on which end is client, which end is server Data channel is still in same direction

Site ASite B

Control channel

Server


57

Site BSite A

Data channel

Third party transfer File transfer without data flowing through the client. Controller can be separate from src/dest Useful for moving data from storage to compute

Control channels

Client

ServerServer


58

Site B

Going fast – parallel streams Use several data channels

Site A

Control channel

Data channelsServer


59

GridFTP usage globus-url-copy

Convensions on URL formats:

file:///home/YOURLOGIN/dataex/largefile a file called largefile on the local file system, in

directory /home/YOURLOGIN/dataex/

gsiftp://osg-edu.cs.wisc.edu/scratch/YOURLOGIN/ a directory accessible via gsiftp on the host called

osg-edu.cs.wisc.edu in directory /scratch/YOURLOGIN.


60

GridFTP examples

globus-url-copy

file:///home/YOURLOGIN/dataex/myfile

gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex1

globus-url-copy

gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex2

gsiftp://tp-osg.ci.uchicago.edu/YOURLOGIN/ex3


61

RFT = Reliable file transfer • protocol that provides reliability and fault tolerance

for file transfers

• Part of the Globus Toolkit

• RFT acts as a client to GridFTP, providing management of a large number of transfer jobs (same as Condor to GRAM)

• RFT can• keep track of the state of each job• run several transfers at once• deal with connection failure, network failure, failure of any

of the servers involved.


62

OSG & Data management OSG relies on GridFTP protocol for the raw transport of the data using

Globus GridFTP in all cases except where interfaces to storage management systems (rather than file systems) dictate individual implementations.

OSG supports the SRM interface to storage resources to enable management of space and data transfers to prevent unexpected errors due to running out of space, to prevent overload of the GridFTP services, and to provide capabilities for pre-staging, pinning and retention of the data files.

OSG currently provides reference implementations of two storage systems the BeStMan and dCache

Grid Security

63


64

Grid security is a crucial component Need for secure communication

Privacy - only invited to understand conversation (use encryption)

Integrity - message unchanged (use signed messages)

Authentication - verify entities are who they claim to be (use

certificates and CAs)

Authorization - allowing or denying access to services based on

policies.

Need to support “single sign-on” for users of grid Delegation of credentials for computations that involve multiple

resources and/or sites

64


65

Identity & Authentication

Each entity should have an identity Authenticate: Establish identity

Is the entity who he claims he is ? Examples:

Driving License Username/password

Stops masquerading imposters


66

Authorization

Establishing rights What can a certain identity do ?

Examples: Are you allowed to be on this flight ?

Passenger ? Pilot ?

Unix read/write/execute permissions Must authenticate first


67

Single Sign-on Important for complex applications that need to

use Grid resources Enables easy coordination of varied resources Enables automation of processes Allows remote processes and resources to act on user’s

behalf Authentication and Delegation


68

Certificates

• Central concept in GSI authentication• Every user, resource and service on Grid is

identified via a certificate• Contains:

– Subject name (identifies entity)– Corresponding public key– Identity of the CA that has signed the cert (to certify

that the public key and the identity both belong to the subject)

– The digital signature of the CA

• GSI certs are encoded in a X509 certificate format


69

Grid Security - in practice - steps: Get certificate from relevant CA

DOEGrids in our case

Request to be authorized for resources Meaning you will be added to the OSGEDU VOMS

Generate proxy as needed Using grid-proxy-init

Run clients Authenticate Authorize Delegate as required

Numerous resources, different CAs, numerous credentials

National GridCyberinfrastructure

70


71

Grid Resources in the US

Origins: National Grid (iVDGL, GriPhyN,

PPDG) and LHC Software & Computing Projects

Current Compute Resources: 70 Open Science Grid sites Connected via Inet2, NLR.... from

10 Gbps – 622 Mbps Compute & Storage Elements All are Linux clusters Most are shared

Campus grids Local non-grid users

More than 20,000 CPUs A lot of opportunistic usage Total computing capacity

difficult to estimate Same with Storage

Origins: National Grid (iVDGL, GriPhyN,

PPDG) and LHC Software & Computing Projects

Current Compute Resources: 70 Open Science Grid sites Connected via Inet2, NLR.... from

10 Gbps – 622 Mbps Compute & Storage Elements All are Linux clusters Most are shared

Campus grids Local non-grid users

More than 20,000 CPUs A lot of opportunistic usage Total computing capacity

difficult to estimate Same with Storage

Origins: National Super Computing Centers,

funded by the National Science Foundation

Current Compute Resources: 14 TeraGrid sites Connected via dedicated multi-

Gbps links Mix of Architectures

ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs

Resources are dedicated but Grid users share with local

users 1000s of CPUs, > 40 TeraFlops

100s of TeraBytes

Origins: National Super Computing Centers,

funded by the National Science Foundation

Current Compute Resources: 14 TeraGrid sites Connected via dedicated multi-

Gbps links Mix of Architectures

ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs

Resources are dedicated but Grid users share with local

users 1000s of CPUs, > 40 TeraFlops

100s of TeraBytes

The TeraGridThe OSG


72

Open Science Grid (OSG) provides shared computing resources, benefiting a broad set of disciplines

OSG incorporates advanced networking and focuses on general services, operations, end-to-end performance

Composed of a large number (>70 and growing) of shared computing facilities, or “sites”

http://www.opensciencegrid.org/

A consortium of universities and national laboratories, building a sustainable grid infrastructure for science.

72


73

96 Resources across

production & integration infrastructures

20 Virtual Organizations +6 operations

Includes 25% non-physics.

~20,000 CPUs (from 30 to 4000)

~6 PB Tapes

~4 PB Shared Disk

Snapshot of Jobs on OSGs

Sustaining through OSG submissions:

3,000-4,000 simultaneous jobs .

~10K jobs/day

~50K CPUhours/day.

Peak test jobs of 15K a day.

Using production & research networks

OSG Snapshot


74

To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with

information maintained on the “health” of sites

74


75

OSG - VORS (VO Resource Locator)


76

OSG Middleware

Infr

ast

ruct

ure

Ap

pli

cati

on

s VO Middleware

Core grid technology distributions: Condor, Globus, myproxy: shared with TeraGrid and

others

Virtual Data Toolkit (VDT) core technologies + software needed by

stakeholders: many components shared with EGEE

OSG Release Cache: OSG specific configurations, utilities etc.

HEP

Data and workflow management etc

Biology

Portals, databases etc

User Science Codes and Interfaces

Existing Operating, Batch systems and Utilities

Astrophysics

Data replication etc


77

TeraGrid provides vast resources via a number of huge computing facilities.world's largest, most comprehensive distributed cyberinfrastructure for open scientific research (750 teraflops of computing capability and more than 30 petabytes of storage)

77


78

Contact us [email protected]

opensciencegrid.org/Education Online training course Materials, resources


79

Acknowledgments:

Presentation based on different OSG speaker notes (globus, condor, swift, GSI teams).


80

Lab

URL: opensciencegrid.org/GridSchool

Machines we’ll use for login : workshop1.ci.uchicago.edu workshop2.ci.uchicago.edu

Username: trainXX, where XX in [01 .. 40]

Password: 0808OKLA

Documents

Introduction to Grid Computing Alina Bejan - University of Chicago SC08 Education Program, OU - Oklahoma Aug 14 2008 1