27
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Workload Management System and Job Description Language

INFSO-RI-508833 Enabling Grids for E-sciencE Workload Management System and Job Description Language

Embed Size (px)

Citation preview

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Workload Management System and Job Description Language

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

2

Enabling Grids for E-sciencE

INFSO-RI-508833

Contents

• Reminder of the main grid services

• A closer look at Workload Management System (WMS) and its Resource Broker (RB)

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

3

Enabling Grids for E-sciencE

INFSO-RI-508833

User Interface node

• The user’s interface to the Grid

• Command-line interface to– Create proxy with VOMS

extensions– Job operations – (non-

blocking) To submit a job Monitor its status Retrieve output

– Data operations on files– Other grid services

• Also C++ and Java APIs

• To run a job user creates a JDL (Job Description Language) file

UIJDL

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

4

Enabling Grids for E-sciencE

INFSO-RI-508833

Current production middleware

Logging &Logging &Book-keepingBook-keeping

ResourceResourceBrokerBroker

StorageStorageElementElement

ComputingComputingElementElement

Information Information ServiceService

Job Status

DataSets info

Author.&Authen.

Job S

ub

mit

Even

t

Job

Qu

ery

Job

Stat

us

Input “sandbox”

Input “sandbox” + Broker Info

Output “sandbox”

Output “sandbox”

Pu

blis

h

SE & CE info

““User User interface”interface”

LCG LCG FileCatalogue FileCatalogue (LFC)(LFC)

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

5

Enabling Grids for E-sciencE

INFSO-RI-508833

Example JDL fileExecutable = “gridTest”;

StdError = “stderr.log”;

StdOutput = “stdout.log”;

InputSandbox = {“/home/joda/test/gridTest”};

OutputSandbox = {“stderr.log”, “stdout.log”};

InputData = “lfn:/grid/gilda/training/testbed0-00019”;

DataAccessProtocol = “gridftp”;

Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX”;

Rank = “other.GlueHostBenchmarkSF00”;

Building on basic tools and Information Service

•Submit job to grid via the “resource broker (RB)”,

•edg-job-submit my.jdlReturns a “job-id” used to monitor job, retrieve output

Page 6: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

6

Enabling Grids for E-sciencE

INFSO-RI-508833

Example JDL fileExecutable = “gridTest”;

StdError = “stderr.log”;

StdOutput = “stdout.log”;

InputSandbox = {“/home/joda/test/gridTest”};

OutputSandbox = {“stderr.log”, “stdout.log”};

InputData = “lfn:/grid/VOname/mydir/testbed0-00019”;

DataAccessProtocol = “gridftp”;

Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;

Rank = “other.GlueHostBenchmarkSF00”;

Building on basic tools and Information Service

•Submit job to grid via the “resource broker”,

•edg-job-submit my.jdlReturns a “job-id” used to monitor job, retrieve output

lfn: logical file name

RB uses Catalog to find replica locations

Page 7: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

7

Enabling Grids for E-sciencE

INFSO-RI-508833

Example JDL fileExecutable = “gridTest”;

StdError = “stderr.log”;

StdOutput = “stdout.log”;

InputSandbox = {“/home/joda/test/gridTest”};

OutputSandbox = {“stderr.log”, “stdout.log”};

InputData = “lfn:/grid/VOname/mydir/testbed0.00019”;

DataAccessProtocol = “gridftp”;

Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;

Rank = “other.GlueHostBenchmarkSF00”;

Building on basic tools and Information Service

•Submit job to grid via the “resource broker”,

•edg_job_submit my.jdlReturns a “job-id” used to monitor job, retrieve output

Uses Information System

Page 8: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

8

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

Job submission

Page 9: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

9

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

Job Status

UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)WMS: Workload Management System

Page 10: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

10

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

edg-job-submit myjob.jdlMyjob.jdl

JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;

submitted

Job Status

Job Description Language(JDL) to specify job characteristics and requirements

Page 11: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

11

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Input Sandboxfiles

Job

waiting

submitted

Job Status

NS: network daemon responsible for acceptingincoming requests

Page 12: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

12

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

WM: responsible to takethe appropriate actions to satisfy the request

Job

Page 13: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

13

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

Where must thisjob be executed ?

Page 14: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

14

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Matchmaker: responsible to find the “best” CE where to submit a job

Page 15: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

15

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Where are (which SEs) the needed data ?

What is thestatus of the

Grid ?

Page 16: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

16

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

CE choice

Page 17: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

17

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

JobAdapter

JA: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, etc.)

Page 18: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

18

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

JC: responsible for theactual job managementoperations (done via CondorG)

Job

submitted

waiting

ready

Page 19: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

19

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

Job

InputSandboxfiles

submitted

waiting

ready

scheduled

Page 20: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

20

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

InputSandbox

submitted

waiting

ready

scheduled

running

“Grid enabled”data transfers/

accesses

Job

Page 21: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

21

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

Page 22: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

22

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandbox

submitted

waiting

ready

scheduled

running

done

edg-job-get-output <dg-job-id>

Page 23: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

23

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

LFC

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

cleared

Page 24: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

24

Job monitoring

UI

Log Monitor

Logging &Bookkeeping

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ComputingElement

RB node

LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB

LB: receives and stores job events; processes corresponding job status

Log ofjob events

edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>

Job status

Page 25: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

25

Enabling Grids for E-sciencE

INFSO-RI-508833

Possible job states

Flag Meaning

SUBMITTED submission logged in the LB

WAIT job match making for resources

READY job being sent to executing CE

SCHEDULED job scheduled in the CE queue manager

RUNNING job executing on a WN of the selected CE queue

DONE job terminated without grid errors

CLEARED job output retrieved

ABORT job aborted by middleware, check reason

Page 26: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

26

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• Create JDL file• Check some CEs match your requirements:

– edg-job-list-match

• Submit job– edg-job-submit

• Do something else for a while! – gLite is not written for short jobs!

• Check job status - occasionally– edg-job-status

• When job is “done”, get output– edg-job-get-output

Page 27: INFSO-RI-508833 Enabling Grids for E-sciencE  Workload Management System and Job Description Language

EGEE ResourceBroker

27

Enabling Grids for E-sciencE

INFSO-RI-508833

NOTES about the practical

• “Write a simple JDL file like the following (jlvptest.jdl)”

– You already have hostname.jdl – use that!

• Follow “Practical_1” link in the agenda page

– Also try the command edg-job-get-logging-info

– And follow Practical_2 to explore different JDL options.