6. Juli 2009
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
tISSGC’09
UNICORE day at ISSGC’09Presenters: Rebecca Breu, Bastian Demuth, Mathilde Romberg
Jülich Supercomputing Centre (JSC)
7 July 2009
7th InternationalSummer School on
Grid Computing
07/07/2009 Slide 2
ISSGC’09
Agenda
9:00 – 10:30 Principles of Job Submission and Execution Management
Set the scene
11:00 – 12:30 UNICORE – Architecture and Components
Technical overview on how UNICORE works and how it is used
14:00 – 15:30 UNICORE Basic Practical
Practical: submitting jobs with the command line client
16:00 – 17:30 UNICORE Workflow Practical
Practical: submitting workflows with the graphical client
18:00 – 19:00 UNICORE: An Application
Example applications using UNICORE
6. Juli 2009
Mitg
lied
der H
elm
holtz
-Gem
eins
chaf
tISSGC’09
7th InternationalSummer School on
Grid Computing
Session 9: Principles of Job Submission and Execution ManagementAuthor/Presenter: Achim Streit, Mathilde RombergJülich Supercomputing Centre (JSC)
07/07/2009 Slide 4
ISSGC’09
Job Submission
07/07/2009 Slide 5
ISSGC’09
Jobs
JobSome work to be executedRequires CPU and memoryPossibly accesses additional resources, e.g., storage, devices, services
Job schedulingPolicy for assigning jobs to resources
Courtesy of Prof. Felix Wolf, RWTH Aachen
07/07/2009 Slide 6
ISSGC’09
Resources
ComputeMemoryCentral Processing UnitsNodesThreads/Tasks
DataSizeTransfer rate
NetworkBandwidths
...
07/07/2009 Slide 7
ISSGC’09
How to Differentiate Compute Resources?
by number of CPUsSingle processorMulti processor
Multiprocessor systems can be grouped intoShared memory
Equal access time to memory from each processorDistributed memory
Each CPU has its own memory and I/ODifferent address spaces
Distributed shared memoryShared address space Access time depends on location of data in memory
07/07/2009 Slide 8
ISSGC’09
Multiprocessor Systems – Examples
SMP (Symmetric (shared-memory) MultiProcessors)IBM Power 4/5/6 node, multi-core chips
MPP (Massively Parallel Processor)IBM Blue Gene/P, Cray XT4
NUMA (Non-Uniform Memory Access)SGI Altix
Cluster:Mare Nostrum, IBM Power4/5/6 system Tera-10, self-built cluster
Jülich
Jülich Helsinki
Munich
Barcelona
07/07/2009 Slide 9
ISSGC’09
Job Scheduling
07/07/2009 Slide 10
ISSGC’09
Job Scheduling
Policy for assigning jobs to resourcesInput are
Set of jobs with requirementsSet of resources
Criteria for assignmentFairnessEfficiencyMinimize response time (interactive users) and turnaround time (batch jobs)Maximize throughput
Courtesy of Prof. Felix Wolf, RWTH Aachen
07/07/2009 Slide 11
ISSGC’09
Usage of Multiprocessor Systems
Typically the user/job resource demands are greater than the available resources users/jobs competeTypically resource requirements differ from one user (or application) to the other
Large/small (in terms of number of processors)Large/small (in terms of amount of memory)Long/short (in terms of duration of resource usage)
A form of resource management and job scheduling is required !
How to share the available resources among the competing jobs?When does a job start and which resources are assigned?
07/07/2009 Slide 12
ISSGC’09
Resource Management & Job Scheduling – 1
Time-sharing (or time-slicing)Several jobs share the same resourceJobs are executed quasi-simultaneously
Resources are not exclusively assigned to jobsResource usage of jobs is reduced to short time slices (some clock ticks of the processor)
Jobs need more than a single time slice to completeEach job gets the resource assigned in a round-robin fashion
New jobs start immediatelyExecution time takes longer than on a dedicated resourceTypically handled by the operating system
Examples: SMP machines, your own Linux PC
07/07/2009 Slide 13
ISSGC’09
Resource Management & Job Scheduling – 2
Space-sharing (or space-slicing)Resources are exclusively assigned to a job until it completesJobs may have to wait for enough free resources until their startNeeds a separate resource management system (also known as batch system) and job scheduler
Examples:MPP systems, clusters, etc.LoadLeveler, Torque + Maui, PBSPro, OpenCCS, SLURM, …
space-sharing based resource management and job scheduling is commonly used on clusters
and other multiprocessor systems
07/07/2009 Slide 14
ISSGC’09
Job Submission on Multiprocessor SystemsExample – LoadLeveler
IBM Tivoli Workload Scheduler LoadLeveler Available for AIX, Linux
Basic LoadLeveler commands
Job submission via job command file
Displays status informationllstatus
Delete a queued or running jobllcancel <job_id>
Show queued and running jobsllq
Submit a jobllsubmit
llsubmit <cmdfile>
07/07/2009 Slide 15
ISSGC’09
LoadLeveler cmd_file examples – 1
IBM Blue Gene/P system @ Jülich – JUGENE# @ job_name = BGP-LoadL-Sample-1# @ comment = "BGP Job by Size"# @ error = $(job_name).$(jobid).out# @ output = $(job_name).$(jobid).out# @ environment = COPY_ALL;# @ wall_clock_limit = 00:20:00# @ notification = error# @ notify_user = [email protected]# @ job_type = bluegene# @ bg_size = 32# @ queue/usr/local/bin/mpirun -exe `/bin/pwd`/wait_bgp.rts \
-mode VN -np 48 -verbose 1 -args "-t 1"
size of partition
runtime/duration
Executable, only mpirun !
07/07/2009 Slide 16
ISSGC’09
LoadLeveler cmd_file examples – 2
IBM p690 eServer Cluster 1600 @ Jülich – JUMP
#@ node: number of nodes for the job#@ total_tasks: number of total tasks in the job
#@ job_type = parallel#@ output = out.$(jobid).$(stepid)#@ error = err.$(jobid).$(stepid)#@ wall_clock_limit = 00:15:00#@ notify_user = [email protected]#@ node = 2 #@ total_tasks = 64#@ data_limit = 1.5GB#@ queuemyprogram
resource requirements
runtime/duration
executable
07/07/2009 Slide 17
ISSGC’09
Job Submission on Multiprocessor SystemsExample – Torque + Maui
Torque is the resource managerMaui is the cluster scheduler
Basic Maui commands
Shows estimated start time of idle jobs
showstart
Shows detailed usage statistics for users, groups, and accounts, the user has access to
showstats
Cancels existing jobscanceljob
Displays detailed and prioritized list of active and idle jobs
showq
Submit a new jobmsub
07/07/2009 Slide 18
ISSGC’09
Job submission in Maui
Via commandline:
resource list:32 nodes with 2 processors each1800 MB per task3600 seconds duration
script file
msub -l nodes=32:ppn=2,pmem=1800mb,walltime=3600 myscript
07/07/2009 Slide 19
ISSGC’09
Each job submission system is differentDifferent commands for submission, status query, cancellationDifferent options, scheduling policies, …
Even different configurations of the same job submission systemsfor different multiprocessor systems exist
Job requirements are specified differentlyCommand-line parameters for the job submission commandSeparate job command file
Different job requirements existNodes and tasks per node, total tasks, …
Lessons Learned
07/07/2009 Slide 20
ISSGC’09
Job submission and the Grid
A higher, meta level with more abstraction is needed to describe the requirements of jobs in a Grid of heterogeneous systemsA lot of proprietary solutions exist, each Grid middleware is using its own language, e.g.
AJO in UNICORE 5, ClassAds/JDL in Condor, JDL in gLite, RSL in Globus Toolkit, xRSL in ARC/NorduGrid, etc…
And there is JSDL 1.0Open Grid Forum (OGF) standardhttp://www.gridforum.org/documents/GFD.56.pdf
07/07/2009 Slide 21
ISSGC’09
JSDL – Introduction
JSDL stands for Job Submission Description LanguageA language for describing the requirements of computational jobs for submission to Grids and other systemsCan also be used between middleware systems or for submitting to any Grid middleware ( interoperability)
JSDL does not define a submission interface or what the results of a submission look like or how resources are selected
07/07/2009 Slide 22
ISSGC’09
JSDL Document
A JSDL document is an XML document, which may containGeneric (job) identification information Application descriptionResource requirements (main focus is computational jobs)Description of required data files
A JSDL document is a template, which can be submitted multiple times and can be used to create multiple job instances
No job instance specific attributes can be defined, e.g. start or end time
07/07/2009 Slide 23
ISSGC’09
JSDL – A Hello World Example
<?xml version="1.0" encoding="UTF-8"?><jsdl:JobDefinition
xmlns:jsdl=“http://schemas.ggf.org/2005/11/jsdl”xmlns:jsdl-posix=http://schemas.ggf.org/jsdl/2005/11/jsdl-posix>
<jsdl:JobDescription><jsdl:Application>
<jsdl-posix:POSIXApplication><jsdl-posix:Executable>/bin/echo<jsdl-posix:Executable><jsdl-posix:Input>/dev/null</jsdl-posix:Input><jsdl-posix:Output>std.out</jsdl-posix:Output><jsdl-posix:Argument>hello</jsdl-posix:Argument><jsdl-posix:Argument>world</jsdl-posix:Argument>
</jsdl-posix:POSIXApplication></jsdl:Application>
</jsdl:JobDescription></jsdl:JobDefinition>
07/07/2009 Slide 24
ISSGC’09
JSDL – Resource Requirements Description
Support simple descriptions of resource requirementsNOT a comprehensive resource requirements languageCan be extended with other elements for richer or more abstract descriptions
Main target is compute jobsCPU, memory, file system/disk, operating system requirements
Allows some flexibility for aggregate (total) requirements“I want 10 CPUs in total and each resource should have 2 or more”
Very basic support for network requirements
07/07/2009 Slide 25
ISSGC’09
JSDL application extensions
SPMD (single-program-multiple-data)http://www.ogf.org/documents/GFD.115.pdf
Extends JDSL 1.0 for parallel applications (MPI, PVM, etc.)Allows to specify number of processors, processors per host, threads per processes along with the application
HPC (high performance computing)http://www.ogf.org/documents/GFD.111.pdf
Extends JSDL 1.0 for HPC applications running as operating system processes (e.g. username, environment, working directory can be specified)
07/07/2009 Slide 26
ISSGC’09
Lessons Learned
JSDL is a standardized language to describe jobs to be submittedto Grid resources
Not only the job itself (application, arguments, input, output, etc.), but also resource requirements (CPU, memory, etc.)Extensions for specific application domains (parallel programs, HPC applications) exist
BUT: JSDL can not directly be submitted to Grid resources, i.e. a resource management and job scheduling system of a cluster or multiprocessor system in a Grid
07/07/2009 Slide 27
ISSGC’09
Execution and Job Management – 1
One of the essential functionalities and components in a Grid middleware
Deals withInitiating/submitting, monitoring and managing jobsHandling and staging of all job dataCoordinating and scheduling of multi-step jobs
Examples:XNJS in UNICORE 6 ( sessions 10-12 today)WS-GRAM in GT4 ( session 19-21 on Thursday)WMS in gLite ( session 24-26 on Friday)ARC Client in NorduGrid/ARC ( session 29-30 on Sat.)
07/07/2009 Slide 28
ISSGC’09
Execution and Job Management – 2
Initiating/submitting, monitoring and managing jobsTranslates the Grid job in a specific job (application details, resources, etc.) for the target systemSubmits the job to the resource management system using its proprietary way of job submissionFrequently polls the job status (waiting/queued, running/executing, failed, aborted, paused, finished, etc.) from the resource management systemProvides “access” to the job, its status and data during its runtime and after its (successful or unsuccessful) completionIf at job submission time the resource management system becomes not available/reachable, the job is cached for a future hand over to it
07/07/2009 Slide 29
ISSGC’09
Execution and Job Management – 3
Handling and staging of all job data, incl. job directory and persistencyCreates, manages, destroys the job directoryAll data submitted with the job as input is stored in the job directoryData is staged in from remote data resources/archivesAll data generated by the job is preserved and/or staged after the successful completion of the job
Coordinating and scheduling of multi-step jobsIf a job consists of more than one step (a workflow), the required resources are orchestratedManages the proper initiation of the workflow executionThe execution of the workflow is controlled and monitored
07/07/2009 Slide 30
ISSGC’09
Execution and job management is needed A meta-layer on top of the Grid resources is needed
to provide a uniform way of accessing the Grid and to provide an intuitive, secure and easy to use interface for the user
Lessons Learned
07/07/2009 Slide 31
ISSGC’09
Introduction to UNICORE(from 30,000 ft)
more in sessions 10-12, today
07/07/2009 Slide 32
ISSGC’09
UNiform Interface to COmputing REsourcesseamless, secure, and intuitive
Initial development started in two German projects funded by theGerman ministry of education and research (BMBF)
08/1997 – 12/1999: UNICORE projectResults: well defined security architecture with X.509 certificates,
intuitive GUI, central job supervisor based on Codine(predecessor of SGE) from Genias
1/2000 – 12/2002: UNICORE Plus projectResults: implementation enhancements (e.g. replacement of
Codine by custom NJS), extended job control (workflows), application specific interfaces (plugins)
Continuous development since 2002 in several EU projectsOpen Source community development since Summer 2004
(Short) History Lesson
07/07/2009 Slide 33
ISSGC’09
2008200720062005200420032002200120001999 2009
More than a decade of German and European research & development and infrastructure projects
Any many others, e.g.
Projects
2010 2011
UNICOREUNICORE Plus
EUROGRIDGRIP
GRIDSTARTOpenMolGRID
UniGridsVIOLADEISA
NextGRIDCoreGRID
D-Grid IPEGEE-IIOMII-EuropeA-WARE
ChemomentumeDEISA
PHOSPHORUS
D-Grid IP 2
SmartLMPRACE
D-MON
DEISA2
ETICS2WisNetGrid
07/07/2009 Slide 34
ISSGC’09
– Grid driving HPC
Used inDEISA (European Distributed Supercomputing Infrastructure)National German Supercomputing Center NICGauss Center for Supercomputing (Alliance of the three German HPC centers)PRACE (European PetaFlop HPC Infrastructure) – starting-upBut also in non-HPC-focused infrastructures (i.e. D-Grid)
Taking up major requirements from i.e.
HPC usersHPC user support teamsHPC operations teams
07/07/2009 Slide 35
ISSGC’09
Open source (BSD license)Open developer community on SourceForgeContribution with your own developments easily possible
Design principlesStandards: OGSA-conform, WS-RF compliantOpen, extensible, interoperableEnd-to-End, seamless, secure and intuitiveSecurity: X.509, proxy and VO supportWorkflow and application support directly integratedVariety of clients: graphical, commandline, portal, API, etc.Quick and simple installation and configurationSupport for many operating and batch systems100% Java 5
– www.unicore.eu
07/07/2009 Slide 36
ISSGC’09
UNICORE in usesome examples
07/07/2009 Slide 37
ISSGC’09
Usage in D-Grid
Core D-Grid sites committing parts of their existing resources to D-Grid
Approx. 700 CPUsApprox. 1 PByte of storageUNICORE is installed and used
Additional Sites received extra money from the BMBF for buying compute clusters and data storage
Approx. 2000 CPUsApprox. 2 PByte of storageUNICORE (as well as Globusand gLite) is installed as soon as systems are in place
LRZDLR-DFD
07/07/2009 Slide 38
ISSGC’09
Consortium of leading national HPC centers in EuropeDeploy and operate a persistent, production quality,
distributed, heterogeneous HPC environmentUNICORE as Grid Middleware
On top of DEISA’s core services:Dedicated networkShared file systemCommon productionenvironment at all sites
Used e.g. for workflow applications
IDRIS – CNRS (Paris, France), FZJ (Jülich, Germany), RZG (Garching, Germany), CINECA (Bologna, Italy), EPCC ( Edinburgh, UK), CSC (Helsinki, Finland), SARA (Amsterdam, NL), HLRS (Stuttgart, Germany), BSC (Barcelona, Spain), LRZ (Munich, Germany), ECMWF (Reading, UK)
Distributed European Infrastructure for Supercomputing Applications
www.deisa.eu
more in session 33, Monday 9:00 – 10:30
07/07/2009 Slide 39
ISSGC’09
Focus on providing common interfaces and integration of major Grid software infrastructures
OGSA-DAI, VOMS, GridSphere, OGSA-BES, OGSA-RUSUNICORE, gLite, Globus Toolkit, CROWN
Apply interoperability components in application use-cases
Interoperability and Usability of Grid Infrastructures
07/07/2009 Slide 40
ISSGC’09
Provide an integrated Grid solution for workflow-centric, complex applications with a focus on data, semantics and knowledge
Provide decision support services for risk assessment, toxicity prediction, and drug designEnd user focus
ease of usedomain specific tools“hidden Grid”
Based on UNICORE 6
Grid Services based Environment to enable Innovative Research
more in sessions 12-13, this afternoon
07/07/2009 Slide 43
ISSGC’09
Commercial usage at
Slide courtesy of Alfred Geiger, T-Systems SfR
07/07/2009 Slide 44
ISSGC’09
UNICORE has a strong HPC-background, but is not limited to HPC use cases, it can be used in any GridUNICORE is OGSA-conform and WS-RF compliantUNICORE is open, extensible and interoperableUNICORE is open source and coded in JavaUNICORE is used in EU and national projects, European e-infrastructures, National Grid Initiatives (NGI), commercial environments, etc.Documentation, tutorials, mailing lists, community links, software, source code, and more:
Lessons Learned
http://www.unicore.euhttp://www.unicore.eu
Recommended