31
Distributed Analysis using Ganga I. Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial TLAS Computing & Software Workshop, München, 26-30 March 2007 http://cern.ch/ganga Karl Harrison / University of Cambridge

Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

Embed Size (px)

Citation preview

Page 1: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

Distributed Analysis using Ganga

I. Ideas behind Ganga II.Getting startedIII.Running ATLAS

applications

Distributed Analysis TutorialATLAS Computing & Software Workshop, München, 26-30 March 2007

http://cern.ch/ganga

Karl Harrison / University of Cambridge

Page 2: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 2/31

Ganga basics

• Depending on context, Ganga can be any of:

(A) a Hindu goddess

(B) an hallucinogenic drug

(C) a job-management framework (Gaudi/Athena and Grid Alliance),implemented in Python, that simplifiesrunning jobs on the Grid

• Anyone expecting a presentation on (A) or (B) is going to be disappointed

• Some have suggested: A + B = C

Sculpture of Ganga in cave temple,Elephanta Island, Mumbai harbour

Ganga, or ganja, is prepared fromthe plant cannabis sativa

Page 3: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 3/31

Ganga as a job-management framework (1)

• Ganga is developed as ATLAS-LHCb common project• Ganga 4.2.12 (current release), has built-in support for applications based on Athena framework, for JobTransforms and for DQ2 data-management system

- Ganga 4.3 (release early in April) will additionally be interfaced with AMI and TNT

• Component model allows customisations for other types of application, e.g. ROOT

• Ganga provides a uniform interface for accessing different types of processing system

- Allow trivial switching between testing on local batch system and running full-scale analysis on the Grid

Job definition

Job submission

Page 4: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 4/31

Ganga as a job-management framework (2)

• Whenever started, Ganga runs a monitoring loop in the background

- Track progress of submitted jobs- Retrieve outputs of completed jobs- Check validity of user credentials: Grid proxy and/or AFS token

• Ganga stores job information locally or (Ganga 4.3) on a remote server with certificate-based authentication• Job inputs and outputs are kept in Ganga workspace until moved or deleted by user• User can modify code without affecting a submitted job

Monitoring

Archival

Page 5: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 5/31

Ganga job abstraction

• A job in Ganga is constructed from a set of building blocks, not all required for every job

Merger

Application

Backend

Input Dataset

Output Dataset

Splitter

Data read by application

Data written by application

Rule for dividing into subjobs

Rule for combining outputs

Where to run

What to run

Job

Page 6: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 6/31

Plugin classes

Athena

GangaObject

IApplication IBackendIDatasetISplitter IMerger

LCG-CE-requirements-jobtype-middleware-id-status-reason-actualCE-exitcode

-atlas_release-max_events-options-option_file-user_setupfile-user_area

User

System

Plugin

Interfaces

Example plugins

and schemas

• Ganga handles many types of Application, Backend, Dataset, Splitter and Merger, implemented as plugin classes

• Each plugin class has its own schema• New plugin classes can readily be added: the system is

extensible

Page 7: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 7/31

Applications and Backends• Running of a particular Application on a given Backend is enabled by

implementing an appropriate adapter component or Runtime Handler– Can often use same Runtime Handler for more than one Backend:

less coding

PBS OSG NorduGridLocal LSF PANDA

US-ATLAS WMS

LHCb WMS

ExecutableAthena

(Simulation/Digitisation/Reconstruction/Analysis)

AthenaMC(Production)

Gauss/Boole/Brunel/DaVinci(Simulation/Digitisation/Reconstruction/Analysis)

LHCb Experiment neutral ATLAS

Available in Ganga 4.2

Work in progress

New in Ganga 4.3

Page 8: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 8/31

Help with Ganga

• Ganga documentation can be found in the User Guides section of the Ganga web side: http://cern.ch/ganga/– Most relevant items are:

• Installation• Working with Ganga - general introduction to functionality• GUI manual - introduction to graphical interface• Link to ATLAS Wiki page for distributed analysis using Ganga

– https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427– Tomorrow’s hands-on sessions will use this

• For problems or feature requests, do any of the following:– Use hypernews forum for Ganga users and developers:

https://hypernews.cern.ch/HyperNews/Atlas/get/GANGAUserDeveloper.html

– Send e-mail to [email protected]– Submit a report via Ganga’s bug-submission page in Savannah:

https://savannah.cern.ch/bugs/?func=additem&group=ganga• Should either login to Savannah first, or give e-mail address

Page 9: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 9/31

Installation for distributed analysis with Ganga

• Software for distributed analysis with Ganga is already installed at CERN and a number of other sites

• If needed, you can perform your own installation– Install the ATLAS software

• See: https://twiki.cern.ch/twiki/bin/view/Atlas/InstallingAtlasSoftware

– To be able to access LCG resources, install LCG user interface• See: https://twiki.cern.ch/twiki/bin/view/LCG/TarUIInstall

– Install DQ2 client• See: https://twiki.cern.ch/twiki/bin/view/Atlas/DDMClientDQ2

– Install Ganga• Download installation script: http://cern.ch/ganga/download/ganga-install

• Perform installation of latest release using:

• With Ganga 4.3, will be able to add GangaNorduGrid to package list– Automatically install NorduGrid client software

python ganga-install --extern=GangaAtlas,GangaGUI,GangaPlotter last

Page 10: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 10/31

Setting up for distributed analysis with Ganga

• Setup sequence is as follows– Ensure that you have a Grid certificate installed, and that you are registered with the ATLAS Virtual Organisation

– Setup environment for Athena, then checkout and build UserAnalysis package (or equivalent)

– Setup the environment for using LCG client tools– Setup the environment for using DQ2– Setup the environment for using Ganga

• On an lxplus account at CERN, Ganga setup is performed using:

• Ganga setup at other sites should ensure the following:– Directory containing ganga executable is added to PATH–

• Detailed setup instructions given as part of hands-on exercises

source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh

Optional, butsometimes useful

GANGA_CONFIG_PATH is set to GangaAtlas/Atlas.ini

Page 11: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 11/31

Using Ganga

• Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython)– Especially good for trying things out, and seeing how the system works

• Scripts, which may contain any Python/IPython or CLIP commands,allow automation of repetitive tasks

• Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system

• Graphical User Interface (GUI)allows job management based on mouse selectionsand field completion– Lots of configuration possibilities

• Ganga allows users to work in a variety of ways

Page 12: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 12/31

Ganga startup and configuration files

ganga --help

ganga -g

ganga --gui &

ganga <script>

• Before processing .gangarc, Ganga processes, in the order they are specified, any configuration files pointed to by the environment variable GANGA_CONFIG_PATH

– This makes possible the use of group configuration files, but allows settings to be overridden on a user-by-user basis

print Ganga help information

start GUI session

run specified script in Ganga environment

ganga start CLIP session

• Ganga can be invoked in any of the following ways:

– If user doesn’t have a valid proxy then his/her Grid passphrase is requested

• When Ganga is first run, a configuration file .gangarc is created in the user’s home directory

– The file includes comments on the configuration possibilities– The latest default configuration file can always be obtained with:

Page 13: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 13/31

Ganga workspace• Ganga creates a directory gangadir in your home directory

and uses this for storing job-related files and information– You can’t move this directory but, before running Ganga, you can create ~/gangadir as a link to another location

– Should delete jobs when they are no longer needed, so that Ganga input/output files don’t exhaust disk quota

gangadir

repository

input

Local

templates

output

workspace

Remote

gui

<username>

jobs 66 67

Page 14: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 14/31

Python commands

• Ganga is developed in Python, making use of IPython extensions

• All Python/IPython commands can be used at the prompt in a Ganga CLIP session, and the syntax for CLIP and Python commands is the same

• Information about Python can be found at: http://www.python.org/– If you’re new to Python, the on-line tutorial is very helpful

• The following are often useful

# A hash (#) marks the start of a comment# A slash (\) at the end of a line indicates that# the following line is a continuationdir() # List currently available objectshelp() # Give helphelp( item ) # Give help on specified itemx = 5 # Assign value to variableprint x # Print value of variablectrl-D # Exit from session

Page 15: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 15/31

IPython commands

• Information about IPython extensions can be found at: http://ipython.scipy.org/

• One useful extension is the possibility to use shell commands from Python, together with both shell variables and Python variables# Use ! before shell commands# Use $ before Python variables# Use $$ before shell variables

here = ‘where the heart is’!echo $$HOME is $here

!ls $$HOME/mySubdir

!emacs # Start emacs session!zsh # Give shell prompt

Exit # Exit from session

Page 16: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 16/31

Ganga CLIP commands (1)

• Ganga commands are explained in the guide Working with Ganga:http://cern.ch/ganga/user/html/GangaIntroduction

• From a CLIP session, available classes, objects and functions may be listed, and help can be requested for each

• Useful commands include the followingplugins( ‘type’) # List plugins of specified type: # ‘applications’, ‘backends’, etcj1 = Job( backend =LSF() ) # Create a new job for LSFa1 = Executable() # Create Executable applicationj1.application = a1 # Set value for job’s applicationj1.backend = LCG() # Change job’s backend to LCGexport( j1, ‘myJob.py’ ) # Write job to specified fileload( ‘myJob.py’ ) # Load job(s) from specified filej2 = j1.copy() # Create j2 as a copy of job j1jobs # List jobsjobs[ i ].subjobs # List subjobs for split job i

Page 17: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 17/31

Ganga CLIP commands (2)

• When a job j has been defined, the following methods can be used

• Once a job has been submitted, it can no longer be modified, and it cannot be resubmitted, but the job can be copied and the copy can be modified/submitted

• Ganga supports use of templates, which can be used as the basis of a job definition

j.submit() # Submit the jobj.kill() # Kill the job (if running)j.remove() # Kill the job and delete associated filesj.peek() # List files in job’s output directory

t = JobTemplate() # Create templatetemplates # List templatesj3 = Job( templates[ i ] ) # Create job from template i

Page 18: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 18/31

CLIP: “Hello World” example

• From a Ganga CLIP session, a job that writes “Hello World” can be created, and then submitted to LCG, as follows app = Executable()app.exe = ‘/bin/echo’app.env = {}app.args = [‘Hello World’ ]# Property values set above are in fact the defaults# for Executable applicationj = Job( application = app, backend = LCG() )j.submit()# Check on job progressjobs# When job has completed, check the outputj.peek( ‘stdout’ )

Page 19: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 19/31

Using Ganga commands from a Linux shell• Ganga includes scripts that can be used from a Linux shell (i.e.

outside of CLIP) # Create a job for submitting Executable to LCG ganga make_job Executable LCG test.py [ Edit test.py to set Executable and/or LCG properties ] # Submit job ganga submit test.py # Query status, triggering output retrieval if job is completed, # but not recommended because of risk of time-outs for status queries ganga query

# Kill job ganga kill id # Remove job from Ganga repository and workspace ganga remove id

• Given job name or id as returned by query, also have possibilities such as

• Same syntax can be used from inside CLIP, with no overheads for startup

Page 20: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 20/31

Ganga plugins for ATLAS jobs

Athena

GangaObject

IApplication IBackendIDatasetISplitter IMerger

LCG

ATLASCastorDataset

DQ2Dataset

ATLASDataset

ATLASLocalDataset

ATLASOutputDataset

DQ2OutputDatasetAthenaMC

AthenaMCpyJY

AthenaSplitterJob

AthenaMCSplitterJob

AthenaMCpyJTSplitterJob

AthenaOutputMergerLSF

Other

Analysis

Production

Input data

Output data

Dataset in DQ2/DDM

Files on local storage

Old mc10 data in old LFC

Older data on CASTOR at CERN

Dataset in DQ2/DDM

Files on local storage

Page 21: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 21/31

Starting point for using Ganga to run ATLAS applications• Need usual setup for running Athena• For analysis:

– Need steering package that defines the physics analysis• This is any package where cmt/requirements defines all dependencies

• In the hands-on exercises, and for anyone who’s followed the analysis examples in the ATLAS Workbook, the steering package is UserAnalysis

– Work from /run subdirectory of steering package• For user-level production

– Should download JobTransform archive to directory where Ganga is run

– Archive used in hands-on exercises is:http://cern.ch/atlas-computing/links/kitsDirectory/Production/kits/AtlasProduction_12_0_4_1_noarch.tar.gz

Page 22: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 22/31

Using Ganga’s athena script to submit analysis job to LCG• From the Linux shell, job can be submitted to LCG using the syntax:

ganga athena \--inDS misalg_csc11.005300.PythiaH130zz4l.recon.AOD.v12003104 \--outputdata AnalysisSkeleton.aan.root \--split 3 \--maxevt 100 \--lcg \--ce ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas \AnalysisSkeleton_topOptions.py

Use Ganga’s athena script Input dataset Output data

Split job into 3 subjobs

Limit analysis to 100 events per subjob

Submit to LCGForce use of particular compute elementJob options

• Replace --lcg with --lsf, and omit --ce, to submit to LSF– Trivial switching between running locally and running on Grid

• Help available on options accepted by Ganga’s athena scriptganga athena --help

Page 23: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 23/31

Monitoring job progress and retrieving output

• To monitor job progress, you should start a Ganga CLIP or GUI session

• In CLIP, changes in the status of jobs/subjobs are buffered, and are printed when you hit return

• At any time, you can also explicitly request status information # print status information for all jobsjobs# Print status information for particular subjobprint jobs[5].subjobs[27].status

• When a job completes, the Ganga monitoring loop takes care of storing the output, and registers it with DQ2 with a dataset name of the form user.username.ganga.jobid

• Output can be listed and retrieved using DQ2 client tools

dq2_ls -f user.username.ganga.jobiddq2_get -r user.username.ganga.jobid

Page 24: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 24/31

Running an analysis job from CLIP (1)

• Create application object, set job options and prepare tar file of user area– Other properties filled automatically, based on user setup app = Athena()app.application.option_file = ‘myOpts.py’app.prepare( athena_compile = False )

• Define the input dataset

inData = DQ2Dataset()inData.dataset = ‘interestingDataset.AOD.v12003104’inData.type = ‘DQ2_Local’

• Define the output dataset

outData = AthenaOutputDataset()outData.outputdata = ‘myOutput.root’

Page 25: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 25/31

Running an analysis job from CLIP (2)

• Define splitter, merger and backend

splitter = AthenaSplitterJob( numsubjobs = 2 )merger = AthenaOutputMerger()backend = LCG( CE = ‘reliableCE’ )

• Create job template from defined objects

t = JobTemplate( name = ‘TestAnalysis’ )t.application = appt.backend = backendt.inputdata = inDatat.outputdata = outDatat.splitter = splittert.merger = merger

Page 26: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 26/31

Running an analysis job from CLIP (3)

• Create job from the template and submit the job

j = Job( t )j.submit()

• Check job status

jobs

• When job has completed, check standard outputs of subjobs, then retrieve and merge ROOT output files

j.subjobs[0].peek( “stdout” )j.subjobs[1].peek( “stdout” )j.outputdata.retrieve()j.merge()

Page 27: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 27/31

User-level production

• Event production is broken down into three steps:– evgen: generate particle kinematics– simul+digit: simulate particles passing through detector - RDO output

– recon: event reconstruction - AOD, ESD, CBNT output• With Ganga 4.3, submission of production jobs from Linux

shell will be possible using Ganga’s athena script• As CLIP example, consider generation of 30 events

containing single electron with ET > 40 GeV

– Same example used in hands-on exercises

Page 28: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 28/31

Running user-level production from CLIP (1)

• Create application object, and set propertiesapp = AthenaMC()app.atlas_release = ‘12.0.4’app.transform_archive = ‘AtlasProduction_12_0_4_1_noarch.tar.gz’app.production_name = ‘tutorial’app.mode = ‘evgen’app.evgen_job_option = ‘DC3.007004.singlepart_e_Et40.py’app.process_name = ‘single_e_Et40’app.run_number = ‘000001’app.firstevent = ‘1’app.random_seed = ‘1102362401’app.number_events_job = ‘30’app.se_name = ‘NIKHEF’

Page 29: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 29/31

Running user-level production from CLIP (2)

• Define the output dataset– The output is stored at the site specified by app.se_name– Naming convention explained in hands-on exercises

• Define LCG backend, with execution forced at a particular site

backend = LCG()backend.CE = ‘tbn20.nikhef.nl:2119/jobmanager-pbs-atlas’

• Create job template from defined objects

t = JobTemplate( name = ‘TestGeneration’ )t.application = appt.backend = backendt.outputdata = outData

outData = DQ2OutputDataset()

• Create job from template and submit Job( t ).submit()

Page 30: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 30/31

Ganga Graphical User Interface (GUI)

• GUI consists of central monitoring panel and dockable windows

• Essentially everything that can be done in CLIP can be done with the GUI– More details in presentation tomorrowJob

details

Logical

Folders

Scriptor

Job Monitoring

Log window

Job builder

Page 31: Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

29 March 2007 31/31

Conclusions

• Have given an overview of:– the ideas behind Ganga– getting started with Ganga, running a “Hello World” job

– using Ganga to run ATLAS applications• Have probably made it seem more complicated than it is

in practice• To see that Ganga is quite easy to use, you just have to

try it– Chance for this, and more detailed explanations of the functionality, in the Ganga hands-on sessions tomorrow