30
1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011 www.epikh.eu The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN Installation and configuration Riccardo Rotondo ([email protected] ) National Institute of Nuclear Physics Asia 2 2011 - CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators Kolkata, 03.02.2011

1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011 The EPIKH Project (Exchange Programme

Embed Size (px)

Citation preview

Page 1: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

1Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011www.epikh.eu

The EPIKH Project(Exchange Programme to advance e-Infrastructure Know-How)

CE+WN Installation and configuration

Riccardo Rotondo ([email protected])

National Institute of Nuclear Physics

Asia 2 2011 - CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators

Kolkata, 03.02.2011

Page 2: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

2Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Outline

• Computing Element overview• Worker Node overview• CE CREAM overview• gLite stack overview• gLite CE cream and siteBDII

– Installation on CE and WN (wiki)– Configuration on CE and WN (wiki)

Page 3: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

3Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

gLite stack overview

Page 4: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

4Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

gLite overview

worker node

Page 5: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

5Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

glite overview• User Interface: it’s the point of access for users to

glite grid services• WMS: it’s the component that optimize resource

usage.• CE: the machine who manage worker nodes• WN: the machines who actually execute applications• SE: machines where files are stored• LFC: used to “find” files on the grid• BDII: services responsible to publish all info of your

sites• Logging and Bookkeping: as it’s name says it’s a

logger and alert user when job is finisched

Page 6: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

6Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Computing Element Overview

• Computing Element provides some of main services of a site.

• Main functionalities:– job management (job submission, job control)– job status updated for WMS– Usually installed together with the site BDII service that

publishes all information regarding the computing element

• It can runs several kinds of batch system:– Torque + MAUI– LSF– SGE– Condor

Page 7: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

7Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Torque + MAUI

• Torque server service:– pbs_server provides basic batch services such as

receiving/creating a batch job.

• Torque client service:– psb_mom places jobs into execution. It’s is also

responsible for returning job’s output to the user.

• MAUI system service:– job_scheduler contains site’s policy to decide which job is

going to be executed and when.

Page 8: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

8Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Site BDII*

• By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual.

• It collect all site GRISes* (for example SE,RB,LFC,etc...)

• Service is named bdii

• Log file: /opt/bdii/var/bdii.log

• *BDII = Berkeley Database Information Index• **GRIS = Grid Resouce Information Service

Page 9: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

9Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Worker Node Element Overview

• They are machines which really execute your job.

• User can only access their services by a Computing Element.

• Their characteristics are collected by Computing Element that publishes all information by BDII services

Page 10: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

10Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

• Computing Resource Execution And Management

• Accept job submission requests belonging from a WMS and other job management request.

• It exposes a web services interface

CE Cream overview

Page 11: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

11Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Requirements

• Three or more machine:– One will be used to perform CE installation;– Others will be used to perform WN installation;

• Architecture: 64 bit• Operating System: Scientific Linux 5• CE machine with a public ip address, direct and

reverse address resolution on a DNS and equipped with an X509 certificate.

Page 12: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

12Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

CE Cream and WN Installation & Configruation

(on Torque/PBS)

Page 13: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

13Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Wiki

• Follow the steps here for CE CREAM:–https://grid.ct.infn.it/twiki/bin/view/EPIK

H/CECreamEpikh

• Follow the steps here for WN:• https://grid.ct.infn.it/twiki/bin/view/EPI

KH/WNEpikh

Page 14: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

14Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

A few words on benchmark • How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR

?

• Try to search for you value in thris link:• http://www.italiangrid.org/grid_operations/site_manager/HEP-

SPEC06

• https://hepix.caspur.it/benchmarks/doku.php?id=bench:results_sl5_x86_64_gcc_412

• https://hepix.caspur.it/processors/dokuwiki/doku.php?id=benchmarks:results

• For example if you have an Intel XEON 5520 2.23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so:

• CE_SI00 = 3800

• CE_SF00 = 3800

• CE_CAPABILITY="CPUScalingReferenceSI00=3800”

• CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06”

• Where (3800/40)/4= 23.75

Page 15: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

15Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Adding a VO

# vim my-ig-site-info.def

VOS="euindia infngrid ops dteam"QUEUES="cert grid"CERT_GROUP_ENABLE="euindia ops dteam"GRID_GROUP_ENABLE="infngrid"

Page 16: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

16Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Adding a VO/2

q1 q2 q3

Page 17: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

17Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Q1_GROUP_ENABLE

Adding a VO/3

Q2_GROUP_ENABLE

Q3_GROUP_ENABLE

Page 18: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

18Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Adding a VO/4

# vim vo.d/euindiaSW_DIR=$VO_SW_DIR/euindiaDEFAULT_SE=$SE_HOSTSTORAGE_DIR=$CLASSIC_STORAGE_DIR/euindiaVOMS_SERVERS="'vomss://voms.ct.infn.it:8443/voms/euindia?/euindia'"VOMSES="'euindia voms.ct.infn.it 15004 /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it euindia'"VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA'"

• Here some settings to support euindia VO:

Then install the VO voms certificates with:

wget http://grid018.ct.infn.it/mrepo/cometa_sl4-i386/RPMS.app/cometa-vomscert-1.0-3.noarch.rpm

rpm –ivh cometa-vomscert-1.0-3.noarch.rpm

Page 19: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

19Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Adding a VO/5

• Now you have to provide a group and some users for EUINDIA VO modifying this two files:

- ig-groups.conf

- ig-users.conf

# vim ig-groups.conf # Append following lines to the end of file"/euindia/ROLE=SoftwareManager":::sgm:"/euindia"::::-

Page 20: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

20Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Adding a VO/6

# vim ig-users.conf #append this line at the end of the file39001:euindia001:3900:euindia:euindia::39002:euindia002:3900:euindia:euindia::39003:euindia003:3900:euindia:euindia::39004:euindia004:3900:euindia:euindia::39005:euindia005:3900:euindia:euindia::39006:euindia006:3900:euindia:euindia::39007:euindia007:3900:euindia:euindia::39008:euindia008:3900:euindia:euindia::39009:euindia009:3900:euindia:euindia::39010:euindia010:3900:euindia:euindia::39011:euindia011:3900:euindia:euindia::39012:euindia012:3900:euindia:euindia::39013:euindia013:3900:euindia:euindia::39014:euindia014:3900:euindia:euindia::39015:euindia015:3900:euindia:euindia::39016:euindia016:3900:euindia:euindia::39017:euindia017:3900:euindia:euindia::39018:euindia018:3900:euindia:euindia::39019:euindia019:3900:euindia:euindia::39020:euindia020:3900:euindia:euindia::39101:sgmeuindia001:3910,3900:sgmeuindia,euindia:euindia:sgm:39102:sgmeuindia002:3910,3900:sgmeuindia,euindia:euindia:sgm:39103:sgmeuindia003:3910,3900:sgmeuindia,euindia:euindia:sgm:

Page 21: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

21Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Testing installation

Page 22: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

22Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Tests on CE• SSH access to CE to test if CE can see WN and to test if all main

service are up & running

# pbsnodes Your-ip-hostname state = free np = 2 properties = lcgpro ntype = cluster status = opsys=linux,uname=Linux grid-test-63.trigrid.it 2.6.18-164.6.1.el5 #1 [cut]

# /etc/init.d/gLite status*** tomcat5:/opt/glite/etc/init.d/tomcat5 is already running (1514)*** glite-lb-locallogger:glite-lb-logd runningglite-lb-interlogd running# /etc/init.d/globus-gridftp statusglobus-gridftp-server (pid 25452) is running...

Page 23: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

23Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Tests on CE

• SSH access to CE and then become a gilda user:

# su – euindia001

$ vi test.sh#!/bin/sh sleep 20 #(it's useful to see the job status) hostname

• Create a file and add the following:

• Set right permission to be executable:

$ chmod 700 test.sh

Page 24: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

24Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Tests on CE

• Launch job locally on CE

$ qsub –q euindia test.sh

• Then check list of job in execution on CE

$ qstat –a

ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----3.wn.localdo gilda001 short test.sh 5839 -- -- -- 00:15 R --

• In case you want to abort a job execution:

$ qdel 3 #that is jobid

• In case you want to more info:

$ qstat -f 3

Page 25: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

25Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Tests on CE

• If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output.

$ lstest.sh.e3 test.sh.o3$ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain

Page 26: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

26Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

JDL example

$ vim hostname-cream.jdl

Type = "Job";JobType = "Normal";Executable = "/bin/hostname";StdOutput = "hostname.out";StdError = "hostname.err";OutputSandbox = {"hostname.err","hostname.out"};Arguments = "-f";OutputSandboxBaseDestUri = "gsiftp://localhost";ShallowRetryCount = 3;

Page 27: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

27Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Working test• SSH access to UI to test if CE can receive and execute

simple job$ ssh [email protected] #password: XXXXXXX$ voms-proxy-init --voms euinda[cut][rotondo@genius ~]$ glite-ce-delegate-proxy -e grid-test-33.trigrid.it riccardo2010-06-29 02:36:21,683 WARN - No configuration file suitable for loading. Using built-in configuration2010-06-29 02:36:26,389 NOTICE - Proxy with delegation id [riccardo] succesfully delegated to endpoint [https://grid-test-33.trigrid.it:8443//ce-cream/services/gridsite-delegation]$[rotondo@genius ~]$ glite-ce-job-submit –r grid-test-33.trigrid.it:8443/cream-pbs-cert -D riccardo hostname-cream.jdl2010-06-29 02:39:06,444 WARN - No configuration file suitable for loading. Using built-in configurationhttps://grid-test-33.trigrid.it:8443/CREAM501920532$ glite-ce-job-status https://ceristXX.grid.arn.dz:8443/CREAM888739522****** JobID=[https://ceristXX.grid.arn.dz:8443/CREAM888739522] Status = [DONE-OK] ExitCode = [0]

Page 28: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

28Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Troubleshooting

• Which logs are supposed to be open if something goes wrong?:–/var/log/message, for general errors–/opt/glite/var/log (especially glite-

ce-cream.log)–/var/spool/pbs/server_priv/

accounting/<data>, if even local submission on batch system doesn’t work.

Page 29: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

29Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

References• INFNGRID generic installation guide:

– http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_2

• YAIM configuration variables

– https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables

• CE Cream installation guide:

– GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki]

• YAIM system administrator guide:

– https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400

• How To Check And Test Your CREAMCE

– http://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAMCE

Page 30: 1 Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011  The EPIKH Project (Exchange Programme

30Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011

Thank you for your kind attention !

Any questions ?