73
Enabling Grids for E-sciencE The INFN GRID Marco Verlato (INFN-Padova) EELA WP2 E-infrastructure Workshop Rio de Janeiro, 20-23 August 2007

The INFN GRID

  • Upload
    emory

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Marco Verlato (INFN-Padova). The INFN GRID. EELA WP2 E-infrastructure Workshop Rio de Janeiro, 20-23 August 2007. Outline. A little of history INFNGRID Overview INFNGRID Release INFNGRID Services From developers to production… Monitoring and Accountig Users and Sites Support - PowerPoint PPT Presentation

Citation preview

Page 1: The INFN GRID

Enabling Grids for E-sciencE

The INFN GRID

Marco Verlato (INFN-Padova)

EELA WP2 E-infrastructure Workshop

Rio de Janeiro, 20-23 August 2007

Page 2: The INFN GRID

Enabling Grids for E-sciencE

2

Outline

• A little of history

• INFNGRID Overview

• INFNGRID Release

• INFNGRID Services

• From developers to production…

• Monitoring and Accountig

• Users and Sites Support

• Managing procedures

Page 3: The INFN GRID

Enabling Grids for E-sciencE

3

The INFN GRID project• The 1° National Project (Feb. 2000) aiming to develop the grid

technology and the new e-infrastructure to solve LHC (and e-Science) computing requirements

• e-Infrastructure = Internet + new WEB and Grid Services on top of a physical layer composed by Network, Computing, Supercomputing and Storage Resources, made properly available in a shared fashion by the new Grid services

• Since then many Italian and EU projects made this a reality• Many scientific sectors in italy, EU and the entire World base now their

research activities on the Grid• INFN Grid continues to be the national container used by INFN to reach

its goals coordinating all the activities:– In the national, european and international Grid projects – In the standardization processes of the Open Grid Forum (OGF)– In the definition of EU policies in the ICT sector of Research

Infrastructures – Through its managerial structure: Executive Board, Technical Board…

Page 4: The INFN GRID

Enabling Grids for E-sciencE

4

The INFN GRID portal

http://grid.infn.it

Page 5: The INFN GRID

Enabling Grids for E-sciencE

5

The strategy• Clear and stable objectives: development of the technology and of the

infrastructure needed for the LHC computing but of general value• Variable instruments: use of projects and external funds ( from EU,

MIUR...) to reach the goal • Coordination among all the projects (Executive Board)

– Grid middleware & infrastructure Grid needed by INFN and LHC within a number of core European and International projects, often coordinated by CERN

DataGrid, DataTAG, EGEE, EGEE II, WLCG– Often fostered by INFN itself

• International collaboration with US Globus and Condor for the middleware and Grid projects like Open Science Grid e Open Grid Forum in order to reach global interoperability among developed services and the adoption of international standards

• National pioneer developments of the MW and the national infrastructure in the areas not covered by EU projects via national projects like Grid.it , LIBI, EGG …

• Strong contribution to political committees: e-Infrastructure Reflection Group (eIRG ->ESFRI), EU Concertation meetings and with involved Units of Commission (F2 e F3) to establish activities programs (Calls)

Page 6: The INFN GRID

Enabling Grids for E-sciencE

6

Some history … LHC EGEE Grid

• 1999 – Monarc Project– Early discussions on how to organise distributed computing

for LHC• 2000 – growing interest in grid technology

– HEP community was the driver in launching the DataGrid project

• 2001-2004 - EU DataGrid project / EU DataTAG project– middleware & testbed for an operational grid

• 2002-2005 – LHC Computing Grid – LCG– deploying the results of DataGrid to provide aproduction facility for LHC experiments

• 2004-2006 – EU EGEE project phase 1– starts from the LCG grid– shared production infrastructure– expanding to other communities and sciences

• 2006-2008 – EU EGEE-II – Building on phase 1– Expanding applications and communities …

• … and in the future – Worldwide grid infrastructure??– Interoperating and co-operating infrastructures?

CERN

Page 7: The INFN GRID

Enabling Grids for E-sciencE

7

Other FP6 activities of INFN GRID in Europe/1

• To guarantee Open Source Grid Middleware evolutions towards international standards– OMII Europe

• …and its availability through an effective repository– ETICS

• To contribute to R&D informatics activities– Core Grid

• To Coordinate EGEE extension in the world– EUMedGrid– Eu-IndiaGrid – EUChinaGrid – EELA

Page 8: The INFN GRID

Enabling Grids for E-sciencE

8

Other FP6 activities of INFN GRID in Europe/2

• To promote EGEE for new scientific communities– GRIDCC (real time applications and instruments control)– BioInfoGrid (Bioinformatics: Coordinated by CNR)– LIBI (MIUR, Bionfomatics in Italy)– Cyclops (Civil Protection)

• To contribute to e-IRG, the e-Infrastructure Reflection Group born in Rome the December 2003 – Initiative of Italian Presidency on “eInfrastructures (Internet and

Grids) – The new foundation for knowledge-based Societies”Event organised by MIUR, INFN and EU Commission

– Representatives in EIRG appointed by EU Science Ministres– Policies and Roadmap for e-Infrastrutture development in EU

• To coordinate participation to Open Grid Forum (ex GGF)

Page 9: The INFN GRID

Enabling Grids for E-sciencE

9

INFN GRID / FP6 active projects

Page 10: The INFN GRID

10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

FP7:guarantee sustainability

• The future of Grids in FP7 after 2008– EGEE proposed to European Parlament to set up an European Grid

Initiative (EGI) in order to: Guarantee long-term support & development to European e-

Infrastructure based on EGEE, DEISA and the Grid national projects being fundend by the National Grid Initiatives (NGI)

Provide a coordination framework at EU level as done for the research networks by Geant, DANTE and the National Networks like GARR

• The Commission asked that a plan for long-term sustainability Grid infrastructure (EGI + EGEE-III, …) to be included among the goals of EGEE-II (other than DANTE+ Geant 1-2)

• The building of EGI at EU level and of a National Grid Initiave at national level is among the main goals of FP7

Page 11: The INFN GRID

11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The future of INFNGRID :IGI

• In 2006 ended Grid.IT, the 3+1 years National Project funded by MIUR with 12 M€ (2002-05)

• The future: the Italian Grid Infrastructure (IGI) Association – EU (eIRG, ESFRI) requires the fusion of different pieces of National Grids into a

single National Organisation (NGI) to be unique interface to EU --> IGI for Italy– Substantial consensus for the creation of IGI for a common governance of the

italian e-Infrastructure from all involved public bodies:INFN Grid, S-PACI, ENEA Grid, CNR, INAF, Centri Nazionali di supercalcolo : CINECA, CILEA, CASPUR, and new consortia “nuovi PON”

– Under evaluation with MIUR the evolution of GARR towards a more general body to manage all the components of the infrastructure: Network, Grid, Digital Libraries…

• Crucial for INFN in 2007-2008 will be to manage the transition from INFN Grid to IGI, in such a way to preserve and if possible enhance the organisation levels which allowed Italy to reach world leadership and become a leading partner of EGI

Page 12: The INFN GRID

Enabling Grids for E-sciencE

12

Overview

INFNGRID Overview

Page 13: The INFN GRID

Enabling Grids for E-sciencE

13

Supported Sites

40 Sites supported:

• 31 INFN Sites

• 9 NON INFN Sites

Total Resources:

• About 4600 CPUs

• About 1000 TB Disk Storage

(+ About 700 TB Tape)

Page 14: The INFN GRID

Enabling Grids for E-sciencE

14

Supported VOs

40 VOs supported:

•4 LHC (ALICE, ATLAS, CMS, LHCB)

•3 cert (DTEAM, OPS, INFNGRID)

•8 Regional (BIO, COMPCHEM, ENEA, INAF, INGV, THEOPHYS, VIRGO)

•1 catch all VO: GRIDIT

•23 Other VOs

Recentrly a new regional VO enabled: COMPASSIT

Page 15: The INFN GRID

Enabling Grids for E-sciencE

15

Components of the production Grid

Grid is not only CPUs and Storage

Other elements are as much fundamental for running, managing and

monitoring the grid:

• Middleware

• Grid Services

• Monitoring tools

• Accounting tools

• Management and control infrastructure

• Users

Page 16: The INFN GRID

Enabling Grids for E-sciencE

16

GRID ManagementGrid management is performed by the Italian Regional Operation

Center (ROC). Its main activities are:

Production of the INFNGRID release and test it

Deployment of the release to the sites, support to local administrators and sites certification

Deployment of the release into central grid services

Maintenance of grid services

Periodical check of the resources and services status

Account the resources usage

Support at an Italian level to site managers and users

Support at an European level to site managers and users

Introduction of new Italian sites

Introduction of new regional VOs

The IT-ROC is involved in many other activities, not directly related to the production infrastructure, i.e. PreProduction, PreView and Certification Testbeds

Page 17: The INFN GRID

Enabling Grids for E-sciencE

17

The Italian Regional Operation Center (ROC)

One of 10 existing ROCs in EGEE

Operations Coordination Centre (OCC)

– Management, oversight of all operational and support activities

• Regional Operations Centres (ROC)

– providing the core of the support infrastructure, each supporting a number of resource centres within its region

• Grid Operator on Duty

• Grid User Support (GGUS)

– At FZK, coordination and management of user support, single point of contact for users

Page 18: The INFN GRID

Enabling Grids for E-sciencE

18

Middleware

INFNGRID RELEASE

Page 19: The INFN GRID

Enabling Grids for E-sciencE

19

The m/w installed on INFNGRID nodes is a customization of the gLite

m/w used in the LCG/EGEE community. The customized INFNGRID

release is packaged by the INFN release team (grid-release<at>infn.it).

The ROC is responsible for the deployment of the release. At the

moment the INFNGRID-3.0-Update28 (based on gLite3.0-Update 28) is

deployed.LCG

LCG 1.0

INFN-GRID

1.0

EGEE

EGEE II

2004 20072003 2008

LCG 2.0

2.0

gLite 3.0

3.0

2005 2006

INFNGRID Release

Page 20: The INFN GRID

Enabling Grids for E-sciencE

20

INFNGRID customizations: why?

• VOs not supported by EGEE: define once configuration parameters (e.g. VO servers, poolaccounts, add VOMS certificates, ...) to reduce misconfiguration risks

• MPI (requested by non-HEP sciences), additional GridICE config (monitor Wns), AFS read-only (CDF requirement), ...

• Deploy additional middleware in a non intrusive way:Since Nov. 2004 VOMS, now in EGEE; DGAS (DataGrid Accounting System); NetworkMonitor (monitor network connection metrics)

Page 21: The INFN GRID

Enabling Grids for E-sciencE

21

INFNGRID customizations

• Additional VOs (~20)• GridICE on almost all profiles (including WN)• Preconfigured support for MPI:

– WN without home shared, but home synchronization using scp with host based authentication

• DGAS accounting:– New profile (HLR server) + additional packages on CE

• NME (Network Monitor Element)• Collaboration with CNAF-T1 for Quattor• UI “PnP”

– UI installable without administrator privilegies

• NTP• AFS (read-only) on WN (needed by CDF VO)

Page 22: The INFN GRID

Enabling Grids for E-sciencE

22

• The packages are distributed in repositories available via HTTP

• For each release EGEE, there are 2 repositories collecting different types of packages:– Middleware

http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30/– Security http://linuxsoft.cern.ch/LCG-CAs/current/

• INFNGRID customizations => 3-rd repository– http://grid-it.cnaf.infn.it/apt/ig_sl3-i386

Packages and metapackages

Page 23: The INFN GRID

Enabling Grids for E-sciencE

23

Metapackages management process

• 1: starting from EGEE lists, update INFNGRID lists (maintained in SVN repository)

• 2: once the lists are ok, to test them generate a first version of INFNGRID metapackages

• 3: install and/or upgrade the metapackages on the release testbed

• 4: if there are errors, correct and goto 2:

• 5: publish the new metapackages on the official repositories so they are available for everybody

Page 24: The INFN GRID

Enabling Grids for E-sciencE

24

Metapackages management

• our metapackages are supersets of the EGEE ones: – INFNGRID metapackage = EGEE metapackage + INFNGRID

additional rpms• EGEE distributed metapackages

– http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30• Flat rpm lists are available:

– http://glite.web.cern.ch/glite/packages/R3.0/deployment• We maintain a customized copy of the lists and resync them easily

– https://forge.cnaf.infn.it/plugins/scmsvn/viewcvs.php/trunk/ig-metapackages/tools/getglists?rev=1888&root=igrelease&view=log

• Using another tool (bmpl) we can generate all artifacts starting from the lists

– “Our” (INFNGRID) customized metapackages http://grid-it.cnaf.infn.it/apt/ig_sl3-i386

– HTML files with the lists of the packages (one list per profile) http://grid-it.cnaf.infn.it/?packages

– Quattor templates lists: http://grid-it.cnaf.infn.it/?quattor

Page 25: The INFN GRID

Enabling Grids for E-sciencE

25

ig-yaim

• The package ig-yaim is an extension of glite-yaim. It provides:– Additional functions or functions that override existing ones. Both

are stored in functions/local instead of functions/– e.g to configure NTP, AFS, LCMAPS gridmapfile/groupmapfile, ..

• More poolaccounts => ig-users.def instead of users.def• More configuration parameters => ig-site-info.def instead

of site-info.def– Both packages (glite-yaim, ig-yaim) are needed!!

Page 26: The INFN GRID

Enabling Grids for E-sciencE

26

Documentation

• Documentation is published at each release– Release notes, upgrade and installation guides:

http://grid-it.cnaf.infn.it/?siteinstall http://grid-it.cnaf.infn.it/?siteupgrade http://grid-it.cnaf.infn.it/?releasenotes

written in LaTeX and published in html, pdf and txt• Additional informations about Updates, various

Notes are published also in wiki pages:– https://grid-it.cnaf.infn.it/checklist/modules/dokuwiki/doku.php?

id=rel:updates– https://grid-it.cnaf.infn.it/checklist/modules/dokuwiki/doku.php?

id=rel:hlr_server_installation_and_configuration

• Everything is available for site managers on a central repository

Page 27: The INFN GRID

Enabling Grids for E-sciencE

27

gLite Updates:17/10/2006 - gLite Update 0620/10/2006 - gLite Update 0724/10/2006 - gLite Update 08

14/11/2006 - gLite Update 09

11/12/2006 - gLite Update 1019/12/2006 - gLite Update 11

22/01/2007 - gLite Update 1205/02/2007 - gLite Update 1319/02/2007 - gLite Update 1426/02/2007 - gLite Update 15…….…….

INFNGRID Updates: 27/10/2006 - INFNGRID Update 06/07/08(+ new dgas, gridice packages)

15/11/2006 - INFNGRID Update 09

19/12/2006 - INFNGRID Update 10/11

29/01/2007 - INFNGRID Update 1214/02/2007 - INFNGRID Update 1320/02/2007 - INFNGRID Update 1427/02/2007 - INFNGRID Update 15…………

Steps:– gLite Update announcement– INFNGRID release alignment to announced update

(ig-metapackages, ig-yaim)– Local testing– IT-ROC deployment

UpdatesUpdates deployment – Since the introduction of gLite3.0, from EGEE there where no more big release changes, but a series of smaller frequent updates (about weekly) – INFNGRID release was updated consequently

Page 28: The INFN GRID

Enabling Grids for E-sciencE

28

INFNGRID services Overview

INFNGRID Services Overview

Page 29: The INFN GRID

Enabling Grids for E-sciencE

29

The general web portal

Page 30: The INFN GRID

Enabling Grids for E-sciencE

30

The technical web portal

Page 31: The INFN GRID

Enabling Grids for E-sciencE

31

General Purpose Services

Page 32: The INFN GRID

Enabling Grids for E-sciencE

32

General purpose services – VOMS servers

Page 33: The INFN GRID

Enabling Grids for E-sciencE

33

VOMSes Stats

VO User

argo 17

bio 44

compchem 31

enea 8

eumed 56

euchina 35

gridit 89

inaf 25

infngrid 178

ingv 12

libi 10

pamela 16

planck 16

theophys 20

virgo 9

Cdf 1133

Egrid 28

VOMS NUMBER OF USERS PER VO

TOP USERS (about 85% of total proxies):

CDF (~50k proxies/month)

EUMED (~500 proxies/month)

PAMELA (~500 proxies/month)

EUCHINA (~400 proxies/month)

INFNGRID (Test purposes ~ 200 proxies/month)

Page 34: The INFN GRID

Enabling Grids for E-sciencE

34

General purpose Services - HLRs

Accounting:Home Location Register

• DGAS (Distributed Grid Accounting System) is used to account jobs running on the farm (grid and not-grid jobs)

• 12 HLR (1st level) distributed

• 1 experimental 2nd level HLR to aggregate data from 1st level

• DGAS2Apel used to send job to the GOC for all sites.

Page 35: The INFN GRID

Enabling Grids for E-sciencE

35

VOs Dedicated Services

New DEVEL-INFNGRID-3.1 WMS and LB are coming soon as VO dedicated services into production (atlas, cms, cdf, lhcb)

VO specific services previously run

by the INFNGRID Certification

Testbed and now moved to

production DEVEL RELEASE

A total of 18 VO dedicated services that will become 25 with the introduction of the 3.1 WMS and LB

Page 36: The INFN GRID

Enabling Grids for E-sciencE

36

FTS channels and VOs

• Installed and fully managed via Quattor-Yaim;• 3 hosts as frontend, 1 backend oracle cluster;• Not only LHC VOs

– PAMELA – VIRGO

• Full standard T1-T1 + T1-T2 + STAR channels– 51 channel agents;– 7 VO agents;

• (A prototype of) Monitoring tool available – Agent and Tomcat log file parsing and saved in a mysql db– Web interface: http://argus.cnaf.infn.it/fts/index-FTS.php

• Support:– Dedicated department team for Tickets;– Mailing list: fts-support<at>cnaf.infn.it

Page 37: The INFN GRID

Enabling Grids for E-sciencE

37

FTS transfer overview

Page 38: The INFN GRID

Enabling Grids for E-sciencE

38

Testbeds

M/W FLOW FROM DEVELOPERS

TO PRODUCTION

IN EGEE AND INFNGRID

Page 39: The INFN GRID

Enabling Grids for E-sciencE

39

TESTBEDS•Preview•Certification CERN•Certification INFN•Pre-Production Service (PPS)

Testbeds

JRA1Developers

SA3(Certification

CERN)

SA1PPS

(Pre-Production)

SA1EGEE PS(Production)

VOs VOs

JRA1/SA1Preview TB

VOs

INFNCertification

TB

VOs

INFNGRIDRelease

Team

SA1INFNGRID PS

(Production)

SA1INFNGRID PS

(DEVEL Production)

VOs

Page 40: The INFN GRID

Enabling Grids for E-sciencE

40

• AIM:the last step for m/w testing before being deployed at the production scale

• INPUT: CERN Certification (SA3)

• SCOPE: EGEE SA1 about 30 sites spread all over Europe (1 Taiwan)

• COORDINATION: CERN

• USER ALLOWED: all the LHC VOs, diligent, switch and 2 PPS fake VOs

• CONTACTS : project-eu-egee-pre-production-service<at>cern.ch

http://egee-pre-production-service.web.cern.ch/egee-pre-production-service/

• ACTIVITIES: Main activity is the testing of the installation procedures and basic

functionalities of releases/patches done by site-managers.

There is limited m/w testing done by users: this is the main pps issue!

Pre-Production Service (PPS) in EGEE

Page 41: The INFN GRID

Enabling Grids for E-sciencE

41

• PPS is run as the Production Service:

– SAM TESTs

– Tickets from COD

– GOCDB registration

– Etc…

Pre-Production Service (PPS) in EGEE

Page 42: The INFN GRID

Enabling Grids for E-sciencE

42

Italian Participation to PPS

• 3 INFN sites:• CNAF• PADOVA• BARI

• 2 Diligent sites:• CNR• ESRIN cert-se-01cert-ce-01

150 slots production farm

cert-ce-03

cert-rb-01 cert-bdii-03

ALL OTHER PPS SITES OUTSIDE INFNprep-ce-02 prep-se-01

68 slots production farm

prep-ce-01

pccms2vgridba5

150 slots production farm

cert-voms-01 pps-fts pps-lfc

cert-ui-01 pps-apt-repo

cert-mon-01

cert-mon

CNAF

BARI

PADOVA

Central Services

CNAF: 2 CE with access to the production farm, 1 SE, 1 mon box + central services (VOMS, UI, BDII, WMS, FTS, LFC, APT REPO)

people: D.Cesini, M.Selmi, D.DongiovanniPADOVA: 2 CE with access to the production farm, 1 SE, 1 Mon Box

people: M.Verlato, S.BertoccoBARI: 1 CE with access to the production farm, 1 SE

people: G.Donvito

Page 43: The INFN GRID

Enabling Grids for E-sciencE

43

Preview Testbed

•It is now an official EGEE activity asked by JRA1 to expose to users those components not yet considered by CERN (SA3) certification. The aim is getting feedback from end-users and sitemanagers.

•It is a distributed testbed deployed in few European sites.

•A joint SA1-JRA1 effort is needed in order not to dedicate persons at 100% of their time to this activity as acknowledged by TCG and PMB

•COORDINATOR : JRA1 (Claudio Grandi)

•USER ALLOWED: JRA1/Preview people and all interested users

•CURRENT ACTIVITIES: CREAM, gLexec, gPBox

•CONTACTS : project-eu-egee-middleware-preview<at>cern.chhttps://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLitePreviewNowTesting

Page 44: The INFN GRID

Enabling Grids for E-sciencE

44

Italian Participation to the Preview Testbed

• 3 INFN sites:- CNAF (D.Cesini, D.Dongiovani)- PADOVA(M.Sgaravatto, M.Verlato, S.Bertocco)• ROMA1 (A. Barchiesi)

H/W resources are partly taken from the INFN certification testbed and partly from the jra1 testbed.

cert-wn-04

egee-rb-05 cert-bdii-02

egee-rb-08

cert-ce-04 cert-ce-05

cert-wn-05

pre-ce-01

cert-wn-03

cert-se-01

cert-04

pad-wn-02

rm1-ce

pre-ui-01 cert-pbox-01

cert-pbox-02

cream-01

cream-02 cream-03 cream-04

cream-05 Cream-06

ALL OTHER PREVIEW SITES OUTSIDE INFN

rm1-wn

cert-ce-06

cert-wn-06

CNAF

PADOVA

Central Services

ROMA1

Physical nodes that run virtual services

Preview services deployed in Italy:PADOVA: 1 CREAM CE + 5 WNCNAF: 1 WMS 3.1, 1 BDII, 1 gLiteCE+ 1 WN, 1 UI, 1 DPM-SE

(for gpbox) 1 WMS3.1 + 2 gLiteCE + 1 LCG CE + 3 WN + 2 gpbox servers

ROMA1: 1 CE + 1 WN for gpbox tests (to be installed)

Virtual machines used at cnaf to optimize h/w resources

Page 45: The INFN GRID

Enabling Grids for E-sciencE

45

•EGEE Activity run by SA3 – It is the official EGEE certification testbed that releases gLite m/w to PPS and to Production.

•ACTIVITY: Test and certify all gLite components, release packaging.

•COORDINATION: CERN

•INFN Involved Sites: CNAF (A.Italiano), MILANO (E.Molinari), PADOVA (A.Gianelle)

•Italian Activities: Testing of Information providers, DGAS, WMS

CERN Certification (SA3)

Services provided: 1 lsf CE + 1 batch system server on a dedicated machine + 1 DGAS HLR + 1 site BDII + 2 WN.

All services are located at CNAF.

wmstest-ce-02

wmstest-ce-03

wmstest-ce-04

wmstest-ce-05

wmstest-ce-06

wmstest-ce-07

wmstest-ce-08

SA3 CERN Certification testbed INFN participation

CNAF

Recently the responsibility of WMS testing passed

from CERN to INFN – Main Focus of SA3-Italia

Page 46: The INFN GRID

Enabling Grids for E-sciencE

46

Distributed testbed deployed in a few Italian sites where EGEE m/w with INFNGRID customizations and INFNGRID grid products are installed for testing purposes by a selected number of end users and grid-managers before being released.

It is NOT an official EGEE activity and it should not be confused with the CERN certification testbed run by the SA3 EGEE activity.

Most of the server migrated to the PREVIEW TESTBED

SITES and PEOPLE: CNAF(D.Cesini, D.Dongiovani)PADOVA(S.DallaFina, C., Aifitimiei, M.Verlato)TORINO(R.Brunetti, G.Patania,F.Nebiolo)ROMA1 (A.Barchiesi)

•CONTACTS : cert-release<at>infn.ithttp://grid-it.cnaf.infn.it/certification

INFNGRID Certification Testbed

Page 47: The INFN GRID

Enabling Grids for E-sciencE

47

WMS (CNAF) No more time to perform detailed test as in the first phase of the

certification tb.( https://grid-it.cnaf.infn.it/certification/?INFN_Grid_Certification_Testbed:WMS%2BLB_TEST )

Provide resources to VOs or developers and maintain patched and experimental WMS:Experimental WMS 3.0:

- 1 ATLAS WMS- 1 ATLAS LB- 1 CMS WMS + LB- 1 CDF WMS + LB- 1 LHCB WMS + LB

WMS for developers:- 2 WMS + LB

The Experimental WMS were heavily used in the last period because more stable than those officially released due to the long time needed for patches to reach the PS:

- bad support from certification- production usage statistics altered

recently tagged as INFNGRID DEVEL (see next slide) PRODUCTION services

Support to JRA1 for the installation of WMS 3.1 in the development TB

ACTIVITIES / 1

INFNGRID Certification Testbed

Page 48: The INFN GRID

Enabling Grids for E-sciencE

48

DGAS CERTIFICATION (TORINO)- 4 physical servers virtualized in a very dynamic way

DEVEL RELEASE (PADOVA/CNAF):-To speed up the flow of patches into the service used by VOs, does not follow the normal m/w certification process-Based on the INFNGRID official release (3.0)-Wiki page on how to transform a normal INFNGRID release into a DEVELhttp://agenda.cnaf.infn.it/materialDisplay.py?contribId=4&amp;materialId=0&amp;confId=18

-apt repository to maintain control on what is going into the DEVEL release-1 WMS Server at CNAF- Announced via mail after testing at CNAF-Cannot come with all the guarantees of normally certified m/w

INFNGRID Certification Testbed

ACTIVITIES / 2

Page 49: The INFN GRID

Enabling Grids for E-sciencE

49

RELEASE INFNGRID CERTIFICATON (PADOVA)- 20 Virtual Machines on 5 Physical Servers- http://igrelease.forge.cnaf.infn.it

STORM – Some resources Provided- 3 physical servers

SERVER VIRTUALIZATION (all sites)

INFNGRID Certification Testbed

ACTIVITIES / 3

Page 50: The INFN GRID

Enabling Grids for E-sciencE

50

cert-rb-02

cert-rb-03

cert-rb-04

cert-rb-05

cert-rb-06

egee-rb-06

cert-bdii-01

Ce torino

server roma1

INFN Certification services

cert-wn-01 cert-ce-02

cert-wn-02

ibm139

Experimental Patched

WMSPASSED TO

DEVEL PROD or used by

JRA1

Virtualization Tests

Reources provided to STORM test

cert-wn-03

Release DEVEL

Release1

Release2 Release4

Release3 Release5

Ce torino

Ce torino

Release INFNGRID

5 Physical servers X 4 VM = 20 VM

DGAS test

CNAF

PADOVA

ROMA1

TORINO

cert-rb-07

Egee-rb-04

Testbed snapshot

Ce torino

INFNGRID Certification Testbed

Page 51: The INFN GRID

Enabling Grids for E-sciencE

51

VIRTUAL GRID ’NEW’Create a self contained grid using old T1 h/w resource to be dedicated to WMS tests:

- Total control of what is installed - No interference with the production grid (altered statistics, site-managers

complaining in case of stuck jobs, no production cpu wasting)

INFNGRID Certification Testbed

Physical servicesUnder the

developpers control

Virtual SitesExact deployment

is under study, probably:

1 LCG CE and 1 WN per physical

box

WMSLB

BDIICE WNCE WNCE WNCE WNCE WNCE WNCE WNCE WNCE WNCE WN

37 Physical Box available per RACK (2 racks available)

Dual PIII 1.4 GHz 2 GB RAM - box dedicated to virtual sites, services can be installed on more powerful machines

A Virtual Site Prototype is already installed on a couple of Boxes

We are investigating the performance that can be reached with this kind of hardware/deployment

Page 52: The INFN GRID

Enabling Grids for E-sciencE

52

Monitoring and Accounting

Monitoring and

Accounting Tools used by the ROC

Page 53: The INFN GRID

Enabling Grids for E-sciencE

53

Monitoring

GridICE:

http://gridice4.cnaf.infn.it:50080/gridice/site

Developed by INFN

Several servers with different scopes are

installed and maintained by

the IT-ROC

Page 54: The INFN GRID

Enabling Grids for E-sciencE

54

GSTAT:

http://goc.grid.sinica.edu.tw/gstat//Italy.html

Developed out of INFN

A GSTAT server is maintained by

the IT-ROC

Monitoring

• GSTAT queries the Information System every 5 minutes

• The sites and nodes checked are those registered in the GOC DB

• The inconsistency of the information published and the eventual missing of a service that a site should publish are reported as an error

Page 55: The INFN GRID

Enabling Grids for E-sciencE

55

SAM:

https://lcg-sam.cern.ch:8443/sam/sam.py

SAM-ADMIN:

https://cic2.gridops.org/samadmin/

Is the CERN-EGEE official testing tool, tests are performed by jobs submitted to sites. Submission is triggered by an admin web interface. A mirror of the web interface is hosted at CNAF and maintained by the IT-ROC.

Monitoring

Page 56: The INFN GRID

Enabling Grids for E-sciencE

56

ROCRep && HLRMON:

http://grid-it.cnaf.infn.it/rocrep/index.php

http://grid-it.cnaf.infn.it/hlrmon/index.php

(Data about all VOs, all sites, T1 excluded)

Web interface to obtain aggregated Grid usage data. Two versions exists:

1)Data taken from the GridiceDB

2)Data taken from DGAS HLR DB – a new interface is being released

Accounting

Page 57: The INFN GRID

Enabling Grids for E-sciencE

57

GOC ACCOUNITNG SYSTEM:http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php

Data from the HLR server are accounted into the GOC system through the dgas2apel tool

Accounting

Page 58: The INFN GRID

Enabling Grids for E-sciencE

58

Users and Sites Support

Support

Page 59: The INFN GRID

Enabling Grids for E-sciencE

59

• The IT-ROC offers a number of grid services and controls their

correct operation. But not only….

• The IT-ROC also continuously monitors the status of the sites

inside the ROC itself and in case of problems helps site managers

or users to find a solution.

• As a parallel activity the IT-ROC is also involved in the monitoring

and support of the entire EGEE infrastructure (TPM and COD) – The

same support activity to users and sites given to the INFNGRID is

given to the LCG/EGEE Grid on a round robin manner among the

ROCs

Support

Page 60: The INFN GRID

Enabling Grids for E-sciencE

60

Users and sites support

The main tools to give support to users are the ticketing systems:

EGEE make use of the GGUS (Global Grid User Support) ticketing system (www.ggus.org)

Each ROC uses different tools interfaced to GGUS in a bidirectional Way.

By means of Web services, it is possible to:Transfer tickets from the global to regional systemTransfer tickets from the regional to the global system

Once tickets are logged they are assigned to a proper support unit either in GGUS either in the regional systems

The IT-ROC ticketing system is based on XOOPS/xHelp

Page 61: The INFN GRID

Enabling Grids for E-sciencE

61

Interface to GGUS

Web Portal

GGUS System

GGUS/TPM

ROC-1 Helpdesk

ROC-1 Interface

Ticket solved

Ticket assignment to ROC-1

SU-1SU-2

SU-N

ROC-X Helpdesk

ROC-X Interface

SU-1SU-2

SU-N

Ticket re-assigned

Page 62: The INFN GRID

Enabling Grids for E-sciencE

62

Interface to GGUS

A new ticket arrives from GGUS

We assign the ticket to the site concerning it

Page 63: The INFN GRID

Enabling Grids for E-sciencE

63

Interface to GGUS

The site reassigns the ticket to GGUS…

…and adds a response

Page 64: The INFN GRID

Enabling Grids for E-sciencE

64

IT-ROC Control Shifts

About 20 supporters perform a monitoring activity composed by 2 shifts per day,

from Monday to Friday, with 2 persons per shift. At the end of the shift a report is produced.

During the shift the supporters:

• Check the Grid status and try to discover problems before the users. In case of problems open tickets to the interested department in order to find a solution. If he/she is able suggests a possible solution.

• Perform sites certification during the deployment phases

• Check the status of tickets and urges experts or site-managers to give answers and solutions to them

Page 65: The INFN GRID

Enabling Grids for E-sciencE

65

IT-ROC Shifts ISSUES

• The ROC monitoring is oriented to the infrastructure and not to the VOs

• The active monitoring done via test jobs (i.e. the SAM tool) uses 3 VOs dedicated to infrastructure testing: dteam, ops and infngrid that in general have greater priority on sites the side effect of this is that VO specific problems are not observed. Passive controls (i.e. gstat and gridice) are not affected by this problem.

• The infrastructure test can be ok, but users can experience problems as well.

The actual control shift organization seems to be insufficient for the VOs needs and the LHC VOs are already performing their own tests (VO dashboards) in order to face this situation.

Page 66: The INFN GRID

Enabling Grids for E-sciencE

66

IT-ROC Shifts ISSUES

Both the Italian and the European experiences in Grid monitoring show that it is necessary to integrate the infrastructure oriented monitoring with a more VO specific monitoring But just in INFNGRID we have about 40 VOs !!

Collaboration between the ROC and the people involved in the VO dashboards is desirable, at least to define a set of controls that are important for the VOs, but still not performed by the ROC

Page 67: The INFN GRID

Enabling Grids for E-sciencE

67

TPM and COD

TPM (Ticket Process Manager): is responsible of the right ticket assignment in the central GGUS system. When a ticket is logged it is automatically assigned to the TPM group that routes the ticket to the proper support unit or, if able, proposes a solution. The whole ticket life is under the control of the TPM that can at any time modify the ticket urging for an answer or solution. Each ROC performs 1 week shift on a round robin cycle.

COD (CIC On Duty): the same monitoring done for the INFNGRID infrastructure is done for the EGEE infrastructure using the same tools (i.e. GSTAT, SAM, GRIDICE, GGUS) and some COD specific tools (i.e. COD dashboard)

The Italian ROC is involved also in the monitoring and support of the entire LCG/EGEE infrastructure. It participates to the TPM and COD activities.

Page 68: The INFN GRID

Enabling Grids for E-sciencE

68

Procedures

Managing procedures

Page 69: The INFN GRID

Enabling Grids for E-sciencE

69

Introducing a new site

• Before joining the INFNGRID, a site have to accept several rules, described in a Memorandum of Understanding (MoU). The COLG (Grid Local Coordinator) read and sign it, and they fax this document to INFN-CNAF.

• Moreover all sites must provide this email alias: grid-prod@<domain>. This alias will be used to report problems and it will be added to the site managers' mailing list. Of course it should include all site managers of your grid site.

• The IT-ROC registers the site and site-managers in the GOC-DB, and create a supporter-operative group in the ticketing system XOOPS.

• Site-managers have to register themselves in XOOPS, so they can be assigned to their supporter-operative groups; each site-manager has to register in the test VOs infngrid and dteam

• Site-managers install the middleware, following the instructions distributed by the Release Team (http://grid-it.cnaf.infn.it/ Installation section) . When finished, they make some preliminary test (http://grid-it.cnaf.infn.it/ --> Test&Cert --> Fry) and then they make the request for the ROC certification

(http://grid-it.cnaf.infn.it/index.php?id=cmtreport&type=1).• IT-ROC log a ticket to communicate with site-managers during the

certification.

Page 70: The INFN GRID

Enabling Grids for E-sciencE

70

MoU for sites

Every site have to:

• Provide computing and storage resources. Farm dimensions (at least 10 cpu) and storage capacity

will be agreed with each site

• Guarantee sufficient man power to manage the site: at least 2 persons

• Manage efficently the site resources: middleware installation and upgrade, patch application,

configuration changes as requested by CMT and do that by the maximum time stated for the several

operation

• Answer to the ticket by 24 hours (T2) or 48 hours (other sites) from Monday to Friday

• Check from time to time own status

• Guarantee continuity to site management and support, also in holidays period

• Partecipate to SA1/Production-Grid phone conferences an meetings and compile weekly pre report

• Keep updated the information on the GOC DB

• Enable test VOs (ops, dteam and infngrid), with a higher priority than other VOs

• Eventual non-fulfilment noticed by ROC will be referred to the biweekly INFNGRID phone

conferences, then to COLG, eventually to EB

Page 71: The INFN GRID

Enabling Grids for E-sciencE

71

Introducing a new VO

When an experiment asks to enter in grid as a new VO, it is necessary a formal request

followed by some technical steps.

Formal Part:

• Needed resources and economical contribution to be agreed between the experiment and the INFN

GRID Executive Board (EB)

• Pick out the experiment software and verify it will work in the Grid environment

• Verify the support that it will receive in the several INFN GRID production sites

• Communicate to IT-ROC the names of VO-managers, Software-managers, persons responsible of

resources and of the support for the software experiment

• Software requisites, kind of jobs and of the storage final destination (CASTOR, SE, experiment disk

server)

Page 72: The INFN GRID

Enabling Grids for E-sciencE

72

Introducing a new VO

Once the Executive Board (EB) has

approved the experiment request, the

technical part begins:

• IT-ROC creates the VO voms server

• IT-ROC creates the VO support group on the ticketing system

• VO-managers fill in the VO identity card on the CIC portal

• IT-ROC announces the new VO to sites

Page 73: The INFN GRID

Enabling Grids for E-sciencE

73

Useful links…

• INFN GRID project: http://grid.infn.it/

• Italian Production grid: http://grid-it.cnaf.infn.it/

• SAM: https://lcg-sam.cern.ch:8443/sam/sam.py

• CIC Portal: http://cic.gridops.org/

• GSTAT: http://goc.grid.sinica.edu.tw/goc/

• GridICE: http://gridice4.cnaf.infn.it:50080/gridice/site/site.php

• GOC Accounting: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php

THANK YOU