View
40
Download
0
Category
Tags:
Preview:
DESCRIPTION
Operations structure of the INFN-GRID/Grid.it Production Grid Infrastructure. Presenter (on behalf of the authors): Cristina Vistoli cristina.vistoli@cnaf.infn.it Italian grid operation manager INFN CNAF – Bologna - Italy. Production Quality Grid Infrastructure. - PowerPoint PPT Presentation
Citation preview
Operations structure of the INFN-GRID/Grid.it
Production Grid Infrastructure
Presenter (on behalf of the authors): Cristina Vistoli
cristina.vistoli@cnaf.infn.it
Italian grid operation managerINFN CNAF – Bologna - Italy
Production Quality Grid Infrastructure
• Status of the infrastructure
• Operations structure and organization
• Grid monitoring and management
• Usage report and accounting
• User and operation support
The Italian Grid Production Infrastructure
about 40 Resource Centers
The grid resources can be accessed through central or VO-specific services (e.g. Resource Brokers)
28 sites are also part of the EGEE/LCG Grid infrastructure (and are registered in the central database of the Grid Operation Center)
the other 12 sites can be accessed through the Italian grid services only
http://grid-it.cnaf.infn.it
Production Infrastructure: Resources
InfnGrid-2_7_0
• InfnGrid-2_7_0 customization of LCG-2_7_0:– Support for the following VOs:
• egrid, babar, zeus, biomed, magic, esr, cms, atlas, lhcb, alice (managed via LDAP VO server);
• pamela, infngrid, cdf, gridit, compchem, planck, bio, enea, theophys, ingv, inaf, virgo, argo (managed via VOMS server);
• euchina, eumed (optional and managed via VOMS server). – DGAS (DataGrid Accounting System) :
• Patched WMS lcg2.1.73 on the Resource Broker to support DGAS • DGAS HLR (Home Location Register) server: it is responsible for
keeping the accounting information for both users and grid resources.
– Network Monitor Element, interfaced with GridIce for data presentation.
InfnGrid-2_7_0
– support for MPI jobs via home synchronization with scp with host based authentication
– Customized tools to install and use the grid:• installation by a customized version of LCG yaim
(ig-yaim)• support to interface ig-yaim with a Quattor
installation; • UIPnP: a PlugAndPlay User Interface to access the
grid as user of every Linux system without RPMs.
InfnGrid-2_7_0 : deployed services
FTS
LFC
MyProxy
RB (DGAS)
VOMS
Gridice
BDII
HLRINFNGRID-2_7_0
Operations Structure and Organization
The National Grid Central Management Team (CMT): – Activities:
• ‘integration’ and testing of the InfnGrid middleware release (based on LCG m/w release)
• deployment procedures and configuration tools
• Monitoring and control of the status of the grid services and resources
– Responsibilities: • site registration procedure
• middleware deployment
• certification procedure for all InfnGrid sites
• Operation of the GRID services
Grid Central Management Team
• Deployment Plan– The team coordinates the installation and
deployment of the grid services.
A plan is provided to:• ensure that the user support and service level provided
to the grid users during the upgrade period is acceptable
• simplify the certification activities (all resources are thoroughly tested before joining the infrastructure).
• Site registration procedure
• Site certification procedure
Operations Support
• The Italian ROC provides local front line support to Virtual Organization, Users and Resources Centres
• The Italian Roc team is organized in daily shifts:– 2 people per shift, 2 shifts per day, from Monday to Friday.
• Activities planned during the shift– Log trouble tickets created, updated and closed, problems on grid
services and sites, monitor successful site certification– check the actions of the previous shift and the downtime page– check the status of production grid services and the GRIS status of
production CE and SE.– check the status of the production sites using the Site Functional
Tests report• Periodic (every 15 days) phone conferences
– ROC/CIC teams and site managers • Provide and write the ROC report for the weekly EGEE operation
meeting
Grid Monitoring
• The status of the Italian grid infrastructure is monitored using GridIce, – It is one of the monitoring tools used by EGEE– It is used to control
• the status of the submitting queues
• Process/daemons status in the services (RB, BDII)
• VO view: list of CE and SE available for a the VOs and their status and capacity
• Job monitoring
Monitoring
Accounting
• The DataGrid Accounting System (DGAS) has been developed within the EDG and EGEE project.– It implements a resource usage metering and economic
accounting in a fully distributed grid environment
– It is part of the InfnGrid middleware release and has been deployed on the Italian Grid Infrastructure
– Grid computing resources and grid users are registered in appropriate servers, known as HLRs (Home Location Registers), which keep track of every submitted job. An arbitrary number of HLR servers can be used
DGAS HLR flow
Accounting
• Accounting data can be retrieved from the HLRs with different aggregation levels: – single-user
– group of users
– VO
– resource
• A functional test has been developed and it is used to monitor the stability of the service. It checks the functionality of the sensors and services running on the CE and the communication between CEs and HLRs
• DGAS data for the Italian Grid are aggregated/anonymized and provided to EGEE through an appropriate interface to Apel.
• More information on http://www.to.infn.it/grid/accounting/
Jobs per week: CMS
02000400060008000
10000120001400016000
WallTime @ CPUTime
010000200003000040000500006000070000
Week
Ho
ur
SUM(wallTime/3600)
SUM(cpuTime/3600)
Jobs per site (January, 15 – 31)
Total jobs =179.310
Jobs per site (January, 15 – 31)
Jobs per VO (January, 15 – 31)
Jobs report (January, 15 – 31)
User, Operation and VO support
• The user support system provides tickets exchange between: – ROC on Duty and site managers– Site managers and Central management team
and viceversa– Site manager and certification team during
installation/upgrade– GGUS to ROC ROC to GGUS
The support system
• Italian ROC ticketing system is built upon a suite of web based tools written in PHP: Xhelp
• The support system components are accessible form the main interface of the deployment portal (grid-it.cnaf.infn.it) providing a SSO point of registration/identification certificate-based.
• The end-user can open a request, view and follow his own tickets and related replies;
• A supporter can view tickets assigned to his own groups, add responses and solutions, and change status/priority
• While operating tickets, a side content is always available for all classes of users (related to their access level) – Site Functional Tests, – site downtimes calendaring system– file archive– net query tools– IRC applet, contextual questions and answers– reports from daily shifts
Interface with GGUS
• The Italian ROC support system is interfaced to the GGUS helpdesk application using web-services technologies– Secure methods to create and update trouble tickets in the GGUS database
are provided by the GGUS application. – These methods are called by APIs that wrap into SOAP messages the
ticket information stored in the XHelp database, and send them to the WSDL contact URL.
• A trouble ticket submitted by a local user to the XHelp helpdesk that cannot be addressed locally, can be escalated by the local supporter across the ROC boundaries.
• The system allows for ticket assignment to any other support unit of GGUS as well as all other ROC helpdesks connected to GGUS via the interface.
• The ticket is shared among all the helpdesk’s databases involved in the workflow, can be updated from every source, and any update will propagate towards all the other systems.
GGUSROC Basic Workflow
Web Portal
GGUS System
GGUS/TPM
ROC-1 Helpdesk
ROC-1 Interface
Ticket solved
Ticket assignment to ROC-1
SU-1SU-2
SU-N
ROC-X Helpdesk
ROC-X Interface
SU-1SU-2
SU-N
Ticket re-assigned
A new ticket comes from GGUS
We assign the ticket to the site
GGUS!
The site's support group reassigns the ticket to GGUS
…and adds a response!
Trouble tickets statistics
Authors
VISTOLI, Maria Cristina, INFN-CNAF
GAIDO, Luciano INFN-Torino
SELMI, Matteo INFN-CNAF
PAGANO, Alfredo INFN-CNAF
AIFTIMIEI, Cristina INFN – Padova
CUSCELA, Guido INFN - Bari
CAVALLI, Alessandro INFN – CNAF
FERRO, Enrico INFN – Padova
FANZAGO, Federica INFN – Padova
FANTINEL, Sergio INFN – LNL
VACCAROSSA, Luca INFN – Milano
CESINI, Daniele INFN-CNAF
PAOLINI, Alessandro, INFN-CNAF
VERONESI, Paolo INFN-CNAF
CAROTA, Luciana INFN-CNAF
NEBIOLO, Federico INFN- Torino
CALTRONI, Andrea INFN – Padova
DONVITO, Giacinto;, INFN - Bari
VERLATO, Marco, INFN – Padova
BAGNASCO, Stefano, INFN - Torino
BRUNETTI, Riccardo, INFN - Torino
DACRUZ, Marcio, INFN - Milano;
BARCHIESI, Alex INFN - Roma
FIORE, Sandro – Univ. Lecce
ARGENTATI, Sabrina – INFN - LNF
DALLA FINA, Simone – INFN - Padova ;
DELLE FRATTE, Cesare; INFN – Roma2;
TURRISI, Rosario INFN - Padova ;
GREGORETTI, Francesco , CNR-ICAR Napoli
Recommended