View
218
Download
0
Embed Size (px)
Citation preview
Status of Globus activities within INFN
Massimo SgaravattoINFN Padova
for the INFN Globus [email protected]
Globus @ INFN WP “Installation and Evaluation of the Globus
Toolkit” of the INFN-GRID Project (WP 1) Goal: evaluate the Globus toolkit as a GRID
framework providing basic services Which services can be useful ? What is necessary to integrate/modify ? What is missing ?
Duration: 6 months Results of this first evaluation used to plan future
activities
Globus Project led by Ian Foster and Carl
Kesselman Basic research on GRID (resource
management, security, QoS, ...) Development of Globus Toolkit
Core services for GRID tools and applications
Globus ArchitectureApplications
Core ServicesMetacomputing
Directory Service
GRAMGlobus
Security Interface
Heartbeat Monitor
Nexus
Gloperf
Local ServicesLSF
Condor MPI
NQEEasy
TCP
SolarisIrixAIX
UDP
High-level Services and ToolsDUROC globusrunMPI Nimrod/GMPI-IO CC++
GlobusView Testbed Status
GASS
Tasks Security
To access GRID resources mechanisms for user authentication needed
Evaluation of GSI service
Information Service To discover the GRID resources (CPU, storage, network, …)
mechanisms to “publish” them must be defined Analysis of GIS service to “publish” information using a uniform and
standard interface
Resource Management Necessary a uniform interface to submit jobs on GRID resources
Uniform standard interface to different resource management systems Uniform standard language for task management Assessment of Globus GRAM service for resource allocation and
process management
Tasks Data Access and Migration
High performance and reliable tools needed to “manage” data (data transfers, wide area replica, …)
Assessment of Globus tools for data management (GASS, Globusftp, Replica Management tools)
Fault Monitoring Faults in a GRID environment must be promptly detected and
recovery mechanisms must be implemented Evaluation of HBM service for fault detection
Execution Environment Management Code migration (moving the application where the job will
actually be executed) as a possible implementation strategy Evaluation of GEM service to support code migration
Globus deployment Reduce complexity and manpower for Globus installation and
maintenance
Globus installed on ~ 30 machines in 11 sites
TORINO
PADOVA
BARI
PALERMO
FIRENZE
PAVIA
MILANO
GENOVA
NAPOLI
CAGLIARI
TRIESTE
ROMA
PISA
L’AQUILA
CATANIA
BOLOGNA
UDINE
TRENTO
PERUGIA
LNF
LNGS
SASSARI
LECCE
LNS
LNL
SALERNO
COSENZA
S.Piero
FERRARA
PARMACNAF
Status
ROMA2
Security (GSI) Already done:
Evaluation of the Globus security architecture We like the “one time login” paradigm, but some
improvements needed Globus certificates (for hosts and users) signed by INFN
certification authority On-going activities:
Definition and implementation of architecture of CAs Up to task force of the European DataGrid project
Periodic update of CRL “Management” of grid-mapfile (where the mappings
between local users and GRID users are defined) updates I.e.: a certain Globus resource must be available to all
members of a specific physics group
Information Service (GIS) Already done:
INFN MDS server serving Globus 1.1.1 and 1.1.2 installations (single LDAP server)
Lot of problems using the “default” American MDS server Definition and implementation of test architecture of GIS for
Globus 1.1.3 installations (distributed model) Web interface for browsing
On-going activities: Improvement of performance (Netscape LDAP server as top
level GIIS) Tests on performance and scalability
Results used to define and implement the GIS architecture Review the information gathered from the various
machines and published in the GIS
Dc=bo, Dc=infn,dc=it,o=grid
Bologna
GIIS
INFN ATLAS GIIS
GIIS
Dc=mi,Dc=infn,dc=it,o=grid
Exp=atlas, o=grid
Top Level INFN GIIS
Dc=infn,dc=it,o=grid
Milano
GIS Architecture (test phase)
GRIS
ImplementedImplemented using INFNGRID distribution
To be implemented
Resource Management (GRAM) Already done:
Job submission tests using Globus tools with real applications and in real production environments (GRAM as uniform interface to different underlying resource management systems [LSF, Condor, PBS])
Some bugs found and fixed Many many memory leaks !!! …
Some bugs can be solved without major re-design and/or re-implementation Two major problems:
Scalability Fault tolerance
Submission of Condor jobs to Globus resources (Condor-G and GlideIn) Evaluation of RSL as uniform language to specify resources
More flexibility is required Resource administrators should be allowed to define new attributes and users should be
allowed to use them in resource specification expressions (Condor Class-Ads model) Cooperation” between GRAM and GIS
The information on characteristics and status of local resources and on jobs is not enough (as local resources we must consider Farms)
The default schema must be integrated with other info provided by the underlying resource management systems or by specific agents
Resource Management (GRAM)
On going activities: Tests with GRAM API Identity a set of useful attributes of a
Condor pool, LSF cluster, PBS cluster that should be reported to the GIS, and integrate the default schema
Tests with MPICH-G2
Globus deployment Already done:
INFN-GRID 1.0 Non-precompiled Globus 1.1.3 + bug fixes Installation instructions (in particular for INFN customizations)
INFN-GRID 1.1 Precompiled Globus 1.1.3 for Linux Red Hat 6.x gsiwuftpd Support for LSF and Condor as underlying resource management systems Possibility to implement INFN customizations
Certificates signed by INFN CA Preliminary architecture for GIS
Installation instructions INFN-GRID 1.2
Besides INFN-GRID 1.1’s functionalities Support for Solaris 2.6 Support for PBS as resource management system Support for GDMP (for Linux)
Tool to upgrade INFN-GRID 1.1 INFN-GRID 1.2 Installation instructions
Globus deployment On-going activities:
Web software repository INFN-GRID 1.3
Fixes for Globus jobmanager memory leaks Support for Solaris 7 Full support for GDMP Distribution of various Globus compilations
(Kerberos, MPICH-G2) INFN-GRID toolkit available to DataGrid
partners Globus team interested to this toolkit
Data Management Already done:
Preliminary tests with GASS and gsiftp To do:
Tests with GlobusFTP and Replica Catalog Software (Globus Data Grid Alpha Release 2)
Other tasks Fault Monitoring (HBM)
Evaluation of HBM for fault detection (for “system” and “user” processes)
Data collectors (implementing automatic recovery mechanisms)
… but the HBM package is not seeing active development
Execution Environment Management (GEM) Evaluation of GEM as service for code migration … but the GEM service now provides only limited
capabilities (executable staging)
Other info http://www.infn.it/globus