DataGrid is a project funded by the European Commission under contract IST-2000-25182
Status and Prospective of EU Data Grid Project
Alessandra Fanfani (University of Bologna)
On behalf of EU DataGrid project
Outline: EU DataGrid project HEP Application experience Future perspective
http://www.eu-datagrid.org
The European DataGrid Project - n° 2The 2nd Workshop on HEP GRID – Daegu 22 August 2003
The EU DataGrid Project
9.8 M Euros EU funding over 3 years
90% for middleware and applications (HEP , Earth Observation, Biomedical)
3 year phased developments & demos
Total of 21 partners Research and Academic institutes as
well as industrial companies
Extensions (time and funds) on the basis of first successful results:
DataTAG (2002-2003) www.datatag.org
CrossGrid (2002-2004) www.crossgrid.org
GridStart (2002-2004) www.gridstart.org
Project started on Jan. 2001
Testbed 0 (early 2001) International test bed 0 infrastructure
deployed Globus 1 only - no EDG middleware
Testbed 1 ( early 2002 ) First release of EU DataGrid software to
defined users within the project
Testbed 2 (end 2002) Builds on Testbed 1 to extend facilities of
DataGrid Focus on stability
Passed 2nd annual EU review Feb. 2003
Testbed 3 (2003) Advanced functionality & scalability Currently being deployed
Project stops on Dec. 2003
The European DataGrid Project - n° 3The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Related Grid Projects
Through links with sister projects, there is thepotential for a truly global scientific applications grid
Main components of EDG 2.0 release build the basis for LCG middleware LHC Computing Grid www.cern.ch/lcg
The European DataGrid Project - n° 4The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EDG Middleware Architecture
Collective ServicesCollective Services
Information &
Monitoring
Information &
Monitoring
Replica ManagerReplica
ManagerGrid
SchedulerGrid
Scheduler
Local ApplicationLocal Application Local DatabaseLocal Database
Underlying Grid ServicesUnderlying Grid Services
Computing Element Services
Computing Element Services
Authorization Authentication and Accounting
Authorization Authentication and Accounting
Replica CatalogReplica Catalog
Storage Element Services
Storage Element Services
SQL Database Services
SQL Database Services
Fabric servicesFabric services
ConfigurationManagement
ConfigurationManagement
Node Installation &Management
Node Installation &Management
Monitoringand
Fault Tolerance
Monitoringand
Fault Tolerance
Resource Management
Resource Management
Fabric StorageManagement
Fabric StorageManagement
Grid
Fabric
Local Computing
Grid Grid Application LayerGrid Application Layer
Data Management
Data Management
Job Management
Job Management
Metadata Management
Metadata Management
Service Index
Service Index
APPLICATIONS
GLOBUSCondorG
(via VDT)
M / W
The European DataGrid Project - n° 5The 2nd Workshop on HEP GRID – Daegu 22 August 2003
The user interacts with Grid via a Workload Management System (WMS)
The Goal of WMS is the distributed scheduling and resource management in a Grid environment.
Resource Broker tries to match user requirements with available resources
Software installed at potential sites Ensure data locality Efficient usage of resources
Workload Management System
The European DataGrid Project - n° 6The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Data Management
High level data management on the Grid Location of data
Replication of data
Efficient access to data
Provide basic, consistent interface to disk and mass to storage systems (Hides the Storage Resource Manager )
The European DataGrid Project - n° 7The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Information & Monitoring
R-GMA Relational implementation of GMA from GGF
Makes use of GLUE schema (inter-operability with US grids)
Interoperable with MDS
Deals with information on The Grid itself
Resources and Services Job status information
Grid applications
The European DataGrid Project - n° 8The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Grid aspects covered by EDG
VOMS
(VO Membership Service)
Provides certificate with VOs, groups and roles
RGMA: Information & Monitoring
Provides info on resource utilization & performance
User Interface Submit & monitor jobs, retrieve output
Grid Fabric Management
Configure, installs & maintains grid sw packages and environ.
Workload Management System
Manages submission of jobs to Res. Broker, obtains information and retrieves output
Network performance
Provides efficient network transport, bandwidth monitoring
Computing Element Gatekeeper to a grid computing resource
Testbed admin. Certificate auth.,user reg., usage policy etc.
Storage Resource Manager
Grid-aware storage area Applications
HEP, EO, Biology
Replica Manager Replicates and locates data
The European DataGrid Project - n° 9The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Detailed Interplay of EDG Components
The European DataGrid Project - n° 10The 2nd Workshop on HEP GRID – Daegu 22 August 2003
People
>350 registered users
12 Virtual Organisations
16 Certificate Authorities
>300 people trained
278 man-years of effort
100 years funded
Scientific applications5 Earth Obs institutes9 bio-informatics apps6 HEP experiments
DataGrid in Numbers
Software
50 use cases
18 software releases
Current release 1.4
Release 2.0 being tested
>300K lines of code
Testbeds
>15 regular sites 40 sites using EDG sw (i.e. Taiwan,
Korea)
>10’000s jobs submitted
>1000 CPUs
>15 TeraBytes disk
3 Mass Storage Systems
The European DataGrid Project - n° 11The 2nd Workshop on HEP GRID – Daegu 22 August 2003
DataGrid Scientific Applications
Earth Observation
•about 100 Gbytes of data per day (ERS 1/2)
•500 Gbytes, for the ENVISAT mission
Bio-informatics Data mining on genomic databases (exponential
growth) Indexing of medical databases (Tb/hospital/year)
Particle Physics Simulate and reconstruct complex physics
phenomena millions of times
LHC experiments will generate 6-8 PetaBytes/year
Developing grid middleware to enable large-scale usage by scientific applications Development on computing side but also focus on the real use by the applications!
The European DataGrid Project - n° 12The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Application Usage of Release 1.4
Positive Signs:
Large increase in users.
Many sites interested in joining.
Pushing real jobs through system.
EDG 1.4 evaluated for review in Feb. 2003
CEs
5
674
1,1
175
1867
126
127
1221
809
21
2616
7
32
3
1 10 100 1000 10000
ALICE
ATLAS
BaBar
Bio.
CMS
E.O.
LHCb
ITeam
CPU Hours
J obs
HEP Simulation
Disk Usage
CPU Usage
CEsSEs
Nb
. of
evts
ALICE
ATLAS
Bio.
CMS
E.O.
ITeam
LHCb
Tutor
WP6
1 MB
1 GB
1 TB
TOTAL: >1.5 TB
100 GB
19 GB
200 GB
Disk Usage
(CERN)
Successful 2nd annual EU review: funding agencies were happy about the real use by the application
The European DataGrid Project - n° 13The 2nd Workshop on HEP GRID – Daegu 22 August 2003
HEP Applications
Intense usage of application testbed in 2002 and early 2003, in particular by HEP experiments:
ATLAS, CMS, ALICE, LHCb, Babar, D0 activities within DataGrid documented in detail in deliverable D8.3 https://edms.cern.ch/document/375586/1.2
ATLAS and CMS task forces very active and successful Several hundred ATLAS simulation jobs of length 4-24 hours were
executed & data was replicated using grid tools CMS Generated ~250K events for physics studies with ~10,000
jobs in 3 week period Since project review: ALICE and LHCb have been generating physics
events Babar and D0 performed more basic tests with analysis and Monte-
Carlo production jobs
The European DataGrid Project - n° 14The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Joint evaluation fromAtlas/CMS work on Release 1.4
Results were obtained from focused task-forces of Experiments and EDG people
Good interaction with EDG middleware providers
Fast turnaround in bug fixing and installing new software
Test were labour intensive since software was developing and the overall system was fragile
There are essential developments needed in Data Management (robustness and functionality)
Information Systems (robustness and scalability)
Workload Management (scalability for high rates, batch submissions,stability)
Mass Storage Support (gridified support due in EDG 2.0)
Release 2.0 should fix the major problems
The European DataGrid Project - n° 15The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Release 2.0
Major new developments in all middleware areas
Addressing the key shortcomings identified: WMS stability and scalability WMS re-factored
Replica catalog stability and scalability Replica Location Service
Data management usability DM re-factored
Information system stability and scalability R-GMA
Unified access to MSS new SE service
Fabric monitoring infrastructure
Providing new functionalities
Upgrade underlying software
The European DataGrid Project - n° 16The 2nd Workshop on HEP GRID – Daegu 22 August 2003
HEP experience:the CMS example joint effort involving CMS, EDG, EDT and LCG people
CMS/EDG Stress Test Goals: Verification of the portability of the CMS Production environment into a
grid environment; Verification of the robustness of the European DataGrid middleware in
a production environment; Production of data for the Physics studies of CMS
Use as much as possible the High-level Grid functionalities provided by EDG:
Workload Management System (Resource Broker), Data Management (Replica Manager and Replica Catalog), MDS (Information Indexes), Virtual Organization Management, etc.
Interface (modify) the CMS Production Tools to the Grid provided access method
Measure performances, efficiencies and reason of job failures to have feedback both for CMS and EDG
The European DataGrid Project - n° 17The 2nd Workshop on HEP GRID – Daegu 22 August 2003
CMS/EDG Middleware and Software Middleware was: EDG from version 1.3.4 to version 1.4.3
Resource Broker server Replica Manager and Replica Catalog Servers MDS and Information Indexes Servers Computing Elements (CEs) and Storage Elements (SEs) User Interfaces (UIs) Virtual Organization Management Servers (VO) and Clients EDG Monitoring, etc…
CMS software distributed as rpms and installed on the CE
CMS Production tools (IMPALA,BOSS) installed on User Interface
Monitoring was done trough: Job monitoring and bookkeeping: BOSS Database, EDG Logging & Bookkeeping service Resources monitoring : Nagios, web based tool developed by the DataTag project EDG monitoring system (MDS based): collected regularly by scripts running as cron jobs
and stored for offline analysis BOSS database: permanently stored in the MySQL database
Both sources are processed by a tool (boss2root) to put the information in a Root tree to perform analysis
Online
Offline
The European DataGrid Project - n° 18The 2nd Workshop on HEP GRID – Daegu 22 August 2003
CMS jobs description
CMKINJob
CMSIMJob
Output data(ntuples)
Output data(Fz files)
Grid Storage
Write to Grid
Storage Element
Write to Grid
Storage Element
Read from
Grid
Stora
ge Elem
ent
* PIII 1GHz 512MB 46.8 SI95
size/event
time*/event
CMKIN ~ 0.05MB ~ 0.4-0.5 sec
CMSIM ~ 1.8 MB ~ 6 min
Dataset eg02_BigJets CMS official jobs for “Production” of
results
used in Physics studies : Real-life testing
Production in 2 steps:
1. CMKIN : MC Generation of the proton-proton interaction for a physics channel (dataset)
125 events ~ 1 minute ~ 6 MB ntuples
2. CMSIM : Detailed simulation of CMS Detector
125 events ~ 12 hours ~ 230 MB FZ files
“Short” jobs
“Long” jobs
The European DataGrid Project - n° 19The 2nd Workshop on HEP GRID – Daegu 22 August 2003
CMS production components interfaced to EDG
•Four submitting UIs: Bologna/CNAF (IT), Ecole Polytechnique (FR), Imperial College (UK), Padova/INFN (IT)
•Several Resource Brokers (WMS), CMS-dedicated and shared with other Applications: one RB for each CMS UI + “backup”
•Replica Catalog at CNAF, MDS (and II) at CERN and CNAF, VO server at NIKHEF
CMS EDGBOSS
DB
WorkloadManagement
System
JDL
RefDB
parameters
input
dat a
lo
cat i
on
Push data or info
Pull info
UIIMPALA/BOSS
Replica Manager
CE
CMS software
CE
CMS software
CE
WN
SE
SE
SE
Job output filteringRuntime monitoring
CE
CMS software
SE
data registration
read
write
SECE
CMS software
X
The European DataGrid Project - n° 20The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EDG hardware resources
SiteNumber of CPUs
Disk Space GB
Availability of MSS
CERN (CH) 122 1000* (+100) yes
CNAF (IT) 20 + 20* 1000*
RAL (UK) 16 360
Lyon (FR)shared
120 (400)200 yes
NIKHEF (NL) 22 35
Legnaro (IT)* 50 1000*
Ecole Polytechnique (FR)* 4 220
Imperial College (UK)* 16 450
Padova (IT)* 12 680
Totals 402 (400) 3000* + (2245)
*Dedicated to CMS Stress Test
•CNAF Bologna
•Legnaro & Padova
•CERN
•Ecole Poly
RAL .•Imperial College
•NIKHEF
•Lyon
add new (CMS) sites to provide extra resources
The European DataGrid Project - n° 21The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Statistics of CMS/EDG Stress Test
Nb o
f jo
bs
Executing Computing Element
Total EDG Stress Test jobs = 10676, successful =7196 , failed = 3480
Total nb. of events
CMKIN CMSIM
592750 268375Total size of
data produced 500 GB
distribution of job:Executing CEs
The European DataGrid Project - n° 22The 2nd Workshop on HEP GRID – Daegu 22 August 2003
CMS/EDG Production
~260K events produced
~7 sec/event average
~2.5 sec/event peak (12-14 Dec)
30 Nov
20 Dec
CMS Week
Upgrade of MW
Hit some limitof implement. (RC,MDS)
CMSIM “long” jobs
Nb o
f events job submitted from UI:
The European DataGrid Project - n° 23The 2nd Workshop on HEP GRID – Daegu 22 August 2003
Main results and observations RESULTS
Could distribute and run CMS software in EDG environment
Generated ~250K events for physics with ~10,000 jobs in 3 week period
OBSERVATIONS
Were able to quickly add new sites to provide extra resources
Fast turnaround in bug fixing and installing new software
Test was labour intensive (since software was developing and the overall system was fragile)
WMS: At the start there were serious problems with long jobs- recently improved
Data Management: Replication Tools were difficult to use and not reliable, and the performance of the Replica Catalogue was unsatisfactory
Information system: The Information System based on MDS performed poorly with increasing query rate
The system is sensitive to hardware faults and site/system mis-configuration The user tools for fault diagnosis are limited
EDG 2.0 should fix the major problems providing a system suitable for full integration in distributed production
The European DataGrid Project - n° 24The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EU DataGrid Summary and Outlook The focussing of the project on stability has improved the manner in which
the software is build and supported
The application testbed has reached the highest level of maturity that can be achieved using the available grid middleware and supporting manpower
Steady increase in the size of the testbed until a peak of approx 1000 CPUs at 15 sites
Intense usage of application testbed (release 1.3 and 1.4) in the past year
significant achievements in the use of EDG middleware by the experiments : Real use is possible but labour intensive Results were obtained by task-force which pointed to areas in the middleware which
required development and reconfiguration
The problems in performance encountered by the experiments are addressed in the release EDG 2.0.
There is a strong connection with the LHC Computing Grid. LCG have a new grid service modeled on the EDG testbed and includes EDG 2.0 components
Outlook: A production quality infrastructure is needed EGEE
Continuous, stable Grid operation represents the most ambitious objective of EGEE and require the largest effort
The European DataGrid Project - n° 25The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EGEE vision:Enabling Grids for E-science in Europe
Goal Create a wide European Grid production quality infrastructure on top of present and future EU RN infrastructure
Build on EU and EU member states major investments in Grid Technology
Exploit International connections (US and AP) Several pioneering prototype results Large Grid development team (>60 people) Requires major EU funding effort
Approach Leverage current and planned national and regional Grid programmes (e.g. LCG)
Work closely with relevant industrial Grid developers, NRENs and US-AP projects
EGEE
Applications
Geant network
http://www.cern.ch/egee
The European DataGrid Project - n° 26The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EGEE Proposal
Proposal submitted to EU IST 6th framework call on 6th May 2003
Executive summary (exec summary: 10 pages; full proposal: 276 pages)
http://agenda.cern.ch/askArchive.php?base=agenda&categ=a03816&id=a03816s5%2Fdocuments%2FEGEE-executive-summary.pdf
Two-year project conceived as part of a four year programme
9 regional federations covering 70 partners in
26 countries
The European DataGrid Project - n° 27The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EGEE Operation Management
Regional Operations Centre
Core Infrastructure Centre
Service Activities: deliver production level Grid Infrastructure (52% of funding)
Integration of national and international Grid infrastructures
Essential elements: manageability, robustness, resilience to failure,consistent security model, scalability to rapidly absorb new resources
Joint Research Activity: Engineering development (24% of funding)
Re-Engineering of grid middleware (OGSA environment) to improve the services provided by the Grid infrastructure
Networking Activities:Management, Dissemination, Training and Applications (24% of funding)
The Applications Interface Activity will start with two Pilot applications in high energy physics and bio/medical
EGEE Activities
managing the overall Grid infrastructure
regional deployment and support of services
The European DataGrid Project - n° 28The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EGEE Status
EGEE proposal passed thresholds at first EU review (June 2003) Follow-up hearing held at Brussels on 1st July 2003 to answer written
questions from the EU reviewers on details of the project
Evaluation Summary Report received from Brussels (17th July 2003) Number of detailed recommendations made
EU budget estimated at 31.5M€
Negotiate budget details during summer and produce Technical Annex (details of negotiated tasks and budgets)
Informal EGEE/EU meeting held in Brussels 24th July 2003
Foreseen project start date: 1st April 2004
Good match with existing EU DataGrid and related project expected completion
All partners are requested to assign resources already during summer 2003 to start engineering investigations and architecture design work so that project can start on time
The European DataGrid Project - n° 29The 2nd Workshop on HEP GRID – Daegu 22 August 2003
EGEE Summary
EGEE is a project to develop and establish a reliable infrastructure that provides high quality grid service to a wide range of users
HEP is one of the two pilot application areas selected to guide the implementation and certify the performance and functionality of this evolving European Grid infrastructure
International connection : participation and collaboration with non EU countries (Russia, US, AP) is desirable and will be pursued