Upload
reginald-pierce
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
6-Mar-2003
Grids NOW
1
Grids Now: The USCMS Integration Grid Testbed
and the European Data Grid
Michael Ernst
6-Mar-2003
Grids NOW
2
6-Mar-2003
Grids NOW
3
Why Grids?
Efficient sharing of distributed heterogeneous compute and storage resources.
Virtual Organizations and Institutional resource sharing Dynamic reallocation of resources to target specific problems Collaboration-wide data access and analysis environments
Grid solutions need to be scalable and robust Must handle petabytes per year Tens of thousands of CPUs Tens of thousands of jobs
Grid solutions presented here are supported in part by the EDG, DataTag, PPDG, GriPhyN, and iVDGL.
We are learning a lot from these current efforts for LCG-1!
6-Mar-2003
Grids NOW
4
Introduction US CMS is positioning itself to be able to learn, prototype and develop while providing a production
environment to cater to CMS, US CMS and LCG demands
R&D Integration Production
3 phased approach is not a new idea, but is mission critical! Development Grid Testbed (DGT), Integration Grid Testbed (IGT), and Production Grid
6-Mar-2003
Grids NOW
5
Commissioning with CMSMonte Carlo Production
USCMS IGT is producing: 1M Egamma “BigJets” events from generation stage all the way
through analysis Ntuples Time dominated by GEANT simulation, but no intermediate data is kept Must run on RedHat Linux 6.1 resources for Objectivity license
500K Egamma “BigJets” events at CMSIM stage only EDG is producing:
1M Egamma “BigJets” events at CMSIM stage only Comparison of requests:
CMSIM-only involves large intermediate data (~ 1.5 TB) transfers Full simulation through to Ntuples involves more complex workflow
6-Mar-2003
Grids NOW
6
Special IGT Software: MOP
MOP is a system for packaging production processing jobs into DAGMan job descriptions
DAGMan jobs are Directed Acyclic Graphs (DAGs) MOP uses the following DAG Nodes for each job:
Stage-in: Stages in needed application files, scripts, data from the submit host Run: The application(s) run on the remote host Stage-out: The produced data is staged out from the execution site back to the submit host Clean-up: Temporary areas on the remote site are cleansed Publish: Data is published to a GDMP replica catalogue after it is returned
Master Site
Remote Site 1
IMPALA mop_submitterDAGManCondor-G
GridFTP
BatchQueue
GridFTP
Remote Site N
BatchQueue
GridFTP
6-Mar-2003
Grids NOW
7
Grid Resources
The USCMS Integration Grid Testbed (IGT) comprises: About 230 CPU (750 MHz equivalent, RedHat Linux 6.1)
Additional 80 CPU at 2.4 GHz running RedHat Linux 7.X About 5 TB local disk space plus Enstore Mass storage at FNAL
using dCache This is the combined USCMS Tier-1/Tier-2 resources:
Caltech, Fermilab, U Florida, UC SanDiego, (UW Madison)
The European Data Grid (EDG) comprises: About 350 CPU (750-1000 MHz equivalent, mostly RHL 6.2)
Up to 400 additional CPU at Lyon that are shared CERN, CNAF, RAL, (NIKHEF), Legnaro, Lyon
About 3.7 TB local Disk space plus CASTOR Mass Storage at CERN and HPSS at Lyon
6-Mar-2003
Grids NOW
8
Current Grid Software
Both grids use Globus and Condor core middleware USCMS Integration Grid Testbed (IGT)
Using Virtual Data Toolkit (VDT) 1.1.3 Bottom-up” approach
Advantage: Bugs have been shaken out of the core middleware products
European Data Grid Using EDG release 1.3 Enhanced functionality
Eg- Using multiple Resource Brokers relying on real time monitoring information to schedule jobs
“Top-down” approach Advantage: More functionality has been added
6-Mar-2003
Grids NOW
9
The CMS IGT “Stack”
The CMS IGT “stack” comprises nine layers. The Application layer contains only CMS executables. The Job Creation layer comprises CMS provided tools MCRunJob and Impala. Neither MCRunJob nor Impala are specifically “grid aware.” Then there is a DAG Creation layer and a Job Submission layer. Both functionalities are provided by MOP. Jobs are submitted to DAGMAN which, through Condor-G, manages jobs run on remote Globus Job Managers. Finally, there is a local Farm or Batch System used by Globus GRAM to manage jobs. In the case of the IGT, the local Batch manager was always FBSNG or Condor. Scheduling and Integrated monitoring are not present.
Application
Job Creation
DAG Creation
DAGMAN/Condor_G
Globus
Network
Job Manager
Farm/Batch System
CMSApplications
MCRunJob
MOP
Globus/GRAM
FBS/PBS/Condor
Job Submission
VDT
Mass Storage System(dCache+Enstore)
6-Mar-2003
Grids NOW
10
Monitoring
MonaLisa is used as the primary IGT Grid-wide monitor. Physical parameters: CPU load, network usage, disk space, etc.
Dynamic discovery of monitoring targets and schema Implemented with Java/Jini with SNMP local monitors Interfaces to/from other monitoring packages
EDG uses two sources of Monitoring: EDG Monitoring System (based on Globus Metadata Directory
Service) Physical Parameters: CPU Load, network usage, disk space, etc.
BOSS (Batch Object Submission Service) Application level: Running Time, Size of Output, Job Progress, etc Stores information in MySQL DB in real time
6-Mar-2003
Grids NOW
11
ML Design ConsiderationsML Design Considerations
Act as a true dynamic service and provide the necessary functionally to be used by any other services that require such information (Jini, UDDI - WSDL / SOAP)- - mechanism to dynamically discover all the "Farm Units" used by a community - remote event notification for changes in the any system- lease mechanism for each registered unit
Allow dynamic configuration and the list of monitor parameters.Integrate existing monitoring tools ( SNMP, Ganglia, Hawkeye …)To provide:
- single-farm values and details for each node - network aspect - real time information - historical data and extracted trend information - listener subscription / notification - (mobile) agent filters (algorithms for prediction and decision-support)
6-Mar-2003
Grids NOW
12
IGT Results (so far)
Time to process 1 event: 500 sec @ 750 MHz
Speedup: Avg factor of 100 speedup during current run
Resources: Approximately 230 CPU @750 MHz equiv.
Sustained efficiency: about 43.5%
Oct 25
6-Mar-2003
Grids NOW
13
EDG CMSIM events vs. time
Avg. CPU Utilizationwas “about the same”
6-Mar-2003
Grids NOW
14
Analysis of Problems on IGT The usual servers die and need to be restarted
But nothing that seems to be related to the load... Failure semantics currently lean heavily towards automatic
resubmission of failed jobs Sometimes failures are not recognized right away Need better system for spotting chronic problems
BOSS does this already, we aren’t using it because it was still under development when we were planning the IGT
Problems must be better routed to the “right” people At one point, an application problem was misdiagnosed early as a
middleware issue Partly because IGT is currently run by middleware developers!
Once the application expert looked into it, the problem was solved in 90 minutes
6-Mar-2003
Grids NOW
15
Analysis of Problems on EDG
The biggest problems related to the Information System: Symptom: no resources are found
Cause: instability of the MDS when it is overloaded Solution: submitting jobs at a lower rate improves the chances of success Symptom: the RB gets stuck (no job ever starts)
Cause: investigating... Symptom: grid elements disappear from the II
Cause: services on some machines stopped workingSolution: restart the services
Symptom: timeouts when copying the input sandbox Symptom: log file lost (“Stdout does not contain useful data”)
Cause: several (no free files/inodes, broken connect. between CE & RB, …) Problems related to the replica manager:
Symptom: file registration in the RC fails from time to time None of these problems is a show-stopper and they happen just in a fraction
of the jobs! Fixes are already there for some of them (but not yet deployed)
6-Mar-2003
Grids NOW
16
The Integration Grid Testbed(IGT)
2002(IGT) 2002(PG) 2003(New) 2003(IGT) 2003(PG)
FNAL 60 0 260 10 310
Florida 80 0 175 5 250
Caltech 120 0 88 5 203
UCSD 128 0 88 5 211
Total 388 0 611 25 974
Resource Allocations (1 GHz equiv. CPU) in 2002/2003 for IGT and Production Grid. (R&D Grid not included.)
New resources for Tier-2 are from iVDGL.
6-Mar-2003
Grids NOW
17
Comparison to CMS Spring 02 This is really apples and oranges, but... Average CPU utilization was about 20-25% during Spring02
over all participating CMS Regional Centers Though it is impossible to extract a concrete number...
It really isn’t known how many Spring02 CPUs should be counted in the denominator, estimated 700-1000 CPUs
File transfers were much more complicated during Spring02 Objectivity data was kept and archived Different events were processed at different steps
But the current efforts show that the Grids are in the same ballpark!
We still need a factor of ~2.5 for DC04 production want slightly better than 1 evt/sec
6-Mar-2003
Grids NOW
18
Manpower Results (so far) Estimates of Manpower for the IGT
2.65 FTE equivalent during initial phase and debugging Reported voluntarily in response to a general query
1.1 FTE equivalent during smooth running periods The manager plus periodic small file transfers
Expected to be less than 1 FTE when regular shift procedures are adopted Caveats:
This is the “second” attempt for the IGT. The first attempt last Spring needed more manpower.
We STILL have a rapidly changing middleware environment This is not counting general sysadmin support This is really saying that production ops is becoming less specialized
EDG began in early December, first attempts, No manpower estimates yet. A task force has been set up including reps from EDG, CMS, and LCG
6-Mar-2003
Grids NOW
19
Next steps For the EDG:
Deploy the fixes for the problems encountered so far Put the online monitoring in place
For the IGT: Deploy fixes for problems uncovered so far Deploy more functionality
IGT and EDG are preparation for LCG-1 Recently, LCG started participating in the IGT
50 nodes running CMSIM-only production Getting “top” and “bottom” to meet in the middle
6-Mar-2003
Grids NOW
20
Conclusion
Our approach to developing the software systems for the distributed data processing environment adopts “rolling prototyping”– Analyze current practices – Prototyping of the distributed processing environment – Software Support and Transitioning – Servicing external milestones
Next prototype system to be delivered is the US CMS contribution to the LCG Production Grid (June 2003)– CMS will run a large Data Challenge on that system to prove the
computing systems (including new object storage solution ?)
This scheme allows us to flexibly react to technology developments AND to changing and developing external requirements