The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY

Embed Size (px)

DESCRIPTION

ISGC 2007: CMS Computing3/26 CMS achievements 2006 : Physics TDRs Feb 2006: Volume I of the P-TDR; describes detector performance and software. Jun 2006: Volume II describes the physics performance. The two volumes constitute the culmination of our plans for data analysis in CMS with up to 30 fb -1 of data. –The special study of detector commissioning and data analysis during the startup of CMS, has been deferred to This activity mobilized hundreds of collaborators during the past two years, and many useful lessons have been learned.

Citation preview

The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY ISGC 2007: CMS Computing2/26 CMS achievements 2006 CMS achievements 2006 Magnet & Cosmics Test (August 06) Detector Lowering (January 07) ISGC 2007: CMS Computing3/26 CMS achievements 2006 : Physics TDRs Feb 2006: Volume I of the P-TDR; describes detector performance and software. Jun 2006: Volume II describes the physics performance. The two volumes constitute the culmination of our plans for data analysis in CMS with up to 30 fb -1 of data. The special study of detector commissioning and data analysis during the startup of CMS, has been deferred to This activity mobilized hundreds of collaborators during the past two years, and many useful lessons have been learned. ISGC 2007: CMS Computing4/26 CMS: Computing highlights 2006 Main computing/software milestones: Magnet Test Cosmic challenge (Apr 06) Computing Software and Analysis Challenge 06 (Nov 06) 2006: a year of fundamental software changes New simulation and reconstruction software packages released Very positive feedback from users Developed procedures for release integration, building and distribution. Control release tools, Hypernews, Nightly builds, Tag collector, WorkBook, Design control of all interfaces and data formats in place CMSSW framework, framework-light, ROOT available for data access Integration with CMS detector and commissioning activities Strong connections with various detector groups key for commissioning Validation software packages and validation procedure in place crucial for startup preparation ISGC 2007: CMS Computing5/26 Major Milestone in 2006: CSA06 Combined Computing, Software, and Analysis challenge (CSA06) A 25% of 2008 data challenge of the CMS data handling model, computing operations Integrated test of full end-to-end chain of the complete system, from (simulated) raw data to analysis at Tier-1 and Tier-2 centers. Launched on Oct 2, 2006; many months of preparation and following the development of about 0.5M lines of software in the new CMSSW framework. 6 weeks later having achieved all technical goals of the challenge. Code ran with negligible crash rate, without any memory problems on all samples By the end of CSA06: Tier-0 centre reconstructed > 200M events; >1 Petabyte of data shipped across network between Tier-0, Tier-1, and Tier-2 centers. Excellent collaboration with IT department was an important factor in the success of the challenge World-wide distributed system of regional Tier1 and Tier2 centers ISGC 2007: CMS Computing6/26 CSA06: T0 Goals & Achievements Prompt Reconstruction at 40 Hz 50 Hz for 2 weeks, then 100 Hz Peak rate: >300 Hz for >10 hours 207M events total Uptime: 80% of best 2 weeks Achieved 100% of 4 weeks Use of Frontier for DB access to prompt reconstruction conditions The CSA challenge was the first opportunity to test this on a large scale with developed reconstruction software Initial difficulties encountered during commissioning, but patches and reduced logging allowed full inclusion into CSA CPU use Max CPU efficiency: 96% of 1400 CPUs over ~12 hours Explored realistic T0 operations, upgrading and intervening on a running system ISGC 2007: CMS Computing7/26 CSA06: T0 T1 Transfers Goal was to sustain 150 MB/s to T1s Twice the expected 40 Hz output rate Last weeks averages hit 350MB/s (daily) 650MB/s (hourly) i.e. exceeded 2008 levels for ~10 days (with some backlog observed) Monthly T1 Transfer plot signals start T0 rate: Hz Min bias start Target rate ISGC 2007: CMS Computing8/26 CSA06: Individual T0 - T1 Performance 6 of 7 Tier-1s exceed 90% availability for 30 days U.S. T1 (FNAL) hit 2X goal 5 sites stored data to MSS (tape) Goals Achievements ISGC 2007: CMS Computing9/26 CSA06: Jobs Execution on the Grid > 50K jobs/day submitted on all but one day in final week > 30K/day robot jobs 90% job completion efficiency Robot jobs have same mechanics as user job submissions via CRAB Mostly T2 centers as expected OSG carries large proportion Scaling issues encountered, but subsequently solved ISGC 2007: CMS Computing10/26 TIB DS modules - positions Closing the loop: analysis of re-reconstructed Z + - data at T1/T2 site: Three scenarios: Ideal/misaligned/realigned (grid jobs at T1-PIC) Determine new alignment: Run HIP algorithm on multiple CPUs at CERN over dedicated alignment skim from T0 1 Million events ~4h on 20CPU Write new alignment into offline DB at T0 (ORCOFF) distribute offline DB to T1/T2s CSA06: Prompt Tracker Alignment results 2 days after AlCaReco! ISGC 2007: CMS Computing11/26 CSA07: Physics Analysis Demonstrations These demonstrations proved to be useful training exercises for collaborators in the new software and computing tools. Muon: Extraction of W Di-Muon reconstruction efficiency Z, J/ + - Northwestern and Purdue groups and T2 activity Tau: Selection of Z tau tau l+jet Tau mis-id study from Z+jet Tau tagging efficiency 1 GLB + 1 tracker track 2 GLB tracks 1 GLB + 1 STA track ISGC 2007: CMS Computing12/26 CSA06 Summary All goals were met T0 prompt reconstruction of RECO, AOD, AlCaReco, and with Frontier efficiency for 207M events Export to 150 MB/s and higher Data reduction (skim) production at T1s performed, transferred to T2s Re-reconstruction demonstrated at 6 T1 centers Job load exceeded 50K/day Alignment/Calibration/Physics analyses widely demonstrated CSA06 was a huge enterprise Commissioned the CMS data-handling 25% scale Everything worked down to the final analysis plots Many lessons can be drawn for the future as we prepare for data- handling operations, and more things to commission DAQ Storage Manager T0 Support of global data-taking during detector commissioning ISGC 2007: CMS Computing13/26 Some Lessons from CSA06 CMS needs some development work to ease the operations load Strong engagement with OSG, WLCG and sites was extremely useful Grid service and site problems were addressed promptly. FTS at CERN was carefully monitored, response when needed CASTOR support at CERN was excellent Support from CERN IT was key for success and very instrumental Data management needs an automatic way to ensure consistency across all components Scale testing continues to be an extremely important activity ISGC 2007: CMS Computing14/26 CMS Outlook and Perspectives for 2007 Lower all the detector, and commission it underground. Prepare final distributed computing and software system and physics analysis capability. Initial* CMS detector will be ready for collisions at 900 GeV at the end of Low luminosity detector will be ready for collisions at design energy in mid Initial* CMS detector is the low luminosity detector minus ECAL endcaps and pixels. Install both during 07/08 winter shutdown. ISGC 2007: CMS Computing15/26 CMS computing goals in 2007 Demonstrate Physics Analysis performance using final software with high statistics. Major MC production of up to 200M events started last week Analysis starts in June, finishes by September Regular data taking: Detector HLT TAPE - T0 - T1 At regular intervals, 3-4 days per months, starting May Month of October: MTCC3 Readout of (successively more) components, data will be processed and distributed to T1 ISGC 2007: CMS Computing16/26 Computing Commissioning Plans 2007 March April May June July Aug. Sep. Oct. Nov. Start large MC Production Global Detector Run February Deploy PhEDEx 2.5 T0-T1, T1-T1, T1-T2 independent transfers Restart job robot Start work on SAM FTS full deployment March SRM v2.2 tests start T0-T1(tape)-T2 coupled transfers (same data) Measure data serving at sites (esp. T1) Production/analysis share at sites verified April Repeat transfer tests with SRM v2.2, FTS v2 Scale up job load gLite WMS test completed (synch. with Atlas) May Start ramping up to CSA07 July CSA07 Start Global data-taking runs CSA07 LHC Eng. run preCSA07 Event Filter tests Start Analysis ISGC 2007: CMS Computing17/26 Motivations for CSA07 There are two important goals for 2007, the last year of preparations for physics and analysis 1) Scaling We need to reach 100% of system scale and functionality by spring of 2008 CSA06 demonstrated between 25% and 50% depending on the metric 2) We need to transition to sustainable operations This spans all areas of computing Data management Job processing User Support Site configuration and consistency In the past functionality was valued higher than the operations load As we prepare for long term support this emphasis needs to change ISGC 2007: CMS Computing18/26 CSA07 Goals: Increase Scale CMS demonstrated 25% performance in We have two more factors of 2 to ramp up before data taking in 2008 The data transfer between Tier-0 and Tier-1 reached about 50% of scale Very successful test, but some signs of system stress were visible Job submission rate reached 25%. We plan another formal challenge in 2007 A > 50% challenge in the summer of 2007 Extend the system to include the HLT farm Add elements like simulation production Increase user load Run concurrent with other experiments stressing the system ISGC 2007: CMS Computing19/26 CMS Computing Model & Resources CMS Tier-1 centers: ISGC 2007: CMS Computing20/26 CSA07 Workflow ISGC 2007: CMS Computing21/26 CSA07 success metrics ISGC 2007: CMS Computing22/26 CSA07 Goals for Tier-1s In the Computing Model the Tier-1 centers perform 4 functions: Archive Data, both real and simulation from Tier-2 centers Execute skimming and selection for users and groups on the data Re-reconstruction of raw data Serving data samples to Tier-2 centers for further analysis As we transition to operations we should bring the Tier-1 centers into alignment with their core functionality ISGC 2007: CMS Computing23/26 CSA07: expectations of Tier-2s MC Production at Tier-2s were a significant contributor to the 25M events/month for CSA06 When the experiment is running the Tier-2s are the only dedicated simulation resources and the expectations is 100M per month Now CMS produces 30M events/months, goal for CSA07 is 50M Analysis submission The Tier-2s are expected to support communities Either local groups or regions of interest Only implemented in a couple of specific communities Unlike Tier-1 data subscriptions and processing expectations, which are largely specified by the experiment centrally, the Tier-2s have control over the data and the activity CMS will work to improve the reliability and availability of the Tier-2 centers ISGC 2007: CMS Computing24/26 Tier-2 Analysis goals in 2007 Tier-2s are the primary analysis resource controlled by physicists The activities are intended to be controlled by user communities Up to now most of the analysis has been hosted at the Tier-1 sites CMS will enlarge analysis support by hosting important physics samples exclusively at Tier-2 centers We have roughly sites that have sufficient disk and CPU resources to support multiple datasets Skims in CSA06 were about ~500GB The largest of the raw samples was ~8TB Force the migration of analysis to Tier-2s by hosting data at Tier-2s ISGC 2007: CMS Computing25/26 Transition to operations in 2007, Goals We plan to measure the transition to operations with concrete metrics Site availability: SAM tests (Site Availability Monitor) Put CMS functions in the site functional testing Analysis submissions Production Frontier Data Transfer Measure the site availability The WLCG goal for the Tier-1 in early 2007 is 90% We should establish a goal for Tier-2s, 80% seams reasonable Goals for summer of 07 would be 95% and 90% respectively ISGC 2007: CMS Computing26/26 Prepare CMS for Analysis: Summary Prepare CMS for Analysis: Summary 2006 was a very successful year for CSM software and computing 2007 promises to be a very busy year for Computing and Offline Commissioning, Integration remains major task in 2007 To balance the needs for physics, computing, detector will be a logistics challenge Transition to Operations has started; data operations group formed Facilities will be ramping up resources to be ready for pilot run and the 2008 physics run An increased number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS analysis