PCAP Management Overview
John HuthJohn Huth
Harvard UniversityHarvard University
PCAP Review of U.S. ATLASPCAP Review of U.S. ATLAS
Lawrence Berkeley LaboratoryLawrence Berkeley LaboratoryNOVEMBER 14-16, 2002NOVEMBER 14-16, 2002
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 2
Outline
Overview Overview Changes from Last year LCG inception U.S. ATLAS and International ATLAS
HighlightsHighlights
IssuesIssues Funding, base program funding
Review of actions on recommendationsReview of actions on recommendations
External groups (iVDGL/PPDG/EDG)External groups (iVDGL/PPDG/EDG)
Change controlChange control
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 3
Major Changes Since Last Review
Research Program Launched – M+O and Computing considered as Research Program Launched – M+O and Computing considered as one “program”one “program” Research program proposal submitted
Tier 2 funds Physics generator interface Some core support CERN infrastructure support Detector specific support
““Large” ITR workshop Large” ITR workshop Data provenance, analysis in a grid environment
LCG Project Launched (Covered in Torre’s talk) LCG Project Launched (Covered in Torre’s talk) Major US ATLAS participation
Data management scheme adopted
LHC turn-on 2007 announced LHC turn-on 2007 announced
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 4
Luminosity Evolution of the LHC
2005 2006 2007 2008 2009 2010 2011 2012
3310L 3310L 3310L 3410L 3410L
DC30%
DC50%
14 fb 110 fb 120 fb 1200 fb 1500 fb
1Pbyte 2.5 Pbyte 6 Pbyte 10 Pbyte4 Pbyte
,q g
H H ZZ
H W W T TH W W
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 5
International/US ATLAS
DeliverablesDeliverables Control/framework, Data management effort, DC support, build
support
Facility support of data challenges
Incorporation and inception of grid tools for data challenges PACMAN, MAGDA, Interoperability tests, Grappa, Grat
ManagementManagement Architecture team
Data management leadership
Detector specific
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 6
FTE Fraction of Core SW
22%
21%
13%8%6%
17%
13% U.S.FranceU.K.CERNItalyOtherNeeded
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 7
Project Core SW FTE
10.5
10
643
8
6U.S.FranceU.K.CERNItalyOtherNeeded
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 8
ATLAS Subsystem/Task Matrix
Offline Offline
CoordinatorCoordinator
ReconstructionReconstruction SimulationSimulation DatabaseDatabase
ChairChair N. McCubbinN. McCubbin D. RousseauD. Rousseau A. Dell’AcquaA. Dell’Acqua D. MalonD. Malon
Inner DetectorInner Detector D. BarberisD. Barberis D. RousseauD. Rousseau F. LuehringF. Luehring S. Bentvelsen /S. Bentvelsen /
D. CalvetD. Calvet
Liquid ArgonLiquid Argon J. CollotJ. Collot S. RajagopalanS. Rajagopalan M. LeltchoukM. Leltchouk H. MaH. Ma
Tile CalorimeterTile Calorimeter A. SolodkovA. Solodkov F. MerrittF. Merritt V.TsulayaV.Tsulaya T. LeCompteT. LeCompte
MuonMuon J.ShankJ.Shank J.F. LaporteJ.F. Laporte A. RimoldiA. Rimoldi S. GoldfarbS. Goldfarb
LVL 2 Trigger/ LVL 2 Trigger/
Trigger DAQTrigger DAQ
S. GeorgeS. George S. TapproggeS. Tapprogge M. WeilersM. Weilers A. Amorim /A. Amorim /
F. TouchardF. Touchard
Event FilterEvent Filter V. VercesiV. Vercesi F. TouchardF. Touchard
Computing Steering Group members/attendees: 4 of 19 Computing Steering Group members/attendees: 4 of 19 from US (Malon, Quarrie, Shank, Wenaus)from US (Malon, Quarrie, Shank, Wenaus)
Physics Coordinator: F.Gianotti
Chief Architect: D.Quarrie
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 9
Highlights of last year
Fads/goofy (alternative framework) issue solvedFads/goofy (alternative framework) issue solved G4 now incorporated into Athena
Increased usage of Athena by collaboration, supportIncreased usage of Athena by collaboration, support
Adoption of (US ATLAS) hybrid database solution by LCGAdoption of (US ATLAS) hybrid database solution by LCG
Major success in grid production for data challenges Major success in grid production for data challenges
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 10
Issues
After baselining exercise, funding profile is lower than After baselining exercise, funding profile is lower than agency guidance (shortfall 10% of budget in 02).agency guidance (shortfall 10% of budget in 02). NSF funding late relative to expectations Budget shortfall
Base programs at the supporting national labs are erodingBase programs at the supporting national labs are eroding Threatens some deliverables (e.g. support of ADL)
In some cases, fractions of FTE’s are split many waysIn some cases, fractions of FTE’s are split many ways Growth of grid activities – spans facilities and software Growth of grid activities – spans facilities and software
domainsdomains Management of deployment and use of grid tools
Coordination with CMS/LCGCoordination with CMS/LCG Infrastructure support improving (SIT)Infrastructure support improving (SIT)
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 11
Recommendations I
• 4.1 As is stated in the framework section of this report, we urge that the ARC-4.1 As is stated in the framework section of this report, we urge that the ARC-recommended adoption of the ATHENA framework for all steps in data simulation, recommended adoption of the ATHENA framework for all steps in data simulation, reconstruction and analysis happen as soon as possible. Completion of the reconstruction and analysis happen as soon as possible. Completion of the integration before DC1 is crucial for effective use of the planned 1E7 events by integration before DC1 is crucial for effective use of the planned 1E7 events by ATLAS collaborator for trigger/DAQ and algorithm studies.ATLAS collaborator for trigger/DAQ and algorithm studies.
• This is being actively pursued as the sole supported strategy. A first production-quality version of the integrated Athena/G4 framework is scheduled for release 6.0.0 (end of January 2003).
• 4.2 We recommend that the ATLAS activities on GEANT4 hadronic physics 4.2 We recommend that the ATLAS activities on GEANT4 hadronic physics validation receive high priorityvalidation receive high priority• ATLAS is acting as a leader for G4 hadronic validation. This activity is now coming
under the LCG project.
• 4.3 Communication with the CLHEP group is important to ensure that this step is 4.3 Communication with the CLHEP group is important to ensure that this step is completed. Recent exchanges look promising, and the committee encouraged completed. Recent exchanges look promising, and the committee encouraged U.S. ATLAS to further pursue contact with Fermilab and the individuals U.S. ATLAS to further pursue contact with Fermilab and the individuals concerned.concerned.• See response from Ian Hinchliffe in his talk
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 12
Recommendations II
4.4 The committee commends the nightly rebuilds as an 4.4 The committee commends the nightly rebuilds as an important step and urges that high priority be given to the important step and urges that high priority be given to the addition of QA/QC tests to the nightly builds. This will addition of QA/QC tests to the nightly builds. This will extend their use to monitoring and improving the extend their use to monitoring and improving the performance of the software. performance of the software.
This has been done. US ATLAS led the incorporation of automated testing at many levels – unit testing, package testing, systems-level integration and regression testing - -into the nightly builds. In the context of the new infrastructure group (SIT) testing is now being formally organized in the release structure.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 13
Recommendations III
4.5 The Athena team should make every effort to respond to the 4.5 The Athena team should make every effort to respond to the recommendations made by the ARC. A clear statement should be recommendations made by the ARC. A clear statement should be given where it is intended to diverge from the recommendations. given where it is intended to diverge from the recommendations. Requests for documentation should be respected.Requests for documentation should be respected. The Athena team has responded to the recommendations made by the
ARC. A substantial effort has been made to provide and streamline the documentation.
4.6 The outcome of the review and the program of work resulting from it 4.6 The outcome of the review and the program of work resulting from it should be communicated to the collaboration by the computing should be communicated to the collaboration by the computing management as soon as possible. This should describe the immediate management as soon as possible. This should describe the immediate plans for the future development of Athena and for its use in the various plans for the future development of Athena and for its use in the various data processing applications. This is urgently needed to give clear data processing applications. This is urgently needed to give clear direction to the software effort. Mechanisms will need to be put in place direction to the software effort. Mechanisms will need to be put in place to ensure that the architecture is respected and the framework used in to ensure that the architecture is respected and the framework used in order to bring cohesion to ATLAS software.order to bring cohesion to ATLAS software. This was done.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 14
Recommendations IV
4.7 We recommend that a clear milestone be set for 4.7 We recommend that a clear milestone be set for migrating the simulation framework to Athena and that the migrating the simulation framework to Athena and that the simulation and framework groups collaborate closely on simulation and framework groups collaborate closely on working towards this common aim. The timescale for this working towards this common aim. The timescale for this depends strongly on the effort that can be devoted to depends strongly on the effort that can be devoted to continue the integration of Athena, Geant 4 and specific continue the integration of Athena, Geant 4 and specific ATLAS code. We recommend that this be given high ATLAS code. We recommend that this be given high priority. priority. A prototype was delivered at the end of 2001. This was been
accepted and extended by the Simulation Coordinator to be the basis of the first production-quality integration framework scheduled for release 6.0.0. (January ’03)
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 15
Recommendations V
4.8 We find that the consolidation of leadership in the Data 4.8 We find that the consolidation of leadership in the Data Management area onto one person is positive and Management area onto one person is positive and recommend that all activities continue under his direction. recommend that all activities continue under his direction. We recommend that technology selection criteria be We recommend that technology selection criteria be specified. We recommend that U.S. ATLAS augment the specified. We recommend that U.S. ATLAS augment the manpower of the database effort as soon as possible.manpower of the database effort as soon as possible. The choice of technology has been made in the context of the LCG.
In particular, the LCG has focused on the adoption of a relational-database/ROOT solution, which was led by U.S. ATLAS. The augmentation of manpower remains problematic in the face of budget shortfalls.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 16
Recommendations VI
4.9 The committee recognizes the importance of GRID 4.9 The committee recognizes the importance of GRID development and commends U.S. ATLAS for forging close development and commends U.S. ATLAS for forging close contact and collaboration with the computer science contact and collaboration with the computer science component. This will pay off with usable distributed component. This will pay off with usable distributed analysis infrastructure on a reasonable time scale. We analysis infrastructure on a reasonable time scale. We urge U.S. ATLAS to continue to emphasize this urge U.S. ATLAS to continue to emphasize this collaboration and use the data challenges to strengthen the collaboration and use the data challenges to strengthen the connection between HEP and CS GRID work.connection between HEP and CS GRID work. This is continuing. We have a number of successes in
implementation of grid tools for DC1, and the SC2002 conference. The workshop here in Berkeley is an example of the ongoing connections between physicists, computer scientists and the relevant funding agencies.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 17
Recommendations VII
4.10 We strongly support the plan to first attack problems of 4.10 We strongly support the plan to first attack problems of distributed data management followed by adding more distributed data management followed by adding more “smarts” to optimize scheduling of jobs on the GRID. We “smarts” to optimize scheduling of jobs on the GRID. We recommend close coupling of GIRD and Data Challenge recommend close coupling of GIRD and Data Challenge activities, and strengthening the communication between activities, and strengthening the communication between U.S. ATLAS GRID developers and Data Challenge U.S. ATLAS GRID developers and Data Challenge management. Further coordination will help ensure that management. Further coordination will help ensure that GRID results in a net gain of manpower.GRID results in a net gain of manpower. That is precisely the route we have taken. This has been largely
successful in the creation and deployment of tools for DC1. The optimization of job scheduling for production, while encapsulated in the way we’ve handled production, leaves many open questions for “chaotic analysis”.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 18
Recommendations VIII
4.11 The Tier 1 computing facility should attempt a data challenge of the 4.11 The Tier 1 computing facility should attempt a data challenge of the same scale as the proposed DC0 at CERN (100k events) in advance of same scale as the proposed DC0 at CERN (100k events) in advance of DC1.DC1. Although no formal work was done to replicate the effort for DC0, the DC1
effort, led by Pavel Nevski, gave the Tier 1 center at BNL a central role in DC1 production.
4.12 The T1 regional center at BNL should encourage, using all means 4.12 The T1 regional center at BNL should encourage, using all means available, physicists to analyze MC stored at BNL Tier 1 in order to available, physicists to analyze MC stored at BNL Tier 1 in order to support US physicists and believe the new scale of the facility is correct. support US physicists and believe the new scale of the facility is correct. We have put effort into encouraging expanded facility use, particularly before
DC1 started monopolizing the cluster. Jeffrey McDonald, Frank Paige and others have made significant use of the Tier 1 facility in the past year in addition to Mike Shupe, Pavel Nevski and Yuri Fisiak.
4.13 We support the re-scoped size of the BNL Tier 1 in order to support 4.13 We support the re-scoped size of the BNL Tier 1 in order to support US physicists and believe the new scale of the facility is correct.US physicists and believe the new scale of the facility is correct. New estimates on personnel required in the long term. Further delays in the
LHC schedule coupled with Moore’s law make this even more attractive.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 19
Recommendations IX
4.14 The committee feels that an increase in Tier 2 4.14 The committee feels that an increase in Tier 2 capacities could help with the funding for the shortfall for capacities could help with the funding for the shortfall for DC2, but there is still a shortfall of manpower. Additional DC2, but there is still a shortfall of manpower. Additional funding for the User facility project in advance of DC2 will funding for the User facility project in advance of DC2 will likely be necessary in order to make this data challenge a likely be necessary in order to make this data challenge a success for the U.S. ATLAS computing facilities.success for the U.S. ATLAS computing facilities. DC1, Phase 2 contributions from the Tier 2 (and Tier 3+) centers
are expected to be significant. Given the funding scenarios, Tier 2 growth, and augmentation of resources via this route may be the only option for increased facility scope.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 20
Recommendations X
4.15 We recommend that the excellent work on measuring 4.15 We recommend that the excellent work on measuring and understanding networking needs be continued and and understanding networking needs be continued and commend ATLAS for its foresight in understanding the commend ATLAS for its foresight in understanding the tested bottlenecks early in the process. This is extremely tested bottlenecks early in the process. This is extremely important in satisfying the ATLAS data challenge goals.important in satisfying the ATLAS data challenge goals. Shawn McKee has, in addition to continuing the analysis and
monitoring of the US ATLAS network, is playing a very active role in US HEP networking, both participating in and organizing efforts directed at satisfying the US ATLAS Wide Area Network needs both within the context of Internet 2 and Esnet upgrades.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 21
Recommendations XI
4.16 We recommend that the Data Management effort be strengthened 4.16 We recommend that the Data Management effort be strengthened to support the Data Challenges. We also recommend the deployment to support the Data Challenges. We also recommend the deployment of sufficient resources so U.S. ATLAS can be a significant participant in of sufficient resources so U.S. ATLAS can be a significant participant in the Data Challenges.the Data Challenges. Although the cutbacks in funding preclude the addition of FTE’s “on project”,
a substantial amount of effort has been culled from the collaboration to make pilot runs of the data challenges a success in the US, and a model for international ATLAS. We should point out that the new 10 TB disk addition at the Tier 1 site (from FY02 year end funding) will be a significant resource in supporting DC1P2 and DC2, and will also play a key role in making the simulated data available for users.
4.17 The committee is worried about the reduced level of funding. If 4.17 The committee is worried about the reduced level of funding. If the present situation is not corrected in FY ’02, this part of the project the present situation is not corrected in FY ’02, this part of the project will undoubtedly suffer. This concerns both the personnel and the will undoubtedly suffer. This concerns both the personnel and the facility at BNL.facility at BNL. This continues to be a problem.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 22
Recommendations XII
4.18 The committee urges the U.S. ATLAS management to address the 4.18 The committee urges the U.S. ATLAS management to address the Athena problem at the appropriate level within the International ATLAS Athena problem at the appropriate level within the International ATLAS management to arrive at an agreed timetable for the transition to a management to arrive at an agreed timetable for the transition to a unique framework.unique framework. This has largely happened already, with fad/goofy withering, and G4
implementations in Athena. Torsten Akesson (deputy spokesperson) has been instrumental in helping resolve this.
4.19 We also encourage the U.S. ATLAS management and their 4.19 We also encourage the U.S. ATLAS management and their funding agencies to help establish a stronger presence at CERN.funding agencies to help establish a stronger presence at CERN. This is part of the Research Program budget submitted to the NSF, which
includes two dedicated infrastructure FTE’s. Whether this is funded at the requisite level remains to be seen. Torre Wenaus is now in charge of the LCG applications area. Massimo Marino is a (nearly) full time resident at CERN for Athena support. Pavel Nevski will be (nearly) full time resident next year.
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 23
Relations with External Groups
PPDG PPDG Metadata catalog (MAGDA) (Wensheng Deng, BNL) Interoperability tests, with EDG (Jerry Gieraltowski, ANL) Virtual organization, monitoring in grid environment (Dantong Yu, BNL) Distributed analysis (David Adams, BNL)
iVDGLiVDGL Package deployment/installation (PACMAN)( S. Youssef, BU) –adopted by
VDT and CMS Grid portals (Grappa) (Dan Engh, U. Chicago, S. Smallens, Indiana) Hardware support
NB A tremendous amount of support comes from base efforts at Labs NB A tremendous amount of support comes from base efforts at Labs and Universities (netlogger – LBNL, grat- De UTA, support – H. and Universities (netlogger – LBNL, grat- De UTA, support – H. Severini, Oklahoma, S. McKee, Michigan, May, ANL, BNL, LBNL, BU)Severini, Oklahoma, S. McKee, Michigan, May, ANL, BNL, LBNL, BU)
14 Nov 0214 Nov 02J. Huth PCAP Review, Mgmt. SummaryJ. Huth PCAP Review, Mgmt. Summary 24
An Instance of Change Control
Our Proj. Management Plan describes a change control Our Proj. Management Plan describes a change control procedure, which invokes the CCB (Computing procedure, which invokes the CCB (Computing Coordination Board), in a process to grant change control. Coordination Board), in a process to grant change control.
R. Gardner departed from Indiana University to Univ. of R. Gardner departed from Indiana University to Univ. of Chicago to become iVDGL coordinator. His funding is via Chicago to become iVDGL coordinator. His funding is via iVDGL was for a prototype Tier 2 site at Indiana.iVDGL was for a prototype Tier 2 site at Indiana. Request was for prototype effort to remain at Indiana (substantial
infrastructure), but have personnel funded at U.Chicago Additional manpower, in effect, comes from this change All parties agreed CCB agreed with this, but didn’t see this change as an entitlement
for a final Tier 2 at either Indiana or U. Chicago (to be revisited in 2 years).
Change control memo written to file.