If you can't read please download the document
Upload
teddy
View
26
Download
1
Tags:
Embed Size (px)
DESCRIPTION
ALICE: Offline Planning and personnel resources. LHCC Manpower Review of Computing September 3, 2003. Questions to be answered. Profile of available and required manpower at CERN / Regional Centers / Institutes Other resources existing and potential - PowerPoint PPT Presentation
Citation preview
ALICE: Offline Planning and personnel resources LHCC Manpower Review of ComputingSeptember 3, 2003
ALICE : planning & resources
Questions to be answeredProfile of available and required manpower at CERN / Regional Centers / Institutes
Other resources existing and potential
Computing elements which will not be provided in case the required manpower and resources are not available
Measures of progress in producing necessary software Management tools to track the progress Verification of the quality of the LCG software
ALICE : planning & resources
ForewordLack of personnel in LHC computing (experiment & common HW/SW infrastructure) has been emphasized by LHC Computing Review (2001) and judged extremely worrying
CERN and the Collaborations together must do all that they can to provide the HR that are needed for Core Software development
The shortage has been alleviated for the LCG project by influx of computing professionals funded by member countries
No such mechanism exists yet for experiments where the personnel shortage remains a problem
ALICE has re-profiled the planning
The data to be shown represent a bare minimum below which the readiness for data processing cannot be guaranteed.
ALICE : planning & resources
Menu : Planning & ResourcesALICE Offline organization & managementStrategy for the Offline project, DC & milestonesPersonnel ressources : available and requestsAnswer to questions & conclusions
ALICE : planning & resources
Organizatio Offline project mandate : Prepare software and computing infrastructure for experiments data processing (+DAQ, +HLT projects); Provide and maintain a complete infrastructure for simulation, reconstruction and analysis already during construction phase; Offline personnel for software developments: Core Offline project : minority, full time, located at CERN; Detector projects : most of the personnel, part time (preparation of apparatus), located in collaboration institutes; LCG provides common hardware and software infrastructure for LHC computing. nStrict coordination required to make the best usage of the personnel available.
ALICE : planning & resources
Organization Management structureProject Leader & DeputyResourcesCoordinationPlanningCoordinationProductionEnvironment CoordinationFramework &Infrastructure CoordinationSimulationCoordinationReconstruction &Physics CoordinationCore Offline OfflineBoardUS GridCoordinationEU GridCoordinationSoftwareprojectsDetectorprojectsDAQHLTInt.Comp.BoardRegional TiersLCG SC2 GDB POB
ALICE : planning & resources
Core Offline Work PackagesFramework and infrastructure coordinationSimulation coordinationReconstruction and physics coordinationProduction environment coordination
ALICE : planning & resources
Organization Light weighted, single structureEfficient use of available personnel High adaptability to rapid changing technologyMerge framework developer (services providers) & physics algorithms developer (consumers)Maximize communicationEconomy of personnel (polymorphism of software experts)Rapid feedback to users requirements Management structure
ALICE : planning & resources
Planning StrategyDynamic management of the work scheduleDevelop a long term software infrastructureMaintain the infrastructure in working state during detector constructionConstraintsDepend on the planning of external projects (LCG, EDG, EGEE)Most developers refer to detector projectsTake advantage of latest developments in fast evolving technologyNo personnel available for in depth planning activity Majority of personnel in Core offline project is temporary and with unpredictable skillsLight weighted and opportunistic strategy with flexible data challenges as high level milestones
ALICE : planning & resources
Core team @ CERNA choice, not a necessity Need for a strong and centralized team of expertsTo facilitate coordination in all detector projects and all regional centersCERN, more than other ALICE groups, has the critical mass of people with the right skillsBenefit from co-habitation with ALICE managementAnd with LCG management Benefit from the attraction CERN exercises on young people with the right profile
ALICE : planning & resources
Development strategyMinimize the effective amount of developmentChose mature and well tested productsROOT : Common HEP solution for: Data persistency at the file level, interface to various libraries, visualization, graphical user interface, virtual Monte-Carlo, geometrical modelerAliEn : The ALICE distributed computing environment all made with Open Source components based on Open Standards; 2 FTE for development, 0.5 for operation, in production since 2002Reduce staff and rely on temporary personnelHowever there is a threshold for staffDelegate well identified and modular packages to teams outside Core groupDetector data baseEDG/EGEE test bed
ALICE : planning & resources
Data ChallengesStress-test the ALICE data model, DAQ hardware and software infrastructure with prototypes of increasing complexity until 2007 objectives are reached.Computing DC: record HI data at 1.2 Gbytes/s and export quasi online processing outside CERNPhysics DC: provide the infrastructure for organized Monte-Carlo production and world-wide random data-analysis
ALICE : planning & resources
Computing Data ChallengeALICE & IT : Assess the MS requirements and evaluate available products (1998);Evaluate functions of DAQ, Offline, HLT projects ; Large-scale high-throughput distributed DC (4) to : Prototype the DAQ, Offline, HLT computing systems Verify their integration Assess technologies and computing models Test hardware and software components in realistic environment Achieve an early integration of the overall computing infrastructure
ALICE : planning & resources
ilestonesM
ALICE : planning & resources
Physics Data ChallengeObjectives : Prototype and test scalability of the components needed to simulate, reconstruct, and analyze data on distributed computing resourcesThree interlinked components : ROOTAliRootAliEn
ALICE : planning & resources
Milestones* Fraction of events simulated in one year of standard data talking
ALICE : planning & resources
PDC-III Resources estimate Simulation105 Pb-Pb + 107 p-pDistributed production, (partial) data replication at CERNReconstruction and analysisData source is CERN : 5106 Pb-Pb + 107 p-p Reconstruction at CERN and outside depending on resource availabilityResources (CPU and Storage)2004 Q1: 1354 KSI2K and 165 TB2004 Q2: 1400 KSI2K and 301 TBBandwidthSimulation in 2004 Q1~90 TB will be shipped to CERN in about 2 months ~10 days using 10% of the CERN bandwidth.
ALICE : planning & resources
PDC-III resources profile
ALICE : planning & resources
PDC-III resourcesDetails in the ALICE Data Challenges paper taking into accountResults of previous PDCEstimation of simulations in a standard year (2009)Storage: 200TB must be kept beyond the PDC end!!The numbers indicating the LCG resources for ALICE assume simultaneous use of the resources by all the experiments!A dynamic resource allocation would easily solve the deficitUSA quota to be confirmed
ALICE : planning & resources
Sheet1
O3Q1O3Q2O3Q3O3Q4O4Q1O4Q2O4Q3
CPU Requirements kSI2k13541400
LCG Declared Capacity for ALICE941941
Storage Requirements - total TB active data165301
LCG Declared Capacity for ALICEDisk192192
109Tapes578578
109Total770770
Sheet2
Sheet3
Tracking progressMilestones set by the needs to prepare the Physics Performance ReportFull and fast simulationDetector reconstructionGlobal reconstructionProgress monitored by Physics DCCentral coordination at CERN (architect, librarian, multi-platform compatibility)Offline board takes the decision on framework evolution and review progressDevelopers implement during Offline week Code reviewed by experts
ALICE : planning & resources
Verification of LCG software quality Grid technology area
ALICE : planning & resources
Verification of LCG software quality Grid deployment area
ALICE : planning & resources
Verification of LCG software quality Fabric area
ALICE : planning & resources
ALICE Offline PlanningToday
ALICE : planning & resources
Personnel Profile (task oriented)4 permanent staff persons
Profile is build up with the assumptions that temporary personnel is NOT replaced*
Evolution reported since 1998
* Unrealistic scenario to emphasize fragility of the structure
ALICE : planning & resources
Personnel Profile (task oriented) - 1/5
Activity 98 99 00 01 02 03 040506070809Off-line CoordinationAvail.0.81.01.01.01.01.71.51.01.01.01.01.0Needed1.01.01.01.02.02.02.02.02.02.02.02.0Missing0.30.00.00.01.00.30.51.01.01.01.01.0DB and distributed computing infrastructureAvail.0.62.21.61.51.82.02.02.42.00.80.00.0Needed2.02.02.02.02.02.02.02.02.02.02.02.0Missing1.50.20.40.50.30.00.00.00.01.22.02.0Framework DevelopmentAvail.0.40.40.30.81.82.31.91.31.30.80.30.3Needed1.01.01.51.51.52.02.02.02.02.02.02.0Missing0.60.61.20.70.30.30.10.70.71.21.71.7Simulation frameworkAvail.1.92.02.83.03.33.02.82.01.51.01.01.0Needed3.03.03.03.03.03.03.02.02.01.51.01.0Missing1.11.00.30.00.30.00.30.00.50.50.00.0
ALICE : planning & resources
Personnel Profile (task oriented) - 2/5
ALICE : planning & resources
Personnel Profile (task oriented) - 3/5
Activity 98 99 00 01 02 03 040506070809RadiationStudiesAvail.0.50.30.81.01.01.00.00.00.00.00.00.0Needed0.50.51.01.01.01.01.01.00.50.50.50.5Missing0.00.20.20.00.00.01.01.00.50.50.50.5SystemsupportAvail.1.01.81.51.01.01.01.01.01.01.01.01.0Needed1.01.01.51.01.01.01.01.01.01.01.01.0Missing0.00.80.00.00.00.00.00.00.00.00.00.0Analysissupport Avail.0.00.00.31.01.21.40.80.00.00.00.00.0Needed0.00.00.51.01.01.01.01.01.01.01.01.0Missing0.00.00.20.00.20.40.21.01.01.01.01.0
ALICE : planning & resources
Personnel Profile (task oriented) - 4/5 Summary Core Offline team
9899 00 01 02 03 040506070809Avail.6.89.811.813.716.118.414.910.08.05.64.34.3Needed11.511.515.716.517.518.018.517.517.016.516.016.0Missing4.81.73.92.81.40.43.77.59.010.911.711.7
ALICE : planning & resources
Personnel Profile (task oriented) - 5/5Long build-up timeMust sustain plateau after 2003
ALICE : planning & resources
Personnel Profile (post oriented)4 permanent CERN staff
Temporary CERN personnel (no replacement assumed*)Staff LDTechnical and Physics studentsCERN Fellows
Temporary CERN Project Associates (direct contribution from collaboration institutes + ALICE CERN exploitation budget ; no replacement assumed* )
* Unrealistic scenario to emphasize fragility of the structure
ALICE : planning & resources
Personnel Profile (post oriented) - 1/5Mostly temporary personnelSubstantial contribution from collaboration institutesROOT effect in 1999, AliEn effect in 2003
ALICE : planning & resources
Personnel Profile (post oriented) - 2/5Only 25% permanent personnelMore than 60% are short/medium term personnel
ALICE : planning & resources
Out-sourced projects - 1/3Detector DB by Physics Department and Computer Science Department @ Warsaw University : a single DB (economy of personnel) common to all detectors in the experiment
ALICE : planning & resources
Out-sourced projects - 2/3EDG testbed validation and participation in various GRID projects by ALICE/Italy, ALICE/US, and the EDG/DataTAG project; to be continued with EGEE
ALICE : planning & resources
Out-sourced projects - 3/3AliEN: basis of the ALICE distributed computing infrastructure : Coordination and main development by Core Offline group but several specific sub-tasks delegated to individuals at remote places
ALICE : planning & resources
Ressources summaryDistribution of personnel for common offline activitiesAbout 40% of the work is distributed outside CERN
ALICE : planning & resources
HLT SoftwareOnly personnel working on algoritms and simulation in collaboration with Offline projectPart of missing personnel should come from PhD students
ALICE : planning & resources
LCG projects in application areaALICE has already made most of choices for critical issues (persistency, data DB, tracking, geometry descriptor, distributed computing, etc)Does not need to rely on common LCG applications To come : AliEn coupled with PROOF as generic architecture for LCG interactive analysis However ALICE contributes to common developments :GANIS ????
ALICE : planning & resources
Other ressourcesUE project : one person to work full time on EDG for ALICEIndustry : Do not remember who???? : Code checkerEricson : AliEn what exactly ????Nasa : one person full time on the Virtual Monte-Carlo ?????
ALICE : planning & resources
Offline in detector projects - 1/3AliRoot: An object Oriented framework which directly uses ROOT and provides:
Many event generatorsTracking using Virtual Monte-CarloIO infrastructureSteering functionalitiesGlobal reconstruction
Detector (13) tracking and reconstructionAnalysis
ALICE : planning & resources
Offline in detector projects - 2/3No full-time dedicated developersSchedule defined by global milestones (DC)Planning is task oriented rather than personnel oriented
ALICE : planning & resources
Offline in detector projects - 3/3 SummaryTotal39.737.335.835.8Needed8.613.314.414.4
ALICE : planning & resources
Personnel resources in Offline project About 16% of the personnel at CERN, the remainder in collaboration institutes, no experiment dedicated personnel at regional centers.
ALICE : planning & resources
Personnel resources in Offline project OUTSIDE INSTITUTES (84%)CERN (16 %)COLORS !
ALICE : planning & resources
Grafico2
38
51
4
10
30
Foglio1
Analysis38
Subtdetecor projects51
HLT4
Core offline NOT CERN10
Core offline CERN30
Foglio1
Foglio2
Foglio3
How to mitigate the lack of PersonnelThe ALICE off line project is committed to provide the collaboration with the adequate software to take and analyze data starting 2007.The project has already adapted its strategy to the lack of personnel and aims toward a bare minimum which enables to fulfill its tasks.The Core team cannot afford to lack more personnel without putting in danger the success of its goals.The severe lack of personnel in the detector projects will translate in lack of readiness in terms of accuracy in the algorithms and in lack of availability of categories of algorithms. Such a deplorable situation will have a negative impact on the quality of physics results.
ALICE : planning & resources
ALICE priorities - 1/4Core Offline group at CERN : Less than 1/4 of personnel in Core Offline group at CERN are permanentMore than 50% are temporary personnel Dependence on availability of short term CERN positionsUncertainty on renewalsLoss of knowledge -- difficulty of knowledge transferDifficulty to cover key positions with people with the appropriate profileCompetition within ALICE in a fixed quota situation
ALICE : planning & resources
ALICE priorities - 2/4Core Offline group at CERN : Have at least 1/3 of long-term personnel, limit use of fellows and students to 1/2, without changing the target number of FTEsEnsure the covering of key areas by converting two area coordinators (Production Environment, Framework & Infrastructure) now on temporary positions into CERN permanent staffAlleviate the volatility of Core Offline Team with at least two long term (6 years, LD-like) positions at CERN to replace short term ones (Detaching LCG personnel to ALICE would be a natural solution)Which profile/task????
ALICE : planning & resources
ALICE priorities - 3/4Core Offline group at CERN :
ALICE : planning & resources
ALICE priorities - 4/4Detector Offline at collaboration institutes : About 10 FTEs missing in the subdetector projects for software developmentsThis is a responsibility of the Institutes in charge of the subdetector projectsWe are working hard to find these peopleAdditional resources from funding agencies will have to be discussed case-by-case
ALICE : planning & resources
Answer to questions - 1/4Profile of available and required manpower at CERN / Regional Centers / InstitutesCore offline group : 2 CERN staff + 2 long-term personnel would create satisfactory working consitions
We have reached a equilibrium which enables to fulfill all the assigned tasks, however the equilibrium is fragile.
ALICE : planning & resources
Answer to questions - 2/4Profile of available and required manpower at CERN / Regional Centers / InstitutesDetector groups : Most of the groups are understaffed ; personnel (about 10 FTE) dedicated to detector projects is systemically needed in the institutes Solution to found in the collaboration with the help on case-by-case basis of funding agencies
ALICE : planning & resources
Answer to questions 3/4Other resources existing and potential A few occasional collaborations with industries Computing elements which will not be provided in case the required manpower and resources are not availableLack of readiness of algorithms or accuracy in algorithms Serious difficulties to interface ALICE software to LCG middlewareQuasi impossibility to adopt new LCG common software
ALICE : planning & resources
Answer to questions - 4/4Measures of progress in producing necessary software :Because of the scare personnel available for the offline project a light weighted and dynamic organization has been adoptedThis organization has been so far been successful in producing a framework, detector software and a grid environment routinely used by the collaboration for detector design and physics validation.LCG software will be considered as soon as stable versions outperforming the software presently in use will become available.Milestones to test LCG middleware and fabric will be closely watched.
ALICE : planning & resources
Conclusions - 1/2The core team of the offline project has adapted to the reduced personnel available and established its tasks and objectives accordingly.
The edifice is fragile : any additional cut in (temporary) personnel might hinder the availability in due time of the software needed for data taking and analysis.
Securing two staff positions is instrumental for the project success.
ALICE : planning & resources
Conclusions - 2/2Adding 2-3 long term personnel to the core team would alleviate the unstable situation by making it less dependent on temporary personnel
The lack of personnel fully dedicated to software development in the detector projects is worrisome as the lack of indispensable algorithms might dramatically delay first physics results from LHC.
The needed personnel (not necessarely computer specialists) must be recruited by the institutes of the collaboration.
ALICE : planning & resources
ALICE : planning & resources
DAQ-HLT Data Flow
HLTDDL SIUDDL SIUDetector RODetector LDCD-RORCDDL DIUDDL SIU FEPHLT algorithmH-RORCDDL DIUHLT LDCD-RORCDDL DIUEvent Building Network (raw, HLT data, decisions)Storage NetworkGDCGDCGDCGDC~ 400 DDL~ 300 DDLTPC, TRD, MUON, ITS10 DDLPre(co)-processingMODE AMODE B&CTrigger
ALICE : planning & resources
Tasks of offlineFramework and infrastructure coordination Framework development (simulation, reconstruction, analysis)Persistency technologyComputing/Physics data challenges with DAQ/HLTIndustrial joint projectsTechnology trackingLibrarian, CVS maintenance, test proceduresQA toolsSupport and documentation
Core
ALICE : planning & resources
Tasks of offlineSimulation coordination Detector simulationPhysics simulationPhysics validationG4 integrationFluka integrationRadiation studiesGeometrical modeler
Core
ALICE : planning & resources
Tasks of offlineReconstruction and physics coordination TrackingDetector reconstructionGlobal reconstructionAnalysis toolsAnalysis algorithmsPhysics data challengesCalibration and alignment algorithms
Core
ALICE : planning & resources
Tasks of offlineProduction environment coordination Production environment for simulation, reconstruction and analysisDistributed computing environmentData bases organization
Core
ALICE : planning & resources
Tasks of offlineWorld computing coordination Planning and resources coordination for LCG1&2Relations with national/international Grid projects
Core
ALICE : planning & resources
Computing needs for PDC IIIFlexibility of distributed computing modelAlternative scenarios
ALICE : planning & resources
LGC resources pledged for ALICE in 2003USA quota to be confirmed
ALICE : planning & resources
Relation with HLTIn ALICE Offline, HLT and DAQ are three distinct projectsCooperation between HLT and Offline is goodHLT is using the common AliRoot framework to do simulationSome of the HLT algorithms have been integrated in the Offline framework for testingAs the Offline, HLT coordinates the activities of the different subdetector projectsHLT main trust is the definition of the HLT architecture (HW and SW) and some seminal work on algorithmsMore work on algorithms is done in collaboration with the subdetector projectsIntegration and testing of the three projects is also performed during DCs> NOT included are manpower for the online infrastucture, i.e. cluster> management, process communication infrastructure, monitoring,> FPGA coprocessor interface.
ALICE : planning & resources
ALICE Data challengesROOTDAQROOT I/OCASTORSimulated DataCERNTIER 0TIER 1Raw DataRegionalTIER 1TIER 2GRID
ALICE : planning & resources
ADC IV Hardware Setup22233333333Total: 192 CPU servers (96 on Gbe, 96 on Fe), 36 DISK servers, 10 TAPE servers210 TAPE servers(distributed)Backbone(4 Gbps)6TOTAL: 32 portsTOTAL: 18 portsCPU servers on FETBED0001-1213-2425-3637-4849-6061-7273-7677-88TBED0007D01D-12D13D-24D25D-36DLXSHARE89-1124 Gigabit switches3 Gigabit switches4 Gigabit switches2 Fastethernet switchesFibers
ALICE : planning & resources
ALICE DC BW
ALICE : planning & resources
ADC IV performances
Event building with flat data traffic No recording 5 days non-stop 1800 MBytes/s sustainedEvent building and data recording With ALICE-like data traffic Recording to CASTOR 4.5 days non-stop to disk: ~ 140 TBytes 350 MBytes/s sustained
ALICE : planning & resources