Upload
fearghus-frain
View
22
Download
2
Tags:
Embed Size (px)
DESCRIPTION
CMS-CCS Status and Plans. May 11, 2002 USCMS meeting David Stickland. Outline. Won’t say anything about ORCA, OSCAR, DDD, IGUANA See talks from Darin, Sarah and Jim. See Tutorials last week at UCSD See Paris’s talk to LHCC this Tuesday Production LCG and its effect on CMS program - PowerPoint PPT Presentation
Citation preview
DPS May 11/2002 USCMS
CMS-CCS Status and PlansCMS-CCS Status and Plans
May 11, 2002 USCMS meeting
David Stickland
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 2
OutlineOutline
Won’t say anything about ORCA, OSCAR, DDD, IGUANA See talks from Darin, Sarah and Jim.
See Tutorials last week at UCSD
See Paris’s talk to LHCC this Tuesday
Production
LCG and its effect on CMS program
Draft of new schedule
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 3
CMS - Productions and Computing Data CMS - Productions and Computing Data ChallengesChallenges
Already completed 2000,1: Single site production challenges with up to 300 nodes
~5 Million events, Pileup for 1034 2000,1: GRID enabled prototypes demonstrated 2001,2: Worldwide production infrastructure
11 Regional Centers comprising 21 computing installations Shared production database, job tracking, sample validation etc 10M min-bias, simulated-reconstructed-analyzed for calibration studies
Underway Now Worldwide production 10 million events for DAQ TDR
1000 CPU’s in use Production and Analysis planned at CERN and offsite
Being Scheduled Single Site Production Challenges
Test code performance, computing performance, identify bottlenecks etc Multi Site Production Challenges
Test Infrastructure, GRID prototypes, networks, replication… Single- and Multi-Site Analysis Challenges
Stress local and GRID prototypes under quite different conditions to Analysis
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 4
Production StatusProduction StatusProd.Cycle Simulation ooHit NoPU digi 2x1033PU digi 1034PU digi
RC Done:
CERN 96% Started Started Started Started
INFN 100% Started Started
Imperial Coll. 89% Started Started Test in progress
UCSD 95% Started Test successful Started
Moscow 100% Started Test successful
FNAL 89% Started - Started
UFL 100% Started Test successful
Wisconsin 97% Test successful Test in progress
Caltech 100% Test in progress
IN2P3 100% Test in progress
Bristol/Ral 28% Test in progress
USMOP 0%
Done 88% 61% 81% 5.% 17%
Estimate 11/May/02,complete in 17 Days.
(June 1 deadline!)
Estimate 11/May/02,complete in 17 Days.
(June 1 deadline!)
May 11. Production Status
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
Simulated Hit No PU 2x10 3̂3 10 3̂4 Filtered
Eve
nts
Requested
Produced
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 5
Production 2002, ComplexityProduction 2002, Complexity
Number of Regional Centers 11
Number of Computing Centers 21
Number of CPU’s ~1000
Largest Local Center 176 CPUs
Number of Production Passes for each Dataset(including analysis group processing done by production) 6-8
Number of Files ~11,000
Data Size (Not including fz files from Simulation) 15TB
File Transfer by GDMP and by perl Scripts over scp/bbcp
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 6
LCG StatusLCG Status
Applications Area Persistency Framework, established a roadmap for a new software
based on ROOT and on an RDBMS layer (Hybrid solution) Project manager appointed, work starting !
Defined parameters of an LCG SW Infrastructure group But so far no people! And big decision between SCRAM/CMT to be made
Mathlib, indicates requirement for skilled mathlib personnel Investigating use of resources assigned to LCG by India
MSS, premature for all but ALICE Detector Description Database
About to start, excellent opportunity for collaboration exists Simulation
Waiting for G4-HEPURC (We urgently need this body to start work) Next Month, start an RTAG on Interactive Analysis
Urgent requirement to clarify focus of this activity
(Interest in using IGUANA expressed also by some other experiments)
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 7
CMS - Schedule for Challenge Ramp UpCMS - Schedule for Challenge Ramp Up
All CMS work to date with Objectivity,
Now being phased out to be replaced with LCG Software
Enforced lull in production challenges
No point to do work to optimize a solution being replaced
(But much learnt in past challenges to influence new design)
Use Challenge time in 2002 to benchmark current performance
Aim to start testing new system as it becomes available
Target early 2003 for first realistic tests
Thereafter return to roughly exponential complexity ramp up to reach 50%
complexity in 2005
20% Data Challenge, (50% complexity in 2005 is approximately
20% capacity)
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 8
Objectivity IssuesObjectivity Issues
Bleak CERN has not renewed the Objectivity Maintenance
Old licenses are still applicable, but cannot be migrated to new hardware Our understanding is that we can continue to use the product as before,
clearly without support any longer Not clear if this applies to any Redhat Version
or for that matter other Linux OS’s Recent contradictory statements from IT/DB
Will become increasingly difficult during this year to find sufficient resources correctly configured for our Objectivity usage.
We are preparing for the demise of our Objectivity-based code by the end of this year CMS already contributing to the new LCG Software Aiming to have first prototypes for catalog layer by July Initial release of CMS prototype ROOT+LCG, September 2002
24/4/02Now Clear we cannot
use on other OS versions
24/4/02Now Clear we cannot
use on other OS versions
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 9
MetaDataCatalog
DictionarySvcStreamerSvcStreamerSvc
PersistencyMgr
IReflectionStreamerSvc DictionarySvc
StorageMgr
IPReflection
FileCatalog
ICnv
IReadWrite
C++
CacheMgr
ICache
TFile,TDirectoryTSocket
TClass, etc.
TBuffer, TMessage, TRef, TKey
TGrid
TTree
TStreamerInfo
IteratorSvc TChainTEventListTDSet
IPers
IFCatalog
SelectorSvc
IMCatalog
PlacementSvcIPlacement
TFile
CustomCacheMgrIPers
One possible One possible
mapping to a ROOTmapping to a ROOT
implementationimplementation
(under discussion)(under discussion)
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 10
CMS Action to Support LCGCMS Action to Support LCG
We expect >50% of our ATF (Architecture,Frameworks, Toolkits) effort to be directed to LCG in short/medium term First person assigned full time to persistency framework the day the
work package started 3-5 more people ready to join work as task develops Initial emphasis,
build the catalog layer that is missing from ROOT Remove Objectivity from COBRA/ORCA(OSCAR) Ensure Simple ROOT storage of objects is working Aim to have basic catalog services by July,
basic COBRA/ORCA/OSCAR using new persistency scheme by September
Try to get our Release Tools (SCRAM) to be adopted by LCG (Two possibilities CMT (LHCb,ATLAS) or SCRAM)
SCRAM is a better product! If adopted we would expect to have to put extra effort into supporting a
wider community Aim to get some extra manpower from LCG
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 11
CMS and the GRIDCMS and the GRID
CMS Grid Implementation plan for 2002 published
Close collaboration with EDG and Griphyn/iVDGL,PPDG
Upcoming CMS GRID/Production Workshop (June CMSweek) File Transfers, Fabrics
Production File Transfer Software Experiences
Production File Transfer Hardware Status & Reports
Future Evolution of File Transfer Tools
Production Tools Monte Carlo Production System Architecture
Experiences with Tools
Monitoring / Deployment Planning Experiences with Grid Monitoring Tools
Towards a Rational System for Tool Deployment
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 12
The The Computing ModelComputing Model
MONARC: CMS Analysis ProcessMONARC: CMS Analysis Process
Hierarchy of Processes (Experiment, Analysis Groups,Individuals)Hierarchy of Processes (Experiment, Analysis Groups,Individuals)
SelectionSelection
Iterative selectionIterative selectionOnce per monthOnce per month
~20 Groups’~20 Groups’ActivityActivity
(10(109 9 101077 events)events)
Trigger based andTrigger based andPhysics basedPhysics basedrefinementsrefinements
25 SI95sec/ event~20 jobs per
month
25 SI95sec/ event25 SI95sec/ event~20 jobs per ~20 jobs per
monthmonth
AnalysisAnalysisDifferent Physics cutsDifferent Physics cuts
& MC comparison& MC comparison~Once per day~Once per day
~25 Individual~25 Individualper Groupper GroupActivityActivity
(10(1066 –– 101077 events)events)
Algorithms Algorithms applied to dataapplied to datato get resultsto get results
10 SI95sec/ event~500 jobs per day10 SI95sec/ event10 SI95sec/ event~500 jobs per day~500 jobs per day
Monte CarloMonte Carlo5000 SI95sec/ event5000 SI95sec/ event5000 SI95sec/ event
RAW DataRAW Data
ReconstructionReconstruction ReRe--processingprocessing3 Times per year3 Times per year
ExperimentExperiment--Wide ActivityWide Activity(10(1099 events)events)
New detector New detector calibrationscalibrations
Or understandingOr understanding
3000 SI95sec/ event
1 job year
3000 3000 SI95sec/ eventSI95sec/ event
1 job year1 job year
3000 SI95sec/ event3 jobs per year
3000 3000 SI95sec/ eventSI95sec/ event3 jobs per year3 jobs per year
CMS Computing Model needs updating
1. CMS (and ATLAS) refining Trigger/DAQ rates
2. PASTA process re-costing HW and re-extrapolating “Moore's law”
3. Realistic cost constraints
With above in place, optimize computing model Need continued
development and refinement of MONARC-like tools to simulate and validate computing models
Realistically this will take most of this year
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 13
CCS ManpowerCCS Manpower
More or less constant over last year, small increase 52 “names” identified working on CCS tasks 13 ~Full-Time “Engineers” (All CERN or USA)
Of which 7 in ATF group ~20 in Worldwide production operations or support
Last detailed plan called for ~ 30 “Engineers” this year (next year with LHC delay) Delays running at ~ 4 months extra delay per year elapsed
OK as long as LHC keeps getting delayed..On the old schedule we would be getting in to big trouble by
now Use LCG to leverage external manpower,
But no free lunch, CMS probably has the biggest Software group and so will need to contribute proportionately more to the work
Make extra effort to use worldwide manpower We do this already
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 14
Draft CPT Schedule ChangeDraft CPT Schedule Change
LHCbeam
Fix
9 months
9 months12
months
~15mo
15 months
LCGTDR
LCGTDR
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 15
DP
S M
ay 1
1/20
02 U
SCM
S
Slide 16
SummarySummary
Still very hard to make firm plans Experiment and LCG schedules are being aligned.
No big problems so far But we do not yet know how much we will have to contribute and how
much we will get We have a slow increase in manpower available, but still most
of it is from CERN and USA. Some major parts of the collaboration are still contributing zero to the CCS effort.
If LHC had not slipped, we would be in trouble defining our baseline (and then getting the Physics TDR underway): Persistency (Transition of 18 months was foreseen, and will be needed) OSCAR validation, SW product needs restructuring, and PRS not
available for physics/detector validation till after DAQ TDR We are proactively trying to find commonality with any other
groups to offload work. CMS is a major contributor of LHC software, so no free lunch, we still
have to do a lot of the work