Upload
jessica-ashley
View
17
Download
2
Embed Size (px)
DESCRIPTION
ATLAS Computing Model – US Research Program Manpower. J. Shank N.A. ATLAS Physics Workshop Tucson, AZ 21 Dec., 2004. Overview. Updates to the Computing Model The Tier hierarchy The base numbers Size estimates: T0, CAF, T1, T2 US ATLAS Research Program Manpower. - PowerPoint PPT Presentation
Citation preview
ATLAS Computing Model –US Research Program Manpower
J. Shank
N.A. ATLAS Physics WorkshopTucson, AZ21 Dec., 2004
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 2
Overview Updates to the Computing Model The Tier hierarchy The base numbers Size estimates: T0, CAF, T1, T2 US ATLAS Research Program Manpower
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 3
Computing Model http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/
OO/computing-model/Comp-Model-December15.doc Computing Model presented at the October Overview Week
Revision concerning the Tier-2s since then Revision concerning effect of pile-up, luminosity profile
There are (and will remain) many unknowns We are starting to see serious consideration of calibration and alignment needs in
the sub-detector communities, but there is a way to go! Physics data access patterns MAY start to be seen from the final stage of DC2
Too late for the document Unlikely to know the real patterns until 2007/2008!
Still uncertainties on the event sizes RAW without pile-up is just over 1.6MB limit ESD is (with only one tracking package) about 60% larger than nominal, 140%
larger with pile-up AOD is smaller than expected, but functionality will grow With the advertised assumptions, we are at the limit of available disk
Model must maintain as much flexibility as possible For review, we must present a single coherent model
All Computing Model slides are from Roger Jones at last sw week
http://agenda.cern.ch/age?a036309
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 4
Resource estimates These have been revised again
Luminosity profile 2007-2010 assumed More simulation (20% of data rate) Now only ~30 Tier-2s
We can count about 29 candidates This means that the average Tier-2 has grown because of
simulation and because it represents a larger fraction The needs of calibration from October have been used to
update the CERN Analysis Facility resources Input buffer added to Tier-0
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 5
The System
Tier2 Center ~200kSI2k
Event Builder
Event Filter~7.5MSI2k
T0 ~5MSI2k
US Regional Centre (BNL)
UK Regional Center (RAL)
French Regional Center
Dutch Regional Center
Tier3Tier3Tier3Tier 3 ~0.25TIPS
Workstations
10 GB/sec
320 MB/sec
100 - 1000 MB/s links
•Some data for calibration and monitoring to institutes
•Calibrations flow back
Each Tier 2 has ~20 physicists working on one or more channels
Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data
Tier 2 do bulk of simulation
Physics data cache
~Pb/sec
~ 75MB/s/T1 for ATLAS
Tier2 Center ~200kSI2k
Tier2 Center ~200kSI2k622Mb/s links
Tier 0Tier 0
Tier 1Tier 1
DesktopsDesktops
PC (2004) = ~1 kSpecInt2k
Other Tier2 ~200kSI2k
Tier 2Tier 2 ~200 Tb/year/T2
~2MSI2k/T1 ~2 Pb/year/T1
~5 Pb/year No simulation
622Mb/s links10 Tier-1s: rereconstruction
store simulated data
group Analysis
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 6
Computing Resources Assumption:
200 days running in 2008 and 2009 at 50% efficiency (107 sec live) 100 days running in 2007 (5x106 sec live) Events recorded are rate limited in all cases – luminosity only affects
data size and data processing time Luminosity:
0.5*1033 cm-2s-1 in 2007 2*1033 cm-2s-1 in 2008 and 2009 1034 cm-2s-1 (design luminosity) from 2010 onwards
Hierarchy Tier-0 has raw+calibration data+first-pass ESD CERN Analysis Facility has AOD, ESD and RAW samples Tier-1s hold RAW data and derived samples and ‘shadow’ the ESD for
another Tier-1 Tier-1s also house simulated data Tier-1s provide reprocessing for their RAW and scheduled access to full
ESD samples Tier-2s provide access to AOD and group Derived Physics Datasets and
carry the full simulation load
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 7
Processing Tier-0:
First pass processing on express/calibration lines 24-48 hours later, process full primary stream with
reasonable calibrations Tier-1:
Reprocess 1-2 months after arrival with better calibrations (steady state: and same software version, to produce a coherent dataset)
Reprocess all resident RAW at year end with improved calibration and software
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 8
The Input Numbers
Rate(Hz) sec/year Events/y Size(MB) Total(TB)Raw Data (inc express etc) 200 1.00E+07 2.00E+09 1.6 3200
ESD (inc express etc) 200 1.00E+07 2.00E+09 0.5 1000
General ESD 180 1.00E+07 1.80E+09 0.5 900
General AOD 180 1.00E+07 1.80E+09 0.1 180
General TAG 180 1.00E+07 1.80E+09 0.001 2
Calibration (ID, LAr, MDT) 44 (8 long-term)
MC Raw 2.00E+08 2 400
ESD Sim 2.00E+08 0.5 50
AOD Sim 2.00E+08 0.1 10
TAG Sim 2.00E+08 0.001 0
Tuple 0.01
•Nominal year 107
•Accelerator efficiency 50%
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 9
Resource Summary (15 Dec. version)
CPU(MSI2k) Tape (PB) Disk (PB) CERN Tier-0 4.1 4.2 0.35
CERN AF 2.2 0.4 1.6
Sum of Tier-1's 18.0 6.5 12.3
Sum of Tier-2's 16.2 0.0 6.9
Total 40.5 11.1 21.2
Table 1: The estimated resources required for one full year of data taking in 2008 or 2009.
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 10
Amount of Simulation is a “free” parameter
CPU (MSI2k) Tape (PB) Disk (PB)
CERN T0 Simulation 0 0.0 0.0
Other 4 4.2 0.4
CERN AF Simulation 0 0.1 0.4
Other 2 0.4 1.3
Tier 1 Simulation 2.8 1.3 1.7
Other 15.2 5.2 10.6
Tier 2 Simulation 5.6 0.0 1.0
Other 10.6 0.0 5.9
CPU (MSI2k) Tape (PB) Disk (PB)
CERN T0 Simulation 0 0.0 0.0
Other 4 4.2 0.4
CERN AF Simulation 0 0.3 1.8
Other 2 0.4 1.3
Tier 1 Simulation 14.0 6.4 8.6
Other 15.2 5.2 10.6
Tier 2 Simulation 28.1 0.0 5.2
Other 10.6 0.0 5.9
20% of data 100% of data
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 11
2008 T0 requirementsCERN T0 : Storage requirement
Disk (TB) Tape (TB)IntegratedTape (TB )
Raw 0 3040 4454
General ESD (prev..) 0 1000 1465
Calibration 240 168 280
Buffer 114 0 0
Total 354 4208 6165
Table Y1.2 CERN T0 : Computing requirement
Reconstr. Reprocess. Calibr. Cent.Analysis User Analysis Total
CPU (KSI2k) 3529 0 529 0 0 4058
Understanding of the calibration load is evolving.
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 12
T0 Evolution – Total capacity
0
5000
10000
15000
20000
25000
30000
35000
Total Disk (TB)
Total Tape (TB)
Total CPU (kSI2k)
Total Disk (TB) 164.692 354.1764 354.1764 495.847 660.539 850.0234
Total Tape (TB) 1956.684 6164.608 10372.53 16263.62 22154.72 30002.49
Total CPU (kSI2k) 1826 4058 4058 8239 10471 10471
2007 2008 2009 2010 2011 2012
Note detailed evolutions differ from the draft – revised and one bug fixed
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 13
T0 Cost/Year Evolution
0
0.5
1
1.5
2
2.5
3
3.5
Year
Est
imat
ed C
ost (
MC
HF)
CPU Cost
Tape Cost
Disk Cost
CPU Cost 1.004457 0.825887 0 0.75243 0.258782 0
Tape Cost 1.636242 1.646287 0.836843 1.285329 0.635168 0.527049
Disk Cost 0.359029 0.257699 0 0.075085 0.054517 0.039151
2007 2008 2009 2010 2011 2012
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 14
CERN Analysis Facility Small-sample chaotic
reprocessing 170kSI2k Calibration 530kSI2k User analysis
~1470kSI2k – much increased
This site does not share in the global simulation load
The start-up balance would be very different, but we should try to respect the envelope
Storage requirement -2008 data only
Disk (TB) Auto.Tape (TB)
Raw 241 0
General ESD (curr.) 229 0
General ESD (prev.) 0 18
AOD (curr.) 257 0
AOD (prev.) 0 4
TAG (curr.) 3 0
TAG (prev.) 0 2
ESD Sim (curr.) 286 0
ESD Sim (prev.) 0 4
AOD Sim (curr.) 57 0
AOD Sim (prev.) 0 40
Tag Sim (curr.) 0.6 0
Tag Sim (prev.) 0 0.4
Calibration 240 168
User Data (100 users) 303 212
Total 1615 448
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 15
Analysis Facility Evolution
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Total Disk (TB)
Total Tape (TB)
Total CPU (kSI2k)
Total Disk (TB) 751.1699 1417.862 1837.704 2761.33 4100.279 5354.749
Total Tape (TB) 208.0896 532.0612 844.2629 1272.754 1670.445 2276.226
Total CPU (kSI2k) 974 2822 4286 8117 12279 16055
2007 2008 2009 2010 2011 2012
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 16
Analysis Facility Cost/Year Evolution
0
0.5
1
1.5
2
2.5
Year
Est
imat
ed C
ost (
MC
HF)
CPU Cost
Tape Cost
Disk Cost
CPU Cost 0.535429 0.683959 0.351305 0.689681 0.482441 0.296137
Tape Cost 0.174011 0.126749 0.062089 0.093489 0.042878 0.040684
Disk Cost 1.63755 0.906701 0.356866 0.489522 0.443227 0.2592
2007 2008 2009 2010 2011 2012
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 17
Estimate about 1800kSi2k for each of 10 T1s
Central analysis (by groups, not users) ~1300kSI2k
Typical Tier-1Year 1 resources
This includes a ‘1year,1 pass’ buffer
ESD is 47% of DiskESD is 33% of Tape
Current pledges are ~55% of this requirementMaking event sizes biggermakes things worse!
2008 Average T1 RequirementsT1 : Storage
requirement
Disk (TB) Auto.Tape (TB)
Raw 43 304
General ESD (curr.) 257 90
General ESD (prev..) 129 90
AOD 283 36
TAG 3 0
Calib 240 0
RAW Sim 0 80
ESD Sim (curr.) 57 20
ESD Sim (prev.) 29 20
AOD Sim 63 8
Tag Sim 1 0
User Data (20 groups) 126 0
Total 1230 648
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 18
Single T1 Evolution
0
5000
10000
15000
20000
25000
Total Disk (TB)
Total Tape (TB)
Total CPU (kSI2k)
Total Disk (TB) 554.1362 1546.446 2309.36 4187.246 6253.862 8758.652
Total Tape (TB) 301.5246 1011.447 1853.589 3087.328 4506.174 6411.653
Total CPU (kSI2k) 790 2650 4760 8923 15033 22003
2007 2008 2009 2010 2011 2012
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 19
Single T1 Cost/Year Evolution
0
0.5
1
1.5
2
2.5
Year
Est
imat
ed C
ost (
MC
HF)
CPU Cost
Tape Cost
Disk Cost
CPU Cost 0.434472 0.688281 0.506357 0.749367 0.708333 0.546539
Tape Cost 0.252144 0.277746 0.16748 0.269179 0.152978 0.12797
Disk Cost 1.208017 1.349541 0.648477 0.99528 0.684104 0.517543
2007 2008 2009 2010 2011 2012
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 20
20 User Tier-2 2008 Data Only Typical Storage requirement
Disk (TB)
Raw 1
General ESD (curr.) 13
AOD 86
TAG 3
RAW Sim 0
ESD Sim (curr.) 6
AOD Sim 19
Tag Sim 1
User Group 42
User Data 61
Total 230
User activity includes some reconstruction (algorithm development etc)
Also includes user simulation (increased)
T2s also share the event simulation load (increased), but not the output data storage
Typical Computing requirement
Reconstruction. Reprocessing Simulation User Analysis Total (kSI2k)
CPU (KSI2k) 68 0 180 293 541
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 21
20-user T2 Evolution
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Disk (TB)
CPU (kSI2k)
Disk (TB) 107.1069 336.77117 566.33411 887.35887 1315.4905 1866.375
CPU (kSI2k) 244 704 1064 1983 3013 3944
2007 2008 2009 2010 2011 2012
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 22
20-user T2 Cost Evolution
Tier-2 cost evolution
0
0.1
0.2
0.3
0.4
0.5
0.6
Year
Cos
t (C
HF)
CPU (CHF)
Disk (CHF)
CPU (CHF) 0.133952 0.170215 0.086597 0.16529 0.119429 0.072998
Disk (CHF) 0.233493 0.312343 0.195128 0.170143 0.141723 0.113824
2007 2008 2009 2010 2011 2012
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 23
Overall 2008-only Resources(‘One Full Year’ Resources)
CERN All T1 All T2
Total
Tape (Pb) 4.6 Pb 6.5 Pb 0.0 Pb 11.1 Pb
Disk (Pb) 2.0 Pb 12.3 Pb 6.9 Pb 21.2 Pb
CPU (MSI2k) 6.2 18.0 16.2 40.5
If T2 supports private analysis, add about 1 TB and 1 kSI2k/user
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 24
Overall 2008 Total Resources
CERN
All T1 All T2 Total
Tape (Pb) 6.9 Pb 9.5 Pb 0.0 Pb 16.4 Pb
Disk (Pb) 2.9 Pb 18.0 Pb 10.1 Pb 31.0 Pb
CPU (MSI2k) 9.0 26.1 23.5 58.7
If T2 supports private analysis, add about 1.5 TB and 1.5 kSI2k/user
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 25
Important points: Discussion on disk vs tape storage at Tier-1’s
Tape in this discussion means low-access slow secure storage
Storage of Simulation Assumed to be at T1s Need partnerships to plan networking Must have fail-over to other sites
Commissioning These numbers are calculated for the steady-state but with
the requirement of flexibility in the early stages Simulation fraction is an important tunable
parameter in T2 numbers!
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 26
Latencies On the input side of the T0, assume following:
Primary stream – every physics event Publications should be based on this, uniform processing
Calibration stream – calibration + copied selected physics triggers Need to reduce latency of processing primary stream
Express stream – copied high-pT events for ‘excitement’ and (with calibration stream) for detector optimisation Must be a small percentage of total
Express and calibration streams get priority in T0 New calibrations determine the latency for primary processing Intention is to have primary processing within 48 hours Significantly more would require a prohibitively large input buffer
Level of access to RAW? Depends on functionality of ESD Discussion of small fraction of DRD – augmented RAW data Software and processing model must support very flexible data
formats
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 27
Networking EFT0 maximum 320MB/s (450MB/s with
headroom) Networking off-site now being calculated with David
Foster Recent exercise with (almost) current numbers Traffic from T0 to each Tier-1 is 75MB/s – will be
more with overheads and contention (225MB/sec) Significant traffic of ESD and AOD from
reprocessing between T1s 52MB/sec raw ~150MB/sec with overheads and contention
Dedicated networking test beyond DC2, plans in HLT
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 28
Conclusions and Timetable Computing Model documents required by 15th
December This is the last chance to alter things We need to present a single coherent model We need to maintain flexibility Intend to produce a requirements and recommendations
document Computing Model review in January 2005 (P McBride)
We need to have serious inputs at this point Documents to April RRBs MoU Signatures in Summer 2005 Computing & LCG TDR June 2005
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 29
Calibration and Start-up Discussions Richard will present some comments from others on what they
would like at start-up Some would like a large e/mu second copy on disk for repeated
reprocessing Be aware of the disk and CPU requirements
10Hz + 2 ESD versions retained = >0.75PB on disk Full sample would take 5MSI2k to reprocess in a week
Requires scheduled activity or huge resources If there are many reprocessings you must either distribute it or work
with smaller samples What were (are) we planning to provide? @CERN
1.1MSI2k in T0 and Super T2 for calibration etc T2 also has 0.8MSI2k for user analysis Super T2 with 0.75TB disk, mainly AoD but could be more
Raw+ESD to start In T1 Cloud
T1 cloud has 10% of Raw on disk and 0.5MSI2k in T1 cloud for calibration
In T2s 0.5PB for RAW+ESD, should allow small unscheduled activities
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 30
Reality check Putting 10Hz e/mu on
disk would require more than double the CERN disk
We are already short of disk in the T1s (funding source the same!)
There is capacity in the T1s so long as the sets are replaced with the steady-state sets as the year progresses
Split 2008 ALICE ATLAS CMS LHCb
Offered 6690 16240 10325 7450
Required 9100 16600 12600 9500
Balance -26% -2% -18% -22%
Offered 769 5171 4406 1154
Required 3000 9200 8700 1300
Balance -74% -44% -49% -11%
Offered 2.1 8.9 5.1 1.7
Required 3.6 6.0 6.6 0.4
Balance -42% 48% -23% 325%
Snapshot of Tier-1 status
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 31
End of Computing Model talk.
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 32
U.S. Research Program Manpower Compuitng and Physics Profile
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
FY03 FY04 FY05 FY06 FY07 FY08
(AY
k$
)
Physics
LCG Common project
T2
DC/prod.
T1
sw
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 33
FY05 Software Manpower 5.75 FTE @ LBNL
Core/Framework 5 FTE @ ANL
Data Management Event Store
5 FTE @BNL DB/Distributed Analysis/sw Infrastructure
1 FTE @ U Pittsburgh Detector Description
1 FTE @ Indiana U. Improving end-user usability of Athena
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 34
Est. FY06 Software Manpower Maintain FY05 Move from PPDG to Program funds
Puts about 1 FTE burden on program + approx. 2 FTE at Universities In long term, total expected program funded
at universities is about 7 FTE
12/21/04 NA ATLAS Physics Workshop Tucson, AZ 35
FY07 and beyond sw manpower Reaching plateau
Maybe 1-2 more FTE at universities Obviously, manpower for physics analysis
(students, post-docs) is going to have to come from the base program. We (project management) try to help get
DOE/NSF base funding for all, but…prospects have not been good
“redirection” from Tevatron starting to happen, but it might not be enough for our needs in 2007