35
ATLAS Computing Model US Research Program Manpower J. Shank N.A. ATLAS Physics Workshop Tucson, AZ 21 Dec., 2004

ATLAS Computing Model – US Research Program Manpower

Embed Size (px)

DESCRIPTION

ATLAS Computing Model – US Research Program Manpower. J. Shank N.A. ATLAS Physics Workshop Tucson, AZ 21 Dec., 2004. Overview. Updates to the Computing Model The Tier hierarchy The base numbers Size estimates: T0, CAF, T1, T2 US ATLAS Research Program Manpower. - PowerPoint PPT Presentation

Citation preview

Page 1: ATLAS Computing Model – US Research Program Manpower

ATLAS Computing Model –US Research Program Manpower

J. Shank

N.A. ATLAS Physics WorkshopTucson, AZ21 Dec., 2004

Page 2: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 2

Overview Updates to the Computing Model The Tier hierarchy The base numbers Size estimates: T0, CAF, T1, T2 US ATLAS Research Program Manpower

Page 3: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 3

Computing Model http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/

OO/computing-model/Comp-Model-December15.doc Computing Model presented at the October Overview Week

Revision concerning the Tier-2s since then Revision concerning effect of pile-up, luminosity profile

There are (and will remain) many unknowns We are starting to see serious consideration of calibration and alignment needs in

the sub-detector communities, but there is a way to go! Physics data access patterns MAY start to be seen from the final stage of DC2

Too late for the document Unlikely to know the real patterns until 2007/2008!

Still uncertainties on the event sizes RAW without pile-up is just over 1.6MB limit ESD is (with only one tracking package) about 60% larger than nominal, 140%

larger with pile-up AOD is smaller than expected, but functionality will grow With the advertised assumptions, we are at the limit of available disk

Model must maintain as much flexibility as possible For review, we must present a single coherent model

All Computing Model slides are from Roger Jones at last sw week

http://agenda.cern.ch/age?a036309

Page 4: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 4

Resource estimates These have been revised again

Luminosity profile 2007-2010 assumed More simulation (20% of data rate) Now only ~30 Tier-2s

We can count about 29 candidates This means that the average Tier-2 has grown because of

simulation and because it represents a larger fraction The needs of calibration from October have been used to

update the CERN Analysis Facility resources Input buffer added to Tier-0

Page 5: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 5

The System

Tier2 Center ~200kSI2k

Event Builder

Event Filter~7.5MSI2k

T0 ~5MSI2k

US Regional Centre (BNL)

UK Regional Center (RAL)

French Regional Center

Dutch Regional Center

Tier3Tier3Tier3Tier 3 ~0.25TIPS

Workstations

10 GB/sec

320 MB/sec

100 - 1000 MB/s links

•Some data for calibration and monitoring to institutes

•Calibrations flow back

Each Tier 2 has ~20 physicists working on one or more channels

Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data

Tier 2 do bulk of simulation

Physics data cache

~Pb/sec

~ 75MB/s/T1 for ATLAS

Tier2 Center ~200kSI2k

Tier2 Center ~200kSI2k622Mb/s links

Tier 0Tier 0

Tier 1Tier 1

DesktopsDesktops

PC (2004) = ~1 kSpecInt2k

Other Tier2 ~200kSI2k

Tier 2Tier 2 ~200 Tb/year/T2

~2MSI2k/T1 ~2 Pb/year/T1

~5 Pb/year No simulation

622Mb/s links10 Tier-1s: rereconstruction

store simulated data

group Analysis

Page 6: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 6

Computing Resources Assumption:

200 days running in 2008 and 2009 at 50% efficiency (107 sec live) 100 days running in 2007 (5x106 sec live) Events recorded are rate limited in all cases – luminosity only affects

data size and data processing time Luminosity:

0.5*1033 cm-2s-1 in 2007 2*1033 cm-2s-1 in 2008 and 2009 1034 cm-2s-1 (design luminosity) from 2010 onwards

Hierarchy Tier-0 has raw+calibration data+first-pass ESD CERN Analysis Facility has AOD, ESD and RAW samples Tier-1s hold RAW data and derived samples and ‘shadow’ the ESD for

another Tier-1 Tier-1s also house simulated data Tier-1s provide reprocessing for their RAW and scheduled access to full

ESD samples Tier-2s provide access to AOD and group Derived Physics Datasets and

carry the full simulation load

Page 7: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 7

Processing Tier-0:

First pass processing on express/calibration lines 24-48 hours later, process full primary stream with

reasonable calibrations Tier-1:

Reprocess 1-2 months after arrival with better calibrations (steady state: and same software version, to produce a coherent dataset)

Reprocess all resident RAW at year end with improved calibration and software

Page 8: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 8

The Input Numbers

Rate(Hz) sec/year Events/y Size(MB) Total(TB)Raw Data (inc express etc) 200 1.00E+07 2.00E+09 1.6 3200

ESD (inc express etc) 200 1.00E+07 2.00E+09 0.5 1000

General ESD 180 1.00E+07 1.80E+09 0.5 900

General AOD 180 1.00E+07 1.80E+09 0.1 180

General TAG 180 1.00E+07 1.80E+09 0.001 2

Calibration (ID, LAr, MDT) 44 (8 long-term)

MC Raw 2.00E+08 2 400

ESD Sim 2.00E+08 0.5 50

AOD Sim 2.00E+08 0.1 10

TAG Sim 2.00E+08 0.001 0

Tuple 0.01

•Nominal year 107

•Accelerator efficiency 50%

Page 9: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 9

Resource Summary (15 Dec. version)

CPU(MSI2k) Tape (PB) Disk (PB) CERN Tier-0 4.1 4.2 0.35

CERN AF 2.2 0.4 1.6

Sum of Tier-1's 18.0 6.5 12.3

Sum of Tier-2's 16.2 0.0 6.9

Total 40.5 11.1 21.2

Table 1: The estimated resources required for one full year of data taking in 2008 or 2009.

Page 10: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 10

Amount of Simulation is a “free” parameter

CPU (MSI2k) Tape (PB) Disk (PB)

CERN T0 Simulation 0 0.0 0.0

Other 4 4.2 0.4

CERN AF Simulation 0 0.1 0.4

Other 2 0.4 1.3

Tier 1 Simulation 2.8 1.3 1.7

Other 15.2 5.2 10.6

Tier 2 Simulation 5.6 0.0 1.0

Other 10.6 0.0 5.9

CPU (MSI2k) Tape (PB) Disk (PB)

CERN T0 Simulation 0 0.0 0.0

Other 4 4.2 0.4

CERN AF Simulation 0 0.3 1.8

Other 2 0.4 1.3

Tier 1 Simulation 14.0 6.4 8.6

Other 15.2 5.2 10.6

Tier 2 Simulation 28.1 0.0 5.2

Other 10.6 0.0 5.9

20% of data 100% of data

Page 11: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 11

2008 T0 requirementsCERN T0 : Storage requirement

 

  Disk (TB) Tape (TB)IntegratedTape (TB )      

Raw 0 3040 4454

General ESD (prev..) 0 1000 1465

Calibration 240 168 280

Buffer 114 0 0

Total 354 4208 6165

Table Y1.2 CERN T0 : Computing requirement

  Reconstr. Reprocess. Calibr. Cent.Analysis User Analysis Total

CPU (KSI2k) 3529 0 529 0 0 4058

Understanding of the calibration load is evolving.

Page 12: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 12

T0 Evolution – Total capacity

0

5000

10000

15000

20000

25000

30000

35000

Total Disk (TB)

Total Tape (TB)

Total CPU (kSI2k)

Total Disk (TB) 164.692 354.1764 354.1764 495.847 660.539 850.0234

Total Tape (TB) 1956.684 6164.608 10372.53 16263.62 22154.72 30002.49

Total CPU (kSI2k) 1826 4058 4058 8239 10471 10471

2007 2008 2009 2010 2011 2012

Note detailed evolutions differ from the draft – revised and one bug fixed

Page 13: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 13

T0 Cost/Year Evolution

0

0.5

1

1.5

2

2.5

3

3.5

Year

Est

imat

ed C

ost (

MC

HF)

CPU Cost

Tape Cost

Disk Cost

CPU Cost 1.004457 0.825887 0 0.75243 0.258782 0

Tape Cost 1.636242 1.646287 0.836843 1.285329 0.635168 0.527049

Disk Cost 0.359029 0.257699 0 0.075085 0.054517 0.039151

2007 2008 2009 2010 2011 2012

Page 14: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 14

CERN Analysis Facility Small-sample chaotic

reprocessing 170kSI2k Calibration 530kSI2k User analysis

~1470kSI2k – much increased

This site does not share in the global simulation load

The start-up balance would be very different, but we should try to respect the envelope

Storage requirement -2008 data only

  Disk (TB) Auto.Tape (TB)

Raw 241 0

General ESD (curr.) 229 0

General ESD (prev.) 0 18

AOD (curr.) 257 0

AOD (prev.) 0 4

TAG (curr.) 3 0

TAG (prev.) 0 2

ESD Sim (curr.) 286 0

ESD Sim (prev.) 0 4

AOD Sim (curr.) 57 0

AOD Sim (prev.) 0 40

Tag Sim (curr.) 0.6 0

Tag Sim (prev.) 0 0.4

Calibration 240 168

User Data (100 users) 303 212

Total 1615 448

Page 15: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 15

Analysis Facility Evolution

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Total Disk (TB)

Total Tape (TB)

Total CPU (kSI2k)

Total Disk (TB) 751.1699 1417.862 1837.704 2761.33 4100.279 5354.749

Total Tape (TB) 208.0896 532.0612 844.2629 1272.754 1670.445 2276.226

Total CPU (kSI2k) 974 2822 4286 8117 12279 16055

2007 2008 2009 2010 2011 2012

Page 16: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 16

Analysis Facility Cost/Year Evolution

0

0.5

1

1.5

2

2.5

Year

Est

imat

ed C

ost (

MC

HF)

CPU Cost

Tape Cost

Disk Cost

CPU Cost 0.535429 0.683959 0.351305 0.689681 0.482441 0.296137

Tape Cost 0.174011 0.126749 0.062089 0.093489 0.042878 0.040684

Disk Cost 1.63755 0.906701 0.356866 0.489522 0.443227 0.2592

2007 2008 2009 2010 2011 2012

Page 17: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 17

Estimate about 1800kSi2k for each of 10 T1s

Central analysis (by groups, not users) ~1300kSI2k

Typical Tier-1Year 1 resources

This includes a ‘1year,1 pass’ buffer

ESD is 47% of DiskESD is 33% of Tape

Current pledges are ~55% of this requirementMaking event sizes biggermakes things worse!

2008 Average T1 RequirementsT1 : Storage

requirement

 

Disk (TB) Auto.Tape (TB)

Raw 43 304

General ESD (curr.) 257 90

General ESD (prev..) 129 90

AOD 283 36

TAG 3 0

Calib 240 0

RAW Sim 0 80

ESD Sim (curr.) 57 20

ESD Sim (prev.) 29 20

AOD Sim 63 8

Tag Sim 1 0

User Data (20 groups) 126 0

Total 1230 648

Page 18: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 18

Single T1 Evolution

0

5000

10000

15000

20000

25000

Total Disk (TB)

Total Tape (TB)

Total CPU (kSI2k)

Total Disk (TB) 554.1362 1546.446 2309.36 4187.246 6253.862 8758.652

Total Tape (TB) 301.5246 1011.447 1853.589 3087.328 4506.174 6411.653

Total CPU (kSI2k) 790 2650 4760 8923 15033 22003

2007 2008 2009 2010 2011 2012

Page 19: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 19

Single T1 Cost/Year Evolution

0

0.5

1

1.5

2

2.5

Year

Est

imat

ed C

ost (

MC

HF)

CPU Cost

Tape Cost

Disk Cost

CPU Cost 0.434472 0.688281 0.506357 0.749367 0.708333 0.546539

Tape Cost 0.252144 0.277746 0.16748 0.269179 0.152978 0.12797

Disk Cost 1.208017 1.349541 0.648477 0.99528 0.684104 0.517543

2007 2008 2009 2010 2011 2012

Page 20: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 20

20 User Tier-2 2008 Data Only Typical Storage requirement

  Disk (TB)

Raw 1

General ESD (curr.) 13

AOD 86

TAG 3

RAW Sim 0

ESD Sim (curr.) 6

AOD Sim 19

Tag Sim 1

User Group 42

User Data 61

Total 230

User activity includes some reconstruction (algorithm development etc)

Also includes user simulation (increased)

T2s also share the event simulation load (increased), but not the output data storage

Typical Computing requirement

  Reconstruction. Reprocessing Simulation User Analysis Total (kSI2k)

CPU (KSI2k) 68 0 180 293 541

Page 21: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 21

20-user T2 Evolution

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Disk (TB)

CPU (kSI2k)

Disk (TB) 107.1069 336.77117 566.33411 887.35887 1315.4905 1866.375

CPU (kSI2k) 244 704 1064 1983 3013 3944

2007 2008 2009 2010 2011 2012

Page 22: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 22

20-user T2 Cost Evolution

Tier-2 cost evolution

0

0.1

0.2

0.3

0.4

0.5

0.6

Year

Cos

t (C

HF)

CPU (CHF)

Disk (CHF)

CPU (CHF) 0.133952 0.170215 0.086597 0.16529 0.119429 0.072998

Disk (CHF) 0.233493 0.312343 0.195128 0.170143 0.141723 0.113824

2007 2008 2009 2010 2011 2012

Page 23: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 23

Overall 2008-only Resources(‘One Full Year’ Resources)

  CERN   All T1  All T2 

Total 

Tape (Pb) 4.6 Pb 6.5 Pb 0.0 Pb 11.1 Pb

Disk (Pb) 2.0 Pb 12.3 Pb 6.9 Pb 21.2 Pb

CPU (MSI2k) 6.2 18.0 16.2 40.5

If T2 supports private analysis, add about 1 TB and 1 kSI2k/user

Page 24: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 24

Overall 2008 Total Resources

 CERN 

All T1  All T2   Total  

Tape (Pb) 6.9 Pb 9.5 Pb 0.0 Pb 16.4 Pb

Disk (Pb) 2.9 Pb 18.0 Pb 10.1 Pb 31.0 Pb

CPU (MSI2k) 9.0 26.1 23.5 58.7

If T2 supports private analysis, add about 1.5 TB and 1.5 kSI2k/user

Page 25: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 25

Important points: Discussion on disk vs tape storage at Tier-1’s

Tape in this discussion means low-access slow secure storage

Storage of Simulation Assumed to be at T1s Need partnerships to plan networking Must have fail-over to other sites

Commissioning These numbers are calculated for the steady-state but with

the requirement of flexibility in the early stages Simulation fraction is an important tunable

parameter in T2 numbers!

Page 26: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 26

Latencies On the input side of the T0, assume following:

Primary stream – every physics event Publications should be based on this, uniform processing

Calibration stream – calibration + copied selected physics triggers Need to reduce latency of processing primary stream

Express stream – copied high-pT events for ‘excitement’ and (with calibration stream) for detector optimisation Must be a small percentage of total

Express and calibration streams get priority in T0 New calibrations determine the latency for primary processing Intention is to have primary processing within 48 hours Significantly more would require a prohibitively large input buffer

Level of access to RAW? Depends on functionality of ESD Discussion of small fraction of DRD – augmented RAW data Software and processing model must support very flexible data

formats

Page 27: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 27

Networking EFT0 maximum 320MB/s (450MB/s with

headroom) Networking off-site now being calculated with David

Foster Recent exercise with (almost) current numbers Traffic from T0 to each Tier-1 is 75MB/s – will be

more with overheads and contention (225MB/sec) Significant traffic of ESD and AOD from

reprocessing between T1s 52MB/sec raw ~150MB/sec with overheads and contention

Dedicated networking test beyond DC2, plans in HLT

Page 28: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 28

Conclusions and Timetable Computing Model documents required by 15th

December This is the last chance to alter things We need to present a single coherent model We need to maintain flexibility Intend to produce a requirements and recommendations

document Computing Model review in January 2005 (P McBride)

We need to have serious inputs at this point Documents to April RRBs MoU Signatures in Summer 2005 Computing & LCG TDR June 2005

Page 29: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 29

Calibration and Start-up Discussions Richard will present some comments from others on what they

would like at start-up Some would like a large e/mu second copy on disk for repeated

reprocessing Be aware of the disk and CPU requirements

10Hz + 2 ESD versions retained = >0.75PB on disk Full sample would take 5MSI2k to reprocess in a week

Requires scheduled activity or huge resources If there are many reprocessings you must either distribute it or work

with smaller samples What were (are) we planning to provide? @CERN

1.1MSI2k in T0 and Super T2 for calibration etc T2 also has 0.8MSI2k for user analysis Super T2 with 0.75TB disk, mainly AoD but could be more

Raw+ESD to start In T1 Cloud

T1 cloud has 10% of Raw on disk and 0.5MSI2k in T1 cloud for calibration

In T2s 0.5PB for RAW+ESD, should allow small unscheduled activities

Page 30: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 30

Reality check Putting 10Hz e/mu on

disk would require more than double the CERN disk

We are already short of disk in the T1s (funding source the same!)

There is capacity in the T1s so long as the sets are replaced with the steady-state sets as the year progresses

Split 2008 ALICE ATLAS CMS LHCb

Offered 6690 16240 10325 7450

Required 9100 16600 12600 9500

Balance -26% -2% -18% -22%

Offered 769 5171 4406 1154

Required 3000 9200 8700 1300

Balance -74% -44% -49% -11%

Offered 2.1 8.9 5.1 1.7

Required 3.6 6.0 6.6 0.4

Balance -42% 48% -23% 325%

Snapshot of Tier-1 status

Page 31: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 31

End of Computing Model talk.

Page 32: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 32

U.S. Research Program Manpower Compuitng and Physics Profile

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

FY03 FY04 FY05 FY06 FY07 FY08

(AY

k$

)

Physics

LCG Common project

T2

DC/prod.

T1

sw

Page 33: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 33

FY05 Software Manpower 5.75 FTE @ LBNL

Core/Framework 5 FTE @ ANL

Data Management Event Store

5 FTE @BNL DB/Distributed Analysis/sw Infrastructure

1 FTE @ U Pittsburgh Detector Description

1 FTE @ Indiana U. Improving end-user usability of Athena

Page 34: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 34

Est. FY06 Software Manpower Maintain FY05 Move from PPDG to Program funds

Puts about 1 FTE burden on program + approx. 2 FTE at Universities In long term, total expected program funded

at universities is about 7 FTE

Page 35: ATLAS Computing Model – US Research Program Manpower

12/21/04 NA ATLAS Physics Workshop Tucson, AZ 35

FY07 and beyond sw manpower Reaching plateau

Maybe 1-2 more FTE at universities Obviously, manpower for physics analysis

(students, post-docs) is going to have to come from the base program. We (project management) try to help get

DOE/NSF base funding for all, but…prospects have not been good

“redirection” from Tevatron starting to happen, but it might not be enough for our needs in 2007