37
U.S. ATLAS Computing Facilities U.S. ATLAS Physics & Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

U.S. ATLAS Computing Facilities U.S. ATLAS Physics Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

Embed Size (px)

DESCRIPTION

11 January, 2000 U.S. ATLAS Physics & Computing Review 3 Setting the Scale Uncertainties in Defining Requirements –Five years of detector, algorithm & software development –Five years of computer technology evolution Start from ATLAS estimate & rules of thumb Adjust for US ATLAS perspective (experience and priorities) Adjust for details of architectural model of US ATLAS facilities

Citation preview

Page 1: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

U.S. ATLAS Computing Facilities

U.S. ATLAS Physics & Computing Review

Bruce G. Gibbard, BNL10-11 January 2000

Page 2: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

2

US ATLAS Computing Facilities• Facilities procured, installed and operated

– …to meet US ‘MOU’ Obligations• Direct IT responsibility (Monte Carlo, for example)• Support for detector construction, testing, & calib.• Support for software development and testing

– …to enable effective participation by US physicists in ATLAS physics program!

• Direct access to and analysis of physics data sets• Support simulation, re-reconstruction, and

reorganization of data associated with that analysis

Page 3: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

3

Setting the Scale• Uncertainties in Defining Requirements

– Five years of detector, algorithm & software development

– Five years of computer technology evolution• Start from ATLAS estimate & rules of thumb• Adjust for US ATLAS perspective (experience and

priorities)• Adjust for details of architectural model of US

ATLAS facilities

Page 4: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

4

Atlas Estimate & Rules of Thumb

• Tier 1 Center in ‘05 should include ...– 30,000 SPECint95 for Analysis– 10-20,000 SPECint95 for Simulation– 50-100 TBytes/year of On-line (Disk) Storage– 200 TBytes/year of Near-line (Robotic Tape)

Storage– 100 Mbit/sec connectivity to CERN

• Assume no major raw data processing or handling outside of CERN

Page 5: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

5

US ATLAS Perspective• US ATLAS facilities must be adequate to

meet any reasonable U.S. ATLAS computing needs (U.S. role in ATLAS should not be constrained by a computing shortfall, rather the U.S. role should be enhanced by computing strength)– Store & re-reconstruct 10-30% of events– Take high end of simulation capacity range– Take high end of disk capacity range– Augment analysis capacity– Augment CERN link bandwidth

Page 6: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

6

Adjusted For US ATLAS Perspective

• US ATLAS Tier 1 Center in ‘05 should include ...– 10,000 SPECint95 for Re-reconstruction– 50,000 SPECint95 for Analysis– 20,000 SPECint95 for Simulation– 100 TBytes/year of On-line (Disk) Storage– 300 TBytes/year of Near-line (Robotic Tape) Storage– Dedicate OC12, 622 Mbit/sec to CERN

Page 7: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

7

Architectural Model• Consists of Transparent Hierarchically Distributed

Grid Connected Computing Resources– Primary ATLAS Computing Centre at CERN– US ATLAS Tier 1 Computing Center at BNL

• National in scope at ~20% of CERN– US ATLAS Tier 2 Computing Centers

• Six, each regional in scope at ~20% of Tier 1• Likely one of them at CERN

– US ATLAS Institutional Computing Facilities• Local LAN in scope, not project supported

– US ATLAS Individual Desk Top Systems

Page 8: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

8

Schematic of Model

ATLAS CERN Computing

Center

US ATLAS Tier 2 Computing

Center

US ATLAS Tier 1 Computing

Center

Tier 3 Computing

US ATLAS Tier 2 Computing

Center

US ATLAS Tier 2 Computing

Center

Tier 3 Computing

Tier 3 Computing

Tier 3 Computing

US ATLAS User

International

National

Regional

Institutional

US ATLAS User

US ATLAS User

US ATLAS User

US ATLAS User

US ATLAS User

US ATLAS User

US ATLAS User

Individual

.

.

.

LAN

Atlantic

Page 9: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

9

Distributed Model• Rationale (benefits)

– Improved user access to computing resources • Local geographic travel • Higher performance regional networks

– Enable local autonomy • Less widely shared• More locally managed resources

– Increased capacities • Encourage integration of other equipment & expertise

– Institutional, base program• Additional funding options

– Com Sci, NSF

Page 10: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

10

Distributed Model• But increase vulnerability (Risk)

– Increased dependence on network– Increased dependence on GRID infrastructure R&D– Increased dependence on facility modeling tools– More complex management

• Risk / benefit analysis must yield positive result

Page 11: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

11

Adjusted For Architectural Model

• US ATLAS facilities in ‘05 should include ...– 10,000 SPECint95 for Re-reconstruction– 85,000 SPECint95 for Analysis– 35,000 SPECint95 for Simulation– 190 TBytes/year of On-line (Disk) Storage– 300 TBytes/year of Near-line (Robotic Tape) Storage– Dedicated OC12 622 Mbit/sec Tier 1 connectivity to each

Tier 2– Dedicate OC12 622 Mbit/sec to CERN

Page 12: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

12GRID Infrastructure

• GRID infrastructure software must supply– Efficiency (optimizing hardware use)– Transparency (optimizing user effectiveness)

• Projects– PPDG : Distributed data services - Later talk by D. Malon– APOGEE: Complete GRID infrastructure including: distributed

resources management, modeling, instrumentation, etc.– GriPhyN: Staged development toward delivery of a production system

• Alternative to success with these projects is a difficult to use and/or inefficient overall system

• U.S. ATLAS involvement includes - ANL, LBNL, LBNL

Page 13: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

13

Facility Modeling• Performance of Complex Distribute System is

Difficult but Necessary to Predict• MONARC - LHC centered project

– Provide toolset for modeling such systems– Develop guidelines for designing such systems– Currently capable of relevant analyses

• U.S. ATLAS Involvement– Later talk by K. Sliwa

Page 14: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

14

Components of Model: Tier 1• Full Function Facility

– Dedicated Connectivity to CERN – Primary Site for Storage/Serving

• Cache/Replicate CERN data needed by US ATLAS• Archive and Serve WAN all data of interest to US ATLAS

– Computation• Primary Site for Re-reconstruction (perhaps only site)• Major Site for Simulation & Analysis (~2 x Tier 2)

– Repository of Technical Expertise and Support• Hardware, OS’s, utilities, and other standard elements of U.S. ATLAS• Network, AFS, GRID, & other infrastructure elements of WAN model

Page 15: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

15

Components of Model: Tier 2• Limit personnel and maintenance support costs• Focused Function Facility

– Excellent connectivity to Tier 1 (Network + GRID)– Tertiary storage via Network at Tier 1 (none local)– Primary Analysis site for its region– Major Simulation capabilities– Major online storage cache for its region

• Leverage local expertise and other resources– Part of site selection criteria, ~1 FTE contributed, for

example

Page 16: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

16Technology Trends & Choices

• CPU– Range: Commodity processors -> SMP servers– Factor 2 decrease in price/performance in 1.5 years

• Disk– Range: Commodity disk -> RAID disk– Factor 2 decrease in price/performance in 1.5 years

• Tape Storage– Range: Desktop storage -> High-end storage– Factor 2 decrease in price/performance in 1.5 - 2 years

Page 17: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

17

Price/Performance Evolution

From Harvey Newman presentation, Third LCB Workshop, Marseilles, Sept. 1999

As of Dec 1996

Page 18: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

18

Technology Trends & Choices• For Costing Purpose

– Start with familiar established technologies– Project by observed exponential slopes

• This is a Conservative Approach– There are no known near term show stoppers to

these established technologies– A new technology would have to be more cost

effective to supplant projection of an established technology

Page 19: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

19

Technology Trends & Choices• CPU Intensive processing

– Farms of commodity processors - Intel/Linux• I/O Intensive Processing and Serving

– Mid-scale SMP’s (SUN, IBM, etc.)• Online Storage (Disk)

– Fibre Channel Connected RAID• Nearline Storage (Robotic Tape System)

– STK / 9840 / HPSS• LAN

– Gigabit Ethernet

Page 20: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

20

Composition of Tier 1• Commodity processor farms (Intel/Linux)• Mid-scale SMP servers (SUN)• Fibre Channel connected RAID disk• Robotic tape / HSM system (STK / HPSS)

1999 2000 2001 2002 2003 2004 2005 2006CPU - kSPECint95 0.2 2 5 9 16 30 50 80 Disk - TB 0.2 1 4 12 24 54 100 168 Tertiary Storage - TB 1 5 17 35 62 116 319 622 - MBytes/sec - 20 50 95 163 264 416 643

Page 21: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

21Current Tier 1 Status

• U.S. ATLAS Tier 1 facility is currently operating as a small, ~5 %, adjunct to the RHIC Computing Facility (RCF)

• Deployment includes– Intel/Linux farms (28 CPU’s)– Sun E450 server (2 CPU’s)– 200 Mbytes of Fibre Channel RAID Disk– Intel/Linux web server– Archiving via low priority HPSS Class of Service

– Shared use of an AFS server (10 GBytes)

Page 22: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

22Current Tier 1 Status

• These RCF chosen platforms/technologies are common to ATLAS– Allows wide range of services with only 1 FTE of sys

admin contributed (plus US ATLAS librarian)– Significant divergence of direction between US ATLAS

and RHIC has been allowed for– Complete divergence, extremely unlikely, would exceed

current staffing estimates C u rren t U .S . A T L A S T ier 1 C a p a cities

C o m p u te 28 C P U 's 500 S P E C in t9 5D is k F ib re C han ne l 25 0 G bytesS u n S e rve r / N IC 2 C P U 's 10 0 M b it/se c

Page 23: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

23

E450(NFS Server)

Dual Intel

Dual Intel

LANSwitch

SANHub

BackupServer

HPSSArchiveServer

Ÿ XXX.USATLAS.BNL.GOVŸ E450 front line with SSHŸ Objectivity Lock Server

200 GBytesRAID Disk

US ATLAS Tier 1 Facility

Intel/LinuxDual 450 MHz512 MBytes18 GBytes

100 Mbit Ethernet(4 of 14

operational)

9840Tapes

AFSServers

AFS

~10 GBytesRAID DiskAtlas AFS

Ÿ LSFŸ AFSŸ ObjectivityŸ Gnu etc.

Atlas Equipment

RCF Infrastructure

~50 GBytes

Intel/LinuxW eb Server

Current Configuration

128 MBytes18 GBytes

.

.

.

Page 24: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

24RAID Disk Subsystem

Page 25: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

25

Intel/Linux Processor Farm

Page 26: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

26Intel/Linux Nodes

Page 27: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

27

Composition of Tier 2 (Initial One)

• Commodity processor farms (Intel/Linux)• Mid-scale SMP servers• Fibre Channel connected RAID disk

1999 2000 2001 2002 2003 2004 2005 2006CPU - kSPECint95 - - 1 3 5 8 15 26 Disk - TB - - 1 2 4 9 15 24

Page 28: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

28Staff Estimate

(In Pseudo Detail)Function

Simulation/Reconstruction SystemData Serving SystemAnalysis SystemComputing EnvironmentData AccessNetworkPhysical InfrastructureFacility ManagementTotal

Assume 1 FTE is Total of 6 FTE's arecontributed from base contributed from base

0.1 1 0.6

3 26 180.5 3 3.0

0.2 3 1.20.5 6 3.00.5 4 3.00.4 2.5 2.40.5 4 3.00.3 2.5 1.8

(FTE's) (FTE's) (FTE's)Typical Tier 2 Tier 1 Total 6 x Tier 2

Page 29: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

29

Time Evolution of Facilities• Tier 1 functioning as early prototype

– Ramp up to meet needs and validate design• Assume 2 years for Tier 2 to fully establish

– Initiate first Tier 2 in 2001• True Tier 2 prototype• Demonstrate Tier 1 - Tier 2 interaction

– Second Tier 2 initiated in 2002 (CERN?)– Four remaining initiated in 2003

• Fully operational by 2005• Six are to be identical (CERN exception?)

Page 30: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

30

Staff Evolution

FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06Tier 1

Tier 1 Total 3 5 8 11 13 19 26 26 Tier 2

Initial Year Center - - 1 2 2 2 2 2 Second Year Center - - - 1 2 2 2 2 4 Final Year Centers - - - - 4 8 8 8

Tier 2 Total - - 1 3 8 12 12 12 Facilities Staff Total 3 5 9 14 21 31 38 38

Page 31: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

31

Network• Tier 1 connectivity to CERN and to Tier 2’s is

critical– Must be guaranteed and allocable (dedicated and

differentiate)– Must be adequate (Triage of functions is

disruptive)– Should grow with need; OC12 should be practical

by 2005 when serious data will flow

Page 32: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

32WAN Configurations and Cost

(FY 2000 k$)1999 2000 2001 2002 2003 2004 2005 2006

Tier 1 to CERN Link T3 OC3 OC12 OC12CERN Link Cost 0 0 0 0 100 250 400 300

Tier 1 to Tier 2 OC3's 0 0 1 2 5 4 0 0Tier 1 to Tier 2 OC12's 0 0 0 0 0 1 5 5Domestic OC3 Cost 300 240 192 154 123 98 79 63Domestic OC12 Cost 600 480 384 307 246 197 157 126Domestic WAN Cost 0 0 192 307 614 590 786 629

Total WAN Cost 0 0 192 307 714 840 1186 929

Page 33: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

33Annual Equipment Costs for Tier 1 Center

(FY 2000 k$)FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06

Linux Farm 16 140 150 150 150 200 200 200SMP Servers 40 25 50 0 100 0 150 0Disk Subsystem 30 120 200 350 350 600 600 600Robotic System 0 0 125 0 0 125 0 125Tape Drives 0 50 50 50 50 50 50 50Local Area Network 0 0 50 50 75 75 75 75Media 1 15 30 30 30 40 100 100Desktops 0 25 50 75 75 75 75 75Hardware Maintenance 0 30 40 111 168 225 281 345Software Licenses 10 40 60 80 100 350 250 250Misc. 5 20 50 80 90 90 100 100

Total 102 465 855 976 1188 1830 1881 1920Overhead 12% 12% 12% 12% 12% 12% 12% 12%Total Cost 114 521 957 1093 1331 2050 2107 2150

Page 34: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

34Annual Equipment Costs Tier 2 Center

(FY 2000 k$) FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06

Linux Farm 0 0 50 50 50 50 70 70SMP Servers 0 0 40 40 40 40 40 40Disk Subsystem 0 0 39 51 81 81 81 81Local Area Network 0 0 15 15 15 15 15 15Desktops 0 0 8 8 8 8 8 8Hardware Maintenance 0 0 0 14 30 50 57 61Software Licenses 0 0 10 20 20 20 20 20Misc. 0 0 10 20 20 20 20 20

Total 0 0 172 218 264 284 311 315Overhead 12% 12% 12% 12% 12% 12% 12% 12%Total Cost 0 0 193 244 296 319 348 353

Page 35: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

35

Integrated Facility Capacities by Year

1999 2000 2001 2002 2003 2004 2005 2006Tier 2 Facilities - - 1 2 6 6 6 6 CPU - kSPECint95

Tier 1 0 2 5 9 16 30 50 80 Tier 2 - - 1 4 20 48 91 156

Total CPU 0 2 6 13 37 77 141 235 Disk - TB

Tier 1 0 1 4 12 24 54 100 168 Tier 2 - - 1 3 20 52 89 145

Total Disk 0 1 5 15 44 106 189 313 Robotic Tape - TB

Total Tape 1 5 17 35 62 116 319 622

Page 36: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

36

US ATLAS Facilities Annual Costs

(FY2000 k$)FY '99 FY '00 FY '01 FY '02 FY '03 FY '04 FY '05 FY '06

Tier 1Equipment, etc. 110 520 960 1,090 1,330 2,050 2,110 2,150

Personnel 100 700 1,120 1,540 1,820 2,650 3,630 3,630 Tier 1 Total 220 1,220 2,070 2,630 3,150 4,700 5,740 5,780

Tier 2Equipment, etc. - - 190 470 1,630 2,030 2,050 2,140

Personnel - - 140 420 1,120 1,680 1,680 1,680 Tier 2 Total - - 330 890 2,740 3,710 3,720 3,820

NetworkNetwork Total - - 190 310 710 840 1,190 930

Facilities Total 200 1,200 2,600 3,800 6,600 9,300 10,600 10,500

Page 37: U.S. ATLAS Computing Facilities U.S. ATLAS Physics  Computing Review Bruce G. Gibbard, BNL 10-11 January 2000

11 January, 2000U.S. ATLAS Physics & Computing Review

37

Major MilestonesMilestone Description Date

Selection of 1st Tier 2 site 01-Oct-00Procure dedicate Automate Tape Library (ALT) 01-Jun-01Demo Tier 2 transparent use of Tier 1 ATL 01-Jan-02Establish dedicated Tier 1 / CERN link 01-Jan-03Select remaining (4) Tier 2 sites 01-Jan-03Mock Data Challenge I (25% turn-on capacity) 01-May-03Final commit to HSM 01-Oct-03Demo full hierarchy transparent operation 01-Apr-04Mock Data Challenge II (50% turn-on capacity) 01-Jun-04Achieve turn-on capacities 01-Jan-05