30
LHC Data Challenges and Physics Analysis Jim Shank Jim Shank Boston University Boston University VI DOSAR Workshop VI DOSAR Workshop 16 Sept., 2005 16 Sept., 2005

LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

Embed Size (px)

Citation preview

Page 1: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

LHC Data Challenges and Physics Analysis

Jim ShankJim Shank

Boston UniversityBoston University

VI DOSAR WorkshopVI DOSAR Workshop

16 Sept., 200516 Sept., 2005

Page 2: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 2

Overview

• I will concentrate on ATLAS, with some CMSI will concentrate on ATLAS, with some CMS

• ATLAS Computing timelineATLAS Computing timeline

• ATLAS Data Challenge 2ATLAS Data Challenge 2

• CMS DC04CMS DC04

• The LHC Service ChallengesThe LHC Service Challenges

• Distributed AnalysisDistributed Analysis• PanDA

• ATLAS Computing System CommissioningATLAS Computing System Commissioning

Page 3: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 3

ATLAS Computing Timeline2003 • POOL/SEAL release (done)

• ATLAS release 7 (with POOL persistency) (done)

• LCG-1 deployment (done)

• ATLAS complete Geant4 validation (done)

• ATLAS release 8 (done)

• DC2 Phase 1: simulation production (done)

• DC2 Phase 2: intensive reconstruction (only partially done…)

• Combined test beams (barrel wedge) (done)

• Computing Model paper (done)

• Computing Memoranda of Understanding (ready for signatures)

• ATLAS Computing TDR and LCG TDR (done)

• Computing System Commissioning

• Physics Readiness Report

• Start cosmic ray run• GO!

2004

2005

2006

2007

Page 4: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 4

ATLAS Production Runs (2004-2005)

Grid ProductionGrid Production WorldwideWorldwide U.S.U.S. U.S. Tier 2U.S. Tier 2

Jobs (k)Jobs (k) Events (M)Events (M) Jobs (k)Jobs (k) Events (M)Events (M) Percentage of U.S. Jobs Percentage of U.S. Jobs

done by three U.S. T2 done by three U.S. T2

sitessites

Data Challenge 2 Data Challenge 2

(DC2)(DC2)334334 8181 117117 2828 55%55%

Rome Physics Rome Physics

WorkshopWorkshop573573 2828 138138 77 60%60%

• Over 400 physicists attended Rome workshop, 100 papers presented based on the data produced during DC2 and Rome production

• Production during DC2 and Rome established a hardened Grid3 infrastructure benefiting all participants in Grid3

Page 5: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 5

Rome Grid ProductionSuccessful Job Count at 83 ATLAS sites

Southwest T2

BNL T1

Boston T2

Midwest T2

Page 6: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 6

U.S. Grid Production (Rome/DC2 combined)

20 different sites used in the U.S.

ATLAS Tier 2’s played dominant role

BNL T122%

Boston T220%

UM3%

UBuf2%

PSU3%

Southwest T224%

FNAL4%

PDSF4%

Midwest T213%

Other US sites (7)4%

UCSD1%

(3 sites)

(2 sites)

Page 7: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 7

ATLAS Production on Grid3

<Capone Jobs/day> = 350

Max # jobs/day = 1020

US ATLAS dominated all other VOs in use of Grid3

2004 2005

Page 8: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

8

CMS

Data

and

Service

Challenges

• CMS is organizing a series of Data Challenges and participating in the LCG service challengesCMS is organizing a series of Data Challenges and participating in the LCG service challenges

Each is a step in the preparation for the start LHC data takingEach is a step in the preparation for the start LHC data taking Data Challenge 2004 (DC04) was the first opportunity to test all the components of reconstruction, data transfer and

registration in a real-time environment

Goal was for 25Hz of reconstruction and transfer of events to Tier-1 centers for archivingGoal was for 25Hz of reconstruction and transfer of events to Tier-1 centers for archiving

First of the LCG Service Challenges started in FY05First of the LCG Service Challenges started in FY05 Transfer data from disk to disk from CERN to Tier-1 centers

Second LCG Service Challenge Second LCG Service Challenge Transfer data using

Service Challenge 3 is in progress

Use experiment services to demonstrate the functionality of the Tier-0, Tier-1 and Tier-2 services Use experiment services to demonstrate the functionality of the Tier-0, Tier-1 and Tier-2 services

simultaneously simultaneously Transfer, Publishing and Verify Consistency of data, Run example analysis jobs against transferred data

Page 9: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 9

Page 10: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 10

Page 11: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 11

Page 12: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 12

SC3

Service Challenges – ramp up to LHC start-up

SC2

LHC Service OperationFull physics run

2005 20072006 2008

first physicsfirst beams

cosmics

June05 - Technical Design Report

Sep05 - SC3 Service Phase

May06 – SC4 Service Phase

Sep06 – Initial LHC Service in stable operation

SC4

SC2 – Reliable data transfer (disk-network-disk) – 5 Tier-1s, aggregate 500 MB/sec sustained at CERNSC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 500 MB/sec, including mass storage (~25% of the nominal final throughput for the proton period)SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughputLHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput

Apr07 – LHC Service commissioned

Page 13: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

Jon Bakken DOE/NSF-2005 Review March 3, 2005 13

Service Challenge 1SC1 was an extremely valuable system integration exercise

It showed that the currently deployed system has a high degree of usability.

•SC1 demonstrated a 10x higher throughput (25 TB/day WAN) than prior use in fairly realistic deployment [c.f. CDF LAN rate is ~30 TB/day]

Many problems exposed due to high number of transfers and high rate. Problems were fixed, or new features added or system redesigned, before proceeding with tests

Rate results:• Using a fully integrated system, rate - 300 MB/sec, CERN to FNAL, disk to

disk

• Using dcache-java-class gridftp script - 500 MB/sec, no disk at FNAL

• Using dcache-java-class gridftp script - 400 MB/sec, to dCache disks

• 20 parallel streams per transfer, each with 2 MB buffers is optimal tune

These transfer rates were only possible using the Starlight research networks.

Page 14: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

Jon Bakken DOE/NSF-2005 Review March 3, 2005 14

Service Challenge 1Data written in SC1 by srmcp transfers in November

2004.

Page 15: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

Jon Bakken DOE/NSF-2005 Review March 3, 2005 15

Service Challenge IIContinue the goal of making transfers robust and reliable, understand

and fix problems rather than just restart

Only use SRM managed transfers to FNAL dCache pools using 3rd party gridftp transfers. No special/contrived transfer scripts.

Continue to use the deployed CMS dCache infrastructure and the Starlight network links. Real data transfers have to coexist with users, service challenge should do so as well. This exposed many bugs in SC1.

Sustain 50 MB/s to tape from CERN to FNALTransfer ~10 MB/s of user data from Castor to tape at FNAL ~5 tapes/dayTransfer 40 MB/s of fake data from CERN OPLAPRO lab to FNAL tape, but recycle tapes quickly

•Use PhEDEx for both user and fake data transfers - Exercising PhEDEx and making it a robust tool is a goal of SC2

Plan to participate in of 500 MB/s SRM managed transfers to CMS's resilient or volatile pools over the Starlight network links.

Page 16: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

Jon Bakken DOE/NSF-2005 Review March 3, 2005 16

Service Challenge IIUSCMS Tier 1 to Tier 2 Complement of SC2

•US CMS Tier 2 sites want some of the data we are transferring. Plan on delivering these data sets to them as part of this challenge also using PhEDEx

•Initially expect low rate to each, to 3-4 Tier 2 sites•End-to-end functionality test of many components•Rate can grow as Tier 2 can accept data

•UF, UCSD and Caltech Tier 2s already have resilient dCache + SRM deployed. Each site is installing PhEDEx daemons.

•Will use PhEDEx for these transfer as well.

•Tier 2 sites do not have MSS, only disk cache, relatively cheap disk. The Tier 1 to Tier 2 operations are being investigated.

Page 17: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

17

US ATLAS Tier 1 SC3 Performance

Plots and statistics taken fromPlots and statistics taken from BNL: Zhao and Yu

CERN: Casey and Shiers

Primary members of BNL teamPrimary members of BNL team Zhao, Liu, Deng, Yu, Popescu

Page 18: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

18

Service Challenge 3 (Throughput Phase)

Primary goals: Primary goals: Sustained 150MB/s disk (T0) – disk (T1) transfers

Sustained 60MB/s disk (T0) – tape (T1) transfers

Secondary goal:Secondary goal: A few named T2 sites (T2 <=> T1) transfers at a few MB/s

Participating US ATLAS Tier 2 Sites Boston University University of Chicago Indiana University University of Texas Arlington

Page 19: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

19

SC3 Services (and some issues)

Storage Element – Storage Element – US ATLAS production dCache US ATLAS production dCache systemsystem dCache/SRM (V1.6.5-2, with SRM 1.1 interface) Total 332 nodes with about 170 TB disks Multiple GridFTP, SRM, and dCap doors

Network to CERN Network to CERN Network for dCache at 2 x 1Gb/sec => OC48 (2.5 Gb/sec) WAN Shared link to CERN with Round Trip Time: >140 ms

RTT for European sites to CERN: ~ 20ms.

Occasional packet losses observed along BNL-CERN path which limited single stream throughput

1.5 Gb/s aggregated bandwidth observed by iperf with 160 TCP streams.

Page 20: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

20

Disk to Disk Transfer Plot

DailyDaily

Page 21: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

21

Disk to Disk Transfer Rates

Site Daily Average (MB/s)

ASCC 10

BNL 107

FNAL 185

GRIDKA 42

IN2P3 40

INFN 50

NDGF 129

PIC 54

RAL 52

SARA 111

TRIUMF 34

Page 22: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

22

Disk to Disk Transfer Plot

Page 23: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

23

Disk to Disk Transfer Plots Castor2 LSFCastor2 LSF

plug-in problemplug-in problem

Data routed through Data routed through GridFTP doorsGridFTP doors

Page 24: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

24

Transfer component parameter Transfer component parameter settings influence successful settings influence successful transfer completions (timeouts, etc.)transfer completions (timeouts, etc.)

150 MB/sec rate achieved for one 150 MB/sec rate achieved for one hour with large numbers (> 50) of hour with large numbers (> 50) of parallel file transfers. CERN FTS parallel file transfers. CERN FTS original limit of 50 files per channel original limit of 50 files per channel was not enough to fill CERNwas not enough to fill CERN BNL BNL data channeldata channel

Data Transfer Status

Page 25: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

25

Disk (T0) to Tape (T1) Transfer Plot

Sustained operation at target 60 MB/sec was achieved but Sustained operation at target 60 MB/sec was achieved but

glitches, primarily with tape HSM (HPSS), resulted in glitches, primarily with tape HSM (HPSS), resulted in

periods of stopped or reduced transfer periods of stopped or reduced transfer

Page 26: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

26

Aggregate Disk to Disk Tier 2 Transfers

While average aggregate transfer to 4 sites at ~16 MB/sec While average aggregate transfer to 4 sites at ~16 MB/sec

was achieved (~4 MB/sec per site, meeting goal) Tier 2 was achieved (~4 MB/sec per site, meeting goal) Tier 2

availability limited consistencyavailability limited consistency

Page 27: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 27

PanDA

• US ATLAS Production and Distributed AnalysisUS ATLAS Production and Distributed Analysis• Based on experience of DC2: redesign of job submission on the grid

• Very actively being worked on now, with aggressive schedule

• Integrates components that were separate in DC2• Distributed Data Management• Production (job submission/Workload management)• Distributed Analysis

Page 28: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 28

Page 29: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 29

Page 30: LHC Data Challenges and Physics Analysis Jim Shank Boston University VI DOSAR Workshop 16 Sept., 2005

J. Shank VI DOSAR Workshop 16 Sept. 2005 J. Shank VI DOSAR Workshop 16 Sept. 2005 30

ATLAS Computing System Commissioning

• Starts Early 2006Starts Early 2006

• Will test complete end-end ATLAS computing infrastructureWill test complete end-end ATLAS computing infrastructure• Including some key components missing from DC2:

• Scale: Production comparable to DC2