27
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb

LHCb report to LHCC and C-RSG

  • Upload
    hollis

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

LHCb report to LHCC and C-RSG. Philippe Charpentier CERN on behalf of LHCb. Activities in 2009-Q3/Q4. Core Software Stable versions of Gaudi and LCG- AA Applications Stable as of September for real data Fast minor releases to cope with reality of life … Monte-Carlo - PowerPoint PPT Presentation

Citation preview

Page 1: LHCb report to LHCC and C-RSG

LHCb report toLHCC and C-RSG

Philippe CharpentierCERN

on behalf of LHCb

Page 2: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 2

Activities in 2009-Q3/Q4

m Core Softwareo Stable versions of Gaudi and LCG-AA

m Applicationso Stable as of September for real datao Fast minor releases to cope with reality of life…

m Monte-Carloo Intensive MC09 simulation (@ 5TeV)

P Minimum biasP b- and c- inclusiveP b signal channels

o Few events in foreseen 2009 configuration (450 GeV)o MC09 stripping (2 passes)

P Trigger strippingP Physics stripping

m Real data reconstruction and strippingo As of November 20th …

Page 3: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 3

Resource usage

Page 4: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 4

139 sites hit, 4.2 million jobs

m Start in June: start of MC09

Page 5: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 5

Job failure: 15% (17% at Tier1s)

Page 6: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 6

Failure breakdown

Page 7: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 7

Production and user jobs

Page 8: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 8

Jobs at Tier1s

Page 9: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 9

Job types at Tier1s

Page 10: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 10

CPU used (not normalised)

m Average job durationo 5.6 hours for all jobso 20 mn for user jobs (20%)o 6.6 hours for production

jobs

Page 11: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 11

m Average job durationo 5.6 hours for all jobso 20 mn for user jobso 6.6 hours for production

jobs

Page 12: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 12

CPU usage (not normalised)

Page 13: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 13

WLCG vs LHCb accounting (unnormalised)

m 13% more in WLCG than in DIRAC (unnormalised)o 1.26 Mdays vs 1.1 Mdayso Overhead of non reporting jobs + pilot/LCG/batch

frameworksm Average CPU power: 1.5 kSI2k (from WLCG

accounting)

Page 14: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 14

Normalised CPU usage in 2009

m Ramping up of pilot role in summerm Resource usage decreased since LHC restarted

o Concentrate on (few) real datao Wait for data analysis for continuing MC simulation

m Group 1: production

m Group 2: pilotm Group 3 & 4: userm Group 5: lcgadmin

Page 15: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 15

Resource usage

m Note: CERN above does not include non-Grid usage

o From WLCG accounting: 32% is non-Grid at CERNo CERN number should then read: 2.18 kHS06.years

m CPU usage within 10% of requestsm Distribution not exactly like expected

o More non-Tier1 resources availableP Less MC ran at CERN + Tier1s

o Almost no real data: less resources used at CERNP CAF not used as much as expected

Site Used (kHS06.years) Requested (kHS06.years)

CERN 1.48 8.54

Tier1s 8.24 11.7

Tier2s 24.44 17.12

Total 34.16 37.36

Page 16: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 16

Storage usage

m *) From Castor queries todaym **) From WLCG accounting end Decemberm ***) Including 420 TB for T1D0 cache

m Sites provided slightly more than the pledgeso Thanks!o At CERN, some disk pools (default, T1D0) were not

included in the requests but are in the accounting

Site Requested Allocated Used

CERN*) TxD1 650 696.5 482.7

CERN*) T1D0 70 148.5 irrelevant

CERN**) 720 721 478

Tier1s**) 1740***) 1915 633

Page 17: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 17

Experience with real data

Page 18: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 18

First experience with real data

m Very low crossing rateo Maximum 8 bunches colliding (88 kHz crossing)o Very low luminosityo Minimum bias trigger rate: from 0.1 to 10 Hzo Data taken with single beam and with collisions

No zero-suppression in VELOOtherwise ~25 GB only!

Page 19: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 19

Real data processing

m Iterative processo Small changes in reconstruction applicationo Improved alignmento In total 7 sets of processing conditions

P Only last files were all processed 4 times now (twice in 2010)

m Processing submissiono Automatic job creation and submission after:

P File is successfully migrated in CastorP File is successfully replicated at Tier1

o If job fails for a reason other than application crashP The file is reset as “to be processed”P New job is created / submitted (automatic)

o Processing more efficient at CERN (see later)P Eventually after few trials at Tier1, the file is processed

at CERNo No stripping ;-)

P DST files distributed to all Tier1s for analysis

Page 20: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 20

Reconstruction jobs

Page 21: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 21

Issues with real data

m Castor migrationo Very low rate: had to change the migration algorithm

for more frequent migration (1 hour instead of 8 hours)

m Issue with large files (above 2 GB)o Real data files are not ROOT files but open by ROOTo There was an issue with a compatibility library for

slc4-32 bit on slc5 nodesP Fixed within a day

m Wrong magnetic field signo Due to different coordinate systems for LHCb and

LHC ;-)o Fixed within hours

m Data access problem (by protocol, directly from server)

o Still dCache issue at IN2P3 and NIKHEFP dCache experts working on it

o Moved to copy mode paradigm for reconstructiono Still a problem for user jobs: a pain!

P Sites are regularly banned for analysis

Page 22: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 22

Transfers and job latency

m No problem observed during file transferso Files randomly distributed to Tier1o Will move to distribution by runs (few 100’s files)o For 2009, runs were never longer than 4-5 files!o Max file size set to 3 GB

m Very good Grid latencyo Time between submission and jobs starting running

Page 23: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 23

Resource requests

Page 24: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 24

Resource requests for 2010-12

m 2010 runningo The requests were made in April-June 2009

P No additional resources expectedP Try to fit within those requests

o Running scenario for LHCbP March: 35% LHC efficiency @ 100 HzP April-May-June: 50% LHC efficiency @ 1 kHz in averageP July-August-September-half October: 50% @ 2 kHzP no Heavy Ion run for LHCbP This corresponds to 6.1 106 seconds @ 2 kHzP The 2009-10 request accounted precisely by chance for

6.1 106 seconds (0.5+5.6)P Therefore we use 6.1 106 seconds for 2010 at 2 kHz

trigger ratem 2011 running

o Use the recommendation of MBP March: 35% LHC efficiency @ 2 kHzP April to mid-October: 50% LHC efficiency @ 2 kHzP Total running time: 8.9 106 seconds

m 2012: no run

Page 25: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 25

Resource requirements for 2010-12

kHEP06*year2010 (old) 2010 (confirmed) 2011 (prelim.) 2012 (very prelim.)

Integrated Integrated Power Integrated Power Integrated Power

CERN T0 5.70 4.50 4.07

CERN CAF - Analysis/Calib/Alignment

11.56 11.91 15.46

CERN T0 + T1 17.19 17.26 21 16.41 20 19.53 24

Tier1s 32.99 33.84 41 57.49 70 65.55 80

Tier2s 31.74 31.74 46 31.48 46 31.48 46

Total 81.91 82.83 108 105.38 136 116.57 150

   

Disk (TB)

CERN T0 + T1 1290 1270 1685 1776

Tier1s 3290 3350 4215 4458

Tier2s 20 20 20 20

Total 4600 4640 5920 6254

   

Tape (TB)

CERN T0 + T1 1500 1462 3020 3723

Tier1s 1800 1922 4271 5605

Total 3300 3384 7290 9328

Page 26: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 26

Comments on resources

m Very uncertain and fluctuating running plans!

m Depending on LHC running, MC requests may be different

o Minimum bias, charm physics, b physics…m Only after one year (at least) experience we can

see how running analysis on the Grid workso Analysis at CERN?o Analysis at Tier3s?o Reliability for analysis?

m 2012 is still very uncertaino No LHC runningo Will the MC requests be the same as previous yearso How many reprocessings?

P Currently assume 1 full reprocessing of 2010 and 2 of 2011

Page 27: LHCb report to LHCC and C-RSG

LHCb to LHCC and C-RSG review, PhC 27

Conclusions

m Real data in 2009o So few that it didn’t impact resource usageo Was extremely valuable for

P Setting proceduresP Start understanding the detector

d Already very promising performance after a few daysd Π0 peak, Λ and K0 reconstruction…

P Exercising automatic processesm 2010

o Still expect somewhat chaotic runningP Frequent changes in LHC settings, LHCb trigger

commissioningo No change in LHCb resource requests w.r.t. June

2009m 2011

o More precise requests with experience from 2010m 2012

o Still very preliminary, but small increase only compared to 2011