35
1 The ALICE Tier-2’s in Italy Roberto Barbera (*) Univ. of Catania and INFN Workshop CCR INFN 2006 Otranto, 08.06.2006 (*) Many thanks to A. Dainese, D. Di Bari, S. Lusso, and M. Masera for providing slides and information for this presentation.

The ALICE Tier-2’s in Italy

  • Upload
    mikaia

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

The ALICE Tier-2’s in Italy. Roberto Barbera (*) Univ. of Catania and INFN Workshop CCR INFN 2006 Otranto, 08.06.2006. (*) Many thanks to A. Dainese, D. Di Bari, S. Lusso, and M. Masera for providing slides and information for this presentation. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: The ALICE Tier-2’s in Italy

1

The ALICE Tier-2’s in Italy

Roberto Barbera(*)

Univ. of Catania and INFNWorkshop CCR INFN 2006

Otranto, 08.06.2006

(*) Many thanks to A. Dainese, D. Di Bari, S. Lusso, and M. Masera for providing slides and information for this presentation.

Page 2: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 2

Outline

The ALICE computing model and its parameters ALICE and the Grid(s)

Layout Implementation Recent results

ALICE Tier-2’s in Italy Catania Torino Bari LNL-PD

Summary and conclusions

Page 3: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 3

The ALICE computing model (1/2)

pp Quasi-online data distribution and first reconstruction at T0 Further reconstructions at T1’s

AA Calibration, alignment and pilot reconstructions during data

taking Data distribution and first reconstruction at T0 during four

months after AA Further reconstructions at T1’s

One copy of RAW at T0 and one distributed at T1’s

Page 4: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 4

The ALICE computing model (2/2)

T0 First pass reconstruction, storage of one copy of RAW,

calibration data and first-pass ESD’s

T1 Reconstructions and scheduled analysis, storage of the

second collective copy of RAW and one copy of all data to be kept, disk replicas of ESD’s and AOD’s

T2 Simulation and end-user analysis, disk replicas of ESD’s and

AOD’s

Page 5: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 5

Parameters of the ALICE computing model

Unit pp PbPb

T1 # 7

T2 # 23

Size raw MB 0.2x5 12.5

Recording rate Hz 100 100

ESD MB 0.04 2.50

AOD kB 4 250

Event Catalogue kB 10 10

Running time s 107 106

Events / y # 109 108

Reconstruction passes (av) # 3

RAW duplication # 2

AOD/ESD duplication # 2

Scheduled analysis passes / rec ev / y (av) # 3

Chaotic analysis passes / rec ev / y (av) # 20

Page 6: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 6

Legenda: TQ= Task Queue Central job DB

CAT= Central Catalogue

ALICE & the Grid(s)

ALICEAgents

&Daemons

ROOT

ALIROOT

Computing

framework

ResourcesR

esou

rces

NU

Grid

Res

ou

rces

ALICE TQ

ALICE Agents &Daemons

OS

G

Reso

urces

ALICE user

ALICE CAT

Page 7: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 7

Implementation: the “VO-Box”

LCG SiteLCG CE

WNJobAgent

LCG SE

LCG

RBTQ

VO-Box

SCA

SA

Job request

LFC SURL Registration

File Catalogue

LFN RegistrationPackManRequest

configuration

Page 8: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 8

Who does what ?

Configure, submit and track jobs

User interface with massive production support

Job DB (Production and user) User and role management

Install software on sites Package Managers

Distribute and execute jobs Workload Management System

(Broker, L&B,…) Computing Element software Information Services Interactive analysis jobs

Store and catalogue data Data catalogues (file, replica,

metadata, local,…) Storage Element software

Move data around File Trasfer services and schedulers

Access data files I/O services File management (SRM)

Monitor all that stuff Transport infrastructure Sensors Web presentation

..on top of that:

Enforce security!

MIXED

PROOF

MonALISA

MIXED

Xrootd

Page 9: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 9

Some statistics and results for SC3/PDC05 In the last two months of 2005:

22,500 jobs (Pb+Pb and p+p) Average CPU time: 8 hours Data volume produced: 20 TB (90% CASTOR2 at CERN, 10%

remote sites) Resource Centres participating (22 in total)

4 T1: CERN, CNAF, GridKa, CCIN2P3 18 T2: Bari, Clermont (FR) , GSI (D), Houston (USA) , ITEP

(RUS), JINR (RUS) , KNU (UKR), Muenster (D), NIHAM (RO), OSC (USA), PNPI (RUS), SPbSU (RUS), Prague (CZ), RMKI (HU), SARA (NL), Sejong (SK), Torino, UiB (NO)

Job share per site: T1: CERN 19%, CNAF 17% (CPU 20%), GridKa 31%, CCIN2P3

22% T2: total of 11%

Failure rate di AliRoot: 2.5%

Page 10: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 10

Job execution profile during SC3

2450 jobs (25% more than entire lxbatch capacity at Cern)

Negative slope: AliEn problem during output retrieval. Fixed in the further release!

Page 11: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 11

Without

INFN-T1~388000 job

~811000 job

Memento: VO= Virtual Organization (esperimento)

ALICE: 8% of the total number of jobs on the national grid

Use of INFN Grid by LHC Exps.: JOB/VO (Sep 2005 - Dec 2005)

Page 12: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 12

~ 98 years, 2 month, 18 days~ 358 years, 7 months, 11 days

Without

INFN-T1

ALICE: 14% of CPU time outside T1

Use of INFN Grid by LHC Exps.: CPU/VO (Sep 2005 - Dec 2005)

Page 13: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 13

ALICE JOBS PER SITE.

Warning:

Job agents and real jobs are accounted in the same way

Page 14: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 14

ALICE Tier-2’s in Italy

Four candidates: Bari, Catania, LNL-PD, and Torino (T2 projects available at the URL: http://www.to.infn.it/~masera/TIER2/).

The team of ALICE referees with representatives of the INFN Management Board visited all Tier-2 candidates between 10/2005 and 02/2006.

Referees’ decision communicated at a meeting in Rome on 10/03/2006: Catania and Torino approved; Bari and LNL-PD “incubated” (kept in “life support” until real

ALICE needs are proved by real test of the computing model in production mode).

Page 15: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 15

Network connectivity of T2-s

ALICE Tier-2’s

Page 16: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 16

Catania (1/5) – Comp. room

Present installation

Future expansion

Space available for installations: ~160 m2

Page 17: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 17

Catania (2/5) - Infrastructure

TraditionalSystem

High DensitySystem

Page 18: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 18

Catania (3/5) - CPU150 kSI2k

SuperMicro dual AMD dual-core 275 with 4 GB RAM in 1U configuration

IBM LS20 “blades” with dual AMD dual-core 280 with 4 GB RAM (within june)

LSF 6.1 as LRMS

Page 19: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 19

Catania (4/5) - Storage

21+ TB with GPFS

• FC-2-SATA systems plus more

traditional DAS with EIDE-2-SCSI

controllers

• Filesystem: GPFS

Page 20: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 20

Catania (5/5) - Statistics

Last month activity

Page 21: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 21

Torino (1/5) – Computing Room

Page 22: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 22

Torino (2/5) - Present installation

• Present solutions: blade servers (IBM) and 1U biprocessors• Guidelines for the future:

•Minimize space•Minimize power consumption

Page 23: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 23

Torino (3/5) - Resources CPU

38 Intel(R) Xeon(TM) CPU 2.40GHz; 12 Intel(R) Xeon(TM) CPU 3.06GHz. 45 Intel Biprocessors (<=4 years – 14 Blades)

DISK ~6TB dedicated to ALICE 2TB shared among various VO’s (Classic-SE); 1 dCache SE with an internal disk of ~80GB for tests; ~15TB of disk space for ALICE is going to be commissioned soon. It

is a FLX210 with 3 FLC200 expansions from di StorageTek Filesystem

Ext3 for the ClassicSE; not yet defined for the new storage system; Tests with xrootd for local and remote access (through proxy) are

scheduled. LRMS

Torque-Maui; the default one coming with the INFN Grid release

Open to all VO’s

Dedicated to ALICE (at the moment)

Page 24: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 24

Torino (4/5) - Resources

Future evolution Many nodes (~20 – the most recent) are being migrated

from the ALICE farm to the LCG farm exploiting the forthcoming upgrade to gLite 3.0;

New WN’s (80 cores – 130 KSI2K), recently bought, will be installed and configured very soon.

Networking: All WN’s are in a hidden LAN (only outbound connectivity is

allowed) and the NATting is done by an Extreme Networks switch. Almost all connection are Gigabit Ethernet.

Monitoring: MRTG and NAGIOS for the local control of the farm.

Page 25: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 25

Torino (5/5) - Usage

Scheduler locale. # di job

LCG. Numero di Job

Monitoring centrale ALICE

Page 26: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006

Bari (1/2)

Bari is a Tier-2 candidate both for ALICE and CMS.

Bari supports also other VO’s. Priorities are given to the various VO’s

proportionally to the different budgets for acquiring resources.

In the last two years Bari has provided resources for ALICE both for PDC04 and SC3 and will provide for SC4.

Page 27: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006

Bari (2/2) One 2 cpu 700 MHz PIII aligrid1.ba.infn.it - HD 40 GB One 2 cpu 1 GHz PIII alicegrid2.ba.infn.it - HD 160 GB Three 2 cpu Intel Xeon 1.8 GHz alicegrid4 - alicegrid6

(VOBOX) - 3 HD da 80GB One 2 cpu Intel Xeon 1.8 GHz alicegrid3.ba.infn.it - (SE for

PDC04) with 0.7 TB of data One 2 cpu Intel Xeon 2.4 GHz alicegrid5.ba.infn.it - (SE for

Finuda) with 1.5 TB disk space Three 2 cpu Intel Xeon 2.4 GHz - HD 80 GB One 2 cpu Intel Xeon 2.4 GHz alicegrid7.ba.infn.it - HD 80

GB - software repository + Quattor installation server One Opteron 2 dual core 275 - HD 120 GB Three 2 cpu Intel Xeon 2.8 GHz - HD 80 GB One 2 cpu Intel Xeon 3.0 GHz EM64T - HD 2 array x 2.5 TB

(TOT 5 TB) (to be configured with xrootd for SC4)

Page 28: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006

ALICE jobs at Bari (monitored by MonaLisa)

Page 29: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 29

LNL-PD

Background: LNL-PD is an approved Tier-2 for CMS; Many-years experience in running a T2 prototype for

CMS.

Size of the existing Tier-2 for CMS: CPU: ~200 KSI2K (almost all “blades” dual core) Storage: EIDE-2-SCSI DAS with 3Ware + Storage Area

Network LRMS: LSF Monitoring: Ganglia (local) + GridIce

Page 30: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 30

ALICE at LNL-PD

ALICE activities already done: ALICE VO-box installed in 02/2006 Site testing with small productions OK Big ALICE production in April-May via LCG

Future activities foreseen for the rest of 2006: Participation to PDC06 (~10 kSI2k dedicated resources

+ the possibility to use CMS resources, if/when available)

Installation of an ALICE storage system with xrootd (~1 TB at the beginning)

Page 31: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 31

ALICE jobs at LNL-PD(monitored by GridIce)

ALICE

15 April 2006 – 15 May 2006

Page 32: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 32

Common issues

Need for a common solution for the infrastructure (to improve the economy of scale).

Need for an affordable, reliable, and scalable solution for the storage.

Need for a better organization of distributed support for Tier-2’s.

Although new technologies (“blades” with low-power CPU’s) help a bit, power consumption at Tier-2 sites is becoming increasingly important from an economic point of view. Strict guidelines and a dedicated budget should be centrally created by INFN Management.

Page 33: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 33

The future: PDC06 (June 2006) Check of the distributed computing model:

From raw-data to ESD Data tranfers among sites Calibration and alignment Analysis

SC3 experience has helped a lot to improve AliEn (current version 2.10)

Intense development of AliRoot to include calibration and alignment code for all sub-detectors and reduce the percentage of run time failures. Huge effort of the Italian groups in many sites.

Page 34: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 34

Resources ramp up at INFN Tier-2’s

2006 2007 2008 2009 2010CPU (KSI2K) 460 1070 2520 5000 6000DISK (TB) 160 379 894 1773 2128CPU/DISK 2.88 2.82 2.82 2.82 2.82

2006 2007 2008 2009 2010CPU (KSI2K) 160 610 1450 2480 1000DISK (TB) 115 219 514 879 355

2006 2007 2008 2009 2010CPU (KSI2K) 0 80 0 220 160DISK (TB) 0 15 0 30 115

2006 2007 2008 2009 2010CPU (KSI2K) 160 690 1450 2700 1160DISK (TB) 115 234 514 909 470

2006 2007 2008 2009 2010CPU (kEur) 92 261 369 446 144DISK (kEur) 258 329 450 498 160Tot (kEur) 351 590 819 944 304GrandTotal 3008 k€

TOTAL ACQUISITIONS (PER YEAR)

COSTS (P.Capiluppi & A. Masoni)

T2 - Present ramp up (year = acquisition)INTEGRATED ESTIMATES @ TIER-2

NEW RESOURCES (differential)

REPLACEMENTS

Page 35: The ALICE Tier-2’s in Italy

Workshop CCR INFN 2006, Otranto, 08.06.2006 35

Summary and conclusions

The ALICE computing model has been finalized and now it is ready to face the forthcoming data from LHC.

INFN has identified the first official Tier-2’s for ALICE.

Both for the design and the day-by-day operation of a LHC Tier-2 a strong collaboration between the Experiments, the INFN Grid Project, the INFN CCR, and the Computing&Network Services at the various INFN Departments is of vital importance.