34
History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Embed Size (px)

Citation preview

Page 1: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

History of the National INFN Pool

P. Mazzanti, F. Semeria

INFN – Bologna (Italy)

European Condor Week 2006

Milan, 29-Jun-2006

Page 2: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Our first experience (1997)

• Monte Carlo event generation.• WA92 experiment at CERN: Beauty

search in fixed target experiment.• Working conditions: a dedicated farm of

3 Alpha VMS and 6 DecStation Ultrix.• Results: 22000 events/day (0 dead

time).

Page 3: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Then Condor came...• Production Condor Pool:

– 23 DEC Alpha• 18 Bologna

• 2 Cnaf (Bologna)

• 2 Turin

• 1 Rome

– 4 HP

– 6 DecStation Ultrix

– 5 Pentium Linux

Page 4: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

The throughput of the 23 Alpha subset of the pool:

75000 to 100000 events/day plus 15000 events/day with the pool in Madison.

We got x5 the production at zero cost!

Then Condor came… (cont.)

Page 5: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Give me a calculator…

• At INFN : 1000 PCs used 8 hours/day by the owners (16 hours/day idle)

• 1000 * 16 = 16000 hours = 1.8 year

1.8 year equivalent CPU wasted each day!

Page 6: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

The ‘Condor on WAN’ INFN Project

• Approved by the Computing Committee on February 1998.

• Goal: install Condor on the INFN WAN and evaluate its effectiveness for the INFN computational needs.

• 30 people involved.

Page 7: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

The Condor INFN Project (cont.)

The INFN Structure• 27 sites• More then 10 experiments on nuclear and sub-

nuclear physics.• Hundreds of researchers involved.• Distributed and heterogeneous resources.

(good frame for a grid…)

Page 8: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

The Condor INFN Project (cont.)

The first example in Europe of a national

distributed computing environment

Page 9: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Collaboration

• INFN and Computer Science Dept. of the University of Wisconsin, Madison

• Coordinators for the project:– for Madison: Miron Livny

– for INFN: Paolo Mazzanti.

Page 10: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

General usage policy

Each group of people must be able to maintain full control over their own machines.

Page 11: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

General usage policy (cont.)

A Condor job sent from a machine of a group must have the maximum access priority on the machines of the same group.

Page 12: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Subpools

• rank expression: a resource owner can give priority to requests from selected groups:

GROUP_ID = “My_Group”RANK = target.GROUP_ID == “My_Group”

• From the group point of view the machines make a pool by themselves: a subpool.

Page 13: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Checkpoint Server Domains

• The network could be a concern with a computing environment distributed over a WAN.

• Policy: a job should run in the ckpt domain if local resources are available.

Page 14: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

The INFN-WAN Pool (2000)

ALPHA/OSF1 111

INTEL/LINUX 68

HP/HPUX 10

SUN/SOLARIS 11

INTEL/WNT 1

Total 201

Page 15: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

The INFN-WAN Pool (2002)

ALPHA/OSF1 107 INTEL/LINUX 122 SUN/SOLARIS 6 INTEL/WNT 1 Total 235

Page 16: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

INFN Condor Pool Allocation Time (Hours) (1999)

Feb 32877.5Mar 39471.1

Apr 30427.6

May 9418.8

Jun 23027.5

Jul 25845.1

Aug 24797.5

Sep 34185.3

Oct 17834.9

Nov 35247.4

Dec 35432.3

Jan 13360.4

TOTAL 321925.4 (> 36 years)

Page 17: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Applications• Simulation of the CMS detector.• MC event production for CMS.• Simulation of Cherenkov light in the

atmosphere (CLUE).• MC integration in perturbative QCD.• Dynamic chaotic systems.• Extra-solar planets orbits.• Sthocastics differentials equations.

• Maxwell equations.

Page 18: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Simulation of Cherenkov light in the atmosphere (CLUE).

• Without Condor (1 Alpha):

– 20000 events/week.

• With Condor: 350000 events in 2 weeks (gain: x9)

Page 19: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Dynamic chaotic systems

• Computations based on complex matrix (multiplication,inversion,determinants etc.).

• Very CPU-bound with little output and no input.

• Gains with Condor respect to the only Alpha used: x3.5 to x10.

Page 20: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

MC integration in perturbative QCD

• CPU-bound

• No input, very small output

• Gains with Condor: x10.

Page 21: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Maxwell Equations

• 201 jobs, each with a different value of an input parameter.

• Output: 401 numbers/jobs

• Gains with Condor compared to the only Alpha available: x11

Page 22: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

People very very very happy!!

Page 23: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

The Pool Today

• 8 checkpoint servers: Bologna,Milano,Torino,Pavia,Trieste, Padova,LNGS,Napoli.

• 270 CPUs• 45.5 years CPU equivalent used from

January to June 25th -> 91 years CPU/year

Page 24: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006
Page 25: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Why the pool does not grow up?

Why Condor is not installed on all PCs?

• Is it difficult to install?

• Is it difficult to use?

• Is it difficult to maintain?

• We are prefer to buy new machines?

Page 26: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

An automatic installation tool

• Three type of installation– server: binary and library only– client: configuration files only– Full: client+server

• Rpm files are built up

• Web interfacehttp://www.bo.infn.it/calcolo/condor/infn-installation-tool-6.6.7.html

Page 27: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Server installation

• Only binaries and libraries

• Usually done on nfs or afs servers. It exports bin and lib to the clients

Page 28: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Client installation

• Install configuration files using data specified through the web interfaceCreates startup and shutdown scripts for the Condor daemons

• Add binaries path (from the ‘server’ installation) in the users PATH

Page 29: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Full installation

• Client + Server

• All the condor distribution and the configuration files on the same machine

• NFS and AFS are not required

Page 30: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006
Page 31: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006
Page 32: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006
Page 33: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Conclusion

• The INFN Condor Pool has been the first ‘pre-grid’ wide area distributed computing system.

• It is still used by people out from the ‘big science’.

Page 34: History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006

Conclusion (cont.)

BUT: why not Condor on each PC?

We did not find the answer in 10 years…