31
Prague, 9.9.2004 1 Tier 2 Centre in Prague Jiří Chudoba Institute of Physics AS CR

Tier 2 Centre in Prague

Embed Size (px)

DESCRIPTION

Tier 2 Centre in Prague. Ji ří Chudoba Institute of Physics AS CR. Tier 2 Centre in Prague. Grid projects in the CR (local and international Collaborations) Network in the Czech Republic Main grid applications and user communities, resources Plans. Grid projects in CZ. - PowerPoint PPT Presentation

Citation preview

Page 1: Tier 2 Centre in Prague

Prague, 9.9.2004 1

Tier 2 Centre in Prague

Jiří Chudoba Institute of Physics AS CR

Page 2: Tier 2 Centre in Prague

Prague, 9.9.2004 2

Tier 2 Centre in Prague• Grid projects in the CR (local and

international Collaborations) • Network in the Czech Republic• Main grid applications and user

communities, resources• Plans

Page 3: Tier 2 Centre in Prague

Prague, 9.9.2004 3

Grid projects in CZ• Grid-like project from 1990th

– MetaCentre• Grid projects

– EDG – European DataGrid– GRIDLAB, COREGRID - CESNET– LCG – LHC computing Grid– EGEE – Enabling Grids for E-science in Europe– SAMGRID/JIM (working data handling system with

grid functions and job monitoring) – developed at FNAL for D0 and CDF – starting in FZU (D0)

Page 4: Tier 2 Centre in Prague

Prague, 9.9.2004 4

Project MetaCentre (1/2)• MetaCentre – project of CESNET with more

collaborators– CESNET z.s.p.o.

• Association of legal bodies formed by Czech Universities and Academy of Sciences of the Czech Republic (AS CR), provider of Czech research network, research in advanced network technologies and applications

– Distributed environment for extensive computing• Super/Cluster/Grid computing

– Collaborates with international computing projects• CESNET collaborates with international Grid projects

through MetaCentre– Main HW resources in

• Supercomputing Centre at Masaryk University in Brno• Supercomputing Centre at Charles University in Prague• Supercomputing Centre at West Bohemian University in

Pilsen

Page 5: Tier 2 Centre in Prague

Prague, 9.9.2004 5

Project MetaCentre (2/2)

– High speed network connectivity– Unified environment with

• Single user login, same user interfaces and application interfaces

• Shared file system OpenAFS with Kerberos• Interactive access and batch queue system

PBS, support for parallel and distributed computing.

• Special applications supported on specialized university locations – e.g. computational chemistry, molecular modeling, technical and material simulations

Page 6: Tier 2 Centre in Prague

Prague, 9.9.2004 6

EDG in CZ• EDG – European DataGrid Project

– 2001 – 2003 (prolonged to end March 2004)– CZ participation

• CESNET • Institute of Physics AS CR (FZU)

– Research institute for solid state and particle physics– Contribution to work packages

• WP1 – Workload Management System (scheduling and resource management)

– CESNET – logging and book keeping service• WP6 – Testbed and Demonstrator

– Grid farms at CESNET and FZU– Certification Authority created at CESNET for scientific

computing• WP7 – Network monitoring

– CESNET

Page 7: Tier 2 Centre in Prague

Prague, 9.9.2004 7

LCG in CZ (1/2)

• LCG – LHC Computing Grid (joined 2002)– Computing environment for LHC experiments by

deploying a worldwide computational grid service, integrating the capacity of scientific computing centres spread across Europe, America and Asia into a virtual computing organisation

– CZ participation in GDB – Grid Deployment Board• forum within the LCG project where the computing

management of the Experiments and the Regional Centres can discuss and take, or prepare, the decisions necessary for planning, deploying and operating the LCG Grid

• FZU participates– Hosting Tier-2 Regional Computing Center– Currently installed MW is LCG-2 and AliEn

Page 8: Tier 2 Centre in Prague

Prague, 9.9.2004 8

LCG-2 status map2 centers in CZ - FZU and CESNET (EGEE)

Page 9: Tier 2 Centre in Prague

Prague, 9.9.2004 9

EGEE in CZ• EGEE – Enabling Grids for E-science in Europe

– For 2004-2005– integrate current national, regional and thematic

Grid efforts to create a seamless European Grid infrastructure for the support of the European Research Area

– CZ participation in EGEE – CESNET• SA1 - European Grid Support, Operation and Management

– Operation of LCG-2 certified farms FZU and CESNET– Local support

• NA3 - User Training and Induction • NA4 - Application Identification and Support

– computational chemistry, technical and material simulations• JRA1 - Middleware Re-engineering and Integration

Page 10: Tier 2 Centre in Prague

Prague, 9.9.2004 10

Network in the CZ• External connections

– 2.5 Gbps line to GÉANT, used for academic traffic – 800 Mbps line to Telia, used for commodity traffic – 10 Gbps line to NetherLight for experimental

traffic • Internal network

– Two connected stars – Redundant connections

• Effectively connections of network rings with small number of hops

Page 11: Tier 2 Centre in Prague

Prague, 9.9.2004 11

GEANTTopologyMay 2004

Page 12: Tier 2 Centre in Prague

Prague, 9.9.2004 12

CESNET Topology April 2004

Page 13: Tier 2 Centre in Prague

Prague, 9.9.2004 13

Main grid applications in CZ• HEP - High Energy Physics (i.e.

Particle Physics)– Well established

• Computational chemistry, Technical and material simulations– Preparing environment and looking for

users in the framework of EGEE

Page 14: Tier 2 Centre in Prague

Prague, 9.9.2004 14

Particle Physics Applications (1/6)

• Particle Physics in the Czech Republic– Charles University in Prague– Czech Technical University in Prague– Institute of Physics of the Academy of Sciences of the

Czech Republic– Nuclear Physics Institute of the Academy of Sciences of

the Czech Republic• Main Applications

– Projects ATLAS, ALICE, D0, STAR, TOTEM– Groups of theoreticians– Approximate size of the community in 2004

58 scientists, 22 dipl. engineers, 21 technicians and 43 students and PhD students

Page 15: Tier 2 Centre in Prague

Prague, 9.9.2004 15

Particle Physics Applications (2/6)

• Computing/ grid resources

• From 2004 usage of the CESNET farm– Skurut

• 16 nodes PIII, 2 processors, 700 MHz, U2, 1 GB per node

• Today Skurut – LCG2 certified farm– shared with other

applications

Page 16: Tier 2 Centre in Prague

Prague, 9.9.2004 16

Particle Physics Applications (3/6)

• Computing farm GOLIAS in FZU

– 34x dual 1.13Ghz PIII– 1GB RAM per node– 1TB disc array– 10TB disc, 3 x 3 TB– Power 34 kSI2000– Plan for usage:

• 50% LCG (ATLAS+ALICE), 50% D0

Page 17: Tier 2 Centre in Prague

Prague, 9.9.2004 17

Particle Physics Applications (4/6)

• New farm from July 2004– New computer place in FZU with

150 kW electric power for computers

– Air condition, UPS, Diesel, – Network connection

• Standard 1 Gbps connection to research network CESNET

– via metropolitan Prague research network PASNET

• Direct optical connection 1 Gbps to CzechLight (CESNET- Amsterdam- Geneva)

– With BGP fallback to GEANT

– 2 x 9 racks– Shared with FZU – main user– + room for operator

Page 18: Tier 2 Centre in Prague

Prague, 9.9.2004 18

Particle Physics Applications (5/6)

• New equipment (from July 2004), installation – 49x dual Intel Xeon 3.06 GHz

with Hyper Threading – computing elements

– 2x dual Intel Xeon 2.8 GHz with Hyper Threading – frontend

– 3x dual AMD Opteron 1.6 GHz • 1 file server (64 bits)• 2 computing elements

– Disc array 30 TB, ATA discs, RAID5

– All nodes connected via 1 Gbps– 3x HP ProCurve Networking

Switch 2848• New power ~100 kSi2000

Page 19: Tier 2 Centre in Prague

Prague, 9.9.2004 19

Particle Physics Applications (6/6)

• LCG2 (LCG-2_2_0) installed and certified

• Queuing system– PBSPro 5.4.0– Task submission either locally

or via grid (LCG2)– For LCG2 we have prepared

interface to PBSPro• Changes to middleware

published LCG web– queues

• General: short, normal, long• Special for experiments: D0,

ATLAS, ALICE• For LCG2: lcgshort, lcglong,

lcginfinite

• Disk array partitions – Success with creation one 8TB

partition from smaller partitions

• using LVM2 from linux kernel 2.6, can't be easily done with 2.4 kernel due to block device size limit

– At least kernel 2.6.6 is needed, older versions have stability problems when NFS goes under heavy load on >1TB partitions

• On both farms Skurut and Golias, instlalled and certified

– LCG-2_2_0http://goc.grid-support.ac.uk/gppmonWorld/gppmon_maps/CERN_lxn1188.html

– ATLAS SW http://mbranco.home.cern.ch/mbranco/cern/lcg2.html

– ALIEN

Page 20: Tier 2 Centre in Prague

Prague, 9.9.2004 20

Participation in Atlas DC• effectively started in July, during

Golias farm upgrade• anyway we managed to participate

from the beginning:– new nodes were gradually installed and

added to the farm– disk space is big enough, 30TB disk

array will be added later

Page 21: Tier 2 Centre in Prague

Prague, 9.9.2004 21

ATLAS DC2 - Number of Jobs - September 6

0

500

1000

1500

2000

2500

3000

4062

340

62640

62940

70240

70540

70840

71140

71440

71740

72040

72340

72640

72940

80140

80440

80740

81040

81340

81640

81940

82240

82540

82840

83140

90340

906

Days

Num

ber

of Jo

bs

Grid3NorduGridLCG

Page 22: Tier 2 Centre in Prague

Prague, 9.9.2004 22

ATLAS DC2 – LCG partATLAS DC2 - LCG - September 71%

2%0%1%

2%

14%

3%1%3%

9%

8%

3%2%5%1%4%

1%1%

3%0%

1%1%4%

1%0%

12%

0%1%1%

2%

10%

1% 4%

at.uibkca.triumfca.ualbertaca.umontrealca.utorontoch.cerncz.goliascz.skurutde.fzkes.ifaees.ifices.uamfr.in2p3it.infn.cnafit.infn.lnlit.infn.miit.infn.nait.infn.nait.infn.romait.infn.toit.infn.lnfjp.iceppnl.nikhefpl.zeusru.msutw.sinicauk.bhamuk.icuk.lancsuk.manuk.rl

goliasskurut

Page 23: Tier 2 Centre in Prague

Prague, 9.9.2004 23

Jobs statistics - Golias

Golias farm usage, July-September

Relative CPU usage (July-September)

testx d0prod short test alicelcgatlasprod long atlas d0 infinite

Page 24: Tier 2 Centre in Prague

Prague, 9.9.2004 24

Skurut in ATLAS DC2All Jobs

0

100

200

300

400

500

600

700

1.7.20

04

8.7.20

04

15.7.

2004

22.7.2

004

29.7.

2004

5.8.20

04

12.8.

2004

19.8.

2004

26.8.

2004

All Jobs

Long Jobs

0

10

20

30

40

50

60

1.7.2004

8.7.2004

15.7.200

4

22.7.200

4

29.7.200

4

5.8.2004

12.8.200

4

19.8.2

004

26.8.200

4

Long Jobs

0

10

20

30

40

50

60

CPU Time Long Jobs(day s)

wall Time Long Jobs(day s)

Page 25: Tier 2 Centre in Prague

Prague, 9.9.2004 25

ALICE PDC Phase1Prague – Golias (before

upgrade)• Data transfer: 1.8 TB over 50 days active running • CPU work: 20.3 MSI-2K hours• Number of CPUs: Max 32, avarage 16

Number of running jobs on Golias farm during ALICE DC(1 week snapshot)

Page 26: Tier 2 Centre in Prague

Prague, 9.9.2004 26

ALICE PDC – Phase2

golias

Page 27: Tier 2 Centre in Prague

Prague, 9.9.2004 27

Technical problems to solve• Fair-share usage in heterogeneous

environment:• just reserved number of CPUs not good• smaller difference in the CPU power for jobs

with big IO requirements• OpenPBS vs PBSPro• Optimal usage of the facility• HyperThreading: on or off?

Page 28: Tier 2 Centre in Prague

Prague, 9.9.2004 28

HyperThreading

HT with scheduling

HTnoHT

2592 s

2.26674 s

2594 s2+2 jobs

1.73514 ± 3 s

1.73515 ± 5 s

2596 ± 10 s4 jobs, parallel

1296 ± 2 s

1.13337 ± 48 s

1297 ±1 s2 jobs, parallel

CERN RH 7.3.3, kernel 2.4.20, AliRoot v4-01-05, 1000 tracks HIJINGParam, Real timeftp://ftp.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/ + http://freshmeat.net/projects/sched-utils/

CPU0 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idleCPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idleCPU2 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idleCPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle

CPU0 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idleCPU1 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idleCPU2 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idleCPU3 states: 0.0% user, 0.1% system, 0.0% nice, 99.0% idle

0 50 100 150 200 250 300 350 400 450

Page 29: Tier 2 Centre in Prague

Prague, 9.9.2004 29

Financial resources• Financial resources for the Grid activities

in the Czech Republic– No dedicated Government support for the Grid

projects– CESNET supports Grid projects from its

research budget and from EU grants (EGEE)– Institutions exceptionally get small grants– Occasional supports from the Institutes

• Like the new computing room in FZU– Marginal support from application projects

needing computing

Page 30: Tier 2 Centre in Prague

Prague, 9.9.2004 30

Plans• EGEE

– Should help to attract other applications to use working grid infrastructure

• HEP– Existing environment serves at the level

of TIER-2 regional centre– We have to find resources to be able to

upgrade it ~ten times to have required computing resources in 2007

Page 31: Tier 2 Centre in Prague

Prague, 9.9.2004 31

Conclusion• The Grid infrastructure in the Czech Republic

established– Profit from earlier experience and know-how of the

MetaCentre project– Established EDG, LCG and EGEE projects and some

smaller ones– Applications

• HEP – well established local community actively using grid environment for LCG

• Other applications – looked for inside EGEE project, local experience already with Computational chemistry, Technical and material simulations

– No specialised resources for grid projects, partial financing from different research projects or Institutions ad hoc contributions