17
GSIAF GSIAF "CAF" experience at "CAF" experience at GSI GSI Kilian Schwarz Kilian Schwarz

GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

Embed Size (px)

Citation preview

Page 1: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

GSIAFGSIAF"CAF" experience at GSI"CAF" experience at GSI

Kilian SchwarzKilian Schwarz

Page 2: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

GSIAFGSIAF

• Present statusPresent status

• installation and installation and configurationconfiguration

• usage experience (see usage experience (see talk of M. Ivanov on talk of M. Ivanov on Friday)Friday)

• Plans and outviewPlans and outview

• Issues and problemsIssues and problems

Page 3: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

ALICE T2 and GSIAF – ALICE T2 and GSIAF – present status present status

vobox

LCG RB/CE

GSI batchfarm:

ALICE cluster (39

nodes/252 cores for

batch & GSIAF)

Directly attached disk

storage (55 TB)

ALICE::GSI::SE_tactical

::xrootd

30 TB + 120

ALICE::GSI::SE::xrootd

PROOF/

Batch

Grid

CERN

GridKa

150 Mbps

GSI

GSI batchfarm:

Comm

on batch

queue (132 nodes)

Page 4: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

Present StatusPresent Status •ALICE::GSI:SE::xrootdALICE::GSI:SE::xrootd

•> 30 TB disk on fileserver (8 FS a 4 TB each)> 30 TB disk on fileserver (8 FS a 4 TB each)

•+ 120 TB disk on fileserver+ 120 TB disk on fileserver–20 fileserver 3U 15*500 GB disks RAID 520 fileserver 3U 15*500 GB disks RAID 5–6 TB user space per server6 TB user space per server

•Batch Farm/GSIAF and ALICE::GSI::SE_tactical::xrootdBatch Farm/GSIAF and ALICE::GSI::SE_tactical::xrootd

nodes dedicated to ALICE:nodes dedicated to ALICE:

•15 D-Grid funded boxes: each 15 D-Grid funded boxes: each –2*2core 2.67 GHz Xeon, 8 GB RAM2*2core 2.67 GHz Xeon, 8 GB RAM–2.1 TB local disk space on 3 disks + system disk2.1 TB local disk space on 3 disks + system diskAdditionally 24 new boxes: each Additionally 24 new boxes: each

–2*4core 2.67 GHz Xeon, 16 GB RAM2*4core 2.67 GHz Xeon, 16 GB RAM–2.0 TB local disk space on 4 disks including system2.0 TB local disk space on 4 disks including system

•on all machines: Debian Etch 64biton all machines: Debian Etch 64bit

Page 5: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

installationinstallation• shared NFS dir, visible by all nodesshared NFS dir, visible by all nodes

– xrootd (version 2.7.0 build 20070926-0000) xrootd (version 2.7.0 build 20070926-0000) – ROOT (5.17/04)ROOT (5.17/04)– AliRoot (head)AliRoot (head)– all compiled for 64bitall compiled for 64bit

• reason: due to fast software changesreason: due to fast software changes

• disadvantage: possible NFS stalesdisadvantage: possible NFS stales

• tried to build Debian packages of the used tried to build Debian packages of the used software, but this process is too long lastingsoftware, but this process is too long lasting

Page 6: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

configurationconfiguration• setup: 1 standalone, high end 8 GB machine for xrd setup: 1 standalone, high end 8 GB machine for xrd

redirector and proof master, Cluster: xrd data servers and redirector and proof master, Cluster: xrd data servers and proof workers, AliEn SEproof workers, AliEn SE

• so far no authentification/authorization so far no authentification/authorization

• via Cfenginevia Cfengine– platform independent computer administration system (main platform independent computer administration system (main

functionality: automatic configuration). functionality: automatic configuration).

• xrootd.cf, proof.conf, TkAuthz.Authorization, access control, xrootd.cf, proof.conf, TkAuthz.Authorization, access control, Debian specific init scripts for start/stop of daemons (for the Debian specific init scripts for start/stop of daemons (for the latter also Capistrano for fast prototyping)latter also Capistrano for fast prototyping)

• all configuration files are under version control (subversion)all configuration files are under version control (subversion)

Page 7: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

Cfengine – config files in Cfengine – config files in subversionsubversion

Page 8: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

monitoringmonitoring via MonaLisavia MonaLisahttp://grid5.gsi.de:8080http://grid5.gsi.de:8080

Page 9: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

GSIAF usage experienceGSIAF usage experience

• see talk of M. Ivanov on Friday at 10:00 see talk of M. Ivanov on Friday at 10:00 ""Analysis Experience at GSIAFAnalysis Experience at GSIAF " "

• real life analysis work of staged data by real life analysis work of staged data by GSI ALICE group (1-4 concurrent users)GSI ALICE group (1-4 concurrent users)

• 2 user tutorials for GSI ALICE users (10 2 user tutorials for GSI ALICE users (10 students each training)students each training)

• GSIAF and CAF were used as PROOF GSIAF and CAF were used as PROOF clusters during PROOF course at GridKa clusters during PROOF course at GridKa School 2007 School 2007

Page 10: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

GridKa School 2007GridKa School 2007

Page 11: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

plans and outviewplans and outview• Study coexistence of interactive and batch processes Study coexistence of interactive and batch processes

(PROOF analysis on staged data and Grid (PROOF analysis on staged data and Grid user/production jobs) on the same machines. Develop user/production jobs) on the same machines. Develop possibility to increase/decrease the number of batch possibility to increase/decrease the number of batch jobs on the fly to give advantage to analysis. Currently jobs on the fly to give advantage to analysis. Currently at GSI each PROOF worker is an LSF batch nodeat GSI each PROOF worker is an LSF batch node

• optimise I/O. Various methods of data access (local optimise I/O. Various methods of data access (local disk, file servers via xrd, mounted lustre cluster) are disk, file servers via xrd, mounted lustre cluster) are being investigated systematically.being investigated systematically.

• extend GSI T2 and GSIAF according to promised ramp extend GSI T2 and GSIAF according to promised ramp up planup plan

Page 12: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

systematic I/O testssystematic I/O testsstill under investigationstill under investigation

lustre: 1 job: 200 events/sec up to 8 jobs: 1200 events/sec

more tests are on the way (also PROOF analysis on lustre)

Page 13: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

ALICE T2 – ramp up plansALICE T2 – ramp up plans  http://lcg.web.cern.ch/LCG/C-RRB/http://lcg.web.cern.ch/LCG/C-RRB/MoU/WLCGMoU.pdfMoU/WLCGMoU.pdf

GSIAF will grow with a similar rate

Page 14: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

issues and problemsissues and problems• instability of PROOF (or our setup). During the last user tutorial (10 instability of PROOF (or our setup). During the last user tutorial (10

users) we had to restart the PROOF master 3 times within 5 hours. users) we had to restart the PROOF master 3 times within 5 hours. It looks like a Reset of a messed up PROOF session can lead under It looks like a Reset of a messed up PROOF session can lead under certain circumstances to the end of the xrootd redirector.certain circumstances to the end of the xrootd redirector.

• our first set of workers provide once in a while lengthy response our first set of workers provide once in a while lengthy response times which is why PROOF refuses to connect and skips them as times which is why PROOF refuses to connect and skips them as workers. This leads sometimes to hanging login sessions since workers. This leads sometimes to hanging login sessions since PROOF does not seem able to skip WNs and continue at this stage. PROOF does not seem able to skip WNs and continue at this stage. Reason: probably missing NIS cache Reason: probably missing NIS cache

• helpful would be if in the xrootd log there would appear some key helpful would be if in the xrootd log there would appear some key word like "error", e.g. if something goes wrong. This would word like "error", e.g. if something goes wrong. This would facilitate finding problems like above significantlyfacilitate finding problems like above significantly

• xrootd versioning in ROOT/AliEn/SLAC and software dependencies xrootd versioning in ROOT/AliEn/SLAC and software dependencies (AliRoot/ROOT/xrootd/AliEn)(AliRoot/ROOT/xrootd/AliEn)

Page 15: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

POSIXPOSIX

• GSI users want to know what files we GSI users want to know what files we have on the GSIAF cluster and want to have on the GSIAF cluster and want to be able to deal with their files in a be able to deal with their files in a POSIX like way.POSIX like way.

• XrootdFS based and Fuse tested in XrootdFS based and Fuse tested in collaboration with Andreas Petzold collaboration with Andreas Petzold (TUD)(TUD)– so far it seems to work only with individual so far it seems to work only with individual

data servers, not on the whole clusterdata servers, not on the whole cluster

Page 16: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

issues and problemsissues and problems• how to bring data to GSIAF ?how to bring data to GSIAF ?

• suggested method:suggested method:

• see see http://alien.cern.ch/twiki/bin/view/AliEn/HowToUseCollecthttp://alien.cern.ch/twiki/bin/view/AliEn/HowToUseCollectionsions – create a collection of filescreate a collection of files– mirror the collection (and the files in it) to the new SE.mirror the collection (and the files in it) to the new SE.

• But during last exercise of the 231614 files in But during last exercise of the 231614 files in /alice/sim/2007/LHC07c/pp_minbias/ only 42191 seem to /alice/sim/2007/LHC07c/pp_minbias/ only 42191 seem to have landed at GSIhave landed at GSI

Page 17: GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage

issues and problemsissues and problems• how to bring data to GSIAF ? Since GSIAF is a working AliEn SE we would like to how to bring data to GSIAF ? Since GSIAF is a working AliEn SE we would like to

access the stored files directly on the SE (local disks) via the same names the files access the stored files directly on the SE (local disks) via the same names the files would have in the AliEn FC without additional staging from the closest SE.would have in the AliEn FC without additional staging from the closest SE.

• Do PROOF Datasets (see talk of Jan Fiete) actually work, now ? Is this a solution to Do PROOF Datasets (see talk of Jan Fiete) actually work, now ? Is this a solution to our issue ?our issue ?