Upload
shanna
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Summary of the HEPiX Spring 2013 Meeting. Arne Wiebalck Luca Mascetti Luis Fernandez Alvarez CERN ITTF May 17, 2013. HEPiX – www.hepix.org. Global organization of service managers and support staff providing computing facilities for HEP community - PowerPoint PPT Presentation
Citation preview
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Summary of the HEPiX Spring 2013 Meeting
Arne Wiebalck
Luca Mascetti
Luis Fernandez Alvarez
CERN ITTF May 17, 2013
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 2
HEPiX – www.hepix.org
• Global organization of service managers and support staff providing computing facilities for HEP community
• Participating sites include BNL, CERN, DESY,
FNAL, IN2P3, NIKHEF, RAL, SLAC, TRIUMF …
• Meetings are held twice per year– Spring: Europe, Autumn: U.S./Asia
• Exchange of experiences, reports on recent work,work in progress & future plans– Usually no showing-off
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 3
Outline
• Miscellaneous, Site reports, Storage (Arne)
• IT infrastructure, Computing (Luca)
• Virtualization, Networking & Security (Luis)
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 4
HEPiX Spring 2013
• May 15-19 at CNAF, INFN, Bologna (IT)– Very well organized, pretty rich program
– Network access: eduroam (Thanks to CS for last minute support!)
• 83 registered participants– Administrative hurdles (& illnesses) prevented better participation
– Europe: 69, U.S./Canada: 8, Asia: 5, Australia: 1 (CERN: 15)
• ~70 presentations from 40 institutes– 3 BoF sessions (OpenAFS/IPv6, CMDBuild, Energy efficiency)– Many offline discussions
• Sponsors: WD, DDN, IBM, E4, and Univa
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 5
Next HEPiX Meetings
• Autumn 2013– U Michigan, Ann Arbor, MI, U.S. – Oct 28 – Nov 1, 2013
• Spring 2014– LAPP, Annecy, France– May 19 – May 23, 2014
• Autumn 2014– several options, not yet decided
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 6
Updates from the WGs
• IPv6– IPv4 address shortage becoming a serious issue soon– distributed testbed has been set up, more and more sites
joining, constant testing (file transfer)– Tools & Software Survey, “problematic” applications identified– http://indico.cern.ch/contributionDisplay.py?contribId=35&sessionId=2&confId=220443
• Storage– WG terminated– Summary report at Ann Arbor meeting
• Benchmarking– No new SPEC benchmark– Application/benchmark discrepancies become worrying
(used for purchases)
• Configuration Management– New WG led by Ben Jones (CERN) and Yves Kemp (DESY)
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 7
Some Trends
• Batch system reviews everywhere– BNL, CERN, GridKA, NERSC, …
– Univa GridEngine seems to take the lead
– WNs with HT
• Broad use of cloud services & virtualization – Private clouds almost everywhere (mostly OpenStack)
– Idle VM detection (FNAL), EC2 spot pricing (BNL)
• Puppet taking the lead for configuration mgmt– But: no monoculture expected
• Interest in Ceph for VM storage– ASGC, BNL, CERN, RAL, …
– At an early stage everywhere
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 8
Site Reports (1)
• Storage/File Systems– Lustre sites happy, GSI: 8PB, home-made access control
– NFS on BlueArc (BNL: almost 1PB of disk space, home+scratch)
– GlusterFS mentioned once
• Tape– Mostly Sun SL8500s, some IBM, but also Spectra T-Finity (UiO)
– FNAL encountered excessive write errors on new tapes:Contaminated with debris during manufacturingSolution: f/w upgrade and change of manufacturing process
– Tape access optimization: BNL’s developed tape scheduler in HPSS
• Authentication– FNAL looking into consolidation of authentication setup:
MIT Kerberos + CA, two separate AD domainsPlan to be presented a next HEPiX
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 9
Site Reports (2)
• Software– SL5 still the mostly used OS for compute clusters (move to SL6 planned)– BNL successfully uses ORACLE/ksplice (rebootless kernel patching) since
about 2 years on their production clusters
• Hardware– Dell systems dominate (PowerEdge R410, R510, R720, C6220, MD3260…)
Not only in the U.S.
• Infrastructure– NERSC computing facilities will be relocated from Oakland to the new CRT
building in BerkeleyFirst systems will move 1Q2015, last will stay until 4Q2016
• Networking– Jumbo frames on LAN are being tried at several sites
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 10
Storage (1)
• Track dominated by CERN presentations (7/11)– Mostly reported already on previous ITTF presentations
(AFS, CASTOR/EOS, RAID optimizations), or future ones (Ceph)
• DPHEP initiative and its impact for HEPiX– Long-term data management a site responsibility
– Techniques and policies need cross-site coordination
• BoF Session on “OpenAFS & IPv6”– Many sites regard AFS as one of their core services, value its
robustness and plan to continue using it in the future– Various options to deal with the IPv6 situation were discussed,
but not the lack of support is not regarded as a burning issue(at least right now)
– The need to gather more information was identified (use cases,traffic maps, prices for an implementation, …) to take an informed decision (before or at next HEPiX)
– Peter van der Reest (DESY) and Arne Wiebalck (CERN) to follow up
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 11
Storage (2)
• Storage Architecture at CNAF Tier1– 11PB on disk; 16PB on tape– >10k processes at 20GB/s (LAN)
• Few but big, dedicated, replicated storage systems– GPFS + TSM– Whole stack (DDN storage backend nodes, I/O servers,
metadata servers, gridFTP servers, StoRM servers,HSM servers) replicated for each experiment
• Manageability problems– Huge building blocks (compared to yearly growth)– Small config changes (can) affect performance– Storage re-balancing takes effort and (can) affect performance– “Slow disk” problem: faulty disks (can) affect performance
• Evaluating alternatives– Multiport SAS arrays with s/w-RAID?– RAIN (simple EOS-like replication regarded as too expensive)?– EMC Isilon (NAS w/ IB interconnect) under investigation
Alvarez, Mascetti, Wiebalck: Summary of the HEPiX Spring 2013 Meeting - 12
Questions?