Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1

Alastair Dewhurst, Dimitrios ZilaskosRAL Tier1

Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams

Maximising job throughput using Hyper-Threading

RAL Tier1 SetupThe RAL TIER1 batch farm consists of several multicore, hyperthreading capable CPUs. Increases in the amount of memory per node combined with experiences from other sites made hyperthreading an attractive option for increasing job throughput.RAL supports all LHC VOs, with prime users being Atlas, CMS and LHCB, and a 10% of resources is devoted to non-LHC VOsThe virtual cores provided by hyperthreading could double the batch farm capacity, however the amount of memory available in the batch nodes did not permit that. LHC jobs require more than 2GB RAM to run smoothly.

With all HT cores enabled total job slots capacity could double. In practice memory constrains resulted in an increase of 30%

Method• Each generation of batch farm hardware with

hyperthreading capability was benchmarked with HEPSPEC, progressively increasing the number of threads up to the total number of virtual cores

• Benchmarks at that time were conducted using Scientific Linux 5. Scientific Linux 6 benchmarks were run later as the batch farm was set to be updated.

• Scientific Linux 6 performed slightly better than Scientific Linux 5. The overall trend was identical

• Power, temperature, disk I/O and batch server performance were closely monitored

• The results indicated a nearly linear increase in the hepspec scores, flattening at about 14 threads for 2 CPU 4 core nodes and 20 threads for 2 CPU 6 core nodes

• The revealed sweet spots were then configured for use in the batch farm to discover where production VO jobs would perform optimally

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

HEPSPEC results

2009-streamline

2009-viglen

2010-clustervision

2010-viglen

2011-viglen

2011-dell

Number of threads

Results• Overall, 2000 extra job slots and 9298 extra hepspec were added in the batch farm using already available

hardware• Average job time increases as expected, but overall job throughput increased• Network/disk/power/temperature usage did not increase in a way that could negatively affect the overall

throughput or require additional maneuvers• Batch server was able to handle the extra job slots• Of critical importance is the sharp drop in job efficiency as job slots approach the upper

hyperthreading limit. This means that real world VO jobs would suffer if we went for full batchfarm HEPSPEC performance!

12 14 16 18 20 22 240.88

Evolution of Job efficiency as more HT cores are being used

dell-2011viglen-2011

Number of threads

efficie

Conclusions• New procurements now take into account the hyperthreading capabilities• For 2012, dual 8 core CPU systems go up to 32 virtual cores• Systems were procured with 128 GB RAM in order to exploit full hyperthreading capabilities• Dual Gigabit links, in the future single 10 GB as they became more cost effective• So far RAID0 software raid setup has proven sufficient for disk I/O• Performance gains so far on par with previous generations• By spending a bit extra on RAM, we save more by buying fewer nodes• This also saves machine room space, cables, and power

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 320

2012 procurements benchmarks

ocf-2012dell-2012

Number of threads

2009-strea

2009-vigle

2010-cluste

rvisio

2010-vigle

2011-vigle

2011-dell

2008-vigle

n (non HT)

Available RAM under different setups

RAM per coreRAM per vcoreRAM per job slot

Physical cores(before) HT Cores Job slots(after)0

12000Comparison of physical cores, full HT cores, and optimum

pre 2012 batch farm 2012-present farm0

Evolution of batch farm size

Physical coresHT cores

2009-streamline 2009-viglen 2010-clustervision 2010-viglen 2011-viglen 2011-dell 2008-viglen (non HT)0

Physical cores compared to HT cores

Physical cores(before)HT Cores

2009-streamline 2009-viglen 2010-clustervision 2010-viglen 2011-viglen 2011-dell0%

HESPEC % increase with full HT

Make Job Slots per WN Efficiency Average Job Length (mins) Standard Deviation (mins) Number of jobs

Dell 12 0.9715 297 370 19065

Viglen 12 0.9757 320 390 23864

Dell 14 0.9719 238 326 6118

Viglen 14 0.9767 270 341 11249

Dell 16 0.9859 343 254 6550

Viglen 16 0.985 304 249 8756

Dell 18 0.9781 377 390 5014

Viglen 18 0.9808 350 391 6263

Dell 20 0.9758 318 346 11339

Viglen 20 0.9756 260 285 11229

Dell 22 0.9747 387 315 6317

Viglen 22 0.9783 305 236 6307

Dell 24 0.9257 544 373 6650

Viglen 24 0.9311 372 278 6713

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1

Documents

Tier1 - Disk Failure stats and Networking

Tier1 Tools

DFMA DESIGN DECISION · Boothroyd Dewhurst, Inc. 2016 June 8th, 2016 DFMA DESIGN DECISION Understanding total product cost. Boothroyd Dewhurst, Inc. Nicholas P Dewhurst ... Mold temp.,

1.169946!Tier1 PSW Guide 04.11

M15 Part1 Tier1 MultiObjective

Dimitrios Stimoniaris

Tier1® 5-Stage Reverse Osmosis System...Reverse Osmosis Membrane: Tier1-TROM-50 based on typical water consumption of a family of four. 4. Carbon Post-Filter: Tier1-IN-WF0140 months

AGILE LEARNING - TiER1 Performance

Portfolio Nikolopoulos Dimitrios

Mark Dewhurst Automotive Engineering

RAL Tier1/A Site Report

Dimitrios Theologitis

Dewhurst c Common Knowledge Essential Intermediate Programming

Utsa Tier1 Presentation 021009

Tier1 View: Resilience Status, plans, and best practice Martin Bly RAL Tier1 Fabric Manager GridPP22 – UCL - 2 April 2009

Dimitrios Christias Robert Lyon Andreas Petrou Dimitrios Christias Robert Lyon Andreas Petrou

VIRTUAL PLAYBOOK - TiER1 Performance

Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum

MOSCHOUTIS DIMITRIOS CERTIFICATES

Anthony Dewhurst Presentation