Upload
claribel-cross
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Why it might be interesting to look at ARM
Ben Couturier, Vijay KartikNiko Neufeld, PH-LBC
SFT Technical Group Meeting 08/10/2012
The challenge for LHCb
• Major upgrade during LS2• Read out detector at bunch-xing rate
40 MHz• No more hardware based trigger –
need to filter 40 Million events / s (32 Tbit/s) in software
Why look at ARM? N. Neufeld
GBT: custom radiation- hard link over MMF, 3.2 Gbit/s (about 10000)
Input into DAQ network (10/40 Gigabit Ethernet or FDR IB) (1000 to 4000)
Output from DAQ network into compute unit clusters (100 Gbit Ethernet / EDR IB) (200 to 400 links)
Dataflow
Why look at ARM? N. Neufeld
Detector
DAQ network
100 m rock
Readout Units
Compute Units
What will be the Compute Unit?
• Baseline could possibly be augmented with a co-processor card (like Intel MIC or a GPU) lots of interest from various groups
• Alternative 1: Use lower-power, cheaper x86 processors such as Intel Atom, AMD– Optimize HEPSpec/CHF/W
• Alternative 2: Or use non-Intel processors. Try to profit from the highly competitive and innovative market for processors for portable devices ARM
Why look at ARM? N. Neufeld
• A compute unit is a destination for the event-data fragments from the readout units
• It assembles the fragments into a complete “event” and runs various selection algorithms on this event
• About 0.1 % of events is retained
• Baseline option: a high-density server platform (mainboard with standard CPUs) using Moore’s law and some estimates on the algorithms need 4000 to 5000 servers of the 2018 type!
ARM
• A “pure” RISC architecture (with some enhancements)
• A long tradition in the embedded market
• Billions of cores sold– in many variants – # cores / power vs
performance• Produced by various
licensees • Has a reputation of the
best power-efficiency in the market
Why look at ARM? N. Neufeld
We are here32-bitIEEE floatsSIMDnative Javaoffload
Announced:64-bitSIMD with DP floats
So what would a compute unit look like?
Why look at ARM? N. Neufeld
Operational constraints
• The Online farms are very big– O(2000) servers, of different generations, vendors,
• Like a traditional data-centre with all the problems, and very few administrators and some simplifications:– A single client – In Online operation at least mostly a single work-load
• But want rack-mountable, remote-manageable, good mechanics, decent powering, vendor support etc… and of course low cost!– Don’t want to build this ourselves needs to fit in
traditional data-centre structure
Why look at ARM? N. Neufeld
Embedded in the data-centre
Why look at ARM? N. Neufeld
• Boston Viridis (projects also from DELL and HP)
• Consists of 48 SoC• 4 cores 4 GB RAM• ARM A9 Cortex 1.4 GHz
• 80 Gb Ethernet switch• Total 192 cores / 192 GB RAM /
300 Watt• Exists also from DELL/HP
How fast is a core?
Why look at ARM? N. Neufeld
So we’ll need many
Is it worth it?
• ARM v7: 192 cores need 300 W and 2 U for about 520 HepSpecs
• X5650: 96 hyperthreads need about 1400 W and 2 U for 900 HEPSpecs
• If this ratio continues to hold into 2018 LHCb could do the upgrade with a 600 kW data-centre instead of a new (!) 2 MW one
• And maybe at some point we need to pay for the power
Why look at ARM? N. Neufeld
The acid test
• HepSPEC is not necessarily a good test for Online usage– Online we (currently) run n instances of
the same application in parallel, where n is the number of cores/hyperthreads
– No “mixed” work-load – hyperthreading typically adds more in the Online “mono-culture”
• Need to benchmark using the High Level Trigger code
Why look at ARM? N. Neufeld
Project: “Moore on ARM”
• Need to compile the LHCb software-stack (beginning from Root)
• Can compare with natively compiled code – everything works fine on the FC17 test-node, but compilation is slow– Root 5.34.02 ./configure linuxarm --enable-c+
+11;make –j 4 takes 30m43s
• Team (part-time only) Ben Couturier, Vijay Kartik, Niko Neufeld
Why look at ARM? N. Neufeld
Future plans
• X-compiler chain ready• Will now go on to compile stack• Verification and bench-marking• Then: full-scale test on fully loaded
192 core system (with a faster ARM – currently use A8 – will have A9 or A15), possibly including real network input (for fun)
Why look at ARM? N. Neufeld