15
Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Embed Size (px)

Citation preview

Page 1: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Farm Completion

Beat Jost and Niko NeufeldLHCb Week St. Petersburg

June 2010

Page 2: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Filling the farm

• Thanks for interesting and useful discussions to– Loic Barda, Rolf Lindner, Laurent Roy

and Eric Thomas

• Thanks for measurements and plots to – Juan Caicedo and Patrick Robbe

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 2

Page 3: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

The three limits:Power, Cooling, Money

• Power: 550 kW available (105 kW used)

• Cooling: nominally available 525 kW• Rack-space: 1700 Us (plenty)• Money: xx MCHF

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 3

Page 4: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Event Filter Farm

• Level 1: – 100 SuperMicro Twin servers (2 servers

in a single 1U chassis with shared power-supply), Intel Harpertown CPU 5420 (2.5 GHz) 4 cores / socket, 1 GB RAM /core

• Level 2:– 350 DELL Bladeservers (up to 16 blades

in a 10 U chassis), Intel Harpertown CPU 5420 (2.5 GHz) 4 cores / socket, 2 GB RAM /core

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 4

Page 5: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

The new farm-node

• Both Intel and AMD have brought out new processors: with up to 12 cores / chip and (Intel) hyper-threads (a.k.a. virtual CPUs)

• Memory has (again) become faster and cheaper (DDR-3) and each processor has 3 memory channels ( “good” memory configuration = 3 * n, where n = 2, 4, 8, 16

• Both processors are now NUMA (non-uniform memory access)– Study program ongoing to take profit from this

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 5

Page 6: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

How many jobs / server

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 6

Page 7: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

How fast?

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 7

Page 8: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Server specifications

• 1 GB RAM per hardware thread == virtual core

• 1 Power supply failure should not affect more than 2 units

• 2 Gigabit Ethernet ports• No constraints on power-consumption• CPU (AMD 61xx / Intel 56xx) chosen

such as to optimise the Moore/CHF

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 8

Page 9: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

A likely candidate

• 1.2 kW– redundant PS

• 4 servers with each– 12 cores – 24 GB (up to 96)

RAM– 1 HDD– 2 x Gigabit Ethernet

• 21 kCHF list-price

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 9

Page 10: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Conclusions

• We will run with 16 Moore jobs / server (twice as many as today)

• Each server will be 2 to 2.5 x faster than the current HLT node

• Each Moore instance can use up to 1.5 GB RAM– If really need more RAM

1. Reduce number of jobs2. Increase (double) memory

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 10

Page 11: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Procedure / planning

Step Duration

Decision to buy (day X) 0

Technical specifications to firms 1 week

Firms reply (with offer) / validation of sample server

4 weeks

Adjudication (negotiation) 1 week

Delivery (in batches if possible installation starts as soon as delivered)

6 weeks

Finishing installation 1 week

Farm Level 3 in production 13 weeks after initial decision

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 11

Page 12: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

To-do list

Hardware• Unpacking (surface

SX8 need a lot of space and friendly volunteers)

• Installation in D1– Power, network

• Burn-in (3 days)• Exchange faulty

servers / parts

Software• Install OS, verify

OS tuning (NIC, memory arrangement etc…)

• Integrate in software-management (Quattor)

• Add to farm-control

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 12

Page 13: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

DETAILS

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 13

Page 14: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

Farm Completion St. Petersburg 06/2010 - Niko Neufeld 14

Page 15: Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010

How fast? (Moore v9r2 HLT1 only)

DAQ & electronics upgrade - Niko Neufeld 15