Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Design & Management of the JLAB Farms

Ian Bird, Jefferson LabMay 24, 2001

FNAL LCCWS

Overview

• JLAB clusters– Aims– Description– Environment

• Batch software• Management

– Configuration – Maintenance – Monitoring

• Performance monitoring• Comments

Clusters at JLAB - 1

• Farm– Support experiments – reconstruction, analysis– 250 ( 320) Intel Linux CPU ( + 8 Sun Solaris)

• 6400 8000 SPECint95

– Goals:• Provide 2 passes of 1st level reconstruction at average

incoming data rate (10 MB/s)• (More recently) provide analysis, simulation, and general batch

facility

– Systems• First phase (1997) was 5 dual Ultra2 + 5 dual IBM 43p• 10 dual Linux (PII 300) acquired in 1998• Currently 165 dual PII/III (300, 400, 450, 500, 750, 1GHz)

– ASUS motherboards, 256 MB, ~40 GB SCSI, IDE, 100 Mbit– First 75 systems towers, 50 2u rackmount, 40 1u (½u?)

• Interactive front-ends– Sun E450’s, 4-proc Intel Xeon, (2 each), 2GB RAM, Gb Ethernet

Last summer, 16 duals (2u) + 500 GB cache (8u) per 19” rack

Recently, 5 TB IDE cache disk (5 x 8u) per 19”

Intel Linux Farm

First purchases, 9 duals per 24” rack

Clusters at JLAB - 2

• Lattice QCD cluster(s)– Existing clusters – in collaboration with MIT, at JLAB:

• Compaq Alpha– 16 XP1000 (500 MHz 21264), 256 or 512 MB, 100 Mbit – 12 Dual UP2000 (667 MHz 21264), 256 MB, 100 Mbit

• All have Myrinet interconnect• Front-end (login) machine has GB Ethernet, 400 GB

fileserver for data staging and transfers MIT JLAB

– Anticipated (funded)• 128 cpu (June 2001), Alpha or P4(?) in 1u• 128 cpu (Dec/Jan ?) – identical to 1st 128• Myrinet

LQCD Clusters

16 single Alpha 21264, 1999

12 dual Alpha (Linux Networks), 2000

Environment

• JLAB has central computing environment (CUE)– NetApp fileservers – NFS & CIFS

• Home directories, group (software) areas, etc.

– Centrally provided software apps

• Available in– General computing environment– Farms and clusters– Managed desktops

• Compatibility between all environments – home and group areas available in farm, library compatibility, etc.

• Locally written software provides access to farm (and mass storage) from any JLAB system

• Campus network backbone is Gigabit Ethernet, with 100 Mbit to physicist desktops, OC-3 to ESnet

Jefferson LabMass Storage and Farm Systems

Work File ServersWork File Servers10 TB – RAID 510 TB – RAID 5

Farm Cache File ServersFarm Cache File Servers4 x 400GB4 x 400GB

DST/Cache File DST/Cache File ServersServers

15 TB – RAID 015 TB – RAID 0

Batch and Interactive FarmBatch and Interactive Farm

DB ServerDB Server

Tape ServersTape Servers

From CLAS DAQFrom Hall A,C DAQ100 Mbit/s1000 Mbit/sFCALSCSI

Lattice QCD Metacenter

Myrinet switch128 ports

ClusterEthernet Switch

100 Mb ethernet

Gigabit

Alpha 21264

Qty. 64 (128)

Interactive Duals

Alpha 21264Qty. 2

400 GB

Development Cluster

CISCOCAT 5500

QuadSUN

270 GB 342 GB STK Redwood, STK 9840 Tape Drives

Staging Disks

300 TB

Mass Storage

System

Other Jlabcomputers

MetaStore

SH7400

File Server

MetaStore

SH7400

File Server

1 TB 1 TB

JLabCluster

ClusterEthernet Switch

100 Mb ethernet

Alpha 21264

Qty. 12

Interactive

Alpha 21164

Qty. 2

296 GB

Myrinet switch16 ports

AlantecEthernet Switch

Other MIT computers

AlphaServer8400 (12 cpu)

Library

10 cpu

120 GB

MIT Cluster

Development Cluster

File Server

Dual Pentium

RAIDFile Server

Batch Software

• Farm– Use LSF (v 4.0.1)

• Pricing now acceptable

– Manage resource allocation with• Job queues

– Production (reconstruction, etc)– Low-priority (for simulations), High-priority (short jobs)– Idle (pre-emptable)

• User + group allocations (shares)– Make full use of hierarchical shares - allows single

undivided cluster to be used efficiently by many groups– E.g.

Batch software - 2

• Users do not use LSF directly, use Java client (jsub), that:– Is available from any machine (does not need LSF)– Provides missing functionality, e.g.

• Submit 1000 jobs in 1 command• Fetches files from tape, pre-stages before job queued

for execution (don’t block farm with jobs waiting for data),

– Ensures efficient retrieval of files from tape - e.g. sort 1000 files by tape and by file no. on tape.

– Web interface (via servlet) to monitor job status and progress (as well as host, queue, etc.)

View job status

View host status

Batch software - 3

• LQCD clusters use PBS– JLAB written scheduler

• 7 stages – mimic LSF hierarchical behaviour

– Users access PBS commands directly– Web interface (portal) – authorization based on

certificates• Used to submit jobs between JLAB & MIT clusters

Batch software - 4

• Future– Combine jsub & LQCD portal features to wrap both

LSF and PBS– XML-based description language– Provide web-interface toolkit to experiments to

enable them to generate jobs based on expt. run data

– In context of PPDG

Cluster management

• Configuration– Initial configuration

• Kickstart, 2 post-install scripts for configuration, sw install (LSF etc), driven by a floppy

• Looking at PXE – DHCP (available on newer motherboards)– Avoids need for floppy – just power on– System working (last week)– Software: PXE standard bootprom (www.nilo.org/docs/pxe.html) – talks to

DHCP,» bpbatch – pre-boot shell (www.bpbatch.org) - downloads vmlinux, kickstart etc

• Alphas configured “by hand + kickstart”– Updates etc.

• Autorpm (especially for patches)• New kernels – by hand with scripts

• OS upgrades– Rolling upgrades – use queues to manage transition

• Missing piece:– Remote, network-accessible console screen access

• Have used serial console, KVM switches, monitor on a cart …• Linux Networks Alphas have remote power management – don’t use!

System monitoring

• Farm systems – LM78 to monitor temp + fans via /proc

• This was our largest failure mode for Pentiums

– Mon (www.kernel.org/software/mon) • Used extensively for all our systems – page “on-call”• For batch farm checks mostly – fan, temp, ping

– Mprime (prime number search) has checks on memory and arithmetic integrity

• Used in initial system burn-in

Monitoring

Performance monitoring

• Use variety of mechanisms– Publish weekly tables and graphs based on LSF

statistics– Graphs from mrtg/rrd

• Network performance, #jobs, utilization, etc

Comments & Issues

• Space – very limited– Installing a new STK silo, moved all sys admins out

• Now have no admins in same building as machine room– Plans to build a new Computer Center …

• Have always been lights-out

Future

• Accelerator and experiment upgrades– Expect first data in 2006, full rate 2007– 100 MB/s data acquisition– 1 – 3 PB/year (1 PB raw, > 1 PB simulated)– Compute clusters:

• Level 3 triggers• Reconstruction• Simulation• Analysis – PWA can be parallelized, but needs access to

very large reconstructed and simulated datasets

• Expansion of LQCD clusters– 10 Tflops by 2005

Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

Documents

Brad Sawatzky / JLAB Acknowledgements to Liguang Tang Hampton University/JLAB MESON 2012 Krakow, Poland

Klaus Peters Ruhr-Universität Bochum U Cincinnati, Nov 18, 2002 FNAL, Batavia, Nov 20, 2002 JLab, Newport News, Nov 22, 2002 TRIUMF, Vancouver, Nov 28,

Polarimetry at JLab

Updated 04/26/05MUTAC1 SBIR/STTR Beam Cooling Projects BNL, FNAL, IIT, Jlab, Muons, Inc. Rolland Johnson, updated April 26, 2005 GH 2 RF cavitiesIIT, Kaplan

chokolate fnal

JLab Cryogenic Technology

Dabb Final Fnal

Eric Prebys, FNAL

Software Overview David Lawrence, JLab Oct. 26, 2007 David Lawrence, JLab Oct. 26, 2007

DVCS at JLab

SANE - JLab

Jixie Zhang (Jlab)

WEDPEPSICO Presentation Fnal

Event Display for the Visualization of CMS Data Lothar BAUERDICK (FNAL), Giulio EULISSE (FNAL), Christopher JONES (FNAL), Dmytro KOVALSKYI (UCSB), Thomas

FNAL Target Related Capabilities PASI 2013 Working Group 1 P. Hurh (FNAL)

JLAB Computing Facilities Development

Current Status of the NuTeV Experiment...With Tony Thomas (JLab), Dave Murdock (Tenn Tech) The NuTeV Experiment: charged, neutral currents from neutrino DIS 800 GeV p at FNAL produce

Status of ILC Cost Reduction R&Dvia doping/infusion/modified low T bake –FNAL, ANL, Jlab, Cornell (KEK) {FY18-19} 4. Apply best High Q/high gradient recipe to nine cells –FNAL,

Harut Avakian (JLab)

Report Fnal