RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska 13-17 October 2014 Martin Bly, STFC-RAL

RAL Site Report

HEPiX FAll 2014 Lincoln, Nebraska13-17 October 2014Martin Bly, STFC-RAL

HEPiX Fall 2014 - RAL Site Report14/10/2015

HEPiX Fall 2014 - RAL Site Report

Tier1 Hardware

• CPU: ~127k HS06 (~13k cores)• Storage: ~13PB disk• Tape: 10k slot SL8500 (one of two in system)• FY14/15 procurement

– Tenders ‘in flight’, closing 17th October– Expect to procure 6PB and 42kHS06– Depends on price…

• New this time:– Storage capable of both Castor and CEPH

• Extra SSDs for CEPH journals– 10GbE for WNs

14/10/2015


Networking

• Tier1 LAN– Mesh network transfer progressing slowly– Phase 1 of new Tier1 connectivity enabled– Phase 2: move the firewall bypass and OPN links to new

router• Will provide 40Gb/s pipe to border

– Phase 3: 40Gb/s redundant link T1 to RAL Site• RAL LAN

– Migration to new firewalls completed– Migration to new core switching infrastructure almost

complete– Sandboxed IPv6 test network available

• Site WAN– No changes

14/10/2015


Network Weathermap

14/10/2015


Virtualisation

• Issues with VMs– Had two production clusters with shared storage, several local

storage hypervisors– Windows Server 2008 + Hyper-V– Stability and migration problems on shared storage systems– Migrated all services to local storage clusters

• New HV clusters– New configuration of networking and hardware– Windows Server 2012 and Hyper-V– Three production clusters

• Include additional hardware with more RAM• Tiered storage on primary clusters

14/10/2015


CASTOR / Storage

• Castor– June: Upgrade to new major version (2.1.14) with various

improvements (disk rebalancing, xroot internal protocol)• Upgrade complete

– New logging system with ElasticSearch– Draining disk servers still slow

• Major production problem

• Ceph– Evaluations continue on the small test cluster

• SSDs for journals installed in cluster nodes– Testing shows mixed performance results, needs more study

– Large departmental resource• 30 servers, ~1PB total

– Dell R520, 8 x 4TB SATA HDD, 32GB RAM, 2 x E5-2403v2, 2 x 10GbE

14/10/2015


Storage failover

• What if the local storage is unavailable?• What if someone else’s local storage is unavailable• Xrootd allows for remote access of data resources on

demand if local data is not available• At RAL, bulk data traffic bypasses firewall

– To/from OPN and SJ6 for disk servers only– NOT WNs

• What happens at firewall?– Concern for non-T1 traffic if we have a failover

• Tested with assistance from CMS• Firewall barely notices

– Very small setup, then transfer offload to ASICs– Larger test to come

14/10/2015

JASMIN/CEMS Hardware

Sept 2014 RIG

• The JASMIN super-data-cluster • UK and Worldwide climate and weather modelling

community.• Climate and Environmental Monitoring from Space (CEMS)• …and all of NERC environmental sciences since JASMIN2

• Eg Environmental genomics, mud slides etc• Facilitating further comparison and evaluation of models

with data.

• 12 PB Storage Panasas at STFC (Largest in the world)• Fast Parallel IO to Physical and VM servers

• Largest capacity Panasas installation in the world (204 shelves)

• Arguably one of top ten IO systems in the world (~250GByte/sec)

• Virtualised and Physical Compute (~3500 cores)• Physical Batch compute “LOTUS”• User + Admin provisioned Cloud of virtual machines

• Data xfer private network links to UK and World sites

2014-15 JASMIN2 Expanded from 5.5PB to 12PB of high performance disc and added

~3,000 CPU cores + ~5PB tape Largest single site Panasas deployment in the world

Benchmarks suggest this might be in the top ten IO systems in the world

Includes a large (100 server + 400TB NetApp VMDK storage) VMware vcloud Director Cloud deployment with custom user ortal

1200+ 10Gb eth ports non-blocking, zero congestion L3 ECMP/OSPF low latency (7.5mS MPI) interconnect– One converged network for everything– Implementing VXLAN L2 over L3 technology for Cloud

Same SuperMicro servers used for batch/MPI computing and cloud/hypervisor work– Mellanox ConnectX3 pro NICs do low latency for MPI and VXLAN

offload for Cloud– Servers all 16 core Ivy Bridge, 128GByte with some at 512GB. All

10Gb networking JASMIN3 this year will add mostly 2Tbyte RAM servers and several PB's

of storage


Other stuff

• Shellshock– Patched exposed systems quickly– Bulk done within days– Long tail of systems to chase down

• Electrical ‘shutdown’ for circuit testing – Scheduled for January

• Phased circuit testing• Tier1 will continue to operate, possibly with some reduced capacity

• Windows XP (still) banned from site networks• New telephone system rollout complete• Recruited a grid-admin, starts soon• Recruiting a System Admin and a Hardware technician

soon

14/10/2015


Questions?

14/10/2015

Documents

RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska 13-17 October 2014 Martin Bly, STFC-RAL