Science DMZ at Imperial

Science DMZ at ImperialPhil Mayers, Campus network engineering workshop

19/10/2016

1

Science DMZ at ImperialPhil Mayers <[email protected]>

About Imperial● 14,700 students, 8,000 staff

● Focused on science, engineering, medicine and business

● 6 major campuses in London, also Silwood Park, and medical sites

● Perhaps more centralised IT than many universities?

● Dual 2x10G connections to JANET

● Various sponsored a.k.a. BCE customers (NHM, Science Museum, NHS trust)

● GridPP / HEP work - close relationship with researchers

Campus network● Decent size network - ~2400 switches, ~2300 APs, 15k simultaneous wifi users,

>60k devices on-net including PCs, wifi/BYOD, SCADA, VoIP, etc.

● Campus to internet throughput ~2Gbit/s average, ~6Gbit/s peak (Oct 2016)

● Fully dual-stack network - 20-40% IPv6 by throughput, 15% by flows

● Typical architecture - switch, dist, router, core, firewall, wan

HEP group● Main HEP grid cluster processes data for the LHC experiments, other

physics experiments/projects & non-physics communities

○ CMS, LHCb, ATLAS, LZ, COMET, biomed & pheno are the main users

● 275 compute nodes (~4000 cores) connected on 1GbE

● 55 storage nodes (~3.7PB of disk) connected on 10GbE

● Simple stacked top-of-rack switches for connectivity

● Majority of WAN traffic is CMS local-storage <-> remote-storage

○ Popular datasets are automatically placed at CMS sites

○ Users can also request data: 50TB+ dataset requests not uncommon

● Local compute nodes can read remote storage over WAN (and vice versa)

○ Generally low rates compared to storage-storage transfers

HEP growth - 1gigApril

2007

HEP growth - 10gigOct

2011

HEP growth - 20gigOct

2016

Issues faced● Firewalls

○ Straight throughput

○ TCP window checking and other stateful inspection

○ Latency and jitter interfering with throughput

○ Impact on other traffic e.g. Office 365 is quite latency-sensitive with the Outlook client

● Equipment costs

○ Need the right size pipe at every forwarding hop

○ Building edge -> dist -> router -> core -> firewall -> WAN edge

○ A lot of those devices are of a class where fast ports are disproportionately costly

■ “Typical” campus router - approx. £1-2k for a 10gig port

■ 1U 48-port 10G switch - approx. £200 for a 10gig port

Solution - Science DMZ● Had no idea it had a name when we built it!

● Separate L3 switch, outside firewall, routes HEP traffic straight onto core and

onward to JANET

● Simple stateless ACLs for outer tier of security

● Fewer hops, shallower buffers, cheaper kit, wider pipes

● HEP @ Imperial - 4x10G ports to HEP, dual 2x10G ECMP to JANET

○ Split HEP into two subnets, use BGP communities outbound to split inbound traffic

○ Necessitates HEP managing which node IPs are used for transfer

Results - recent past● Quite capable of driving 4x10G at >99.5% utilisation

● Apologies for the graph - low resolution and hourly averages hiding peaks

○ Don’t be fooled - 30-second and 5-minute averages on all 4 10G links to JANET were >99% load

Architecture

Janet

Border

Firewall

CoreDatacentre

Science DMZ

Possible

Benefits● Works - capable of driving campus connectivity to capacity

● Cheap - equipment cost on our side manageable

○ As long as upstream connectivity exists, of course

● Easy - no need to poke at firewalls or building edge to improve throughput

Issues● Works too well!

● At capacity, it can drive other traffic off the campus links

○ 64 bytes from ...: icmp_seq=856 ttl=49 time=104 ms○ from a typical 2ms to the same site

○ Have seen 10gig links running at essentially 100% for >1 hour

● Need to ensure enough spare capacity for other uses

○ Rate-limiting port channels (shudder)

○ Rate-limit $here - sure it’ll be hashed to the same bundle members at $nexthop?

Results - Thu 13 OctLatency across one leg of default route, versus throughput on same

Noticeable to customers… not great. But very impressive throughput!

Issues - Mark 2● Cheap switches are cheap for a reason

● Doesn’t solve distance and fibre issues

○ Want to run in excess of 10G at distances of >10km? Get ready for a lot of zeroes

○ Fibre capacity on inter-site links (install & recurrent costs)

○ Or use DWDM (skills & training, tools, monitoring) - we do this

● Question mark over dual-use systems - is it appropriate to attach to DMZ

○ Can you do a windows domain login from a DMZ?

● Our implementation requires HEP team to split transfer nodes across two

subnets, to make use of both inbound paths

● Security policy - speak to your IT Security team first!

Thoughts● We are considering making Science DMZ a core part of network architecture

○ 100G still not cost-effective for widespread campus deployment - particularly if you are

geographically distributed

○ Build parallel cheap/fast DMZ network, hook together at JANET & datacentre?

○ Present DMZ where needed (distance & fibre issues though…)

● Considerations

○ Equipment in normal office/lab locations e.g. high-throughput gene sequencers

○ Separate switches in wiring closets - have to manage patching, labelling, training

○ Spurious requests - people think they can drive 10gig and cannot

● Only applicable for mature research efforts with good tooling, IMO

○ Took GridPP community many years to be able to drive these speeds

Recommendations● Speak to researchers!

● Consider appropriate cost/benefit of implementation

○ Cheap vs. high-end routers

○ Fixed versus expandable

● How will you scale, monitor, manage

○ Counters, API, routing/switching capability

● Consider your upstream capacity

LHCONE - if we have time● Overlay L3VPN - used to steer HEP traffic down separate links

○ Funding reasons

● Imperial already do L3VPN internally for network segmentation

○ JANET presented LHCONE as 802.1q-tagged subint & BGP peering, into L3VPN on core

○ Core presents as 2x “peerings” (internet & LHCONE) to Science DMZ router

○ DMZ router follows routing table (401 IPv4 & 146 IPv6 BGP routes)

● Basically works

○ Very impressive throughput

● Reservations internally about ultimate scalability of this model

○ If we had a multi-researcher Science DMZ - how would that work?

○ Policy routing? Shoot me now please...

Technology

Science DMZ at Imperial