View
214
Download
0
Category
Preview:
Citation preview
Alex White, Campus network engineering workshop19/10/2016 Solving Network Throughput Problems at the
Diamond Light Source
Introduction to Diamond Light SourceSolving Network Throughput Problemsat the Diamond Light Source
Alex Whitealex.white@diamond.ac.uk
So, what do we actually do?
The Diamond machine is a type of particle accelerator
CERN = high energy particles smashed together and analyse the “crash”!
Diamond = accelerate electrons to produce synchrotron light
Use this light to study matter – like a “super microscope”
Three particle accelerators:
Linear accelerator
Booster Synchrotron
Storage ring (48 straight sections angled
together, 562m long)
The Diamond machine
Simultaneous Experiments
Data-intensive research
Lustre and GPFS filesystems: 430TB, 900TB, 3.3PB as of 2016
Typical X-ray camera 4MB * 100hz An experiment can easily produce 300GB-1TB Scientists want to take their data home
Site Limitations
Scientific data download speeds from Diamond to visiting user’s institutes were inconsistent and slow even though the facility had a “10Gb/s” JANET connection from STFC.
The limit on download speeds was delaying post-experiment data analysis by academics at their home institutes.
How did we characterise the problem?
We set ourselves an initial target of “a stable 50Mb/s over a 10ms path”
Initial Findings
10Gb/s inside our network, with no packet loss Low speeds found with iperf over the
STFC/JANET segment between Diamond's edge and the Physics Department at Oxford
We saw a small amount of packet loss over the STFC/JANET link
TCP Performance and the Mathis equation
Packet size Latency (AKA Round Trip Time) Packet Loss
“Interesting” effects of packet loss
Packet Loss
According to Mathis, to achieve our initial goal of 50Mb/s over a 10ms path the tolerable packet loss is 0.026% maximum.
Finding the problem – the Last Mile
We worked with STFC to connect a PerfSonar server directly to the Harwell site border router.
Tests with this extra server allowed us to pinpoint the STFC firewall (our “last mile”) as the source of the insidious packet loss.
The Fix: Science DMZ
The Fix: Science DMZ
Globus GridFTP
Uses parallel TCP streams Simple, web-based interface
Performance with Science DMZ
Test data: 2Gb/s+ consistently between DLS and Brookhaven National Labs (USA)!
Actual transfers in August 2016:Fastest: Crystallography dataset from DLS to
Newcastle: 260GB @ 480Mb/sBiggest: Electron Microscope data from DLS to
Imperial: 1120GB @ 290Mb/s
Security in the Science DMZ
In Summary
1. Use real-world testing to find packet loss2. Zero packet loss is crucial3. The last mile is usually the problem4. Firewalls have been shown to introduce packet loss – this is
backed up by ESnet's own testing5. Don't use SCP as the common implementation has a fixed
TCP window size – it will never grow to fill your link
Recommended