20
Alex White, Campus network engineering workshop 19/10/2016 Solving Network Throughput Problems at the Diamond Light Source

Solving Network Throughput Problems at the Diamond Light Source

  • Upload
    jisc

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Solving Network Throughput Problems at the Diamond Light Source

Alex White, Campus network engineering workshop19/10/2016 Solving Network Throughput Problems at the

Diamond Light Source

Page 2: Solving Network Throughput Problems at the Diamond Light Source

Introduction to Diamond Light SourceSolving Network Throughput Problemsat the Diamond Light Source

Alex [email protected]

Page 3: Solving Network Throughput Problems at the Diamond Light Source
Page 4: Solving Network Throughput Problems at the Diamond Light Source

So, what do we actually do?

The Diamond machine is a type of particle accelerator

CERN = high energy particles smashed together and analyse the “crash”!

Diamond = accelerate electrons to produce synchrotron light

Use this light to study matter – like a “super microscope”

Page 5: Solving Network Throughput Problems at the Diamond Light Source

Three particle accelerators:

Linear accelerator

Booster Synchrotron

Storage ring (48 straight sections angled

together, 562m long)

The Diamond machine

Page 6: Solving Network Throughput Problems at the Diamond Light Source

Simultaneous Experiments

Page 7: Solving Network Throughput Problems at the Diamond Light Source

Data-intensive research

Lustre and GPFS filesystems: 430TB, 900TB, 3.3PB as of 2016

Typical X-ray camera 4MB * 100hz An experiment can easily produce 300GB-1TB Scientists want to take their data home

Page 8: Solving Network Throughput Problems at the Diamond Light Source

Site Limitations

Scientific data download speeds from Diamond to visiting user’s institutes were inconsistent and slow even though the facility had a “10Gb/s” JANET connection from STFC.

The limit on download speeds was delaying post-experiment data analysis by academics at their home institutes.

Page 9: Solving Network Throughput Problems at the Diamond Light Source

How did we characterise the problem?

We set ourselves an initial target of “a stable 50Mb/s over a 10ms path”

Page 10: Solving Network Throughput Problems at the Diamond Light Source

Initial Findings

10Gb/s inside our network, with no packet loss Low speeds found with iperf over the

STFC/JANET segment between Diamond's edge and the Physics Department at Oxford

We saw a small amount of packet loss over the STFC/JANET link

Page 11: Solving Network Throughput Problems at the Diamond Light Source

TCP Performance and the Mathis equation

Packet size Latency (AKA Round Trip Time) Packet Loss

Page 12: Solving Network Throughput Problems at the Diamond Light Source

“Interesting” effects of packet loss

Page 13: Solving Network Throughput Problems at the Diamond Light Source

Packet Loss

According to Mathis, to achieve our initial goal of 50Mb/s over a 10ms path the tolerable packet loss is 0.026% maximum.

Page 14: Solving Network Throughput Problems at the Diamond Light Source

Finding the problem – the Last Mile

We worked with STFC to connect a PerfSonar server directly to the Harwell site border router.

Tests with this extra server allowed us to pinpoint the STFC firewall (our “last mile”) as the source of the insidious packet loss.

Page 15: Solving Network Throughput Problems at the Diamond Light Source

The Fix: Science DMZ

Page 16: Solving Network Throughput Problems at the Diamond Light Source

The Fix: Science DMZ

Page 17: Solving Network Throughput Problems at the Diamond Light Source

Globus GridFTP

Uses parallel TCP streams Simple, web-based interface

Page 18: Solving Network Throughput Problems at the Diamond Light Source

Performance with Science DMZ

Test data: 2Gb/s+ consistently between DLS and Brookhaven National Labs (USA)!

Actual transfers in August 2016:Fastest: Crystallography dataset from DLS to

Newcastle: 260GB @ 480Mb/sBiggest: Electron Microscope data from DLS to

Imperial: 1120GB @ 290Mb/s

Page 19: Solving Network Throughput Problems at the Diamond Light Source

Security in the Science DMZ

Page 20: Solving Network Throughput Problems at the Diamond Light Source

In Summary

1. Use real-world testing to find packet loss2. Zero packet loss is crucial3. The last mile is usually the problem4. Firewalls have been shown to introduce packet loss – this is

backed up by ESnet's own testing5. Don't use SCP as the common implementation has a fixed

TCP window size – it will never grow to fill your link