17
Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT, Poland [email protected] Building the Modern Performance Monitoring with perfSONAR Mreža znanja 2016 24. november 2016, Ljubljana

Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

  • Upload
    vuthien

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Szymon Trocha

Poznań Supercomputing & Networking Center /

GÉANT, Poland

[email protected]

Building the Modern

Performance Monitoring with

perfSONAR

Mreža znanja 201624. november 2016, Ljubljana

Page 2: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• Identify problems, when they happen or (better) earlier

• The tools must be available (at campus endpoints, demarcations between networks, at exchange points, and near data resources such as storage and computing elements, etc)

• Access to testing resources

2

Motivations

…users encounter problems orwould like to know more…

Page 3: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• The global Research & Education network ecosystem is comprised of hundreds of international, national, regional and local-scale networks

• While these networks all interconnect, each network is owned and operated by separate organizations (called “domains”) with different policies, customers, funding models, hardware, bandwidth and configurations

3

Problem statement

• This complex, heterogeneous set of networks must operate seamlessly from “end to end” to support your science and research collaborations that are distributed globally

Page 4: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

4

Where Are The Problems?

Source

Campus

S

Congested or faulty links between domains

Congested intra- campus links

D

Destination

Campus

Latency dependant problems inside domains with small RTT

Page 5: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• Soft failures cause packet loss and degraded TCP performance

5

Local Testing Will Not Find Everything

DS

Short distance

Performance is good when RTT is < ~10 ms

Long distance

Performance is poor when RTT exceeds ~10 ms

Switch with small buffers

Page 6: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• All networks do some form monitoring. • Addresses needs of local staff for understanding state of the network

• Would this information be useful to external users?

• Can these tools function on a multi-domain basis?

• Beyond passive methods, there are active tools. • E.g. often we want a ‘throughput’ number. Can we automate that

idea?

• Wouldn’t it be nice to get some sort of plot of performance over the course of a day? Week? Year? Multiple endpoints?

• perfSONAR = Measurement Middleware

6

Network Monitoring

Page 7: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• It’s infeasible to perform at-scale data movement all the time – as we see in other forms of science, we need to rely on simulations

• perfSONAR is a tool to:• Set network performance expectations• Find network problems (“soft failures”)• Help fix these problems• All in multi-domain environments

• These problems are all harder when multiple networks are involved

• perfSONAR is provides a standard way to publish active and passive monitoring data

• This data is interesting to network researchers as well as network operators

7

What is perfSONAR?

Page 8: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• Network performance comes down to a couple of key metrics:• Throughput (e.g. “how much can I get out of the network”)• Latency (time it takes to get to/from a destination)

• Packet loss/duplication/ordering (for some sampling of packets, do they all make it to the other side without serious abnormalities occurring?)

• Network utilization (the opposite of “throughput” for a moment in time)

• We can get many of these from a selection of active and passive measurement tools – the perfSONAR Toolkit

• The “perfSONAR Toolkit” is an open source implementation and packaging of the perfSONAR measurement infrastructure and protocols

• http://www.perfsonar.net

• All components are available as RPMs, DEBs, and bundled as a CentOS 6-based (CentOS 7 is coming) “netinstall”

• Very easy to install and configure• Usually takes less than 30 minutes

8

perfSONAR Toolkit

Page 9: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• We can’t wait for users to report problems and then fix them

• Things just break sometimes• Bad system or network tuning

• Failing optics

• Somebody messed around in a patch panel and kinked a fiber

• Hardware goes bad

• Problems that get fixed have a way of coming back• System defaults come back after hardware/software upgrades

• New employees may not know why the previous employee set things up a certain way and back out fixes

• Important to continually collect, archive, and alert on active throughput test results

9

Importance of Regular Testing

Page 10: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

10

perfSONAR Regular Testing

• We run regular tests to check for three things: TCP throughput, One way packet loss and delay, traceroute

• perfSONAR has mechanisms for managing regular testing between perfSONAR hosts• Statistics collection and archiving• Graphs• MaDDash display

Page 11: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• Above all, perfSONAR allows you to maintain a healthy, high-performing network because it helps identify the “soft failures” in the network path.

• Classical monitoring systems have limitations• Performance problems are often only visible at the ends

• Individual network components (e.g. routers) have no knowledge of end host state

• perfSONAR tests the network in ways that classical monitoring systems do not

• More perfSONAR distributions equal better network visibility

11

Benefits: Finding the needle in the haystack

Page 12: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

12

Small Node in GÉANT

•„I Low cost" by gazteaukera is licenced under CC BY-NC

2.0

Page 13: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• Inexpensive but with known capabilities

• Cost target: € 200

• Many different hardware choices

• Enable you to have first hand experience with perfSONAR

• Increase awareness and usage of perfSONAR in the community

• For users with an outdated perception of perfSONAR or end-user organization

• Initially 6 months project with provided hardware, central server and support run by GÉANT

• Launched at TNC2016

13

perfSONARon Low Cost Hardware

Page 14: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

14

Hardware chosen

• What’s inside? Current configuration

•Intel Celeron N3150 1.6GHz

•4 cores

•8GB RAM Kingston

•120GB•Patriot SSD

•/ 20 GB•/var/lib 86 GB•Swap 8 GB

Page 15: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

15

Running infrastructure

• Regular testing mesh

• Dashboard in Central server

• Support

INSPIRATION

Page 16: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

• http://www.perfsonar.net/

• http://docs.perfsonar.net/

• http://www.perfsonar.net/about/getting-help/

• perfSONAR videos• https://learn.nsrc.org/perfsonar

• FasterData Knowledgebase• http://fasterdata.es.net/

• http://perfsonar-smallnodes.geant.org/maddash-webui/

16

Resources

Page 17: Building the Modern Performance Monitoring with perfSONARmrezaznanja.splet.arnes.si/files/2016/...perfSONAR... · Szymon Trocha Poznań Supercomputing & Networking Center / GÉANT,

Networks ∙ Services ∙ People www.geant.org

Thank you

Networks ∙ Services ∙ People

www.geant.org

•This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 691567 (GN4-1).

•17

[email protected]

This document is a result of work by the perfSONAR Project (http://www.perfsonar.net) and is licensedunder CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/).