End-2-End Network Monitoring What do we do ? What do we use it for?

  • Published on

  • View

  • Download

Embed Size (px)


End-2-End Network Monitoring What do we do ? What do we use it for?. Richard Hughes-Jones Many people are involved:. DataGRID WP7: Network Monitoring Architecture for Grid Sites. LDAP Schema. Grid Apps GridFTP. Backend LDAP script to fetch metrics Monitor process to push metrics. PingER - PowerPoint PPT Presentation


  • End-2-End Network MonitoringWhat do we do ? What do we use it for?Richard Hughes-Jones

    Many people are involved:

  • Local NetworkMonitoringStore & Analysisof Data (Access)Access to current and historic dataand metrics via the Web, i.e. WP7NM Pages, access to metric forecastsBackend LDAP script to fetch metricsMonitor process to push metricslocalLDAPServer

    Grid Application access viaLDAP Schema to- monitoring metrics; - location of monitoring data.PingER(RIPE TTB)iperfUDPmonrTPLNWSetcLDAPSchemaGrid AppsGridFTPDataGRID WP7: Network Monitoring Architecture for Grid SitesRobin Tasker

  • WP7 Network Monitoring Components

  • WP7 MapCentre: Grid Monitoring & VisualisationGrid network monitoring architecture uses LDAP & R-GMA - DataGrid WP7Central MySQL archive hosting all network metrics and GridFTP loggingProbe Coordination Protocol deployed, scheduling tests MapCentre also provides site & node Fabric health checksFranck BonnassieuxCNRS Lyon

  • WP7 MapCentre: Grid Monitoring & VisualisationCERN RAL UDP CERN IN2P3 UDPCERN RAL TCP CERN IN2P3 TCP

  • UK e-Science: Network Monitoring Technology TransferDataGrid WP7 M/cUK e-Science DLDataGrid WP7 M/c


  • UK e-Science: Network Problem Solving24 Jan to 4 Feb 04TCP iperf RAL to HEPOnly 2 sites >80 Mbit/s RAL -> DL 250-300 Mbit/s

  • Tools: UDPmon Latency & ThroughputUDP/IP packets sent between end systemsLatencyRound trip times using Request-Response UDP framesLatency as a function of frame sizeSlope s given by:

    Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s)Intercept indicates processing times + HW latencies Histograms of singleton measurementsUDP ThroughputSend a controlled stream of UDP frames spaced at regular intervalsVary the frame size and the frame transmit spacing & measure:The time of first and last frames receivedThe number packets received, lost, & out of orderHistogram inter-packet spacing received packetsPacket loss pattern1-way delayCPU loadNumber of interrupts

  • UDPmon: Example 1 Gigabit NIC Intel pro/1000 LatencyThroughputBus ActivityMotherboard: Supermicro P4DP6Chipset: E7500 (Plumas) CPU: Dual Xeon 2 2GHz with 512k L2 cacheMem bus 400 MHz PCI-X 64 bit 66 MHzHP Linux Kernel 2.4.19 SMPMTU 1500 bytes

    Intel PRO/1000 XT

  • Tools: Trace-Rate Hop by hop measurementsA method to measure the hop-by-hop capacity, delay, and loss up to the path bottleneck Not intrusiveOperates in a high-performance environment Does not need cooperation of the destinationBased on Packet Pair MethodSend sets of b2b packets with increasing time to live For each set filter noise from rttCalculate spacing hence bottleneck BWRobust regarding the presence of invisible nodesEffect of the bottleneck on a packet pair. L is a packet size C is the capacityExamples of parameters that are iteratively analysed to extract the capacity mode

  • Tools: Trace-Rate Some ResultsCapacity measurements as function of load in Mbit/s from tests on the DataTAG Link:

    Comparison of the number of packets required

    Validated by simulations in NS-2 Linux implementations, working in a high-performance environmentResearch report: http://www.inria.fr/rrrt/rr-4959.html Research Paper: ICC2004 : International Conference on Communications, Paris, France, June 2004. IEEE Communication Society.

  • Network Monitoring as a Tool to study:Protocol Behaviour Network PerformanceApplication Performance

    Tools include: web100tcpdumpOutput from the test tool:UDPmon, iperf, Output from the application Gridftp, bbcp, apache

  • Protocol Performance: RDUDPMonitoring from Data Moving Application & Network Test Program DataTAG WP3 workTest Setup:Path: Ams-Chi-Ams Force10 loopbackMoving data from DAS-2 cluster with RUDP UDP based TransportApply 11*11 TCP background streams from iperfConclusions RUDP performs wellIt does Back off and share BWRapidly expands when BW free

    Hans Blom

  • Performance of the GANT Core NetworkTest Setup:Supermicro PC in: London & Amsterdam GANT PoPSmartbits in: London & Frankfurt GANT PoPLong link : UK-SE-DE2-IT-CH-FR-BE-NL Short Link : UK-FR-BE-NLNetwork Quality Of ServiceLBE, IP PremiumHigh-Throughput TransfersStandard and advanced TCP stacksPacket re-ordering effects

  • Tests GANT Core: Packet re-orderingEffect of LBE backgroundAmsterdam-London BE Test flowPackets at 10 s line speed10,000 sent Packet Loss ~ 0.1%

    Re-order Distributions:

  • Application Throughput + Web1002Gbyte file transferred RAID0 disksWeb100 output every 10 msGridftpSee alternate 600/800 Mbit and zero

    Apachie web server + curl-based clientSee steady 720 Mbit

  • 1472 byte Packets man -> JIVEFWHM 22 s (B2B 3 s )

    VLBI Project: Throughput Jitter 1-way Delay Loss1-way Delay note the packet loss (points with zero 1 way delay)

    1472 byte Packets Manchester -> Dwingeloo JIVE

    Packets Loss distributionProb. Density Function: P(t) = e-tMean = 2360 / s [426 s]

  • Passive MonitoringTime-series data from Routers and SwitchesImmediate but usually historical- MRTGUsually derived from SNMPMiss-configured / infected / misbehaving End Systems (or Users?)Note Data Protection Laws & confidentialitySite MAN and Back-bone topology & loadHelp to user/sysadmin to isolate problem eg low TCP transferEssential for Proof of Concept tests or Protocol testingTrends used for capacity planningControl of P2P traffic

  • Users: The Campus & the MAN [1]NNW to SJ4 Access 2.5 Gbit PoS Hits 1 Gbit 50 %

    Man NNW Access 2 * 1 Gbit Ethernet

    Pete WhitePat Myers

  • Users: The Campus & the MAN [2]LMN to site 1 Access 1 Gbit Ethernet

    LMN to site 2 Access 1 Gbit Ethernet

    Message: Not a complaint Continue to work with your network groupUnderstand the traffic levelsUnderstand the Network Topology

  • VLBI Traffic Flows

    Manchester NetNorthWest - SuperJANET Access linksTwo 1 Gbit/s

    Access links: SJ4 to GANT GANT to SurfNet

    Only testing Could be worse!

  • Network Measurement Working Group

    A Hierarchy of Network Performance Characteristics for Grid Applications and ServicesDocument defines terms & relations:Network characteristicsMeasurement methodologiesObservationDiscusses Nodes & PathsFor each CharacteristicDefines the meaningAttributes that SHOULD be includedIssues to consider when making an observation

    Status:Originally submitted to GFSG as Community Practice Document draft-ggf-nmwg-hierarchy-00.pdf Jul 2003Revised to Proposed Recommendation http://www-didc.lbl.gov/NMWG/docs/draft-ggf-nmwg-hierarchy-02.pdf 7 Jan 04Now in 60 day Public comment from 28 Jan 04 18 days to go.GGF: Hierarchy Characteristics Document

  • Request Schema:Ask for results / ask to make testSchema Requirements Document madeUse DAMED style names e.g. path.delay.oneWaySend: Char. Time, Subject = node | path Methodology, Stats

    Response Schema:Interpret resultsIncludes Observation environment

    Much work in progress Common componentsDrafts almost done

    2 (3) proof-of-concept implementations2 implementations using XML-RPC by Internet2 SLACImplementation in progress using Document /Literal by DL & UCLGGF: Schemata for Network Measurements

  • So What do we Use Monitoring for: A SummaryEnd2End Time SeriesThroughput UDP/TCPRttPacket loss

    Passive MonitoringRouters Switches SNMP MRTGHistorical MRTG

    Packet/Protocol Dynamicstcpdumpweb100

    Output from Application toolsDetect or X-check problem reportsIsolate / determine a performance issueCapacity planningPublication of data: network cost for middlewareRBs for optimized matchmakingWP2 Replica Manager

    Capacity planningSLA verificationIsolate / determine throughput bottleneck work with real user problemsTest conditions for Protocol/HW investigations

    Protocol performance / developmentHardware performance / developmentApplication analysis

    Input to middleware eg gridftp throughputIsolate / determine a (user) performance issueHardware / protocol investigations

  • More Information Some URLsDataGrid WP7 Mapcenter: http://ccwp7.in2p3.fr/wp7archive/ & http://mapcenter.in2p3.fr/datagrid-rgma/UK e-science monitoring: http://gridmon.dl.ac.uk/gridmon/MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/netMotherboard and NIC Tests: www.hep.man.ac.uk/~rich/netIEPM-BW site: http://www-iepm.slac.stanford.edu/bw

  • Network Monitoring to Grid SitesNetwork Tools DevelopedUsing Network Monitoring as a Study ToolApplications & Network Monitoring real usersPassive MonitoringStandards Links to GGF

  • Data Flow: SuperMicro 370DLE: SysKonnectMotherboard: SuperMicro 370DLE Chipset: ServerWorks III LE ChipsetCPU: PIII 800 MHz PCI:64 bit 66 MHzRedHat 7.1 Kernel 2.4.14

    1400 bytes sentWait 100 us~8 us for send or receiveStack & Application overhead ~ 10 us / node

  • 10 GigEthernet: Throughput1500 byte MTU gives ~ 2 Gbit/sUsed 16144 byte MTU max user length 16080DataTAG Supermicro PCs Dual 2.2 GHz Xeon CPU FSB 400 MHzPCI-X mmrbc 512 byteswire rate throughput of 2.9 Gbit/s

    SLAC Dell PCs giving aDual 3.0 GHz Xeon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

    CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 byteswire rate of 5.7 Gbit/s

  • 16080 byte packets every 200 s Intel PRO/10GbE LR Adapter

    PCI-X bus occupancy vs mmrbc

    Plot:Measured timesTimes based on PCI-X times from the logic analyserExpected throughputTuning PCI-X: Variation of mmrbc IA32

  • 10 GigEthernet at SC2003 BW ChallengeThree Server systems with 10 GigEthernet NICsUsed the DataTAG altAIMD stack 9000 byte MTUSend mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to:

    Pal Alto PAIXrtt 17 ms , window 30 MBShared with Caltech booth4.37 Gbit hstcp I=5% Then 2.87 Gbit I=16% Fall corresponds to 10 Gbit on link

    3.3Gbit Scalable I=8%Tested 2 flows sum 1.9Gbit I=39%

    Chicago Starlightrtt 65 ms , window 60 MBPhoenix CPU 2.2 GHz3.1 Gbit hstcp I=1.6%

    Amsterdam SARArtt 175 ms , window 200 MBPhoenix CPU 2.2 GHz

    4.35 Gbit hstcp I=6.9%Very StableBoth used Abilene to Chicago

  • Intel PRO/10GbE LR Adapter and driver gave stable throughput and worked wellNeed large MTU (9000 or 16114) 1500 bytes gives ~2 Gbit/s

    PCI-X tuning mmrbc = 4096 bytes increase by 55% (3.2 to 5.7 Gbit/s)PCI-X sequences clear on transmit gaps ~ 950 ns Transfers: transmission (22 s) takes longer than receiving (18 s)Tx rate 5.85 Gbit/s Rx rate 7.0 Gbit/s (Itanium) (PCI-X max 8.5Gbit/s)

    CPU load considerable 60% Xenon 40% ItaniumBW of Memory system important crosses 3 times!Sensitive to OS/ Driver updates

    More study needed

    Summary & Conclusions

  • PCI Activity: Read Multiple data blocks 0 waitRead 999424 bytes Each Data block:Setup CSRsData movementUpdate CSRsFor 0 wait between reads: Data blocks ~600s long take ~6 msThen 744s gap PCI transfer rate 1188Mbit/s (148.5 Mbytes/s)Read_sstor rate 778 Mbit/s (97 Mbyte/s)PCI bus occupancy: 68.44% Concern about Ethernet Traffic 64 bit 33 MHz PCI needs ~ 82% for 930 Mbit/s Expect ~360 Mbit/sData transfer CSR AccessPCI Burst 4096 bytes Data Block131,072 bytes

  • PCI Activity: Read ThroughputFlat then 1/t dependance~ 860 Mbit/s for Read blocks >= 262144 bytes

    CPU load ~20%Concern about CPU load needed to drive Gigabit link

  • BaBar Case Study: RAID Throughput & PCI Activity 3Ware 7500-8 RAID5 parallel EIDE3Ware forces PCI bus to 33 MHzBaBar Tyan to MB-NG SuperMicro Network mem-mem 619 Mbit/s

    Disk disk throughput bbcp 40-45 Mbytes/s (320 360 Mbit/s) PCI bus effectively full!Read from RAID5 DisksWrite to RAID5 Disks

  • BaBar: Serial ATA Raid Controllers3Ware 66 MHz PCIICP 66 MHz PCI

  • Measure the time between lost packets in the time series of packets sent.Lost 1410 in 0.6sIs it a Poisson process?Assume Poisson is stationary (t) = Use Prob. Density Function: P(t) = e-tMean = 2360 / s [426 s]

    Plot log: slope -0.0028 expect -0.0024Could be additional process involved

    VLBI Project: Packet Loss Distribution