1
Reliability in a White Rabbit Network Maciej Lipinski, Javier Serrano, Tomasz Wlostowski, CERN, Geneva, Switzerland Cesar Prados, GSI, Darmstadt, Germany is a time-deterministic, low-latency Ethernet-based network which enables transparent, sub-ns accuracy timing distribution. It is being developed to replace the General Machine Timing (GMT) system currently used at CERN and will become the foundation for the control system of the Facility for Antiproton and Ion Research (FAIR) at GSI. High reliability is an important issue in WR’s design, since unavailability of the accelerator’s control system will directly translate into expensive downtime of the machine. White Rabbit (WR) CERN 1000ms -11 3.17 · 10 1 10km 1200-5000 bytes 1ms to 2ns GSI 100ms -12 3.17 · 10 1 2km 200-500 bytes Requirement Max latency CM failure rate CMs lost per year d from DM max CM size Accuracy is defined as the ability of a system to provide its services to clients under both routine and abnormal circumstances. Thus, we identify critical services of a White Rabbit Network (WRN) based on the study of WR’s requirements (Table 1). We then analyze each critical service to identify possible reasons for their failure and propose targeted counter-measures to increase reliability. Finally, their impact on the overall system reliability is studied to identify the highest contributor and the focus for the further studies. Introduction Table 1. The requirements are defined by GSI and CERN as the prospective users of WR to control their accelerators. Reliability The reliability of the WRN relies on the deterministic delivery of the to all the designated nodes and their sufficiently accurate and stable synchronization. Unreliability is translated into the number of Control Messages (CMs) considered lost (not delivered, delivered corrupted or in a nondeterministic way) in a given period of time. During this time, the synchronization must be always of the required quality. Quantitative requirements of the accelerator facilities are listed in Table 1. Control Data (CD) time Information distributed over the WRN (types of services) : is distributed from a switch/node called Timing Master (TM) to all the other nodes/switches in the network. The deviation between the clock of the TM and that of any other node/switch is called accuracy. Data is exchanged among all the nodes. However, the critical data is the one sent by a Data Master (DM) carrying sets of commands (events) which are organized into Control Messages (CM) - . The worst-case upper bound of their delivery latency (Max latency) from the DM to any node in the network, regardless of its location (d from DM), is required to be guaranteed – this is a determinism requirement. max Time Control Data (CD) Deterministic packet delivery Data resilience Synchronization resilience Topology redundancy Conclusions must be considered as an ordinary Ethernet network with extra optional built-in features which, when properly used, can make it robust and more reliable.The reliability study presents areas which need to be addressed to increase the reliability of a WRN. The development of WR is an on-going effort and some of the suggested solutions have been already properly investigated or developed (FEC, clock distribution) while the others need further verification (RSTP, cut-through forwarding). Suggested solutions enable to fulfill the requirements set by CERN and GSI. However the costs might trigger re-justifying of at least two of them: upper-bound latency by GSI and the number of CMs lost per year. The network topology and its reliability calculations turned out to be the greater factor in the overall system reliability, it is necessary to perform more precise calculations and simulations to verify the rough estimations. This mayinclude different techniques (e.g. Monte Carlo simulations) but also more real-life use cases ( not available at the time of described study). Especially, we need to take into account and include into calculations the fact that not all the nodes connected to the WRN are equally critical in real-life applications. White Rabbit Network l A carefully configured and properly used WRN offers deterministic Ethernet frame delivery. l The upper-bound delay latency can be verified by analysis of publicly available source code. l The analysis (Table 2) revealed that GSI’s requirement (100μs over 2km) is not fulfilled with the currently implemented store-and-forward solution. l It was proposed to use the highest priority broadcast Ethernet traffic only for the Control Data and implement the cut-through method for this traffic. The analysis results (Table 2) show a significant improvement for the proposed cut-through method. Table 2. Control Message (CM) delivery latency estimation, the requirement by CERN is 1000 μs and by GSI is 100μs. In WR, the switch-over is heavily supported by the Clock Recovery System (CRS) of the switch and the WR extension to PTP (WRPTP). Figure 2 presents an example where a switch (timing slave) is connected (by its uplinks 1 & 2) to two other switches (primary and secondary masters) – the sources of timing. On both uplinks the frequency is recovered from the signal and provided to the CRS. Similarly, WRPTP measures delay and offset on each of the links and provides this data to the CRS. The information from uplink 1 (primary) is used to control the CRS and adjust the local time. However, at any time all the necessary information from the uplink 2 is available and a seamless switch-over can be performed in case of primary master failure. Figure 2. Seamless switch-over between redundant uplinks. Figure 1. Control Message latency estimation. Two concacenated Forward Error Correction (FEC) schemes are used to decrease the loss rate of the Control Messages : l coding allows to encode k original- frames into n encoded-frames (n > k). Reception of any k encoded-frames enables to decode the original frames. l is used to correct bit errors. Hardware-supporting Rapid Spanning Tree Protocol (RSTP) is proposed to prevent loosing Control Messages (CMs) during the switch- over between redundant links. The maximum convergence time of 3μs between active and backup connections is required. Reed-Solomon (R-S) Hamming coding (SEC-DED) The overall reliability is strongly dependent on the WRN topology which needs to be appropriate for the proposed solutions (SyncE, H/W-supported RSTP, determinism). Topology comparison: l We consider the reliability of a network of switches. lEach node is connected to the network with M links (each to a separate switch), where M reflects the level of redundancy. lRough estimations of the probability of WRN failure (Table 3) using analytic calculations for the three considered topologies show that triple redundancy topology can barely satisfy the requirements by CERN (Table 1). WR Node WR Node WR Node WR Node WR Node WR Node WR Node WR Node Switch WR Switch WR Switch WR Switch WR Switch WR Switch WR Switch WR Time & Data Master GPS m Ti e on C trol Data 2000 nodes 0km 1 Figure 4. WRN topologies, Switch’s MTBF of 200 000 hours. PHY PHY PHY PHY Clock Recovery Unit Reference clock Reference clock t 2 t 2 t 3 t 3 t 4 t 4 t 1 t 1 WR PTP (offser&delay) WR PTP (Offser&delay) DOWNLINK DOWNLINK UPLINK 2 UPLINK 1 WR SWITCH WR SWITCH WR SWITCH (Timing Slave) (Primary Timing Master) (Secondary Timing Master) Offset MS Offset MS Figure 3. FEC & RSTP Synchronization quality can deteriorate due to : þ Failure of network elements (resolved by network topology redundancy) þ Variation of external conditions, i.e. temperature (resolved by the used PTP protocol) ¤ Switch-over between redundant sources of timing (proposed special solution) Table 3. WRN topologies’s relibilities. White Rabbit is an open hardware and open software effort. All its sources are freely available on the project’s website: www.ohwr.org/projects/white-rabbit The failure study resulted in dividing the problem of reliability in the WRN into sub-domains: lDeterminism of the data delivery latency which varies with cable length, the number of hops (switches) it has to traverse to reach its destination and the traffic load. lSynchronization resilience against element failure, switch-over between redundant elements and the variation of external conditions. lData resilience against data loss or corruption due to the physical medium imperfections and switching between redundant elements of the network. lTopology redundancy to eliminate Single Points of Failure (SPoF). Due to the one-to-all character of information distribution in the WRN, all the elements of the WRN are considered SPoF. Sending Sending Sending Sending Sending Sending Sending Sending Sending Sending Sending Sending Link Link Link Link Link Link Link Link Link Link Link Link Link Receiving Receiving Receiving Legend WR Node WR Switch Fibre/copper Time 75.6μs 0μs Sending Sending Sending Sending Link Link Link Frames sent 0.2μs 2.5μs 2.5μs 4x2.3μs=9.2μs 2.5μs 12.5μs 12.2μs 12.5μs 2.5μs 0.5μs 12.2μs+3x2.3μs =19.1μs Encoding Receiving Decoding Redundancy No Double Triple Switches 127 292 495 P -3 2.08 · 10 -7 4.71 · 10 -11 3.06 · 10 MTBF [h] 3 5.77 10 · 7 2.55 10 · 11 4.08 10 · 500 1500 5000 221ms 285ms 324ms 283ms 325ms 364ms 76ms 102ms 162ms 118ms 142ms 202ms GSI Store-and-forward CM deliery latency Cut-through GSI CERN CERN CM Size [bytes] Control Message 500 Bytes Control Message 500 Bytes Dr p n o pi g Forwarding g F rwardin o Forwarding HP Bypass HP Bypass HP Bypass WR Switch Switch Delay (1ms) Switch Delay (1ms) Switch Delay (11ms) Switch-over (2.3ms) WR Switch WR Switch 3 4 2 1 WR Node WR Node FEC decoder Lost Frames Original message Parity Check Parity Check FEC encoder HW-supported RSTP Reed-Solomon encoding Reed-Solomon encoding Hamming Hamming X kB (e.g. 500kB) 8*CEIL(7*1*X)kB (e.g. 256kB) (8+1)*CEIL(9X) kB (e.g. ~288kB) 10 Data Master Node [400 nodes] T3 Data Master Node L0 L1 L2 [4096 nodes] T1 Data Master Node T2 [1024 nodes]

Reliability in a White Rabbit Network...Network (WRN) based on the study of WR’s requirements (Table 1). We then analyze each critical service to identify possib le reasons for their

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Reliabilityin a White Rabbit Network

    Maciej Lipinski, Javier Serrano, Tomasz Wlostowski, CERN, Geneva, SwitzerlandCesar Prados, GSI, Darmstadt, Germany

    is a time-deterministic, low-latency Ethernet-based network which enables transparent, sub-ns accuracy timing distribution. It is being developed to replace the General Machine Timing (GMT) system currently used at CERN and will become the foundation for the control system of the Facility for Antiproton and Ion Research (FAIR) at GSI. High reliability is an important issue in WR’s design, since unavailability of the accelerator’s control system will directly translate into expensive downtime of the machine.

    White Rabbit (WR)

    CERN1000ms

    -113.17 · 101

    10km1200-5000 bytes

    1ms to 2ns

    GSI100ms

    -123.17 · 10

    12km

    200-500 bytes

    RequirementMax latencyCM failure rateCMs lost per yeard from DMmaxCM sizeAccuracy

    is defined as the ability of a system to provide its services to clients under both routine and abnormal circumstances. Thus, we identify critical services of a White Rabbit Network (WRN) based on the study of WR’s requirements (Table 1). We then analyze each critical service to identify possible reasons for their failure and propose targeted counter-measures to increase reliability. Finally, their impact on the overall system reliability is studied to identify the highest contributor and the focus for the further studies.

    Introduction

    Table 1. The requirements are defined by GSI and CERN as the prospective users of WR to control their accelerators.

    Reliability The reliability of the WRN relies on the deterministic delivery of the to all the designated nodes and their sufficiently accurate and stable synchronization. Unreliability is translated into the number of Control Messages (CMs) considered lost (not delivered, delivered corrupted or in a nondeterministic way) in a given period of time. During this time, the synchronization must be always of the required quality. Quantitative requirements of the accelerator facilities are listed in Table 1.

    Control Data (CD)time

    Information distributed over the WRN (types of services) :

    is distributed from a switch/node called Timing Master (TM) to all the other nodes/switches in the network. The deviation between the clock of the TM and that of any other node/switch is called accuracy.

    Data is exchanged among all the nodes. However, the critical data is the one sent by a Data Master (DM) carrying sets of commands (events) which are organized into Control Messages (CM) - . The worst-case upper bound of their delivery latency (Max latency) from the DM to any node in the network, regardless of its location (d from DM), is required to be guaranteed – this is a determinism requirement.max

    Time

    Control Data (CD)

    Deterministic packet delivery

    Data resilience

    Synchronization resilience

    Topology redundancy

    Conclusions must be considered as an ordinary Ethernet network with extra optional built-in features which, when properly used, can make it robust and more reliable.The reliability study presents areas

    which need to be addressed to increase the reliability of a WRN. The development of WR is an on-going effort and some of the suggested solutions have been already properly investigated or developed (FEC, clock distribution) while the others need further verification (RSTP, cut-through forwarding). Suggested solutions enable to fulfill the requirements set by CERN and GSI. However the costs might trigger re-justifying of at least two of them: upper-bound latency by GSI and the number of CMs lost per year. The network topology and its reliability calculations turned out to be the greater factor in the overall system reliability, it is necessary to perform more precise calculations and simulations to verify the rough estimations. This mayinclude different techniques (e.g. Monte Carlo simulations) but also more real-life use cases ( not available at the time of described study). Especially, we need to take into account and include into calculations the fact that not all the nodes connected to the WRN are equally critical in real-life applications.

    White Rabbit Network

    l A carefully configured and properly used WRN offers deterministic Ethernet frame delivery.

    l The upper-bound delay latency can be verified by analysis of publicly available source code.

    l The analysis (Table 2) revealed that GSI’s requirement (100µs over 2km) is not fulfilled with the currently implemented store-and-forward solution.

    l It was proposed to use the highest priority broadcast Ethernet traffic only for the Control Data and implement the cut-through method for this traffic. The analysis results (Table 2) show a significant improvement for the proposed cut-through method.

    Table 2. Control Message (CM) delivery latency estimation, the requirement by CERN is 1000 µs and by GSI is 100µs.

    In WR, the switch-over is heavily supported by the Clock Recovery System (CRS) of the switch and the WR extension to PTP (WRPTP). Figure 2

    presents an example where a switch (timing slave) is connected (by its uplinks 1 & 2) to two other switches (primary and secondary masters)

    – the sources of timing. On both uplinks the frequency is recovered from the signal and provided to the CRS. Similarly, WRPTP

    measures delay and offset on each of the links and provides this data to the CRS. The information from uplink 1 (primary) is used to control the CRS and adjust the local time. However, at any time all the necessary information from the uplink 2 is available and a seamless switch-over can be performed in case of primary master failure.

    Figure 2. Seamless switch-over between redundant uplinks.

    Figure 1. Control Message latency estimation.

    Two concacenated Forward Error Correction (FEC) schemes are used to decrease the loss rate of the Control Messages :

    l coding allows to encode k original-frames into n encoded-frames (n > k). Reception of any k

    encoded-frames enables to decode the original frames.

    l is used to correct bit errors.

    Hardware-supporting Rapid Spanning Tree Protocol (RSTP) is proposed to prevent loosing Control Messages (CMs) during the switch-

    over between redundant links. The maximum convergence time of 3µs between active and backup connections is required.

    Reed-Solomon (R-S)

    Hamming coding (SEC-DED)

    The overall reliability is strongly dependent on the WRN topology which needs to be appropriate for the proposed solutions (SyncE, H/W-supported RSTP, determinism).

    Topology comparison:

    l We consider the reliability of a network of switches.

    lEach node is connected to the network with M links (each to a separate switch), where M reflects the level of redundancy.

    lRough estimations of the probability of WRN failure (Table 3) using analytic calculations for the three considered topologies show that triple redundancy topology can barely satisfy the requirements by CERN (Table 1).

    WR Node

    WR Node WR

    NodeWR

    Node

    WR Node

    WR Node

    WR Node

    WR NodeSwitch

    WR

    SwitchWR

    SwitchWR

    SwitchWR

    SwitchWR

    SwitchWR

    SwitchWR

    Time & Data Master

    GPS

    mTi e onC trol

    Data

    2000 nodes

    0km1

    Figure 4.WRN topologies,Switch’s MTBF of 200 000 hours.

    PHY PHY

    PHYPHY

    ClockRecovery

    Unit

    Reference clock

    Reference clock

    t2

    t2

    t3

    t3

    t4

    t4

    t1

    t1 WR PTP(offser&delay)

    WR PTP(Offser&delay)

    DOWNLINK

    DOWNLINK UPLINK 2

    UPLINK 1

    WR SWITCHWR SWITCH

    WR SWITCH

    (Timing Slave)(Primary Timing Master)

    (Secondary Timing Master)

    OffsetMS

    OffsetMS

    Figure 3. FEC & RSTP

    Synchronization quality can deteriorate due to :

    þ Failure of network elements

    (resolved by network topology redundancy)

    þ Variation of external conditions, i.e. temperature

    (resolved by the used PTP protocol)

    ¨ Switch-over between redundant sources of timing

    (proposed special solution)

    Table 3. WRN topologies’s relibilities.

    White Rabbit is an open hardware and open software effort. All its sources are freely available on the project’s website: www.ohwr.org/projects/white-rabbit

    The failure study resulted in dividing the problem of reliability in the WRN into sub-domains:

    lDeterminism of the data delivery latency which varies with cable length, the number of hops (switches) it has to traverse to reach its destination and the traffic load.

    lSynchronization resilience against element failure, switch-over between redundant elements and the variation of external conditions.

    lData resilience against data loss or corruption due to the physical medium imperfections and switching between redundant elements of the network.

    lTopology redundancy to eliminate Single Points of Failure (SPoF). Due to the one-to-all character of information distribution in the WRN, all the elements of the WRN are considered SPoF.

    SendingSending

    Sending

    SendingSending

    Sending

    SendingSending

    Sending

    SendingSending

    Sending

    LinkLink

    Link

    LinkLink

    Link

    Link

    LinkLink

    Link

    Link

    LinkLink

    ReceivingReceiving

    Receiving

    LegendWR Node

    WR Switch

    Fibre/copper

    Tim

    e

    75.6µs

    0µs

    Sending

    Sending

    Sending

    Sending

    Link

    Link

    Link

    Frames sent0.2µs

    2.5µs

    2.5µs

    4x2.3µs=9.2µs

    2.5µs

    12.5µs

    12.2µs

    12.5µs

    2.5µs

    0.5µs

    12.2µs+3x2.3µs=19.1µs

    Encoding

    Receiving

    Decoding

    RedundancyNo

    DoubleTriple

    Switches127292495

    P-32.08 · 10-74.71 · 10-113.06 · 10

    MTBF [h]35.77 10 · 72.55 10 · 11

    4.08 10 ·

    500 1500 5000

    221ms285ms324ms

    283ms325ms364ms

    76ms102ms162ms

    118ms142ms202ms

    GSIStore-and-forward

    CM deliery latencyCut-through

    GSICERN CERN

    CM Size[bytes]

    ControlMessage500 Bytes

    ControlMessage500 BytesDr p

    no p

    i g

    Forwarding

    g

    F rwardino

    Forwarding

    HP Bypass

    HP Bypass

    HP Bypass

    WR Switch

    Switch Delay(1ms)

    Switch Delay(1ms)

    Switch Delay(11ms)

    Switch-over(2.3ms)

    WR Switch

    WR Switch

    34

    2 1

    WR

    Nod

    e

    WR

    Nod

    e

    FE

    C d

    ecod

    er

    Lost Frames

    Original message

    Parity CheckParity Check

    FEC encoder

    HW-supported RSTP

    Reed-Solomonencoding

    Reed-Solomonencoding

    HammingHamming

    X kB (e.g. 500kB)

    8*CEIL(7*1*X)kB (e.g. 256kB) (8+1)*CEIL(9X) kB (e.g. ~288kB)

    1 0

    Data Master

    Node [400 nodes]

    T3Data Master

    Node

    L0

    L1

    L2[4096 nodes]

    T1 Data Master

    Node

    T2

    [1024 nodes]