Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Track ndashNext Generation Data Center Fabrics Do New Technologies meet Operator NeedsNANOG 7610 June 2019
1
2
Agenda
bull Advancing Ethernet for Data Center Networks Roger Marks EthAirNet Associates
bull Towards Hyperscale High Performance Computing with RDMA Omar Cardona Microsoft
bull The IEEE P8021Qcz Project on Congestion Isolation Paul Congdon Tallac Networks
bull Discussion Identifying Problems and Inefficiencies in Current Data Center Operations
2
Advancing Ethernet for Data Center Networks
Roger Marksrogerethairnet+1 802 capable
Prepared 3 June 2019
3
4
Disclaimer
bull All speakers presenting information on IEEE standards speak as individuals and their views should be considered the personal views of that individual rather than the formal position explanation or interpretation of the IEEE
4
5
IEEE 802 The LANMAN Standard Committee (LMSC) IEEE 802 LANMAN Standards Committee (LMSC) Develops standards for Local Area Networks (LAN) Metropolitan Area Networks (MAN) Regional Area Networks (RAN) Personal Area Networks (PAN) Wireless Specialty Networks etc In operation since March 1980
5
6
Reference Model for
End Stations
Application
Presentation
Session
Transport
Network
Data Link
Physical
bullCopperbullFiberbullAirbullOthers
IEEE802
Lower-Layer Focus IEEE 802 standards emphasize
the functionality of the lowest two layers of the OSI reference model and the higher layers as they relate to network management physical layer (PHY Layer 1) data link layer (DLL Layer 2)
IEEE 802 divides DLL into Medium Access Control (MAC)
Multiple specifications Common logical link control (LLC)
See details in IEEE Std 802 ldquoIEEE Standard for Local and
Metropolitan Area Networks Overview and Architecturerdquo
6
MAC
LLC
Medium
7
Nendicabull Nendica IEEE 802 ldquoNetwork Enhancements for the Next
Decaderdquo Industry Connections Activity IEEE Industry Connections Activities ldquoprovide an efficient
environment for building consensus and developing many different types of shared results Such activities may complement supplement or be precursors of IEEE Standards projectsrdquo
bull Organized under the IEEE 8021 Working Groupbull Chartered through March 2021bull Chair Roger Marksbull Open to all participants no membership
7
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
2
Agenda
bull Advancing Ethernet for Data Center Networks Roger Marks EthAirNet Associates
bull Towards Hyperscale High Performance Computing with RDMA Omar Cardona Microsoft
bull The IEEE P8021Qcz Project on Congestion Isolation Paul Congdon Tallac Networks
bull Discussion Identifying Problems and Inefficiencies in Current Data Center Operations
2
Advancing Ethernet for Data Center Networks
Roger Marksrogerethairnet+1 802 capable
Prepared 3 June 2019
3
4
Disclaimer
bull All speakers presenting information on IEEE standards speak as individuals and their views should be considered the personal views of that individual rather than the formal position explanation or interpretation of the IEEE
4
5
IEEE 802 The LANMAN Standard Committee (LMSC) IEEE 802 LANMAN Standards Committee (LMSC) Develops standards for Local Area Networks (LAN) Metropolitan Area Networks (MAN) Regional Area Networks (RAN) Personal Area Networks (PAN) Wireless Specialty Networks etc In operation since March 1980
5
6
Reference Model for
End Stations
Application
Presentation
Session
Transport
Network
Data Link
Physical
bullCopperbullFiberbullAirbullOthers
IEEE802
Lower-Layer Focus IEEE 802 standards emphasize
the functionality of the lowest two layers of the OSI reference model and the higher layers as they relate to network management physical layer (PHY Layer 1) data link layer (DLL Layer 2)
IEEE 802 divides DLL into Medium Access Control (MAC)
Multiple specifications Common logical link control (LLC)
See details in IEEE Std 802 ldquoIEEE Standard for Local and
Metropolitan Area Networks Overview and Architecturerdquo
6
MAC
LLC
Medium
7
Nendicabull Nendica IEEE 802 ldquoNetwork Enhancements for the Next
Decaderdquo Industry Connections Activity IEEE Industry Connections Activities ldquoprovide an efficient
environment for building consensus and developing many different types of shared results Such activities may complement supplement or be precursors of IEEE Standards projectsrdquo
bull Organized under the IEEE 8021 Working Groupbull Chartered through March 2021bull Chair Roger Marksbull Open to all participants no membership
7
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
Advancing Ethernet for Data Center Networks
Roger Marksrogerethairnet+1 802 capable
Prepared 3 June 2019
3
4
Disclaimer
bull All speakers presenting information on IEEE standards speak as individuals and their views should be considered the personal views of that individual rather than the formal position explanation or interpretation of the IEEE
4
5
IEEE 802 The LANMAN Standard Committee (LMSC) IEEE 802 LANMAN Standards Committee (LMSC) Develops standards for Local Area Networks (LAN) Metropolitan Area Networks (MAN) Regional Area Networks (RAN) Personal Area Networks (PAN) Wireless Specialty Networks etc In operation since March 1980
5
6
Reference Model for
End Stations
Application
Presentation
Session
Transport
Network
Data Link
Physical
bullCopperbullFiberbullAirbullOthers
IEEE802
Lower-Layer Focus IEEE 802 standards emphasize
the functionality of the lowest two layers of the OSI reference model and the higher layers as they relate to network management physical layer (PHY Layer 1) data link layer (DLL Layer 2)
IEEE 802 divides DLL into Medium Access Control (MAC)
Multiple specifications Common logical link control (LLC)
See details in IEEE Std 802 ldquoIEEE Standard for Local and
Metropolitan Area Networks Overview and Architecturerdquo
6
MAC
LLC
Medium
7
Nendicabull Nendica IEEE 802 ldquoNetwork Enhancements for the Next
Decaderdquo Industry Connections Activity IEEE Industry Connections Activities ldquoprovide an efficient
environment for building consensus and developing many different types of shared results Such activities may complement supplement or be precursors of IEEE Standards projectsrdquo
bull Organized under the IEEE 8021 Working Groupbull Chartered through March 2021bull Chair Roger Marksbull Open to all participants no membership
7
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
4
Disclaimer
bull All speakers presenting information on IEEE standards speak as individuals and their views should be considered the personal views of that individual rather than the formal position explanation or interpretation of the IEEE
4
5
IEEE 802 The LANMAN Standard Committee (LMSC) IEEE 802 LANMAN Standards Committee (LMSC) Develops standards for Local Area Networks (LAN) Metropolitan Area Networks (MAN) Regional Area Networks (RAN) Personal Area Networks (PAN) Wireless Specialty Networks etc In operation since March 1980
5
6
Reference Model for
End Stations
Application
Presentation
Session
Transport
Network
Data Link
Physical
bullCopperbullFiberbullAirbullOthers
IEEE802
Lower-Layer Focus IEEE 802 standards emphasize
the functionality of the lowest two layers of the OSI reference model and the higher layers as they relate to network management physical layer (PHY Layer 1) data link layer (DLL Layer 2)
IEEE 802 divides DLL into Medium Access Control (MAC)
Multiple specifications Common logical link control (LLC)
See details in IEEE Std 802 ldquoIEEE Standard for Local and
Metropolitan Area Networks Overview and Architecturerdquo
6
MAC
LLC
Medium
7
Nendicabull Nendica IEEE 802 ldquoNetwork Enhancements for the Next
Decaderdquo Industry Connections Activity IEEE Industry Connections Activities ldquoprovide an efficient
environment for building consensus and developing many different types of shared results Such activities may complement supplement or be precursors of IEEE Standards projectsrdquo
bull Organized under the IEEE 8021 Working Groupbull Chartered through March 2021bull Chair Roger Marksbull Open to all participants no membership
7
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
5
IEEE 802 The LANMAN Standard Committee (LMSC) IEEE 802 LANMAN Standards Committee (LMSC) Develops standards for Local Area Networks (LAN) Metropolitan Area Networks (MAN) Regional Area Networks (RAN) Personal Area Networks (PAN) Wireless Specialty Networks etc In operation since March 1980
5
6
Reference Model for
End Stations
Application
Presentation
Session
Transport
Network
Data Link
Physical
bullCopperbullFiberbullAirbullOthers
IEEE802
Lower-Layer Focus IEEE 802 standards emphasize
the functionality of the lowest two layers of the OSI reference model and the higher layers as they relate to network management physical layer (PHY Layer 1) data link layer (DLL Layer 2)
IEEE 802 divides DLL into Medium Access Control (MAC)
Multiple specifications Common logical link control (LLC)
See details in IEEE Std 802 ldquoIEEE Standard for Local and
Metropolitan Area Networks Overview and Architecturerdquo
6
MAC
LLC
Medium
7
Nendicabull Nendica IEEE 802 ldquoNetwork Enhancements for the Next
Decaderdquo Industry Connections Activity IEEE Industry Connections Activities ldquoprovide an efficient
environment for building consensus and developing many different types of shared results Such activities may complement supplement or be precursors of IEEE Standards projectsrdquo
bull Organized under the IEEE 8021 Working Groupbull Chartered through March 2021bull Chair Roger Marksbull Open to all participants no membership
7
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
6
Reference Model for
End Stations
Application
Presentation
Session
Transport
Network
Data Link
Physical
bullCopperbullFiberbullAirbullOthers
IEEE802
Lower-Layer Focus IEEE 802 standards emphasize
the functionality of the lowest two layers of the OSI reference model and the higher layers as they relate to network management physical layer (PHY Layer 1) data link layer (DLL Layer 2)
IEEE 802 divides DLL into Medium Access Control (MAC)
Multiple specifications Common logical link control (LLC)
See details in IEEE Std 802 ldquoIEEE Standard for Local and
Metropolitan Area Networks Overview and Architecturerdquo
6
MAC
LLC
Medium
7
Nendicabull Nendica IEEE 802 ldquoNetwork Enhancements for the Next
Decaderdquo Industry Connections Activity IEEE Industry Connections Activities ldquoprovide an efficient
environment for building consensus and developing many different types of shared results Such activities may complement supplement or be precursors of IEEE Standards projectsrdquo
bull Organized under the IEEE 8021 Working Groupbull Chartered through March 2021bull Chair Roger Marksbull Open to all participants no membership
7
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
7
Nendicabull Nendica IEEE 802 ldquoNetwork Enhancements for the Next
Decaderdquo Industry Connections Activity IEEE Industry Connections Activities ldquoprovide an efficient
environment for building consensus and developing many different types of shared results Such activities may complement supplement or be precursors of IEEE Standards projectsrdquo
bull Organized under the IEEE 8021 Working Groupbull Chartered through March 2021bull Chair Roger Marksbull Open to all participants no membership
7
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
8
Nendica Motivation and Goalsbull ldquoThe goal of this activity is to assesshellip emerging
requirements for IEEE 802 wireless and higher-layer communication infrastructures identify commonalities gaps and trends not currently addressed by IEEE 802 standards and projects and facilitate building industry consensus towards proposals to initiate new standards development effortsrdquo
8
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
9
Nendica Report The Lossless Network for Data Centers
bull Paul Congdon Editorbull Key messages regarding the data center Packet loss leads to large delays Congestion leads to packet loss Conventional methods are problematic A Layer 3 network uses Layer 2 transport action at
Layer 2 can reduce congestion and thereby loss The paper is not specifying a ldquolosslessrdquo network but
describing a few prospective methods to progress towards a lossless data center network in the future
bull The report is open to comment and may be revised
9
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
10
Use Cases The Lossless Network for Data Centers
bull The scale of Data Center Networks continues to grow Online Data Intensive (OLDI) Services AI Deep Learning and Model Training Cloud High-Performance Computing Financial Trading Distributed Storage Non-Volatile Memory Express (NVMe) over Fabrics
bull Data Center =gt High Performance Computerbull New requirements for lossless Ethernet fabric
10
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
11
Data Center Applications are distributed and latency-sensitive
11
3
Copyright copy 2018 IEEE All rights reserved
experience is highly dependent upon the system responsiveness and even moderate delays of less than a second can have a measurable impact on individual queries and their associated advertising revenue A large chunk of unavoidable delay due to the speed of light is inherently built into a system that uses the remote cloud as the source of decision and information This puts even more pressure on the deadlines within the data center itself To address these latency concerns OLDI services deploy individual requests across thousands of servers simultaneously The responses from these servers are coordinated and aggregated to form the best recommendations or answers Delays in obtaining these answers are compounded by delayed or lsquostragglerrsquo communication flows between the servers This creates a long tail latency distribution in the data center for highly parallel applications To combat tail latency servers are often arranged in a hierarchy as shown in Figure 1 with strict deadlines given to each tier to produce an answer If valuable data arrives late because of latency in the network the data is simply discarded and a sub‐optimal answer may be returned Studies have shown that the network becomes a significant component of overall data center latency when congestion occurs in the network [2]
Figure 1 ndash Parallel Application Hierarchy
The long tail of latency distribution in OLDI data centers can be caused by various factors [3] One is simply related to the mix of traffic between control messages (mice) and data messages (elephants) While most of the flows in the data center are mice most of the bytes transferred across the network are due to elephants Therefore a small number of elephant flows can delay the set‐up of control channels established by mice flows Since OLDI data centers are processing requests over thousands of servers simultaneously the mix and interplay of mice and elephant flows is highly uncoordinated An additional complexity is that flows can change behavior over time what was once an elephant can transform into a mouse after an application has reached steady state Another cause of latency is due to incast at the lower tiers of the node hierarchy Leaf worker nodes return their answers to a common parent in the tree at nearly the same time This can cause buffer over‐runs and packet loss within an individual switch It may invoke congestion management schemes such as flow‐control or congestion notification which have little effect on mice flows and tail latency
Authorized licensed use limited to Roger Marks Downloaded on September 282018 at 034705 UTC from IEEE Xplore Restrictions apply
bull Tend toward congestion eg due to incastbull Packet loss leads to retransmission more
congestion more delaySource IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
12
12
Remote Direct Memory Access (RDMA) in the Data Center
Source InfiniBand Trade Association lthttpwwwroceinitiativeorgroce-introductiongt
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
13
13
RoCE RDMA over Converged Ethernet
Dependent on transport layerIntolerant of packet loss
No protection against packet loss
Typically no protectionagainst packet loss
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
14
Lossless Ethernet is a Foundation of the New Data Center
bull Data Centers are becoming High Performance Computersbull Most Data Center networking is at Layer 34 but it all
rides on Layer 12bull Ethernet is preferred option for Layer 12 Ethernet infrastructure can support multiple upper layers
(eg TCP and RoCEv2) simultaneouslybull Most Data Center networking requires low latency bull Some Data Center networking requires a lossless
Ethernet fabric
14
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
15
IEEE 802 DevelopedData Center Bridging
bull In the IEEE 8021 Working Group the Data Center Bridging Task Group developed many standards to enhance Layer 2 support for the data center Beginning around 2006
bull Key technologies include Priority-based Flow Control (PFC) Congestion Notification Enhanced Transmission Selection (ETS)
15
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
16
Folded-Clos NetworkMany Paths from Server to Server
16
server
spine
rack
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
17
Incast fills output queue(note ECMP cannot help)
17
server
spine
rack
ECMPincast
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
18
Priority flow control (PFC)18
server
spine
rack
PFC
incast
bull Output backup fills ingress queuebull PFC can be used to pause input per QoS classbull IEEE 8021Q (originally in 8021Qbb)
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
19
PFC pauses all flows of the classincluding ldquovictimrdquo flows
19
server
spine
rack
PFC stops both flows
incast
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
20
Traffic-class blocking analysis20
17
Figure 11 Network performance degradation due to HOL blocking
other flows in the network that are injected from other sources and are addressed todestinations dicrarrerent from X (Y and Z respectively) so that they do not contribute to thecongestion tree However as can be seen the latter flows interact (ie share queues) withthe congested flows at several points of the network then the latter flows produce HOLblocking to the former ones As a consequence all the flows end up being injected at thesame rate as the contributor ones despite the fact that the links connected to Y and Zare not saturated On the other hand note that at the switch where the root is locatedthe HOL blocking appears due to the direct contention of the congested flows to accessthe root of the tree but in other (upstream) switches the HOL blocking appears due tocongestion propagation through flow control In general the HOL blocking produced inthe switch where congestion originates is called ldquolow-orderrdquo HOL blocking while the oneproduced at upstream switches is called ldquohigh-orderrdquo HOL blocking
4 Reducing In-Network Congestion
Taking into account the problems derived from congestion trees a natural approach to dealwith them is trying to prevent their formation This approach is actually a particular case
Source IEEE 8021-19-0012-00 [4]
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
21
Congested-flow Isolation(see IEEE Project P8021Qcz [5])
21
DownstreamUpstream1 3 1 3
2
4
2
4Ingress Port
(Virtual Queues)Egress Port Ingress Port
(Virtual Queues)Egress Port
Congested Flows
Non-Congested Flows
1 Identify the flow causing congestion and isolate locally
CIP 2 Signal to neighbor when congested queue fills
Eliminate HoL Blocking
3 Upstream isolates the flow too eliminating head-of-line blocking
PFC 4 If congested queue continues to fill invoke PFC for lossless
Source IEEE 802 Nendica Report The Lossless Network for Data Centers [2]
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
22
Congested-flow Isolation Analysis22
Source IEEE 8021-19-0012-00 [4]
23
Figure 14 Isolating contributor flows eliminates HOL blocking
other flows so that if HOL blocking were prevented congestion trees could exist withoutactually being a problem on their own
Although this basic approach have been followed by very dicrarrerent techniques in generala common advantage of all of them is that they deal with HOL blocking fast and locallyas they attack the HOL blocking when and where it may appear Therefore they cancomplement some techniques explained above that react slowly to congestion (see Section7) Another common issue of all these techniques is that they configure several queues perport to separately store packets belonging to dicrarrerent flows hence reducing or eliminatingthe interaction among these flows and so the HOL blocking they can produce each otherHence the ultimate goal of these techniques is to isolate in separate queues the flows thatcontribute to congestion which are the ones producing significant HOL blocking This canbe seen in Figure 15 that shows the same situation depicted in Figure 11 but now withseveral queues configured per port to separate the flows addressed to dicrarrerent destinationsAs can be seen the separation of flows into dicrarrerent queues leads to the elimination ofHOL blocking so to an improvement of the performance of the network as the utilizationof several links increases with respect to the situation that the flows share queues
Note that the HOL blocking would be ideally and completely eliminated if there wereat every port a queue specifically dedicated to each destination [10] but this is obvi-ously unacrarrordable as the number of queues in the network would grow quadratically with
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
23
Bibliography1) IEEE 802 Orientation for New Participants httpsmentorieeeorg802-ecdcn18ec-18-0117-02pdf
2) IEEE 802 ldquoNetwork Enhancements for the Next Decaderdquo Industry Connections Activity (Nendica) https1ieee802org802-nendica
3) IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) httpsmentorieeeorg8021dcn181-18-0042-00pdf
4) Paul Congdon ldquoThe Lossless Network in the Data Centerrdquo IEEE 8021-17-0007-01 7 November 2017 httpsmentorieeeorg8021dcn171-17-0007-01pdf
5) Pedro Javier Garcia Jesus Escudero-Sahuquillo Francisco J Quiles and Jose Duato ldquoCongestion Management for Ethernet-based Lossless DataCenterNetworksrdquo IEEE 8021-19-0012-00 4 February 2012 httpsmentorieeeorg8021dcn191-19-0012-00pdf
6) IEEE P8021Qcz Project ldquoCongestion Isolationrdquo https1ieee802orgtsn802-1qcz
23
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
24
Going forward
bull IEEE 802 Nendica Report ldquoThe Lossless Network for Data Centersrdquo (18 August 2018) is published but open to further comment Comments are welcome from all
bull Could open an activity to revise the report addressing new issues Proposal [5] may be discussed in future teleconference
bull Report could help identify issues in need of further research or unified action
bull Nendica could instigate standardization of key topics Mainly in IEEE 802 perhaps also in eg IETF
24
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25
25
Nendica Participation
bull Nendica is open to all participants please join in no membership requirements Comment by Internet or contribute documents Call in to teleconferences Attend meetings
bull Nendica web site https1ieee802org802-nendica Details of in-person and teleconference meetings
bull Feel free to contact the Chair (see first slide)
25