Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
42
Understanding the Performance Guarantee of PhysicalTopology Design for Optical Circuit Switched Data Centers
SHIZHEN ZHAO, PEIRUI CAO, and XINBING WANG, Shanghai Jiao Tong University, China
As a first step of designing Optical-circuit-switched Data Centers (ODC), physical topology design is critical
as it determines the scalability and the performance limit of the entire ODC. However, prior works on ODC
have not yet paid much attention to physical topology design, and the adopted physical topologies either scale
poorly, or lack performance guarantee.
We offer a mathematical foundation for the design and performance analysis of ODC physical topologies in
this paper. We introduce a new performance metric π½ (G) to evaluate the gap between a physical topology
G and the ideal physical topology. We develop a coupling technique that bypasses a significant amount of
computational complexity of calculating π½ (G). Using π½ (G) and the coupling technique, we study four physicaltopologies that are representative of those in literature, analyze their scalabilities and prove their performance
guarantees. Our analysis may provide new guidance for network operators to design better physical topologies
for their ODCs.
CCS Concepts: β’NetworksβNetwork performance modeling;Network performance analysis;Datacenter networks.
Additional Key Words and Phrases: Optical Circuit Switch, Data Center, Physical Topology, Logical Topology
ACM Reference Format:Shizhen Zhao, Peirui Cao, and Xinbing Wang. 2021. Understanding the Performance Guarantee of Physical
Topology Design for Optical Circuit Switched Data Centers. Proc. ACMMeas. Anal. Comput. Syst. 5, 3, Article 42(December 2021), 24 pages. https://doi.org/10.1145/3491054
1 INTRODUCTIONAs data center traffic doubles every year [25], building Clos topologies [1, 12, 25] for data centers
using electrical switches is becoming more and more expensive and power prohibitive [2]. In order
to meet the growing demand at reduced energy cost, building ODCs is becoming a promising
alternative for future data centers. An optical circuit switch, e.g. Calient [4], could offer at least
hundreds of times higher switching capacity than an electrical switch, while its energy cost is
hundreds of times lower (less than 45 Watts for an optical circuit switch with 320 Tx/Rx pairs).
While the eventual goal of ODC design is the full-optical data center, due to technological
immaturity, existing designs of ODC have mostly adopted a hybrid design that involves both
electrical switching and optical switching. The design of an ODC typically covers the following
four aspects:
(1) Physical topology design: Determine how to interconnect hosts, electrical packet switches
(EPS) and optical circuit switches (OCS).
(2) OCS control: Determine the configurations of all the OCS nodes in an ODC.
Shizhen Zhao is the corresponding author.
Authorsβ address: Shizhen Zhao; Peirui Cao; Xinbing Wang, Shanghai Jiao Tong University, China.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,
contact the owner/author(s).
Β© 2022 Copyright held by the owner/author(s).
2476-1249/2021/12-ART42
https://doi.org/10.1145/3491054
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
This work is licensed under a Creative Commons Attribution International 4.0 License.
Β© 2021 Copyright held by the owner/author(s). 2476-1249/2021/12-ART42. https://doi.org/10.1145/3491054
42:2 Shizhen Zhao, Peirui Cao, and Xinbing Wang
(3) EPS control: Set up the routing strategies for all the EPS nodes. Since OCS reconfiguration
may affect the connected EPS nodes, the queuing and buffer management policies of the EPS
nodes may also require careful redesign.
(4) Host control: Modify the host protocol stacks at different layers in accordance with the
OCS/EPS control strategies to attain better end-to-end performance.
Existing works on ODC, e.g., Helios [10], c-Through [26], Mordia [24], REACToR [19], OSA [7],
Solstice [20], MegaSwitch [8], FireFly [13], ProjectToR [11], RotorNet [22], Opera [21], etc., have
primarily focused on the OCS/EPS/Host control aspects, but their physical topology designs may not
support large-scale data centers (see Section 2 for more details). Only until recently, three scalable
physical topology designs are adopted by Flexfly [27], Sirius [2] and 3D-Hyper-FleX-LION [18].
However, the performance guarantee of these physical topologies remain unclear. To the best of
our knowledge, there is a lack of rigorous performance metrics that evaluate the βgoodnessβ of
a physical topology design, and there is no systematic study on how to design a good physical
topology for ODC.
In this paper, we focus on the physical topology design for ODC, and offer a rigorous performance
evaluation for different physical topologies. One challenge of this work is to design a performance
metric for physical topologies. While much of the literature on ODC has adopted network through-
put, flow completion time, max link utilization, etc., to evaluate the end-to-end performance of an
ODC, these metrics may not accurately reflect the βgoodnessβ of a physical topology. The reason is
that, both the physical topology design and the network control policies may affect these metrics,
and thus it is hard to evaluate the exact contribution of a physical topology. To eliminate the impact
of network control policies, we introduce ` (G, π ), which is defined as the optimal throughput
across all possible network control policies for a given physical topology G and a given data center
demand pattern π (see Β§3.2). Based on ` (G, π ), we then define βπ½-optimalityβ for a given physical
topology G, i.e., G is said to be π½ (G)-optimal as long as ` (G, π ) β₯ π½ (G)` (Gβ², π ) for any physical
topology Gβ²and any demand pattern π (see Β§3.3). A physical topology G is said optimal if π½ (G) = 1.
We first study how to design the optimal physical topologies with π½ (G) = 1. A naive approach
requires calculating ` (G, π ) for different π βs and optimizing ` (G, π ) among different physical
topologies. Unfortunately, this approach is computationally prohibitive. First, the formulation of
` (G, π ) is essentially an integer programming problem (see (5) in Β§3.2), which is NP-hard in general.
Second, there are an uncountable number of different traffic patterns, and it is impossible to calculate
` (G, π ) one by one. To circumvent the above difficulty, we introduce a coupling technique (see
Lemma 1 in Β§3.3) to compare two physical topologies. The intuition is that, if one physical topology
can βimitateβ all the control strategies of another physical topology, then the first physical topology
will perform no worse than the second one. Using this coupling technique, we prove that the ideal
physical topology and the uniform bipartite physical topology are both optimal. While these two
physical topologies have been adopted by many existing ODC designs [5, 7, 10, 19β22, 24, 26] to
build prototypes, they scale poorly to support large data centers.
We then study how to design scalable physical topologies for large ODCs. Along this direction,
we first prove a negative result that no physical topology is optimal when the number of EPS nodes
exceeds the number of ports of an OCS node. Hence, we have to seek for sub-optimal physical
topologies, and calculate π½ (G) as a performance guarantee of these physical topologies. We could
still use the coupling technique to calculate π½ (G) for these physical topologies. However, due
to the lack of direct connectivity between certain node pairs, it may not always be possible for
one physical topology to imitate the control decisions of another physical topology. To overcome
this challenge, we enhance the above coupling technique using overlay topologies. If one physicaltopology allows constructing an overlay topology to βimitateβ all the control strategies of another
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:3
physical topology, then the first physical topology will perform no worse than the second one (see
Lemma 2 in Β§3.3). Using the enhanced coupling technique, we successfully calculated π½ (G) forthree representative physical topologies.
The contributions of this work are summarized below:
(1) We introduce a new metric π½ (G) to evaluate the goodness of a physical topology G.
(2) We develop a coupling technique to calculate π½ (G). This technique circumvents the compu-
tational complexity of calculating ` (G, π ) for any traffic pattern π .
(3) We design an optimal physical topology (with rigorous proof) for small-scale data centers.
(4) We prove that no physical topology is optimal for large-scale data centers.
(5) Motivated by Flexfly [27], 3D-Hyper-FleX-LION [18] and Sirius [2], we design three physical
topologies that can scale to large ODCs, and prove their performance guarantees π½ (G).
2 RELATEDWORKExisting works on optical circuit switched data centers have primarily focused on the overall
network control. The adopted physical topology designs either scale poorly, or lack performance
guarantee.
c-Through [26], Mordia [24], REACToR [19], OSA [7], Solstice [20] simply use one OCS to connect
all the Top-of-Rack (ToR) switches. This physical topology design could only support tiny data
centers with Ξ(100) number of servers. To improve scalability, MegaSwitch [8] uses fiber rings
for interconnection (a similar idea was also discussed in [24]). This physical topology architecture
could support Ξ(1000) number of servers, but still does not work for large-scale data centers with
over 100k servers and thousands of ToRs.
Using free-space optics, FireFly [13] and ProjectToR [11] enable optical interconnection for thou-
sands of ToRs. However, the free-space optical switching technology faces tremendous deployment
complexity, as many environmental factors (e.g., vibration, dust, and humidity) may hinder the
performance of the free-space optical links.
Helios [10], RotorNet [22], Opera [21], TROD [5] create a uniform bipartite graph between all the
PoDs/ToRs and all the OCSs. This physical topology turns out to be optimal based on our analysis
in Β§4. While this design scales well for the PoD-level interconnect, it scales poorly for the ToR-level
interconnect. A large-scale data center may require hundreds of OCSs to connect over 100k servers.
Unfortunately, a ToR switch only has tens of uplinks and thus it is impossible to establish a link
between every pair of ToR and OCS. (In contrast, a PoD could have hundreds of uplinks.)
Recently, Flexfly [27], Sirius [2] and 3D-Hyper-FleX-LION [18] offered three physical topologies
that can potentially scale to large-scale data centers. However, no performance guarantee is provided
for any of the three physical topologies.
3 MATHEMATICAL MODEL3.1 Basic DefinitionsWe study a network with π electrical-packet-switching (EPS) nodes S = {π1, π2, ..., ππ } and πΎoptical-circuit-switching (OCS) nodes O = {π1,π2, ...,ππΎ }. Each EPS node has πΏ ports, each of
which has a transceiver. Each OCS node has π ingress ports and π egress ports. Each EPS port can
either connect to another EPS port, or connect to a pair of OCS ports. Note that when we connect
an EPS port to OCS ports, we need to separate the transmitter and the receiver of the EPS port,
have the transmitter connected to an OCS ingress port, and have the receiver connected to an OCS
egress port. Since an OCS port capacity is typically much higher (100Γ and more) than an EPS port
capacity, the capacity of each network link is dominated by the EPS port capacity π΅.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:4 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Remark on OCS-OCS connections: The optical transceivers commonly used in data centers,
e.g., 100G QSFP28 CWDM4, typically have a link margin of 5dB and a maximum transmission
distance of 2km. Note that signals traversing an OCS node could experience 3dB optical loss. To
enable OCS-OCS connections, one may need to replace the 2km optical transceivers by 10km
optical transceivers, e.g, QSFP28 LR4. However, the QSFP28 LR4 transceiver has a much higher
consumption and laser cost than the QSFP28 CWDM4 transceiver. Due to the above cost reasons,
we do not allow OCS-OCS connections in this paper. Nevertheless, the methodologies introduced
in this paper can be easily generalized to incorporate OCS-OCS connections.
Traffic Matrix: Let ππ π be the relative traffic demand between the EPS node ππ and the EPS node
π π . Then, the network demand can be modeled using a traffic matrix π = [ππ π , π = 1, 2, ..., π , π =
1, 2, ..., π ] .Physical Topology: We use G = (V, E), where V = S βͺ O, to denote the physical topology
among all the EPS nodes and all the OCS nodes. As shown in Fig. 1(a), the edges of G consist of
two parts E = E1 βͺ E2, where E1 is the set of bidirectional links that interconnect two EPS nodes
and E2 is the set of unidirectional links that interconnect one EPS node and one OCS node. The
port count constraints of the EPS nodes and the OCS nodes are reflected by the degrees of the
nodes in G. Specifically, the in-degree and out-degree of each EPS node are no more than πΏ and the
in-degree and out-degree of each OCS node are no more than π . Note that there could be multiple
physical links connecting two adjacent nodes.
OCS Configuration: Each OCS node has π ingress ports and π egress ports. As shown in Fig. 1(b),
an OCS configuration is essentially a 1 β 1 mapping from the π ingress ports to the π egress
ports. In this paper, we denote by Ξ π , π = 1, 2, ..., πΎ as the OCS configuration of ππ , and let
Ξ = {Ξ 1,Ξ 2, ...,Ξ πΎ }. An OCS configuration helps create multiple logical links between EPS node
pairs. For example, if an ingress port π is mapped to an egress port π in an OCS node, let ππ be
the EPS node that connects to π and π π be the EPS node that connects to π, then a unidirectional
logical link is created between ππ and π π . Since OCS nodes do not perform packet decoding, a packet
traversing through a logical link will not be aware of the underlying OCS node.
Logical Topology: Given a physical topology G and an OCS configuration Ξ , a logical topologyGπ = Gπ (G,Ξ ) = (Vπ , Eπ ) is formed. In the logical topology Gπ (also see Fig. 1(c)), the node set
Vπ = S only contains the EPS nodes; the link set Eπ = E1 βͺ E3, where E1 is the set of physical
links among all the EPS nodes and E3 is the set of logical links formed by the OCS configuration Ξ .We use a non-negative integer π₯π π (Gπ ) to denote the number of links from ππ to π π in Gπ . Note thatπ₯π π (Gπ ) could be larger than one.
3.2 Network Control for Optimal ThroughputNetwork performance are usually evaluated from different dimensions, including throughput,
latency, flow completion time [9], deployment complexity [28], expansion complexity [28, 29], etc.
We focus on throughput in this paper due to the following reasons. First, compared to latency and
flow completion time, throughput can be mathematically formulated and thus is easy to evaluate
even at the planning stage. Second, having a higher throughput could also help reduce latency, flow
completion time, expansion complexity (the number of expansion stages can be reduced [28]), etc.
Given a traffic pattern π and a physical topology G = (V, E), network throughput can be affectedby various control policies, including route selection, load balancing, congestion control, etc. For
ODC, the OCS configurations can be also a determining factor for throughput. We consider an
βoracleβ that could choose the best control policies for every pair of (G, π ). Specifically, this oracleperforms topology engineering (ToE) and traffic engineering (TE) to attain the optimal network
throughput, denoted by ` (G, π ). Admittedly, this oracle is hard to realize in practice. But on the
other hand, this oracle allows us to eliminate the impact of different control policies on ` (G, π ),
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:5
β¦
β¦
β¦ β¦ β¦
(a) Physical topology.
β¦
β¦
β¦ β¦ β¦
(b) OCS configuration.
β¦
β¦
β¦ β¦ β¦
(c) Logical topology.
Fig. 1. Physical topology and logical topology. When an OCS configuration acts on a physical topology, thelogical topology will be changed. In (a) physical topology, red arrowlines represent the set of bidirectionallinks that interconnect two EPS nodes and green arrowlines represent the set of unidirectional links thatinterconnect one EPS node and one OCS node. In (c) logical topology, blue arrowlines represent the setof logical links formed by the OCS configuration. Note that each EPS port can either connect to one redbidirectional arrowline, or connect to two green unidirectional arrowlines with opposite directions.
and thus ` (G, π ) can be used to measure the goodness of the physical topology G under the traffic
pattern π . Next, we will describe how this oracle calculates the optimal throughput ` (G, π ).Topology Engineering (ToE): ToE controls the logical topologies used to serve different traffic
patterns. For each traffic pattern π , ToE could either generate one logical topology, or generate
multiple logical topologies and perform time sharing among these logical topologies. Letπ β₯ 1 be
the number of logical topologies generated by ToE. Each logical topology, denoted by G (π)π
,π =
1, 2, ..., π, corresponds to a set of πΎ OCS configurations Ξ (π) = {Ξ (π)1,Ξ (π)
2, ...,Ξ (π)
πΎ}, and lasts for
Ξ(π)amount of time. Note that reconfiguring OCSs could incur a reconfiguration latency πΏ , which
ranges from hundreds of nanoseconds to tens of milliseconds, depending on the optical switching
technology being used. Hence, the average bandwidth allocated to (ππ’, ππ£) can be computed as
π΅π’π£ =
(πβπ=1
π₯π’π£ (G (π)π
)Ξ(π)π΅
) / (ππΏ +
πβπ=1
Ξ(π)
). (1)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:6 Shizhen Zhao, Peirui Cao, and Xinbing Wang
In addition, traffic patterns may change in an ODC. To model this fact, we impose the following
constraint on Ξ(π):
ππΏ +πβπ=1
Ξ(π) β€ Ξ, (2)
where Ξ is the duration of the traffic pattern π .
Traffic Engineering (TE): Given a ToE solution, network operators can then perform traffic
engineering to maximize network throughput. Here we adopt the max-flow formulation. For every
flow from ππ to π π , let ππ π (π’, π£) be the amount of traffic allocated to (ππ’, ππ£). The traffic allocation
ππ π (π’, π£) must satisfy the following constraints:
(1) Flow conservation constraints: The total ingress traffic equals to the total egress traffic at
every EPS node, i.e.,βπ€1β π£
ππ π (π€1, π£) =βπ€2β π£
ππ π (π£,π€2), β π£ β π, π,` ππ π +
βπ€1β π
ππ π (π€1, π) =βπ€2β π
ππ π (π,π€2),βπ€1β π
ππ€1 π (π’, π) = `ππ π +βπ€2β π
ππ π ( π,π€2),(3)
where ` is a scaling factor of the network traffic pattern π .
(2) Network capacity constraints: For any EPS node pair (ππ’, ππ£), the total amount of traffic
allocated to (ππ’, ππ£) must be no larger than π΅π’π£ , i.e.,
πβπ=1
πβπ=1
ππ π (π’, π£) β€ π΅π’π£ . (4)
Defining ` (G, π ): Note that we introduced a scaling factor ` in (3). We could optimize ` by solving
the following optimization problem:
max `
s.t. G (π)π
= Gπ (G,Ξ (π) ),βπ = 1, 2, ..., π,
`, π΅π’π£, ππ π (π’, π£),G (π)π
,Ξ(π)satisfy (1)(2)(3)(4).
(5)
Then, ` (G, π ) is defined as the optimal objective value of the above optimization problem.
Remark on the complexity of calculating ` (G, π ): Even though we have defined ` (G, π ) rig-orously, calculating ` (G, π ) turns out to be an NP-Complete problem (see Appendix A for the
detailed proof). We could use Gurobi [23], the worldβs fastest linear programming solver, to solve (5).
Unfortunately, (5) contains Ξ(π 2πΎπ) number of integer variables and Ξ(π 4) fractional variables.As we will show in Section 7.2, Gurobi may fail to solve (5) even when there are only 9 EPS nodes.
Remark on TE:We have adopted an βedge formulationβ for TE in (3) and (4). This edge formulation
requires Ξ(π 4) routing variables, one for every link & flow pair. Another widely used formulation
for TE is the βpath formulationβ (see SWAN [14]). The path formulation pre-computes a set of paths
for each flow, and then defines a traffic-spitting ratio variable for every flow and every path. This
formulation generally requires fewer number of decision variables. However, its logical topology
must be determined first in order to compute paths. In contrast, calculating ` (G, π ) requires a jointoptimization over topology and routing, and thus cannot adopt the βpath formulationβ.
3.3 Objective of Physical Topology DesignDefinition 1. A physical topology G is said to be π½-optimal if for every traffic pattern π ,
` (G, π ) β₯ π½` (Gβ², π )holds for any possible physical topology Gβ². π½ is called the performance ratio of G.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:7
Clearly, every physical topology G corresponds to a π½ (G) β [0, 1]. π½ (G) can be viewed as a
performance guarantee of the physical topology G, and the larger the better. A physical topology
is said to be optimal if its corresponding π½ = 1.
Unfortunately, directly calculating π½ (G) based on (5) and Definition 1 is computationally in-
tractable for large scale ODCs. First, calculating π½ (G, π ) is NP-Complete. Second, there are un-
countable number of traffic patterns π , and it is impossible to calculate ` (G, π ) one by one. Due
to the above difficulties, we have to resort to a theoretical approach. The key to our theoretical
approach is a coupling technique detailed in the following two lemmas. This coupling technique
helps us circumvent the algorithmic complexity of calculating π½ (G, π ) for all possible π βs.
Lemma 1. Given two physical topologies G and Gβ². If any logical topology Gπ formed on top of Gcan be also formed on top of Gβ², then for any traffic pattern π , ` (Gβ², π ) β₯ ` (G, π ).
Proof. Let (Gβ(π)π
, π βπ π (π’, π£)) be the optimal topology+routing solution that attains the optimal
throughput ` (G, π ). Since the logical topologies Gβ(π)π
are also realizable on Gβ², (Gβ(π)
π, π βπ π (π’, π£))
must also satisfy the constraints in (5) when calculating ` (Gβ², π ). Therefore, ` (Gβ², π ) β₯ ` (G, π ). β‘
Lemma 1 can be also generalized to the concept of overlay topology, which is defined below:
Definition 2. Given a logical topology Gπ . A topology Gππ can be formed as an overlay topologyof Gπ if we can reserve a number of paths for every source and destination pairs ππ , π π , such that thefollowing two conditions are met:
(1) From the overlay topology Gππ aspect, for every pair of EPS nodes ππ , π π , the total reserved capacityamong all paths is equal to π₯π π (Gππ )π΅.
(2) From the underlay topology Gπ aspect, for any EPS node pair (ππ , π π ), the total capacity offeredfor reservation does not exceed π₯π π (Gπ )π΅.
Lemma 2. Given two physical topologies G and Gβ². If for any logical topology Gπ formed on topof G, there is a logical topology Gβ²
πformed on top of Gβ², such that Gπ can be formed as an overlay
topology of Gβ²π, then for any traffic pattern π , ` (Gβ², π ) β₯ ` (G, π ).
Proof. See Appendix B. β‘
3.3.1 An Ideal Physical Topology. Fix the number of links πΏ and the link capacity π΅ of an EPS node.
Assume that each OCS node has the infinite number of ports. Then, we could connect all the EPS
nodes to a single OCS node and obtain an ideal physical topology Gideal
πΏ,π΅. It is easy to verify that
any logical topology formed on top of any physical topology is realizable on top of Gideal
πΏ,π΅. Then
according to Lemma 1, Gideal
πΏ,π΅is optimal, as stated below:
Theorem 3. For any fixed number of links πΏ and fixed link capacity π΅, GidealπΏ,π΅
is optimal.
Based on the ideal physical topology Gideal
πΏ,π΅, we can then define the ideal throughput `ideal
πΏ,π΅(π ) for
an arbitrary traffic pattern π as
`idealπΏ,π΅ (π ) = ` (Gideal
πΏ,π΅ , π ).
Clearly, `idealπΏ,π΅
(π ) scales linearly with respect to π΅. Further, as the number of links πΏ of an EPS node
increases, the ideal throughput `idealπΏ,π΅
(π ) would never decrease.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:8 Shizhen Zhao, Peirui Cao, and Xinbing Wang
4 OPTIMAL PHYSICAL TOPOLOGY FOR SMALL DATA CENTERS WITH π β€ π
Building an ideal physical topology requires π β₯ ππΏ. This only applies to tiny data centers with
at most tens of EPS nodes1. In this section, we study a less restrictive case where π β€ π , which
corresponds to hundreds of EPS nodes.
Before proposing our physical topology design, we first introduce the following lemma.
Lemma 4. Given an π Γ π non-negative integer matrix x = {π₯π, π }, for any integer π» β₯ 1 andmutually disjoint sets A1,A2, ...,AπΆ satisfying βͺπΆπ=1
Aπ = {1, 2, ..., π } and Aπ1β© Aπ2
= β ,βπ1 β π2,there must exist π» non-negative integer matrices x(1) , ..., x(H) satisfying(1) x = x(1) + Β· Β· Β· + x(H) ;
(2) 0 β€ βππ=1π₯(β)π, π
β€ββπ
π=1π₯π,π
π»
β, for any π = 1, ..., π and any β = 1, 2, ..., π» ;
(3) 0 β€ βπβAπ
βππ=1π₯(β)π, π
β€ββ
πβAπ
βππ=1π₯π,π
π»
β, for any π = 1, ...,πΆ and any β = 1, 2, ..., π» ;
(4) 0 β€ βππ=1π₯(β)π, π
β€ββπ
π=1π₯π,π
π»
β, for any π = 1, ..., π and any β = 1, 2, ..., π» ;
(5) 0 β€ βππ=1
βπ βAπ
π₯(β)π, π
β€ββπ
π=1
βπβAπ
π₯π,π
π»
β, for any π = 1, ...,πΆ and any β = 1, 2, ..., π» .
Lemma 4 can be proved by transforming it into a sequence of max-flow problems. It is actually
a direct consequence of Theorem 3 in [29] by setting A = B = {A1, ...,AπΆ , {1}, ..., {π }}. SeeAppendix A in [29] for the detailed proof.
4.1 Physical Topology DesignMotivated by Lemma 4, we set the number of OCS nodes as πΎ = πΏ and evenly distribute the πΏ links
of each EPS node across all the OCS nodes. In other words, the physical topology G becomes a
uniform bipartite graph, with exactly one bidirectional physical link between every pair of EPS
node and OCS node. (More specifically, the transmitter of the EPS port is connected to an OCS
ingress port, and the receiver of the EPS port is connected to an OCS egress port.) Note that π β€ π
ensures that the number of ports of each OCS node is not exhausted. We denote such a physical
topology by Guniform, as shown in Fig. 2. We can prove that Guniform is optimal.
β¦
β¦
β¦
β¦ β¦ β¦
β¦ β¦ β¦ β¦
β¦
Fig. 2. Guniform topology.
Theorem 5. If π β€ π , then Guniform is optimal.
Proof. We only need to prove that any logical topology Gπ is realizable on top of the physical
topology Guniform.
1A practical OCS node has hundreds of ports and a practical ToR switches has tens of ports.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:9
Since each EPS node has πΏ ports, we must have{ βππ=1π₯π π (Gπ ) β€ πΏ, β π = 1, ..., π ,βπ
π=1π₯π π (Gπ ) β€ πΏ, β π = 1, ..., π .
(6)
Then, according to Lemma 4, we can decompose {π₯π π (Gπ )} into π» = πΎ = πΏ integer matrices
{π₯ (β)π π
(Gπ )}, β = 1, 2, ..., π» satisfying the conditions in Lemma 4. Based on (6) and the conditions (2)
and (4) of Lemma 4, it is easy to check that any {π₯ (β)π π
(Gπ )}, β = 1, 2, ..., π» , must satisfy{ βππ=1π₯(β)π π
(Gπ ) β€ 1, β π = 1, ..., π ,βππ=1π₯(β)π π
(Gπ ) β€ 1, β π = 1, ..., π .
Since π₯(β)π π
(Gπ )βs are non-negative integers, π₯ (β)π π
(Gπ ) is actually a permutation matrix. Note that
each EPS node has one bidirectional physical link connected to the β-th OCS node, it is easy to
verify that the permutation matrix π₯(β)π π
(Gπ ) is realizable on the β-th OCS node.
β‘
5 NEGATIVE RESULT ON PHYSICAL TOPOLOGY DESIGNHaving designed an optimal physical topology for the case π β€ π , we thus wonder if it is possible
to design an optimal physical topology for large-scale data centers with π > π . Unfortunately, the
following theorem demonstrates the impossibility of such a design.
Theorem 6. If π > π , then no physical topology is π½-optimal for π½ > π+π β1
2πβ2. In other words, for
any physical topology G, there exists a traffic pattern π and a physical topology Gβ², such that
` (G, π ) β€ π + π β 1
2π β 2
` (Gβ², π ).
Proof. We pick π EPS nodes from the π EPS nodes, and form a permutationZ = (π1, π2, ..., ππ )among the π EPS nodes. In total, we obtain π (π β 1) Β· Β· Β· (π β π + 1) different permutations.
For each permutation Z = (π1, π2, ..., ππ ), we count the maximum number of unidirectional
logical links that can be formed for ππ1β ππ2
, ππ2β ππ3
, ..., πππ β ππ1under a physical topology
G, and denote this number by π (G,Z).Step 1:We would like to prove that for any physical topology G, there must exist a permutation
Zβ, such that
π (G,Zβ) β€ πΏπ π
π β 1
.
Let ππβπ be the maximum number of links that can be created from the EPS node ππ to the EPS
node π π . We computeβZπ (G,Z) =
βZ
βπ, π :πβ π
1{ π is after π in Z}ππβπ =βπ, π :πβ π
ππβπ
βZ
1{ π is after π in Z} .
Here, β π is after π inZ = (π1, π2, ..., ππ )β means that either there exists π = 1, 2, ..., π β 1, such that
ππ = π, ππ+1 = π , or ππ = π, π1 = π . In order to compute the number of permutations in which π is
after π , i.e.,β
Z 1{ π is after π in Z} ,
(1) we first find a permutation with π β 2 elements excluding π and π , and the total number of
such permutations is (π β 2) (π β 3) Β· Β· Β· (π β π + 1);(2) we then insert π and π to the right places of the above permutation, and the number of
different inserting positions is π . (For example, when π = 4, π π β β, βπ πβ, β β π π and π β βπ arethe 4 generated permutations.)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:10 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Hence, βZ
1{ π is after π in Z} = π (π β 2) (π β 3) Β· Β· Β· (π β π + 1).
We then focus on
βπ, π :πβ π ππβπ . We first fix π and compute
βπ :πβ π ππβπ . Assume that the EPS node ππ
has at least one transmitter connected to the OCS node ππ . Let π π be the number of links between
the egress ports of the OCS node ππ and the receivers of all the EPS nodes excluding ππ . Then it is
easy to check that the contribution of the OCS node ππ toβπ :πβ π ππβπ is at most π π β€ π . Since the
EPS node ππ can connect to at most πΏ different OCS nodes, we must haveβπ :πβ π
ππβπ β€ πΏπ , for any fixed π .
Therefore, βπ, π :πβ π
ππβπ =
πβπ=1
βπ :πβ π
ππβπ β€ ππΏπ . (7)
Note that there are π (π β 1) Β· Β· Β· (π β π + 1) different permutations in total. According to the
Pigeonhole Principle, there must exist a permutation Zβsuch that
π (G,Zβ) β€β
Z π (G,Z)π (π β 1) Β· Β· Β· (π β π + 1) β€ πΏπ
π
π β 1
.
Step 2:We construct a permutation traffic pattern π β based on π β = (πβ1, πβ
2, ..., πβ
π ), i.e.,
π βπ π =
1, if π = πβπ , π = π
βπ+1, where π = 1, 2, ..., π β 1;
1, if π = πβπ , π = πβ
1;
0, otherwise.
Clearly, we can create a physical topology G(π β) by picking πΏ OCS nodes and connecting these OCSnodes to the π EPS nodes ππβ
1
, ππβ2
, ..., ππβπ using a uniform bipartite graph. On top of this physical
topology, we can then create πΏ identical cycles πβ1β πβ
2β ... β πβ
π β πβ
1as the logical topology.
It is easy to check that this logical topology offers the highest throughput for the traffic matrix π β
over the physical topology G(π β), and ` (G(π β), π β) = π΅πΏ.We then compute the throughput ` (G, π β) for π β over G. Consider the optimal ToE+TE strategy
that attains the throughput value ` (G, π β). Then, within a time period of Ξ,
` (G, π β)Ξπβπ=1
πβπ=1
π βπ π = ` (G, π β)Ξπ (8)
amount of traffic is delivered from sources to destinations. The delivered traffic can be grouped
into two classes:
(1) Class-1: traffic that is delivered via only one hop;
(2) Class-2: traffic that is delivered via at least two hops.
Class-1 traffic must occupy one of following types of logical links ππβ1
β ππβ2
, ..., ππβπ β1
βππβ
π , ππβ
π β ππβ
1
. Since the total number of such links is at most π (G,Zβ), we must have
Class-1 traffic β€ π (G,Zβ)Ξπ΅. (9)
Every byte of the Class-2 traffic occupies at least two bytes of the network capacity. Hence,
2 Γ Class-2 traffic β€ (πΏπ β π (G,Zβ))Ξπ΅. (10)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:11
Based on (8)(9)(10), we then obtain
` (G, π β) β€ 1
π
(π΅π (G,Zβ) + π΅
2
(πΏπ β π (G,Zβ)))=π΅πΏ
2
+ π΅
2π π (G,Zβ)
β€ π΅πΏ
2
+ π΅πΏ2
π
π β 1
=π + π β 1
2π β 2
` (G(π β), π β).
This completes the proof. β‘
Discussion: Note that π+π β1
2πβ2< 1 when π > π + 1. Then according to Theorem 6, no physical
topology is optimal. Readers may wonder if it is possible to construct an optimal physical topology
for the case where π = π + 1. The answer is no. When π = π + 1,π+π β1
2πβ2= 1. This indicates that, if
there exists an optimal physical topology, the β=β must hold for the inequality (7). This β=β requires
that the 2π OCS ports (π ingress ports and π egress ports) must connect to 2π different EPS nodes.
This is impossible because there are only π = π + 1 EPS nodes.
6 SCALABLE PHYSICAL TOPOLOGY DESIGNS WITH PERFORMANCE GUARANTEEIn this section, we propose three scalable physical topology designs and analyze their performance
guarantees. The main results in this section are summarized in Table 1. Note that the maximum
scale of different physical topologies depends on the OCS node port count π and the EPS node port
count πΏ. A commercially-available OCS node, e.g., the 3D-MEMS based OCS [3] and the Arrayed
Wavelength Grating Routers [2], could have hundreds of ports. An off-the-shelf ToR EPS node, e.g.,
the CloudEngine 8800 [15], could have 32 βΌ 64 ports. Hence, the physical topologes studied in this
section could support large-scale ODCs with thousands or even tens of thousands of EPS nodes.
Table 1. The scalability and the performance ratios of different physical topologies.
Physical Topology Maximum Number of EPS Nodes Performance Ratio π½
Guniform π 1
Geps-group π βπΏ/3β minπ
{`idealβπΏ/3β,π΅ (π )/`
ideal
πΏ,π΅(π )
}Geps-mesh π 2
minπ
{`idealβπΏ/3β,π΅ (π )/`
ideal
πΏ,π΅(π )
}Gocs-mesh βπ /2β βπΏ/2β minπ
{`idealβπΏ/2β,π΅ (π )/`
ideal
πΏ,π΅(π )
}6.1 EPS-Group based Physical TopologyPhysical topology of Geps-group: Let π = βπΏ/3β. We create πΆ EPS node groups, each of which
contains π EPS nodes and is denoted by Aπ . Within each EPS node group, there are two links
between every pair of EPS nodes. Each EPS node has 2(π β 1) intra-group links, and the total
number of links in each EPS node group is π (π β 1). Each EPS node also has π links connected
to the OCS nodes to establish inter-group connections. It is easy to verify that the degree of each
EPS node is 3π β 2 β€ πΏ. We use πΎ = π2OCS nodes and divide these OCS nodes into π equal-sized
groups. We number the EPS nodes in a group from 1 to π . The π-th EPS node in a group has one
link connected to every OCS node in the π-th OCS node group, where π = 1, 2, ..., π . Since the
number of ports of an OCS node is π , the number of EPS node groups of Geps-group must satisfy
πΆ β€ π . Thus, the total number of EPS nodes in Geps-group satisfies π = πΆπ β€ π βπΏ/3β .
Theorem 7. For any traffic pattern π , the optimal throughput under Geps-group satisfies
` (Geps-group, π ) β₯ `idealβπΏ/3β,π΅ (π ).
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:12 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Fig. 3. Toy example of EPS-Group topology (π = 2, π = 6).
Proof. Let Gideal
βπΏ/3β,π΅ be the ideal physical topology. According to Lemma 2, we need to show that
any logical topology Gπ formed over Gideal
βπΏ/3β,π΅ can be realized as a overlay topology over Geps-group.
Since each EPS node in Gideal
βπΏ/3β,π΅ has π = βπΏ/3β number of uplinks, the following constraints must
be satisfied { βππ=1π₯π π (Gπ ) β€ βπΏ/3β = π, β π = 1, ..., π ,βπ
π=1π₯π π (Gπ ) β€ βπΏ/3β = π, β π = 1, ..., π .
(11)
Note that it may not always be possible to establish a logical link between two EPS nodes in
Geps-group, especially when the two EPS nodes are in different EPS node groups and these two
nodes have different intra-group indices. Hence, we are interested in constructing Gπ as an overlay
topology on top of Geps-group. For every pair of EPS nodes ππ and π π in Geps-group, we would like to
create π₯π π (Gπ ) number of virtual logical links in between. Here, a virtual logical links is essentiallya path between two EPS nodes, with each path segment occupying exactly one logical link.
For every EPS node pair (ππ , π π ), we classify this pairβs virtual logical links into π = βπΏ/3β types:β’ If ππ and π π are in the same EPS node group, the type-π virtual logical links are two-hop
paths with intermediate node being the π-th EPS node in this EPS node group. It is possible
that the π-th EPS node is exactly ππ or π π . In this case, this βtwo-hop pathβ degenerates to a
single-hop path.
β’ If ππ and π π belong to different EPS node groups, the type-π virtual logical links are three-hop
paths with intermediate nodes being the π-th EPS node in the EPS node group of ππ and the
π-th EPS node in the EPS node group of π π . It is possible that the π-th EPS node in the EPS
node group of ππ is exactly ππ , or the π-th EPS node in the EPS node group of π π is exactly π π .
In this case, this βthree-hop pathβ degenerates to a two/one-hop path.
We use π₯π
π π, π = 1, 2, ..., π to denote the total number of type-π paths between ππ and π π . According
to Lemma 4, there exists an integer solution of [π₯ππ π] satisfying the following constraints:
(1) π₯π π =βππ=1
π₯π
π π,βπ, π = 1, ..., π ;
(2) 0 β€ βππ=1π₯π
π πβ€ 1,βπ = 1, ..., π , π = 1, ..., π ;
(3) 0 β€ βπβAπ
βππ=1π₯π
π πβ€ π,βπ = 1, ...,πΆ, π = 1, ..., π ;
(4) 0 β€ βππ=1π₯π
π πβ€ 1,βπ = 1, ..., π , π = 1, ..., π ;
(5) 0 β€ βππ=1
βπ βAπ
π₯π
π πβ€ π,βπ = 1, ...,πΆ, π = 1, ..., π .
Note that π₯π
π πdetermines the underlying logical topology of the overlay topology Gπ . We need to
show that this logical topology is compatible with the physical topology Geps-group.
Intra-group Logical Topology: If ππ and π π are in the same EPS node group, then the number of
links in between is
βππ’=1
π₯( π mod π )ππ’
+ βππ£=1
π₯(π mod π )π£ π
. Based on the constraints 2) and 4), it is easy to
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:13
check that the above number is no greater than 2, which is compatible with intra-group physical
topology design of Geps-group.
Inter-group Logical Topology: Fix π = 1, 2, ..., π . Let π (π1β1)π+π and π (π2β1)π+π be the π-th EPS
node in the EPS node groups Aπ1and Aπ2
, respectively. Then the total number of links from
π (π1β1)π+π to π (π2β1)π+π is π¦ππ1π2
=βπβAπ
1
βπ βAπ
2
π₯π
π π. Based on the constraints 3) and 5), it is easy
to verify that
βπΆπ1=1
π¦π€π1π2
β€ π,βπΆπ2=1
π¦π€π1π2
β€ π . Then, using the same arguments in the proof of
Theorem 5, we can prove that the logical topology π¦ππ1π2
, π1, π2 = 1, 2, ...,πΆ is realizable on top of
Geps-group. This completes the proof. β‘
Remark: Theorem 7 implies that π½ (Geps-group) = minπ {`idealβπΏ/3β,π΅ (π )/`ideal
πΏ,π΅(π )}. Geps-group is similar
to the physical topology used by Flexfly [27]. In order to prove the performance guarantee in
Theorem 7, we have restricted Geps-group to use uniform mesh for its intra-group topology, which
limits the scale of Geps-group. One could use flattened butterfly [17] for intra-group interconnect
to increase the size of each EPS group of Geps-group, as suggested by Flexfly [27]. However, the
performance guarantee would become much poorer.
6.2 EPS-Mesh based Physical Topology DesignPhysical topology of Geps-mesh: Let π = βπΏ/3β. We arrange all the ESP nodes into a πΆ Γπ mesh.
Theπ EPS nodes in the π-th row forms an EPS node group, denoted by Aπ , π = 1, 2, ...,πΆ . We index
all the EPS nodes row by row, from 1 to πΆπ . There are 2π OCS nodes dedicated for each EPS node
group Aπ . For every EPS node in Aπ , there is one bidirectional link between this EPS node and
each of the 2π OCS nodes. There are also π OCS nodes dedicated for the EPS nodes in each column.
For theπ€-th column, there is one bidirectional link between every EPS node in this column and
every OCS node dedicated for this column. Since the number of ports of an OCS node is π , the
number of EPS nodes in each row or each column must be no more than π . Thus, the total number
of EPS nodes must satisfy π = πΆπ β€ π 2.
Fig. 4. Toy example of EPS-Mesh topology (π = 3, π = 9).
Theorem 8. For any traffic pattern π , the optimal throughput under Geps-mesh satisfies
` (Geps-mesh, π ) β₯ `idealβπΏ/3β,π΅ (π ).
Proof. Let Gideal
βπΏ/3β,π΅ be the ideal physical topology. According to Lemma 2, we need to show that
any logical topology Gπ formed over Gideal
βπΏ/3β,π΅ can be realized as a overlay topology over Geps-mesh.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:14 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Since each EPS node in Gideal
βπΏ/3β,π΅ has π = βπΏ/3β number of uplinks, the following constraints
must be satisfied { βππ=1π₯π π (Gπ ) β€ βπΏ/3β = π, β π = 1, ..., π ,βπ
π=1π₯π π (Gπ ) β€ βπΏ/3β = π, β π = 1, ..., π .
(12)
Similar to Geps-group, it may not always be possible to establish a logical link between two EPS
nodes in Geps-mesh as well, especially when the two EPS nodes are in different rows and different
columns. Hence, we will construct Gπ as an overlay topology on top of Glarge. For every pair of
EPS nodes ππ and π π in Geps-mesh, we would like to create π₯π π (Gπ ) number of virtual logical links in
between, and classify these virtual logical links intoπ types:
β’ If ππ and π π are in the same row, the type-π€ virtual logical links are two-hop paths with
intermediate node being theπ€-th EPS node in this row.
β’ If ππ and π π belong to different rows, the type-π€ virtual logical links are three-hop paths with
intermediate nodes being theπ€-th EPS node in the same row as ππ and theπ€-th EPS node in
the same row as π π .
We useπ₯π€π π,π€ = 1, 2, ...,π to denote the total number of type-π€ paths between ππ and π π . According to
Lemma 4, there exists an integer solution of π₯π€π π,π€ = 1, 2, ...,π satisfying the following constraints:
(1) π₯π π (Gπ ) =βππ€=1
π₯π€π π,βπ, π = 1, ..., π ;
(2) 0 β€ βπβAπ
βππ=1π₯π€π π
β€ π,βπ = 1, ...,πΆ,π€ = 1, ...,π ;
(3) 0 β€ βππ=1
βπ βAπ
π₯π€π π
β€ π,βπ = 1, ...,πΆ,π€ = 1, ...,π .
Note that π₯π€π πdetermines the underlying logical topology of the overlay topology Gπ . We need to
show that this logical topology is compatible with the physical topology Geps-group.
Inter-group Logical Topology: Fix π€ = 1, 2, ...,π . Let π (π1β1)π +π€ and π (π2β1)π +π€ be the π€-th
EPS node in the EPS node groups Aπ1and Aπ2
, respectively. Then the total number of links from
π (π1β1)π +π€ to π (π2β1)π +π€ is π¦π€π1π2
=βπβAπ
1
βπ βAπ
2
π₯π€π π. Based on the constraints 2) and 3), it is easy
to verify that
βπΆπ1=1
π¦π€π1π2
β€ π,βπΆπ2=1
π¦π€π1π2
β€ π . Then, using the same arguments in the proof of
Theorem 5, we can prove that the logical topology π¦π€π1π2
, π1, π2 = 1, 2, ...,πΆ is realizable on the π OCS
nodes dedicated for theπ€-th row.
Intra-group Logical Topology: Fix π = 1, 2, ...,πΆ . Let π (πβ1)π +π€1and π (πβ1)π +π€2
be the π€1-th
and the π€2-th EPS nodes in Aπ . Then the total number of links from π (πβ1)π +π€1to π (πβ1)π +π€2
is
π§ππ€1,π€2
=βππ’=1
π₯π€2
(πβ1)π +π€1,π’+βπ
π£=1π₯π€1
π£,(πβ1)π +π€2
. According to Eqn. (13) and the constraints 2) and 3),
we can verify that
βππ€1=1
π§ππ€1π€2
β€ 2π,βππ€2=1
π§ππ€1π€2
β€ 2π . Take the first one for example:
πβπ€1=1
π§ππ€1π€2
=
πβπ€1=1
πβπ’=1
π₯π€2
(πβ1)π +π€1,π’+
πβπ€1=1
πβπ£=1
π₯π€1
π£,(πβ1)π +π€2
=βπβAπ
πβπ’=1
π₯π€2
ππ’+
πβπ£=1
π₯π£,(πβ1)π +π€2(Gπ ) β€ 2π .
Again, using the same arguments in the proof of Theorem 5, we can prove that the logical topology
π§ππ€1π€2
,π€1,π€2 = 1, 2, ...,π is realizable on the 2π OCS nodes dedicated for the EPS node group Aπ .
This completes the proof. β‘
Remark:Theorem 8 implies that π½ (Geps-mesh) = minπ {`idealβπΏ/3β,π΅ (π )/`ideal
πΏ,π΅(π )}. The design ofGeps-mesh
shares a similar idea as the physical topology used by 3D-Hyper-FleX-LION [18]. Although 3D-
Hyper-FleX-LION uses a 3D-mesh instead, it is easy to generalize Geps-mesh from 2D-mesh to
3D-mesh to further improve its scalability. Notably, one critical design of Geps-mesh is that the
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:15
EPS ports used for intra-group interconnect is βtwoβ times the EPS ports used for inter-group
interconnect. This design allows us to attain the best performance guarantee.
6.3 OCS-Mesh based Physical Topology DesignPhysical Topology of Gocs-mesh: We arrangeπΆ2
OCS nodes into anπΆ ΓπΆ mesh, whereπΆ = βπΏ/2β.The EPS nodes are arranged into πΆ groups, each of which containsπ EPS nodes and is denoted by
Aπ , π = 1, 2, ...,πΆ . For the EPS node group Aπ , we connect two directed links from the transmitters
of each EPS node in Aπ to the ingress ports of each OCS node in the π-th row; we also connect
two directed links from the egress ports of each OCS node in the π-th column to the receivers
of each EPS node in Aπ . Since each OCS node has π ingress/egress ports, the number of EPS
nodes in each group must satisfyπ β€ βπ /2β. Hence, the total number of EPS nodes must satisfy
π = πΆπ β€ βπΏ/2β βπ /2β .
β¦ β¦β¦ β¦
Fig. 5. Toy example of OCS-Mesh topology (π = 6, π = 9). Some RX side links are omitted for clear visualization.Each EPS node in A2 is connected to two egress ports of OCS(1,2), OCS(2,2) and OCS(3,2). Each EPS node inA3 is connected to two egress ports of OCS(1,3), OCS(2,3) and OCS(3,3).
Theorem 9. For any traffic pattern π , the optimal throughput under Gocs-mesh satisfies
` (Gocs-mesh, π ) β₯ `idealβπΏ/2β,π΅ (π ).
Proof. Let Gideal
βπΏ/2β,π΅ be the ideal physical topology. According to Lemma 2, we need to show that
any logical topology Gπ formed over Gideal
βπΏ/2β,π΅ can be realized as a overlay topology over Gocs-mesh.
Since each EPS node in Gideal
βπΏ/2β,π΅ has πΆ = βπΏ/2β number of uplinks, the following constraints
must be satisfied { βππ=1π₯π π (Gπ ) β€ βπΏ/2β = πΆ, β π = 1, ..., π ,βπ
π=1π₯π π (Gπ ) β€ βπΏ/2β = πΆ, β π = 1, ..., π .
(13)
Again, we construct Gπ as an overlay topology on top of Gocs-mesh. For every pair of EPS nodes
ππ and π π in Gocs-mesh, we would like to create π₯π π (Gπ ) number of virtual logical links. Each virtual
logical link is essentially a two-hop path ππ β ππ β π π , π = 1, 2, ..., π , and we use π₯ππ π to denote the
number of such virtual logical links.
Next, we would like to divide π₯π π (Gπ ) =βππ=1
π₯ππ π , such that the π₯ππ π βs are compatible with the
physical topology Gocs-mesh. Unlike Geps-group and Geps-mesh, we need to apply Lemma 4 multiple
times to obtain a valid decomposition of π₯π π (Gπ ).Step 1: Decompose π₯π π (Gπ ) =
βπΆπ=1
π₯(π)π π
such that the following constraints are met:
β’ 0 β€ βππ=1π₯(π)π π
β€ 1,βπ = 1, ..., π , π = 1, ...,πΆ ;
β’ 0 β€ βππ=1π₯(π)π π
β€ 1,βπ = 1, ..., π , π = 1, ...,πΆ .
Step 2: For every π0 = 1, 2, ...,πΆ , decompose π₯(π0)π π
=βππ€=1
π₯(π0β1)π +π€π π
such that for anyπ€ = 1, ...,π ,
the following constraints are met:
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:16 Shizhen Zhao, Peirui Cao, and Xinbing Wang
β’ 0 β€ βπβAπ
βππ=1π₯(π0β1)π +π€π π
β€ 1,βπ = 1, ...,πΆ ;
β’ 0 β€ βππ=1
βπ βAπ
π₯(π0β1)π +π€π π
β€ 1,βπ = 1, ...,πΆ .
Here Aπ = {(π β 1)π + 1, (π β 1)π + 2, ..., ππ }.Based on Step 1&2, we obtain an integer decomposition π₯π π (Gπ ) =
βπΆπ=1
π₯(π)π π
=βππ=1
π₯ππ π satisfying
(1) 0 β€ βππ=1
βπβAπ
π₯ππ π β€ 1,βπ = 1, ..., π , π = 1, ...,πΆ;
(2) 0 β€ βππ=1
βπβAπ
π₯ππ π β€ 1,βπ = 1, ..., π , π = 1, ...,πΆ ;
(3) 0 β€ βπβAπ
βππ=1π₯ππ π β€ 1,βπ = 1, ...,πΆ, π = 1, ..., π ;
(4) 0 β€ βππ=1
βπ βAπ
π₯ππ π β€ 1,βπ = 1, ...,πΆ, π = 1, ..., π .
With π₯ππ π , the total number of logical links needed from ππ to π π can be computed as π¦π π =βππ=1
π₯ππ π
+ βππ=1
π₯π
ππ. Next, we will show that π¦π π is compatible with Gocs-mesh.
Recall that the OCS nodes in Gocs-mesh are arranged as a 2D mesh. Consider the OCS node in the
π1-th row and π2-th column. This OCS node connects the transmitters of the EPS node in group
Aπ1to the receivers of the EPS node in group Aπ2
. Then, if the following constraints are satisfiedβπ βAπ
2
π¦π π β€ 2,βπβAπ
1
π¦π π β€ 2, (14)
the logical topology π¦π π , π β Aπ1, π β Aπ2
will be realizable on this OCS. (14) can be verified below:βπ βAπ
2
π¦π π =βπ βAπ
2
πβπ=1
π₯ππ π
+βπ βAπ
2
πβπ=1
π₯π
ππβ€ 2;
βπβAπ
1
π¦π π =βπβAπ
1
πβπ=1
π₯ππ π
+βπβAπ
1
πβπ=1
π₯π
ππβ€ 2.
This completes the proof. β‘
Remark:Theorem 9 implies that π½ (Gocs-mesh) = minπ {`idealβπΏ/2β,π΅ (π )/`ideal
πΏ,π΅(π )}. The design ofGocs-mesh
shares a similar idea as the physical topology used by Sirius [2]. One critical difference between
Gocs-mesh and Siriusβ physical topology is that we connect βtwoβ links instead of one between each
connected pair of EPS node and OCS node. This subtle design allows us to derive the performance
guarantee in Theorem 9.
7 PERFORMANCE RATIO ANALYSISWe have performed theoretical analysis for four physical topologies Guniform, Geps-group, Geps-mesh
and Gocs-mesh, and related their throughput metrics with that of the ideal physical topology. Guniform
is optimal but scales poorly. Geps-group, Geps-mesh and Gocs-mesh have better scalability but are sub-
optimal. Accoring to Theorem 7, Theorem 8 and Theorem 9, It is easy to obtainπ½ (Geps-group) = minπ {π½ (Geps-group, π )}, where π½ (Geps-group, π ) = `idealβπΏ/3β,π΅ (π )/`
ideal
πΏ,π΅(π ),
π½ (Geps-mesh) = minπ {π½ (Geps-mesh, π )}, where π½ (Geps-mesh, π ) = `idealβπΏ/3β,π΅ (π )/`ideal
πΏ,π΅(π ),
π½ (Gocs-mesh) = minπ {π½ (Gocs-mesh, π )}, where π½ (Gocs-mesh, π ) = `idealβπΏ/2β,π΅ (π )/`ideal
πΏ,π΅(π ).
(15)
In this section, wewill calculate the approximated values of π½ (Geps-group), π½ (Geps-mesh) and π½ (Gocs-mesh).We focus on two cases below:
(1) ToE can generate multiple logical topologies to serve each traffic matrix. This setting was
adopted by [2, 20β22].
(2) ToE generates only one logical topology (i.e.,π = 1) for each traffic matrix. This setting was
adopted by [5, 10, 13].
For the first case, we obtain some theoretical results in Section 7.1. The second case is hard to
analyze, and we could only obtain some approximated numerical results in Section 7.2.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:17
7.1 Calculating π½ (Geps-group), π½ (Geps-mesh) and π½ (Gocs-mesh) in Case 1Definition 3. A traffic pattern π is π-decomposable if there exists π permutation traffic matrices
π1, π2, ..., ππ and π1 + π2 + Β· Β· Β· + ππ = 1, 0 β€ ππ β€ 1, such that π β€ βπ
π=1ππππ .
Definition 4. A traffic pattern π is a normalized traffic pattern if the following condition is met
max
{π
max
π=1
{πβπ=1
ππ π
},π
max
π=1
{πβπ=1
ππ π
}}= 1.
Lemma 10. For any normalized traffic pattern π , we must have `idealπΏ,π΅
(π ) β€ π΅πΏ.
Proof. Without loss of generality, we assume that the egress traffic from π1 sums to 1, e.g.,βππ=1
π1π = 1. Consider the network control policy that achieves the throughput value `idealπΏ,π΅
(π ).Then, `ideal
πΏ,π΅(π )βπ
π=1π1π = `
ideal
πΏ,π΅(π ) amount of traffic can be delivered from the EPS node π1 to other
EPS nodes in a unit time period. On the other hand, the total egress bandwidth of π1 is upper
bounded by π΅πΏ, because π1 has πΏ ports. Thus, `idealπΏ,π΅
(π ) β€ π΅πΏ. β‘
Lemma 11. Given a π-decomposable traffic pattern π that lasts Ξ amount of time, if the ToE policyallows generatingπ OCS configurations, then `ideal
πΏ,π΅(π ) β₯ π΅πΏ(1 β π
ππΏ) (1 β πΏπ
Ξ ).
Proof. We split Ξ into π time slots, such that Ξ(1) = Ξ(2) = Β· Β· Β· = Ξ(π) = Ξ/π β πΏ. In each
time slot, πΏ permutations can be formed because each EPS node has πΏ bidirectional links connected
the OCS. In total, we can formππΏ permutations.
Let π β€ βπ
π=1ππππ be the π-decomposition of π . For every permutation ππ , π = 1, 2, ..., π, we
configure OCSs such that ππ is formed β(ππΏ β π)ππβ times. It is easy to verify that
πβπ=1
β(ππΏ β π)ππβ β€πβπ=1
((ππΏ β π)ππ + 1) = ππΏ.
Hence, the above ToE strategy is feasible. Under this strategy, the bandwidth allocated to the ODC
satisfies
1
Ξ
πβπ=1
β(ππΏ β π)ππβπ΅(Ξ/π β πΏ)ππ β₯(ππΏ β π)π΅(Ξ/π β πΏ)
Ξ
πβπ=1
ππππ β₯(ππΏ β π)π΅(Ξ/π β πΏ)
Ξπ .
Therefore, under the above ToE strategy and the one-hop routing, we can achieve a throughput
value(ππΏβπ)π΅ (Ξ/πβπΏ)
Ξ . Based on the definition of ` (G, π ), we must have
`idealπΏ,π΅ (π ) β₯ (ππΏ β π)π΅(Ξ/π β πΏ)Ξ
= π΅πΏ(1 β π
ππΏ) (1 β πΏπ
Ξ).
β‘
Note that Lemma 11 holds for any π . According to the definitions of π½ (Geps-group), π½ (Geps-mesh)and π½ (Gocs-mesh) in (15), it is sufficient to consider normalized traffic patterns, because `ideal
πΏ,π΅(π )
increases linearly if we scales π linearly. Further, based on the Birkhoff and von Neumann theo-
rem [6], any normalized traffic pattern π is π-decomposable with π β€ π 2 β 2π + 2. Combining all
the above analysis with Lemma 10, we then obtain
Corollary 11.1. Let πΎ (π, π) =(1 β π 2β2π+2
ππ
) (1 β πΏπ
Ξ
), we have
βπΏ/3βπΏ
max
π{πΎ (π, βπΏ/3β)} β€ π½ (Geps-group) β€
βπΏ/3βπΏ
(max
π{πΎ (π, πΏ)}
)β1
,
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:18 Shizhen Zhao, Peirui Cao, and Xinbing Wang
βπΏ/3βπΏ
max
π{πΎ (π, βπΏ/3β)} β€ π½ (Geps-mesh) β€
βπΏ/3βπΏ
(max
π{πΎ (π, πΏ)}
)β1
,
βπΏ/2βπΏ
max
π{πΎ (π, βπΏ/2β)} β€ π½ (Gocs-mesh) β€
βπΏ/2βπΏ
(max
π{πΎ (π, πΏ)}
)β1
.
When the OCS reconfiguration delay πΏ = 0, πΎ (π, π) can be arbitrarily close to 1 as we increase
the number of OCS configurationsπ . In this case, we have
Corollary 11.2. If πΏ = 0, then π½ (Geps-group) = βπΏ/3βπΏ
, π½ (Geps-mesh) = βπΏ/3βπΏ
, π½ (Gocs-mesh) = βπΏ/2βπΏ
.
7.2 Calculating π½ (Geps-group), π½ (Geps-mesh) and π½ (Gocs-mesh) in Case 2 Whereπ = 1
π = 1 means that only one logical topology is generated for each traffic pattern π (see the definition
of ToE in Section 3.1).Whenπ = 1, the lower bound of `idealπΏ,π΅
(π ) given by Lemma 11 can be extremely
loose, and thus may not be useful for calculating π½ (Geps-group), π½ (Geps-mesh) and π½ (Gocs-mesh). Inthis case, we perform numerical analysis instead. Note that exhaustively enumerating all the
different traffic matrices is not feasible. We simply generate random traffic matrices to perform the
calculation. We use two approaches to generate random traffic matrices:
Approach 1: Randomly generate π permutation traffic matrices π (1) , π (2) , ..., π (π)and π pa-
rameters 0 β€ πΎ1, πΎ2, ..., πΎπ β€ 1 satisfying
βπ
π=1πΎπ = 1. Let π =
βπ
π=1πΎπ π
(π). It is easy to verify
that the traffic matrix π generated with this approach is a normalized traffic pattern.
Approach 2: Generate a random non-negative number for each entry ππ π , π β π , and then
normalize π such that max{maxππ=1
{βππ=1
ππ π },maxππ=1
{βππ=1
ππ π }} = 1.
For every π and every πΏ, we generate a number of normalized random traffic patterns using
the above two approaches, and calculate throughput values `idealβπΏ/3β,π΅ (π ), `ideal
βπΏ/3β,π΅ (π ), `ideal
βπΏ/2β,π΅ (π ),`idealπΏ,π΅
(π ) for every traffic pattern π . We use three approaches to calculate the throughput values:
Joint Optimization: Directly solve the optimization problem (5) using Gurobi [23].
Decoupled Optimization: First, compute a logical topologyΒ―G (1)π
that best approximates the
traffic pattern π by solving the following optimization problem:
max
πβπ=1
πβπ=1
(πΏππ π β βπΏππ π β
)π₯π π ( Β―G (1)
π)
s.t.βπΏππ π β β€ π₯π π ( Β―G (1)π
) β€ βπΏππ π β, for all π, π = 1, 2, ..., π ,
πβπ=1
π₯ππ ( Β―G (1)π
) β€ πΏ,
πβπ=1
π₯π π ( Β―G (1)π
) β€ πΏ, for all π, π = 1, 2, ..., π .
(16)
Then, solve the optimization problem (5) with the logical topology G (1)π
being fixed asΒ―G (1)π
.
Fast Approximation: First, compute a logical topologyΒ―G (1)π
by solving (16). Second, find all
the shortest paths for every EPS node pair inΒ―G (1)π
. Third, calculate the throughput value
based on Equal-Cost Multi-Path (ECMP) routing.
The Joint Optimization approach solves topology and routing jointly, and gives the exact through-
put values. However, this approach has the highest computational complexity. The Decoupled
Optimization approach reduces algorithmic complexity by solving topology and routing separately.
The Fast Approximation approach further reduces computational complexity by using a fixed
ECMP routing. Notably, the second and the third approaches can only compute approximated
throughput and performance ratio values.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:19
7.2.1 Approximating π½ (Geps-group), π½ (Geps-mesh) and π½ (Gocs-mesh). We generate a list of (π, πΏ) pairs:(7, 10), (8, 10), (9, 10), (15, 10), (20, 10), (30, 16), (40, 16), (128, 32), (256, 32), (512, 64) and (1024, 64).For each (π, πΏ) pair, we generate a set of 60 normalized traffic patterns, and denote this set by Fπ,πΏ .For every traffic pattern π β Fπ,πΏ , we use the above three approaches to compute the per-traffic-
pattern performance ratio π½ (G, π ), G β {Geps-group,Geps-mesh,Gocs-mesh} as defined in (15).
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
28,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
0.2
0.3
0.4
0.5
Per
-Tra
ffic-
Pat
ternΞ²
Joint Decoupled Fast
(a) EPS-Group
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
28,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
0.150.200.250.300.350.40
Per
-Tra
ffic-
Pat
ternΞ²
Joint Decoupled Fast
(b) EPS-Mesh
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
28,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
0.20.30.40.50.60.7
Per
-Tra
ffic-
Pat
ternΞ²
Joint Decoupled Fast
(c) OCS-Mesh
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
30,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
10β310β210β1
100101102103104
Ru
nn
ing
Tim
e(s
) X Unfinished X Unfinished
Joint Decoupled Fast
(d) Running Time Comparison
Fig. 6. Calculating π½ (G, π ) using Joint Optimization, Decoupled Optimization and Fast Approximation.
Since the Joint Optimization approach can compute the exact per-traffic-pattern performance
ratio values, we consider π½joint (G) = minπ βFπ,πΏπ½joint (G, π ) as the best approximation to π½ (G) for
every G β {Geps-group,Geps-mesh,Gocs-mesh}. However, calculating π½joint (G, π ) is computationally
expensive. The Joint Optimization approach can only compute π½joint (G) for the (π, πΏ) pairs (7, 10),(8, 10) and (9, 10). When π > 9, the Joint Optimization approach may not be able to compute a
solution even after running a few hours. Hence, in the following, we study how to approximate
π½joint (G) using the other two approaches.
We first use the Decoupled Optimization approach to approximate π½joint (G). For every π β Fπ,πΏ ,we compute π½decoupled (G, π ) using the Decoupled Optimization approach. As shown in Fig. 6,
the values of π½decoupled (G, π ) exhibit much higher variance than that of π½joint (G, π ). If we use
minπ βFπ,πΏπ½decoupled (G, π ) to approximate π½decoupled (G), the gap between the estimated value and
the true value would be too large. Hence, we adopt the following heuristic. Let πΌdecoupled (G, π)be the π-th percentile value among all the values of π½decoupled (G, π ). We aim to find π such that
πΌdecoupled (G, π) β π½joint (G). From Fig. 6, we can see that π (Geps-group) β 25th, π (Geps-mesh) β 35th
and π (Gocs-mesh) β 25th when (π, πΏ) = (7, 10), (8, 10) or (9, 10). Then, we use πΌdecoupled (G, π (G)) toapproximate π½joint (G) for larger data centers. However, even this Decoupled Optimization approach
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:20 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Table 2. The values of π β²(G) for different (π, πΏ)βs. When π β€ 9, we pick π β²(G) such that πΌfast (G, π β²(G)) βπ½joint (G). When 9 < π β€ 40, we pick π β²(G) such that πΌfast (G, π β²(G)) β πΌdecoupled (G, π (G)).
(7,10) (8,10) (9,10) (15,10) (20,10) (30,16) (40,16)
π β²(Geps-group) 36th 26th 23th 35th 33th 58th 56th
π β²(Geps-mesh) 28th 31th 20th 36th 36th 56th 61th
π β²(Gocs-mesh) 48th 31th 26th 46th 45th 65th 63th
Table 3. Approximating π½ (Geps-group), π½ (Geps-mesh) and π½ (Gocs-mesh) using the Fast Approximation approach.
(7,10) (8,10) (9,10) (15,10) (20,10) (30,16)
π½ (Geps-group) 0.34 0.34 0.33 [0.28,0.31] [0.27,0.3] [0.25,0.28]
π½ (Geps-mesh) 0.24 0.23 0.22 [0.17,0.21] [0.16,0.19] [0.18,0.22]
π½ (Gocs-mesh) 0.45 0.45 0.44 [0.39,0.43] [0.36,0.42] [0.36,0.42]
(40,16) (128,32) (256,32) (512,64) (1024,64)
π½ (Geps-group) [0.24,0.26] [0.22,0.27] [0.23,0.27] [0.24,0.28] [0.25,0.29]
π½ (Geps-mesh) [0.18,0.2] [0.2,0.25] [0.19,0.24] [0.22,0.26] [0.21,0.28]
π½ (Gocs-mesh) [0.35,0.4] [0.37,0.45] [0.39,0.44] [0.38,0.43] [0.37,0.45]
cannot scale beyond 40 EPS nodes. When π > 40, the Decoupled Optimization approach may not
produce any solution after running a few hours.
We then use the Fast Approximation approach to approximate π½joint (G). As shown in Fig. 6,
the values of π½fast (G, π ) exhibit even higher variance, and the gap between minπ βFπ,πΏπ½fast (G, π )
and π½decoupled (G) would be even larger. To reduce the estimation gap, we let πΌfast (G, π β²) be theπ β²-th percentile value among all the values of π½fast (G, π ). We aim to find π β² such that πΌfast (G, π β²) βπ½joint (G) when π β€ 9 and πΌfast (G, π β²) β πΌdecoupled (G, π (G)) when 9 < π β€ 40. The values of
π β²(G) vary for different physical topologies and different (π, πΏ) pairs. The detailed values are
summarized in Table 2. For every G β {Geps-group,Geps-mesh,Gocs-mesh}, we pick the minimum and
the maximum values of π β²(G), and denote them by π β²min
(G) and π β²max
(G). Then, we can generate a
range [πΌfast (G, π β²min
(G)), πΌfast (G, π β²max(G))] to approximate π½joint (G) β π½ (G). We summarize the
approximation results for all π > 9 in Table 3.
Note: The idea of using percentile values to estimate π½ (G)βs is just a heuristic. There is no guar-antee that the true values of π½ (G) are indeed within the ranges provided by Table 3. Nevertheless,
we believe that our numerical estimation gives readers a sense on the magnitude of π½ (G).
8 CONCLUSIONWe study physical topology design for ODCs, and offer a novel methodology to analyze the
performance guarantee for different physical topologies in this paper. Based on our methodology,
we prove that the uniform bipartite graph based physical topology is optimal for small-scale
data centers with π β€ π , while optimal physical topologies do not exist when π > π . We also
design three physical topologies that support larger-scale ODCs, and prove their performance gaps
with respect to the ideal physical topology. The methodology in this paper provides a theoretical
foundation for ODC physical topology design, and may help network operators design physical
topologies with a better guarantee.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:21
There are a number of interesting directions for future work. First, we do not assume any prior
knowledge on the traffic patterns while we define the performancemetric π½ (G). In practice, differentToR switches may belong to different production groups, and the intra-group traffic can be higher
than the inter-group traffic. It would be interesting to study how such traffic knowledge affects
physical topology design. Second, in addition to throughput, deployment/management/expansion
complexities may also affect the physical topology design for ODC. It remains open to find a
physical topology with the best tradeoff among different performance metrics for ODC.
9 ACKNOWLEDGEMENTThis work was supported by the NSF China under Grant 61902246. We also thank our shepherd
Rachee Singh, and the SIGMETRICS reviewers.
REFERENCES[1] Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network
architecture. In SIGCOMM.
[2] Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou,
Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams. 2020. Sirius: A Flat Datacenter Network with Nanosecond
Optical Switching. In SIGCOMM.
[3] Inc. CALIENT Technologies. [n.d.]. https://www.calient.net/.
[4] Inc. CALIENT Technologies. [n.d.]. S Series Optical Circuit Switch. https://www.calient.net/products/s-series-photonic-
switch/.
[5] Peirui Cao, Shizhen Zhao, Min Yee Teh, Yunzhuo Liu, and Xinbing Wang. 2021. TROD: Evolving From Electrical Data
Center to Optical Data Center. In ICNP.[6] Cheng-shang Chang, Wen-Jyh Chen, and Hsiang-Yi Huang. 2000. Birkhoff-von Neumann Input-Buffered Crossbar
Switches. In INFOCOM.
[7] Kai Chen, Ankit Singla, Atul Singh, Kishore Ramachandran, Lei Xu, Yueping Zhang, Xitao Wen, and Yan Chen. 2012.
OSA: An optical switching architecture for data center networks with unprecedented flexibility. In NSDI.[8] Li Chen, Kai Chen, Zhonghua Zhu, Minlan Yu, George Porter, Chunming Qiao, and Shan Zhong. 2017. Enabling
wide-spread communications on optical fabric with megaswitch. In NSDI.[9] Nandita Dukkipati and Nick McKeown. 2006. Why Flow-Completion Time is the Right Metric for Congestion Control.
In SIGCOMM Review.[10] Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya,
Yeshaiahu Fainman, George Papen, and Amin Vahdat. 2010. Helios: a hybrid electrical/optical switch architecture for
modular data centers. In SIGCOMM.
[11] Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre Alexan-
dre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel Kilper. 2016. ProjecToR: Agile Reconfigurable Data
Center Interconnect. In SIGCOMM.
[12] Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, and Sudipta Sengupta. 2009. VL2: A Scalable
and Flexible Data Center Network. In SIGCOMM.
[13] Navid Hamedazimi, Zafar Qazi, Himanshu Gupta, Vyas Sekar, Samir R. Das, Jon P. Longtin, Himanshu Shah, and
Ashish Tanwer. 2014. FireFly: A Reconfigurable Wireless Data Center Fabric Using Free-Space Optics. In SIGCOMM.
[14] Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer.
2013. Achieving High Utilization with Software-Driven WAN. In SIGCOMM.
[15] Co. Huawei Technologies. [n.d.]. https://e.huawei.com/en/products/enterprise-networking/switches/data-center-
switches/ce8800.
[16] Robert W. Irving and Mark R. Jerrum. 1992. Three-dimensional Statistical Data Security Problems. SIAM J. Comput. 23(October 1992), 170β184.
[17] John Kim, William J. Dally, and Dennis Abts. 2007. Flattened Butterfly : A Cost-Efficient Topology for High-Radix
Networks. In ISCA.[18] Gengchen Liu, Roberto Proietti, Marjan Fariborz, Pouya Fotouhi, Xian Xiao, and S.J. Ben Yoo. 2020. Architecture and
Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks. In SC.[19] He Liu, Feng Lu, Alex Forencich, Rishi Kapoor, Malveeka Tewari, Geoffrey M Voelker, George Papen, Alex C Snoeren,
and George Porter. 2014. Circuit Switching Under the Radar with REACToR. In NSDI.[20] He Liu, Matthew K. Mukerjee, Conglong Li, Nicolas Feltman, George Papen, Stefan Savage, Srinivasan Seshan,
Geoffrey M. Voelker, David G. Andersen, Michael Kaminsky, George Porter, and Alex C. Snoeren. 2015. Scheduling
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:22 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Techniques for Hybrid Circuit/Packet Networks. In CoNEXT.[21] William M Mellette, Rajdeep Das, Yibo Guo, Rob McGuinness, Alex C Snoeren, and George Porter. 2020. Expanding
across time to deliver bandwidth efficiency and low latency. In NSDI.[22] William M. Mellette, Rob Mcguinness, Arjun Roy, Alex Forencich, and George Porter. 2017. RotorNet: A Scalable,
Low-complexity, Optical Datacenter Network. In SIGCOMM.
[23] Gurobi Optimization. [n.d.]. https://www.gurobi.com/.
[24] George Porter, Richard Strong, Nathan Farrington, Alex Forencich, Pang Chen-Sun, Tajana Rosing, Yeshaiahu Fainman,
George Papen, and Amin Vahdat. 2013. Integrating microsecond circuit switching into the data center. In SIGCOMM.
[25] Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai,
Bob Felderman, Paulie Germano, et al. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in
Googleβs Datacenter Network. In SIGCOMM.
[26] Guohui Wang, David G Andersen, Michael Kaminsky, Konstantina Papagiannaki, TS Ng, Michael Kozuch, and Michael
Ryan. 2010. c-Through: Part-time optics in data centers. In SIGCOMM.
[27] Ke Wen, Payman Samadi, Sebastien Rumley, Christine P. Chen, Yiwen Shen, Meisam Bahadori, Keren Bergman, and
Jeremiah Wilke. 2016. Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics. In SC.[28] Mingyang Zhang, Radhika Niranjan Mysore, Sucha Supittayapornpong, and Ramesh Govindan. 2019. Understanding
lifecycle management complexity of datacenter topologies. In NSDI.[29] Shizhen Zhao, Rui Wang, Junlan Zhou, Joon Ong, Jeffery C. Mogul, and Amin Vahdat. 2019. Minimal Rewiring: Efficient
Live Expansion for Clos Data Center Networks: Extended Version. In NSDI https://ai.google/ research/pubs/pub47492.
A NP-COMPLETENESS OF CALCULATING ` (G, π )We prove the following theorem in this section.
Theorem 12. The problem of calculating ` (G, π ) for a given physical topology G and a giventraffic pattern π is NP-Complete.
We prove Theorem 12 by reducing the following three-dimensional sum problem to the problem
of calculating ` (G, π ). The three-dimensional sum problem was proven to be NP-Complete [16].
Definition 5. (Three-dimensional Sum Problem) Given a positive integer π and three non-negativeinteger matrices π· (π, π), πΈ (π, π) and πΉ (π, π), where π, π, π = 1, 2, ..., π and
πβπ=1
πβπ=1
π· (π, π) =πβπ=1
πβπ=1
πΈ (π, π) =πβπ=1
πβπ=1
πΉ (π, π),
does there exist a non-negative integer assignment of π΄(π, π, π) such thatβππ=1π΄(π, π, π) = π· (π, π),βπ
π=1π΄(π, π, π) = πΈ (π, π),βπ
π=1π΄(π, π, π) = πΉ (π, π).
(17)
We are now ready to prove Theorem 12.
Proof. Given an instance of the three-dimensional sum problem, we construct an ODC physical
topology G and a trafficmatrix π as follows. G has 2π EPS nodes andπ OCS nodes. The 2π EPS nodes
are separated into two groups, each with π EPS nodes. There are π· (π, π) number of bidirectional
links between the π-th EPS node in Group 1 and the π-th OCS node; there are πΈ (π, π) number of
bidirectional links between the π-th OCS node and the π-th EPS node in Group 2. Each link is
assumed to have its link capacity π΅ = 1. The traffic demand from the π-th EPS node in Group 1 to the
π-th EPS node in Group 2 is πΉ (π, π). There is no traffic demand within each EPS group or from Group
2 to Group 1. In total, this network has π2flows, and the traffic matrix π is an 2π Γ 2π matrix with
at most π2non-zero entries (see Fig.7). For topology engineering, we only allow generating one
logical topology for each traffic matrix, i.e.,π = 1. Next, we will prove that the three-dimensional
sum problem is satisfiable if and only if ` (G, π ) = 1.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:23
β¦
β¦β¦β¦ β¦ β¦
Traffic demand
Fig. 7. Network topology and traffic model.
Sufficiency (β): Let π΄(π, π, π) be a solution of the three-dimensional sum problem. Then, we
can set up the OCS configurations as follows: for the π-th OCS, establish π΄(π, π, π) number of
bidirectional links between the π-th EPS node in Group 1 and the π-th EPS node in Group 2. Under
this set of OCS configurations, we establish
βππ=1
π΄(π, π, π) = πΉ (π, π) number of logical links between
the π-th EPS node in Group 1 and the π-th EPS node in Group 2. Then, if we route every flow
along its shortest path, it is easy to check that the achievable throughput ` (G, π ) = 1. Note that
` (G, π ) = 1 is indeed the optimal throughput, because the bisection bandwidth from Group 1 to
Group 2
βππ=1
βππ=1
π· (π, π) = βππ=1
βππ=1
βππ=1
π΄(π, π, π) divided by the total traffic demand from from
Group 1 to Group 2
βππ=1
βππ=1πΉ (π, π) = βπ
π=1
βππ=1
βππ=1
π΄(π, π, π) is exactly 1.
Necessity (β): Let Gπ be the logical topology such that the optimal throughput ` (G, π ) = 1 is
achieved, and let Ξ = {Ξ 1,Ξ 2, ...,Ξ π} be the corresponding configurations of the π OCS nodes.
Since the throughput is 1 under the logical topology Gπ , the bisection bandwidth from Group 1 to
Group 2 in Gπ must be at least
βππ=1
βππ=1πΉ (π, π). Note that the egress capacity of Group 1 and the
ingress capacity of Group 2 are both at most
πβπ=1
πβπ=1
π· (π, π) =πβπ=1
πβπ=1
πΈ (π, π) =πβπ=1
πβπ=1
πΉ (π, π).
Hence, in order to deliver
βππ=1
βππ=1πΉ (π, π) amount of traffic, the following statements must be true
for Gπ :(1) Every egress EPS port of Group 1 must be connected with an ingress EPS port of Group 2
throughput an OCS node;
(2) Every flow must be delivered directly from its source to its destination.
Now, let π΄(π, π, π) be the number of links from the π-th EPS node in Group 1 to the π-th EPS node
in Group 2 formed in the π-th OCS node. Note that there are π· (π, π) number of bidirectional
links between the π-th EPS node in Group 1 and the π-th OCS node; there are πΈ (π, π) number of
bidirectional links between the π-th OCS node and the π-th EPS node in Group 2. Combined with
the first statement, we must have
βππ=1π΄(π, π, π) = π· (π, π) and βπ
π=1π΄(π, π, π) = πΈ (π, π). According
to the second statement and the fact that the bisection bandwidth from Group 1 to Group 2 is at
most
βππ=1
βππ=1πΉ (π, π), the total number of links from the π-th EPS node in Group 1 to the π-th EPS
node in Group 2 must be exactly πΉ (π, π). Hence, βππ=1
π΄(π, π, π) = πΉ (π, π).The above analysis indicates that all the three-dimensional sum problem can be reduced to a
ODC throughput optimization problem. Since the three-dimensional sum problem is NP-Complete,
the ODC throughput optimization problem must also be NP-Complete. β‘
B PROOF OF LEMMA 2Proof. Let (Gβ(π)
π, π βπ π (π’, π£)) be the optimal topology+routing solution that attains the optimal
throughput ` (G, π ). Next we consider ` (Gβ², π ).
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:24 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Note that the logical topologies Gβ(π)π
are realizable as overlay topologies of Gβ². Let G
β²β(π)π
be
the corresponding underlay logical topology of Gβ(π)π
. Consider the following overlay-topology
based routing allocation in Gβ². For every flow from ππ to π π , we allocate π
βπ π (π’, π£) amount of traffic
to the overlay paths from ππ’ to ππ£ . It is easy to verify that the resulting throughput is ` (G, π ). Notethat restricting routing allocation to overlay paths reduces routing flexibility. In contrast, our TE
formulation (3)(4) computes network throughput based on the underlay topologies Gβ²β(π)π
directly,
and thus the resulting throughput must be no smaller than ` (G, π ). Combined with the definition
of ` (Gβ², π ), we must have ` (Gβ², π ) β₯ ` (G, π ). β‘
C NOTATIONSThe notations in this paper are summarized in Table 4.
Table 4. Notations used in this paper
π , S π electrical-packet-switching (EPS) nodes S = {π1, π2, ..., ππ }.πΎ , O πΎ optical-circuit-switching (OCS) nodes O = {π1,π2, ...,ππΎ }.G Physical topology G = (V, E), where V = S βͺ O. (see Β§3.1).
Gπ Logical topology Gπ = Gπ (G,Ξ ) = (Vπ , Eπ ) (see Β§3.1).π₯π π π₯π π (Gπ ) denotes the number of links from ππ to π π in Gπ .π½ (G) Performance ratio of the physical topology G (see Β§3.3).
π Traffic matrix among the π EPS nodes (see Β§3.1).
` (G, π ) Optimal throughput for the given physical topology G and traffic matrix π (see Β§3.2).
πΏ Each EPS node has πΏ ports.
π Each OCS node has π ingress ports and π egress ports.
π΅ EPS port capacity.
Ξ OCS configuration Ξ = {Ξ 1,Ξ 2, ...,Ξ πΎ } (see Β§3.1).π The number of logical topologies generated by ToE for each traffic matrix (see Β§3.2).
ππ π (π’, π£) Amount of traffic from ππ to π π that is allocated to the link (ππ’, ππ£) (see Β§3.2).πΆ , Aπ πΆ mutually disjoint EPS node groups satisfying βͺπΆπ=1
Aπ = S.
Received August 2021; revised October 2021; accepted November 2021
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.