View
230
Download
0
Tags:
Embed Size (px)
Citation preview
Packet-Mode Emulation of Output-Queued Switches
David Hay, CS, Technion
Joint work with Hagit Attiya (CS, Technion),
Isaac Keslassy (EE, Technion)
Trend towards Packet-Mode
Cell-mode scheduling is getting too hard Fragmentation and reassembly should work very fast,
at the external rate Extra header for each cell loss of bandwidth
For optical switches such fragmentation and reassembly are prohibitive
Cell-mode schedulers are packet-oblivious Degradation of the overall performance
Packet-Mode Scheduling
No need for fragmentation and reassembly Must ensure contiguous packet delivery over the
fabric While input i delivers a packet to output j, neither input
i nor output j can handle other packets.
Can packet-mode schedulers provide similar
performance guarantees as cell-mode schedulers?
[Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006]
Output Queuing Emulation
OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice…
Emulation: Same input traffic same output traffic
How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?
Output Queuing Emulation
OQ switches are considered optimal with respect to queuing delay and throughput But too hard to implement in practice…
Emulation: Same input traffic same output traffic
How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?
Easy with speedup S=N N scheduling decisions every time-slot:
In the 1st decision forward the cell of input 1 In the 2nd decision forward the cell of input 2⋮ In the Nth decision forward the cell of input N
Possible with speedup S2: CCF algorithm Lower bound: S≥2-1/N is required
[Chuang et al.,1999]
Cell-Mode Emulation is Possible
What is the speedup required for
packet-mode emulation?
Emulation w/ Relative Queuing Delay
The CIOQ switch is allowed a bounded lag behind the shadow OQ switch
Exact same behavior as the optimal OQ switch, but with some extra delay Called relative queuing delay
Can we provide packet-mode OQ emulation with bounded RQD and small speedup?
Our Results: Speedup-RQD tradeoff
Speedup
RQD
2
4
2Lmax
Lower bound on RQD (even with infinite speedup)
Lower bound on the speedup (from cell-mode scheduling)
Generalization of cell-mode scheduling with S=2: Taking each packet of size ≤ Lmax as one huge cell
Lmax=maximum packet size
First algorithm: S 4 with RQD=O(NLmax)
Underlying CCF Algorithm
Observation: Packet-Mode OQ switch is a Cell-Mode OQ switch with different queuing discipline (called PIFO)
Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cell-mode OQ switch [Chuang et al.,1999]
But, CCF does not maintain contiguous packet forwarding over the fabric!
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
PIFO Cell-Mode OQ
=
Intuition for Emulation Algorithms
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
Two sub-steps:1. Framing2. Contiguous Decomposition
Frame-Based Schedulers
Works in pipelined frame-based manner
Within each frame: Build a demand matrix for this frame Schedule the demand matrix of the
previous frame
time
At each frame of size T, CCF forwards at most 2T cells from each input and to each output.
Building the Demand Matrix
3012
1221
2220
0213
Number of cells CCF sent from input 1 to output 1 in
the last frame
+ + +
+
+
+
+
+
+ +
+
+
≤ 2T
≤ 2T
≤ 2T
≤ 2T
++++
++++
++++≤≤ ≤ ≤
Problem: A packet may span several frames.
2T 2T 2T 2T
Building the Demand Matrix
Count only packets whose last cell is forwarded by the CCF in the frame
Each row/column in the matrix is bounded by 2T+N(Lmax-1)For each input-output pair only cells of one
additional packet can be added.
Translates into RQD of 2T+Lmax-2.
Intuition for Emulation Algorithms
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
Two sub-steps:1. Framing2. Contiguous Decomposition
Decomposing the Demand Matrix Challenge: Decompose the matrix into permutations
while maintaining contiguous packet delivery. Each permutation dictates a scheduling decision. Speedup = Number of permutations/Frame Length
First try: optimal Birkhoff von-Neumann decomposition results in 2T+N(Lmax-1) permutations.
0010
0100
1000
0001
1000
0010
0100
0001
1000
0100
0010
0001
3012
1221
2220
0213
0001
0010
1000
0100
0001
1000
0100
0010
1000
0001
0010
0100
Contiguous Greedy Decomposition
To maintain contiguous packet delivery: If (i,j) was matched in iteration t-1 and there are more
(i,j) cells to schedule keep for iteration t.
Find a greedy matching for the rest of the matrix.
1000
0010
0100
0001
Iteration t-1
1000
0010
0100
0001
Iteration t
Cells left from 1 to 1
0010
0100
1000
0001
T
LN 1)1(24 max Speedup: RQD: 2T+Lmax-2
Packet-Mode Emulation w/ S2
Separate demand matrix for every possible packet size
Concatenate packets of the same size into mega-packets of size k=LCM(1,…,Lmax)
Leftover matrix for each size m
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
Two sub-steps:1. Framing2. Contiguous
Decomposition
Packet-Mode Emulation w/ S2
Optimally decompose (w/ Birkhoff von-Neumann) the mega-packets
matrix then the leftover
matrices
Packet Mode CIOQ
Packet Mode OQ
Cell Mode CIOQ w/ S=2
Two sub-steps:1. Framing2. Contiguous
Decomposition
T
kLNS
)1(2 max
22 max LTRQD
Wrap-up
Packet-mode scheduling can be done with the same speedup as cell-mode scheduling
With the price of bounded RQD Future work: lower bounds
??