Upload
baldwin-farmer
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Belief-Propagation Assisted Scheduling in Input-Queued
Switches
S. Atalla1, D. Cuda2, P. Giaccone1, M. Pretti2
1Politecnico di Torino2Italian National Research Council
Hot Interconnects 2010August 2010
Outline Background motivations System model Basic belief-propagation algorithm for MWM Assisted scheduling Belief-propagation for assisted scheduling Performance evaluation Hardware implementation Conclusions
2 Hot Interconnects 2010
Background motivations
Internet traffic is steadily increasing Routers and switches require to process growing
amount of data faster and faster Input Queued (IQ) switches can be
considered as a reference architecture Memory speed = line rate
IQ switches require suitable scheduling algorithms that Ensure good performance (throughput, delay,) Run fast (few ns to take each scheduling decision) Are implementable in hardware (HW)
3 Hot Interconnects 2010
System model NxN crossbar with Virtual Output Queuing
one FIFO queue for each input output pair total of N2 queues
Synchronous architecture: time is slotted fixed sized packets
Hot Interconnects 20104
Scheduling algorithm At each timeslot, the scheduler selects a set of
head-of-line packets compatible with the crossbar constraint: At the most one packet can be transferred to/from
each output/input port equivalent to choose a matching in a bipartite graph
Inputs: lengths of the VOQ Outputs: matching described through binary variable:
xij=1 iff input i transfer packet to output j )]([ txtX ij
)]([ tqtQ ij
qij
Scheduler(MWM, iSLIP, iLQF,
…)
tQ tX
x00=1
x33=05 Hot Interconnects 2010
3 33 3
0 00 0
Scheduling algorithm dichotomy Maximum Weight Matching (MWM) is
Optimal in terms of performance Difficult to implement in HW
O(N3) operations, difficult to be parallelized
Heuristic algorithms mimicking MWM E.g., iSLIP, iLQF, WFA (and many others) Efficient to be implemented in HW
e.g., iSLIP was implemented in CISCO 12000 serie Possible traffic losses under critical traffic patterns
Hot Interconnects 20106
Basic belief-propagation for MWM Recently, Belief-Propagation (BP) algorithm has been
proposed to solve MWM problem [1,2] BP algorithms are message passing algorithms firstly
conceived to study Graphical Models (GMs) GMs combine graphic theory and probability theory
BP is exact for MWM over bipartite graph (see [1]), but To ensure convergence, MWM must be unique
Small random noise can be added to queue length It takes O(N3/ε) to converge
ε: difference in weight between the first two heaviest matchings not known a priori
Hot Interconnects 20107
[1]M. Bayati, D. Shah, and M. Sharma, “Max-product for maximum weight matching: Convergence, correctness, and LP duality,” Information Theory, IEEE Transactions on, vol. 54, no. 3, pp. 1241–1251, Mar. 2008.
[2]M. Bayati, B. Prabhakar, D. Shah, and M. Sharma, “Iterative scheduling algorithms,” in INFOCOM 2007, IEEE, 6-12 2007, pp. 445 –453.
Basic belief-propagation for MWM
0 0
8 Hot Interconnects 2010
3 3
Basic belief-propagation for MWM
0 0
9 Hot Interconnects 2010
3 3
Basic belief-propagation for MWM
0 0
10 Hot Interconnects 2010
3 3
Basic belief-propagation for MWM
0 0
11 Hot Interconnects 2010
3 3
00030
01030
02030
130 ,,,0max fqfqfqb
Basic belief-propagation for MWM
0 0
After convergence, each output it is matched to the input associated with the largest message.
12 Hot Interconnects 2010
Assisted scheduling
Our major contribution is the introduction of the concept of assisted scheduling: Instead of the queue length, scheduling
algorithms are modified to use messages computed by BP as weights We show that BP assisted scheduling boosts
performance of existing schedulers while keeping backward compatibility
13 Hot Interconnects 2010
Assisted scheduling We introduce the Belief-Propagation Message-Processing
module between the VOQs and the Scheduler
BP-MP computes message values as a function of the queue length Q(t), based on a BP algorithm
The scheduler works in the usual way, but scheduling decisions are based on the messages F(t) computed by the BP-MP module instead that on Q(t) F(t) can be see as a correction of the VOQ lengths Q(t)
tQBP-MPfew I
tF
tF
Scheduler
tX
14 Hot Interconnects 2010
Assisted scheduling BP propagation has been improved with:
Relaxation of the MWM uniqueness constraint We do not need BP to converge anymore No random noise
Finite (and small) number of iterations Integer number representation Memory Self-Asynchronous update
Hot Interconnects 201015
Messages for assisted scheduling It runs for a fixed (and small) number of iterations I
Hot Interconnects 201016
Messages are bounded
Messages represented through
integer numbersSame numerical
range of the queue length(around log2 Qmax bits)
Memory for assisted schedulingQueues exhibit a strong correlation that is reflected in the message dynamics
Queue length can change at the most by 1 at each timeslot
Memory: messages are initialized to the last computed messages
Memory speeds up convergence
17 Hot Interconnects 2010
Self-asynchronous update for assisted schedulingStudies in BP showed that messages updated in a random sequential order are beneficial for the convergence (asynchronous update) Not easy to implement in HW
Self-asynchronous update:•exploits randomness of the arrival process•updates only messages associated with queues which have changed from the previous timeslot•mimics asynchronous update
18 Hot Interconnects 2010
Scheduling algorithms iLQF vs. BP assisted iLQF (BP-iLQF)
Distributed greedy algorithm Each input (each output) is equipped with an arbiter which selects
output (input) associated with the longest queue Greedy MWM (GMWM) vs. BP assisted GMWM (BP-GMWM)
centralized scheduling, iterating N times at each iteration it selects the unmatched input/output couple
associated with the longest queue iSLIP
as iLQF, but sending only a binary information (queue empty/not-empty)
Hot Interconnects 201019
Performance evaluation settings Simulation settings:
Traffic patterns:
Critical traffic pattern
20 Hot Interconnects 2010
Performance evaluation results
BP assisted scheduling improves performance (I=3)
Memory
No Memory
21 Hot Interconnects 2010
Self-asynchronous
Synchronous
Asynchronous
Hardware design: General overview
2N modules running in parallel
BP-MPBackward messages
Forward messages
• When n=I, IM sends F(t) to the scheduler
• IM and OM perform the same operations
VOQ tQ
tFScheduler
22 Hot Interconnects 2010
Hardware design: IM details
Self-asynchronous:if wij(t)≠ wij(t-1) eij=1else eij=0
Flags associated with VOQ at input i
Memory: registers storingmessages computed during the previous timeslot
Max operation
Tournament implementationlog2 (N-1) stages and (N-2) comparisons
c used to select between 0 and the result of the subtraction operation
Subtraction operation
When n=I messages are sent to the scheduler
23 Hot Interconnects 2010
N registers of size log2 Qmax
Conclusion We proposed BP assisted scheduling to boost
performance of existing scheduling algorithms keeping backward compatibility BP runs for few iterations
We simplified and improved basic BP algorithm: Relaxation of MWM uniqueness constraint Integer messages (backward compatibility) Message memory Self-asynchronous update
We provided a high-level description of a possible HW implementation of the BP-MP: BP-MP can be efficiently implemented in HW and it is
compatible with existing implementations
Hot Interconnects 201024
Belief-Propagation Assisted Scheduling in Input-Queued
Switches
S. Atalla1, D. Cuda2, P. Giaccone1, M. Pretti2
1Politecnico di Torino2Italian National Research Council
Hot Interconnects 2010August 2010
Any questions?
Thank you for your attention!
25 Hot Interconnects 2010
Example: MWM computation over a tree Node “1” must decide to add or not edge (1,2) to
the matching Node “1” takes its decision based on the information
provided only by nodes belonging to its neighborhood
E.g., Node “2” sends to “1” two messages: : MWM of the sub-tree rooted at “2” comprising (2,1)
given that (2,1) is part of the MWM rooted at “1” : MWM of the sub-tree rooted at 2 comprehending (2,1)
given that (2,1) is part of the MWM rooted at “1”
12W
12W
1
2
34
5 6
7
Take or not to take (2,1)?
w 32
w21
w42
w42
w61
w71
26 Hot Interconnects 2010
Example: MWM computation over a tree
Message definitions:
If (2,1) is part of the MWM, then (3,2), (4,2), (5,2) can not be in the MWM
if (2,1) is not the MWM, then at the most one (or none) among (3,2), (4,2), (5,2) can part of the MWM
It is possible to reduce the number of exchanged messages combining into a single message
2524232112 WWWwW
12
34 5
w21
25W
},
,,max{
252423252423
25242325242312
WWWWWW
WWWWWWW1
2
34
5 25W
,12W
27 Hot Interconnects 2010
Example: MWM computation over a tree
Node “1” decision: Node “1” adds edge (1,2) to the MWM if:
or equivalently
},
,max{
171612171612
171612171512
WWWWWW
WWWWWW
}0,,max{ 161512 1
2
34
5 6
7
Take or not to take (2,1)?
w 32
w21
w42
w42
w61
w71
28 Hot Interconnects 2010
Graphical models BP algorithms are message passing algorithms
conceived firstly to study Graphical Models (GMs) GMs are a “marriage” between probability theory and
graph theory lo direi solo a voce, non significa niente qui GMs are becoming a powerful tool in several fields
of science (AI, speech recognition, coding/decoding, bioinformatics) to compute marginal probabilities and maximum a posteriori probability (max-product algorithm) “BP” and “max-product “ are usually simply referred as
“BP” since computing the maximum a posteriori probability requires first to compute the marginal distributions io questa frase non l’ho capita e mi pare rischiosissima!!!
29 Hot Interconnects 2010
VOQ BP-MP Scheduler
30 Hot Interconnects 2010
Scheduler: iLQF If the MWM is unique, BP assisted iLQF, running
with weights computes exactly the MWM
31 Hot Interconnects 2010
Performance evaluation: results
BP assisted scheduling improves performance(I=3)
Average delays : delays BP-iLQF/GWM are at the most 1.37 times delays of iLQF/GWM.
Memory
No Memory
32 Hot Interconnects 2010
Self-asynchronous
Synchronous
Asynchronous
Basic belief-propagation for MWM
0 0
33 Hot Interconnects 2010
3 3