VoQ Crossbar Schedulers

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Hong Kong University of Science and Technology -MSc(IT) 2003

Fall Semester - Track 1

CSIT560 Internet Infrastructure: Switches and Routers

Final Project Presentation #T1-6-2

VoQ Crossbar SchedulersVoQ Crossbar Schedulers

StartStart

Group Members:Cyrus LO – 02784753 [email protected]

Ricky CHAN - 02784557 [email protected]

Ricky KWAN - 02784674 [email protected]

Presented on 1 December 2003

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Agenda• Part I. Introduction

Project goals (30secs by Ricky Chan)

• Part II. Some well-known MWM/MSM scheduling algorithms What are the differences between MSM & MWM ? Overview of PIM algorithm What is the common problem? (2’30 mins by Ricky Chan)

• Part III. Our findings Requirements and assumptions A new conceptual algorithm –

Real De-synchronized Round-Robin Model (RDESRR) What is RDESRR ? (6 mins by Ricky Kwan)

• Part IV. Experimental results SIM Results Result comparisons : RDESRR Vs SSRR & ISLIP Strengths & Weaknesses (4 mins by Cryus Lo)

• Part V. Conclusion Conclusions (1 min by Cryus Lo)

• Part VI. Q & A (1 min by all)

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part I. Introduction

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Project goals

By further studying and analysis of the pointer desynchronization effect of some iterative arbitration algorithms, our group has the following goals:

•Tackle the common problem of some well-known MSM algorithms

•Propose a new conceptual algorithm for overcoming the problem

•Simulate the proposed algorithm by using SIM

•Justify the results

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part II.

Some Well-known MWM/MSM scheduling algorithms

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

What are the differences between MSM & MWM ?

• Maximum weight matching (MWM) – stable for independent, 100% throughput under any traffic.

• Maximum size matching (MSM) – stable for i.i.d, uniform, admissible traffic.

• Both are too complex to implement, time complexities are O(N3logN) – MWM and O(N5/2) – MSM

•Only rely on some simple iterative algorithms to approximate MSMapproximate MSM•Some simple iterative algorithms:

•Parallel Iterative Matching (PIM)Round-Robin Matching (RRM)iterative SLIP (iSLIP).Fcfs in Round-Robin Matching (FIRM)Static Round-Robin Matching (SRR – SSRR, DSRR, Rotating DSRR & DRDSRRDRDSRR etc)

Derandomized rotating double Derandomized rotating double static round-robin !static round-robin !

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Overview of PIM algorithm

When no new matching can be found, the algorithm stops.

3.Accept - If an input receives a grant, it accepts one by selecting an output randomly among those that granted to this output..

2. Grant - If an unmatched output receives any requests, it grants to one by randomly selecting a request uniformly over all requests.

1. Request - Each unmatched input sends a request to every output for which it has a queued cell.

The basic matching algorithm. Each iteration of the algorithm follows these three steps:

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Is any improvement for PIM algorithms ?

• PIM -> RRM -> ISLIP -> FIRM -> Pointer Desynchronized (SRR)

• Some small change on pointers but make big improvement

• SRR is desynchronized the highest priority pointers

• It has no guarantee the grant result is desynchronized.

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 10

1

2

3

Both grant to 3

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Is any improvement for PIM algorithms ?

•If all grants are desynchronized.

Expect to get better result.

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1

Grant Desynchronized(Real Desynchronized)

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part III. Our Findings

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Assumptions

• Not enough knowledge in Hardware• Look for the theoretical solution• We ignore all hardware implementation issues

Requirements

• The system is stable• 100% Throughput in uniform traffic

•Target to desynchronize the grant results (so called Real Desynchronize)

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

A new algorithm – RDESRR

Concept:• Real Desynchronized Round Robin Model (RDESRR)

• Based on 2 phases RRM model (Request and Grant)

• Add a small share memory that each outputs can read/write (called Share Bits)

• The size of the memory is 1 bit per input

• If the bit is set, the corresponding input has already granted by an output

• If the bit is not set, the output may grant to corresponding input port

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Conceptual model

0

1

2

3

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1

3

0

1

2

Share Bits

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR model

• 2 phases only

• Request. Each input sends a request to every output for which it has a queued cell.

• Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to

the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo - Request

Step 1: Request

0

1

2

3

0

1

2

3

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – Add a share memory in Output

Step 2: Grant

0

1

2

3

0

1

2

3

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•Add a small share memory that each outputs can read/write (called Share Bits)

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

3 02 1

3 02 1

3 02 1

3 02 1

RDESRR Demo – Output check the share bits

0

1

2

3

0

1

2

3

Step 2: Grant

3

0

1

2

Share Bits

•The output check the corresponding bit is set or not

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – When share bit is occupied

0

1

2

3

0

1

2

3

Step 2: Grant

3

0

1

2

3 02 1

3 02 1

3 02 1

3 02 1

Share Bits

•if not set, the output will set the bit and notifies the input its request was granted•The share bit is First Come First Serve

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – Output looks for next request

0

1

2

3

0

1

2

3

Step 2: Grant

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•If set, the output will look for next request until all requests have gone through

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR Demo – All share bits are allocated

0

1

2

3

0

1

2

3

Step 2: Grant

3 02 1

3 02 1

3 02 1

3 02 1 3

0

1

2

Share Bits

•Fully allocate the share bit will result for fully grant all input request

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

3 02 1

3 02 1

3 02 1

3 02 1

RDESRR Demo – Pointer update/Share bit reset

0

1

2

3

0

1

2

3

3

0

1

2

Share Bits

•The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input•If no request is received, the pointer stays unchanged•Share bits are also reset

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

RDESRR model (Recap)

• 2 phases only

• Request. Each input sends a request to every output for which it has a queued cell.

• Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to

the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part IV. Experimental Results

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

SIM Results• Run the test for 32x32 port in SIM using –l 1000000

Total Latency Avg Match Size0.1 0.0588 3.1958 0.2 0.1447 6.3938 0.3 0.2686 9.5947 0.4 0.4501 12.7940 0.5 0.7198 15.9960 0.6 1.1398 19.1980 0.7 1.8636 22.3961 0.8 3.2619 25.5986 0.9 7.5087 28.8003 1.0 715.5900 31.9850

RDESRR

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Result comparisons : RDESRR Vs SSRR & ISLIP

• The latency in load = 1 is 715.59• The average match size in load = 1 is 31.985 (99.95% match)

Average Delay under Uniform Traffic

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1.0E+02

1.0E+03

1.0E+04

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

NormalizedLoad

Ave

rage

Del

ay

RR-DESY

SSRR

ISLIP

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Strengths • 2 phase operations

• Each input only get one grant, • no arbiter in input port

Unknown factorsParallel processing for ports scheduling.

All outputs access the share bits simultaneously Multi-process protection for Share Bits

1. Prevent deadlock, Check and set bit should be atomic action

2. Any chance for two outputs access same bit at the same time?

• Save Cost•It is simple to implement because base on RR

•Has a lot of potential varieties to study

•May achieve Maximum Size Match after few iterations.

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part V. Conclusions

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Final thought…• Future research and study

1. Hardware implementation feasibility?

2. Apply to ISLIP, FIRM, may get better result?

• Comments on SIM1. We have successfully build the SIM on Linux platform so we expect it

can also be build on Windows platform with Cywin library

2. It cannot simulate the parallel processing behaviour therefore we cannot know what happen in the real scheduler

3. The processing time of SIM is quite slow so we cannot have enough time to simulate more revised algorithms and more iterations

Conclusion: Conclusion: RDESRR is a conceptual model which get RDESRR is a conceptual model which get

the good result in SIMthe good result in SIM

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Part VI. Q & A

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

Q & AQ & A

41 2 3

1

3

2

44 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

4 13 2

PresentationPresentation End End

Documents

VoQ Crossbar Schedulers