Upload
barrett-sullivan
View
39
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Hong Kong University of Science and Technology - MSc(IT) 2003 Fall Semester - Track 1 CSIT560 Internet Infrastructure: Switches and Routers Final Project Presentation #T1-6-2. VoQ Crossbar Schedulers. Group Members: Cyrus LO – 02784753 [email protected] Ricky CHAN - 02784557 [email protected] - PowerPoint PPT Presentation
Citation preview
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Hong Kong University of Science and Technology -MSc(IT) 2003
Fall Semester - Track 1
CSIT560 Internet Infrastructure: Switches and Routers
Final Project Presentation #T1-6-2
VoQ Crossbar SchedulersVoQ Crossbar Schedulers
StartStart
Group Members:Cyrus LO – 02784753 [email protected]
Ricky CHAN - 02784557 [email protected]
Ricky KWAN - 02784674 [email protected]
Presented on 1 December 2003
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Agenda• Part I. Introduction
Project goals (30secs by Ricky Chan)
• Part II. Some well-known MWM/MSM scheduling algorithms What are the differences between MSM & MWM ? Overview of PIM algorithm What is the common problem? (2’30 mins by Ricky Chan)
• Part III. Our findings Requirements and assumptions A new conceptual algorithm –
Real De-synchronized Round-Robin Model (RDESRR) What is RDESRR ? (6 mins by Ricky Kwan)
• Part IV. Experimental results SIM Results Result comparisons : RDESRR Vs SSRR & ISLIP Strengths & Weaknesses (4 mins by Cryus Lo)
• Part V. Conclusion Conclusions (1 min by Cryus Lo)
• Part VI. Q & A (1 min by all)
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Part I. Introduction
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Project goals
By further studying and analysis of the pointer desynchronization effect of some iterative arbitration algorithms, our group has the following goals:
•Tackle the common problem of some well-known MSM algorithms
•Propose a new conceptual algorithm for overcoming the problem
•Simulate the proposed algorithm by using SIM
•Justify the results
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Part II.
Some Well-known MWM/MSM scheduling algorithms
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
What are the differences between MSM & MWM ?
• Maximum weight matching (MWM) – stable for independent, 100% throughput under any traffic.
• Maximum size matching (MSM) – stable for i.i.d, uniform, admissible traffic.
• Both are too complex to implement, time complexities are O(N3logN) – MWM and O(N5/2) – MSM
•Only rely on some simple iterative algorithms to approximate MSMapproximate MSM•Some simple iterative algorithms:
•Parallel Iterative Matching (PIM)Round-Robin Matching (RRM)iterative SLIP (iSLIP).Fcfs in Round-Robin Matching (FIRM)Static Round-Robin Matching (SRR – SSRR, DSRR, Rotating DSRR & DRDSRRDRDSRR etc)
Derandomized rotating double Derandomized rotating double static round-robin !static round-robin !
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Overview of PIM algorithm
When no new matching can be found, the algorithm stops.
3.Accept - If an input receives a grant, it accepts one by selecting an output randomly among those that granted to this output..
2. Grant - If an unmatched output receives any requests, it grants to one by randomly selecting a request uniformly over all requests.
1. Request - Each unmatched input sends a request to every output for which it has a queued cell.
The basic matching algorithm. Each iteration of the algorithm follows these three steps:
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Is any improvement for PIM algorithms ?
• PIM -> RRM -> ISLIP -> FIRM -> Pointer Desynchronized (SRR)
• Some small change on pointers but make big improvement
• SRR is desynchronized the highest priority pointers
• It has no guarantee the grant result is desynchronized.
0
1
2
3
3 02 1
3 02 1
3 02 1
3 02 10
1
2
3
Both grant to 3
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Is any improvement for PIM algorithms ?
•If all grants are desynchronized.
Expect to get better result.
0
1
2
3
3 02 1
3 02 1
3 02 1
3 02 1
Grant Desynchronized(Real Desynchronized)
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Part III. Our Findings
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Assumptions
• Not enough knowledge in Hardware• Look for the theoretical solution• We ignore all hardware implementation issues
Requirements
• The system is stable• 100% Throughput in uniform traffic
•Target to desynchronize the grant results (so called Real Desynchronize)
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
A new algorithm – RDESRR
Concept:• Real Desynchronized Round Robin Model (RDESRR)
• Based on 2 phases RRM model (Request and Grant)
• Add a small share memory that each outputs can read/write (called Share Bits)
• The size of the memory is 1 bit per input
• If the bit is set, the corresponding input has already granted by an output
• If the bit is not set, the output may grant to corresponding input port
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR Conceptual model
0
1
2
3
0
1
2
3
3 02 1
3 02 1
3 02 1
3 02 1
3
0
1
2
Share Bits
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR model
• 2 phases only
• Request. Each input sends a request to every output for which it has a queued cell.
• Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to
the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR Demo - Request
Step 1: Request
0
1
2
3
0
1
2
3
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR Demo – Add a share memory in Output
Step 2: Grant
0
1
2
3
0
1
2
3
3 02 1
3 02 1
3 02 1
3 02 1 3
0
1
2
Share Bits
•Add a small share memory that each outputs can read/write (called Share Bits)
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
3 02 1
3 02 1
3 02 1
3 02 1
RDESRR Demo – Output check the share bits
0
1
2
3
0
1
2
3
Step 2: Grant
3
0
1
2
Share Bits
•The output check the corresponding bit is set or not
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR Demo – When share bit is occupied
0
1
2
3
0
1
2
3
Step 2: Grant
3
0
1
2
3 02 1
3 02 1
3 02 1
3 02 1
Share Bits
•if not set, the output will set the bit and notifies the input its request was granted•The share bit is First Come First Serve
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR Demo – Output looks for next request
0
1
2
3
0
1
2
3
Step 2: Grant
3 02 1
3 02 1
3 02 1
3 02 1 3
0
1
2
Share Bits
•If set, the output will look for next request until all requests have gone through
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR Demo – All share bits are allocated
0
1
2
3
0
1
2
3
Step 2: Grant
3 02 1
3 02 1
3 02 1
3 02 1 3
0
1
2
Share Bits
•Fully allocate the share bit will result for fully grant all input request
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
3 02 1
3 02 1
3 02 1
3 02 1
RDESRR Demo – Pointer update/Share bit reset
0
1
2
3
0
1
2
3
3
0
1
2
Share Bits
•The pointer gi to the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input•If no request is received, the pointer stays unchanged•Share bits are also reset
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
RDESRR model (Recap)
• 2 phases only
• Request. Each input sends a request to every output for which it has a queued cell.
• Grant. If an output receives any requests, it chooses the one that appears next in a fixed, round-robin schedule starting from the highest priority element. The output check the corresponding bit is set or not, if not set, the output will set the bit and notifies the input its request was granted. Otherwise, the output will look for next request until all requests has gone through. The pointer gi to
the highest priority element of the round-robin schedule is incremented (modulo N) to one location beyond the granted input. If no request is received, the pointer stays unchanged.
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Part IV. Experimental Results
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
SIM Results• Run the test for 32x32 port in SIM using –l 1000000
Total Latency Avg Match Size0.1 0.0588 3.1958 0.2 0.1447 6.3938 0.3 0.2686 9.5947 0.4 0.4501 12.7940 0.5 0.7198 15.9960 0.6 1.1398 19.1980 0.7 1.8636 22.3961 0.8 3.2619 25.5986 0.9 7.5087 28.8003 1.0 715.5900 31.9850
RDESRR
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Result comparisons : RDESRR Vs SSRR & ISLIP
• The latency in load = 1 is 715.59• The average match size in load = 1 is 31.985 (99.95% match)
Average Delay under Uniform Traffic
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
NormalizedLoad
Ave
rage
Del
ay
RR-DESY
SSRR
ISLIP
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Strengths • 2 phase operations
• Each input only get one grant, • no arbiter in input port
Unknown factorsParallel processing for ports scheduling.
All outputs access the share bits simultaneously Multi-process protection for Share Bits
1. Prevent deadlock, Check and set bit should be atomic action
2. Any chance for two outputs access same bit at the same time?
• Save Cost•It is simple to implement because base on RR
•Has a lot of potential varieties to study
•May achieve Maximum Size Match after few iterations.
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Part V. Conclusions
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Final thought…• Future research and study
1. Hardware implementation feasibility?
2. Apply to ISLIP, FIRM, may get better result?
• Comments on SIM1. We have successfully build the SIM on Linux platform so we expect it
can also be build on Windows platform with Cywin library
2. It cannot simulate the parallel processing behaviour therefore we cannot know what happen in the real scheduler
3. The processing time of SIM is quite slow so we cannot have enough time to simulate more revised algorithms and more iterations
Conclusion: Conclusion: RDESRR is a conceptual model which get RDESRR is a conceptual model which get
the good result in SIMthe good result in SIM
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Part VI. Q & A
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
Q & AQ & A
41 2 3
1
3
2
44 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
4 13 2
PresentationPresentation End End