View
219
Download
0
Embed Size (px)
Citation preview
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
Analyzing Single Buffered Routers
Sundar Iyer, Rui Zhang, Nick McKeown(sundaes, rzhang, nickm)@stanford.eduHigh Performance Networking GroupDepartments of Electrical Engineering & Computer Science, Stanford University
Stanford University 2
What is an Ideal Router?
• Output Queued switches are ideal but not practical
It minimizes the delay faced by a packet Can give QoS guarantees
The bandwidth to each output is NR, the total bandwidth is N2RThe cost and power consumption is prohibitive
1
N
R
R
Arriving Packets
1
N
R
R
Departing PacketsInterconnect Memory
NR
BW: N2R
NR
Output Queued Switch
Stanford University 3
CIOQ Models
Departing PacketsArriving Packets
1
N N
1
Arbiter
R
R
R
R
R
R
N memoriesDeparting PacketsArriving Packets
1
N N
1
Arbiter
R
R
R
R
R
R
N memories
2R
BW: NR BW: 2NR
R
• CIOQ switches are better but still not practical
They can emulate OQ switches They need a bandwidth of only 2NR
They have high computational complexityThe model does not capture many different architectures
Input Queued Switch Combined Input-Output Queued Switch
2R
R
Stanford University 4
The Single Buffered Router Model
• Single Buffered Routers buffer packets only once• The interconnects may be
– physically separate or merged– one of the interconnects may be optional
• The memory can be– centralized or distributed– one or many– reserved or shared amongst all ports
1
N
R
R
Arriving Packets
1
N
R
R
Departing Packets
Interconnect InterconnectMemory
Stanford University 5
Why a New Model for Routers?• SB Routers comprise a broader class of
routers
– They replace the CIOQ model– They also include other interesting router
architectures such as • shared memory routers, parallel packet switches etc.
• With this model we can compare these routers to an ideal router and answer
– Does a router give me quality of service?– Can a router guarantee me 100% throughput
Stanford University 6
How to Compare Routers? OQ Switch
R
R
R
R
R
R
R
R
Any SB Switch
Yes?Emulate
=?
1
N
R
R
Arriving Packets
1
N
R
R
Departing Packets
Interconnect
Interconnect
Memory
1
N
R
R
Arriving Packets
1
N
R
R
Departing Packets
Interconnect
NR
BW: N2R
Memory
No
Stanford University 7
A Modified Pigeon Hole Principle
– Consider the following– Only one pigeon can enter or leave a hole in a given time– A pigeon decides when it wants to leave– A pigeonhole may contain many pigeons over time
– How many pigeon holes do we need so that departing pigeons are guaranteed to be able to leave, and arriving pigeons are guaranteed a pigeon hole?
Stanford University 8
The Constraint Set Technique• A technique to analyze single buffered routers
1. Determine each packet’s departure time2. Define the constraints on the system for both inputs and
outputs (if applicable)– buffer, fabrics, speedup, etc.
3. Apply the Pigeon Hole Principle
• Constraint Sets can be used to analyze
• Parallel Shared Memory Switches• Distributed Shared Memory Switches (bus-based or
crossbar-based)• Input Queued Switches• Parallel Packet Switches
.. and we expect in general any Single Buffered router
Stanford University 9
Examples of Constraints
Physical Constraints
• These are limitations imposed by the hardware– Memory: (E.g.: Parallel Packet Switch) Can’t access a memory
more than a certain number of times in a time period
– Bus: (E.g: Centralized Shared Memory) Can’t use the same bus simultaneously for more than a certain number of packets
– Crossbar: (E.g: Distributed Shared Memory) Each input and output may be busy only once in a scheduling period
Logical Constraints
• These are requirements imposed on the switch– Time: (Input Queued Router) A packet must face a delay of no
more than “p” time slots with respect to an ideal switch
Stanford University 10
An Example: Parallel Shared Memory (PSM) Router
DRAM consisting of k memories
2NR
Arbiter
Read Access Time = T
Write Access Time = T
1
N
1
N
R
R
R
R
Departing PacketsArriving Packets
Interconnect Fabric Bus
Interconnect Number One/Two
Interconnect Implementation
Separated/Merged
Memory Physical Location
Centralized
Memory Number One/Many
Memory Sharing Allowed
Yes
NumberMemories
BW per Memory
TotalBW
Emulate?
k 3NR/k 3NR FIFO
k 4NR/k 4NR QoS
Stanford University 11
Question: Can a PSM Router emulate an OQ Router?
– Let a cell arrive at input “i” at time “t” and be destined to depart from output port “j” at time “DT”
– Such a cell must not be written to memories which
1. Are used to write the other N-1 arriving cells at t.2. Are used to read the departing N departing cells at t.3. Will be used to read the N-1 departing cells at DT.
– There are three constraint sets
– By the pigeonhole principle, 3N memories at rate R, or a memory bandwidth of 3NR is sufficient
Stanford University 12
Distributed Shared Memory Router
Interconnect Fabric Bus/Crossbar
Interconnect Number One/Two
Interconnect Implementation
Separated/Merged
Memory Physical Location
Distributed
Memory Number Many
Memory Sharing Allowed
Yes
Departing PacketsArriving Packets
N
1
Arbiter
R
R
R
1
N
R
R
R
Arbiter
N memories
No.Mem
BW per Mem.
TotalBW
Xbarspeed
Emulate?
N 4R 4NR 4NR FIFO
N 6R 6NR 6NR QoS
S1R
BW: S1NR
S2R
BW: S2NR
Stanford University 13
Question: Can a DSM Router emulate an OQ Router?
– Let a cell arrive at input “i” at time “t” and be destined to depart from output port “j” at time “DT”
– The cell can be written to any intermediate port “x” such that
1. The edge (i,x) is available at time t. Since, no more than N-1 other cells contend to write at time t, at least (N-1)/s1 vertices are available.
2. The edge (x,j) is available at time DT. Since, no more than N-1 other cells contend to leave at time t, at least (N-1)/s2 vertices are available.
– There are two constraint sets
• By pigeonhole principle, if suffices that (N-1)/s1 + (N-1)/s2 > N.
• Hence if s1 =s2 =2, i.e. s=s1+s2=4 is enough.• A bandwidth of 4NR is sufficient
Stanford University 14
How Complex is the Arbiter?
For each packet, need to check k memory addresses for potential conflicts
Need to maintain the bitmap for scheduled departures from memories
Scheduling is done sequentially, O(N)
Communication from linecards is minimal
Stanford University 15
Summary: New Results, Previous Architectures, Comparison
YesSimple-3N(N+1)R3R(N+1)/kNkClosPPS -OQ7a
YesSimple6NR6NR6RNXbarDSM-III6c
YesSimple8NR4NR4RNXbarDSM-II6b
YesComplex5NR4NR4RNXbarDSM-I6a
YesSimple-4NR4NR/kkBusPSM5
YesComplex2NR6NR3R2NXbarCIOQ4
Yes –for FIFO, Leaky Bucket
TrafficSimple3NR6NR3R2NXbarIQ*
(with speedup)3
NoMax. Matching
NR2NR2RNXbarIQ2
-Simple-2NR2NR1BusShared Mem.1
Emulate(QoS)
ArbiterXbar BW
Total BWBW of Mem.
Num. Mem.
FabricType
YesSimple-6NR6NR/kNkClosPPS –Shared
Memory7b
Yes (FIFO)Simple-4NR4NR/kNkClosPPS – Shared Memory
7c
Yes None-N(N+1)R(N+1)RNBusOQ0
Stanford University 17
DSM Router Variants(Trading Arbiter Complexity with Memory Speed)
No.Mem.
BW per Mem.
TotalBW
Xbarspeed
Emulate? Arbiter
N 3R 3NR 4NR FIFO Complex
N 3R 3NR 6NR FIFO Simple
N 4R 4NR 4NR FIFO Simple
N 4R 4NR 5NR QoS Complex
N 4R 4NR 8NR QoS Simple
N 6R 6NR 6NR QoS Simple
Stanford University 18
Input Queued Router
Departing PacketsArriving Packets
1
N N
1
Arbiter
Interconnect Fabric Crossbar
Interconnect Number One
Interconnect Implementation Merged
Memory Physical Location Distributed
Memory Number Many
Memory Sharing Allowed No
R
R
R
R
R
R
NumberMemories
BW per Memory
TotalBW
Emulate?
N 3R 3NR FIFO – Leaky Bucket Traffic
N 3R 3NR QoS – Leaky Bucket Cons.
N memories
Stanford University 19
Parallel Packet SwitchInterconnect Fabric Clos
Network
Interconnect Number Two
Interconnect Implementation Separated
Memory Physical Location Distributed
Memory Number Many
Memory Sharing Allowed Yes
3N(N+1)R
2N(N+1)R
TotalBW
3R(N+1)/k
2R(N+1)/k
BW per Memory
Nk
Nk
No.Mem.
QoS
FIFO
Emulate?
-
-
Xbarspeed
OQ Switch
OQ Switch
OQ Switch
1
2
3
4
R
R
R
R
1
2
3
4
R
R
R
R
MultiplexorDemultiplexor(R/k) (R/k)
k=3
1
2
(R/k) (R/k)Departing PacketsArriving Packets
Stanford University 20
Comparing DSM to CIOQ Routers
• DSM routers are less complex than CIOQ routers• Lower requirements on memories• Simpler scheduling algorithm• Slightly higher crossbar bandwidth
• Two problems:• Departure times must be determined centrally• Scheduler is sequential
CIOQ DSM
Num. Mem. 2N N
Total Mem. BW 6NR 4NR
Xbar BW 4R 5R
Buffer Size NR x RTT << NR x RTT