Upload
kolton
View
37
Download
2
Embed Size (px)
DESCRIPTION
A Deficit Round Robin 20MB/s Layer 2 Switch. Muraleedhara Navada Francois Labonte. Fairness in Switches. Output Queued Switch. How to provide fair bandwidth allocation at output link ? Simple FIFO favors greedy flow Separate flows into FIFOs at output Bit by Bit fair queuing - PowerPoint PPT Presentation
Citation preview
2
Fairness in Switches
• How to provide fair bandwidth allocation at output link ? – Simple FIFO favors greedy
flow
• Separate flows into FIFOs at output– Bit by Bit fair queuing
– Weighted Fair Queuing allows different weight for flows
– Packetized Weighted Fair Queuing (aka PGPS) calculates departure time for each packet
Output Queued Switch
50 100
505050
505050
150 Round-Robin bit by bit allocation
3
Deficit Round Robin
• Packetized Weighted Fair Queuing is complicated to implement
• Deficit Round Robin keeps track of credits for each flow– Flow sends according
credits– Add credits according to
weight– Essentially PWFQ at
coarser level
505050
505050
150
7550 100
75
75
75
Credits
505050
5050
50
150
7550 100
25
25
75
Credits
5050
5050
150
15050 100
100
100
150
Credits
Tim
e
4
NetFPGA System
• 8 Port 10MB/s duplex ethernet
• Control FPGA (CFPGA) handles physical interface (MAC)
• Our design targets both the User FPGAs (UFPGA)
CFPGA
UFPGA1
UFPGA0
1MB SRAM
1MB SRAM
1MB SRAM
10MB/s Ethernet
5
Design Considerations
• 4 MACs behind each port (8)• Each flow is a unique Source Address –
Destination Address pair– ~1024 flows
• Split across FPGAs – Each UFPGAs read incoming packets from
different ports(0-3 and 4-7) – tradeoff between memory storage and
fairness across all flows
6
Memory Buffer Allocation
• Static Partitioning of 1MB SRAM across 512 flows gives 2kbytes per flow < 2 max size packets
• Need more dynamic allocation– Segments: smaller size means less
fragmentation, but more pointer and list handling overhead
• 128 bytes was chosen
– Keep free segments list– Save on-chip only pointer to head
and tail of each flow
P4
P5
P5
P6
P1
P1
P2
P3
7
MAC address Learning
• Instead of telling which MAC addresses belong to which port
• Learn them from the source address– Note that our split FPGA design (reading from
different ports) require them to communicate the MACs learned between them
• When destination MAC is not learned yet, broadcast (send to all other ports).
• So MAC learning implies broadcast capability
8
Read Operation
Master Control
Packet Memory Manager
MAC Learning Flow Assignment
DRR Engine
Control Handler
1 MB SRAM
CFPGAInterface
DA, SA
Flow
ID
Flow Tail
Length, ptr
Read, port Sha
re S
A
9
Write Operation
Master Control
Packet Memory Manager
MAC Learning Flow Assignment
DRR Engine
Control Handler
1 MB SRAM
CFPGAInterface
Head, length
Next head, length, latency
Write, port
Port REQ
Port GNT
Dat
a R
eady
10
DRR Engine
• How to handle 512 flows and stay work conserving:– Only one flow active at any
time– DRR allocation happens on
dequeuing– Fifos contain the next flow to
be serviced for each port• Statistics per flow
– Weight– Latency – Byte sent – Packet sent– Packets active
FLOW data512 x 160bits SRAM
Port 0 F
IFO
Port 1 F
IFO
Port 2 F
IFO
Port 3 F
IFO
Port 4 F
IFO
Port 5 F
IFO
Port 6 F
IFO
Port 7 F
IFO