20
Analyzing Single Buffered Routers Sundar Iyer, Rui Zhang, Nick McKeown (sundaes, rzhang, nickm)@stanford.edu High Performance Networking Group Departments of Electrical Engineering & Computer Science, Stanford University

Analyzing Single Buffered Routers Sundar Iyer, Rui Zhang, Nick McKeown (sundaes, rzhang, nickm)@stanford.edu High Performance Networking Group Departments

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Analyzing Single Buffered Routers

Sundar Iyer, Rui Zhang, Nick McKeown(sundaes, rzhang, nickm)@stanford.eduHigh Performance Networking GroupDepartments of Electrical Engineering & Computer Science, Stanford University

Stanford University 2

What is an Ideal Router?

• Output Queued switches are ideal but not practical

It minimizes the delay faced by a packet Can give QoS guarantees

The bandwidth to each output is NR, the total bandwidth is N2RThe cost and power consumption is prohibitive

1

N

R

R

Arriving Packets

1

N

R

R

Departing PacketsInterconnect Memory

NR

BW: N2R

NR

Output Queued Switch

Stanford University 3

CIOQ Models

Departing PacketsArriving Packets

1

N N

1

Arbiter

R

R

R

R

R

R

N memoriesDeparting PacketsArriving Packets

1

N N

1

Arbiter

R

R

R

R

R

R

N memories

2R

BW: NR BW: 2NR

R

• CIOQ switches are better but still not practical

They can emulate OQ switches They need a bandwidth of only 2NR

They have high computational complexityThe model does not capture many different architectures

Input Queued Switch Combined Input-Output Queued Switch

2R

R

Stanford University 4

The Single Buffered Router Model

• Single Buffered Routers buffer packets only once• The interconnects may be

– physically separate or merged– one of the interconnects may be optional

• The memory can be– centralized or distributed– one or many– reserved or shared amongst all ports

1

N

R

R

Arriving Packets

1

N

R

R

Departing Packets

Interconnect InterconnectMemory

Stanford University 5

Why a New Model for Routers?• SB Routers comprise a broader class of

routers

– They replace the CIOQ model– They also include other interesting router

architectures such as • shared memory routers, parallel packet switches etc.

• With this model we can compare these routers to an ideal router and answer

– Does a router give me quality of service?– Can a router guarantee me 100% throughput

Stanford University 6

How to Compare Routers? OQ Switch

R

R

R

R

R

R

R

R

Any SB Switch

Yes?Emulate

=?

1

N

R

R

Arriving Packets

1

N

R

R

Departing Packets

Interconnect

Interconnect

Memory

1

N

R

R

Arriving Packets

1

N

R

R

Departing Packets

Interconnect

NR

BW: N2R

Memory

No

Stanford University 7

A Modified Pigeon Hole Principle

– Consider the following– Only one pigeon can enter or leave a hole in a given time– A pigeon decides when it wants to leave– A pigeonhole may contain many pigeons over time

– How many pigeon holes do we need so that departing pigeons are guaranteed to be able to leave, and arriving pigeons are guaranteed a pigeon hole?

Stanford University 8

The Constraint Set Technique• A technique to analyze single buffered routers

1. Determine each packet’s departure time2. Define the constraints on the system for both inputs and

outputs (if applicable)– buffer, fabrics, speedup, etc.

3. Apply the Pigeon Hole Principle

• Constraint Sets can be used to analyze

• Parallel Shared Memory Switches• Distributed Shared Memory Switches (bus-based or

crossbar-based)• Input Queued Switches• Parallel Packet Switches

.. and we expect in general any Single Buffered router

Stanford University 9

Examples of Constraints

Physical Constraints

• These are limitations imposed by the hardware– Memory: (E.g.: Parallel Packet Switch) Can’t access a memory

more than a certain number of times in a time period

– Bus: (E.g: Centralized Shared Memory) Can’t use the same bus simultaneously for more than a certain number of packets

– Crossbar: (E.g: Distributed Shared Memory) Each input and output may be busy only once in a scheduling period

Logical Constraints

• These are requirements imposed on the switch– Time: (Input Queued Router) A packet must face a delay of no

more than “p” time slots with respect to an ideal switch

Stanford University 10

An Example: Parallel Shared Memory (PSM) Router

DRAM consisting of k memories

2NR

Arbiter

Read Access Time = T

Write Access Time = T

1

N

1

N

R

R

R

R

Departing PacketsArriving Packets

Interconnect Fabric Bus

Interconnect Number One/Two

Interconnect Implementation

Separated/Merged

Memory Physical Location

Centralized

Memory Number One/Many

Memory Sharing Allowed

Yes

NumberMemories

BW per Memory

TotalBW

Emulate?

k 3NR/k 3NR FIFO

k 4NR/k 4NR QoS

Stanford University 11

Question: Can a PSM Router emulate an OQ Router?

– Let a cell arrive at input “i” at time “t” and be destined to depart from output port “j” at time “DT”

– Such a cell must not be written to memories which

1. Are used to write the other N-1 arriving cells at t.2. Are used to read the departing N departing cells at t.3. Will be used to read the N-1 departing cells at DT.

– There are three constraint sets

– By the pigeonhole principle, 3N memories at rate R, or a memory bandwidth of 3NR is sufficient

Stanford University 12

Distributed Shared Memory Router

Interconnect Fabric Bus/Crossbar

Interconnect Number One/Two

Interconnect Implementation

Separated/Merged

Memory Physical Location

Distributed

Memory Number Many

Memory Sharing Allowed

Yes

Departing PacketsArriving Packets

N

1

Arbiter

R

R

R

1

N

R

R

R

Arbiter

N memories

No.Mem

BW per Mem.

TotalBW

Xbarspeed

Emulate?

N 4R 4NR 4NR FIFO

N 6R 6NR 6NR QoS

S1R

BW: S1NR

S2R

BW: S2NR

Stanford University 13

Question: Can a DSM Router emulate an OQ Router?

– Let a cell arrive at input “i” at time “t” and be destined to depart from output port “j” at time “DT”

– The cell can be written to any intermediate port “x” such that

1. The edge (i,x) is available at time t. Since, no more than N-1 other cells contend to write at time t, at least (N-1)/s1 vertices are available.

2. The edge (x,j) is available at time DT. Since, no more than N-1 other cells contend to leave at time t, at least (N-1)/s2 vertices are available.

– There are two constraint sets

• By pigeonhole principle, if suffices that (N-1)/s1 + (N-1)/s2 > N.

• Hence if s1 =s2 =2, i.e. s=s1+s2=4 is enough.• A bandwidth of 4NR is sufficient

Stanford University 14

How Complex is the Arbiter?

For each packet, need to check k memory addresses for potential conflicts

Need to maintain the bitmap for scheduled departures from memories

Scheduling is done sequentially, O(N)

Communication from linecards is minimal

Stanford University 15

Summary: New Results, Previous Architectures, Comparison

YesSimple-3N(N+1)R3R(N+1)/kNkClosPPS -OQ7a

YesSimple6NR6NR6RNXbarDSM-III6c

YesSimple8NR4NR4RNXbarDSM-II6b

YesComplex5NR4NR4RNXbarDSM-I6a

YesSimple-4NR4NR/kkBusPSM5

YesComplex2NR6NR3R2NXbarCIOQ4

Yes –for FIFO, Leaky Bucket

TrafficSimple3NR6NR3R2NXbarIQ*

(with speedup)3

NoMax. Matching

NR2NR2RNXbarIQ2

-Simple-2NR2NR1BusShared Mem.1

Emulate(QoS)

ArbiterXbar BW

Total BWBW of Mem.

Num. Mem.

FabricType

YesSimple-6NR6NR/kNkClosPPS –Shared

Memory7b

Yes (FIFO)Simple-4NR4NR/kNkClosPPS – Shared Memory

7c

Yes None-N(N+1)R(N+1)RNBusOQ0

Backups

Stanford University 17

DSM Router Variants(Trading Arbiter Complexity with Memory Speed)

No.Mem.

BW per Mem.

TotalBW

Xbarspeed

Emulate? Arbiter

N 3R 3NR 4NR FIFO Complex

N 3R 3NR 6NR FIFO Simple

N 4R 4NR 4NR FIFO Simple

N 4R 4NR 5NR QoS Complex

N 4R 4NR 8NR QoS Simple

N 6R 6NR 6NR QoS Simple

Stanford University 18

Input Queued Router

Departing PacketsArriving Packets

1

N N

1

Arbiter

Interconnect Fabric Crossbar

Interconnect Number One

Interconnect Implementation Merged

Memory Physical Location Distributed

Memory Number Many

Memory Sharing Allowed No

R

R

R

R

R

R

NumberMemories

BW per Memory

TotalBW

Emulate?

N 3R 3NR FIFO – Leaky Bucket Traffic

N 3R 3NR QoS – Leaky Bucket Cons.

N memories

Stanford University 19

Parallel Packet SwitchInterconnect Fabric Clos

Network

Interconnect Number Two

Interconnect Implementation Separated

Memory Physical Location Distributed

Memory Number Many

Memory Sharing Allowed Yes

3N(N+1)R

2N(N+1)R

TotalBW

3R(N+1)/k

2R(N+1)/k

BW per Memory

Nk

Nk

No.Mem.

QoS

FIFO

Emulate?

-

-

Xbarspeed

OQ Switch

OQ Switch

OQ Switch

1

2

3

4

R

R

R

R

1

2

3

4

R

R

R

R

MultiplexorDemultiplexor(R/k) (R/k)

k=3

1

2

(R/k) (R/k)Departing PacketsArriving Packets

Stanford University 20

Comparing DSM to CIOQ Routers

• DSM routers are less complex than CIOQ routers• Lower requirements on memories• Simpler scheduling algorithm• Slightly higher crossbar bandwidth

• Two problems:• Departure times must be determined centrally• Scheduler is sequential

CIOQ DSM

Num. Mem. 2N N

Total Mem. BW 6NR 4NR

Xbar BW 4R 5R

Buffer Size NR x RTT << NR x RTT