Sigcomm Tutorial

Embed Size (px)

Citation preview

  • 8/6/2019 Sigcomm Tutorial

    1/189

    High Performance Switches and Routers:

    Theory and PracticeSigcomm 99August 30, 1999

    Harvard UniversityH i g h P e r f o r m a n c e

    S w i tc h i n g a n d R o u t ing

    Tel eco m Cen ter Wo rks hop : Sep t4, 19 97.

    Nick McKeown Balaji Prabhakar

    Departments of Electrical Engineering and Computer Science

    [email protected] [email protected]

  • 8/6/2019 Sigcomm Tutorial

    2/189

    Copyright 1999. All Rights Reserved 2

    Tutorial Outline

    Introduction:What is a Packet Switch?

    Packet Lookup and Classification:Where does a packet go next?

    Switching Fabrics:How does the packet get there?

    Output Scheduling:When should the packet leave?

  • 8/6/2019 Sigcomm Tutorial

    3/189

    Copyright 1999. All Rights Reserved 3

    Introduction

    What is a Packet Switch?

    Basic Architectural Components

    Some Example Packet Switches

    The Evolution of IP Routers

  • 8/6/2019 Sigcomm Tutorial

    4/189

    Copyright 1999. All Rights Reserved 4

    Basic Architectural Components

    Policing

    Output

    SchedulingSwitching

    Routing

    Congestion

    Control

    Reservation

    Admission

    Control

    Control

    Datapath:

    per-packetprocessing

  • 8/6/2019 Sigcomm Tutorial

    5/189

    Copyright 1999. All Rights Reserved 5

    Basic Architectural ComponentsDatapath: per-packet processing

    ForwardingDecision

    ForwardingDecision

    ForwardingDecision

    Forwarding

    Table

    Forwarding

    Table

    Forwarding

    Table

    Interconnect

    OutputScheduling

    1.

    2.

    3.

  • 8/6/2019 Sigcomm Tutorial

    6/189

    Copyright 1999. All Rights Reserved 6

    Where high performance packetswitches are used

    Enterprise WAN access

    & Enterprise Campus Switch

    - Carrier Class Core Router

    - ATM Switch

    - Frame Relay Switch

    The Internet Core

    Edge Router

  • 8/6/2019 Sigcomm Tutorial

    7/189Copyright 1999. All Rights Reserved 7

    Introduction

    What is a Packet Switch?

    Basic Architectural Components

    Some Example Packet Switches

    The Evolution of IP Routers

  • 8/6/2019 Sigcomm Tutorial

    8/189Copyright 1999. All Rights Reserved 8

    ATM Switch

    Lookup cell VCI/VPI in VC table.

    Replace old VCI/VPI with new.

    Forward cell to outgoing interface.

    Transmit cell onto link.

  • 8/6/2019 Sigcomm Tutorial

    9/189Copyright 1999. All Rights Reserved 9

    Ethernet Switch

    Lookup frame DA in forwarding table.

    If known, forward to correct port.

    If unknown, broadcast to all ports.

    Learn SA of incoming frame.

    Forward frame to outgoing interface.

    Transmit frame onto link.

  • 8/6/2019 Sigcomm Tutorial

    10/189Copyright 1999. All Rights Reserved 10

    IP Router

    Lookup packet DA in forwarding table.

    If known, forward to correct port.

    If unknown, drop packet.

    Decrement TTL, update header Cksum.

    Forward packet to outgoing interface.

    Transmit packet onto link.

  • 8/6/2019 Sigcomm Tutorial

    11/189Copyright 1999. All Rights Reserved 11

    Introduction

    What is a Packet Switch?

    Basic Architectural Components

    Some Example Packet Switches

    The Evolution of IP Routers

  • 8/6/2019 Sigcomm Tutorial

    12/189Copyright 1999. All Rights Reserved 12

    First-Generation IP Routers

    Shared Backplane

    LineInterface

    CPU

    Memory

    CPU BufferMemory

    LineInterface

    DMA

    MAC

    LineInterface

    DMA

    MAC

    LineInterface

    DMA

    MAC

  • 8/6/2019 Sigcomm Tutorial

    13/189Copyright 1999. All Rights Reserved 13

    Second-Generation IP Routers

    CPU BufferMemory

    LineCard

    DMA

    MAC

    Local

    BufferMemory

    LineCard

    DMA

    MAC

    Local

    BufferMemory

    LineCard

    DMA

    MAC

    Local

    BufferMemory

  • 8/6/2019 Sigcomm Tutorial

    14/189Copyright 1999. All Rights Reserved 14

    Third-Generation Switches/Routers

    LineCard

    MAC

    LocalBuffer

    Memory

    CPUCard

    LineCard

    MAC

    LocalBuffer

    Memory

    Switched Backplane

    LineInterface

    LineInterface

    LineInterface

    LineInterface

    LineInterface

    LineInterfac

    e

    LineInterface

    LineInterface

    CPU

    Memory

  • 8/6/2019 Sigcomm Tutorial

    15/189Copyright 1999. All Rights Reserved 15

    1 2 3 4 5 6 7 8 9 10 1112 13 14 15 16

    17181920 212223242526272829303132

    13 14 15 16 17 18

    19 20 21 22 23 24

    25 26 27 28 29 30

    31 32 21

    1 2 3 4 5 6

    7 8 9 10 11 12

    Fourth-Generation Switches/RoutersClustering and Multistage

  • 8/6/2019 Sigcomm Tutorial

    16/189Copyright 1999. All Rights Reserved 16

    Packet Switches

    References J. Giacopelli, M. Littlewood, W.D. Sincoskie Sunshine: A

    high performance self-routing broadband packet switch

    architecture, ISS 90.

    J. S. Turner Design of a Broadcast packet switching

    network, IEEE Trans Comm, June 1988, pp. 734-743.

    C. Partridge et al. A Fifty Gigabit per second IP Router,

    IEEE Trans Networking, 1998.

    N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, M.

    Horowitz, The Tiny Tera: A Packet Switch Core, IEEE

    Micro Magazine, Jan-Feb 1997.

  • 8/6/2019 Sigcomm Tutorial

    17/189Copyright 1999. All Rights Reserved 17

    Tutorial Outline

    Introduction:What is a Packet Switch?

    Packet Lookup and Classification:Where does a packet go next?

    Switching Fabrics:How does the packet get there?

    Output Scheduling:When should the packet leave?

  • 8/6/2019 Sigcomm Tutorial

    18/189Copyright 1999. All Rights Reserved 18

    Basic Architectural ComponentsDatapath: per-packet processing

    ForwardingDecision

    ForwardingDecision

    ForwardingDecision

    Forwarding

    Table

    Forwarding

    Table

    Forwarding

    Table

    Interconnect

    OutputScheduling

    1.

    2.

    3.

  • 8/6/2019 Sigcomm Tutorial

    19/189

    Copyright 1999. All Rights Reserved 19

    Forwarding Decisions ATM and MPLS switches

    Direct Lookup

    Bridges and Ethernet switches

    Associative Lookup

    Hashing

    Trees and tries

    IP RoutersCaching

    CIDR

    Patricia trees/tries

    Other methods

    Packet Classification

  • 8/6/2019 Sigcomm Tutorial

    20/189

    Copyright 1999. All Rights Reserved 20

    ATM and MPLS Switches

    Direct Lookup

    VCIA

    ddres

    sMemory

    Data

    (Port, VCI)

  • 8/6/2019 Sigcomm Tutorial

    21/189

    Copyright 1999. All Rights Reserved 21

    Forwarding Decisions ATM and MPLS switches

    Direct Lookup

    Bridges and Ethernet switches

    Associative Lookup

    Hashing

    Trees and tries

    IP RoutersCaching

    CIDR

    Patricia trees/tries

    Other methods

    Packet Classification

  • 8/6/2019 Sigcomm Tutorial

    22/189

    Copyright 1999. All Rights Reserved 22

    Bridges and Ethernet Switches

    Associative Lookups

    Network

    Address

    Associated

    Data

    Associative

    Memory or CAM

    Search

    Data

    48

    log2

    N

    Associated

    Data

    Hit?

    Address{

    Advantages:

    Simple

    Disadvantages

    Slow

    High Power

    Small

    Expensive

  • 8/6/2019 Sigcomm Tutorial

    23/189

    Copyright 1999. All Rights Reserved 23

    Bridges and Ethernet Switches

    Hashing

    Hashing

    FunctionMemory

    Address

    Data

    Search

    Data

    48

    log2N

    AssociatedData

    Hit?

    Address{16

  • 8/6/2019 Sigcomm Tutorial

    24/189

    Copyright 1999. All Rights Reserved 24

    Lookups Using Hashing

    An example

    Hashing Function

    CRC-16

    16

    #1 #2 #3 #4

    #1 #2

    #1 #2 #3Linked lists

    Memory

    Search

    Data

    48

    log2N

    AssociatedData

    Hit?

    Address{

  • 8/6/2019 Sigcomm Tutorial

    25/189

    Copyright 1999. All Rights Reserved 25

    Lookups Using HashingPerformance of simple example

    Where:

    ER Expected number of memory references=

    M Number of memory addresses in table=

    N Number of linked lists=

    M N=

    ER 12--- 1 1 1

    1

    N----

    M--------------------------------+

    =

  • 8/6/2019 Sigcomm Tutorial

    26/189

    Copyright 1999. All Rights Reserved 26

    Lookups Using Hashing

    Advantages:

    Simple

    Expected lookup time can be small

    Disadvantages

    Non-deterministic lookup time

    Inefficient use of memory

  • 8/6/2019 Sigcomm Tutorial

    27/189

    Copyright 1999. All Rights Reserved 27

    Trees and Tries

    Binary Search Tree

    < >

    < > < >

    log

    2N

    Nentries

    Binary Search Trie

    0 1

    0 1 0 1

    111010

  • 8/6/2019 Sigcomm Tutorial

    28/189

    Copyright 1999. All Rights Reserved 28

    Trees and Tries

    Multiway tries

    16-ary Search Trie

    0000, ptr 1111, ptr

    0000, 0 1111, ptr

    000011110000

    0000, 0 1111, ptr

    111111111111

    d i

  • 8/6/2019 Sigcomm Tutorial

    29/189

    Copyright 1999. All Rights Reserved 29

    Trees and TriesMultiway tries

    Degree ofTree

    # MemReferences

    # Nodes(x106)

    Total Memory(Mbytes)

    FractionWasted (%)

    2 48 1.09 4.3 49

    4 24 0.53 4.3 73

    8 16 0.35 5.6 8616 12 0.25 8.3 93

    64 8 0.17 21 98

    256 6 0.12 64 99.5

    Ew DL 1 1 1

    N

    DL-------

    D Di 1 Di 1( )N 1 D1 i( )N( )

    i 1=

    L 1

    +=

    En 1 DL 1

    N

    DL-------

    D Di Di 1 1 Di 1( )N

    i 1=

    L 1

    + +=

    Where:

    D D egree of tree=

    L Number of layers/references=

    N Number of entries in table=

    En Expected number of nodes=

    Ew Expected amount of wasted memory=

    Table produced from 215 randomly generated 48-bit addresses

  • 8/6/2019 Sigcomm Tutorial

    30/189

    Copyright 1999. All Rights Reserved 30

    Forwarding Decisions ATM and MPLS switches

    Direct Lookup

    Bridges and Ethernet switches

    Associative Lookup

    Hashing

    Trees and tries

    IP RoutersCaching

    CIDR

    Patricia trees/tries

    Other methods

    Packet Classification

  • 8/6/2019 Sigcomm Tutorial

    31/189

    Copyright 1999. All Rights Reserved 31

    Caching Addresses

    CPU BufferMemory

    LineCard

    DMA

    MAC

    Local

    BufferMemory

    LineCard

    DMA

    MAC

    Local

    BufferMemory

    LineCard

    DMA

    MAC

    Local

    BufferMemory

    Fast Path

    Slow Path

  • 8/6/2019 Sigcomm Tutorial

    32/189

    Copyright 1999. All Rights Reserved 32

    Caching Addresses

    LAN:Average flow < 40 packets

    WAN: Huge Number of flows

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    Cache = 10% of Full Table

    Cache

    Hit

    Rate

  • 8/6/2019 Sigcomm Tutorial

    33/189

    Copyright 1999. All Rights Reserved 33

    IP Routers

    Class-based addresses

    Class A Class B Class C D

    212.17.9.4

    Class A

    Class BClass C

    212.17.9.0 Port 4

    ExactMatch

    Routing Table:

    IP Address Space

  • 8/6/2019 Sigcomm Tutorial

    34/189

    Copyright 1999. All Rights Reserved 34

    IP Routers

    CIDR

    A B C D

    0 232

    -1

    0 232 -1

    128.9/16

    128.9.0.0

    216

    142.12/19

    65/8

    Classless:

    Class-based:

    128.9.16.14

  • 8/6/2019 Sigcomm Tutorial

    35/189

    Copyright 1999. All Rights Reserved 35

    IP Routers

    CIDR

    0 232 -1

    128.9/16

    128.9.16.14

    128.9.16/20 128.9.176/20

    128.9.19/24

    128.9.25/24

    Most specific route = longest matching prefix

  • 8/6/2019 Sigcomm Tutorial

    36/189

    Copyright 1999. All Rights Reserved 36

    IP Routers

    Metrics for Lookups

    128.9.16.14 128.9/16128.9.16/20

    128.9.176/20

    128.9.19/24

    128.9.25/24

    142.12/19

    65/8

    Prefix Port

    35

    2

    7

    10

    1

    3

    Lookup time Storage space Update time Preprocessing time

    IP R t

  • 8/6/2019 Sigcomm Tutorial

    37/189

    Copyright 1999. All Rights Reserved 37

    IP Router

    Lookup

    IPv4 unicast destination address based lookup

    DstnAddr Next Hop

    ----

    ----

    ---- ----

    ----

    ----

    Destination Next HopForwarding Table

    Next Hop Computation

    Forwarding Engine

    Incoming

    Packet

    HE

    A

    D

    E

    R

  • 8/6/2019 Sigcomm Tutorial

    38/189

    Copyright 1999. All Rights Reserved 38

    Need more than IPv4 unicast

    lookups Multicast

    PIM-SM

    Longest Prefix Matching on the source and group address

    Try (S,G) followed by (*,G) followed by (*,*,RP) Check Incoming Interface

    DVMRP:

    Incoming Interface Check followed by (S,G) lookup

    IPv6 128-bit destination address field

    Exact address architecture not yet known

  • 8/6/2019 Sigcomm Tutorial

    39/189

    Copyright 1999. All Rights Reserved 39

    Lookup Performance Required

    Gigabit Ethernet (84B packets): 1.49 Mpps

    Line Line Rate Pktsize=40B Pktsize=240B

    T1 1.5Mbps 4.68 Kpps 0.78 Kpps

    OC3 155Mbps 480 Kpps 80 Kpps

    OC12 622Mbps 1.94 Mpps 323 Kpps

    OC48 2.5Gbps 7.81 Mpps 1.3 Mpps

    OC192 10 Gbps 31.25 Mpps 5.21 Mpps

  • 8/6/2019 Sigcomm Tutorial

    40/189

    Copyright 1999. All Rights Reserved 40

    Size of the Routing Table

    Source: http://www.telstra.net/ops/bgptable.html

  • 8/6/2019 Sigcomm Tutorial

    41/189

    Copyright 1999. All Rights Reserved 41

    Ternary CAMs

    10.0.0.0 R110.1.0.0 R2

    10.1.1.0 R3

    10.1.3.0 R4

    255.0.0.0255.255.0.0

    255.255.255.0

    255.255.255.0

    255.255.255.25510.1.3.1 R4

    Value Mask

    Priority Encoder

    Next Hop

    Associative Memory

  • 8/6/2019 Sigcomm Tutorial

    42/189

    Copyright 1999. All Rights Reserved 42

    Binary Tries

    Example Prefixes

    a) 00001

    b) 00010

    c) 00011d) 001

    e) 0101

    f) 011

    g) 100

    h) 1010

    i) 1100

    j) 11110000a b c

    d

    e

    f g

    h i

    j

    0 1

    P t i i T

  • 8/6/2019 Sigcomm Tutorial

    43/189

    Copyright 1999. All Rights Reserved 43

    Patricia Tree

    Skip=5

    j

    a b c

    d

    e

    f g

    0 1

    h i

    Example Prefixesa) 00001

    b) 00010c) 00011

    d) 001

    e) 0101

    f) 011

    g) 100

    h) 1010

    i) 1100

    j) 11110000

  • 8/6/2019 Sigcomm Tutorial

    44/189

    Copyright 1999. All Rights Reserved 44

    Patricia Tree

    Disadvantages

    Many memory accesses

    May need backtracking Pointers take up a lot of

    space

    Advantages

    General Solution

    Extensible to widerfields

    Avoid backtracking by storing the intermediate-best matched prefix.(Dynamic Prefix Tries)

    40K entries: 2MB data structure with 0.3-0.5 Mpps [O(W)]

  • 8/6/2019 Sigcomm Tutorial

    45/189

    Copyright 1999. All Rights Reserved 45

    Binary search on trie levels

    P

    Level 0

    Level 29

    Level 8

  • 8/6/2019 Sigcomm Tutorial

    46/189

    Copyright 1999. All Rights Reserved 46

    Binary search on trie levels

    10.0.0.0/810.1.0.0/1610.1.1.0/24

    Example Prefixes

    10.1.2.0/24

    Length Hash

    8

    12

    16

    24

    Store a hash table for each prefix lengthto aid search at a particular trie level.

    10.2.3.0/24

    Example Addrs

    10.1.1.410.4.4.310.2.3.910.2.4.8

    10.0.0.0/810.1.0.0/1610.1.1.0/24

    Example Prefixes

    10.1.2.0/2410.2.3.0/2410

    10.1, 10.2

    10.1.1, 10.1.2, 10.2.3

  • 8/6/2019 Sigcomm Tutorial

    47/189

    Copyright 1999. All Rights Reserved 47

    Binary search on trie levels

    Disadvantages

    Multiple hashed memory

    accesses. Updates are complex.

    Advantages

    Scaleable to IPv6.

    33K entries: 1.4MB data structure with 1.2-2.2 Mpps [O(log W)]

  • 8/6/2019 Sigcomm Tutorial

    48/189

    Copyright 1999. All Rights Reserved 48

    Compacting Forwarding Tables

    1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1

  • 8/6/2019 Sigcomm Tutorial

    49/189

    Copyright 1999. All Rights Reserved 49

    Compacting Forwarding Tables

    10001010 11100010 10000010 10110100 11000000

    R1, 0 R5, 0R2, 3 R3, 7 R4, 9

    0 13

    Codeword array

    Base index array

    01

    0 321 4

  • 8/6/2019 Sigcomm Tutorial

    50/189

  • 8/6/2019 Sigcomm Tutorial

    51/189

    Copyright 1999. All Rights Reserved 51

    16-ary Search Trie

    0000, ptr 1111, ptr

    0000, 0 1111, ptr

    000011110000

    0000, 0 1111, ptr

    111111111111

    Multi-bit Tries

  • 8/6/2019 Sigcomm Tutorial

    52/189

    Copyright 1999. All Rights Reserved 52

    Compressed Tries

    L16

    L24

    L8

    Only 3 memory accesses

  • 8/6/2019 Sigcomm Tutorial

    53/189

    Copyright 1999. All Rights Reserved 53

    Routing Lookups in Hardware

    Prefix length

    Numbe

    r

    Most prefixes are 24-bits or shorter

    Routing Lookups in Hardware

  • 8/6/2019 Sigcomm Tutorial

    54/189

    Copyright 1999. All Rights Reserved 54

    Routing Lookups in Hardware

    142

    .19.6.14

    Prefixes up to 24-bits

    142.19

    .6

    14

    1 Next Hop

    24

    Next Hop

    142.19.6

    224 = 16M entries

    Routing Lookups in Hardware

  • 8/6/2019 Sigcomm Tutorial

    55/189

    Copyright 1999. All Rights Reserved 55

    Routing Lookups in Hardware

    128

    .3.72.44

    Prefixes up to 24-bits

    128.3. 7

    2

    44

    1 Next Hop

    128.3.72

    240 Pointer

    8

    Prefixes above24-bits

    Next Hop

    Next Hop

    Next Hop

    offset

    base

    Routing Lookups in Hardware

  • 8/6/2019 Sigcomm Tutorial

    56/189

    Copyright 1999. All Rights Reserved 56

    Routing Lookups in Hardware

    Prefixes up to n-bits2n entries:

    0

    N + M

    N

    i j Prefixeslonger than

    N+M bits

    Next Hop

    ( )2m

    i entries

  • 8/6/2019 Sigcomm Tutorial

    57/189

    Copyright 1999. All Rights Reserved 57

    Routing Lookups in Hardware

    Disadvantages

    Large memory required

    (9-33MB) Depends on prefix-length

    distribution.

    Advantages

    20Mpps with 50ns

    DRAM

    Easy to implement in

    hardware

    Various compression schemes can be employed to decrease the

    storage requirements: e.g. employ carefully chosen variable length

    strides, bitmap compression etc.

  • 8/6/2019 Sigcomm Tutorial

    58/189

    Copyright 1999. All Rights Reserved 58

    IP Router Lookups

    References A. Brodnik, S. Carlsson, M. Degermark, S. Pink. Small Forwarding Tables

    for Fast Routing Lookups, Sigcomm 1997, pp 3-14.

    B. Lampson, V. Srinivasan, G. Varghese. IP lookups using multiway and

    multicolumn search, Infocom 1998, pp 1248-56, vol. 3.

    M. Waldvogel, G. Varghese, J. Turner, B. Plattner. Scalable high speed IP

    routing lookups, Sigcomm 1997, pp 25-36.

    P. Gupta, S. Lin, N.McKeown. Routing lookups in hardware at memory

    access speeds, Infocom 1998, pp 1241-1248, vol. 3.

    S. Nilsson, G. Karlsson. Fast address lookup for Internet routers, IFIP Intl

    Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998.

    V. Srinivasan, G.Varghese. Fast IP lookups using controlled prefix

    expansion, Sigmetrics, June 1998.

  • 8/6/2019 Sigcomm Tutorial

    59/189

    Copyright 1999. All Rights Reserved 59

    Forwarding Decisions ATM and MPLS switches

    Direct Lookup Bridges and Ethernet switches

    Associative Lookup

    Hashing

    Trees and tries

    IP Routers

    Caching

    CIDR

    Patricia trees/tries

    Other methods

    Packet Classification

    P idi V l Add d S i

  • 8/6/2019 Sigcomm Tutorial

    60/189

    Copyright 1999. All Rights Reserved 60

    Providing Value-Added Services

    Some examples

    Differentiated services

    Regard traffic from Autonomous System #33 as `platinum-grade

    Access Control Lists

    Deny udp host 194.72.72.33 194.72.6.64 0.0.0.15 eq snmp

    Committed Access Rate

    Rate limit WWW traffic from sub-interface#739 to 10Mbps

    Policy-based Routing

    Route all voice traffic through the ATM network

  • 8/6/2019 Sigcomm Tutorial

    61/189

  • 8/6/2019 Sigcomm Tutorial

    62/189

    Copyright 1999. All Rights Reserved 62

    Multi-field Packet Classification

    Given a classifier with N rules, find the action associatedwith the highest priority rule matching an incoming

    packet.

    Field 1 Field 2 Field k Action

    Rule 1 152.163.190.69/ 21 152.163.80.11/ 32 UDP A1

    Rule 2 152.168.3.0/ 24 152.163.0.0/ 16 TCP A2

    Rule N 152.168.0.0/ 16 152.0.0.0/ 8 ANY An

    G i I i i 2D

  • 8/6/2019 Sigcomm Tutorial

    63/189

    Copyright 1999. All Rights Reserved 63

    R5

    Geometric Interpretation in 2D

    R4

    R3

    R2R1

    R7

    P2

    Field #1

    Field# 2

    R6

    Field #1 Field #2 Data

    P1

    e.g. (128.16.46.23, *)e.g. (144.24/16, 64/24)

  • 8/6/2019 Sigcomm Tutorial

    64/189

  • 8/6/2019 Sigcomm Tutorial

    65/189

    Copyright 1999. All Rights Reserved 65

    Proposed Schemes (Contd.)

    Pros ConsCrossproducting

    (Srinivasan etal[Sigcomm 98])

    Fast accesses.

    Suitable formultiple fields.

    Large memory

    requirements. Suitablewithout caching forclassifiers with fewer than50 rules.

    Bil-level Parallelism

    (Lakshman andStiliadis[Sigcomm 98])

    Suitable for

    multiple fields.

    Large memory bandwidth

    required. Comparativelyslow lookup rate.Hardware only.

  • 8/6/2019 Sigcomm Tutorial

    66/189

    Copyright 1999. All Rights Reserved 66

    Proposed Schemes (Contd.)

    Pros ConsHierarchical

    Intelligent Cuttings

    (Gupta and

    McKeown[HotI 99])

    Suitable for multiple

    fields. Small memory

    requirements. Good

    update time.

    Large preprocessing

    time.

    Tuple Space Search

    (Srinivasan et

    al[Sigcomm 99])

    Suitable for multiple

    fields. The basic scheme

    has good update times

    and memory

    requirements.

    Classification rate can be

    low. Requires perfect

    hashing for determinism.

    Recursive Flow

    Classification (Gupta

    and

    McKeown[Sigcomm

    99])

    Fast accesses. Suitable for

    multiple fields.

    Reasonable memory

    requirements for real-life

    classifiers.

    Large preprocessing time

    and memory

    requirements for large

    classifiers.

  • 8/6/2019 Sigcomm Tutorial

    67/189

    Copyright 1999. All Rights Reserved 67

    Grid of Tries

    R7

    R4

    R6R5R3

    R2

    R1

    Dimension 1

    Dimension 2

    0

    00

    0

    0

    01

    1

    1

    1

    1

    1

    0

    0

    000

  • 8/6/2019 Sigcomm Tutorial

    68/189

    Copyright 1999. All Rights Reserved 68

    Grid of Tries

    Disadvantages

    Static solution

    Not easy to extend to

    higher dimensions

    Advantages

    Good solution for two

    dimensions

    20K entries: 2MB data structure with 9 memory accesses [at most 2W]

  • 8/6/2019 Sigcomm Tutorial

    69/189

    Copyright 1999. All Rights Reserved 69

    Classification using Bit Parallelism

    R4 R3 R2R11

    1

    0

    0

    1

    0

    1

    1

  • 8/6/2019 Sigcomm Tutorial

    70/189

    Copyright 1999. All Rights Reserved 70

    Classification using Bit Parallelism

    Disadvantages

    Large memory

    bandwidth

    Hardware optimized

    Advantages

    Good solution for

    multiple dimensions

    for small classifiers

    512 rules: 1Mpps with single FPGA and 5 128KB SRAM chips.

  • 8/6/2019 Sigcomm Tutorial

    71/189

    Copyright 1999. All Rights Reserved 71

    Classification Using Multiple Fields

    Recursive Flow ClassificationPacket Header

    F1

    F2

    F3

    F4

    Fn

    MemoryMemory

    Action

    Memory

    2S =2128 2T =212

    2S =2128

    2T

    =

    2122

    64

    224

  • 8/6/2019 Sigcomm Tutorial

    72/189

    Copyright 1999. All Rights Reserved 72

    Packet Classification

    References T.V. Lakshman. D. Stiliadis. High speed policy based packet

    forwarding using efficient multi-dimensional range matching,

    Sigcomm 1998, pp 191-202.

    V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. Fast and

    scalable layer 4 switching, Sigcomm 1998, pp 203-214.

    V. Srinivasan, G. Varghese, S. Suri. Fast packet classification using

    tuple space search, to be presented at Sigcomm 1999.

    P. Gupta, N. McKeown, Packet classification using hierarchicalintelligent cuttings, Hot Interconnects VII, 1999.

    P. Gupta, N. McKeown, Packet classification on multiple fields,

    Sigcomm 1999.

  • 8/6/2019 Sigcomm Tutorial

    73/189

    Copyright 1999. All Rights Reserved 73

    Tutorial Outline

    Introduction:What is a Packet Switch?

    Packet Lookup and Classification:Where does a packet go next?

    Switching Fabrics:How does the packet get there?

    Output Scheduling:When should the packet leave?

  • 8/6/2019 Sigcomm Tutorial

    74/189

    Copyright 1999. All Rights Reserved 74

    Switching Fabrics

    Output and Input Queueing

    Output Queueing

    Input Queueing

    Scheduling algorithms

    Combining input and output queues

    Other non-blocking fabrics

    Multicast traffic

    Basic Architectural Components

  • 8/6/2019 Sigcomm Tutorial

    75/189

    Copyright 1999. All Rights Reserved 75

    Basic Architectural ComponentsDatapath: per-packet processing

    ForwardingDecision

    ForwardingDecision

    ForwardingDecision

    Forwarding

    Table

    Forwarding

    Table

    Forwarding

    Table

    Interconnect

    OutputScheduling

    1.

    2.

    3.

  • 8/6/2019 Sigcomm Tutorial

    76/189

    Copyright 1999. All Rights Reserved 76

    InterconnectsTwo basic techniques

    Input Queueing Output Queueing

    Usually a non-blocking

    switch fabric (e.g. crossbar)

    Usually a fast bus

  • 8/6/2019 Sigcomm Tutorial

    77/189

    Copyright 1999. All Rights Reserved 77

    InterconnectsOutput Queueing

    Individual Output Queues Centralized Shared Memory

    Memory b/w = (N+1).R

    1

    2

    N

    Memory b/w = 2N.R

    1

    2

    N

    Output Queueing

  • 8/6/2019 Sigcomm Tutorial

    78/189

    Copyright 1999. All Rights Reserved 78

    Output QueueingThe ideal

    1

    1

    1

    1

    1

    1

    1

    1

    1

    11

    1

    2

    2

    2

    2

    2

    2

  • 8/6/2019 Sigcomm Tutorial

    79/189

  • 8/6/2019 Sigcomm Tutorial

    80/189

    Copyright 1999. All Rights Reserved 80

    Switching Fabrics

    Output and Input Queueing

    Output Queueing

    Input QueueingScheduling algorithms

    Other non-blocking fabrics

    Combining input and output queues

    Multicast traffic

  • 8/6/2019 Sigcomm Tutorial

    81/189

  • 8/6/2019 Sigcomm Tutorial

    82/189

    Copyright 1999. All Rights Reserved 82

    Input Queueing

    Head of Line Blocking

    D

    ela

    y

    Load58.6% 100%

    Head of Line Blocking

  • 8/6/2019 Sigcomm Tutorial

    83/189

    Copyright 1999. All Rights Reserved 83

    Head of Line Blocking

  • 8/6/2019 Sigcomm Tutorial

    84/189

    Copyright 1999. All Rights Reserved 84

  • 8/6/2019 Sigcomm Tutorial

    85/189

    Input QueueingV l

  • 8/6/2019 Sigcomm Tutorial

    86/189

    Copyright 1999. All Rights Reserved 86

    Virtual output queues

  • 8/6/2019 Sigcomm Tutorial

    87/189

    Copyright 1999. All Rights Reserved 87

    Input QueuesVirtual Output Queues

    D

    ela

    y

    Load100%

  • 8/6/2019 Sigcomm Tutorial

    88/189

    Copyright 1999. All Rights Reserved 88

    Input Queueing

    Scheduler

    Memory b/w = 2R

    Can be quite

    complex!

    Input Queueing

  • 8/6/2019 Sigcomm Tutorial

    89/189

    Copyright 1999. All Rights Reserved 89

    Input QueueingScheduling

    Input 1Q(1,1)

    Q(1,n)

    A1(t)

    Input m

    Q(m,1)

    Q(m,n)

    Am(t)

    D1(t)

    Dn(t)

    Output 1

    Output n

    Matching, MA1,1(t)

    ?

    Q i

  • 8/6/2019 Sigcomm Tutorial

    90/189

    Copyright 1999. All Rights Reserved 90

    Input Queueing

    Scheduling

    Request

    Graph

    1

    2

    34

    1

    2

    342

    5

    242

    7

    Bipartite

    Matching

    1

    2

    34

    1

    2

    34

    (Weight = 18)

    Question:Maximum weight or maximum size?

    I Q i

  • 8/6/2019 Sigcomm Tutorial

    91/189

    Copyright 1999. All Rights Reserved 91

    Input Queueing

    Scheduling Maximum Size

    Maximizes instantaneous throughput

    Does it maximize long-term throughput? Maximum Weight

    Can clear most backlogged queues

    But does it sacrifice long-term throughput?

    I Q i

  • 8/6/2019 Sigcomm Tutorial

    92/189

    Copyright 1999. All Rights Reserved 92

    Input Queueing

    Scheduling

    1

    2

    1

    2

    1

    2

    1

    2

    Input Queueing

  • 8/6/2019 Sigcomm Tutorial

    93/189

    Copyright 1999. All Rights Reserved 93

    Longest Queue First or

    Oldest Cell First

    1

    23

    4

    1

    23

    4

    1

    23

    4

    1

    23

    4

    10

    1

    11

    110

    Maximum weight

    WeightWaiting Time

    100%Queue Length

    { }=

    Input Queueing

  • 8/6/2019 Sigcomm Tutorial

    94/189

    Copyright 1999. All Rights Reserved 94

    Input QueueingWhy is serving long/old queues better than

    serving maximum number of queues? When traffic is uniformly distributed, servicing the

    maximum number of queues leads to 100% throughput. When traffic is non-uniform, some queues become

    longer than others. A good algorithm keeps the queue lengths matched, and

    services a large number of queues.

    VOQ #

    AvgOccupancy Uniform traffic

    VOQ #

    AvgOcc

    upancy

    Non-uniform traffic

  • 8/6/2019 Sigcomm Tutorial

    95/189

  • 8/6/2019 Sigcomm Tutorial

    96/189

    Copyright 1999. All Rights Reserved 96

    Wave Front Arbiter

    Requests Match

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

  • 8/6/2019 Sigcomm Tutorial

    97/189

    Copyright 1999. All Rights Reserved 97

    Wave Front Arbiter

    Requests Match

    W F t A bit

  • 8/6/2019 Sigcomm Tutorial

    98/189

    Copyright 1999. All Rights Reserved 98

    Wave Front ArbiterImplementation

    1,1 1,2 1,3 1,4

    2,1 2,2 2,3 2,4

    3,1 3,2 3,3 3,4

    4,1 4,2 4,3 4,4

    Combinational

    Logic Blocks

    W F t A bit

  • 8/6/2019 Sigcomm Tutorial

    99/189

    Copyright 1999. All Rights Reserved 99

    Wave Front ArbiterWrapped WFA (WWFA)

    Requests Match

    N steps instead of

    2N-1

    I t Q i

  • 8/6/2019 Sigcomm Tutorial

    100/189

    Copyright 1999. All Rights Reserved 100

    Input QueueingPractical Algorithms

    Maximal Size Algorithms

    Wave Front Arbiter (WFA)

    Parallel Iterative Matching (PIM)iSLIP

    Maximal Weight Algorithms

    Fair Access Round Robin (FARR)Longest Port First (LPF)

  • 8/6/2019 Sigcomm Tutorial

    101/189

    Copyright 1999. All Rights Reserved 101

    Parallel Iterative Matching

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

    1

    2

    3

    4

    Requests

    1

    2

    3

    4

    1

    2

    3

    4

    Grant

    1

    2

    3

    4

    1

    2

    3

    4

    Accept/Match

    1

    2

    3

    4

    1

    2

    3

    4

    #1

    #2

    Random Selection

    1

    2

    3

    4

    1

    2

    3

    4

    Random Selection

    Parallel Iterative Matching

  • 8/6/2019 Sigcomm Tutorial

    102/189

    Copyright 1999. All Rights Reserved 102

    Parallel Iterative MatchingMaximal is not Maximum

    1

    2

    34

    1

    2

    34

    Requests Accept/Match

    1

    2

    3

    4

    1

    2

    34

    1

    2

    3

    4

    1

    2

    3

    4

    Parallel Iterative Matching

  • 8/6/2019 Sigcomm Tutorial

    103/189

    Copyright 1999. All Rights Reserved 103

    Parallel Iterative MatchingAnalytical Results

    E C[ ] Nlog

    E U i[ ]N2

    4i-------

    C # of iterations required to resolve connections=

    N # of ports=

    Ui # of unresolved connections after iteration i=

    Number of iterations to converge:

    Parallel Iterative Matching

  • 8/6/2019 Sigcomm Tutorial

    104/189

    Copyright 1999. All Rights Reserved 104

    Parallel Iterative Matching

  • 8/6/2019 Sigcomm Tutorial

    105/189

    Copyright 1999. All Rights Reserved 105

    Parallel Iterative Matching

  • 8/6/2019 Sigcomm Tutorial

    106/189

    Copyright 1999. All Rights Reserved 106

    Inp t Q e eing

  • 8/6/2019 Sigcomm Tutorial

    107/189

    Copyright 1999. All Rights Reserved 107

    Input QueueingPractical Algorithms

    Maximal Size Algorithms

    Wave Front Arbiter (WFA)

    Parallel Iterative Matching (PIM)iSLIP

    Maximal Weight Algorithms

    Fair Access Round Robin (FARR)Longest Port First (LPF)

  • 8/6/2019 Sigcomm Tutorial

    108/189

    iSLIP

  • 8/6/2019 Sigcomm Tutorial

    109/189

    Copyright 1999. All Rights Reserved 109

    iSLIPProperties

    Random under low load

    TDM under high load

    Lowest priority to MRU 1 iteration: fair to outputs

    Converges in at most N iterations. On average

  • 8/6/2019 Sigcomm Tutorial

    110/189

    Copyright 1999. All Rights Reserved 110

  • 8/6/2019 Sigcomm Tutorial

    111/189

    iSLIP

  • 8/6/2019 Sigcomm Tutorial

    112/189

    Copyright 1999. All Rights Reserved 112

    iSLIPImplementation

    Grant

    Grant

    Grant

    Accept

    Accept

    Accept

    1

    2

    N

    1

    2

    N

    State

    N

    N

    N

    Decision

    log2N

    log2N

    log2N

    Programmable

    Priority Encoder

    Input Queueing References

  • 8/6/2019 Sigcomm Tutorial

    113/189

    Copyright 1999. All Rights Reserved 113

    Input Queueing ReferencesReferences

    M. Karol et al. Input vs Output Queueing on a Space-Division Packet

    Switch, IEEE Trans Comm., Dec 1987, pp. 1347-1356.

    Y. Tamir, Symmetric Crossbar arbiters for VLSI communication

    switches, IEEE Trans Parallel and Dist Sys., Jan 1993, pp.13-27.

    T. Anderson et al. High-Speed Switch Scheduling for Local Area

    Networks, ACM Trans Comp Sys., Nov 1993, pp. 319-352.

    N. McKeown, The iSLIP scheduling algorithm for Input-Queued

    Switches, IEEE Trans Networking, April 1999, pp. 188-201.

    C. Lund et al. Fair prioritized scheduling in an input-bufferedswitch, Proc. of IFIP-IEEE Conf., April 1996, pp. 358-69.

    A. Mekkitikul et al. A Practical Scheduling Algorithm to Achieve

    100% Throughput in Input-Queued Switches, IEEE Infocom 98,

    April 1998.

  • 8/6/2019 Sigcomm Tutorial

    114/189

    Other Non Blocking Fabrics

  • 8/6/2019 Sigcomm Tutorial

    115/189

    Copyright 1999. All Rights Reserved 115

    Other Non-Blocking FabricsClos Network

    Other Non Blocking Fabrics

  • 8/6/2019 Sigcomm Tutorial

    116/189

    Copyright 1999. All Rights Reserved 116

    Other Non-Blocking FabricsClos Network

    Expansion factor required = 2-1/N (but still blocking for multicast)

    Other Non-Blocking Fabrics

  • 8/6/2019 Sigcomm Tutorial

    117/189

    Copyright 1999. All Rights Reserved 117

    Other Non-Blocking FabricsSelf-Routing Networks

    000

    001

    010

    011

    100

    101

    110

    111

    000

    001

    010

    011

    100

    101

    110

    111

  • 8/6/2019 Sigcomm Tutorial

    118/189

  • 8/6/2019 Sigcomm Tutorial

    119/189

    Speedup

  • 8/6/2019 Sigcomm Tutorial

    120/189

    Copyright 1999. All Rights Reserved 120

    Context

    input-queued switches

    output-queued switches

    the speedup problem

    Early approaches

    Algorithms

    Implementation considerations

    Speedup: Context

  • 8/6/2019 Sigcomm Tutorial

    121/189

    Copyright 1999. All Rights Reserved 121

    Speedup: Context

    Me

    m

    o

    r

    y

    Me

    m

    o

    r

    y

    The placement of memory gives

    - Output-queued switches- Input-queued switches- Combined input- and output-queued switches

    A generic switch

    Output-queued switches

  • 8/6/2019 Sigcomm Tutorial

    122/189

    Copyright 1999. All Rights Reserved 122

    Best delay and throughput performance- Possible to erect bandwidth firewalls between sessions

    Main problem- Requires high fabric speedup (S = N)

    Unsuitable for high-speed switching

    Input-queued switches

  • 8/6/2019 Sigcomm Tutorial

    123/189

    Copyright 1999. All Rights Reserved 123

    Big advantage- Speedup of one is sufficient

    Main problem

    - Cant guarantee delay due to input contention

    Overcoming input contention: use higher speedup

    A Comparison

  • 8/6/2019 Sigcomm Tutorial

    124/189

    Copyright 1999. All Rights Reserved 124

    A Comparison

    Line Rate Memory

    BW

    Access Time

    Per cell

    Memory

    BW

    Access Time

    Memory speeds for 32x32 switch

    Output-queued Input-queued

    100 Mb/s 3.3 Gb/s 128 ns 200 Mb/s 2.12 s1 Gb/s 33 Gb/s 12.8 ns 2 Gb/s 212 ns

    2.5 Gb/s 82.5 Gb/s 5.12 ns 5 Gb/s 84.8 ns

    10 Gb/s 330 Gb/s 1.28ns 20 Gb/s 21.2 ns

    Th S d P bl

  • 8/6/2019 Sigcomm Tutorial

    125/189

    Copyright 1999. All Rights Reserved 125

    The Speedup Problem

    Find a compromise: 1 < Speedup

  • 8/6/2019 Sigcomm Tutorial

    126/189

    The findings

  • 8/6/2019 Sigcomm Tutorial

    127/189

    Copyright 1999. All Rights Reserved 127

    The findings

    Very tantalizing ...

    - under different settings (traffic, loading, algorithm, etc)

    - and even for varying switch sizes

    A speedup of between 2 and 5 was sufficient!

    Using Speedup

  • 8/6/2019 Sigcomm Tutorial

    128/189

    Copyright 1999. All Rights Reserved 128

    1

    1

    1

    2

    2

    Intuition

  • 8/6/2019 Sigcomm Tutorial

    129/189

    Copyright 1999. All Rights Reserved 129

    Speedup = 1

    Speedup = 2

    Fabric throughput = .58

    Bernoulli IID inputs

    Fabric throughput = 1.16

    Bernoulli IID inputs

    I/p efficiency, = 1/1.16

    Ave I/p queue = 6.25

    Intuition (continued)

  • 8/6/2019 Sigcomm Tutorial

    130/189

    Copyright 1999. All Rights Reserved 130

    Speedup = 3Fabric throughput = 1.74

    Bernoulli IID inputs

    Input efficiency = 1/1.74

    Speedup = 4 Fabric throughput = 2.32

    Bernoulli IID inputs

    Input efficiency = 1/2.32

    Ave I/p queue = 0.75

    Ave I/p queue = 1.35

    Issues

  • 8/6/2019 Sigcomm Tutorial

    131/189

    Copyright 1999. All Rights Reserved 131

    Need hard guarantees- exact, not average

    Robustness- realistic, even adversarial, traffic not friendly Bernoulli IID

    The Ideal Solution

  • 8/6/2019 Sigcomm Tutorial

    132/189

    Copyright 1999. All Rights Reserved 132

    Speedup = N

    ?

    Speedup

  • 8/6/2019 Sigcomm Tutorial

    133/189

    Copyright 1999. All Rights Reserved 133

    Apply same inputs to an OQ and a CIOQ switch

    - packet by packet

    Obtain same outputs- packet by packet

    Algorithm - MUCF

  • 8/6/2019 Sigcomm Tutorial

    134/189

    Copyright 1999. All Rights Reserved 134

    Key concept: urgency value

    - urgency = departure time - present time

    MUCF

  • 8/6/2019 Sigcomm Tutorial

    135/189

    Copyright 1999. All Rights Reserved 135

    The algorithm

    - Outputs try to get their most urgent packets- Inputs grant to output whose packet is most

    urgent, ties broken by port number- Loser outputs for next most urgent packet- Algorithm terminates when no more matchings

    are possible

    Stable Marriage Problem

  • 8/6/2019 Sigcomm Tutorial

    136/189

    Copyright 1999. All Rights Reserved 136

    MariaHillary Monica

    PedroJohnBill

    Men = Outputs

    Women = Inputs

    An example

  • 8/6/2019 Sigcomm Tutorial

    137/189

    Copyright 1999. All Rights Reserved 137

    Observation: Only two reasons a packet doesnt get to its output

    - Input contention, Output contention

    -This is why speedup of 2 works!!

    What does this get us?

  • 8/6/2019 Sigcomm Tutorial

    138/189

    Copyright 1999. All Rights Reserved 138

    g

    Speedup of 4 is sufficient for exact emulation of FIFO

    OQ switches, with MUCF

    What about non-FIFO OQ switches?

    E.g. WFQ, Strict priority

  • 8/6/2019 Sigcomm Tutorial

    139/189

    What gives?

  • 8/6/2019 Sigcomm Tutorial

    140/189

    Copyright 1999. All Rights Reserved 140

    g

    Complexity of the algorithms- Extra hardware for processing

    - Extra run time (time complexity)

    What is the benefit?

    - Reduced memory bandwidth requirements

    Tradeoff: Memory for processing

    - Moores Law supports this tradeoff

    Implementation - a closer look

  • 8/6/2019 Sigcomm Tutorial

    141/189

    Copyright 1999. All Rights Reserved 141

    Main sources of difficulty- Estimating urgency, etc - info is distributed

    - Matching process - too many iterations?

    Estimating urgency depends on what is being emulated

    - Like taking a ticket to hold a place in a queue

    - FIFO, Strict priorities - no problem

    - WFQ, etc - problems

    (and communicating this info among I/ps and O/ps)

    Implementation (contd)

  • 8/6/2019 Sigcomm Tutorial

    142/189

    Copyright 1999. All Rights Reserved 142

    Matching process

    - A variant of the stable marriage problem

    - Worst-case number of iterations in switching =

    N- High probability and average approxly log(N)

    - Worst-case number of iterations for SMP = N2

    Other Work

  • 8/6/2019 Sigcomm Tutorial

    143/189

    Copyright 1999. All Rights Reserved 143

    Relax stringent requirement of exact emulation

    - Least Occupied O/p First Algorithm (LOOFA)

    - Disallow arbitrary inputs

    Keeps outputs always busy if there are packets

    By time-stamping packets, it also exactly mimics

    E.g. leaky bucket constrained

    Obtain worst-case delay bounds

    References for speedup

  • 8/6/2019 Sigcomm Tutorial

    144/189

    Copyright 1999. All Rights Reserved 144

    p p

    - Y. Oie et al, Effect of speedup in nonblocking packet switch, ICC 89.- A.L Gupta, N.D. Georgana, Analysis of a packet switch with input and

    and output buffers and speed constraints, Infocom 91.

    - S-T. Chuang et al, Matching output queueing with a combined input and

    and output queued switch, IEEE JSAC, vol 17, no 6, 1999.- B. Prabhakar, N. McKeown, On the speedup required for combined input

    and output queued switching, Automatica, vol 35, 1999.

    - P. Krishna et al, On the speedup required for work-conserving crossbar

    switches, IEEE JSAC, vol 17, no 6, 1999.- A. Charny, Providing QoS guarantees in input buffered crossbar switches

    with speedup, PhD Thesis, MIT, 1998.

    Switching Fabrics

  • 8/6/2019 Sigcomm Tutorial

    145/189

    Copyright 1999. All Rights Reserved 145

    Switching Fabrics

    Output and Input Queueing

    Output Queueing

    Input QueueingScheduling algorithms

    Other non-blocking fabrics

    Combining input and output queuesMulticast traffic

    Multicast Switching

  • 8/6/2019 Sigcomm Tutorial

    146/189

    Copyright 1999. All Rights Reserved 146

    The problem

    Switching with crossbar fabrics

    Switching with other fabrics

    Multicasting

  • 8/6/2019 Sigcomm Tutorial

    147/189

    Copyright 1999. All Rights Reserved 147

    1

    2

    64

    3 5

  • 8/6/2019 Sigcomm Tutorial

    148/189

    Method 2

  • 8/6/2019 Sigcomm Tutorial

    149/189

    Copyright 1999. All Rights Reserved 149

    Use copying properties of crossbar fabric

    No fanout-splitting: Easy, but low

    throughput

    Fanout-splitting: higher

    throughput, but not as simple.

    Leaves residue.

    The effect of fanout-splitting

  • 8/6/2019 Sigcomm Tutorial

    150/189

    Copyright 1999. All Rights Reserved 150

    Performance of an 8x8 switch with and without fanout-splitting

    under uniform IID traffic

    Placement of residue

  • 8/6/2019 Sigcomm Tutorial

    151/189

    Copyright 1999. All Rights Reserved 151

    Key question: How should outputs grant requests?(and hence decide placement of residue)

    Residue and throughput

  • 8/6/2019 Sigcomm Tutorial

    152/189

    Copyright 1999. All Rights Reserved 152

    Result: Concentrating residue brings more new workforward. Hence leads to higher throughput.

    But, there are fairness problems to deal with.

    This and other problems can be looked at in a unified

    way by mapping the multicasting problem onto a

    variation of Tetris.

    Multicasting and Tetris

  • 8/6/2019 Sigcomm Tutorial

    153/189

    Copyright 1999. All Rights Reserved 153

    Output ports

    1 2 3 54

    1 2 3 54Input ports

    Residue

    Multicasting and Tetris

  • 8/6/2019 Sigcomm Tutorial

    154/189

    Copyright 1999. All Rights Reserved 154

    Output ports

    1 2 3 54

    1 2 3 54Input ports

    Residue

    Concentrated

    Replication by recycling

    Main idea: Make two copies at a time using a binary tree

  • 8/6/2019 Sigcomm Tutorial

    155/189

    Copyright 1999. All Rights Reserved 155

    Main idea: Make two copies at a time using a binary tree

    with input at root and all possible destination outputs atthe leaves.

    ab

    e

    x d

    yc

    x

    y

    a

    b

    cx

    y

    de

  • 8/6/2019 Sigcomm Tutorial

    156/189

  • 8/6/2019 Sigcomm Tutorial

    157/189

    Tutorial Outline

  • 8/6/2019 Sigcomm Tutorial

    158/189

    Copyright 1999. All Rights Reserved 158

    Tutorial Outline

    Introduction:What is a Packet Switch?

    Packet Lookup and Classification:

    Where does a packet go next?

    Switching Fabrics:How does the packet get there?

    Output Scheduling:When should the packet leave?

    Output Scheduling

  • 8/6/2019 Sigcomm Tutorial

    159/189

    Copyright 1999. All Rights Reserved 159

    What is output scheduling?

    How is it done?

    Practical Considerations

    Output Scheduling

  • 8/6/2019 Sigcomm Tutorial

    160/189

    Copyright 1999. All Rights Reserved 160

    scheduler

    Allocating output bandwidthControlling packet delay

    Output Scheduling

  • 8/6/2019 Sigcomm Tutorial

    161/189

    Copyright 1999. All Rights Reserved 161

    FIFO

    Fair Queueing

    Motivation

    FIFO i l b i Q S

  • 8/6/2019 Sigcomm Tutorial

    162/189

    Copyright 1999. All Rights Reserved 162

    FIFO is natural but gives poor QoS

    bursty flows increase delays for others

    hence cannot guarantee delays

    Need round robin scheduling of packets

    Fair Queueing

    Weighted Fair Queueing, Generalized Processor Sharing

    Fair queueing: Main issues

  • 8/6/2019 Sigcomm Tutorial

    163/189

    Copyright 1999. All Rights Reserved 163

    Level of granularitypacket-by-packet? (favors long packets)

    bit-by-bit? (ideal, but very complicated)

    Packet Generalized Processor Sharing (PGPS)

    serves packet-by-packet

    and imitates bit-by-bit schedule within a tolerance

    How does WFQ work?

  • 8/6/2019 Sigcomm Tutorial

    164/189

    Copyright 1999. All Rights Reserved 164

    WR = 1

    WG = 5

    WP = 2

    Delay guarantees

  • 8/6/2019 Sigcomm Tutorial

    165/189

    Copyright 1999. All Rights Reserved 165

    Theorem

    If flows are leaky bucket constrained and all nodes

    employ GPS (WFQ), then the network canguarantee worst-case delay bounds to sessions.

    Practical considerations

  • 8/6/2019 Sigcomm Tutorial

    166/189

    Copyright 1999. All Rights Reserved 166

    Forevery packet, the scheduler needs to

    classify it into the right flow queue and maintain a linked-list

    for each flow

    schedule it for departure

    Complexities of both are o(log [# of flows])

    first is hard to overcome

    second can be overcome by DRR

    Deficit Round Robin

  • 8/6/2019 Sigcomm Tutorial

    167/189

    Copyright 1999. All Rights Reserved 167

    50 700 250

    400 600

    200 600 100

    500

    500 Quantum size

    250

    500

    500400

    750

    1000

    Good approximation of FQ

    Much simpler to implement

    But...

  • 8/6/2019 Sigcomm Tutorial

    168/189

    Copyright 1999. All Rights Reserved 168

    WFQ is still very hard to implementclassification is a problem

    needs to maintain too much state information

    doesnt scale well

    Strict Priorities and Diff Serv

    l if fl i i i l

  • 8/6/2019 Sigcomm Tutorial

    169/189

    Copyright 1999. All Rights Reserved 169

    Classify flows into priority classes

    maintain only per-class queues

    perform FIFO within each class

    avoid curse of dimensionality

    Diff Serv

  • 8/6/2019 Sigcomm Tutorial

    170/189

    Copyright 1999. All Rights Reserved 170

    A framework for providing differentiated QoS

    set Type of Service (ToS) bits in packet headers

    this classifies packets into classes

    routers maintain per-class queues

    condition traffic at network edges to conform to

    class requirements

    May still need queue management inside the network

    References for O/p Scheduling

  • 8/6/2019 Sigcomm Tutorial

    171/189

    Copyright 1999. All Rights Reserved 171

    - A. Demers et al, Analysis and simulation of a fair queueing algorithm,

    ACM SIGCOMM 1989.

    - A. Parekh, R. Gallager, A generalized processor sharing approach to

    flow control in integrated services networks: the single node

    - M. Shreedhar, G. Varghese, Efficient Fair Queueing using Deficit Round

    Robin, ACM SIGCOMM, 1995.- K. Nichols, S. Blake (eds), Differentiated Services: Operational Model

    and Definitions, Internet Draft, 1998.

    case, IEEE Trans. on Networking, June 1993.- A. Parekh, R. Gallager, A generalized processor sharing approach to

    flow control in integrated services networks: the multiple node

    case, IEEE Trans. on Networking, August 1993.

    Active Queue Management

  • 8/6/2019 Sigcomm Tutorial

    172/189

    Copyright 1999. All Rights Reserved 172

    Problems with traditional queue management

    tail drop

    Active Queue Management

    goals

    an example

    effectiveness

    Tail Drop Queue ManagementL k O t

  • 8/6/2019 Sigcomm Tutorial

    173/189

    Copyright 1999. All Rights Reserved 173

    Max Queue Length

    Lock-Out

    Tail Drop Queue Management

  • 8/6/2019 Sigcomm Tutorial

    174/189

    Copyright 1999. All Rights Reserved 174

    Drop packets only when queue is full

    long steady-state delay

    global synchronization

    bias against bursty traffic

    Global Synchronization

  • 8/6/2019 Sigcomm Tutorial

    175/189

    Copyright 1999. All Rights Reserved 175

    Max Queue Length

    Bias Against Bursty Traffic

  • 8/6/2019 Sigcomm Tutorial

    176/189

    Copyright 1999. All Rights Reserved 176

    Max Queue Length

    Alternative Queue Management

    Schemes

  • 8/6/2019 Sigcomm Tutorial

    177/189

    Copyright 1999. All Rights Reserved 177

    Schemes

    Active Queue ManagementGoals

  • 8/6/2019 Sigcomm Tutorial

    178/189

    Copyright 1999. All Rights Reserved 178

    Solve lock-out and full-queue problems

    no lock-out behavior

    no global synchronization

    no bias against bursty flow

    Provide better QoS at a router

    low steady-state delay

    lower packet dropping

    Goals

    Active Queue Management

  • 8/6/2019 Sigcomm Tutorial

    179/189

    Copyright 1999 All Rights Reserved 179

    Problems with traditional queue management

    tail drop

    Active Queue Management

    goals

    an exampleeffectiveness

    Random Early Detection (RED)

  • 8/6/2019 Sigcomm Tutorial

    180/189

    Copyright 1999 All Rights Reserved 180

    q if qavg

    < minth: admit every packet

    q else if qavg maxth: drop every incoming packet

    minthmaxth

    P1P

    kP2

    qavg

    Effectiveness of RED: Lock-Out

  • 8/6/2019 Sigcomm Tutorial

    181/189

    Copyright 1999. All Rights Reserved 181

    Packets are randomly dropped

    Each flow has the same probability of being discarded

    Effectiveness of RED: Full-Queue

  • 8/6/2019 Sigcomm Tutorial

    182/189

    Copyright 1999. All Rights Reserved 182

    Drop packets probabilistically in anticipation of congestion (not when queue is full)

    Use qavg to decide packet dropping probability: allow instantaneous bursts

    Randomness avoids global synchronization

    What QoS does RED Provide?

  • 8/6/2019 Sigcomm Tutorial

    183/189

    Copyright 1999. All Rights Reserved 183

    Lower buffer delay: good interactive service

    qavg is controlled to be small

    Given responsive flows: packet dropping is reduced

    early congestion indication allows traffic to throttle back before congestion

    Given responsive flows: fair bandwidth allocation

    Unresponsive or aggressive flows

  • 8/6/2019 Sigcomm Tutorial

    184/189

    Copyright 1999. All Rights Reserved 184

    Dont properly back off during congestion

    Take away bandwidth from TCP

    compatible flows Monopolize buffer space

    Control Unresponsive Flows

  • 8/6/2019 Sigcomm Tutorial

    185/189

    Copyright 1999. All Rights Reserved 185

    Some active queue management schemes

    RED with penalty box

    Flow RED (FRED)

    Stabilized RED (SRED)

    identify and penalize unresponsive flows with a bit of extra work

    Active Queue ManagementReferences

  • 8/6/2019 Sigcomm Tutorial

    186/189

    Copyright 1999. All Rights Reserved 186

    B. Braden et al. Recommendations on queue management andcongestion avoidance in the internet, RFC2309, 1998.

    S. Floyd, V. Jacobson, Random early detection gateways for

    congestion avoidance, IEEE/ACM Trans. on Networking, 1(4),

    Aug. 1993. D. Lin, R. Morris, Dynamics on random early detection, ACM

    SIGCOMM, 1997

    T. Ott et al. SRED: Stabilized RED, INFOCOM 1999

    S. Floyd, K. Fall, Router mechanisms to support end-to-end

    congestion control, LBL technical report, 1997

    Tutorial Outline

  • 8/6/2019 Sigcomm Tutorial

    187/189

    Copyright 1999. All Rights Reserved 187

    Introduction:What is a Packet Switch?

    Packet Lookup and Classification:

    Where does a packet go next? Switching Fabrics:

    How does the packet get there?

    Output Scheduling:When should the packet leave?

    Basic Architectural Components

  • 8/6/2019 Sigcomm Tutorial

    188/189

    Copyright 1999. All Rights Reserved 188

    PolicingOutput

    SchedulingSwitching

    Routing

    Congestion

    ControlReservation

    Admission

    Control

    Control

    Datapath:per-packet

    processing

    Basic Architectural ComponentsDatapath: per-packet processing

    Output1.

    3.

  • 8/6/2019 Sigcomm Tutorial

    189/189

    ForwardingDecision

    ForwardingDecision

    ForwardingTable

    ForwardingTable

    F di

    Interconnect

    Output

    Scheduling2.