© Sudhakar Yalamanchili, Georgia Institute of Technology

Embed Size (px)

Citation preview

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    1/28

    Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated)

    TopologiesTopologies

    ECE 8813a (2)

    OverviewOverview

    Direct Networks

    Indirect Networks

    Cost Model

    Comparison of Direct and Indirect Networks

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    2/28

    ECE 8813a (3)

    ClassificationClassification

    Shared medium networks

    Example: backplane buses

    Direct networksExample: k -ary n -cubes, meshes, and trees

    Indirect networksExample: multistage interconnection networks

    Hybrid Networks

    Example: hypergraph topologies

    ECE 8813a (4)

    Direct NetworksDirect Networks

    Buses do not scale, electrically or in bandwidth Full connectivity too expensive (not the same as Xbars) Network built on point-to-point transfers Topologies: Strongly and weakly orthogonal

    Processor Memory

    Router

    Ejectionchannels

    injectionchannels

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    3/28

    ECE 8813a (5)

    System ViewSystem View

    Processor Memory

    Router

    Ejectionchannels

    injectionchannels

    SB

    NB

    NI

    Processor

    PCIeHigh

    latency region

    Performance critical

    From http://www.psc.edu/publications/tech_reports/PDIO/CrayXT3-ScalableArchitecture.jpg

    ECE 8813a (6)

    Common PropertiesCommon Properties

    Diameter

    Node degree

    Bisection BW

    Channel length

    Regularity andsymmetry

    Latency

    I/O BW (pin-out)

    Throughput

    Latency

    Routing and pathdiversity

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    4/28

    ECE 8813a (7)

    Evaluation MetricsEvaluation Metrics

    Bisection bandwidthThis is minimum bandwidth across any bisection of the network

    Bisection bandwidth is a limiting attribute of performance

    bisection

    ECE 8813a (8)

    Engineering ConsiderationsEngineering Considerations

    Distinguish between layout (physical) andtopology (logical)

    Averagechannel

    wire length

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    5/28

    ECE 8813a (9)

    Extensions to Higher DimensionsExtensions to Higher Dimensions

    Interleaved layout

    significant reducesthe wire/cable length

    Improves packagingmodularity

    Note the end-aroundconnections

    Impacts performance

    and cost

    Adapted from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)

    ECE 8813a (10)

    Common TopologiesCommon Topologies

    Binary hypercube

    Torus

    Multidimensional mesh

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    6/28

    ECE 8813a (11)

    Common TopologiesCommon Topologies

    Definition

    Basic connectivity propertiesDiameterI/O (also referred to as node size or pin-out)Bisection bandwidth

    Routing

    ECE 8813a (12)

    MetricsMetrics

    2WnWk n-1

    n -dimensional mesh

    nW NW/2Binary n -cube

    2Wn2Wk n-1k -ary n -cubeNode SizeBisection WidthNetwork

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    7/28

    ECE 8813a (13)

    Less Common TopologiesLess Common Topologies

    000

    001

    011

    010

    101

    111

    100

    110

    t

    u

    w

    x

    z

    t

    v

    yv

    y

    z

    u

    w

    x

    (a) Cube-Connected Cycles (b) De Bruijn Network (c) Star Graph

    Routing Basic properties

    ECE 8813a (14)

    Less Common Topologies (cont.)Less Common Topologies (cont.)

    (a) Binary Tree (b) Balanced Binary Tree

    Routing Basic properties A note on irregular topologies

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    8/28

    ECE 8813a (15)

    Generalized HypercubesGeneralized Hypercubes

    Generalization of tori to multiple dimensions

    and multiple radicesUnique radix in each dimension

    Preserves the structure of addressing androuting techniques

    ECE 8813a (16)

    Indirect NetworksIndirect Networks

    5 3

    6

    1 4

    7 2Switches

    0

    8

    Bidirectional Link

    Processing Elements

    Switches may or may not host end-points

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    9/28

    ECE 8813a (17)

    Multistage Interconnection NetworksMultistage Interconnection Networks

    7

    6

    5

    4

    3

    2

    1

    0

    7

    6

    5

    4

    3

    2

    1

    0

    Switch states

    Interconnect specifiedas a permutation

    Number of stages =log 2 N

    Can be generalized toKxK switches

    Networks defined byinter-stagepermutations

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (18)

    The Shuffle InterconnectionThe Shuffle Interconnection

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    (a) Perfect Shuffle (b) Inverse Perfect Shuffle

    shuffle(i)

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    10/28

    ECE 8813a (19)

    The Baseline InterconnectionThe Baseline Interconnection

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    (a) Second Baseline (b) First Baseline (c) Zeroth Baseline

    baseline(i)

    ECE 8813a (20)

    The Butterfly InterconnectionThe Butterfly Interconnection

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    (a) Second Butterfly (b) First Butterfly (c) Zeroth Butterfly

    butterfly(i)

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    11/28

    ECE 8813a (21)

    The Cube InterconnectionThe Cube Interconnection

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    0(000)

    1(001)

    2(010)

    3(011)

    4(100)

    5(101)

    6(110)

    7(111)

    (a) Second Cube (b) First Cube (c) Zeroth Cube

    cube(i)

    ECE 8813a (22)

    Omega NetworkOmega Network0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    1110

    1111

    0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    1110

    1111

    shuffle

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    12/28

    ECE 8813a (23)

    Baseline NetworkBaseline Network0000

    0001

    00100011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    11101111

    0000

    0001

    00100011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    11101111

    sub-shuffle i

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (24)

    ButterflyButterfly0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    1110

    1111

    0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    1110

    1111

    butterfly i

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    13/28

    ECE 8813a (25)

    Cube NetworkCube Network0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    11101111

    0000

    0001

    0010

    0011

    0100

    0101

    0110

    0111

    1000

    1001

    1010

    1011

    1100

    1101

    11101111

    cube i

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (26)

    Routing inRouting in MINsMINs

    Routing can be modeled as a sequence addresstransformations

    Each stage transforms a bit of the source addressinto a bit of the destination address

    Routing Implementation: a single bit of the

    destination address determines the output port

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    14/28

    ECE 8813a (27)

    Basic PropertiesBasic Properties

    Diameter, path length and pin-out

    Bisection bandwidth

    ECE 8813a (28)

    Blocking vs. NonBlocking vs. Non -- blocking Networksblocking Networks

    blocking topology

    X

    non-blocking topology

    7

    6

    5

    4

    3

    2

    1

    0

    76543210

    7

    6

    5

    4

    3

    2

    1

    0

    7

    6

    5

    4

    3

    2

    1

    0

    Consider the permutation behaviorModel the input-output requests as permutations of the source addresses

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    15/28

    ECE 8813a (29)

    BlockingBlocking BehaviorBehavior

    Strictly non-blockingA new connection can always be set upEvery permutation can be realized

    Weakly non-blockingStrictly non-blocking only under some routing protocols

    BlockingSome permutations cannot be realized

    RearrangeableEvery permutation can be realized by rearrangingexisting connections

    ECE 8813a (30)

    Crossbar NetworkCrossbar Network

    7

    6

    5

    4

    3

    2

    1

    0

    15

    14

    13

    12

    11

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    15

    14

    13

    12

    11

    10

    9

    8

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    16/28

    ECE 8813a (31)

    NonNon -- BlockingBlocking ClosClos NetworkNetwork

    7

    6

    5

    4

    3

    2

    1

    0

    15

    14

    13

    12

    11

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    15

    14

    13

    12

    11

    10

    9

    8

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (32)

    ClosClos Network PropertiesNetwork Properties

    General 3 stage non-blocking networkOriginally conceived for telephone networks

    Recursive decompositionProduces the Benes network with 2x2 switches

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    17/28

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    18/28

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    19/28

    ECE 8813a (37)

    Path DiversityPath Diversity

    7

    6

    5

    4

    32

    1

    0

    15

    14

    13

    12

    11

    10

    9

    8

    7

    6

    5

    4

    32

    1

    0

    15

    14

    13

    12

    11

    10

    9

    8

    Contention free, paths 0 to 1 and 4 to 1. 16 port, 7 stage Clos network = Benes topology

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (38)

    BidirectionalBidirectional MINsMINs

    000

    001

    010

    011

    100

    101

    110

    111

    Nodes

    C 0 G 0 C 1 G 1 C 2 G 2

    Forward Backward Turnaround

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    20/28

    ECE 8813a (39)

    Routing in Bidirectional MINSRouting in Bidirectional MINS

    Networks are multi-path Routing takes place in two steps: route to an

    intermediate node followed by routing todestination

    Multiple intermediate nodes can be selectedPath from intermediate node to destination us unique

    000

    001

    010

    011

    100

    101

    110

    111

    S

    D

    ECE 8813a (40)

    Moving to Fat TreesMoving to Fat Trees

    Nodes at tree leaves

    Switches at tree vertices

    Total link bandwidth isconstant across all treelevels, with full bisection

    bandwidth

    Equivalent to folded Benestopology

    Preferred topology in manysystem area networks

    Folded Clos = Folded Benes = Fat tree network

    7

    6

    5

    4

    3

    2

    1

    0

    15

    14

    13

    12

    11

    10

    9

    8

    Network Bisection

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    21/28

    ECE 8813a (41)

    Fat Trees: Another ViewFat Trees: Another View

    Equivalent to the preceding multistageimplementation

    Common topology in many supercomputerinstallations

    Forward Backward

    ECE 8813a (42)

    GeneralizedGeneralized MINsMINs

    N M

    Ports

    Ports

    C gG g 1C 1 G 1G 0C 0

    a i, 1

    a i, 2

    a i, w i b i, w i

    b i, 1

    b i, 2

    Stage

    Switches

    G i

    w i

    Connection

    Links

    Connection

    Links

    q i = p i + 1

    C i C i + 1

    p i = q i 1

    Generalized switch radix Routing and mathematics uniform across

    switch radix values

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    22/28

    ECE 8813a (43)

    Hybrid NetworksHybrid Networks

    Cluster Bus

    Cluster Bus

    Cluster Bus Cluster Bus

    Cluster Bus

    Cluster Bus

    Cluster Bus

    Cluster Bus

    Cluster Bus Cluster Bus

    Cluster Bus

    Cluster Bus

    2D Hypermesh

    Cluster based 2D Mesh

    ECE 8813a (44)

    A Cost ModelA Cost Model

    Crossbar costsSwitch N 2

    Link costs 2N

    Multistage interconnection networks (MINs)MINs interconnect N input/output ports using k x k switches

    o log k N switch stages, each with N/k switcheso N/k (log k N ) total number of switches

    Example: Compute the relative switch and link costs of interconnecting 4096 nodes

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    23/28

    ECE 8813a (45)

    ExampleExample

    Example: compute the relative switch and linkcosts, N = 4096

    relative_cost(2 2) switches = 40962 / (2 2 4096/2 log 2 4096) = 170

    relative_cost(4 4 )switches = 40962 / (4 2 4096/4 log 4 4096) = 170

    relative_cost(16 16) switches = 4096 2 / (16 2 4096/16 log 16 4096) = 85

    relative_cost(2 2) links = 8192 / (4096 (log 2 4096 + 1)) = 2/13 = 0.1538

    relative_cost(4 4) links = 8192 / (4096 (log 4 4096 + 1)) = 2/7 = 0.2857

    relative_cost(16 16) links = 8192 / (4096 (log 16 4096 + 1)) = 2/4 = 0.5

    cost(crossbar) switches = 40962

    cost(crossbar) links = 8192

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (46)

    ExampleExample (cont.)(cont.)

    Relative link cost

    0

    0.5

    1

    1.5

    2

    2 16 128 1024 81922

    32

    512

    8192

    k

    N

    0-0 .5 0 .5-1 1-1 .5 1 .5-2

    0

    50

    100

    150

    200

    250

    300

    350

    2 16 128 1024 81922

    32512

    8192

    k

    N

    Relative switch cost

    Relative switch and link costs for various values of k and N (crossbar relative to a MIN)

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    24/28

    ECE 8813a (47)

    Comparison of Direct and IndirectComparison of Direct and IndirectNetworksNetworks

    N = 16, k = 4fat tree-like MIN

    Indirect networks have end nodes connected at network periphery

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (48)

    Comparison of Direct and IndirectComparison of Direct and IndirectNetworksNetworks

    N = 8, k = 42D torus

    Direct networks have end nodes connect in network area/volume

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    25/28

    ECE 8813a (49)

    Comparison of Direct and IndirectComparison of Direct and IndirectNetworksNetworks

    N = 8, k = 42D torus

    Direct networks have end nodes connect in network area/volume

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (50)

    Comparison of Direct and IndirectComparison of Direct and IndirectNetworksNetworks

    N = 16 , k = 42D torus

    Direct networks have end nodes connect in network area/volume

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    26/28

    ECE 8813a (51)

    Comparison of Direct and IndirectComparison of Direct and IndirectNetworksNetworks

    64-node system with 8-port switches, b = 4 32-node system with 8-port switches

    Bristling can be used to reduce direct network switch & link costsb end nodes connect to each switch, where b is bristlingfactorAllows larger systems to be built from fewer switches andlinksRequires larger switch degreeFor N = 32 and k = 8, fewer switches and links than fat tree

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

    ECE 8813a (52)

    Comparison of Direct and IndirectComparison of Direct and IndirectNetworksNetworks

    Switches

    End Nodes

    Distance scaling problems may be exacerbated in on-chip MINs

    T.M. Pinkston, J. Duato, with major contributions by J. Filch

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    27/28

  • 8/14/2019 Sudhakar Yalamanchili, Georgia Institute of Technology

    28/28

    ECE 8813a (55)

    A Unified View of Direct and IndirectA Unified View of Direct and IndirectNetworksNetworks

    Switch designs in both cases are coalescing

    Generic network may have 0, 1, or more computenodes/switch

    Switches implement programmable routingfunctions

    Differences are primarily an issue of topologyImagine the use of source routed messages

    Deadlock avoidance

    ECE 8813a (56)

    Summary and Research DirectionsSummary and Research Directions

    Use of hybrid interconnection networksBest way to utilize existing pin-out?

    Engineering considerations rapidly prune thespace of candidate topologies

    Routing + switching + topology = network

    Onto routing.