20
Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University

Phastlane: A Rapid Transit Optical Routing Network · Computer Systems Laboratory Cornell University. O/E Translation E/O Translation Optical Coupler Photonic System Components The

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Phastlane: A Rapid Transit Optical Routing Network

    Mark Cianchetti, Joseph Kerekes, and David Albonesi

    Computer Systems Laboratory

    Cornell University

  • O/E

    Translation

    E/O

    Translation

    Optical

    Coupler

    Photonic System Components

    The Interconnect Bottleneck

    � Future processors: tens to hundreds of cores

    � Dire need for fast and power efficient on-chip interconnect

    � Nanophotonics

    • Ultra-fast signal propagation

    • High bandwidth density

    • Low power

    10001101111100101010111100001

    11011101110011100000110101000

    11111001010101000111000111000

    11101010101001110101010001110

    Electrical Input Data

    + Vinput -

  • Limitations of Nanophotonics

    � Lack of fundamental building blocks

    • Logic gates

    • Memory structures

    � Single routing layer

    • Power loss of waveguide crossings

    Upon formation, unimpeded data

    blasted from source to destination

    Traditional Approach To Exploiting Photonics

    Source

    NodeDestination

    Node

    Direct Bus Path

  • Respecting the Limitations…

    � Previous proposals largely bus-based

    � Direct photonic links between source and destination

    � Data blasted in a single optical transmission

    Corona ArchitectureColumbia Architecture FireFly ArchitectureCornell Ring Architecture

    Dest Node

    Me

    sh

    Co

    lum

    n

    Electrical Network

    Optical Network

  • Overview of Previous Work

    Cache Line

    Token

    Destination Bus

    Fully Connected XBAR

    Snake XBAR

    Corona

    Cache Line>> Cache LineCache LineUnit of Data Transfer

    Per HopPer HopWDMShared Resource Arb.

    Router ChannelsRouter ChannelsNoneShared Resources

    Packet SwitchedElectrical SetupSnoopy BusNetwork Operation

    MeshTorusRing-busNetwork Topology

    PhastlaneColumbiaCornell

    RingFeature

  • Phastlane Contributions

    � Novel nanophotonic router architecture

    � Hybrid optical/electrical packet-switched, mesh network

    • Nanophotonics: High-speed, multi-hop packet transfers

    • Electrical: Logic and buffering to handle packet conflicts

  • � Minimally impede flow of light from source to destination

    • Simplified routing, arbitration

    � Dual die configuration

    Phastlane Architecture

    Transmit: Node 0 to Node 10

    Arbitration

    Routing

  • Optical Node Operation

    Crossbar Resonators

    Optical Router

  • Electrical Node Operation

    � Blocked packets are buffered locally

    • Per port buffers, single processor buffer

    � Output multiplexers connect to output port optical transmitters

    Electrical Node

  • Simplified Packet Routing

    � Packets are dimension-order, source routed

    • Portion of packet holds pre-computed routing bits

    � Per hop control bits enable near-instant switch traversal

    Source Routing

    110010111010011100011111011111110001100101010100011

    100011100010010100011100001110001110001111100000000

    000101000111000111000101010101101010100111000001111

    +

    Data/Address/Other

    Per Hop Packet Control

    01000

    Straight

    R5

    01000

    Straight

    R4

    00100

    Left

    R3

    01000

    Straight

    R2

    01000

    Straight

    10000

    Local

    Direction

    Encoding

    R1R6Hop Number

    XY Dimension-Order

  • Per Hop Control Bits

    � Per hop control groups set switch resonators

    • Left, right, straight, local, broadcast

    � Frequency translation

    • Enables hop oblivious routing

    Left

    Rig

    ht

    Stra

    ight

    Local

    Bro

    adcast

    Control Waveguide

    R 3R 4

    Empty

    01000 Straight

    R5

    01000 Straight

    R4

    01000 Straight

    R3

    00100 Left

    R2

    01000 Straight

    10000 Local

    Direction

    Encoding

    R1R6Hop Number

    Router R1 examines

    control bits

    Frequency shift

  • Per Hop Control Bits Continued

    � Control waveguide split into C0 and C1

    • WDM limitations

    � Frequency and physical translation

    • Enables hop oblivious routing

    C1

    C0C1

    C0

  • Competing Packets: Resolution

    � Packet collisions avoided by packet blocking

    � Collision resolution performed “on the fly”

    N

    S

    W E

    Two packets collide

    Packet

    Received

    Collision resolution

  • High-Speed Drop Network

    � Drop path gradually formed as packet passes through network

    � Packet dropped if downstream buffer full

    � Drop signal travels opposite data packet

    Source

    Drop NodeN

    EW

    D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o

    N

    EW

    D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o

    N

    EW

    D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o

    N

    EW

    D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o

    N

    EW

    D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o

    N

    EW

    D0i D1 i C0i C1iD00D1oC0oC1o D2iD2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0i

    D1i

    C0i

    C1i

    D00

    D1o

    C0o

    C1o

    D2i

    D2o

    D0iD1iC0iC1i D00 D1o C0o C1oD2i D2o

    D00C0oC1o D0 i C0 i C1 i

    D0o

    C0o

    C1o

    D0i

    C0i

    C1i

    D00

    C0o

    C1o

    D0i

    C0i

    C1i

    D00 C0o C1oD0 iC0 iC1 i

  • Router Design Space Exploration

  • Evaluation Methodology

    � 64 node, 8x8 mesh network topology

    � Low latency, high bandwidth electrical network baseline

    • Multi-port receive for destination packets

    � Packet size: 80 bytes, single flit

    � Synthetic and Splash benchmarks

    Baseline Configuration

    XBAR

    Baseline Node

  • Synthetic Benchmark Results

    0

    10

    20

    30

    40

    50

    0 0.05 0.1 0.15 0.2 0.25 0.3

    Offered Traffic

    Late

    ncy .

    0

    10

    20

    30

    40

    50

    0 0.05 0.1 0.15 0.2

    Offered Traffic

    Late

    ncy .

    0

    10

    20

    30

    40

    50

    0 0.05 0.1 0.15 0.2 0.25 0.3

    Offered Traffic

    Late

    ncy .

    0

    10

    20

    30

    40

    50

    0 0.05 0.1 0.15 0.2

    Offered Traffic

    Late

    ncy .

    8X6X

    5X 6X

    Bit Comp Bit Rev

    Shuffle Transpose

  • 0

    1

    2

    3

    4

    5

    Bar

    nes

    Cho

    lesk

    y

    FFT Lu

    Oce

    an

    Rad

    ixR

    aytra

    ceW

    ater

    NSq

    Wat

    ersp

    atia

    l

    FMM

    Ne

    two

    rk S

    pe

    ed

    up

    Electrical

    Optical

    Optical 64 Entry Buffers

    Optical Inf Buffers

    Splash Performance Analysis

    � 2X speedup across all benchmarks

  • Splash Power Analysis

    � 80% reduction in power across all benchmarks

    0

    0.25

    0.5

    0.75

    1

    Barn

    esC

    hole

    sky

    FFT Lu

    Oce

    an

    Rad

    ixR

    aytra

    ceW

    ater

    NSq

    Wat

    ersp

    atia

    l

    FMM

    Re

    lati

    ve

    Po

    we

    r

    Electrical

    Optical

    Optical 64 Entry Buffers

  • Conclusions

    � Novel nanophotonic router architecture

    � Packet-switched, hybrid optical/electrical mesh network

    � Up to 8X performance improvement for synthetic workloads

    � Up to 4X performance improvement, 80% power reduction, for Splash

    � Future work

    • Lower power broadcast scheme

    • Improved allocation and flow control