Introduction NoC Concept Network-on-Chip … NoC Concept NoC topology Switching strategies Routing algorithms Flow control schemes NoC Architecture Examples ... paradigm for System-on-Chip

1

Advanced Digital IC Design

Network-on-Chip(NoC)

Chenxin Zhang & Xiaodong Liu

Agenda

IntroductionNoC Concept

NoC topologySwitching strategiesRouting algorithmsFlow control schemes

NoC Architecture ExamplesEmerging NoC technologies

Introduction

Evolution of on-chip communication architectures

N t k Chi (N C) i k t it h d hi Network-on-Chip (NoC) is a packet switched on-chip communication network designed using a layeredmethodology. NoC is a communication centric design paradigm for System-on-Chip (SoC).

Introduction

NoCs use packets to route data from the source processing element (PE) to the destination PE via a network fabric that consists of

Network interfaces/adapters (NI)Routers (a.k.a. switches)interconnection links (channels, wires bundles)

registers

ALU MEM

NI

2

Building Blocks: NI

Session-layer (P2P) interface with nodesBack-end manages interface with switches

Decoupling logic & synchronization

Front end

Backend

Node Switches

Standard P2P Node protocol Proprietary link protocol

Decoupling logic & synchronization

Standardized node interface @ session layer.1. Supported transactions (e.g. QoSread…)2. Degree of parallelism3. Session prot. control flow & negotiation

NoC specific backend (layers 1-4)1. Physical channel interface2. Link-level protocol3. Network-layer (packetization)4. Transport layer (routing)

Building Blocks: Router (Switch)

Router or Switch: receives and forwards packetsBuffers have dual function: synchronization & queuing

Crossbar

AllocatorArbiter

Output buffers& control flow

Input buffers& control flow

QoS &Routing

Data portswith control flowwires

Building Blocks: Links

Connects two routers in both directions on a number of wires (e.g.,32 bits)In addition, wires for control are part of the link tooCan be pipelined (include handshaking for asynchronous)

NoC Concept

TopologyHow the nodes are connected together

SwitchingSwitchingAllocation of network resources (bandwidth, buffer capacity, …) to information flows

RoutingPath selection between a source and a destination node in a particular topology

Flow control Flow control How the downstream node communicates forwarding availability to the upstream node

3

NoC Topology

Direct IndirectIrregularIrregular

Direct Topologies

Direct TopologiesEach node has direct point-to-point link to a subset of other nodes in the system called neighboring nodesAs the number of nodes in the system increases the total available As the number of nodes in the system increases, the total available communication bandwidth also increasesFundamental trade-off is between connectivity and cost

Most direct network topologies have an orthogonal implementation, where nodes can be arranged in an n-dimensional orthogonal space

e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon

2D-mesh

It is most popular topology All links have the same All links have the same length

eases physical design

Area grows linearly with the number of nodesMust be designed in such a way as to avoid traffic accumulating in the center of accumulating in the center of the mesh

Torus

Torus topology, also called a k-ary n-cube, is an n-dimensional grid with k nodes in each dimensionk-ary 1-cube (1-D torus) is essentially a ring network with k nodes

limited scalability as performance decreases when more nodesk-ary 2-cube (i.e., 2-D torus) topology is similar to a regular mesh

except that nodes at the edges are connected to switches at the opposite edge via wrap-around channelslong end-around connections can, however, lead to excessive delays

4

Folding Torus

Folding torus topology overcomes the long link limitation of a 2-D torusMeshes and tori can be extended by adding Meshes and tori can be extended by adding bypass links to increase performance at the cost of higher area

Octagon

Octagon topology is another example of a direct network

messages being sent between any 2 nodes require at most messages being sent between any 2 nodes require at most two hopsmore octagons can be tiled together to accommodate larger designs by using one of the nodes as a bridge node

Indirect Topologies

Indirect TopologiesEach node is connected to an external switch, and switches have point-to-point links to other switchesswitches have point to point links to other switchesFat tree topologyButterfly topology

Irregular or ad hoc network topologies

Customized for an applicationUsually a mix of shared bus, direct, and indirect network topologiesp gE.g. reduced mesh, cluster-based hybrid topology

5

Switching techniques

Circuit switching+ Dedicated links, simple, low overhead, full bandwidth- Inflexible low utilizationInflexible, low utilization

Packet switching+ Shared links, flexible, variable bit rate (payload length)- Packet overhead

Switching mode:o Datagram switching: packet orientedo Virtual circuit switching: connection oriented

Switching scheme:Switching scheme:o Store and Forward (SAF) switchingo Virtual Cut-through (VCT): e.g. Ethernet

• Low latency, decreased reliabilityo Worm-Hole switching (WH): e.g. NoC

• Few buffer, lower latency, decreased reliability

Packet Switching (Store and Forward)

Buffers for datapackets

Source end node

Destination end node

P k t l t l t d b f ti i f d d

Store

Packets are completely stored before any portion is forwarded

Packet Switching (Store and Forward)

Requirement:buffers must be

sized to holdentire packet

Source end node


P k t l t l t d b f ti i f d d

StoreForward

Packets are completely stored before any portion is forwarded

Packet Switching (Virtual Cut Through)

Routing

Source end node


P ti f k t b f d d (“ t th h”) t th t it hPortions of a packet may be forwarded (“cut-through”) to the next switchbefore the entire packet is stored at the current switch

6

Virtual Cut Through vs. Wormhole

Virtual Cut Through: Packet level

Buffers for datapackets

Requirement:buffers must be sizedto hold entire packet

Wormhole: FLIT level

Source end node


Buffers for flits:packets can be larger

than buffers

Source end node


Virtual Cut Through vs. WormholeBuffers for data

packetsRequirement:

buffers must be sizedto hold entire packet

Virtual Cut Through

Source end node


BusyLink

Packet completelystored atthe switch

Buffers for flits:packets can be larger

than buffersWormhole

Source end node


Packet stored along the path

BusyLink

Routing Algorithms

Responsible for correctly and efficiently routing packets or circuits from the source to the destination

Ensure load balancing

Latency minimization

Deadlock and livelock free

S

D

Deadlock

7

Livelock Static vs. Dynamic Routing

Static routing+ Simple logic, low overhead+ Simple logic, low overhead

+ Guaranteed in-order packet delivery

- Does not take into account current state of the network

Dynamic routing+ Dynamic traffic distribution according to the

current state of the network

- Complex, need to monitor state of the network and dynamically change routing paths

Turn model based routing algorithm

Basis: Mainly for mesh NoC

Analyze directions in which packets can turn in the network

Determine the cycles that such turns can form

Prohibit just enough turns to break all cycle

Resulting routing algorithms are:Resulting routing algorithms are:Deadlock and livelock free

Minimal/Non-minimal

Highly Adaptive: based on the Network load

Turn model

What is a turnFrom one dimension to another : 90 degree turngTo another virtual channel in the same direction: 0 degree turnTo the reverse direction: 180 degree turn

Turns combine to form cycles

8

X-Y routing algorithm (Deterministic)

+x x

+y

+x

-y

-x

-y

+y

-x +x

West-First routing algorithm

West First Algorithm


S

D



S

D

9



D S

North Last Algorithm

North-Last routing algorithm

D S

Source routing Flow Control

Required in non-Circuit Switched networks to deal with congestion

Recover from transmission errors

Commonly used schemes:

ACK-NACK Flow control

Credit based Flow control

Xon/Xoff (STALL-GO) Flow Control

A B C Block

Buffer full

Don’t send

Buffer full

Don’t send

“Backpressure”

10

Flow Control Schemes

Credit based flow controlSender sends

packets whenever

receiversender

packets whenever credit counter

is not zero

10Credit counter 9876543210

Xpipelined transfer

Queue isnot serviced


Credit based flow control

Receiver sends Sender resumes

receiversender

10Credit counter 9876543210

+5

5432

X

Receiver sends credits after they become available

Sender resumesinjection

pipelined transfer



Xon/Xoff flow control

receiversender

XonXoff a packet is

injected if control bit is in

XonControl bit

Xon

Xoff

pipelined transfer


Xon/Xoff flow control

When Xoff

receiversender

XonXoff

When Xoff threshold is

reached, an Xoffnotification is

sent

Control bit Xoff

Xon

When in Xoff,sender cannotinject packets

Xpipelined transfer


11

Credit-Based vs. Xon/Xoff Flow Control

Both schemes can fully utilize buffers

Restart latency is lower for credit-based schemes and thereforetherefore

Credit-based flow control has higher average buffer occupancy at high loads

Credit-based flow control leads to higher throughput at high loads

Smaller inter-packet gap

Control traffic is higher for credit schemes

Block credits can be used to tune link behavior

Buffer sizes are independent of round trip latency for credit schemes (at the expense of performance)

Credit schemes have higher information content useful for QoS schemes

NoC Architecture Examples

Intel’s Teraflops Research Processor

Deliver Tera-scale performanceSingle precision TFLOP at desktop powerFrequency target 5GHz

I/O AreaI/O Area

single tilesingle tile

1 5mm1 5mm

12.64mm

Frequency target 5GHzBi-section B/W order of Terabits/sLink bandwidth in hundreds of GB/s

Prototype two key technologiesOn-die interconnect fabric3D stacked memory

Develop a scalable design methodology

1.5mm1.5mm

22..00mmmm

21

.72

mm

65nm, 1 poly, 8 metal (Cu)Technology

100 Million (full-chip) 1.2 Million (tile)

Transistors

275mm2 (full-chip) Die Area

65nm, 1 poly, 8 metal (Cu)Technology

100 Million (full-chip) 1.2 Million (tile)

Transistors

275mm2 (full-chip) Die Area methodologyTiled design approachMesochronous clockingPower-aware capability

I/O AreaI/O Area

PLLPLL TAPTAP

I/O AreaI/O Area

PLLPLL TAPTAP

( )3mm2 (tile)8390C4 bumps #

( )3mm2 (tile)8390C4 bumps #

[Vangal08]

Main Building Blocks

Special Purpose Cores2D Mesh Interconnect

Mesochronous Interface

Crossbar RouterMS

INT39

40 GB/s

MSINT

MSINT

2D Mesh Interconnect Mesochronous ClockingWorkload-aware Power Management

2KB Data memory (DMEM)

mem

ory (

IMEM

)

6-read, 4-write 32 entry RF3264 64

32

64

RIB

96

96

M

39

x

32

x

32

MSINT

3KB

Inst

. mProcessing Engine (PE)

FPMAC0

+

Normalize

32

FPMAC1

+32

Normalize

Tile

12

Emerging NoC technologies

Multi-Band RF-Interconnect

Source: M. Chang, CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect



An area overhead of 0.13%, an average 13% (max 18%) boost in application performance, an average 22% (max 24%) reduction in packet latency.

3D optical NoC

32Gbps optical link bandwidth, 70% power reduction compared to a matched 2D electronic NoC.

Source: Y. Ye, 3D Optical Networks-on-chip (NoC) for Multiprocessor Systems-on-chip (MPSoC)

13

References

[1] S. Pasricha, On-Chip Communication: Networks on Chip (NoCs), 2011.

[2] C J Glass The Turn Model for Adaptive Routing[2] C. J. Glass, The Turn Model for Adaptive Routing.[3] U. M. Mirza, Network on Chip, 2011.[4] M.F. Cong et al., "CMP network-on-chip overlaid with

multi-band RF-interconnect," IEEE 14th International Symposium on High Performance Computer Architecture (HPCA), vol., no., pp.191-202, 16-20 Feb. 2008.

[5] Y. Ye, “3D Optical Networks-on-chip (NoC) for M lti S t hi (MPS C)” IEEE Multiprocessor Systems-on-chip (MPSoC)”, IEEE International Conference on 3D System Integration (3DIC), vol., no., pp.1-6, 28-30 Sept. 2009.

Documents

Introduction NoC Concept Network-on-Chip … NoC Concept NoC topology Switching strategies Routing algorithms Flow control schemes NoC Architecture Examples ... paradigm for System-on-Chip