Upload
doandiep
View
263
Download
5
Embed Size (px)
Citation preview
1
Advanced Digital IC Design
Network-on-Chip(NoC)
Chenxin Zhang & Xiaodong Liu
Agenda
IntroductionNoC Concept
NoC topologySwitching strategiesRouting algorithmsFlow control schemes
NoC Architecture ExamplesEmerging NoC technologies
Introduction
Evolution of on-chip communication architectures
N t k Chi (N C) i k t it h d hi Network-on-Chip (NoC) is a packet switched on-chip communication network designed using a layeredmethodology. NoC is a communication centric design paradigm for System-on-Chip (SoC).
Introduction
NoCs use packets to route data from the source processing element (PE) to the destination PE via a network fabric that consists of
Network interfaces/adapters (NI)Routers (a.k.a. switches)interconnection links (channels, wires bundles)
registers
ALU MEM
NI
2
Building Blocks: NI
Session-layer (P2P) interface with nodesBack-end manages interface with switches
Decoupling logic & synchronization
Front end
Backend
Node Switches
Standard P2P Node protocol Proprietary link protocol
Decoupling logic & synchronization
Standardized node interface @ session layer.1. Supported transactions (e.g. QoSread…)2. Degree of parallelism3. Session prot. control flow & negotiation
NoC specific backend (layers 1-4)1. Physical channel interface2. Link-level protocol3. Network-layer (packetization)4. Transport layer (routing)
Building Blocks: Router (Switch)
Router or Switch: receives and forwards packetsBuffers have dual function: synchronization & queuing
Crossbar
AllocatorArbiter
Output buffers& control flow
Input buffers& control flow
QoS &Routing
Data portswith control flowwires
Building Blocks: Links
Connects two routers in both directions on a number of wires (e.g.,32 bits)In addition, wires for control are part of the link tooCan be pipelined (include handshaking for asynchronous)
NoC Concept
TopologyHow the nodes are connected together
SwitchingSwitchingAllocation of network resources (bandwidth, buffer capacity, …) to information flows
RoutingPath selection between a source and a destination node in a particular topology
Flow control Flow control How the downstream node communicates forwarding availability to the upstream node
3
NoC Topology
Direct IndirectIrregularIrregular
Direct Topologies
Direct TopologiesEach node has direct point-to-point link to a subset of other nodes in the system called neighboring nodesAs the number of nodes in the system increases the total available As the number of nodes in the system increases, the total available communication bandwidth also increasesFundamental trade-off is between connectivity and cost
Most direct network topologies have an orthogonal implementation, where nodes can be arranged in an n-dimensional orthogonal space
e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon
2D-mesh
It is most popular topology All links have the same All links have the same length
eases physical design
Area grows linearly with the number of nodesMust be designed in such a way as to avoid traffic accumulating in the center of accumulating in the center of the mesh
Torus
Torus topology, also called a k-ary n-cube, is an n-dimensional grid with k nodes in each dimensionk-ary 1-cube (1-D torus) is essentially a ring network with k nodes
limited scalability as performance decreases when more nodesk-ary 2-cube (i.e., 2-D torus) topology is similar to a regular mesh
except that nodes at the edges are connected to switches at the opposite edge via wrap-around channelslong end-around connections can, however, lead to excessive delays
4
Folding Torus
Folding torus topology overcomes the long link limitation of a 2-D torusMeshes and tori can be extended by adding Meshes and tori can be extended by adding bypass links to increase performance at the cost of higher area
Octagon
Octagon topology is another example of a direct network
messages being sent between any 2 nodes require at most messages being sent between any 2 nodes require at most two hopsmore octagons can be tiled together to accommodate larger designs by using one of the nodes as a bridge node
Indirect Topologies
Indirect TopologiesEach node is connected to an external switch, and switches have point-to-point links to other switchesswitches have point to point links to other switchesFat tree topologyButterfly topology
Irregular or ad hoc network topologies
Customized for an applicationUsually a mix of shared bus, direct, and indirect network topologiesp gE.g. reduced mesh, cluster-based hybrid topology
5
Switching techniques
Circuit switching+ Dedicated links, simple, low overhead, full bandwidth- Inflexible low utilizationInflexible, low utilization
Packet switching+ Shared links, flexible, variable bit rate (payload length)- Packet overhead
Switching mode:o Datagram switching: packet orientedo Virtual circuit switching: connection oriented
Switching scheme:Switching scheme:o Store and Forward (SAF) switchingo Virtual Cut-through (VCT): e.g. Ethernet
• Low latency, decreased reliabilityo Worm-Hole switching (WH): e.g. NoC
• Few buffer, lower latency, decreased reliability
Packet Switching (Store and Forward)
Buffers for datapackets
Source end node
Destination end node
P k t l t l t d b f ti i f d d
Store
Packets are completely stored before any portion is forwarded
Packet Switching (Store and Forward)
Requirement:buffers must be
sized to holdentire packet
Source end node
Destination end node
P k t l t l t d b f ti i f d d
StoreForward
Packets are completely stored before any portion is forwarded
Packet Switching (Virtual Cut Through)
Routing
Source end node
Destination end node
P ti f k t b f d d (“ t th h”) t th t it hPortions of a packet may be forwarded (“cut-through”) to the next switchbefore the entire packet is stored at the current switch
6
Virtual Cut Through vs. Wormhole
Virtual Cut Through: Packet level
Buffers for datapackets
Requirement:buffers must be sizedto hold entire packet
Wormhole: FLIT level
Source end node
Destination end node
Buffers for flits:packets can be larger
than buffers
Source end node
Destination end node
Virtual Cut Through vs. WormholeBuffers for data
packetsRequirement:
buffers must be sizedto hold entire packet
Virtual Cut Through
Source end node
Destination end node
BusyLink
Packet completelystored atthe switch
Buffers for flits:packets can be larger
than buffersWormhole
Source end node
Destination end node
Packet stored along the path
BusyLink
Routing Algorithms
Responsible for correctly and efficiently routing packets or circuits from the source to the destination
Ensure load balancing
Latency minimization
Deadlock and livelock free
S
D
Deadlock
7
Livelock Static vs. Dynamic Routing
Static routing+ Simple logic, low overhead+ Simple logic, low overhead
+ Guaranteed in-order packet delivery
- Does not take into account current state of the network
Dynamic routing+ Dynamic traffic distribution according to the
current state of the network
- Complex, need to monitor state of the network and dynamically change routing paths
Turn model based routing algorithm
Basis: Mainly for mesh NoC
Analyze directions in which packets can turn in the network
Determine the cycles that such turns can form
Prohibit just enough turns to break all cycle
Resulting routing algorithms are:Resulting routing algorithms are:Deadlock and livelock free
Minimal/Non-minimal
Highly Adaptive: based on the Network load
Turn model
What is a turnFrom one dimension to another : 90 degree turngTo another virtual channel in the same direction: 0 degree turnTo the reverse direction: 180 degree turn
Turns combine to form cycles
8
X-Y routing algorithm (Deterministic)
+x x
+y
+x
-y
-x
-y
+y
-x +x
West-First routing algorithm
West First Algorithm
West-First routing algorithm
S
D
West First Algorithm
West-First routing algorithm
S
D
9
West First Algorithm
West-First routing algorithm
D S
North Last Algorithm
North-Last routing algorithm
D S
Source routing Flow Control
Required in non-Circuit Switched networks to deal with congestion
Recover from transmission errors
Commonly used schemes:
ACK-NACK Flow control
Credit based Flow control
Xon/Xoff (STALL-GO) Flow Control
A B C Block
Buffer full
Don’t send
Buffer full
Don’t send
“Backpressure”
10
Flow Control Schemes
Credit based flow controlSender sends
packets whenever
receiversender
packets whenever credit counter
is not zero
10Credit counter 9876543210
Xpipelined transfer
Queue isnot serviced
Flow Control Schemes
Credit based flow control
Receiver sends Sender resumes
receiversender
10Credit counter 9876543210
+5
5432
X
Receiver sends credits after they become available
Sender resumesinjection
pipelined transfer
Queue isnot serviced
Flow Control Schemes
Xon/Xoff flow control
receiversender
XonXoff a packet is
injected if control bit is in
XonControl bit
Xon
Xoff
pipelined transfer
Flow Control Schemes
Xon/Xoff flow control
When Xoff
receiversender
XonXoff
When Xoff threshold is
reached, an Xoffnotification is
sent
Control bit Xoff
Xon
When in Xoff,sender cannotinject packets
Xpipelined transfer
Queue isnot serviced
11
Credit-Based vs. Xon/Xoff Flow Control
Both schemes can fully utilize buffers
Restart latency is lower for credit-based schemes and thereforetherefore
Credit-based flow control has higher average buffer occupancy at high loads
Credit-based flow control leads to higher throughput at high loads
Smaller inter-packet gap
Control traffic is higher for credit schemes
Block credits can be used to tune link behavior
Buffer sizes are independent of round trip latency for credit schemes (at the expense of performance)
Credit schemes have higher information content useful for QoS schemes
NoC Architecture Examples
Intel’s Teraflops Research Processor
Deliver Tera-scale performanceSingle precision TFLOP at desktop powerFrequency target 5GHz
I/O AreaI/O Area
single tilesingle tile
1 5mm1 5mm
12.64mm
Frequency target 5GHzBi-section B/W order of Terabits/sLink bandwidth in hundreds of GB/s
Prototype two key technologiesOn-die interconnect fabric3D stacked memory
Develop a scalable design methodology
1.5mm1.5mm
22..00mmmm
21
.72
mm
65nm, 1 poly, 8 metal (Cu)Technology
100 Million (full-chip) 1.2 Million (tile)
Transistors
275mm2 (full-chip) Die Area
65nm, 1 poly, 8 metal (Cu)Technology
100 Million (full-chip) 1.2 Million (tile)
Transistors
275mm2 (full-chip) Die Area methodologyTiled design approachMesochronous clockingPower-aware capability
I/O AreaI/O Area
PLLPLL TAPTAP
I/O AreaI/O Area
PLLPLL TAPTAP
( )3mm2 (tile)8390C4 bumps #
( )3mm2 (tile)8390C4 bumps #
[Vangal08]
Main Building Blocks
Special Purpose Cores2D Mesh Interconnect
Mesochronous Interface
Crossbar RouterMS
INT39
40 GB/s
MSINT
MSINT
2D Mesh Interconnect Mesochronous ClockingWorkload-aware Power Management
2KB Data memory (DMEM)
mem
ory (
IMEM
)
6-read, 4-write 32 entry RF3264 64
32
64
RIB
96
96
M
39
x
32
x
32
MSINT
3KB
Inst
. mProcessing Engine (PE)
FPMAC0
+
Normalize
32
FPMAC1
+32
Normalize
Tile
12
Emerging NoC technologies
Multi-Band RF-Interconnect
Source: M. Chang, CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect
Multi-Band RF-Interconnect
Multi-Band RF-Interconnect
An area overhead of 0.13%, an average 13% (max 18%) boost in application performance, an average 22% (max 24%) reduction in packet latency.
3D optical NoC
32Gbps optical link bandwidth, 70% power reduction compared to a matched 2D electronic NoC.
Source: Y. Ye, 3D Optical Networks-on-chip (NoC) for Multiprocessor Systems-on-chip (MPSoC)
13
References
[1] S. Pasricha, On-Chip Communication: Networks on Chip (NoCs), 2011.
[2] C J Glass The Turn Model for Adaptive Routing[2] C. J. Glass, The Turn Model for Adaptive Routing.[3] U. M. Mirza, Network on Chip, 2011.[4] M.F. Cong et al., "CMP network-on-chip overlaid with
multi-band RF-interconnect," IEEE 14th International Symposium on High Performance Computer Architecture (HPCA), vol., no., pp.191-202, 16-20 Feb. 2008.
[5] Y. Ye, “3D Optical Networks-on-chip (NoC) for M lti S t hi (MPS C)” IEEE Multiprocessor Systems-on-chip (MPSoC)”, IEEE International Conference on 3D System Integration (3DIC), vol., no., pp.1-6, 28-30 Sept. 2009.