Upload
everett-harvey
View
213
Download
0
Embed Size (px)
Citation preview
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point
Connections
M. Modarressi, H. Sarbazi-Azad, and A. TavakkolComputer Engineering Department, Sharif University of Technology, Tehran,
2 Sharif University of Technology
Outline
Introduction and Motivations Virtual Point-to-Point (VIP) Connections Static VIP Construction Scheme Dynamic VIP Construction Scheme Setup Network Evaluation Results Conclusions and Future Work
3
On-Chip Communication Mechanisms
Packet-Switched NoCs Good Resource Utilization Modest Design Effort/Time Due to Structured and Predictable Links Some Power and Performance Overheads Due to Multi-Stage
Pipelined Routers
Dedicated Point-to-Point Links Ideal Power and Performance Poor Scalability: Significant Area Overhead for Large Systems Significant Design Effort/Time Due to Non-Predictable Link
Properties
Virtual Point-to-Point Connections in a Packet-Switched NoC
4
VIP Connections
VIP: VIrtual Point-to-point Connections Over One VC (Virtual Channel) of Each Physical Channel Bypass Some Router Pipeline Stages
Inexpensive Extensions to a Traditional Wormhole Router Router Control Unit, Arbiter, Buffer of the VIP Virtual Channels
5
Router Architecture
Buffer at the VIP Virtual Channels Is Replaced by a Register (1-Flit Buffer)
VIP Paths Are Kept by VIP Allocator Units at Output Ports Determines Which Input Is Connected to This Port Along the VIP
Allocates Output Port to VIP When Control Signals Indicate That the VIP Has an Incoming Flit to ForwardA Flow-Control Mechanism Prevents Starvation in Packet-Switched Flits
6
VIP Connections
A VIP Is Constructed by Chaining the VIP Registers in the Routers Between the Source And Destination Nodes of a Communication Flow
Provides a Virtual Dedicated Pipelined Link With 1-flit VIP Buffers as Staging Registers Flits Only Travel Over the Crossbars and Links Which Cover the Actual
Physical Distance Between Their Source and Destination Nodes
Skip Through Buffer Read, Buffer Write, and Allocation Operations
7
VIP Connections
VIPs Are Not Allowed to Share a Common Link To Remove Buffering, Arbitration,…
A Limited Number of VIPs in a Network
But VIPs Cover a Significant Portion of On-Chip Traffic Due to Communication Locality In Most Multi-Core SoC Applications Each Core Communicates With a
Few Other Cores
In CMP Workloads Each Node Tends to Have a Small Number of Favored Destinations for Its Messages
8
VIP Construction Algorithm - Static
Based on Application Traffic Pattern
Input Applications Are Described by a Task-Graph (TG)
A Heuristic Algorithm
Map the TG Cores into the Nodes of a Mesh-based NoC
Construct VIP for TG Edges in Order of Their Communication Volumes
Find a Path Through Packet-Switched Network for a TG Edge If There Are Not Sufficient Free Resources to Build a VIP for It
9
VIPs for the VOPD Application
VIPs Cover 100% of the On-Chip Traffic for This Application Static VIP Construction Scheme:
Benchmarks: VOPD, MWD, MPEG, MP3+H263 Up to 58% Reduction in Message Latency (39% on Average) Up to 65% Reduction in Power Consumption (49% on Average)
V8V9
V10
V15
V13V14
V11
V12
V7
V6
V3
V4
V16
V5
V2V170
27
49
362
362362
353
16
16
157
16
16
16
300
16
500313
313
94
16
357
V13
V1
V7
V12
V2
V6
V9
V3
V5
V8
V4
V16
V14 V15 V11 V10
10
VIPs vs. Physical Point-to-Point Connections
VIPs Offer: Power and Performance Close to Dedicated Physical Point-to-
Point Connections
More Flexibility Dynamically Reconfigurable Based on the Traffic Pattern of the
Running Application
Less Design Effort Customized Dedicated Connections Over Regular Components
11
Dynamic VIP Construction
An Alternative VIP Construction Scheme
Dynamically Changes the VIP Connections in Response to Communication Requirements Imposed By the Running Application Monitoring the NoC Traffic Detecting High-Volume Communications and Constructing a VIP for
Them
Select the Best Route for a VIP Using a Simple Setup Network
12
Setup Network
Setup Network Structure A Light-Weight Control Network
Simple Node Structure and Small Bit-Width The Same Topology as the Main Data Network
Setup Network Operation Keep the Track of the Number and Destination of Packets Sent by Each
Node Select Traffic Flows Weighting Higher Than a Threshold (Bit/Sec.) Finds a Path Along One of the Shortest Paths Between the Source and
Destination Nodes of the Traffic Flow to Construct a VIP
13
Dynamic VIP Construction
Establishing a New VIP May Tear Down Some Existing VIPs Cost of a VIP: The Cumulative Weight (bit/sec.) of the VIPs That Will Be
Torn Down By This New VIP
Setup Network: Finds the Path With Minimum Cost Sends the Cost to the Source Node to Decide on Establishing the New
VIP
A New VIP Is Established If the Cumulative Weight of the Torn Down VIPs Is Less Than the Weight of the Requesting Traffic Flow
14
Setup Network
VIP Setup Procedure: Arbitrating Among VIP Setup Requests Running the Distributed VIP Setup Algorithm Setting Up a VIP in the Data Network By Configuring the VIP Allocator
of the Nodes Along the VIP Path Tearing Down Conflicting VIPs
Each Setup Network Node Contains the Configuration Information of Its Corresponding Data Network Node Due to the Distributed Nature of the Algorithm
Short Reconfiguration Time
S
D
4
5
0
2 9
8
35
5 0
5 4 7
4
5
12
10
9
9
12
15
12
8
21
Port Cost ( Weight of the VIP Using It )
1. Add the Received Cost (4) to the Weight of Ports Along the Shortest Path (the W and N Ports) toward the Destination Node
2. Send the New Costs (9 and 12) to the Neighboring Nodes Along the Destination Node
Select the Minimum Cost and Keep the Port from Which the Smaller Cost Is Received
15
16
Dynamic VIP Construction
The Setup Network Operates in Parallel with Packet Transmission in Packet-switched Network Hide the Setup Time
The Setup Network Has a Small Bit-width and Operates Infrequently (Only When a High-volume Flow Is Detected) Negligible Power and Area Overhead
17
Evaluation Results
XMulator NoC Simulator (www.xmulator.org) A C# -based Simulator Orion Power Library
Comparison with a Conventional NoC (5-Stage Pipelined Wormhole Switch)
Multi-Core SoC Traffic: H.263 Decoder+MP3 Decoder, H.263 Decoder+ MP3 Encoder, MP3
Decoder+ MP3 Encoder
38% Reduction in Message Latency, 46% Reduction in Power Consumption
18
Evaluation Results
Synthetic Traffic: N-Hot Traffic: 80% of Messages to Exactly N Destination, 20% to Randomly
Chosen Nodes
Power (nJ/Cycle)
20
30
40
50
60
70
80
90
100
110
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Traffic (Message/nod/cycle)
Po
wer
(nJ/c
ycle
) 1-hot Conv.
1-hot VIP
2-hot conv.
2-hot VIP
3-hot Conv.
3-hot VIP
0
100
200
300
400
500
600
0 0.02 0.04 0.06 0.08 0.1 0.12
Traffic (Message/node/cycle)
Ave
rage
Mes
sage
Lat
ency
(C
ycl
es)
1-hot Conv.
1-hot VIP
2-hot conv.
2-hot VIP
3-hot Conv.
3-hot VIP
Message Latency (cycles for 8-flit packets)
19
Summary and Future Work
Adaptable Virtual Point-to-Point Connections in a Packet-Switched NoC
Benefit from the Advantages of Both Communication Methods Two Static and Dynamic VIP Construction Schemes Significant Power/Latency Reduction
Future Work Comparing the Method with Related Work; Express Virtual
Channels, Single-Cycle Routers, … Precise Area/Power Results by Implementing the NoC in Hardware
Analytical Models Show Small Area Overhead