View
227
Download
0
Category
Tags:
Preview:
Citation preview
The Power of Communication: Energy-Efficient NoCs for FPGAs
Mohamed ABDELFATTAHVaughn BETZ
2
Outline
Why NoCs on FPGAs?
Embedded NoCs
Power Analysis
1
2
3
3
Interconnect
Motivation1. Why NoCs on FPGAs?
Logic Blocks
Switch Blocks
Wires
4
Motivation1. Why NoCs on FPGAs?
Logic Blocks
Switch Blocks
Wires
Hard Blocks:• Memory• Multiplier• Processor
5
Motivation1. Why NoCs on FPGAs?
Logic Blocks
Switch Blocks
Wires
Hard InterfacesDDR/PCIe ..
Interconnect still the same
Hard Blocks:• Memory• Multiplier• Processor
1600 MHz
200 MHz
800 MHz
6
MotivationDDR3 PHY and Controller
Problems:1. Bandwidth requirements for
hard logic/interfaces2. Timing closure
1. Why NoCs on FPGAs?PCIe Controller
Gigabit Ethernet
1600 MHz
200 MHz
800 MHz
7
MotivationDDR3 PHY and Controller
Problems:1. Bandwidth requirements for
hard logic/interfaces2. Timing closure3. High interconnect utilization:
– Huge CAD Problem– Slow compilation– Power/area utilization
4. Wire speed not scaling:– Delay is interconnect-dominated
1. Why NoCs on FPGAs?PCIe Controller
Gigabit Ethernet
Barcelona Los Angeles
Keep the “roads”, but add “freeways”.
Hard Blocks
Logic Cluster
Source: Google Earth
9
DDR3 PHY and Controller
1. Why NoCs on FPGAs?PCIe Controller
Gigabit Ethernet
Problems:1. Bandwidth requirements for
hard logic/interfaces2. Timing closure3. High interconnect utilization:
– Huge CAD Problem– Slow compilation– Power/area utilization
4. Wire speed not scaling:– Delay is interconnect-dominated
FPGA with NoCNoC
Routers
Links Router forwards data packet
Router moves data to local interconnect
10
DDR3 PHY and Controller
1. Why NoCs on FPGAs?PCIe Controller
Gigabit Ethernet
Problems:1. Bandwidth requirements for
hard logic/interfaces2. Timing closure3. High interconnect utilization:
– Huge CAD Problem– Slow compilation– Power/area utilization
4. Wire speed not scaling:– Delay is interconnect-dominated
5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect
FPGA with NoC
Pre-design NoC to requirements NoC links are “re-usable” NoC is heavily “pipelined” NoC abstraction favors modularity
High bandwidth endpoints known
11
DDR3 PHY and Controller
1. Why NoCs on FPGAs?PCIe Controller
Gigabit Ethernet
FPGA with NoC
Latency-tolerant communication NoC abstraction favors modularity
Problems:1. Bandwidth requirements for
hard logic/interfaces2. Timing closure3. High interconnect utilization:
– Huge CAD Problem– Slow compilation– Power/area utilization
4. Wire speed not scaling:– Delay is interconnect-dominated
5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect
Previous work: Compelling area efficiency and performance
NoCs can simplify FPGA design
Does the NoC abstraction come at a high power cost?
12
Outline
Why NoCs on FPGAs?
Embedded NoCs
Power Analysis
1
2
3
Mixed NoCs Hard NoCs
Embedded NoCsFPGA
DD
Rx In
terf
ace
PCIe
Inte
rfac
e
Router
Compute Module
Links(Hard or Soft)
Fabric
Port
(Hard or Soft)
2. Embedded NoCs
“Mixed” NoC
“Hard” NoC
Soft LinksHard Routers
Hard LinksHard Routers =++
=“Soft” NoCSoft LinksSoft Routers + =
14
Soft Hard
FPGA CAD Tools ASIC CAD Tools
Design Compiler
Area
Speed
Power?Power
Methodology
Toggle rates
Gate-level simulation Gate-level simulation
Mixed
HSPICE
15
Router Logic
Programmable Interconnect
FPGA
Router
Mixed NoCs2. Embedded NoCs
Logic blocks
Baseline Router
Programmable“soft” interconnect
Width VCs Ports Buffer
32 2 5 10/VC
“Mixed” NoCSoft LinksHard Routers + =
16
Router Logic
Programmable Interconnect
FPGA
Router
Mixed NoCs2. Embedded NoCs
Router Logic
16“Mixed” NoCSoft LinksHard Routers + =
17
Router Logic
Programmable Interconnect
Router
Assumed a mesh Can form any topology
FPGA
Mixed NoCs2. Embedded NoCs
Special FeatureConfigurable topology
18
Router Logic
Dedicated Interconnect
FPGA
Router
Hard NoCs2. Embedded NoCs
Logic blocks
Dedicated “hard” interconnect
Programmable“soft” interconnect
18“Hard” NoCHard LinksHard Routers + =
19
Router Logic
Dedicated Interconnect
FPGA
Router
Hard NoCs2. Embedded NoCs
Router Logic
19“Hard” NoCHard LinksHard Routers + =
20
Router Logic
Dedicated Interconnect
FPGA
Router
Hard NoCs2. Embedded NoCs
Low-V mode
1.1 V0.9 V
Save 33% Dynamic Power
Special Feature
~15% slower
20“Hard” NoCHard LinksHard Routers + =
21
Outline
Why NoCs on FPGAs?
Embedded NoCs
1
2
Power Analysis
ComponentsAnalysis
3
System Analysis
Soft, Mixed and Hard
22
Area Gap
Speed Gap
Power Gap
Mixed Hard (Low-V)Soft
20X – 23X smaller
5X – 6X faster
9X 11X (15X)
Speed
Area
Speed
Bisection BW
1. Power-aware design 2. NoC power budget 3. Comparison
~ 1.5% of FPGA33% of FPGA
730 – 940 MHz166 MHz
~ 50 GB/s~ 10 GB/s
Aver
age
64 –
NoC
1X
Investigate BW and power together
23
Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?
3. Power Analysis
Links Power
Routers Power
Wider Links, Fewer Routers
24
Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?
3. Power Analysis
25
Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?
3. Power Analysis
26
NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)
17.4 W
250 GB/s total bandwidth
Typical FPGA Dynamic Power
3. Power Analysis
123%How much is used for system-level communication?
27
NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)
17.4 W
NoC
250 GB/s total bandwidth 15%
Typical FPGA Dynamic Power
3. Power Analysis
123%
28
NoC Power Budget3. Power Analysis
NoC
17.4 WTypical FPGA
Dynamic Power
Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11%
29
NoC Power Budget3. Power Analysis
NoC
17.4 WTypical FPGA
Dynamic Power
Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11% 7%
30
Bandwidth in Perspective
14.6 GB/s
14.6 GB/s
14.6 GB/s
14.6 GB/s
17 G
B/s
17 G
B/s
17 G
B/s
17 G
B/s
DDR3 Module 1
PCIe Module 2
Full theoretical BW
126 GB/sAggregate Bandwidth
3.5%NoC Power Budget
Cross whole chip!
3. Power Analysis
31
FPGA Interconnect
1 1
Point-to-point Links
Broadcast
1 1
n
Multiple Masters
1
1Mux + Arbiter
n
Multiple Masters, Multiple Slaves
1 1Mux + Arbiter
n nMux + Arbiter
Interconnect = Just wires Interconnect = Wires + Logic Interconnect = NoC
1 .. .. ..
.. .. .. ..
.. .. ..
.. .. .. n
..Compare “wires” interconnect to NoCs
3. Power Analysis
32
NoC Power vs. FPGA Interconnect
Hard and Mixed NoCs very compelling
Length of 1 NoC Link1 % area overhead on Stratix 5
Runs at 730-943 MHz
Power on-par with simplest FPGA interconnect
3. Power Analysis
200 MHz
High Performance / Packet Switched
1
2
3
Big city needs freeways to handle traffic
Area: 20-23X
Why NoCs on FPGAs?
Embedded NoCs: Mixed & Hard
Power Analysis
Speed: 5-6X Power: 9-15X
• Power-aware design of embedded NoCs• Power Budget for 100 GB/s: 3-7%• Point-to-point soft Links: 4.7 mJ/GB• Embedded NoCs: 4.5 – 10.4 mJ/GB
34
eecg.utoronto.ca/~mohamed/noc_designer.html
35
Thank You!
eecg.utoronto.ca/~mohamed/noc_designer.html
Recommended