Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis 1 1 2 2 3 3

Preview:

Citation preview

The Power of Communication: Energy-Efficient NoCs for FPGAs

Mohamed ABDELFATTAHVaughn BETZ

2

Outline

Why NoCs on FPGAs?

Embedded NoCs

Power Analysis

1

2

3

3

Interconnect

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

4

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard Blocks:• Memory• Multiplier• Processor

5

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard InterfacesDDR/PCIe ..

Interconnect still the same

Hard Blocks:• Memory• Multiplier• Processor

1600 MHz

200 MHz

800 MHz

6

MotivationDDR3 PHY and Controller

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

1600 MHz

200 MHz

800 MHz

7

MotivationDDR3 PHY and Controller

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Barcelona Los Angeles

Keep the “roads”, but add “freeways”.

Hard Blocks

Logic Cluster

Source: Google Earth

9

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

FPGA with NoCNoC

Routers

Links Router forwards data packet

Router moves data to local interconnect

10

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

FPGA with NoC

Pre-design NoC to requirements NoC links are “re-usable” NoC is heavily “pipelined” NoC abstraction favors modularity

High bandwidth endpoints known

11

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

FPGA with NoC

Latency-tolerant communication NoC abstraction favors modularity

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

Previous work: Compelling area efficiency and performance

NoCs can simplify FPGA design

Does the NoC abstraction come at a high power cost?

12

Outline

Why NoCs on FPGAs?

Embedded NoCs

Power Analysis

1

2

3

Mixed NoCs Hard NoCs

Embedded NoCsFPGA

DD

Rx In

terf

ace

PCIe

Inte

rfac

e

Router

Compute Module

Links(Hard or Soft)

Fabric

Port

(Hard or Soft)

2. Embedded NoCs

“Mixed” NoC

“Hard” NoC

Soft LinksHard Routers

Hard LinksHard Routers =++

=“Soft” NoCSoft LinksSoft Routers + =

14

Soft Hard

FPGA CAD Tools ASIC CAD Tools

Design Compiler

Area

Speed

Power?Power

Methodology

Toggle rates

Gate-level simulation Gate-level simulation

Mixed

HSPICE

15

Router Logic

Programmable Interconnect

FPGA

Router

Mixed NoCs2. Embedded NoCs

Logic blocks

Baseline Router

Programmable“soft” interconnect

Width VCs Ports Buffer

32 2 5 10/VC

“Mixed” NoCSoft LinksHard Routers + =

16

Router Logic

Programmable Interconnect

FPGA

Router

Mixed NoCs2. Embedded NoCs

Router Logic

16“Mixed” NoCSoft LinksHard Routers + =

17

Router Logic

Programmable Interconnect

Router

Assumed a mesh Can form any topology

FPGA

Mixed NoCs2. Embedded NoCs

Special FeatureConfigurable topology

18

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Logic blocks

Dedicated “hard” interconnect

Programmable“soft” interconnect

18“Hard” NoCHard LinksHard Routers + =

19

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Router Logic

19“Hard” NoCHard LinksHard Routers + =

20

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Low-V mode

1.1 V0.9 V

Save 33% Dynamic Power

Special Feature

~15% slower

20“Hard” NoCHard LinksHard Routers + =

21

Outline

Why NoCs on FPGAs?

Embedded NoCs

1

2

Power Analysis

ComponentsAnalysis

3

System Analysis

Soft, Mixed and Hard

22

Area Gap

Speed Gap

Power Gap

Mixed Hard (Low-V)Soft

20X – 23X smaller

5X – 6X faster

9X 11X (15X)

Speed

Area

Speed

Bisection BW

1. Power-aware design 2. NoC power budget 3. Comparison

~ 1.5% of FPGA33% of FPGA

730 – 940 MHz166 MHz

~ 50 GB/s~ 10 GB/s

Aver

age

64 –

NoC

1X

Investigate BW and power together

23

Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?

3. Power Analysis

Links Power

Routers Power

Wider Links, Fewer Routers

24

Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?

3. Power Analysis

25

Power-Aware NoC Design Total BW = 250 GBps Most Efficient NoC?

3. Power Analysis

26

NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)

17.4 W

250 GB/s total bandwidth

Typical FPGA Dynamic Power

3. Power Analysis

123%How much is used for system-level communication?

27

NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)

17.4 W

NoC

250 GB/s total bandwidth 15%

Typical FPGA Dynamic Power

3. Power Analysis

123%

28

NoC Power Budget3. Power Analysis

NoC

17.4 WTypical FPGA

Dynamic Power

Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11%

29

NoC Power Budget3. Power Analysis

NoC

17.4 WTypical FPGA

Dynamic Power

Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11% 7%

30

Bandwidth in Perspective

14.6 GB/s

14.6 GB/s

14.6 GB/s

14.6 GB/s

17 G

B/s

17 G

B/s

17 G

B/s

17 G

B/s

DDR3 Module 1

PCIe Module 2

Full theoretical BW

126 GB/sAggregate Bandwidth

3.5%NoC Power Budget

Cross whole chip!

3. Power Analysis

31

FPGA Interconnect

1 1

Point-to-point Links

Broadcast

1 1

n

Multiple Masters

1

1Mux + Arbiter

n

Multiple Masters, Multiple Slaves

1 1Mux + Arbiter

n nMux + Arbiter

Interconnect = Just wires Interconnect = Wires + Logic Interconnect = NoC

1 .. .. ..

.. .. .. ..

.. .. ..

.. .. .. n

..Compare “wires” interconnect to NoCs

3. Power Analysis

32

NoC Power vs. FPGA Interconnect

Hard and Mixed NoCs very compelling

Length of 1 NoC Link1 % area overhead on Stratix 5

Runs at 730-943 MHz

Power on-par with simplest FPGA interconnect

3. Power Analysis

200 MHz

High Performance / Packet Switched

1

2

3

Big city needs freeways to handle traffic

Area: 20-23X

Why NoCs on FPGAs?

Embedded NoCs: Mixed & Hard

Power Analysis

Speed: 5-6X Power: 9-15X

• Power-aware design of embedded NoCs• Power Budget for 100 GB/s: 3-7%• Point-to-point soft Links: 4.7 mJ/GB• Embedded NoCs: 4.5 – 10.4 mJ/GB

34

eecg.utoronto.ca/~mohamed/noc_designer.html

35

Thank You!

eecg.utoronto.ca/~mohamed/noc_designer.html