34
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

FPGA Logic Cluster Design

  • Upload
    kirk

  • View
    101

  • Download
    0

Embed Size (px)

DESCRIPTION

FPGA Logic Cluster Design. Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223. How Much Logic Should Go in an FPGA Logic Block?. Vaughn Betz, Jonathan Rose IEEE Design & Test of Computers 15(1): 10-15 (1998). Three Questions. - PowerPoint PPT Presentation

Citation preview

Page 1: FPGA  Logic Cluster Design

FPGA Logic Cluster Design

Dr. Philip BriskDepartment of Computer Science and Engineering

University of California, Riverside

CS 223

Page 2: FPGA  Logic Cluster Design

How Much Logic Should Go in an FPGA Logic Block?

Vaughn Betz, Jonathan RoseIEEE Design & Test of Computers

15(1): 10-15 (1998)

Page 3: FPGA  Logic Cluster Design

Three Questions• How many inputs should the FPGA routing provide to a cluster of LUTs? (I)

– Routing flexibility vs. area

• As the number of LUTs in a logic cluster changes, how should the FPGA’s routing architecture change? (Fc)

• How many LUTs should be included in a cluster? (N)

Page 4: FPGA  Logic Cluster Design

Experimental Methodology

• 20 MCNC Benchmarks– Well-established– A bit old, even by 1998

standards– Sadly, still in use

• 4-LUT Architecture• Fs = 3– Vary other parameters to

see what works best

Page 5: FPGA  Logic Cluster Design

Area Model

• Count the number of min-width transistors required to implement a benchmark circuit in an FPGA architecture

• Normalized Area(Num min-width transistors used) / (Num BLEs used)

Page 6: FPGA  Logic Cluster Design

How many cluster inputs do we need?

We hit near 100% utilization when I = 50-60% of the total number of BLE inputs

We can pack BLEs together to share common inputs

Re-use locally generated outputs

Works because the packing algorithm was effective!

Input sharing and outputre-use within alogic cluster

Page 7: FPGA  Logic Cluster Design

Visual Depiction

I = ~0.6KN is pretty goodUse the feedbacks!

Fanout

Page 8: FPGA  Logic Cluster Design

The Packer was Effective!

It packed BLEs together to share common inputs

It re-use locally generated outputs via the feedbacks

Page 9: FPGA  Logic Cluster Design

Cluster inputs vs. Cluster size

Approx. (2N + 2)

N = 1 BLE uses 3.5/4

inputs(on average)

N = 16 BLEs uses19.7 / 64 inputs, on average

Page 10: FPGA  Logic Cluster Design

Commercial FPGAs

• Altera Flex 8000 FPGA uses a cluster of size N=8 with I=24– Results suggest to reduce I to 18 (save area)

• Xilinx 5200 FPGA uses a cluster of size N=4 with I=16– Results suggest to reduce I to 10 (save area)

Page 11: FPGA  Logic Cluster Design

Routing Flexiblity vs. Cluster Size• Set Fc = W/N– Each routing track is driven by one LUT output pin

in the cluster

Page 12: FPGA  Logic Cluster Design

Area Efficiency vs. Cluster Size

I is set to achieve 98% logic utilization

N=2 BLEs introduces intra-cluster routing

Reduce routing between logic blocks

Area efficiency rapidly degrades

beyond this point

Page 13: FPGA  Logic Cluster Design

Conclusions

• I = 2N + 2 for N < 16– Slow, linear growth

• Reduce Fc

– Works because LUT inputs are equivalent• Cluster area efficiency is within 10% for 1 < N < 8• Large clusters reduce the size of the placement

problem and increase FPGA speed

Page 14: FPGA  Logic Cluster Design

The Effect of LUT and Cluster Size on Deep-Submicron FPGA Performance and Density

Elias Ahmed, Jonathan RoseIEEE Transactions on VLSI Systems

12(3): 288-298 (2004)

Page 15: FPGA  Logic Cluster Design

Contributions

• Vary LUT size (K) from 2 to 7• Vary cluster size (N) from 1 to 10 LUTs– Experimentally determine the number of cluster

inputs (I) as a function of K and N– Clustering small LUTs (K=2,3) produces good area

results, but bad performance (~2x worse)– LUTs of size (K=4,5,6), clusters of size (N=3…10)

yield the best area-delay product

Page 16: FPGA  Logic Cluster Design

CAD Flow

Page 17: FPGA  Logic Cluster Design

Inputs Req.’d for 98% Area Utilization

I = ½K(N+1)

Page 18: FPGA  Logic Cluster Design

Total Area

• LUT sizes of K = 4,5 are the most area efficient for all cluster sizes• Reduction in total area as cluster size increases from 1-3 for all LUT sizes

• As clusters are made larger (N > 4) there is little impact on total FPGA area

• Intra-cluster routing area is 25-35% of the total area

Page 19: FPGA  Logic Cluster Design

Total Intra-cluster Routing AreaThe increase in cluster size far outweighs the rate of decrease in the number of clusters: hence the upward trend

Page 20: FPGA  Logic Cluster Design

#Clusters and Area/Cluster vs. K

25-35%

N = 1 BLE per Cluster

Page 21: FPGA  Logic Cluster Design

LUT area vs. Intra-cluster Mux Area

Intra-cluster routing area is 25-35% of logic cluster area

LUT area dominates

Page 22: FPGA  Logic Cluster Design

Intra-cluster Routing Area as a Function of LUT Size

Total intra-cluster routing area decreases near-linearly from K = 3 to 7

Page 23: FPGA  Logic Cluster Design

Total Intra-cluster Routing Area

The product of these two curves givesthe total inter-cluster routing area.

Routing area decreases linearly with LUT size• Increasing LUT sizes decreases the

number of clusters used faster than the rate of increase in routing area per cluster

• Depends on good CAD tools

Page 24: FPGA  Logic Cluster Design

Critical Path Delay vs. LUT Size

Increasing both N and K has a positive effect• Benefits saturate as N and K get large

As N and K increase• LUT delay and the delay through a

single cluster increases• The number of LUTs and clusters in

series on the critical path decreases• Reduced global routing delay

Page 25: FPGA  Logic Cluster Design

Intra-cluster Delay vs. LUT Size

Intra-cluster delay decreases as K increases• Reduction in number of BLE levels on critical pathIntra-cluster delay increases as N increases• Larger intra-cluster cluster muxes are slower• The delay through these muxes is still much faster

than global routing delay

Page 26: FPGA  Logic Cluster Design

BLE Delay vs. K

BLE delay increases linearly as K increases (intuitive)

Number of BLEs on the critical path decreases quadratically as K increases• Fewer, but larger, BLEs

Page 27: FPGA  Logic Cluster Design

Global Routing Delay vs. KAs K increases• Fewer LUTs on the critical path• Fewer global routing linksAs N increases• More opportunities to use faster

intra-cluster routing

Page 28: FPGA  Logic Cluster Design

Critical Path Delay (K = 4)

• K remains constants– No reduction in number of BLEs on critical path

• N increases– BLE and intra-cluster routing delay increase– More logic implemented internally within clusters– Can use faster intra-cluster routing instead of global routing

Page 29: FPGA  Logic Cluster Design

Critical Path Delay vs. LUT Size (Recap)Increasing N beyond 3 has minimal effects• Limited effectiveness of clustering• Architectural weakness? • Semi-effective CAD tools?

Page 30: FPGA  Logic Cluster Design

Number of Logic Clusters on Critical PathThe number of logic levels decrease withincreasing N and K• For a given K, most of the reduction is from

N = 1 to 3• The majority of the critical path delay was

reduced in this range• Increasing N is less effective when K is large

Page 31: FPGA  Logic Cluster Design

BLE Fanout vs. LUT Size

Smaller LUTs have better response to increasing N because each LUThas a relatively small fanout • Adding an extra BLE to the cluster guaranteed

some reduction in the number of logic levels

Larger LUTs have larger average fanout • Harder to ensure that increasing N will

result in fewer cluster levels on the critical path

Page 32: FPGA  Logic Cluster Design

Area-Delay ProductLarge Delays• Many BLEs on

critical path• Slightly larger area

requirement

Large area cost for K=7 outweighs marginal delay improvement

Page 33: FPGA  Logic Cluster Design

Caveats

• Quality of CAD tools• Mix of benchmark circuits• Limited exploration of routing parameter

design space– Parameters were derived from N = K = 4

Page 34: FPGA  Logic Cluster Design

Best Overall Results and Summary

• To achieve 98% LUT utilization, set I = ½K(N+1)• Small LUT sizes are not area efficient and have poor

performance characteristics• Future challenges– Reduce number of BLEs on critical path without resorting

to larger LUTs – Reduce intra-cluster routing delays