22
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a , Rakesh Kumar b , Roman Lysecky c , Frank Vahid a* , Dean Tullsen b a Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine b Department of Computer Science and Engineering University of California, San Diego c Department of Electrical and Computer Engineering University of Arizona This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and by hardware and software donations from Xilinx

Application-Specific Customization of Parameterized FPGA Soft-Core Processors

  • Upload
    essien

  • View
    46

  • Download
    1

Embed Size (px)

DESCRIPTION

Application-Specific Customization of Parameterized FPGA Soft-Core Processors. David Sheldon a , Rakesh Kumar b , Roman Lysecky c , Frank Vahid a* , Dean Tullsen b a Department of Computer Science and Engineering University of California, Riverside - PowerPoint PPT Presentation

Citation preview

Page 1: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldona, Rakesh Kumarb, Roman Lyseckyc, Frank Vahida*, Dean Tullsenb

aDepartment of Computer Science and EngineeringUniversity of California, Riverside

*Also with the Center for Embedded Computer Systems at UC Irvine

bDepartment of Computer Science and EngineeringUniversity of California, San Diego

cDepartment of Electrical and Computer EngineeringUniversity of Arizona

This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and by hardware and software donations

from Xilinx

Page 2: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 2 of 22

FPGA Soft Core Processors Soft-core Processor

HDL description

Flexible implementation FPGA or ASIC

Technology independent

HDLDescription

FPGA ASIC

Spartan 3 Virtex 2 Virtex 4

Page 3: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 3 of 22

FPGA Soft Core Processors Soft Core Processors

can have configurable options Datapath units Cache Bus architecture

Current commercial FPGA Soft-Core Processors Xilinx Microblaze Altera Nios FPGA

μP

Cache

FPUMAC

Page 4: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 4 of 22

Goal Goal: Tune FPGA soft-core microprocessor for a given application

FPGA

Synthesis

size

time

App

Configured μP

Parameter Values μPParameter

Values

Configured μP

Page 5: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 5 of 22

Microblaze – Xilinx FPGA Soft-Core

BaseMicroBlaze

Multiplier

Barrel ShifterDivider

FPU

Cache

0

2

4

6

8

10

12

14

16

18

Size (Equivalent LUTs)

Appl

icat

ion

Runt

ime

(ms) base

bs

mul+bs

mul+bs+cache

FPU

bs+cache

mul

01234567

aifir

BaseF

P01bit

mnpbr

evca

nrdr

g3fax

g721

_ps

idctmatm

ulra

ytrac

etbl

ook

ttsprk AVG

Spee

dup

Base MBFull MBOptimal MB

Significant tradeoffs

All units not necessarily the fastest, due to critical path lengthening

Instantiatable units

Page 6: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 6 of 22

Problem Need fast exploration

Synthesis runs can take an hour

Synthesis~20-60 mins

Parameter Values μP

Exploration

Configured μP

This talk Two approaches

Approach 1: Using Traditional CAD Techniques

Approach 2: Synthesis-in-the-loop

Results

Page 7: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 7 of 22

Constraints on Configurations Size constraints may prevent use of all

possible unitsMultiplier

FPU

Cache

Barrel Shifter

Divider

MicroBlaze Cache

Multiplier FPU

Max Area

Page 8: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 8 of 22

Approach 1: Traditional CAD Techniques

Create a model of the problem

Solve model with extensive search heuristics

We will model this problem as a 0-1 knapsack problem

Model

ExplorationFast,

considers 1000s

of configurations

MicroBlaze Cache

Multiplier FPU

Max Area

Create model Slow, includes synthesis

Page 9: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 9 of 22

Approach 1: Traditional CAD Techniques

MicroBlaze

Multiplier

sizepe

rf

Cache

perf

size

Divider

size

perf

size

perf

Barrel Shifter

perf

size

FPU

BSPerf increment

Size increment

FPU MUL DIV CACHE1.1 0.9 1.2 1.0 1.3

1.4 2.7 1.8 1.1 1.6

Perf/Size 0.96 0.34 0.63 0.93 0.80

Creating the model

Synthesis

MicroBlazeFPU

Synthesis

App

Base

Page 10: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 10 of 22

Approach 1: Traditional CAD Techniques

0-1 knapsack model Object’s benefit = Unit’s performance increment / size

increment Object’s weight = Unit’s Size Knapsack’s size constraint = FPGA size constraint

BS

Perf increment

Size increment

FPU MUL DIV CACHE

1.1 0.9 1.2 1.0 1.3

1.4 2.7 1.8 1.1 1.6

Perf/Size 0.96 0.34 0.63 0.93 0.80

Micro-Blaze

Page 11: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 11 of 22

Approach 1: Traditional CAD Techniques

Solved the 0-1 knapsack problem using established methods

Toth, P., Dynamic Programming Algorithms for the Zero-One Knapsack Problem. Computing 1980

Running time 6 Microblaze configuration synthesis

runs to create model O(n*p) to solve model

n is the number of factors p is the available area Negligible (seconds) compared to synthesis

runtimes (~hour)

Page 12: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 12 of 22

Approach 1: Traditional CAD Techniques

Problems 100’s of target FPGAs

Different hard core resources (multiplier, block RAM)

Model approach estimates size and performance for two or more units

MUL speedup 1.3, DIV speedup 1.6 estimate MUL+DIV speedup 1.9

May really be 1.7 Model inaccuracies may be

large

Device LUTs PPCsXC2V2000 21504 0XC2VP2 2816 0XC4VLX80 71680 0XC4VLX15 12288 0XC2S300E 6140 0XC2V4000 46080 0XC2VP40 38784 2XC4VSX25 20480 0XC4VSX35 30720 0XC4VFX20 17088 1XC2S150E 3456 0XC2VP30 27392 2XC4VLX60 53248 0XC2S600E 13824 0XC2VP20 18560 2XC2V500 6144 0XC2VPX70 66176 2XC4VLX40 36864 0XC2V6000 67584 0XC4VFX60 50560 2XC4VFX100 84352 2XC2VP4 6016 1XC2VP70 66176 2

Page 13: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 13 of 22

Approach 2: Synthesis-in-the-Loop

Problem with traditional CAD approach

100’s of target FPGAs Model approach estimates size and

performance for two or more units Model inaccuracies may be large

Solution – Synthesis in the loop No abstract model Guided by actual size and

performance data But slow – can only explore a few

configurationsExploration

Synthesis

perf

size

Execute

Synthesis-in-the-Loop

10’s of minutes

Model

Exploration

Create model

Page 14: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 14 of 22

Approach 2: Synthesis-in-the-Loop

Multiplier

size

perf

Cache

perf

size

Divider

size

perf

BS

Perf increment

Size increment

FPU MUL DIV CACHE

1.1 0.9 1.2 1.0 1.3

1.4 2.7 1.8 1.1 1.6

Perf/Size 0.96 0.34 0.63 0.93 0.80

size

perf

Barrel Shifterpe

rf

size

Floating Point

First pre-analyze units to guide heuristic Same calculations as when creating model for

knapsack

Page 15: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 15 of 22

Approach 2: Synthesis-in-the-Loop

Build “impact-ordered tree” structure Tree is specific to given

applicationBS FPU MUL DIV CACHE

Perf/Size 0.96 0.34 0.63 0.93 0.80

Sort

BS FPUMULDIV CACHE

Perf/Size 0.96 0.340.630.93 0.80

BS

CACHEMUL

FPU

DIV

Application Specific Impact-

ordering0.96

0.80

0.63

0.34

0.93

Impact

Page 16: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 16 of 22

Approach 2: Synthesis-in-the-Loop

Run tree-based search heuristic

BS

MULFPU

DIVInclude

Not Includ

e

CACHE

UsefulYes

Yes

No

No

No

0.96

0.80

0.630.34

0.93

Perf/Size

Synthesis-in-the-Loop

Exploration

Synthesis

perf

size

Execute

Page 17: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 17 of 22

Comparison of Approaches Approach 1 – Traditional CAD

6 synthesis runs to build model O(np) knapsack solution Examines thousands of configurations during

exploration Approach 2 – Synthesis in the loop

11 synthesis runs (6 pre-analysis, 5 exploration)

Examines (at most) 5 configurations during exploration

Page 18: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 18 of 22

Results 10 EEMBC and Powerstone benchmarks

aifir, BaseFP01, bitmnp, brev, canrdr, g3fax, g721_ps, idct, matmul, tblook, ttsprk

Average results shown, on Virtex 2 Pro, for particular size constraintTo

ol R

un T

ime

(min

)

Speedup

0

200

400

600

800

1 1.5 2 2.5

ExhaustiveApp-Spec

Knapsack

Application-specific impact-ordered tree approach yields near-optimal results in acceptable tool runtime

Knapsack sub-optimality due to multi-unit estimation inaccuracy

Page 19: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 19 of 22

Results Obtained results

for six different size constraints Results shown

for a second size constraint

Similar findings for all six constraints

Tool

Run

Tim

e (m

in)

Speedup

0

200

400

600

800

1 1.5 2 2.5

Exhaustive

App-Spec

Knapsack

Page 20: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 20 of 22

Results Also ran for

different FPGA Xilinx Spartan2 Similar findings

Tool

Run

Tim

e (m

in)

Speedup

0

50

150

250

300

1 1.2 1.4 1.6

100

200

Exhaustive

App-Spec

Knapsack

Page 21: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 21 of 22

Conclusions Synthesis-in-the-loop approach

outperformed traditional CAD approach Better results Slightly longer runtime

Application-specific impact-ordered tree heuristic served well for synthesis-in-the-loop approach

Future Extend for highly-configurable soft-core

processors, and for multiple processors competing for and/or sharing resources

Page 22: Application-Specific Customization of Parameterized FPGA Soft-Core Processors

David Sheldon, UC Riverside 22 of 22

Questions?