3D-DRESD Polaris

Preview:

Citation preview

POLITECNICO DI MILANO

PolarisPolaris

2

A workflow to manage allocation and relocation of tasks in a reconfigurable architecture

Final goal: complete architecture (bitstreams) generation

PolarisPolaris

POLITECNICO DI MILANO

Management of 2D Management of 2D Reconfiguration in a Reconfiguration in a

Reconfigurable SystemReconfigurable System

Massimo Morandimassimo.morandi@dresd.org

4

OutlineOutline

IntroductionProblem description Project Goals and Contributions

Project in detailsPhasesResults

Future Work

Problem DescriptionProblem Description

New Generation of FPGAsVirtex-4 and Virtex-5Allow bi-dimensional reconfiguration

This permits to:Better exploit reconfigurable areaObtain modules performance optimizations

More complex management:Handle one more degree of freedomAvoid more fragmentationPerform good placement choices to keep low TRRKeep acceptable intra-module routing paths

5

Project Goals and Project Goals and ContributionsContributions

Analyze effects of 2D reconfigurationNew advantagesNew problems

Examine possible solutions to new problemsExplore literature to find promising ideasEvaluate those solutions in various scenarios

Propose a new solutionCombining ideas from literature with new onesObtaining good cost-quality tradeoff

6

Setting and Advantages Setting and Advantages DefinitionDefinition

Definition of the setting:2D self partial dynamical run-time reconfiguration

Analysis of the advantages of 2D ReconfigurationIn area usage and performance

7

8

2D Fragmentation Problem2D Fragmentation Problem

Analysis of the 2D-fragmentation problemArea generally more fragmentedCan nullify the area optimizations obtained

9

Placement DecisionsPlacement Decisions

Analysis of 2D placement choices effects:Again, bad choices can lead to performance loss

10

Allocation managerAllocation manager

Definition of allocation manager desired features:Low TRRLow management overheadHigh routing efficiencyLow fragmentation

Definition of allocation manager structure:Empty space manager

Complete space Heuristic selection

FitterGeneral (FF,BL,BF,WF…)Focused (FA,RA… )

Most relevant worksMost relevant works

Maintain complete information on empty space:KAMER:

Keep All Maximally Empty RectanglesApply a general fitting strategy

CUR:Maintain the Countour of a Union of RectanglesApply a focused fitting strategy

Heuristically prune part of the information:KNER:

Keep Non-overlapping Empty RectanglesApply a general fitting strategy

2D-HASHING:Keep Non-ov. Empty Rectangles in optimized data structure

Apply (exclusively) a general fitting strategy11

Evaluation and Proposed Evaluation and Proposed ApproachApproach

Proposed ApproachHeuristic (KNER-like) empty space manager, to keep low complexity for use in a self-reconfigurable systemFitting strategy focused on minimizing routing paths, to maintain high performance of the reconfigurable system (chosen metric to minimize Manhattan distance)12

High placement quality => high complexityLowest compl. => no focused fitting (bad especially for routing)

13

Structure of the allocation managerStructure of the allocation manager

Task, defined by:Arrival time, ASAP, (ALAP), H, W, Latency, Communicating TasksHosted in a queue which also adds a pointer to the rectangle where it is placed

Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle. Navigation trough pointers to left child, right child, next leaf and a function to find previous leaf (for bookkeeping after split or merge)

Rectangle, defined by:X, Y, H, WInitially one, (X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols

14

The Placement AlgorithmThe Placement Algorithm

Experimental ResultsExperimental Results

Benchmark of 100 randomly generated tasks:Size (5% to 25% of FPGA), randomly interconnected

Execution time: 3x less than CUR, close to KNERCommunication cost: 3x less than KNER, close to CURTask Rejection Rate: all solutions quite close

15

Future WorkFuture Work

Apply the proposed solution to self reconfiguration:

Adapt the algorithm to run on the internal processorCreate a validation reconfigurable architectureIntegrate the architecture with relocation

Tune the algorithm to improve results:Experiment techniques to reduce TRRTry to optimize the code to have an algorithm with lower running time

Evaluate other fitting strategies16

17

Questions?Questions?

POLITECNICO DI MILANO

Relocation for 2D Relocation for 2D Reconfigurable SystemsReconfigurable Systems

Marco Novatimarco.novati@dresd.org

19

Project OutlineProject Outline

IntroductionProblem descriptionProject Goals

Project in detailsPhases Results

What’s next

ProblemProblem DescriptionDescription

Self Dynamical Runtime 2D ReconfigurationXilinx Virtex-4 and Virtex-5

Relocation, different solutionsSoftwareHardware

We chose an hardware solutionBiRF Square

20

Project GoalsProject Goals

Study of the new FPGA FamiliesExamination of Xilinx documentation on V4 and V5

Analysis of the new bitstream structureGeneration of V4 and V5 bitstream

Development of the new version of BiRFImplementationValidation

21

New Frame Addressing:Possibility of addressing rows and columns

Frame Addressing (1/2)Frame Addressing (1/2)

22

23

Frame Addressing (2/2)Frame Addressing (2/2)

24

New ParserNew Parser

CRC CalculationCRC Calculation

Particular CRC value, used by Xilinx tools

Two version of BiRF Square:By using the “predefined” valueWith actual CRC calculation

An optimized algorithm has been used

25

Synthesis resultsSynthesis results

On a Virtex-4 with speed grade -12General purpose version: max frequency of 160 MHzSpecific version: max frequency of 290 Mhz

26

27

Target DeviceTarget Device

28

Validation ArchitectureValidation Architecture

Results (1/2)Results (1/2)

BiRF SquarePermits apply relocation in a self partially and dynamically 2D-reconfigurable systemThe occupation ratio is relatively smallFrequency more than acceptableReduction of internal memory requirements

29

Results (2/2)Results (2/2)

Throughput of 7,3 MB/s:

A total configuration file size is about 1 MBConsidering an architecture:

1/3 of the area as fixed part 2/3 as reconfigurable part with 6 slots

With such hypothesisSize of a partial bitstream will be about 110 KBRelocation time of about 15 ms

30

What’s NextWhat’s Next

Future improvements:Direct access to the memory (DMA)

Direct manipulation of the bitstreamPortability

Integration with ICAPElimination of the relocation overhead Relocation time << reconfiguration time

Future work:Provide a simulation framework to monitor the reconfigurable system evolution and to evaluate different choices

The final goal:Creation of a real architecture that exploits self partial and dynamical 2D-reconfiguration,with relocation

31

32

QuestionsQuestions

Recommended