View
746
Download
0
Category
Preview:
DESCRIPTION
Citation preview
POLITECNICO DI MILANO
DReAMSDReAMSDynamic Reconfigurability Applied to Multi-FPGA Dynamic Reconfigurability Applied to Multi-FPGA
SystemsSystems
Matteo Murgida, Alessandro Panella{matteo.murgida,
alessandro.panella}@dresd.org
2
DReAMSDReAMS
Dynamic Reconfigurability Applied to Multi-FPGA Systems
Branch of DRESD projectInherits architectures and tools
Automatic workflow from VHDL system description to FPGA implementation
VHDL parsing and system simulationSystem creation over a specific architectureBitstream creation and download onto FPGAs
3
WorkflowWorkflow
POLITECNICO DI MILANO
SPartASPartAA novel algorithm for multi-FPGA partitioningA novel algorithm for multi-FPGA partitioning
Alessandro Panellaalessandro.panella@dresd.org
5
OutlineOutline
Problem description
Project goals and contributions
What is partitioning?
Existing approaches
Going deep into the problem
SpartAThe frameworkThe ideaThe algorithm
Experimental resultsFuture work
6
Problem descriptionProblem description
Multi-FPGA - RATIONALELarge designs do not fit into a single chip
High performance parallelized applications
Our case: apply dynamic reconfigurability
Need to break the initial design into several blocks
One block corresponds to a single FPGA chip
Which inputs/outputs?
Which objectives?
Which techinques?
7
Project goals and Project goals and contributionscontributions
Analyze existing approachesObtain a deep knowledge of this -well explored- field
Extract basic ideas for a new approach
Obtain some terms of comparison
Define precisely which problem(s) we cope withContextualize the problem
Focus on our needs
Develop a new solutionTheoretical background
Implementation and evaluation
8
What is partitioning?What is partitioning?
GoalDivide a set of interrelated objects into a set of subsets
Optimize a specific objective(s)
K-way partitioningGiven a graph G=(V,E), partition it into k subsets V1...Vk such that their intersection is empty and their union = V.
Balance constraint: |Vi| ≈ |V|/k
Aims at minimizing (or maximizing) an objective function
• Edge-cut• Other objectives
• In general: NP-complete• Several heuristics that provide good results have
been developed
9
Existing approaches - a glanceExisting approaches - a glance
• Traditional methodsKernighan – Lin and Fiduccia – Mattheyses heuristics
Iterative-improvement algorithmsBegins with an initial partition and iteratively improve itO(n3) complexity
Iterative algorithmsGeneticSimulated annealing
Multilevel algorithmsClustering -> Initial partitioning -> RefiningMeTIS/hMETIS suite: best current results for large flattened graphs partitioning
10
Going deeper into the Going deeper into the problemproblem
• Two kinds of multi-FPGA partition• Topology-aware
• Architecture topology is an input• No optimization of the no. of FPGAs needed• Main task: association between the (larger) system
graph and the (smaller) architectural graph• Topology-free
• Architecture topology is not provided• Input: dimension and communication features of
FPGAs• Minimization of the number of FPGAs• Place and route after partitioning
• At the moment, we deal with the Topology-free problem
11
SPartA: the frameworkSPartA: the framework
• Input: VHDL system description
• Output: several VHDL files, one for each block (FPGA)
• Three main phases:• Extract design from VHDL
description• “Real” partitioning phase
(core)• Build VHDL files
12
SPartA: the ideaSPartA: the ideaStructural approach
Fully exploits the design hierarchyModules can be treated as single blocksBases for expansions toward dynamic reconfigurability
ObjectivesMinimize cutsizeMinimized the number of used FPGAsPreserving module integrity
(*)
(*) From Wen-Jong Fang and Allen C.-H. Wu, Multiway FPGA Partitioning by Fully Exploiting Design Hierarchy
13
SPartA: the algorithm SPartA: the algorithm 1/21/2
Recursive algorithm (deals with trees)Starts from TOP nodePrecondition
No leaves with dimension > FPGA size
At every moment, a node can be:COVERED, UNCOVERED or PARTIALLY COVERED
Stop condition Node TOP is COVERED
14
SPartA: the algorithm SPartA: the algorithm 2/22/2
OPEN ISSUE: Selecting the first node to be inserted into an empty partition
Random node
Node with overall max communication
Node with max communication with its siblings
15
Results Results 1/31/3
• Complexity: exponential, due to the recursive nature of the algorithm
• Execution time however low (tens of seconds for a reasonable large design)
• EXAMPLE
ORIGINAL TREE PARTITIONED TREE
16
Results Results 2/32/3
• Evaluation metrics• EDGECUT, FILLING and SPLITS
• Evaluation of the three policies for node selection• 18 different trees of varying size
17
Results Results 3/33/3
18
Future workFuture work
Algorithm improvementBalancing of last partition
First node selection policies
More refined “score” function for selecting nodeUse closeness metrics
Comparisons with existing algorithms
ExpansionSpartA framework development
Topology-aware partitioning
19
The endThe end
ANY QUESTIONS?
POLITECNICO DI MILANO
ChimeraChimeraMulti-FPGAs Architecture DefinitionMulti-FPGAs Architecture Definition
Matteo Murgidamatteo.murgida@dresd.org
21
Murgida - OutlineMurgida - Outline
IntroductionProblem descriptionProject GoalsState of the Art
Project in detailsContributionsDevelopmentResultsFuture Works
Demo
22
Problem DescriptionProblem Description
Architectural description of a distributed FPGAs environment3 layers architecture
23
Project GoalsProject Goals
Design the architecture of the most generic distributed system
Node definitionInterface definitionCommunication channel definition
Design a communication protocolEssential protocolInterrupt based protocolTimeout improvement
24
State of the ArtState of the Art
CONFigurable ElecTronic TIssue (CONFETTI) by EPFL
Cellular based architecturePROs: high degree of parallelism, high computational powerCONs: no flexibility, oversized for small problems, small architectural customizations imply big cost/effort
Splash 2 by IDA Supercomputing CenterArchitecture composed by a Sun Sparcstation host, an interface board and “Splash Array” boardsPROs: again high parallelism and powerCONs: a central host coordinates the computational units, no fault tollerance, no flexibility
25
ContributionsContributions
The proposed architecture:
Allows several Spartan-3 Starter Boards to communicate and exchange data
It is portable to different FPGAs with minimum effort
It is the basic infrastructure that will allow external partial dynamic reconfiguration
26
Board StudyBoard Study
How to use resources like switches, leds and connectors in the boardHow to map an IP-Core port with a physical pin of the boardChoice of the A2 Expansion Connector to connect two boards
27
Microblaze CommunicationMicroblaze Communication
Communication between two Microblaze soft-processorsDevelopment of a display controller to visualize the data flow
28
GPIO InsertionGPIO Insertion
Higher architecture portability through the use of the GPIO IP-Core.
29
Interrupt Controller InsertionInterrupt Controller Insertion
Communication protocol improvement by interrupt handling to prevent processor from busy waiting Interrupt Controller is included in the architecture to permit multi-interrupt detection and handling
30
TimeoutTimeout
Malfunctioning due to interference on the communication channel lead to deadlocks
Communication protocol is not reliable at all
Counter implementation, including the driver used by the processor to lower down raised interrupts
Development of a simple application to verify to correctness of the proposed approach
31
ResultsResults
A short Demo ...
32
DemoDemo
33
Future WorksFuture Works
• Development of a SystemC/VHDL Co-Simulation Framework
• Expert system integration
34
The endThe end
ANY QUESTIONS?
Recommended