26
Intelligent Design Space Exploration for High-Level and System Synthesis AIDArc 2020 1 Antonino Tumeo

Intelligent Design Space Exploration for High-Level and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intelligent Design Space Exploration for High-Level and

Intelligent Design Space Exploration for High-Level

and System Synthesis

AIDArc 2020

1

Antonino Tumeo

Page 2: Intelligent Design Space Exploration for High-Level and

Outline

• Synthesis of accelerators for irregular applications and data analytics

• Speeding up design space exploration in high-level and system synthesis with bio-inspired heuristics

•Overview of SODALITE

•Opportunities for artificial intelligence in SODALITE

2

Page 3: Intelligent Design Space Exploration for High-Level and

3

Irregular Applications Characteristics• Unpredictable, fine-grained data accesses

• Poor locality

• Pointer or linked-list based data structures• Graphs & sparse matrices, unbalanced trees, unstructured grids

• Difficult to partition in a balanced way• Inherent parallelism (for each element)• High synchronization intensity

• In general, memory-bound• High memory parallelism, but many small memory operations in unrelated locations• The key problem is actual bandwidth utilization

• Prototypical irregular kernels: graph algorithms• Data analysts do not only want to compute metric on graphs, but also and foremost query graph databases

(e.g., to find interesting patterns)

Page 4: Intelligent Design Space Exploration for High-Level and

4

Application-specific Accelerators• As Moore’s law slows down, application-specific accelerators appear the main

approach to keep increasing efficiency

• At one end, sea of application-specific accelerators

• At the other end, (re)emergence of (re)configurable designs• FPGAs in the cloud, FPGAs for HPC• Renewed interest for Coarse Grained Reconfigurable Arrays (CGRAs)

• Reconfigurable architectures, and FPGAs in particular, may have hard time to reach peak flop rates of ASICs• Can make it up in efficiency• Key aspect (especially for irregular applications): enable exploration around the memory

interface

Page 5: Intelligent Design Space Exploration for High-Level and

5

High-Level Synthesis

• Tries to bridge the design gap of FPGA accelerators• Generation of hardware design language descriptions starting from high-level

program specifications

• Conventional High-Level Synthesis flows address:• Dense, regular data structures• Simple memory models• Instruction-level parallelism• Compute-bound kernels (Digital Signal Processing-like)• Latest commercial tools based on OpenCL works well for regular, compute-

bound workloads• Significant limits for nested-loops, no support for atomic memory operations

Page 6: Intelligent Design Space Exploration for High-Level and

6

Our contributions

• We have developed a set of techniques to enable HLS of Irregular Applications• Customizable architectural templates and related analysis and synthesis methodologies• Implemented in an open-source HLS research framework – PandA Bambu – available at: https://panda.dei.polimi.it

Page 7: Intelligent Design Space Exploration for High-Level and

7

Query exampleReturn the names of all persons owning at least two cars, of which at least one is a SUV

Page 8: Intelligent Design Space Exploration for High-Level and

8

Source Code Example

Page 9: Intelligent Design Space Exploration for High-Level and

Multithreaded architecture template

• Architectural templates: expose set of parameters (number of accelerators, memory channels, contexts) to explore

• Synthesizes effectively parallel loop iterations with atomic memory operations

9

Page 10: Intelligent Design Space Exploration for High-Level and

10

Design Space Exploration

Page 11: Intelligent Design Space Exploration for High-Level and

11

Intelligent System Design• The previous example has shown only the space of the parameters for the multithreaded

architecture template• In reality, High-Level or System Synthesis need to solve various NP-Complete Problems• High-Level Synthesis:• Resource Allocation, Scheduling, Resource Binding, Interconnection…

• System Synthesis:• HW/SW partitioning, Scheduling, Mapping, Communication orchestration…

• Brute-force methods require too much time• Problems also are strictly correlated, and executed in different orders

• Many Integer Linear Programming formulations• Still too much time to converge

• Heuristic optimization algorithms• Many bio-inspired (genetic algorithms, swarm optimization)

Page 12: Intelligent Design Space Exploration for High-Level and

Genetic Algorithm for High-Level Synthesis

•Genetic Algorithms (GAs) enable exploring non-convex design spaces by evolving a population of solutions• mutation introduces local variations• crossover allows jumping across areas of the space and exit from local minima

or maxima• selection of the fittests then guides the search along the most promising areas

•We apply GAs High-Level Synthesis process• Each chromosome represent a full synthesis process• By considering the full synthesis process, we can explore much larger design

space than considering each synthesis “task” alone

12

Page 13: Intelligent Design Space Exploration for High-Level and

NSGA-II for synthesis example

• Non-Dominated Sorting Genetic Algorithm II• Chromosome encoding• Binding of Operations to Functional Units• Algorithms for Scheduling, Register Allocation, Interconnection

• Mutation and crossover• Elitism preserves diversity into the population• Crowded-comparison operator based on density estimation, allows obtaining the crowding distance

• Selection: non-dominated rank and crowding distance• Solutions are ranked also inside a non-dominated level: if they have the same rank, they belong to

the same front and selection prefers the. less crowded region

13

[C. Pilato, A. Tumeo, G. Palermo, F. Ferrandi, P. L. Lanzi, D. Sciuto:Improving evolutionary exploration to area-time optimization of FPGA designs. J. Syst. Archit. 54(11): 1046-1057 (2008)]

Page 14: Intelligent Design Space Exploration for High-Level and

Ant Colony Optimization for Scheduling and Mapping• Ant Colony Optimization: multi-agent optimization heuristic• Ants randomly explore different paths to the food.

• At each decision point:

• They deposit pheromone proportionally to the length of the path, which suggests other ants to follow the same trail. Pheromone also evaporates with time.

14

Page 15: Intelligent Design Space Exploration for High-Level and

Scheduling and Mapping Example

15

[F.Ferrandi, P.L. Lanzi, C. Pilato, D. Sciuto, A. Tumeo: Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems. IEEE Trans. on CAD of Integrated Circuits and Systems 29(6): 911-924 (2010)]

Page 16: Intelligent Design Space Exploration for High-Level and

Bayesian Optimization for Mapping Pipelined Applications

• The Bayesian Optimization Algorithm (BOA) is a Probabilistic Model Building Genetic Algorithm (PMBGA)• mutation and crossover operators are replaced by the construction and the sampling of a Bayesian

network. • Through the Bayesian Network, it can find underlying sub-structures of some complex problems

• We apply BOA for the mapping of pipelined applications on a heterogeneous platform

16

Page 17: Intelligent Design Space Exploration for High-Level and

BOA example

[A. Tumeo, M. Branca, L. Camerini, C. Pilato, P. L. Lanzi, F. Ferrandi, D. Sciuto: Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach. CODES+ISSS 2009: 443-452]

17

Page 18: Intelligent Design Space Exploration for High-Level and

BOA Results• Applied to more complex task graph (B,C, D)• Compared to multiobjective Simulated

Annealing (MSA), Tabu Search (TSA), Genetic Algorithm (GA)• Also a hybrid formulation where each offspring

generation of BOA is followed by several iterations of SA• Reports execution latency in clock cycles,

Relative Standard Deviation, and execution time of the optimization algorithm

18

Page 19: Intelligent Design Space Exploration for High-Level and

Multi-objective Synthesis for Real-Time Systems

• Consider a real time application with hard and soft deadlines, described by a task graph

•We are given a set of resources that could be composed together to form a system• Processors, accelerators, memories, communication elements (buses or point-

to-point channels)

•We want to obtain the system that is able to minimize area, is feasible (no violations of hard deadlines), minimize buffer/memories size, and minimize violation of soft deadlines

19

Page 20: Intelligent Design Space Exploration for High-Level and

Multi-objective Synthesis for Real-Time Systems

OVERALL FLOW CONVERSION TO MULTI-RATE TASK GRAPH

20

Page 21: Intelligent Design Space Exploration for High-Level and

Multiobjective Synthesis for Real Time Systems

• Problem formulation• Resource library, communication paths,

mapping, scheduling

• Optimization algorithms evaluated:• Multiobjective Simulated Annealing (SA)• Multiobjective Tabu Search (TS)• Niched Pareto Genetic Algorithm II (GA)

• In average, the GA is more robust and able to cover more non dominated solutions in highly constrained problem• The TS performs worse than the SA with high

number of evaluations, but is comparable or better with few evaluations• SA obtains valuable results on problems with

higher degrees of freedom

[M. Ceriani, F. Ferrandi, P. L. Lanzi, D. Sciuto, A. Tumeo:Multiprocessor systems-on-chip synthesis using multi-objective evolutionary computation. GECCO 2010: 1267-1274]

21

Page 22: Intelligent Design Space Exploration for High-Level and

SODALITE: Software Defined Accelerators from Machine Learning Tools Environment• SODALITE is PNNL’s project in the DARPA RTML (Real Time Machine Learning) program

• 3 years, 2 phases of 1.5 years each• Coordinated with parallel NSF Program

• DARPA RTML looks at the development of a compiler that will allow to generate Verilog designs starting from High-Level Machine Learning Frameworks (e.g., Pytorch, TensorFlow, MXNet, CNTK, …)

• The designs will then be fabricated in chiplets

22

Page 23: Intelligent Design Space Exploration for High-Level and

SODALITE overview• Distill promising network architectures from suggested

application area• High-Bandwidth Imaging• Driver to enable agile codesign approach and identification of

architectural templates, but objective is generality of the synthesizer

• Synthesizer frontend lowers a High-Level Intermediate Representation (HLIR) to Low Level IR (LLIR)

• Initially exploit ONNX to lower to a common HLIR• Explore opportunities to employ MLIR as HLIR• LLIR: LLVM IR

• Synthesizer Middle end performs the actual synthesis• New dataflow template-based synthesis• Classical high-level synthesis path

• Design Space Exploration engine plugs-in in the middle end• Heuristic optimization algorithms, including bio-inspired

• Closed loop with chip design and evaluation• Provides constant feedback for synthesizer development

23

Page 24: Intelligent Design Space Exploration for High-Level and

Artificial Intelligence in SODALITE• SODALITE is a new generation synthesizer• Like the examples for high-level and system synthesis, we will use

optimization algorithms to explore a multidimensional design space• Performance, power, accuracy, heat…

• A synthesizer is a compiler• Large amount of compiler optimizations can significantly influence the Verilog

generation process• Not only optimizations, but also ability to understand computational patterns and reuse• Patterns may not be the conventional ones

•We also need estimation methods to estimate the quality of the results• ASIC vs FPGA interconnect• Estimators for FPGA work mostly based on linear regression through the synthesizers -

can we do better for ASICs?

24

Page 25: Intelligent Design Space Exploration for High-Level and

Conclusion

• Synthesis techniques for graph analytics and large design space

•Overview of heuristic optimization methods for high-level and system synthesis

•Overview of SODALITE

• Possible directions for SODALITE design space exploration

• Looking to create an opensource ecosystem for synthesis and system level design space exploration

25

Page 26: Intelligent Design Space Exploration for High-Level and

Thank you!

• Thank you to my past and present collaborators

• Thanks to the SODALITE and SO(DA)2 team• Vinay Amatya, Vito Giovanni Castellana, Joseph Manzano, Marco Minutoli,

Cheng Tan

•Questions?• [email protected]

26