24
1 Alan Mishchenko Alan Mishchenko UC Berkeley UC Berkeley Implementation of Implementation of Industrial FPGA Synthesis Industrial FPGA Synthesis Flow Revisited Flow Revisited

1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

Embed Size (px)

Citation preview

Page 1: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

11

Alan MishchenkoAlan Mishchenko

UC BerkeleyUC Berkeley

Implementation of Implementation of Industrial FPGA Synthesis Flow Industrial FPGA Synthesis Flow

RevisitedRevisited

Page 2: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

22

OverviewOverview IntroductionIntroduction

MotivationMotivation Structure of FPGA synthesis flowStructure of FPGA synthesis flow Overview of the previous systemOverview of the previous system

Lessons learned while developing new systemLessons learned while developing new system Verilog parsingVerilog parsing Design representationDesign representation Netlist datastructureNetlist datastructure Integration of application packagesIntegration of application packages CustomizationCustomization

Experimental resultsExperimental results Future workFuture work

Page 3: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

33

MotivationMotivation

ABC is a logic synthesis and verification tool ABC is a logic synthesis and verification tool

developed at Berkeley (developed at Berkeley (http://www.bvsrc.org/http://www.bvsrc.org/)) ABC has been in public domain since 2005, but it ABC has been in public domain since 2005, but it

does not meet all of the industrial requirementsdoes not meet all of the industrial requirements New system is needed to fill the gapNew system is needed to fill the gap Magic was an industrial version of ABC developed Magic was an industrial version of ABC developed

in 2010 and used by several companiesin 2010 and used by several companies A new system to enhance ABC and replace Magic A new system to enhance ABC and replace Magic

is being developed at this timeis being developed at this time This presentation shares this experienceThis presentation shares this experience

Page 4: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

44

What Is Missing in ABC?What Is Missing in ABC?

The baseline version of ABC is not applicable to The baseline version of ABC is not applicable to industrial designs because it does not supportindustrial designs because it does not support Complex flopsComplex flops Multiple clock domainsMultiple clock domains Special objects (adders, RAMs, DSPs, etc)Special objects (adders, RAMs, DSPs, etc) Standard-cell librariesStandard-cell libraries

Page 5: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

55

FPGA Synthesis FlowFPGA Synthesis Flow

Inputting the design

Sequential synthesis

Comb synthesis with choices

Retiming and resynthesis

Tech mapping

Outputting the design

Ver

ifica

tion

Page 6: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

66

Magic: Synthesis Flow Based on ABCMagic: Synthesis Flow Based on ABC

Design database

Sequential synthesis

AIG rewriting

File / Code interface

Computing choices

Tech mapping

Retiming

Structuring for delay

Post-place resynthesis

Verification

Verilog, EDIF, BLIF

Programmable APIs

A. Mishchenko, N. Een, R. K. Brayton, S. Jang, M. Ciesielski, and T. Daniel, "Magic: An industrial-strength logic optimization, technology mapping, and formal verification tool". Proc. IWLS'10.

Page 7: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

77

Case Study 1: Combinational Case Study 1: Combinational Synthesis with Structural ChoicesSynthesis with Structural Choices

Traditional synthesisTraditional synthesis

D2D2D1D1

Synthesis with choicesSynthesis with choices

D3D3HAIGHAIG

D2D2D1D1 D3D3 D4D4

D4D4

Perform synthesis and keep track of changesPerform synthesis and keep track of changes Iterate fast local AIG rewriting with a global view (via hash table)Iterate fast local AIG rewriting with a global view (via hash table) Collect AIG snapshots and prove equivalences across themCollect AIG snapshots and prove equivalences across them Use equivalences (choices) during technology mappingUse equivalences (choices) during technology mapping

ObservationsObservations Leads to improved QoR after technology mappingLeads to improved QoR after technology mapping Successfully applied to 1M gate designsSuccessfully applied to 1M gate designs

Page 8: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

88

Case Study 2: Sequential VerificationCase Study 2: Sequential Verification

Property checkingProperty checking Takes design and property and Takes design and property and

makes a miter (AIG)makes a miter (AIG) Equivalence checkingEquivalence checking

Takes two designs and makes a Takes two designs and makes a miter (AIG)miter (AIG)

The goal is to transform AIG until The goal is to transform AIG until the output can be proved const 0the output can be proved const 0

Equivalence checking in Magic is Equivalence checking in Magic is based on the model checker that based on the model checker that won Hardware Model Checking won Hardware Model Checking Competition in 2008, 2010, 2011Competition in 2008, 2010, 2011

http://fmv.jku.at/hwmcc1http://fmv.jku.at/hwmcc111/results.html/results.html D2D2D1D1

Equivalence checkingEquivalence checking

0

D1D1

Property checkingProperty checking

0

pp

Page 9: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

99

AIG: A Unifying RepresentationAIG: A Unifying Representation

An underlying data structure for various computationsAn underlying data structure for various computations Representing both local and global functionsRepresenting both local and global functions Used in rewriting, resubstitution, simulation, SAT sweeping, Used in rewriting, resubstitution, simulation, SAT sweeping,

induction, etcinduction, etc

A unifying representation for the whole flowA unifying representation for the whole flow Synthesis, mapping, verification pass around AIGsSynthesis, mapping, verification pass around AIGs Stored multiple structures for mapping (‘AIG with choices’)Stored multiple structures for mapping (‘AIG with choices’)

The main functional representation in ABCThe main functional representation in ABC Foundation of ‘contemporary’ logic synthesis Foundation of ‘contemporary’ logic synthesis Source of ‘signature features’ (speed, scalability, etc)Source of ‘signature features’ (speed, scalability, etc)

Page 10: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1010

AIG: DAIG: Definition and efinition and EExamplesxamples

cdcdabab 0000 0101 1111 1010

0000 00 00 11 00

0101 00 00 11 11

1111 00 11 11 00

1010 00 00 11 00

F(a,b,c,d) = ab + d(ac’+bc)

F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d)

cdcdabab 0000 0101 1111 1010

0000 00 00 11 00

0101 00 00 11 11

1111 00 11 11 00

1010 00 00 11 00

6 nodes

4 levels

7 nodes

3 levels

b ca c

a b d

a c b d b c a d

AIG is a Boolean network composed of two-input ANDs and invertersAIG is a Boolean network composed of two-input ANDs and inverters

Page 11: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1111

Design size, gate count

Time, years

1950-1970 1980 1990 2000

Conjunctive normal forms

Truth tables

Sum-of-products

Binary Decision Diagrams

Historical PerspectiveHistorical Perspective

And-Inverter Graphs

10

100

1,000,000

Espresso, MIS, SIS

SIS, VIS, MVSIS

ABC, Magic

2010

10,000

Page 12: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1212

Magic 2: Lessons LearnedMagic 2: Lessons Learned (1) Verilog parsing(1) Verilog parsing

Limit Verilog to a structural subsetLimit Verilog to a structural subset (2) Design representation(2) Design representation

Represent only relevant data and hide useless detailsRepresent only relevant data and hide useless details (3) Netlist data-structure(3) Netlist data-structure

Use simple, compact netlist data-structureUse simple, compact netlist data-structure (4) Integration of application packages(4) Integration of application packages

Make packages independent of the netlist and Make packages independent of the netlist and interface them using AIGsinterface them using AIGs

(5) Customization(5) Customization Make the system user-independentMake the system user-independent

Page 13: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1313

(1) Verilog Parsing(1) Verilog Parsing

Verilog parsing is believed to be a difficult problem, and Verilog parsing is believed to be a difficult problem, and companies (e.g. Verific) offer industry-standard solutionscompanies (e.g. Verific) offer industry-standard solutions

However, several simplifying assumptions can make However, several simplifying assumptions can make Verilog parsing a 1-person 1-month project:Verilog parsing a 1-person 1-month project:

Consider only structural VerilogConsider only structural Verilog Read the file into memory and parse it in memoryRead the file into memory and parse it in memory Remove preprocessor definitions, comments, line endings, etcRemove preprocessor definitions, comments, line endings, etc Split into statements separated by semi-colons (;)Split into statements separated by semi-colons (;) Parse in two passes: first statements for module interfacesParse in two passes: first statements for module interfaces

• module/endmodule, input/output/inout, etcmodule/endmodule, input/output/inout, etc Second, parse remaining statements, including instance definitionsSecond, parse remaining statements, including instance definitions Connect all constructed objects using net/pin namesConnect all constructed objects using net/pin names Check the correctness of the connectivity infoCheck the correctness of the connectivity info

Page 14: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1414

ExampleExample

module add2( A, B, S, CO );module add2( A, B, S, CO ); input [1:0] A , B;input [1:0] A , B; output CO, S[1:0];output CO, S[1:0]; wire n1;wire n1; fadd inst1 (.ci(1’b0), .a(A[0]), .b(B[0]), .s(S[0]) , .co(n1) );fadd inst1 (.ci(1’b0), .a(A[0]), .b(B[0]), .s(S[0]) , .co(n1) ); fadd inst2 (.ci(n1), .a(A[1]), .b(B[1]), .s(S[1]) , .co(CO) );fadd inst2 (.ci(n1), .a(A[1]), .b(B[1]), .s(S[1]) , .co(CO) );endmoduleendmodule

module fadd( ci, a, b, s, co );module fadd( ci, a, b, s, co ); input ci, a, b;input ci, a, b; output s, co;output s, co; assign s = ci ^ a ^ b;assign s = ci ^ a ^ b; assign co = (ci & a) | (ci & b) | (a & b);assign co = (ci & a) | (ci & b) | (a & b);endmoduleendmodule

Page 15: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1515

(2) Design Representation(2) Design Representation

Structural informationStructural information Inputs, outputs, wires, internal objects, etcInputs, outputs, wires, internal objects, etc Hierarchy (to be flattened, to be kept, library cells, etc)Hierarchy (to be flattened, to be kept, library cells, etc)

Functional informationFunctional information Combinational: gates, LUTsCombinational: gates, LUTs Sequential: flip-flops, clocksSequential: flip-flops, clocks

Additional structural informationAdditional structural information White/black/grey boxes: RAM, DSP, regfiles, etcWhite/black/grey boxes: RAM, DSP, regfiles, etc Multiple clock domains, clock networkMultiple clock domains, clock network Tri-states, in-outs, etcTri-states, in-outs, etc

Page 16: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1616

Handling Design RepresentationHandling Design Representation

Design representation should be comprehensive Design representation should be comprehensive (represent complete information) but flexible (represent complete information) but flexible (work only on what is necessary at each time)(work only on what is necessary at each time)

Examples:Examples: to flatten hierarchy, only structural info is neededto flatten hierarchy, only structural info is needed to perform comb synthesis, only comb logic is neededto perform comb synthesis, only comb logic is needed

In both cases, it should be possible to access In both cases, it should be possible to access and modify each type of information without and modify each type of information without changing other typeschanging other types

Page 17: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1717

(3) Netlist Data Structure(3) Netlist Data Structure

Should be very simple and easy to constructShould be very simple and easy to construct Objects use as little memory as possibleObjects use as little memory as possible

• Currently, 4-LUT uses 28 bytes + memory for attributesCurrently, 4-LUT uses 28 bytes + memory for attributes Object attributes are added/removed on demandObject attributes are added/removed on demand

• For example, no need for fanout information in most casesFor example, no need for fanout information in most cases Objects ordered in memory in a topological orderObjects ordered in memory in a topological order

• Improves runtime of iterative traversalsImproves runtime of iterative traversals• Makes the code much simplerMakes the code much simpler

LimitationLimitation Each time the netlist is modified, it needs to be Each time the netlist is modified, it needs to be

duplicatedduplicated

Page 18: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1818

(4) Integration of Application (4) Integration of Application PackagesPackages

Application packages Application packages interact with design interact with design databasedatabase

Logic information is Logic information is extracted and inserted extracted and inserted in the form of AIGsin the form of AIGs

Synthesis & verification Synthesis & verification are performed by ABC are performed by ABC working on these AIGsworking on these AIGs

Design database

Sequential synthesis

AIG rewriting

File / Code interface

Computing choices

Tech mapping

Retiming

Structuring for delay

Post-place resynthesis

Verification

Page 19: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

1919

(5) Customization(5) Customization

The system should be easily customizableThe system should be easily customizable The source code is the same for all usersThe source code is the same for all users Configuration files differConfiguration files differ

Currently, the user “owns” the following:Currently, the user “owns” the following: The library of primitives (a Verilog file)The library of primitives (a Verilog file) Timing info for primitives (e.g. LUT pin delays)Timing info for primitives (e.g. LUT pin delays) Timing models used for calculating data for Timing models used for calculating data for

boxes, complex flops, wires, etcboxes, complex flops, wires, etc

Page 20: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

2020

Experimental SetupExperimental Setup Integrated Magic into an industrial FPGA synthesis flowIntegrated Magic into an industrial FPGA synthesis flow Experimented with the full flow, including P&RExperimented with the full flow, including P&R

Did not use retimingDid not use retiming Did not use post-placement re-synthesisDid not use post-placement re-synthesis

Verified by running Magic and in-house simulation toolsVerified by running Magic and in-house simulation tools Experimented with 20 designs, from 175K to 648K LUT4Experimented with 20 designs, from 175K to 648K LUT4 Two experimental runs:Two experimental runs:

““Reference” stands for the typical industrial flow without MagicReference” stands for the typical industrial flow without Magic ““Magic” stands for the new flow with MagicMagic” stands for the new flow with Magic

Frontend

Design entry, high-level synthesis, quick mapping

BackendPlacement, routing,

design rule checking, etc

Magic

Seq and comb synthesis, mapping, legalization

Page 21: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

2121

Experimental ResultsExperimental ResultsProfile Reference Magic

Circuits PI PO LUT FF Lev fMAX Time LUT FF Lev fMAX Time

C1 736 369 174972 113157 12 128.53 1.05 173561 100398 10 133.87 0.70

C2 150 67 187037 112991 18 91.32 0.53 161303 93930 16 95.69 0.67

C3 4 80 199097 53954 27 68.49 0.69 137126 36190 20 75.59 0.77

C4 517 253 206725 132416 11 105.37 1.31 197029 114745 8 129.20 0.67

C5 4 280 212124 64120 26 68.82 0.65 152799 49513 19 77.70 0.74

C6 803 258 255415 166644 11 113.25 2.08 255026 148445 8 123.00 1.00

C7 24 10 296152 133704 17 89.93 0.72 246908 114002 14 120.48 0.90

C8 124 58 323818 86712 32 40.68 1.99 346516 86662 25 47.08 1.94

C9 268 132 413017 195150 18 81.50 1.40 375481 174306 15 79.81 1.61

C10 205 94 439963 134139 20 63.17 3.55 445950 133575 15 69.06 2.64

C11 148 456 455429 160450 96 27.53 2.23 398428 149126 56 33.11 1.90

C12 4 3 455630 20277 6 66.67 0.78 152414 19446 6 100.40 0.41

C13 4 240 470436 230811 28 53.59 3.30 462010 225676 18 57.34 6.18

C14 218 69 522988 311436 17 68.78 1.83 448426 257996 15 69.40 2.19

C15 377 183 575355 351911 10 136.05 2.59 575672 349715 8 136.99 2.95

C16 73 33 599413 216051 4 202.02 1.07 599413 216051 4 209.21 1.79

C17 136 66 618377 259844 56 47.66 2.75 562367 243084 34 53.53 2.61

C18 136 66 621875 249327 27 45.68 4.60 606135 247825 27 52.58 4.03

C19 146 391 630918 275871 55 46.36 2.50 572834 259336 36 50.76 2.51

C20 135 32 648849 353940 7 127.71 2.45 645501 353616 5 136.43 2.91

Geomean 377883 150015 18.54 74.768 1.591 329751 135972 14.40 83.572 1.541

Ratio 1 1 1 1 1 0.873 0.906 0.777 1.118 0.969

Page 22: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

2222

Cumulative ImprovementCumulative Improvement(retiming excluded)(retiming excluded)

2222

Page 23: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

2323

Future WorkFuture Work

Improve the integrationImprove the integration Simpler interfaces, better data consistency checking, etcSimpler interfaces, better data consistency checking, etc

Improve application packagesImprove application packages AIG rewriting, tech-mapping, sequential synthesis, etcAIG rewriting, tech-mapping, sequential synthesis, etc

Integrate logic and physical synthesisIntegrate logic and physical synthesis Synthesis/mapping/retiming before placementSynthesis/mapping/retiming before placement Retiming/restructuring after placementRetiming/restructuring after placement

Extend to work for various technologiesExtend to work for various technologies Standard cellsStandard cells Macro cellsMacro cells LUT structuresLUT structures LUT/MUX structuresLUT/MUX structures

Page 24: 1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited

2424

AbstractAbstract

This talk is inspired by the recent experiences gained This talk is inspired by the recent experiences gained while developing an industrial-strength system for FPGA while developing an industrial-strength system for FPGA synthesis and mapping.  First, we review the design synthesis and mapping.  First, we review the design representation with "industrial stuff", such as black and representation with "industrial stuff", such as black and while boxes, complex flops, multiple clock domains, while boxes, complex flops, multiple clock domains, tristates, inouts, etc, and how to handle them in the tool tristates, inouts, etc, and how to handle them in the tool whose primary strength is applying combinational whose primary strength is applying combinational synthesis and mapping.  Next, we discuss several ideas synthesis and mapping.  Next, we discuss several ideas for implementing a custom Verilog parser for hierarchical for implementing a custom Verilog parser for hierarchical designs. Finally, we propose a low-memory netlist designs. Finally, we propose a low-memory netlist representation used to store the data and interface representation used to store the data and interface various optimization engines. various optimization engines.