17
Enumeration of Enumeration of Irredundant Circuit Irredundant Circuit Structures Structures Alan Mishchenko Alan Mishchenko Department of EECS Department of EECS UC Berkeley UC Berkeley

Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Embed Size (px)

Citation preview

Page 1: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Enumeration of Irredundant Enumeration of Irredundant

Circuit StructuresCircuit Structures

Alan MishchenkoAlan Mishchenko

Department of EECSDepartment of EECS

UC BerkeleyUC Berkeley

Page 2: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

2

OverviewOverview Logic synthesis is important and challenging taskLogic synthesis is important and challenging task Boolean decomposition is a way to do logic synthesisBoolean decomposition is a way to do logic synthesis

Several algorithms - many heuristicsSeveral algorithms - many heuristics

DrawbacksDrawbacks Incomplete algorithms - Incomplete algorithms - suboptimal resultssuboptimal results Computationally expensive algorithms - Computationally expensive algorithms - high runtimehigh runtime

Our goal is to overcome these drawbacksOur goal is to overcome these drawbacks Perform exhaustive enumeration Perform exhaustive enumeration offlineoffline Use pre-computed results Use pre-computed results onlineonline, to get good Q&R and low runtime, to get good Q&R and low runtime

Practical discoveriesPractical discoveries The number of unique functions up to 16 inputs is not too highThe number of unique functions up to 16 inputs is not too high The number of unique decompositions of a function is not too highThe number of unique decompositions of a function is not too high

Page 3: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Small Practical FunctionsSmall Practical Functions Classifications of Boolean functionsClassifications of Boolean functions

Random functionsRandom functions Special function classes Special function classes

SymmetricSymmetric UnateUnate etcetc

Logic synthesis and technology mapping deal with Logic synthesis and technology mapping deal with Functions appearing in the designsFunctions appearing in the designs Functions with small support (up to 16 variables)Functions with small support (up to 16 variables)

These functions are called These functions are called small practical functions (SPFs)small practical functions (SPFs) We will concentrate on SPFs and study their propertiesWe will concentrate on SPFs and study their properties In particular, we will askIn particular, we will ask

How many different SPFs exist?How many different SPFs exist? How many different irredundant logic structures they have?How many different irredundant logic structures they have?

Page 4: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

DSD StructureDSD Structure DSD structureDSD structure is a tree of nodes derived by applying DSD is a tree of nodes derived by applying DSD

recursively until remaining nodes are not decomposablerecursively until remaining nodes are not decomposable DSD is DSD is fullfull if the resulting tree consists of only simple if the resulting tree consists of only simple

gates (AND/XOR/MUX)gates (AND/XOR/MUX) DSD is DSD is partialpartial if the resulting tree has non-decomposable if the resulting tree has non-decomposable

nodes (called nodes (called prime nodesprime nodes)) DSD DSD does not existdoes not exist if the tree is composed of one node if the tree is composed of one node

ab

c d e

f

a b c d e

f

Full DSD Partial DSD

a b c d e f

No DSD

Page 5: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Computing DSDComputing DSD The input is a Boolean functionThe input is a Boolean function The output is a DSD structureThe output is a DSD structure

The structure is The structure is uniqueunique up to several normalizations: up to several normalizations: Selection of base functions (elementary gates)Selection of base functions (elementary gates) Placement of invertersPlacement of inverters Factoring of multi-input AND/XOR gatesFactoring of multi-input AND/XOR gates Ordering of fanins of AND/XOR gatesOrdering of fanins of AND/XOR gates Ordering of data inputs of MUXesOrdering of data inputs of MUXes NPN representative of prime nodesNPN representative of prime nodes

This computation is fast and reliableThis computation is fast and reliable Originally implemented with BDDs (Bertacco et al)Originally implemented with BDDs (Bertacco et al) In a limited form, re-implemented with truth tables In a limited form, re-implemented with truth tables

Detects about 95% of DSDs of cut functionsDetects about 95% of DSDs of cut functions

To put DSD computation in perspectiveTo put DSD computation in perspective For 8-LUT mapping, it takes roughly the same time to For 8-LUT mapping, it takes roughly the same time to

to compute structural cutsto compute structural cuts to derive their truth tablesto derive their truth tables to compute DSDs of the truth tablesto compute DSDs of the truth tables

F(a,b,c,d) = ab + cd

c da b

F

Page 6: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Pre-computing Non-Disjoint-Support Pre-computing Non-Disjoint-Support DecompositionsDecompositions

Enumerate bound sets while Enumerate bound sets while increasing sizeincreasing size Enumerate shared sets while Enumerate shared sets while

increasing sizeincreasing size If the bound+shared set is If the bound+shared set is

irredundantirredundant Add it to the computed setAdd it to the computed set

Bound+shared set is redundantBound+shared set is redundant If a variable can be removed and If a variable can be removed and

the resulting set is decomposablethe resulting set is decomposable Ex: (Ex: (abCDabCD) is redundant ) is redundant

if (if (abcDabcD) or () or (abDabD) is a valid set) is a valid set

a b CD e

a b c D a b cD ee

H

HH

G

G G

Page 7: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Example of Non-DS Decomposition:Example of Non-DS Decomposition: Mapping 4:1 MUX into two 4-LUTsMapping 4:1 MUX into two 4-LUTs

The complete set of support-reducing bound-sets for Boolean function of 4:1 MUX:

Set 0 : S = 1 D = 3 C = 5 x=Acd y=xAbefSet 1 : S = 1 D = 3 C = 5 x=Bce y=xaBdfSet 2 : S = 1 D = 3 C = 5 x=Ade y=xAbcfSet 3 : S = 1 D = 3 C = 5 x=Bde y=xaBcfSet 4 : S = 1 D = 3 C = 5 x=Acf y=xAbdeSet 5 : S = 1 D = 3 C = 5 x=Bcf y=xaBdeSet 6 : S = 1 D = 3 C = 5 x=Bdf y=xaBceSet 7 : S = 1 D = 3 C = 5 x=Aef y=xAbcdSet 8 : S = 1 D = 4 C = 4 x=aBcd y=xBefSet 9 : S = 1 D = 4 C = 4 x=Abce y=xAdfSet 10 : S = 1 D = 4 C = 4 x=Abdf y=xAceSet 11 : S = 1 D = 4 C = 4 x=aBef y=xBcdSet 12 : S = 2 D = 5 C = 4 x=ABcde y=xABfSet 13 : S = 2 D = 5 C = 4 x=ABcdf y=xABeSet 14 : S = 2 D = 5 C = 4 x=ABcef y=xABdSet 15 : S = 2 D = 5 C = 4 x=ABdef y=xABc

b

a

c d e f

0 1

1 1 0 0

F 4-LUT

4-LUT 4-LUT

b

e f b

a

c d

b

4-LUT

4-LUT

1

1

1

1 0

0

0

0 z

F

Page 8: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Application to LUT Structure Mapping:Application to LUT Structure Mapping:Matching 6-input function with LUT structure “44”Matching 6-input function with LUT structure “44”

a b c D e

H

G

f

a b c d e f

a b c D e

H

G

a b c d e

f

f

a b C d e

H

G

a b c d e

f

f

a b C d e

H’

G

f

Case 1 Case 2 Case 3

Page 9: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Application to Standard Cell MappingApplication to Standard Cell Mapping

Enumerate decomposable bound setsEnumerate decomposable bound sets For each bound set, enumerate NPN For each bound set, enumerate NPN

classes of G and Hclasses of G and H Use them as choice nodesUse them as choice nodes Use choice nodes to improve quality of Use choice nodes to improve quality of

Boolean matchingBoolean matching

Property:Property: When non-disjoint-support When non-disjoint-support decomposition is applied, there are exactly decomposition is applied, there are exactly M = 2^((2^k)-1)M = 2^((2^k)-1) pairs of different NPN pairs of different NPN classes of decomposition/composition classes of decomposition/composition functions, functions, GG and and H,H, where where kk is the number is the number of shared variablesof shared variables

H

G

F

kk MM

00 11

11 22

22 88

33 128128

44 3276832768

55 21474836482147483648

Page 10: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Example of a Typical SPFExample of a Typical SPFabc 01> rt 000A115Fabc 02> print_dsd –dF = 0505003F(a,b,c,d,e)This 5-variable function has 10 decomposable variable sets:Set 0 : S = 1 D = 3 C = 4 x=abC y=xCde 0 : <cba> 011D{decf} 1 : <c!ba> 110D{decf}Set 1 : S = 1 D = 3 C = 4 x=bCd y=xaCe 0 : !(!d!(cb)) <e(!c!a)!f> 1 : 1C{bdc} 3407{aecf}Set 2 : S = 1 D = 3 C = 4 x=abE y=xcdE 0 : <eab> 0153{cdef} 1 : <e!ab> 5103{cdef}Set 3 : S = 1 D = 3 C = 4 x=acE y=xbdE 0 : !(!c!(ea)) 01F3{bdef} 1 : 1C{ace} F103{bdef}Set 4 : S = 1 D = 3 C = 4 x=bcE y=xadE 0 : (c!(!e!b)) (!f<e!a!d>) 1 : 38{bce} 5003{adef}Set 5 : S = 1 D = 3 C = 4 x=bCe y=xaCd 0 : !(!e!(cb)) <f(!c!a)!d> 1 : 1C{bec} 3503{adcf}Set 6 : S = 1 D = 3 C = 4 x=adE y=xbcE 0 : <ead> (!f!(c!(!e!b))) 1 : <e!ad> 3007{bcef}Set 7 : S = 1 D = 4 C = 3 x=abcE y=xdE 0 : FAC0{abce} (!f!(!ed)) 1 : 05C0{abce} C1{def}Set 8 : S = 1 D = 4 C = 3 x=aCde y=xbC 0 : <e!(!c!a)d> (!f!(cb)) 1 : 03AC{adec} 43{bcf}Set 9 : S = 1 D = 4 C = 3 x=bcdE y=xaE 0 : CCF8{bcde} (!f!(ea)) 1 : 33F8{bcde} 43{aef}

abc 01> rt 000A115Fabc 02> pkTruth table: 000a115f

d e \ a b c 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 +---+---+---+---+---+---+---+---+00 | 1 | 1 | 1 | 1 | 1 | | | 1 | +---+---+---+---+---+---+---+---+01 | | | | | 1 | | | 1 | +---+---+---+---+---+---+---+---+11 | | | | | | | | | +---+---+---+---+---+---+---+---+10 | 1 | 1 | | | | | | | +---+---+---+---+---+---+---+---+

NOTATIONS: !a is complementation NOT(a) (ab) is AND(a,b) [ab] is XOR(a,b) <abc> is MUX(a, b, c) = ab + !ac <truth_table>{abc} is PRIME node

Page 11: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Statistics of DSD ManagerStatistics of DSD Managerabc 01> abc 01> pub12_16.dsd; dsd_pspub12_16.dsd; dsd_psTotal number of objects = 3567880Total number of objects = 3567880Externally used objects = 3060774Externally used objects = 3060774Non-DSD objects (max =12) = 479945Non-DSD objects (max =12) = 479945Non-DSD structures = 3220044Non-DSD structures = 3220044Prime objects = 1405170Prime objects = 1405170

Memory used for objects = 100.04 MB.Memory used for objects = 100.04 MB.Memory used for functions = 238.01 MB.Memory used for functions = 238.01 MB.Memory used for hash table = 40.83 MB.Memory used for hash table = 40.83 MB.Memory used for bound sets = 79.98 MB.Memory used for bound sets = 79.98 MB.Memory used for array = 27.22 MB.Memory used for array = 27.22 MB.

0 : All = 10 : All = 1 1 : All = 11 : All = 1 2 : All = 22 : All = 2 3 : All = 103 : All = 10 4 : All = 2294 : All = 229 5 : All = 38235 : All = 3823 6 : All = 222736 : All = 22273 7 : All = 779597 : All = 77959 8 : All = 2000888 : All = 200088 9 : All = 3963079 : All = 396307 10 : All = 66162010 : All = 661620 11 : All = 97233311 : All = 972333 12 : All = 123323412 : All = 1233234All : All = 3567880All : All = 3567880

abc 01> timeabc 01> timeelapse: 3.00 seconds, total: 3.00 secondselapse: 3.00 seconds, total: 3.00 seconds

This DSD manager was created This DSD manager was created using cut enumeration applied to using cut enumeration applied to *all* MCNC, ISCAS, and ITC *all* MCNC, ISCAS, and ITC benchmarks circuits (the total of benchmarks circuits (the total of about 835K AIG nodes).about 835K AIG nodes).

This involved computing 16 priority This involved computing 16 priority 12-input cuts at each node.12-input cuts at each node.

Binary file “pub12_16.dsd” has Binary file “pub12_16.dsd” has size 177 MB. size 177 MB.

Gzipped archive has size 42 MB.Gzipped archive has size 42 MB.

Reading it into ABC takes 3 sec.Reading it into ABC takes 3 sec.

Harvesting functions contained in Harvesting functions contained in this DSD manager took 1 hour.this DSD manager took 1 hour.

Page 12: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Typical DSD StructuresTypical DSD Structures 6 inputs Occurs 9 inputs Occurs 12 inputs Occurs 1 (a!(bc)!(d!(ef))) 4386 (abcdefg!(hi)) 5511 (abcdefghij!(kl)) 4092 2 (a!(b!(c!(d!(ef))))) 4128 (abcd!(ef)!(ghi)) 5375 (abcdefghi!(j!(kl))) 2788 3 (ab!(c!(d!(ef)))) 3727 (abcdef!(g!(hi))) 4901 (abcdefgh!(ij)!(kl)) 2447 4 (a!(bc!(d!(ef)))) 3503 (abcd!(ef)!(g!(hi))) 4625 (abcdefghi!(jkl)) 2318 5 (a!(b!(cd!(ef)))) 3075 (abcde!(fg)!(hi)) 4588 (abcdefg!(hi)!(jkl)) 2087 101 (a!(b!(c17{def}))) 485 (abcdef[g(hi)]) 1109 (abcdefg!(hi!(jkl))) 557 102 (a!(bc)17{def}) 483 (abcd!(e!(fg))[hi]) 1090 (abc!(!(defg)!(hijkl))) 556 103 <ab(c!(def))> 471 (!(ab!(cd))!(ef!(ghi))) 1067 (abcdef!(gh)!(i!(j!(kl)))) 555 104 (a17{!(b!(cd))ef}) 470 (a!(bc!(de))!(fg!(hi))) 1061 (!(ab)!(cde)!(fghijkl)) 550 105 (abcd[ef]) 466 (a!(b!(cd))!(ef!(g!(hi)))) 1058 (ab!(cdefg)!(hijkl)) 549 201 (!(ab)!(c[def])) 209 (a!(bc)!(de!(fg)!(hi))) 652 (abcde!(f!(gh!(ijkl)))) 378 202 (a17{[b(cd)]ef}) 207 (!(a!(b!(cd)))!(e!(fghi))) 644 (a!(!(bc)!(defghijkl))) 378 203 <a(bc)<def>> 206 (abcde!(f<ghi>)) 644 (!(abcd!(ef))!(ghijkl)) 378 204 (a!(b!(!(cd)[ef]))) 203 (a!(bcdefg[hi])) 641 (a!(bc)!(d!(e!(fghijkl)))) 378 205 (a!000A115F{bcdef}) 203 (a!(bc)!(d!(e!(f!(g!(hi)))))) 638 (ab!(c!(de))!(fghijkl)) 377

NOTATIONS: !a is complementation NOT(a) (ab) is AND(a,b) [ab] is XOR(a,b) <abc> is MUX(a, b, c) = ab + !ac <truth_table>{abc} is a PRIME node with hexadecimal <truth_table>

Page 13: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

Support-Reducing DecompositionsSupport-Reducing Decompositions

S NPN Percentage of functions having the given number of different decompositions Decs Decs classes 0 1-

10 11-20

21-30

31-40

41-50

51-60

61-70

71-80

81-90

91-100

More max ave

0 0 0 1 0 0 2 0 0 3 4 10 0 4 176 52.3 47.2 0.6 12 1.1 5 3438 12.8 81.7 5.2 0.1 50 4.1 6 17397 7.0 48.6 36.4 5.7 1.3 0.5 0.2 0.1 150 9.7 7 43926 2.1 32.0 31.8 19.6 7.3 3.0 1.5 0.8 0.5 0.4 0.2 0.7 392 17.9 8 78979 2.4 30.3 23.9 15.1 9.9 6.4 3.5 2.2 1.6 1.0 0.7 3.1 1000 25.9 9 104584 2.4 31.5 23.0 12.8 8.1 5.3 3.6 2.7 2.0 1.6 1.2 5.7 2214 32.3 10 106462 2.8 33.3 22.2 12.2 7.5 4.6 3.4 2.4 1.8 1.4 1.1 7.1 5454 37.7 11 83125 3.6 35.1 21.7 10.9 6.9 4.5 3.1 2.3 1.9 1.3 1.1 7.8 11132 42.8 12 41854 5.4 38.7 20.6 9.9 6.0 3.8 2.7 1.8 1.6 1.2 1.0 7.4 20144 47.3

For each support size (S) of NPN classes of non-DSD-decomposable functions- the columns are ranges of counts of irredundant decompositions- the entries are percentages of functions in each range- the last two columns are the maximum and average decomposition counts

Page 14: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

LUT Structure MappingLUT Structure MappingDesign Without DSD structures With DSD structures

LUT Level Time, s LUT Level Time, s Time, s 01 33648 30 130.29 33212 30 180.27 32.33 02 19751 7 6.91 19777 7 3.37 2.36 03 28266 20 114.18 27859 20 123.18 52.49 04 40286 12 121.58 40332 12 55.22 32.36 05 47858 15 126.29 47016 15 91.29 32.99 06 95630 15 243.48 93901 15 123.09 60.70 07 32118 15 66.32 31564 15 73.01 16.67 08 33611 21 72.14 33083 21 32.68 16.63 09 34887 5 8.36 34835 5 9.68 3.60 10 13364 10 13.47 13218 9 67.70 4.68 Geomean 1.000 1.000 1.000 0.989 0.990 0.901 0.300

LUT: LUT countLevel: LUT level countTime, s: Runtime, in secondsThe last two columns: - with online DSD computations- with offline DSD computations (based on pre-computed data)

Page 15: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

LUT Level MinimizationLUT Level MinimizationDesign 6-LUT mapping LUTB SOPB + LUTB LMS + LUTB

LUT Level LUT Level LUT Level LUT Level 01 32788 20 32483 18 31104 20 33047 17 02 19768 5 19818 5 20039 5 19956 5 03 27545 13 26716 13 27057 12 27081 12 04 39727 9 37644 9 39180 9 38906 8 05 46633 10 46225 9 46740 8 46754 8 06 93172 10 92238 9 92970 8 93270 8 07 31299 9 30929 8 30480 8 30811 8 08 36583 20 34730 19 36576 17 37334 17 09 33600 5 33455 5 33559 5 33565 5 10 13099 8 12943 7 13028 6 13011 6

Geomean 1.000 1.000 0.981 0.940 0.990 0.896 0.998 0.872

6-LUT mapping: Standard mapping into 6-LUTs with structural choicesLUTB: DSD-based LUT balancing proposed in this workSOPB+LUTB: SOP balancing followed by LUT balancing (ICCAD’11)LMS+LUTB: Lazy Man’s Logic Synthesis followed by LUT balancing (ICCAD’12)

Page 16: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

ConclusionsConclusions

Introduced Boolean decompositionIntroduced Boolean decomposition Proposed exhaustive enumeration of decomposable setsProposed exhaustive enumeration of decomposable sets Discussed applications to Boolean matchingDiscussed applications to Boolean matching Experimented with benchmarks to find a 3x speedup in Experimented with benchmarks to find a 3x speedup in

LUT structure mappingLUT structure mapping

Future work will focus on Improving implementation Extending to standard cells Use in technology-independent synthesis

Page 17: Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley

AbstractAbstract A new approach to Boolean decomposition and matching is proposed. It A new approach to Boolean decomposition and matching is proposed. It

uses enumeration of all support-reducing decompositions of Boolean uses enumeration of all support-reducing decompositions of Boolean functions up to 16 inputs. The approach is implemented in a new functions up to 16 inputs. The approach is implemented in a new framework that compactly stores multiple circuit structures. The method framework that compactly stores multiple circuit structures. The method makes use of pre-computations performed makes use of pre-computations performed offlineoffline, before the framework is , before the framework is started by the calling application. As a result, the runtime of the started by the calling application. As a result, the runtime of the onlineonline computations is substantially reduced. For example, matching Boolean computations is substantially reduced. For example, matching Boolean functions against an interconnected LUT structure during technology functions against an interconnected LUT structure during technology mapping is reduced to the extent that it no longer dominates the runtime of mapping is reduced to the extent that it no longer dominates the runtime of the mapper. Experimental results indicate that this work has promising the mapper. Experimental results indicate that this work has promising applications in CAD tools for both FPGAs and standard cells.applications in CAD tools for both FPGAs and standard cells.