Upload
declan
View
16
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Power Optimization Toolbox for Logic Synthesis and Mapping. Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley. Outline. Introduction Background Contributions SimSwitch : Switching activity estimation PowerMap : Mapping for power reduction - PowerPoint PPT Presentation
Citation preview
11
Alan Mishchenko Robert BraytonAlan Mishchenko Robert Brayton
UC BerkeleyUC Berkeley
Power Optimization Toolbox Power Optimization Toolbox for for
Logic Synthesis and MappingLogic Synthesis and Mapping
22
OutlineOutline
Introduction Introduction BackgroundBackground ContributionsContributions
SimSwitchSimSwitch: Switching activity estimation: Switching activity estimation PowerMapPowerMap:: Mapping for power reductionMapping for power reduction PowerDCPowerDC:: Re-synthesis for power reductionRe-synthesis for power reduction
ExperimentsExperiments ConclusionsConclusions
33
IntroductionIntroduction
High power dissipation is a rising concernHigh power dissipation is a rising concern It was shown that, in FPGAs, 2/3 of dissipation is due to It was shown that, in FPGAs, 2/3 of dissipation is due to
dynamic power [J. Anderson, F. N. Najm, FPGA’02] dynamic power [J. Anderson, F. N. Najm, FPGA’02]
Minimization of dynamic power is achieved by reducing Minimization of dynamic power is achieved by reducing the total switching activity of the nodesthe total switching activity of the nodes
This workThis work Uses sequential simulation to estimate switchingUses sequential simulation to estimate switching Controls switching during synthesis and mappingControls switching during synthesis and mapping
signalsi
dynamic ii SCVfP2
2
1
f is the clock frequency, V the supply voltage, Ci the capacitance switched by signal i, and Si is the probability of signal i making a transition (switching)
44
BackgroundBackground
Boolean networkBoolean network And-Inverter GraphsAnd-Inverter Graphs
Technology mappingTechnology mapping LUTs and standard cellsLUTs and standard cells
SAT-based re-synthesisSAT-based re-synthesis Resubstitution with don’t-caresResubstitution with don’t-cares
55
AIGs: Unifying RepresentationAIGs: Unifying Representation
An underlying data structure for various computationsAn underlying data structure for various computations Rewriting, resubstitution, simulation, SAT sweeping, Rewriting, resubstitution, simulation, SAT sweeping,
induction, etc are based on the same AIG managerinduction, etc are based on the same AIG manager
A unifying representation for the whole A unifying representation for the whole synthesis/mapping/resynthesis/verification flowsynthesis/mapping/resynthesis/verification flow
Synthesis, mapping, verification use the same data-structureSynthesis, mapping, verification use the same data-structure Allows multiple structures to be stored and used for mappingAllows multiple structures to be stored and used for mapping
The main functional representation in ABCThe main functional representation in ABC A foundation of “contemporary logic synthesis”A foundation of “contemporary logic synthesis”
66
AIG DAIG Definition and efinition and EExamplesxamples
cdcdabab 0000 0101 1111 1010
0000 00 00 11 00
0101 00 00 11 11
1111 00 11 11 00
1010 00 00 11 00
F(a,b,c,d) = ab + d(ac’+bc)
F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d)
cdcdabab 0000 0101 1111 1010
0000 00 00 11 00
0101 00 00 11 11
1111 00 11 11 00
1010 00 00 11 00
6 nodes
4 levels
7 nodes
3 levels
b ca c
a b d
a c b d b c a d
AIG is a Boolean network composed of two-input ANDs and invertersAIG is a Boolean network composed of two-input ANDs and inverters
77
Three Tricks That Make AIGs TickThree Tricks That Make AIGs Tick
Structural hashingStructural hashing Makes sure AIG is always stored in a compact formMakes sure AIG is always stored in a compact form Is applied during AIG constructionIs applied during AIG construction
• Propagates constantsPropagates constants• Ensures each node is structurally uniqueEnsures each node is structurally unique
Complemented edgesComplemented edges Represents inverters as attributes on the edgesRepresents inverters as attributes on the edges
• Leads to fast, uniform manipulationLeads to fast, uniform manipulation• Does not use memory for invertersDoes not use memory for inverters• Leads to efficient structural hashingLeads to efficient structural hashing
Memory allocationMemory allocation Uses fixed amount of memory for each nodeUses fixed amount of memory for each node
• Can be done by a simple custom memory managerCan be done by a simple custom memory manager• Even dynamic fanout manipulation is supported!Even dynamic fanout manipulation is supported!
Allocates memory for nodes in a topological orderAllocates memory for nodes in a topological order• Optimized for traversal in the same topological orderOptimized for traversal in the same topological order• Small static memory footprint for many applicationsSmall static memory footprint for many applications
a b
c d
a b
c d
Without hashingWithout hashing
With hashingWith hashing
88
SimSwitchSimSwitch
Fast sequential logic simulatorFast sequential logic simulator Useful for switching activity estimationUseful for switching activity estimation
Improvements in simulationImprovements in simulation CompactCompact l logic ogic rrepresentation epresentation
• only 12 bytes per AIG nodeonly 12 bytes per AIG node RRecycling ecycling ssimulation imulation mmemoryemory
• allocate simulation memory only for nodes on the frontierallocate simulation memory only for nodes on the frontier Bit-parallel simulation of two time framesBit-parallel simulation of two time frames
• When comparing simulation info in two consecutive time frames, When comparing simulation info in two consecutive time frames, avoids storing the simulation info from the previous frameavoids storing the simulation info from the previous frame
99
Simulation Runtime EvaluationSimulation Runtime Evaluation
Runtime for 64 frames and given number of inputs patterns (sec) Design AIG FF
2560 5120 10240 20480 C1 304K 1585 0.1 0.2 0.2 0.4 C2 362K 27514 2.7 2.9 4.1 6.6 C3 842K 58322 7.4 7.6 10.2 18.2 C4 1306K 87157 12.1 15.4 15.7 24.2
Intel Xeon 2-CPU 4-core computer with 8GB RAM.
Less than 100Mb was used in these experiments.
1010
Review of Cut-Based MappingReview of Cut-Based Mapping
Input:Input: And-Inverter Graph And-Inverter Graph
1.1. Compute Compute KK-feasible cuts for each node-feasible cuts for each node2.2. Compute best arrival time at each nodeCompute best arrival time at each node
• In topological order (from PI to PO)In topological order (from PI to PO)• Compute the depth of all cuts and choose the best oneCompute the depth of all cuts and choose the best one
3.3. Iterate area recoveryIterate area recovery• Using area flowUsing area flow• Using exact local areaUsing exact local area
4.4. Chose the best cover Chose the best cover • In reverse topological order (from PO to PI)In reverse topological order (from PO to PI)
Output:Output: Mapped netlist Mapped netlist
S. Chatterjee et al, “Reducing structural bias in technology mapping”, Proc. ICCAD’05.
1111
Cost FunctionsCost Functions
Area flow Area flow
Wire flowWire flow
Switching flowSwitching flow
))((
))(()()(
nLeafNumFanout
nLeafAFnAreanAF
i
ii
))((
))(()()(
nLeafNumFanout
nLeafEFnEdgenEF
i
ii
))((
))(()()(
nLeafNumFanout
nLeafSwitchFlownSwitchnSwitchFlow
i
ii
(J. Cong, FPGA’99 S. Chatterjee, ICCAD’05)
(S. Jang, FPGA’08)
(This work)
1212
Understanding a Cost-Function FlowUnderstanding a Cost-Function Flow
n1
n2 n3 n4
Resources owned by the nodes
Resources owned by the nodes
n5
1313
SAT-based Re-synthesis FrameworkSAT-based Re-synthesis Framework
SAT-based re-synthesis (FGPA’09) has these featuresSAT-based re-synthesis (FGPA’09) has these features substantial optimization powersubstantial optimization power
• due to the use of internal don’t-caresdue to the use of internal don’t-cares
scalable local computationscalable local computation• due to the use of windowingdue to the use of windowing
practical computation speedpractical computation speed• due to the use of Boolean satisfiability for functional manipulationdue to the use of Boolean satisfiability for functional manipulation
ability to use various optimization objectivesability to use various optimization objectives• due to the flexible conceptual framework.due to the flexible conceptual framework.
1414
Two Ways to Cool Down a Hot WireTwo Ways to Cool Down a Hot Wire
n10
n9
n3n4
n2
n8
x
n1
a b
n5
zn3
n10
n9
n3
n2
n8
n5
Resub the hot wire with cool
n10
n9
n3
n2
n8
n5
Remove the hot wire
1515
Experimental SetupExperimental Setup
Considered 20 industrial designs (12K to 165K 6-LUTs)Considered 20 industrial designs (12K to 165K 6-LUTs) Used Intel Xeon 2-CPU 4-core computer with 8GB RAM Used Intel Xeon 2-CPU 4-core computer with 8GB RAM Verified the results using command “cec” in ABCVerified the results using command “cec” in ABC Experimental runs performed:Experimental runs performed:
BaselineBaseline: comb synthesis with choices: comb synthesis with choices• (dch; if –e)(dch; if –e)2 2 (WireMap [FGPA’08] is disabled)(WireMap [FGPA’08] is disabled)
FullOptFullOpt: complete flow including high-effort seq and synthesis: complete flow including high-effort seq and synthesis• ((scl; lcorr; scorrscl; lcorr; scorr) + () + (dch; ifdch; if))2 2 (WireMap is enabled)(WireMap is enabled)
PowerMapPowerMap: power-aware LUT-mapping: power-aware LUT-mapping• FullOpt + FullOpt + (dch; if –p)(dch; if –p)2 2
PowerDCPowerDC: power-aware resynthesis: power-aware resynthesis• PowerMap + (PowerMap + (mfs –pmfs –p))22
1616
Experimental DataExperimental DataDesign Statistics Base FullOpt PowerMap PowerDC
name PI PO FF LUT Lv Pwr FF LUT Lv Pwr FF LUT Lv Pwr FF LUT Lv Pwr
D01 4725 16657 43309 71956 14 63592 41868 70060 13 53666 41868 67214 12 49268 41868 65693 12 46139
D02 8561 20356 65881 144295 27 59306 41327 90304 27 34090 41327 86789 25 33418 41327 85798 25 30497
D03 8879 52334 41521 123845 11 79928 39884 122947 11 66571 39884 119625 10 61239 39884 117946 10 57068
D04 781 5563 16205 50328 10 38961 15123 48392 10 29901 15123 47361 9 26706 15123 46651 9 23938
D05 3343 5533 23740 65704 17 37158 18933 55512 13 25415 18933 52086 12 20771 18933 50973 12 16247
D06 3664 21989 29947 36188 8 36899 27896 33733 8 27844 27896 33220 8 25985 27896 32962 8 25351
D07 1284 2929 81328 164437 10 90849 74898 153317 11 66515 74898 145918 10 56595 74898 143357 10 52249
D08 261 359 4586 12412 22 6480 4463 12325 23 4938 4463 11986 21 4461 4463 11844 21 3326
D09 2561 9586 23612 55639 9 40244 16290 37736 8 21981 16290 35123 7 20053 16290 34103 7 17538
D10 3765 9987 37630 96677 31 62304 36665 95005 31 50132 36665 90167 30 43864 36665 88191 30 37483
D11 2418 6000 34834 76798 19 51724 34446 75724 20 43953 34446 70531 18 36783 34446 69117 18 32231
D12 1134 3965 8371 14939 12 11632 8256 15316 12 10480 8256 14825 12 9254 8256 14623 12 8624
D13 210 299 6662 15888 11 5766 6591 15381 11 3893 6591 14960 11 3265 6591 14710 11 2498
D14 2326 3713 61789 109865 18 22738 36887 67338 19 7956 36887 66681 17 7582 36887 66097 17 7537
D15 2312 5523 26233 49031 7 35819 13575 27498 6 20779 13575 26234 6 17972 13575 25871 6 16819
D16 5124 21571 39127 146931 17 18685 37772 146298 16 14087 37772 139226 16 13095 37772 135903 16 12966
D17 2587 7025 6975 12528 12 8152 6429 11780 12 7124 6429 11957 11 6904 6429 11834 11 6685
D18 3918 6110 23727 35996 13 24990 22260 34630 12 21728 22260 34215 10 19339 22260 33994 10 18297
D19 4633 7540 28376 43515 13 31148 26241 41415 12 26941 26241 41120 10 24176 26241 40839 10 22822
D20 6631 19368 58322 158216 25 101921 53581 144455 25 75781 53581 139174 22 64332 53581 136747 22 54554
Geom 2438 6590 25656 55456 14.1 31294 22118 48700 13.7 22621 22118 47049 12.6 20238 22118 46318 12.6 18190
Ratio 1 1 1 1 0.862 0.878 0.97 0.723 0.862 0.848 0.90 0.647 0.862 0.835 0.90 0.581
Ratio 1 1 1 1 1.000 0.966 0.92 0.895 1.000 0.951 0.92 0.804
Ratio 1 1 1 1 1.000 0.984 1.00 0.899
1717
Power Reduction Power Reduction due to Power-Aware Optimizationdue to Power-Aware Optimization
Power BaseLine FullOpt PwrMap PwrDC
Geomean 17312 12687 11445 10445
Ratio 1 0.73 0.66 0.60
Ratio 1 0.90 0.82
Ratio 1 0.91
Power BaseLine FullOpt PwrMap PwrDC
Geomean 31294 22621 20238 18190
Ratio 1 0.72 0.64 0.58
Ratio 1 0.89 0.80
Ratio 1 0.89
Table 1: ITable 1: Inputs toggle rate nputs toggle rate isis 0.25 0.25
Table 2: ITable 2: Inputs toggle rate nputs toggle rate isis 0. 0.5050
The results are geometric averages over 20 industrial designsThe results are geometric averages over 20 industrial designs
1818
CChanges in hanges in WWire ire RRatiosatios due to Power-Aware Optimizationdue to Power-Aware Optimization
Wire Comparison
-15.00%
-10.00%
-5.00%
0.00%
5.00%
10.00%
PowrMap vs FullOpt PowerDc vs. PowrMap
Red
uct
ion (neg
ativ
e is
good)
T5 T4 T3 T2 T1 Total Wrs
Wire group codes: T5: “hot wires” (p > 0.4) … T1: “cold wires” (p < 0.1) Wire group codes: T5: “hot wires” (p > 0.4) … T1: “cold wires” (p < 0.1) where p is the probability of switching (note that p can be more than 0.5)where p is the probability of switching (note that p can be more than 0.5)
1919
Power Dissipation per Wire GroupPower Dissipation per Wire GroupWith / Without Power-Aware OptimizationWith / Without Power-Aware Optimization
Power distribution before/after optimization
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
Switching frequency (temperature)
Per
centa
ge
Wire Wire2 Pwr Pwr2
Wire 13.90% 1.62% 0.91% 1.29% 82.27%
Wire2 11.44% 1.79% 0.93% 1.29% 84.55%
Pwr 85.90% 7.36% 2.68% 2.32% 1.74%
Pwr2 82.62% 9.48% 3.19% 2.69% 2.02%
T5 T4 T3 T2 T1
Wire (Wire2) are wires before (after) synthesis.Wire (Wire2) are wires before (after) synthesis.Pwr (Pwr2) are power dissipations before (after) synthesis.Pwr (Pwr2) are power dissipations before (after) synthesis.
2020
ConclusionsConclusions
Presented several contributionsPresented several contributions SimSwitch:SimSwitch: Estimation of switching activity Estimation of switching activity PowerMap:PowerMap: An extension of the priority cut LUT An extension of the priority cut LUT
mapper [ICCAD’07] to prioritize cuts based on mapper [ICCAD’07] to prioritize cuts based on switching activity of the nodesswitching activity of the nodes
PowerDC:PowerDC: An extension of SAT-based resynthesis An extension of SAT-based resynthesis [FPGA’09] to remove signals with high switching[FPGA’09] to remove signals with high switching
Demonstrated reductions in switching activity Demonstrated reductions in switching activity (without degradation of area and delay)(without degradation of area and delay) 27%27% reduction due to seq synthesis [ICCAD’08] and reduction due to seq synthesis [ICCAD’08] and
WireMap [FPGA’08] against a plain-vanilla flowWireMap [FPGA’08] against a plain-vanilla flow +19%+19% reduction due to PowerMap and WireDC reduction due to PowerMap and WireDC
described in this paperdescribed in this paper
2121
Future WorkFuture Work Speeding up switching activity estimationSpeeding up switching activity estimation
Current implementation can be made fasterCurrent implementation can be made faster More accurate power estimationMore accurate power estimation
Estimating glitching in addition to switchingEstimating glitching in addition to switching Making other transforms power-awareMaking other transforms power-aware
Computing power-aware choicesComputing power-aware choices Specialized logic structuring (power gating)Specialized logic structuring (power gating)
Sequential techniques for power reductionSequential techniques for power reduction Clock-gating that uses induction to compute signals Clock-gating that uses induction to compute signals
that are valid clock gates on the reachable statesthat are valid clock gates on the reachable states
2222
AbstractAbstract The paper describes several complementary algorithms for power-The paper describes several complementary algorithms for power-
aware logic optimization: (1) SimSwitch is an efficient sequential aware logic optimization: (1) SimSwitch is an efficient sequential simulator for estimating switching activity of signals in large simulator for estimating switching activity of signals in large sequential designs. (2) PowerMap uses switching activity to make sequential designs. (2) PowerMap uses switching activity to make better decisions during power-aware technology mapping. (3) better decisions during power-aware technology mapping. (3) PowerDC is a resynthesis algorithm that eliminates wires with high PowerDC is a resynthesis algorithm that eliminates wires with high switching activity. The proposed simulator draws on new ideas in switching activity. The proposed simulator draws on new ideas in logic representation and is geared for speed, e.g. it can simulate a logic representation and is geared for speed, e.g. it can simulate a 1M-node sequential design using 1000 bit patterns for 100 cycles in 1M-node sequential design using 1000 bit patterns for 100 cycles in about 10 seconds on a typical one-core CPU. Experiments show about 10 seconds on a typical one-core CPU. Experiments show that, although each technique contributes to the final quality, it is that, although each technique contributes to the final quality, it is their combination that gives the best results. When applied to their combination that gives the best results. When applied to industrial designs in a highly-optimized industrial flow, previous work industrial designs in a highly-optimized industrial flow, previous work on sequential synthesis and wire-aware technology mapping led to on sequential synthesis and wire-aware technology mapping led to a 27.6% reduction in switching activity, while the techniques of this a 27.6% reduction in switching activity, while the techniques of this paper reduce it additionally by 19.6% without a substantial increase paper reduce it additionally by 19.6% without a substantial increase in runtime or degradation of other metrics.in runtime or degradation of other metrics.