6
On-Chip Decoupling Capacitance and PIG Wire Co-optimization for Dynamic Noise Min Zhao, Rajendran Panda, Ben Reschke, Yuhong Fu, Trudi Mewettt, Sri Chandrasekaran, Savithri Sundareswaran, Shu Yan Fruxalc Semiconductor Inc., Austin, TX, 78729 +Australia Semicond~tctor Tedlnology Company Pty Ltd, Adelaide 5000, South Australia ABSTRACT the decap shorlld he transported very quickly to that gate Decap allocotion ore the primary methods for addressing the dynamic voltage noise pmhlem of on-chip power networks. When space in the immediate pmzimity of a hot spot is con- stmined, simply adding decoupling capacitance without im- proving the local wiring is ineffective. Rased on this key ob- servation, we propose an efieient co-optimiantion of demp allocation and locnl wiring enhancement. The method solves a linenr pm.pam (LP) itemtiuely and is based on the decap budgeting algorithm [lo]. Ezperimental results on two actual chip designs demonstrate the area and run-time efieiency of the co-optimixation algorithm. Moreover, it pmvides ezcel- lent solutions even in cases where decap allocation alone fails to pmuide a feasible solution. Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids-Layout; J.6 [Compi~tttr-Aided Engineering]: computer-aided design General Terms Algorithms, Reliability, Verification Keywords decap, On-chip Decmpling Capacitance, Wire Enhancement, Co-optimization, Dynamic noise 1. INTRODUCTION AND MOTIVATION in order to reduce the supply voltap;edroop. 1f the pathbe- tween the decap and the high current demand location has significant resistance, then the charge transportation cannot occitr &q fmt aq required. This diminishes the effectiveness of the decap and requires more decap to be added. However, the deeoupling capacitance density achievable with a given technology is limited. Moreover, the white space available around a problem spot may be limited. Due to these d o cap constraints, it is not always possible to place the entire amount of required decap close to a problem spot. Figure 1: Impact of decap constraints Figure 1 illustrates the impact of decap constraint on the total amount of decap r e q u i d for limiting the supply noise within 10% of Vdd in a real design. Decnp constraint is the maximum amount of decap allowed per unit area, which is the product of decap density of the decap cells (decap per unit area) and vacancy ratio (white space divided by to- tal area). The total decap (y-axis) was determined by first placing the deeap around the worst voltage nodes and then With the increast? in VLSI circltit freqllency and sllPPIY &dltG?v expanding tbe decap insertion Fegion in all direc- volt,aa? scaline. daimine a roh~mt distrihlltion tions until the voltage threshold was satisfied. The decap . ~ ~ ~ < , ~ -.~~ c,7 ,, r. work has became a challenging task. For limiting dynamic added at location was kept the maximum amount voltage fl~~ctuatious of power network, the chief techniqlte permitted hy the specified deeap constraints (x-axis). ~ i ~ - is to ,,lace dwoupling capacitors (known also as deenp) ure 1 assumes that the decap constraints are the same across c l m to the problem spots[8, 3, 9, 5, 61. However, decaps the We can see that the total decap needed increases are effective in s~~nnrnsqine noise onlv if nlm*d wrv rl-e to d'maticall~ when d ma~ constraint is tightened. -. . - .~ . =,.--- ~--" .--> ~.- 3 ----- -- the noise sources. When a large gate such as a clock driver Alternatively, dynamic noise could be sllppressed by de- make a sudden and high current demand, charge stored in creasing the resistances between the large current sinks and the decan or surrnlv aads. This can he done hv sizinpl nn Permission to make digital or hud copies of all or pan of this work far penanal or class- use is granted without fee provided that copies are not made w distributed lorpmfit orcommprcial advantage and that copies bear this notice and the full citation on the first page. To wpy othenvise, to republish, lo post on servers or to redistrihule to lists, requires prior specific prmirsion mdlar a fee. DACZIN7, June 48, 2007, San Diepo, California, USA. Copyright 2W7 ACM 97R-1-59593-627-lm7mna6 ... $5.00. .." . ". the local wires around the c~~rreut sinks and creating addi- tional paths to supply pads or decap locations. As we will show later (in Figure 5 of Section 5), there are often situ- ations where a hot-spot node is located physically close to another node connected well to the supply, hut the connec- tion between these 2 nodes is highly resistive. Adding lo- cal connections between these two nodes may dramatically decrease the decap required in that region. Cc-optimizing wires for dynamic noise benefits for sevcrd reasons: (1) Re 162 10.1

On-Chip Decoupling Capacitance and PIG Wire Co ... · grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill. In [lo], a macromodel composed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On-Chip Decoupling Capacitance and PIG Wire Co ... · grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill. In [lo], a macromodel composed

On-Chip Decoupling Capacitance and PIG Wire Co-optimization for Dynamic Noise

Min Zhao, Rajendran Panda, Ben Reschke, Yuhong Fu, Trudi Mewettt, Sri Chandrasekaran, Savithri Sundareswaran, Shu Yan

Fruxalc Semiconductor Inc., Austin, TX, 78729 +Australia Semicond~tctor Tedlnology Company Pty Ltd, Adelaide 5000, South Australia

ABSTRACT the decap shorlld he transported very quickly to that gate

Decap allocotion ore the primary methods for addressing the dynamic voltage noise pmhlem of on-chip power networks. When space in the immediate pmzimity of a hot spot is con- stmined, simply adding decoupling capacitance without im- proving the local wiring is ineffective. Rased on this key ob- servation, we propose an efieient co-optimiantion of demp allocation and locnl wiring enhancement. The method solves a linenr pm.pam (LP) itemtiuely and is based on the decap budgeting algorithm [lo]. Ezperimental results on two actual chip designs demonstrate the area and run-time efieiency of the co-optimixation algorithm. Moreover, it pmvides ezcel- lent solutions even in cases where decap allocation alone fails to pmuide a feasible solution.

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids-Layout; J.6 [Compi~tttr-Aided Engineering]: computer-aided design

General Terms Algorithms, Reliability, Verification

Keywords decap, On-chip Decmpling Capacitance, Wire Enhancement, Co-optimization, Dynamic noise

1. INTRODUCTION AND MOTIVATION

in order to reduce the supply voltap;edroop. 1f the pathbe- tween the decap and the high current demand location has significant resistance, then the charge transportation cannot occitr &q fmt aq required. This diminishes the effectiveness of the decap and requires more decap to be added. However, the deeoupling capacitance density achievable with a given technology is limited. Moreover, the white space available around a problem spot may be limited. Due to these d o cap constraints, it is not always possible to place the entire amount of required decap close to a problem spot.

Figure 1: Impact of decap constraints

Figure 1 illustrates the impact of decap constraint on the total amount of decap r e q u i d for limiting the supply noise within 10% of Vdd in a real design. Decnp constraint is the maximum amount of decap allowed per unit area, which is the product of decap density of the decap cells (decap per unit area) and vacancy ratio (white space divided by to- tal area). The total decap (y-axis) was determined by first placing the deeap around the worst voltage nodes and then

With the increast? in VLSI circltit freqllency and sllPPIY &dltG?v expanding tbe decap insertion Fegion in all direc-

volt,aa? scaline. daimine a roh~mt distrihlltion tions until the voltage threshold was satisfied. The decap . ~ ~ ~ < , ~ - . ~ ~ c,7 ~~~~ ,, ~~ r . ~ ~~~~ ~~~~~ ~~

work has became a challenging task. For limiting dynamic added at location was kept the maximum amount voltage fl~~ctuatious of power network, the chief techniqlte permitted hy the specified deeap constraints (x-axis). ~ i ~ - is to ,,lace dwoupling capacitors (known also as deenp) ure 1 assumes that the decap constraints are the same across

c l m to the problem spots[8, 3, 9, 5, 61. However, decaps the We can see that the total decap needed increases are effective in s~~nnrnsqine noise onlv if nlm*d wrv rl-e to d 'maticall~ when d m a ~ constraint is tightened. -. . - .~ .. =,.--- ~--" .--> ~ . - 3 ----- - - the noise sources. When a large gate such as a clock driver Alternatively, dynamic noise could be sllppressed by de- m a k e a sudden and high current demand, charge stored in creasing the resistances between the large current sinks and

the decan or surrnlv aads. This can he done hv sizinpl nn

Permission to make digital or hud copies of all or pan of this work far penanal or class- use is granted without fee provided that copies are not made w distributed lorpmfit orcommprcial advantage and that copies bear this notice and the full citation on the first page. To wpy othenvise, to republish, lo post on servers or to redistrihule to lists, requires prior specific prmirsion mdlar a fee. DACZIN7, June 48, 2007, San Diepo, California, USA. Copyright 2W7 ACM 97R-1-59593-627-lm7mna6 ... $5.00.

.." . " . the local wires around the c~~r reu t sinks and creating addi- tional paths to supply pads or decap locations. As we will show later (in Figure 5 of Section 5), there are often situ- ations where a hot-spot node is located physically close to another node connected well to the supply, hut the connec- tion between these 2 nodes is highly resistive. Adding lo- cal connections between these two nodes may dramatically decrease the decap required in that region. Cc-optimizing wires for dynamic noise benefits for sevcrd reasons: (1) R e

162

10.1

Page 2: On-Chip Decoupling Capacitance and PIG Wire Co ... · grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill. In [lo], a macromodel composed

ducing wire resistance reduces both the DC noise (IR drop) and the high-frequency AC noise (due to reduced resistance of discharge paths). Adding capacitance alone does not help to reduce the DC noise. (2) Locations that are congested for adding decap circuits may not be so for adding additional wires, as decap circuits largely use the device layers while the P/G wiring uses the metal routing layers. (3) While additional decaps add to area, increase leakage, and pose a higher risk for yield, adding or sizing up wires does not cause such problems.

In this paper, we propose a method for c~optimization of decoupling capacitance and local wiring enhancement. To the best of our knowledge, the only previous work that addresses the decap and wiring codesign is [9]. However, (1) the method in [9] is limited to regular mesh structures whereas our method works for regular as well as irregular p/g networks. (2) The wire optimization in [9] is limited to wire sizing while our method can do both wire sizing, as well as to~ology changes through the addition of n& wires. (3) The method in [9] needs many iterations of very costly adjoint sensitivity calculation of voltages with respect to decap locations and wires. The method proposed here ac- complishes the c~optimization through several iterations of linear programming on a reduced problem size.

We formulate the co-optimization problem as a constrained minimization. The solution aims at minimizing the overall m a cmt of decap and wires while realizing dynamic volt- ages better than a user-specified threshold level at all times. Basically, the decaplwire co-optinlization uses the synergy between decap optimization and wire optimization. We use the method proposed in [lo] for the decap optimization part and propose a new wire optimization algorithm (wire sizing and the wire addition) based on similar concepts as [lo].

2. BACKGROUND

2.1 Power Grid Simulation A chip's power distribution system is modeled as a linear

RLC network with independent timevarying current sources modeling the switching currents of the transistors. Simulat- ing the network m q u i r ~ solving the following systeu~ of dif- ferential equations, which are formed in a typical approach such as the Modified Nodal Analysis (MNA)[4] approach:

where G is a conductance matrix, C is the admittance ma- trix r~u l t i ng from capacitive azld indut-tive elements, ~ ( t ) is the time-varying vector of voltages at the nodes, and cur- rents through inductors and voltage sourcts, and b(t) is the vector of independent time-varying current sources. This differential system is very efficiently solved by reducing it to a linem algebraic system

(G + C / h ) - ~ ( t ) = b(t) + C / h . x(t - h ) ,

using the Backward Euler (BE) technique with fmed time step, h.

2.2 DecapOptimization Algorithm The decap/wire m-optimization proposed in this paper is

based on the decap optimization algorithm [lo], which will be reviewed briefly in this section.

2.2.1 Violation Region, Sampling Nodes, and Viola- tion Time Window

A violation region is illustrated in Figure 2. First, the die ia partitioned into uniformly sized tiles using pre-defined x and y pitches. A set of contiguous tiles wherein one or more nodes violate the voltage constraint defines the core violation region. The core violation region is then expanded in all directions by a predetermined distance (known as the effective radius) to include additional tiles. This expanded region is called the violation region. The expansion is based on the fact that decap added outside the expanded region has little or no impact on the voltages in the core region, and vim versa.

Sampling nodes are representative nodes sampled from each tile having a voltage violation. A few (less than 10) nodes per tile with the worst voltages within the tiles are sampled to represent the behavior of all node8 in those tiles.

-A lilt

The ccrc vidarion rc&ioo

The vi&m re@o

Figure 2: An illustration of t h e violation region The violation time windows for a sampling node are

determined &om the voltage waveform at that node obtained through the dynamic analysis of the power network. An ex- ample of a violation time window is [ts, tc] shown in Figure 3. t, is the time instance of maximum voltage that occurred before a violation and t, is the time instance when the volt- age recovers back to &hre after a violation. The voltage at t e is assumed to be the initial voltage of the decap before discharge.

2.2.2 Reduce prohkm size through macromodeling The current transfer characteristics of a part of the power

network, a current-based macromodel, can be written as:

h x t = A . V + I , ~ , v , I E R ~ , A E R ~ ~ ~ (3)

where m is the number of nodes in the model, A is the dmit tmce matrix, V is the vector of voltages of nodes in the model, I is the vector of the equivalent current sources connected between each mmmode l node and the refer- ence node, and I . ~ is the vector of unknown currents flow- ing through the mawornodel nodes %om the external sur- roundings. Macromodeling is the procedure of deriving the macromodel (3) from linear system of the entire power grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill.

In [lo], a macromodel composed of only sampling nodes in the violation region are generated and used to approximately represents the behavior of the entire power network. The rest of the nodes are abstracted away by macromodeling. The optimization assumes that all the decap is added a t the sampling nodes. After solving the LP, however, the decap at the sampling nodes are evenly distributed among the nodes in the tile regions that the sampling nodes belong to.

2.2.3 Charge-Based MNA Constraints By integrating the current-based macromodel (3) over the

violation time window, [t,, te] , a charge-based mmmodel can be obtained as below:

163

Page 3: On-Chip Decoupling Capacitance and PIG Wire Co ... · grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill. In [lo], a macromodel composed

where Q is .f:; L d t , Y is V d t , and B is ,f:*' I. As each equation of (3) is a KCL, equation (4) implies that the total charge flowing out of each node over the integration period is zero. The o~timization ~rocedure determines the amount of decap, u, which is to b e placed at every node i in the set of sampling nodes (SP). Now Qi is the charge flowing from the decap C, into the macromodel, approximated as C, x - r",), where V,,i and K are the voltages at t , and tc , r~pectively. In ordm to keep the voltage above K h r c ,

enough charge should be released &om the decap. Working with the objective of minimizing xIE C4 together, the equality constraints (4) could be relaxed to form inequality constraints:

M o C > A . Y + B (5)

where M = [VO,~ - Vi,K,a - Va, -%IT and C = [Ci , C2, .., CmITn The 0 operation represents the entry-wise product of vectors M and C.

2.2.4 Chrge-Based Voltage Constraints Figure 3 shows a voltage waveform at a node before and

after adding decap. The objective of adding decap is to pull up the voltage above the vthrc level as shown in Figure 3.

VdW

Figure 3: The voltage waveform with/without decap

The shaded area in the figure represents YI = Vdt. Suppose Vo,i and I4 represent the voltage of node d at time points te and te, respectively. By approximating the shaded area with a trapezoidal area, Y can be written as:

K (Vo,i + Vi) * ( t e - t s ) / 2 , i E SP,

where SP is the set of sampling nodes. By assuming the volt- age at te, i.e. K, as the worst voltage within the violation time window, the constraints that the dynamic node volt- ages are better thau a specified tlulreuhold is approximated

2.2.5 Sequence of Linear Programing (SLP) The decap optimization for a violation region is formu-

lated as a sequence of LP. In each iteration, a LP is formu- lated as follows:

minimize C~ESP subject to Y > E

MoCz A-Y+B zlGEtile Ck 5 Cmaz,i, Vtile i E violatiou region

where m = ISPI, C = [Cl,Ca, ..,CmIT,

Here, vector C and Y are vrtriables of LP optimization. M is assumed to be a constant vector, decided by the node volt- ages of the previous iteration. Cmo,,i represents the decap constraint of tile i. In each iteration, the LP is performed to determine the decap amount. Then the reduced network with the new decap is solved to get the updated voltages. M is then recalculated with the new voltage aud wed in the next iteration.

3. WlRE OPTIMIZATION The decap/wire co-optimization is the synergy between

decap optimization and wire optimization. Here, we pro- pose a new wire optimization algorithm that shares the same framework of decap optimization algorithm [lo] and thus could be easily combined with [lo] fm co-optimization.

3.1 Sequence of Linear Programming Given a network, the MNA equations of a system with

existing wires and wires to be added could be written as:

(A' + A ) . V + I = O

where A is the conductance of the existing network, and A' is the conductance to be added. Working with the optimiz* tion objective of minimizing total wires, we then relax (10) to inequality constraints and rewrite them as:

- 4 A + I V i E SP (11) 1SjSm l 9 5 m

w h m m i s the number of nodes in the power network. - Aij,5 is the total current &wing into node i throw1 the wue candidates of node d. Combined with con- straints K 2 I&,=, the constraints in (11) indicate that, in order to keep the voltage at node i above T/thre, enough cur- rent needs to flow through the wire candidates. Since both Cg: and Ah are variables, the constraints (11) are not lin- ear. Similar to [lo], we linearize the term - Cl<j<m through an iterative linear programming procedGE The Vg variables in tcrm - Cl..g<m ALVj arc trcatcd as constants in LP and obtained from previous iteration.

Sllppme a wire randidate's length md width are Lij and Wid respectively, p is the resistivity of the wire. Then a total current of Wi,(Vj - Vi)/pLii flows through the wire from node j into node i. Hence, the left hand side of (11) becomes

In (12), K , & are treated as constants and W;j as variables. So, without loss of generality of row and column indices, - Clijim xj& can be rewritten as a product of a constant mat& b and a variable vector W.

164

Page 4: On-Chip Decoupling Capacitance and PIG Wire Co ... · grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill. In [lo], a macromodel composed

Here, W repreents the widths of the wire candidates and each column of matrix D have two non-zero elements. Then, the MNA constraints (11) become:

Thus, the LP formulation of wire optimization is as follows:

minimize z,,j W=j Li j (15)

subject to V 3 [T/thre> %we, -.IT (l6) D . W > A * V + I (17)

W i j 5 Wmar,iil Vajcwirecandidates (18)

The constant parameters D here evaluate the charges flow- ing through the wire candidates and rely on the voltages of the nodm in the current LP iteration. The wire optimiza- tion procedure executes multiple iterations. Tn each itera- tion, the LP (15) is performed to determine the wire widths to be added. Then the network with the new wires is solved to get the updated voltages. D is then recalculated with the new voltages and used in the next iteration. The i tem tions of solving LP (15) and updating the w i r ~ and voltages continue until voltages of all nodes are above I/thre.

3.2 Region- w ise Wire Enhancement For a p/g network with millions of nodes, it is impractical

to optimize the entire network even with the SLP formula- tion proposed in section 3.1. Therefore, we propme to first determine metalization between neighboring regions of the power grid with the optimization procedure described in 3.1 and then implement the additional metalization through de- tailed routing between the regions. The procedure is quite similar tu the global arrd detailed routiug of geueral inter-

- -

connect optimi~ation [ I . Suppose that the power grid is divided into regions (or

tiles) as shown in Figure 4. When electrical connectivity be- tween two adjacent tiles is improved, charge in one tile can be transferred more effectively to the other tile. If a tile ad- jacent to a tile with hot spot sees a very stable voltage, then connecting (or enhancing the connectivity between) these tiles will bring dramatic improvements to the hot spot. Po- tentially, wiring can be added or enhanced between any two nodes in the network. Considering all such connections and uptimixi~~g them will be impractical a r~d uuzlecessary. To simplify the problem, we consider only 4 basic connections for any tile - the connections to its 4 neighboring tilm. With these 4 basic connections, any 2 distant tiles in the design can be connected together, if necessary, through a chain of connections. If the optimizer determinm that it is best to connect a region sitting N tiles away to a hot spot, then a sequence of adjacent connections will be established auto- matically to achieve this.

BO$A WV A tile l f l i f !/'

. > -. - [ - - I ' I ' t + -- ---- -

BOX B - -f _ . t f-. I.-- Wrn ..-,. f - * - - t ;..-. t .-..- --,-.-.

-. - - 4 - 1 - t t

Figure 4: Wire candidates between sampli~lg nodes

Similar to algorithm [lo], we select one sampling node in each tile to represent all the nodes in that tile and use

macromodeling to reduce the original network to a reduced network consisting only of sampling nodes. The SLP wire optimization is executed on the reduced network. The size of the reduced network could be orders of magnitude smaller than that of the original network. In constraints (17) of the LP formulation (151, the MNA equations of wire optimiz* ti011 are replaced with macroluodel equatiom (3), where o111y variables of the sampling nodes remain. Wij id (15) now rep- resents the wires that go across the boundary of tile i and tile j. Suppose rn is the the size of the sampling nodes. Be- cause wiring enhancements are done only between adjacent pairs of nodes, then IWI < 2m and D is a sparse m x IWI matrix with at most 4 elements in each raw. In wire width constraints (18), Wm*z,;j in constraints Wij 5 Wmaa,ij are determined by the available routing track in tiles i and j . This information can be easily obtained from the routed layout.

ARer optimization, the added metalization between 2 neigh- boring tiles (represented by a pair of sampling nodes) is dis- tributed within their companding regions. Suppose the optimizer determines that Wij as the additional metaliza tion between tiles i and j, then Wij is routed within the Box A region. Similarly Wmn is distributed within Box B, as shown in Figure 4. In fact, on occasions, the designers prefer to get a high level guidance such as how much metal should be added between 2 tile regions for improvement, rather than the optimizer making detailed changes to the wiring topology.

Since we use one sampling node to approximately repre sent an entire tile region, it is imperative that all the nodes in a tile should have the similar voltage behavior. Suppose Vmin,i and Vm2,i are the minimum and maximum voltages in tile i, respectively, Vm,,,d - Vminri is used as a metric for tile generation. In our implementation, we initially divide the chip into uniform tiles of very small size. Then we con- tinuously merge neighboring tiles whose joint Vmaz - Vmin is minimum until the LP problem size is within control. Tiles could also be cut through layers and thus can be treated three dimensional.

4. DECAP/WIRE CO-OPTIMIZATION As pointed out in Section 1, the best way to suppress the

dynamic noise is through a combinational effort of decap insertion and wire enhancement. In this section, we set out to minimize the total area of decap and wires to be added, subject to the constraints that (i) the dynamic node voltages are better than a specified threshold, (ii) the node voltages satisfy the MNA equations d t h e network a t any time point, (iii) the capacitance that can be added at a node is bounded, and (iv) the wire widths that can be added between tiles are bounded.

4.1 Merging of LP Formulations (8) and (15) The decap optimization algorithm [lo] and the wire opti-

mization described in Section 3 follow the same framework: (i) Reduce the original network through region-wise sam- pling and macromodcling; (ii) optirnizc thc rcduccd nctwark through the sequence of linear programming. Therefore, the LP fnrm~ilatinn nf the &ap/wire co-nptimizatinn cnuld he easily obtained by merging the LP formulation of wire opti- mization (15) and decap optimization (8).

In co-optimization, decap optimization and wire optimiz* tion use the same tiles and select one node in each tile as a

165

Page 5: On-Chip Decoupling Capacitance and PIG Wire Co ... · grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill. In [lo], a macromodel composed

sampling node. For each problem spot, only the P I G wires of the nearby region, i.e. violation region, are optimized. The network is macmmodeled with the sampling nodes of the violation regions, thus abstracting away the hulk of the network.

In Section 3, the LP formulation of the wire optimization is built on the current-based macromodel. Alternatively, the LP formulation of wire optimization could be h a d on the charge-based nodal equations (4) described in Section 2.2.3. In this way, the MNA constraints of (5) and (14) could be merged into

M O C + D ' . W > A . Y + B . (19)

S t e p 2 Update MP and DIP. By plugging V P into for- mula (9), we can get M P . By substituting Y P into (20), we can get Dlp.

S t e p 3 Determine the decap budget Cp and the wire widths W P by solving the LP problem (21).

S t e p 4 Limit the step length. Since 3 varies with W,j, we need to keep the change in W;, small enough so that is still valid for evaluating the Wi,'s influence on the charge flowing through the wire. This is true also for Ci. Given the max step size for capacitance, C,, and max step size for wire width, W,,

J ohtained by replacing V, with Y; in matrix D of (13). M o C is the vector of charges flowing from the d m p into the netarork through the sampling nodes and D'.W is the vector of the charges flowing through the wire candidates. Both sources of charges could help with pulling the voltage up to the specified threshold.

4.2 The Complete LP Formulation The complete LP formulation of the co-optimization pmb-

lem for a violation region is aq follows.

minimize r* . Ci Cilds + (1 - a) . Ci,j WijL;j (21) siibject to Y > E

M o C + D 1 . W > A . Y + B

C; 5 Cmr,i, Wile i E violation region

Wjj 5 W,,,.,,,, V i j € wire candidates

where variable vector C = [c~,Cz,.., CmIT represents the decap amount to be added a t the sampling nodes, and W = [Ww, W13, .., W,,,*lT represents the wire widths to he added between the sampling nodes. Here, constant parameters dr is the capacitance density of the decap cells for a given tech- nology. Wij and L;j are the wire width and length, re- spectively. Xi Cildm is the area of additional dwap and X. . Wi,Lij is the area of additional wires. Cmaz,i is set de-

1.1 pending on local congestion and decap density. Routing re- source constraints between tile regions are imposed through

E, d d n e d in (9), represents the voltage threshold. M , defined in (9), evaluates the charge flowing out of the decap to he added. D', defined in (20), evaluates thecharge flowing through the wire candidates.

The constant parameters D' and M here rely on the volt- ages of the sampling nodes. These are updated at each LP iteration. The decaptwire update procedure is discussed in the next section.

4.3 LP Iterations and Overall Flow The complete iterative LP and decaplwire update proce-

dure is as follown: S tep 1 Set iteration index p = 0 and the initial value of

A', B'. v'. YO. H m , the A' and B' are ohtained throngh the macromodeling prorrdrre dewihed in Swrion 2.2.3. V O

is ohrained from the initial transient analmis r e~ l l t s . Y O is calculated as x0 = s:; V,' dt

If C? 5 > - C,, q = C,; If w. > w,, w; = w,

$1 - S t e p 5 Update AP and BP by stamping the new decap

C and the new wires W into A and B. Stamping W into matrix A is straightforward, and is tha same a9 the regular conductance stamping of MNA [4]. C is stamped into A" and BP through the ass~~mption (6) addressed in section 2.2.4. By substitilting I$ = f i - K.i, the total charge flowing into node i from the capacitance Ci becomes:

After stamping -2Vo,,C;,Vi E SP into BP, and a , V i E SP into the diagonal position of AP, the charge nodaf equa- tions of the macromodel hetome:

S t e p 6 Update Y P and V P . Y P represents the average voltage multiplied by the time period [t,, t,] after the new capacitance and new wires are added, which is ohtained by solving the linear system (22). V P indicates the voltage value a t time point t . after the new capacitance and new wires are added. By applying the approximation (6) of Sec-

2YP tion 2.2.4, we can get Kn = - K,;.

S t e p 7 If V, 2 Vi € ESP, then stop; else set p = p + l and go to step 2.

Note that the above iterations are performed on the re- duced network that consists only of the sampling nodes of each individual violation region. Since IWI < 2m, and all the other vectors are of dimension m, the computations for the LP in step 3 and the linear solutions in step 6 are in the order of m.

Similar to algorithm [lo], we iteratively run transient anal- ysis, identify the violation regions and optimize for each in- dividnal violation region until no new violation region are found. From column 9 of Table 1, we can see that voltage constraints are satisfied typically in 1-2 iterations.

5. EXPERIMENTAL RESULTS The proposed techniq~les were implemented in our in-

house power grid analysis tool[2]. An efficient direct linear solver based on Cholesky factorization was used for macro- modeling and a public domain linear programming solver, Ipsolve[l] to solve the linear programs. All experiments were performed on Linnx machines with 3.2GHz CPU and lGR2GR memory.

We benchmarked the performance of the proposed tech- nique using global power networks from 2 real designs. The nnmher of nodes, supply voltage, number of time steps and

166

Page 6: On-Chip Decoupling Capacitance and PIG Wire Co ... · grid (2), thrn~igh either Galmian eliminatinn nr a mnre d- ficient procedure proposed in [Ill. In [lo], a macromodel composed

Table 1: Comnarsion of deca~/wire wo~t imizat ion met hod with decan-onlv method

CPU time for running dynamic analysis are listed in Table

Table 2: The circuits information

Table 1 compares the results of decaplwire co-optimization method with the wmlts of the decap allocation only. Most of the decap-only results were obtained by the decap opti- mization algorithm [lo], except for the results for decap con- straints of 20 f f /urn2 and 15 f f /urn2 of Chip-2. Those two results were obtained by using the method of the uniform decap allocation and gradual expansion of insertion region, as described in Section 1. For these two cases, the uniform allocation method generates better results than [lo]. The voltage threshold of both the circuits were set to 90% of the supply voltage. To illustrate the trend for the decap amount required, we list the results for various decap constraints and assume decap constraints are the same a$.rms the chip

In Table 1; the specified decap constraints are in column 2. The amount of decap added and the total area with the decap-only method are listed in Columns 3 and 4, respec- tively. The amount of decap added, the area of wires added, the total area of decaps and wires, the CPU time, the num- ber of external iterations and the number of internal itera- tions of our -optimization method are listed in Columns 5-10. The percentage of total area reductiou is shuwn in Col- umn 11. The CPU time here includes the time for the en- tire optimization flow, including the pre-optimization tran- sient analysis, dwapjwire ccc~ptimixat~ion, and the post- optimization trans& analysis verification. The number of external iterations refers to the number of optimization and transient analysis verification iterations. The total number of transient analysis runs is #extiter + 1. The number of internal iterations refers to the iterations of LP (step 1-7) addressed in Section 4.3. Listed in Column 10 is the average number of internal iterations for all violation regions.

From Table 1, we can see that the decap and wire co- optimization mcthod is always using l a s arca than thc dccap- only method. When the space around the worst voltage spot is arleq~~ate, the cn-nptimization methnd gives mnder- ately better results than the decap-only method. But, when current demand is dramatic and the capacitance constraints are tight, co-optimization shows a much greater advantage. In Chip2, we found, somewhat to our surprise, that all the

three violation regions could be easily fixed with a couple of wire enhancements. Additional decap was not needed. This situation is illustrated in figure 5. The node A (on Metal 2) in a row of standard cells was originally the worst IR dmp node since its connection to M4 was detoured. The detour involved thin M2 segments (inside the standard cells) which were abutted to one another. With the addition of few M3 strips to that region, the voltage at A was improved without adding any decap in or around that region.

Figure 5: Illustration of Chip-2's violation region

6. REFERENCES [I] M. R. C. M. Berkelaar, et aI. LP SOLVE 5.5 Users'

Manual. 2005. [2] A. Dharchoudhury, et al. Design and analysis of power

distribution networks in PowerPC microprocessors. In DAC, pages 738-743, 1998.

[3] J. Fu? ct a!. A fast dwoupliug capxitor budgetiug algorithm for robust on-chip power delivery. In ASP-DAC, pages 505-510, 2004.

[4] C. Ho, A. Ruehli, and P. Brennan. The modified nodal approach to network analysis. IEEE Tkapas. Circuits and Systems, CAS-22(6):504-509, 1975.

[5] H. Li, et al. Partitioning-bascd appromh to fast on-chip decouplin capacitor budgeting and minimization. In &c, pages 170-175, 2005.

[6] 2. Qi, et al. On-chip decoupling capacitor budgeting by sequence of linear pmgammin . In Proceedings of 6th International Conference on SIC, 2005.

[q N. A . S h m n i . Algorithms for VLST Phgrsiral Design Aatornlation. Springer, 3rd Edition, 1998.

[B] H. Su, S. S. Sapatnekar, and S. R. Nassif. An algorithm for optimal decoupling capacitor sizing and placement for standard cell layouts. Tn ISPD, pages 68-73. 2002. - - , - -

[9] K. Wang and M. M. Sadowska. On-chip power supply network optimization using rnultigrid-based technique. In DAC, pages 1U-118, 2003.

[lo] M. Zhm, et aI. A fast ou-chip decoupling capacitance budgeting algorithm using macromodeling and linear progmmming. In DAC, 2006.

[ l l ] M. Zhan, et aI. Hierarchical analaysis of power distribution networks. TCAD, Feb. 2002.

167