6
An Efficient Spectral Graph Sparsification Approach to Scalable Reduction of Large Flip-Chip Power Grids Xueqian Zhao Department of ECE Michigan Technological University Houghton, Michigan 49931 Email: [email protected] Zhuo Feng Department of ECE Michigan Technological University Houghton, Michigan 49931 Email: [email protected] Cheng Zhuo Intel Corporation 2501 NW 229th Ave Hillsboro, OR 97124 Email: [email protected] Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable for scalable power grid reductions due to the fast growing computational complexity and the large number of ports. In this work, we present a scalable power grid reduction method for reducing large-scale flip-chip power grids based on recent spectral graph sparsification techniques. The first step of the proposed approach aggressively reduces the large power grid blocks into much smaller power grid blocks by properly matching the effective resistances of the original power grid networks. Next, an efficient spectral graph sparsification scheme is introduced to dramatically sparsify the relatively dense power grid blocks that are generated during the previous step. In the last, an effective grid compensation scheme is proposed to further improve the model accuracy of the reduced and sparsified power grid. Since reduction of each power grid block can be performed independently, our method can be easily accelerated on parallel computers, and therefore expected to be capable of handling large power grid designs as well as incremental designs. Extensive experimental results show that our method can scale linearly with power grid sizes and efficiently reduce industrial power grids sizes by 20X without loss of much accuracy in both DC and transient analysis. I. I NTRODUCTION As the relentless technology scaling reaches into nano-scale regime, design and verification of large power delivery networks (PDNs) have been hindered by the every-increasing design complexity imposed by the huge number of components. Since direct modeling and simulation of large PDNs can be very computationally expensive and even intractable, a significant amount of research effort has been made in recent years to reduce the original large power grid systems into much smaller ones such that they can be further used for much faster circuit simulations during the design and verification of large PDNs. Several model order reduction algorithms proposed in the past decades can be used for power grid reductions, which typically fall into the following categories: Krylov-subspace based model order reduction methods via moment matching [1]–[3], nodal elimination methods [4], [5], and multigrid-like reduction methods [6], [7]. Unfortunately, existing model order reduction methods may face with many difficulties when reducing large-scale flip-chip power grids that include hundreds or thousands of ports. Krylov-subspace based model order reduction methods via moment matching are not realizable, and the reduction efficiency may be limited when handling the large number of ports. The TIme-Constant Equilibration Reduction (TICER) algorithm is a realizable linear circuit reduction method, and has been very successful in reducing tree-like RC or RLC networks by eliminating low-degree nodes and connecting new edges between existent nodes [4], [5]. However, reducing flip-chip power grid circuits with massive mesh-structured interconnects using TICER may introduce too many new edges (e.g. even more edges than the original power grid), making the reduced power grid models extremely difficult to use. In [8], the authors proposed a hierarchical power grid analysis method by invoking Schur complement-based reduction algorithm and 0-1 knapsack sparsification technique applied on each block of the grid. Although it can achieve very good accuracy, it may still suffer from heavy computational cost. Since each reduced Spectral Sparsification & Model Compensation Spectral Graph Sparsification Reduced Power Grid Block Reduction & Model Stitching Original Power Grid Fig. 1. Proposed scalable power grid reduction approach based on recent spectral graph sparsification techniques. block can contain more than hundreds to thousands of ports while the Schur complement reduction will build a near-complete graph, the reduced block may include more than hundreds of thousands of edges. Multigrid-like power grid reduction methods [6], [7] can quickly reduce the number of power grid nodes as well as edges through multilevel node reductions/aggregations based on electrical connectivity properties, and generate realizable power grid models, but these methods typically lack effective error prediction and control schemes that are critical for accuracy-aware power grid reductions. In this work, we propose an efficient spectral graph sparsifi- cation approach to scalable power grid reduction by introducing an effective resistance-oriented port-merging scheme and a spectral graph sparsification technique [9]. It is well known that although the Schur complement method can reduce a large power grid into a much smaller network [6], [10], solving the resultant reduced but extremely dense grid may cost even greater runtime and memory than solving the original system. To address this issue, we propose a spectral graph sparsification method based on recent spectral graph sparsification theory [9] to sparsify a dense graph into a much sparser one while maintaining almost the same spectral characteristics. How- ever, directly applying spectral graph sparsification to large power grids including more than hundreds of thousands elements can be computationally intensive due to the high cost for computing edges probabilities and creating the sparsifier [9]. In this work, as illustrated in Fig. 1, a divide-and-conquer power grid reduction and sparsification approach that include the following key steps will be exploited: 1) the original large power grid is first partitioned into massively number of smaller power grid blocks; 2) by merging outgoing ports that exhibit small effective resistances and using Schur complement method [10], each individual power grid block can be efficiently reduced independently; 3) the reduced dense grid blocks are subsequently sparsified using the proposed spectral graph sparsification scheme based on recent spectral graph sparsification theory [9]; 4) in the last, the reduced and sparsified block power grid models are stitched 978-1-4799-6278-5/14/$31.00 ©2014 IEEE 218

An Efficient Spectral Graph Sparsification Approach to ...zhuofeng/MTU_VLSI_DA_files/... · Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Efficient Spectral Graph Sparsification Approach to ...zhuofeng/MTU_VLSI_DA_files/... · Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable

An Efficient Spectral Graph Sparsification Approach toScalable Reduction of Large Flip-Chip Power Grids

Xueqian ZhaoDepartment of ECE

Michigan Technological UniversityHoughton, Michigan 49931Email: [email protected]

Zhuo FengDepartment of ECE

Michigan Technological UniversityHoughton, Michigan 49931Email: [email protected]

Cheng ZhuoIntel Corporation

2501 NW 229th AveHillsboro, OR 97124

Email: [email protected]

Abstract—Existing state-of-the-art realizable RC reduction methodsmay not be suitable for scalable power grid reductions due to thefast growing computational complexity and the large number of ports.In this work, we present a scalable power grid reduction methodfor reducing large-scale flip-chip power grids based on recent spectralgraph sparsification techniques. The first step of the proposed approachaggressively reduces the large power grid blocks into much smaller powergrid blocks by properly matching the effective resistances of the originalpower grid networks. Next, an efficient spectral graph sparsificationscheme is introduced to dramatically sparsify the relatively dense powergrid blocks that are generated during the previous step. In the last, aneffective grid compensation scheme is proposed to further improve themodel accuracy of the reduced and sparsified power grid. Since reductionof each power grid block can be performed independently, our methodcan be easily accelerated on parallel computers, and therefore expectedto be capable of handling large power grid designs as well as incrementaldesigns. Extensive experimental results show that our method can scalelinearly with power grid sizes and efficiently reduce industrial power gridssizes by 20X without loss of much accuracy in both DC and transientanalysis.

I. INTRODUCTION

As the relentless technology scaling reaches into nano-scaleregime, design and verification of large power delivery networks(PDNs) have been hindered by the every-increasing design complexityimposed by the huge number of components. Since direct modelingand simulation of large PDNs can be very computationally expensiveand even intractable, a significant amount of research effort has beenmade in recent years to reduce the original large power grid systemsinto much smaller ones such that they can be further used for muchfaster circuit simulations during the design and verification of largePDNs.

Several model order reduction algorithms proposed in the pastdecades can be used for power grid reductions, which typically fallinto the following categories: Krylov-subspace based model orderreduction methods via moment matching [1]–[3], nodal eliminationmethods [4], [5], and multigrid-like reduction methods [6], [7].Unfortunately, existing model order reduction methods may facewith many difficulties when reducing large-scale flip-chip powergrids that include hundreds or thousands of ports. Krylov-subspacebased model order reduction methods via moment matching arenot realizable, and the reduction efficiency may be limited whenhandling the large number of ports. The TIme-Constant EquilibrationReduction (TICER) algorithm is a realizable linear circuit reductionmethod, and has been very successful in reducing tree-like RC orRLC networks by eliminating low-degree nodes and connecting newedges between existent nodes [4], [5]. However, reducing flip-chippower grid circuits with massive mesh-structured interconnects usingTICER may introduce too many new edges (e.g. even more edgesthan the original power grid), making the reduced power grid modelsextremely difficult to use. In [8], the authors proposed a hierarchicalpower grid analysis method by invoking Schur complement-basedreduction algorithm and 0-1 knapsack sparsification technique appliedon each block of the grid. Although it can achieve very good accuracy,it may still suffer from heavy computational cost. Since each reduced

Spectral Sparsification&

Model Compensation

Spectral Graph Sparsification

Reduced Power Grid

Block Reduction&

Model Stitching

Original Power Grid

Fig. 1. Proposed scalable power grid reduction approach based on recentspectral graph sparsification techniques.

block can contain more than hundreds to thousands of ports whilethe Schur complement reduction will build a near-complete graph,the reduced block may include more than hundreds of thousandsof edges. Multigrid-like power grid reduction methods [6], [7] canquickly reduce the number of power grid nodes as well as edgesthrough multilevel node reductions/aggregations based on electricalconnectivity properties, and generate realizable power grid models,but these methods typically lack effective error prediction and controlschemes that are critical for accuracy-aware power grid reductions.

In this work, we propose an efficient spectral graph sparsifi-cation approach to scalable power grid reduction by introducingan effective resistance-oriented port-merging scheme and a spectralgraph sparsification technique [9]. It is well known that althoughthe Schur complement method can reduce a large power grid intoa much smaller network [6], [10], solving the resultant reduced butextremely dense grid may cost even greater runtime and memorythan solving the original system. To address this issue, we proposea spectral graph sparsification method based on recent spectral graphsparsification theory [9] to sparsify a dense graph into a much sparserone while maintaining almost the same spectral characteristics. How-ever, directly applying spectral graph sparsification to large powergrids including more than hundreds of thousands elements can becomputationally intensive due to the high cost for computing edgesprobabilities and creating the sparsifier [9]. In this work, as illustratedin Fig. 1, a divide-and-conquer power grid reduction and sparsificationapproach that include the following key steps will be exploited: 1) theoriginal large power grid is first partitioned into massively number ofsmaller power grid blocks; 2) by merging outgoing ports that exhibitsmall effective resistances and using Schur complement method[10], each individual power grid block can be efficiently reducedindependently; 3) the reduced dense grid blocks are subsequentlysparsified using the proposed spectral graph sparsification schemebased on recent spectral graph sparsification theory [9]; 4) in thelast, the reduced and sparsified block power grid models are stitched

978-1-4799-6278-5/14/$31.00 ©2014 IEEE 218

Page 2: An Efficient Spectral Graph Sparsification Approach to ...zhuofeng/MTU_VLSI_DA_files/... · Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable

together to form the final power grid model.

This new power grid reduction flow has the following advantageswhen compared with previous reduction techniques:

1) Since the proposed power grid reduction and sparsificationprocedures can be applied independently to each grid block,it can efficiently handle very large power grid designs withmassive number of ports, and be easily accelerated onparallel computers.

2) This novel approach inherently allows for very efficient on-demand power grid model update, and thus can facilitatevery fast incremental power grid design and optimizations.Since only the modified small power grid blocks needto be reprocessed by the proposed block reduction andsparsification methods, the updated reduced power gridmodel can be quickly generated by stitching these newblock power grid models with the existing unmodified ones.

Our experimental results show that the proposed reduction methodcan scale linearly with power grid sizes and efficiently reduce thesizes of large industrial power grid designs by a factor of 20 withoutloss of much accuracy in both DC and transient analysis. We want toemphasize that although this work is mainly focusing on RC powergrid reduction problems, there is still a good potential for us to extendthis technique to handle RLC power grid reductions in the future.

The rest of this paper is organized as follows: the backgroundof spectral graph sparsification technique, as well as the power gridmodeling and simulation, are reviewed in Section II. In Section III, weintroduce an efficient spectral graph sparsification approach to scal-able power grid reduction by applying effective resistance-orientedport-merging scheme, Schur complement-based block grid reductionmethod, and spectral graph sparsification technique with graph scaling(GS) method. Section IV demonstrates extensive experimental resultsfor a set of IBM benchmarks to validate the proposed approach, whichis followed by the conclusion of this work in Section V.

II. BACKGROUND

A. Power Grid Modeling and Analysis

Power grid analysis usually falls into the following categories:DC or steady state analysis, and transient simulations. Power gridmodeling for DC and transient simulations (TR) of a circuit with nunknowns can be formulated using nodal analysis (NA) as follows[11]:

DC : G�x = �b, (1)

TR : Cd�x(t)

dt+ G�x(t) = �b(t), (2)

where G ∈ Rn×n denotes conductance matrix, C ∈ R

n×n denotescapacitance matrix, �x ∈ R

n is a solution vector storing all thenodal voltages, and �b ∈ R

n is the input excitation vector includingall excitation sources. In the steady state DC analysis, power gridsare modeled as a network of pure resistors with excitation sourcessuch as current and voltage sources. It can be also shown thattransient analysis can be performed by first replacing the energy-storage components such as capacitors or inductors with companionmodels that include resistors and current/voltage sources throughbackward Euler (BE) or Trapezoidal (TR) approximations [12], andsubsequently solving the equivalent DC problems.

B. Spectral Graph Sparsification

Spectral graph sparsification is to approximate a graph by findinga sparse subgraph so that it can well preserve the Laplacian quadraticform of the original graph for any real vector inputs [9]. The Laplacianquadratic form of a weighted graph A = (V, E, w) is given by

�xT LA�x =�

(u,v)∈E

w(u,v)(�x(u) − �x(v))2, (3)

where �x ∈ RV is the real vector input, w(u,v) is the weight of edge

connecting vertices u and v, and LA is the Laplacian matrix definedby

LA(u, v) =� −w(u, v), if u �= v�

w(u, v), if u = v.(4)

Spectral graph sparsification can well capture the spectral similaritiesbetween the original graph A and its sparsifier A that is said to bea σ-spectral approximation of A if for all �x ∈ R

V the followingconditions are valid [9]:

1

σ�xT LA�x ≤ �xT LA�x ≤ σ�xT LA�x. (5)

It can be shown that �xT LA�x also denotes the power dissipation ofa linear resistive circuit network with the conductance matrix LA. Itwas also shown in [9] that for every weighted graph A = (V, E, w),and every ε > 0, there is a re-weighted subgraph of A with O(n/ε2)edges that is a (1 + ε) approximation of A.

In practical applications, spectral graph sparsifiers can be obtainedby using the following procedures [13]: 1) the Laplacian matrix of thegraph is first factorized once and solved m times in order to computethe effective resistance of m edges; 2) the probability for samplingeach edge is computed based on the effective resistance and the edgeweight value; 3) each edge is then sampled many times using itscorresponding probability to form the sparsifier. It is shown in [13]that to create a 1± ε sparsifier of a graph A, each edge of the graphA has to be sampled for M = O(n log n

ε2 ) times, where n is thenumber of vertices of graph A. For each sample, edge e is assignedwith a probability pe proportional to qe = weR

effA (e), where we is

the conductance of e, and ReffA (e) is the effective resistance on e.

If e is selected in a sample, we add it to the sparsifier with weightwe

Mpe. Therefore the updated weight for edge e can be computed by

w′e =

Kwe

Mpe, (6)

if edge e is selected K times in M samples.

The effective resistance ReffA (e) on edge e that connects vertices

a and b of graph A can be simply written as:

ReffA (e) = (�ea − �eb)

T LA−1(�ea − �eb), (7)

where �ea and �eb are the a-th and b-th unit vectors, respectively.

III. SCALABLE POWER GRID REDUCTION APPROACH

A. Overview of Our Method

Our power grid reduction approach is illustrated in Fig. 1,including the following key steps: 1) the original power grid isfirst geometrically partitioned into many smaller blocks, where theblock sizes are determined based on desired model accuracy andcomplexity; 2) the outgoing ports of each power grid block areto be kept or merged based on their effective resistance values, asshown in Fig. 2; 3) an efficient network reduction method [10] basedon Schur complement is applied to obtain the reduced yet denseconductance matrices corresponding to the reduced power grid blocks;4) the reduced dense grid blocks are significantly sparsified using theproposed spectral graph sparsification technique, and subsequentlystitched together to form the final reduced model that is usuallymuch smaller than the original power grid. When dealing with RCpower grids, grounded capacitors can be evenly distributed withineach reduced grid block to match the total grounded capacitance valueof its corresponding block in the original power grid.

219

Page 3: An Efficient Spectral Graph Sparsification Approach to ...zhuofeng/MTU_VLSI_DA_files/... · Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable

Port b

Port a

Block i

Left Right

Back

Front

Port c

Port d

Port 1

Port 2

Block i

Port a’ Port dPort 1’

Port dPort a

Port cPort b

Port 2Port 1

Front Right

Merge a, b and c Merge 1 and 2

Original block and ports Merge ports based on effective resistance

Fig. 2. Port merging based on effective resistance.

B. Block Power Grid Reduction

For a power grid network with n nodes and m currents sources,(1) can be rewritten as follows [10]:�

G11 G12

GT12 G22

� ��vp

�vnp

�=�

B0

��ip, (8)

where G11 ∈ Rm×m can be considered as the port block, G12 ∈

Rm×(n−m) is a block denoting the connections between port nodes

and non-port nodes, G22 ∈ R(n−m)×(n−m) denotes the non-port

block, �vp ∈ Rm and �vnp ∈ R

(n−m) are the voltage vectors for portand non-port nodes, respectively, B ∈ R

m×m is the incidence matrixfor current sources, and �ip ∈ R

m is the input excitation vector. Ifk-th current source of �ip is injected at node a, B(a, k) = 1.

If we only keep the port nodes but eliminate all of non-port nodes,a much smaller equivalent system can be obtained by computing theSchur complement of G22 [10]:

(G11 − G12G−122 GT

12)�vp = B�ip. (9)

Since G22 is a symmetric positive definite (SPD) matrix, the equiv-alent system conductance matrix Geq can be computed by:

Geq = G11 − G12(L−1)T L−1GT

12, (10)

where Cholesky decomposition leads to G22 = LLT .

Unfortunately, in many cases, eliminating all the non-port nodesof a graph can result in a reduced but dense conductance matrixGeq that corresponds to a nearly-complete graph: when using Schurcomplement method to reduce large-scale power grids that containmillions of grid nodes and thousands of port nodes, eliminating allthe non-port nodes can lead to a much smaller system but also muchdenser conductance matrices that can not be efficiently computedduring circuit simulations. Even for the partitioned grid blocks, thenumber of ports on the boundaries can be still high, making the finalreduced model rather costly to use.

C. An Efficient Port-Merging Scheme

In this work, the following port-merging scheme based on ef-fective resistance computations will be used to quickly reduce thenumber of ports for each power grid block: if a resistive path withina power grid block is connecting two boundary nodes (port nodes),and its effective resistance is not greater than a given threshold (e.g.thresR), then the two port nodes can be merged into a single portnode. It is obvious that a larger thresR value will result in less num-ber of ports and thus lower model complexity, at the cost of increasedmodel errors. For instance, if the model accuracy after merging is notsatisfactory, we can progressively decrease the threshold thresR suchthat fewer boundary nodes are merged and better model accuracy canbe obtained. To better illustrate the proposed port-merging scheme,consider the example shown in Fig. 2. Since there exist resistive pathsconnecting port nodes a, b and c, and the effective resistance valuesbetween any two of these port nodes are smaller than thresR, a, band c can be merged into a single port node. However, since we can

awdsmp 1.2< 0.9×awdorigM samples

awdsmp 1.4< 0.9×awdorigM+m samples

awdsmp 1.6> 0.9×awdorigM+2m samples

awdorig 1.7

given threshold =0.9

Fig. 3. Adaptive sparsification control based on average weighted degree.

not find any path that connects to port node d, node d will be keptas a separate port node.

D. A Block Model Sparsification Technique

The above port-merging scheme can effectively reduce the num-ber of port nodes within a power grid block, and thus the reducedmodel size. However, if the number of port nodes after merging isstill large, using Schur complement can result in rather dense reducedmodels that will be very expensive to use. To this end, we propose anovel model sparsification method to significantly reduce the modelcomplexity via an iterative spectral graph sparsification scheme basedon the recent spectral graph sparsification theory [9].

As introduced in Section II, spectral graph sparsifier can beformed by sampling edges according to their effective resistances.In practice, however, the resultant graph sparsity and approximationaccuracy will be highly dependent on the number of samples. Ac-cording to (6), it can be shown that when the number of samples islarge enough, nearly all the edges will be sampled for many times,resulting in a spectral sparsifier that has almost the same graph densityas the original graph. On the other hand, if only a small number ofsamples are used, many edges may not be sampled at all, leading toa sparsifier that will be very sparse and may not well approximatethe spectral properties of the original graph.

To well control the reduced block sparsity while maintaining thedesired accuracy, the weighted degree metric will be examined [14].For a graph Laplacian matrix Agraph, the weighted degree wd(v) ofa vertex v ∈ Agraph is defined as the ratio [14]:

wd(v) =vol(v)

maxu∈N(v)w(u, v), (11)

where vol(v) represents the total weight incident to the vertex v,N(v) represents the number of edges connected with vertex v, andw(u, v) represents the weight of the edge connecting vertex v andvertex u. The average weighted degree of the graph can be calculatedby [14]

awd(Agraph) =1

n

�v∈V

wd(v), (12)

where V denotes the set including all the vertices in graph Agraph,and n is the total number of vertices.

During the spectral sparsification step, we will repeatedly applythe weighted degree metric to determine the optimal number ofsamples used for graph sparsification: we assure that the averageweighted degree of the sparsified graph should be close enough tothe average weighted degree of the original graph. As shown inFig. 3, for a given threshold θ, we will progressively add new samplesfor computing the sparsified graph until its average weighted degreeawdsmp > θ×awdorig, where awdorig denotes the average weighteddegree of the original graph.

220

Page 4: An Efficient Spectral Graph Sparsification Approach to ...zhuofeng/MTU_VLSI_DA_files/... · Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable

Original reducedblock

After spectralsparsification

After edge scaling

Port Original edge of reduced gridSampled edges Scaled sampled edges

Fig. 4. Spectral graph-based block sparsification with conductance-basedgraph scaling.

We propose the following procedures for computing the spectralgraph sparsifier of the reduced power grid block i:

1) Build the Laplacian matrix Gi of reduced block i andcompute the average weighted degree awdi

orig.2) Factorize Gi for one time and store the factors.3) For each edge j compute the effective resistance Reff

j by(7).

4) Compute the initial probability by pej = wejReffj , where

wej is the conductance of edge j. Sum up pej for all edgesto compute the total sampling probability totpe.

5) Update the sampling probability of each edge that is pro-portional to the original one by pej =

pej

totpe[15].

6) Sample each edge M times: each time, generate a randomnumber rnj within range 0 to 1, if rnj is equal or greaterthan pej , the corresponding edge sample is selected. If edgej is selected K times for M samples, edge j will be addedto the sparsifier with a total weight

Kwej

Mpej.

7) Progressively add new samples to compute the updatedsparsified grid block until its average weighted degree valuecan well match the original average weighted degree.

After performing above spectral sparsification procedure for eachreduced grid block, only the edges with relatively large conductancevalues will be included into the sparsified graph. As a result, theconductance of the sparsified graph is always smaller than theconductance of the original graph. To further improve the spectralgraph approximation without increasing extra edges, the a simplegraph scaling (GS) scheme can be applied to each grid block, asshown in Fig. 4. As a result, we have

Mk�i=1

gik =

mk�j=1

αkgkj , (13)

where k is block index, Mk denotes the number of edges in originalreduced block, αk is the scaling factor of block k, mk denotes thenumber of edges in sparsified reduced block, gk

i denotes the edgeweight before sparsification, and gk

j denotes the edge weight obtainedfrom sparsification.

E. Current Redistribution Using Numerical Method

When applying the block-based grid reduction and graph sparsi-fication methods for each block, we need to consider the eliminatednodes that are previously connected with current sources. To accu-rately preserve the voltage droops on the existing nodes, we applythe following method derived from Schur complement for currentredistribution [8]. For each block j of the original power grid, themodified nodal equations can be written as�

Gj11 Gj

12

Gj12

TGj

22

� ��vj

p

�vjnp

�=

�Ij1 + Bj

Ij2

�, (14)

where Ij1 and Ij

2 are vectors of current sources connected to the portnodes and non-port nodes, respectively. Bj is a vector of current

Port

Original edge

Merged edge

g1g2

g3

g4

12

4

3

56

8

7

Block i Block j

Before Merging

g2>g3>g1>g4

After Merging

1’

4

5’

7

Block i Block j8

Fig. 5. Block-to-block edges stitch on merged ports.

flowing into its neighboring blocks through the ports of block j, �vjp

and �vjnp are vectors of voltages for ports and non-port nodes, Gj

11

is the admittance matrix of port nodes, Gj12 is the admittance matrix

of connections between ports and non-port nodes, and Gj22 is the

admittance matrix of non-port nodes. Similar to (9), the first part of(14) can be written as

Bj = (Gj11 − Gj

12Gj22

−1Gj

12

T)�vj

p + (Gj12G

j22

−1I2

j − I1j), (15)

where the currents flowing through the ports into the neighboringblocks Bj can be described by using the reduced system matrix withport voltages and the current sources redistributed onto the ports.Therefore, for each reduced block j, the redistributed current sourcescan be represented by

bj = Gj12G

j22

−1Ij2 − Ij

1 , (16)

which can be used for both DC and transient analysis.

F. Block Power Grid Model Stitching

Once the reduced block models are generated through the aboveprocedures, they can be stitched together with recreated block-to-block edges to form the final reduced power grid system. First,at the common side of blocks i and j, we need to pair the portsof block i with ports of block j. For each port k of block ithat has connection to block j in original grid, we calculate theport conductivity Pwi,k by summing up the weights of block-to-block edges incident to the original boundary nodes of block i thatare later merged to port k. Same procedures are also applied toblock j. Then we sort Pwi,k for block i and Pwi,k for block j,respectively. If block i has Npij ports as interface to block j andblock j has Npji ports as interface to block i, we only recreatemin(Npij , Npji) block-to-block edges. Then, ports of block i andj with top min(Npij , Npji) largest port conductivities are pairedwith edge weight

(Pwi,k+Pwj,k)

2, respectively. To better illustrate

the reduced block model stitching scheme, consider the example inFig. 5: after reduction, block i has two ports as interface to blockj, while block j has three, then two new edges will be created.Port 1′ of block i and port 5′ of block j have largest conductivityg1 + g2 + g3 and g1 + g2 in each block respectively, then a newedge is created to connect them with weight (2g1+2g2+g3)

2. Port 4 of

block i and port 7 of block j have 2nd largest conductivity in eachblock, then they are connected with edge weight (g3+g4)

2. Port 8 has

no connection to block i. It is also worth noting that once the powergrid design is partially modified during the design and optimizationprocess, only the modified power grid blocks and correspondingblock-wise connections are required to be reprocessed using our blockreduction and model stitching methods, which can facilitate muchfaster incremental power grid design and optimizations.

221

Page 5: An Efficient Spectral Graph Sparsification Approach to ...zhuofeng/MTU_VLSI_DA_files/... · Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable

TABLE I. EXPERIMENTAL SETUP OF POWER GRID TEST CASES.

CKT # Nodes # Res. # Lay. # C4s # I Max. Vd.ibmpg3 440,615 724,184 5 461 88,471 182mVibmpg4 478,094 779,946 6 650 127,221 3.6mVibmpg5 581,472 871,182 3 177 236,600 43mVibmpg6 862,417 1,283,371 3 249 315,568 114mV

4 5 6 7 8 9x 105

10

20

30

40

50

60

70

80

90

Power grid size

Tota

l red

uctio

n tim

e (s

)

90%93%95%

Fig. 6. Reduction time for 90%, 93% and 95% node elimination of powergrid.

G. Capacitance Redistribution for Transient Analysis

In transient analysis, the dynamic charging and discharging effectsof loading capacitors dominate the voltage supply variations of thepower/ground network. In order to preserve the dynamic behaviorsin the reduced grid, we propose to evenly redistribute the capacitorswithin each block grid such that the total block capacitance remainunchanged after the reduction. Since power grid blocks are usuallyvery small for maintaining a good accuracy level, the proposed uni-form capacitance distribution scheme within each block will producevery accurate results, as shown in our extensive experiments.

IV. EXPERIMENTAL RESULTS

In this section, extensive experiments have been conducted toevaluate our proposed scalable reduction method for power gridnetworks.

A. Experimental Setup

We evaluate our proposed power grid reduction method in run-time and accuracy, as well as memory cost. External libraries areadopted for Cholesky decomposition [16]. A set of IBM power gridbenchmarks has been tested [17]. The algorithm is implemented inC++. All experiments are performed using a single CPU core of acomputing platform running 64-bit RHEW 6.0 with 2.67GHz 12-coreCPU and 48GB DRAM memory. The characteristics of power gridbenchmarks, and the maximal voltage droops have been concluded inTable I, where “# Nodes” denotes the total number of nodes in thepower grid, “# Res.” for number of resistors, “# Lay.” for number ofmetal layers, “# C4s” for number of C4 bumps, “# I” for number ofcurrent sources, and “Max. Vd.” for maximal voltage droops on thegrid.

B. The Results of Power Grid Reductions

First, we would like to demonstrate the proposed power gridreduction method for DC analysis. The results of proposed power gridreduction method for IBM power grid benchmarks are summarizedin Table II, where “#blk” denotes the number of blocks, “thresR”denotes the threshold for port merging, “#N” denotes the numberof nodes after reduction, “#R” is the number of resistors in thereduced network after sparsification, “Err.” is the absolute error, “w/o

TABLE III. COMPARISON OF ACCURACY BETWEEN DIFFERENTSPARSITIES FOR “IBMPG5” UNDER 90% NODE ELIMINATION OF POWER

GRID.

Sparsify? Nsmp Fact. Solve Mem #R Err.No - 0.26 0.018 88MB 1,064k 0.68mVYes 1600 0.21 0.016 47MB 319k 0.80mVYes 1200 0.20 0.016 45MB 282k 0.9mVYes 800 0.19 0.015 43MB 238k 0.9mVYes 600 0.18 0.015 39MB 209k 1.05mVYes 400 0.16 0.014 33MB 175k 1.28mVYes 200 0.12 0.012 25MB 132k 2.3mV

TABLE IV. RUNTIME ANALYSIS BETWEEN COMPLETE POWER GRIDREDUCTION AND INCREMENTAL REDUCTION. ALL RESULTS ARE

MEASURED IN SECOND.

CKT 90% Elimination 93% Elimination 95% EliminationFull Inc. Full Inc. Full Inc.

ibmpg3 23.5 2.66 (9X) 21.6 2.50 (8X) 20.7 2.40 (8X)ibmpg4 38.7 4.40 (9X) 29.5 3.39 (9X) 25.2 2.94 (9X)ibmpg5 26.0 2.70 (10X) 19.4 2.17 (9X) 18.4 2.10 (9X)ibmpg6 88.0 9.25 (9X) 71.0 7.40 (9X) 54.6 5.98 (9X)

GS” denotes sparsification without applying graph scaling, and “w/GS” denotes sparsification with graph scaling. All the threshold ismeasured in Ohms, and the absolute errors are measured in millivolts(mV). Three node elimination ratios are applied that are 90%, 93%and 95% node elimination of power grids. The initial number ofsamples for each edge in spectral sparsification is set to 400 for allbenchmarks. A weighted degree threshold of 0.9 is utilized to controlthe sparsification.

As observed in Table II, for each benchmark, given a largerresistive threshold for port merging, a higher node elimination ratioof power grid can be obtained. However, larger node eliminationratios can sacrifice the accuracy and lead to larger solution errors(e.g. solution error of “ibmpg3” which has a maximal 182mV voltagedroop increases from 1.4mV to 2.4mV). It is also observed that foraround 10X power grid reduction (90% elimination ratio), the numberof edges can be reduced by up to 6X which is around half of the powergrid reduction. For 93% elimination ratio, the number of edges can bereduced by up to 7X, while 8X edge reduction for 95% eliminationratio. Moreover, Fig. 6 shows the total runtime of the proposedreduction method for 90%, 93% and 95% node eliminations that mostof the time cost is from port merging in which the effective resistancebetween each set of two boundary nodes needs to be calculated, aswell as spectral sparsification on every reduced block which requiresto evaluate many samples for each edge to build the sparsifier. We canobserve that for the same elimination ratio, the reduction time almostscales linearly with the problem size. Furthermore, the effectivenessof graph scaling scheme is also analyzed. By comparing the errorsobtained by “w/o GS” and “w/ GS”, we can see that by applyinggraph scaling, the error can be reduced up to 4.5X.

In the incremental power grid reduction analysis, 10% blocks ofeach benchmark are modified. The runtime results of the incrementalpower grid reduction, as well as complete power grid reduction,are summarized in Table IV, where “Full” denotes execution ofa complete power grid reduction, and “Inc.” denotes the proposedincremental power grid reduction method in which only the modifiedblock power grid models are reprocessed. The total runtime ofreduction process that includes port merging, Schur complement, andspectral sparsification is measured in second. We can observe fromthe table that, benefited from the proposed block-based power gridreduction method, for 90%, 93% and 95% elimination ratios, theincremental reduction process of all benchmarks can achieve average9X runtime speedups. It should be noted that if the modified powergrid blocks have more number of nodes and ports than average, theincremental reduction process of modified power grid blocks can

222

Page 6: An Efficient Spectral Graph Sparsification Approach to ...zhuofeng/MTU_VLSI_DA_files/... · Abstract—Existing state-of-the-art realizable RC reduction methods may not be suitable

TABLE II. EXPERIMENTAL RESULTS OF DC ANALYSIS FOR REDUCED POWER GRID NETWORKS W/ SPECTRAL SPARSIFICATION UNDER 90%, 93%, AND95% NODE ELIMINATION

CKT90% Elimination 93% Elimination 95% Elimination

#blk thresR #N #R Err. #blk thresR #N #R Err. #blk thresR #N #R Err.w/o GS w/ GS w/o GS w/ GS w/o GS w/ GS

ibmpg3 2116 0.14Ω 44k 158k 1.6mV 1.4mV 2116 0.20Ω 32k 129k 2.0mV 1.5mV 2116 0.32Ω 23k 95k 2.5mV 2.4mVibmpg4 961 0.30Ω 49k 201k 0.87mV 0.19mV 961 0.40Ω 33k 165k 0.80mV 0.28mV 961 0.68Ω 23k 119k 0.70mV 0.52mVibmpg5 1156 0.10Ω 52k 175k 1.5mV 1.2mV 1156 0.20Ω 35k 141k 3.0mV 1.5mV 1156 0.30Ω 31k 135k 2.9mV 1.7mVibmpg6 900 0.05Ω 73k 183k 2.8mV 2.4mV 900 0.15Ω 62k 172k 4.6mV 3.0mV 900 0.25Ω 43k 148k 6.7mV 3.9mV

0 10 20 30 40 500.01

0.015

0.02

0.025

0.03

0.035

Transient steps

Volta

ge (V

)

OrigReduced

32 34 360.015

0.016

0.017

0.018

0.019

OrigReduced

Fig. 7. Transient waveforms comparison of “ibmpg5” for 90% nodeelimination.

require longer time, thus leading to less speedups, and vice versa.

Table III compares the solution quality between different spar-sities for “ibmpg5”, where “Nsmp” denotes the number of samples,“#R” denotes number of edges in the reduced power grid, and “Err.”is the absolute solution errors. The memory and runtime cost for dif-ferent sparsities of reduced power grid are also measured that “Fact.”denotes the Cholesky factorizing time, “Solve” denotes solving time,“Mem” is the memory cost during factorization. Runtime is measuredin second and memory cost is measured in Mega-Byte. On one hand,without any sparsification, solving the reduced power grid requiresthe longest runtime and the largest memory cost due to the extremelyheavy grid density. We can observe that more than one million edgesare created which is even greater than the original grid, although thebest solution quality can be obtained. On the other hand, spectralsparsification using 200 samples of each edges can greatly reduce thenumber of edges in the reduced power grid by a factor of 8 and obtain3.4X larger errors, while a factor of 3 for 1600 samples with slightlygreater errors. It is easy to understand that more edges can be selectedwith larger number of samples, thus the corresponding sparsifier canbetter approximate the electrical characteristics of original grid. Whilefewer samples can greatly reduce the density by generating a sparsifierwith fewer edges, thus it may suffer from larger errors. Therefore,there is a tradeoff between the reduced model sparsity (computationalcost) and the solution quality which can be well controlled by theblock average weighted degree scheme.

The validation of transient analysis is also examined. We showthe results of benchmark “ibmpg5” as example. In “ibmpg5”, totalcapacitance of 5e-7 Farad is distributed on every node of originalgrid. Periodic pulse current are generated at each current sources fortransient analysis. Fig. 7 illustrates the voltage waveforms obtainedby 50-step transient analysis at the same node of “ibmpg5” before andafter reduction, respectively. We can observe that the two waveformscan well match each other and the maximal error between the twowaveforms is 0.4mV.

V. CONCLUSION

We present a spectral graph sparsification technique for scalablelarge-scale flip-chip power grids reduction. By partitioning large

power grids into many smaller power grid blocks, the reduction ofeach power grid block can be performed efficiently and independentlythat those blocks are subsequently reduced to even smaller and sparserpower grid blocks by using effective resistance-oriented port-mergingalgorithm and spectral graph sparsification method. Extensive experi-mental results show that the proposed method can scale linearly withpower grid sizes and efficiently reduce industrial power grids sizes by20X without loss of much accuracy in both DC and transient analysis.

VI. ACKNOWLEDGEMENTS

This work is supported in part by the National Science Foundationunder Grant No. CCF-1318694, and by a grant from Intel Corporation.

REFERENCES

[1] P. Feldmann and T. Liu, “Sparse and efficient reduced order modelingof linear subcircuits with large number of terminals,” in Proc. ofIEEE/ACM ICCAD, 2004, pp. 88–92.

[2] P. Li and W. Shi, “Model order reduction of linear networks withmassive ports via frequency-dependent port packing,” in Proc. ofIEEE/ACM DAC, 2006, pp. 267–272.

[3] B. Yan, S. X D Tan, L. Zhou, J. Chen, and R. Shen, “Decentralized andpassive model order reduction of linear networks with massive ports,”IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 20,no. 5, pp. 865–877, 2012.

[4] B.N. Sheehan, “Realizable reduction of RC networks,” IEEE Trans. onComputer-Aided Design, vol. 26, no. 8, pp. 1393 –1407, 2007.

[5] C.S. Amin, M.H. Chowdhury, and Y.I. Ismail, “Realizable RLCK circuitcrunching,” in Proc. of IEEE/ACM DAC, 2003, pp. 226–231.

[6] H. Su, E. Acar, and S.R. Nassif, “Power grid reduction based onalgebraic multigrid principles,” in Proc. of IEEE/ACM DAC, 2003,pp. 109–112.

[7] Y. Su, F. Yang, and X. Zeng, “AMOR: an efficient aggregating basedmodel order reduction method for many-terminal interconnect circuits,”in Proc. of IEEE/ACM DAC, 2012, pp. 295 –300.

[8] M. Zhao, R.V. Panda, S.S. Sapatnekar, T. Edwards, R. Chaudhry, andD. Blaauw, “Hierarchical analysis of power distribution networks,” inProc. of IEEE/ACM DAC, 2000.

[9] D.A. Spielman and S. Teng, “Spectral sparsification of graphs,” CoRR,vol. abs/0808.4134, 2008.

[10] J. Rommes and W.H.A. Schilders, “Efficient methods for large resistornetworks,” IEEE Trans. on Computer-Aided Design of IntegratedCircuits and Systems, vol. 29, no. 1, pp. 28 –39, jan. 2010.

[11] H. Qian, S.R. Nassif, and S.S. Sapatnekar, “Power grid analysis usingrandom walks,” IEEE Trans. on Computer-Aided Design of IntegratedCircuits and Systems, vol. 24, no. 8, pp. 1204–1224, 2005.

[12] L. Pillage, R. Rohrer, and C. Visweswariah, Electronic circuit & systemsimulation methods, McGraw-Hill, 1995.

[13] D.A. Spielman and N. Srivastava, “Graph Sparsification by EffectiveResistances networks,” in Proc. of ACM STOC, 2008, pp. 563–568.

[14] I. Koutis, G. L. Miller, A. Sinop, and D. Tolliver, “Combinatorialpreconditioners and multilevel solvers for problems in computer visionand image processing,” Tech. Rep., CMU, 2009.

[15] I. Koutis, G. L. Miller, and R. Peng, “A fast solver for a class of linearsystems,” Commun. ACM, vol. 55, no. 10, pp. 99–107, Oct. 2012.

[16] T. Davis, CHOLMOD: sparse supernodal Choleskyfactorization and update/downdate, [Online]. Available:http://www.cise.ufl.edu/research/sparse/cholmod/, 2008.

[17] S.R. Nassif, IBM power grid benchmarks, [Online]. Available:http://dropzone.tamu.edu/ pli/PGBench/, 2008.

223