A Fast Metal Layer Elimination Approach for Power Grid

A Fast Metal Layer Elimination Approach for Power Grid Reduction inIntegrated Circuits

by

Abdul-Amir Yassine

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical & Computer EngineeringUniversity of Toronto

c© Copyright 2016 by Abdul-Amir Yassine

Abstract

A Fast Metal Layer Elimination Approach for Power Grid Reduction in Integrated Circuits

Abdul-Amir Yassine

Master of Applied Science

Graduate Department of Electrical & Computer Engineering

University of Toronto

2016

Simulation and verification of the on-die power delivery network (PDN) is one of the

key steps in the design of integrated circuits. With the very large sizes of modern grids,

verification of PDNs has become very expensive and a host of techniques for grid model

approximation have been proposed. These include topological node elimination and

full-blown numerical model order reduction (MOR). However, both of these traditional

approaches suffer from drawbacks that limit their scalability to very large grids. In

this thesis, we propose a novel technique for grid reduction that is a hybrid of both

approaches – the method is numerical but also factors in grid topology. It works by

eliminating whole internal layers of the grid at a time, while aiming to preserve the

dynamic behavior of the grid. Effectively, instead of traditional node-by-node topological

elimination we provide a numerical layer-by-layer block-matrix approach that is both fast

and accurate.

ii

Acknowledgements

I would like to express my sincere gratefulness to several individuals without whom this work would not

have become a reality. Thank you all for providing me with the immense support and for being there

when I needed you the most.

It goes without saying that my research supervisor, Professor Farid N. Najm, deserves all the gratitude

and appreciation for the continuous guidance, inspiration, motivation and encouragement he has been

giving me the past two years; without which this work could not have been completed. I am honored to

have been given the chance to be part of his research group. Professor Najm’s technical leadership and

warm friendliness are traits every individual hopes to find in a supervisor. I hope I would achieve half

of what Professor Najm has already achieved in his career. He is a real role model one could look up to.

Thank you professor for pushing me and providing me with such an amazing experience that I am sure

will help me build a very good career and a much better future.

I would also like to thank Professor Jason Anderson, Professor Jianwen Zhu and Professor Wai Tung

Ng from the ECE department at the University of Toronto for reviewing this work and for providing

me with constructive comments. I would also like to acknowledge the financial support provided by the

University of Toronto and the Natural Sciences and Engineering Research Council of Canada (NSERC).

Much gratitude goes also to Mohammad Fawaz whom I will forever be in debt to. Mohammad is one

of the great friends I have had before being a colleague. I would like to thank him for all the support and

time he has dedicated for me to help me crack my research problem. I can never forget all the discussions

we had inside and outside the lab. I hope to keep those discussions and such a great friendship going.

To my best friend, Zahi Moudallal, I would like to express how grateful I am to have him as a friend,

a colleague and a roommate. Without Zahi, these two years would have been really hard to finish. I

cannot overlook the help Zahi has given me in my courses and my research. Thank you for all the long

and fruitful discussions, jokes and adventures we had (and will keep having).

I would also like to thank Sandeep Chatterjee, my friend and colleague. Thank you Sandeep for all

the research and non-research related discussions we had. Sandeep is the man whom you find when you

need him the most. His help in my research is undeniable. I cannot forget the guidance he has given

me to finish my degree. Of course, I cannot also forget all the jokes and super-heroes and time-travel

discussions we had!

A special gratitude is due to Natali Kobayaa for her constant love, support and patience throughout

these two years. I am very lucky, and forever grateful to have her in my life. I cannot envy her enough

for the patience she has while listening to me nagging about my research, especially during the phase

in which my problem was stuck. Thank you for all the support and encouragement you have given me.

This work would not have been the same without you!

These acknowledgement would not be complete without mentioning my friends: Ali, Reem, Ethar,

Noha, Noura, Maher, Mohammad Sinno and Elias, with whom I have spent a quality time outside the

lab. Of course, I cannot neglect the value of my office mates at Pratt building, room 392, for providing

a friendly and pleasant environment.

Last but not least, my greatest gratitude goes to my parents, Mounir Yassine and Ward Nasrallah,

and my siblings, Zeinab, Layal, Rana and Abbas, to whom this thesis is dedicated. I cannot thank

them enough for the continuous support they have given me, and for the trust they have put in me,

despite being very far away. This journey wouldn’t have been possible without them. Thank you all for

everything you have done to me. I would not be the same without you.

iii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background & Notation 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 The Power Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3.2 Power Grid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.3 Spatial Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Power Grid Simulation and Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Model Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5.1 Moment Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5.2 Truncated Balanced Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.3 Multigrid Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.4 Topological Node Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5.5 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Grid Reduction 23

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Layer Elimination in Resistive Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Sparsification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Extension to RC Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Incremental Layer Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.6.1 SPER v.s. Exact Effective Resistance . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.6.2 Direct vs. Incremental Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.6.3 Transient Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

iv

4 Voltage Drop Predictor 45

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.1 Power Grid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.2 Current Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2.3 Vectorless Power Grid Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 Voltage Drop Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.1 DC Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3.2 Transient Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.3 Transient Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Conclusion and Future Work 54

Bibliography 56

v

List of Tables

3.1 Power Grids Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 Comparing SPER to computing exact effective resistance for sparsification . . . . . . . . . 39

3.3 Comparison between Direct and Incremental Elimination. Grid Name: G1, Grid Size:

152K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 Numerical Elimination v.s. Topological Elimination . . . . . . . . . . . . . . . . . . . . . 40

3.5 Incremental layer elimination results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.6 Power grids reduction time and Transient simulations run time . . . . . . . . . . . . . . . 41

vi

List of Figures

2.1 A 3D multi-layer on-die power delivery network . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 An RC model of a power grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Hierarchical power grid analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Graphical illustration of model order reduction [11] . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Circuit models between pairs of ports after reduction . . . . . . . . . . . . . . . . . . . . . 15

2.6 A simple node elimination process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Effective resistance between grid nodes v.s. distance . . . . . . . . . . . . . . . . . . . . . 26

3.2 Quick Node property of a power grid of size 152K nodes (fmax = 1GHz). . . . . . . . . . 29

3.3 Illustration of G(q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Illustration of c(q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Sparsity patterns of the system matrix at different phases of the elimination process . . . 38

3.6 (a): Voltage Drop Waveforms at one node in the grid (b): Histogram of the error over

all nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7 Total Simulation Time before and after reduction . . . . . . . . . . . . . . . . . . . . . . . 42

3.8 A 3D plot of the error at M1 nodes after reduction . . . . . . . . . . . . . . . . . . . . . . 43

3.9 Error rate versus Actual voltage drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1 DC voltage drop distribution at each layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 DC voltage drop distribution at each layer. Case 1 has 50% current source distribution . . 49

4.3 Upper bound & DC voltage drop distribution at each layer . . . . . . . . . . . . . . . . . 50

4.4 Transient & DC voltage drop distribution at each layer . . . . . . . . . . . . . . . . . . . . 50

4.5 Average transient and DC voltage drop distribution of each layer - Original Grid . . . . . 51

4.6 Average transient and DC voltage drop distribution of each layer - Increased Pitches . . . 52

4.7 Average transient and DC voltage drop distribution of each layer - Decreased Pitches . . . 52

4.8 Average transient and DC voltage drop distribution of each layer - Increased Widths . . . 53

4.9 Average transient and DC voltage drop distribution of each layer - Decreased Widths . . 53

vii

Chapter 1

Introduction

“The mind is the limit. As long as the mind can envision the fact that you can do some-

thing, you can do it, as long as you really believe 100 percent” – Arnold Schwarzenegger

1.1 Motivation

Soon after the invention of monolithic integrated circuits (ICs) in the beginning of the 1960’s, Gordon

Moore predicted that the number of transistors lying on a dense integrated circuit would double every

two years. With the continuous enhancement of performance, power, feature size and cost characteristics,

the semiconductor industry has proved Moore’s law right, and has experienced a huge growth to reach

to where it is today. Currently, we are in the age of ultra large scale integration, or ULSI. Typically,

a modern chip area would be around 400mm2, with up to 5 billion transistors. With such a huge

number, chip reliability is of a great concern for designers. A chip is reliable if it is providing the correct

logic functionality at the intended design speeds. This reliability depends heavily on the power delivery

network, or power grid, that delivers the intended power levels to the on-chip devices. Any significant

voltage fluctuation due to the many parasitics in the grid may lead to functional failures and circuit

delays. With the increase in the number of on-chip devices, the number of grid nodes increases, and

the interconnect width is scaled down leading to higher grid parasitics. Besides, most of the modern

high-performance chips operate at reduced supply voltages with high current switching activity, which

add to the voltage fluctuation problem. Therefore, designers tend to verify power grids to ensure that

the voltage drop at any node in the grid does not exceed a certain threshold. In such a case, the grid is

said to be safe.

The problem of power grid verification has been extensively studied in the past, and lots of approaches

have been introduced trying to cope with the very large sizes of modern grids. Most of the techniques

involve simulating the grid against current waveforms drawn by the underlying transistor circuitry in

order to determine the voltage response at the grid nodes. Such methods suffer from a major problem,

as the set of all possible current waveforms that cover the behavior of the underlying logic is huge, which

makes the simulations very expensive. A lot of studies have tried to come up with algorithms that search

for the set of waveforms that would cause the worst-case voltage drop. However, such techniques do not

1

Chapter 1. Introduction 2

allow for early verification, where any fixes to the grid needed to be done can be easily implemented,

since no knowledge of the current drawn is available at that stage of the design.

Other efforts have been made to verify the grids at the early stages of the design flow. Vectorless

verification, introduced in [14], is a verification technique that does not require full knowledge of current

waveforms, and relies only on partial information available at the early stages in the form of current

constraints. This information captures the uncertainty in circuit behaviors. With vectorless verification,

the verification problem is reduced to a problem of finding the worst-case voltage drops (upper bounds)

that satisfy the given constraints. However, these techniques require running a linear program (LP) for

each node in the grid, which makes it infeasible to verify very large power grids (billions of nodes) given

the existing resources. The authors in [1] developed an efficient method of incremental verification of

RC power grids based on vectorless techniques. They were able to reduce the number of LPs required

to verify the grids, which was still very large. Another drawback is that the constraints are obtained

from engineering judgement and expertise from previous design activities. This may lead to pessimistic

or very optimistic results when evaluating the worst-case voltage drops.

Looking at the problem, if one is interested in the safety of part of the grid, then it would be a waste

of time and resources to simulate the whole grid. For example, if one needs to test the behavior of a

small block of on-chip transistors, then only the power grid nodes connected to those devices need to be

verified. Another example is that if a change is made to a previously verified grid, then only the local

impact (on the surrounding region) of that change needs to be verified. Therefore, grid reduction comes

into play. Grid reduction aims to reduce the size (number of nodes) of the grid using existing model

order reduction techniques while preserving the essential characteristics of the original grid. With a

reduced grid, the verification problem becomes relatively more efficient. Model order reduction (MOR)

has been extensively studied in the past decades, and proves beneficial in many applications involving

large networks.

In the context of power grid verification, several MOR techniques have been employed in recent years.

These techniques differ in the approach followed to reduce the grid. Some employ a numerical approach

by working on the system matrices and model equations directly such as PRIMA [26], others follow

an iterative and divide-and-conquer approach [39], and other methods adopt a topological approach to

selectively remove nodes from the grid [38]. Several other attempts have been made aiming to come

up with efficient and accurate reduction techniques. However, most of the reduction techniques, if not

all, suffer from drawbacks and/or perform well only on grids with specific characteristics. For example,

PRIMA and most of the numerical techniques, although they are relatively fast reduction techniques,

do not scale well with the number of ports (inputs and outputs) of the system, as they produce dense

reduced models. Modern power grids contain a large number of ports making such methods unfavorable.

Iterative methods do not perform well with non-uniform power grids (which is the case with modern

grids), as it is hard to keep track of the structure and topology of the grid at different iterations and

the errors incurred are not predictable. Finally, a lot of effort has been put in topological reduction

methods [12, 38], however, such methods have proved to be relatively slow with large networks, and they

do not perform well with mesh-structured grids due to the very dense systems they produce. Such dense

systems are more difficult to simulate than the original systems.

Furthermore, most of the grid reduction techniques encountered in the literature do not exploit the

3D multi-layered structure of modern power grids, and do not perform well when taking into account

the capacitive and inductive parasitics of the grids. Given the multi-layered structure of modern power

Chapter 1. Introduction 3

grids and that all transistors and logic circuitry are connected to the bottommost layer, then it is safe

to say that the safety of the nodes in that bottom layer is what matters for the verification problem. If

the voltage at these nodes is within the desired limits, then the supply voltage to the on-chip devices is

in-check, and no failures would occur due to voltage fluctuations.

1.2 Contribution

The goal of this research is to find a fast, efficient and accurate reduction technique for very large RC

power grids with many ports. Given that numerical reduction methods are in general fast, we propose

a novel numerical reduction technique that exploits the fact that the whole logic circuitry is attached to

the bottom metal layer of a power grid. The method works by eliminating whole internal metal layers

at a time, while preserving the behavior of the original grid. Instead of the node-by-node topological

elimination studied in [12] and [38], we propose a fast, numerical layer-by-layer block-matrix elimination

approach. The key contributions of this technique are the following:

1. Sparsification: In order to preserve the sparsity of the grid after reduction while preserving its

original input-output behavior, we propose a simple, yet effective, sparsification technique that

factors in grid topology in order to find the connections that strongly affect the output response

of the system.

2. Capacitive effects consideration: Our method provides a numerical approach to approximate

the capacitive parasitics of the reduced system. We show that the time constants of the original

and reduced grids are almost the same.

Aside from reduction, with a multi-layered structure of power grids consisting of several blocks, it

would be interesting to compute the voltage drop at a specific layer without the need to simulate the

whole grid. This proves beneficial in various applications such as design and verification. We propose an

efficient method to estimate the voltage drop at different metal layers of the grid. We empirically prove

that the method gives exact results in relatively short times.

1.3 Thesis Organization

The rest of this thesis is organized as follows: Chapter 2 provides the necessary background for the

work done in this thesis. We review the power grid models and their different parasitics. On top of

that, we familiarize the reader with some of the existing power grid verification techniques found in the

literature, along with their drawbacks. We also review model order reduction and its history with some

of the well-known existing techniques and their drawbacks. Chapter 3 constitutes the main contribution

of this thesis. We present the main theory and heuristics behind our RC grid reduction technique, along

with the detailed implementation of the algorithm and some experimental results and comparisons.

Chapter 4 presents our voltage drop prediction technique with some experimental results. We conclude

in Chapter 5 with a summary of our work and contributions, and with possible future directions of

investigation.

Chapter 2

Background & Notation

2.1 Introduction

This chapter provides a brief overview of the background material necessary for the research presented

in later chapters. We first present some notation in section 2.2 that will be used throughout the rest

of this thesis. After that, section 2.3 provides an overview of the power delivery network (PDN) and

the main parasitic effects in modern chip design, along with a description of the power grid model

used and the main system equations. We then briefly describe a spatial locality property of the power

grid. Subsequently, we introduce the power grid verification problem and the issues faced with modern

verification techniques in section 2.4. We discuss several research works that have added to and improved

the modern vector-based verification approach. Finally, we present in section 2.5 a brief overview of

Model Order Reduction (MOR) and the different techniques found of in the literature, along with issues

and drawbacks of those techniques.

2.2 Notation

Throughout the rest of this thesis, we use standard definitions and results from [31], and we use the

following notation. Let (M)ij denote the (i, j)th entry of any matrix M , and let (v)j denote the jth

entry of a vector v. We will use the notation M > 0 (or M ≥ 0), for any matrix M , to denote that

(M)ij > 0 (or (M)ij ≥ 0), ∀i, j. Finally, let diag(α1, α2, . . . , αn) be a diagonal n × n matrix with the

diagonal consisting of the entries α1, α2, . . . , αn.

2.3 The Power Grid

2.3.1 Overview

A power delivery network (PDN) of an integrated circuit is a distribution system that delivers the power

and the ground voltages from pad locations to all devices in the design. A typical power distribution

system for a high speed integrated circuit spans several levels of packaging hierarchy [19]. It consists of

a switching voltage regulator module (VRM), the power delivery networks on a printed circuit board

(PCB), and on an integrated circuit package, and the on-die power delivery network, plus the decoupling

4

Chapter 2. Background & Notation 5

Figure 2.1: A 3D multi-layer on-die power delivery network

capacitances connected to those networks. A voltage regulator coverts the DC voltage level provided by

an external power supply unit to a voltage Vdd required for powering an integrated circuit. The PCB

and package power and ground delivery networks are typically comprised of several low impedance metal

layers. The package PDN is connected to the on-die PDN through a flip-chip array of C4 (controlled

collapsed chip connection) bump contacts [41, 10]. In this work, our focus is mainly on the on-die power

delivery network, or what is commonly known as the “power grid”.

An on-die power grid of a high complexity, high performance integrated circuit typically comprises of a

multi-layer metallic mesh structure. Each layer consists of many equidistantly spaced lines (interconnect

trees) of equal width. The direction of the power and ground lines in each layer is orthogonal to the

direction of those in the upper/lower layers. Each power and ground line is connected through vias to

the power and ground lines, respectively, in the upper/lower layers at the intersection sites. In a modern

power grid, the lower the metal layer, the smaller the width and the pitch of the lines. Fig. 2.1 shows

a 3D example of a power grid with three metal layers. We can see how the power and ground lines

are connected among the different layers, and how the width and pitch of each layer increases with the

higher layers.

Ideally, the voltage levels on the grid are uniform, meaning that every node in the grid should have

the same voltage level and equal to the supply voltage Vdd. However, due to the resistance and other

parasitics in the power and ground lines constituting the grid, circuit activity, coupling effects, and

electromigration [4], there occurs a voltage drop across the grid. Excessive voltage drops may reduce

switching speeds and noise margins of circuits, and may inject noise which may lead to soft errors and

functional failures [3, 8, 29, 35]. Therefore, achieving good voltage regulation across the grid is a key

feature in modern high performance integrated circuits designs in order to provide a certain voltage level

to the underlying logic circuit components and ensure a reliable operation of the chip. This can be done

through simulation and/or verification of the power grid as will be discussed in section 2.4. To this end,

designers have to devise an accurate model of the power grid and its parasitics.

The main parasitic effects of an on-die power grid can be categorized into three components: resistive,


capacitive and inductive. Due to the sheet resistance of the metal layers and the large number of layers

in modern grids required for grid routing, resistive effects arise and cause a voltage drop across the grid.

This drop is commonly referred to as the IR drop, and constitutes a major component of the total drop in

the grid [28, 36]. Parasitic capacitive effects arise in power grids because of the proximity of metal wires,

the intrinsic capacitance of non-switching devices, the capacitance between the N-well and substrate in

MOSFETs, and the on-chip decoupling capacitance. Decoupling capacitance, or decap, is the capacitance

between power and ground distribution networks. It acts as local charge storage and is helpful in reducing

the effect of the voltage drop at supply points. These types of parasitic effects are considered implicit

capacitance in a power grid. Implicit decoupling capacitance is not enough to constrain the voltage

drop within safe bounds. Therefore, designers tend to add intentional explicit decoupling capacitance

structures on the die at strategic locations [7]. These explicitly added decoupling capacitances are not

free and may increase the area and the leakage-power consumption of the chip. Therefore, any power

grid model must account for both implicit and explicit types of capacitance for thorough simulations.

Finally, inductive effects arise from the inductance of the interconnections between the grid and the C4

package. This inductance causes a voltage drop at the pad locations due to time-varying currents drawn

by the underlying circuitry. This voltage drop is referred to as the Ldi/dt drop. Inductance in the top

metal layers can be significant [21]. However, because our main focus will be to reduce the lower metal

layers of the grid, we will focus on the grid RC model only.

2.3.2 Power Grid Model

Consider an RC model of a power grid where each metal branch is represented by a resistor and where

there exists a capacitor from every node to the ground. Some nodes have ideal current sources (to

ground) to represent the current drawn by the underlying circuitry, and some have ideal voltage sources

to represent the connections to external power supply. Let the power grid consist of n+ p nodes, where

nodes 1, 2, . . . , n have no voltage sources attached, and the remaining nodes are the nodes where p

voltage sources are attached. Fig. 2.2 shows an RC model of a power grid consisting of nine nodes,

three of which are attached to current sources. Let i(t) be the element-wise non-negative vector of all

current sources connected to the grid. We assume that ∀k = 1, . . . , n, the entry (i(t))k is well-defined,

so that nodes with no current source attached have (i(t))k = 0. Furthermore, let N (i) denote the set of

all neighboring nodes of a node i, where a node j is considered a neighboring node of i if and only if it

is directly connected to node i through a metal branch. Besides, let g(ij) denote a physical conductance

between two nodes, i and j, with g(ij) = 0 if the two nodes are not directly connected, and g(i) denote

the total incident conductance at node i, i.e.:

g(i) =∑

j∈N (i)

g(ij) (2.1)

Let c(i) denote a capacitance connected from node i to ground. Finally, let u(t) be the vector of all

nodal voltages. Applying Nodal Analysis (NA) [22] to the grid leads to:

Gu(t) + Cu(t) = −i(t) +G0Vdd (2.2)

where G and G0 are n× n conductance matrices, C is an n× n diagonal non-singular matrix consisting

of all node-to-ground capacitances, and Vdd is an n × 1 constant vector each entry of which is equal to


Figure 2.2: An RC model of a power grid

the ideal supply voltage source value. Let v(t) = Vdd − u(t) be the vector of voltage drops at all nodes

in the grid. Then the RC model for the power grid can be written as [25]:

Gv(t) + Cv(t) = i(t) (2.3)

Note that this equation can be obtained directly by writing the Nodal Analysis (NA) system for a

modified network in which all voltage sources are shorted (set to 0) and all current sources are reversed.

Throughout the rest of this thesis, we will assume this modified network topology. Moreover, note that

a resistive power grid is modeled in the same way by neglecting all capacitances, and the model can be

written as:

GV = I (2.4)

where V and I are n×1 time-independent vectors representing the DC node voltage drops and currents,

respectively.

The matrix G is known to be symmetric and diagonally-dominant with positive diagonal entries and

non-positive off-diagonal entries [22], and it is defined as follows, ∀i, j ∈ 1, 2, . . . , n,

(G)ij =

g(i) i = j

−g(ij) i 6= j(2.5)

Assuming the grid is strongly connected and there is at least one external voltage source, then G is known

to be irreducibly diagonally dominant. With these properties, G becomes a so-called M-matrix [31], so

that G−1 exists and is non-negative (G−1 ≥ 0).

2.3.3 Spatial Locality

Chip power grids have been shown to have a property of spatial locality, in which the voltage drop, due

to a current source, is limited to the proximity of that source, due to the C4 bumps. A C4 bump acts as


a low impedance path for the local die currents to flow in-chip and therefore affects mostly the current

passing through the nodes closest to it [9]. In other words, a supply connection to the grid strongly

affects the voltage at the nodes closest to it. This phenomenon allows one to divide the grid into blocks

to enhance the grid analysis process.

2.4 Power Grid Simulation and Verification

As mentioned in section 2.3.1, on-die power grid integrity checking plays a major role in the design

of modern complex integrated circuits. With the fast technology scaling in integrated circuits design,

the supply voltage decreases, parasitic effects increase, and more variations occur at the node voltages.

This leads to soft errors, increased circuit delays and loss of yield. Integrity checking ensures that the

voltage drop at any node in the grid does not exceed a certain threshold under all possible input current

waveforms, otherwise, the chip will not perform as intended. This can be done through simulation

and/or verification of the power grid. However, as the grid size increases (to around a billion nodes

today), accurate and complete simulation and verification of the grid become computationally expensive

and almost impossible to implement due to mainly two reasons: 1) The number of traces required to

cover the space of all possible current waveforms increases exponentially, and 2) the existing resources of

CPU and memory prohibit the designers from doing a thorough verification. Over the past two decades,

a lot of research has been developed aiming to efficiently simulate large power grids, and finding the

voltage drops at all desired nodes.

In general, power grid analysis can be classified into two main approaches: vector -based and vectorless

methods. Vector-based techniques, or simulation-based techniques, employ search methods to find a set

of input patterns which cause the worst drop in the grid. They involve simulating the grid under various

current patterns drawn from the underlying non-linear circuitry. On the other hand, vectorless methods

aim to find conservative bounds on the worst-case voltage drop in an efficient manner without full

knowledge of the input vectors.

In this section, we will review some of the techniques used in the literature to verify power grids

based on the vector-based approach, since we will test our work using such techniques.

Vector-based Power Grid Verification

As mentioned earlier, vector-based verification is a simulation based technique, where the power grid

is exhaustively simulated against all possible input waveforms in order to find the worst-case voltage

drops. In recent years, a lot of effort has been put into finding efficient simulation techniques using

the existing resources. Of those techniques, some try to perform a smart search to look for the set of

input waveforms that would cause the worst-case (maximum) voltage drop [13], other methods follow a

divide-and-conquer manner of solving problems in order to find solutions at specific areas in the grid [46],

and some try to speed up the simulations of the power grids for a given set of current waveforms [33, 2].

We will briefly review some of those techniques to give an idea of how things have been developing in

the literature regarding the simulation-based power grid verification problem.

In [13], the authors suggest a genetic algorithm (GA) based method to iteratively find a smaller

set of input patterns that would maximize the voltage drop across the grid. They start with an initial

set of input patterns (generated randomly or specified by the user) such that each input pattern has a

fitness function assigned to it. In their work, the authors use the peak currents that the design draws


Figure 2.3: Hierarchical power grid analysis

in response of those patterns reasoning that higher peak currents tend to cause higher voltage drops,

as more current flows in the resistive network. Besides, the peak current can be efficiently estimated

using a waveform simulator based on the event-driven logic simulation algorithm. The GA engine then

iteratively generates input patterns with higher fitness functions by using evolution-like operations, such

as mutation and selection, where patterns with low fitness functions are removed and new ones with

higher fitness functions are produced from parts of “healthy” input patterns from the previous run.

Notice that during each iteration, the waveform simulator is used to determine the fitness function of

each input pattern. The algorithm stops when there is no more improvement observed, or when the

number of iterations exceeds a certain value. Finally, the resulting patterns from the GA engine are fed

to a transistor level simulator in order to accurately identify the one with the worst-case voltage drop

on the grid.

Another technique, that was adapted in the literature aiming to come up with an efficient method

for power grid analysis, was based on a divide-and-conquer strategy. In [46], the authors exploit the

hierarchical structure of modern power grids by proposing a divide-and-conquer based method to speed

up the analysis of the power grid. The grid is partitioned into smaller local grids of manageable sizes.

Given that the power grid is strongly connected, the local grids are connected to the global grid through

port nodes, in a way that abstracts away the behavior of these small grids from the global one. These

abstractions are called macromodels, where a macromodel is a multiport linear element that has the

same linear relation between the electrical characteristics at its ports as the local partition it represents.

This hierarchical approach for power grid verification is shown in Fig. 2.3 taken from [46]. The transfer

function of each macromodel is given by:

I = Y V + S (2.6)

where V and I are vectors representing the voltages at the port nodes and the currents passing through

them, respectively. Y is the port admittance matrix, and S is a current vector that captures the effect

of the current sources internal to a local grid at each port node. The set (Y, S) in (2.6) is referred to as

the macromodel of the respective local grid. Once the macromodels of the local grids are derived from

the original grid, the entire network is abstracted simply as a global grid with the macromodel elements

connected to it at the port nodes. Now the analysis problem can be easily solved by simulating the small

global grid, and then solving for the voltages on the internal nodes of each local grid. Experimental


results show that there is a 10x-20x reduction in memory with the hierarchical approach, and a 2x-

5x speed-up in runtime. It is worth mentioning that this type of analysis is useful only when one is

interested in one part of the grid, since a simulation of the entire design would require more work than

the traditional flat approach.

Other efforts have been made to improve the run time and memory usage of power grid analysis

simulations. In [33], the authors used the concept of random walk to analyze the power grids. Random

walk is a mathematical optimization approach such that a path of consecutive steps is chosen randomly

and analyzed statistically. The authors start this work by performing a DC analysis on a resistive grid.

In such a model, the voltage at node i can be obtained by the following equation:

(V )i =∑

j∈N (i)

g(ij)

g(i)(V )j −

(I)ig(i)

(2.7)

where (I)i is the net current flowing out of node i. Recall that N (i) is the set of all neighboring nodes

of (or into) node i. Using this equation, the authors represent the grid as an undirected graph, where

each node is a vertex and each metal branch is an edge. Starting at vertex i, the algorithm jumps to

any of the neighboring vertices j ∈ N (i) with a probability:

p(ij) =g(ij)

g(i)(2.8)

The process stops when a Vdd node is reached. The voltage drop at node i is then estimated from

the expected value of the voltage drops at the neighboring nodes. Several Monte-Carlo simulations are

performed for the random walk process and the average of the estimates of those runs is considered as

the voltage drop at node i.

The authors then extend their work to analyze RC power grids. They compare their approach to the

hierarchical approach in [46]. The random walk algorithm is shown to be faster than the hierarchical

one, and has an advantage in a sense that it can be used to estimate the voltage drop at selected nodes

without the need to analyze the whole grid. This makes random walk suitable for incremental verification

of power grids.

Recently, a new attempt for faster transient power grid analysis has been published in [2]. The

authors propose a new hybrid algorithm that involves both direct and iterative solvers based on waveform

relaxation [16]. The algorithm starts by a divide-and-conquer approach to partition the grid into smaller

networks (sub-circuits) suitable for waveform relaxation. The coupling between the different partitions

is represented by relaxation sources, appended at the boundaries of each sub-circuit, since the coupling

between partitions is mainly resistive, which serves as a current path. Initial guesses are given at the

start of the algorithm for the relaxation sources. Then each sub-circuit is iteratively simulated for a

desired duration using some state-of-the-art direct solvers and the values of the relaxation sources are

updated to reflect the new values of the electrical characteristics of each sub-circuit. The process stops

once convergence is achieved. The authors give some efficient partitioning schemes, and some good

initial guesses for the sources based on the DC analysis of the grid in order for the process to converge in

few iterations. Moreover, for totally independent partitions, the authors propose a highly parallelizable

scheme to run those partitions in parallel using pipelining (taking into account that the number of CPU

cores is limited) in order to maximize throughput. The method is compared to conventional direct

and iterative solvers, and the results show that it is able to achieve 2x-6x speed-up, and it scales well


with increasing number of CPU cores available. However, a major drawback of this method is that its

performance may be affected by the grid topology, as it requires some way to smooth the errors that

exist on the boundary nodes.

In summary, all vector-based power grid verification techniques, no matter what approach they

follow, are simulation-based, i.e., they require a circuit simulator to simulate the grid against the loading

currents from the underlying non-linear logic circuitry. A major drawback of such techniques is that

they should be done after the grid is designed, when the entire chip design is complete and detailed

information about currents drawn are known. Power grid issues that occur at this stage of the design

are usually hard and expensive to fix. Besides, these techniques are often optimistic, in a sense that they

underestimate the worst-case voltage drop if the wrong input patterns are used. Most of the modern

industry power delivery networks verification tools are based on simulation-based techniques.

2.5 Model Order Reduction

As seen in section 2.4, simulation-based power grid verification techniques perform well when the grid

is well-designed, and is fairly small and sparse. However, as the grid size increases, simulations become

computationally expensive, and exhaustive simulation is almost impossible to implement. Given that

the logic circuitry is connected to the low metal layers of a power grid, then the safety (integrity) of the

nodes in those layers is what matters to ensure reliable performance. To this end, compact modeling of

on-chip passive interconnect networks has been extensively used in IC designs for the past two decades.

Researchers have been using model order reduction (MOR) techniques to reduce the original large power

grid systems to much smaller ones, preserving accuracy and timing behavior as much as possible, such

that they can be used in further simulations and design verification. In this section, we give a brief

background on model order reduction, its history, and some of the well-known MOR techniques developed

in the past two decades and their applications to power grid networks.

Realistic simulations of highly complex design products require efficient and fast ways to do these

computations. Despite the fact that computational speeds and power are increasing year after year, the

problem sizes of real-life simulations have been increasing with a faster rate, and the existing computa-

tional power cannot cope with such simulations. Thus, the need for model order reduction in numerical

simulations has emerged. Originally, MOR was developed in the area of systems and control theory,

which studies the properties of dynamical systems aiming to reduce their complexity, while preserving

their input-output behavior as much as possible [37]. MOR tries to automatically capture the essential

properties of a system without computing all its details. In other words, as its name indicates, MOR

transforms a dynamical system into a much smaller one whose behavior is almost identical to the original

one. Such simplification helps one perform simulations within an acceptable amount of time and limited

storage capacity.

Fig. 2.4 (taken from [11]) illustrates graphically the concept of MOR. The figure demonstrates that

a model can be described with very little information about it. In this example, one can deduce that

the model represents a dinosaur even with a few faces.

Model order reduction has been studied a lot in the past. In fact, the fundamental methods in

this area date back to the eighties and nineties of the last century. Since then, much research and

improvement have been added to the basics of this field resulting in a large variety of MOR techniques.

This section reviews the fundamental methods and some of the recent work published on this topic


Figure 2.4: Graphical illustration of model order reduction [11]

along with the drawbacks of each method. Specifically, we review the moment matching techniques, the

truncated balanced realization techniques, the algebraic multigrid reduction techniques, the topological

node elimination techniques, and some hybrid techniques that combine several of these methods.

2.5.1 Moment Matching

Moment matching methods are based on numerical matching of moments of a system to reduce its order.

Assume that the transfer function of the original linear network is given by:

H(s) =Y (s)

X(s)(2.9)

where Y (s) and X(s) are the input and output functions of the system, respectively. H(s) can be

expanded around s = 0 using Taylor series expansion as follows:

H(s) =∞∑k=0

Mksk

= M0 +M1s+M2s2 + . . .

(2.10)


where Mk is called the kth order moment, and is given by:

Mk =1

k!× dkH(s)

dsk

∣∣∣k

(2.11)

To apply this to our RC power delivery network, the ordinary differential equation of our MNA

model in (2.3) can be transformed into a linear system as follows:

Cv(t) = −Gv(t) +Br(t)

y(t) = LT v(t)(2.12)

where r(t) is an m× 1 vector representing the m excitation current sources attached to the power grid

(typically m << n), B is an n×m input position matrix specifying the nodes that the current sources

are attached to, v(t) is the n× 1 vector of voltage drops at all nodes, and it represents the states of the

system, y(t) is a q × 1 output vector of the nodes at which the voltage drop is to be observed, and L is

a n× q output position matrix specifying those output nodes. Note that i(t) in (2.3) is given by:

i(t) = Br(t)

Taking the Laplace transform of the system in (2.12) with zero initial conditions, we get:

(G+ sC)V (s) = BR(s) (2.13a)

Y (s) = LTV (s) (2.13b)

where V (s), Y (s) and R(s) are the Laplace transform vectors of their respective time domain vectors.

Expanding V (s) in (2.13a) around s = 0, we get:

(G+ sC)(v0 + v1s+ v2s2 + . . . ) = BR(s) (2.14)

where the moments vk are given by:

v0 = G−1B

vk = −G−1Cvk−1 ∀k ≥ 1(2.15)

Note that at DC, the capacitance acts as an open circuit and the solution of the system is given from (2.4)

as:

V = G−1I

This is the same as v0R(s), which should be the case. The output moment Mk = LT vk is given by:

Mk = LT (−G−1C)kG−1B (2.16)

Moment matching techniques can be divided into two sub-divisions. Specifically, Explicit and Implicit

moment matching. We will give a brief review about each, along with a well-known method found in

the literature for both.


Explicit Moment Matching

Explicit moment matching is based on the direct matching of the original system moments to reduce the

model. Given the transfer function in (2.9), one tries to reduce the model by matching only the first β

moments, such that the transfer function of the reduced model is given by:

Hβ(s) = M0 +M1s+M2s2 + · · ·+Mβs

β (2.17)

where β ≥ 0 is user-defined, and is decided based on the accuracy desired. The higher β is, the more

accurate the reduced system will be, and bigger of course.

One of the earliest explicit moment matching attempts applied to power grids is found in [17].

The authors partition a large RC network into small subnetworks that are reduced to a lower order

RC equivalent circuits. Partitioning is performed using an S-parameter matrix. The objective of the

partitioning is to minimize the total number of entries of the S-matrix. After partitioning, each sub-

circuit is reduced by matching the first two order moments of the admittance matrix looking into its

port, where a port here is a node the was originally connected to another sub-circuit or the original main

RC network. In other words:

A(s) ≈M0 +M1s

where A(s) is the admittance matrix. The circuit models shown in Figs. 2.5a and 2.5b are used to

synthesize the elements between the different pairs of ports, while the parallel RC circuit model shown

in Fig. 2.5c is used to synthesize the port-to-ground elements. The elements of the T-model in Fig. 2.5a

are given by:

R(ij1) =−√

(M1)jj

(M0)ij

(√(M1)ii +

√(M1)jj

)R(ij2) =

−√

(M1)ii

(M0)ij

(√(M1)ii +

√(M1)jj

)C(ij) =

(M1)ij

(√(M1)ii +

√(M1)jj

)2

√(M1)ii(M1)jj

(2.18)

If the circuit contains floating capacitors, (M1)ij becomes negative and in that case, we use the

floating capacitance model shown in Fig. 2.5b whose elements are given by:

R(ij) = −1(M0)ij

C(ij) = −(M1)ij(2.19)

Finally, the port-to-ground elements shown in Fig. 2.5c are given by:

R(ii) =(

(M0)ii +nprt∑j=1j 6=i

(M0)ij

)−1

C(ii) = (M1)ii −nprt∑j=1j 6=i

C(ij)R2(ij2)

(R(ij1) +R(ij2))2

(2.20)

where nprt is total number of ports for the corresponding partition. The modeling time required for this


(a) T-model (b) Floating capacitance model (c) Parallel RC model

Figure 2.5: Circuit models between pairs of ports after reduction

RC MOR method is linear with the number of ports of the original grid. This method preserves the

block and sparse structure in the reduced networks. Besides, the reduced sub-circuits are stable and

realizable, as the resultant network is also an RC network. A major disadvantage of this method is that

it is not very accurate, as it uses only the first two moments in its approximation. Using higher order

moments makes the reduced networks unrealizable. An attempt to extend the idea for reducing RLC

networks was made by the authors in [20], in which they used the first three order moments for their

approximation.

Implicit Moment Matching

Implicit moment matching is based on a numerical projection of the moment space onto a smaller

orthonormal subspace called the Krylov subspace. This reduction technique is commonly referred to as

a projection-based method. Given the linear system of an RC network in (2.12), define:

A := −G−1C and P := G−1B (2.21)

where A is of size n× n, and P is of size n×m. The Krylov subspace for matrices A and P is defined

as:

Kd(A,P ) = spanP,AP,A2P, . . . , Ad−1P (2.22)

where d n is the order of the reduced system. Recall that span of a set of vectors (or matrices) is the

set of all linear combinations of those vectors (or the bases of column spaces of those matrices).

The main idea lies in finding a transformation matrix V ∈ Rn×d such that:

colspace(V ) = Kd(A,P )

where colspace(V ) is the column space of matrix V . V is known to be orthonormal, i.e., V −1 = V T .

The reduced system is given by:

C ˙v(t) = −Gv(t) + Br(t)

y(t) = LT v(t)(2.23)

where v(t) is an r×1 vector representing the state variables of the reduced system, y(t) is a q×1 output

vector. Typically, we aim for y(t) ≈ y(t). G, C ∈ Rd×d, B ∈ Rd×m and L ∈ Rd×q. The system matrices

are given by:

G = V TGV

C = V TCV

B = V TB

L = V TL

(2.24)


Once the projection matrix V is found, one can get a reduced system of order d by (2.23) and (2.24).

The passive reduced-order interconnect macromodeling algorithm, or PRIMA [26], is a well-known

algorithm for implicit moment matching. PRIMA constructs the transformation matrix V using the

block Arnoldi algorithm [40]. The authors try to match the dominant block moments of the original

system. In the case of RC networks, the block moments Mk are given by (2.16):

Mk = LTAkP

If the order of the reduced system is d (G is d×d), and the number of input excitation sources (ports) is

m, then PRIMA matches at least b dmc block moments. PRIMA has the advantage over other reduction

techniques in guaranteeing passivity of the reduced system given that the original system is in MNA

form and L = B.

However, PRIMA suffers from some major drawbacks. For example, PRIMA does not preserve some

of the essential circuit properties such as reciprocity. Also, it does not scale well with the number

of ports. The algorithm generates m new poles for every block moment order increase, which means

that as the number of ports m increases, the reduced system becomes larger and denser. To better

understand this limitation, assume that the original system has 10K ports, and that PRIMA matches

only 2 moments, then the reduced system would be of size 20K and much denser than the original one.

Given that modern on-die power grids have a large number of ports, PRIMA is rarely used as is in modern

power grid reduction techniques. The authors in [18] proposed a method to reduce the number of ports

before applying PRIMA by merging ports with similar timing behavior. Finally, PRIMA wastes memory

resources by computing the transformation matrix explicitly. Recently, the authors in [27] proposed a

numerical moment matching reduction technique based on PRIMA, but without explicitly computing

the transformation matrix. Results show that the proposed approach is able to achieve around 80x

speed-up in reduction time compared to PRIMA with much less memory usage.

2.5.2 Truncated Balanced Realization

Similar to projection-based methods, truncated balanced realization (TBR) approaches map the original

system onto a smaller subspace. The key advantage of TBR over other projection-based techniques is

that it gives high quality reduced systems by making an extra effort in choosing the projection subspaces.

The idea in such an approach is that the method makes sure that all difficult-to-reach states in the original

system are truncated. This can be done by computing the controllability and observability Gramians.

We will briefly review the TBR method. For deeper understanding, the reader is advised to re-

fer to [40].

Consider the linear system in (2.12), and let A = −C−1G and Bc = C−1B, then we get the following

linear system:

v = Av +Bcr

y = LT v(2.25)

Note that we dropped the time variable t in the system equations for simplicity. The controllability

Gramian X and observability Gramian Y are defined as the solutions to the Lyapunov equations [40]:

AX +XAT +BcBTc = 0

ATY + Y A+ LLT = 0(2.26)


where X and Y are unique symmetric, positive definite solutions given that the system is controllable

and observable. Hence, X and Y can be factorized using Cholesky factorization to get:

X = LcLTc and Y = LoL

To

where Lc and Lo are both lower triangular matrices. Let LTo Lc = PΣQT be the singular value decom-

position (SVD, refer to [40]) of LTo Lc, where

Σ = diag(σ1, σ2, . . . , σn)

such that σ1 ≥ σ2 ≥ · · · ≥ σn > 0, and σi’s are the singular values of the product XY . P and Q are

both orthonormal matrices.

Let T = LcQΣ−1/2 be the balancing transformation matrix, then by applying a congruence trans-

formation to the original system matrices, we get:

A = T−1AT

Bc = T−1Bc

LT = LTT−1

(2.27)

Let q be the order of the reduced system desired, then partition Σ as follows:

Σ =

[Σ1 0

0 Σ2

]

where Σ1 is a q×q diagonal matrix consisting of the q largest singular values σ1, σ2, . . . , σq. Accordingly,

partition the transformed matrices as:

A =

[A11 A12

A21 A22

]

Bc =

[Bc1

Bc2

]

L =

[L1

L2

] (2.28)

The reduced model is then obtained by a simple truncation, that is, by taking the leading blocks A11, Bc1

and L1 to form the reduced balanced model.

Despite that fact that TBR methods provide excellent accuracy, they are computationally expensive,

since solving the Lyapunov equations is of cubic complexity. A lot of effort has been made to reduce the

time complexity of the reduction method, but at the expense of losing accuracy [30, 42].

2.5.3 Multigrid Reduction

Multigrid reduction is a class of model order reduction inspired by the classic algebraic multigrid (AMG)

and standard multigrid (SMG) methods. The reduction scheme tends to map the problem of a large

passive linear network to some coarse network. This is known as the restriction step. The coarse network


is then analyzed using some of the existing analysis techniques. An interpolation step follows, in which

the solution is mapped back to the original network. Multigrid-like techniques have been widely used in

power grid analysis [15, 24, 39, 43, 48].

In [15], the authors propose a multigrid-based method in order to efficiently analyze and verify on-

die power delivery networks for some given set of current waveforms representing the underlying logic

circuitry. The method starts by selectively removing nodes from the power grid in an iterative manner

in order to reduce the grid to a much smaller coarse one. The resulting grid is small enough that it

can be solved and analyzed using a direct approach. The solution of the small grid is then mapped to

the larger (finer) grid exploiting the strength of wire connections and the dominant neighbors (strongly

connected neighbors). In other words, the voltage (or voltage drop) at a removed node from the original

grid is obtained from the voltages at its strongly connected kept neighbors. To this end, current sources

attached to the nodes being removed from the original grid are split to current sources at their respective

strongly connected neighbors. Results show that the method provides 16x-20x speed-up in DC analysis

compared to direct solvers, and up to 600x speed-up in transient analysis.

Different multigrid-based reduction methods follow different reduction (restriction) schemes. In gen-

eral, methods that follow standard multigrid (SMG) techniques tend to use uniform coarsening to reduce

the grid, and linear interpolation methods to get the solution of the original network, while algebraic

multigrid (AMG) techniques tend to fix the number of iterations while coarsening, and applying some

correction schemes at the reduced networks to mitigate the error [39].

Typically, multigrid-like reduction techniques do not incur very large errors (less than 1%), since

well-designed power grids are characterized by smooth voltage variations. However, most of the modern

power grids contain a lot of irregularities and non-uniformity. Such non-uniformity results in rapid

voltage variation at some nodes, and slow variation at other nodes. This may lead to unpredictable

errors and loss of accuracy. Besides, multigrid techniques become inefficient with non-uniform grids, as

the topology and geometry of the grid have to be saved at each multigrid level.

2.5.4 Topological Node Elimination

Nodal elimination techniques refer to the topological approach to reduce the number of nodes in the

original circuit and approximate the newly added elements in the reduced circuit [40]. The main idea

of Node Elimination is based on the well-known Y −∆, or star-mesh, transformation technique used in

circuit analysis [32]. Suppose that node i is to be eliminated, then the elimination process is as follows:

• Remove all conductances connecting node i to its neighbors.

• For each two nodes j, k ∈ N (i), insert a new conductance between them given by:

g(jk)new:= g(jk)old +

g(ij)g(ik)

g(i)(2.29)

• Remove node i.

This elimination of node i does not affect the surrounding nodes’ voltages. The basic node elimination

process for a node with four neighbors is summarized in Fig. 2.6. The main advantage of nodal

elimination techniques is that the elimination process is local, i.e., one only needs information about the

node to be eliminated and its surroundings. This makes the reduction much faster compared to other

approaches, which makes it suitable for reducing large interconnect networks.


Figure 2.6: A simple node elimination process

One of the well-known nodal elimination methods is the Time Constant Equilibrium Reduction

Scheme, or TICER [38]. The aim of TICER is to convert circuits to smaller realizable networks while

preserving Elmore delays through RC trees. TICER’s key idea is to eliminate a node that has few

neighbors and a small time constant, or what the authors call “quick nodes”. The time constant of a

node i is given by:

τ(i) =c(i)

g(i)(2.30)

The node is said to be a quick node, and hence can be eliminated, if∣∣∣sτ(i)∣∣∣ << 1, where s is related to

the maximum operating frequency.

Taking the Laplace transform of any RC electric network system as in (2.13a), and I(s) = BR(s),

we have:

Y (s)V (s) = I(s) (2.31)

where Y (s) = G+ sC is the admittance matrix. In what follows, we will drop the s part in the equation

for simplicity, and refer to Y (s), V (s) and I(s) by Y, V and I, respectively. TICER works on a node by

node basis. In other words, assume that node n is being eliminated, then TICER partitions Y in such

a way that the (n, n)th entry forms one partition as follows:

[Y y

yT g(n) + sc(n)

][V

(V )n

]=

[I

(I)n

]

where y is a column vector representing the interconnections to node n, and Y contains the admittance

information about the remaining sub-circuit. Eliminating (V )n from the equation, we get:

(Y − E)V = I − F (2.32)

where

E =

∣∣∣yyT ∣∣∣g(n) + sc(n)

F =I(n)

∣∣∣y∣∣∣g(n) + sc(n)

(2.33)


Alternatively:

(E)ij =

∣∣∣∣ (y)i(y)jg(n) + sc(n)

∣∣∣∣=

(g(in) + sc(in)

)(g(jn) + sc(jn)

)g(n) + sc(n)

=g(in)g(jn)

g(n)

(1 +

sc(n)

g(n)

)−1

+g(in)c(jn) + c(in)g(jn)

g(n)s(

1 +sc(n)

g(n)

)−1

+O(s2)

(2.34)

To get a realizable RC network, TICER approximates (E)ij by expanding it using Taylor series expansion

around s = 0, and dropping the higher order terms in s to get:

(E)ij ≈g(in)g(jn)

g(n)

(1−

sc(n)

g(n)

)+g(in)c(jn) + c(in)g(jn)

g(n)s(

1−sc(n)

g(n)

)(2.35)

Assuming the node being eliminated is a quick node, i.e.,∣∣∣ sc(n)

g(n)

∣∣∣ 1, (E)ij becomes:

(E)ij ≈g(in)g(jn)

g(n)+g(in)c(jn) + c(in)g(jn)

g(n)s (2.36)

Similarly, from (2.33), (F )i is approximated as follows:

(F )i =

∣∣∣(y)i

∣∣∣g(n) + sc(n)

(I)n

=g(in) + sc(in)

g(n) + sc(n)(I)n

≈g(in)

g(n)(I)n

(2.37)

This approximation lets us eliminate nodes topologically similar to the aforementioned node elimi-

nation process, as in (2.29), but on RC networks as follows. Assume node n is being eliminated, then:

1. Remove all resistors and capacitors connecting node n to other nodes.

2. Insert new resistors and capacitors between former neighbors using the following two rules:

• If nodes i and j were connected to node n through metal branches g(in) and g(jn), insert a

conductanceg(in)g(jn)

g(n)between i and j.

• If node i had a capacitor c(in) connected to n, and node j had a conductance g(jn) connected

to n, insert a capacitorc(in)g(jn)

g(n)between nodes i and j.

3. Remove node n.

The authors in [5] made an attempt to extend TICER to RLC networks. They match the DC

characteristics and the first two moments at all nodes. The method is applicable on simplified RLC

models, in which at least one of the three elements R,L or C should be zero between any pair of nodes.

However, since the approximation takes only the first two moments into consideration, errors occur in a

local manner and accumulate, making the global error harder to predict and control.


An obvious limitation of TICER is that the addition of new elements makes the reduced network

denser compared to the original one. This makes TICER suitable for tree-like structures of networks

observed in circuit timing analysis problems, but not readily applicable to mesh-structured chip power

grids. The authors in [12] use TICER to eliminate nodes from on-die power grids in order to check the

safety of those grids. For the nodes being eliminated that are attached to current sources, the authors

propose a way to move the current sources to the neighboring nodes without affecting accuracy based

on some local constraints (upper and lower bounds) given by the user. Also, node-to-ground capacitors

are distributed among neighbors in a weighted ratio of conductances. The authors maintain sparsity of

the reduced grids by providing a lower bound on the conductances to be added to the grid. However,

experimental results show that the reduction ratio is not very high, and the work cannot be extended

to very large power grids. Furthermore, the sparsification procedure results in an unpredictable loss of

accuracy. Throughout the rest of this thesis, we will refer (to compare our results) to the method used

in [12] to eliminate nodes from power grids topologically by the node elimination process.

2.5.5 Other Methods

We have covered most of the popular model order reduction methods used in the literature to reduce the

size of power delivery networks for the purpose of verification. However, MOR techniques are certainly

not restricted to the ones mentioned in this section [40]. Recently, several MOR techniques that combine

more than one fundamental method have been published.

In [47], the authors follow a divide-and-conquer strategy to geometrically partition the grid into

several blocks. Each block is reduced to a much smaller one using Gaussian Elimination. The Reduction

of each block is done efficiently and independently from other blocks. After reduction, the authors apply

a port merging scheme to reduce the number of ports for each block, and a spectral graph sparsification

scheme based on random sampling of edges to reduce the density of each block and keep the simulation

runtime in check. Results show a good runtime and memory reduction with good accuracy in DC

simulations. However, the authors give very little information on their approach for RC grids, and most

of their results are shown for resistive grids.

Following a divide-and-conquer strategy also, the authors in [44] partition the grid into smaller

blocks based on the spatial locality property mentioned in section 2.3.3. The partitioner is combined

with existing MOR methods to further reduce the blocks. Each block is then simulated independently,

and then all simulation results are combined to perform full-chip analysis. The work is similar to the

hierarchical idea for power grid analysis mentioned in section 2.4. However, there too the results shown

are only for resistive power grids.

2.6 Conclusion

In this chapter, we have reviewed the modern chip design process and the resulting power grids and their

models. Moreover, we introduced the verification problem of modern power grids and the difficulties

faced in verifying very large integrated circuits, along with some of the existing techniques made in

an attempt to make the simulations faster. Finally, we reviewed the class of model order reduction

methods and their benefits, along with most of the fundamental techniques used for different purposes.

As we have seen, each model order reduction technique suffers from its drawbacks, and there is always

a trade-off between the accuracy of the reduced model and the runtime of the simulations. In the next


chapter, we introduce a fast hybrid approach (numerical method that factors in grid topology) for RC

power grid reduction that exploits the fact that all logic circuitry is connected to the low metal layers of

power grids. The method is a layer-by-layer elimination process that extends the TICER approach into

a block-matrix approach that preserves grid topology.

Chapter 3

Grid Reduction

3.1 Introduction

As seen in chapter 2, existing model order reduction methods (MOR) have various difficulties when

reducing very large on-die power grids with many ports. Numerical methods based on moment matching,

like PRIMA, do not scale well with the number of ports, as the reduced system becomes denser and

larger. Besides, PRIMA puts an upper bound on the size of the grid it can reduce, as it needs significant

computational resources to compute the actual projection matrix. Topological methods, like TICER and

node elimination [12], are suitable for selective node elimination. However, this node-by-node elimination

process becomes computationally expensive when eliminating several nodes next to each other, as the

grid becomes dense. Therefore, node elimination does not serve our purpose to eliminate whole metal

layers from the power grid. Other methods do not perform well when there is non-uniformity in the

grids, or when taking into account the capacitive and inductive parasitics while modeling the grid. In this

chapter, we propose a novel technique for RC power grid reduction that is a hybrid of both numerical and

topological approaches – the method is numerical but also factors in grid topology to preserve sparsity.

Instead of the traditional node-by-node topological elimination, we provide a numerical layer-by-layer

block-matrix approach that is both fast and accurate. A version of this work has been accepted and is

to appear in ICCAD, November 2016 [45].

The chapter is organized as follows. Section 3.2 introduces the block-matrix representation of the

grid and the reduction process performed on resistive grids. Section 3.3 provides a sparsification scheme

based on the topology and electrical characteristics of the grid in order to preserve the sparsity of the

reduced grid while maintaining the original dynamic behavior. After that, section 3.4 presents a major

contribution of this thesis that extends our numerical reduction to RC grids giving a proof that it is

equivalent to the approximation used in [12]. Section 3.5 presents a faster incremental approach to

eliminate the metal layers. Finally, section 3.6 shows some experimental results and analysis of our

proposed approach.

3.2 Layer Elimination in Resistive Grids

In [47], the authors divided the NA conductance matrix to a 2 × 2 block matrix representing port and

non-port (to be eliminated) blocks, where a port block is one that contains nodes attached to current

23

Chapter 3. Grid Reduction 24

sources (underlying circuitry), and then they performed Gaussian Elimination to reduce the grid to only

the port block. In this section, we divide the grid into three regions and generalize the work done in [47]

to a 3× 3 block-matrix Gaussian Elimination. Since the resulting grid is dense, we propose in the next

section a topological technique for power grid sparsification based on the relation between the effective

resistance between two nodes in the grid and spatial locality.

Let us divide a resistive grid into 3 blocks; one representing the top layers of the grid that should be

kept (the topmost layer is connected to the C4 bumps), another representing the layers that are desired

to be eliminated (middle layers), and finally a block representing the lower (bottom) layers that are

connected to the current sources.

Let nl be the total number of nodes in the layers to be eliminated (middle layers), nt be the number

of nodes in the top layers of the grid, and let nb be the number of nodes in the bottom layers of the grid.

Note that n = nt + nl + nb, where n is the total number of nodes in the grid. Consider the following

representation of the resistive grid conductance matrix of size n× n:

G =

G11 G12 0

GT12 G22 G23

0 GT23 G33

(3.1)

where G22 is an nl × nl conductance matrix that represents the middle metal layers, G11 is an nt × ntconductance matrix representing the top layers of the grid, and G33 is an nb × nb conductance matrix

representing the bottom layers of the grid. G12 and G23 are non-positive, non-square matrices that

represent the vias between the middle layers and the top and bottom layers, respectively, where G12 is

of size nt×nl, and G23 is of size nl×nb. Note that G11, G22 and G33 are allM-matrices (their inverses

exist and are non-negative), since any principal sub-matrix of an M-matrix is also an M-matrix [31].

Recall that from (2.4), one can get the DC voltage drops V =

V1

V2

V3

at all nodes in the grid by

solving:

GV = I (3.2)

where V1 is of size nt × 1, V2 is of size nl × 1, and V3 is of size nb × 1, and I =

0

0

I3

is an n× 1 vector

representing the DC current values drawn from each node in the grid to ground. Note that I3 is of size

nb × 1, and the zeros are vectors of sizes nt × 1 and nl × 1.

Combining (3.1) and (3.2), and solving for V2, we get:

V2 = −G−122 (GT12V1 +G23V3) (3.3)

From (3.1) and (3.2), we also have:

G11V1 +G12V2 = 0 (3.4)

Substituting (3.3) into (3.4), we get:

(G11 −G12G−122 G

T12)V1 −G12G

−122 G23V3 = 0 (3.5)


Following a similar procedure, we also get:

−GT23G−122 G

T12V1 + (G33 −GT23G

−122 G23)V3 = I3 (3.6)

Let:

G11 , G11 −G12G−122 G

T12

G13 , −G12G−122 G23 (3.7)

G33 , G33 −GT23G−122 G23

Then, the reduced system matrix can be represented by:

G =

[G11 G13

GT13 G33

](3.8)

where G is of size n× n, and n = n−nl = nt +nb, and the DC voltage drops at all nodes in the reduced

grid can be obtained by solving:

GV = I (3.9)

where V =

[V1

V3

]is an n × 1 vector that represents the DC voltage drops at all nodes in the reduced

grid. V1 and V3 are of sizes nt× 1 and nb× 1, respectively. I =

[0

I3

]is also an n× 1 vector representing

the DC current values drawn from each node in the reduced grid to ground. Note that I3 is still the

same as in (3.2).

3.3 Sparsification

As the size of the grid increases, the number of nodes in the middle layers increases, and thus, Gaussian

Elimination results in rather dense models that are hard to store in memory, and expensive to use

in simulations. To overcome this issue, we use some heuristic techniques to reduce the number of

connections in the reduced model topologically, without creating too many inaccuracies in both the DC

and transient simulations. The method is based on the effective resistance between any two nodes in

the original grid.

The sparsification process benefits from the notion of, for every conductance in the reduced model,

the effective resistance between the terminals of that conductance in the original grid. If that effective

resistance is less than a user defined threshold Reff,th, and if the value of the new added conductance

is greater than a user defined tolerance δ > 0, then we would like to keep the conductance, otherwise,

remove it. The effective resistance R(ab)eff between nodes a and b can be computed as follows:

R(ab)eff = (ea − eb)TG−1(ea − eb) (3.10)

where ei is an n× 1 vector that includes a 1 at index i and 0 otherwise. Since the reduced grid is dense

(close to full), the number of edges is almost(n2

), i.e., the method would require

(n2

)DC solves of the

original grid in order to sparsify the new reduced grid. This becomes very inefficient as the size of the


Figure 3.1: Effective resistance between grid nodes v.s. distance

grid increases. Instead, one can notice that there is no need to actually compute the effective resistance,

if we have an estimate of how small or large it is. This leads to our proposed sparsification technique.

Fig. 3.1 shows the effective resistance between different nodes in the reduced grid versus the distance

between them. The grid used was resistive of size 3K nodes with 8 layers (15µm× 15µm in dimension).

Six layers were eliminated, keeping only the topmost (M8) and bottommost (M1) layers. The figure

shows the distribution of the original effective resistance between nodes in layer M8 (in magenta),

between nodes in layer M8 and layer M1 (in green), between nodes in layer M1 with the same x-

coordinate but different y-coordinate (in yellow), between nodes in layer M1 with the same y-coordinate

but different x-coordinate (in blue), and between nodes in layer M1 with different x and different y

coordinates (in red). The figure also shows a 13Ω line threshold, below which the effective resistance is

considered good, and the edge is considered important. This threshold was chosen based on empirical

results, as it gave a clear division of the effective resistance region and resulted in a good accuracy after

reduction. We can see that all connections in the reduced grid that are between nodes in layer M8,

or between nodes in M8 and M1 are considered important no matter what the distance between them

is. Moreover, all connections between nodes in M1 with the same x-coordinate and lie on the same

line (interconnect tree) or with a difference of one line (inside dashed ovals in the figure) are considered

important. Besides, all connections between nodes in M1 that have the same y-coordinate and lie at

most three lines (interconnect trees) away from each other (inside pink dotted oval in the figure) are

also important. All other connections are considered not important. This is the basis of our efficient

sparsification technique.

Based on the spatial locality, the closeness to the boundaries and the relation between the top and

bottom layers of the grid, we propose a method that captures the important connections in the reduced

grid while preserving the electrical properties of the original one as much as possible. Our method uses

the relation between the effective resistance between two nodes and spatial locality in power grids. Using


the concept of spatial locality mentioned in section 2.3.3, one can say that there is a correlation between

the effective resistance between any two nodes and the length of the path between those nodes. In other

words, as the nodes become spatially closer to each other, the effective resistance between them becomes

smaller.

Assume that the metal layers are indexed in increasing order from bottom to top. Let h be the index

of the lowest layer of the top layers in the grid that is not being eliminated, and let l, l < h, be the

index of the highest layer in the bottom layers, which means that ∀k ∈ l+1, . . . , h−1, layer k is being

eliminated. After reduction, there are three types of new connections in the resulting dense grid:

• Connections between nodes within layer l.

• Connections between nodes within layer h.

• Connections between nodes in layer l, and nodes in layer h.

For the nodes in layer l, the effective resistance is small between nodes that lie in a neighborhood of a

specific size (based on spatial locality). In other words, for an edge that connects two nodes in the same

neighborhood, one can safely assume that it is an important edge and keep it (if the new connection is

greater than a threshold δ), otherwise, it can be removed. The choice of the size of the neighborhood

is up to the user. Clearly, the larger the neighborhood, the denser the final sparsified grid will be, and

more accurate of course. Hence, there is a trade-off between the sparsity of the grid and simulation

speed, and the accuracy of such simulation. As seen in Fig. 3.1, nodes that are on the same interconnect

tree that lie within 2-3 nodes away from each other, and nodes in different interconnect trees that lie

within 2-3 lines away from each other have relatively small effective resistance between them. Hence, in

our sparsification, we use this as the neighborhood.

For the other two types of connections, we keep them, if their conductance is greater than δ. The

reason behind this is first of all, the interconnect tree width in the top layer h is high. This means

that layer h tends to have high connectivity, which implies lower effective resistance. Second, the

via resistances between layers in the original grid are small compared to the branch resistances. This

implies that nodes in different layers tend to have lower effective resistance between them. Algorithm 1

summarizes our sparsification method, which we call SPER, or sparsification based on effective resistance.

The algorithm takes as input the grid Gin to sparsify, along with h, l and the tolerance δ, and outputs

Gsp, which is the sparsified grid. If we have a good representation of the topology of the grid, the

sparsification procedure, which indicates the important edges in the reduced model, will be very efficient

and easy to implement.

In section 3.6, we will compare SPER to the method of computing exact effective resistances and

setting a threshold on them by looking at the average error of the DC simulation of the new reduced

sparsified grids from both approaches compared to the original grid. We will see that they are almost

the same.

3.4 Extension to RC Grids

In this section, we will extend the DC grid reduction of section 3.2 to RC power grids. Our main

contribution is to reduce an RC grid numerically to eliminate multiple layers at once, rather than

topologically on a node by node basis. We introduce a method to approximate the reduced capacitance

matrix such that we have a realizable grid reduction technique. The approximation is based on the work


Algorithm 1 SPER

Inputs: Gin, h, l, δOutputs: Gsp

1: Gsp = new empty grid2: for all edges g(ij) between nodes i and j in Gin do3: if i or j is in layer k && (k < l || k > h) then4: add g(ij) to Gsp

5: else6: if i and j are not inside the same neighborhood in layer l || g(ij) ≤ δ then7: continue // DO NOT add g(ij) to Gsp

8: else9: add g(ij) to Gsp

10: end if11: end if12: end for13: return Gsp

done in [38] and [12] (refer to section 2.5.4). We give a proof to show that eliminating all the layers

at once using our approach and eliminating them on a node by node basis produce the same reduced

system.

Let C be the capacitance matrix defined in section 2.3.2. Assume that it is partitioned as follows:

C =

C1 0 0

0 C2 0

0 0 C3

(3.11)

where C1, C2 and C3 are all diagonal matrices representing the capacitances connected from each node

in the top, middle and bottom layers, respectively, to ground. Note that C1 is of size nt × nt, C2 is of

size nl × nl and C3 is of size nb × nb.Recall that the authors in [12] used the TICER algorithm [38] to account for the capacitance con-

nected to a node being eliminated using Node Elimination while approximately preserving the time

constant of the circuit. TICER distributes the capacitance of a node among its neighbors in a weighted

ratio of conductances. Analytically, this translates to:

c′(j) , c(j) +c(i)g(ij)

g(i)∀j ∈ N (i) (3.12)

where i is the node being eliminated, and c′(j) is the updated value of the capacitance connected from

node j to ground. This approximation assumes that the node being eliminated is a so-called quick node

[38], i.e.:

2πfmaxτ(i) << 1

where fmax is the maximum operating frequency of interest, and τ(i) =c(i)g(i)

is referred to as the time

constant of node i. Fortunately, as seen in Fig. 3.2, the maximum 2πfmaxτ(i) on all the nodes i in metal

layers M2 to M8 in modern chip power grids is very small, less than 0.012. Note that the power grid

used is generated internally based on 45nm technology data. Even if a higher fmax is chosen, such as 2

or 3 GHz, it is still clear that the quick nodes approximation is still valid for modern grids. Hence, all

the nodes in the layers being eliminated are considered quick nodes.

In this chapter, we generalize (3.12) to eliminate multiple layers at once, as follows. Let c1 be an


2πfmaxτ(i)

0 0.002 0.004 0.006 0.008 0.01 0.012

NodeCou

nt

0

2000

4000

6000

Figure 3.2: Quick Node property of a power grid of size 152K nodes (fmax = 1GHz).

nt × 1 column vector of the capacitors connected from node to ground at every node in the top layers,

i.e., c1 is a column vector of the diagonal entries of C1. Likewise, let c2 and c3 be nl × 1 and nb × 1

column vectors of the diagonal entries of C2 and C3, respectively.

Let:

c1 , c1 −G12G−122 c2 (3.13a)

c3 , c3 −GT23G−122 c2 (3.13b)

and let:

C ,

[C1 0

0 C3

](3.14)

where C1 and C3 are diagonal matrices consisting of the entries of c1 and c3, respectively. Claim 2

proves that C is the reduced capacitance matrix of the grid obtained by applying (3.12) sequentially,

while eliminating the desired nodes.

Before we start with the main claim of this section, we will prove that the order of the nodes (or

layers) to be eliminated in the original system matrix does not matter, as we always obtain the same

reduced model using our layer elimination approach. This observation helps in proving our main result.

Claim 1. Eliminating q nodes in any order using (3.7) and (3.13) from an RC power grid produces the

same reduced RC grid.

Proof. Let G22 be an nl × nl conductance matrix that represents the connections between the nl nodes

to be eliminated in a specific order. Changing the order of the nodes to be eliminated is achieved by

permuting the order of the rows and columns of G22, using PG22PT , where P is a permutation matrix.

A permutation matrix P is a matrix obtained by permuting the rows or columns of an identity matrix

according to some permutation. Moreover, P−1 = PT , because permutation matrices are unitary [34].

Let P be any nl×nl arbitrary permutation matrix, and let Q be another permutation matrix of size

n× n given by:

Q =

Int0 0

0 P 0

0 0 Inb

(3.15)

where n = nt + nl + nb, and Intand Inb

are identity matrices of sizes nt × nt and nb × nb, respectively.


Let G′ be a reordered version of G in (3.1) given by:

G′ = QGQT

=

Int0 0

0 P 0

0 0 Inb

G11 G12 0

GT12 G22 G23

0 GT23 G33

Int

0 0

0 PT 0

0 0 Inb

=

G11 G12PT 0

PGT12 PG22PT PG23

0 GT23PT G33

(3.16)

From (3.16), we have:

G′11 = G11

G′12 = G12PT

G′22 = PG22PT

G′23 = PG23

G′33 = G33

(3.17)

To find the reduced grid of G′ after elimination, we apply (3.7) on the partitions in (3.17). Hence,

we get:

G′11 = G′11 −G′12G′−122 G′T12

= G11 −G12PTPG−1

22 PTPGT12

= G11 −G12G−122 G

T12

= G11

G′13 = −G′12G′−122 G′23

= −G12PTPG−1

22 PTPG23

= −G12G−122 G23

= G13

G′33 = G′33 −G′T23G′−122 G23

= G33 −GT23PTPG−1

22 PTPG23

= G33 −GT23G−122 G23

= G33

(3.18)

where G′−122 = (PG22P

T )−1 = PG−122 P

T , since P is unitary. From (3.18), we can see that G = G′.

Similarly, reordering the matrix C using Q gives the following reordered versions of the vectors c1, c2

and c3:

c′1 = c1

c′2 = Pc2

c′3 = c3

(3.19)


Using (3.17) and (3.19) in (3.13), we get:

c′1 = c′1 −G′12G′−122 c′2

= c1 −G12PTPG1

22PTPc2

= c1

c′3 = c′3 −G′T23G′−122 c′2

= c3 −GT23PTPG1

22PTPc2

= c3

(3.20)

From (3.20), we can see that C ′ = C, and this completes the proof.

Now we prove the equivalence between (3.12) and (3.13).

Claim 2. Applying (3.13) when eliminating all q nodes of the middle layers gives the same capacitance

values as applying (3.12) sequentially q times for the nodes in the middle layers, while using Node

Elimination.

Proof. We will prove that the new capacitances connected to the top layers are exactly the same in both

cases. In other words, we are going to prove that (3.13a) and (3.12) are equivalent. The proof for the

bottom layers’ capacitances (3.13b) follows the same structure, and is skipped for brevity. The proof is

by induction on the number of nodes being eliminated. Throughout the rest of the proof, the following

notation will be used.

Let G be the n×n conductance matrix as defined in (2.5). Define G(q) to be a q-partition of G, where

a q-partition is a block-matrix representation of G such that the number of nodes to be eliminated is q,

and the indexing of the nodes is as follows: all the nodes before the q nodes (top nodes) have indices

1, 2, . . . , nt, all the nodes to be eliminated have indices nt + 1, . . . , nt + q, and all nodes after the q nodes

(bottom nodes) have indices nt + q + 1, . . . , n. Analytically,

G(q) =

G

(q)11 G

(q)12 G

(q)13

G(q)T

12 G(q)22 G

(q)23

G(q)T

13 G(q)T

23 G(q)33

(3.21)

This means that G(q)22 consists of all the q nodes to be eliminated. The sizes of the partitions are as

follows: G(q)11 is nt × nt, G(q)

12 is nt × q, G(q)22 is q × q, G(q)

23 is q × nb, G(q)13 is nt × nb, and G

(q)33 is nb × nb,

where nb = n− (nt + q).

Recall that in (3.1), G13 = 0. This is due to the fact that in (3.1), G22 consists of a whole metal layer

(or multiple layers), and as discussed in section 2.3.1, each metal layer in the structure of chip power

grids is connected to only the layers above and below it directly. This means that there are no physical

connections between the top and bottom layers, and G13 = 0. Hence, if the q nodes being eliminated

form a whole layer, then G(q)13 = 0. In fact, if q = nl, then nt = nt, nb = nb, and the block-matrix

representations in (3.1) and (3.21) are equivalent.

Furthermore, let G(q) be the resulting (nt+nb)×(nt+nb) conductance matrix after applying Gaussian


Elimination on G(q)22 , i.e., let G

(q)11 , G

(q)13 and G

(q)33 be such that:

G(q)11 = G

(q)11 −G

(q)12 G

(q)−1

22 G(q)T

12

G(q)13 = G

(q)13 −G

(q)12 G

(q)−1

22 G(q)23

G(q)33 = G

(q)33 −G

(q)T

23 G(q)−1

22 G(q)23

(3.22)

Then:

G(q) =

[G

(q)11 G

(q)13

G(q)T

13 G(q)33

]

Note that finding the above expression for G(q)13 follows the same structure of the Gaussian Elimination

mentioned in section 3.2.

Finally, let us define C(q), a q-partition of C, such that c(q)1 , c

(q)2 and c

(q)3 are the nt × 1, q × 1

and nb× 1 vectors representing the capacitors from each node to ground in the corresponding partitions

of the grid. And, let c(q)1 and c

(q)3 be the resulting nt × 1 and nb × 1 vectors after the elimination of q

nodes using (3.13), respectively.

The structure of the proof is as follows:

• Base step: Form a 1-partition of G and C, and prove that (3.12) and (3.13a) are equivalent.

• Inductive step: For any integer q > 1,

1. Form a (q− 1)-partition of G and C, and reduce the system using (3.22) and (3.13) to get G′

and C ′. Assume that (3.12) and (3.13a) are equivalent when eliminating q − 1 nodes.

2. Eliminate 1 more node using node elimination and (3.12) from G′ and C ′ to get C ′′ and its

partitions c′′1 and c′′3 .

3. Form a q-partition of G and C using the (q − 1) nodes and the additional node used in

steps 1 and 2. Apply (3.13a) to get c(q)1 .

4. Prove that c′′1 = c(q)1 .

Base Step: Let node k = nt + 1 be the node we need to eliminate, and let G(1) and C(1) be the

1-partitions made on G and C to eliminate node k. Then, G(1)22 = g(k), c

(1)2 = c(k), and:

(G(1)12 )jk =

−g(jk) if j is a neighbor of k

0 otherwise

Applying (3.13a), we get:

c(1)1 = c

(1)1 −G

(1)12 G

(1)−1

22 c(1)2 = c

(1)1 −

c(k)

g(k)G

(1)12

which translates to:

(c(1)1 )j =

(c

(1)1 )j +

c(k)g(jk)

g(k)if j is a neighbor of k

(c(1)1 )j otherwise

(3.23)


On the other hand, applying (3.12) to get the new capacitances at all top nodes, we get:

(c(1)1 )j =

(c

(1)1 )j +

c(k)g(jk)

g(k)if j is a neighbor of k

(c(1)1 )j otherwise

(3.24)

We can see that (3.23) and (3.24) are the same. This concludes the base step.

Inductive Step: Assume that the claim is true for eliminating q − 1 nodes, i.e., assume that:

c(q−1)1 = c

(q−1)1 −G(q−1)

12 G(q−1)22 c

(q−1)2

c(q−1)3 = c

(q−1)3 −G(q−1)T

23 G(q−1)22 c

(q−1)2

(3.25)

are equivalent to applying (3.12) q − 1 times sequentially while eliminating q − 1 nodes using node

elimination.

Let the additional node to be eliminated be the first node in the bottom nodes, which has index

k = nt + q. From Claim 1, note that eliminating the nodes in any order does not affect the final reduced

system. Forming the q-partition of G with this additional node, we get:

c(q)1 = c

(q−1)1 = c1 (3.26a)

c(q)2 =

[c(q−1)2 c(k)

]T(3.26b)

c(q−1)3 =

[c(k) c

(q)3

]TG

(q)11 = G

(q−1)11

G(q)12 =

[G

(q−1)12 γk

](3.26c)

G(q−1)13 =

[γk G

(q)13

]G

(q)22 =

[G

(q−1)22 µk

µTk g(k)

](3.26d)

G(q−1)33 =

[g(k) αTkαk G

(q)33

]

where γk is an nt × 1 vector representing the connections between the top nodes and node k, µk is a

(q−1)×1 vector representing the connections between the block of the first (q−1) nodes being eliminated

and node k, and αk is an nb × 1 vector representing the connections between the bottom nodes and node

k. Note that: G(q−1)23 =

[µk X

], while G

(q)23 =

[X

αTk

], where X is a (q − 1) × nb non-positive matrix.

Figs. 3.3 and 3.4 illustrate how G(q) and c(q) are composed in terms of G(q−1) and c(q−1), respectively.

In what follows, we will give some results that will be helpful in the rest of the proof. From [34], the


Figure 3.3: Illustration of G(q)

Figure 3.4: Illustration of c(q)

inverse of any non-singular 2×2 block-matrix A =

[B E

F C

], where B is non-singular, can be written as:

A−1 =

[B−1 +B−1ES−1FB−1 −B−1ES−1

−S−1FB−1 S−1

](3.27)

where S = C − FB−1E is the well-known Schur complement of C in A.

Let ϕk = G(q−1)−1

22 µk. Then, using (3.26d) and (3.27), we can write:

G(q)−1

22 =

[P λ

λT r

](3.28)

where

P = G(q−1)−1

22 + rϕkϕTk

λ = −rϕk

r = (g(k) − µTk ϕk)−1

(3.29)

where r is a scalar, λ is a (q − 1)× 1 vector, and P is a (q − 1)× (q − 1) matrix.

Let:

g(q−1)(k) , r−1 = g(k) − µTk ϕk = g(k) − µTkG

(q−1)−1

22 µk (3.30)

Moreover, let:

γk , γk −G(q−1)12 ϕk = γk −G(q−1)

12 G(q−1)−1

22 µk

c(q−1)(k) , c(k) − ϕTk c

(q−1)2 = c(k) − µTkG

(q−1)−1

22 c(q−1)2

(3.31)

Note that g(q−1)(k) is the total incident conductance at node k after eliminating the first q − 1 nodes.


This can be seen by evaluating G(q−1)33 using (3.22):

G(q−1)33 = G

(q−1)33 −G(q−1)T

23 G(q−1)−1

22 G(q−1)23

Since g(k) =(G

(q−1)33

)11

, and µk is the column of G(q−1)23 that corresponds to the connections to node k,

then:

g(q−1)(k) =

(G

(q−1)33

)11

(3.32)

Similarly, we can show that c(q−1)(k) is the capacitance connected to node k after eliminating the first

q − 1 nodes by evaluating c(q−1)3 using (3.25):

c(q−1)3 = c

(q−1)3 −G(q−1)T

23 G(q−1)−1

22 c(q−1)2

Since c(k) =(c(q−1)3

)1, and µk is the column of G

(q−1)23 that corresponds to the connections to node k,

then:

c(q−1)(k) =

(c(q−1)3

)1

(3.33)

Using the same reasoning, one can prove that γk is an nt × 1 vector representing the connections

between the top nodes and node k after eliminating the first q − 1 nodes.

Going back to the main part of the proof, we have from (3.13a):

c(q)1 = c

(q)1 −G

(q)12 G

(q)−1

22 c(q)2 (3.34)

Plugging (3.26a)-(3.26d) and (3.28)-(3.31) into (3.34), we get:

c(q)1 = c1 −

[G

(q−1)12 γk

] [ P λ

λT r

][c(q−1)2

c(k)

]= c1 −G(q−1)

12 (Pc(q−1)2 + c(k)λ)− γk(λT c

(q−1)2 + rc(k))

= c1 −G(q−1)12

((G

(q−1)−1

22 + rϕkϕTk

)c(q−1)2 − c(k)rϕk

)− γk(−rϕTk c

(q−1)2 + rc(k))

= c1 −G(q−1)12

(G

(q−1)−1

22 c(q−1)2 +

ϕkϕTk c

(q−1)2

g(q−1)

(k)

− c(k)ϕk

g(q−1)

(k)

)− −ϕ

Tk c

(q−1)2 +c(k)

g(q−1)

(k)

γk

= c1 −G(q−1)12 G

(q−1)−1

22 c(q−1)2 +G

(q−1)12 ϕk

c(k)−ϕTk c

(q−1)2

g(q−1)

(k)

− c(k)−ϕTk c

(q−1)2

g(q−1)

(k)

γk

= c(q−1)1 − c(k)−ϕT

k c(q−1)2

g(q−1)

(k)

(γk −G(q−1)12 ϕk)

= c(q−1)1 −

c(q−1)

(k)

g(q−1)

(k)

γk

(3.35)

which translates to:

(c(q)1 )j =

(c

(q−1)1 )j +

c(q−1)(k) |(γk)j |

g(q−1)(k)

if j is a neighbor of k

(c(q−1)1 )j otherwise

(3.36)


Algorithm 2 Incremental Elimination

Inputs: G,C, h, l, δOutputs: G, C1: for k = h− 1 down to l + 1 do2: layer nodes := get nodes of layer k3: Form partitions of G and C using (3.1) and (3.11) based on layer nodes4: Compute G based on (3.7) using G5: Compute C based on (3.14) using C6: C := C7: G := SPER(G, h, k − 1, δ)8: end for9: G := G

10: C := C11: return G, C

On the other hand, let G′ = G(q−1) and C ′ = C(q−1), applying node elimination then (3.12) to

eliminate the same node k, and get the new capacitances at all nodes in the top nodes c′′1 , we get:

(c′′1)j =

(c′1)j +c′(k)g

′(kj)

g′(k)

if j is a neighbor of k

(c′1)j otherwise

(3.37)

Using the fact that G′ = G(q−1) and C ′ = C(q−1), we can see from (3.32) that g′(k) = g(q−1)(k) . Similarly,

from (3.33), we can see that c′(k) = c(q−1)(k) . It can also be shown using a similar reasoning that g′(kj) =

|(γk)j | ,∀j. Hence, we can see that c′′1 = c(q)1 , and this completes the proof.

3.5 Incremental Layer Elimination

Looking at the reduced conductance matrix in (3.7) and the approximate reduced capacitance matrix in

(3.13), we can see that they involve computing G−122 , which represents all the nodes in the layers to be

eliminated. As the size of the grid and number of layers increase, the number of nodes to be eliminated

becomes larger. Clearly, computing the inverse will take more CPU time, and the result may not fit in

memory anymore. To this end, we propose an incremental layer-by-layer approach to the elimination

process. The idea is to eliminate one layer at a time and sparsify the reduced grid along the way until

we eliminate all the desired layers. This should reduce the time needed for elimination and the memory

needed in the process. In section 3.6, we will compare both approaches, the direct and incremental

ones. We will see that the incremental approach takes much less time, and it allows us to use very

large grids that we couldn’t test with the direct approach. Algorithm 2 summarizes our incremental

elimination approach. The algorithm takes as input the original grid conductance matrix G, the original

grid capacitance matrix C, the index of the lowest top layer h and the index of the topmost bottom

layer l such that ∀k ∈ l+ 1, . . . , h− 1, layer k is to be eliminated. It also requires δ, the sparsification

threshold on conductances to keep. The output of the algorithm is G and C which are the reduced grid

conductance and capacitance matrices, respectively. SPER is the sparsification method described in

section 3.3.


Table 3.1: Power Grids Specifications

Grid Name Size (# Nodes) # Current Sources) # Voltage Sources

G1 152K 32970 81

G2 342K 73710 169

G3 609K 131320 289

G4 950K 205450 484

G5 1.36M 295890 676

G6 1.86M 403025 900

G7 2.43M 526680 1156

G8 3.086M 668972 1482

3.6 Experimental Results

In this section, we will present some results from tests run on the proposed Layer Elimination method

with sparsification. First, we will test the proposed sparsification technique (SPER) compared to com-

puting the exact effective resistance and using a threshold of Reff,th = 13Ω. Then, we will compare the

direct and incremental approaches for layer elimination by comparing their DC responses and elimina-

tion time. Next, we will test the reduced RC grid by comparing its transient voltage drops to those

computed on the original grid.

A C++ implementation was written to perform the above tests. Our test grids were generated based

on user specifications such as the grid dimensions, number of layers, layers’ geometrical specifications

(pitch, width, orientation, offset and sheet resistance) and current source distributions. The generated

grids are consistent with a 45nm technology. Table 3.1 shows the grids used in our tests along with

their specifications. In all tests, the number of metal layers in the grids was set to 8. The number of

layers eliminated was 6, keeping only layers M8 that is attached to the C4 bumps (top layer) and M1

that is attached to the current sources (bottom layer). The model assumes that there is a current source

attached to each non-boundary node in M1. All tests were run on a Linux machine with 3.4 GHz Intel

core i7-4770 processor with 8 cores and 32GB of RAM. Fig. 3.5 shows sparsity patterns of the system

matrix at different phases of the elimination process. The grid used is of size 6K nodes with 8 layers.

Fig. 3.5a shows the original sparsity pattern of the conductance matrix G before reduction. Fig. 3.5b

shows the grid matrix after eliminating only four layers, while Fig. 3.5c shows the final pattern of the

reduced grid after eliminating six layers, keeping only the topmost and bottommost layers. Looking at

how the pattern is changing, we can see how SPER captures the important edges of the reduced model

(only connections to spatially close neighbors and to the upper layer’s nodes) that keep almost the same

electrical characteristics of the original grid.

Whenever we compare two things by looking at the error, we mean the error at the nodes of layer

M1 (that are attached to current sources) in the reduced grid relative to the exact values of those in the

original grid. The comparison may be based on the DC voltage drops or the transient voltage drops in

an RC grid.


nz = 24774

0 1000 2000 3000 4000 5000 6000

0

1000

2000

3000

4000

5000

6000

(a) Sparsity pattern of the original G matrix

nz = 34613

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0

500

1000

1500

2000

2500

3000

3500

4000

4500

(b) Sparsity pattern after eliminating 4 layers

nz = 36918

0 200 400 600 800 1000 1200

0

200

400

600

800

1000

1200

(c) Sparsity pattern of the final reduced G matrix

Figure 3.5: Sparsity patterns of the system matrix at different phases of the elimination process

3.6.1 SPER v.s. Exact Effective Resistance

In this section, we compare SPER with the approach of computing the exact effective resistance and

comparing it to the threshold for sparsification. We will use the direct approach for elimination, i.e.

we eliminate 6 layers at once. The value of the tolerance being used is δ = 10−6, and the threshold on

the effective resistance is 13Ω as mentioned earlier. Table 3.2 displays the results of this comparison

with two different sizes for the grids, in terms of total number of nodes. For each method, we show

the elimination time (ET), the sparsification time (SPT) and the average error of their respective DC

response compared to the original grid. We can see that the sparsification time in the effective resistance

approach takes a lot of time, which is expected as the new dense matrix is full. Recall that when the

matrix is full, we need at least(n2

)DC solves for computing all the effective resistances. Notice that

SPER performs better in terms of error. This is due to the fact that the neighborhood chosen in SPER,

which is discussed in section 3.3 contains more neighbors to one node than the 13Ω threshold on effective

resistances.


Table 3.2: Comparing SPER to computing exact effective resistance for sparsification

Grid SizeSPER Exact ER with Reff,th = 13Ω

ET (sec) SPT (sec) Avg. Error (mV) ET (sec) SPT (min) Avg. Error (mV)

6K 2.165 0.0158 2.25 2.165 5.33 2.7

10K 6.474 0.0618 1.7 6.474 21.32 2.2

Table 3.3: Comparison between Direct and Incremental Elimination. Grid Name: G1, Grid Size: 152K

Measure Direct Elimination Incremental Elimination

Sparsity 0.279% 0.1833%

ET (min) 11.11 0.7

SPT (sec) 11.09 33.974

Speed-up in DC solve 1.57x 1.71x

Average Error (mV) 1.907 1.943

3.6.2 Direct vs. Incremental Elimination

In this section, we present a comparison between the direct and the incremental elimination processes

discussed in sections 3.2 and 3.5, respectively. Table 3.3 presents such a comparison on a grid of size 152K

nodes. The size of the reduced system is 33K nodes. For each approach, the table shows the sparsity

of the reduced resistive systems (the sparsity of the original system is 0.00261%), the elimination time

(ET), the sparsification time (SPT) using SPER, the speed-up in the DC solve of the systems compared

to the original system, and the average error of the DC voltage drops at layer M1. The sparsity of any

n×m matrix A is calculated as follows:

Sparsity(A) =nnz(A)

nm× 100%

where nnz(A) is the number of non-zeros in the matrix A. We can see that the elimination time of the

direct approach is around 15 times that of the incremental approach. This is due to the fact that in

the direct approach, the size of the matrix being inverted is much larger than those being inverted in

the gradual process. However, the sparsification process in the incremental elimination is 3 times slower

than the direct one. This is because in the incremental approach, SPER is called as many times as there

are layers being eliminated, and in each time, it goes over all the connections in the reduced system. The

error in the incremental process is insignificantly larger, however, the speed-up in the DC solve is better.

Finally, it is worth mentioning that the direct simulation stopped working on grids of size beyond 300K

nodes due to lack of memory, while with the incremental simulation, we were able to go up to sizes of

the order of 3M nodes as shown in Table 3.5.

Table 3.4 compares the reduction time and ratio of our incremental numerical approach with the

topological one implemented in [12], which is based on TICER [38]. A lower threshold of δ = 10−5 was

used on the new conductances to be added to the reduced grids in both cases. The table presents, along

with the size of the original grid, the reduction time (ET + SPT ) in our incremental approach and RT

in the topological one. In addition, the table presents the reduction ratio in both approaches, which is

the ratio of the size of the reduced grid to that of the original one. From the table, we can see that our


Table 3.4: Numerical Elimination v.s. Topological Elimination

Original Grid Numerical Elimination Topological Elimination

Name SizeReduced

ET + SPTReduction Reduced

RTReduction

Size Ratio(%) Size Ratio (%)

G1 152K 33K 1.27m 78.3 44.7K 3.4m 70.6

G2 342K 74K 6.54m 78.4 100.4K 16.3m 70.6

G3 609K 133K 25.2m 78.2 179K 50.7m 70.6

Table 3.5: Incremental layer elimination results

Original Grid Reduced Grid Speedup in Average

Name Size FT (sec) ST (sec) Size ET SPT FT (sec) ST (sec) DC Solve Error (mV )

G2 342K 1.6605 0.0771 74K 3.6m 2.9m 0.8039 0.0161 2.12x 2.2

G3 609K 3.6797 0.1278 133K 15.3m 9.9m 2.0739 0.0299 1.81x 2.2

G4 950K 7.140 0.2061 208K 46.10m 19.2m 6.5557 0.0477 1.12x 2.3

G5 1.36M 13.191 0.3377 299K 111.98m 40.11m 9.8535 0.0698 1.36x 2.2

G6 1.86M 19.892 0.4296 407K 3.4hr 1.2hr 13.9014 0.0964 1.45x 2.2

G7 2.43M 30.5293 0.6038 532K 5.746hr 2.16hr 18.9831 0.1292 1.63 2.3

G8 3.086M 47.1174 0.777 675K 9.04hr 3.55hr 25.0598 0.1732 1.9x 2.3

incremental approach is almost 2.2 times faster than the topological one. Besides, the reduction ratio

in our technique is higher, which is better as we aim to eliminate complete layers and keep only the

important nodes. Note that the topological approach broke down after a grid size of 609K due to lack

of memory, while our approach handled much larger grid sizes.

Finally, Table 3.5 displays the results of our incremental layer elimination tested on multiple grid

sizes. The table shows the size of each of the original and reduced grids in number of nodes, the

elimination time (ET) and the sparsification time (SPT). For each of the original and reduced grids, the

table shows the factorization time (FT) of the system matrix that should be done before solving for the

DC voltage drops. In addition, the table shows the solve time (ST) after factorization. Finally, the table

shows the speed-up in the DC solve of the reduced grid compared to the original grid and the average

error in the DC voltage drops. Note that the speed-up was computed based on the following equation:

Speedup =(FT + ST )original(FT + ST )reduced

If we exclude the factorization time in the speed-up computation, we can see that the DC solve is almost

4.5 times faster in the reduced grid. The benefit of this will be seen in the transient simulations. From

the table, we can see that the reduction ratio is around 80% with an average speed-up of 1.63x and an

average error of around 2.2mV .

3.6.3 Transient Simulations

To find the transient voltage drops on all the nodes in an RC power grid, we created a simple simulation

engine for the ODE in (2.3) based on a standard backward-Euler discretization. All current waveforms


Table 3.6: Power grids reduction time and Transient simulations run time

Original Grid Reduced Grid

Speedup

Average Worst Case

Name SizeTransient

SizeElimination Sparsification Transient Error Error

Time (TT) Time (ET) Time (SPT) Time (TT) in (mV ) in (mV )

G1 152K 1.17hr 33K 43.68s 36.92s 16.68m 4.2x 2.4 14.5

G2 342K 2.51hr 74K 3.71m 2.94m 35.49m 4.24x 2.5 19.1

G3 609K 4.34hr 133K 15.8m 9.7m 1.01hr 4.3x 2.4 16.2

G4 950K 8.05hr 208K 47.25m 21.06m 1.91hr 4.21x 2.3 14.3

G5 1.36M 11.22hr 299K 112.95m 43.41m 2.64hr 4.25x 2.3 17.7

G6 1.86M 19.18hr 407K 4.24hr 1.34hr 4.25hr 4.51x 2.3 15.4

G7 2.43M 23.65hr 532K 7.42hr 2.39hr 5.11hr 4.63x 2.4 16.0

G8 3.08M 26.59hr 676K 9.06hr 3.49hr 5.73hr 4.64x 2.39 16.9

were generated randomly. When comparing the voltage drops on the original and reduced grids, we

computed the worst-case voltage drop error at each node over the whole simulation time, and then took

the average and maximum error over all the nodes.

Fig. 3.6a shows the voltage drop waveform before and after reduction at one node in a grid of size

152K. We can see that our capacitance approximation is good, since the rate of convergence to steady

state is almost the same. In other words, the time constant of both grids is almost the same.

Table 3.6 shows the transient simulation results for different grid sizes. The length of the simulation

was taken to be 120ns. For each grid, the table presents the transient simulation time (TT) before

and after reduction, the elimination and sparsification times (ET and SPT), the speed-up gained after

reduction, and the average and maximum worst-case errors on the voltage drops. The speed-up is

computed as follows:

Speedup =TToriginalTTreduced

As can be seen in the table, the speed-up gain is around 4.25x. This speed-up is obtained due to our

topological sparsification procedure, which keeps only the important connections in the reduced grid.

Accordingly, the time taken in one linear solve of the reduced grid is around 4.25 times smaller that of

the original grid, and hence, the 4.25x speed-up. This can also be seen in the 4.5x speed-up in the DC

solve shown in the previous section. Fig. 3.7 shows that as the simulation time increases, the effect of

the reduction overhead diminishes and becomes negligible.

Fig. 3.6b shows a histogram of the worst-case errors on all M1 nodes in a grid of size 152K nodes.

Moreover, Fig. 3.8 shows a 3D plot of the worst-case error at all nodes in M1. We can see that the

error is above average only at the boundary nodes of the grid. This is expected as there are not enough

connections to the nodes at the boundary to counter this effect. As mentioned before, there is a clear

trade-off between the simulation speed-up and the accuracy of the simulations. To see this trade-off,

Fig. 3.9 shows the relative worst-case error rate at the M1 nodes versus the actual voltage drop for two

cases. In the first case (Fig. 3.9a), we eliminate 6 out of 8 layers, while in the second case (Fig. 3.9b),

we eliminate only 5 layers by keeping layer M2. From the figures, we can see how keeping layer M2

affects the errors by restricting them below the 5mV hyperbolas. In fact, the average worst-case error

while keeping layer M2 is 1.2mV . However, it is worth mentioning that the speed-up in the first case

simulations is 4.2x, while in the second one, it is only 2.34x. Moreover, the elimination ratio in the first

case is around 80%, while in the second, it is only 46%. This gives an idea on how there is a trade-off

with more layers being eliminated.


Simulation Time (ns)0 2 4 6 8 10

VoltageDropatonenode(m

V)

0

10

20

30

40

50

60

(a)

Original Voltage DropVoltage Drop after Reduction

Worst case error (mV)0 5 10 15

Number

ofoccurren

ces

0

2000

4000

6000

8000

10000

12000

(b)

Figure 3.6: (a): Voltage Drop Waveforms at one node in the grid(b): Histogram of the error over all nodes

Transient Simulation Time (ns)0 5 10 15 20 25 30

TotalSim

ulationTim

e(m

in)

0

2

4

6

8

10

12

14

Original Sim. Time

Reduced Sim. Time w/ Reduction Time

Reduced Sim. Time w/out Reduction Time

Figure 3.7: Total Simulation Time before and after reduction


100

Node’s x-position (µm)

80

60

40

20

00

20Node’s y-position (µm)40

60

80

15

0

5

10

100

Worst-case

transienterror(m

V)

2

4

6

8

10

12

14

Figure 3.8: A 3D plot of the error at M1 nodes after reduction


Voltage Drop (mV)0 10 20 30 40 50 60

ErrorRate

(%)

-200

-150

-100

-50

0

50

100

150

200

Change in voltage drop

-5 mV

5 mV

(a) Error rate with layer M2 eliminated

Voltage Drop (mV)5 10 15 20 25 30 35 40 45 50 55

ErrorRate

(%)

-100

-80

-60

-40

-20

0

20

40

60

80

100

Change in voltage drop

-5 mV

5 mV

(b) Error rate while keeping layer M2

Figure 3.9: Error rate versus Actual voltage drop

Chapter 4

Voltage Drop Predictor

4.1 Introduction

As previously mentioned in chapter 2, with modern scaling techniques, the number of nodes in power

grids is rapidly increasing, making the full grid simulation very expensive, especially if storage elements

such as capacitors and inductors are involved. To this end, we aim in this chapter to estimate the

voltage drops at different metal layers without the need to explicitly simulate the whole grid. In fact, we

empirically prove that one can estimate the voltage drop at a specific metal layer from the voltage drop

at the bottommost layer, irrespective of the characteristics of the grid. This proves beneficial in many

applications. For example, with separate blocks and power domains in modern grids, knowing how much

voltage drop to allow in the global metal layers is always an issue for grid designers. Therefore, it would

be interesting to estimate the level of the voltage drop at the different metal layers without the need to

simulate the whole grid model. Another example is that applying the grid reduction technique studied in

chapter 3 and then simulating the grid would only verify the safety of the nodes at the bottommost layers

(layers that are connected to the underlying circuitry). However, if one is concerned with the simulation

response in all layers of the grid, then one should simulate the whole grid. With the estimated levels

of voltage drop , one can use the numerical reduction method studied in chapter 3 to reduce the gird

and simulate it to get the voltage drops at the bottommost layer. After that, the voltage drops at other

layers, can be easily inferred and checked.

This chapter is organized as follows: In section 4.2, we review a power grid verification technique

based on vectorless verification. This technique will be used in this chapter, instead of the direct vector-

based simulation methods, to simulate the grid and study the effects of its various characteristics. In

section 4.3, we present the main contribution of this chapter in predicting the voltage drops at different

metal layers. We also present some experimental results of the prediction method and give some analysis.

4.2 Background

4.2.1 Power Grid Model

Consider the RC grid model presented in section 2.3.2. Recall that the grid equation can be written as:

45

Chapter 4. Voltage Drop Predictor 46

Gv(t) + Cv(t) = i(t) (4.1)

Using finite difference approximation as in [25], (4.1) can be rewritten as:

Av(t) =C

∆tv(t−∆t) + i(t) (4.2)

where ∆t > 0 is a small time increment, and A = (G + C∆t ), similar to G, is a symmetric positive-

definite M-matrix [23], so that A−1 ≥ 0. In the next subsection, we define constraints on the current

vectors used in the vectorless verification of power grids.

4.2.2 Current Constraints

A vectorless verification is an approach based on partial current specification in the form of current

constraints [23]. Those constraints capture the uncertainty of circuit behaviors early in the design flow.

We use two types of constraints defined as follows:

1. Local constraints: These constraints are upper bounds on the individual current sources. We

assume that every grid node has a current upper bound associated with it; if a node does not have

a current source attached, the upper bound for that node is 0. We can express these constraints

as follows:

0 ≤ i(t) ≤ iL, ∀t ≥ 0 (4.3)

2. Global constraints: These constraints are upper bounds on the sums of currents for groups of

current sources. They represent the peak total power dissipation of a group of circuit blocks. Let

m be the total number of global current constraints, then we represent these constraints as follows:

0 ≤ Umi(t) ≤ iG, ∀t ≥ 0 (4.4)

where Um is an m×n incidence matrix that contains 0’s and 1’s, which indicate the current sources

that form each global constraint.

Typically, those constraints are obtained from engineering judgement and expertise from previous

design activities. Using the above constraints, we define a feasible space of currents, denoted by F , such

that i(t) ∈ F if and only if the local and global constraints are satisfied.

4.2.3 Vectorless Power Grid Verification

Vectorless verification refers to the class of verification methods that do not rely on simulating the grid

for specific input patterns. Given a power grid, we are interested in finding upper bounds on the worst-

case voltage drops at all nodes in order to ensure the grid’s safety given a set of possible transient current

waveforms i(t) specified by current constraints. The derivation of the upper bound is as follows [23].

Let “emax[x]” be an element-wise maximization operator of its vector argument x, and let v∗ denote

the vector whose every entry is the worst-case voltage drop at its corresponding node under all possible


current waveforms that satisfy the current constraints, then we have from (4.2):

v∗ = emax∀i(t)∈F

v(t) = emax∀i(t)∈F

[A−1 C∆tv(t−∆t) +A−1i(t)]

≤ emax∀i∈F

[A−1 C∆tv(t−∆t)] + emax

∀i∈F[A−1i]

(4.5)

Note that we dropped the time variable t in i, as the local and global constraints do not depend on time,

and hence, the feasible space F is the same at every time point.

Given that A−1 ≥ 0, and C ≥ 0 for an RC grid, then A−1 C∆t ≥ 0, and it follows from (4.5) that:

v∗ ≤ (A−1 C

∆t) emax∀i∈F

[v(t−∆t)] + emax∀i∈F

[A−1i] (4.6)

and because v∗ is time-independent, then emax∀i∈F

[v(t−∆t)] = emax∀i∈F

[v(t)] = v∗ and:

v∗ ≤ (A−1 C

∆t)v∗ + emax

∀i∈F[A−1i] (4.7)

so that:

(I −A−1 C

∆t)v∗ ≤ emax

∀i∈F[A−1i] (4.8)

Given that C∆t = A−G, we have I −A−1 C

∆t = I −A−1(A−G) = A−1G. Then, (4.8) becomes:

v∗ ≤ (A−1G)−1 emax∀i∈F

[A−1i]

= G−1A emax∀i∈F

[A−1i](4.9)

Hence, finding an upper bound on the voltage drop at each node boils down to running a linear

program (LP) for that node over all i ∈ F followed by a standard linear solve, where, for the purpose of

the optimization, i can be seen as an n× 1 dummy variable with units of current.

4.3 Voltage Drop Predictor

Even though computing the upper bound in (4.9) is much more efficient than finding the exact solution

for a transient analysis, it would be a waste of time and resources to run millions of LPs for very large

grids in order to find the upper bound at specific sets of nodes. Given that modern power grids have

3D multi-layered structures consisting of several separate blocks, it would be interesting to compute or

predict the voltage drop in the different layers of a specific block without simulating the whole grid.

In this chapter, we aim to prove that one can approximately predict the general characteristics of the

voltage drop in each layer in the grid by running only one LU factorization with a backward-forward

substitution. In other words, one can estimate the average voltage drop in a specific layer for the

transient case by looking only at the DC response of the grid.

We start by showing that irrespective of the DC voltage drop level at each metal layer, simulations

will always produce the same voltage drop budget level for each layer relative to other layers. In other

words, no matter how the current sources (underlying circuitry) are distributed at the bottommost layer,

the average DC response of each layer would be the same relative to the other layers.


Metal Layers

M8 M7 M6 M5 M4 M3 M2 M1

PercentageofDC

VoltageDropatM1

0%

20%

40%

60%

80%

100%

120%DC Voltage Drop Case 1

DC Voltage Drop Case 2

Figure 4.1: DC voltage drop distribution at each layer

4.3.1 DC Response

In this section, we show that given a voltage drop budget at layer M1 (bottommost layer), we can predict

the average DC response at each layer of the grid, regardless of the input excitation characteristics. The

test conducted to show this property is as follows. For an arbitrary grid, we randomly generate two

current vectors i1 and i2 based on a uniform distribution. Then, for each current vector, we get the

corresponding DC response; v1 = G−1i1 and v2 = G−1i2. We then compute the average voltage drop at

layer M1 in each case, and set that average to be our 100% voltage drop budget. After that, we compute

the average voltage drop at each layer and find its level relative to the 100% level at layer M1.

Fig. 4.1 shows the results of such a test with two different current vectors; one with an average of

0.2mA (case 1), and the other with an average of 0.65A (case 2). The grid used is of size 152K nodes,

consistent with a 45nm technology, with 8 metal layers. The yellow bars indicate the range of voltage

drops at each layer, and the solid lines indicate the average at the corresponding layer. As can be seen

in the figure, the average at each layer is the same in both cases compared to the voltage drop budget

limit.

In the next experiment, we change the number of the current sources used in one of the cases. Fig.

4.2 shows the results of such a test. In case 1, we use a 50% current source distribution at layer M1,

while in case 2, we use a 100% distribution. As seen in the figure, the average voltage drop at each

layer is the same compared to the 100% level.

With the previous tests, one can safely say that regardless of the characteristics of the current sources

attached to a power grid, we always get the same average DC voltage drop level at each layer compared

to the bottommost one. With such a property, we can immediately estimate the average voltage drop at

a specific layer in a specific block, if we have an estimate of the average voltage supplied to the transistor

circuitry.


Metal Layers

M8 M7 M6 M5 M4 M3 M2 M1

PercentageofDC

VoltageDropatM1

0%

20%

40%

60%

80%

100%

120%DC Voltage Drop Case 1

DC Voltage Drop Case 2

Figure 4.2: DC voltage drop distribution at each layer. Case 1 has 50% current source distribution

4.3.2 Transient Response

Similar to what has been done in the DC case, we simulate an RC grid using (4.9) to get an upper bound

on the worst-case voltage drop at each node. We use MOSEK driver for optimization problems [6] to

implement the LPs in (4.9). After that we compute the average at each layer, and set the average at

layer M1 to be the 100% voltage drop budget and relate the averages of the other layers to it. We

compare our results against the DC case. Fig. 4.3 shows the results of such a comparison. All local

and global constraints were generated randomly. The size of the grid used is 152K nodes with 8 layers.

We run the same test using a vector-based simulation for the ODE in (2.3) with randomly generated

current waveforms. Fig. 4.4 shows the results of such a test. As one can notice from both figures, the

average upper bound or transient voltage drop at each layer is exactly the same as the average of the

DC response, while the maximum and minimum values differ.

4.3.3 Transient Robustness

It is known that there are many factors that affect the voltage drop at a specific node in the grid. This

includes the geometry and topology of the grid such as the width and pitch of each layer, the current

waveforms or constraints used to simulate the grid, the number of current sources relative to the grid

size, and many other factors. In the previous section, we saw that the worst-case transient response

follows a profile similar to that of the DC response. In this section, we prove that such a property is

robust by changing several of the factors that affect the voltage drop.

The experiments are implemented as follows:

1. Find the DC response at each metal layer of the grid for a randomly generated current excitation

vector (based on a uniform distribution).

2. Find the transient response at each metal layer using (4.9) for randomly generated current con-

straints.

3. Change the space of current constraints and repeat step 2.


Metal Layers

M8 M7 M6 M5 M4 M3 M2 M1

PercentageofUpper

bound&

VoltageDropatlayer

M1

0%

20%

40%

60%

80%

100%

120%Transient Upper Bound

DC Voltage Drop

Figure 4.3: Upper bound & DC voltage drop distribution at each layer

Metal Layers

M8 M7 M6 M5 M4 M3 M2 M1

PercentageofTransient&

DC

VoltageDropatlayer

M1

0%

20%

40%

60%

80%

100%

120%Transient Voltage Drop

DC Voltage Drop

Figure 4.4: Transient & DC voltage drop distribution at each layer

4. For each response, set the average at layer M1 to be the 100% voltage drop budget limit and relate

the other layers to that limit.

5. Change one or multiple factors affecting the voltage drop, and repeat steps 1-4.

All tests are implemented in C++ on a Linux machine with a 3.4 GHz Intel core i7-4770 processor

with 8 cores and 32GB of RAM. The test grids are generated based on some user specifications; such as

the grid dimensions, number of metal layers, layers’ geometrical specifications, and the current source

distributions. The grids generated are consistent with a 45nm technology. All tests are run on a grid

of 8 metal layers and 38K nodes. In each test case, we compute the DC voltage drop at each node in

the resistive grid, and then we compute the upper bound on a similar RC grid with 4 different spaces of


Simulation Type

DC F1

F2

F3

F4

Perc

enta

ge o

f U

pper

bound &

Voltage D

rop a

t M

1

0%

20%

40%

60%

80%

100%

120%M1

avg

M2avg

M3avg

M4avg

M5avg

M6avg

M7avg

M8avg

Figure 4.5: Average transient and DC voltage drop distribution of each layer - Original Grid

currents defined below:

• F1: 100% current source distribution at layer M1 with 7 global constraints.


• F3: 100% current source distribution at layer M1 with 7 global constraints, but the values of the

global constraints were made smaller.


After computing the DC voltage drops and transient upper bounds, we normalize the average of

nodes’ voltage drops at each layer relative to the voltage drop budget limit at layer M1.

Figs. 4.5-4.9 show the results of the 5 tests run on 5 different grids. Each horizontal line represents

the average voltage drop or upper bound at a metal layer, starting at layer M8 at the bottom and ending

at layer M1 at the top. Test 1 takes as an input the grid with the original given pitches and widths. In

test 2, we increase the pitches of some layers, making the number of nodes in the grid decrease to 35K.

In test 3, we decrease the pitches of the layers, and this makes the size of the grid 41K nodes. In tests

4 and 5 we increase and decrease some layers’ widths, respectively. Note that changing the width of a

layer affects the branch resistances in that layer, which changes the G matrix of the system.

Examining the figures, we can see that in each test case, regardless of the nature of the feasible space,

the ratio of the average voltage drop at layer Mk, 2 ≤ k ≤ 8, to the average at layer M1 is exactly the

same as in the DC run. This suggests that the user can predict the voltage drop budget at each metal

layer relative to the maximum budget allowed from one linear LU solve only.

4.4 Conclusion

In this chapter, we have shown that a DC simulation of any power grid is sufficient to get a good idea

of the profile and characteristics of the voltage drop in that power grid. In other words, given the DC


Simulation Type

DC F1

F2

F3

F4

Perc

enta

ge o

f U

pper

bound &

Voltage D

rop a

t M

1

0%

20%

40%

60%

80%

100%

120%M1

avg

M2avg

M3avg

M4avg

M5avg

M6avg

M7avg

M8avg

Figure 4.6: Average transient and DC voltage drop distribution of each layer - Increased Pitches

Simulation Type

DC F1

F2

F3

F4

Perc

enta

ge o

f U

pper

bound &

Voltage D

rop a

t M

1

0%

20%

40%

60%

80%

100%

120%M1

avg

M2avg

M3avg

M4avg

M5avg

M6avg

M7avg

M8avg

Figure 4.7: Average transient and DC voltage drop distribution of each layer - Decreased Pitches

voltage drop in the grid, we can find the level of the transient response at one layer relative to the

other. This property is beneficial in many applications. For example, in grid verification, if we have an

estimate of the voltage supplied to the transistor circuitry relative to the external power supply, then

we can have an estimate of the voltage level at any other metal layer, and check the safety against the

desired voltage threshold. This property also helps in approximately verifying the whole grid even after

reducing its size. As mentioned earlier, when we use our proposed reduction technique in chapter 3 and

then simulate the grid, we will have a reading on the safety of the bottommost layer. With our predictor,

we can have a reading on the voltage drop of all upper layers in the original grid.


Simulation Type

DC F1

F2

F3

F4

Perc

enta

ge o

f U

pper

bound &

Voltage D

rop a

t M

1

0%

20%

40%

60%

80%

100%

120%M1

avg

M2avg

M3avg

M4avg

M5avg

M6avg

M7avg

M8avg

Figure 4.8: Average transient and DC voltage drop distribution of each layer - Increased Widths

Simulation Type

DC F1

F2

F3

F4

Perc

enta

ge o

f U

pper

bound &

Voltage D

rop a

t M

1

0%

20%

40%

60%

80%

100%

120%M1

avg

M2avg

M3avg

M4avg

M5avg

M6avg

M7avg

M8avg

Figure 4.9: Average transient and DC voltage drop distribution of each layer - Decreased Widths

Chapter 5

Conclusion and Future Work

With the fast, continued scaling of semiconductor technology, robust design of on-die power delivery

networks has become of critical importance. A well-designed power distribution network is one that

supplies a supply-voltage that is fairly free from excessive fluctuations over time at the intended design

speeds. Excessive fluctuations may lead to functional failures, increased circuit delays and loss of yield.

Therefore, verification and simulation of power grids is a key step during a high-performance integrated

circuit design process. Verification ensures that the grid is reliable, and the voltage drop at any node

does not exceed a certain threshold. Several verification methods have been introduced in the past

decades, however, no technique could overcome the issue of large sizes of power grids. Modern power

grids may comprise of billions of components and nodes, and hundreds of thousands of ports. Directly

simulating and verifying such grids is computationally expensive, and in some cases infeasible given the

existing resources.

To overcome these issues, model order reduction (MOR) comes in handy, as the grid can be reduced

in size, while keeping its essential characteristics. MOR has been extensively used in the studies of IC

design for the past two decades. However, traditional MOR techniques suffer from several drawbacks

that limit their scalability to very large power grids. Memory requirements and reduction time increase

dramatically as the size of the network increases. Traditional numerical methods based on moment

matching require significant computational resources to compute the actual projection matrix, and the

resulting system is dense. Topological methods, like TICER [38], are suitable only for selective node

elimination, as they do not perform well with the mesh structure of power grids. Other methods do not

perform well when taking into account the capacitive and inductive parasitics while modeling the grids.

Although a number of attempts have been made to address such issues, they have limitations in terms

of accuracy, and serve only specific models of grids. Given the 3D multi-layered structure of modern

on-die power grids and the fact that all the transistor circuitry is connected to the bottommost layer

of the grid, it would be a waste of resources to simulate the whole grid in order to verify the nodes in

that layer. The focus of this thesis is twofold: 1) We propose an efficient reduction technique that is

suitable for modern power grids. Our technique aims to eliminate whole metal layers and keep only the

bottommost layers to simulate without losing too much accuracy. 2) We propose an efficient way to

predict and estimate the voltage drop at specific desired areas in the grid without the need to simulate

it.

In chapter 3, we propose a novel numerical layer-by-layer grid reduction technique that is both fast

54

Chapter 5. Conclusion and Future Work 55

and accurate. The method works by eliminating whole internal metal layers at a time, while preserving

the behavior of the original grid. Our reduction technique is realizable, as it produces a smaller RC

network from a large input RC network, and it contributes to the literature in mainly two ways. First, we

propose a simple, yet effective, sparsification technique in order to preserve grid sparsity after reduction

while preserving the original input-output behavior. The method factors in grid topology in order to

find the connections that strongly affect the output response of the system. A future direction would

be optimizing our sparsification technique to make it accuracy-driven, such that the algorithm sparsifies

the grid based on the accuracy desired. Second, we provide a numerical approach to approximate the

capacitive parasitics of the reduced system, while preserving the time constant of the original system.

Experimental results show that our reduction technique is capable of handling very large power grids.

The method is able to achieve around 4.5x in speed-up with a less than 5% (less than 2.4 mV ) average

worst-case error at the nodes of the bottommost metal layer.

In chapter 4, we propose a fast way to estimate the average voltage drop in a specific metal layer of

the grid without the need to run a transient simulation of the whole grid. We show that a DC simulation

of the grid is all that is needed to get a good idea of the profile of the voltage drop in the grid. Having

the average DC voltage drop at one layer, we can know the average DC/transient voltage drop at any

other layer relative to that layer. In other words, we can know the voltage drop budget expected at each

layer. This can help in speeding up grid verification, as one can simulate only part of the grid, knowing

that the budget supplied to that section of the grid is at a specific level. Furthermore, it can add to the

reduction technique proposed in chapter 3, as one can reduce the grid, simulate the reduced grid, get

the voltage drop at the bottommost metal layer, and finally estimate the average voltage drop at any

other layer in the original grid.

Several future directions can be raised based on the research presented in this thesis. As mentioned

earlier, it would be beneficial to make the sparsification technique accuracy-driven. Based on the accuracy

desired, one can divide the bottommost layer to sections, and specify a different local neighborhood to

be used in the sparsification process for each section. For example, from Fig. 3.9, we can see that the

nodes that have low voltage drop can tolerate larger relative error. Those nodes, as can be seen from

Fig. 3.8, lie in the middle of the bottommost layer. Specifying a less dense neighborhood for those nodes

would make the reduced system sparser, and this would speed-up the simulations. Another interesting

direction would be trying to eliminate sections of layers at a time, instead of eliminating whole layers

at a time, in order to find the combination that would lead to the fastest reduction scheme. As we saw

earlier, the incremental elimination was much faster than the direct elimination of all the middle layers

and much faster than the node-by-node elimination. Dissecting the middle layers before reduction may

result in a much faster elimination process. Finally, an important issue remains in that the simulations

may be expensive even on the reduced systems, since the bottom layers have the most number of nodes.

Finding a condition on the safety of the bottommost layer’s nodes in the reduced system can be useful

in this case. If we can find a condition on the nodes in the reduced system that can ensure the safety

of the nodes in the original grid, then there would be no need for simulating the grid, even the reduced

one, and this would help designers analyze the grids in a much faster way.

Bibliography

[1] Abhishek and F. N. Najm. Incremental power grid verification. In ACM/IEEE 49th Design Au-

tomation Conference (DAC-2012), pages 151–156, San Francisco, CA, June 3-7 2012.

[2] R. Achar, M. S. Nakhla, H. S. Dhindsa, A. R. Sridhar, D. Paul, and N. M. Nakhla. Parallel and scal-

able transient simulator for power grids via waveform relaxation (PTS-PWR). IEEE Transactions

on Very Large Scale Integration (VLSI) Systems, 19(2):319–332, February 2011.

[3] R. Ahmadi and F. N. Najm. Timing analysis in presence of power supply and ground voltage

variations. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages

176–183, San Jose, CA, November 9-13 2003.

[4] E. A. Amerasekera and F. N. Najm. Failure mechanisms in semiconductor devices. Wiley, 1997.

[5] C. S. Amin, M. H. Chowdhury, and Y. I. Ismail. Realizable RLCK circuit crunching. In ACM/IEEE

40th Design Automation Conference (DAC-03), pages 226–231, Anaheim, CA, June 2-6 2003.

[6] MOSEK ApS. The MOSEK C optimizer API manual Version 7.1 (Revision 41), 2015.

[7] H. H. Chen and D. D. Ling. Power supply noise analysis methodology for deep-submicron VLSI

chip design. In 34th Design Automation Conference, pages 638–643, Anaheim, CA, June 9-13 1997.

[8] L. H. Chen, M. Marek-Sadowska, and F. Brewer. Coping with buffer delay change due to power

and ground noise. In ACM/IEEE 39th Design Automation Conference (DAC-02), pages 860–865,

New Orleans, LA, June 10-14 2002.

[9] E. Chiprout. Fast flip-chip power grid analysis via locality and grid shells. In IEEE/ACM Interna-

tional Conference on Computer-Aided Design (ICCAD), pages 485–488, San Jose, CA, November

7-11 2004.

[10] K. DeHaven and J. Dietz. Controlled collapse chip connection (C4)–an enabling technology. In

Proceedings of 44th Electronic Components and Technology Conference (ECTC-94), pages 1–6, May

1994.

[11] M. Garland. Quadric-based polygonal surface simplification. PhD thesis, Georgia Institute of Tech-

nology, 1999.

[12] A. Goyal and F. N. Najm. Efficient RC power grid verification using node elimination. pages

257–260, Grenoble, France, March 14-18 2011.

56

Bibliography 57

[13] Y-M. Jiang, K-T. Cheng, and A-C. Deng. Estimation of maximum power supply noise for deep sub-

micron designs. In ACM/IEEE International Symposium on Low Power Electronics and Design,

pages 233–238, Monterey, CA, August 10-12 1998.

[14] D. Kouroussis and F. N. Najm. A static pattern-independent technique for power grid voltage

integrity verification. In ACM/IEEE 40th Design Automation Conference (DAC-03), pages 99–104,

Anaheim, CA, June 2-6 2003.

[15] J. N. Kozhaya, S. R. Nassif, and F. N. Najm. A multigrid-like technique for power grid analysis.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(10):1148–

1160, October 2002.

[16] E. Lelarasmee, A. E. Ruehli, and A. L. Sangiovanni-Vincentelli. The waveform relaxation method

for time-domain analysis of large scale integrated circuits. IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, 1(3):131–145, July 1982.

[17] H. Liao and W. Wei-Ming Dai. Partitioning and reduction of RC interconnect networks based on

scattering parameter macromodels. In IEEE/ACM International Conference on Computer-Aided

Design, pages 704–709, November 1995.

[18] P. Liu, S. X.-D. Tan, B. McGaughy, L. Wu, and L. He. Termmerg: an efficient terminal-reduction

method for interconnect circuits. IEEE Transactions on Computer-Aided Design of Integrated Cir-

cuits and Systems, 26(8):1382–1392, August 2007.

[19] A. V. Mezhiba and E. G. Friedman. Power Distribution Networks in High Speed Integrated Circuits.

Kluwer Academic Publishers, Boston, MA, USA, 2004.

[20] P. Miettinen, M. Honkala, J. Roos, and M. Valtonen. Partmor: partitioning-based realizable model-

order reduction method for RLC circuits. IEEE Transactions on Computer-Aided Design of Inte-

grated Circuits and Systems, 30(3):374–387, March 2011.

[21] A. Muramatsu, M. Hashimoto, and H. Onodera. Effects of on-chip inductance on power distribution

grid. IEICE Transactions on fundamentals of electronics, communications and computer sciences,

88(12):3564–3572, 2005.

[22] F. N. Najm. Circuit simulation. John Wiley & Sons, 2010.

[23] F. N. Najm. Overview of vectorless/early power grid verification. In IEEE/ACM International

Conference on Computer-Aided Design (ICCAD), pages 670–677, Nov 2012.

[24] S. R. Nassif and J. N. Kozhaya. Fast power grid simulation. In Design Automation Conference,

pages 156–161, Los Angeles, CA, June 5-9 2000.

[25] M. Nizam, F. N. Najm, and A. Devgan. Power grid voltage integrity verification. In ACM/IEEE

International Symposium on Low Power Electronics and Design, pages 239–244, San Diego, CA,

August 8-10 2005.

[26] A. Odabasioglu, M. Celik, and L. T. Pileggi. PRIMA: Passive reduced-order interconnect macro-

modeling algorithm. IEEE Transactions on Computer-Aided Design, 17(8):645–654, August 1998.

Bibliography 58

[27] D. Oyaro and P. Triverio. Turbomor: an efficient model order reduction technique for rc networks

with many ports. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,

2016.

[28] R. Panda, D. Blaauw, R. Chaudhry, V. Zolotov, B. Young, and R. Ramaraju. Model and analysis

for combined package and on-chip power grid simulation. In ACM/IEEE International Symposium

on Low Power Electronics and Design, pages 179–184, Italy, July 26-27 2000.

[29] S. Pant, D. Blaauw, V. Zolotov, S. Sundareswaran, and R. Panda. Vectorless analysis of supply

noise induced delay variation. In IEEE/ACM International Conference on Computer-Aided Design

(ICCAD), pages 184–191, San Jose, CA, November 9-13 2003.

[30] J. R. Phillips and L. M. Silveira. Poor man’s TBR: a simple model reduction scheme. IEEE

Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(1):43–55, January

2005.

[31] R. J. Plemmons and A. Berman. Nonnegative matrices in the mathematical sciences. New York:

Academic, 1979.

[32] R. Powell. Introduction to electric circuits. Butterworth-Heinemann, 1995.

[33] H. Qian, S. R. Nassif, and S. S. Sapatnekar. Random walks in a supply network. In ACM/IEEE

40th Design Automation Conference (DAC-03), pages 93–98, Anaheim, CA, June 2-6 2003.

[34] Y. Saad. Iterative methods for sparse linear systems. PWS Publishing Company, Boston, MA, USA,

1996.

[35] R. Saleh, S. Z. Hussain, S. Rochel, and D. Overhauser. Clock skew verification in the presence

of IR-drop in the power distribution network. IEEE Transactions on Computer-Aided Design,

19(6):635–644, June 2000.

[36] S. Sapatnekar and H. Su. Analysis and optimization of power grids. IEEE Design & Test of

Computers, pages 7–15, May-June 2003.

[37] W. H. A. Schilders, H. A. Van der Vorst, and J. Rommes. Model order reduction: theory, research

aspects and applications, volume 13. Springer, Berlin, 2008.

[38] B. Sheehan. Realizable reduction of RC networks. IEEE Transactions on Computer-Aided Design

of Integrated Circuits and Systems, 26(8):1393–1407, August 2007.

[39] H. Su, E. Acar, and S. R. Nassif. Power grid reduction based on algebraic multigrid principles. In

ACM/IEEE 40th Design Automation Conference (DAC-03), pages 109–112, Anaheim, CA, June

2-6 2003.

[40] S. Tan and L. He. Advanced model order reduction techniques in VLSI design. Cambridge University

Press, 2007.

[41] R. Tummala, E. J. Rymaszewski, and A. G. Klopfenstein. Microelectronics packaging handbook.

Kluwer Academic Publishers, Boston, MA, USA, 1997.

Bibliography 59

[42] H. Wang, S. X-D Tan, and R. Rakib. Compact modeling of interconnect circuits over wide frequency

band by adaptive complex-valued sampling method. ACM Transactions on Design Automation of

Electronic Systems, 17(1):5, 2012.

[43] K. Wang and M. Marek-Sadowska. On-chip power supply network optimization using multigrid-

based technique. In ACM/IEEE 40th Design Automation Conference (DAC-03), pages 113–118,

Anaheim, CA, June 2-6 2003.

[44] C.-J. Wei, H. Chen, and S.-J. Chen. Design and implementation of block-based partitioning for

parallel flip-chip power-grid analysis. IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems, 31(3):370–379, March 2012.

[45] A. Yassine and F. N. Najm. A fast layer elimination approach for power grid reduction. To Appear

in IEEE/ACM 35th International Conference on Computer-Aided Design (ICCAD), Austin, TX,

November 7-10 2016.

[46] M. Zhao, R. V. Panda, S. S. Sapatnekar, and D. Blaauw. Hierarchical analysis of power distribu-

tion networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,

21(2):159–168, February 2002.

[47] X. Zhao, Z. Feng, and C. Zhuo. An efficient spectral graph sparsification approach to scalable

reduction of large flip-chip power grids. In IEEE/ACM International Conference on Computer-

Aided Design (ICCAD), pages 218–223, San Jose, CA, November 2-6 2014.

[48] Z. Zhu, B. Yao, and C.-K. Cheng. Power network analysis using an adaptive algebraic multigrid

approach. In ACM/IEEE 40th Design Automation Conference (DAC-03), pages 105–108, Anaheim,

CA, June 2-6 2003.

Documents

A Fast Metal Layer Elimination Approach for Power Grid