DAC14_MATEX_PowerDistributionNetworkSimulationSlides

1. Computer Science & Engineering Dept., University of California, San Diego, CA

2. Facebook Inc., Menlo Park, CA

MATEX: A Distributed Framework of Transient Simulation for Power Distribution Networks

* Email: [email protected]

Hao Zhuang1*, Shih-Hung Weng2, Jeng-Hau Lin1, Chung-Kuan Cheng1

mailto:[email protected]

Outline Problem Formulation

MATEX Framework Circuit Solver

Matrix Exponential KernelKrylov Subspace Accelerations for PDNs

Distributed Framework Linear system’s Superposition Property and

Parallel Processing Reduce Krylov Subspace Computations

Experimental Results

Conclusions 2

Linear differential equations𝐂𝐂�̇�𝐱 𝑡𝑡 = −𝐆𝐆𝐱𝐱(𝑡𝑡) + 𝐁𝐁𝐁𝐁(𝑡𝑡)

Tens of millions or billions unknowns

Problem Formulation for PDN Transient Simulation

𝐂𝐂: capacitance/inductance matrix𝐆𝐆: conductance matrix𝐱𝐱(𝑡𝑡): voltage/current vector𝐁𝐁: input selection matrix𝐁𝐁 𝑡𝑡 : input current sources (vector)

PDN structureRLC model

3

Previous Work

Time step size ℎ is determined by Input transition distances defines

the upper bound of the time step, e.g. ℎ2 = min(ℎ1,ℎ2,ℎ3)

Stiffness of systems Local truncation error (LTE) ℎ1

ℎ2ℎ3

A pulse input example

Low order approximations, e.g. Trapezoidal method (TR) 𝐂𝐂ℎ

+ 𝐆𝐆2𝐱𝐱 𝑡𝑡 + ℎ = 𝐂𝐂

ℎ− 𝐆𝐆

2𝐱𝐱 𝑡𝑡 + 𝐁𝐁𝐁𝐁 𝑡𝑡+ℎ +𝐁𝐁(𝑡𝑡)

2

TR with fixed time-step ℎ was used by the top solvers in TAU’12 power grid (PG) simulation contest

Efficient for IBM PG Benchmarks Only one matrix factorization for transient stepping

Process forward and backward substitutions to calculate 𝐱𝐱 𝑡𝑡 + ℎ

4

Our Matrix Exponential Method Analytical solution [Weng, et. al., IEEE TCAD 2012]

𝐱𝐱 𝑡𝑡 + ℎ = 𝑒𝑒ℎ𝐀𝐀𝐱𝐱(𝑡𝑡) + �0

ℎ𝑒𝑒(ℎ−𝜏𝜏)𝐀𝐀𝐛𝐛(𝑡𝑡 + 𝜏𝜏) 𝑑𝑑𝜏𝜏

where 𝐀𝐀 = −𝐂𝐂−𝟏𝟏𝐆𝐆,𝐛𝐛 = 𝐂𝐂−𝟏𝟏𝐁𝐁𝐁𝐁(𝐭𝐭)

Input sources are piecewise linear (PWL)

𝐱𝐱 𝑡𝑡 + ℎ = 𝑒𝑒ℎ𝐀𝐀 (𝐱𝐱 𝑡𝑡 + 𝐅𝐅 𝑡𝑡, ℎ) − 𝐏𝐏 𝑡𝑡, ℎ

Where

𝐅𝐅 𝑡𝑡,ℎ = 𝐀𝐀−𝟏𝟏𝐛𝐛 𝑡𝑡 + 𝐀𝐀−𝟐𝟐 𝐛𝐛 𝑡𝑡+ℎ −𝐛𝐛 𝑡𝑡ℎ

,

𝐏𝐏 𝑡𝑡, ℎ = 𝐀𝐀−𝟏𝟏𝐛𝐛 𝑡𝑡 + ℎ + 𝐀𝐀−𝟐𝟐𝐛𝐛 𝑡𝑡 + ℎ − 𝐛𝐛 𝑡𝑡

ℎ

vectorMatrix exponential vector

5

Advantage in Accuracy

Reference solution

With the same h, Matrix Exponential method can reaches the reference solution, while Backward Euler cannot.

6

Not 𝒆𝒆𝐀𝐀, but 𝒆𝒆𝐀𝐀𝐯𝐯 [Weng, et. al. IEEE TCAD 2012]

Compute 𝒆𝒆𝐀𝐀 is very expensive, when 𝐀𝐀 is large!

𝒆𝒆𝐀𝐀𝐯𝐯: Matrix Exponential and Vector Product (MEVP) Efficiently approximated via Krylov subspace (MEXP)

Standard Krylov subspace 𝑲𝑲𝒎𝒎 𝐀𝐀, 𝐯𝐯 = 𝐯𝐯,𝐀𝐀𝐯𝐯,𝐀𝐀𝟐𝟐𝐯𝐯, … ,𝐀𝐀𝒎𝒎−𝟏𝟏𝐯𝐯

Basis Generation: 𝐕𝐕𝒎𝒎 = 𝐯𝐯𝟏𝟏, 𝐯𝐯𝟐𝟐,⋯ , 𝐯𝐯𝒎𝒎

Arnoldi process and Matrix reduction:𝐀𝐀𝐕𝐕𝒎𝒎 = 𝐕𝐕𝒎𝒎𝐇𝐇𝒎𝒎 + 𝒉𝒉𝒎𝒎+𝟏𝟏,𝒎𝒎𝐯𝐯𝒎𝒎+𝟏𝟏𝒆𝒆𝒎𝒎𝐓𝐓

MEVP is computed by

𝒆𝒆𝐀𝐀𝐯𝐯 ≈ 𝐯𝐯 𝟐𝟐𝐕𝐕𝒎𝒎 𝒆𝒆𝐇𝐇𝒎𝒎𝒆𝒆𝟏𝟏 Time stepping only by scaling h,

𝒆𝒆ℎ𝐀𝐀𝐯𝐯 ≈ 𝐯𝐯 𝟐𝟐𝐕𝐕𝒎𝒎 𝒆𝒆ℎ𝐇𝐇𝒎𝒎𝒆𝒆𝟏𝟏

7

Algorithm of Computing 𝐱𝐱(𝑡𝑡 + ℎ)

PDN is a linear system, so that the input matrices 𝐗𝐗𝟐𝟐,𝐋𝐋,𝐔𝐔 do not change. 𝐥𝐥𝐁𝐁_𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝 𝐗𝐗𝟏𝟏 is done only once for the whole simulation.

𝐗𝐗𝟏𝟏 𝐗𝐗𝟐𝟐 𝐋𝐋,𝐔𝐔MEXP 𝐂𝐂 𝐆𝐆 𝐥𝐥𝐁𝐁_𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝 (𝐗𝐗𝟏𝟏)

8

PDNs are usually highly stiff circuits Generalized eigenvalues spread in a wide range

within spectrum of A. (𝐀𝐀 = −𝐂𝐂−𝟏𝟏𝐆𝐆) Requires Standard Krylov subspace to build a very

large number of bases to approximate MEVP.

Problem #1: Stiff PDN Circuits

9

Next Section Problem Formulation






Conclusions 10

Standard Krylov subspace (MEXP) (a) Standard Krylov Basis (MEXP):

𝑲𝑲𝒎𝒎 𝐀𝐀,𝐯𝐯 = 𝐯𝐯,𝐀𝐀𝐯𝐯,𝐀𝐀𝟐𝟐𝐯𝐯, … ,𝐀𝐀𝒎𝒎−𝟏𝟏𝐯𝐯

Im

Re0

(a)

Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components

𝐀𝐀 = −𝐂𝐂−𝟏𝟏𝐆𝐆

11



Im

Re0

(a)

• Fast mode of dynamical behavior of circuits.• Standard Krylov basis tends to capture these

eigenvalues with large magnitude.

Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components 12



Im

Re0

(a)

• These eigenvalues defines the major dynamical behavior of circuits.

• Demand more bases in order to characterize these eigenvalues


Inverted Krylov subspace (I-MATEX) (a) Standard Krylov Basis (MEXP):

𝑲𝑲𝒎𝒎 𝐀𝐀,𝐯𝐯 = 𝐯𝐯,𝐀𝐀𝐯𝐯,𝐀𝐀𝟐𝟐𝐯𝐯, … ,𝐀𝐀𝒎𝒎−𝟏𝟏𝐯𝐯 (b) Inverted Krylov Basis (I-MATEX)

𝑲𝑲𝒎𝒎 𝐀𝐀−𝟏𝟏, 𝐯𝐯 = 𝐯𝐯,𝐀𝐀−𝟏𝟏𝐯𝐯,𝐀𝐀−𝟐𝟐 𝐯𝐯, … ,𝐀𝐀−𝒎𝒎+𝟏𝟏𝐯𝐯

Im

Re

Im

Re00

(a) (b)


Inverted Krylov subspace (I-MATEX) (a) Standard Krylov Basis (MEXP):

𝑲𝑲𝒎𝒎 𝐀𝐀,𝐯𝐯 = 𝐯𝐯,𝐀𝐀𝐯𝐯,𝐀𝐀𝟐𝟐𝐯𝐯, … ,𝐀𝐀𝒎𝒎−𝟏𝟏𝐯𝐯 (b) Inverted Krylov Basis (I-MATEX)

𝑲𝑲𝒎𝒎 𝐀𝐀−𝟏𝟏, 𝐯𝐯 = 𝐯𝐯,𝐀𝐀−𝟏𝟏𝐯𝐯,𝐀𝐀−𝟐𝟐 𝐯𝐯, … ,𝐀𝐀−𝒎𝒎+𝟏𝟏𝐯𝐯

Im

Re

Im

Re00

(a) (b)

Inverted Krylov subspace is more likely to capture these “important” eigenvalues


Rational Krylov subspace (R-MATEX) (a) Standard Krylov Basis (MEXP):

𝑲𝑲𝒎𝒎 𝐀𝐀,𝐯𝐯 = 𝐯𝐯,𝐀𝐀𝐯𝐯,𝐀𝐀𝟐𝟐𝐯𝐯, … ,𝐀𝐀𝒎𝒎−𝟏𝟏𝐯𝐯 (c) Rational Krylov Basis (R-MATEX)

𝑲𝑲𝒎𝒎 (𝐈𝐈 − 𝛾𝛾𝐀𝐀)−𝟏𝟏,𝐯𝐯 = 𝐯𝐯, (𝐈𝐈 − 𝛾𝛾𝐀𝐀)−𝟏𝟏𝐯𝐯, (𝐈𝐈 − 𝛾𝛾𝐀𝐀)−𝟐𝟐 𝐯𝐯, … , (𝐈𝐈 − 𝛾𝛾𝐀𝐀)−𝒎𝒎+𝟏𝟏𝐯𝐯

Im

Re

Im

Re

Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components

00

(a) (c)

• Rational Krylov is still likely to capture these “important” eigenvalues

• More robust numerical property

16

Error trend of R-MATEX

Directly compute 𝑒𝑒ℎ𝐀𝐀 MEVP via R-MATEX

𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = |𝑒𝑒ℎ𝐀𝐀𝐯𝐯 − 𝐕𝐕𝐝𝐝𝑒𝑒ℎ𝐇𝐇𝐝𝐝𝑒𝑒1| vs. m vs. h

Erro

r

17

Same Algorithm with Different Input Matrices

Still only one 𝐋𝐋,𝐔𝐔 = 𝐥𝐥𝐁𝐁_𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝(𝐗𝐗𝟏𝟏)

𝐗𝐗𝟏𝟏 𝐗𝐗𝟐𝟐 𝐇𝐇𝒎𝒎

MEXP 𝐂𝐂 𝐆𝐆 𝐇𝐇𝒎𝒎

I-MATEX 𝐆𝐆 𝐂𝐂 𝐇𝐇𝐇𝑚𝑚−1

R-MATEX 𝐂𝐂 + 𝜸𝜸𝐆𝐆 𝐂𝐂 (𝐈𝐈 − �𝐇𝐇𝑚𝑚−1)/𝜸𝜸

18

Testcases: RC Circuits with Different Stiffnessma: average dimension of Krylov subspace (Vm, Hm)mp: peak dimension of Krylov subspace (Vm, Hm)Err(%): relative error compared to reference solution.Speedups brought by Krylov subspace reduction

Stiffness:|𝑅𝑅𝑒𝑒{𝜆𝜆𝑚𝑚𝑚𝑚𝑚𝑚 𝐴𝐴 }||𝑅𝑅𝑒𝑒{𝜆𝜆𝑚𝑚𝑎𝑎𝑎𝑎 𝐴𝐴 }|

Method 𝑚𝑚𝑎𝑎 𝑚𝑚𝑝𝑝 Err(%) Speedup/MEXP StiffnessMEXP 211.4 229 0.510 1X

2.1X1016I-MATEX 5.7 14 0.004 2616X

R-MATEX 6.9 12 0.004 2735X

MEXP 154.2 224 0.004 1X

2.1X1012I-MATEX 5.7 14 0.004 583X

R-MATEX 6.9 12 0.004 611X

MEXP 148.6 223 0.004 1X

2.1X108I-MATEX 5.7 14 0.004 229X

R-MATEX 6.9 12 0.004 252X19

Problem #2: Initial Vector Change MEVP= 𝑒𝑒𝐀𝐀𝐯𝐯

Once 𝐯𝐯 changes, we need to compute 𝑲𝑲𝒎𝒎 for MEVP.

initial vector of 𝑲𝑲𝒎𝒎 (𝐈𝐈 − 𝛾𝛾𝐀𝐀)−𝟏𝟏,𝐯𝐯

20

Problem #2: Initial Vector Change

changes when input sources cannot keep the previous trend

MEVP= 𝑒𝑒𝐀𝐀𝐯𝐯Once 𝐯𝐯 changes, we need to compute 𝑲𝑲𝒎𝒎 for MEVP.In circuit solver,

𝐱𝐱 𝑡𝑡 + ℎ = 𝑒𝑒ℎ𝐀𝐀 (𝐱𝐱 𝑡𝑡 + 𝐅𝐅 𝑡𝑡,ℎ) − 𝐏𝐏 𝑡𝑡,ℎ

where𝐅𝐅 𝑡𝑡,ℎ = 𝐀𝐀−𝟏𝟏𝐛𝐛 𝑡𝑡 + 𝐀𝐀−𝟐𝟐

𝐛𝐛 𝑡𝑡 + ℎ − 𝐛𝐛 𝑡𝑡ℎ


initial vector

21

Problem #2: Initial Vector Change MEVP= 𝑒𝑒𝐀𝐀𝐯𝐯

Once 𝐯𝐯 changes, we need to compute 𝑲𝑲𝒎𝒎 for MEVP.

𝐅𝐅 𝑡𝑡,ℎ = 𝐀𝐀−𝟏𝟏𝐛𝐛 𝑡𝑡 + 𝐀𝐀−𝟐𝟐𝐛𝐛 𝑡𝑡 + ℎ − 𝐛𝐛 𝑡𝑡

ℎ


A pulse input example, • the dash lines are places where initial vector changes• “transition spot”


22

Problem #2: Initial Vector Change


MEVP= 𝑒𝑒𝐀𝐀𝐯𝐯Once 𝒗𝒗 changes, we need to compute 𝑲𝑲𝒎𝒎 for MEVP.In circuit solver,

𝐱𝐱 𝑡𝑡 + ℎ = 𝑒𝑒ℎ𝐀𝐀 (𝐱𝐱 𝑡𝑡 + 𝐅𝐅 𝑡𝑡,ℎ) − 𝐏𝐏 𝑡𝑡,ℎ

where𝐅𝐅 𝑡𝑡,ℎ = 𝐀𝐀−𝟏𝟏𝐛𝐛 𝑡𝑡 + 𝐀𝐀−𝟐𝟐

𝐛𝐛 𝑡𝑡 + ℎ − 𝐛𝐛 𝑡𝑡ℎ


initial vector

Many input current sources in PDN make the initial vector change frequently, which triggers Krylov subspace generations and consumes runtime (trouble maker).

23

Next Section Problem Formulation






Conclusions 24

Input sources, the trouble maker

A PDN with three input current sources.

25



26


Some definitions

Local Transition Spot (LTS): for oneinput source, its transition spots.

Global Transition Spot (GTS): theunion of all LTS.

Snapshot: for one input source, the spot in GTS but not in LTS.


27


Some definitions

Local Transition Spot (LTS): for one input source, its transition spots.

Global Transition Spot (GTS): theunion of all LTS

Snapshot: for one input source, the spot in GTS but not in LTS


Simulating circuit with input sources as a whole, GTS triggers Krylov subspace generations.

28


How about simulating the circuit with individual source, then sum them up later by superposition?


Some definitions

Local Transition Spot (LTS): for oneinput source, its transition spots.

Global Transition Spot (GTS): theunion of all LTS.

Snapshot: for one input source, the spot in GTS but not in LTS.

29

Reduce the Krylov subspace generation chances and reuse subspace

For one input source, LTS is much smaller than GTS.

Meanwhile, the snapshot is needed to keep track for later superposition.

Compute snapshot without extra Krylov subspace generations.

30


Given an previous solution x(t)

𝐱𝐱 𝑡𝑡

31


To compute the solution at snapshot 𝐱𝐱 𝑡𝑡 + ℎ1 and 𝐱𝐱 𝑡𝑡 + ℎ2 without Krylov subspace generations

𝐱𝐱 𝑡𝑡 + ℎ1

𝐱𝐱 𝑡𝑡 + ℎ2

ℎ1

ℎ2

32


Generate 𝐕𝐕𝐝𝐝 and 𝐇𝐇𝐝𝐝 at t

𝐕𝐕𝐝𝐝,𝐇𝐇𝒎𝒎𝑡𝑡

33


Use 𝐕𝐕𝐝𝐝, 𝐇𝐇𝐝𝐝 and scaling h to h1, and h2 for MEVP, until reach the next LTS

No matrix factorizations during this adaptive stepping!

𝐱𝐱 𝑡𝑡 + ℎ2 = ||𝐯𝐯||𝐕𝐕𝐝𝐝𝑒𝑒ℎ2𝐇𝐇𝑚𝑚𝒆𝒆𝟏𝟏 − 𝑷𝑷(𝑡𝑡, ℎ𝟐𝟐)

ℎ2

𝐱𝐱 𝑡𝑡 + ℎ1 = ||𝐯𝐯||𝐕𝐕𝐝𝐝𝑒𝑒ℎ1𝐇𝐇𝑚𝑚𝒆𝒆𝟏𝟏 − 𝑷𝑷(𝒕𝒕,ℎ1)ℎ1𝐕𝐕𝐝𝐝,𝐇𝐇𝒎𝒎

34

MATEX’s Distributed Framework

35

More aggressive! Each computing node is responsible for one set of bumps.

36

Experimental Results Test cases: IBM power grid benchmarks

TR: Trapezoidal method with fixed time step MATEX: circuit solver uses R-MATEX

Environment Linux workstations, Intel CoreTM i7-4770 3.40GHz processor 32GB memory. Implemented in MATLAB 2013. Easy to emulate distributed environment (no

synchronization during the simulation).

37


DesignMATEX

# Grp trmatex(s) trtotal(s) AvgErr.

Speedups t1000(s)/trmatex(s)

Speedups ttotal(s)/trtotal(s)

ibmpg1t 100 0.50 0.85 2.5E-5 11.9X 7.3Xibmpg2t 100 2.02 3.72 4.3E-5 13.4X 7.7Xibmpg3t 100 20.15 45.77 3.7E-5 12.2X 6.0XIbmpg4t 15 22.35 65.66 3.9E-5 14.7X 5.6Xibmpg5t 100 35.67 54.21 1.1E-5 11.5X 7.9Xibmpg6t 100 47.27 74.94 3.4E-5 11.5X 7.6X

DesignTR with h=10pst1000(s) tttotal(s)

ibmpg1t 5.94 6.20ibmpg2t 26.98 28.61ibmpg3t 245.92 272.47Ibmpg4t 329.36 368.55ibmpg5t 408.78 428.43ibmpg6t 542.04 567.38

• Avg Err.: average differences compared to all output nodes' solutions provided by IBM Power Grid Benchmarks;

• Speedups t1000/trmatex : transient stepping runtime speedups of MATEX over TR;

• Speedups tttotal/trtotal : total simulation runtime speedups of MATEX over TR.

38


DesignMATEX



Speedups tttotal(s)/trtotal(s)







39


DesignMATEX



Speedups tttotal(s)/trtotal(s)







40

Contributions New time-integration kernel is applied with improved

Krylov subspace-based MEVP approximations for PDNs Adaptive time stepping without matrix re-factorization

during the transient (stepping) simulation This feature cannot be achieved in low order approximation strategy,

e.g., trapezoidal (TR), due to the explicitly embedded ℎ in 𝐂𝐂ℎ

+ 𝐆𝐆2

Distributed computing framework Decompose simulation task based on LTS, then do superposition

using GTS and snapshot to form the final solution. Explore the advantages of large time stepping, also reduce and

reuse Krylov subspaces.

Results of IBM PG benchmarks Compared to TR with fixed time step (10ps), the speedup of transient

stepping is 13X on average.

41

THANK YOU

42