Upload
hao-zhuang
View
39
Download
0
Embed Size (px)
DESCRIPTION
This is the draft slides we use for DAC 2014 presentation. Abstract: We proposed MATEX, a distributed framework for transient simulation of power distribution networks (PDNs). MATEX utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit. First, the whole simulation task is divided into subtasks based on decompositions of current sources, in order to reduce the computational overheads. Then these subtasks are distributed to different computing nodes and processed in parallel. Within each node, after the matrix factorization at the beginning of simulation, the adaptive time stepping solver is performed without extra matrix re-factorizations. MATEX overcomes the stiffness hinder of previous matrix exponential-based circuit simulator by rational Krylov subspace method, which leads to larger step sizes with smaller dimensions of Krylov subspace bases and highly accelerates the whole computation. MATEX outperforms both traditional fixed and adaptive time stepping methods, e.g., achieving around 13X over the trapezoidal framework with fixed time step for the IBM power grid benchmarks.
Citation preview
1. Computer Science & Engineering Dept., University of California, San Diego, CA
2. Facebook Inc., Menlo Park, CA
MATEX: A Distributed Framework of Transient Simulation for Power Distribution Networks
* Email: [email protected]
Hao Zhuang1*, Shih-Hung Weng2, Jeng-Hau Lin1, Chung-Kuan Cheng1
Outline Problem Formulation
MATEX Framework Circuit Solver
Matrix Exponential KernelKrylov Subspace Accelerations for PDNs
Distributed Framework Linear systemβs Superposition Property and
Parallel Processing Reduce Krylov Subspace Computations
Experimental Results
Conclusions 2
Linear differential equationsπποΏ½ΜοΏ½π± π‘π‘ = βπππ±π±(π‘π‘) + ππππ(π‘π‘)
Tens of millions or billions unknowns
Problem Formulation for PDN Transient Simulation
ππ: capacitance/inductance matrixππ: conductance matrixπ±π±(π‘π‘): voltage/current vectorππ: input selection matrixππ π‘π‘ : input current sources (vector)
PDN structureRLC model
3
Previous Work
Time step size β is determined by Input transition distances defines
the upper bound of the time step, e.g. β2 = min(β1,β2,β3)
Stiffness of systems Local truncation error (LTE) β1
β2β3
A pulse input example
Low order approximations, e.g. Trapezoidal method (TR) ππβ
+ ππ2π±π± π‘π‘ + β = ππ
ββ ππ
2π±π± π‘π‘ + ππππ π‘π‘+β +ππ(π‘π‘)
2
TR with fixed time-step β was used by the top solvers in TAUβ12 power grid (PG) simulation contest
Efficient for IBM PG Benchmarks Only one matrix factorization for transient stepping
Process forward and backward substitutions to calculate π±π± π‘π‘ + β
4
Our Matrix Exponential Method Analytical solution [Weng, et. al., IEEE TCAD 2012]
π±π± π‘π‘ + β = ππβπππ±π±(π‘π‘) + οΏ½0
βππ(ββππ)ππππ(π‘π‘ + ππ) ππππ
where ππ = βππβππππ,ππ = ππβππππππ(ππ)
Input sources are piecewise linear (PWL)
π±π± π‘π‘ + β = ππβππ (π±π± π‘π‘ + π π π‘π‘, β) β ππ π‘π‘, β
Where
π π π‘π‘,β = ππβππππ π‘π‘ + ππβππ ππ π‘π‘+β βππ π‘π‘β
,
ππ π‘π‘, β = ππβππππ π‘π‘ + β + ππβππππ π‘π‘ + β β ππ π‘π‘
β
vectorMatrix exponential vector
5
Advantage in Accuracy
Reference solution
With the same h, Matrix Exponential method can reaches the reference solution, while Backward Euler cannot.
6
Not ππππ, but πππππ―π― [Weng, et. al. IEEE TCAD 2012]
Compute ππππ is very expensive, when ππ is large!
πππππ―π―: Matrix Exponential and Vector Product (MEVP) Efficiently approximated via Krylov subspace (MEXP)
Standard Krylov subspace π²π²ππ ππ, π―π― = π―π―,πππ―π―,πππππ―π―, β¦ ,ππππβπππ―π―
Basis Generation: ππππ = π―π―ππ, π―π―ππ,β― , π―π―ππ
Arnoldi process and Matrix reduction:ππππππ = ππππππππ + ππππ+ππ,πππ―π―ππ+ππππππππ
MEVP is computed by
πππππ―π― β π―π― ππππππ ππππππππππ Time stepping only by scaling h,
ππβπππ―π― β π―π― ππππππ ππβππππππππ
7
Algorithm of Computing π±π±(π‘π‘ + β)
PDN is a linear system, so that the input matrices ππππ,ππ,ππ do not change. π₯π₯ππ_ππππππππππππ ππππ is done only once for the whole simulation.
ππππ ππππ ππ,ππMEXP ππ ππ π₯π₯ππ_ππππππππππππ (ππππ)
8
PDNs are usually highly stiff circuits Generalized eigenvalues spread in a wide range
within spectrum of A. (ππ = βππβππππ) Requires Standard Krylov subspace to build a very
large number of bases to approximate MEVP.
Problem #1: Stiff PDN Circuits
9
Next Section Problem Formulation
MATEX Framework Circuit Solver
Matrix Exponential KernelKrylov Subspace Accelerations for PDNs
Distributed Framework Linear systemβs Superposition Property and
Parallel Processing Reduce Krylov Subspace Computations
Experimental Results
Conclusions 10
Standard Krylov subspace (MEXP) (a) Standard Krylov Basis (MEXP):
π²π²ππ ππ,π―π― = π―π―,πππ―π―,πππππ―π―, β¦ ,ππππβπππ―π―
Im
Re0
(a)
Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components
ππ = βππβππππ
11
Standard Krylov subspace (MEXP) (a) Standard Krylov Basis (MEXP):
π²π²ππ ππ,π―π― = π―π―,πππ―π―,πππππ―π―, β¦ ,ππππβπππ―π―
Im
Re0
(a)
β’ Fast mode of dynamical behavior of circuits.β’ Standard Krylov basis tends to capture these
eigenvalues with large magnitude.
Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components 12
Standard Krylov subspace (MEXP) (a) Standard Krylov Basis (MEXP):
π²π²ππ ππ,π―π― = π―π―,πππ―π―,πππππ―π―, β¦ ,ππππβπππ―π―
Im
Re0
(a)
β’ These eigenvalues defines the major dynamical behavior of circuits.
β’ Demand more bases in order to characterize these eigenvalues
Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components 13
Inverted Krylov subspace (I-MATEX) (a) Standard Krylov Basis (MEXP):
π²π²ππ ππ,π―π― = π―π―,πππ―π―,πππππ―π―, β¦ ,ππππβπππ―π― (b) Inverted Krylov Basis (I-MATEX)
π²π²ππ ππβππ, π―π― = π―π―,ππβπππ―π―,ππβππ π―π―, β¦ ,ππβππ+πππ―π―
Im
Re
Im
Re00
(a) (b)
Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components 14
Inverted Krylov subspace (I-MATEX) (a) Standard Krylov Basis (MEXP):
π²π²ππ ππ,π―π― = π―π―,πππ―π―,πππππ―π―, β¦ ,ππππβπππ―π― (b) Inverted Krylov Basis (I-MATEX)
π²π²ππ ππβππ, π―π― = π―π―,ππβπππ―π―,ππβππ π―π―, β¦ ,ππβππ+πππ―π―
Im
Re
Im
Re00
(a) (b)
Inverted Krylov subspace is more likely to capture these βimportantβ eigenvalues
Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components 15
Rational Krylov subspace (R-MATEX) (a) Standard Krylov Basis (MEXP):
π²π²ππ ππ,π―π― = π―π―,πππ―π―,πππππ―π―, β¦ ,ππππβπππ―π― (c) Rational Krylov Basis (R-MATEX)
π²π²ππ (ππ β πΎπΎππ)βππ,π―π― = π―π―, (ππ β πΎπΎππ)βπππ―π―, (ππ β πΎπΎππ)βππ π―π―, β¦ , (ππ β πΎπΎππ)βππ+πππ―π―
Im
Re
Im
Re
Eigenvalues of A: small magnitude of real componentsEigenvalues of A: large magnitude of real components
00
(a) (c)
β’ Rational Krylov is still likely to capture these βimportantβ eigenvalues
β’ More robust numerical property
16
Error trend of R-MATEX
Directly compute ππβππ MEVP via R-MATEX
ππππππππππ = |ππβπππ―π― β ππππππβππππππ1| vs. m vs. h
Erro
r
17
Same Algorithm with Different Input Matrices
Still only one ππ,ππ = π₯π₯ππ_ππππππππππππ(ππππ)
ππππ ππππ ππππ
MEXP ππ ππ ππππ
I-MATEX ππ ππ πππππβ1
R-MATEX ππ + πΈπΈππ ππ (ππ β οΏ½ππππβ1)/πΈπΈ
18
Testcases: RC Circuits with Different Stiffnessma: average dimension of Krylov subspace (Vm, Hm)mp: peak dimension of Krylov subspace (Vm, Hm)Err(%): relative error compared to reference solution.Speedups brought by Krylov subspace reduction
Stiffness:|π π ππ{ππππππππ π΄π΄ }||π π ππ{ππππππππ π΄π΄ }|
Method ππππ ππππ Err(%) Speedup/MEXP StiffnessMEXP 211.4 229 0.510 1X
2.1X1016I-MATEX 5.7 14 0.004 2616X
R-MATEX 6.9 12 0.004 2735X
MEXP 154.2 224 0.004 1X
2.1X1012I-MATEX 5.7 14 0.004 583X
R-MATEX 6.9 12 0.004 611X
MEXP 148.6 223 0.004 1X
2.1X108I-MATEX 5.7 14 0.004 229X
R-MATEX 6.9 12 0.004 252X19
Problem #2: Initial Vector Change MEVP= πππππ―π―
Once π―π― changes, we need to compute π²π²ππ for MEVP.
initial vector of π²π²ππ (ππ β πΎπΎππ)βππ,π―π―
20
Problem #2: Initial Vector Change
changes when input sources cannot keep the previous trend
MEVP= πππππ―π―Once π―π― changes, we need to compute π²π²ππ for MEVP.In circuit solver,
π±π± π‘π‘ + β = ππβππ (π±π± π‘π‘ + π π π‘π‘,β) β ππ π‘π‘,β
whereπ π π‘π‘,β = ππβππππ π‘π‘ + ππβππ
ππ π‘π‘ + β β ππ π‘π‘β
initial vector of π²π²ππ (ππ β πΎπΎππ)βππ,π―π―
initial vector
21
Problem #2: Initial Vector Change MEVP= πππππ―π―
Once π―π― changes, we need to compute π²π²ππ for MEVP.
π π π‘π‘,β = ππβππππ π‘π‘ + ππβππππ π‘π‘ + β β ππ π‘π‘
β
initial vector of π²π²ππ (ππ β πΎπΎππ)βππ,π―π―
A pulse input example, β’ the dash lines are places where initial vector changesβ’ βtransition spotβ
changes when input sources cannot keep the previous trend
22
Problem #2: Initial Vector Change
changes when input sources cannot keep the previous trend
MEVP= πππππ―π―Once ππ changes, we need to compute π²π²ππ for MEVP.In circuit solver,
π±π± π‘π‘ + β = ππβππ (π±π± π‘π‘ + π π π‘π‘,β) β ππ π‘π‘,β
whereπ π π‘π‘,β = ππβππππ π‘π‘ + ππβππ
ππ π‘π‘ + β β ππ π‘π‘β
initial vector of π²π²ππ (ππ β πΎπΎππ)βππ,π―π―
initial vector
Many input current sources in PDN make the initial vector change frequently, which triggers Krylov subspace generations and consumes runtime (trouble maker).
23
Next Section Problem Formulation
MATEX Framework Circuit Solver
Matrix Exponential KernelKrylov Subspace Accelerations for PDNs
Distributed Framework Linear systemβs Superposition Property and
Parallel Processing Reduce Krylov Subspace Computations
Experimental Results
Conclusions 24
Input sources, the trouble maker
A PDN with three input current sources.
25
Input sources, the trouble maker
A PDN with three input current sources.
26
Input sources, the trouble maker
Some definitions
Local Transition Spot (LTS): for oneinput source, its transition spots.
Global Transition Spot (GTS): theunion of all LTS.
Snapshot: for one input source, the spot in GTS but not in LTS.
A PDN with three input current sources.
27
Input sources, the trouble maker
Some definitions
Local Transition Spot (LTS): for one input source, its transition spots.
Global Transition Spot (GTS): theunion of all LTS
Snapshot: for one input source, the spot in GTS but not in LTS
A PDN with three input current sources.
Simulating circuit with input sources as a whole, GTS triggers Krylov subspace generations.
28
Input sources, the trouble maker
How about simulating the circuit with individual source, then sum them up later by superposition?
A PDN with three input current sources.
Some definitions
Local Transition Spot (LTS): for oneinput source, its transition spots.
Global Transition Spot (GTS): theunion of all LTS.
Snapshot: for one input source, the spot in GTS but not in LTS.
29
Reduce the Krylov subspace generation chances and reuse subspace
For one input source, LTS is much smaller than GTS.
Meanwhile, the snapshot is needed to keep track for later superposition.
Compute snapshot without extra Krylov subspace generations.
30
Reduce the Krylov subspace generation chances and reuse subspace
Given an previous solution x(t)
π±π± π‘π‘
31
Reduce the Krylov subspace generation chances and reuse subspace
To compute the solution at snapshot π±π± π‘π‘ + β1 and π±π± π‘π‘ + β2 without Krylov subspace generations
π±π± π‘π‘ + β1
π±π± π‘π‘ + β2
β1
β2
32
Reduce the Krylov subspace generation chances and reuse subspace
Generate ππππ and ππππ at t
ππππ,πππππ‘π‘
33
Reduce the Krylov subspace generation chances and reuse subspace
Use ππππ, ππππ and scaling h to h1, and h2 for MEVP, until reach the next LTS
No matrix factorizations during this adaptive stepping!
π±π± π‘π‘ + β2 = ||π―π―||ππππππβ2ππππππππ β π·π·(π‘π‘, βππ)
β2
π±π± π‘π‘ + β1 = ||π―π―||ππππππβ1ππππππππ β π·π·(ππ,β1)β1ππππ,ππππ
34
MATEXβs Distributed Framework
35
More aggressive! Each computing node is responsible for one set of bumps.
36
Experimental Results Test cases: IBM power grid benchmarks
TR: Trapezoidal method with fixed time step MATEX: circuit solver uses R-MATEX
Environment Linux workstations, Intel CoreTM i7-4770 3.40GHz processor 32GB memory. Implemented in MATLAB 2013. Easy to emulate distributed environment (no
synchronization during the simulation).
37
Experimental Results
DesignMATEX
# Grp trmatex(s) trtotal(s) AvgErr.
Speedups t1000(s)/trmatex(s)
Speedups ttotal(s)/trtotal(s)
ibmpg1t 100 0.50 0.85 2.5E-5 11.9X 7.3Xibmpg2t 100 2.02 3.72 4.3E-5 13.4X 7.7Xibmpg3t 100 20.15 45.77 3.7E-5 12.2X 6.0XIbmpg4t 15 22.35 65.66 3.9E-5 14.7X 5.6Xibmpg5t 100 35.67 54.21 1.1E-5 11.5X 7.9Xibmpg6t 100 47.27 74.94 3.4E-5 11.5X 7.6X
DesignTR with h=10pst1000(s) tttotal(s)
ibmpg1t 5.94 6.20ibmpg2t 26.98 28.61ibmpg3t 245.92 272.47Ibmpg4t 329.36 368.55ibmpg5t 408.78 428.43ibmpg6t 542.04 567.38
β’ Avg Err.: average differences compared to all output nodes' solutions provided by IBM Power Grid Benchmarks;
β’ Speedups t1000/trmatex : transient stepping runtime speedups of MATEX over TR;
β’ Speedups tttotal/trtotal : total simulation runtime speedups of MATEX over TR.
38
Experimental Results
DesignMATEX
# Grp trmatex(s) trtotal(s) AvgErr.
Speedups t1000(s)/trmatex(s)
Speedups tttotal(s)/trtotal(s)
ibmpg1t 100 0.50 0.85 2.5E-5 11.9X 7.3Xibmpg2t 100 2.02 3.72 4.3E-5 13.4X 7.7Xibmpg3t 100 20.15 45.77 3.7E-5 12.2X 6.0XIbmpg4t 15 22.35 65.66 3.9E-5 14.7X 5.6Xibmpg5t 100 35.67 54.21 1.1E-5 11.5X 7.9Xibmpg6t 100 47.27 74.94 3.4E-5 11.5X 7.6X
DesignTR with h=10pst1000(s) tttotal(s)
ibmpg1t 5.94 6.20ibmpg2t 26.98 28.61ibmpg3t 245.92 272.47Ibmpg4t 329.36 368.55ibmpg5t 408.78 428.43ibmpg6t 542.04 567.38
β’ Avg Err.: average differences compared to all output nodes' solutions provided by IBM Power Grid Benchmarks;
β’ Speedups t1000/trmatex : transient stepping runtime speedups of MATEX over TR;
β’ Speedups tttotal/trtotal : total simulation runtime speedups of MATEX over TR.
39
Experimental Results
DesignMATEX
# Grp trmatex(s) trtotal(s) AvgErr.
Speedups t1000(s)/trmatex(s)
Speedups tttotal(s)/trtotal(s)
ibmpg1t 100 0.50 0.85 2.5E-5 11.9X 7.3Xibmpg2t 100 2.02 3.72 4.3E-5 13.4X 7.7Xibmpg3t 100 20.15 45.77 3.7E-5 12.2X 6.0XIbmpg4t 15 22.35 65.66 3.9E-5 14.7X 5.6Xibmpg5t 100 35.67 54.21 1.1E-5 11.5X 7.9Xibmpg6t 100 47.27 74.94 3.4E-5 11.5X 7.6X
DesignTR with h=10pst1000(s) tttotal(s)
ibmpg1t 5.94 6.20ibmpg2t 26.98 28.61ibmpg3t 245.92 272.47Ibmpg4t 329.36 368.55ibmpg5t 408.78 428.43ibmpg6t 542.04 567.38
β’ Avg Err.: average differences compared to all output nodes' solutions provided by IBM Power Grid Benchmarks;
β’ Speedups t1000/trmatex : transient stepping runtime speedups of MATEX over TR;
β’ Speedups tttotal/trtotal : total simulation runtime speedups of MATEX over TR.
40
Contributions New time-integration kernel is applied with improved
Krylov subspace-based MEVP approximations for PDNs Adaptive time stepping without matrix re-factorization
during the transient (stepping) simulation This feature cannot be achieved in low order approximation strategy,
e.g., trapezoidal (TR), due to the explicitly embedded β in ππβ
+ ππ2
Distributed computing framework Decompose simulation task based on LTS, then do superposition
using GTS and snapshot to form the final solution. Explore the advantages of large time stepping, also reduce and
reuse Krylov subspaces.
Results of IBM PG benchmarks Compared to TR with fixed time step (10ps), the speedup of transient
stepping is 13X on average.
41
THANK YOU
42