Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Model-Order Reduction of High-Speed Interconnects:
Challenges and Opportunities
Michel Nakhla
Carleton University
Canada
Model Reduction for Complex Dynamical Systems
Berlin 2010
Delay
DistortionReflection
EMI
Crosstalk
Interconnect Hierarchy
Backplanesand cables
DIEPackage
BOARD
Lumped segmentation
Lumped segmentation
If you have is a hammer, every problem starts looking like a nail !!(Mark Twain)
Lumped segmentation
Is this a good starting point for MOR??
Agenda
• Interconnect Macromodeling
• MOR of Interconnect Macromodels
Lumped segmentation
High-Speed Interconnect ??
d
Interconnect length becomescomparable to the Wavelength
l v
f
__= d
fmax
0.35
tr
_____=Sharper pulses contain
higher frequency harmonics
From Maxwell to the Telegrapher Equations
( , ) . ( , ) .
Hx xH
t t
v x t dl i x t H dl
QTEM
Telegrapher's Equations
C. R. Paul, Analysis of Multiconductor Transmission Lines. John
Wiley and Sons, 1994
The Telegrapher's Equations
),(t
),( ),(
),(t
),( ),(
tttx
tttx
xxx
xx
vvi
xiiv
CG
LR
Interconnect Delay
TL
50Vs
50
S. Grivet-Talocia et al., “Transientanalysis of lossy transmission lines: an efficient approach based on the method ofcharacteristics,” IEEE Transactions on Advanced Packaging, Feb. 2004.
Time-of-Flight delay
From Telegrapher’s Equations to SPICE
Mixed Frequency/Time Problem
H-S Interconnect
Telegrapher’s Equation
Freq-Domain Equations
( , ) (0, )
( , ) (0, )
d s sA sB de
d s s
V V
I I
SPICE
Nonlinear Simulator
Time-Domain Equations
xW Hx F(x) b(t)
t
From Telegrapher's Equations to Circuit Simulation
),(t
),( ),(
),(t
),( ),(
tttx
tttx
xxx
xx
vvi
xiiv
CG
LR
“Macromodeling”
Circuit Simulator“ODE’s Solver”
ODE’s
Macromodeling
Uniform Segmentation
DELAY ???
Delay Modeling
With Delay
Extraction
Lumped
Segmentation
4s 1272s
TL
50Vs
50
Time-of-Flight delay
Why?
Without Delay Extraction
With Delay Extraction
i1(t) i2(t)
-
v1(t)
+Z0
+-w1(t)
i1(t)
v2(t)
-
+Z0
+- w2(t)
i2(t)
twtv2tw 221 twtv2tw 112
NOT Passive By Construction
Delay Modeling- MoC
Passive Delay Extraction
Baker-Campbell-Hausdorff Series (BCH)
BXBA ss eee
where
1k
kXX
BAX ,,sfkk
BBAX ss eee
0
0d
RA
G
0
0d
LB
C
Passive Delay Extraction
Lie Product
The product where1
;m
kk
0
0
d
RA
G
0
0d
LB
Cs
m mk e e
A B
1error O
m
s (A B)converges asymptotically to e as m
Passive Delay Extraction
Modified Lie Product
The product where1
;m
kk
0
0
d
RA
G
0
0d
LB
C2 2
s s
m m mk e e e
B A B
2
1error O
m
s (A B)converges asymptotically to e as m
Delay
sourcesRLC
Delay
sources
Passive Delay Extraction
lossylossless lossless
2 2
s s
m m mk e e e
B A B
DEPACT Macromodel
DEPACT cell
1
DEPACT cell
k
DEPACT cell
m
Lossless LosslessLossy
kth DEPACT Cell
1 2m
A sBe
N. Nakhla, A. Dounavis, R. Achar, and M. Nakhla, “DEPACT: Delay extraction based passive compact transmission line macromodelling algorithm,” IEEE Transactions on Advanced Packaging, Feb. 2005.
DEPACT Macromodel
DEPACT cell
1
DEPACT cell
k
DEPACT cell
m
1 2m
A sBe
Passivity of this realization is guaranteed
by CONSTRUCTION.
Realization of the Lossy Sections
Are uniform sections good choice?
Example: 2 2( ) ( )m
mH s G s
2 ( ) ( ) ( ) ( )......... ( )
.... 2
i jm k lH s G s G s G s G s
i j k l m
Realization of the Lossy Sections- MRA
• Based on Pade’ approximation of the
exponential matrix
• “Closed-form” Approximation
• Passivity is guaranteed by construction
• Realized as cascade of RLC sections
A. Dounavis, R. Achar, and M. Nakhla, “A general class of passive macromodels for lossy multiconductor transmission lines,” IEEE Transactions on Microwave Theory and Techniques, Oct. 2001.
0.25pFV
500.25pF
50 C1
C2
B11ns, 50
1ns, 50
50
Open
B2
Example : Lossy Coupled TL
5cm, 20cm, 40cm
Input: step response, rise time = 0.035 ns
Example 3: Far End Active Line (Node C1) 40cm
Example 3: Far End Victim Line (Node C2) 40cm
Example 3: Near End Active Line (Node B1) 40cm
Example 3: Near End Victim Line (Node B2) 40cm
Performance Comparison
Simulations MRA
(MNA size)
Lumped (MNA size)
MNA savings
Example 1 8281 48000 83%
Example 2 355 2 482 86%
Example3 (5cm) 914 6 002 85%
Example 3 (20cm)
3 650 24 002 85%
Example 3 (40cm)
7 298 80 002 91%
Example 4: Several Coupled TL
Length=0.4cmLength=0.4cm
Length=2.5cm Length=2.5cm Length=2.5cm
Input: Trapzoidal pulse, pulse width = 0.8nsrise time = 0.1 nsfall time = 0.1 nsperiod = 2ns
V1
V4
Example 4: CPU Comparison
Algorithm Total number of lumped sections
CPU time (SPARC Ultra 5-10) (seconds)
Conventional Lumped
1606 470
MRA 221 76
V
301.5pF
1.5pF
30
V1
V2
V3
V4
Length=10cm
5V5V
Trapezoidal pulse with rise/fall 0.1ns pulse width 5ns and a period of 10ns
R, L, C and G functions of frequency
0.1pF
Example : Nonlinear Terminations
Example 5
R and L of interconnects are functions of frequency
Algorithm Total number of lumped sections
CPU time (SPARC Ultra 5-10) (seconds)
Conventional Lumped
150 921
MRA 20 49
1pF
1pF
1pF
1pF
1pF
1pF
1pF
1pF
1pF
25
25
25
25
25
25
25
25
25
5V
1pF
OutputLength = 15cm
Lossless line
5V
Example 6: Nonlinear Terminations
Algorithm Total number of lumped sections
CPU time (SPARC Ultra 5-10) (seconds)
Conventional Lumped
300 3282
MRA 31 315
DEPACT
Lumped
Segmentation
Vo
lts
IBM Line-4*
Macromodeling- Example
Ruehli, Cangellaris and Huang, “Three Test Problems for the Comparison of
Lossy Transmission Line Algorithms”, Proc. EPEP-2002
Macromodeling- CPU Comparison
Method CPU time (sec) SPEED-UP
Lumped
Segmentation1272 318
MRA 463 116
DEPACT 4 -------
Frequency-dependent Parameters
(IBM Line-4*)
DEPACT Macromodel - Example 2
DEPACT order m=6
tr=0.1ns
1v
DEPACT Macromodel - Example 2
Time (ns)
Vo
lts
Active line far-end voltage of subnetwork #2
DEPACT
Lumped
Segmentation
DEPACT Macromodel - Example 2
Time (ns)
Vo
lts
Victim line far-end voltage of subnetwork #2
DEPACT
Lumped
Segmentation
DEPACT Macromodel - Example 2
Lumped
Segmentation
DEPACT Speed-Up
23.4 sec 0.75 sec 31
CPU speed-up
Macromodeling
RLC coupled sections
+
distributed delay sources
Agenda
• Interconnect Macromodeling
• MOR of Interconnect Macromodels
MOR for RLC+Delay
ODE
ODE
MOR
ODDE
ODE
MOR
s kk
k
e a s
• Expanded system
• Passivity
MOR for RLC+Delay
ODE
ODE
MOR
ODDE
ODE
MOR
ODDE
ODDE
MOR ??
W. Tseng, C. Chen, E. Gad, M. Nakhla, and R. Achar, “Passive order reduction for
RLC circuits with delay elements”, IEEE Trans. Adv. Pkg.,Nov. 2007.
ODDEODE Q1
Q2
ODDEODDE
MOR for RLC+Delay
• Passive by Construction
• Preserve TL causality
MOR for RLC+Delay
1024 Resistors
477 Capacitors
596 Inductors
120 lossless TLs
Order of the original network: 2390Order of the Reduced model: 60CPU SPEEDUP: 17
Frequency Response
...
No. of ports = 2 x N
N lines
MOR
Multi-port MOR
…….
…….
……
.…
….
…….
...
...
... …
….
……
.
...
...
...
...
Multi-port MOR
N= Number of coupled lines
P= number of ports= 2 x N
N = 4 q = 160
N = 64 q = 2560
N lines
No. of block moments k = 20:
Interconnects
General RLC Circuits??
Interconnect PUL Parameters
),(t
),( ),(
),(t
),( ),(
tttx
tttx
xxx
xx
vvi
xiiv
CG
LR
G and C are diagonally dominant Matrices
L and R : diagonal is the largest element(absolute value)
Agenda
• Interconnect Macromodeling
• MOR of Interconnect Macromodels
MOR for RLC+Delay
Partitioning
1. Physical
2. Electrical
Agenda
• Interconnect Macromodeling
• MOR of Interconnect Macromodels
MOR for RLC+Delay
Partitioning
1. Physical
2. Electrical
Previous Methods
Driver
Subcircuit
Interconnect Subcircuit
Receiver
Subcircuit
References:
1) F.Y.Chang, “The generalized method of characteristics for waveform relaxation analysis for
lossy coupled transmission lines,” IEEE Trans.MTT,vol.37,pp.2028-2038, Dec.1989
2) R.Wang and O.Wing, “Transient analysis of dispersive VLSI interconnects terminated in
nonlinear load,”IEEE Trans. CAD, vol.11, no.10,pp.1258-1277, Oct. 1992
Transverse Partitioning (Conceptual View)
Transverse Partitioning (Conceptual View)
Transverse Partitioning (Conceptual View)
SOURCE
SOURCE
SOURCE
SOURCE+ -
SOURCE+ -
SOURCE+ -
WR-TP
• Coupled lines circuit splits into single lines
subcircuits
• Exploits the rapid decrease in coupling effects
as the distance between the lines increases
• Method can be implemented using Parallel
Processing
N. Nakhla, A. E. Ruehli, M. Nakhla, and R. Achar, “Simulation of coupled interconnectsusing waveform relaxation and transverse partitioning,” IEEE Transactions on Advanced Packaging,, Feb. 2006.
Telegrapher’s equations can be written as:
For the jth line
WR-TP: Mathematical View
j j
jj j jj
j j
jj j jj
v iR i L ,
i vˆ ˆG v C ,
j
j
e x tx t
q x tx t
(k+1)
(k+1)(k+1) (k+1)
(k+1) (k+1)
(k)
(k)
Applying relaxation techniques
Line 1
Line j
Line N
Line 1
N single line
subcircuits
N coupled lines
Single-ended Representation
Line j
Line N
+ -
+ -
+ -
N coupled lines
Line 1
Line j
Line N
Distributed Representation
N single line subcircuits
Line 1
Line j
Line N
+ - + - + -
+ - + - + -
+ - + - + -
Line 1
Line j
Line N
N coupled lines
Distributed Representation
Nine Coupled Line circuit
1) A. Ruehli, A.C. Cangellaris and H-M Huang, “Three test problems for the
comparison of lossy transmission line algorithms,” Proceedings EPEP,
pp. 347-350, Oct. 2002.
R = 50
C = 1pF
N=9
d=1cm
V
WR-TP: Numerical Examples
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
x 10-8
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
x 10-8
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
x 10-3
Voltage at far end of active line
Time (sec) Time (sec)
Voltage at far end of victim line
HSPICE
WF-TP
HSPICE
WF-TP
After 3 iterations
After 3 iterationsInitial Guess
Initial Guess
WR-TP: Example 1
I. Elfadel, “Convergence of Transverse Waveform Relaxation for the Electrical Analysis of Very Wide Transmission Line Buses”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, August 2009
0 20 40 60 80 1000
2000
4000
6000
8000
10000
Example 2
CPU Time
(Seconds)
WR-TP
HSPICE
N (number of lines)
N4
Computer: INTEL P4 2GHz CPU
0 20 40 60 80 1000
5
10
15
20
25
30
35
40
45
Example 2
CPU Time
(Seconds)
N (number of lines)
Linear
WF-TP
2.7 hours!
in HSPICE
Computer: INTEL P4 2GHz CPU
Example 3
# lines W Element
HSPICE WF-TP
12 455 sec 13.66 sec
15 1319.61 sec 17.10 sec
24 2986.22 sec 27.30 sec
N=24
V
Highly Resistive Low Inductive
R = 50
C = 1pF
Computer: INTEL P4 2GHz CPU
Example 3
24 Coupled lines
Voltage at near end of victim line
0 0.2 0.4 0.6 0.8 1
x 10-8
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
Time (sec)
IFFT
WF-TP
(3 iterations)
Example 3
Voltage at near end of victim line
0 0.2 0.4 0.6 0.8 1
x 10-8
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
Time (sec)
IFFT
WF-TP
(3 iterations)
W element
(HSPICE)
24 Coupled lines
Direct MOR vs. WR-TP+MOR
Direct MOR
Tightly coupled2N-ports
Reduced model
… ...
Reduced Models
2-Port Reduced Subcircuit #1
…
+ -
+ -
+ -
WR-TP+MOR
2-Port Reduced Subcircuit #2
2-Port Reduced Subcircuit #N
N. Nakhla, M. Nakhla, and R. Achar, “Model order reduction of large multiport interconnect structures using waveform relaxation techniques,” IEEE International Conference on Computer Aided Design, 2007
Model Reduction of subcircuits
SOURCE
SOURCE+ -
jth line:
Port 1 Port 2…….
2-N port circuit N 2-port subcircuits
Sparsity Patterns
Large and dense!!!
Dimension: 2kN x 2kN
k= no. of preserved
block moments
Direct MOR vs. WR-TP+MOR
Sparsity pattern of MNA
Eqs. using Direct MOR
Sparsity Patterns
Sparsity pattern of reduced MNA
Eqs. using PRIMA
Large and dense!!!
Dimension: 2kN x 2kN
Sparsity pattern of reduced MNA
Eqs. using WR-TP
Sparse Block Diagonal
Dimension of each block:
2k x 2k
k= no. of preserved
block moments
Direct MOR vs. WR-TP+MOR
2. To ensure passivity of the
reduced model :
Requires passive
synthesis of multi-port
Z(s), Y(s) [1]
Direct MOR
1. Approximation of Z(s), Y(s)
by positive real matrices
WR-TP+MOR
1. Approximation of Z(s), Y(s)
by positive real scalar
rational function
2. To ensure passivity of the
reduced model :
Requires passive
synthesis of single-port
immitance
WR-TP + MOR: FD parameters
( ) ( ) ( )s s s s Z R L ( ) ( ) ( )s s s s Y G C
Computational Results
Example: 8 coupled lines
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x 10-8
-0.025
-0.02
-0.015
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
Time (sec)
Vo
lta
ge
(vo
lts
)
Victim line near end (line 4)
Original
Network
WR-TP+MOR
(1 iteration)
Computational Results
Example: 8 coupled lines
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x 10-8
-0.025
-0.02
-0.015
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
Time (sec)
Vo
lta
ge
(vo
lts
)
Victim line near end (line 4)
Original
Network
WR-TP+MOR
(2 iterations)
Computational Results
Example: 8 coupled lines
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x 10-8
-0.025
-0.02
-0.015
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
Time (sec)
Vo
lta
ge
(vo
lts
)
Victim line near end (line 4)
Original
Network
WR-TP+MOR
(3 iterations)
Computational Results
Example: 8 coupled lines
Model Size
Original network 1810
d= 3 cm, tr=0.5ns
Direct MOR
40
320
WR-TP+MOR
Computational Results
Example: 36 coupled lines (72 ports)
d= 15 cm, tr=0.2ns
Model SizeCPU Time
(Sec)Speed-up
Reduced 1440 5560 -------
WR-TP +MOR 40 19.29 288 x
Parallel Implementation
Speedup
# CPUs
8 - core CPU (Intel Xeon E5310 1.6 GHz)
2 3 4 5 6 7 82
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
D. Paul, N. Nakhla, R. Achar, and M. Nakhla, “Parallel simulation of massively coupled interconnect networks,” IEEE Transactions on Advanced Packaging, Feb. 2010
SUMMARY
Maxwell’s equations to Telegrapher’s equations
Properties of Interconnects
Interconnects Macromodels• Uniform Lumped Segmentation• Non-uniform Lumped Segmentation• Non-uniform Lumped Segmentation
MOR• RLC +Delay• Portioning Physical Electrical