Upload
taite
View
45
Download
3
Embed Size (px)
DESCRIPTION
Solving Large-scale Eigenvalue Problems in SciDAC Applications. Chao Yang Lawrence Berkeley National Laboratory June 27, 2005. People Involved. LBNL: W. Gao, P. Husbands, X. S. Li, E. Ng, C. Yang (TOPS) J. Meza, L. W. Wang, C. Yang (Nano-science) SLAC: L. Lee, K. Ko Stanford: G. Golub - PowerPoint PPT Presentation
Citation preview
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Solving Large-scale Eigenvalue Problems in SciDAC Applications
Chao YangLawrence Berkeley National Laboratory
June 27, 2005
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
People Involved
LBNL: W. Gao, P. Husbands, X. S. Li, E. Ng, C. Yang
(TOPS) J. Meza, L. W. Wang, C. Yang (Nano-science)
SLAC: L. Lee, K. Ko
Stanford: G. Golub
UC-Davis Z. Bai
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
SciDAC Applications
Accelerator Modeling
Nano-science
MxKx 0)1(
0
0)1( 2
2
E
E
EE
n
nc
H i E ii XXXH )(
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Algorithms
Krylov Subspace MethodAlternatives
Optimization based approach non-linear solver based approach
Multi-level Sub-structuringNon-linear Eigenvalue Problems
Structure preserving methods Optimization based method
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Krylov Subspace Method
xAx 0
1000 ,...,,span);,( vAAvvkvA k Κ
kT
kkkkT
kTkkkkk AVVHIVVefHVAV , ,
• Widely used, relatively well understood (Polynomial approximation theory):
• Convergence of KSM: Well separated, large eigenvalues converge rapidly the starting vector
nnn xpxpxpvApz 2221110)(
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Acceleration Techniques
Implicit Restart
Spectral transformation
MxMxMK
xxIA
1
1
)(
)(
);,( 0 kvAΚ
);,( 0 kvA ΚQRIH
QefQHQQVQAV
k
Tkkk
Tkk
,)(
1
filter out unwanted spectral components from v0
ARPACK
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Using KSM in accelerator modeling
the spectrum of the problem
Example: H60VG3 structure, linear element, N=30M, nnz=484M 1024 CPUs, 738GB Ordering time: 4143s Numerical Factorization: 133s Total: 5068s for 12 eigenvalues
Software: PARPACK (implicit restart) + SuperLU, WSMP (spectral transformation)
1
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Limitations of the KSM High degree polynomial
needed for computing small clustered eigenvalues many matrix vector
multiplications Spectral transformation
can be expensive memory limitation scalability
Not easy to introduce a preconditioner eigenvectors of P-1A are
different from eigenvectors of A
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Alternative algorithms
Optimization based approach Minimizing Rayleigh Quotient
Minimizing Residual (Wood & Zunger 85, Jia 97)
Nonlinear equation solver based approach (Jacobi-Davidson) Newton correction Preconditioner stopping criteria for the inner iteration (Notay
2002, Stathopoulos 2005)
0 ),)(()( zuzuzuA T
AxxT
xxT 1min
xAxxxVx T
1,
min
)()( TT uuIPuuI
Allows us to solve problems with more than 90M DOF
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Multi-level Sub-structuring (for computing many eigenpairs)
Domain Decomposition concept Multi-level extension of the Component Mode
Synthesis (CMS) method (Bennighof 92) Decomposition can be done algebraically (Lehoucq &
Bennighof 2002) Success story in structure engineering.... Error analysis Extend to accelerator modeling
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Single-level Sub-structuring
Matrix Partition
Block elimination
Sub-structure calculation (mode selection)
Subspace assembling
11K
22K
11M
22M
11K
22K
11M
22M
),( MK
TKLLK 1ˆ TMLLM 1ˆ
)3(33
)3()3(33
)2(22
)2()2(22
)1(11
)1()1(11
ˆˆ vMvK
vMvK
vMvK
)3()3(
2)3(
13
)2()2(2
)2(12
)1()1(2
)1(11
3
2
1
k
k
k
vvvS
vvvS
vvvS
1S
2S
S
3S
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Mode Selection
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Implementation & cost
Cost: Flops: more than a single sparse Cholesky
factorization Storage: Block Cholesky factor + Projected matrix +
some other stuff NO triangular solves (involving the original K and M),
NO orthogonalization
attractive when:1) the problem is large enough2) a large number of eigenvalues are needed
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
AMLS vs. Shift-invert Lanczos (SIL)DOF=65K, 3 levels of partition
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Cavity with External Coupling
Vector wave equation with waveguide boundary conditions can be modeled by a non-linear eigenvalue problem
OpenCavity
n
E i k 2 kc1
2 n n
E 0
n
E i k 2 kc2
2 n
n
E 0
n
E i k 2 kc3
2 n
n
E 0 Waveguide BC
Waveguide BC
Waveguide BC
With
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Quadratic Eigenvalue Problem
Consider only one mode propagating in the waveguides
Algorithms Linearize then solve by KSM (does not preserve
the structure of the problem) Second Order Arnoldi Iteration (Bai & Su 2005)
project the QEP into 2nd order Krylov Subspace
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Second-Order Krylov Space (Bai)
IkKMBWiMA c211 ,
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
SOAR is faster and more accurate (than linearization)
Accelerating cavity model for international linear collider (ILC)
9-cell superconducting cavity coupled to one input coupler and two Higher-Order-Mode couplers.
NDOFs=3.2million, NCPUs=768, Memory=300GB
18 eigenpairs in 2634 seconds (linearization took more than 1 hour)
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Electronic Structure Calculation
wave function
n – real space grid size, e.g. 323~32000
k – number of occupied states, 1~10% of n
Charge density
• Ekinetic =
• Eionic =
• EHartree=
• Exc =
)(trace21 LXX T
i
Ti
Tion wxXXD
2trace
)(21 XSX T
Xfe xcT
nik xxxxX R ),,...,,( 21
TXXX diag)(
Etotal(X) = Ekinetic + Eionic + EHartree + Exc
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Non-linear Eigenvalue Problem
Total energy minimization
KKT condition
IXX
XET
totalX
s.t.
)(min
IXX
XX
XH
XgXSwwDL
T
xcT
ion
)(
))((DiagDiag
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
The Self Consistent Field Iteration
Input: initial guess and Output:
Major steps
o For i=1,2,…,until converged
1) Form
2) Compute k smallest eigpairs of
)( )()( ii XHH
0X k
T
total
IXX
XEX
s.t.
)(minargwSDL ion ,,,
k
T
iTi
IXX
XHXX
s.t.
traceminarg )()1(
)(iH
)()()()( iiii XXXH
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Direct Constrained Minimization (DCM)
GYE itotal
IYGYG kTT
)(min
):1,2:1()()( kkkGYP ii
)1()()()( ,, iiii PRXY
)( )()( ii XHH For i=1,2,… until convergence1. Form 2. Compute
3. If (i>1) then• set
4. else• set
5. Solve
6. If (i>1) then• set
7. else• set
onerpreconditi a is ,Diag where
,)()()()(
)()()()(1)(
KXHX
XXHKRiiii
iiiii
T
)()()( , iii RXY
):1,3:1()()( kkkGYP ii
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
DCM vs. SCF
Atomic system: SiH4 Discretization: spectral
method with plane wave basis: n=323 in real space, N=2103 (# of basis functions) in frequency space
Number of occupied states: k = 4
PETOT version of SCF uses 10 PCG steps (inner iterations) per outer iteration
DCM: 3 inner iterations
min)()( )()( EXEXE i
totali
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N
Concluding Remarks
Krylov Subspace Method (with appropriate acceleration strategies) continues to play an important role in solving SciDAC eigenvalue problems
Steady progress has been made in alternative approaches that can make better use of preconditioners
Multi-level sub-structuring is promising for computing many eigenpairs
Significant progress made in solving QEP Non-linear eigenvalue problems remain challenging