Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Computational Challenges ofCoupled Cluster Theory
Jeff Hammond
Leadership Computing FacilityArgonne National Laboratory
11 January 2012
Jeff Hammond ICERM
Atomistic simulation in chemistry
1 classical molecular dynamics (MD) withempirical potentials
2 quantum molecular dynamics based upondensity-function theory (DFT)
3 quantum chemistry with wavefunctionse.g. perturbation theory (PT), coupled-cluster(CC) or quantum monte carlo (QMC).
Jeff Hammond ICERM
Classical molecular dynamics
Image courtesy of Benoıt Roux via ALCF.
Solves Newton’s equations ofmotion with empirical terms andclassical electrostatics.
Size: 100K-10M atoms
Time: 1-10 ns/day
Scaling: ∼ Natoms
Math: N-body
Data from K. Schulten, et al. “Biomolecular modeling in the era of
petascale computing.” In D. Bader, ed., Petascale Computing:
Algorithms and Applications.
Jeff Hammond ICERM
Car-Parrinello molecular dynamics
Image courtesy of Giulia Galli via ALCF.
Forces obtained from solving anapproximate single-particleSchrodinger equation.
Size: 100-1000 atoms
Time: 0.01-1 ps/day
Scaling: ∼ Nxel (x=1-3)
Math: FFT, eigensolve.
F. Gygi, IBM J. Res. Dev. 52, 137 (2008); E. J. Bylaska et al. J.
Phys.: Conf. Ser. 180, 012028 (2009).
Jeff Hammond ICERM
Wavefunction theory
,
MP2 is second-order PT and is accurate viamagical cancellation of error.
CC is infinite-order solution to many-bodySchrodinger equation truncated via clusters.
QMC is Monte Carlo integration applied tothe Schrodinger equation.
Size: 10-100 atoms, maybe 100-1000 atomswith MP2.
Time: N/A (LOL)
Scaling: ∼ Nxbf (x=4-7)
Math: DLA (tensors)
Image courtesy of Karol Kowalski and Niri Govind.
Jeff Hammond ICERM
The Standard Model(of Quantum Chemistry)
Jeff Hammond ICERM
Quantum chemistry
1 Separate molecule(s) from environment (closed to bothmatter and energy)
2 Boundary conditions:ψ(x →∞) = 0 (finite system)ψ(x) = φ(x + g) (infinite, periodic system)
3 Ignore relativity, QED, spin-orbit coupling
4 Separate electronic and nuclear degrees of freedom
−→ non-relativistic electronic Schrodinger equation in a vacuum atzero temperature.
Jeff Hammond ICERM
Quantum chemistry
H = Tel + Vel−nuc + Vel−el
H = −1
2
M∑i=1
∇2i +
N∑n=1
M∑i=1
Zn
Rni+
M∑i<j
1
rij
Ψ (x1, . . . , xn, xn+1, . . . , xN) = −Ψ (x1, . . . , xn+1, xn, . . . , xN)
The electron coordinates (xi ) include both space (r) and spin (σ).We will integrate-out spin wherever possible.
Jeff Hammond ICERM
Quantum chemistry
Wavefunction antisymmetry is enforced by expanding indeterminants, which we now capture in second quantization.
1 project physical operators (e.g. Coulomb) into one-electronbasis — usually atom-center Gaussians
2 generate mean-field reference and expand many-bodywavefunction in terms of excitations out of that reference
−→ Full configuration-interation (FCI) ansatz.
1 truncate exponentially-growing FCI ansatz (CI=lineargenerator, CC=exponential generator)
2 solve CC (or CI) iteratively
3 add more correlation via perturbation theory
−→ CCSD(T), as one example.
Jeff Hammond ICERM
Quantum chemistry
Correct for missing physics using perturbation theory (a posteriorierror correction) or mixed (e.g. QM/MM) formalism:
1 relativistic corrections
2 non-adiabatic corrections
3 solvent corrections
4 open BC corrections (less common)
Jeff Hammond ICERM
Coupled-cluster theory
Jeff Hammond ICERM
Coupled-cluster theory
|ΨCC 〉 = exp(T )|ΨHF 〉T = T1 + T2 + · · ·+ Tn (n� N)
T1 =∑ia
tai a†aai
T2 =∑ijab
tabij a†aa†baj ai
|ΨCCD〉 = exp(T2)|ΨHF 〉= (1 + T2 + T 2
2 )|ΨHF 〉|ΨCCSD〉 = exp(T1 + T2)|ΨHF 〉
= (1 + T1 + · · ·+ T 41 + T2 + T 2
2 + T1T2 + T 21 T2)|ΨHF 〉
Jeff Hammond ICERM
Coupled cluster (CCD) implementation
exp(T2)|ΨHF 〉 turns into:
Rabij = V ab
ij + P(ia, jb)
[T aeij I be − T ab
im Imj +1
2V abef T
efij +
1
2T abmnI
mnij − T ae
mj Imbie − Ima
ie T ebmj + (2T ea
mi − T eaim)Imb
ej
]
I ab = (−2Vmneb + Vmn
be )T eamn
I ij = (2Vmief − V im
ef )T efmj
I ijkl = V ijkl + V ij
ef Tefkl
I iajb = V iajb −
1
2V imeb T
eajm
I iabj = V iabj + V im
be (T eamj −
1
2T aemj )−
1
2Vmibe T
aemj
Jeff Hammond ICERM
Tensor Contraction Engine
Jeff Hammond ICERM
Tensor Contraction Engine
What does it do?
1 GUI input quantum many-body theory e.g. CCSD.
2 Operator specification of theory (as in a theory paper).
3 Apply Wick’s theory to transform operator expressions intoarray expressions (as in a computational paper).
4 Transform input array expression to operation tree using manytypes of optimization (i.e. compile).
5 Generate F77/GA/NXTVAL implementation for NWChem orC++/MemoryGrp for MPQC or F90/.. for UTChem.
Developer can intercept at various stages to modify theory,algorithm or implementation (may be painful).
Jeff Hammond ICERM
TCE Input
We get 73 lines of serial F90 or 604 lines of parallel F77 from this:
1/1 Sum(g1 g2 p3 h4) f(g1 g2) t(p3 h4) {g1+ g2}{p3+ h4}1/4 Sum(g1 g2 g3 g4 p5 h6) v(g1 g2 g3 g4) t(p5 h6) {g1+g2+ g4 g3}{p5+ h6}1/16 Sum(g1 g2 g3 g4 p5 p6 h7 h8) v(g1 g2 g3 g4) t(p5 p6
h7 h8) {g1+ g2+ g4 g3}{p5+ p6+ h8 h7}1/8 Sum(g1 g2 g3 g4 p5 h6 p7 h8) v(g1 g2 g3 g4) t(p5 h6)
t(p7 h8) {g1+ g2+ g4 g3}{p5+ h6} {p7+ h8}
LaTeX equivalent of the first term:∑g1,g2,p3,h4
fg1,g2tp3,h4{g†1g2}{p†3h4}
Jeff Hammond ICERM
Summary of TCE module
http://cloc.sourceforge.net v 1.53 T=30.0 s
---------------------------------------------
Language files blank comment code
---------------------------------------------
Fortran 77 11451 1004 115129 2824724
---------------------------------------------
SUM: 11451 1004 115129 2824724
---------------------------------------------
Perhaps <25 KLOC are hand-written; ∼100 KLOC is utility codefollowing TCE data-parallel template.
Expansion from TCE input to massively-parallel F77 is ∼ 200(drops with language abstractions).
Jeff Hammond ICERM
TCE template
Pseudocode for Ra,bi ,j = T c,d
i ,j ∗ Vc,da,b :
for i,j in occupied blocks:
for a,b in virtual blocks:
for c,d in virtual blocks:
if symmetry criteria(i,j,a,b,c,d):
if dynamic load balancer(me):
Get block t(i,j,c,d) from T
Permute t(i,j,c,d)
Get block v(a,b,c,d) from V
Permute v(a,b,c,d)
r(i,j,c,d) += t(i,j,c,d) * v(a,b,c,d)
Permute r(i,j,a,b)
Accumulate r(i,j,a,b) block to R
Jeff Hammond ICERM
TCE profile
ccsd t2 8 (DGEMM-like):
timer min max avgdgemm 68.605 91.296 81.282ga acc 0.042 0.070 0.050ga get 5.845 7.779 6.679nxtask 0.012 28.710 13.638
tce sort4 6.184 8.174 7.347tce sortacc4 7.892 11.042 9.290
Jeff Hammond ICERM
Observations about the TCE template
1 Blocking get means no overlap
2 Dynamic load balancing is global (shared counter)
3 Get+Permute of t(i,j,c,d) happens for all (a,b)
4 Get+Permute of v(a,b,c,d) happens for all (i,j)
5 Permute is a nasty operation (desire fused contraction).
We could apply well-known techniques to fix everything. . .
(There are an uncountable number of good programmingtechniques not being used in any scientific code.)
Jeff Hammond ICERM
TCE Template for MMM
Pseudocode for C ij = Ai
k ∗ Bkj :
for i in I blocks:
for j in J blocks:
for k in K blocks:
if dynamic load balancer(me):
Get block a(i,k) from A
Get block b(k,j) from B
c(i,j) += a(i,k) * b(k,j)
Accumulate c(i,j) block to C
Algorithms trump tuned runtimes and libraries every time.
Jeff Hammond ICERM
A better way
TCE has it right, but only serially:tensor contractions are permute + matmul.
Parallel permute = parallel sorting = well-understood.
Parallel matmul = well-understood.
Therefore, parallel tensor contractions are solved, up to theimplementation details and future algorithm developments insorting and matmul.
All existing TCE technology for operation trees are still valid.
Jeff Hammond ICERM
Cyclops Tensor Framework
Written by Edgar Solomonik (I am just a cheerleader).
Very preliminary (Summer 2011) strong-scaling results:
Jeff Hammond ICERM
Communication
But where’s the one-sidedcommunication?!?
Like parallel matmul and sorting,CTF does fine with MPI-1.
There are good uses of one-sidedbut TCE isn’t one*.
* Unless matmul or sorting benefits from it.
Jeff Hammond ICERM
Summary
Dense tensor contractions are dense linear algebra plus somelower-order bookkeeping.
Permutation symmetry folded into cyclic/elementaldistribution in a load-balanced way.
Parallel dense linear algebra is a well-understood problem thatis continuously studied by smart people; parallel libraries exist.
Parallel dense tensor contractions are best implemented interms of parallel dense linear algebra and not as serial denselinear algebra directed by a locality-oblivious dynamic runtime,especially if flops are “free.”
Jeff Hammond ICERM
The future
Sparse formalisms are much more involved, but there is everyreason to believe that representing them in terms of sparse sortingand sparse matmul will be reasonable, whereas the one-sidedapproach would only move farther and farther away from itssweet-spot (bandwidth-limited bulk operations and infrequentactive-messages).
TCE treated matrix-chain multiplication problem in astraightforward way, ignoring possibilities for exotic multi-levelfusion.
PNNL and Argonne have efforts using task-parallelism andDAG-scheduling for MPPs and heterogenous nodes, respectively,but much parallelism remains. We can learn from DAG efforts inDLA. . .
Jeff Hammond ICERM
The end of CC
Jeff Hammond ICERM
A theory for hard science problems. . .
“It is our belief that CCSDTQ represents the (practical) end ofadvancement in coupled-cluster theory. We are unaware of anyproblems in quantum chemistry that are not treated satisfactorilyby this level of theory.” – John Stanton
|ΨCC 〉 = exp(T )|ΨHF 〉T = T1 + T2 + T3 + T4
|ΨCCSDTQ〉 = exp(T1 + T2 + T3 + T4)|ΨHF 〉
Writing this theory down in full details takes longer than somesimulations. . .
Jeff Hammond ICERM
CCSDTQ singles
r1p2h1
, = fp2h1
, +fh3h1
, tp2h3
, +f p2p3
, tp3h1
,−tp3h4
, vh4p2h1p3
, +f h3p4
, tp4p2h3h1
, +1
2tp3p2h4h5
, vh4h5h1p3
, +1
2tp3p4h5h1
, vh5p2p3p4
, +1
4tp3p4p2h5h6h1
, vh5h6p3p4
,
−tp3h1
, tp2h4
, f h4p3
,−tp2h3
, tp4h5
, vh3h5h1p4
,−tp3h1
, tp4h5
, vh5p2p3p4
,−1
2tp3p2h4h5
, tp6h1
, vh4h5p3p6
,−1
2tp3p4h5h1
, tp2h6
, vh5h6p3p4
, +tp3p2h4h1
, tp5h6
, vh4h6p3p5
,
−tp3h1
, tp2h4
, tp5h6
, vh4h6p3p5
,
Jeff Hammond ICERM
CCSDTQ doubles
r2p3p4h1h2
, = +vp3p4h1h2
,−(
1 − Pp4p3h1h2p3p4h1h2
)tp3h5
, vh5p4h1h2
, +(
1 − Pp3p4h2h1p3p4h1h2
)tp5h2
, vp3p4h1p5
,−(
1 − Pp3p4h1h2p3p4h2h1
)fh5h1
, tp3p4h5h2
,
−(
1 − Pp3p4h1h2p4p3h1h2
)f p4p5
, tp5p3h1h2
, +1
2tp3p4h5h6
, vh5h6h1h2
, +(
1 − Pp4p3h2h1p3p4h2h1
− Pp3p4h2h1p3p4h1h2
+ Pp4p3h2h1p3p4h1h2
)tp5p3h6h2
, vh6p4h1p5
,
+1
2tp5p6h1h2
, vp3p4p5p6
, +f h5p6
, tp6p3p4h5h1h2
, +1
2
(1 − P
p3p4h2h1p3p4h1h2
)tp5p3p4h6h7h2
, vh6h7h1p5
,−1
2
(1 − P
p4p3h1h2p3p4h1h2
)tp5p6p3h7h1h2
, vh7p4p5p6
,
+1
4tp5p6p3p4h7h8h1h2
, vh7h8p5p6
, +tp3h5
, tp4h6
, vh5h6h1h2
,−(
1 − Pp4p3h2h1p3p4h2h1
− Pp3p4h2h1p3p4h1h2
+ Pp4p3h2h1p3p4h1h2
)tp5h2
, tp3h6
, vh6p4h1p5
, +tp5h1
, tp6h2
, vp3p4p5p6
,
−(
1 − Pp3p4h2h1p3p4h1h2
)f h5p6
, tp3p4h5h2
, tp6h1
, +(
1 − Pp4p3h1h2p3p4h1h2
)f h5p6
, tp6p3h1h2
, tp4h5
, +1
2
(1 − P
p3p4h2h1p3p4h1h2
)tp3p4h5h6
, tp7h2
, vh5h6h1p7
,
−(
1 − Pp4p3h2h1p3p4h2h1
− Pp3p4h2h1p3p4h1h2
+ Pp4p3h2h1p3p4h1h2
)tp5p3h6h2
, tp4h7
, vh6h7h1p5
,−(
1 − Pp3p4h2h1p3p4h1h2
)tp3p4h5h2
, tp6h7
, vh5h7h1p6
,−(
1 − Pp4p3h2h1p3p4h2h1
− Pp3p4h2h1p3p4h1h2
+ Pp4p3h2h1p3p4h1h2
)tp5p3h6h2
, tp7h1
, vh6p4p5p7
,
−1
2
(1 − P
p4p3h1h2p3p4h1h2
)tp5p6h1h2
, tp3h7
, vh7p4p5p6
, +(
1 − Pp4p3h1h2p3p4h1h2
)tp5p3h1h2
, tp6h7
, vh7p4p5p6
,−1
2
(1 − P
p3p4h2h1p3p4h1h2
)tp5p3p4h6h7h2
, tp8h1
, vh6h7p5p8
,
+1
2
(1 − P
p4p3h1h2p3p4h1h2
)tp5p6p3h7h1h2
, tp4h8
, vh7h8p5p6
, +tp5p3p4h6h1h2
, tp7h8
, vh6h8p5p7
, +1
2
(1 − P
p3p4h1h2p4p3h1h2
)tp5p4h1h2
, tp6p3h7h8
, vh7h8p5p6
, +1
4tp5p6h1h2
, tp3p4h7h8
, vh7h8p5p6
,
−1
2
(1 − P
p3p4h1h2p3p4h2h1
)tp3p4h5h1
, tp6p7h8h2
, vh5h8p6p7
,−(
1 − Pp3p4h1h2p4p3h1h2
)tp5p4h6h1
, tp7p3h8h2
, vh6h8p5p7
, +(
1 − Pp3p4h2h1p3p4h1h2
)tp5h2
, tp3h6
, tp4h7
, vh6h7h1p5
,−(
1 − Pp4p3h1h2p3p4h1h2
)tp5h1
, tp6h2
, tp3h7
, vh7p4p5p6
, +1
2tp5h1
, tp6h2
, tp3p4h7h8
, vh7h8p5p6
,
+(
1 − Pp3p4h1h2p4p3h1h2
− Pp4p3h1h2p4p3h2h1
+ Pp3p4h1h2p4p3h2h1
)tp5h1
, tp4h6
, tp7p3h8h2
, vh6h8p5p7
,
+(
1 − Pp3p4h1h2p3p4h2h1
)tp5h1
, tp6h7
, tp3p4h8h2
, vh7h8p5p6
, +1
2tp3h5
, tp4h6
, tp7p8h1h2
, vh5h6p7p8
,−(
1 − Pp3p4h1h2p4p3h1h2
)tp4h5
, tp6h7
, tp8p3h1h2
, vh5h7p6p8
, +tp5h1
, tp6h2
, tp3h7
, tp4h8
, vh7h8p5p6
,
Jeff Hammond ICERM
CCSDTQ triples
r3p3p4p5h1h2h3
, =(
1 − Pp4p6p5h3h1h2p4p5p6h3h1h2
− Pp5p6p4h3h1h2p5p4p6h3h1h2
− Pp4p5p6h3h1h2p4p5p6h2h1h3
+ Pp4p6p5h3h1h2p4p5p6h2h1h3
+ Pp5p6p4h3h1h2p5p4p6h2h1h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
)tp4p5h7h3
, vh7p6h1h2
,
+(
1 − Pp5p4p6h2h3h1p4p5p6h2h3h1
− Pp6p4p5h2h3h1p4p6p5h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp5p4p6h3h2h1p4p5p6h1h3h2
− Pp6p4p5h3h2h1p4p6p5h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp5p4p6h2h3h1p4p5p6h1h2h3
− Pp6p4p5h2h3h1p4p6p5h1h2h3
)tp7p4h2h3
, vp5p6h1p7
,
−(
1 − Pp4p5p6h1h2h3p4p5p6h2h1h3
− Pp4p5p6h1h3h2p4p5p6h3h1h2
)fh7h1
, tp4p5p6h7h2h3
, +(
1 − Pp5p4p6h1h2h3p6p4p5h1h2h3
− Pp4p5p6h1h2h3p6p5p4h1h2h3
)f p6p7
, tp7p4p5h1h2h3
,
+1
2
(1 − P
p4p5p6h3h1h2p4p5p6h2h1h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
)tp4p5p6h7h8h3
, vh7h8h1h2
,−(
1 − Pp4p6p5h2h3h1p4p5p6h2h3h1
− Pp5p6p4h2h3h1p5p4p6h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp4p6p5h3h2h1p4p5p6h1h3h2
− Pp5p6p4h3h2h1p5p4p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp4p6p5h2h3h1p4p5p6h1h2h3
− Pp5p6p4h2h3h1p5p4p6h1h2h3
)tp7p4p5h8h2h3
, vh8p6h1p7
,
+1
2
(1 − P
p5p4p6h1h2h3p4p5p6h1h2h3
− Pp6p4p5h1h2h3p4p6p5h1h2h3
)tp7p8p4h1h2h3
, vp5p6p7p8
, +f h7p8
, tp8p4p5p6h7h1h2h3
,
+1
2
(1 + P
p4p5p6h3h2h1p4p5p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
)tp7p4p5p6h8h9h2h3
, vh8h9h1p7
, +1
2
(1 − P
p4p6p5h1h2h3p4p5p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
)tp7p8p4p5h9h1h2h3
, vh9p6p7p8
,−(
1 − Pp4p6p5h3h1h2p4p5p6h3h1h2
− Pp5p6p4h3h1h2p5p4p6h3h1h2
− Pp4p5p6h3h1h2p4p5p6h2h1h3
+ Pp4p6p5h3h1h2p4p5p6h2h1h3
+ Pp5p6p4h3h1h2p5p4p6h2h1h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
)tp4p5h7h3
, tp6h8
, vh7h8h1h2
,
+(
1 − Pp4p6p5h3h2h1p4p5p6h3h2h1
− Pp5p6p4h3h2h1p5p4p6h3h2h1
− Pp4p5p6h3h2h1p4p5p6h2h3h1
+ Pp4p6p5h3h2h1p4p5p6h2h3h1
+ Pp5p6p4h3h2h1p5p4p6h2h3h1
− Pp4p5p6h3h2h1p4p5p6h3h1h2
+ Pp4p6p5h3h2h1p4p5p6h3h1h2
+ Pp5p6p4h3h2h1p5p4p6h3h1h2
+ Pp4p5p6h3h2h1p4p5p6h2h1h3
− Pp4p6p5h3h2h1p4p5p6h2h1h3
− Pp5p6p4h3h2h1p5p4p6h2h1h3
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp4p6p5h3h2h1p4p5p6h1h3h2
− Pp5p6p4h3h2h1p5p4p6h1h3h2
− Pp4p5p6h3h2h1p4p5p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
)tp4p5h7h3
, tp8h2
, vh7p6h1p8
,
−(
1 − Pp5p4p6h2h3h1p4p5p6h2h3h1
− Pp4p6p5h2h3h1p4p5p6h2h3h1
+ Pp5p6p4h2h3h1p4p5p6h2h3h1
+ Pp6p4p5h2h3h1p4p5p6h2h3h1
− Pp6p5p4h2h3h1p4p5p6h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp5p4p6h3h2h1p4p5p6h1h3h2
− Pp4p6p5h3h2h1p4p5p6h1h3h2
+ Pp5p6p4h3h2h1p4p5p6h1h3h2
+ Pp6p4p5h3h2h1p4p5p6h1h3h2
− Pp6p5p4h3h2h1p4p5p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp5p4p6h2h3h1p4p5p6h1h2h3
− Pp4p6p5h2h3h1p4p5p6h1h2h3
+ Pp5p6p4h2h3h1p4p5p6h1h2h3
+ Pp6p4p5h2h3h1p4p5p6h1h2h3
− Pp6p5p4h2h3h1p4p5p6h1h2h3
)tp7p4h2h3
, tp5h8
, vh8p6h1p7
,−(
1 − Pp5p4p6h2h3h1p4p5p6h2h3h1
− Pp6p4p5h2h3h1p4p6p5h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp5p4p6h3h2h1p4p5p6h1h3h2
− Pp6p4p5h3h2h1p4p6p5h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp5p4p6h2h3h1p4p5p6h1h2h3
− Pp6p4p5h2h3h1p4p6p5h1h2h3
)tp7p4h2h3
, tp8h1
, vp5p6p7p8
,
−(
1 + Pp4p5p6h3h2h1p4p5p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
)f h7p8
, tp4p5p6h7h2h3
, tp8h1
,−(
1 − Pp4p6p5h1h2h3p4p5p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
)f h7p8
, tp8p4p5h1h2h3
, tp6h7
, +1
2
(1 − P
p4p5p6h3h2h1p4p5p6h2h3h1
− Pp4p5p6h3h2h1p4p5p6h3h1h2
+ Pp4p5p6h3h2h1p4p5p6h2h1h3
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp4p5p6h3h2h1p4p5p6h1h2h3
)tp4p5p6h7h8h3
, tp9h2
, vh7h8h1p9
,
+(
1 − Pp4p6p5h2h3h1p4p5p6h2h3h1
− Pp5p6p4h2h3h1p5p4p6h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp4p6p5h3h2h1p4p5p6h1h3h2
− Pp5p6p4h3h2h1p5p4p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp4p6p5h2h3h1p4p5p6h1h2h3
− Pp5p6p4h2h3h1p5p4p6h1h2h3
)tp7p4p5h8h2h3
, tp6h9
, vh8h9h1p7
,
−(
1 + Pp4p5p6h3h2h1p4p5p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
)tp4p5p6h7h2h3
, tp8h9
, vh7h9h1p8
, +(
1 − Pp4p6p5h2h3h1p4p5p6h2h3h1
− Pp5p6p4h2h3h1p5p4p6h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp4p6p5h3h2h1p4p5p6h1h3h2
− Pp5p6p4h3h2h1p5p4p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp4p6p5h2h3h1p4p5p6h1h2h3
− Pp5p6p4h2h3h1p5p4p6h1h2h3
)tp7p4p5h8h2h3
, tp9h1
, vh8p6p7p9
,−1
2
(1 − P
p5p4p6h1h2h3p4p5p6h1h2h3
− Pp4p6p5h1h2h3p4p5p6h1h2h3
+ Pp5p6p4h1h2h3p4p5p6h1h2h3
+ Pp6p4p5h1h2h3p4p5p6h1h2h3
− Pp6p5p4h1h2h3p4p5p6h1h2h3
)tp7p8p4h1h2h3
, tp5h9
, vh9p6p7p8
,−(
1 − Pp4p6p5h1h2h3p4p5p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
)tp7p4p5h1h2h3
, tp8h9
, vh9p6p7p8
,−1
2
(1 + P
p4p5p6h3h2h1p4p5p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
)tp7p4p5p6h8h9h2h3
, tp10h1
, vh8h9p7p10
,
−1
2
(1 − P
p4p6p5h1h2h3p4p5p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
)tp7p8p4p5h9h1h2h3
, tp6h10
, vh9h10p7p8
, +tp7p4p5p6h8h1h2h3
, tp9h10
, vh8h10p7p9
, +(
1 − Pp5p4p6h1h2h3p6p4p5h1h2h3
− Pp4p5p6h1h2h3p6p5p4h1h2h3
− Pp6p4p5h1h2h3p6p4p5h1h3h2
+ Pp5p4p6h1h2h3p6p4p5h1h3h2
+ Pp4p5p6h1h2h3p6p5p4h1h3h2
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
)tp7p6h1h2
, tp4p5h8h3
, f h8p7
,
+1
2
(1 − P
p5p4p6h2h3h1p6p4p5h2h3h1
− Pp4p5p6h2h3h1p6p5p4h2h3h1
+ Pp6p4p5h3h2h1p6p4p5h1h3h2
− Pp5p4p6h3h2h1p6p4p5h1h3h2
− Pp4p5p6h3h2h1p6p5p4h1h3h2
+ Pp6p4p5h2h3h1p6p4p5h1h2h3
− Pp5p4p6h2h3h1p6p4p5h1h2h3
− Pp4p5p6h2h3h1p6p5p4h1h2h3
)tp7p6h2h3
, tp4p5h8h9
, vh8h9h1p7
,
−(
1 + Pp4p6p5h2h3h1p6p5p4h2h3h1
− Pp4p5p6h2h3h1p5p6p4h3h2h1
+ Pp4p5p6h2h3h1p5p6p4h2h3h1
− Pp4p6p5h2h3h1p6p5p4h3h2h1
− Pp5p6p4h2h3h1p5p6p4h3h2h1
− Pp5p6p4h2h3h1p5p6p4h1h3h2
− Pp4p6p5h2h3h1p6p5p4h1h3h2
+ Pp4p5p6h2h3h1p5p6p4h3h1h2
− Pp4p5p6h2h3h1p5p6p4h1h3h2
+ Pp4p6p5h2h3h1p6p5p4h3h1h2
+ Pp5p6p4h2h3h1p5p6p4h3h1h2
+ Pp5p6p4h2h3h1p5p6p4h1h2h3
+ Pp4p6p5h2h3h1p6p5p4h1h2h3
− Pp4p5p6h2h3h1p5p6p4h2h1h3
+ Pp4p5p6h2h3h1p5p6p4h1h2h3
− Pp4p6p5h2h3h1p6p5p4h2h1h3
− Pp5p6p4h2h3h1p5p6p4h2h1h3
)tp5p6h7h2
, tp8p4h9h3
, vh7h9h1p8
, +(
1 − Pp4p5p6h1h2h3p5p4p6h1h2h3
− Pp6p4p5h1h2h3p5p4p6h1h2h3
+ Pp6p5p4h1h2h3p5p4p6h1h2h3
+ Pp4p6p5h1h2h3p5p4p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
− Pp5p4p6h1h2h3p5p4p6h1h3h2
+ Pp4p5p6h1h2h3p5p4p6h1h3h2
+ Pp6p4p5h1h2h3p5p4p6h1h3h2
− Pp6p5p4h1h2h3p5p4p6h1h3h2
− Pp4p6p5h1h2h3p5p4p6h1h3h2
+ Pp5p6p4h1h2h3p5p4p6h1h3h2
+ Pp4p5p6h2h1h3p5p4p6h2h3h1
− Pp5p4p6h2h1h3p5p4p6h2h3h1
− Pp4p6p5h2h1h3p5p4p6h2h3h1
+ Pp5p6p4h2h1h3p5p4p6h2h3h1
+ Pp6p4p5h2h1h3p5p4p6h2h3h1
− Pp6p5p4h2h1h3p5p4p6h2h3h1
)tp7p5h1h2
, tp8p4h9h3
, vh9p6p7p8
,
+1
2
(1 − P
p4p6p5h3h1h2p4p5p6h3h1h2
− Pp5p6p4h3h1h2p5p4p6h3h1h2
− Pp4p5p6h3h1h2p4p5p6h2h1h3
+ Pp4p6p5h3h1h2p4p5p6h2h1h3
+ Pp5p6p4h3h1h2p5p4p6h2h1h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
)tp4p5h7h3
, tp8p9h1h2
, vh7p6p8p9
,
+1
2
(1 − P
p4p6p5h3h1h2p4p5p6h3h1h2
− Pp5p6p4h3h1h2p5p4p6h3h1h2
− Pp4p5p6h3h1h2p4p5p6h2h1h3
+ Pp4p6p5h3h1h2p4p5p6h2h1h3
+ Pp5p6p4h3h1h2p5p4p6h2h1h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
)tp7p4p5h8h9h3
, tp10p6h1h2
, vh8h9p7p10
, +1
4
(1 − P
p4p5p6h3h1h2p4p5p6h2h1h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
)tp4p5p6h7h8h3
, tp9p10h1h2
, vh7h8p9p10
, +1
2
(1 − P
p5p4p6h2h3h1p4p5p6h2h3h1
− Pp6p4p5h2h3h1p4p6p5h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp5p4p6h3h2h1p4p5p6h1h3h2
− Pp6p4p5h3h2h1p4p6p5h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp5p4p6h2h3h1p4p5p6h1h2h3
− Pp6p4p5h2h3h1p4p6p5h1h2h3
)tp7p8p4h9h2h3
, tp5p6h10h1
, vh9h10p7p8
, +(
1 − Pp4p6p5h2h3h1p4p5p6h2h3h1
− Pp5p6p4h2h3h1p5p4p6h2h3h1
+ Pp4p5p6h3h2h1p4p5p6h1h3h2
− Pp4p6p5h3h2h1p4p5p6h1h3h2
− Pp5p6p4h3h2h1p5p4p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
− Pp4p6p5h2h3h1p4p5p6h1h2h3
− Pp5p6p4h2h3h1p5p4p6h1h2h3
)tp7p4p5h8h2h3
, tp9p6h10h1
, vh8h10p7p9
,
+1
2
(1 + P
p4p5p6h3h2h1p4p5p6h1h3h2
+ Pp4p5p6h2h3h1p4p5p6h1h2h3
)tp4p5p6h7h2h3
, tp8p9h10h1
, vh7h10p8p9
, +1
4
(1 − P
p5p4p6h1h2h3p4p5p6h1h2h3
− Pp6p4p5h1h2h3p4p6p5h1h2h3
)tp7p8p4h1h2h3
, tp5p6h9h10
, vh9h10p7p8
, +1
2
(1 − P
p4p6p5h1h2h3p4p5p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
)tp7p4p5h1h2h3
, tp8p6h9h10
, vh9h10p7p8
, +(
1 − Pp5p4p6h2h3h1p6p4p5h2h3h1
− Pp4p5p6h2h3h1p6p5p4h2h3h1
− Pp6p4p5h2h3h1p6p4p5h3h2h1
+ Pp5p4p6h2h3h1p6p4p5h3h2h1
+ Pp4p5p6h2h3h1p6p5p4h3h2h1
− Pp6p4p5h2h3h1p6p4p5h1h3h2
+ Pp5p4p6h2h3h1p6p4p5h1h3h2
+ Pp4p5p6h2h3h1p6p5p4h1h3h2
+ Pp6p4p5h2h3h1p6p4p5h1h2h3
− Pp5p4p6h2h3h1p6p4p5h1h2h3
− Pp4p5p6h2h3h1p6p5p4h1h2h3
+ Pp6p4p5h2h3h1p6p4p5h3h1h2
− Pp5p4p6h2h3h1p6p4p5h3h1h2
− Pp4p5p6h2h3h1p6p5p4h3h1h2
− Pp6p4p5h2h3h1p6p4p5h2h1h3
+ Pp5p4p6h2h3h1p6p4p5h2h1h3
+ Pp4p5p6h2h3h1p6p5p4h2h1h3
)tp7h2
, tp6h8
, tp4p5h9h3
, vh8h9h1p7
, +(
1 − Pp4p6p5h2h3h1p5p6p4h2h3h1
+ Pp4p5p6h2h3h1p5p6p4h2h3h1
+ Pp5p6p4h3h2h1p5p6p4h1h3h2
− Pp4p6p5h3h2h1p5p6p4h1h3h2
+ Pp4p5p6h3h2h1p5p6p4h1h3h2
+ Pp5p6p4h2h3h1p5p6p4h1h2h3
− Pp4p6p5h2h3h1p5p6p4h1h2h3
+ Pp4p5p6h2h3h1p5p6p4h1h2h3
)tp5h7
, tp6h8
, tp9p4h2h3
, vh7h8h1p9
, +(
1 − Pp4p6p5h1h2h3p4p5p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
− Pp4p5p6h1h2h3p4p5p6h1h3h2
+ Pp4p6p5h1h2h3p4p5p6h1h3h2
+ Pp5p6p4h1h2h3p5p4p6h1h3h2
+ Pp4p5p6h1h2h3p4p5p6h2h3h1
− Pp4p6p5h1h2h3p4p5p6h2h3h1
− Pp5p6p4h1h2h3p5p4p6h2h3h1
)tp7h1
, tp8h2
, tp4p5h9h3
, vh9p6p7p8
,−(
1 − Pp4p5p6h1h2h3p5p4p6h1h2h3
− Pp6p4p5h1h2h3p5p4p6h1h2h3
+ Pp6p5p4h1h2h3p5p4p6h1h2h3
+ Pp4p6p5h1h2h3p5p4p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
− Pp5p4p6h1h2h3p5p4p6h2h1h3
+ Pp4p5p6h1h2h3p5p4p6h2h1h3
+ Pp6p4p5h1h2h3p5p4p6h2h1h3
− Pp6p5p4h1h2h3p5p4p6h2h1h3
− Pp4p6p5h1h2h3p5p4p6h2h1h3
+ Pp5p6p4h1h2h3p5p4p6h2h1h3
− Pp5p4p6h1h3h2p5p4p6h3h1h2
+ Pp4p5p6h1h3h2p5p4p6h3h1h2
+ Pp6p4p5h1h3h2p5p4p6h3h1h2
− Pp6p5p4h1h3h2p5p4p6h3h1h2
− Pp4p6p5h1h3h2p5p4p6h3h1h2
+ Pp5p6p4h1h3h2p5p4p6h3h1h2
)tp7h1
, tp5h8
, tp9p4h2h3
, vh8p6p7p9
, +1
2
(1 − P
p4p5p6h1h2h3p4p5p6h1h3h2
+ Pp4p5p6h1h2h3p4p5p6h2h3h1
)tp7h1
, tp8h2
, tp4p5p6h9h10h3
, vh9h10p7p8
,−(
1 − Pp5p4p6h1h2h3p6p4p5h1h2h3
− Pp4p5p6h1h2h3p6p5p4h1h2h3
− Pp6p4p5h1h2h3p6p4p5h2h1h3
+ Pp5p4p6h1h2h3p6p4p5h2h1h3
+ Pp4p5p6h1h2h3p6p5p4h2h1h3
− Pp6p4p5h1h3h2p6p4p5h3h1h2
+ Pp5p4p6h1h3h2p6p4p5h3h1h2
+ Pp4p5p6h1h3h2p6p5p4h3h1h2
)tp7h1
, tp6h8
, tp9p4p5h10h2h3
, vh8h10p7p9
, +(
1 − Pp4p5p6h1h2h3p4p5p6h2h1h3
− Pp4p5p6h1h3h2p4p5p6h3h1h2
)tp7h1
, tp8h9
, tp4p5p6h10h2h3
, vh9h10p7p8
, +1
2
(1 − P
p4p6p5h1h2h3p5p6p4h1h2h3
+ Pp4p5p6h1h2h3p5p6p4h1h2h3
)tp5h7
, tp6h8
, tp9p10p4h1h2h3
, vh7h8p9p10
, +(
1 − Pp5p4p6h1h2h3p6p4p5h1h2h3
− Pp4p5p6h1h2h3p6p5p4h1h2h3
)tp6h7
, tp8h9
, tp10p4p5h1h2h3
, vh7h9p8p10
,−1
2
(1 − P
p5p4p6h2h3h1p6p4p5h2h3h1
− Pp4p5p6h2h3h1p6p5p4h2h3h1
+ Pp6p4p5h3h2h1p6p4p5h1h3h2
− Pp5p4p6h3h2h1p6p4p5h1h3h2
− Pp4p5p6h3h2h1p6p5p4h1h3h2
+ Pp6p4p5h2h3h1p6p4p5h1h2h3
− Pp5p4p6h2h3h1p6p4p5h1h2h3
− Pp4p5p6h2h3h1p6p5p4h1h2h3
)tp7p6h2h3
, tp4p5h8h9
, tp10h1
, vh8h9p7p10
, +(
1 + Pp4p6p5h2h3h1p6p5p4h2h3h1
− Pp4p5p6h2h3h1p5p6p4h3h2h1
+ Pp4p5p6h2h3h1p5p6p4h2h3h1
− Pp4p6p5h2h3h1p6p5p4h3h2h1
− Pp5p6p4h2h3h1p5p6p4h3h2h1
− Pp5p6p4h2h3h1p5p6p4h1h3h2
− Pp4p6p5h2h3h1p6p5p4h1h3h2
+ Pp4p5p6h2h3h1p5p6p4h3h1h2
− Pp4p5p6h2h3h1p5p6p4h1h3h2
+ Pp4p6p5h2h3h1p6p5p4h3h1h2
+ Pp5p6p4h2h3h1p5p6p4h3h1h2
+ Pp5p6p4h2h3h1p5p6p4h1h2h3
+ Pp4p6p5h2h3h1p6p5p4h1h2h3
− Pp4p5p6h2h3h1p5p6p4h2h1h3
+ Pp4p5p6h2h3h1p5p6p4h1h2h3
− Pp4p6p5h2h3h1p6p5p4h2h1h3
− Pp5p6p4h2h3h1p5p6p4h2h1h3
)tp5p6h7h2
, tp8p4h9h3
, tp10h1
, vh7h9p8p10
,−(
1 − Pp4p5p6h1h2h3p5p4p6h1h2h3
− Pp6p4p5h1h2h3p5p4p6h1h2h3
+ Pp6p5p4h1h2h3p5p4p6h1h2h3
+ Pp4p6p5h1h2h3p5p4p6h1h2h3
− Pp5p6p4h1h2h3p5p4p6h1h2h3
− Pp5p4p6h1h2h3p5p4p6h1h3h2
+ Pp4p5p6h1h2h3p5p4p6h1h3h2
+ Pp6p4p5h1h2h3p5p4p6h1h3h2
− Pp6p5p4h1h2h3p5p4p6h1h3h2
− Pp4p6p5h1h2h3p5p4p6h1h3h2
+ Pp5p6p4h1h2h3p5p4p6h1h3h2
+ Pp4p5p6h2h1h3p5p4p6h2h3h1
− Pp5p4p6h2h1h3p5p4p6h2h3h1
− Pp4p6p5h2h1h3p5p4p6h2h3h1
+ Pp5p6p4h2h1h3p5p4p6h2h3h1
+ Pp6p4p5h2h1h3p5p4p6h2h3h1
− Pp6p5p4h2h1h3p5p4p6h2h3h1
)tp7p5h1h2
, tp8p4h9h3
, tp6h10
, vh9h10p7p8
,−1
2
(1 − P
p4p6p5h3h1h2p4p5p6h3h1h2
− Pp5p6p4h3h1h2p5p4p6h3h1h2
− Pp4p5p6h3h1h2p4p5p6h2h1h3
+ Pp4p6p5h3h1h2p4p5p6h2h1h3
+ Pp5p6p4h3h1h2p5p4p6h2h1h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
)tp4p5h7h3
, tp8p9h1h2
, tp6h10
, vh7h10p8p9
, +(
1 − Pp5p4p6h1h2h3p6p4p5h1h2h3
− Pp4p5p6h1h2h3p6p5p4h1h2h3
− Pp6p4p5h1h2h3p6p4p5h1h3h2
+ Pp5p4p6h1h2h3p6p4p5h1h3h2
+ Pp4p5p6h1h2h3p6p5p4h1h3h2
+ Pp5p6p4h3h2h1p5p4p6h1h2h3
+ Pp4p6p5h3h2h1p4p5p6h1h2h3
− Pp4p5p6h3h2h1p4p5p6h1h2h3
)tp7p6h1h2
, tp4p5h8h3
, tp9h10
, vh8h10p7p9
, +(
1 − Pp5p4p6h1h2h3p6p4p5h1h2h3
− Pp4p5p6h1h2h3p6p5p4h1h2h3
− Pp6p4p5h1h2h3p6p4p5h1h3h2
+ Pp5p4p6h1h2h3p6p4p5h1h3h2
+ Pp4p5p6h1h2h3p6p5p4h1h3h2
+ Pp6p4p5h1h2h3p6p4p5h2h3h1
− Pp5p4p6h1h2h3p6p4p5h2h3h1
− Pp4p5p6h1h2h3p6p5p4h2h3h1
)tp7h1
, tp8h2
, tp6h9
, tp4p5h10h3
, vh9h10p7p8
, +(
1 − Pp4p6p5h1h2h3p5p6p4h1h2h3
+ Pp4p5p6h1h2h3p5p6p4h1h2h3
− Pp5p6p4h1h2h3p5p6p4h2h1h3
+ Pp4p6p5h1h2h3p5p6p4h2h1h3
− Pp4p5p6h1h2h3p5p6p4h2h1h3
− Pp5p6p4h1h3h2p5p6p4h3h1h2
+ Pp4p6p5h1h3h2p5p6p4h3h1h2
− Pp4p5p6h1h3h2p5p6p4h3h1h2
)tp7h1
, tp5h8
, tp6h9
, tp10p4h2h3
, vh8h9p7p10
,
Jeff Hammond ICERM
CCSDTQ-LR quadruples
Summary:
· · · /ccsdtq lr> wc -l ccsdtq t[1234].out
15 ccsdtq t1.out
38 ccsdtq t2.out
53 ccsdtq t3.out
74 ccsdtq t4.out
180 total
These 180 equations – each of which may include dozens ofpermutations of one contraction – are implemented as 50 KLOC ofFortran 77.
Side note: The operator algebra and code generation for CCSDTQtakes close to an hour. CCSDTQP exhausted the memory on allknown machines capabile of running Python.
Jeff Hammond ICERM
Summary
Know:
Don’t send a runtime to do an algorithm’s job.
Elemental/cyclic distribution solves critical data distributionproblem of symmetric tensors.
Communication-reducing algorithms for sorting and matmulcan be reused for tensors.
Binary contractions are just one piece of the puzzle.
Want:
Generalized N-body requires continuous FMM for r−α
Getting to science is hopeless without automation.
Jeff Hammond ICERM
Acknowledgments
Berkeley: Edgar Solomonik
Jeff Hammond ICERM