18
Implementation of Density Functional Theory based Electronic Structure Codes on Advanced Computing Architectures W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

  • Upload
    joann

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Implementation of Density Functional Theory based Electronic Structure Codes on Advanced Computing Architectures. W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison. Cray-X1E. 1024 Multi-streaming vector processor (MSP) Each MSP has 2 MB of cache and a peak computation rate of 12.8 GF - PowerPoint PPT Presentation

Citation preview

Page 1: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Implementation of Density Functional Theory based Electronic Structure Codes on Advanced Computing

Architectures

W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Page 2: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Cray-X1E• 1024 Multi-streaming vector processor (MSP)

– Each MSP has 2 MB of cache and a peak computation rate of 12.8 GF– 4 single-streaming processors (SSPs) form a node with 16 Gbytes of

shared memory– Memory is physically distributed on individual modules– all memory is directly addressable to and accessible by any MSP in the

system through the use of load and store instructions

Page 3: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Cray XT3

• 5294 nodes• 4-processors per node connected through

hypertransport 2.4-GHz AMD Opteron processor and 2 GB of memory

• SDRAM memory controller and function of Northbridge is pulled onto the Opteron die. Memory latency reduced to <60 ns

• Interface off the chip is an open standard (HyperTransport)

Six Network LinksEach >3 GB/s x 2

(7.6 GB/sec Peak for each link)

6.5 GB/secSustained

2.17 GB/secSustained

Page 4: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Applications• Porphyrin Functionalized

Nanotube • 1532 atoms• STO-3G basis set = 6380 basis

functions• Do the covalently attached

porphyrins undergo facile absorption of visible light and transferelectrons to the nanotube

• What type of efficiencies does one obtain

• Time for LDA energy + gradient using 300 processors = less than 1 hour

Page 5: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

LSDA &Multiple Scattering Theory (MST)

1

2{( )1 } 1 ( ')2

Im ( ) Tr

Im (

( , '; )

( , '; )

( , '

( )

r )) ;( ) T1

eff G

G

Vm

dn f

d f G

π

ε δ

ε ε μ

μ σ

ε

ε εε

ε

π

+ ∇ − = −

= −

= −

r

m

r r

r r

r r

r

r

r) )) )

)

)

h

%

No

Yes

Mix

in & out

Initial guess

nin(r) , min(r)

Calculate

Veff[n,m]in

Solve Schrodinger

Equation

Recalculate

nout(r) , mout(r)

nin(r) = nout(r) ?

min(r) =mout(r) ?

Calculate

Total Energy

Multiple Scattering Theory (MST)J. Korringa, Physica 13, 392, (1947)W. Kohn, N. Rostoker, PR, 94, 1111,(1954)

MST Green function methodsB. Gyorffy, and M. J. Stott, “Band Structure

Spectroscopy of Metals and Alloys”, Ed. D.J. Fabian and L. M. Watson (Academic 1972)

S.J. Faulkner and G.M. Stocks, PR B 21, 3222, (1980)

Page 6: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Algorithm Design for future generation architectures

• More accurate• Spectral or pseudo-spectral accuracy

• Wider range of applicability

• Sparse representation• Memory requirements grow linearly

• Each processor can treat thousands of atoms• Make use of large number of processors

• Message-Passing• Each atom/node local message-passing is independent of the size of the system

• Time consuming step of model• Sparse linear solver

•Direct or preconditioned iterative approach

Page 7: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Multiple Scattering Theory

nR

n′R

nr

nr

r

′r

n

n′

decay slowly with increasing distancecontain free-electron singularities

Generalization of t-matrix. Converts incoming wave at site n into outgoing wave at site m in the presence of all the other sites

M(ε) =

t0−1 −G01(ε) −G02 (ε) −G03 (ε)

−G10 (ε) t1−1 −G12 (ε) −G1m(ε)

−G20 (ε) −G21(ε) t2−1 −G2m(ε)

−G30 (ε) −G31(ε) −G32 (ε) tm−1

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

Multiple scattering theory• Green function

• Scattering path matrix

τnm(ε) =tn(ε)δnm+ tnn≠m

∑ (ε)G( r Rn −

r Rm)τ

nm(ε)

τnm(ε) =(Mnm(ε))−1

ML ′ Lnm(ε) =tL ′ L

−1δL ′ L δnm−GL ′ L ( r Rn −

r Rm)

Gnm( r r, r′ r ;ε) = ZL

n

L ′ L

∑ ( r rn;ε)τL ′ L

nm(ε)Z ′ Lm(

r′ rm;ε)−ZL

n( r r<;ε)J ′ L

m( r′ r>;ε)δL ′ L

Gnm(ε) =−14π

eiε 1/ 2

r Rn−

r Rm

r Rn −

r Rm

Page 8: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Complex Energy Plane

The scattering properties at complex energy can be used to develop highly efficient real-space and k-space methods

Real ε

Im ε

εf

Semi-conductors and insulators could work well since they have no states at εf

Real ε

Im ε

εf

Scattering is local since there are no states near the bottom of the energy contourScattering is local since a large Im ε is equivalent to rising temperature which smears out the states

Near εf scattering is non-local (metal)

εf is the highest occupied electronic state in energy

Page 9: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Multiple Scattering Theory

τ−1L ′ L (ε) =t

L , ′ L ,0

−1 (ε) - GL ′ L(R

ij,ε)

Depends on constituent V(r)Independent of lattice

Depends on ε and Underlying lattice structure

Representation is ideal for disorder systems since t-matrixis site diagonal !!Coherent Potential Approximation (CPA)Non-local CPA (based Dynamical Cluster Approximation toDynamical mean-field theory

Page 10: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

t-matrix• Solve for t-matrix inside

Voroni polyhedron– Using Calergo method

R l (r) =jl (κr)−κ s∫ 2V(s)R l (s) nl (κs) jl (κr)−jl (κs)nl (κr)[ ]ds

R l (r) =cl (r) jl (κr)−sl nl (κr)dcl (r)

dr=−κr2V(r)nl (κr) cl (r) jl (κr)−sl nl (κr)[ ]

dsl (r)dr

=−κr2V(r) jl (κr) cl (r) jl (κr)−sl nl (κr)[ ]

Equations solved for both regular and irregular solutionsSince each atom and is independent can solve in shared memory. Similar for the structure constants.

l

Page 11: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Parallel Implementation

• Green’s function

• Scattering path matrix: real space

t : scattering from single site

G: structure constant matrix

M=[t-1()-G(Rnm,]

τ=M-1

• Once M is fixed increasing N does not

affect the local calculation of M-1

Gnm( r r, r′ r ;ε) = ZL

n

L ′ L

∑ ( r rn;ε)τL ′ L

nm(ε)Z ′ Lm(

r′ rm;ε)−ZL

n( r r<;ε)J ′ L

m( r′ r>;ε)δL ′ L

Page 12: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Tight-Binding MST Representation

Tight Binding Multiple Scattering Theory

Embed a constant repulsive potential

Shifts the energy zero allowing for calculations at negative energy

Rapidly decaying interactions

Free electron singularities are not a problem

Sparse representation

G ,nms(ε)∝e−(Vs−ε)1/ 2 |

r R n−

r R m|

Gnm(ε)∝e−ε1/ 2

r R n−

r R m

r Rn −

r Rm

G =G0 +G0tG0 +G0tG0tG0 =G0 (I −tG0 )−1

G−1 =G0

−1 −tG r =G0 +G0t

rG0 +G0trG0t

rG0 =G0 (I −trG0 )−1

(G r )−1 =G0

−1 −tr → G0

−1 =(G r )−1 +tr

G−1 =(G r )−1 +tr −t → (G r )−1 −Δt

}

2 Ryd.

0Vr

Constant inside a sphere

r R ≤

r Rmt

r R >

r Rmt

Page 13: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Screened Structure Constants

Gs (ε) =[I −tsGfree(ε)]−1

G( r k,ε) = Gm,s (ε)ei

r k•

r Rm

m

• Screened Structure Constants Gs

on the left unscreened on the right– Screened structure constants

rapidly go to zero, whereas the free space structure constants have hardly changed

• Linear solve using m atom cluster that is less than the n atom system

• Easy to perform Fourier transform–K-space method

Page 14: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Screened MST Methods

• Formulation produces a sparse matrix representation– 2-D case has tridiagonal structure with a few distant elements due to

periodicity– 3-D case has scattered elements

• Mainly due to mapping 3-D structure to a matrix (2-D)• A few elements due to periodic boundary conditions

• Require block diagonals of the inverse of τε matrix– Block diagonals represent the site τε matrix and are needed to

calculate the Green’s function for each atomic site• Sparse direct and preconditioned iterative methods are used to calculate

τii(ε)– SuperLU– Transpose free Quasi-Minimal Residual Method (TFQMR)

Page 15: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Screened KKR Accuracy

fcc Cubcc Cubcc Mohcp Co

Page 16: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Timing and Scaling of Scr-K KKR-CPA

Page 17: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Conclusion

• Initial benchmarking of the Screened KKR method– SuperLU N1.8 for finding the inverse of the upper left block of τ– TFQMR with block Jacobi preconditioner N1.06 for finding the inverse of

the upper left block of τ• Extremely high sparsity (97%-99% zeros increases with increasing

system size)• Large number of atoms on a single processor• Real-space/Scr-KKR hybrid may provide the most efficient parallel

approach for new generation architectres• Single code contains

– LSMS, KKR-CPA, Scr-LSMS and Scr-KKR-CPA

Page 18: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison

Local Poisson EquationDiscontinuous Galerkin approach

σh ⋅τK∫ dx=- uh∇⋅τ

K∫ dx+ ˆ uKK

∫ ⋅nKτ ,ds ∀τ ∈ ( )K∑

σh ⋅∇νK∫ dx=- fνK∫ dx+ ˆ σ KK∫ ⋅nKν ,ds ∀ν ∈ P(K )

−Δu=f inΩ

∇u=σ∇⋅σ =f

τ =tensor basis in d-dimension

ν =basis

uh =is ,the representation of u on an element

same forσh

√ uh =