Hybrid Programming Models in Computational Chemistry€¦ · 1 Managed by UT-Battelle 100,000 CPUsfor the U.S. Department of Energy NWChem •Premier DOE computational chemistry software

Hybrid Programming Models inComputational Chemistry

Edo Apra, George I. Fann1, and Robert J. Harrison1,2, 1Oak Ridge National Laboratory

2University of Tennessee, Knoxville

[email protected]

Outline• This presentation provides a brief overview

of two models that computational chemists have been using which is scalable, achieves high performance and practicle.– Global Array as in PNNL's NWChem– Futures and MADNESS runtime as in ORNL's

MADNESS

http://code.google.com/p/m-a-d-n-e-s-shttp://harrison2.chem.utk.edu/~rjh/madness/

open source GPL-2

Schrodinger’s Equation, Electronic Structure

Electronic Structure, e.g. Hartree-Fock wavefunction methods, where N is

the number of particles (e.g. electrons and nuclei)

Kohn-Sham Equations

Ground state energy, given as a

functional of the charge density, with

kinetic energy, exchange

potential+nuclei-electron+ electron-

electron

Kinetic energy, summing over all

particles

Electron-electron energy, including

electro-static, exchange-correlation

Re-introduce orthonormal wave-

functions

Define a functional, with Lagrange

multipliers to ensure orthonormality

Minimize with respect to

1 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs

NWChem

• Premier DOE computational chemistry software developed at PNNL

• Provides science to solution

• Provide major modeling and simulation capability for molecular science

– Broad range of molecules, including catalysts, biomolecules, and heavy elements

– Solid state capabilities

• Performance characteristics – designed for MPP

– Single node performance comparable to best serial codes

– Runs on a wide range of computers

• Freely distributed → large and active user community

• 1000 plus peer-reviewed publications citing NWChem usage

• Uses Global Arrays/ARMCI for parallelization

• ORNL contribution: Cray XT porting and tuning


NWChem Architecture R

un

-tim

e d

ata

ba

se

DFT energy, gradient, …

MD, NMR, Solvation, …

Optimize, Dynamics, …

SCF energy, gradient, …

Inte

gral A

PI

Geo

met

ry O

bje

ct

Basi

s S

et O

bje

ct

...

PeI

GS

...

Global Arrays

Memory Allocator

Parallel IO

Molecular

Modeling

Toolkit

Molecular

Calculation

Modules

Molecular

Software

Development

Toolkit

Generic

Tasks Energy, structure, … Object-oriented design

• abstraction, data hiding, APIs

Parallel programming model

• non-uniform memory access, Global Arrays, MPI

Infrastructure

• GA, Parallel I/O, RTDB, MA, ...

Program modules

• communication only through the database

• persistence for easy restart


Global Arrays

single, shared data structure/

global indexing

e.g., access A(4,3) rather than

buf(7) on task 2

Legacy of Jarek Nieplocha

Physically distributed data

• Distributed dense arrays that can be accessed through a shared memory-like style • High level abstraction layer for the application developer (that’s me!) • One-sided model = no need to worry and send/receive

Global Address Space


GA call to ARMCI communication


Gaussian DFT computational kernel

Evaluation of XC potential matrix element

my_next_task = SharedCounter()

do i=1,max_i

if(i.eq.my_next_task) then

call ga_get()

(do work)

call ga_acc()

my_next_task =

SharedCounter()

endif

enddo

barrier()

Both GA operations are greatly dependent on the

communication latency


Methods for Latency Hiding

Aggregating small messages – Not always feasible, depending on algorithm

Data replication – Large memory requirement

– Use of collective operations -> scaling bottleneck

Non-blocking communication – Not always supported by either the communication layer (MPI) or the

network

– HW and SW must be able to overlap communication and computation.

Task-based parallelism


Issues in Petaflops Computing -

Programming models

• Message-Passing • Message-passing fro MPI1 current workhorse

– Strengths and limitations

• Failure of MPI2

• New promising feature of MPI3 , under development/discussion

– Fault-tolerance

– Non-blocking collectives

– Remote Memory Access (RMA)

• Partitioned global address space (PGAs) programning models

• Libraries: Global Arrays

• Languages: X10, Chapel, Fortress, Charm++, Madness RT

• Networking layers: Gasnet, ARMCI, more to come, …

• Hybrid approaches using MPI inter-node plus for intra-node

– OpenMP, Pthreads, Intel Threading Building Blocks (TBB), etc ..


Trends in Chemistry Codes

• Multi-level parallelism

– Effective path for most applications to scale to 100K processors

– Examples: coarse grain over vibrational degrees of freedom in numerical hessian, or geometries in a surface scan or parameter study

– Conventional distributed memory within each subtask

– Fine grain parallelism within a few processor SMP (multi-threads, OpenMP, parallel BLAS, …)

• Efficient exploitation of fine grain parallelism is a major concern on future architectures


Trends in Chemistry Codes

• Multi-level parallelism

– Effective path for most applications to scale to 100K processors

– Examples: coarse grain over vibrational degrees of freedom in numerical hessian, or geometries in a surface scan or parameter study

– Conventional distributed memory within each subtask

– Fine grain parallelism within a few processor SMP (multi-threads, OpenMP, parallel BLAS, …)

• Efficient exploitation of fine grain parallelism is a major concern on future architectures


ARMCI GAS Runtime

Application

Global

Arrays

native interfaces- LAPI, GM/Myrinet,

threads, IB, Portals

CAF GP-

SHME

M

ARMCI

ARMCI can coexist with MPI as an

implementation layer for multiple

programming models and applications.

Message

Passing (MPI)

• Remote Memory Access Functionality

– put, get, accumulate (also handles noncontiguous data transfers)

– nonblocking communication interfaces

– atomic read-modify-write, mutexes and locks

– memory allocation

– fence operations for managing memory consistency

• Characteristics

– simple progress rules (like Cray SHMEM)

– widespread portability through ports to native network interfaces

– exploits and exposes shared memory to the user

– works with MPI


ARMCI port for Cray XT

• Part of collaborative effort between ORNL and PNNL. This ORNL contributed port will be integrated in GA/NWChem releases distributed by PNNL

• Uses the Portals library for inter-node communication

• SMP aware: uses SysV shared memory for intra-node communication

• Server-layer for calls that do not map directly onto the network


XT Specific enhancements to ARMCI

• Cloning vs. pthreads/fork

– Traditional implementation forks a process to act as the Communication Helper Thread or creates a thread using pthreads library

– For XT, to ensure Portals specific environment is not duplicated, we clone with a clean stack

• State checking

– For debugging, advanced state information in maintained and aggregated to obtain overall communication state in case of any errors or race conditions

• Smarter memory usage for communication

– Single Portals MD created to use across multiple application memory allocation even across process groups

• Data response as ACK’s

– Utilizes Cray Portals events to consider data responses as acknowledgements

• Utilizes a single copy of Portals NID/PID information necessary for communication across all the clients and communication helper thread

Functionality

Matches

Portals API

Missing

functionality

Thin Wrapper Full Implementation

Portals CLE


Global Address Space

• Global Address Space (GAS) programming models: Co-Array Fortran, SHMEM, and X10

• Global Arrays = GAS library that allows PGAS-style programming (i.e. distinction of global and local memory)


Preliminary Steps for CCSD(T) run

• Hartree-Fock run – requires few 100 procs, followed

• CCSD SCF (iterative run)

– Starts from the molecular orbitals generated by Hartree-Fock

– requires a few 1000 procs, communication intensive

• CCSD(T) reuses

– the T1 and T2 amplitudes computed by CCSD

– HF molecular orbitals


NWChem code changes required to

scale beyond 100K cores

• Many to one communication patterns causes job

– to progress very slowly (best case)

– to hang (or worse)

• Diagnosis: cumbersome … lucky coredumps!

• Causes: several processors simultaneously accessing the same patch of a matrix, where the patch is owned by a single processor.

• Solutions:

– Staggered access

– Use of a subset of processing elements

– Atomic shared counter: first token is static


CCSD(T) run on Cray XT5 : 24 water

(H2O)24

72 atoms 1224 basis functions

Cc-pvtz(-f) basis

November 2009

Floating-Point performance

at 223K cores:

1.39 PetaFLOP/s


Scientific impact

POTENTIAL DEVELOPMENT PHILOSOPHY

Water Clusters

Molecular level information

Binding energies, Structures

Liquid water, ices, aqueous interfaces

Macroscopic properties

Enthalpies (H), Radial Distribution Functions, Diffusion coefficient, dielectric constant, 2nd virial,

…

Picture: Snapshot during a nuclear quantum statistical trajectory (32 beads/atom)

of liquid water with the developed interaction potential

Objective: Model the properties of liquid water and ice

Approach: Rely on highly accurate electronic structure results in lieu of absence of

experimental data

Implementation: highly accurate CCSD(T) binding energies of water clusters up to

(H2O)24; cannot be obtained on commodity HW, requires Petascale HW

Scientific Results:

first quantitative information on the magnitude of quantum effects in liquid water

accurate description of structural (radial distribution functions), thermodynamic

(enthalpy), transport (diffusion coefficient) and spectral (infrared and 2-D

infrared) properties of liquid water

testbed for the development of a general polarizable force field to study

aqueous solvation of biomolecules


Outstanding 1.39 PetaFLOP/s sustained double-precision performance using the freely distributed NWChem electronic structure code

Performance result obtained on the full Cray XT5 machine at 60% peak FP, fully exploiting the aggregate CPU, RAM and network resources

Innovative use of one-sided communication model via the Global Arrays library

Addresses important scientific questions regarding the modeling of water

Summary

What is MADNESS?

• A general purpose numerical environment for reliable and fast scientific simulation– Chemistry, nuclear physics, atomic physics,

material science, nanoscience, climate, fusion, ...

• A general purpose parallel programming environment designed for the peta/exa-scales– Standard C++ with concepts from Cilk, Charm++,

HPCS languages– Compatible by design with existing applications– Runs on the world’s largest computers

What is MADNESS?

• MADNESS is a framework– Like NWChem, PETSc, ...

• Frameworks– Increase productivity; hide complexity– Interface disciplines; capture knowledge– Open HPC to a wider community– Long-lived, communal projects with

broad impact

Numerics

Parallel Runtime

Applications

Why MADNESS?

• MADNESS addresses many of the sources of complexity that constrain our HPC ambitions– Science, physics, theory, ...

• Constantly evolving but can take years to implement

– Scalable algorithms and math• Need rapid deployment of the latest and greatest

– Software• Crude parallel programming tools with explicit expression

and management of concurrency and data

– Hardware• Millions of cores with deep memory hierarchy

Why MADNESS?

• Reduces S/W complexity– MATLAB-like level of composition of scientific

problems with guaranteed speed and precision– Programmer not responsible for managing

dependencies, scheduling, or placement

• Reduces numerical complexity– Solution of integral not differential equations– Framework makes latest techniques in applied

math and physics available to wide audience

E.g., with guaranteed precision of 1e-6 form a numerical representation of a Gaussian in

the cube [-20,20]3, solve Poisson’s equation, and plot the resulting potential

(all running in parallel with threads+MPI)

There are only two lines doing real work. First the Gaussian (g) is projected intothe adaptive basis to the default precision. Second, the Green’s function is applied.The exact results are norm=1.0 and energy=0.3989422804.

output: norm of f 1.00000000e+00 energy 3.989420526e-01

Example Applications

Molecular HF and DFTEnergy andgradients

ECPs coming(Sekino, Kato)

Responseproperties(Vasquez andSekino)

Still not as functional as previous Python version

Spin density of solvated electron

Nuclear physics

J. Pei, G.I. Fann, Y. Ou, W. NazarewiczUT/ORNL

● DOE UNDEF● Nuclei & neutron matter● ASLDA● Hartree-Fock Bogliobulov● Spinors● Gamow states

Imaginary part of the seventh eigen function two-well Wood-Saxon potential

Solid-state physics

• Thornton, Eguiluz and Harrison (UT/ORNL)– NSF OCI-0904972: Computational

chemistry and physics beyond the petascale

• Full band structure with LDA and HF for periodic systems

• In development: hybrid functionals, response theory, post-DFT methods such as GW and model many-body Hamiltonians via Wannier functions

Coulomb potential isosurface in LiF

Time evolutionVence, Harrison, Krstic, Jia, Fann (UT/ORNL)

• Multiwavelet basis not optimal– Not strongly band limited; explicit methods unstable

* DG introduces flux limiters, we use filters

• Semi-group approach– Split into linear and non-linear parts

• Trotter-Suzuki methods– Time-ordered exponentials – Chin-Chen gradient correction (JCP 114, 7338, 2001)

u̇ x , t = L uN u , t

u x , t = eLt u x , 0∫

0

t

eL t− N u ,d

eAB=e A/2 eB eA /2

O ∥[[A , B ] , A]∥

Dynamics of H2

+ in laser• 4D – 3 electronic + internuclear coordinate

– First simulation with quantum nuclei and non-collinear field (field below is transverse)E

lect

roni

c di

pole

a.u

.

Time a.u.

R a.u.

-2.017

-2.032

Field

Nanoscale photonics(Reuter, Northwestern; Hill, Harrison ORNL)

Diffuse domain approximation for interior boundary value problem; long-wavelength Maxwell equations; Poisson equation; Micron-scale Au tip 2 nm above Si surface with H2 molecule in gap – 107 difference between shortest and longest length scales.

Advantages of Integral Form

• E.g.,– Often soluble without any preconditioning and

often without any iteration (as in this case) – Can obtain higher accuracy– In simple domains builds in correct asymptotics– Potentially more computationally efficient

• Challenge and solution– In most bases integral operator is dense & slow– Multiresolution analysis provides fast

algorithms with guaranteed precision

∇2 u (r )=−4 πρ (r ) v.s. u (r )=∫G (r , r ' )ρ (r ' )d 3 r '

Electrostatics ∇ 2 u (r )=−4 π ρ(r )

u (r) = ∫G (r , r ' )ρ(r ')d 3 r '+

∮∂Ω[G(r , r ' )∇ ' u (r ' )−u (r ' )∇ ' G(r , r ' )] . dS

G (r , r ') =1

4 π∣r−r '∣

Quantum mechanics (−12∇2+ V (r ))ψ (r )=E ψ(r )

ψ(r )=−2 (∇ 2+ 2 E )

−1V (r )ψ (r )

Time evolution L̂ u (r , t)+ N (u , t)=dudt

u (r , t)=et L̂ u (r ,0)+∫0

t

e(τ−t ) L̂ N (u , τ)d τ

e t ∇2

f (r )=(4 π t )−d /2∫ e

−(r−r ')2

4 t f (r ' )d d r

The math behind the MADNESS

• Discontinuous spectral element basis– High-order convergence ideally suited for modern

computer technology

• Multi-resolution analysis for fast algorithms– Sparse representation of many integral operators– Precision guaranteed through adaptive refinement

• Separated representations of operators and functions– Enable efficient computation in many dimensions

Essential techniques for fast computation

• Multiresolution

• Low-separation rank

• Low-operator rank

V 0⊂V 1⊂⋯⊂V n

V n=V 0V 1−V 0 ⋯ V n−V n−1

f x1, , xn=∑l=1

M

l∏i=1

d

f i l xiO

∥ f il ∥2=1 l0

A=∑=1

r

u vTO

0 vT v=u

T u=

1-D Scaling Function Basis: Alpert’s Multiwavelets

Divide domain into 2n pieces (level n)

– Adaptive sub-division (local refinement)

– lth sub-interval [l*2-n,(l+1)*2-n] l=0,…,n-1

In each sub-interval define a polynomial basis

– First k Legendre polynomials, , rescale and normalized on [0,1)

– Orthonormal, disjoint support

n=0

n=1

n=2

l=0 l=1 l=2 l=3

/2

( ) 2 1 (2 1), [0,1)

( ) 2 (2 ), 2 [ , 1), 0 1

i i

n n n n

il i

x i P x x

x x l x l l l n

iP

Scaling Function Basis:Alpert’s, level, n=2

2

1( ) , 0, ,3i x i

i=0

i=2 i=3

i=1

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

0 0.25 0.5 0.75 1

Alpert’s Multiwavelet Basis

-3

-2

-1

0

1

2

3

0 0.25 0.5 0.75 1

-2

-1

0

1

2

3

4

0 0.25 0.5 0.75 1

-4

-3

-2

-1

0

1

2

3

4

0 0.25 0.5 0.75 1

i=0

i=3 i=2

i=1

k=4, n=0

( ) , 0, ,3i x i

Multiwavelet Basis

An orthonormal basis to span

1 1n n nW V V

11 (0) (1)

2 2 1

0

11 (0) (1)

2 2 1

0

( ) ( ) ( )

( ) ( ) ( )

il

il

kn n n

ij j l ij j l

j

kn n n

ij j l ij j l

j

x h x h x

x g x g x

1

2 2

1

2 2

1

2

1

2

n n n

l l l

n n n

l l l

sum

difference

Non-standard form of functions

Construct local basis for Wn-1=Vn- Vn-1

n=0 n=1

0

1

0 1 0

( ) ( )

(2 ) (2 ), (2 ) (2 1), (2 1) (2 ),2

(2 1) (2 1)

( ) ( ), ( ) ( ), ( ) ( ), ( ) ( )

( ) ( ), ( ) ( ), ( ) ( )

V x y

x y x y x yV

x y

x y x y x y x y

W V V x y x y x y

Oct-tree Adaptivity

1-D Example Sub-Tree Parallelism

0

1

2

3

4

5

6

Both sub-trees can be done in parallel.

In 3-D nodes split into 8 children … in 6-D there are 64 children

Separated form for integral operators

• Circumvents exponential scaling with #dimensions• Approach

– Represent the kernel over a finite range as a sum of products of 1-D operators (often, not always, Gaussian)

– Only need compute 1D transition matrices (X,Y,Z)– SVD the 1-D operators (low rank away from singularity)– Apply most efficient choice of low/full rank 1-D operator– Even better algorithms not yet implemented

T∗ f =∫ ds K r−s f s

ri i ' , j j ' , k k 'n , l− l '

=∑=0

M

X i i 'n , lx− l ' x Y j j '

n , l y− l ' y Z k k 'n , l z−l ' zO

High-level composition• Close to the physics

operatorT op = CoulombOperator(k, rlo, thresh);

functionT rho = psi*psi;

double twoe = inner(apply(op,rho),rho);

double pe = 2.0*inner(Vnuc*psi,psi);

double ke = 0.0;

for (int axis=0; axis<3; axis++) {

functionT dpsi = diff(psi,axis);

ke += inner(dpsi,dpsi);

}

double energy = ke + pe + twoe;

E=⟨∣−12∇

2V∣⟩∫

2x

1∣x− y∣

2 y dx dy

Compose directly in terms offunctions and operators

This is a Latex rendering of a program to solve the Hartree-Fockequations for the helium atom

The compiler also output a C++code that can be compiled withoutmodification and run in parallel

He atomHartree-Fock

MADNESS parallel runtime

MPI Global Arrays ARMCI GPC/GASNET

MADNESS math and numerics

MADNESS applications – chemistry, physics, nuclear, ...

MADNESS architecture

Intel Thread Building Blocks future target for multicore

Runtime Objectives● Scalability to 1+M processors ASAP

● Runtime responsible for ● scheduling and placement, ● managing dependencies & hiding latency

● Compatible with existing models (MPI, GA)

● Borrow successful concepts from Cilk, Charm++, Python, HPCS languages

Why a new runtime?• MADNESS computation is irregular & dynamic

– 1000s of dynamically-refined meshes changing frequently & independently (to guarantee precision)

• Because we wanted to make MADNESS itself easier to write not just the applications using it– We explored implementations with MPI, Global

Arrays, and Charm++ and all were inadequate

• MADNESS is helping drive– One-sided operations in MPI-3, DOE projects in

fault tolerance, ...

Key runtime elements

• Futures for hiding latency and automating dependency management

• Global names and name spaces

• Non-process centric computing– One-sided messaging between objects– Retain place=process for MPI/GA legacy

compatibility

• Dynamic load balancing– Data redistribution, work stealing,

randomization

Futures● Result of an

asynchronous computation– Cilk, Java, HPCLs,

C++0x

● Hide latency due to communication or computation

●

● Management of dependencies– Via callbacks

int f(int arg);ProcessId me, p;

Future<int> r0=task(p, f, 0);Future<int> r1=task(me, f, r0);

// Work until need result

cout << r0 << r1 << endl;

Process “me” spawns a new task in process “p”to execute f(0) with the result eventually returnedas the value of future r0. This is used as the argumentof a second task whose execution is deferred until its argument is assigned. Tasks and futures can register multiple local or remote callbacks to express complex and dynamic dependencies.

Global Names

● Objects with global names with different state in each process– C.f. shared[threads]

in UPC; co-Array

● Non-collective constructor; deferred destructor– Eliminates synchronization

class A : public WorldObject<A>{

int f(int);};ProcessID p;A a;Future<int> b =

a.task(p,&A::f,0);

A task is sent to the instance of a in process p.If this has not yet been constructed the messageis stored in a pending queue. Destruction of aglobal object is deferred until the next user synchronization point.

Global Namespaces● Specialize global names

to containers– Hash table done– Arrays, etc., planned

● Replace global pointer (process+local pointer) with more powerful concept

●

● User definable map from keys to “owner” process

class Index; // Hashableclass Value {

double f(int);};

WorldContainer<Index,Value> c;Index i,j; Value v;c.insert(i,v);Future<double> r =

c.task(j,&Value::f,666);

Namespaces are a large part of the elegance of Python and success of Charm++ (chares+arrays)

A container is created mapping indices to values.

A value is inserted into the container.

A task is spawned in the process owning key j to invoke c[j].f(666).

Multi-threaded architecture

RMI Server(MPI or portals)

Incoming activemessages

Task dequeue

Incoming activemessages

Applicationlogical main

thread

Outgoing active messages

Work stealingMust augment with cache-aware algorithms and scheduling

Virtualization of data and tasks

Parameter:

MPI rank

probe()

set()

get()

Future Compress(tree):

Future left = Compress(tree.left)

Future right = Compress(tree.right) return Task(Op, left, right)

Compress(tree) Wait for all tasks to complete

Task:

Input parameters

Output parameters

probe()

run()

Benefits: Communication latency & transfer time largely hidden

Much simpler composition than explicit message passing

Positions code to use “intelligent” runtimes with work stealing

Positions code for efficient use of multi-core chips

Scaling of Poisson and Helmholtz in MADNESS with Lattice of Atom Centered Potential

Charge density for Poisson Equation

Scattering for Helmholtz Eqn, bound-state

or Yukawa

Lattice of 1000 atoms

Potential centered at each atom center

10 levels of refinement

1.91B equations and variables

Scaling of MADNESS for on Cray XT-5 Poisson and Helmholtz

1.91B eqns, 10 levels of refinement, rel. accuracy 1.e-10, 6/2010

0.01

0.1

1

10

100

1000

1024 2048 3072 4096 8192 16384 32768 65536

Cores

Tim

e (

sec)

Projection

Compression

Reconstruction

Poisson

Helmholtz

Summary

• MADNESS is a general purpose framework for scientific simulation – Conceived for the next (not the last) decade

– Increases HPC productivity by reducing many sources of complexity

– Deploys advanced math, numerics, and C/S

http://code.google.com/p/m-a-d-n-e-s-s

Funding• DOE: SciDAC, Office of Science divisions of Advanced

Scientific Computing Research and Basic Energy Science, and the Math Base Program under contract DE-AC05-00OR22725 with Oak Ridge National Laboratory, in part using the National Center for Computational Sciences.

• DARPA HPCS2: HPCS programming language evaluation

• NSF CHE-0625598: Cyber-infrastructure and Research Facilities: Chemical Computations on Future High-end Computers

• NSF CNS-0509410: CAS-AES: An integrated framework for compile-time/run-time support for multi-scale applications on high-end systems

• NSF OCI-0904972: Computational Chemistry and Physics Beyond the Petascale

ReferencesB. Alpert, G. Beylkin, D. Gines, and L. Vozovoi, “Adaptive Solution of Partial Differential Equations in Multiwavelet Bases,”

Journal of Computational Physics, v. 182, pp. 149-190, 2002

G.I. Fann, G. Beylkin, R.J. Harrison and K.E. Jordan, “Singular operators in multiwavelet bases,” IBM J. Res. Dev., 48 (2004) 161.

R.J. Harrison, G.I. Fann, T. Yanai, Z. Gan and G. Beylkin, “Multiresolution quantum chemistry: basic theory and initial applications,” J. Chem. Phys., 121 (2004) 11587.

T. Yanai, G.I. Fann, Z. Gan, R.J. Harrison, G. Beylkin, “Multiresolution quantum chemistry in multiwavelet bases: Analytic derivatives for Hartree-Fock and density functional theory,” J. Chem. Phys., 121 (2004) 2866.

T. Yanai, R.J. Harrison and N.C. Handy, “Multiresolution quantum chemistry in multiwavelet bases: time-dependent density functional theory with asymptotically corrected potentials in local density and generalized gradient approximations,” Mol. Phys., 103 (2004) 403.

R. Harrison, G. Fann, Z. Gan, T. Yanai, S. Sugiki, A. Beste, and G. Beylkin, “Multiresolution computational chemistry,” J. Physics, Conference Series, 16, 243-246, 2005.

T. Yanai, R.J. Harrison, G.I. Fann and G. Beylkin, “Multi-resolution quantum chemistry: linear response for excited states,” J. Chem. Phys., submitted for publication, 2005.

G. Beylkin, R. Cramer, G. Fann and R. J. Harrison, “Multiresolution separated representations of singular and weakly singular operators,” Applied and Computational Harmonic Analysis, 23 (2007) 235-253

H. Sekino, Y. Maeda, T. Yanai, and R.J. Harrison, “Basis set limit Hartree-Fock and density functional theory response property evaluation by multiresolution multiwavelet basis,” J. Chem. Phys,129(3):034111 , 2008

G.I. Fann, J.C. Pei, R.J. Harrison ,J. Jia, J.C. Hill, M.J. Ou, W. Nazarewicz, W.A. Shelton and N. Schunck., “Fast multiresolution methods for density functional theory in nuclear physics,” Journal of Physics, 180, 012080, 2009

M. Reuter, J. Hill and R.J. Harrison, “Solving PDEs in Irregular Geometries with Multiresolution Methods I: Embedded Dirichlet Boundary Conditions,” Computer Physics Communications, submitted March 2010

Documents

Hybrid Programming Models in Computational Chemistry€¦ · 1 Managed by UT-Battelle 100,000 CPUsfor the U.S. Department of Energy NWChem •Premier DOE computational chemistry software