Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Hybrid Programming Models inComputational Chemistry
Edo Apra, George I. Fann1, and Robert J. Harrison1,2, 1Oak Ridge National Laboratory
2University of Tennessee, Knoxville
Outline• This presentation provides a brief overview
of two models that computational chemists have been using which is scalable, achieves high performance and practicle.– Global Array as in PNNL's NWChem– Futures and MADNESS runtime as in ORNL's
MADNESS
http://code.google.com/p/m-a-d-n-e-s-shttp://harrison2.chem.utk.edu/~rjh/madness/
open source GPL-2
Schrodinger’s Equation, Electronic Structure
Electronic Structure, e.g. Hartree-Fock wavefunction methods, where N is
the number of particles (e.g. electrons and nuclei)
Kohn-Sham Equations
Ground state energy, given as a
functional of the charge density, with
kinetic energy, exchange
potential+nuclei-electron+ electron-
electron
Kinetic energy, summing over all
particles
Electron-electron energy, including
electro-static, exchange-correlation
Re-introduce orthonormal wave-
functions
Define a functional, with Lagrange
multipliers to ensure orthonormality
Minimize with respect to
1 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
NWChem
• Premier DOE computational chemistry software developed at PNNL
• Provides science to solution
• Provide major modeling and simulation capability for molecular science
– Broad range of molecules, including catalysts, biomolecules, and heavy elements
– Solid state capabilities
• Performance characteristics – designed for MPP
– Single node performance comparable to best serial codes
– Runs on a wide range of computers
• Freely distributed → large and active user community
• 1000 plus peer-reviewed publications citing NWChem usage
• Uses Global Arrays/ARMCI for parallelization
• ORNL contribution: Cray XT porting and tuning
2 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
NWChem Architecture R
un
-tim
e d
ata
ba
se
DFT energy, gradient, …
MD, NMR, Solvation, …
Optimize, Dynamics, …
SCF energy, gradient, …
Inte
gral A
PI
Geo
met
ry O
bje
ct
Basi
s S
et O
bje
ct
...
PeI
GS
...
Global Arrays
Memory Allocator
Parallel IO
Molecular
Modeling
Toolkit
Molecular
Calculation
Modules
Molecular
Software
Development
Toolkit
Generic
Tasks Energy, structure, … Object-oriented design
• abstraction, data hiding, APIs
Parallel programming model
• non-uniform memory access, Global Arrays, MPI
Infrastructure
• GA, Parallel I/O, RTDB, MA, ...
Program modules
• communication only through the database
• persistence for easy restart
3 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Global Arrays
single, shared data structure/
global indexing
e.g., access A(4,3) rather than
buf(7) on task 2
Legacy of Jarek Nieplocha
Physically distributed data
• Distributed dense arrays that can be accessed through a shared memory-like style • High level abstraction layer for the application developer (that’s me!) • One-sided model = no need to worry and send/receive
Global Address Space
4 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
GA call to ARMCI communication
17 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Gaussian DFT computational kernel
Evaluation of XC potential matrix element
my_next_task = SharedCounter()
do i=1,max_i
if(i.eq.my_next_task) then
call ga_get()
(do work)
call ga_acc()
my_next_task =
SharedCounter()
endif
enddo
barrier()
Both GA operations are greatly dependent on the
communication latency
18 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Methods for Latency Hiding
Aggregating small messages – Not always feasible, depending on algorithm
Data replication – Large memory requirement
– Use of collective operations -> scaling bottleneck
Non-blocking communication – Not always supported by either the communication layer (MPI) or the
network
– HW and SW must be able to overlap communication and computation.
Task-based parallelism
5 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Issues in Petaflops Computing -
Programming models
• Message-Passing • Message-passing fro MPI1 current workhorse
– Strengths and limitations
• Failure of MPI2
• New promising feature of MPI3 , under development/discussion
– Fault-tolerance
– Non-blocking collectives
– Remote Memory Access (RMA)
• Partitioned global address space (PGAs) programning models
• Libraries: Global Arrays
• Languages: X10, Chapel, Fortress, Charm++, Madness RT
• Networking layers: Gasnet, ARMCI, more to come, …
• Hybrid approaches using MPI inter-node plus for intra-node
– OpenMP, Pthreads, Intel Threading Building Blocks (TBB), etc ..
6 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Trends in Chemistry Codes
• Multi-level parallelism
– Effective path for most applications to scale to 100K processors
– Examples: coarse grain over vibrational degrees of freedom in numerical hessian, or geometries in a surface scan or parameter study
– Conventional distributed memory within each subtask
– Fine grain parallelism within a few processor SMP (multi-threads, OpenMP, parallel BLAS, …)
• Efficient exploitation of fine grain parallelism is a major concern on future architectures
7 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Trends in Chemistry Codes
• Multi-level parallelism
– Effective path for most applications to scale to 100K processors
– Examples: coarse grain over vibrational degrees of freedom in numerical hessian, or geometries in a surface scan or parameter study
– Conventional distributed memory within each subtask
– Fine grain parallelism within a few processor SMP (multi-threads, OpenMP, parallel BLAS, …)
• Efficient exploitation of fine grain parallelism is a major concern on future architectures
8 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
ARMCI GAS Runtime
Application
Global
Arrays
native interfaces- LAPI, GM/Myrinet,
threads, IB, Portals
CAF GP-
SHME
M
ARMCI
ARMCI can coexist with MPI as an
implementation layer for multiple
programming models and applications.
Message
Passing (MPI)
• Remote Memory Access Functionality
– put, get, accumulate (also handles noncontiguous data transfers)
– nonblocking communication interfaces
– atomic read-modify-write, mutexes and locks
– memory allocation
– fence operations for managing memory consistency
• Characteristics
– simple progress rules (like Cray SHMEM)
– widespread portability through ports to native network interfaces
– exploits and exposes shared memory to the user
– works with MPI
9 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
ARMCI port for Cray XT
• Part of collaborative effort between ORNL and PNNL. This ORNL contributed port will be integrated in GA/NWChem releases distributed by PNNL
• Uses the Portals library for inter-node communication
• SMP aware: uses SysV shared memory for intra-node communication
• Server-layer for calls that do not map directly onto the network
10 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
XT Specific enhancements to ARMCI
• Cloning vs. pthreads/fork
– Traditional implementation forks a process to act as the Communication Helper Thread or creates a thread using pthreads library
– For XT, to ensure Portals specific environment is not duplicated, we clone with a clean stack
• State checking
– For debugging, advanced state information in maintained and aggregated to obtain overall communication state in case of any errors or race conditions
• Smarter memory usage for communication
– Single Portals MD created to use across multiple application memory allocation even across process groups
• Data response as ACK’s
– Utilizes Cray Portals events to consider data responses as acknowledgements
• Utilizes a single copy of Portals NID/PID information necessary for communication across all the clients and communication helper thread
Functionality
Matches
Portals API
Missing
functionality
Thin Wrapper Full Implementation
Portals CLE
11 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Global Address Space
• Global Address Space (GAS) programming models: Co-Array Fortran, SHMEM, and X10
• Global Arrays = GAS library that allows PGAS-style programming (i.e. distinction of global and local memory)
12 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Preliminary Steps for CCSD(T) run
• Hartree-Fock run – requires few 100 procs, followed
• CCSD SCF (iterative run)
– Starts from the molecular orbitals generated by Hartree-Fock
– requires a few 1000 procs, communication intensive
• CCSD(T) reuses
– the T1 and T2 amplitudes computed by CCSD
– HF molecular orbitals
13 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
NWChem code changes required to
scale beyond 100K cores
• Many to one communication patterns causes job
– to progress very slowly (best case)
– to hang (or worse)
• Diagnosis: cumbersome … lucky coredumps!
• Causes: several processors simultaneously accessing the same patch of a matrix, where the patch is owned by a single processor.
• Solutions:
– Staggered access
– Use of a subset of processing elements
– Atomic shared counter: first token is static
14 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
CCSD(T) run on Cray XT5 : 24 water
(H2O)24
72 atoms 1224 basis functions
Cc-pvtz(-f) basis
November 2009
Floating-Point performance
at 223K cores:
1.39 PetaFLOP/s
15 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Scientific impact
POTENTIAL DEVELOPMENT PHILOSOPHY
Water Clusters
Molecular level information
Binding energies, Structures
Liquid water, ices, aqueous interfaces
Macroscopic properties
Enthalpies (H), Radial Distribution Functions, Diffusion coefficient, dielectric constant, 2nd virial,
…
Picture: Snapshot during a nuclear quantum statistical trajectory (32 beads/atom)
of liquid water with the developed interaction potential
Objective: Model the properties of liquid water and ice
Approach: Rely on highly accurate electronic structure results in lieu of absence of
experimental data
Implementation: highly accurate CCSD(T) binding energies of water clusters up to
(H2O)24; cannot be obtained on commodity HW, requires Petascale HW
Scientific Results:
first quantitative information on the magnitude of quantum effects in liquid water
accurate description of structural (radial distribution functions), thermodynamic
(enthalpy), transport (diffusion coefficient) and spectral (infrared and 2-D
infrared) properties of liquid water
testbed for the development of a general polarizable force field to study
aqueous solvation of biomolecules
16 Managed by UT-Battelle for the U.S. Department of Energy 100,000 CPUs
Outstanding 1.39 PetaFLOP/s sustained double-precision performance using the freely distributed NWChem electronic structure code
Performance result obtained on the full Cray XT5 machine at 60% peak FP, fully exploiting the aggregate CPU, RAM and network resources
Innovative use of one-sided communication model via the Global Arrays library
Addresses important scientific questions regarding the modeling of water
Summary
What is MADNESS?
• A general purpose numerical environment for reliable and fast scientific simulation– Chemistry, nuclear physics, atomic physics,
material science, nanoscience, climate, fusion, ...
• A general purpose parallel programming environment designed for the peta/exa-scales– Standard C++ with concepts from Cilk, Charm++,
HPCS languages– Compatible by design with existing applications– Runs on the world’s largest computers
What is MADNESS?
• MADNESS is a framework– Like NWChem, PETSc, ...
• Frameworks– Increase productivity; hide complexity– Interface disciplines; capture knowledge– Open HPC to a wider community– Long-lived, communal projects with
broad impact
Numerics
Parallel Runtime
Applications
Why MADNESS?
• MADNESS addresses many of the sources of complexity that constrain our HPC ambitions– Science, physics, theory, ...
• Constantly evolving but can take years to implement
– Scalable algorithms and math• Need rapid deployment of the latest and greatest
– Software• Crude parallel programming tools with explicit expression
and management of concurrency and data
– Hardware• Millions of cores with deep memory hierarchy
Why MADNESS?
• Reduces S/W complexity– MATLAB-like level of composition of scientific
problems with guaranteed speed and precision– Programmer not responsible for managing
dependencies, scheduling, or placement
• Reduces numerical complexity– Solution of integral not differential equations– Framework makes latest techniques in applied
math and physics available to wide audience
E.g., with guaranteed precision of 1e-6 form a numerical representation of a Gaussian in
the cube [-20,20]3, solve Poisson’s equation, and plot the resulting potential
(all running in parallel with threads+MPI)
There are only two lines doing real work. First the Gaussian (g) is projected intothe adaptive basis to the default precision. Second, the Green’s function is applied.The exact results are norm=1.0 and energy=0.3989422804.
output: norm of f 1.00000000e+00 energy 3.989420526e-01
Example Applications
Molecular HF and DFTEnergy andgradients
ECPs coming(Sekino, Kato)
Responseproperties(Vasquez andSekino)
Still not as functional as previous Python version
Spin density of solvated electron
Nuclear physics
J. Pei, G.I. Fann, Y. Ou, W. NazarewiczUT/ORNL
● DOE UNDEF● Nuclei & neutron matter● ASLDA● Hartree-Fock Bogliobulov● Spinors● Gamow states
Imaginary part of the seventh eigen function two-well Wood-Saxon potential
Solid-state physics
• Thornton, Eguiluz and Harrison (UT/ORNL)– NSF OCI-0904972: Computational
chemistry and physics beyond the petascale
• Full band structure with LDA and HF for periodic systems
• In development: hybrid functionals, response theory, post-DFT methods such as GW and model many-body Hamiltonians via Wannier functions
Coulomb potential isosurface in LiF
Time evolutionVence, Harrison, Krstic, Jia, Fann (UT/ORNL)
• Multiwavelet basis not optimal– Not strongly band limited; explicit methods unstable
* DG introduces flux limiters, we use filters
• Semi-group approach– Split into linear and non-linear parts
• Trotter-Suzuki methods– Time-ordered exponentials – Chin-Chen gradient correction (JCP 114, 7338, 2001)
u̇ x , t = L uN u , t
u x , t = eLt u x , 0∫
0
t
eL t− N u ,d
eAB=e A/2 eB eA /2
O ∥[[A , B ] , A]∥
Dynamics of H2
+ in laser• 4D – 3 electronic + internuclear coordinate
– First simulation with quantum nuclei and non-collinear field (field below is transverse)E
lect
roni
c di
pole
a.u
.
Time a.u.
R a.u.
-2.017
-2.032
Field
Nanoscale photonics(Reuter, Northwestern; Hill, Harrison ORNL)
Diffuse domain approximation for interior boundary value problem; long-wavelength Maxwell equations; Poisson equation; Micron-scale Au tip 2 nm above Si surface with H2 molecule in gap – 107 difference between shortest and longest length scales.
Advantages of Integral Form
• E.g.,– Often soluble without any preconditioning and
often without any iteration (as in this case) – Can obtain higher accuracy– In simple domains builds in correct asymptotics– Potentially more computationally efficient
• Challenge and solution– In most bases integral operator is dense & slow– Multiresolution analysis provides fast
algorithms with guaranteed precision
∇2 u (r )=−4 πρ (r ) v.s. u (r )=∫G (r , r ' )ρ (r ' )d 3 r '
Electrostatics ∇ 2 u (r )=−4 π ρ(r )
u (r) = ∫G (r , r ' )ρ(r ')d 3 r '+
∮∂Ω[G(r , r ' )∇ ' u (r ' )−u (r ' )∇ ' G(r , r ' )] . dS
G (r , r ') =1
4 π∣r−r '∣
Quantum mechanics (−12∇2+ V (r ))ψ (r )=E ψ(r )
ψ(r )=−2 (∇ 2+ 2 E )
−1V (r )ψ (r )
Time evolution L̂ u (r , t)+ N (u , t)=dudt
u (r , t)=et L̂ u (r ,0)+∫0
t
e(τ−t ) L̂ N (u , τ)d τ
e t ∇2
f (r )=(4 π t )−d /2∫ e
−(r−r ')2
4 t f (r ' )d d r
The math behind the MADNESS
• Discontinuous spectral element basis– High-order convergence ideally suited for modern
computer technology
• Multi-resolution analysis for fast algorithms– Sparse representation of many integral operators– Precision guaranteed through adaptive refinement
• Separated representations of operators and functions– Enable efficient computation in many dimensions
Essential techniques for fast computation
• Multiresolution
• Low-separation rank
• Low-operator rank
V 0⊂V 1⊂⋯⊂V n
V n=V 0V 1−V 0 ⋯ V n−V n−1
f x1, , xn=∑l=1
M
l∏i=1
d
f i l xiO
∥ f il ∥2=1 l0
A=∑=1
r
u vTO
0 vT v=u
T u=
1-D Scaling Function Basis: Alpert’s Multiwavelets
Divide domain into 2n pieces (level n)
– Adaptive sub-division (local refinement)
– lth sub-interval [l*2-n,(l+1)*2-n] l=0,…,n-1
In each sub-interval define a polynomial basis
– First k Legendre polynomials, , rescale and normalized on [0,1)
– Orthonormal, disjoint support
n=0
n=1
n=2
l=0 l=1 l=2 l=3
/2
( ) 2 1 (2 1), [0,1)
( ) 2 (2 ), 2 [ , 1), 0 1
i i
n n n n
il i
x i P x x
x x l x l l l n
iP
Scaling Function Basis:Alpert’s, level, n=2
2
1( ) , 0, ,3i x i
i=0
i=2 i=3
i=1
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
0 0.25 0.5 0.75 1
Alpert’s Multiwavelet Basis
-3
-2
-1
0
1
2
3
0 0.25 0.5 0.75 1
-2
-1
0
1
2
3
4
0 0.25 0.5 0.75 1
-4
-3
-2
-1
0
1
2
3
4
0 0.25 0.5 0.75 1
i=0
i=3 i=2
i=1
k=4, n=0
( ) , 0, ,3i x i
Multiwavelet Basis
An orthonormal basis to span
1 1n n nW V V
11 (0) (1)
2 2 1
0
11 (0) (1)
2 2 1
0
( ) ( ) ( )
( ) ( ) ( )
il
il
kn n n
ij j l ij j l
j
kn n n
ij j l ij j l
j
x h x h x
x g x g x
1
2 2
1
2 2
1
2
1
2
n n n
l l l
n n n
l l l
sum
difference
Non-standard form of functions
Construct local basis for Wn-1=Vn- Vn-1
n=0 n=1
0
1
0 1 0
( ) ( )
(2 ) (2 ), (2 ) (2 1), (2 1) (2 ),2
(2 1) (2 1)
( ) ( ), ( ) ( ), ( ) ( ), ( ) ( )
( ) ( ), ( ) ( ), ( ) ( )
V x y
x y x y x yV
x y
x y x y x y x y
W V V x y x y x y
Oct-tree Adaptivity
1-D Example Sub-Tree Parallelism
0
1
2
3
4
5
6
Both sub-trees can be done in parallel.
In 3-D nodes split into 8 children … in 6-D there are 64 children
Separated form for integral operators
• Circumvents exponential scaling with #dimensions• Approach
– Represent the kernel over a finite range as a sum of products of 1-D operators (often, not always, Gaussian)
– Only need compute 1D transition matrices (X,Y,Z)– SVD the 1-D operators (low rank away from singularity)– Apply most efficient choice of low/full rank 1-D operator– Even better algorithms not yet implemented
T∗ f =∫ ds K r−s f s
ri i ' , j j ' , k k 'n , l− l '
=∑=0
M
X i i 'n , lx− l ' x Y j j '
n , l y− l ' y Z k k 'n , l z−l ' zO
High-level composition• Close to the physics
operatorT op = CoulombOperator(k, rlo, thresh);
functionT rho = psi*psi;
double twoe = inner(apply(op,rho),rho);
double pe = 2.0*inner(Vnuc*psi,psi);
double ke = 0.0;
for (int axis=0; axis<3; axis++) {
functionT dpsi = diff(psi,axis);
ke += inner(dpsi,dpsi);
}
double energy = ke + pe + twoe;
E=⟨∣−12∇
2V∣⟩∫
2x
1∣x− y∣
2 y dx dy
Compose directly in terms offunctions and operators
This is a Latex rendering of a program to solve the Hartree-Fockequations for the helium atom
The compiler also output a C++code that can be compiled withoutmodification and run in parallel
He atomHartree-Fock
MADNESS parallel runtime
MPI Global Arrays ARMCI GPC/GASNET
MADNESS math and numerics
MADNESS applications – chemistry, physics, nuclear, ...
MADNESS architecture
Intel Thread Building Blocks future target for multicore
Runtime Objectives● Scalability to 1+M processors ASAP
● Runtime responsible for ● scheduling and placement, ● managing dependencies & hiding latency
● Compatible with existing models (MPI, GA)
● Borrow successful concepts from Cilk, Charm++, Python, HPCS languages
Why a new runtime?• MADNESS computation is irregular & dynamic
– 1000s of dynamically-refined meshes changing frequently & independently (to guarantee precision)
• Because we wanted to make MADNESS itself easier to write not just the applications using it– We explored implementations with MPI, Global
Arrays, and Charm++ and all were inadequate
• MADNESS is helping drive– One-sided operations in MPI-3, DOE projects in
fault tolerance, ...
Key runtime elements
• Futures for hiding latency and automating dependency management
• Global names and name spaces
• Non-process centric computing– One-sided messaging between objects– Retain place=process for MPI/GA legacy
compatibility
• Dynamic load balancing– Data redistribution, work stealing,
randomization
Futures● Result of an
asynchronous computation– Cilk, Java, HPCLs,
C++0x
● Hide latency due to communication or computation
●
● Management of dependencies– Via callbacks
int f(int arg);ProcessId me, p;
Future<int> r0=task(p, f, 0);Future<int> r1=task(me, f, r0);
// Work until need result
cout << r0 << r1 << endl;
Process “me” spawns a new task in process “p”to execute f(0) with the result eventually returnedas the value of future r0. This is used as the argumentof a second task whose execution is deferred until its argument is assigned. Tasks and futures can register multiple local or remote callbacks to express complex and dynamic dependencies.
Global Names
● Objects with global names with different state in each process– C.f. shared[threads]
in UPC; co-Array
● Non-collective constructor; deferred destructor– Eliminates synchronization
class A : public WorldObject<A>{
int f(int);};ProcessID p;A a;Future<int> b =
a.task(p,&A::f,0);
A task is sent to the instance of a in process p.If this has not yet been constructed the messageis stored in a pending queue. Destruction of aglobal object is deferred until the next user synchronization point.
Global Namespaces● Specialize global names
to containers– Hash table done– Arrays, etc., planned
● Replace global pointer (process+local pointer) with more powerful concept
●
● User definable map from keys to “owner” process
class Index; // Hashableclass Value {
double f(int);};
WorldContainer<Index,Value> c;Index i,j; Value v;c.insert(i,v);Future<double> r =
c.task(j,&Value::f,666);
Namespaces are a large part of the elegance of Python and success of Charm++ (chares+arrays)
A container is created mapping indices to values.
A value is inserted into the container.
A task is spawned in the process owning key j to invoke c[j].f(666).
Multi-threaded architecture
RMI Server(MPI or portals)
Incoming activemessages
Task dequeue
Incoming activemessages
Applicationlogical main
thread
Outgoing active messages
Work stealingMust augment with cache-aware algorithms and scheduling
Virtualization of data and tasks
Parameter:
MPI rank
probe()
set()
get()
Future Compress(tree):
Future left = Compress(tree.left)
Future right = Compress(tree.right) return Task(Op, left, right)
Compress(tree) Wait for all tasks to complete
Task:
Input parameters
Output parameters
probe()
run()
Benefits: Communication latency & transfer time largely hidden
Much simpler composition than explicit message passing
Positions code to use “intelligent” runtimes with work stealing
Positions code for efficient use of multi-core chips
Scaling of Poisson and Helmholtz in MADNESS with Lattice of Atom Centered Potential
Charge density for Poisson Equation
Scattering for Helmholtz Eqn, bound-state
or Yukawa
Lattice of 1000 atoms
Potential centered at each atom center
10 levels of refinement
1.91B equations and variables
Scaling of MADNESS for on Cray XT-5 Poisson and Helmholtz
1.91B eqns, 10 levels of refinement, rel. accuracy 1.e-10, 6/2010
0.01
0.1
1
10
100
1000
1024 2048 3072 4096 8192 16384 32768 65536
Cores
Tim
e (
sec)
Projection
Compression
Reconstruction
Poisson
Helmholtz
Summary
• MADNESS is a general purpose framework for scientific simulation – Conceived for the next (not the last) decade
– Increases HPC productivity by reducing many sources of complexity
– Deploys advanced math, numerics, and C/S
http://code.google.com/p/m-a-d-n-e-s-s
Funding• DOE: SciDAC, Office of Science divisions of Advanced
Scientific Computing Research and Basic Energy Science, and the Math Base Program under contract DE-AC05-00OR22725 with Oak Ridge National Laboratory, in part using the National Center for Computational Sciences.
• DARPA HPCS2: HPCS programming language evaluation
• NSF CHE-0625598: Cyber-infrastructure and Research Facilities: Chemical Computations on Future High-end Computers
• NSF CNS-0509410: CAS-AES: An integrated framework for compile-time/run-time support for multi-scale applications on high-end systems
• NSF OCI-0904972: Computational Chemistry and Physics Beyond the Petascale
ReferencesB. Alpert, G. Beylkin, D. Gines, and L. Vozovoi, “Adaptive Solution of Partial Differential Equations in Multiwavelet Bases,”
Journal of Computational Physics, v. 182, pp. 149-190, 2002
G.I. Fann, G. Beylkin, R.J. Harrison and K.E. Jordan, “Singular operators in multiwavelet bases,” IBM J. Res. Dev., 48 (2004) 161.
R.J. Harrison, G.I. Fann, T. Yanai, Z. Gan and G. Beylkin, “Multiresolution quantum chemistry: basic theory and initial applications,” J. Chem. Phys., 121 (2004) 11587.
T. Yanai, G.I. Fann, Z. Gan, R.J. Harrison, G. Beylkin, “Multiresolution quantum chemistry in multiwavelet bases: Analytic derivatives for Hartree-Fock and density functional theory,” J. Chem. Phys., 121 (2004) 2866.
T. Yanai, R.J. Harrison and N.C. Handy, “Multiresolution quantum chemistry in multiwavelet bases: time-dependent density functional theory with asymptotically corrected potentials in local density and generalized gradient approximations,” Mol. Phys., 103 (2004) 403.
R. Harrison, G. Fann, Z. Gan, T. Yanai, S. Sugiki, A. Beste, and G. Beylkin, “Multiresolution computational chemistry,” J. Physics, Conference Series, 16, 243-246, 2005.
T. Yanai, R.J. Harrison, G.I. Fann and G. Beylkin, “Multi-resolution quantum chemistry: linear response for excited states,” J. Chem. Phys., submitted for publication, 2005.
G. Beylkin, R. Cramer, G. Fann and R. J. Harrison, “Multiresolution separated representations of singular and weakly singular operators,” Applied and Computational Harmonic Analysis, 23 (2007) 235-253
H. Sekino, Y. Maeda, T. Yanai, and R.J. Harrison, “Basis set limit Hartree-Fock and density functional theory response property evaluation by multiresolution multiwavelet basis,” J. Chem. Phys,129(3):034111 , 2008
G.I. Fann, J.C. Pei, R.J. Harrison ,J. Jia, J.C. Hill, M.J. Ou, W. Nazarewicz, W.A. Shelton and N. Schunck., “Fast multiresolution methods for density functional theory in nuclear physics,” Journal of Physics, 180, 012080, 2009
M. Reuter, J. Hill and R.J. Harrison, “Solving PDEs in Irregular Geometries with Multiresolution Methods I: Embedded Dirichlet Boundary Conditions,” Computer Physics Communications, submitted March 2010