Upload
jonas-west
View
214
Download
0
Embed Size (px)
Citation preview
Preliminary CPMD BenchmarksOn Ranger, Pople, and Abe
TG AUS Materials Science ProjectMatt McKenzieLONI
What is CPMD ?
•Car Parrinello Molecular Dynamics▫www.cpmd.org
•Parallelized plane wave / pseudopotential implementation of Density Functional Theory
•Common chemical systems: liquids, solids, interfaces, gas clusters, reactions ▫Large systems ~500atoms
Scales w/ # electrons NOT atoms
Key Points in Optimizing CPMD
•Developers have done a lot of work here
•The Intel compiler is used in this study
•BLAS/LAPACK▫BLAS levels 1 (vector ops) and 3 (matrix-matrix
ops) Some level 2 (vector-matrix)
• Integrated optimized FFT Library▫Compiler flag: -DFFT_DEFAULT
Benchmarking CPMD is difficult because…
• Nature of the modeled chemical system▫ Solids, liquids, interfaces
Require different parameters stressing the memory along the way▫ Volume and # electrons
• Choice of the pseudopotential (psp)▫ Norm-conserving, ‘soft’, non-linear core correction (++memory)
• Type of simulation conducted▫ CPMD, BOMD, Path Integral, Simulated Annealing, etc…▫ CPMD is a robust code
• Very chemical system specific▫ Any one CPMD sim. cannot be easily compared to another▫ However, THERE ARE TRENDS
• FOCUS: simple wave function optimization timing▫ This is a common ab initio calculation
Probing Memory Limitations
•For any ab initio calculation:•Accuracy is proportional to # basis sets
used•Stored in matrices, requiring increased
RAM•Energy cutoff determines the size of the
Plane wave basis set,
NPW = (1/2π2)ΩEcut3/2
Model Accuracy & Memory Overview
Image obtained from the CPMD user’s manual
Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff)NOTE: Choice of psp is importanti.e. ‘softer’ psp = lower cutoff = loss of transferability VASP specializes in soft psp’s ; CPMD works with any psp’s
Memory Comparison Ψoptimization, 63 Si atoms, SGS psp
Ecut = 50 Ryd Ecut = 70 Ryd
• NPW ≈ 134,000
• Memory = 1.0 GB
• NPW ≈ 222,000
• Memory = 1.8 GB
Well known CPMD benchmarking model: www.cpmd.org
Results can be shown either by:Wall time = (n steps x iteration time/step) + network overhead
Typical Results / Interpretations, nothing new here Iteration time = fundamental unit, used throughout any given CPMD calculation
It neglects the network, yet results are comparableNote, CPMD runs well on a few nodes connected with gigabyte ethernet
Two important factor which affects CPMD performance MEMORY BANDWIDTH
FLOATING-POINT
Pople, Abe, Ranger CPMD Benchmarks
0 32 64 96 128 160 192 224 2560
1
2
3
4
5
6
7
8Pople 50 Ryd
Pople 70 Ryd
Abe 50 Ryd
Abe 70 Ryd
Ranger 50 Ryd
Ranger 70 Ryd
Number of Cores
Ave
rag
e I
tera
tion
Tim
e, se
con
ds
Results I
• All calculations ran no longer than 2 hours• Ranger is not the preferred machine for CPMD
• Scales well between 8 and 96 cores▫This is a common CPMD trend
• CPMD is known to super-linearity scale above ~1000 processors▫Will look into this▫Chemical system would have to change as this smaller
simulation is likely not to scale in this manner
Results II
• Pople and Abe gave the best performance
• IF a system requires more than 96 procs, Abe would be a slightly better choice
• Knowing the difficulties in benchmarking CPMD,( psp, volume, system phase, sim. protocol )
this benchmark is not a good representation of all the possible uses of CPMD.▫ Only explored one part of the code
• How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance▫ To increase system accuracy, increase Ecut
Percent Difference between 70 and 50 Ryd%Diff = [(t70-t50) / t50]*100
0 32 64 96 128 160 192 224 2560
10
20
30
40
50
60
70
Pople
Abe
Ranger
Number of Cores
Perc
en
t D
iffere
nce,
rela
tive
to 5
0 R
yd
ConclusionsRANGER• Re-ran Ranger calculations • Lower performance maybe linked to Intel compiler on AMD chips
▫ PGI compiler could show an improvement▫ Nothing over 5% is expected: still be the slowest▫ Wanted to use the same compiler/math libraries
ABE• Possible super-linear scaling, tAbe, 256procs < tothers, 256procs
• Memory size effects hinders performance below 96 procs
POPLE• Is the best system for wave function optimization• Shows a (relatively) stable, modest speed decrease as the memory
requirement is increased, it is the recommended system
Future Work
• Half-node benchmarking• Profiling Tools
• Test the MD part of CPMD▫Force calculations involving the non-local parts of the
psp will increase memory▫Extensive level 3 BLAS & some level 2▫Many FFT all-to-all calls, Now the network plays a role▫Memory > 2 GB
A new variable ! Monitor the fictitious electron mass
• Changing the model▫Metallic system (lots of electrons, change of psp; Ecut)▫Check super-linear scaling