36
QE intro QE intro

QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

QE introQE intro

Page 2: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

I need a DFT, very first stepsI need a DFT, very first steps

1. find the best code for the problem2. download & install3. benchmark4. setup run and run calculation5. postprocess results

Page 3: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Quantum-Espresso the poor man's plane-vawecode

1. find the best code for the problem1. find the best code for the problem

the chemists' choice

http://www.bannedbygaussian.org

go for Siesta instead...

"etalon code"

Gaussian

Page 4: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

+ orthonormal+ independent of atomic positions+ no BSSE± naturally periodic  – many functions needed– localised functions difficult to represent

Why plane-wave?Why plane-wave?

Page 6: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

2. download & install2. download & installUbuntu users: sudo apt install quantum-espresso (mpirun version!)   Not the latest version

    No pwgui    Slow, only for educational use

src package from    Have you ever compiled a linux kernel?

and compile it!  

Quantum Mobile (virtualbox image)kooplex-edu

www.quantum-espresso.org/download

Page 7: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

3. benchmark3. benchmark

we skip this for now, but it is important 

parallelization of the code vs parallelization against someof the parameters

 FFT is important, gnu-compiler vs intel + MKL

Page 8: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Parallelization of QEParallelization of QEParallelization on k-points:

guarantees (almost) linear scaling if the number of k-points is a multiple of the number of

pools;

requires little communications (suitable for ethernet communications);

reduces the required memory per processor by distributing wavefunctions (but not other

quantities like the charge density), unless you set disk_io='high'.

Parallelization on PWs:

yields good to very good scaling, especially if the number of processors in a pool is a divisor

of N3 and Nr3 (the dimensions along the z-axis of the FFT grids, nr3 and nr3s, which coincide

for NCPPs);

requires heavy communications (suitable for Gigabit ethernet up to 4, 8 CPUs at most,

specialized communication hardware needed for 8 or more processors );

yields almost linear reduction of memory per processor with the number of processors in the

pool.

Page 9: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

kooplex-edukooplex-edu 2db XeonE56452db [email protected]@2.40GHz

Cores Time (s)1 238

2 139

3 94

4 72

6 56

8 50

12 38

Cores Time (s)1 208

2 147

3 108

4 87

6 52

8 48

12 40

16 41

Page 10: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

4. setup run ...4. setup run ...

examples and tests for every package, e.g. qe_dir/PW/examples  graphical interfaces: pwgui, avogadro, etc  text editor & scripting

Page 11: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

main control blockmain control block

 &control     prefix='silicon',     ! this is a comment: you can comment out variables     ! set pseudo_dir and outdir to suitable directories     pseudo_dir = '../pseudo/',     outdir='../tmp/'  /

Page 12: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

System blockSystem block

 &system         ibrav=  2, celldm(1) =10.2, nat=  2, ntyp= 1,     ecutwfc = 18.0,  /

ibrav : type of Bravais lattice   celldm : cell parameters in atomic units (?)    nat : number of atoms in unit cellntyp : number of types of atomsecutwfc : cutoff energy (in Ry)ecutrho : default 4*ecutwfc (in Ry)  

more infowiki

MIND the commas and slashes!

Page 13: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Atoms & kpointsAtoms & kpointsATOMIC_SPECIES  Si  28.086  Si.pz-vbc.UPF ATOMIC_POSITIONS alat  Si 0.00 0.00 0.00  Si 0.25 0.25 0.25 #K_POINTS tpiba # this is also a comment: next line, number of k-points # 2 # 0.25 0.25 0.25  1.0 # 0.25 0.25 0.75  3.0K_POINTS automatic  2 2 2 1 1 1 

Page 14: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and
Page 15: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Atoms & kpointsAtoms & kpointsATOMIC_SPECIES  Si  28.086  Si.pz-vbc.UPF ATOMIC_POSITIONS alat  Si 0.00 0.00 0.00  Si 0.25 0.25 0.25 #K_POINTS tpiba # this is also a comment: next line, number of k-points # 2 # 0.25 0.25 0.25  1.0 # 0.25 0.25 0.75  3.0K_POINTS automatic  2 2 2 1 1 1 

Page 16: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and
Page 17: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Atoms & kpointsAtoms & kpointsATOMIC_SPECIES  Si  28.086  Si.pz-vbc.UPF ATOMIC_POSITIONS alat  Si 0.00 0.00 0.00  Si 0.25 0.25 0.25 #K_POINTS tpiba # 2 # 0.25 0.25 0.25  1.0 # 0.25 0.25 0.75  3.0K_POINTS automatic  2 2 2 1 1 1 

Page 18: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and
Page 19: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

4. ... run calculation4. ... run calculationrun erhm what?run erhm what?

ls pwscfdir/bin:       alpha2f.x     dos.x      fd_ifc.x          generate_rVV10_kernel_table.x 

iotk_print_kinds.x  molecularpdos.x       plan_avg.x  projwfc.x       pwi2xsf.x     sumpdos.x

average.x     dynmat.x   fd.x              generate_vdW_kernel_table.x    iotk.x              neb.x                

plotband.x  pw2bgw.x        pw.x          wannier_ham.x

bands.x       epa.x      fermi_proj.x      ibrav2cell.x                   kpoints.x           open_grid.x          

plotproj.x  pw2gw.x         q2qstar.x     wannier_plot.x

bgw2pw.x      epsilon.x  fermi_velocity.x  importexport_binary.x          lambda.x           

path_interpolation.x  plotrho.x   pw2wannier90.x  q2r.x         wfck2r.x

cell2ibrav.x  ev.x       fqha.x            initial_state.x                manypw.x            phcg.x                pmw.x      

pwcond.x        q2trans_fd.x

dist.x        fd_ef.x    fs.x              iotk                           matdyn.x            ph.x                  pp.x       

pw_export.x   q2trans.x

 

Page 20: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

PWSCF executablesPWSCF executablespw.x : Plane-Wave Self-Consistent Fieldph.x : phonons with Density-Functional PerturbationTheorycp.x : Car-Parrinello Molecular Dynamicsneb.x : energy barriers and reaction pathwaysthrough the Nudged Elastic Band methodpwcond.x : ballistic conductancedos.x bands.x pp.x ... : various utilities for datapostprocessing  

: a Graphical User Interface, producing inputdata files for PWscfld1.x : (atomic) a program for atomic calculations andgeneration of pseudopotentials

pwgui

Page 21: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Bulk siliconBulk silicon

diamond-like: C 3.567Si 5.431Ge 5.658

select your favourite ...

zinc-blende-like: AlAs 5.6605 AlP 5.4510, AlSb 6.1355 GaP 5.4505 GaAs 5.653 GaSb 6.0959 InP 5.869 InAs 6.0583 InSb 6.479

Page 22: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Example 1Example 1

Convergency testsConvergency tests

Page 23: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

- download pseudo (LDA+PAW for Si)PART 1- change ecutwfc (10, 20, ... Ry) & run- find total energies- plot results & find the converged ecutwfcPART2- set ecutwfc to converged value- increase kpoint set 2x2x2 ... 9x9x9 & run- plot results & find the converged k-set

wget <link> pw.x < infile >outfilegrep '!' outfile 

go to my_example1 folder

Page 24: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Example2Example2Geometry optimization byGeometry optimization by

handhand

Page 25: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

- set ecutwfc, k-set (from my_example1)- change unit cell parameterA=5.3 -- 5.5 in 0.02 steps (Ångström) & run- find total energies- plot & fit parabola

  pw.x < infile >outfilegrep '!' outfile 

go to my_example2 folder

experimental cell parameter A=5.341Å

Page 26: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Example3Example3Geometry optimization byGeometry optimization by

built-in methodbuilt-in method

Page 27: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

- add calculation='vc-relax' to control block- add &cell + &ion blocks (even if they areempty) - find final coordinates (in units of originallattice constant ...) - compare result with "handy method"

 pw.x < infile >outfile 

calculation switch

go to my_example3 folder

Page 28: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Example4Example4Band structureBand structure

Page 29: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

- optimized structure, scf run with k-set- add calculation='bands'      nbnd= number_of_bands      k-path in BZ- run pw.x again- band.x to create gnuplot-compatibleoutput- plotband.x (writes plot scripts) 

 

  pw.x < infile >outfilegrep '!' outfile 

go to my_example4 folder

Page 30: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Band structure of bulk SiBand structure of bulk Si

Page 31: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Example5Example5Density of statesDensity of states

Page 32: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

- setup scf calculation - setup dos calculation & run - plot DOS- DOS is ugly   - increase kpoints   - add occupations='tetrahedron' to scf &remove degauss from dos input - DOS is pretty

pw.x < infile >outfile dos.x <dos.in >dos.out mpirun -np 6 pw.x  manual

go to my_example5 folder

Page 33: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

DOS of bulk siliconDOS of bulk silicon

Page 34: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

Example6Example6Bulk modulusBulk modulus

Page 35: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and

- add tstress=.true. tag to  &control block  - change the volume size by ±0.5%, ±1.0%, ... - find the stress tensor in output,differentiate numerically & get the unitsright ...

go to my_example6 folder

B = −V ∂V∂p

Alternatively, set target stress to some value, perform geometryoptimization, work out the bulk modulus...

Page 36: QE intro - Eötvös Loránd Universityoroszl.web.elte.hu/dft/slides04.pdfyields good to very good scaling, especially if the number of pr ocessors in a pool is a divisor of N3 and