30
http://aimpro.ncl.ac.uk MMG Skills Lecture Series Introductory concepts: Computer etiquette Jon Goss

Introductory concepts: Computer etiquette

  • Upload
    lela

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Introductory concepts: Computer etiquette. Jon Goss. Outline. Get organised Consistency vs efficiency Compute machine vs file server Hierarchical calculation strategy Restarting calculations Throughput vs turn-around time Interacting with a batch queue. Get organised. - PowerPoint PPT Presentation

Citation preview

Page 1: Introductory concepts: Computer etiquette

http://aimpro.ncl.ac.uk MMG Skills Lecture Series

Introductory concepts:Computer etiquette

Jon Goss

Page 2: Introductory concepts: Computer etiquette

MMG Skills Lecture Series2

Outline

Get organisedConsistency vs efficiencyCompute machine vs file serverHierarchical calculation strategyRestarting calculationsThroughput vs turn-around timeInteracting with a batch queue

Page 3: Introductory concepts: Computer etiquette

MMG Skills Lecture Series3

Get organised

You are in an area of study that has the potential to produce vastvast amounts of data

Why do we need to store it?It is crucialcrucial that you adopt a comprehensive, structured filing system from the very start.

Alpha-numerical file naming with index filesindex files – could be chronological.

Potentially many files per directory.HierarchicalHierarchical set of directories based on application

Try to keep few files per directory.

You’ll probably have to reviserevise your filing system over time.Acid test:

When asked to locate a file, can you do it within a few seconds?Minutes?Hours?Days?

Page 4: Introductory concepts: Computer etiquette

MMG Skills Lecture Series4

Consistency vs efficiency

When you beginbegin researching an application, you may adopt a parameter set from some previous calculations to be consistentconsistent with them

The foundationfoundation work may already be thereBulk material: basis sets, lattice constants, convergence tests…

This maymay allow direct comparison between previous and current calculations

Binding / reaction energiesMarker method calculations for electrical levelsStructures for better starting points

It reducesreduces the probability of trivial errors – it does not eliminateeliminate them.

Page 5: Introductory concepts: Computer etiquette

MMG Skills Lecture Series5

Consistency vs efficiency

Consistency is good, but not at any costnot at any cost:The old parameter set may be inefficientinefficient in terms of CPU time per calculation.New parameters may be better designedbetter designed (e.g. optimised basis sets) for this specific project

In the long run it may be better to spend time at the start setting your parameters as this gives you a higher level of confidence in the accuracy of the results, and grounds you in the fundamentals of the calculations

You may well find that you have to repeatrepeat large numbers of calculations if you have to revise from a sub-optimal setup…

Calculating the same basic values twice is unlikely to be exemplary efficiency

Page 6: Introductory concepts: Computer etiquette

MMG Skills Lecture Series6

Consistency vs efficiency

What ever you do, it should be agreed with

all involved in the project, and you should

choose your strategy carefully at the start:

Think!Think!

Page 7: Introductory concepts: Computer etiquette

MMG Skills Lecture Series7

Compute machine vs file server

This distinction is part of “organisation”, and is important in:

Avoiding duplicationduplicationAvoiding lossloss of dataAllowing accessaccess for others involved in the projectKeeping accessaccess to important files for yourself

Some computers are there for calculationscalculations:

Rhodes, Verity, Braid, Hector, HPCx, …Other machines are designed to store store filesfiles:

Trueman, Snufkin…

Page 8: Introductory concepts: Computer etiquette

MMG Skills Lecture Series8

File storage

We need to answer three questions:Why do we need to store files?Which files do we need to keep?How can we minimise file-space usage?

Page 9: Introductory concepts: Computer etiquette

MMG Skills Lecture Series9

File storage: why? why?

Why do we need to store files?Scientific ethics require us to be able to back up our claims!They are one of our chief resources for future research.

Page 10: Introductory concepts: Computer etiquette

MMG Skills Lecture Series10

File storage

Which files do we need to keep?Always AIMPRO standard output, plus

Bandstructures: bandst.out, bandt.pltEELS / OA: bandst.out, dieln output filesMulliken: bandt.outNEB: maybe res.nebAIMVIEW: bandst.out, dump files (careful with these)DDS: maybe derivs.txtDoS: bandst.out , dos.out , dos.dump

Keep all parts of a runIf you have to restart a relaxation, for example, keep all parts.

What don’t we keep in our permanent filing:Restart dump files!fort.99, standard error files (e.g. aim2.3.01b.sh.e18371)Aimpro input files (dat, pseudo-pots, hgh-pot, bandst.dat,…) unless they are modified specifically for this runspecifically for this run, and cannot be re-created from the aimpro output.

Page 11: Introductory concepts: Computer etiquette

MMG Skills Lecture Series11

File storage

How can we minimise file-space usage? Most files we generate are ASCIIThey can be reduced in size by compressing them (we’ll look again at this in another lecture):

bzip2 <file>(You can learn about this by typing man bzip2 on snufkin.)A typical aimpro output file may be reduced in size by 80% without any loss of datawithout any loss of data.‘‘Bzipped’ files can be viewed, gres’d and even Bzipped’ files can be viewed, gres’d and even edited.edited.

You have a quotaquota on most machinesIf you exceed it, you’ll be unable to do very muchYou may be prevented from logging into the machine!

If there is no quota, you may fill the disk you may fill the disk space and affect all other usersspace and affect all other users.

Page 12: Introductory concepts: Computer etiquette

MMG Skills Lecture Series12

Calculation strategy: hierarchy of hierarchy of costscosts

The majority of the AIMPRO computational time is taken in obtaining total energies:

The self-consistent cycle.

We focus on this part of the calculation for speed.

Obtainρin

GenerateĤ

Obtainρout

ρin=ρout? Done

Page 13: Introductory concepts: Computer etiquette

MMG Skills Lecture Series13

Calculation strategy: hierarchy of hierarchy of costscosts

Commonly, a goal may be reached in a sequentialsequential method, minimizing computational effort……we want to maximise the amount of sciencescience we can do so we need to be able to answer the following questions:

How do the decisions we make in constructing a data file, and how we run the job affect the time & efficiency?Which factors are most important?

Page 14: Introductory concepts: Computer etiquette

MMG Skills Lecture Series14

Calculation strategy: hierarchy of hierarchy of costscosts

Number of atomsNumber of different speciesSelf-consistency methodNumber of basis functionsNumber of exponentsMaximum orbital angular momentumInitialisation charge density basisNumber of k-pointsLocation of k-pointsSpin statePlane-wave basisNumber of processorsThe amount of memory available

K-point parallelismNumber of symmetry operationsDIISReal-space buildThe amount of vacuum (molecules and surfaces)Number of images in a NEBNumber of k-points in a band structureNumber of bands in an EELS runNumber of energies in an EELS runNumber of atoms included in the derivatives

Page 15: Introductory concepts: Computer etiquette

MMG Skills Lecture Series15

Calculation strategy: hierarchy of hierarchy of costscosts

Number of atomsNumber of different speciesSelf-consistency methodNumber of basis functionsNumber of basis functionsNumber of exponentsMaximum orbital angular momentumInitialisation charge density basisNumber of k-pointsNumber of k-pointsLocation of k-points Location of k-points (real/complex)(real/complex)

Spin stateSpin statePlane-wave basis (vacuum) (vacuum)Number of processorsThe amount of memory The amount of memory availableavailable

K-point parallelismNumber of symmetry operationsDIIS (extensive parallelism) (extensive parallelism)Real-space buildThe amount of vacuum (molecules and surfaces)Number of images in a NEBNumber of k-points in a band structureNumber of bands in an EELS runNumber of energies in an EELS runNumber of atoms included in the derivatives

Page 16: Introductory concepts: Computer etiquette

MMG Skills Lecture Series16

Calculation strategy: hierarchy of hierarchy of costscosts

For an SCFSCF step:The time for an energy scales as nα, α~2→4 where n is the dimension of the Hamiltonian.Going from real arithmetic to complex (at a general k-point), the time increases by a factor of ~½-1 order of magnitude.Sampling the Brillouin-zone mp23 generally includes 4 (complex) points for a cubic cell.Spin polarisation doubles the time taken

The number of SCF steps for an energy may increase with spin-polarisation.Compare the time for an energy for a pppp, pdpp, ddpp and dddd basis.Compare the time for pppp, gamma point, spin averaged, and dddd, 4 complex k-points, spin polarised.

Page 17: Introductory concepts: Computer etiquette

MMG Skills Lecture Series17

Calculation strategy: hierarchy of hierarchy of costscosts

The most common calculation we perform is a structural relaxation.If we do not have an accurate starting structureaccurate starting structure, it is likely that the optimisation will require multiple optimisation iterations.

Why is this likely to be the case?The number of structural iterations is approximately independentindependent of how we run AIMRPO, provided the calculation is performed within a ‘reasonable’ set of parameters (i.e. not necessarily convergent ones).

What does this tell us about the energy surface?A structural iteration generally takes more SCF stepsmore SCF steps when we’re far from the structural minimum.

This is related to recycling charge densities from previous structures – why does this affect the number of SCF cycles?

Page 18: Introductory concepts: Computer etiquette

MMG Skills Lecture Series18

Calculation strategy: hierarchy of hierarchy of costscosts

Example: Substitutional gold in silicon. We want to know the symmetry, donor and acceptor levels. We need to check the convergence with

Cell sizeBasisPseudo-potential – whether the 5d electrons are included in the valence or not

How do we start? Sketch out a strategy to get all the data we need.

Page 19: Introductory concepts: Computer etiquette

MMG Skills Lecture Series19

Calculation strategy: hierarchy of hierarchy of costscosts

BasisDesign a hierarchical basis set sequence:

C44G* → pdpp → dddd

How do we know where we can start?And how far we need to go?

Page 20: Introductory concepts: Computer etiquette

MMG Skills Lecture Series20

Calculation strategy: hierarchy of hierarchy of costscosts

SamplingSamplingStart with simplest viableviable sampling scheme:

Gamma-point, mp23,…

Use k-point parallelismThe Hamiltonians for different k-points are diagonalised in parallel, rather than in serialparameter{use_kpar}

If nk<np, then np must be an integer multiple of nk.

If nk>np, then this is not the case.

How do we know what is “viableviable”?How does this improve efficiency?What are the potential pitfalls?

k1 k2

k3 k4

k4 k4

k4 k4

k3 k3

k3 k3

k2 k2

k2 k2

k1 k1

k1 k1

Page 21: Introductory concepts: Computer etiquette

MMG Skills Lecture Series21

Calculation strategy: hierarchy of hierarchy of costscosts

Spin polarisationSpin polarisationFor spin-polarized problems, first relax spin averaged – this makes the calculation run twice as fast.

What does spin averaged / polarised mean?Can we always relax S=0?Is it always really helpful?

Page 22: Introductory concepts: Computer etiquette

MMG Skills Lecture Series22

Calculation strategy: hierarchy of hierarchy of costscosts

Supercell sizeSupercell sizeStart with a small unit cell and embed it in larger ones:

64 → 216 → 512 → 1000 atomsUse an anchor pointanchor pointTake care over symmetrysymmetry

What are the implications for timing?

Page 23: Introductory concepts: Computer etiquette

MMG Skills Lecture Series23

Calculation strategy: hierarchy of hierarchy of costscosts

Starting structures continued…Starting structures continued…Use the structure obtained from one charge state to start others.Recycle similar systems: e.g. use a phosphorus structure to start an arsenic one.

If you’ve already run in LDA and you now want it GGA, scale the structure according to the ratio of standardized lengths (typically lattice constants).

What sort of errors might these short-cuts lead to, if any?

Page 24: Introductory concepts: Computer etiquette

MMG Skills Lecture Series24

Calculation strategy: hierarchy of hierarchy of costscosts

Recycle the charge-density!Recycle the charge-density!When restarting an incomplete relaxation, or restarting at the end of the relaxation to get some analytical data (e.g. AIMVIEW dumps, band-structures, EELS spectra...)Use a “restart-dumprestart-dump”!

What does the restart dump contain?restart{make-dump}

restart{load-dump}

restart{load-dump,override-positions}

How can the restart be used?How much time might this save?

Page 25: Introductory concepts: Computer etiquette

MMG Skills Lecture Series25

Restarting calculations: more generallymore generally

Your default mode of operation is to alwaysalways write to a restart dump using:

restart{make-dump,file=dump.xxxxx}You might store them all in one place

filespace{/scratch/njpg/DUMP-FILES}This is what I do!

The xxxxx is a unique identifier associating the dump-file with the run.The restart dump files may be very largevery large in terms of disk-usage!Restarts also exist for other calculations:

NEB This will be discussed in detail elsewhere in this course

Energy second derivatives with respect to positionYou should check the on-line documentation

Page 26: Introductory concepts: Computer etiquette

MMG Skills Lecture Series26

Sometimes we have to bite the bullet apple(?!).

After all this…

Page 27: Introductory concepts: Computer etiquette

MMG Skills Lecture Series27

Throughput vs turn-around time

When running a parallel jobparallel job, there is a scaling penalty:

Doubling the number of nodes will NOT will NOT half the time (in general).Why is this?It WILL WILL prevent other people from using the nodes.

You will constantly balance throughputthroughput (the number of jobs completed per day) with turn-aroundturn-around time (wall time from start to job completion).In general:

throughput is maximised by adopting the minimum number of nodes for the job, given the batch queue time limitsturn-around time is minimised by adopting larger numbers of nodes (subject to scaling).

Minimising the turn-around time is anti-social anti-social behaviourbehaviour.Big brother is watching you.

Page 28: Introductory concepts: Computer etiquette

MMG Skills Lecture Series28

Interacting with a batch queuebatch queue system

The batch queue systems differ from machine to machine.It is important that you familiarise yourself with:

Memory per nodeCPUs/cores per nodeTime limitsScheduling priorityDisk-usageScaling (interconnect)Reliability

How do these relate to the preceding discussion?

Page 29: Introductory concepts: Computer etiquette

MMG Skills Lecture Series29

Interacting with a batch queue system

Every day, check all of your jobsRunningFinished

For those still running, check to see:

Is all well?Can/should this run be terminated?

When might this be?

Maximise the available resources Maximise the available resources for all usersfor all users

For those finished:Restart them, if need be

Incomplete relaxationAnalysis

File them carefully

Page 30: Introductory concepts: Computer etiquette

MMG Skills Lecture Series30

Concluding thoughts

Be organisedBe hardware awareBe efficientBe sociable

THINKTHINK about what you’re doing.