51
www.ccdc.cam.ac.u k How to make the most of a QM calculation Noel O’Boyle [email protected]

Www.ccdc.cam.ac.uk How to make the most of a QM calculation Noel O’Boyle [email protected]

Embed Size (px)

Citation preview

www.ccdc.cam.ac.uk

How to make the most of a QM calculation

Noel O’Boyle

[email protected]

www.ccdc.cam.ac.uk

Background

• ‘Career’:– (ROI) UCG, DCU, UCD

– (UK) UCC, CCDC

• PhD in Computational Inorganic Chemistry– Han Vos, Dublin City University (Ru polypyridyls)

• Postdoc in Cheminformatics– Ciaran Regan, University College Dublin

– John Mitchell, University of Cambridge (MACiE)

• Postdoc in Protein-Ligand Docking– Cambridge Crystallographic Data Centre (GOLD)

www.ccdc.cam.ac.uk

Tools

• GaussSum

– GUI for analysing results of comp chem calculations

• cclib

– Python library for extracting data from comp chem

calculations (now used by GaussSum…and others)

• Pybel

– Python library giving access to OpenBabel

www.ccdc.cam.ac.uk

Some general themes

• Interoperability

• Reinvent the wheel

– Libraries spread the work, and increase the reach

• Tools can add value

• Cross-platform

• Python where possible

www.ccdc.cam.ac.uk

Python is the dominant scripting language in chemistry• Cheminformatics

– OpenBabel, RDKit, OEChem, Daylight, Cambios Molecular Toolkit, Frowns, PyBabel

• Computational chemistry

– OpenBabel, PyQuante, NWChem, Maestro/Jaguar, MMTK

• Visualisation

– CCP1GUI, PyMOL, Zeobuilder

• Scientific programming

– numpy (interface to ATLAS, LAPACK), can interface to C/C++, FORTRAN, matplotlib, VTK

www.ccdc.cam.ac.uk

Tools

• GaussSum

– GUI for analysing results of comp chem calculations

• cclib

– Python library for extracting data from comp chem

calculations (now used by GaussSum…and others)

• Pybel

– Python library giving access to OpenBabel

www.ccdc.cam.ac.uk

GaussSum (.sf.net)

• GUI written in Python

• Enables comparisons of calculated properties with experimental results– orbitals and molecular structure, partial density of states

• HOMO is 40% Ligand 1, 20% Ligand 2, etc.

– vibrational frequencies and IR spectrum• scale frequencies individually or generally

– electronic transitions and UV-vis, CD spectra

– electronic transitions and molecular structure

• lowest energy transition involves change in ‘charge density’ on Ligand 1 from 0% to 80%

• (Electron density difference map removed, but how to make package independent?)

NM O’Boyle, AL Tenderholt, KM Langner. J. Comp. Chem. 2008, 29, 839. http://gausssum.sf.net

www.ccdc.cam.ac.uk

GaussSum

• Simple features that make life easier for modellers

– ‘grep’ for lines containing particular expressions

• can store up to four expressions

– spectra and extracted data are written to files suitable for Excel

– plot convergence of geometry or SCF

• early warning of problems (unlike plotting of energy)

• “GaussSum parameter”

– Sum of (log of (deviation from target value))

[(for all unmet targets]

www.ccdc.cam.ac.uk

www.ccdc.cam.ac.uk

www.ccdc.cam.ac.uk

Tools

• GaussSum

– GUI for analysing results of comp chem calculations

• cclib

– Python library for extracting data from comp chem

calculations (now used by GaussSum…and others)

• Pybel

– Python library giving access to OpenBabel

www.ccdc.cam.ac.uk

cclib (.sf.net) - a Python library for package-independent computational chemistry algorithms

• In Jan 2005, Adam Tenderholt started writing PyMOlyze (now QMForge)

– some overlap with GaussSum

– we decided to collaborate on a common framework for extracting data from QM log files

• Karol Langner joined in Jan 2007

• cclib now extracts and standardises data from ADF, GAMESS, GAMESS-UK, Gaussian, PC GAMESS, Jaguar, Molpro, ORCA...(someone offered this week to help with ACES, Dalton, NWChem, and PSI too)

NM O’Boyle, AL Tenderholt, KM Langner. J. Comp. Chem. 2008, 29, 839. http://cclib.sf.net

www.ccdc.cam.ac.uk

Why is cclib needed?

• Analysis methods are available only to users of certain packages– Morokuma energy decomposition (implemented in

GAMESS)

– Charge Decomposition Analysis (Frenking's code only reads Gaussian output files)

• Keeps up to date with new versions of packages

• Allows chemists to focus on algorithms

• Makes implementation of algorithms independent of proprietary software

www.ccdc.cam.ac.uk

>>> from cclib.parser import ccopen>>> myfile = ccopen("basicGAMESS-UK/water_mp3.out")>>> data = myfile.parse()>>> dir(data)['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', '__weakref__', '_attrlist', '_attrtypes', '_intarrays', '_listsofarrays', 'aonames', 'arrayify', 'atombasis', 'atomcoords', 'atomnos', 'charge', 'coreelectrons', 'gbasis', 'getattributes', 'homos', 'listify', 'mocoeffs', 'moenergies', 'mosyms', 'mpenergies', 'mult', 'natom', 'nbasis', 'nmo', 'scfenergies', 'scftargets', 'scfvalues', 'setattributes']

>>> print data.nbasis7>>> print data.atomcoords[[[ 0. 0. -0.2251786] [ 0. 1.4941103 0.9007143] [ 0. -1.4941103 0.9007143]]]>>>

www.ccdc.cam.ac.uk

AttributeName

Description Units Datatype

aonames atomic orbital names List

aooverlaps atomic orbital overlap matrix array of rank 2

atomcoords atom coordinates Å array of rank 3

atomnos atomic numbers array of rank 1

coreelectrons number of core electrons in an atom's pseudopotential array of rank 1

etenergies energies of electronic transitions cm-1 array of rank 1

etoscs oscillator strengths of electronic transitions array of rank 1

etrotats rotatory strengths of electronic transitions array of rank 1

etsecs singly-excited configurations for electronic transitions list of lists

etsyms symmetries of electronic transitions List

fonames fragment molecular orbital names List

fooverlaps fragment molecular orbital overlap matrix array of rank 2

gbasis coefficients and exponents of Gaussian basis functions PyQuante format

geotargets criteria target values for geometry convergence array of rank 1

geovalues criteria values for geometry convergence array of rank 2

homos molecular orbital index of the HOMO(s) array of rank 1

mocoeffs molecular orbital coefficients list of arrays of rank 2

moenergies molecular orbital energies eV list of arrays of rank 1

mosyms molecular orbital symmetries list of lists

mpenergies Möller-Plesset corrected electronic energies eV array of rank 2

natom number of atoms Integer

nbasis number of basis functions Integer

nmo number of molecular orbitals Integer

scfenergies electronic energy of the molecule eV array of rank 1

scftargets criteria target values for SCF convergence array of rank 2

scfvalues criteria values for SCF convergence list of arrays of rank 2

vibdisps Cartesian displacement vectors ΔÅ array of rank 3

vibfreqs vibrational frequencies cm-1 array of rank 1

vibirs IR intensities km mol-1 array of rank 1

vibramans Raman intensities A4 amu-1 array of rank 1

vibsyms Symmetries of vibrations List

www.ccdc.cam.ac.uk

Standardisation of Symmetry Labels

• For the symmetry labelled BU by GAMESS and Gaussian, ADF uses B.u, GAMESS-UK uses bu and Jaguar uses Bu– cclib normalises all of these to Bu

• In other cases all of the programs disagree: A” is alternatively represented by AAA (ADF), A’’ (GAMESS), a1” (GAMESS-UK), A” (Gaussian) and App (Jaguar)

• (one of the programs is internally inconsistent in another case)

..\data\ADF\ADF2004.01\MoOCl4-sp.adfout.bz2... parsed ..\data\ADF\ADF2004.01\mo_sp.adfout.bz2... parsed ..\data\ADF\ADF2004.01\NH3.adfout.bz2... parsed ..\data\ADF\ADF2005.01\Os3(CO)12-D3h.zip... parsed ..\data\ADF\ADF2005.01\Os3.zip... parsed ..\data\ADF\ADF2006.01\Au2.out... parsed ..\data\ADF\ADF2006.01\Frags_NiCO4_orig.out... parsed ..\data\ADF\ADF2006.01\HgMeBr_zso_orig.out... parsed ..\data\ADF\ADF2006.01\dvb_gopt.adfout.bz2... parsed

Are the GAMESS UK files ccopened and parsed correctly? ..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt_b.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt_c.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt_d.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_ir.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_raman.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_sp.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_sp_b.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_un_sp.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\dvb_un_sp_b.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\MoOCl4-sp.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\water_mp2.out... parsed ..\data\GAMESS-UK\basicGAMESS-UK\water_mp3.out... parsed ..\data\GAMESS-UK\GAMESS-UK6.0\dscf_4.out.gz... parsed ..\data\GAMESS-UK\GAMESS-UK6.0\duhf_1.out.gz... parsed ..\data\GAMESS-UK\GAMESS-UK7.0\mg10.out.gz... parsed ..\data\GAMESS-UK\GAMESS-UK7.0\pyridine.out.gz... parsed ..\data\GAMESS-UK\GAMESS-UK7.0\pyridine2_21m10r.out.gz... parsed

Are the Jaguar files ccopened and parsed correctly? ..\data\Jaguar\Jaguar4.2\dvb_gopt.out.bz2... parsed ..\data\Jaguar\Jaguar4.2\dvb_gopt_b.out.bz2... parsed ..\data\Jaguar\Jaguar4.2\dvb_ir.out.bz2... parsed ..\data\Jaguar\Jaguar4.2\dvb_sp.out.bz2... parsed Total: 147 Failed: 0 Errors: 2

**** testGeoOpt: GAMESS-UK geometry optimization unittest. ****Are the indices in atombasis the right amount and unique? ... okAre atomcoords consistent with natom and Angstroms? ... okAre the atomnos correct? ... okAre the charge and multiplicity correct? ... okAre the coreelectrons all 0? ... okAre the dimensions of mocoeffs equal to 1 x (homo+5) x nbasis? ... okDo the geo targets have the right dimensions? ... okAre atomcoords consistent with geovalues? ... okAre scfvalues consistent with geovalues? ... okIs the index of the HOMO equal to 34? ... okIs the number of evalues equal to nmo? ... okIs the number of atoms equal to 20? ... okIs the number of basis set functions correct? ... okDid this subclass overwrite normalisesym? ... okIs the SCF energy within 40eV of target? ... okDo the scf targets have the right dimensions? ... okAre scfvalues and its elements the right type? ... okAre all the symmetry labels either Ag/u or Bg/u? ... okIs moenergies a list containing one numpy array? ... ok

----------------------------------------------------------------------Ran 19 tests in 0.016s

********* SUMMARY PER PACKAGE **************** Total Passed Failed Errors SkippedADF2007.01 48 46 0 0 2GAMESS-UK 58 58 0 0 0GAMESS-US 75 71 2 0 2Gaussian03 92 88 1 0 3Jaguar7.0 54 47 0 0 7Molpro2006 63 59 0 0 4ORCA2.6 54 44 5 3 2PCGAMESS 75 74 0 0 1

********* SUMMARY OF EVERYTHING **************TOTAL: 519 PASSED: 487 FAILED: 8 ERRORS: 3 SKIPPED: 21

www.ccdc.cam.ac.uk

But it’s Python! I only code C, FORTRAN, etc.

• Use cclib to convert the log file to JSON

• JSON libraries are available for

– C, C++, Java, Javascript, Perl, PHP, Python, Ruby

• Trivial to write data to some type of FORTRAN format

www.ccdc.cam.ac.uk

Tools

• GaussSum

– GUI for analysing results of comp chem calculations

• cclib

– Python library for extracting data from comp chem

calculations (now used by GaussSum…and others)

• Pybel

– Python library giving access to OpenBabel

www.ccdc.cam.ac.uk

OpenBabel - “Not just file conversion”

• A C++ library for…

• Cheminformatics

– SMARTS searching, InChI, SMILES, molecular fingerprints, group-contribution based descriptors, determination of SSSR, bond order perception, hydrogen addition, Gasteiger charge calculation

• Computational chemistry

– AMBER, DMol3, Gaussian, GAMESS, GROMOS96, HyperChem, Jaguar, MOPAC, Q-Chem, Turbomole, ZINDO

• varying levels of support

• if you want to change this…

– forcefield minimisation (UFF, MMFF94, Ghemical)

– symmetrisation of almost symmetric molecules (coming soon)

http://openbabel.org

www.ccdc.cam.ac.uk

Language bindings…and wrappers

• OpenBabel is a C++ library

• SWIG allows access to OpenBabel from

– Java, Perl, Python, Ruby (and many more if we wish)

• SWIG bindings are direct 1-to-1 translation of C++ API and objects to a Python API and objects

• Pybel is a Pythonic wrapper around the SWIG bindings

– Makes it easy to carry out common tasks

– Allows idiomatic Python, e.g. using iterators, direct access to attribute values rather than Get/Set, reduces verbosity

NM O’Boyle, C Morley, GR Hutchison. Chem. Cent. J. 2008, 2, 5. http://openbabel.org/wiki/Python

www.ccdc.cam.ac.uk

SWIG bindings

import pybel

mol = pybel.readfile(“mol”, “caffeine.mol”).next()mol.optimise(“UFF”) # Coming soon!

import openbabel as ob

obconv = ob.OBConversion()obconv.SetInFormat(“mol")obmol = ob.OBMol()obconv.ReadFile(obmol, “caffeine.mol")

obff = ob.OBForceField.FindForceField("UFF")obff.Setup(obmol)obff.ConjugateGradients(1000)obff.UpdateCoordinates(obmol)

Pybel

Let’s read a MOL file and optimise the geometry with the UFF forcefield

www.ccdc.cam.ac.uk

import pybel

inchis = []output = pybel.Outputfile("sdf", "uniquemols.sdf")

for mol in pybel.readfile("sdf", "inputfile.sdf"): inchi = mol.write("inchi") if inchi not in inchis: output.write(mol) inchis.append(inchi)output.close()

Eliminate duplicate molecules from a multimolecule SD file

Note to self: should use ‘set’ instead of ‘list’ for O(N) instead of O(N**2)

www.ccdc.cam.ac.uk

Make it work on Windows!• Most users use Windows, and even Linux users

want the option of jumping between OSs

• You restrict the reach of your software (and hasten its replacement)

• Case study cclib-0.8 (Nov 07):– cclib-0.8.tar.gz 63

– cclib-0.8.zip 58

– cclib-0.8-py2.4.exe 26

– cclib-0.8-py2.5.exe 45

• For every Linux user, there are 2 Windows users

www.ccdc.cam.ac.uk

Make it easy to install on Windows!• No dependencies

• Case study: GaussSum 2.1.4 (Nov 2007)– GaussSum-2.1.4.tar.gz 143 (Linux)

– GaussSum-2.1.4.zip 206 (Windows, requires Python, Numpy and Python Imaging Library)

– GaussSumexe-2.1.4.zip 396 (Windows, no dependencies)

www.ccdc.cam.ac.uk

Make it easy to install on Windows!• No dependencies

• Case study: GaussSum 2.1.4 (Nov 2007)– GaussSum-2.1.4.tar.gz 143 (Linux)

– GaussSum-2.1.4.zip 206 (Windows, requires Python, Numpy and Python Imaging Library)

– GaussSumexe-2.1.4.zip 396 (Windows, no dependencies)

• Lower the barrier to installation– A one-click installer > a .zip file >> a .tar.gz file

– Make the installation instructions easy

• Case study: OpenBabel– OB 2.0.1 Linux:Windows 5:4

– OB 2.1.1 Linux:Windows 5:7.5

www.ccdc.cam.ac.uk

Some questions

• Why is it so easy to add value to QM calculations?– QM developers don’t consider analysis of results?

www.ccdc.cam.ac.uk

Some questions

• Why is it so easy to add value to QM calculations?– QM developers don’t consider analysis of results?

• Why don’t QM software developers list compatible tools on their website?– Good for the QM software, good for the tool

www.ccdc.cam.ac.uk

Some questions

• Why is it so easy to add value to QM calculations?– QM developers don’t consider analysis of results?

• Why don’t QM software developers list compatible tools on their website?– Good for the QM software, good for the tool

• Why don’t QM software developers make it easier for tool developers?– API, documentation describing output, XML, interoperability

www.ccdc.cam.ac.uk

Some questions

• Why is it so easy to add value to QM calculations?– QM developers don’t consider analysis of results?

• Why don’t QM software developers list compatible tools on their website?– Good for the QM software, good for the tool

• Why don’t QM software developers make it easier for tool developers?– API, documentation describing output, XML, interoperability

• Why not open source?– Could fix these problems myself

www.ccdc.cam.ac.uk

Some questions

• Why is it so easy to add value to QM calculations?– QM developers don’t consider analysis of results?

• Why don’t QM software developers list compatible tools on their website?– Good for the QM software, good for the tool

• Why don’t QM software developers make it easier for tool developers?– API, documentation describing output, XML, interoperability

• Why not open source?– Could fix these problems myself

• Why can’t I mix and match calculation methods from different programs?

www.ccdc.cam.ac.uk

Some more questions

• Why do academics restrict usage of their sophisticated routines to a single proprietary code?

www.ccdc.cam.ac.uk

Some more questions

• Why do academics restrict usage of their sophisticated routines to a single proprietary code?

• Why do some visualisation packages use their own parsing routines instead of adding them to libraries like OpenBabel or cclib?

www.ccdc.cam.ac.uk

Some more questions

• Why do academics restrict usage of their sophisticated routines to a single proprietary code?

• Why do some visualisation packages use their own parsing routines instead of adding them to libraries like OpenBabel or cclib?

• Why don’t QM packages donate code or contract developers to improve support in libraries like OpenBabel or cclib?

– ADF is doing this

www.ccdc.cam.ac.uk

Some more questions

• Why do academics restrict usage of their sophisticated routines to a single proprietary code?

• Why do some visualisation packages use their own parsing routines instead of adding them to libraries like OpenBabel or cclib?

• Why don’t QM packages donate code or contract developers to improve support in libraries like OpenBabel or cclib?

– ADF is doing this

• How can we coordinate interoperability?

4

• BlueObelisk.org

• I propose [email protected]

www.ccdc.cam.ac.uk

Wish list

• Build farm (buildbot)

• Calculation farm

• Electron density export will give a major payoff

– Coarse (STO-3G)

– Medium (6-31G..)

– Fine (..)

• Let’s promote each other (help us help you)

www.ccdc.cam.ac.uk

Conclusions

• Interoperability

• Reinvent the wheel

– Libraries spread the work, and increase the reach

• Tools can add value

• Cross-platform

• Python where possible

• “Some of the people some of the time” is a good aim

www.ccdc.cam.ac.uk

Thanks!

• The OpenBabel development team and particularly Geoff Hutchison and Chris Morley

• cclib: Adam Tenderholt and Karol Langner

• SourceForge

• Email: [email protected], [email protected]

• Blog: http://baoilleach.blogspot.com

• Website: http://www.redbrick.dcu.ie/~noel

• Check out Linux4Chemistry

• Consider subscribing to RSS feed for Blue Obelisk blogs– http://cb.openmolecules.net/posts.php?category=Blue Obelisk

www.ccdc.cam.ac.uk

QM at Cambridge Crystallographic Data Centre (CCDC)

Noel O’Boyle

www.ccdc.cam.ac.uk

www.ccdc.cam.ac.uk

www.ccdc.cam.ac.uk

www.ccdc.cam.ac.uk

www.ccdc.cam.ac.uk

www.ccdc.cam.ac.uk

To get MOGUL screenshot: