Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 (WG D23/0006/01) Elda Rossi, Andrew Emerson – CINECA –Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna –Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara –Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse –José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona Computational Chemistry Motivation Vocabolary wrappers

Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Embed Size (px)

Citation preview

Page 1: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Looking for a (standard) Common Format for (Quantum)

A WG activity within COST action 23 (WG D23/0006/01)

– Elda Rossi, Andrew Emerson – CINECA–Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna–Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara–Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse–José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona

– Elda Rossi, Andrew Emerson – CINECA–Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna–Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara–Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse–José Sanchez-Marin - Universitat de Valencia –Peter Szalay - Eötvös Loránd University –Rosa Caballol - Universitat Rovira i Virgili Tarragona

Computational Chemistry



Page 2: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Motivation for the work

To build a meta-system for supporting research collaboration in the field of

“Localised Orbitals in post-SCF methods …

Linear Scaling methods in a Multi-Reference context”



Page 3: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

The scenario

Different laboratories need to collaborate Different “home-made” codes need to be used together since

they give different views of the same problem General purpose “basic” codes needed to pre-compute data in

a sort of pipeline Programmes should remain

on their original sites under the responsibility of their authors

Different platforms Network connections (grid architecture)




Page 4: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

The need of a Common Format

The first problem we faced:How different codes (on different platforms) can communicate

we need a Common Format for (at least) Quantum Chemistry codes



Page 5: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Preliminary steps

Looking around …o CML available since long timeo XML is use by Accelrys for internal fileso XML is used by ArgusLab for internal files

All of them not completed suited for computational chemistry mainly structural chemistry, no Quantum Chemistry properties

XML seems the best technology so we took the decision to try another XML based format

HDF5 looked nice for storing large binary data typical of QC



Page 6: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian






Data Data RepositoryRepositoryXML/HDF

Leaves the program unchanged

One wrapper for each program – If a code is added only one wrapper to be written

How should work the engine



Page 7: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

QCML: an XML format for QC

In order to be as general as possible we need to write down a hierarchical schema of Quantum Chemistry quantities

As a first approximation three domains can be identified

Base FACTS initial data for describing the physics of the system

DERIVED quantities computed from FACTS using QC Fact algorithms (Energies, Props, integrals, coeff, …)

W-FLOW which codes are in the pipeline, specific input Parameters data, …

•A base fact is a fact that is a given in the world and is remembered (stored) in the system. •A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions.



Page 8: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

FACT: molecule<system title date program author><molecule nElectrons charge spinMultiplicity

spaceSymmetry> <symmetry> groupName/> <geometry type unit numAtoms symmetryRef > <atom symbol isotope x3 y3 z3/> <basis name type numOrbitals >

<atomBase angularMomMAX symbol > <angularMom value symbol numOrbitals> <orbital id numPrimitives> <exps/> <coeffs/>


Symmetry: group name & other symmetry data

Geometry: only cartesian, full or unique for sym

Basis: by name or fully defined



Page 9: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

DERIVED data: computedData

<system …>


<energy unit levelOfTheory quality value>

<state spaceSymmetry spinMultiplicity excitationLevel />

<property unit levelOfTheory quality value>

<state “bra” spaceSymmetry spinMultiplicity excitationLevel />

<state “ket” spaceSymmetry spinMultiplicity excitationLevel />

<operator order name/>

<file address URL/>


A “schema” has been written for QCML



Page 10: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

DERIVED : computedData/file

Two possible strategies:1. Leave data in their native format and translate

them only when needed. Maintain different version (formats) of the same data

2. Define a “standard” format for binary data and convert them anyway

Problem with large binary datasets include the reference not the actual data

The second was the solution of choice HDF5 appears to be a good solution



Page 11: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

HDF Mission

To develop, promote, deploy, and To develop, promote, deploy, and support open and free technologies that support open and free technologies that facilitate scientific data storage, facilitate scientific data storage, exchange, access, analysis and exchange, access, analysis and discovery. discovery.

• Format and software for scientific data• Stores images, multidimensional arrays, tables, etc.• Emphasis on storage and I/O efficiency• Free and commercial software support• Emphasis on standards• Users from many engineering and scientific fields



Page 12: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian


“/” (root)


Example HDF5 file

Orb | occ | energy----|-----|----- 1 | 0 | 0.35 2 | 0.5| 0.26 3 | 2. | 0.69



KineticKineticOverlapOverlap RepulsionRepulsion




4-D 4-D arrayarray





Page 13: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

HDF file structure for QCRoot AO <i/j>

<i/T/j> <i/Vnuc/j>



MO <i/T/j> <i/V/j> <i/T/j>+<i/Vnuc/j>



Property <i/p/j>



Spin Polar.: Orb Classif: Core


Orb Energies: Orb Symm: [1-order]

+ format metadata (integer, binary, Endian-ism, …)



Page 14: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

QCML processing: wrappers

One couple of wrappers for each code in the metasystem

They should be written & maintained by the authors of the chemical codes

XML processing can be used (DOM) but … what language???

o Fortran: no easy and stable DOM available

o Scripting languages (Perl/Python/Java): not known by chemists

We tried both ways (Fortran & Python)We tried both ways (Fortran & Python)



Page 15: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Fortran DOM: drawbacks

The only problem is the Fortran bindingo It doesn’t exist (at least last year …)o DOM is OO and Fortran is not

It exists a C binding (Gdome2) Gdome2 was installed – very hard work – on

a mainframe platform (it was conceived for Linux)

We are currently converting it to Fortran, by adopting the DOM recommendations (simplified …)



Page 16: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Why Fortran

GOODGOOD•Users don't need to learn a new language•Homogeneous environment

BABADD•Tricky: need an external library (f77xml) built on top of gdome2•Porting problems for gdome2/libxml2 may arise



Page 17: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

F77xml library

Still in development ov0.4 is out (experimental, with limited features)ov1.0 upcoming, API changed to be nearly DOM2 compliant

Written in C on top of gdome2 http://gdome2.cs.unibo.it/index.html

Designed for interfacing to F77 (also F90 soon)Reduced namespace pollution

Cons: ● F77 syntax is difficult (DOM2 + tricks)● F90 syntax is simpler ● A pre-processor will convert F90 syntax to




Page 18: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

F77xml library - V1.0 example

GdomeNode* gdome_el_firstChild (GdomeElement *self, GdomeException *exc);

Call f77xml_el_firstChild(nodeCode, elemCode, exc)

First position:Return value

NodeCode, elemCode,excmapped to INTEGER

Gdome2 (C)Gdome2 (C)



Func='el_firstChild'Call xp3t1(nodeCode,func,elemCode,exc)

Multiplexer function:x:p3: 3 parameters (+ name function) t1: type 1 parameter schema (code/code/error)



Page 19: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Why PythonGOODGOOD Very Easy Object Oriented Language Works well with strings Simple ed efficient DOM interface for XML Present in almost all UNIX/LINUX distribution

BADBAD Users do need to learn a new language Maybe less powerful than Perl Usually not used by chemists



Page 20: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Python Wrapper

At the present a prototype does work with molpro-fci chain.

It takes information from xml-repository Writes down proper MOLPRO and FCI input Starts the two programs

With a different XML file users should only specify the file name and some simple parameters (orbital guess for FCI)



Page 21: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

Python or not

Python is very simple to learn and works very efficiently with xml

Scripts written in Python (at least for prototypes) are quite clear, linear and easy to maintain or upgrade

Possibility of a GUI could make our project much more user-friendly



Page 22: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

What we have done …Single platform:

IBM SP4Two code chains MolPro to FCI MolPro to CasDI



QCML Repository

HDF5 Repository



Bin file for FCI






Start here

Stop here

Page 23: Looking for a (standard) Common Format for (Quantum) A WG activity within COST action 23 ( WG D23/0006/01) – Elda Rossi, Andrew Emerson – CINECA –Gian

In conclusion …

Two important hints on data…Two important hints on data…1.Use some XML dialect for describing simple

structured data2.Use HDF5 for storing large array and binary data

Need of a good and easy API to XML & HDFNeed of a good and easy API to XML & HDF

How to manage the workflowHow to manage the grid connection