21
materiaIs virtuaLab Python Materials Genomics (pymatgen) Shyue Ping Ong November 10, 2014 MAVRL Workshop 2014

MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Embed Size (px)

DESCRIPTION

This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the Python Materials Genomics (pymatgen) materials analysis library. Pymatgen is a robust, open-source Python library for materials analysis. It currently powers the public Materials Project (http://www.materialsproject.org), an initiative to make calculated properties of all known inorganic materials available to materials researchers. These are some of the main features: 1. Highly flexible classes for the representation of Element, Site, Molecule, Structure objects. Extensive io capabilities to manipulate many VASP (http://cms.mpi.univie.ac.at/vasp/) and ABINIT (http://www.abinit.org/) input and output files and the crystallographic information file format. This includes generating Structure objects from vasp input and output. There is also support for Gaussian input files and XYZ file for molecules. 2. Comprehensive tool to generate and view compositional and grand canonical phase diagrams. 3. Electronic structure analyses (DOS and Bandstructure). 4. Integration with the Materials Project REST API.

Citation preview

Page 1: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

materiaIsvirtuaLab

Python Materials Genomics ���(pymatgen)

Shyue Ping Ong

November 10, 2014

MAVRL Workshop 2014

Page 2: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Python Materials Genomics (pymatgen)

Core materials analysis powering the Materials Project • Defines core extensible Python objects for materials data

representation. • Provides a robust and well-documented set of structure

and thermodynamic analysis tools relevant to many applications. • Establishes an open platform for researchers to

collaboratively develop sophisticated analyses of materials data.

November 10, 2014 MAVRL Workshop 2014

Page 3: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Vision for pymatgen

To be the leading open-source software platform for robust materials analysis.

November 10, 2014 MAVRL Workshop 2014

Page 4: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

pymatgen is now global.

November 10, 2014 MAVRL Workshop 2014

Page 5: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Overview of Pymatgen

November 10, 2014 MAVRL Workshop 2014

Page 6: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

from pymatgen import dao

1. Great code enables great materials science. 2. Comprehensive tests ensure robustness. 3. Clear documentation leads to more usage. 4. More usage improves code quality (and increases citations). 5. Even complex scientific ideas can be broken down into simple interfaces. 6. Though deep (Hulk-level) understanding is often necessary to develop the right interface design. 7. Slow and accurate is better than fast and wrong. 8. But efficiency matters for core classes. 9. The law of thermodynamics apply: code entropy always increases in a closed system. 10. Constant refactoring is the hallmark of an open platform.

November 10, 2014 MAVRL Workshop 2014

Page 7: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Most frequently used packages

Package name Purpose

core (start here) Defines classes and methods that are common to many analyses, e.g., Element, Site, PeriodicSite, Lattice, Structure, Molecule, etc.

electronic_structure Bandstructure, DOS classes. Plotting and analysis tools.

entries ComputedEntry – Basic unit of most thermodynamic and other analyses (e.g., constructing phase diagrams or reaction enthalpies) Compatibility – Defines schemes to “correct” entries for compatibility between different computational methods and/or certain analysis

io Input and output between pymatgen’s objects and various file formats. E.g., reading CIF files, writing and reading VASP input and output, ABINIT, Gaussian, Qchem, ….

symmetry Symmetry analysis. Spacegroup, point group, etc.

November 10, 2014 MAVRL Workshop 2014

Page 8: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Analysis packages

Package name Purpose

phasediagram Constructing compositional and grand canonical phase diagrams. Analyze stability.

analysis Master package containing lots of different materials analyses. A few key ones are:

.structure_matcher (Will Richards, Steve

Dacek and Shyue Ping)

In-house super powerful structure matching algorithm. Tells you whether two structures are the same, have the same framework, etc. Use this to avoid duplicate calculations.

.reaction_calculator Calculate enthalpies of reactions. Balances reactions.

.diffusion_analyzer Analyze MD runs to determine diffusivity, conductivity, Arrhenius plots, etc.

.pourbaix.* Constructs Pourbaix diagrams. Similar to phase diagrams, except studies aqueous stability.

.defects (Bharat Medasani)

Analysis of defects – interstitial, vacancies, etc. Highly experimental at this stage

November 10, 2014 MAVRL Workshop 2014

Page 9: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Other packages

November 10, 2014

Package name Purpose

transformations Defines ways of making changes to structures. Examples: substitute a species for another, remove certain species, ordering of disordered structures, etc.

structure_prediction Predict completely novel structures! Based on algorithms developed by Geoffroy Hautier and fine-tuned by Will Richards

alchemy High-throughput tools to make lots of changes to lots of structures in a manner that preserves provenance / history.

Structure manipulations and generation

Package name Purpose

Matproj High-level interface to the Materials Project RESTful API. Allows one to download computed data (energies, DOS, bandstructures) and relaxed structures.

Interfacing with the Materials Project

MAVRL Workshop 2014

Page 10: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

www.pymatgen.org stats •  Steady increase over the past year •  > 1000 views per month on average

v2.7.0 è v3.0.7

> 500 commits over the last year.

Pymatgen coders work Mon-Wed.

Major new features / functionality •  Support for ABINIT 7.6.1

(ABINIT group/UCL) •  Defects (Haranczyk/LBNL) •  Qchem (JCESR) •  Robust units handling

(UCSD/UCL) •  XRD pattern simulation

(UCSD)

# of active contributors has more than doubled!

Major new users / fans

November 10, 2014 MAVRL Workshop 2014

Page 11: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Getting Started

http://www.pymatgen.org is your friend •  Usage guide: http://pymatgen.org/usage.html •  Simple examples: http://pymatgen.org/examples •  API docs: http://pymatgen.org/modules.html

Source code •  Openly available on Github:

https://github.com/materialsproject/pymatgen •  Very comprehensive unit tests

November 10, 2014 MAVRL Workshop 2014

Page 12: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Practical example of typical usage

You have an experimental collaborator who has an idea that substituting Sn for Ge in Li4GeS4, a fast, but expensive Li-ion conductor, might improve its properties and be cheaper. But before he proceeds to attempt a potentially difficult synthesis, he wants to know if you can use first principles calculations to estimate if a potential Li4SnS4 phase would be stable. What would you do?

November 10, 2014 MAVRL Workshop 2014

Page 13: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Broad steps

November 10, 2014

Get the known Li4GeS4 phase

Substitute Sn for Ge and

generate the input files.

Do calculations with your

favorite DFT code.

Construct the phase diagram for the Li-Sn-S

system to understand if

Li4SnS4 is stable.

MAVRL Workshop 2014

Page 14: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Hands-on Tutorial

For this tutorial, we will be using the excellent IPython Notebook. Basically, the notebook is like a superpowered scratch space for you to write quick analyses and scripts. You can install and run this software on your own computers, but for the purposes of this workshop, we are running a notebook server on Amazon EC2, which has all the necessary packages (pymatgen, etc.) already installed.

1.  Go to http://bit.ly/mavrlwksp2014 (bypass any security warnings). When asked for a password, type in “MVLworkshop”.

2.  Create a new notebook. Rename it as <first_name>_<last_name>_pmg.

November 10, 2014 MAVRL Workshop 2014

Page 15: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Step 1: Getting Li4GeS4

Option 1: The traditional, slow and bad option •  Do a search and download the CIF for Li4GeS4 from an an existing database like

the ICSD (this is already done for you, the filename is ICSD_95649.cif).

Option 2: Use the MPRester inteface to the Materials API •  Register at www.materialsproject.org. •  Go to www.materialsproject.org/dashboard. •  Generate your API key and copy it. •  Use pymatgen’s Materials API interface to get the structure.

November 10, 2014

Advantages 1.  You get pre-relaxed structures 2.  You can get a lot of structures

at once

Hint: If you ever want to see the doc of any method, use ipython’s “?” syntax. For example, “Structure.from_file?” will show you the doc of what it

does and the args. Pymatgen is extremely well-documented.

MAVRL Workshop 2014

Page 16: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Step 2: Doing the substitution and generating the input files

Simple method: •  Depending on whether you got the structure from the ICSD or Materials Project, you

need to replace either Ge4+ or Ge with Sn.

•  Pymatgen has support for all VASP input files. Butgenerating them manually is a bit of work. We will use what is known as “InputSets” to generate input files. Input sets are basically well-defined rules for generating inputs from structures. They define things like what the appropriate INCAR parameters (e.g., U value for each element), an algorithm for generating a KPOINTS grid, the PSP to use. We will use the MPVaspInputSet, which is the well-tested set of parameters that is currently being used in the Materials Project.

November 10, 2014 MAVRL Workshop 2014

Page 17: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Step 2: Doing the substitution and generating the input files, contd.

“Advanced” method: •  Method described in the previous slide works perfectly fine and is the fastest way. But a

major problem is that all provenance is lost, i.e., if you revisit your calculations many months down the road, you have forgotten how you generated the structure and input files in the first place.

•  Pymatgen’s alchemy + transformations packages are designed to deal with such issues. A bit more complex to use, but if you are doing a lot of calculations on many different structures, it is important to keep a record of the history of each structure came from.

•  An example is given below. We will not go through this exercise, but just mention that in every directory of VASP input files, there will be a “transformations.json” file that records everything that has been done, e.g., the source of the structure, the transformations performed, etc. This file will be parsed by pymatgen-db to be recorded in the database.

November 10, 2014 MAVRL Workshop 2014

Page 18: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Step 3: Do your DFT calculations

We are not actually going to run DFT calculations in this tutorial. We will just note that the Materials Project Infrastructure has many tools (Custodian, Fireworks) to help do this better as well (covered in later parts).

For this tutorial, a vasprun.xml from a completed calculation is already present for you to parse. The Vasprun object is pymatgen’s highly efficient parser for the vasprun.xml. From that, we can get a ComputedEntry for analysis.

November 10, 2014

Hint: Ipython has an excellent tab-completion system. For example, if you type the filename as “vasprun” and hit tab, ipython will autocomplete it for

you, similar to most Unix-command lines.

MAVRL Workshop 2014

Page 19: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Step 4: Construct the Li-Sn-S phase diagram

To construct the phase diagram of Li-Sn-S, you need the energies of all structures in the Li-Sn-S system, i.e., all Li, Sn, S, LixSny, LixSy, SnxSy and LixSnySz phases. •  Rapidly becomes a lot of

calculations for more components.

•  => Good news is that we can use the MPRester to get pre-calculated data from the Materials Project!

November 10, 2014 MAVRL Workshop 2014

Page 20: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Final result

November 10, 2014

Li4SnS4 is stable!

MAVRL Workshop 2014

Page 21: MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)

Summary

Pymatgen is an extremely powerful tool for materials analysis and for facilitating first principles calculations. Tight integration with the Materials Project is a key feature – enables analyses that would otherwise be very time-consuming to perform. Very well-documented and robustly tested. Supported by a large and growing community of materials developer-scientists. November 10, 2014 MAVRL Workshop 2014