Creating It from Bit - Designing Materials by Integrating Quantum Mechanics, Informatics and Computer Science

materiaIsvirtuaLab

Creating It from BitDesigning Materials by Integrating Quantum Mechanics, Informatics and Computer Science

Shyue Ping Ong

February 23, 2017

57th Sanibel Symposium

Electronic structure calculations are today reliable and reasonably accurate.

February 23, 2017

(SSSP) libraries]. The considerable differencein the older potentials, even for the predefinedstructures in this relatively simple test set, pro-vides a compelling argument to use only themostrecent potential files of a given code.In addition to the comparison with all-electron

codes, it is also interesting to assess how thesame PAW or pseudopotential recipes are im-plemented in differentways.Whenboth theGPAWand ABINIT codes use the GPAW 0.9 PAW set,

for example, they agree towithin aD of 0.6meVperatom. A similar correspondence is found for theSchlipf-Gygi 2015-01-24 optimized norm-conservingVanderbilt pseudopotentials (ONCVPSP) (0.3meVper atom between Quantum ESPRESSO andCASTEP), the Garrity-Bennett-Rabe-Vanderbilt(GBRV) 1.4 ultrasoft pseudopotentials (0.3meVperatombetweenQuantumESPRESSO andCASTEP)and the GBRV 1.2 set (0.7 meV per atom betweenPAW potentials in ABINIT and ultrasoft poten-

tials in Quantum ESPRESSO). In this case, too,the small D values indicate a good agreementbetween codes. This agreementmoreover encom-passes varying degrees of numerical convergence,differences in the numerical implementation ofthe particular potentials, and computational dif-ferences beyond the pseudization scheme, mostof which are expected to be of the same order ofmagnitude or smaller than the differences amongall-electron codes (1 meV per atom at most).

Conclusions and outlook

Solid-state DFT codes have evolved considerably.The change from small and personalized codes towidespread general-purpose packages has pusheddevelopers to aim for the best possible precision.Whereas past DFT-PBE literature on the latticeparameter of silicon indicated a spread of 0.05 ,the most recent versions of the implementationsdiscussed here agree on this value within 0.01 (Fig. 1 and tables S3 to S42). By comparing codeson a more detailed level using the D gauge, wehave found the most recent methods to yieldnearly indistinguishable EOS, with the associ-ated error bar comparable to that between dif-ferent high-precision experiments. This underpinsthe validity of recentDFTEOS results and confirmsthat correctly converged calculations yield reliablepredictions. The implications are moreover rele-vant throughout the multidisciplinary set of fieldsthat build upon DFT results, ranging from thephysical to the biological sciences.In spite of the absence of one absolute refer-

ence code, we were able to improve and demon-strate the reproducibility of DFT results by meansof a pairwise comparison of a wide range of codesand methods. It is now possible to verify whetherany newly developed methodology can reach thesame precision described here, and new DFTapplications can be shown to have used a meth-od and/or potentials that were screened in thisway. The data generated in this study serve as acrucial enabler for such a reproducibility-drivenparadigm shift, and future updates of availableD values will be presented at http://molmod.ugent.be/deltacodesdft. The reproducibility ofreported results also provides a sound basis forfurther improvement to the accuracy of DFT,particularly in the investigation of new DFT func-tionals, or for the development of new computa-tional approaches. This work might thereforesubstantially accelerate methodological advancesin solid-state DFT.Future work can examine the reproducibility

of different codes even further. Such work mightinvolve larger benchmark sets (describing differ-ent atomic environments per element), other func-tionals, an exhaustive comparison of differentrelativistic treatments, and/or a more detailed ac-count of computational differences (using data-bases or scripts, for example). The precision ofband gaps, magnetic anisotropies, and other non-EOS properties would also be of interest. How-ever, the current investigation of EOS parametersprovides the most important pass-fail test of thequality of different implementations of Kohn-Sham theory. A method that is not able to reach

SCIENCE sciencemag.org 25 MARCH 2016 VOL 351 ISSUE 6280 aad3000-5

Fig. 4. D values for comparisons between the most important DFT methods considered (inmillielectron volts per atom). Shown are comparisons of all-electron (AE), PAW, ultrasoft (USPP), andnorm-conserving pseudopotential (NCPP) results with all-electron results (methods are listed in alpha-betical order in each category). The labels for each method stand for code, code/specification (AE), orpotential set/code (PAW, USPP, and NCPP) and are explained in full in tables S3 to S42.The color codingillustrates the range from small (green) to large (red) D values.Themixed potential set SSSPwas added tothe ultrasoft category, in agreement with its prevalent potential type. Both the code settings and the DFT-predicted EOS parameters behind these numbers are included in tables S3 to S42, and fig. S1 provides afull D matrix for all methods mentioned in this article.

RESEARCH | RESEARCH ARTICLE

on

Febr

uary

19,

201

7ht

tp://

scie

nce.

scie

ncem

ag.o

rg/

Dow

nloa

ded

from

Lejaeghere et al. Science, 2016, 351 (6280), aad3000.

Nitrides are an important class of optoelectronics (35), and the re-ported synthesizability of highly metastable nitrides from reactivenitrogen precursors (36, 37) suggests that there may be a broadspectrum of promising and technologically relevant metastable ni-trides awaiting discovery.

Although our study focuses on the metastability of inorganiccrystals, polymorphism and metastability in organic molecular solidsis of great technological relevance to pharmaceuticals, organic elec-tronics, and protein folding (7). Our observation relating cohesiveenergy to metastability could address a deep fundamental questionin organic molecular solids: Why do many molecular solids exhibitnumerous polymorphs within a small (~15C) temperature range,whereas inorganic solids often see >100C differences between poly-morph transition temperatures? The weak intermolecular bonds ofmolecular solids yield cohesive energies of roughly 100 kJ/mol (12)or 1 eV per molecule, about a third of the cohesion of the weakestclass of inorganic solids (iodides; Fig. 2B). This weak lattice cohesionyields a correspondingly small energy scale of accessible metastability(38). When this small energy scale of organic crystalline metastabilityis coupledwith the rich structural diversity arising fromhigh conforma-tional degrees of freedom duringmolecular packing (39), this inevitablyleads to awide range of accessible polymorphs over a small span of ther-modynamic conditions.

Influence of compositionThe space of metastable compounds hovers above an energy land-scape of equilibrium phases. As chemical elements are added to athermodynamic system, the complexity of this energy landscapegrows. Figure 2A shows an example calculated energy landscapefor the ternary Fe-Al-O system, plotted as a convex hull of forma-tion energies referenced to the elemental standard states (see sectionS1.2 for discussion). We anticipate the thermodynamic metastabilityof a phase to be different when it is competing against a polymorpha stable phase of the same composition (Fig. 2A, red stars)or againsta phase-separated statemultiple phases of different compositions(Fig. 2A, purple triangles). In Fig. 2B, we explore this hypothesis byconstructing probability distributions of metastability for allotropes,binaries, ternaries, quaternaries, and pentanaries and beyond, groupedby whether the competing equilibrium phase is a polymorph (shadedlight) or phase-separated (shaded dark). The relative areas of the shadedregions are proportional to the ratio of entries within each composition.

Figure 2B demonstrates that the more elements present in ametastable compound, the more likely that its competing equilib-rium state is phase-separated rather than polymorphic, and that ingeneral, these phase-separating compounds tend to be more meta-stable than polymorphs. The increased probability for phase sepa-ration with increasing number of elements results from a higherlikelihood of low-energy decomposition products to exist in a broaderchemical space. However, even though this brings about greater ther-modynamic risk for phase separation, long-range chemical separa-tion is diffusion-limited, which can be a kinetic barrier that enablesthe persistence of highly metastable (>70 meV/atom) multinarycompounds. Indeed, there are emerging examples that the formationof low-dimensional crystals from multicomponent precursors underdiffusion-limited conditions can result in novel crystalline phasesthat are metastable with respect to phase separation (4, 4042). Incontrast, polymorphic phase transformations occur under constant localcomposition and thus lack this kinetic barrier of chemical separation,which may rationalize why the energy scale of metastability for

Fig. 1. Influence of chemistry on thermodynamic scale of metastability.(A) Cumulative distribution functions of crystalline metastability for the most-representedchemistries in the Materials Project. Manual investigation reveals that the 20% highest-energy structures in the ICSD do not correspond to observed, crystalline polymorphs.(B) Bivariate sample density maps of metastability versus cohesive energy for group VIcompounds. Chemistries with higher electronegativities, c, exhibit stronger bonds, re-sulting in greater median cohesive energies and higher accessible crystalline meta-stabilities. (C) Energy scale of metastability for various chemistries, ordered verticallyby the median cohesive energy. Left vertex, median metastability; right vertex, 90thpercentile. Within a periodic group, greater lattice cohesivity yields greater crystallinemetastability, as strong bonds can lock more metastable crystal structures.

S C I ENCE ADVANCES | R E S EARCH ART I C L E

Sun et al. Sci. Adv. 2016;2 : e1600225 18 November 2016 3 of 8

on Novem

ber 19, 2016http://advances.sciencem

ag.org/D

ownloaded from

HAUTIER, ONG, JAIN, MOORE, AND CEDER PHYSICAL REVIEW B 85, 155208 (2012)

or meV/atom); 10 meV/atom corresponds to about 1 kJ/mol-atom.

III. RESULTS

Figure 2 plots the experimental reaction energies as afunction of the computed reaction energies. All reactionsinvolve binary oxides to ternary oxides and have been chosenas presented in Sec. II. The error bars indicate the experimentalerror on the reaction energy. The data points follow roughlythe diagonal and no computed reaction energy deviates fromthe experimental data by more than 150 meV/atom. Figure 2does not show any systematic increase in the DFT error withlarger reaction energies. This justifies our focus in this studyon absolute and not relative errors.

In Fig. 3, we plot a histogram of the difference betweenthe DFT and experimental reaction energies. GGA + U un-derestimates and overestimates the energy of reaction with thesame frequency, and the mean difference between computedand experimental energies is 9.6 meV/atom. The root-mean-square (rms) deviation of the computed energies with respectto experiments is 34.4 meV/atom. Both the mean and rms arevery different from the results obtained by Lany on reactionenergies from the elements.52 Using pure GGA, Lany foundthat elemental formation energies are underestimated by GGAwith a much larger rms of 240 meV/atom. Our results arecloser to experiments because of the greater accuracy of DFTwhen comparing chemically similar compounds such as binaryand ternary oxides due to errors cancellation.40 We should notethat even using elemental energies that are fitted to minimizethe error versus experiment in a large set of reactions, Lanyreports that the error is still 70 meV/atom and much largerthan what we find for the relevant reaction energies. Therms we found is consistent with the error of 3 kJ/mol-atom

0 200 400 600 800

0

200

400

600

800

Computedreaction Energy (meV/at)

Exp

erim

enta

lre

actio

nE

nerg

y(m

eV/a

t)

FIG. 2. (Color online) Experimental reaction energy as functionof the computed reaction energy (in meV/atom). The error barindicates the experimental error. As the reaction energies are typicallynegative, the graph actually plots the negative of the reaction energy.

FIG. 3. (Color online) Histogram of the difference betweencomputed (!Ecomp0 K ) and experimental (!E

expt0 K ) energies of reaction

(in meV/atom).

(30 meV/atom) for reaction energies from the binaries in thelimited set of perovskites reported by Martinez et al.29

Very often, instead of the exact reaction energy, one isinterested in knowing if a ternary compound is stable enoughto form with respect to the binaries. This is typically the casewhen a new ternary oxide phase is proposed and tested forstability versus the competing binary phases.18 From the 131compounds for which reaction energies are negative accordingto experiments, all but two (Al2SiO5 and CeAlO3) are alsonegative according to computations. This success in predictingstability versus binary oxides of known ternary oxides canbe related to the very large magnitude of reaction energiesfrom binary to ternary oxides compared to the typical errorsobserved (rms of 34 meV/atom). Indeed, for the vast majorityof the reactions (109 among 131), the experimental reaction en-ergies are larger than 50 meV/atom. It is unlikely then that theDFT error would be large enough to offset this large reactionenergy and make a stable compound unstable versus the binaryoxides.

The histogram in Fig. 3 shows several reaction energieswith significant errors. Failures and successes of DFT are oftenknown to be chemistry dependent, and we present the effect ofthe chemistry on the DFT error by plotting, in Fig. 4, a matrixof absolute reaction energies errors. The x axis represents theoxides of element A and the y axis the oxide of element B.Each element in the matrix corresponds to an A-B-O chemicalsystem. When several reaction energies are available in achemical systems (i.e., several ternary compounds are present),we plotted the maximum absolute error energy in this system.The matrix is symmetric as A-B-O is equivalent to B-A-O.The elements are sorted by their Mendeleev number53 sothat important chemical classes (e.g., alkalis or transitionmetals) are grouped together. The first row and column inthe matrix indicate the mean of the difference computedexperimental for one given element across all ternary oxidechemistries.

It is remarkable that no systematically larger error is presentfor elements with partially filled d orbitals (e.g., Fe, Mn, Co, orNi), which indicates that the use of a Hubbard U is sufficient tocompensate the error associated with the localized d orbitals.On the other hand, elements containing f electrons such as

155208-4

PropertiesThe JSON document for each entry contains an organized list of sub-entries that describes the propertiesof each surface in detail. Each sub-entry contains information such as the Miller index, surface energyand the fraction of the Wulff shapes surface area occupied by this facet. For each Miller index, the lowestsurface energy termination, including among different reconstructions investigated where applicable,is provided in each sub-entry. The slab structure used to model the surface is available as a string in theJSON document in the format of a Crystallographic Information File (cif), which can also be downloadedvia the Materials Project website and Crystalium web application. In addition, the weighted surfaceenergy (equation (2)), shape factor (equation (3)), and surface anisotropy (equation (4)) are given.Table 2 provides a full description of all properties available in each entry as well as their correspondingJSON key.

Technical ValidationThe data was validated through an extensive comparison with surface energies from experiments andother DFT studies in the literature. Due to limitations in the available literature, only the data on groundstate phases were compared.

Comparison to experimental measurementsExperimental determination of surface energy typically involves measuring the liquid surface tension andsolid-liquid interfacial energy of the material20 to estimate the solid surface energy at the meltingtemperature, which is then extrapolated to 0 K under isotropic approximations. Surface energies forindividual crystal facets are rarely available experimentally. Figure 5 compares the weighted surfaceenergies of all crystals (equation (2)) to experimental values in the literature20,23,2628. It should be notedthat we have adopted the latest experimental values available for comparison, i.e., values were obtainedfrom the 2016 review by Mills et al.27, followed by Keene28, and finally Niessen et al.26 and Miller andTyson20. A one-factor linear regression line DFT EXP c was fitted for the data points. The choice ofthe one factor fit is motivated by the fact that standard broken bond models show that there is a directrelationship between surface energies and cohesive energies, and previous studies have found no evidencethat DFT errors in the cohesive energy scale with the magnitude of the cohesive energy itself61.

We find that the DFT weighted surface energies are in excellent agreement with experimental values,with an average underestimation of only 0.01 J m 2 and a standard error of the estimate (SEE) of0.27 J m 2. The Pearson correlation coefficient r is 0.966. Crystals with surfaces that are well-known toundergo significant reconstruction tend to have errors in weighted surface energies that are larger thanthe SEE.

The differences between the calculated and experimental surface energies can be attributed to threemain factors. First, there are uncertainties in the experimental surface energies. The experimental valuesderived by Miller and Tyson20 are extrapolations from extreme temperatures beyond the melting point.The surface energy of Ge, Si62, Te63, and Se64 were determined at 77, 77, 432 and 313 K respectively while

Figure 5. Comparison to experimental surface energies. Plot of experimental versus calculated weightedsurface energies for ground-state elemental crystals. Structures known to reconstruct have blue data pointswhile square data points correspond to non-metals. Points that are within the standard error of the estimate(0.27 J m 2) lie in the white region.

www.nature.com/sdata/

SCIENTIFIC DATA | 3:160080 | DOI: 10.1038/sdata.2016.80 8

Phase stability Formation energies

Tran, et al. Sci. Data 2016, 3, 160080.

Sun, et al. Sci. Adv. 2016, 2 (11), e1600225.

and Strickman14,55. Other properties computed in this work are the index of elastic anisotropy56 and thePoisson ratio in the isotropic approximation. The various derived properties are listed in Table 1,including expressions relating these properties to the elements of the single-crystal elastic tensor. Thecorresponding JSON keys and the datatypes are also listed in Table 1. The elastic tensor Cij is presented intwo ways in Table 1: i) in the standardized IEEE-format and ii) in the format corresponding to theorientation of the crystal structure as defined in the poscar-key in Table 2.

Graphical representation of resultsA graphical representation of our dataset is presented in Fig. 2, which shows a log-log plot of the VRHaveraged bulk modulus versus the VRH averaged shear modulus for all materials considered in this work.The orientation of each arrow corresponds to the volume per atom (VPA) of that specific material. Thematerial with the minimum VPA in our dataset is assigned an arrow pointing at 12 oclock (diamond)and the arrows rotate anti-clockwise towards the materials with the maximum VPA in our dataset at6 oclock (barium). The angle of rotation from 12 oclock to 6 oclock is proportional to the normalizedVPA. The VPA is considered since it is known to correlate well with elastic properties such as bulkmodulus5759. Indeed, Fig. 2 illustrates this apparent correlation. Specifically, diamond exhibits thehighest bulk and shear moduli of all materials in our database and it also has the smallest VPA amongthose materials. The more elastically compliant materials in Fig. 2 show relatively higher values for theVPA. The color coding in Fig. 2 represents the Poisson ratio in the isotropic approximation. Also, twolines of constants KVRH/GVRH ratio are drawn. As described in the Introduction, this quantity, known asPughs ratio2, has been shown to correlate with ductility in crystalline compounds2,3 and is further related

Figure 2. Distribution of calculated volume per atom, Poisson ratio, bulk modulus and shear modulus. Vectorfield-plot showing the distribution of the bulk and shear modulus, Poisson ratio and atomic volume for 1,181metals, compounds and non-metals. Arrows pointing at 12 oclock correspond to minimum volume-per-atomand move anti-clockwise in the direction of maximum volume-per-atom, which is located at 6 oclock. Barplots indicate the distribution of materials in terms of their shear and bulk moduli.

www.nature.com/sdata/

SCIENTIFIC DATA | 2:150009 | DOI: 10.1038/sdata.2015.9 7

Surface energies Elastic constants

de Jong et al. Sci. Data 2015, 2, 150009.

Hautier et al. Phys. Rev. B 2012, 85, 155208.


Modern electronic structure codes give relatively consistent equations of state.

Of course, challenges remain

February 23, 2017

What are the tools necessary for automation of electronic structure calculations?Automation

What is a model for open-vs-private data?Data

How and what can we learn from large quantities of materials data?

Can we really do in silico design of materials?

Learning & Design

Reliability + Reasonable Accuracy


Our automated future

February 23, 2017

X

Write reusable software

frameworks


User requirements for electronic structure calculation automation

February 23, 2017

H = T + V +U = E???

Need #1: Robust materials analysis software to talk between application, computable property and electronic structure calculations

Need #2: Error detection and correction

Need #3: Scientific workflow management


Software frameworks for HT electronic structure computations

February 23, 2017

Atomic Simulation Environmenthttps://wiki.fysik.dtu.dk/ase Materials Project1

https://www.materialsproject.org

Custodianhttp://aflowlib.org

http://www.aiida.net 1 Jain et al. APL Mater. 2013, 1 (1), 11002. 2 Ong et al. Comput. Mater. Sci. 2013, 68, 314319.3 Jain et al. Concurr. Comput. Pract. Exp. 2015, 27 (17), 50375059.

2

3


Extensive Materials Analysis Capabilities

Input/Output

objects

(Modular, Reusable, Extendable)

Defects and TransformationsElectronic Structure

XRD Patterns

Phase and Pourbaix Diagrams

Functional properties

Comprehensively documented

Continuously tested and integrated

Active dev/user communityOng et al. Comput. Mater. Sci. 2013, 68, 314319.

February 23, 2017 57th Sanibel Symposium

February 23, 2017

Global network of users

Some recent additions Dielectric constants Elastic constants and

phonons X-ray Absorption

Spectroscopy

Number of visits on pymatgen.org

Very active developer community!


Custodian

Simple, robust and flexible just-in-time (JIT) job management framework.

Wrappers to perform error checking, job management and error recovery.

Error recovery is an important aspect for HT: O(100,000) jobs + 1% error rate => O(1000) errored jobs.

Existing sub-packages for error handling for VASP, NwChem and QChemcalculations.


FireWorks is the Workflow Manager

Custom material

A cool material !!Lots of information about

cool material !!

Submit!

Input generation(parameter choice) Workflow mapping

Supercomputer submission / monitoring

Errorhandling File Transfer

File Parsing /DB insertion


FireWorks as a platform

Community can write any workflow in FireWorks

We can automate it over most supercomputing resources

structure

charge

Band structure

DOS

Optical

phonons

XAFS spectra

GW


February 23, 2017





Learning & Design



With great automation comes great quantities of data

February 23, 2017

Jain, et al. , APL Mater., 2013, 1, 11002.


User-friendly web interface (but unfriendly for advanced users requiring large quantities of data!)

Materials ExplorerBattery ExplorerCrystal ToolkitStructure PredictorPhase Diagram AppPourbaix Diagram AppReaction Calculator


Structure

Electronic Structure

Elastic properties

XRD

Energetic properties

Jain, et al. , APL Mater., 2013, 1, 11002.

Materials Project DB

How do I access MP

data?

Web

App

s

RES

Tful

API

The Materials API

Provides programmatic access to large quantities of data. Data can be used for analysis or for learning.

Ong et al. Comput. Mater. Sci. 2015, 97, 209215.


A modern data model for high-throughput computational research groups

February 23, 2017

Materials Project

Large, open electronic structure databases

AFLOW OQMD

APIs

Private small databases for individual research groups


February 23, 2017





Learning & Design



Applications


Rapidly explore vast chemical spaces Exclude bad candidates using a minimum of

computational resources Multi-property optimization Identify best candidate(s)

Screening

Analyze large data sets Identify trends Obtain new physics/chemistry insights

Learning

Next Generation All-solid-state Batteries


NMC Li metalSolid electrolyte

~

Li+

Solid electrolyte (SE) is non-flammableand can be stacked for higher system energy densities.

Can potentially enable high voltage cathodes like NMC

Can potentially enable Li metal anode, with significantly higher energies densities

Design Requirements for SEq Extremely high (super) ionic

conductivity > 0.1 mS/cmq Low electronic conductivityq Phase stabilityq Good electrochemical stability with

electrodes (intrinsic or passivation)q Mechanical compatibility

Predicting non-dilute diffusion properties with first principles

Lithium motion in Li10GeP2S12(sulfur tetrahedra frozen for clarity)

E(r) = h2

2m2(r)+V (r)(r)

Quantum mechanics

F =maNewtons laws of motion

Ab initio molecular dynamics or AIMD D =

12Ndt

ri (t + t0 )2 ri (t)

2

i

Computational and human-time intensive Huge quantities data need to be stored and analyzed

Typical AIMD simulation: 50,000-100,000 time steps of 2 fs at 4-6 temperatures(400,000 structures, 2-3 weeks on cluster)


Completely automated ab initio molecular dynamics


DynamicallyaddcontinuationAIMD jobstarting fromprevious one

Dynamically add initialAIMD jobs running atdifferent temperatures

Converged? Converged? Converged?

AIMDsimulation

AIMDsimulation

AIMDsimulation

Setupsimulation

box

Initialrelaxation

Start

End

N N N

Y Y Y

Deng, Z.; Zhu, Z.; Chu, I.-H.; Ong, S. P. Chem. Mater. 2017, 29 (1), 281288.

In-house system built entirely on pymatgen, custodian and fireworks!

Li3OClxBr1-x

pristine-Na3PS4

Li4P2S6

Li3OClxBr1-x

doped-Na3PS4

Li10GeP2S12

Li10Si1.5P1.5S11.5Cl0.5

Li10SnP2S12

Li10SiP2S12

pristine-Na3PS4

Li4P2S6

Li7P3S11

Li3OClxBr1-x

doped-Na3PS4

Li10GeP2S12

Li10Si1.5P1.5S11.5Cl0.5

Li10SnP2S12

Li10SiP2S12

pristine-Na3PS4

Li4P2S6

Li15P4S16Cl3 LiZnPS4

LiAl(PS3)2

MSD800K = 5

Li7P3S11

Li5PS4Cl2

Li3Y(PS4)2

MSD 120

0K = 7

MSD 800

K

Superionic conductor region

Can we screen with short (~50 ps) AIMD simulations?


Zhu et al. Chem. Mater. 2017, acs.chemmater.6b04049.

Establish minimum level of

diffusivity

Many Li superionic conductors have Ag analogues


Li7P3S11Ag7P3S11

a

c

Yamane et al., Solid State Ionics 2007, 178 (1518), 11631167.

Hautier et al., Inorg. Chem., 2010, 656663.

Data-mined ionic substitution probabilities

For an ionic substitution model, one could choose forexample as a feature function:

f X,X0

1 if Ca2substitutes for Ba2in the presence of O2-

0 else

(

4

The relevant feature functions are commonly defined byexperts from prior knowledge. If our chosen set of featurefunctions are informative enough, we expect to be able toapproximate the probability function by a weighted sum ofthose feature functions:

pnX,X0 %e

Pi

i fni X,X0

Z5

The i indicate the weight given to the feature fi(n)(X,

X0) in the probabilistic model. Z is a partition functionensuring the normalization of the probability func-tion. The exponential form chosen in eq 5 follows acommonly used convention in the machine learningcommunity.25

2.3. Binary Feature Model. A first assumption we makeis to consider that the feature functions do not dependon the number n of ions in the compound. Simply put, weassume that the ionic substitution rules are independentof the compounds number of components (binary, ternary,quaternary, ...).Therefore, we will omit any reference to n in the

probability and feature functions. Equation 5 becomes

pX,X0 % ePi

i fiX,X0

Z6

While the feature functions could be more complex,only simple binary substitutions are considered in thispaper. This means that the likelihood for two ions tosubstitute to each other is independent of the nature ofthe other ionic species present in the compound. Mathe-matically, this translates in assuming that the relevantfeature functions are simple binary features of the form:

f a, bk X,X0

1 Xk a and Xk0 b0 else

(

7

Each pair of ions a and b present in the domain isassigned a set of feature functions with correspondingweights k

a,b indicating how likely the ions a and b cansubstitute in position k. For instance, one of the featurefunction will be related to the Ca2 to Ba2 substitution.

f Ca2, Ba2

k X,X0

1 Xk Ca2 and Xk0 Ba2

0 else

(

8

The magnitude of the weight kCa2,Ba2 associated with

this feature function indicates how likely this binarysubstitution is to happen.Finally, the features weights should satisfy certain

constraints for any permutation of the components tonot change the result of the probability evaluation. Thosesymmetry conditions are

a, bk b, ak 9

and

a, bk a, bl 10

2.4. Training of the Probability Function. While themathematical form for our probabilistic model is nowwell established, the model parameters (the weights k

a,b)still need to be evaluated. Those weights are estimatedfrom the information present in an experimental crystalstructure database.From any experimental crystal structure database,

structural similarities can be obtained using structurecomparison algorithms.26,27 For instance, CaTiO3 andBaTiO3 both form cubic perovskite structures with Caand Ba on equivalent sites. This translates in our math-ematical framework as a specific assignment for thevariables vector (X,X0)= (Ca2, Ti4, O2-, Ba2, Ti4,O2-). We will follow the convention in probability theorydesigning specific values of the random variable vector(X,X0) by lower case letters (x, x0). An entire crystal struc-ture database D will lead to m assignments: (X,X0)=(x, x0)t with t = 1, ..., m

D fX,X0 x, x01, X,X0 x, x02, :::, X,X0

x, x0m- 1, X,X0 x, x0mg 11

Coming back to our analogy to machine translation,probabilistic translation models are estimated fromdatabases of texts with their corresponding translation.The analog to the translated texts database in oursubstitution model is the crystal structure database.Using these assignments obtained from the database,

we follow the commonly used maximum-likelihood ap-proach to find the adequate weights from the databaseavailable.28 The weights maximizing the likelihood toobserve the training data are considered as the bestestimates to use in the model. For notational purposewe will represent the set of weights by a weight vector .From those m assignments, the log-likelihood l of the

observed data D can be computed:

lD, Xm

t 1log px, x0tj 12

Xm

t 1X

i

i fix, x0t- log Z' 13

(25) Della Pietra, S. A.; Della Pietra, V. J.; Lafferty, J. IEEE Trans.Pattern Anal. Machine Intell. 1997, 19, 113.

(26) Parth!e, E.; Gelato, L. Acta Crystallogr., Sect. A 1984, 40, 169183.(27) Hundt, R.; Schon, J. C.; Jansen, M. J. Appl. Crystallogr. 2006, 39,

616.(28) Eliason, S. R. Maximum Likelihood Estimation: Logic and Practice;

Sage Publications, Inc: Thousand Oaks, CA, 1993.

For an ionic substitution model, one could choose forexample as a feature function:

f X,X0

1 if Ca2substitutes for Ba2in the presence of O2-

0 else

(

4

The relevant feature functions are commonly defined byexperts from prior knowledge. If our chosen set of featurefunctions are informative enough, we expect to be able toapproximate the probability function by a weighted sum ofthose feature functions:

pnX,X0 %e

Pi

i fni X,X0

Z5

The i indicate the weight given to the feature fi(n)(X,

X0) in the probabilistic model. Z is a partition functionensuring the normalization of the probability func-tion. The exponential form chosen in eq 5 follows acommonly used convention in the machine learningcommunity.25

2.3. Binary Feature Model. A first assumption we makeis to consider that the feature functions do not dependon the number n of ions in the compound. Simply put, weassume that the ionic substitution rules are independentof the compounds number of components (binary, ternary,quaternary, ...).Therefore, we will omit any reference to n in the

probability and feature functions. Equation 5 becomes

pX,X0 % ePi

i fiX,X0

Z6

While the feature functions could be more complex,only simple binary substitutions are considered in thispaper. This means that the likelihood for two ions tosubstitute to each other is independent of the nature ofthe other ionic species present in the compound. Mathe-matically, this translates in assuming that the relevantfeature functions are simple binary features of the form:

f a, bk X,X0

1 Xk a and Xk0 b0 else

(

7

Each pair of ions a and b present in the domain isassigned a set of feature functions with correspondingweights k

a,b indicating how likely the ions a and b cansubstitute in position k. For instance, one of the featurefunction will be related to the Ca2 to Ba2 substitution.

f Ca2, Ba2

k X,X0

1 Xk Ca2 and Xk0 Ba2

0 else

(

8

The magnitude of the weight kCa2,Ba2 associated with

this feature function indicates how likely this binarysubstitution is to happen.Finally, the features weights should satisfy certain

constraints for any permutation of the components tonot change the result of the probability evaluation. Thosesymmetry conditions are

a, bk b, ak 9

and

a, bk a, bl 10

2.4. Training of the Probability Function. While themathematical form for our probabilistic model is nowwell established, the model parameters (the weights k

a,b)still need to be evaluated. Those weights are estimatedfrom the information present in an experimental crystalstructure database.From any experimental crystal structure database,

structural similarities can be obtained using structurecomparison algorithms.26,27 For instance, CaTiO3 andBaTiO3 both form cubic perovskite structures with Caand Ba on equivalent sites. This translates in our math-ematical framework as a specific assignment for thevariables vector (X,X0)= (Ca2, Ti4, O2-, Ba2, Ti4,O2-). We will follow the convention in probability theorydesigning specific values of the random variable vector(X,X0) by lower case letters (x, x0). An entire crystal struc-ture database D will lead to m assignments: (X,X0)=(x, x0)t with t = 1, ..., m

D fX,X0 x, x01, X,X0 x, x02, :::, X,X0

x, x0m- 1, X,X0 x, x0mg 11

Coming back to our analogy to machine translation,probabilistic translation models are estimated fromdatabases of texts with their corresponding translation.The analog to the translated texts database in oursubstitution model is the crystal structure database.Using these assignments obtained from the database,

we follow the commonly used maximum-likelihood ap-proach to find the adequate weights from the databaseavailable.28 The weights maximizing the likelihood toobserve the training data are considered as the bestestimates to use in the model. For notational purposewe will represent the set of weights by a weight vector .From those m assignments, the log-likelihood l of the

observed data D can be computed:

lD, Xm

t 1log px, x0tj 12

Xm

t 1X

i

i fix, x0t- log Z' 13

(25) Della Pietra, S. A.; Della Pietra, V. J.; Lafferty, J. IEEE Trans.Pattern Anal. Machine Intell. 1997, 19, 113.

(26) Parth!e, E.; Gelato, L. Acta Crystallogr., Sect. A 1984, 40, 169183.(27) Hundt, R.; Schon, J. C.; Jansen, M. J. Appl. Crystallogr. 2006, 39,

616.(28) Eliason, S. R. Maximum Likelihood Estimation: Logic and Practice;

Sage Publications, Inc: Thousand Oaks, CA, 1993.


Phase stabilityEhull < 30 meV/atom

Promising candidates

Li-P-S and Li-M-P-S

compounds

Topological analysisrc > 1.75

Short AIMD estimation

MSD800K > 5 2

MSD1200K/MSD800K < 7

Long AIMD at multiple

temperatures300K > 1 mS/cm

ICSD

Diffusivity screening

Initial candidates

Ag-P-S and Ag-M-P-S

compounds

Substitute Ag for Li

Dopant and composition optimization

Li3Y(PS4)2

Li5PS4Cl2

Ehull = 2 meV/atomrc = 1.88 MSD800K = 65.1 2MSD ratio = 4.5

Ehull = 17 meV/atomrc = 1.76 MSD800K = 77.9 2MSD ratio = 3.1

Parent: Ag3Y(PS4)2(ICSD 417658)

Parent: Li5PS4Cl2(ICSD 416587)


Provisional patent filed

A B

Ehull

A0.5B0.5

Converged AIMD simulations confirm 3D Superionic Conductivities



300K = 2.16 1 mS/cmEa = 278 meV

300K = 1.78 1 mS/cmEa = 304 meV

Ca-doped 300K = 7.14 3 mS/cmZr-doped 300K = 5.25 1 mS/cm

Li3Y(PS4)2 likely to exhibit better electrochemical stability than current state-of-the-art


Zhu, Z.; et al., Chem. Mater., 2016, Accepted Kato, et al., Nat. Energy, 2016, 1, 16030.

Crystalium Worlds Largest Database of Surface Energies and Wulff Shapes


Generation of OUCs up to max

Miller index

Input bulk crystal structure

Relaxation calculation of OUC (hkl)

Termination 1 calculation

Termination 2 calculation

Surface calculations (h2k2l2)

Surface calculations

(h1k1l1)

Calculations completed?

Parameter adjustment

No

Yes

Surface Database

Surface energy and Wulff shape

calculations

Materials Project

DryadRepository

http://crystalium.materialsvirtuallab.orgTran, R.; Xu, Z.; Radhakrishnan, B.; Winston, D.; Sun, W.; Persson, K. A.; Ong, S. P., Sci. Data, 2016, 3, 160080.

Insights from Large Materials Datasets


DFT does surprisingly well in terms of surface energies, contrary to popular perception

Trends in energies between reconstructed and unreconstructed surfaces can be reproduced.

Fcc(110) missing row

Expt. known to reconstruct!

Tran, R.; Xu, Z.; Radhakrishnan, B.; Winston, D.; Sun, W.; Persson, K. A.; Ong, S. P., Sci. Data, 2016, 3, 160080.

SEE = 0.27 J m2

Building software infrastructure enables new capabilities


Tran, R.; Xu, Z.; Zhou, N.; Radhakrishnan, B.; Luo, J.; Ong, S. P., Acta Mater., 2016, 117, 9199, doi:10.1016/j.actamat.2016.07.005.

Re, Os, Ta and W are strengthening dopants for Mo alloys.

Learning new insights into dopant effect on GB strength


the noble metals and the 3d transition metals. The dopants forwhich there is the most significant disagreement are those thattend to form intermetallic compounds with Mo. For example, Re,Os, Co, Pt, Ni, Ir, Pt, Tc, Zr, Hf, and Fe form at least one intermetalliccompound with Mo [43], and the empirical models significantlyunderestimate the magnitude of the segregation energy for thesedopants. Elements for which the empirical models are in goodagreement with the DFT results, such as W, Nb, Cr, Cd, Ta, and V,generally do not form intermetallic compounds with Mo. Precipi-tation of intermetallics at the two-dimensional GB is a precursor tobulk formation and is a chemical effect not accounted for in the

empirical models. In Fig. S3 of the Supplementary Information, wehave plotted the EX;tiltseg against experimental solubilities of thedopant X in Mo. In general, the same qualitative trend is observedwherein dopants with higher solubility have lower EX;tiltseg , inagreement with the model proposed by Hondros et al. [44].Moreover, it has been demonstrated generally for many metallicand ceramic materials (see several reviews by Cantwell et al. [22],Kaplan et al. [45], Harmer [46], Luo [47] and references therein) andspecifically for several binary [13,48] and ternary Mo based alloys[49,50] as well as W-based alloys [51,52] that GBs can undergo 2-Dphase-like structural transitions, which are more likely to form in

Fig. 7. Plots of the strengthening energy E XSE versus segregation energy EXseg for the 29 dopants in the S5(310) tilt GB. (a) based on lowest energy dopant site in GB and free surface (l-

to-l approach); (b) based on Site 0 (m-to-s approach). Dopants in the white region (positive E Xseg ) prefer to stay in the bulk. For dopants that segregate, those with negative E XSE (blueregion) tend to strengthen the GB. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 8. Plot of the observed strengthening energy E XSE for the site with the lowest EGBseg versus the two factor linear model for E

X; modelSE

!EXcoh;

RX!RMoRMo

"for the 29 dopants in the S5(310)

tilt GB. (a) Fitting performed based on the l-to-l approach. kcoh 0.14 0.023, kR 7.37 0.578. (b) Fitting performed based on the m-to-s approach. kcoh 0.39 0.042,kR 2.40 1.076. In all fittings, the p-value for all coefficients are

materiaIsvirtuaLab

Future challenges

February 23, 2017


Challenge 1: Software development is typically not viewed as science

Most research group software are:a. Badly codedb. Poorly documentedc. Not available to the communityd. Does not last beyond the current PhD/postdoce. All of the above


https://xkcd.com/292/

Challenge 2: Data APIs are not that common

Many materials data repository projects still Web 1.0. Development of API for programmatic materials data access

not part of distribution strategy.

APIs need complementary software support.


The Materials Application Programming Interface (API): A simple,flexible and efficient API for materials data based on REpresentationalState Transfer (REST) principles

Shyue Ping Ong a,, Shreyas Cholia b, Anubhav Jain b, Miriam Brafman b, Dan Gunter b, Gerbrand Ceder c,Kristin A. Persson ba Department of NanoEngineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0448, La Jolla, CA 92093, USAb Lawrence Berkeley National Lab, 1 Cyclotron Rd, Berkeley, CA 94720, USAc Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA

a r t i c l e i n f o

Article history:Received 18 August 2014Accepted 18 October 2014

Keywords:Materials ProjectApplication Programming InterfaceHigh-throughputMaterials genomeRestRepresentational state transfer

a b s t r a c t

In this paper, we describe the Materials Application Programming Interface (API), a simple, flexible andefficient interface to programmatically query and interact with the Materials Project database based onthe REpresentational State Transfer (REST) pattern for the web. Since its creation in Aug 2012, theMaterials API has been the Materials Projects de facto platform for data access, supporting not onlythe Materials Projects many collaborative efforts but also enabling new applications and analyses. Wewill highlight some of these analyses enabled by the Materials API, particularly those requiringconsolidation of data on a large number of materials, such as data mining of structural and propertytrends, and generation of phase diagrams. We will conclude with a discussion of the role of the API inbuilding a community that is developing novel applications and analyses based on Materials Project data.

! 2014 Elsevier B.V. All rights reserved.

1. Introduction

First principles methods are today a critical tool in the studyand design of materials. Starting from the fundamental laws ofphysics with minimal assumptions and approximations, first prin-ciples techniques can access a wide range of chemistries in a rela-tively agnostic manner, making them especially powerful inmaterials investigations or design problems spanning diversechemical spaces.

In the past decade, electronic structure calculation codes [14]have reached a level of maturity that it is now possible to reliablyautomate and scale first principles calculations across any numberof compounds. Coupled with computing advances, this develop-ment has led to the advent of high throughput (HT) first principlescalculations as an investigative and design tool in materialsscience. Even today, there are already several examples of HTfirst principles computation-guided materials design efforts inapplications as varied as alkali-ion batteries [59], catalysts for

hydrogen production [10], topological insulators [11], and organicsemiconductors [12], with many of these efforts resulting in thediscovery of novel materials that have already been synthesizedand verified experimentally. This HT capability has also spurredthe development of large databases of computed data on materials,such as the Materials Project [13], the AFLOWLIB library [14] andthe Harvard Clean Energy Project [12].

In particular, the Materials Project [13], created by the authorsof this paper, has led the charge of combining a large database ofmaterials properties with a diverse and growing set of online anal-ysis and comprehensive open source software tools [1517]. TheMaterials Projects database today contains computed energeticproperties for over 59,000 crystal structures along with over25,000 electronic structure properties. More structures and prop-erties (e.g., elastic constants, dielectric constants, etc.) are beingadded on a daily basis. A series of web applications provide userswith the capability to perform advanced searches and commonanalyses such as phase diagram and Pourbaix diagram generation[1820], reaction energy computations, prediction of novel struc-tures [21,22], etc. However, while these web applications provideuser-friendly graphical interfaces to explore materials data andanalyses, they do not provide easy programmatic access to theunderlying resources or a means for the community to developnovel applications or analyses.

http://dx.doi.org/10.1016/j.commatsci.2014.10.0370927-0256/! 2014 Elsevier B.V. All rights reserved.

Corresponding author.E-mail addresses: [email protected] (S.P. Ong), [email protected] (S. Cholia),

[email protected] (A. Jain), [email protected] (M. Brafman), [email protected](D. Gunter), [email protected] (G. Ceder), [email protected] (K.A. Persson).

URLs: http://www.materialsvirtuallab.org (S.P. Ong), http://ceder.mit.edu(G. Ceder).

Computational Materials Science 97 (2015) 209215

Contents lists available at ScienceDirect

Computational Materials Science

journal homepage: www.elsevier .com/locate /commatsci

Ong et al. Comput. Mater. Sci. 2015, 97, 209215.

A RESTful API for exchanging materials data in the AFLOWLIB.orgconsortium

Richard H. Taylor a,b, Frisco Rose b, Cormac Toher b, Ohad Levy b,1, Kesong Yang c,Marco Buongiorno Nardelli d,e, Stefano Curtarolo f,a National Institute of Standards and Technology, Gaithersburg, MD 20878, USAb Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, USAc Department of Nanoengineering, University of California, San Diego, La Jolla, CA 92093, USAd Department of Physics, University of North Texas, Denton, TX, USAe Department of Chemistry, University of North Texas, Denton, TX, USAf Materials Science, Electrical Engineering, Physics and Chemistry, Duke University, Durham, NC 27708, USA

a r t i c l e i n f o

Article history:Received 20 March 2014Received in revised form 5 May 2014Accepted 10 May 2014Available online 24 July 2014

Keywords:High-throughputCombinatorial materials scienceComputer simulationsMaterials databasesAFLOWLIB

a b s t r a c t

The continued advancement of science depends on shared and reproducible data. In the field of compu-tational materials science and rational materials design this entails the construction of large open dat-abases of materials properties. To this end, an Application Program Interface (API) following RESTprinciples is introduced for the AFLOWLIB.org materials data repositories consortium. AUIDs (AflowlibUnique IDentifier) and AURLs (Aflowlib Uniform Resource Locator) are assigned to the database resourcesaccording to a well-defined protocol described herein, which enables the client to access, through appro-priate queries, the desired data for post-processing. This introduces a new level of openness into theAFLOWLIB repository, allowing the community to construct high-level work-flows and tools exploitingits rich data set of calculated structural, thermodynamic, and electronic properties. Furthermore, feder-ating these tools will open the door to collaborative investigations of unprecedented scope that will dra-matically accelerate the advancement of computational materials design and development.! 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://

creativecommons.org/licenses/by/3.0/).

1. Introduction

Data-driven materials science has gained considerable tractionover the last decade or so. This is due to the confluence of threekey factors: (1) Improved computational methods and tools; (2)greater computational power; and (3) heightened awareness ofthe power of extensive databases in science [1]. The recent Materi-als Genome Initiative (MGI) [1,2] reflects the recognition thatmany important social and economic challenges of the 21st cen-tury could be solved or mitigated by advanced materials. Compu-tational materials science currently presents the most promisingpath to the resolution of these challenges.

The first and second factors above are epitomized byhigh-throughput computation of materials properties by ab initiomethods, which is the foundation of an effective approach to mate-rials design and discovery [312]. Recently, the software used tomanage the calculation work-flow and perform the analyses have

trended toward more public and user-friendly frameworks. Theemphasis is increasingly on portability and sharing of tools anddata [1315]. Similar to the effort presented here, the Materials-Project [16] has been providing open access to its database of com-puted materials properties through a RESTful API and a pythonlibrary enabling ad hoc applications [17]. Other examples of onlinematerial properties databases include that being implemented bythe Engineering Virtual Organization for Cyber Design (EVOCD)[18], which contains a repository of experimental data, materialsconstants and computational tools for use in Integrated Computa-tional Material Engineering (ICME). The future advance of compu-tational materials science would rely on interoperable andfederatable tools and databases as much as on the quantities andtypes of data being produced.

A principle of high-throughput materials science is that onedoes not know a priori where the value of the data lies for any spe-cific application. Trends and insights are deduced a posteriori. Thisrequires efficient interfaces to interrogate available data on variouslevels. We have developed a simple WEB-based API to greatlyimprove the accessibility and utility of the AFLOWLIB database[14] to the scientific community. Through it, the client can access

http://dx.doi.org/10.1016/j.commatsci.2014.05.0140927-0256/! 2014 The Authors. Published by Elsevier B.V.This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/).

Corresponding author.E-mail address: [email protected] (S. Curtarolo).

1 On leave from the Physics Department, NRCN, Israel.

Computational Materials Science 93 (2014) 178192

Contents lists available at ScienceDirect

Computational Materials Science

journal homepage: www.elsevier .com/locate /commatsci

Taylor et al. Comput. Mater. Sci. 2014, 93, 178192

Pymatgen provides high-level tools for users to easily obtain and work with data via Materials API.

Challenge 3: There are still problems too large for first principles


CostScale

Transferability

Transferable

Costly

Short time/length scales

Non-transferable

Cheap

Longer time/length scales

First principles EmpiricalAutomation

It from bit.- John Archibald Wheeler

Materials

Data 57th Sanibel Symposium

Acknowledgements


Ong group

Iek-Heng Chu, Balachandran Radhakrishnan, Zhuoying Zhu, Yuh-Chieh Lin, Zhenbin Wang, Zihan Xu, Zhi Deng, Hanmei Tang, WeikeYe, Chen Zheng

Collaborators

Prof Shirley Meng, Christopher Kompella, Han Nguyen, Sunny Hy

Prof Joanna McKittrick, Jungmin Ha

Creating It from Bit

materiaIsvirtuaLab

Thank you.

February 23, 2017