20
1 Materials Database and Machine Learning: AFLOW-ML Cormac Toher July 14 th , 2021

Materials Database and Machine Learning: AFLOW-ML

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

1

Materials Database and Machine Learning:AFLOW-ML

Cormac Toher

July 14th, 2021

AFLOW.org: Using the data

• Tools like the REST API and AFLUX provide a way to wrangle data.

• Each entry’s properties were calculated from DFT, which requires a high performance computing environment.

• Even with this, calculation times can take days to months depending on the system and property.

Goal: Use existing data to construct a model that predicts these properties with high accuracy to accelerate materials discovery.

2

3

AFLOW Machine Learning

• AFLOW data for > 26,674 materials on AFLOW.org used to train gradient boosting decision trees machine-learning model

• Predictions based on structural morphology and elemental properties crystalstructure

Voronoitessella/onandneighborssearch

infiniteperiodicgraphconstruc/onandpropertylabeling

nodes(atoms)

decomposi/ontofragments

edges(bonds)

pathfragmentsoflengthl,l=2,3,…

circularfragments(polyhedrons)

a b c

d

O. Isayev et al., Nat. Commun. 8, 15679 (2017).

• Voronoi tessellation used to determine atomic connectivity

• Atoms which share a Voronoi cell face are connected to form a graph

4

AFLOW Machine Learning

• Connected atoms form structure fragments descriptors

crystalstructureVoronoitessella/onand

neighborssearchinfiniteperiodicgraph

construc/onandpropertylabeling

nodes(atoms)

decomposi/ontofragments

edges(bonds)

pathfragmentsoflengthl,l=2,3,…

circularfragments(polyhedrons)

a b c

d

O. Isayev et al., Nat. Commun. 8, 15679 (2017).

• Atomic nodes in structure fragments are decorated with elemental properties to form Property-Labeled Materials Fragments (PLMF)

• Properties used include number of valence electrons, ionization potential, electron affinity, electronegativity, covalent radii, etc.

5

AFLOW Machine Learning• Model predicts electronic and thermo-mechanical properties 4

crystalstructure

ElectronicProper1es Thermo-MechanicalProper1es

metalorinsulator?

no EBG

{EBG � R :

EBG > 0}

bandgapenergy

predic4on

bulkmodulus(VRH)predic4on

{X � R}

yes

no

classifica4onmodel

regressionmodel

regressionmodels

FIG. 2. Outline of the modeling work-flow. ML models are represented by orange diamonds. Target properties predictedby these models are highlighted in green.

structure of the material and determine the atomic con-nectivity within it. In general, atomic connectivity isnot a trivial property to determine within materials.Not only must we consider the potential bonding dis-tances among the atoms, but also whether the topologyof nearby atoms allows for bonding. Therefore, we haveemployed a computational geometry approach to parti-tion the crystal structure (Figure 1a) into atom-centeredVoronoi-Dirichlet polyhedra [59–62] (Figure 1b). Thispartitioning scheme was found to be invaluable in thetopological analysis of metal organic frameworks (MOF),molecules, and inorganic crystals [63, 64]. Connectivitybetween atoms is established by satisfying two criteria:(i) the atoms must share a Voronoi face (perpendicu-lar bisector between neighboring atoms), and (ii) theinteratomic distance must be shorter than the sum ofthe Cordero covalent radii [65] to within a 0.25 A tol-erance. Here, we consider only strong interatomic in-teractions such as covalent, ionic, and metallic bonding,ignoring van der Waals interactions. Due to the ambigu-ity within materials, the bond order (single/double/triplebond classification) is not considered. Taken together,the Voronoi centers that share a Voronoi face and arewithin the sum of their covalent radii form a three-dimensional graph defining the connectivity within thematerial.

In the final steps of the PLMF construction, the fullgraph and corresponding adjacency matrix (Figure 1c)are constructed from the total list of connections. Theadjacency matrix A of a simple graph (material) with n

vertices (atoms) is a square matrix (n ⇥ n) with entriesaij = 1 if atom i is connected to atom j, and aij = 0 oth-erwise. This adjacency matrix reflects the global topol-

ogy for a given system, including interatomic bonds andcontacts within the crystal. The full graph is partitionedinto smaller subgraphs, corresponding to individual frag-ments (Figure 1d). While there are several subgraphs toconsider in general, we restrict the length l to a maximumof three, where l is the largest number of consecutive,non-repetitive edges in the subgraph. This restrictionserves to curb the complexity of the final descriptor vec-tor. In particular, we consider two types of fragments.Path fragments are subgraphs of at most l = 3 that en-code any linear strand of up to four atoms. Only theshortest paths between atoms are considered. Circularfragments are subgraphs of l = 2 that encode the firstshell of nearest neighbor atoms. In this context, circularfragments represent coordination polyhedra, or clustersof atoms with anion/cation centers each surrounded bya set of its respective counter ion. Coordination polyhe-dra are used extensively in crystallography and mineral-ogy [66].

Property labeling. PLMFs are di↵erentiated bylocal (standard atomic) reference properties [57], whichinclude: (i) general properties: the Mendeleev group andperiod numbers, number of valence electrons (NV); (ii)measured properties [57]: atomic mass, electron a�nity(EA), thermal conductivity (�), heat capacity (C), en-thalpies of atomization (�Hat), fusion (�Hfusion), and va-porization, first three ionization potentials (IP1,2,3); and(iii) derived properties: e↵ective atomic charge (Ze↵),molar volume (Vmolar), chemical hardness (⌘) [57, 67],covalent (rcov) [65], absolute [68], and van der Waalsradii [57], electronegativity (�), and polarizability. Wealso combine pairs of properties in the form of their mul-tiplication and ratio, as well as include the property value

• Model predicts electronic band gap for non-metals

O. Isayev et al., Nat. Commun. 8, 15679 (2017).

6

AFLOW Machine Learning

• Good agreement of predictions with both DFT and experimenta b c

• Partial dependence of properties on descriptors:

B,G : r2 = 0.99; ✓D : r2 = 0.97O. Isayev et al., Nat. Commun. 8, 15679 (2017).

7

AFLOW-ML Online• Models are available online at aflow.org/aflow-ml

O. Isayev et al., Nat. Commun. 8, 15679 (2017), E. Gossett et al., Comput. Mater. Sci. 152, 134 (2018).

8

AFLOW-ML Online• Models are available online at aflow.org/aflow-ml

PLMF: O. Isayev et al., Nat. Commun. 8, 15679 (2017)MFD: F. Legrain et al., J. Chem. Inf. Model. 58(12), 2460-2466 (2018)ASC: V. Stanev et al., npj Comput. Mater. 4, 29 (2018)

PLMF MFD ASC

POSCAR (VASP 5)

Run prediction

9

AFLOW-ML Online• Models are available online at aflow.org/aflow-ml

10

AFLOW-ML Online• Convert POSCAR for VASP 4 to POSCAR for VASP 5

ClNa/AB_cF8_225_a_b.AB params=5.63931 SG=2251.000000

0.00000000000000 2.81965500000000 2.819655000000002.81965500000000 0.00000000000000 2.819655000000002.81965500000000 2.81965500000000 0.00000000000000

1 1 Direct(2) [A1B1]

0.00000000000000 0.00000000000000 0.00000000000000 Cl 0.50000000000000 0.50000000000000 0.50000000000000 Na

ClNa/AB_cF8_225_a_b.AB params=5.63931 SG=2251.000000

0.00000000000000 2.81965500000000 2.819655000000002.81965500000000 0.00000000000000 2.819655000000002.81965500000000 2.81965500000000 0.00000000000000

Cl Na1 1 Direct(2) [A1B1]

0.00000000000000 0.00000000000000 0.00000000000000 Cl 0.50000000000000 0.50000000000000 0.50000000000000 Na

VASP 5: Add line with list of elements

VASP 4

VASP 5

11

AFLOW-ML Online

Exercises:• Convert the Heusler structure POSCAR you decorated in Session 4 from VASP 4 to

VASP 5.

• Copy this structure into the AFLOW-ML application, and run the PLMF model. Is it a metal or insulator? What are the values of the bulk and shear moduli?

• Run the MFD model for the same structure. What properties does this model give?

• Upload the chemical formula for this material to the AFLOW-ML application, and run the ASC model. What is the superconducting critical temperature for this composition?

O. Isayev et al., Nat. Commun. 8, 15679 (2017); E. Gossett et al., Comput. Mater. Sci. 152, 134 (2018);F. Legrain et al., J. Chem. Inf. Model. 58(12), 2460-2466 (2018); V. Stanev et al., npj Comput. Mater. 4, 29 (2018)

AFLOW-ML API

• With machine learning models becoming more prevalent, we wanted to create a programable interface to access our ML models.

• In tandem, we wanted this interface to be simple and require users access to predictions without the need of installing ML libraries or codebases.

• Finally, we wanted a centralized location to continuously update our models as well as add those of our collaborators.

12

13

AFLOW-ML API• Models are now programmatically accessible via AFLOW-ML API

E. Gossett et al., Comput. Mater. Sci. 152, 134-145 (2018).

Prediction

no

yes

POSCAREndpoint <model>/prediction

Response task object (which includes {id})

Endpoint /prediction/result/{id}

Response status or prediction object

status =

"SUCCESS"

POST

GET

AFLOW-ML API

14

• AFLOW-ML API Python client can be downloaded from:http://aflow.org/src/aflow-ml/

E. Gossett et al., Comput. Mater. Sci. 152, 134-145 (2018).

AFLOW-ML API: Applications• PLMF model integrated with genetic algorithm code XtalOpt to

discover new superhard carbon phases

15

0

10

20

30

40

50

60

70

80

-9 -8 -7 -6 -5

Hv

(GP

a)

Energy (eV per formula unit)

diamond likegraphite like

Superhardand stable P-1-12

75.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P-1-1275.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P-1-1275.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P-1-1275.6 GPa

(a) (b) (c)

(f)

P-1-16c71.3 GPa

P1-16d72.4 GPa

P-1-16e71.5 GPa

(e)

P-1-16b72.4 GPa

P-1-16d76.2 GPa

(d)

P. Avery et al., npj Comput. Mater. 5, 89 (2019).

AFLOW-ML API: Example• Submit VASP 5 POSCAR to prediction endpoint with curl:

16

curl http://aflow.org/API/aflow-ml/v1.0/plmf/prediction --data-urlencode file@POSCAR

• Receive task object with task ID:

{"id": "39b0f11a-671d-4144-9465-997013ab19c0", "model": "plmf", "results_endpoint": "/prediction/result/39b0f11a-671d-4144-9465-997013ab19c0"

}

• Query task ID to retrieve results:

curl http://aflow.org/API/aflow-ml/v1.0/prediction/result/39b0f11a-671d-4144-9465-997013ab19c0

• Receive results object

E. Gossett et al., Comput. Mater. Sci. 152, 134-145 (2018).

AFLOW-ML API: Example

17

• PLMF results object

{"citation": "10.1038/ncomms15679", "description": "The job has completed.", "ml_ael_bulk_modulus_vrh": 144.522, "ml_ael_shear_modulus_vrh": 104.453, "ml_agl_debye": 777.163, "ml_agl_heat_capacity_Cp_300K": 4.33, "ml_agl_heat_capacity_Cp_300K_per_atom": 2.194, "ml_agl_heat_capacity_Cv_300K": 4.178, "ml_agl_heat_capacity_Cv_300K_per_atom": 2.139, "ml_agl_thermal_conductivity_300K": 3.509, "ml_agl_thermal_expansion_300K": 6.18e-05, "ml_egap": 3.375, "ml_egap_type": "Insulator", "ml_energy_per_atom": -5.742, "model": "plmf", "status": "SUCCESS"

}

E. Gossett et al., Comput. Mater. Sci. 152, 134-145 (2018).

AFLOW-ML API: Example

18

• ML API python script

E. Gossett et al., Comput. Mater. Sci. 152, 134-145 (2018).

#!/usr/bin/python3import json, sys, osfrom time import sleepfrom urllib.parse import urlencodefrom urllib.request import urlopenfrom urllib.request import Requestfrom urllib.error import HTTPError

SERVER="http://aflow.org"API="/API/aflow-ml/v1.0"MODEL="plmf"

poscar=open('POSCAR', 'r').read()encoded_data = urlencode({'file': poscar,}).encode('utf-8')

url = SERVER + API + "/" + MODEL + "/prediction"request_task = Request(url, encoded_data)task = urlopen(request_task).read()task_json = json.loads(task.decode('utf-8'))results_endpoint = task_json["results_endpoint"]results_url = SERVER + API + results_endpoint

Sleep library

AFLOW-ML server

PLMF model

Encode POSCAR

Retrieve task object

Extract task ID and results endpoint

Results URL

AFLOW-ML API: Example

19

• ML API python script

E. Gossett et al., Comput. Mater. Sci. 152, 134-145 (2018).

incomplete = Truewhile incomplete:

request_results = Request(results_url)results = urlopen(request_results).read()results_json = json.loads(results)if results_json["status"] == 'PENDING':

sleep(10)continue

elif results_json["status"] == 'STARTED':sleep(10)continue

elif results_json["status"] == 'FAILURE':print("Error: prediction failure")incomplete = False

elif results_json["status"] == 'SUCCESS':print("Successful prediction")print(results_json)incomplete = False

Retrieve status/results object

Check status: if PENDING or STARTED, sleep for 10 seconds and recheck

Check status: if FAILURE, write error message

Check status: if SUCCESS, write out the results json

20

AFLOW-ML Online

Exercises:• Copy the VASP 5 Heusler structure POSCAR from the previous exercise to the

2_AFLOW-ML_API directory. Modify the aflow_ml_api.py script to print whether the material is a metal or an insulator, and if it is an insulator, to print the band gap.

• Modify the script to run the MFD model for the same structure. What results are returned?

• Use AFLUX or the AFLOW.org advanced search page to find the entry in the Mo-Ti alloy system with the lowest formation enthalpy per atom. Download the relaxed structure and convert it to VASP 5 format, and use the AFLUX ML API to find the bulk and shear moduli.

O. Isayev et al., Nat. Commun. 8, 15679 (2017); E. Gossett et al., Comput. Mater. Sci. 152, 134 (2018);F. Legrain et al., J. Chem. Inf. Model. 58(12), 2460-2466 (2018); V. Stanev et al., npj Comput. Mater. 4, 29 (2018)