12
Informatics Infrastructure for the Materials Genome Initiative ALDEN DIMA, 1,3 SUNIL BHASKARLA, 1 CHANDLER BECKER, 1 MARY BRADY, 1 CARELYN CAMPBELL, 1 PHILIPPE DESSAUW, 1 ROBERT HANISCH, 1 URSULA KATTNER, 1 KENNETH KROENLEIN, 2 MARCUS NEWROCK, 1 ADELE PESKIN, 2 RAYMOND PLANTE, 1 SHENG-YEN LI, 1 PIERRE-FRANC ¸ OIS RIGODIAT, 1 GUILLAUME SOUSA AMARAL, 1 ZACHARY TRAUTT, 1 XAVIER SCHMITT, 1 JAMES WARREN, 1 and SHARIEF YOUSSEF 1 1.—National Institute of Standards and Technology, Gaithersburg, MD, USA. 2.—National Ins- titute of Standards and Technology, Boulder, CO, USA. 3.—e-mail: [email protected] A materials data infrastructure that enables the sharing and transformation of a wide range of materials data is an essential part of achieving the goals of the Materials Genome Initiative. We describe two high-level requirements of such an infrastructure as well as an emerging open-source implementation consisting of the Materials Data Curation System and the National Institute of Standards and Technology Materials Resource Registry. INTRODUCTION New technologies are often limited by currently existing materials because the time to develop and deploy new materials generally exceeds the pro- duct design cycle. For example, it takes approxi- mately 2 years to design a new jet engine using available materials, but it may take 10–15 years to design and certify the new materials needed for the engine. 1 Integrated computational materials engineering (ICME) approaches have proven suc- cessful at decreasing this gap between the mate- rials development cycle and product development cycle, 2 but these approaches are not well devel- oped for all classes and applications of materials, and there is a critical need for materials data and modeling tools that further enable these approaches. To address the need to decrease the time and cost to develop and deploy new materials by 50%, President Obama announced the Materials Gen- ome Initiative (MGI) in 2011. 3 The MGI recog- nizes that advanced materials play a critical role in clean energy, human welfare, and national security. It is a multiagency initiative that focuses on the infrastructure needed to accelerate materi- als development, particularly in the following areas: (I) Computational Tools, (II) Experimental Tools, (III) Collaborative Networks, and (IV) Dig- ital Data. By facilitating the integration of data into devel- oping ICME approaches and other computational approaches to materials discovery, design, develop- ment, and deployment, a materials data infrastruc- ture that allows the wide range of materials data to be easily shared and transformed is essential to achieving the goals of the MGI. As a part of this materials data infrastructure, the National Institute of Standards and Technology (NIST) is establishing essential data exchange pro- tocols and the means to ensure the quality of materials data and models needed to foster wide- spread adoption of MGI approaches. This informat- ics infrastructure will play an important role, in particular, in the form of repositories that contain materials simulation and experimental data and metadata, models, and code. These repositories and other infrastructure will provide resources for use in the materials development process as researchers strive to create materials with targeted properties. NIST is particularly working to enable and enhance the exchange of materials resources across reposito- ries, subdomains of the materials community, and industries. NIST is also working to assess and improve the quality of materials data, models, and infrastructure. Users of these developing data resources come from diverse communities. Many informatics efforts are, by immediate necessity, ad hoc and organic as opposed to being top-down. Each community has its JOM, Vol. 68, No. 8, 2016 DOI: 10.1007/s11837-016-2000-4 Ó 2016 The Minerals, Metals & Materials Society (outside the U.S.) (Published online July 6, 2016) 2053

Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

Informatics Infrastructure for the Materials Genome Initiative

ALDEN DIMA,1,3 SUNIL BHASKARLA,1 CHANDLER BECKER,1

MARY BRADY,1 CARELYN CAMPBELL,1 PHILIPPE DESSAUW,1

ROBERT HANISCH,1 URSULA KATTNER,1 KENNETH KROENLEIN,2

MARCUS NEWROCK,1 ADELE PESKIN,2 RAYMOND PLANTE,1

SHENG-YEN LI,1 PIERRE-FRANCOIS RIGODIAT,1 GUILLAUME SOUSAAMARAL,1 ZACHARY TRAUTT,1 XAVIER SCHMITT,1 JAMES WARREN,1

and SHARIEF YOUSSEF1

1.—National Institute of Standards and Technology, Gaithersburg, MD, USA. 2.—National Ins-titute of Standards and Technology, Boulder, CO, USA. 3.—e-mail: [email protected]

A materials data infrastructure that enables the sharing and transformationof a wide range of materials data is an essential part of achieving the goals ofthe Materials Genome Initiative. We describe two high-level requirements ofsuch an infrastructure as well as an emerging open-source implementationconsisting of the Materials Data Curation System and the National Instituteof Standards and Technology Materials Resource Registry.

INTRODUCTION

New technologies are often limited by currentlyexisting materials because the time to develop anddeploy new materials generally exceeds the pro-duct design cycle. For example, it takes approxi-mately 2 years to design a new jet engine usingavailable materials, but it may take 10–15 yearsto design and certify the new materials needed forthe engine.1 Integrated computational materialsengineering (ICME) approaches have proven suc-cessful at decreasing this gap between the mate-rials development cycle and product developmentcycle,2 but these approaches are not well devel-oped for all classes and applications of materials,and there is a critical need for materials data andmodeling tools that further enable theseapproaches.

To address the need to decrease the time andcost to develop and deploy new materials by 50%,President Obama announced the Materials Gen-ome Initiative (MGI) in 2011.3 The MGI recog-nizes that advanced materials play a critical rolein clean energy, human welfare, and nationalsecurity. It is a multiagency initiative that focuseson the infrastructure needed to accelerate materi-als development, particularly in the followingareas: (I) Computational Tools, (II) ExperimentalTools, (III) Collaborative Networks, and (IV) Dig-ital Data.

By facilitating the integration of data into devel-oping ICME approaches and other computationalapproaches to materials discovery, design, develop-ment, and deployment, a materials data infrastruc-ture that allows the wide range of materials data tobe easily shared and transformed is essential toachieving the goals of the MGI.

As a part of this materials data infrastructure, theNational Institute of Standards and Technology(NIST) is establishing essential data exchange pro-tocols and the means to ensure the quality ofmaterials data and models needed to foster wide-spread adoption of MGI approaches. This informat-ics infrastructure will play an important role, inparticular, in the form of repositories that containmaterials simulation and experimental data andmetadata, models, and code. These repositories andother infrastructure will provide resources for use inthe materials development process as researchersstrive to create materials with targeted properties.NIST is particularly working to enable and enhancethe exchange of materials resources across reposito-ries, subdomains of the materials community, andindustries. NIST is also working to assess andimprove the quality of materials data, models, andinfrastructure.

Users of these developing data resources come fromdiverse communities. Many informatics efforts are,by immediate necessity, ad hoc and organic asopposed to being top-down. Each community has its

JOM, Vol. 68, No. 8, 2016

DOI: 10.1007/s11837-016-2000-4� 2016 The Minerals, Metals & Materials Society (outside the U.S.)

(Published online July 6, 2016) 2053

Page 2: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

own data, metadata, and tools that are often incom-patible. NIST believes that there is a need for newmethods to enable the rapid definition of data andmetadata, as well as a need for tools to enable rapiddiscovery and integration of these diverse data.

HIGH-LEVEL REQUIREMENTS

We believe that, from an informatics perspective,the MGI goals of accelerating materials develop-ment and deployment will hinge on two high-levelrequirements:

(1) Materials researchers require a platform forinteroperable exchange of materials data andmetadata, which supports an approach of mod-ular community-developed data standards.

(2) Materials researchers need a decentralizedinfrastructure to enable finding and sharing ofmaterials resources.

To meet the first requirement, researchers musthave a system of data templates that can bedesigned to form custom containers for their exper-imental and simulation data and its associatedmetadata. These custom data formats will, however,be made from combinations of standardized compo-nents including community-developed templatesthat describe particular experiments or simulationsand low-level reusable data types that encode datavalues and metadata fields in a standard way. As aresult, it is anticipated that many of the issuesassociated with the current diversity of materialsdata formats will disappear without requiringresearchers to force fit their data into monolithicdata formats ill-suited to their needs.

Despite the success of Web-based search engines,they are in many ways not suited for searching forscientific resources. In this context, we use the term‘‘resources’’ to include datasets and data collectionsor repositories, and information about organiza-tions, application programming interfaces (APIs)and other information services, informational web-sites, and software. Simple text-based searchesoften return too many irrelevant results thatrequire researchers to filter tediously through pagesof output or to spend time devising clever searchqueries. Meeting the second requirement impliescreating an informatics infrastructure that willenable materials researchers to search for materialsdata using metadata schemas with well-definedmeanings. It will also enable them to make theirdata and other resources available to others usingthe same decentralized infrastructure.

The use of registries in informatics infrastruc-tures is not new. In healthcare, registries supportthe task of identifying documents related to apatient in systems conforming to the Integratingthe Healthcare Enterprise (IHE) Cross-EnterpriseDocument Sharing (XDS) integration profile.4 Meta-data pertaining to a patient document are indexedin a registry that can be queried. In astronomy, the

Virtual Observatory5 provides astronomers with adistributed ecosystem for data-based research thatincludes community-established data protocols, for-mats, and tools. A key component of the discoveryframework is federation of data resource registriesthat contain searchable metadata about archives,data collections, and services that are available.6

In addition, various other scientific registries andsupport tools are being developed.7–9 The ResearchData Alliance (RDA) Data Type Registries WorkingGroup has defined a data model for the collection ofscientific data and has implemented a prototype datatype registry10 to facilitate the understanding ofscientific data collected by different research groups.

Also, a variety of materials science-based effortsexist to improve the exchange of materials-baseddata. The Materials Intelligence system fromGranta Design* integrates materials data with avariety of software tools.

Boyce et al. worked to develop an integratedsystem by using HDF5 formats.11 MatSeek12 devel-oped an ontology-focused system to federate searchcapabilities for materials data. The Materials Com-mons platform13 is a JavaScript Object Notation(JSON)-based modular system for data curation andprovenance documentation. As far as we know, nomaterials informatics infrastructure currentlyexists that can easily and flexibly adapt withminimum development effort to the variety of needsdescribed by our high-level requirements.

OVERALL ARCHITECTURE

After considering these previous and currentefforts, we have chosen a Web-based approach thatuses a Python-based Django framework, as illus-trated in Fig. 1. User interaction can occur via agraphical user interface (GUI) or through scriptsconnected via a representational state transfer(REST) API. For data harvesting applications, weuse the Open Archives Initiative Protocol for Meta-data Harvesting (OAI-PMH) to query and retrievedata from known data providers such as repositoriesand other registries. Data, metadata, and binarylarge objects (BLOBs), such as images, are handledby a data management layer that ultimately storesthe data and metadata contained in the ExtensibleMarkup Language (XML) documents in a MongoDBNoSQL database. BLOBs are stored separately bydefault with MongoDB’s GridFS, but other reposi-tories such as DSpace can also be used. Our systemcan act as a data provider for harvesting via otherOAI-PMH compliant systems.

*Certain commercial equipment or software is identified in thisarticle to foster understanding. Such identification does not implyrecommendation or endorsement by the National Institute ofStandards and Technology, nor does it imply that the material orequipment identified is necessarily the best available for thepurpose.

Dima, Bhaskarla, Becker, Brady, Campbell, Dessauw, Hanisch, Kattner, Kroenlein,Newrock, Peskin, Plante, Li, Rigodiat, Amaral, Trautt, Schmitt, Warren, and Youssef

2054

Page 3: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

An important aspect of our architecture is the use ofXML to structure data and metadata because thisprovides standardized methods for the encoding,interpretation, and transformation. We expect that

user communities will work together to generateshared data and metadata models expressed as XMLSchema. Our infrastructure thendynamically rendersa GUI based on the schema to allow users to input data

Fig. 1. Overall architecture of the informatics infrastructure under development.

Fig. 2. How the curation of data from literature fits into the overall architecture.

Informatics Infrastructure for the Materials Genome Initiative 2055

Page 4: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

conforming to that schema. As MongoDB uses BinaryJSON (BSON), a variant of JSON, to represent itsdata, we have created a translation layer that convertsXML documents into the corresponding BSON andthen back to XML as needed. The transformability ofXML is also used to export retrieved XML documentsto other formats. Currently, we allow for conversion to

other text-based formats such as comma separatedvalues (CSVs), but in principle any format can begenerated, including graphics.

Our architecture has been implemented for Win-dows, Mac OS X, and Linux and is currently thebasis for four systems: the Materials Data CurationSystem (MDCS), the NIST Materials Resource

Fig. 3. A data entry form dynamically generated from an XML Schema file.

Dima, Bhaskarla, Becker, Brady, Campbell, Dessauw, Hanisch, Kattner, Kroenlein,Newrock, Peskin, Plante, Li, Rigodiat, Amaral, Trautt, Schmitt, Warren, and Youssef

2056

Page 5: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

Registry (NMRR), the MGI Code Catalog (MCC),and the National Metrology Institutes ResourceRegistry (NMIRR). The first two systems will bediscussed in more detail here.

MATERIALS DATA CURATION SYSTEM

The MDCS was designed to address the first high-level informatics requirement of the MGI that mate-rials researchers need modular data models thatcapture their data and metadata in community-developed templates using reusable data types. TheMDCS source code and installation instructions areavailable from https://github.com/usnistgov/MDCS.

Scientific data exist in a multitude of formats, andsimilar data are often encoded in many ways. Thisdiversity makes it difficult to combine data frommultiple sources, understand and reuse existingdata, find associated metadata, and transform datainto new formats to support its reuse. Figure 2shows how the MDCS fits in our overall architecturewhen data are curated from literature. By using acommunity-developed template expressed in XMLSchema, a user can interact with a dynamicallygenerated user interface to enter data and loadimages or other binary data into the MDCS. Asimilar user interface will allow the user to retrievedata already entered into the MDCS. Data are

Fig. 4. Search interface generated from XML Schema.

Informatics Infrastructure for the Materials Genome Initiative 2057

Page 6: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

converted for storage and retrieval from MongoDBby a data management layer. Images and BLOBsare stored separately. An MDCS instance may actas a data provider to a registry; this functionality isavailable via the OAI-PMH data provider. Theexporter functionality is also available to convertthe data into other, possibly non-XML, formats.

Multiple instances of the MDCS can be connected tosupport federated searches. Figures 3 and 4 showthe types of graphical user interfaces that can bedynamically generated from an XML Schema. Thedata entry form in Fig. 3 was generated from anXML Schema representing diffusion data, andFig. 4 shows a search form also generated from

Fig. 5. Interface of the template composer.

Dima, Bhaskarla, Becker, Brady, Campbell, Dessauw, Hanisch, Kattner, Kroenlein,Newrock, Peskin, Plante, Li, Rigodiat, Amaral, Trautt, Schmitt, Warren, and Youssef

2058

Page 7: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

the schema. The ability to generate forms dynam-ically directly from XML Schema saves developmenteffort and increases the flexibility of the MDCS.

As XML Schema plays a central role in the MDCS,a concern is that reliance on this technology mightprove an obstacle to widespread use of the MDCS byusers who are not versed in schema development anduse. In recognizing this, we have created a templatecomposer as part of the MDCS that allows users toeither start with an existing XML Schema andmodify it or use an existing collection of lower leveltemplates to create an entirely new template. Fig-ure 5 shows a screenshot of the MDCS templatecomposer. We plan to leverage public registries inthe future to enable an ecosystem centered on thecreation and sharing of MDCS templates.

The dynamically generated user interfaces arelimited to generating default user interface widgetsfor a given schema. Certain schema elements, suchas the one representing the elements of the PeriodicTable, would by default be rendered as a long pull-down list. This is unnecessarily tedious, and the

MDCS provides facilities to override the defaultuser interface elements with custom widgets. Fig-ure 6 shows how the default Periodic Table elementpull-down list is replaced by a custom PeriodicTable. The custom widget was developed by aprogrammer and is associated with the XMLSchema elements in the Admin dashboard by theMDCS system administrator. Subsequent uses ofthe template now render the Periodic Table in amore familiar format.

The user interface (UI) module system can domore than just override the default rendering ofXML tags in the input form. It can also be used tocreate entire mini applications that can do backendprocessing to support the overall use of the MDCSfor curating particular types of data. Figure 7 showsthat the UI module architecture is capable ofinteracting with the server, remote data sources,and external programs to support data processingand validation. Figure 8 shows the administrativeuser interface that allows a module to be associatedwith elements from a schema. This will effectively

Fig. 6. Overriding the default rendering of an element in a schema using a UI module.

Fig. 7. UI modules support backend processing.

Informatics Infrastructure for the Materials Genome Initiative 2059

Page 8: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

allow the default rendering behavior associatedwith an element to be replaced by another specifiedbehavior in the module.

The MDCS allows for automated curation of datavia user scripts written in languages such as Pythonthat interact via the MDCS REST API (Fig. 9). TheREST API enables the full functionality of theMDCS to be accessed by using a wide variety ofprogramming languages without using the graphi-cal user interface. Scientific equipment will oftengenerate output data in a text format. It is arelatively simple manner to write code that willconvert the text data into an XML document andthen submit it to the MDCS for storage. We haveused Swagger to expose and document the MDCSREST API to users via a Web browser. This shouldgreatly facilitate its use.

One of the great strengths of XML is its ability tobe transformed into other formats by using stan-dard tools such as Extensible Stylesheet Language

Transformations (XSLT), a programming languagethat uses XML syntax. The MDCS Exporter allowsfor the XML documents stored in the MDCS to betransformed into other formats such as CSV byusing an XSLT stylesheet associated with theschema. This enables data stored in XML to beconverted into tool-specific formats for use as part ofscientific workflows.

NIST MATERIALS RESOURCE REGISTRY

The NIST Materials Resource Registry (NMRR)was developed to address the second high-level MGIinformatics requirement that materials researchersneed to be able to find and share materials resourcesin a decentralized way. The source code for theNMRR is available from https://github.com/usnistgov/MaterialsResourceRegistry.

Figure 10 shows how the NMRR fits within ouroverall architecture. NMRR users can publishmetadata describing their resources using

Fig. 8. The module manager associates UI modules with elements in documents generated from an XML Schema.

Dima, Bhaskarla, Becker, Brady, Campbell, Dessauw, Hanisch, Kattner, Kroenlein,Newrock, Peskin, Plante, Li, Rigodiat, Amaral, Trautt, Schmitt, Warren, and Youssef

2060

Page 9: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

community-developed metadata templates renderedby a graphical user interface, and they can alsosearch and discover existing resources. Addition-ally, resource metadata can be published in anautomated fashion by using the REST API or it canbe harvested from registered data providers (suchas repositories and other registries) using OAI-PMH. An NMRR registry can also serve as a dataprovider for other OAI-PMH compliant registries.In this fashion, multiple NMRR installations can beinterconnected to create a decentralized federationof registries. Figure 11 shows the interface pre-sented to users searching for resources. Theresource search and retrieval process begins when

a user submits a query to the NMRR searchinterface. The NMRR then responds with a list ofavailable resources that match the query. The userthen selects the link to the appropriate resource andthe user’s browser is redirected to that resource.

The NMRR and the MDCS are complementarysystems where the MDCS can be used to makematerials data accessible and the NMRR can beused to make materials data discoverable. Fromthe perspective of the data consumer, a search onthe NMRR returns candidate instances of theMDCS and other repositories. The user can thensearch an individual repository for candidatedatasets.

Fig. 9. The MDCS REST API as exposed by Swagger.

Informatics Infrastructure for the Materials Genome Initiative 2061

Page 10: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

DISCUSSION

The goal of the MDCS is to facilitate the collec-tion, use, and reuse of materials data and to providethe needed informatics infrastructure to facilitate

the implementation of ICME approaches. Severalcollaborators are using the MDCS for their ownwork. Northwestern University’s NanoMine, anonline platform for the prediction of polymer

Fig. 11. NMRR search interface.

Fig. 10. How the NMRR fits into the overall architecture.

Dima, Bhaskarla, Becker, Brady, Campbell, Dessauw, Hanisch, Kattner, Kroenlein,Newrock, Peskin, Plante, Li, Rigodiat, Amaral, Trautt, Schmitt, Warren, and Youssef

2062

Page 11: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

nanocomposites, uses the MDCS to curate nanocom-posite processing, structure, and property datareported in literature and then to link it to a varietyof modeling tools.14 Raymundo Arroyave’s group atTexas A&M University is using the MDCS to collectdata from computational materials science simula-tions and measurements of differential scanningcalorimetry. At NIST, work is being done to curateboth literature and experimental thermodynamicdata with the MDCS. The NIST ThermodynamicResearch Center is expanding ThermoML15,16 toinclude data on metals and plans to integrate theirefforts with the MDCS.

The MDCS is also being integrated with theInteratomic Potentials Repository (IPR) Project.17 Arecent article summarized the expanded scope of theIPR Project as a response to the MGI.18 Prior to thecreation of the MDCS, metadata for interatomicpotentials were manually curated in semistructuredtext files. As the project is working to enableselection of interatomic potentials based on materialproperties and other metadata, the MDCS is beingused to curate all supporting data and metadata.Furthermore, rapid property calculation tools arebeing developed and directly integrated with theMDCS via its API. This combined toolset could alsobe used to develop new potentials, where localinstances of IPR tools and the MDCS address data

management issues associated with developingmany different iterations or variants of interatomicpotentials, as part of the typical developmentprocess.

A 2014 whitepaper indicated that high-through-put experiments (HTEs) are uniquely suited to meetmany needs within the MGI by generating largevolumes of high-quality experimental data suit-able for model validation or model input.19 Effortsat NIST are focused on capturing data as it isgenerated on the synthesis or measurement appa-ratus and automatically transforming applicabledata and metadata into XML formats, which arecompliant with the MDCS. This effort is part of abroader effort to exchange samples and data acrossinstitutions to advance HTE metrology.

As the use of the MDCS expands, users of thissoftware will be able to register datasets to share usingthe Materials Resource Registry. Registering a data-set will allow the metadata to be harvested, enablingpotential users to find it. Figure 12 illustrates how thepotential user might use the Materials ResourceRegistry to locate data stored in Materials DataCurators and a variety of other data repositories.

The open source software infrastructure presentedin this work supports both data curation usingmodular data schema models for data exchangeand decentralized data search platform. The MDCS

Fig. 12. (a) MRR harvests metadata from a variety of data sources including both MDCS and data repositories around the world. (b) User sendsdata request to MRR. (c) MRR sends the user a variety of potential data sets and enables the use of a REST-API to push data to user.

Informatics Infrastructure for the Materials Genome Initiative 2063

Page 12: Informatics Infrastructure for the Materials Genome Initiativeinformatics infrastructure for the materials genome initiative alden dima,1,3 sunil bhaskarla,1 chandler becker,1 mary

will enable the materials science community to buildand share community-based data models for thecuration of specific data types. The MaterialsResource Registry will improve the ability to findand share data with the metadata harvestable byother registries. Both the MDCS and the NMRR aredesigned to work with other data curation andsharing tools to further the aims of the MGI.

ACKNOWLEDGEMENTS

The authors would like to thank RaymundoArroyave, Cate Brinson, Yannick Congo, LucasHale, Ya-Shian Li-Baboud, Greta Lindwall, ChrisMuzny, Pierre Savonitto, Daniel Sauceda, and Ri-chard Zhao for their support and contributions.

REFERENCES

1. C. Rae, Mater. Sci. Techn. 25, 479 (2009).2. W. Xiong and G.B. Olson, MRS Bull. 40, 1035 (2015).3. National Science and Technology Council, Materials Gen-

ome Initative for Global Competitiveness (Washington: Of-fice of Science and Technology Policy, 2011).

4. Integrating the Healthcare Enterprise (IHE International2015), http://www.ihe.net/. Accessed 24 March 2016.

5. R.J. Hanisch, G.B. Berriman, T.J.W. Lazio, S. EmeryBunn,J. Evans, T.A. McGlynn, and R. Plante, Astron. Comput.11, 190 (2015).

6. M. Demleitner, G. Greene, P. Le Sidaner, and R.L. Plante,Astron. Comput. 7–8, 101 (2014).

7. Corda (Corporation of National Research Initiatives, January2016), https://www.cordra.org/. Accessed 24 March 2016.

8. 2nd Generation of Open Access Infrastructure for Researchin Europe, OpenAIRE (Openaire Consortium, February2016), https://www.openaire.eu/, Accessed 22 March 2016.

9. Research Data Switchboard (Research Data Alliance,March 2015), http://www.rd-switchboard.org/. Accessed 21March 2016.

10. Data Type Registry (Corporation of National ResearchInitiatives, August 2014) http://typeregistry.org/registrar/.Accessed 13 July 2015.

11. D.E. Boyce, P.R. Dawson, and M.P. Miller, Metall. Mater.Trans. A 40A, 2301 (2009).

12. K. Cheung, J. Hunter, and J. Drennan, Intell. Syst. 24, 47(2009).

13. B. Puchala, G. Tarcea, E.A. Marquis, M. Hedstrom, H.V.Jagadish, and J.E. Allison, JOM (2016). doi:10.1007/s11837-016-1998-7.

14. C. Brinson, H.R. Zhao, NanoMine, http://brinson.mech.northwestern.edu/research/Nanomine.html. Accessed Feb2016.

15. M. Frenkel, R.D. Chirico, V.V. Diky, Q. Dong, S. Frenkel,P.R. Franchois, D.L. Embry, T.L. Teague, K.N. Marsh, andR.C. Wilhoit, J. Chem. Eng. Data 48, 2 (2003).

16. R.D. Chirico, M. Frenkel, V.V. Diky, K.N. Marsh, and R.C.Wilhoit, J. Chem. Eng. Data 48, 1344 (2003).

17. C.A. Becker, F. Tavazza, Z.T. Trautt, and R.A. Buarque deMacedo, Curr. Opin. Solid State Mater. Sci. 17, 277 (2013).

18. Z.T. Trautt, F. Tavazza, and C. Becker, Model. Simul.Mater. Sci. Eng. 23, 074009 (2015).

19. M.L. Green, J.R. Hattrick-Simpers, C.L. Choi, I. Takeuchi,A.M. Joshi, S.C. Barron, T. Chiang, A. Davydov, S.Empedocles, J. Gregoire, and A. Mehta, Fulfilling thePromise of the Materials Genome Initiative with High-Throughput Experimentation (MR Society, 2014), http://www.mrs.org/mgi-workshop-full-report/. Accessed 24March 2016.

Dima, Bhaskarla, Becker, Brady, Campbell, Dessauw, Hanisch, Kattner, Kroenlein,Newrock, Peskin, Plante, Li, Rigodiat, Amaral, Trautt, Schmitt, Warren, and Youssef

2064