20
Data Science Solutions by Materials Scientists The Early Case Studies Tony Fast Materials Data Analyst Materials Informatics for Engineering Design Woodruff School of Mechanical Engineering Georgia Institute of Technology *Any MINED shield is a link to a resourc

Data Science Solutions by Materials Scientists: The Early Case Studies

Embed Size (px)

DESCRIPTION

Improvements in algorithms, technology, and computation are directly impacting the landscape of information use in materials science. The 3 V’s of Big Data (volume, velocity, and variety) are becoming evermore apparent within all sectors of the field. Novel approaches will be required to confront the emerging data deluge and extract the richest knowledge from simulated and empirical information in complex evolving 3-D spaces. Microstructure Informatics (μInformatics) is an emerging suite of signal processing techniques, advanced statistical tools, and data science methods tailored specifically for this new frontier. μInformatics curates and transforms large collections of materials science information using efficient workflows to extract knowledge of bi-directional structure-property/processing connections for most material classes. In this talk, a few early case studies in data-driven methods to solve materials science problems will be explored. Emerging spatial statistics tools will be explored that enable an objective comparison of static and evolving 3-D material volumes from molecular dynamics simulation, micro-CT, and Scanning Electron Microscopy. Also, the statistics will provide a foundation to create improved bottom-up homogenization relationships in fuel cell materials. Lastly, applications of the Materials Knowledge System, a data-driven meta-model to create top-down localization relationships will be explored for phase field model and finite element model information.

Citation preview

Page 1: Data Science Solutions by Materials Scientists: The Early Case Studies

Data Science Solutions by Materials Scientists

The Early Case Studies

Tony FastMaterials Data AnalystMaterials Informatics for Engineering Design

Woodruff School of Mechanical Engineering

Georgia Institute of Technology

*Any MINED shield is a link to a resource.

Page 2: Data Science Solutions by Materials Scientists: The Early Case Studies

An Archival and Self Describing Data Format using HDF5

Data and Metadata stored in one file, Support in many languages, and Ideal support for high-dimensional data

*MXADataModel – Archival Data Format – ONR/DARPA Dynamic 3-D Digital Structures Program

Page 3: Data Science Solutions by Materials Scientists: The Early Case Studies

HDF5 - The little zip file that could…

One Dataset – 1.6GB – 4 Experiments –with 160 Datasets each…..no long term value.

Page 4: Data Science Solutions by Materials Scientists: The Early Case Studies

Volume

Variety Velocity

= Big DataPolymer - MD Titanium

Jacobs -GaTech

Bamboo

Martensitic Steel SiC/SiC Al-Cu Solidification

Frasier -OSU Wegst - Dartmouth

Gumbusch Ritchie- LLNL Voorhees - NW

Materials Science

The velocity that data is generated will rise and the speed that it will be analyzed in will decrease.

Page 5: Data Science Solutions by Materials Scientists: The Early Case Studies

Rowenhorst, Lewis, Spanos, Acta Mat, 2010

β-Titanium

REDUCED OUTPUT:Grain sizeGrain FacesNumber of GrainsMean CurvatureNearest Grain Analysis

10 micron resolution with 4300 GrainsCompare with empirical models

Materials Science is a Big Data domain, but it is not treated that way.

Page 6: Data Science Solutions by Materials Scientists: The Early Case Studies

Scalable, objective, parametric materials descriptorsManage data with care for the futureInteroperability, Sharing, and CollaborationEducate data scientists who can extract value from data using statistics, computation, and materials domain knowledge

Embrace complexity in big materials

data

Example Databases

AFLOW, Curtarolo Group Harvard Clean Energy Project Database

Page 7: Data Science Solutions by Materials Scientists: The Early Case Studies

STRUCTURE INFORMATICSWORKFLOW

PHYSICS BASED MODELSSIMULATION EXPERIMENT

MICROSTRUCTURE (MATERIAL) SIGNAL

PROCESSING

ADVANCED & OBJECTIVE STATISTICAL ENCODING

DATA SCIENCE MODULES

INNOVATION ACCOUNTING

INTELL

IGEN

T

DESIG

N O

F EX

PER

IMEN

TS

Microstructure Informatics is a scalable, data-driven system to mine structure-property/processing connections from experimental and simulation materials science information; structure being the independent variable. The system is agnostic to material system and length scale, objectively quantifiable, and rapidly iterates in less cycles for both materials improvement and discovery.

Page 8: Data Science Solutions by Materials Scientists: The Early Case Studies

DATA SCIENCEMODULES

MicrostructureMaterial Structure

ProcessingProperty

Data science modules are machine learning and statistical tools to extract rich bi-directional structure-property/processing linkages from encodings of materials & microstructure datasets. Mining modules create structure taxonomies, homogenization and localization relationships, ground truth comparison between simulation and experiment, materials discovery, and materials improvement.

Page 9: Data Science Solutions by Materials Scientists: The Early Case Studies

ADVANCED & OBJECTIVE STATISTICAL ENCODING

THE MICROSTRUCTURE IS A SAMPLE IN AN IMMENSE STATISTICAL POPULATION.

α-β Titanium

Page 10: Data Science Solutions by Materials Scientists: The Early Case Studies

SPATIALSTATISTICS

t t

t

Statistical correlations between random points in space/time which reveal systematic patterns in the microstructure. Contains the original μS within a translation & inversion. An objective encoding for most materials datasets.

Page 11: Data Science Solutions by Materials Scientists: The Early Case Studies

CURRENT APPLICATIONSmetals, polymers, fuel cells, cmc, md, & a bunch of other

things

TYPES OF SIGNALS sparse, experimental, simulation, heterogeneous, surface,

bulk

The fidelity of the spatial statistics are impacted by how the material structure is parameterized as a signal.

Page 13: Data Science Solutions by Materials Scientists: The Early Case Studies

Mechanical Deformation of Polymer Chains

Molecular Dynamics of

Aluminum Atoms

Page 14: Data Science Solutions by Materials Scientists: The Early Case Studies

MPL

GDL

X-CTFinite Element ModelingStatisticsRegression to connect the statistics with diffusivity values from FEM

Bottom-up Homogenization Relationships

exac

t fit

simulation

mod

el

Page 15: Data Science Solutions by Materials Scientists: The Early Case Studies

FEMε=5e-4

Meta-modeling with Materials Knowledge SystemsTop-down localization relationships

The MKS design filters that capture the effect of the local arrangement of the microstructure on the response. The filters are learned from physics based models and can only be as accurate as

the model never better.

Page 16: Data Science Solutions by Materials Scientists: The Early Case Studies

INPUT OUTPUTControl

Meta-modeling with Materials Knowledge SystemsTop-down localization relationships

The MKS design filters that capture the effect of the local arrangement of the microstructure on the response. The filters are learned from physics based models and can only be as accurate as

the model never better.

Any M

odel

Page 17: Data Science Solutions by Materials Scientists: The Early Case Studies

OTHER APPLICATIONSSpinodal Decomposition, Grain Coarsening, Thermo-mechanical, Polycrystalline

Top-Down Localization Relationships for High Contrast Composites

The MKS is a scalable, parallel meta-model that learns from physics based models to enable rapid simulation at a cost in accuracy.

N2 vs. Nlog(N) complexity It learns top-down localization relationships to extra extreme value

events and enables multiscale integration.

Page 18: Data Science Solutions by Materials Scientists: The Early Case Studies

Structure-Processing MKS

Processing History

Structure-Property

Homogenization

Structure-Property

Localization

Objective parametric descriptors and data science enable integrationof bi-direction structure-property/processing linkages.

Page 19: Data Science Solutions by Materials Scientists: The Early Case Studies

Data enables bidirectional S-P/P, multiscale integration, and higher throughput

CORE TECHNOLOGIES TO FUEL THE DATA AGE OF MATERIALS SCIENCE

Open Access, Open Source Software, Scalable Databases, High-Statistical Throughput Simulation and Experiment, Image

Segmentation, Machine Learning, Scalable Databases, Metadata Integration, Mobile Technology, Visualization, High Performance Computing, Cyberinfrastructure/Collaboratories, Collaboration &

Sharing

Page 20: Data Science Solutions by Materials Scientists: The Early Case Studies

Selected Links

Any shield in this presentation is a link

HDF5 http://www.hdfgroup.org/HDF5/whatishdf5.htmlHDFView http://www.hdfgroup.org/hdf-java-html/hdfview/MXADataModel http://mxa.web.cmu.edu/Background.htmlCurtarolo Group http://www.mems.duke.edu/faculty/stefano-curtaroloAFLOW http://materials.duke.edu/apool.htmlHarvard Clean Energy Project http://www.molecularspace.org/Serial Sectioned Titanium https://cosmicweb.mse.iastate.edu/wiki/pages/viewpage.action?pageId=753830MATIN http://www.materials.gatech.edu/matinMaterials Genome Initiative http://www.whitehouse.gov/mgi