32
Karthik Duraisamy May 18 2017, Univ of Michigan ConFlux: An Ecosystem for Data-enabled Computational Physics

ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Karthik Duraisamy

May 18 2017, Univ of Michigan

ConFlux: An Ecosystem for Data-enabled Computational Physics

Page 2: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

“Commercial” applications

Data

Predictive capability

Machine Learning

• No physical law ;• Data is directly useful for model;• Large amounts of relevant data.

Page 3: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Data, UQ, Physics and all that..

SEQUOIA Team

3

Page 4: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Data, UQ, Physics and all that..

SEQUOIA Team

4

Not really !

Page 5: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Data+

Inference +

Physical model+

Machine Learning+

Theoretical insight +

Problem-specific thought process

+Computer science

= Useful solution

Data &Comp. Science

High perf. computing

Physics & Modeling

Page 6: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Why may this not work in physical systems?

Predictive capability

Machine Learning

• We may never have enough data to address the entire range of physical phenomena

• Boundary conditions, physical constraints, etc.

Data

Page 7: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Combining Data and Physical Modeling

Truth

Model

Data-augmented

model

Page 8: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Combining Machine Learning and Physical Modeling

Predictive capability

Physical Model + Machine Learning

• We will never have enough data to address the entire range of physical phenomena Use physics model to pin data and

constrain it

• Data contains real quantities; Model contains “modeled” quantities (loss of consistency)

• Data will be only loosely connected to model (and not objective)

• Data will be noisy and of variable quality

Data

Page 9: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Field Inversion + Machine learning to Augment Physics-based, Consistent Models

Data

Predictive capability

Physical model + Inference

Machine learning

Physical model + consistent augmentation

• We will never have enough data to address the entire range of physical phenomena Use physics model to pin data and

constrain it

• Data contains real quantities; Model contains “modeled” quantities (loss of consistency) Inference connects real quantities to

modeled ones

• Data will be only loosely connected to model (and not objective) Inference connects secondary, non-

objective data to model quantities

• Data will be noisy and of variable quality Probabilistic casting of inference and

learning

Page 10: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Inverse Problem 1

Gd1

Inverse Problem 2

Gd2

Inverse Problem n

Gdn

Machine Learning / Big Data

PDF / Map of β (η)

Dataset 1 Dataset 2 Dataset n Schematic of Framework

Extrapolation capability

(from theory, existing models)

δ1, η1 δ2, η2 δn, ηn

Page 11: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Inverse Problem 1

Gd1

Inverse Problem 2

Gd2

Inverse Problem n

Machine Learning / Big Data

PDF / Map of δ (η)

Predictive model

M(Q,Qt,Qx, δ(η),..) =0

ηQuery

Realization

Pre-processing Prediction (for one realization)

Dataset 1 Dataset 2 Schematic of Framework

Extrapolation capability

(from theory, existing models)

Dataset n

Gdn

δ1, η1 δ2, η2 δn, ηn

δ

Page 12: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

ConFlux: A Novel Platform for Data-driven Computational Physics

Page 13: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Big Data Machine LearningTraining Tier

Spark, Hadoop, GraphLab, BlinkDB, DBMS, ...

4 TB RAM

Spark, Hadoop, GraphLab, etc for scalable machinelearning training aggregation analytics on massive in-memory and out of core datasets.

Training Tier

Page 14: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Machine LearningTesting /HPC Tier

Pascal GPU + NVLink

Receive results from training Tier and perform massively parallel evaluations (e.g. GP)

(CAPI)/NVLink to allow low Latency data transfers between GPUs and CPUs.

Testing Tier

Big Data Machine LearningTraining Tier

Spark, Hadoop, GraphLab, BlinkDB, DBMS, ...

4TB RAM

Page 15: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

IBM Elastic Storage System (GPFS)

HDFS / MPI-IO / Posix

SSD Buffer 1PB HDD

Two subsystems high throughput bulk storage holding I/O of largesimulations. low latency storage for

check-pointing andtransactional model updates during different iterations of the physical modeling.

IBM ESS supporting GPFS high-throughput I/O withboth traditional and HDFS compatible access.

Storage Tier

Big Data Machine LearningTraining Tier

Spark, Hadoop, GraphLab, BlinkDB, DBMS, ...

4TB RAM

Machine LearningTesting /HPC Tier

Pascal GPU + NVLink

Page 16: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

IBM Elastic Storage System (GPFS)

HDFS / MPI-IO / Posix

SSD Buffer 3PB HDD

Big Data Machine LearningTraining Tier

Spark, Hadoop, GraphLab, BlinkDB, DBMS, ...

4TB RAM

Machine LearningTesting /HPC Tier

Pascal GPU + NVLink

Resource Manager & Scheduler

IBM Platform Symphony

Seamless, multi-tenant andlow-latency schedulingacross heterogeneous HPCand Hadoop-based clusters.

Manage both YARN-based components (e.g., Hadoopnodes, Spark nodes, etc.) aswell as HPC nodes

Scheduling Tier

Page 17: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

IBM Elastic Storage System (GPFS)

HDFS / MPI-IO / Posix

Big Data Machine LearningTraining Tier

Machine LearningTesting /HPC Tier

Resource Manager & Scheduler

SSD Buffer 1PB HDD

Spark, Hadoop, GraphLab, BlinkDB, DBMS, ...

4TB RAMPascal GPU + NVLink

Throughput Big Data Interconnect (100Gbps speed)

IBM Platform Symphony

Control Interconnect (1Gbps)

InfiniBand interconnects to enable both low-latency and high-throughput transfers

TCP drivers to ensurecapability for ourHadoop-based components.

Interconnects

Page 18: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

IBM Elastic Storage System (GPFS)

HDFS / MPI-IO / Posix

Big Data Machine LearningTraining Tier

Machine LearningTesting /HPC Tier

Resource Manager & Scheduler

Commodity HPC cluster (Flux, XSEDE, Globus-Transfer) SSD Buffer 1PB HDD

Spark, Hadoop, GraphLab, BlinkDB, DBMS, ...

4TB RAMPascal GPU + NVLink

Throughput Big Data Interconnect (100Gbps speed)

IBM Platform Symphony

WAN 100Gbps Internet 2 Service via

UM Science DMZ

Control Interconnect (1Gbps)

Page 19: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Example: Turbulence Modeling for Aerodynamics

Singh, A., Medida, S. & Duraisamy, K., Data-augmented Predictive Modeling of Turbulent Separated Flows over Airfoils, AIAA Journal, 2017

Models perform poorly in separated flows.

Page 20: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

True prediction

Singh, A., Medida, S. & Duraisamy, K., Data-augmented Predictive Modeling of Turbulent Separated Flows over Airfoils, AIAA Journal, 2017

Page 21: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Prediction – S805

Collaboration with Altair, Inc.

Page 22: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

True prediction !

Inference used only CL data, NN-augmented model provides considerable predictive improvements of Cp

S 809, Re=2 Million

Page 23: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Variability

α=0

α=14 α=20

Training from different sets

S 809, Re=2 Million

Page 24: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Portability : Implementation in AcuSolve

S809 Airfoil : Predictive results in Commercial CFD solver

Page 25: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Vision for the futureA continuously augmented curated database / website of inferred corrections that are input to the machine learning process

Users upload/download/process data, generate maps.

Page 26: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

• Traditionally, models have been provided through:– A complete set of PDEs with modeled closure terms– A printed article to deliver all this content

• In the future, we expect to provide:– A set of PDEs comprising all that is well known about the

behavior of the turbulence– An auxiliary piece of software that can be embedded into a

RANS solver and that contains the machine-learned closure terms (appropriately version controlled)

– A software repository with clear explanations of the datasets that were used to create the “model”

– Possibly multiple versions of the “models” that have been trained with different datasets and are more appropriate for different flow conditions.

Turbulence models of the future

Page 27: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Precipitate Morphology

G. Treichert & K. Garikipati (Mech Engineering and Materials Science)

Material properties can be significantly improved through alloying and its associated precipitate formation.

● The geometry of the precipitate affects the resulting material properties, and it is driven largely by the minimization of the interfacial and strain energies in the precipitate.

● Current methods rely on phase field models, which involve solving time dependentPDEs, using inherently serial time stepping schemes.

Page 28: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Precipitate Morphology

G. Treichert & K. Garikipati (Mech Engineering and Materials Science)

Page 29: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Materials ModelingThe goal is to identify, explain, predict and ultimately to design the properties and responses of these materials.

Hierarchical models have been developed at several scales These methods have thus far provided insight and qualitative connections to

parameters and phenomena from lower scales, but have not been predictive

Quantum Monte Carlo Density Functional Theory Continuum physics

Profs. Vikram Gavini and Krishna Garikipati (Mech Engineering and Materials Science)

Page 30: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Subject-specific blood flow modeling

Biggest challenges lack of physiologic data to

inform the boundary conditions lack of data on mechanical

properties of the vascular model

Obtain data from tomography and MRI

Solve inverse problem for parameters

Massive data size

On-the-fly Lagrangian computation of Motion

Evaluation of arterial stiffness from medicalImages !

Prof. Alberto Figueroa (Biomedical Engineering & Surgery)

Page 31: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Climate system interactionsThe Earth's climate system is composed of multiple interacting components that span spatial scales of 13 orders of magnitude and temporal scales that range frommicroseconds to centuries. key responses and feedbacks in the system are not well characterized

Understanding how clouds interact with thelarger scale circulation, thermodynamic state, and radiative balance is one of the most challenging problems

We use statistical inversion and machine learning to explore the interaction between changes in the Earths climate system and the radiative fluxes, circulation, and precipitation generated by large scale organized cloud systems.

Prof. Derek Posselt (Atmospheric Oceanic & Space Sciences)

Page 32: ConFlux: An Ecosystem for Data-enabled Computational Physics · Spark, Hadoop, GraphLab, BlinkDB, DBMS, ... 4TB RAM Pascal GPU + NVLink Throughput Big Data Interconnect (100Gbps speed)

Data+

Inference +

Physical model+

Machine Learning+

Theoretical insight +

Problem-specific thought process

+Computer science

= Useful solution

Data &Comp. Science

High perf. computing

Physics & Modeling