34
1 CODAR: Center for Online Data Analysis and Reduction, a “potential” ECP Codesign Center Scott A. Klasky, ORNL October 27, 2016 Xi’an China CODAR Data services Exascale platforms Applications

a “potential” ECP Codesign · PDF filea “potential” ECP Codesign Center Scott A. Klasky, ORNL October ... Improvable Quantum- Mechanics Based Simulations of ... Start with

Embed Size (px)

Citation preview

1

CODAR: Center for Online Data Analysis and Reduction,a “potential” ECP Codesign Center

Scott A. Klasky, ORNLOctober 27, 2016Xi’an China

CODAR

Data services Exascaleplatforms

Applications

2

The CODAR team

PI: Ian Fostera

Co-Is: Scott Klaskyo, Kerstin Kleese-Van Damb, Todd Munsona

Participants: Mark Ainsworthd, Franck Cappelloa, Barbara Chapmanb,s, Jong Choio, Emil Constantinescua, Hanqi Guoa, Tahsin Kurcs, Qing Liuo, Jeremy Logan, Klaus Muellers, George Ostrouchovo, Manish Parasharr, Tom Peterkaa, Norbert Podhorszkio, Dave Pugmireo, Rangan Sukumaro, Stefan Wilda, Matthew Wolfo, Justin Wozniaka, Wei Xub, Shinjae Yoob

a: Argonne National Laboratoryb: Brookhaven National Laboratoryo: Oak Ridge National Laboratory

d: Brown Universityr: Rutgers Universitys: Stony Brook University

3

Survey of Application Motifs

Application Monte Carlo Particles

Sparse

Linear

Algebra

Dense

Linear

Algebra

Spectral

Methods

Unstructured

Grid

Structured

Grid

Comb.

Logic

Graph

Traversal

Dynamical

Program

Backtrack

& Branch

and Bound

Graphical

Models

Finite State

Machine

Cosmology

Subsurface

Materials (QMC)

Additive

Manufacturing

Chemistry for

Catalysts & Plants

Climate Science

Precision Medicine

Machine Learning

QCD for Standard

Model Validation

Accelerator Physics

Nuclear Binding and

Heavy Elements

MD for Materials

Discovery & Design

Magnetically

Confined Fusion

4

Survey of Application Motifs

Application Monte Carlo Particles

Sparse

Linear

Algebra

Dense

Linear

Algebra

Spectral

Methods

Unstructured

Grid

Structured

Grid

Comb.

Logic

Graph

Traversal

Dynamical

Program

Backtrack

& Branch

and Bound

Graphical

Models

Finite State

Machine

Combustion S&T

Free Electron Laser

Data Analytics

Microbiome Analysis

Catalyst Design

Wind Plant Flow

Physics

SMR Core Physics

Next-Gen Engine

Design

Urban Systems

Seismic Hazard

Assessment

Systems Biology

Biological Neutron

Science

Power Grid

Dynamics

5

Survey of Application Motifs

Application Monte Carlo Particles

Sparse

Linear

Algebra

Dense

Linear

Algebra

Spectral

Methods

Unstructured

Grid

Structured

Grid

Comb.

Logic

Graph

Traversal

Dynamical

Program

Backtrack

& Branch

and Bound

Graphical

Models

Finite State

Machine

Stellar Explosions

Excited State

Material Properties

Light Sources

Materials for Energy

Conversion/Storage

Hypersonic Vehicle

Design

Multiphase Energy

Conversion Devices

6

Mapping of Applications to Co-Design Centers

Application Motifs Relevant Co-Design Centers

Computing the Sky at Extreme Scales Particles; Sparse LA; Spectral; Structured Grid; Graph Traversal CODAR (ANL); AMR (LBNL); Particles (LANL); Graph (PNNL)

Exascale Deep Learning and Simulation Enabled Precision

Medicine for Cancer

Sparse LA, Dense LA, Combinatorial Logic, Graph Traversal, Dynamic

Programming, Backtrack and Branch-and-Bound, Graphical Models, Finite State

Machine

Data Analytics (SLAC); CODAR (ANL); GraphEx (PNNL)

Exascale Lattice Gauge Theory Opportunities and

Requirements for Nuclear and High Energy PhysicsMonte Carlo, Sparse LA, Dense LA, Spectral, Structured Grid CODAR (ANL); AMR (LBNL); ExaMC (LANL)

Molecular Dynamics at the Exascale: Spanning the Accuracy,

Length and Time Scales for Critical Problems in Materials

Science

Particles, Sparse LA, Dense LA, Spectral, Structured Grid CODAR (ANL): Particles (LANL); GraphEx (PNNL); QUASIX (ORNL)

Exascale Modeling of Advanced Particle Accelerators Particles, Sparse LA, Spectral, Structured Grid AMR (LBNL); Particles (LANL); Data Science (FNAL)

An Exascale Subsurface Simulator of Coupled Flow,

Transport, Reactions and MechanicsSparse LA, Structured Grid, Unstructured Grid AMR (LBNL); CEED (LLNL); CHIME (LLNL); PUMA (SNL)

Exascale Predictive Wind Plant Flow Physics Modeling Sparse LA, Dense LA, Unstructured Grid AMR (LBNL); CHIME (LLNL); PUMA (SNL)

QMCPACK: A Framework for Predictive and Systematically

Improvable Quantum- Mechanics Based Simulations of

Materials

Monte Carlo, Particles, Sparse LA, Dense LA, Spectral, Dynamic Programming Particles (LANL); CHIME (LLNL); ExaMC (LANL); QUASIX (ORNL); CODAR (ANL)

Coupled Monte Carlo Neutronics and Fluid Flow Simulation

of Small Modular Reactors

Monte Carlo, Particles, Sparse LA, Dense LA, Spectral, Structured Grid,

Unstructured GridParticles (LANL); CHIME (LLNL); ExaMC (LANL), CODAR (ANL)

Transforming Additive Manufacturing through Exascale

Simulation (TrAMEx)Particles, Sparse LA, Dense LA, Spectral, Unstructured Grid AMR (LBNL); Particles (LANL); CEED (LLNL); PUMA (SNL), CODAR (ANL)

NWChemEx: Tackling Chemical, Materials and Biomolecular

Challenges in the Exascale EraMonte Carlo, Sparse LA, Dense LA, Spectral, Structured Grid, Graph Traversal QUASIX (ORNL); ExaMC (LANL); GraphEx (PNNL); AMR (LBNL)

High-Fidelity Whole Device Modeling of Magnetically

Confined Fusion PlasmaParticles, Sparse LA, Spectral, Structured Grid, Unstructured Grid CODAR (ANL); Particles (LANL); AMR (LBNL); CEED (LLNL); PUMA (SNL)

7

Mapping of Applications to Co-Design Centers

Application Motifs Relevant Co-Design Centers

Data Analytics at the Exascale for Free Electron LasersMonte Carlo, Particles, Sparse LA, Dense LA, Spectral, Structured Grid,

Dynamic Programming, Backtrack and Branch-and-Bound, Graphical ModelsCODAR (ANL); Particles (LANL); AMR (LBNL); GraphEx (PNNL)

Transforming Combustion Science and Technology with

Exascale SimulationsParticles, Sparse LA, Dense LA, Structured Grid CODAR (ANL); AMR (LBNL); Particles (LANL)

Cloud-Resolving Climate Modeling of the Earth's Water

CycleParticles, Sparse LA, Structured Grid, Unstructured Grid

CODAR (ANL); CEED (LLNL); CHIME (LLNL); PUMA (SNL); Data Analytics (SLAC);

GraphEx (PNNL)

Enabling GAMESS for Exascale Computing in Chemistry &

Materials [seed]Monte Carlo, Particles, Sparse LA, Dense LA, Spectral ExaMC (LANL); QUASIX (ORNL); Particles (LANL)

Multiscale Coupled Urban Systems [seed] Sparse LA, Spectral, Structured Grid, Unstructured Grid CEED (LLNL); AMR (LBNL); PUMA (SNL)

Exascale Models of Stellar Explosions: Quintessential Multi-

Physics Simulation [seed]Monte Carlo, Particles, Sparse LA, Dense LA, Structured Grid AMR (LBNL); Particles (LANL); ExaMC (LANL)

Exascale Solutions for Microbiome Analysis [seed]Sparse LA, Combinatorial Logic, Graph Traversal, Dynamic Programming,

Graphical ModelsGraphEx (PNNL)

High Performance, Multidisciplinary Simulations for Regional

Scale Seismic Hazard and Risk Assessments [seed]Sparse LA, Spectral, Structured Grid, Unstructured Grid, Dynamic Programming AMR (LBNL); CEED (LLNL); PUMA (SNL)

Performance Prediction of Multiphase Energy Conversion

Devices with Discrete Element, Particle-in-Cell, and Two-

Fluid Models (MFIX-Exa) [seed]

Particles, Sparse LA, Structured Grid, Unstructured Grid AMR (LBNL); CEED (LLNL); PUMA (SNL); Particles (LANL)

Optimizing Stochastic Grid Dynamics at Exascale [seed]Monte Carlo, Sparse LA, Dense LA, Graph Traversal, Dynamic Programming,

Backtrack and Branch-and-BoundExaMC (LANL); GraphEx (PNNL)

8

Computation: Fusion•Develop high-fidelity Whole Device Model (WDM)of magnetically confined fusion plasmas to predictthe performance of ITER

•Couple existing, well established extreme-scalegyrokinetic codes•GENE continuum code for the plasma core•XGC (PIC) code for the plasma edge

•Data challenges•Couple codes (XGC to GENE) using a SOA•Large volumes of data to place inknowledge repository

• Math challenges: stabilityand accuracy

• PMI, Dust, RF, Neutral particles, Ohmicpower supply, Poloidal field, Magnetic Equilibrium, RMP coils

• Fusion reaction: α-particles, neutrons

9

Filesystem/network bandwidth falls behind CPU/memory: Fewer bytes/operation

Swap I/O for CPU cycles:

• Data (de)compression

• Online data analysis

Right bytes in right place at right time!

Applications are already demanding 100 PBs of data for ‘medium-term” 12-months.

The compute-data gap is a major challenge for exascale

10

The co-design concept and exascale computing

Exascale co-design: Evaluate, deploy, and integrate exascale hardware-savvy software designs and technologies for key crosscutting algorithmic motifs into applications

11

The need for online data analysis and reduction

Traditional approach: Simulate, output, analyze

Write simulation output to secondary storage; read back for analysis

Decimate in time when simulation output rate exceeds output rate of computer

New approach:Online data analysis and reduction

Co-optimize simulation, analysis, reduction for performance and information output

Substitute CPU cycles for I/O, via data (de)compression and/or online data analysis

Right bytes in right place at right time

12

CODAR codesign questions

• What are the best data analysis and reduction algorithms for different application classes, in terms of speed, accuracy, and resource requirements? How can we implement those algorithms to achieve scalability and performance portability?

• What are the tradeoffs in data analysis accuracy, resource needs, and overall application performance between using various data reduction methods to reduce file size prior to offline data reconstruction and analysis vs. performing more online data analysis? How do these tradeoffs vary with exascale hardware and software choices?

• How do we effectively orchestrate online data analysis and reduction to reduce associated overheads? How can exascale hardware and software help with orchestration?

13

Mission• To create the infrastructure for adding data services for

Exascale Applications• Analysis

• Reduction

• Interface to ECP applications and ECP Software for “data-related” activities

14

Start with the “challenge problems” for CODAR• Challenges in Data Reduction

• Understanding the science requires massive data reduction • How do we reduce

•The time spent in reducing the data to knowledge?•The amount of data being moved on the exascale platform?•The amount of data being read from the storage system?•The amount of data stored in memory, on the storage system, being moved over the WAN?

• Without removing the knowledge.•Requires our team to take deep dives into the application post processing routines and simulations

• Goal is to create infrastructure, reduction routines, and analysis routines• General: e.g., can reduce Nbytes to Mbytes, with N>>M• Motif-specific: e.g., better for finite difference mesh vs. particles vs. finite

elements• Application-specific: e.g., reduced physics allows us to understand the deltas

15

driver: short high-power laser with period T0

separation of time scales: instantaneous, linear force ✔ effective, non-linear force over T

0=300 steps

store 5TB x 300 steps = 1.5 PB (150GB/node) ?

→ derive force online: (100ms between updates)→ precision: allow energy evolution of test-particle (stream lines)

each time step:500 MB / node x 10k nodes

→ 5TB instantaneous field data

Challenge 1: Laser-Ion AccelerationHigh-Frequency, Time-Averaged, Derived Force &

Particle Energy Gain

16

each time step:500 MB / node x 10k nodes

→ 5TB instantaneous field data

Challenge 1: Laser-Ion AccelerationHigh-Frequency, Time-Averaged, Derived Force &

Particle Energy Gain

t t+1 t+2 ... t+T0

Ex

Ey

Ez

|E|

Ex

Ey

Ez

|E|

Ex

Ey

Ez

|E|

average element-wise

5TB 5TB 5TB still 5TB! ✔

<Exyz

> & <|E|>→ substract→ stencil= force

stream lines= acceleration

process!

Lorentz Force includes magnetic fields (+ 5TB),but they might be weak: Only store if necessary

17

intrinsically coupled: particle injection & acceleration

time

Example: laser electron acceleration; same applies to laser ion acceleration

randomlydistributedtest particles

Some will endup here

ensemble of ~109 particles in small phase space volume at tend

is selectedfrom ~1012 particles

can we determine region(s) of origin & trajectories? more efficiently then “mark IDs & rerun”?

microscopicobservablesof process

Challenge 2: Laser-Plasma Accelerators

Inverse

Problem of

Particle

Origin &

Trajectories

18

High-Fidelity Whole Device Model of Magnetically Confined Fusion Plasma

• Progress in magnetic fusion energy relies on understanding the complex processes of plasma confinement. High fidelity simulation across the entire fusion plasma requires exascale resources. Novel coupling of scalable core and edge plasma codes will be the first critical step toward such a model.

• XGC1: Particles and a Finite Element Mesh• In general the PDF is a Maxwellian• Macro particles can represent the

particles – similar to “the cells”• Allows us to reduce in phase space by

saving only the large (e) and save themajority of data in “macro” particles.

• Our team is working with C. S. Chang,M. Churchill, F. Jenko – contact – E. Suchyta, J. Choi, G. Liu

• GENE - Finite Difference mesh. Large output. Eric

• GEM – S. Parker (Colorado) – wants to add physics based reduction methods

co

un

t

E

19

Prototypical CODAR data analysis and reduction pipeline

CODAR runtimeReduced output and reconstruction info

I/O system

CODAR data API

Running simulation

Multivariate statistics

Feature analysis

Outlier detection

Application-aware

Transforms

Encodings

Error calculation

Refinement hints

CO

DA

R d

ata AP

I

Offlin

e d

ata analysis

Simulation knowledge: application, models, numerics, performance optimization, …

CODAR data analysis

CODAR data reduction

CODAR data monitoring

20

Runtime

In situ slowdown

vs. reduction

Storage vs. accuracy

Policies and hints Orchestration

Analysis, reduction, monitoring

catalog

Delivery

Service providers:

ADIOS staging, Decaf, Swift

I/O interfaces

Data (re)org

Awareness

Scheduling and placement

Platform awareness:

memory, bandwidth, utilization

Application awareness:

triggers, end-to-end progress

21

CODAR building blocks• ADIOS

• Dataspaces

• DECAF

• SZ/EZ

• Swift

• ProvEn

22

XSSA: eXtreme Scale Service Architecture• Philosophy based on Service-Oriented Architecture

• System management• Changing requirements• Evolving target platforms• Diverse, distributed teams

• Applications built by assembling services• Universal view of functionality• Well defined API

• Implementations can be easily modified and assembled

• Manage complexity while maintaining performance, scalability• Scientific problems and codes• Underlying disruptive infrastructure• Coordination across codes and research teams• End-to-end workflows

23

ADIOS roadmap to Exascale2017

• Create test harness

• Create a clearer, more modular layering of application interfaces, data abstractions, and runtime components

• Burst buffer support

• New methods for CORAL optimizations

2018

• Code coupling support with hybrid staging

• Living workflow

• Support for new programming models

• WAN staging

2019

• EOD integration for validationworkflows

• Ensembleworkflow optimizations

• Data Model support for software ecosystem

24

Transformation layer• Designed for data conversions,

compression, and transformation• zlib, bzip2, szip, ISOBAR, ALACRITY, FastBit, • that can transform local data on each

processor

• Transparent for users• User code read/write the original

untransformed data

• Applications• Compressed output• Automatically indexed data• Local Data Reorganization• Data Reduction

• Released in ADIOS 1.6 in 2013 with compression transformations

User Application

ADIOS

Variable A

I/O Transport Layer

Regular var.

BP file, staging area, etc.

DataTransform

Layer

Variable B

Plugin Read

Transform Plugin

Plugin Write

Transformed var.

Staging

• Use compute and deep-memory hierarchies to optimize overallworkflow for power vs. performance tradeoffs

• Abstract complex/deep memory hierarchy access

• Placement of analysis and visualization tasks in a complex system

• Impact of network data movement compared to memory movement

• Abstraction allows staging

• On-same core

• On different cores

• On different nodes

• On different machines

• Through the storage system

26

Reduction comes with challenges• Handling high entropy

• Performance – no benefit otherwise

• Not only errors in variable itselfΕ ≡ 𝑓 − ሚ𝑓must also consider impact onderived quantities:

Ε ≡ (𝑔𝑙𝑡(𝑓 റ𝑥, 𝑡 ) −

෫𝑔𝑙𝑡( ෪𝑓𝑙

𝑡( റ𝑥, 𝑡 ) Where

did

it

go?

400 X reduction techniques

27

Several HPC floating point compression algorithms have emerged

• Current interest is with lossy algorithms, some use preprocessing

• Lossless usually achieves up to ~3x reduction

• ISABELA

• SZ coming to ADIOS

• ZFP in ADIOS 1.10

• Linear auditing

• SVD

• Adaptive gradient methods

Compress each variable separately: Several variables simultaneously:

• PCA

• Tensor decomposition

• …

28

Lossy compression with Argonne SZ• No existing compressor can reduce hard to

compress datasets by more than a factor of 2.

• Objective 1: Reduce hard to compress datasets by one order of magnitude

• Objective 2: Add user required error controls (error bound, shape of error distribution, spectral behavior of error function, etc. etc.)

NCAR atmosphere

simulationoutput

(1.5 TB)

WRF hurricane

simulationoutput

Advanced Photon Source

mouse brain data

What we need to compress

(bit map of 128 floating point

numbers):

Random noise

29

Next generation version of ZFP

• ZFP enables a progressive segmentation of data into low, medium, and high precision “buckets”

• Work in blocks of 4 elements in each dimension• A family of spatially-decorrelating transforms (e.g.

DCT), can be parameterized as follows:,

• ZFP chooses because it can be implemented in a highly optimized fashion

• Assume smoothness: zig-zag ordering “sorts” coefficients

• Arrange by bit plane, set how many bits or bit planes to keep in each of N output streams, and where to write each

is an adjustable parameter

Almost correct summary: fast, piece-wise FFT, with tunable precisions for Fourier coefficients, splitable into N pieces on disk

2D data

P. Lindstrom, "Fixed-Rate Compressed Floating-Point Arrays," in IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2674-2683, 2014.

l2-norm bounded in each 4x4 block

30

ISABELA: Sort and Spline Compression• Preconditioning: sort

the data (in chunks) –smoothens data

• Then fit cubic-splines, to l∞-norm error tolerance• Fewer fit parameters

needed because of preconditioning

• Losslessly compress differentials to ensure point-wise error bounds, not just “chunk-averaged” ones

31

CODAR: Machine Learning Potentials • NWChemEx

• Molecular Dynamics (MD)

• Molecular dynamics (MD) simulations are essential to address the modeling of large molecular systems with appropriate models for the complex realistic environments affecting the chemical processes of interest.

• Rare trajectories event identification (Anomaly Detection)•NWChem use methods that bias the calculations to make rare events more likely

•It would be good if amongst a bunch of trajectories we could identify the ones where something interesting happens and give more compute resources to those paths and downgrade others.

• Climate• Causation of interactions (i.e. aerosol- cloud interaction)

• Cancer• Potential contributions of applying and/or develop DL methods

32

Provenance- data reduction for performance data

• What is the “best” data to save?

33

Performance Optimization Strategies for Data Intensive Applications• Data Reduction and Analysis Algorithms and Workflows need to run as

efficiently as the numerical applications they serve on the Exascale systems, so as not to create bottlenecks for the computational studies.

• These applications need to adapt to the new hardware incl. deeper memory hierarchies and different architectural swim lanes. Our team will explore: Runtime System enhancements,

• Programming Models and Compiler directives that specifically benefit data intensive applications as used by ExaFEL (SLAC), Cancer (ANL), as well as numerical application with extreme volumes of data such as Fusion, Climate and Cosmology.

• We will provide tailored improved memory management (supported through OpenMP), communication and resource management strategies, as well as execution performance improvements and performance portability through the use of tailored compiler directives and strategies.

34

Applications Targeted

• Climate (ACME), Materials Science, Magnetic Fusion Energy, Chemistry, Cosmology, QCD, Experimental Light Sources

Software Technologies Cited

• MPI, OpenMP, Pthreads

• ADIOS, Swift, ZFP, SZ, Globus, DIY/Decaf, VTK-m, PETSc/TAO, Trilinos

Hardware Technologies Addressed

• Communication network, on-node memory hierarchy, system solid state memory (placement and usage for burst buffers), I/O design for data movement on and off exascale system

CODAR

Center Objectives (Co-Designed Motifs) Applications and S/W & H/W Technologies

Development PlanRisks and Challenges

PI: Ian Foster (ANL); Institutions: ANL, BNL, ORNL, Brown, Rutgers, Stony Brook

• Motif(s): Online data analysis and reduction

• Address growing disparity between simulation speeds and I/O rates rendering it infeasible for HPC and data analytic applications to perform offline analysis

• Target common data analysis and reduction methods (e.g., feature and outlier detection, compression) and methods specific to particular data types and domains (e.g., particles, FEM)

• Reduce application development risk by providing performance tradeoffs for offline vs inline analyses of simulation results, produce and integrate into applications high-performance products embodying data analysis and reduction

• Key metrics: use of CODAR tools and technologies by applications; improvements in data analysis efficiency and information-to-data ratios in application outputs

Y1: Application engagements under way; experiments on LCF systems; CODAR Services Beta with first integrated data analysis and reduction services; Domain-specific services

Y2: Experiments on ECP testbed; First CODAR Services monitoring capability; Expanded set of data analysis and reduction services; CODAR Runtime Beta with integrated control layer; Demonstration of 10:1 data reduction for two ECP apps

Y3: CODAR Services and Runtime v1.0 with additional services included; CODAR used inside four ECP applications on LCF systems

Y4: CODAR v2.0, with adaptive data reduction; At-scale demonstrations on LCF systems; CODAR used in four ECP applications with 100:1 data reduction

• Ineffective interaction with application project teams

• Insufficient resources to perform work

• Scope creep

• Unknown performance of tools at exascale

• Poor performance, scalability, or energy efficiency

• Fail to meet application team delivery expectations

• Poor interaction with ECP and other software teams

• Exascale apps not ready

• Test systems not available