Upload
phungcong
View
213
Download
0
Embed Size (px)
Citation preview
1
CODAR: Center for Online Data Analysis and Reduction,a “potential” ECP Codesign Center
Scott A. Klasky, ORNLOctober 27, 2016Xi’an China
CODAR
Data services Exascaleplatforms
Applications
2
The CODAR team
PI: Ian Fostera
Co-Is: Scott Klaskyo, Kerstin Kleese-Van Damb, Todd Munsona
Participants: Mark Ainsworthd, Franck Cappelloa, Barbara Chapmanb,s, Jong Choio, Emil Constantinescua, Hanqi Guoa, Tahsin Kurcs, Qing Liuo, Jeremy Logan, Klaus Muellers, George Ostrouchovo, Manish Parasharr, Tom Peterkaa, Norbert Podhorszkio, Dave Pugmireo, Rangan Sukumaro, Stefan Wilda, Matthew Wolfo, Justin Wozniaka, Wei Xub, Shinjae Yoob
a: Argonne National Laboratoryb: Brookhaven National Laboratoryo: Oak Ridge National Laboratory
d: Brown Universityr: Rutgers Universitys: Stony Brook University
3
Survey of Application Motifs
Application Monte Carlo Particles
Sparse
Linear
Algebra
Dense
Linear
Algebra
Spectral
Methods
Unstructured
Grid
Structured
Grid
Comb.
Logic
Graph
Traversal
Dynamical
Program
Backtrack
& Branch
and Bound
Graphical
Models
Finite State
Machine
Cosmology
Subsurface
Materials (QMC)
Additive
Manufacturing
Chemistry for
Catalysts & Plants
Climate Science
Precision Medicine
Machine Learning
QCD for Standard
Model Validation
Accelerator Physics
Nuclear Binding and
Heavy Elements
MD for Materials
Discovery & Design
Magnetically
Confined Fusion
4
Survey of Application Motifs
Application Monte Carlo Particles
Sparse
Linear
Algebra
Dense
Linear
Algebra
Spectral
Methods
Unstructured
Grid
Structured
Grid
Comb.
Logic
Graph
Traversal
Dynamical
Program
Backtrack
& Branch
and Bound
Graphical
Models
Finite State
Machine
Combustion S&T
Free Electron Laser
Data Analytics
Microbiome Analysis
Catalyst Design
Wind Plant Flow
Physics
SMR Core Physics
Next-Gen Engine
Design
Urban Systems
Seismic Hazard
Assessment
Systems Biology
Biological Neutron
Science
Power Grid
Dynamics
5
Survey of Application Motifs
Application Monte Carlo Particles
Sparse
Linear
Algebra
Dense
Linear
Algebra
Spectral
Methods
Unstructured
Grid
Structured
Grid
Comb.
Logic
Graph
Traversal
Dynamical
Program
Backtrack
& Branch
and Bound
Graphical
Models
Finite State
Machine
Stellar Explosions
Excited State
Material Properties
Light Sources
Materials for Energy
Conversion/Storage
Hypersonic Vehicle
Design
Multiphase Energy
Conversion Devices
6
Mapping of Applications to Co-Design Centers
Application Motifs Relevant Co-Design Centers
Computing the Sky at Extreme Scales Particles; Sparse LA; Spectral; Structured Grid; Graph Traversal CODAR (ANL); AMR (LBNL); Particles (LANL); Graph (PNNL)
Exascale Deep Learning and Simulation Enabled Precision
Medicine for Cancer
Sparse LA, Dense LA, Combinatorial Logic, Graph Traversal, Dynamic
Programming, Backtrack and Branch-and-Bound, Graphical Models, Finite State
Machine
Data Analytics (SLAC); CODAR (ANL); GraphEx (PNNL)
Exascale Lattice Gauge Theory Opportunities and
Requirements for Nuclear and High Energy PhysicsMonte Carlo, Sparse LA, Dense LA, Spectral, Structured Grid CODAR (ANL); AMR (LBNL); ExaMC (LANL)
Molecular Dynamics at the Exascale: Spanning the Accuracy,
Length and Time Scales for Critical Problems in Materials
Science
Particles, Sparse LA, Dense LA, Spectral, Structured Grid CODAR (ANL): Particles (LANL); GraphEx (PNNL); QUASIX (ORNL)
Exascale Modeling of Advanced Particle Accelerators Particles, Sparse LA, Spectral, Structured Grid AMR (LBNL); Particles (LANL); Data Science (FNAL)
An Exascale Subsurface Simulator of Coupled Flow,
Transport, Reactions and MechanicsSparse LA, Structured Grid, Unstructured Grid AMR (LBNL); CEED (LLNL); CHIME (LLNL); PUMA (SNL)
Exascale Predictive Wind Plant Flow Physics Modeling Sparse LA, Dense LA, Unstructured Grid AMR (LBNL); CHIME (LLNL); PUMA (SNL)
QMCPACK: A Framework for Predictive and Systematically
Improvable Quantum- Mechanics Based Simulations of
Materials
Monte Carlo, Particles, Sparse LA, Dense LA, Spectral, Dynamic Programming Particles (LANL); CHIME (LLNL); ExaMC (LANL); QUASIX (ORNL); CODAR (ANL)
Coupled Monte Carlo Neutronics and Fluid Flow Simulation
of Small Modular Reactors
Monte Carlo, Particles, Sparse LA, Dense LA, Spectral, Structured Grid,
Unstructured GridParticles (LANL); CHIME (LLNL); ExaMC (LANL), CODAR (ANL)
Transforming Additive Manufacturing through Exascale
Simulation (TrAMEx)Particles, Sparse LA, Dense LA, Spectral, Unstructured Grid AMR (LBNL); Particles (LANL); CEED (LLNL); PUMA (SNL), CODAR (ANL)
NWChemEx: Tackling Chemical, Materials and Biomolecular
Challenges in the Exascale EraMonte Carlo, Sparse LA, Dense LA, Spectral, Structured Grid, Graph Traversal QUASIX (ORNL); ExaMC (LANL); GraphEx (PNNL); AMR (LBNL)
High-Fidelity Whole Device Modeling of Magnetically
Confined Fusion PlasmaParticles, Sparse LA, Spectral, Structured Grid, Unstructured Grid CODAR (ANL); Particles (LANL); AMR (LBNL); CEED (LLNL); PUMA (SNL)
7
Mapping of Applications to Co-Design Centers
Application Motifs Relevant Co-Design Centers
Data Analytics at the Exascale for Free Electron LasersMonte Carlo, Particles, Sparse LA, Dense LA, Spectral, Structured Grid,
Dynamic Programming, Backtrack and Branch-and-Bound, Graphical ModelsCODAR (ANL); Particles (LANL); AMR (LBNL); GraphEx (PNNL)
Transforming Combustion Science and Technology with
Exascale SimulationsParticles, Sparse LA, Dense LA, Structured Grid CODAR (ANL); AMR (LBNL); Particles (LANL)
Cloud-Resolving Climate Modeling of the Earth's Water
CycleParticles, Sparse LA, Structured Grid, Unstructured Grid
CODAR (ANL); CEED (LLNL); CHIME (LLNL); PUMA (SNL); Data Analytics (SLAC);
GraphEx (PNNL)
Enabling GAMESS for Exascale Computing in Chemistry &
Materials [seed]Monte Carlo, Particles, Sparse LA, Dense LA, Spectral ExaMC (LANL); QUASIX (ORNL); Particles (LANL)
Multiscale Coupled Urban Systems [seed] Sparse LA, Spectral, Structured Grid, Unstructured Grid CEED (LLNL); AMR (LBNL); PUMA (SNL)
Exascale Models of Stellar Explosions: Quintessential Multi-
Physics Simulation [seed]Monte Carlo, Particles, Sparse LA, Dense LA, Structured Grid AMR (LBNL); Particles (LANL); ExaMC (LANL)
Exascale Solutions for Microbiome Analysis [seed]Sparse LA, Combinatorial Logic, Graph Traversal, Dynamic Programming,
Graphical ModelsGraphEx (PNNL)
High Performance, Multidisciplinary Simulations for Regional
Scale Seismic Hazard and Risk Assessments [seed]Sparse LA, Spectral, Structured Grid, Unstructured Grid, Dynamic Programming AMR (LBNL); CEED (LLNL); PUMA (SNL)
Performance Prediction of Multiphase Energy Conversion
Devices with Discrete Element, Particle-in-Cell, and Two-
Fluid Models (MFIX-Exa) [seed]
Particles, Sparse LA, Structured Grid, Unstructured Grid AMR (LBNL); CEED (LLNL); PUMA (SNL); Particles (LANL)
Optimizing Stochastic Grid Dynamics at Exascale [seed]Monte Carlo, Sparse LA, Dense LA, Graph Traversal, Dynamic Programming,
Backtrack and Branch-and-BoundExaMC (LANL); GraphEx (PNNL)
8
Computation: Fusion•Develop high-fidelity Whole Device Model (WDM)of magnetically confined fusion plasmas to predictthe performance of ITER
•Couple existing, well established extreme-scalegyrokinetic codes•GENE continuum code for the plasma core•XGC (PIC) code for the plasma edge
•Data challenges•Couple codes (XGC to GENE) using a SOA•Large volumes of data to place inknowledge repository
• Math challenges: stabilityand accuracy
• PMI, Dust, RF, Neutral particles, Ohmicpower supply, Poloidal field, Magnetic Equilibrium, RMP coils
• Fusion reaction: α-particles, neutrons
9
Filesystem/network bandwidth falls behind CPU/memory: Fewer bytes/operation
Swap I/O for CPU cycles:
• Data (de)compression
• Online data analysis
Right bytes in right place at right time!
Applications are already demanding 100 PBs of data for ‘medium-term” 12-months.
The compute-data gap is a major challenge for exascale
10
The co-design concept and exascale computing
Exascale co-design: Evaluate, deploy, and integrate exascale hardware-savvy software designs and technologies for key crosscutting algorithmic motifs into applications
11
The need for online data analysis and reduction
Traditional approach: Simulate, output, analyze
Write simulation output to secondary storage; read back for analysis
Decimate in time when simulation output rate exceeds output rate of computer
New approach:Online data analysis and reduction
Co-optimize simulation, analysis, reduction for performance and information output
Substitute CPU cycles for I/O, via data (de)compression and/or online data analysis
Right bytes in right place at right time
12
CODAR codesign questions
• What are the best data analysis and reduction algorithms for different application classes, in terms of speed, accuracy, and resource requirements? How can we implement those algorithms to achieve scalability and performance portability?
• What are the tradeoffs in data analysis accuracy, resource needs, and overall application performance between using various data reduction methods to reduce file size prior to offline data reconstruction and analysis vs. performing more online data analysis? How do these tradeoffs vary with exascale hardware and software choices?
• How do we effectively orchestrate online data analysis and reduction to reduce associated overheads? How can exascale hardware and software help with orchestration?
13
Mission• To create the infrastructure for adding data services for
Exascale Applications• Analysis
• Reduction
• Interface to ECP applications and ECP Software for “data-related” activities
14
Start with the “challenge problems” for CODAR• Challenges in Data Reduction
• Understanding the science requires massive data reduction • How do we reduce
•The time spent in reducing the data to knowledge?•The amount of data being moved on the exascale platform?•The amount of data being read from the storage system?•The amount of data stored in memory, on the storage system, being moved over the WAN?
• Without removing the knowledge.•Requires our team to take deep dives into the application post processing routines and simulations
• Goal is to create infrastructure, reduction routines, and analysis routines• General: e.g., can reduce Nbytes to Mbytes, with N>>M• Motif-specific: e.g., better for finite difference mesh vs. particles vs. finite
elements• Application-specific: e.g., reduced physics allows us to understand the deltas
15
driver: short high-power laser with period T0
separation of time scales: instantaneous, linear force ✔ effective, non-linear force over T
0=300 steps
store 5TB x 300 steps = 1.5 PB (150GB/node) ?
→ derive force online: (100ms between updates)→ precision: allow energy evolution of test-particle (stream lines)
each time step:500 MB / node x 10k nodes
→ 5TB instantaneous field data
Challenge 1: Laser-Ion AccelerationHigh-Frequency, Time-Averaged, Derived Force &
Particle Energy Gain
16
each time step:500 MB / node x 10k nodes
→ 5TB instantaneous field data
Challenge 1: Laser-Ion AccelerationHigh-Frequency, Time-Averaged, Derived Force &
Particle Energy Gain
t t+1 t+2 ... t+T0
Ex
Ey
Ez
|E|
Ex
Ey
Ez
|E|
Ex
Ey
Ez
|E|
average element-wise
5TB 5TB 5TB still 5TB! ✔
<Exyz
> & <|E|>→ substract→ stencil= force
stream lines= acceleration
process!
Lorentz Force includes magnetic fields (+ 5TB),but they might be weak: Only store if necessary
17
intrinsically coupled: particle injection & acceleration
time
Example: laser electron acceleration; same applies to laser ion acceleration
randomlydistributedtest particles
Some will endup here
ensemble of ~109 particles in small phase space volume at tend
is selectedfrom ~1012 particles
can we determine region(s) of origin & trajectories? more efficiently then “mark IDs & rerun”?
microscopicobservablesof process
Challenge 2: Laser-Plasma Accelerators
Inverse
Problem of
Particle
Origin &
Trajectories
18
High-Fidelity Whole Device Model of Magnetically Confined Fusion Plasma
• Progress in magnetic fusion energy relies on understanding the complex processes of plasma confinement. High fidelity simulation across the entire fusion plasma requires exascale resources. Novel coupling of scalable core and edge plasma codes will be the first critical step toward such a model.
• XGC1: Particles and a Finite Element Mesh• In general the PDF is a Maxwellian• Macro particles can represent the
particles – similar to “the cells”• Allows us to reduce in phase space by
saving only the large (e) and save themajority of data in “macro” particles.
• Our team is working with C. S. Chang,M. Churchill, F. Jenko – contact – E. Suchyta, J. Choi, G. Liu
• GENE - Finite Difference mesh. Large output. Eric
• GEM – S. Parker (Colorado) – wants to add physics based reduction methods
co
un
t
E
19
Prototypical CODAR data analysis and reduction pipeline
CODAR runtimeReduced output and reconstruction info
I/O system
CODAR data API
Running simulation
Multivariate statistics
Feature analysis
Outlier detection
Application-aware
Transforms
Encodings
Error calculation
Refinement hints
CO
DA
R d
ata AP
I
Offlin
e d
ata analysis
Simulation knowledge: application, models, numerics, performance optimization, …
CODAR data analysis
CODAR data reduction
CODAR data monitoring
20
Runtime
In situ slowdown
vs. reduction
Storage vs. accuracy
Policies and hints Orchestration
Analysis, reduction, monitoring
catalog
Delivery
Service providers:
ADIOS staging, Decaf, Swift
I/O interfaces
Data (re)org
Awareness
Scheduling and placement
Platform awareness:
memory, bandwidth, utilization
Application awareness:
triggers, end-to-end progress
22
XSSA: eXtreme Scale Service Architecture• Philosophy based on Service-Oriented Architecture
• System management• Changing requirements• Evolving target platforms• Diverse, distributed teams
• Applications built by assembling services• Universal view of functionality• Well defined API
• Implementations can be easily modified and assembled
• Manage complexity while maintaining performance, scalability• Scientific problems and codes• Underlying disruptive infrastructure• Coordination across codes and research teams• End-to-end workflows
23
ADIOS roadmap to Exascale2017
• Create test harness
• Create a clearer, more modular layering of application interfaces, data abstractions, and runtime components
• Burst buffer support
• New methods for CORAL optimizations
2018
• Code coupling support with hybrid staging
• Living workflow
• Support for new programming models
• WAN staging
2019
• EOD integration for validationworkflows
• Ensembleworkflow optimizations
• Data Model support for software ecosystem
24
Transformation layer• Designed for data conversions,
compression, and transformation• zlib, bzip2, szip, ISOBAR, ALACRITY, FastBit, • that can transform local data on each
processor
• Transparent for users• User code read/write the original
untransformed data
• Applications• Compressed output• Automatically indexed data• Local Data Reorganization• Data Reduction
• Released in ADIOS 1.6 in 2013 with compression transformations
User Application
ADIOS
Variable A
I/O Transport Layer
Regular var.
BP file, staging area, etc.
DataTransform
Layer
Variable B
Plugin Read
Transform Plugin
Plugin Write
Transformed var.
Staging
• Use compute and deep-memory hierarchies to optimize overallworkflow for power vs. performance tradeoffs
• Abstract complex/deep memory hierarchy access
• Placement of analysis and visualization tasks in a complex system
• Impact of network data movement compared to memory movement
• Abstraction allows staging
• On-same core
• On different cores
• On different nodes
• On different machines
• Through the storage system
26
Reduction comes with challenges• Handling high entropy
• Performance – no benefit otherwise
• Not only errors in variable itselfΕ ≡ 𝑓 − ሚ𝑓must also consider impact onderived quantities:
Ε ≡ (𝑔𝑙𝑡(𝑓 റ𝑥, 𝑡 ) −
෫𝑔𝑙𝑡( ෪𝑓𝑙
𝑡( റ𝑥, 𝑡 ) Where
did
it
go?
400 X reduction techniques
27
Several HPC floating point compression algorithms have emerged
• Current interest is with lossy algorithms, some use preprocessing
• Lossless usually achieves up to ~3x reduction
• ISABELA
• SZ coming to ADIOS
• ZFP in ADIOS 1.10
• Linear auditing
• SVD
• Adaptive gradient methods
Compress each variable separately: Several variables simultaneously:
• PCA
• Tensor decomposition
• …
28
Lossy compression with Argonne SZ• No existing compressor can reduce hard to
compress datasets by more than a factor of 2.
• Objective 1: Reduce hard to compress datasets by one order of magnitude
• Objective 2: Add user required error controls (error bound, shape of error distribution, spectral behavior of error function, etc. etc.)
NCAR atmosphere
simulationoutput
(1.5 TB)
WRF hurricane
simulationoutput
Advanced Photon Source
mouse brain data
What we need to compress
(bit map of 128 floating point
numbers):
Random noise
29
Next generation version of ZFP
• ZFP enables a progressive segmentation of data into low, medium, and high precision “buckets”
• Work in blocks of 4 elements in each dimension• A family of spatially-decorrelating transforms (e.g.
DCT), can be parameterized as follows:,
• ZFP chooses because it can be implemented in a highly optimized fashion
• Assume smoothness: zig-zag ordering “sorts” coefficients
• Arrange by bit plane, set how many bits or bit planes to keep in each of N output streams, and where to write each
is an adjustable parameter
Almost correct summary: fast, piece-wise FFT, with tunable precisions for Fourier coefficients, splitable into N pieces on disk
2D data
P. Lindstrom, "Fixed-Rate Compressed Floating-Point Arrays," in IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2674-2683, 2014.
l2-norm bounded in each 4x4 block
30
ISABELA: Sort and Spline Compression• Preconditioning: sort
the data (in chunks) –smoothens data
• Then fit cubic-splines, to l∞-norm error tolerance• Fewer fit parameters
needed because of preconditioning
• Losslessly compress differentials to ensure point-wise error bounds, not just “chunk-averaged” ones
31
CODAR: Machine Learning Potentials • NWChemEx
• Molecular Dynamics (MD)
• Molecular dynamics (MD) simulations are essential to address the modeling of large molecular systems with appropriate models for the complex realistic environments affecting the chemical processes of interest.
• Rare trajectories event identification (Anomaly Detection)•NWChem use methods that bias the calculations to make rare events more likely
•It would be good if amongst a bunch of trajectories we could identify the ones where something interesting happens and give more compute resources to those paths and downgrade others.
• Climate• Causation of interactions (i.e. aerosol- cloud interaction)
• Cancer• Potential contributions of applying and/or develop DL methods
33
Performance Optimization Strategies for Data Intensive Applications• Data Reduction and Analysis Algorithms and Workflows need to run as
efficiently as the numerical applications they serve on the Exascale systems, so as not to create bottlenecks for the computational studies.
• These applications need to adapt to the new hardware incl. deeper memory hierarchies and different architectural swim lanes. Our team will explore: Runtime System enhancements,
• Programming Models and Compiler directives that specifically benefit data intensive applications as used by ExaFEL (SLAC), Cancer (ANL), as well as numerical application with extreme volumes of data such as Fusion, Climate and Cosmology.
• We will provide tailored improved memory management (supported through OpenMP), communication and resource management strategies, as well as execution performance improvements and performance portability through the use of tailored compiler directives and strategies.
34
Applications Targeted
• Climate (ACME), Materials Science, Magnetic Fusion Energy, Chemistry, Cosmology, QCD, Experimental Light Sources
Software Technologies Cited
• MPI, OpenMP, Pthreads
• ADIOS, Swift, ZFP, SZ, Globus, DIY/Decaf, VTK-m, PETSc/TAO, Trilinos
Hardware Technologies Addressed
• Communication network, on-node memory hierarchy, system solid state memory (placement and usage for burst buffers), I/O design for data movement on and off exascale system
CODAR
Center Objectives (Co-Designed Motifs) Applications and S/W & H/W Technologies
Development PlanRisks and Challenges
PI: Ian Foster (ANL); Institutions: ANL, BNL, ORNL, Brown, Rutgers, Stony Brook
• Motif(s): Online data analysis and reduction
• Address growing disparity between simulation speeds and I/O rates rendering it infeasible for HPC and data analytic applications to perform offline analysis
• Target common data analysis and reduction methods (e.g., feature and outlier detection, compression) and methods specific to particular data types and domains (e.g., particles, FEM)
• Reduce application development risk by providing performance tradeoffs for offline vs inline analyses of simulation results, produce and integrate into applications high-performance products embodying data analysis and reduction
• Key metrics: use of CODAR tools and technologies by applications; improvements in data analysis efficiency and information-to-data ratios in application outputs
Y1: Application engagements under way; experiments on LCF systems; CODAR Services Beta with first integrated data analysis and reduction services; Domain-specific services
Y2: Experiments on ECP testbed; First CODAR Services monitoring capability; Expanded set of data analysis and reduction services; CODAR Runtime Beta with integrated control layer; Demonstration of 10:1 data reduction for two ECP apps
Y3: CODAR Services and Runtime v1.0 with additional services included; CODAR used inside four ECP applications on LCF systems
Y4: CODAR v2.0, with adaptive data reduction; At-scale demonstrations on LCF systems; CODAR used in four ECP applications with 100:1 data reduction
• Ineffective interaction with application project teams
• Insufficient resources to perform work
• Scope creep
• Unknown performance of tools at exascale
• Poor performance, scalability, or energy efficiency
• Fail to meet application team delivery expectations
• Poor interaction with ECP and other software teams
• Exascale apps not ready
• Test systems not available