DIY Parallel Data Analysis

Tom Peterka

tpeterka@mcs.anl.gov

Mathematics and Computer Science Division ATPESC Talk 8/11/14

Image courtesy pigtimes.com

“I have had my results for a long time, but I do not yet know how I am to arrive at them.”

–Carl Friedrich Gauss, 1777-1855

Preliminaries

Moving from Postprocessing to Run-Time Scientific Data

Analysis in HPC ��

Analyze!

Postprocessing analysis and visualization

Run-time analysis and visualization

Example of a data flow network

Definition of Data Analysis

•  Any data transformation, or a network or transformations. •  Anything done to original data beyond its original generation. •  Can be visual, analytical, statistical, or data management.

Examples

Streamlines and pathlines Stream surfaces

FTLE Information entropy

Morse-Smale complex Voronoi and Delaunay tessellation

Ptychography

Common Denominators

•  Big science => big data, big machines •  Most analysis algorithms are not up to speed

•  Either serial, or

•  Overheads kill scalability

•  Solutions

•  Process data closer to the source •  Write scalable analysis algorithms

•  Parallelize in various forms

•  Build software stacks of useful and reusable layers

•  Usability and workflow

•  Develop libraries rather than tools •  Users write small main programs and call into libraries

Abstractions Matter: Think Blocks, not Tasks

•  Block = unit of decomposition •  Block size, shape can be configured

•  From coarse to fine

•  Regular, adaptive, KD-tree

•  Block placement is flexible, dynamic

•  Blocks per task •  Tasks per block

•  Memory / storage hierarchy

•  Data is first-class citizen

•  Separate operations per block

•  Thread safety

Parallel data analysis consists of decomposing a problem into blocks, operating on them, and communicating between them.

You Have Two Choices to Parallelize Data Analysis

or By hand With tools

void ParallelAlgorithm() { … MPI_Send(); … MPI_Recv(); … MPI_Barrier(); … MPI_File_write(); }

void ParallelAlgorithm() { … LocalAlgorithm(); … DIY_Merge_blocks(); … DIY_File_write() }

DIY Concepts

Library

Written in C++ with C bindings Autoconf build system (configure, make, make install) Lightweight: libdiy.a 800KB Maintainable: ~15K lines of code, including examples

DIY usage and library organization

Features

Parallel I/O to/from storage Domain decomposition Network communication Utilities

!"#$%&'"() *"+$&%",&'"()-.((%

/)&%0+"+-1"23&30

4%&+56-789:;;;6-</== >&3&*"8?6-*"+@'

@.16-A+$B%(?6-C5$%%6-*.D

78"H52(3

I%(2&%

J%(K9")H

/++"H)#8)'

E8K(#L(+"'"() =(##$)"K&'"()M8&NE&'&

P3"'8-M8+$%'+

=(#L38++"()Q'"%"'"8+ >&3&%%8%!(3'

E&'&'0L8=38&'"()

>&3&%%8%

DIY��helps the user write data-parallel analysis algorithms by decomposing a

problem into blocks and communicating items between blocks. ��

Nine Things That DIY Does

1. Separate analysis ops from data ops

2. Group data items into blocks

3. Assign blocks to processes

4. Group blocks into neighborhoods

5. Support multiple multiple instances of 2, 3, and 4

6. Handle time

7. Communicate between blocks in various ways

8. Read data and write results

9. Integrate with other libraries and tools

!"#$%&'(('( )"#$%&'(('( *"#$%&'((

!'()*+,

-$.!"+$/

/01!"1)2$"34(*.4**5

!$#0*.1)2$"34(*.4**5

!"#$6/!$0/

!"#$%&'()*%+$#,$-$#./$#,$'$/#/'*$#,$01$2%3456#75##8+

Writing a DIY Program

Tutorial Examples •  Block I/O: Reading data, writing analysis

results •  Static: Merge-based, Swap-based reduction,

Neighborhood exchange

•  Time-varying: Neighborhood exchange •  Spare thread: Simulation and analysis

overlap •  MOAB: Unstructured mesh data model •  VTK: Integrating DIY communication with

VTK filters •  R: Integrating DIY communication with R

stats algorithms •  Multimodel: multiple domains and

communicating between them

Documentation •  README for installation •  User’s manual with description, examples

of custom datatypes, complete API reference

Execution: Same Core and Spare Core

Same core Spare core

Time: Static and Time-Varying

Static Time-varying

Communication: Merge and Swap Reduction, Neighbor Exchange

Merge and swap reduction

Neighbor exchange

One Example in Greater Detail

Parallel Tessellation ��

We developed a prototype library for computing in situ Voronoi and Delaunay tessellations from particle data and applied it to cosmology, molecular dynamics,

and plasma fusion. ��

Key Ideas

•  Mesh tessellations convert sparse point data into continuous dense field data.

•  Meshing output of simulations is data-intensive and requires supercomputing resources

•  No large-scale data-parallel tessellation tools exist.

•  We developed such a library, tess.

•  We achieved good parallel performance and scalability.

•  Widespread GIS applicability in addition to the datasets we tested.

Scalability

Strong and weak scaling for up to 20483 synthetic particles and up to 128K processes (excluding I/O) shows up to 90% strong scaling and up to 98% weak scaling.

Applications in Cosmology

Histogram of Cell Density Contrast at t = 11

Cell Density Contrast ((density − mean) / mean)

100 binsRange [ −0.77 , 0.59 ]Bin width 0.014Skewness 1.6Kurtosis 4.1

−0.768 −0.496 −0.225 0.046 0.318 0.589

00 100 binsRange [ −0.77 , 2.4 ]Bin width 0.033Skewness 2Kurtosis 5.5

−0.77 −0.13 0.52 0.84 1.16 1.48 1.80 2.12 2.45

00 100 binsRange [ −0.72 , 15 ]Bin width 0.15Skewness 4.5Kurtosis 23

−0.72 2.33 3.85 5.38 6.90 8.42 9.95 11.47 14.52

t=11 t=21 t=31

Temporal structure dynamics: As time progresses, the range of cell volume and density expands, kurtosis and skewness increases. These statistics are consistent with the governing physics of the formation of high- and low-density structures over time and can perhaps be used to summarize evolution at given time steps.

Feature statistics: Total volume, surface area, curvature, topology of connected components of Voronoi cells classify and quantify features.

Density estimation: Tessellations as intermediate representations enable accurate regular grid density estimators.

Block abstraction for parallelizing data analysis

Encapsulate data movement in a separate library

Define design patterns for data movement in HPC data analysis in terms of:

•  Execution •  Temporal behavior

•  Communication pattern

The benefits are:

•  Abstraction, implementation independence •  Reuse, programmer productivity

•  Standardization

•  Benchmarking 21

Further Reading

DIY •  Peterka, T., Ross, R., Kendall, W., Gyulassy, A., Pascucci, V., Shen, H.-W., Lee, T.-Y., Chaudhuri, A.: Scalable Parallel Building Blocks for Custom Data Analysis. Proceedings of Large Data Analysis and Visualization Symposium (LDAV'11), IEEE Visualization Conference, Providence RI, 2011. •  Peterka, T., Ross, R.: Versatile Communication Algorithms for Data Analysis. 2012 EuroMPI Special Session on Improving MPI User and Developer Interaction IMUDI'12, Vienna, AT.

DIY applications •  Peterka, T., Ross, R., Nouanesengsey, B., Lee, T.-Y., Shen, H.-W., Kendall, W., Huang, J.: A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields. Proceedings IPDPS'11, Anchorage AK, May 2011. •  Gyulassy, A., Peterka, T., Pascucci, V., Ross, R.: The Parallel Computation of Morse-Smale Complexes. Proceedings of IPDPS'12, Shanghai, China, 2012. •  Nouanesengsy, B., Lee, T.-Y., Lu, K., Shen, H.-W., Peterka, T.: Parallel Particle Advection and FTLE Computation for Time-Varying Flow Fields. Proeedings of SC12, Salt Lake, UT. •  Peterka, T., Kwan, J., Pope, A., Finkel, H., Heitmann, K., Habib, S., Wang, J., Zagaris, G.: Meshing the Universe: Integrating Analysis in Cosmological Simulations. Proceedings of the SC12 Ultrascale Visualization Workshop, Salt Lake City, UT. •  Chaudhuri, A., Lee-T.-Y., Zhou, B., Wang, C., Xu, T., Shen, H.-W., Peterka, T., Chiang, Y.-J.: Scalable Computation of Distributions from Large Scale Data Sets. Proceedings of 2012 Symposium on Large Data Analysis and Visualization, LDAV'12, Seattle, WA.

Tom Peterka

tpeterka@mcs.anl.gov

Mathematics and Computer Science Division

Acknowledgments:

Facilities Argonne Leadership Computing Facility (ALCF)

Oak Ridge National Center for Computational Sciences (NCCS)

Funding DOE SDMAV Exascale Initiative DOE Exascale Codesign Center DOE SciDAC SDAV Institute

https://bitbucket.org/diatomic/diy

ATPESC Talk 8/11/14

“The purpose of computing is insight, not numbers.” –Richard Hamming, 1962

DIY Parallel Data Analysis

Documents

Parallel Kinematic Machines: Design, Analysis and ...1090203/FULLTEXT01.pdf · topology synthesis and analysis of parallel manipulators. ... from kinematic/dynamic analysis evaluation

Software Engineering and the Parallel Climate Analysis ... · Software Engineering and the Parallel Climate Analysis Library (ParCAL) RobertJacob’and’ Xiabing’Xu’ Mathemacs’and’Computer’Science’Division’

Performance Analysis with Parallel Performance Wizard

Parallel Architectures & Performance Analysis

Parallel Analysis of Algorithms: PRAM + CGM

Parallel Cost Analysis of Distributed Systems - UCMeprints.ucm.es/36995/1/Parallel cost analysis.pdf · Parallel Cost Analysis of Distributed Systems? ... tributed processing to capture

Developing Parallel Applications with the Eclipse Parallel ...hpc.ac.upc.edu/Talks/dir11/T000359/slides.pdf · Dynamic & Performance Analysis Dynamic Analysis Tools Perform analysis

PARALLEL ELLIPTIC PRECONDITIONERS: FOURIER ANALYSIS …mcl.usc.edu/wp-content/uploads/2014/01/1988-08-Parallel-elliptic... · PARALLEL ELLIPTIC PRECONDITIONERS: FOURIER ANALYSIS AND

Parallel Systems: Performance Analysis of Parallel Processing

Massively parallel functional analysis of missense

Performance Analysis of Parallel Processingganesan/old/courses/proj664.… · Web viewMeasurement and analysis of parallel processing is a new field, the analysis of parallel processing

Parallel Algorithms - Design,Performance &Analysis Issues

Analysis of Double-parallel

Water Research -Parallel Factor Analysis

Parallel Algorithm Analysis and Design

DIY: #sonarlint Java Static Analysis #sonarqube · #sonarqube #sonarlint DIY: Java Static Analysis Nicolas PERU - @benzonico. Ego boost Nicolas PERU - @benzonico Java developer@SonarSource

TIM BOCK PRESENTS DIY Advanced Analysis

Parallel Data Analysis - California Institute of Technologyadassxiv.ipac.caltech.edu/Presentations/Tuesday/Gardner_O6_5.pdf · The Challenge of Parallel Data Analysis Parallel programs

FLUORESCENCE SPECTROSCOPY AND PARALLEL FACTOR ANALYSIS …

Chapter 4 'Performance Analysis of Parallel Programs'...Chapter 4 "Performance Analysis of Parallel Programs" according to the book "Parallel Programming" by T. Rauber and G. R unger