Parallel Data Analysis - California Institute of...

N Tropy: A Framework forParallel Data Analysis

Harnessing the Power of Parallel GridResources for Astronomical Data Analysis

Jeffrey P. Gardner,Jeffrey P. Gardner,

Andy ConnollyAndy Connolly

Pittsburgh Supercomputing CenterPittsburgh Supercomputing Center

University of PittsburghUniversity of Pittsburgh

Carnegie Mellon UniversityCarnegie Mellon University

Mining the Universe can beComputationally Expensive

Astronomy now generates ~ 1TB data per nightWith VOs, one can pool data from multiple catalogs.Computational requirements are becoming muchmore extreme relative to current state of the art.There will be many problems that would beimpossible without parallel machines.Example: N-Point correlation functions for the SDSS

2-pt: CPU hours3-pt: CPU weeks4-pt: 100 CPU years!

There will be many more problems for whichthroughput can be substantially enhanced by parallelmachines.

Types of Parallelism

Data Parallel (or “Embarrassingly Parallel”):Example:

100,000 QSO spectra

Each spectrum takes ~1 hour to reduce

Each spectrum is computationally independent from the others

If you have root access to a Grid resource:

Solution for “traditional” enviroment: Condor

VOs will provide a integrated workflow solution (e.g. Pegasus)

Running on shared resources like the TeraGrid is moredifficult

TeraGrid has no metascheduler

TeraGrid batch systems cannot handle 100,000 independentwork units

Solution: GridShell (talk to me if you are interested!)

Types of Parallelism

Tightly Coupled Parallelism (What thistalk is about):

Data and computational domains overlap

Examples:N-Point correlation functions

New object classification

Density estimation

Intersections in parameter space

Solution(?):N Tropy

The Challenge of Parallel Data Analysis

Parallel programs are hard to write!Steep learning curve to learn parallel programmingLengthy development time

Parallel world is dominated by simulations:Code is often reused for many years by many peopleTherefore, you can afford to spend lots of time writing thecode.

Data Analysis does not work this way:Rapidly changing scientific inqueriesLess code reuse

Data Analysis requires rapid software development!

Even the simulation community rarely doesdata analysis in parallel.

The Goal

GOAL: Minimize development time forparallel applications.

GOAL: Allow scientists who don’t have thetime to learn how to write parallel programsto still implement their algorithms in parallel.

GOAL: Provide seamless scalability fromsingle processor machines to TeraGridplatforms

GOAL: Do not restrict inquiry space.

Methodology

Limited Data Structures:Most (all?) efficient data analysis methods usegrids or trees.

Limited Methods:Analysis methods perform a limited number ofoperations on these data structures.

Methodology

Examples: Fast Fourier Transform

Abstraction: Grid

Method: Global Reduction

N-Body Gravity CalculationAbstraction: Tree

Method: Global Top-Down TreeWalk

2-Point Correlation Function CalculationAbstraction: Tree

Method: Global Top-Down TreeWalk

Vision: A Parallel Framework

Computational SteeringPython? (C? / Fortran?)

Framework (“Black Box”)C++ or CHARM++

User serial compute routines

Web Service Layer (atleast from Python)

Domain Decomposition

Tree/Grid TraversalParallel I/O

User serial I/O routines

XML?SOAP?

WSDL?SOAP? Key:

Framework ComponentsTree ServicesUser Supplied

Result Tracking

Workload Scheduling

User traversal/decision routines

Proof of Concept: PHASE 1(complete)

Convert parallel N-Body code “PKDGRAV*” to 3-pointcorrelation function calculator by modifying existingcode as little as possible.

*PKDGRAV developed by Tom Quinn, Joachim Stadel, andothers at the University of Washington

PKDGRAV (aka GASOLINE) benefits:Highly portable

MPI, POSIX Threads, SHMEM, Quadrics, & more

Highly scalable92% linear speedup on 512 processors

Development time:Writing PKDGRAV: ~10 FTE years (could be rewritten in ~2)PKDGRAV -> 2-Point: 2 FTE weeks

2-Point -> 3-Point: >3 FTE months

PHASE 1 Performance

10 million particlesSpatial 3-Point3->4 Mpc

(SDSS DR1 takes lessthan 1 minute with perfect load balancing)

PHASE 1 Performance

10 million particlesProjected 3-Point0->3 Mpc

Proof of Concept: PHASE 2

N N TropyTropy(Currently in progress)

Use only Parallel Management Layer of PKDGRAV.

Rewrite everything else from scratch

Computational Steering Layer

Parallel Management Layer

Serial Layer

Gravity Calculator Hydro Calculator

PKDGRAV Functional Layout

Executes on master processor

Coordinates execution and datadistribution among processors

Executes on all processors

Proof of Concept: PHASE 2

N N TropyTropy(Currently in progress)

Use only Parallel Managment Layer of PKDGRAV.Rewrite everything else from scratchPKDGRAV benefits to keep:

Flexible client-server scheduling architectureThreads respond to service requests issued by master.To do a new task, simply add a new service.

PortabilityInterprocessor communication occurs by high-level requests to“Machine-Dependent Layer” (MDL) which is rewritten to takeadvantage of each parallel architecture.

Advanced interprocessor data caching< 1 in 100,000 off-PE requests actually result in communication.

N Tropy Design

Computational Steering Layer

General-purpose tree building and tree walking routines

UserTestCellsUserTestParticles

Domain decomposition

Tree Traversal

Parallel I/O

PKDGRAV Parallel Management Layer

“User” Supplied Layer

UserCellAccumulateUserParticleAccumulate

Result trackingUserCellSubsumeUserParticleSubsume

Tree Building

2-Point and 3-Point2-Point and 3-Point

algorithm are nowalgorithm are now

complete!complete!

Interprocessorcommunication layer

Layers retained from PKDGRAV

Layers completely rewritten

Tree Services

Web Service Layer

N Tropy “Meaningful” Benchmarks

The purpose of this framework is tominimize development time!

Rewriting user and scheduling layer todo an N-body gravity calculation:

N Tropy “Meaningful” Benchmarks

The purpose of this framework is tominimize development time!

Rewriting user and scheduling layer todo an N-body gravity calculation:

3 Hours

N Tropy New Features(coming soon)

Dynamic load balancingWorkload and processor domainboundaries can be dynamically reallocatedas computation progresses.

Data pre-fetchingPredict request off-PE data that will beneeded for upcoming tree nodes.

Work with CMU Auton-lab to investigateactive learning algorithms to prefetch off-PE data.

N Tropy New Features(coming soon)

Computing across grid nodes

Much more difficult than between nodes on atightly-coupled parallel machine:

Network latencies between grid resources 1000 times

higher than nodes on a single parallel machine.

Nodes on a far grid resources must be treateddifferently than the processor next door:

Data mirroring or aggressive prefetching.

Sophisticated workload management, synchronization

Conclusions

Most data analysis in astronomy isdone using trees as the fundamentaldata structure.

Most operations on these treestructures are functionally identical.

Based on our studies so far, it appearsfeasible to construct a general purposeparallel framework that users canrapidly customize to their needs.

Parallel Data Analysis - California Institute of...

Documents

Data Parallel Algorithms

Morphometry BIRN Lobar analysis and atlas registration to subjects Parallel computing and statistical analysis Anatomical Segmentation Retrospective Data

Parallel Partial Reduction for Large-Scale Data Analysis and …hguo/publications/HeGPDCS18.pdf · 2018-10-24 · Parallel Partial Reduction for Large-Scale Data Analysis and Visualization

OU Supercomputer Symposium – September 23, 2015 Parallel Programming in the Classroom Analysis of Genome Data Karl Frinkle - Mike Morris Parallel Programming

Report from Dagstuhl Seminar 13251 Parallel Data Analysis · Report from Dagstuhl Seminar 13251 Parallel Data Analysis Editedby Artur Andrzejak1, Joachim Giesen2, Raghu Ramakrishnan3,

T MVA A Toolkit for (Parallel) MultiVariate Data Analysis

Parallel Data Analysis Directly on Scientific File Formatssbyna/research/papers/201406-SIG… · · 2014-06-12Parallel Data Analysis Directly on Scientiﬁc File Formats ... The

Enabling Rapid Development of Parallel Tree-Search Applications Harnessing the Power of Massively Parallel Platforms for Astrophysical Data Analysis Jeffrey

Data-Parallel Operations - Parallel Programming and Data Analysis€¦ · Parallel Programming and Data Analysis Aleksandar Prokopec. Scala Collection Recap Scala Collections already

GPU-accelerated data expansion for the Marching Cubes ......Data-parallel! Data-parallel! Not trivially data-parallel! Step 2 is Data Compaction & Expansion ... —Traversal path follows

Ray DataFrames: A library for parallel data analysisRay DataFrames: A library for parallel data analysis Patrick Yang ... uitous systems for computational statistics [14]. Tools such

Module 6: Parallel processing large datacedadocs.ceda.ac.uk/1048/1/parallel_processing_large_data_2.pdf · Parallel processing for data analysis • Data analysis tools do . not (typically)

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Number of Processes Large-Scale Parallel Analysis of ...Large-Scale Parallel Analysis of Applications with the DIY Library SDAV technologies help scientists analyze data ranging from

Server-side Parallel Data Reduction and Analysisdust.ess.uci.edu/smn/smn_WZJ07_gpc_200705.pdf · Server-side Parallel Data Reduction and Analysis Daniel L. Wang1*, Charles S. Zender2,

Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014

Parallel Interactive and Batch HEP-Data Analysis with PROOF

CSE332: Data Abstractions Lecture 20 : Analysis of Fork-Join Parallel Programs

Parallel Data Analysis Directly on Scientific File Formats

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics · Graph-parallel computation is the analogue of data-parallel computation applied to graph data (i.e., property graphs)