24
- 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis Data Flow Pattern Analysis of Scientific Applications of Scientific Applications Michael Frumkin Michael Frumkin Parallel Systems & Parallel Systems & Applications Applications Intel Corporation Intel Corporation May 6, 2005 May 6, 2005

- 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

Embed Size (px)

DESCRIPTION

- 3 - Workshop on Pattern Analysis Why Data Flow Pattern Analysis?  Scientific applications –model few natural processes –new effects are added infrequently –influence on the existing data flows are insignificant  Knowledge of data flow in program helps with –program understanding –program optimization, parallelization, multithreading –building application performance model

Citation preview

Page 1: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 1 -Workshop on Pattern Analysis

Data Flow Pattern Analysis Data Flow Pattern Analysis of Scientific Applications of Scientific Applications

Michael FrumkinMichael FrumkinParallel Systems & Parallel Systems &

ApplicationsApplicationsIntel CorporationIntel Corporation

May 6, 2005May 6, 2005

Page 2: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 2 -Workshop on Pattern Analysis

OutlineOutline Why Data Flow Pattern Why Data Flow Pattern

Analysis?Analysis? CFD ApplicationsCFD Applications The NAS Parallel BenchmarksThe NAS Parallel Benchmarks The NAS Grid BenchmarksThe NAS Grid Benchmarks Trace File AnalysisTrace File Analysis ConclusionsConclusions

Page 3: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 3 -Workshop on Pattern Analysis

Why Data Flow Pattern Why Data Flow Pattern Analysis?Analysis? Scientific applications Scientific applications

– model few natural processesmodel few natural processes– new effects are added infrequently new effects are added infrequently – influence on the existing data flows are influence on the existing data flows are

insignificantinsignificant

Knowledge of data flow in program helps withKnowledge of data flow in program helps with– program understandingprogram understanding– program optimization, parallelization, program optimization, parallelization,

multithreadingmultithreading– building application performance modelbuilding application performance model

Page 4: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 4 -Workshop on Pattern Analysis

Design of Scientific Design of Scientific ApplicationsApplications

Time represented as an outer loopTime represented as an outer loop– Iterations over time stepIterations over time step

Space is represented by structured/unstructured gridsSpace is represented by structured/unstructured grids– Important for understanding data localityImportant for understanding data locality– Data access patternsData access patterns– Spatial parallelism Spatial parallelism

Physics is represented by an operator at each grid Physics is represented by an operator at each grid pointpoint

– Data flowData flow– Operator level of parallelism/dependenceOperator level of parallelism/dependence

Page 5: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 5 -Workshop on Pattern Analysis

CFD Data Flow PatternsCFD Data Flow Patterns Solve the Navier-Stokes equationSolve the Navier-Stokes equation

K(uK(ui+1i+1)=Lu)=Luii

– u is five-dimensional vectoru is five-dimensional vector– K is non-linear operatorK is non-linear operator

SolverSolver

RHS computationRHS computation

Page 6: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 6 -Workshop on Pattern Analysis

ADI PatternADI Pattern

x-solve

y-solve

z-solve Multipartition

x-solve y-solve z-solve

ADI method

ADI method K~Kx*Ky*Kz ADI method K~Kx*Ky*Kz Multilevel parallelismMultilevel parallelism

Page 7: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 7 -Workshop on Pattern Analysis

BT BT Communication Communication

Page 8: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 8 -Workshop on Pattern Analysis

Explicit Explicit OperatorsOperators

Stencil operators (explicit methods)Stencil operators (explicit methods) At each point of a 3-dimensional mesh apply:At each point of a 3-dimensional mesh apply:

seven-point 27-point

Page 9: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 9 -Workshop on Pattern Analysis

) Two-dimensional pipelineTwo-dimensional pipeline Hyperplane algorithmHyperplane algorithm -1 0 0 1 0 0

0 -1 0 0 1 0 0 0 -1 0 0 1

Dependence Matrices

( ()

Lower-Upper Lower-Upper TriangularTriangular

Page 10: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 10 -Workshop on Pattern Analysis

LU CommunicationLU Communication

Page 11: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 11 -Workshop on Pattern Analysis

Multigrid V-CycleMultigrid V-Cycle

Interpolation & SmoothingInterpolation & Smoothing

Interpolation & SmoothingInterpolation & Smoothing

Interpolation & SmoothingInterpolation & Smoothing

Interpolation & SmoothingInterpolation & Smoothing

SmoothingSmoothing

ProjectionProjection

ProjectionProjection

ProjectionProjection

ProjectionProjection

Page 12: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 12 -Workshop on Pattern Analysis

MG Communication MG Communication

Page 13: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 13 -Workshop on Pattern Analysis

BT x_solve (serial) Call Graph Data Flow AnalysisData Flow Analysis

do k=1,ksize

do i=1,isize

do j=1,jsize

Page 14: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 14 -Workshop on Pattern Analysis

Nest Data Flow Nest Data Flow GraphGraph

do_45 do_134 do_330

Each arc represents Affinity Relation

Page 15: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 15 -Workshop on Pattern Analysis

NAS Parallel BenchmarksNAS Parallel Benchmarks Application BenchmarksApplication Benchmarks

– CFDCFD– BT, SP, LUBT, SP, LU

– Data IntensiveData Intensive– DC, DT, BTIODC, DT, BTIO

– Computational ChemistryComputational Chemistry– UAUA

Kernel BenchmarksKernel Benchmarks– FT, CG, MG, ISFT, CG, MG, IS

VerificationVerification Performance ModelPerformance Model FORTRAN, C, HPF, Java*FORTRAN, C, HPF, Java* Serial, MPI, OpenMP, Java* ThreadsSerial, MPI, OpenMP, Java* Threads

www.nas.nasa.gov/Software/NPB

* Other names and brands may be claimed as the property of others.

Page 16: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 16 -Workshop on Pattern Analysis

NPB Performance on Altix*NPB Performance on Altix***

* Other names and brands may be claimed as the property of others.** Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.  Any difference in system hardware or software design or configuration may affect actual performance.  Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing.

Page 17: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 17 -Workshop on Pattern Analysis

Basic Data Flow Basic Data Flow PatternsPatterns

ShufflesShuffles– SortingSorting– FFTFFT– RoutingRouting

Gather/ScatterGather/Scatter– Conjugate Gradient Conjugate Gradient – MD and FE codesMD and FE codes– Sparse matricesSparse matrices

TransposeTranspose– FFTFFT– SortingSorting

TreeTree– Parallel prefix, Parallel prefix,

ReductionReduction– SortingSorting

Page 18: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 18 -Workshop on Pattern Analysis

HPC Challenge BenchmarksHPC Challenge Benchmarks HPL*HPL* DGEMM*DGEMM* STREAM*STREAM* PTRANS*PTRANS* FFTE*FFTE* RandomAccess*RandomAccess* Effective Bandwidth b_eff*Effective Bandwidth b_eff*

icl.cs.utk.edu/hpcc

* Other names and brands may be claimed as the property of others.

Page 19: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 19 -Workshop on Pattern Analysis

Programming With Directed Programming With Directed GraphsGraphs

ArcArc– Arc* newArc(Node *tail, Node *head)Arc* newArc(Node *tail, Node *head)– AttachArc(DGraph *dg)AttachArc(DGraph *dg)– deleArc(Arc *ar)deleArc(Arc *ar)

NodeNode– newNode(char *name)newNode(char *name)– Node* AttachNode(DGraph *dg)Node* AttachNode(DGraph *dg)– deleteNode(Node *nd)deleteNode(Node *nd)

DGraphDGraph– DGraph* newDGraph(char *name)DGraph* newDGraph(char *name)– writeGraph(DGraph *dg, char* fname)writeGraph(DGraph *dg, char* fname)– DGraph * readGraph(char* fname)DGraph * readGraph(char* fname)

Implemented in DT of NPB and in NGB

do_134

Page 20: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 20 -Workshop on Pattern Analysis

Directed Graphs Directed Graphs AroundAround

Parse treesParse trees File SystemsFile Systems Application task graphsApplication task graphs Device SchematicsDevice Schematics

VCG toolVCG tool Edge toolEdge tool Tom Sawyer SoftwareTom Sawyer Software Commercial toolsCommercial tools

Visualization and layout Tools

Page 21: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 21 -Workshop on Pattern Analysis

Cart3D*Cart3D* Performs CFD analysis on Performs CFD analysis on

complex geometriescomplex geometries Uses six executablesUses six executables

– Intersect* – intersects Intersect* – intersects geometrygeometry

– Cubes* – produces Cubes* – produces Cartesian meshesCartesian meshes

– Reorder* – reorders Reorder* – reorders meshes meshes

– Mgprep* – coarsens meshMgprep* – coarsens mesh– flowCart* – convergence flowCart* – convergence

accelerationacceleration– Clic* – analyzes the flowClic* – analyzes the flow

Executables communicate Executables communicate via filesvia files

Returns relevant forcesReturns relevant forces– Lift, Drag, Side ForceLift, Drag, Side Force

Task Graphs are rapidly growing* Other names and brands may be claimed as the property of others.

Page 22: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 22 -Workshop on Pattern Analysis

The NAS Grid The NAS Grid BenchmarksBenchmarks Reflect task level programming Reflect task level programming

paradigm paradigm Contain four patternsContain four patterns

– Embarrassingly Distributed (ED)Embarrassingly Distributed (ED)– Helical Chain (HC)Helical Chain (HC)– Visualization Pipeline (VP)Visualization Pipeline (VP)– Mixed Bag (MB)Mixed Bag (MB)

Launch

Report

SP SP SP

SP SP SP

SP SP SP

Embarrassingly Distributed (ED)

Launch

Report

BT MG FT

BT MG FTBT MG FT

Visualization Pipeline (VP)

Report

Launch

FT8 FT8 FT2

LU2 LU4 LU8

MG4 MG8 MG2

Mixed Bag (MB)

#steps

Report

Launch

BT SP LU

BT SP LU

BT SP LU

Helical Chain (HC)

Page 23: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 23 -Workshop on Pattern Analysis

Data Dependent Data Dependent PatternsPatterns Intermittent patternsIntermittent patterns– Useful for application performance tuningUseful for application performance tuning

Visualization is importantVisualization is important– Allows to employ human eye ability to detect patternsAllows to employ human eye ability to detect patterns

Automatic Pattern MiningAutomatic Pattern Mining– OLAP approachOLAP approach

MPI communication patternsMPI communication patterns

Automatic Trace Analysis Using OLAP

Page 24: - 1 - Workshop on Pattern Analysis Data Flow Pattern Analysis of Scientific Applications Michael Frumkin Parallel Systems & Applications Intel Corporation

- 24 -Workshop on Pattern Analysis

ConclusionsConclusions Data Flow in Applications Data Flow in Applications

Application Application PParallelizationarallelization Application Application UUnderstandingnderstanding Application Application MMappingapping Application Application PPerformanceerformance