16
On-line detection of large-scale parallel application s structure German Llort, Juan Gonzalez, Harald Servat, Judit Gimenez, Jesus Labatra Barcelona Supercomputing Center University Politecnica de Catalunya

On-line detection

Embed Size (px)

Citation preview

Page 1: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 1/16

On-line detection of large-scale

parallel applications structureGerman Llort, Juan Gonzalez, Harald Servat, Judit

Gimenez, Jesus Labatra

Barcelona Supercomputing Center 

University Politecnica de Catalunya

Page 2: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 2/16

Introduction (1/2)

� Trace-based performance analysis of large parallel

applications has become a challenging task

± Traces rapidly become unmanageable due to long runs

and many processes

� Saving all traces might be unfeasible due to storage limitations

� Vast amount of data degrades the responsiveness of the

analysis tools

� Irrelevant data can distort the results and hinder the

understanding of the applications performance

± Filtering irrelevant (either meaningless or repetitive)

data is a first step for an efficient analysis

Page 3: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 3/16

Introduction (2/2)

� This paper proposes an on-line analysis framework

� i) Automatic analysis: users only specify a trace size

� ii) Clustering technique: at runtime, a small region of the

execution which represents the overall behavior of app ischosen

� iii) Selective collection: only region-related performance data

is stored in the trace

Page 4: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 4/16

Framework (1/7)

� System components interaction

MPItrace intercepts calls

and records the values

MRNet interconnects

processes in a tree-like

topology, and summarizes

data on its way

CPU bursts are grouped according to their 

similarity in terms of duration and

performance counters

a fine-grain characterization of the app¶s

structure

Page 5: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 5/16

Framework (2/7)

� Data acquisition

± MPItrace gather information whenever any of the

instrumented events occur from processors

� e.g., elapsed cycles, completed instructions, cache misses

� Values are stored per task into separate memory buffers, and

every new event overwrites the oldest

� Data for analysis belongs to a time region where all processes

are active simultaneously

� Data transmission

± A backend thread per process connects to the tools

front-end through MRNet network

Page 6: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 6/16

Framework (3/7)

� All communication processes ran in a separate set of 

processors to lower the burden of task processes

� Communication processes transfer performance data in buffer

when awakened by broadcast message from the front-end

� Data analysis

± The main purpose is to detect computing regions (i.e.,

CPU bursts) with similar behavior to identifycomputation structure (i.e., apps phases)

� Every CPU burst is defined by its duration and a set of 

performance metrics at the start and end of the region

Page 7: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 7/16

Framework (4/7)

� The clustering algorithm uses these metrics to characterize

app

± The small subset is clustered to speed up the clustering process

± Apps like Gromacs, Specfem3D, NAS BT generated 50,000 bursts in 30

seconds, which can take up to 10 minutes to analyze

± The remaining bursts are classified to their closest cluster using a

nearest neighbor search

± Reduction though selection strategies varying sampling time or

sample processes dropped the analysis time to 5-10 seconds

� A numerical report with the average values and a scatter plot

are presented

Page 8: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 8/16

Framework (5/7)

Page 9: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 9/16

Framework (6/7)

± Tracking the app evolution

� Whenever the app produces a new volume of data with a

given size, a subsequent clustering analysis triggers

� Once a stable region has been detected, clustering results aretransferred back to the back-end threads, and every CPU burst

is labeled with the cluster to whom it belongs to

± The app is considered stable when several clusterings in a row are

equivalent

± Two clustering are considered equivalent if the matching clusters

represent at least the 85% of the total computation time

� Along with the clusters distribution, all performance data

within the same time interval is flushed from the tracing

buffers in order to produce a detailed trace of that region

Page 10: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 10/16

Framework (7/7)

Page 11: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 11/16

Experimental Setup

± Marenostrum supercomputer

� A cluster comprising 10,240 IBM Power PC 970MP processors

at 2.3GHz interconnected by a Myrinet network

Page 12: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 12/16

Gromacs (1/2)

± An engine to perform molecular dynamics simulations

and energy minimization

� 64 MPI tasks with 10 iterationsIndication of potential

load imbalance

Page 13: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 13/16

Gromacs (2/2)

Page 14: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 14/16

Zeus-MP

± A computational fluid dynamics code for the simulation

astrophysical phenomena

� 256 MPI tasks with 4 iterations

Page 15: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 15/16

SPEMFEM3D

Page 16: On-line detection

8/7/2019 On-line detection

http://slidepdf.com/reader/full/on-line-detection 16/16

Q uality of the results