Automatic Detection of Performance Anomalies in Task ...€¦ · I Post-mortem (o ine) analysis...

Preview:

Citation preview

Automatic Detection of Performance Anomalies inTask-Parallel Programs

Work in progress on the Aftermath trace analysis tool

Andi Drebes

Universite Pierre et Marie CurieLaboratoire d’Informatique de Paris VI

andi.drebes@lip6.fr

Joint work with:Antoniu Pop, Karine Heydemann

Albert Cohen, Nathalie Drach

RACING’14, May 30th, 2014

Open tream

Context

Hardware and software environment

I Multi-core / many-core systems

I How to exploit the hardware efficiently?

I Task-parallel languages based on fine-grained tasks

Performance debugging

I Requires analysis of complex interactions at execution time:Application / Run-time / Machine

I Possible solution: Record dynamic events to a trace file

I Post-mortem (offline) analysis

Aftermath

I Trace visualization and support for manual analysis

I Originally developed for OpenStream language & run-time

I Work in progress: Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 1 / 16

Context

Hardware and software environment

I Multi-core / many-core systems

I How to exploit the hardware efficiently?

I Task-parallel languages based on fine-grained tasks

Performance debugging

I Requires analysis of complex interactions at execution time:Application / Run-time / Machine

I Possible solution: Record dynamic events to a trace file

I Post-mortem (offline) analysis

Aftermath

I Trace visualization and support for manual analysis

I Originally developed for OpenStream language & run-time

I Work in progress: Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 1 / 16

Context

Hardware and software environment

I Multi-core / many-core systems

I How to exploit the hardware efficiently?

I Task-parallel languages based on fine-grained tasks

Performance debugging

I Requires analysis of complex interactions at execution time:Application / Run-time / Machine

I Possible solution: Record dynamic events to a trace file

I Post-mortem (offline) analysis

Aftermath

I Trace visualization and support for manual analysis

I Originally developed for OpenStream language & run-time

I Work in progress: Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 1 / 16

Outline

1. Aftermath

2. Insufficient parallelism and its causes

3. Performance anomalies during task execution

4. Work in progress: Status

5. Summary & Questions

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs

Aftermath

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

TimelineTimeline

Fil

ters

Fil

ters

Detailed text viewDetailed text view

Sta

tist

ics

Sta

tist

ics

Menu bar: derived metricsMenu bar: derived metrics

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Time

Procesors

Activityduringexecution

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Task execution (dark blue)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Task Task Task

Instance Instance Instance

Task execution (dark blue)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Task creation (white)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Searching for work (light blue)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Basic statistics for run-time states

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

Heatmap indicating task duration (white: fast, red: slow)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Aftermath

NUMA heatmap indicating locality of memory accesses(blue: local, pink: remote)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 2 / 16

Navigating through a trace

I Huge amounts of high-dimensional data

I Lots of features → lots of possibilities for analysis

I Where to look? What to look for?

I Expertise & lots of time required

Guide the user through performance analysisI Refine analysis in several steps

I Start by analyzing parallelismI Analyze what happens inside tasks

I Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 3 / 16

Navigating through a trace

I Huge amounts of high-dimensional data

I Lots of features → lots of possibilities for analysis

I Where to look? What to look for?

I Expertise & lots of time required

Guide the user through performance analysisI Refine analysis in several steps

I Start by analyzing parallelismI Analyze what happens inside tasks

I Automate repetitive tasks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 3 / 16

Detecting insufficient parallelism

P0

Time

P1

P2

Pn-1

...

...

Task execution

Other states

Ideal situationAll CPUs are in task executionstate without any interruption

P0

Time

P1

P2

Pn-1...

...

Task execution

Other states

Realistic scenarioTask creation, memoryallocation, idle time,over-synchronization, . . .

Detect insufficient parallelism / high overhead automatically

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 4 / 16

Detecting insufficient parallelism

P0

Time

P1

P2

Pn-1

...

...

Task execution

Other states

Ideal situationAll CPUs are in task executionstate without any interruption

P0

Time

P1

P2

Pn-1...

...

Task execution

Other states

Realistic scenarioTask creation, memoryallocation, idle time,over-synchronization, . . .

Detect insufficient parallelism / high overhead automatically

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 4 / 16

Detecting insufficient parallelism

P0

Time

P1

P2

Pn-1

...

...

Task execution

Other states

Ideal situationAll CPUs are in task executionstate without any interruption

P0

Time

P1

P2

Pn-1...

...

Task execution

Other states

Realistic scenarioTask creation, memoryallocation, idle time,over-synchronization, . . .

Detect insufficient parallelism / high overhead automatically

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 4 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the intervalde,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the interval

de,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,0+ + + +

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,1+

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,2+ + + +

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

de,n-1+

d : Duration of the intervalde,i : Time that processor i spends in task execution state

te : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the intervalde,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Threshold-based analysis of parallelism

P0

Time

d

P1

P2

Pn-1

...

...

Task execution

Other states

d : Duration of the intervalde,i : Time that processor i spends in task execution statete : Threshold for task execution, e.g. te = 0.95

Consider that there is sufficient parallelism if inequation holds:

n∑i=1

de,i > te · n · d

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 5 / 16

Detecting the cause of insufficient parallelism

Multiple stages during analysis

I If inequation does not hold, find out why

I Possible causes: task creation overhead, memory allocation,not enough tasks available for execution, . . .

I Use thresholds for associated states:tc (task creation), ti (idle time)

1 2 3 4 5 6 7 8 9 10

Interval selection

I Multiple intervals: initialization, termination, etc.

I Repeat analysis for different intervals

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 6 / 16

Detecting the cause of insufficient parallelism

Multiple stages during analysis

I If inequation does not hold, find out why

I Possible causes: task creation overhead, memory allocation,not enough tasks available for execution, . . .

I Use thresholds for associated states:tc (task creation), ti (idle time)

1 2 3 4 5 6 7 8 9 10

Interval selection

I Multiple intervals: initialization, termination, etc.

I Repeat analysis for different intervals

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 6 / 16

Per-interval analysis of parallelism & overhead

ChooseInterval

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Per-interval analysis of parallelism & overhead

ChooseInterval

Analyze taskexecution time

AboveSufficientparallelism

No intervalleft

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Per-interval analysis of parallelism & overhead

ChooseInterval

Analyze taskexecution time

Analyze task

synchronizationAnalyze task

creation time...

Below

AboveSufficientparallelism

No intervalleft

Analyzeidle time

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Per-interval analysis of parallelism & overhead

ChooseInterval

Analyze taskexecution time

Analyzeidle time

Analyze task

synchronizationAnalyze task

creation time...

Inefficientsynchroni-

zation

Notenough

parallelismexposed

High taskcreationoverhead

...

BelowAbove BelowAbove BelowAbove BelowAbove

Below

AboveSufficientparallelism

No intervalleft

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 7 / 16

Detecting performance anomalies during task execution

During task executionI Performance anomaly possible even at 100% task execution

(ineffective use of caches, remote memory accesses, branchmisprediction)

Number of tasks

Duration

Number of tasks

Duration

Impact on the distribution of task durationI Slowdown of all tasksI Different groups / peaks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 8 / 16

Detecting performance anomalies during task execution

During task executionI Performance anomaly possible even at 100% task execution

(ineffective use of caches, remote memory accesses, branchmisprediction)

Number of tasks

DurationNumber of tasks

Duration

Impact on the distribution of task durationI Slowdown of all tasksI Different groups / peaks

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 8 / 16

Using performance counters

Hardware performance counters

I Implemented in hardware, no slowdown of the application

I Low tracing overhead if sampled at beginning / end of a task

I Dozens of hardware events can be monitored

Automatic analysis of performance counters

I Which hardware events are relevant?

I Manual testing tedious & time consuming

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 9 / 16

Using performance counters

Hardware performance counters

I Implemented in hardware, no slowdown of the application

I Low tracing overhead if sampled at beginning / end of a task

I Dozens of hardware events can be monitored

Automatic analysis of performance counters

I Which hardware events are relevant?

I Manual testing tedious & time consuming

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 9 / 16

Analyzing performance counters

Time

Counter value

v(c,i,t)Pi

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Analyzing performance counters

T

Time

Counter value

v(c,i,t)Pi

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Analyzing performance counters

T

Time

Counter value

s e

v(c,i,t)Pi

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Analyzing performance counters

T

Time

Counter value

s e

v(c,i,t)Pi Nc,T

Per-CPU performance counterI Absolute value v(c , i , t), monotonically increasing

I c : Counter (e.g. cache misses)I i : Processor identifierI t: Timestamp

I Sampled at the beginning and end of a task

Break down counter evolution to task instances

I Increase of c by task T : Nc,T = v(c, i , e)− v(c , i , s)

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 10 / 16

Linear regression

Duration

Nc,TjPerformance indicator value

dTj

Perform linear regression

I Assume linear model: dTj= α · Nc,Tj

+ β (α and βconstant)

I Compare coefficient of determination with threshold

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 11 / 16

Linear regression

Duration

Nc,TjPerformance indicator value

dTj

Perform linear regression

I Assume linear model: dTj= α · Nc,Tj

+ β (α and βconstant)

I Compare coefficient of determination with threshold

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 11 / 16

Linear regression

Duration

Nc,TjPerformance indicator value

dTj

Perform linear regression

I Assume linear model: dTj= α · Nc,Tj

+ β (α and βconstant)

I Compare coefficient of determination with threshold

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 11 / 16

Shortcuts & refinement

Variation of task duration

I Determine coefficient of variation for task duration

I Only perform analysis if significant

Task typesI Different task types in an application

I Auxiliary tasks: initialization, terminationI Work tasks: matrix multiplication, decomposition, etc.

I Performance anomaly not necessarily present in all types

Topology of the machine

I Anomaly only present on subset of processors

I Example: Memory accesses local for one NUMA node, remoteon another

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 12 / 16

Shortcuts & refinement

Variation of task duration

I Determine coefficient of variation for task duration

I Only perform analysis if significant

Task typesI Different task types in an application

I Auxiliary tasks: initialization, terminationI Work tasks: matrix multiplication, decomposition, etc.

I Performance anomaly not necessarily present in all types

Topology of the machine

I Anomaly only present on subset of processors

I Example: Memory accesses local for one NUMA node, remoteon another

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 12 / 16

Shortcuts & refinement

Variation of task duration

I Determine coefficient of variation for task duration

I Only perform analysis if significant

Task typesI Different task types in an application

I Auxiliary tasks: initialization, terminationI Work tasks: matrix multiplication, decomposition, etc.

I Performance anomaly not necessarily present in all types

Topology of the machine

I Anomaly only present on subset of processors

I Example: Memory accesses local for one NUMA node, remoteon another

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 12 / 16

Per-interval analysis of parallelism & overhead

Choose task type

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose task typeChoose set

of processors

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose task typeChoose set

of processorsCheck varation

of task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose task typeChoose set

of processorsCheck varation

of task duration

Low

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processorsCheck varation

of task duration

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Break down values

to task instances

Check varation

of task duration

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

Event set

irrelevant

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

Event set

irrelevant

Event set

relevant

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

Event set

irrelevant

Event set

relevant

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

Event set

irrelevant

Event set

relevant

No set left

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Per-interval analysis of parallelism & overhead

Choose set of

performance

counters

Choose task typeChoose set

of processors

Correlate per-task

instance values

and duration

Break down values

to task instances

Check varation

of task duration

Low

High

No set left

Event set

irrelevant

Event set

relevant

No set left

LowHigh

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 13 / 16

Example: K-means branch misprediction

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Example: K-means branch misprediction

0

500

1000

1500

2000

2500

0

2e+

06

4e+

06

6e+

06

8e+

06

1e+

07

1.2

e+07

1.4

e+07

1.6

e+07

1.8

e+07

2e+

07

Num

ber

of

tasks

Task duration [cycles]

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Example: K-means branch misprediction

0.0×100

2.0×106

4.0×106

6.0×106

8.0×106

1.0×107

1.2×107

1.4×107

1.6×107

1.8×107

2.0×107

0

100

00

200

00

300

00

400

00

500

00

600

00

700

00

800

00

900

00

100

000

Task d

ura

tion [

cycle

s]

Branch mispredictions

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Example: K-means branch misprediction

0.0×100

2.0×106

4.0×106

6.0×106

8.0×106

1.0×107

1.2×107

1.4×107

1.6×107

1.8×107

2.0×107

0

100

00

200

00

300

00

400

00

500

00

600

00

700

00

800

00

900

00

100

000

Task d

ura

tion [

cycle

s]

Branch mispredictions

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Example: K-means branch misprediction

Low mispre-diction rateLow mispre-diction rate

High mispre-diction rateHigh mispre-diction rate

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 14 / 16

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Work in progress: Status

Analysis of parallelism

I Per-interval analysis of time spent on task execution

I Per-interval analysis of time spent in run-time states

I Support for thresholds

I Loop performing analysis on set of intervals

Correlation of performance indicators

I Support for performance counters

I Task duration histogram

I Analysis of the variation of task durations

I Breaking down performance counter values to task instances

I Linear regression

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 15 / 16

Summary

Aftermath

I Tool for trace-based analysis of task-parallel programs

I Currently provides only support for manual analysis

I Available at http://openstream.info/aftermath

Automatic analysis of parallelism based on thresholds

I Amount of time spent on task execution sufficiently high?

I If not, perform subsequent threshold-based analysis for statesassociated with overhead of the run-time system

Automatic correlation of performance indicators

I Indicate which events are relevant

I Break down counter evolution to task instances

I Correlate with task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 16 / 16

Summary

Aftermath

I Tool for trace-based analysis of task-parallel programs

I Currently provides only support for manual analysis

I Available at http://openstream.info/aftermath

Automatic analysis of parallelism based on thresholds

I Amount of time spent on task execution sufficiently high?

I If not, perform subsequent threshold-based analysis for statesassociated with overhead of the run-time system

Automatic correlation of performance indicators

I Indicate which events are relevant

I Break down counter evolution to task instances

I Correlate with task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 16 / 16

Summary

Aftermath

I Tool for trace-based analysis of task-parallel programs

I Currently provides only support for manual analysis

I Available at http://openstream.info/aftermath

Automatic analysis of parallelism based on thresholds

I Amount of time spent on task execution sufficiently high?

I If not, perform subsequent threshold-based analysis for statesassociated with overhead of the run-time system

Automatic correlation of performance indicators

I Indicate which events are relevant

I Break down counter evolution to task instances

I Correlate with task duration

Andi Drebes – Automatic Detection of Performance Anomalies in Task-Parallel Programs 16 / 16