18
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum Jülich [email protected] * Work done while authors were at IBM Research * Seetharami Seelam University of Texas El Paso [email protected] Luiz DeRose Cray Inc. [email protected] *

Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

Embed Size (px)

Citation preview

Page 1: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

Paradyn Week – April 14, 2004 – Madison, WI

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP

Applications

Bernd MohrForschungszentrum Jü[email protected]

* Work done while authors were at IBM Research

*Seetharami SeelamUniversity of Texas El [email protected]

Luiz DeRoseCray [email protected]

*

Page 2: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

2

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

Outline

What is POMPWhat is DPCL IBM compiler and run-time library features

that makes dPOMP possible Implementation

POMP not supported features (and why)Probe libraries

POMPROF

KOJAKConclusions

Page 3: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

3

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

“Standard” OpenMP Monitoring API?

Problem: OpenMP (unlike MPI) does not define

standard monitoring interface

OpenMP is defined mainly by directives/pragmas

Solution: POMP: OpenMP Monitoring Interface

Joint Development

Forschungszentrum JülichUniversity of Oregon

Presented at EWOMP’01, LACSI’01 and SC’01

“The Journal of Supercomputing”, 23, Aug. 2002.

Page 4: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

4

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

What is POMP?

Portable cross-platform/cross-language API to simplify the design and implementation of OpenMP tools

POMP was motivated by the MPI profiling interface (PMPI) PMPI allows selective replacement of MPI routines at link time

Used by most MPI performance tools

VampirTrace

MP-Profiler

User Program

Call MPI_Bcast

Call MPI_Send

MPI Library

MPI_Bcast

PMPI_Send

MPI_Send

MPI Library

MPI_Bcast

PMPI_Send

MPI_Send

Profiling Library

MPI_Send

Page 5: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

5

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

A Brief History of OpenMP Instrumentation

POMP1 OpenMP performance monitoring interface Forschungszentrum Jülich, University of Oregon

Published at "The Journal of Supercomputing", Vol. 23, 2002.

European IST Project INTONE Development of OpenMP tools (includes Monitoring interface)

Pallas, CEPBA, Royal Inst. Of Technology, Tech. Univ. Dresden

http://www.cepba.upc.es/intone/

Intel KAI Software Laboratory (KSL) - POMP Development of OpenMP monitoring interface inside ASCI

Based on POMP, but further developed in other directions

Joint proposal presented at EWOMP’02 POMP2 == POMP

Page 6: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

6

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

POMP Proposal

Three groups of events OpenMP constructs and directives/pragmas

Enter/Exit around each OpenMP constructBegin/End around associated body

Special case for parallel loops: ChunkBegin/End, IterBegin/End, or IterEvent

instead of Begin/End

“Single” events for small constructs like atomic or flush

OpenMP API calls

Enter/Exit events around omp_set_*_lock() functions

“single” events for all API functions

User functions and regions

Allows application programmers to specify and control amount of instrumentation

Page 7: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

7

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

1: int main() { 2: int id; 3: 4: #pragma omp parallel private(id) 5: { 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); 8: } 9: }

Example: Standard Instrumentation 1: int main() { 2: int id;

3:

4: #pragma omp parallel private(id) 5: {

6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id);

8: }

9: }

*** POMP_Init();

*** POMP_Finalize();

*** { POMP_handle_t pomp_hd1 = 0; *** int32 pomp_tid = omp_get_thread_num();

*** int32 pomp_tid = omp_get_thread_num();

*** }

*** POMP_Parallel_enter(&pomp_hd1, pomp_tid, -1, 1,*** "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**");

*** POMP_Parallel_begin(pomp_hd1, pomp_tid);

*** POMP_Parallel_end(pomp_hd1, pomp_tid);

*** POMP_Parallel_exit(pomp_hd1, pomp_tid);

Page 8: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

8

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

Dynamic Performance Monitoring Interface for OpenMP

Collaboration with Forschungszentrum Jülich

Motivation:POMP under review by the OpenMP ARB!

May take too long to be implemented (if accepted)

ApproachA POMP implementation based on dynamic

probesBuilt on top of DPCL

Modifies the binary with performance instrumentation

No source code or re-compilation required

Page 9: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

10

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

A() {

}

OMP loop

Source code

main() {

}

A()

OMP parallel

OMP end parallel

DPOMP Instrumentation

The IBM compiler and run-time library

run-time library

Compiler generated

A() {

}

xlf_Par

main() {

}

A()

master thread

A@0L1 {

}

xlf_DoPar

all threads

do I=start,end loop bodyenddo

A@0L1@OL2 {

}

POMP_Parallel_enter

POMP_Parallel_exit

POMP_Parallel_begin

POMP_Parallel_end

POMP_Loop_enter

POMP_Loop_exit

POMP_Loop_chunk_begin

POMP_Loop_chunk_end

POMP_Functionl_enter

POMP_Functionl_exit

Page 10: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

11

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

DPOMP Usage

% dpomp <pomp library> <exe>

Input parameters

<exe> OpenMP application (or mixed-mode)

<pomp-lib> POMP compliant monitoring library

List of user functions to instrument (optional)

dpomp [-f function.lst] libpomp a.out

Page 11: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

12

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

DPOMP Instrumentation

Amount of instrumentation can be controlled By the tool builder

Set of POMP calls available in the monitoring library and/or

By the user

Environment variables

Events instrumented by default: All OpenMP constructs

All user functions called in the main program

All MPI Calls

Once the instrumentation is finished, the modified

program is executed

Page 12: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

13

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

Limitations

63 out of 68 POMP events supported ! Limitations due to compiler issues

POMP_Loop_iter_(begin, or end, or event)

POMP_Implicit_barrier_(end, or exit)

OMP Parallel Loop NOT = OMP Parallel / OMP Loop

Compile Time Context (CTC)

hasFirstPrivate, hasLastPrivate, hasNowait, hasCopyin, schedule, hasOrdered, and hasCopypriv not available

Limitations due to DPCL issues Loop iteration values (init, final, incr, chunk)

Other Limitations C++ not support

Page 13: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

14

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

POMP Profiler (POMPROF)

Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application:

Parallel regions

OpenMP loops inside a parallel region

User defined functions

Profile data is presented in the form of an

XML file that can be visualized with PeekPerf

Page 14: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

15

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

POMP Profiler (pomprof)

Page 15: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

16

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

POMP Profiler (II)

Page 16: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

17

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

KOJAK POMP Library (Forschungszentrum Juelich)

POMP monitoring library which generates EPILOG event traces

Processed by KOJAK’s automatic event tracer analyzer EXPERT

Performance Property:Which type of behavior caused the problem?

Call Tree: Where in the source code? In which context?

Location:How is the problem distributed across the machine?

Color Scale:How severe is the problem?

Page 17: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

18

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

EPILOG Trace Converted to VTF3 (FZ Juelich) EPILOG-to-VTF3

Maps OpenMP constructs into VAMPIR symbols and activities

Page 18: Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum

19

DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Luiz DeRose

Conclusions

DPCL based implementation of the POMP performance monitoring interface for OpenMP

Easy to use

Two POMP LibrariesKOJAK POMP Library

POMP Profiler Library

or build your own library