28
Program Systems Institute RAS TDB TDB TD TD B: B: THE INTERACTIVE DISTRIBUTED THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS MPI PROGRAMS

Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Embed Size (px)

Citation preview

Page 1: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASTDBTDB

TDTDB:B:

THE INTERACTIVE THE INTERACTIVE DISTRIBUTED DEBUGGING DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI TOOL FOR PARALLEL MPI

PROGRAMSPROGRAMS

Page 2: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASAuthors:Authors:

RCMS PSI RASRCMS PSI RAS, , Pereslavl-ZalesskyPereslavl-Zalessky, , RussiaRussia

• A.A. AdamovichAdamovich• M.M. KovalenkoKovalenko

Page 3: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

History of the DevelopmentHistory of the Development

T-systemT-system RCMS PSI RASRCMS PSI RAS,, since the early since the early 9090ss

The SKIF project of the Russia-The SKIF project of the Russia-Belarus Union StateBelarus Union State 2000-2004 2000-2004

T-system and itsT-system and its environmentenvironment::• T-systemT-system ( (industrial versionindustrial version););• the TGCC compilerthe TGCC compiler;;• the TDB interactive debugging systemthe TDB interactive debugging system;;• and others.and others.

Page 4: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

Objectives of the DevelopmentObjectives of the Development

Support of software design and Support of software design and development using computing development using computing systems of the SKIF familysystems of the SKIF family• the element of the integrated toolkitthe element of the integrated toolkit;;• directed towards T-system supportdirected towards T-system support..

Cost-effectivenessCost-effectiveness• reduced expenses for purchasing and reduced expenses for purchasing and

maintaining the SKIF computing systemmaintaining the SKIF computing system Information independenceInformation independence

Page 5: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

Predecessors and AnaloguesPredecessors and Analogues

P2D2P2D2 (Portable Debugger for (Portable Debugger for Parallel and Distributed ProgramsParallel and Distributed Programs, , NASA, 1994, Doreen Cheng, Robert NASA, 1994, Doreen Cheng, Robert Hood)Hood)

TotalViewTotalView (Etnus) (Etnus)

DDTDDT (Distributed Debugging Tool, (Distributed Debugging Tool, Streamline Computing)Streamline Computing)

Page 6: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

Basic Architecture PrinciplesBasic Architecture Principles

The TDB architectureThe TDB architecture::

distributed and multi-distributed and multi-componentcomponent

open and portableopen and portable flexibleflexible multi-user multi-user

Page 7: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASThe TDB Architecture:The TDB Architecture:

Distributed and Multi-componentDistributed and Multi-component

1) The primary daemon1) The primary daemon22) ) The secondary The secondary

daemondaemon33) ) The central serverThe central server44) ) The client The client

componentcomponent55) ) The debugging The debugging

serverserver

Page 8: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

The TDB Architecture (2/2)The TDB Architecture (2/2)

FlexibleFlexible

uses free software:uses free software:• АСЕ, libxml++АСЕ, libxml++, , libpcrelibpcre, , libgtk2.libgtk2.x, x,

scintillascintilla, , gnome-debug-tdbgnome-debug-tdb (based (based onon gnome-debug) gnome-debug)

the possibility of using the possibility of using commercial products, system commercial products, system debuggers, for exampledebuggers, for example

Page 9: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASTBD FeaturesTBD Features

Debug C and C++, Fortran programsDebug C and C++, Fortran programs Linux for 32-bit or 64-bit processorsLinux for 32-bit or 64-bit processors Debug parallel MPI programs.Debug parallel MPI programs. Supported MPI implementations: LAM, Supported MPI implementations: LAM,

MPICH, SCAMPI, MP-MPICH, DMPI.MPICH, SCAMPI, MP-MPICH, DMPI. Advanced job launch methodsAdvanced job launch methods Monitoring of states of target nodes Monitoring of states of target nodes Multi-user supportMulti-user support

Page 10: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASTBD FeaturesTBD Features

One-touch breakpoint One-touch breakpoint setting/manipulatingsetting/manipulating

Step into, over or out of functionsStep into, over or out of functions WatchpointsWatchpoints One-touch symbolic displayOne-touch symbolic display Controls processes individually or Controls processes individually or

collectivelycollectively Color-coded processes/nodes statesColor-coded processes/nodes states Log filesLog files

Page 11: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASTBD FeaturesTBD Features

GroupsGroups Group processes using flexible definition Group processes using flexible definition

languagelanguage Two types of groups supported:Two types of groups supported:

static groups and static groups and dynamic groupsdynamic groups

Control grouped processes as lone processes Control grouped processes as lone processes (step, next, stop...) with real-time visual (step, next, stop...) with real-time visual feedbackfeedback

Special group commands: Special group commands: group breakpoint, group breakpoint, group displaygroup display

Page 12: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASTBD FeaturesTBD Features

Two process control modes:Two process control modes: active process control modeactive process control mode group control modegroup control mode

Two GTDB operational modes:Two GTDB operational modes: active process / active group debugging active process / active group debugging

modemode per process debugging modeper process debugging mode

Page 13: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASTBD FeaturesTBD Features

Special support for parallelizing Special support for parallelizing systems:systems: T-system support:T-system support:

Special commands t-break, t-print…Special commands t-break, t-print…

Page 14: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

GTDB (TDB GUI client) windows and GTDB (TDB GUI client) windows and components featurescomponents features

Main window:Main window: Active Process window Active Process window Source Code display with breakpointsSource Code display with breakpoints Command buttonsCommand buttons Command componentCommand component Active process / Active group selection Active process / Active group selection

componentcomponent

Page 15: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASGTDB windows and components GTDB windows and components

featuresfeatures

GUI component for per process debugging:GUI component for per process debugging: With GUI features for easy processes and MPI-With GUI features for easy processes and MPI-

nodes status readnodes status read With ability to pick and choose one of processesWith ability to pick and choose one of processes Full featured subcomponent for processes Full featured subcomponent for processes

debugging similar to main subcomponent for debugging similar to main subcomponent for debugging active processdebugging active process

MPI-nodes/processes states window, also MPI-nodes/processes states window, also used for selecting processes to inspectused for selecting processes to inspect

Page 16: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASGTDB windows and components GTDB windows and components

featuresfeatures

Breakpoints manipulation component Breakpoints manipulation component windowwindow

Configuration / Properties component Configuration / Properties component windowwindow

Various pop-up menus used for: Various pop-up menus used for: selected expression data inspection and selected expression data inspection and

manipulation, print, display, watchpoints, manipulation, print, display, watchpoints, value set...value set...

execution control (breakpoints set, disable, execution control (breakpoints set, disable, delete...)delete...)

Page 17: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

GTDB – TDB Client ComponentGTDB – TDB Client Component

intuitive interface and ergonomic design

the presentation of information is handy and convenient

Page 18: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

GTDB Node Selection ComponentGTDB Node Selection Component

User can select the exact set of User can select the exact set of computational nodes that computational nodes that are available for debugging are available for debugging MPI tasks.MPI tasks.

The list of all nodes available The list of all nodes available for MPI task debugging can for MPI task debugging can be obtained through the be obtained through the request to TDB daemons. request to TDB daemons.

The primary TDB daemon is The primary TDB daemon is running on front-end and running on front-end and Secondary TDB daemons are Secondary TDB daemons are running on computational running on computational nodes of cluster. TDB nodes of cluster. TDB daemons represent monitor daemons represent monitor processes.processes.

Secondary daemons collect and Secondary daemons collect and the primary daemon the primary daemon accumulates useful info accumulates useful info about computational nodes about computational nodes status. status.

Page 19: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

GTDB Properties ComponentGTDB Properties Component

Is used to configure various TDB, GTDB, Is used to configure various TDB, GTDB, and MPI implementations settingsand MPI implementations settings

Page 20: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

GTDB Nodes Status ComponentGTDB Nodes Status Component Describes statuses of MPI-nodes processes.

• Green color marks running processes

• Yellow color marks stopped processes

• Red color marks processes that have been stopped or terminated by a signal

Upper bar : common MPI-node statusGreen - all processes of the node are runningYellow – at least one of the processes is stoppedRed - at least one process caught a signal

Common status bar is used in purpose to give the user the opportunity to read information about the situation with debugging processes in a more simple and clear way.All status subcomponents are implemented as button widgets:if clicked, open appropriate process (processes) for individual exploration in the PROCS GTDB mode.

Page 21: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

GTDB Breakpoints ComponentGTDB Breakpoints Component The component The component

is used to is used to work with work with various types various types of breakpoints of breakpoints supported in supported in TDB: TDB:

Source line Source line breakpoints, breakpoints,

function function breakpoints breakpoints andand

watchpoints; watchpoints; all of them may all of them may

have have conditions.conditions.

As well a special type of breakpoints is  implemented in TDB, so called As well a special type of breakpoints is  implemented in TDB, so called “group breakpoints”. The group breakpoint allows user to set a number “group breakpoints”. The group breakpoint allows user to set a number of uniform breakpoints in a group of parallel processes. The  user can of uniform breakpoints in a group of parallel processes. The  user can set, delete, disable or enable group breakpoint in one command or click.set, delete, disable or enable group breakpoint in one command or click.

Page 22: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASThe Main GTDB Window. The Main GTDB Window.

Sample Debug SessionSample Debug Session

GTDB in the MAIN -> PROC mode. Process 2:0 is an active (selected, exploring) GTDB in the MAIN -> PROC mode. Process 2:0 is an active (selected, exploring) process...process...

Page 23: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASExample Debug Session of Example Debug Session of

Debugging Simple MPI ProgramDebugging Simple MPI Program

Example of dynamic groups definition using the "dgroup" commandExample of dynamic groups definition using the "dgroup" command

Page 24: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

Example Debug Session of Example Debug Session of Debugging Simple MPI ProgramDebugging Simple MPI Program

We continue the execution of processes from the masters dynamic group and then We continue the execution of processes from the masters dynamic group and then stop on previously set breakpoints in the loopstop on previously set breakpoints in the loop..

Page 25: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

Example Debug Session of Example Debug Session of Debugging Simple MPI ProgramDebugging Simple MPI Program

As we can see the ‘i’ variable equals to zero on all processes in the masters As we can see the ‘i’ variable equals to zero on all processes in the masters group (the "print" command on group masters was used). To get out from the group (the "print" command on group masters was used). To get out from the

loop we set the ‘i’ variable on all masters to 1.loop we set the ‘i’ variable on all masters to 1.

Page 26: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

We continue execution of masters group processes, but – after the loop We continue execution of masters group processes, but – after the loop – execution is stopped by the SIGSEGV signal.– execution is stopped by the SIGSEGV signal.

Page 27: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RAS

Per Procs GTDB Debugging ModePer Procs GTDB Debugging Mode In the Main mode the user can In the Main mode the user can

work with one selected work with one selected (active) process or group(active) process or group

In the Procs mode he/she can In the Procs mode he/she can examine any process examine any process individually.individually.

The component was The component was implemented as two implemented as two “notebooks” inserted one into “notebooks” inserted one into the other.the other.

The first (outer, placed The first (outer, placed vertically) notebook is the vertically) notebook is the MPI-nodes notebook. Its MPI-nodes notebook. Its bookmarks contain info about bookmarks contain info about appropriate processes and appropriate processes and common MPI-node statuses, common MPI-node statuses, colored as nodes status colored as nodes status component.component.

The second (inner, placed The second (inner, placed horizontally) notebook is a horizontally) notebook is a notebook of processes...notebook of processes...

Page 28: Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS

Program Systems Institute RASContactsContacts

Max KovalenkoMax Kovalenko [email protected]@botik.ru Alexei AdamovichAlexei Adamovich [email protected]@botik.ru Sergei AbramovSergei Abramov

[email protected]@botik.ru