20
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15 1

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

Embed Size (px)

Citation preview

Page 1: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 1

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS  Iván Pérez

Enrique Vallejo

José Luis Bosque

University of Cantabria

Page 2: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 2

Overview• HPC Traces - Introduction

• Traces for Application Developers• Traces for Computer Architects• Traces - Objections• Goals

• BSC Trace Tools • Extrae• Paraver

• TraceRep• Architecture• Design• Implementation• Limitations• Snapshots

• Conclusions and Future Work

Page 3: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 3

1. HPC Traces – Introduction• HPC traces are sequences of events and messages

recorded during the execution of a parallel HPC program.

Page 4: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 4

1.1. Traces for Application Developers

Computation Synchronization Waits

Point to PointMessages

Load Unbalance

Evaluation, tuning and optimization of applications

Page 5: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 5

1.2. Traces for Computer Architects• Evaluate computer architectures.• Workloads for feeding simulators.

Application Binaries

Application Execution

Extraction Tool

Hardware model 1

Hardware model 2

Hardware model 3

Stats 1

Stats 2

Stats 3

Simulator

Page 6: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 6

1.3. Traces - Objections• Complexity of tools and environment.• Limited access to HPC clusters.• Traces can reach very large sizes.• Traces are often not shared between researchers

• Traces are hard to obtain and distribute.• The tracing effort is not recognized.

Page 7: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 7

1.4. TraceRep - Goals• User friendly interface to collect traces.

• Support with multiple clusters.• Easy to incorporate new clusters.

• Public trace repository.• Computer architects can access to traces of parallel

applications for their experiments.• Users can upload their own traces for the community.• Author encouragement:

• Authorship: Users can set Creative Commons licenses which protect the authorship of their traces.

• Citation of related work: Users can add a citation (.bib file) of a paper which studied the traced application, so it can be cited when the trace is used.

Page 8: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 8

Overview• HPC Traces - Introduction

• Traces for Application Developers• Traces for Computer Architects• Traces - Objections• Goals

• BSC Trace Tools • Extrae• Paraver

• TraceRep• Architecture• Design• Implementation• Limitations• Snapshots

• Conclusions and Future Work

Page 9: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 9

2.1. Extrae• Collects information during

the program execution and generates traces:• Runtime entries and exits,

hardware counters, user functions, periodic samples…

• Supported programming models:• MPI, OpenMP, CUDA, OpenCL,

pthreads, OmpSs, Java, Python.

• Supported platforms:• Linux clusters, BlueGene/Q,

Cray, nVidia GPUs, Intel Xeon Phi, ARM, Android.

Extrae configuration file

Page 10: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 10

2.2. BSC Tools - Paraver• Very flexible visualization tool of trace-files.

Page 11: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

IWSG'15 11

Overview• HPC Traces - Introduction

• Traces for Application Developers• Traces for Computer Architectures• Traces - Objections• Goals

• BSC Trace Tools • Extrae• Paraver

• TraceRep• Architecture• Design• Implementation• Limitations• Sanpshots

• Conclusions and Future Work

Page 12: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 12

Internet

Internet

Web Browser

Users

Gateway

Drupal 7

Core Modules

TraceRepModule

Third Party Modules

User / Password

HPC clusters

TraceRep Scripts

Extrae Resource Manager

SSH/SFTP TraceRep User / Password

Apache 2 PHP MySQL

3.1. TraceRep - Architecture

Page 13: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 13

Cluster

Compilation

Environment setup

Experiment launch

Gateway

Authentication

Source Code Upload

Create Experiment

Trace Repository

Anonymous

Registered

Trace Upload

3.2. TraceRep - Design

Page 14: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 14

3.2. TraceRep - Implementation• Drupal’s modules covered most of the features.• Trace extraction service has implementations in both sides:

• Gateway side: new Drupal module.• Clusters side: Python scripts adapted to the specific cluster.

Drupal

Cluster

Trace Extraction

Experiment

Periodic Task

Cluster Filesystem

TraceRep directory

Compiltation Tools Extrae Resource Manager

Makefile Scripts

Is the experiment over?

Page 15: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 15

3.4. TraceRep – Current prototype limitations

• Security:• TraceRep users upload code to the HPC clusters• Alternatives:

• Restricted privileges for the user account of TraceRep• Require a cluster account per-user to extract traces

• Compilation:• Paths to compilers and libraries can vary from cluster to cluster• Compilation constrains: a generic Makefile is currently used for all

source codes. Applications that use complex building tools are currently no supported.

• Alternative: provide a unified environment for compilation.

• Storage:• Storage in the gateway server is limited (limitation of the service used)• Alternative: $$$

Page 16: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 16

3.5. Snapshotshttp://tracerep.unican.es

Page 17: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 17

Overview• HPC Traces - Introduction

• Traces for Application Developers• Traces for Computer Architectures• Traces - Objections• Goals

• BSC Trace Tools • Extrae• Paraver

• TraceRep• Architecture• Design• Implementation• Limitations• Snapshots

• Conclusions and Future Work

Page 18: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 18

4. TraceRep – Conclusions• Traces are very useful for HPC parallel application

developers and computer architects.• TraceRep provides a user friendly interface to collect

and share traces.• It encourage to share traces through trace licensing and

citations.• There are some limitations that must be addressed,

regarding security, compilation and storage.

Page 19: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 19

4. TraceRep – Future work• Alternative frameworks to replace the Drupal prototype:

• Liferay [1]• Apache Airavata [2]

• Improve the compilation toolchain to present a consistent view on different clusters and allow for more complex codes.

• Exploiting the advanced features of Paraver is complex. We are seeking for a way to integrate Paraver in TraceRep.

[1] “Liferay” 2015. Available: http://www.liferay.com/

[2] “Apache Airavata architecture overview,” 2015. Available:http://airavata.apache.org/architecture/overview.html

Page 20: TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15

TraceRep IWSG'15 20

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS  Iván Pérez

Enrique Vallejo

Jose Luis Bosque

University of Cantabria

Thank you for your attention