1
Dynamic task migraon in HPC for Exascale challenge M. Rodríguez Pascual, J.A. Moríñigo, R. Mayo-García Centro de Invesgaciones Energécas, Medioambientales y Tecnológicas CIEMAT. Madrid, Spain Reduce energy consumpon by hibernang iddle nodes What are we doing? New possibilies in scheduling policies RES- 11th Users' Conference 28-29 Sept. 2017 Sanago de Compostela - Implemement full support for checkpoint/restart in Slurm through the usage of DMTCP, a chechpoint library - Create dynamic scheduling algorithms - Creaon of tools for system administraon - We can migrate Serial, MPI and hybrid MPI+OpenMP codes - All process is transparent to final users Available possibilies Migraon of single jobs How are we doing it? Why are we doing it? Migraon process: -Chekpoint running job -Copy image to desnaon node -Resume job there Soſtware Stack: - Resource manager: SLURM - MPI library: mvapich - OpenMP - Checkpoinng mechanism: DMTCP - Container: Docker Because it is great! We want to provide a whole set of new tools for HPC management This soluon can: -provide fault tolerance, basic in long running applicaons and highly parallel ones for the exascale era -increase performace: spread tasks in cluster to maximice disk & network availability -reduce energy consumpon: concentrate tasks in the cluster so part of it remains iddle & can be hibernated or powered down -reduce communicaon overhead: posion tasks that communicate close to each other All together, it allows creang sophiscated scheduling We expect it to have an impact on next generaon of HPC systems Open Problems - Ensure scalability on very large parallel jobs - Migraon on heterogeneous infrastructures - Provide support for Docker with DMTCP Slurm MVAPICH DMTCP DMTCP Migraon of parallel jobs Migraon of soſtware containers Future work - Create a profiling tool able to monitor memory usage. - Provide support for GPU and Xeon Phi - Support heterogeneous clusters - Inter-cluster job migraon Reduce MPI network overhead Increase data locality Maximice performance This work was supported by the COST Acon NESUS (IC1305) and parally funded by the Spanish Ministry of Economy, Industry and Compeveness project CODEC2 (TIN2015-63562-R) with FEDER funds and EU H2020 project HPC4E (grant agreement n 689772).

Dynamic task migration in HPC for Exascale challenge · -Copy image to destination node-Resume job there Software Stack: - Resource manager: SLURM - MPI library: mvapich - OpenMP

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic task migration in HPC for Exascale challenge · -Copy image to destination node-Resume job there Software Stack: - Resource manager: SLURM - MPI library: mvapich - OpenMP

Dynamic task migration in HPC for Exascale challengeM. Rodríguez Pascual, J.A. Moríñigo, R. Mayo-García

Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas CIEMAT. Madrid, Spain

Reduce energy consumption by hibernating iddle nodes

What are we doing?

New possibilities in scheduling policies

RES- 11th Users' Conference28-29 Sept. 2017 Santiago de Compostela

- Implemement full support for checkpoint/restart in Slurm through the usage of DMTCP, a chechpoint library

- Create dynamic scheduling algorithms

- Creation of tools for system administration

- We can migrate Serial, MPI and hybrid MPI+OpenMP codes

- All process is transparent to final users

Available possibilities

Migration of single jobs

How are we doing it? Why are we doing it?

Migration process:-Chekpoint running job-Copy image to destination node-Resume job there

Software Stack:

- Resource manager: SLURM- MPI library: mvapich- OpenMP- Checkpointing mechanism: DMTCP- Container: Docker

Because it is great!

We want to provide a whole set of new tools for HPC management

This solution can:

-provide fault tolerance, basic in long running applications and highly parallel ones for the exascale era

-increase performace: spread tasks in cluster to maximice disk & network availability

-reduce energy consumption: concentrate tasks in the cluster so part of it remains iddle & can be hibernated or powered down

-reduce communication overhead: position tasks that communicate close to each other

All together, it allows creating sophisticated scheduling

We expect it to have an impact on next generation of HPC systems

Open Problems

- Ensure scalability on very large parallel jobs

- Migration on heterogeneous infrastructures

- Provide support for Docker with DMTCP

Slurm

MVAPICHDMTCPDMTCP

Migration of parallel jobs Migration of software containers

Future work

- Create a profiling tool able to monitor memory usage.

- Provide support for GPU and Xeon Phi

- Support heterogeneous clusters

- Inter-cluster job migration

Reduce MPI network overhead

Increase data locality Maximice performance

This work was supported by the COST Action NESUS (IC1305) and partially funded by the Spanish Ministry of Economy, Industry and Competitiveness project CODEC2 (TIN2015-63562-R) with FEDER funds and EU H2020 project HPC4E (grant agreement n 689772).