Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Dynamic task migration in HPC for Exascale challengeM. Rodríguez Pascual, J.A. Moríñigo, R. Mayo-García
Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas CIEMAT. Madrid, Spain
Reduce energy consumption by hibernating iddle nodes
What are we doing?
New possibilities in scheduling policies
RES- 11th Users' Conference28-29 Sept. 2017 Santiago de Compostela
- Implemement full support for checkpoint/restart in Slurm through the usage of DMTCP, a chechpoint library
- Create dynamic scheduling algorithms
- Creation of tools for system administration
- We can migrate Serial, MPI and hybrid MPI+OpenMP codes
- All process is transparent to final users
Available possibilities
Migration of single jobs
How are we doing it? Why are we doing it?
Migration process:-Chekpoint running job-Copy image to destination node-Resume job there
Software Stack:
- Resource manager: SLURM- MPI library: mvapich- OpenMP- Checkpointing mechanism: DMTCP- Container: Docker
Because it is great!
We want to provide a whole set of new tools for HPC management
This solution can:
-provide fault tolerance, basic in long running applications and highly parallel ones for the exascale era
-increase performace: spread tasks in cluster to maximice disk & network availability
-reduce energy consumption: concentrate tasks in the cluster so part of it remains iddle & can be hibernated or powered down
-reduce communication overhead: position tasks that communicate close to each other
All together, it allows creating sophisticated scheduling
We expect it to have an impact on next generation of HPC systems
Open Problems
- Ensure scalability on very large parallel jobs
- Migration on heterogeneous infrastructures
- Provide support for Docker with DMTCP
Slurm
MVAPICHDMTCPDMTCP
Migration of parallel jobs Migration of software containers
Future work
- Create a profiling tool able to monitor memory usage.
- Provide support for GPU and Xeon Phi
- Support heterogeneous clusters
- Inter-cluster job migration
Reduce MPI network overhead
Increase data locality Maximice performance
This work was supported by the COST Action NESUS (IC1305) and partially funded by the Spanish Ministry of Economy, Industry and Competitiveness project CODEC2 (TIN2015-63562-R) with FEDER funds and EU H2020 project HPC4E (grant agreement n 689772).