Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues

Arnaud Legrand 1, 2
1 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Although our everyday life and society now depends heavily on communication infrastructures and computation infrastructures, scientists and engineers have always been among the main consumers of computing power. This document provides a coherent overview of the research I have conducted in the last 15 years and which targets the management and performance evaluation of large scale distributed computing infrastructures such as clusters, grids, desktop grids, volunteer computing platforms, ... when used for scientific computing. In the first part of this document, I present how I have addressed scheduling problems arising on distributed platforms (like computing grids) with a particular emphasis on heterogeneity and multi-user issues, hence in connection with game theory. Most of these problems are relaxed from a classical combinatorial optimization formulation into a continuous form, which allows to easily account for key platform characteristics such as heterogeneity or complex topology while providing efficient practical and distributed solutions. The second part presents my main contributions to the SimGrid project, which is a simulation toolkit for building simulators of distributed applications (originally designed for scheduling algorithm evaluation purposes). It comprises a unified presentation of how the questions of validation and scalability have been addressed in SimGrid as well as thoughts on specific challenges related to methodological aspects and to the application of SimGrid to the HPC context.
Document type :
Habilitation à diriger des recherches
Complete list of metadatas

Cited literature [240 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01247932
Contributor : Arnaud Legrand <>
Submitted on : Wednesday, December 23, 2015 - 10:27:54 AM
Last modification on : Thursday, October 11, 2018 - 8:48:02 AM
Long-term archiving on : Sunday, April 30, 2017 - 12:25:05 AM

File

Identifiers

  • HAL Id : tel-01247932, version 1

Collections

Citation

Arnaud Legrand. Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Grenoble Alpes, 2015. ⟨tel-01247932⟩

Share

Metrics

Record views

1120

Files downloads

4843