Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues

Arnaud Legrand 1, 2
1 MESCAL - Middleware efficiently scalable
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Although our everyday life and society now depends heavily on communication infrastructures and computation infrastructures, scientists and engineers have always been among the main consumers of computing power. This document provides a coherent overview of the research I have conducted in the last 15 years and which targets the management and performance evaluation of large scale distributed computing infrastructures such as clusters, grids, desktop grids, volunteer computing platforms, ... when used for scientific computing. In the first part of this document, I present how I have addressed scheduling problems arising on distributed platforms (like computing grids) with a particular emphasis on heterogeneity and multi-user issues, hence in connection with game theory. Most of these problems are relaxed from a classical combinatorial optimization formulation into a continuous form, which allows to easily account for key platform characteristics such as heterogeneity or complex topology while providing efficient practical and distributed solutions. The second part presents my main contributions to the SimGrid project, which is a simulation toolkit for building simulators of distributed applications (originally designed for scheduling algorithm evaluation purposes). It comprises a unified presentation of how the questions of validation and scalability have been addressed in SimGrid as well as thoughts on specific challenges related to methodological aspects and to the application of SimGrid to the HPC context.
Type de document :
HDR
Distributed, Parallel, and Cluster Computing [cs.DC]. Université Grenoble Alpes, 2015
Liste complète des métadonnées

https://tel.archives-ouvertes.fr/tel-01247932
Contributeur : Arnaud Legrand <>
Soumis le : mercredi 23 décembre 2015 - 10:27:54
Dernière modification le : mardi 15 mars 2016 - 16:16:22

Fichier

Identifiants

  • HAL Id : tel-01247932, version 1

Collections

Citation

Arnaud Legrand. Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Grenoble Alpes, 2015. <tel-01247932>

Partager

Métriques

Consultations de
la notice

577

Téléchargements du document

624