Evaluation de précision et vitesse de simulation pour des systèmes de calcul distribué à large échelle

Abstract : Large-Scale Distributed Computing (LSDC) systems are in production today to solve problems that require huge amounts of computational power or storage. Such systems are composed by a set of computational resources sharing a communication infrastructure. In such systems, as in any computing environment, specialists need to conduct experiments to validate alternatives and compare solutions. However, due to the distributed nature of resources, performing experiments in LSDC environments is hard and costly. In such systems, the execution flow depends on the order of events which is likely to change from one execution to another. Consequently, it is hard to reproduce experiments hindering the development process. Moreover, resources are very likely to fail or go off-line. Yet, LSDC archi- tectures are shared and interference among different applications, or even among processes of the same application, affects the overall application behavior. Last, LSDC applications are time consuming, thus conducting many experiments, with several parameters is often unfeasible. Because of all these reasons, experiments in LSDC often rely on simulations. Today we find many simulation approaches for LSDC. Most of them objective specific architectures, such as cluster, grid or volunteer computing. Each simulator claims to be more adapted for a particular research purpose. Nevertheless, those simulators must address the same problems: modeling network and managing computing resources. Moreover, they must satisfy the same requirements providing: fast, accurate, scalable, and repeatable simulations. To match these requirements, LSDC simulation use models to approximate the system behavior, neglecting some aspects to focus on the desired phe- nomena. However, models may be wrong. When this is the case, trusting on models lead to random conclusions. In other words, we need to have evidence that the models are accurate to accept the con- clusions supported by simulated results. Although many simulators exist for LSDC, studies about their accuracy is rarely found. In this thesis, we are particularly interested in analyzing and proposing accurate models that respect the requirements of LSDC research. To follow our goal, we propose an accuracy evaluation study to verify common and new simulation models. Throughout this document, we propose model improvements to mitigate simulation error of LSDC simulation using SimGrid as case study. We also evaluate the effect of these improvements on scalability and speed. As a main contribution, we show that intuitive models have better accuracy, speed and scalability than other state-of-the art models. These better results are achieved by performing a thorough and systematic analysis of problematic situations. This analysis reveals that many small yet common phenomena had been neglected in previous models and had to be accounted for to design sound models.
Document type :
Theses
Liste complète des métadonnées

https://tel.archives-ouvertes.fr/tel-00625497
Contributor : Abes Star <>
Submitted on : Wednesday, September 21, 2011 - 4:47:44 PM
Last modification on : Tuesday, April 16, 2019 - 1:41:06 AM
Document(s) archivé(s) le : Thursday, December 22, 2011 - 2:35:52 AM

File

18610_VELHO_2011_archivage_1_....
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00625497, version 1

Collections

Citation

Pedro Antonio Madeira de Campos Velho. Evaluation de précision et vitesse de simulation pour des systèmes de calcul distribué à large échelle. Autre [cs.OH]. Université Grenoble Alpes, 2011. Français. ⟨NNT : 2011GRENM027⟩. ⟨tel-00625497⟩

Share

Metrics

Record views

880

Files downloads

383