Exploiting Heterogeneous Distributed Systems for Monte-Carlo Simulations in the Medical Field

Sorina Pop 1, 2
1 Service Informatique et développements
CREATIS - Centre de Recherche en Acquisition et Traitement de l'Image pour la Santé
2 Images et Modèles
CREATIS - Centre de Recherche en Acquisition et Traitement de l'Image pour la Santé
Abstract : Particle-tracking Monte-Carlo applications are easily parallelizable, but efficient parallelization on computing grids is difficult to achieve. Advanced scheduling strategies and parallelization methods are required to cope with failures and resource heterogeneity on distributed architectures. Moreover, the merging of partial simulation results is also a critical step. In this context, the main goal of our work is to propose new strategies for a faster and more reliable execution of Monte-Carlo applications on computing grids. These strategies concern both the computing and merging phases of Monte-Carlo applications and aim at being used in production. In this thesis, we introduce a parallelization approach based on pilots jobs and on a new dynamic partitioning algorithm. Results obtained on the production European Grid Infrastructure (EGI) using the GATE application show that pilot jobs bring strong improvement w.r.t. regular metascheduling and that the proposed dynamic partitioning algorithm solves the load-balancing problem of particle-tracking Monte-Carlo applications executed in parallel on distributed heterogeneous systems. Since all tasks complete almost simultaneously, our method can be considered optimal both in terms of resource usage and makespan. We also propose advanced merging strategies with multiple parallel mergers. Checkpointing is used to enable incremental result merging from partial results and to improve reliability. A model is proposed to analyze the behavior of the complete framework and help tune its parameters. Experimental results show that the model fits the real makespan with a relative error of maximum 10%, that using multiple parallel mergers reduces the makespan by 40% on average, that checkpointing enables the completion of very long simulations and that it can be used without penalizing the makespan. To evaluate our load balancing and merging strategies, we implement an end-to-end SimGrid-based simulation of the previously described framework for Monte-Carlo computations on EGI. Simulated and real makespans are consistent, and conclusions drawn in production about the influence of application parameters such as the checkpointing frequency and the number of mergers are also made in simulation. These results open the door to better and faster experimentation. To illustrate the outcome of the proposed framework, we present some usage statistics and a few examples of results obtained in production. These results show that our experience in production is significant in terms of users and executions, that the dynamic load balancing can be used extensively in production, and that it significantly improves performance regardless of the variable grid conditions.
Document type :
Theses
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01015270
Contributor : Abes Star <>
Submitted on : Thursday, June 26, 2014 - 10:47:09 AM
Last modification on : Tuesday, November 19, 2019 - 11:51:41 AM
Long-term archiving on : Friday, September 26, 2014 - 11:20:49 AM

File

these.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01015270, version 1

Citation

Sorina Pop. Exploiting Heterogeneous Distributed Systems for Monte-Carlo Simulations in the Medical Field. Medical Imaging. INSA de Lyon, 2013. English. ⟨NNT : 2013ISAL0114⟩. ⟨tel-01015270⟩

Share

Metrics

Record views

377

Files downloads

173