Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications

Sylvain Didelot

Thèse Année : 2014

Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications

Amélioration de la consommation mémoire et de l'extensibilité des performances des applications HPC par le multi-threading des communications réseaux

(1, 1)

Sylvain Didelot

Fonction : Auteur
PersonId : 773549
IdRef : 184284120

Parallélisme, Réseaux, Systèmes, Modélisation

Résumé

A recent trend in high performance computing shows a rising number of cores per compute node, while the total amount of memory per compute node remains constant. To scale parallel applications on such large machines, one of the major challenges is to keep a low memory consumption. This thesis develops a multi-threaded communication layer over Infiniband which provides both good performance of communications and a low memory consumption. We target scientific applications parallelized using the MPI standard in pure mode or combined with a shared memory programming model. Starting with the observation that network endpoints and communication buffers are critical for the scalability of MPI runtimes, the first contribution proposes three approaches to control their usage. We introduce a scalable and fully-connected virtual topology for connection-oriented high-speed networks. In the context of multirail configurations, we then detail a runtime technique which reduces the number of network connections. We finally present a protocol for dynamically resizing network buffers over the RDMA technology. The second contribution proposes a runtime optimization to enforce the overlap potential of MPI communications, showing a 2x improvement factor on communications. The third contribution evaluates the performance of several MPI runtimes running a seismic modeling application in a hybrid context. On large compute nodes up to 128 cores, the introduction of OpenMP in the MPI application saves up to 17 % of memory. Moreover, we show a performance improvement with our multi-threaded communication layer where the OpenMP threads concurrently participate to the MPI communications

La tendance en HPC est à l'accroissement du nombre de coeurs par noeud de calcul pour une quantité totale de mémoire par noeud constante. A large échelle, l'un des principaux défis pour les applications parallèles est de garder une faible consommation mémoire. Cette thèse présente une couche de communication multi-threadée sur Infiniband, laquelle fournie de bonnes performances et une faible consommation mémoire. Nous ciblons les applications scientifiques parallélisées grâce à la bibliothèque MPI ou bien combinées avec un modèle de programmation en mémoire partagée. En partant du constat que le nombre de connexions réseau et de buffers de communication est critique pour la mise à l'échelle des bibliothèques MPI, la première contribution propose trois approches afin de contrôler leur utilisation. Nous présentons une topologie virtuelle extensible et entièrement connectée pour réseaux rapides orientés connexion. Dans un contexte agrégeant plusieurs cartes permettant d'ajuster dynamiquement la configuration des buffers réseau utilisant la technologie RDMA. La seconde contribution propose une optimisation qui renforce le potentiel d'asynchronisme des applications MPI, laquelle montre une accélération de deux des communications. La troisième contribution évalue les performances de plusieurs bibliothèques MPI exécutant une application de modélisation sismique en contexte hybride. Les expériences sur des noeuds de calcul jusqu'à 128 coeurs montrent une économie de 17 % sur la mémoire. De plus, notre couche de communication multi-threadée réduit le temps d'exécution dans le cas où plusieurs threads OpenMP participent simultanément aux communications MPI.

Mots clés

High performance computing High-speed networks

Calcul haute performance Multi-threading Réseaux haut débit MPI NUMA

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Performance et fiabilité [cs.PF]

Fichier principal

2014VERS0029bis.pdf (4.3 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://theses.hal.science/tel-01132316

Soumis le : mardi 17 mars 2015-08:27:05

Dernière modification le : vendredi 24 mars 2023-14:53:00

Archivage à long terme le : jeudi 18 juin 2015-10:35:22

Dates et versions

tel-01132316 , version 1 (17-03-2015)

Identifiants

HAL Id : tel-01132316 , version 1

Citer

Sylvain Didelot. Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Versailles-Saint Quentin en Yvelines, 2014. English. ⟨NNT : 2014VERS0029⟩. ⟨tel-01132316⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS STAR UVSQ GENCI

223 Consultations

716 Téléchargements

Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications

Amélioration de la consommation mémoire et de l'extensibilité des performances des applications HPC par le multi-threading des communications réseaux

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager