Calculs pour les matrices denses : coût de communication et stabilité numérique

Amal Khabou

Résumé

This dissertation focuses on a widely used linear algebra kernel to solve linear systems, that is the LU decomposition. Usually, to perform such a computation one uses the Gaussian elimination with partial pivoting (GEPP). The backward stability of GEPP depends on a quantity which is referred to as the growth factor, it is known that in general GEPP leads to modest element growth in practice. However its parallel version does not attain the communication lower bounds. Indeed the panel factorization rep- resents a bottleneck in terms of communication. To overcome this communication bottleneck, Grigori et al [60] have developed a communication avoiding LU factorization (CALU), which is asymptotically optimal in terms of communication cost at the cost of some redundant computation. In theory, the upper bound of the growth factor is larger than that of Gaussian elimination with partial pivoting, however CALU is stable in practice. To improve the upper bound of the growth factor, we study a new pivoting strategy based on strong rank revealing QR factorization. Thus we develop a new block algorithm for the LU factorization. This algorithm has a smaller growth factor upper bound compared to Gaussian elimination with partial pivoting. The strong rank revealing pivoting is then combined with tournament pivoting strategy to produce a communication avoiding LU factorization that is more stable than CALU. For hierarchical systems, multiple levels of parallelism are available. However, none of the previously cited methods fully exploit these hierarchical systems. We propose and study two recursive algorithms based on the communication avoiding LU algorithm, which are more suitable for architectures with multiple levels of parallelism. For an accurate and realistic cost analysis of these hierarchical algo- rithms, we introduce a hierarchical parallel performance model that takes into account processor and network hierarchies. This analysis enables us to accurately predict the performance of the hierarchical LU factorization on an exascale platform.

Cette thèse traite d’une routine d’algèbre linéaire largement utilisée pour la résolution des systèmes li- néaires, il s’agit de la factorisation LU. Habituellement, pour calculer une telle décomposition, on utilise l’élimination de Gauss avec pivotage partiel (GEPP). La stabilité numérique de l’élimination de Gauss avec pivotage partiel est caractérisée par un facteur de croissance qui est reste assez petit en pratique. Toutefois, la version parallèle de cet algorithme ne permet pas d’atteindre les bornes inférieures qui ca- ractérisent le coût de communication pour un algorithme donné. En effet, la factorisation d’un bloc de colonnes constitue un goulot d’étranglement en termes de communication. Pour remédier à ce problème, Grigori et al [60] ont développé une factorisation LU qui minimise la communication(CALU) au prix de quelques calculs redondants. En théorie la borne supérieure du facteur de croissance de CALU est plus grande que celle de l’élimination de Gauss avec pivotage partiel, cependant CALU est stable en pratique. Pour améliorer la borne supérieure du facteur de croissance, nous étudions une nouvelle stra- tégie de pivotage utilisant la factorisation QR avec forte révélation de rang. Ainsi nous développons un nouvel algorithme pour la factorisation LU par blocs. La borne supérieure du facteur de croissance de cet algorithme est plus petite que celle de l’élimination de Gauss avec pivotage partiel. Cette stratégie de pivotage est ensuite combinée avec le pivotage basé sur un tournoi pour produire une factorisation LU qui minimise la communication et qui est plus stable que CALU. Pour les systèmes hiérarchiques, plusieurs niveaux de parallélisme sont disponibles. Cependant, aucune des méthodes précédemment ci- tées n’exploite pleinement ces ressources. Nous proposons et étudions alors deux algorithmes récursifs qui utilisent les mêmes principes que CALU mais qui sont plus appropriés pour des architectures à plu- sieurs niveaux de parallélisme. Pour analyser d’une façon précise et réaliste

Calculs pour les matrices denses : coût de communication et stabilité numérique

Dense matrix computations : communication cost and numerical stability

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager