Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides

Abstract : We consider the solution of very large sparse systems of linear equations on parallel architectures. In this context, memory is often a bottleneck that prevents or limits the use of direct solvers, especially those based on the multifrontal method. This work focuses on memory and performance issues of the two memory and computationally intensive phases of direct methods, namely, the numerical factorization and the solution phase. In the first part we consider the solution phase with sparse right-hand sides, and in the second part we consider the memory scalability of the multifrontal factorization. In the first part, we focus on the triangular solution phase with multiple sparse right-hand sides, that appear in numerous applications. We especially emphasize the computation of entries of the inverse, where both the right-hand sides and the solution are sparse. We first present several storage schemes that enable a significant compression of the solution space, both in a sequential and a parallel context. We then show that the way the right-hand sides are partitioned into blocks strongly influences the performance and we consider two different settings: the out-of-core case, where the aim is to reduce the number of accesses to the factors, that are stored on disk, and the in-core case, where the aim is to reduce the computational cost. Finally, we show how to enhance the parallel efficiency. In the second part, we consider the parallel multifrontal factorization. We show that controlling the active memory specific to the multifrontal method is critical, and that commonly used mapping techniques usually fail to do so: they cannot achieve a high memory scalability, i.e., they dramatically increase the amount of memory needed by the factorization when the number of processors increases. We propose a class of "memory-aware" mapping and scheduling algorithms that aim at maximizing performance while enforcing a user-given memory constraint and provide robust memory estimates before the factorization. These techniques have raised performance issues in the parallel dense kernels used at each step of the factorization, and we have proposed some algorithmic improvements. The ideas presented throughout this study have been implemented within the MUMPS (MUltifrontal Massively Parallel Solver) solver and experimented on large matrices (up to a few tens of millions unknowns) and massively parallel architectures (up to a few thousand cores). They have demonstrated to improve the performance and the robustness of the code, and will be available in a future release. Some of the ideas presented in the first part have also been implemented within the PDSLin (Parallel Domain decomposition Schur complement based Linear solver) package.
Complete list of metadatas

Cited literature [91 references]  Display  Hide  Download
Contributor : François-Henry Rouet <>
Submitted on : Wednesday, February 6, 2013 - 7:37:42 PM
Last modification on : Monday, April 29, 2019 - 5:30:45 PM
Long-term archiving on : Saturday, April 1, 2017 - 5:49:04 PM


  • HAL Id : tel-00785748, version 1



François-Henry Rouet. Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides. Mathematical Software [cs.MS]. Institut National Polytechnique de Toulouse - INPT, 2012. English. ⟨tel-00785748⟩



Record views


Files downloads