Minimizing communication for incomplete factorizations and low-rank approximations on large scale computers

Sébastien Cayrols 1
1 ALPINES - Algorithms and parallel tools for integrated numerical simulations
INSMI - Institut National des Sciences Mathématiques et de leurs Interactions, Inria de Paris, LJLL (UMR_7598) - Laboratoire Jacques-Louis Lions
Abstract : The impact of the communication on the performance of numerical algorithms increases with the number of cores. In the context of sparse linear systems of equations, solving Ax = b on a very large computer with thousands of nodes requires the minimization of the communication to achieve very high efficiency as well as low energy cost. The high level of sequentiality in the Incomplete LU factorization (ILU) makes it difficult to parallelize. We first introduce in this manuscript a Communication-Avoiding ILU preconditioner, denoted CA-ILU(k), that factors A in parallel and then is applied at each iteration of a solver as GMRES, both steps without communication. Considering a row block of A, the key idea is to gather all the required dependencies of the block so that the factorization and the application can be done without communication. Experiments show that CA-ILU(k) preconditioner can be competitive with respect to Block Jacobi and Restricted Additive Schwarz preconditioners. We then present a low-rank algorithm named LU factorization with Column Row Tournament Pivoting (LU-CRTP). This algorithm uses a tournament pivoting strategy to select a subset of columns of A that are used to compute the block LU factorization of the permuted A as well as a good approximation of the singular values of A. Extensive parallel and sequential tests show that LU-CRTP approximates the singular values with an error close to that of the Rank Revealing QR factorization (RRQR), while the memory storage of the factors in LU-CRTP is up to 200 times lower than of the factors in RRQR. In this context, we propose an improvement of the tournament pivoting strategy that tends to reduce the number of Flops performed as well as the communication. A column of A is discarded when this column is a linear combination of other columns of A, with respect to a threshold τ . Extensive experiments show that this modification does not degrade by much the accuracy of LU-CRTP. Moreover, compared to the Communication-Avoiding variant of RRQR, our modification reduces the number of operations by a factor of up to 36.
Complete list of metadatas

Cited literature [86 references]  Display  Hide  Download
Contributor : Sebastien Cayrols <>
Submitted on : Monday, January 13, 2020 - 9:57:16 PM
Last modification on : Thursday, January 16, 2020 - 1:38:23 AM


Files produced by the author(s)


  • HAL Id : tel-02437769, version 1


Sébastien Cayrols. Minimizing communication for incomplete factorizations and low-rank approximations on large scale computers. Mathematics [math]. Sorbonne Universites, UPMC University of Paris 6, 2019. English. ⟨tel-02437769⟩



Record views


Files downloads