Task-based multifrontal QR solver for heterogeneous architectures

Abstract : To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. In this study we investigate the design of task-based sparse direct solvers which constitute extremely irregular workloads, with tasks of different granularities and characteristics with variable memory consumption on top of runtime systems. In the context of the qr mumps solver, we prove the usability and effectiveness of our approach with the implementation of a sparse matrix multifrontal factorization based on a Sequential Task Flow parallel programming model. Using this programming model, we developed features such as the integration of dense 2D Communication Avoiding algorithms in the multifrontal method allowing for better scalability compared to the original approach used in qr mumps. In addition we introduced a memory-aware algorithm to control the memory behaviour of our solver and show, in the context of multicore architectures, an important reduction of the memory footprint for the multifrontal QR factorization with a small impact on performance. Following this approach, we move to heterogeneous architectures where task granularity and scheduling strategies are critical to achieve performance. We present, for the multifrontal method, a hierarchical strategy for data partitioning and a scheduling algorithm capable of handling the heterogeneity of resources. Finally we present a study on the reproducibility of executions and the use of alternative programming models for the implementation of the multifrontal method. All the experimental results presented in this study are evaluated with a detailed performance analysis measuring the impact of several identified effects on the performance and scalability. Thanks to this original analysis, presented in the first part of this study, we are capable of fully understanding the results obtained with our solver.
Complete list of metadatas

Cited literature [121 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01386600
Contributor : Abes Star <>
Submitted on : Monday, October 24, 2016 - 1:13:05 PM
Last modification on : Thursday, June 27, 2019 - 4:27:49 PM

File

2015TOU30303.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01386600, version 1

Collections

Citation

Florent Lopez. Task-based multifrontal QR solver for heterogeneous architectures. Distributed, Parallel, and Cluster Computing [cs.DC]. Université Paul Sabatier - Toulouse III, 2015. English. ⟨NNT : 2015TOU30303⟩. ⟨tel-01386600⟩

Share

Metrics

Record views

333

Files downloads

190