Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures

Abstract : The solution of sparse systems of linear equations is at the heart of numerous applicationfields. While the amount of computational resources in modern architectures increases and offersnew perspectives, the size of the problems arising in today’s numerical simulation applicationsalso grows very much. Exploiting modern architectures to solve very large problems efficiently isthus a challenge, from both a theoretical and an algorithmic point of view. The aim of this thesisis to address the scalability of sparse direct solvers based on multifrontal methods in parallelasynchronous environments.In the first part of this thesis, we focus on exploiting multi-threaded parallelism on sharedmemoryarchitectures. A variant of the Geist-Ng algorithm is introduced to handle both finegrain parallelism through the use of optimized sequential and multi-threaded BLAS libraries andcoarser grain parallelism through explicit OpenMP based parallelization. Memory aspects arethen considered to further improve performance on NUMA architectures: (i) on the one hand,we analyse the influence of memory locality and exploit adaptive memory allocation strategiesto manage private and shared workspaces; (ii) on the other hand, resource sharing on multicoreprocessors induces performance penalties when many cores are active (machine load effects) thatwe also consider. Finally, in order to avoid resources remaining idle when they have finishedtheir share of the work, and thus, to efficiently exploit all computational resources available, wepropose an algorithm wich is conceptually very close to the work-stealing approach and whichconsists in dynamically assigning idle cores to busy threads/activities.In the second part of this thesis, we target hybrid shared-distributed memory architectures,for which specific work to improve scalability is needed when processing large problems. We firststudy and optimize the dense linear algebra kernels used in distributed asynchronous multifrontalmethods. Simulation, experimentation and profiling have been performed to tune parameterscontrolling the algorithm, in correlation with problem size and computer architecture characteristics.To do so, right-looking and left-looking variants of the LU factorization with partialpivoting in our distributed context have been revisited. Furthermore, when computations are acceleratedwith multiple cores, the relative weight of communication with respect to computationis higher. We explain how to design mapping algorithms minimizing the communication betweennodes of the dependency tree of the multifrontal method, and show that collective asynchronouscommunications become critical on large numbers of processors. We explain why asynchronousbroadcasts using standard tree-based communication algorithms must be used. We then showthat, in a fully asynchronous multifrontal context where several such asynchronous communicationtrees coexist, new synchronization issues must be addressed. We analyse and characterizethe possible deadlock situations and formally establish simple global properties to handle deadlocks.Such properties partially force synchronization and may limit performance. Hence, wedefine properties which enable us to relax synchronization and thus improve performance. Ourapproach is based on the observation that, in our case, as long as memory is available, deadlockscannot occur and, consequently, we just need to keep enough memory to guarantee thata deadlock can always be avoided. Finally, we show that synchronizations can be relaxed in astate-of-the-art solver and illustrate the performance gains on large real problems in our fullyasynchronous multifrontal approach.
Document type :
Theses
Complete list of metadatas

Cited literature [118 references]  Display  Hide  Download

https://hal.inria.fr/tel-01111259
Contributor : Equipe Roma <>
Submitted on : Friday, January 30, 2015 - 12:35:19 AM
Last modification on : Friday, April 20, 2018 - 3:44:27 PM
Long-term archiving on : Saturday, April 15, 2017 - 11:08:20 PM

Identifiers

  • HAL Id : tel-01111259, version 1

Citation

Mohamed Wissam Sid Lakhdar. Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures. Other [cs.OH]. Ecole normale supérieure de lyon - ENS LYON, 2014. English. ⟨NNT : 2014ENSL0958⟩. ⟨tel-01111259⟩

Share

Metrics

Record views

844

Files downloads

647