Task-based FMM for multicore architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00807368
Tensorflow: A system for large-scale machine learning, Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16), 2016. ,
Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, vol.28, issue.9, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-00974674
LAPACK: A Portable Linear Algebra Library for High-performance Computers, Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, Supercomputing '90, pp.2-11, 1990. ,
Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures, José M. Laginha M. Palma, Michel Daydé, Osni Marques, and João Correia Lopes, pp.129-138, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00548906
,
, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, 19th International Conference Euro-Par, vol.8097, pp.521-532, 2013.
Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01333645
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th Euro-Par Conference, 2009. ,
The design of OpenMP tasks, IEEE Transactions on Parallel and Distributed Systems, vol.20, issue.3, pp.404-418, 2009. ,
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, In Journal of Physics: Conference Series, vol.180, p.12037, 2009. ,
On the application task granularity and the interplay with the scheduling overhead in manycore shared memory systems, 2015 IEEE International Conference on Cluster Computing (CLUSTER), vol.00, pp.428-437, 2015. ,
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, vol.23, pp.187-198, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Gama framework: Hardware aware scheduling in heterogeneous environments, 2012. ,
DAGuE: A generic distributed DAG engine for high performance computing, 2010. ,
Flexible development of dense linear algebra algorithms on massively parallel architectures with dplasma, Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum (IPDPSW'11), pp.1432-1441, 2011. ,
From serial loops to parallel execution on distributed systems, European Conference on Parallel Processing, pp.246-257, 2012. ,
PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability, Computing in Science and Engineering, vol.15, issue.6, pp.36-45, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00930217
hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), vol.82, pp.180-186, 1989. ,
Poster: programming clusters of gpus with ompss, Proceedings of the international conference on Supercomputing, ICS '11, pp.378-378, 2011. ,
Approximation of Boundary Element Matrices, Numerische Mathematik, vol.86, pp.565-589, 2000. ,
Influence of Tasks Duration Variability on Task-Based Runtime Schedulers, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01716489
TEMANEJO-a debugger for task based parallel programming models, 2011. ,
Scheduling Independent Moldable Tasks on Multi-Cores with GPUs, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01263100
Cilk: An Efficient Multithreaded Runtime System, SIGPLAN Not, vol.30, issue.8, pp.207-216, 1995. ,
Complexity results for scheduling problems, 2009. ,
Scheduling Independent Tasks on Multi-cores with GPU Accelerators, Concurr. Comput. : Pract. Exper, vol.27, issue.6, pp.1625-1638, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01081625
Scheduling multithreaded computations by work stealing, Journal of the ACM (JACM), vol.46, issue.5, pp.720-748, 1999. ,
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
COMP Superscalar, an interoperable programming framework, SoftwareX, vol.1, pp.32-36, 2003. ,
Cellss: Scheduling techniques to better exploit memory hierarchy, Scientific Programming, vol.17, pp.77-95, 2009. ,
Productive programming of gpu clusters with ompss, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012. ,
Legion: expressing locality and independence with logical regions, Proceedings of the international conference on high performance computing, networking, storage and analysis, p.66, 2012. ,
Parallel programmability and the chapel language, The International Journal of High Performance Computing Applications, vol.21, issue.3, pp.291-312, 2007. ,
ScaLAPACK: A portable linear algebra library for distributed memory computers-Design issues and performance, Applied Parallel Computing Computations in Physics, pp.95-106, 1996. ,
Barra: A parallel functional simulator for gpgpu, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp.351-360, 2010. ,
Arthur Redondy, and Clément Vuchener. Vite's project page ,
The MPI message passing interface standard. In Programming environments for massively parallel distributed systems, pp.213-218, 1994. ,
X10: An object-oriented approach to non-uniform cluster computing, SIGPLAN Not, vol.40, issue.10, pp.519-538, 2005. ,
Evaluation and optimization of the robustness of dag schedules in heterogeneous environments, IEEE Transactions on Parallel and Distributed Systems, vol.99, pp.532-546, 2009. ,
Distributed snapshots: Determining global states of distributed systems, ACM Trans. Comput. Syst, vol.3, issue.1, pp.63-75, 1985. ,
Automatic task graph generation techniques, Proceedings of the Twenty-Eighth Hawaii International Conference on, vol.2, pp.113-122, 1995. ,
SimGrid: a Generic Framework for Large-Scale Distributed Experiments, 10th IEEE International Conference on Computer Modeling and Simulation, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00260697
Multiprogram scheduling: Parts 1 and 2. introduction and theory, Commun. ACM, vol.3, issue.6, pp.347-350, 1960. ,
Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp.123-132, 2008. ,
HMPP: A hybrid multi-core parallel programming environment, 2007. ,
PTG: an abstraction for unhindered parallelism, DomainSpecific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp.21-30, 2014. ,
LINPACK users' guide, 1979. ,
Large Scale Distributed Deep Networks. In f. pereira, c. j. c. burges, l. bottou, and k. q. weinberger, editors, advances in neural information processing systems 25, pp.1223-1231, 2012. ,
Auto-tuning skepu: a multi-backend skeleton programming framework for multi-gpu systems, Proceeding of the 4th international workshop on Multicore software engineering, IWMSE '11, pp.25-32, 2011. ,
First version of a data fow procedure language, Programming Symposium, pp.362-376, 1974. ,
Mapreduce: Simplified data processing on large clusters, Commun. ACM, vol.51, issue.1, pp.107-113, 2008. ,
Een algorithme ter voorkoming van de dodelijke omarming, 1965. ,
The mathematics behind the banker's algorithm. Selected Writings on Computing: A personal Perspective, 1982. ,
Scheduling Parallel Tasks: Approximation Algorithms, Handbook of Scheduling: Algorithms, Models, and Performance Analysis, vol.26, pp.26-27, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00003126
An efficient multi-level trace toolkit for multi-threaded applications, EuroPar, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00360309
Architecture-aware algorithms for scalable performance and resilience on heterogeneous architectures, vol.3, 2013. ,
MPI Overlap: Benchmark and Analysis, International Conference on Parallel Processing, 45th International Conference on Parallel Processing, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01324179
Software libraries for linear algebra computations on high performance computers, SIAM Review, vol.37, issue.2, pp.151-180, 1995. ,
UPC performance and potential: A NPB experimental study, Supercomputing, ACM/IEEE 2002 Conference, pp.17-17, 2002. ,
Short Description, and Lucent Technologies, Lecture Notes in Computer Science, pp.483-484, 2001. ,
Skepu: a multi-backend skeleton programming library for multi-gpu systems, Proceedings of the fourth international workshop on High-level parallel programming and applications, HLPP '10, pp.5-14, 2010. ,
A NUMA Aware Scheduler for a Parallel Sparse Direct Solver, Workshop on Massively Multiprocessor and Multicore Computers, page 5p, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00549827
Athapascan-1: Online building data flow graph in a parallel language, Parallel Architectures and Compilation Techniques, pp.88-95, 1998. ,
FLAME: Formal linear algebra methods environment, ACM Transactions on Mathematical Software (TOMS), vol.27, issue.4, pp.422-455, 2001. ,
Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979. ,
Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures, Parallel & Distributed Processing (IPDPS), pp.1299-1308, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00799904
Bounds for certain multiprocessing anomalies, Bell System Technical Journal, vol.45, issue.9, pp.1563-1581, 1966. ,
Mpi+ mpi: a new hybrid approach to parallel programming with mpi plus shared memory, Computing, vol.95, issue.12, pp.1121-1136, 2013. ,
Communicating sequential processes, Commun. ACM, vol.21, pp.666-677, 1978. ,
Multi-gpu and multi-cpu parallelization for interactive physics simulations, Euro-Par 2010-Parallel Processing, vol.6272, pp.235-246, 2010. ,
Coloured petri net modelling of task scheduling on a heterogeneous computational node, IEEE 10th International Conference on Intelligent Computer Communication and Processing (ICCP), pp.323-330, 2014. ,
, Intel Math Kernel Library. Reference Manual. Intel Corporation, 2009.
Introducing the open trace format (otf), Computational Science-ICCS 2006, pp.526-533, 2006. ,
Parallex an advanced parallel execution model for scaling-impaired applications, 2009 International Conference on Parallel Processing Workshops, pp.394-401, 2009. ,
Hpx: A task based programming model in a global address space, Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, vol.14, pp.1-6, 2014. ,
Charm++: a portable concurrent object oriented system based on c++, Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, OOPSLA '93, pp.91-108, 1993. ,
A comparison of multiprocessor scheduling heuristics, Proceedings of the 1994 International Conference on Parallel Processing, vol.II, pp.243-250, 1994. ,
The CHARM Parallel Programming Language and System: Part II-The Runtime system, 1994. ,
Paje: An extensible environment for visualizing multithreaded program executions, Proc. Euro-Par, pp.133-144, 1900. ,
How to make a multiprocessor computer that correctly executes multiprocess programs, IEEE Trans. Comput, vol.28, pp.690-691, 1979. ,
Message passing for GPGPU clusters: CudaMPI, Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on, pp.1-8, 2009. ,
The Cilk++ concurrency platform, The Journal of Supercomputing, vol.51, pp.522-527, 2009. ,
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes, Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp.29-38, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00925017
Basic linear algebra subprograms for Fortran usage, ACM Transactions on Mathematical Software (TOMS), vol.5, issue.3, pp.308-323, 1979. ,
Capturing OS expertise in an Event Type System: the Bossa experience, Proceedings of the 10th workshop on ACM SIGOPS European workshop, pp.54-61, 2002. ,
Approximation algorithms for scheduling unrelated parallel machines. Mathematical programming, 1990. ,
Parallel scheduling of dags under memory constraints, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.204-213, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01828312
Scheduling for new computing platforms with GPUs, 2014. ,
URL : https://hal.archives-ouvertes.fr/tel-01127919
Capsules: Expressing composable computations in a parallel programming model, Languages and Compilers for Parallel Computing, vol.5234, pp.276-291, 2008. ,
The OpenACC application programming interface, 2013. ,
Hierarchical task-based programming with StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009. ,
A comparison of some recent task-based parallel programming models, 3rd Workshop on Programmability Issues for Multi-Core Computers, 2010. ,
A dependency-aware taskbased programming environment for multi-core architectures, Proceedings of the 2008 IEEE International Conference on Cluster Computing, pp.142-151, 2008. ,
Composing parallel software efficiently with lithe, Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, PLDI '10, pp.376-387, 2010. ,
Tuning pipelined scientific data analyses for efficient multicore execution, 2016 International Conference on High Performance Computing Simulation (HPCS), pp.751-758, 2016. ,
Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs, High-Performance Computing and Networking, vol.1067, pp.493-498, 1996. ,
Intel Threading Building Blocks, 2007. ,
Dask: Parallel computation with blocked algorithms and task scheduling, Proceedings of the 14th Python in Science Conference, pp.130-136, 2015. ,
Jade: A high-level, machine-independent language for parallel programming, Computer, vol.26, pp.28-38, 1993. ,
Scheduling task graphs on modern computing platforms. Theses, 2018. ,
URL : https://hal.archives-ouvertes.fr/tel-01843558
Regent: A high-productivity programming language for hpc with logical regions, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p.81, 2015. ,
Knights landing (knl): 2nd generation intel xeon phi processor, 2015 IEEE Hot Chips 27 Symposium (HCS), pp.1-24, 2015. ,
Dynamic grain-size adaptation on object oriented parallel programming the SCOOPP approach, Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing, IPPS '99/SPDP '99, pp.728-732, 1999. ,
The gaspi api: A failure tolerant pgas api for asynchronous dataflow on heterogeneous architectures, Sustained Simulation Performance, pp.17-32, 2014. ,
The Two-dimensional Block-Cyclic Distribution, 1997. ,
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp.1-11, 2009. ,
Component technology: what, where, and how?, Proceedings of the 25th international conference on Software engineering, pp.684-693, 2003. ,
A scalable and generic task scheduling system for communication libraries, IEEE International Conference on Cluster Computing, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00408521
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.56, pp.232-240, 2010. ,
A taxonomy of task-based parallel programming technologies for high-performance computing, The Journal of Supercomputing, vol.74, issue.4, pp.1422-1434, 2018. ,
A high-productivity task-based programming model for clusters, Concurrency and Computation: Practice and Experience, vol.24, issue.18, pp.2421-2448, 2012. ,
Task scheduling algorithms for heterogeneous processors, Proceedings of the Eighth Heterogeneous Computing Workshop, HCW '99, vol.3, 1999. ,
Scientific Computing on Multicore Architectures, 2014. ,
SuperGlue: A shared memory framework using data versioning for dependency-aware task-based parallelization, SIAM Journal on Scientific Computing, vol.37, issue.6, pp.617-642, 2015. ,
Building portable thread schedulers for hierarchical multiprocessors: The bubblesched framework, Euro-Par 2007 Parallel Processing, vol.4641, pp.42-51 ,
URL : https://hal.archives-ouvertes.fr/inria-00154506
, , 2007.
Poster: Task management for irregular workloads on the gpu, Proceeding of NVIDIA GPU Technology Conference, 2010. ,
De l'interaction des communications et de l'ordonnancement de threads au sein des grappes de machines multi-coeurs, Alexandre Informatique Bordeaux, vol.1, 2009. ,
URL : https://hal.archives-ouvertes.fr/tel-00469488
Multi2sim: A simulation framework for cpu-gpu computing, Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, pp.335-344, 2012. ,
NP-complete scheduling problems, Journal of Computer and System sciences, vol.10, issue.3, pp.384-393, 1975. ,
A bridging model for parallel computation, Commun. ACM, vol.33, issue.8, pp.103-111, 1990. ,
Hierarchical DAG scheduling for Hybrid Distributed Systems, 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01078359
Dynamic task execution on shared and distributed memory architectures, 2012. ,
Quark users' guide: Queueing and runtime for kernels ,
Porting the plasma numerical library to the openmp standard, International Journal of Parallel Programming, vol.45, issue.3, pp.612-633, 2017. ,
Taskuniverse: A task-based unified interface for versatile parallel execution, Parallel Processing and Applied Mathematics, pp.169-184, 2018. ,
UPC++: a PGAS Extension for C++, Parallel and Distributed Processing Symposium, pp.1105-1114, 2014. ,
Distributed dynamic load balancing for task parallel programming, 2018. ,
DuctTeip: A task-based parallel programming framework for distributed memory architectures, 2016. ,
Ordonnancement de processus légers sur architectures multiprocesseurs hiérarchiques : BubbleSched, une approche exploitant la structure du parallélisme des applications, vol.1, 2007. ,
Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01618526
List Scheduling in Embedded Systems Under Memory Constraints, International Journal of Parallel Programming, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00906117
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, vol.23, pp.187-198, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
A Visual Performance Analysis Framework for Task-based Parallel Applications running on Hybrid Clusters, Concurrency and Computation: Practice and Experience, 2018. ,
Faithful Performance Prediction of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures, Concurrency and Computation: Practice and Experience, p.16, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01147997
A Hybridization Methodology for HighPerformance Linear Algebra Software for GPUs, GPU Computing Gems, vol.2, 2010. ,
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 25th IEEE International Parallel & Distributed Processing Symposium, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00547614
List Scheduling in Embedded Systems under Memory Constraints, SBAC-PAD'2013-25th International Symposium on Computer Architecture and High-Performance Computing, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00906117
DKPN: A Composite Dataflow/Kahn Process Networks Execution Model, 24th Euromicro International Conference on Parallel, Distributed and Network-based processing, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01234333
DataAware Task Scheduling on Multi-Accelerator based Platforms, The 16th International Conference on Parallel and Distributed Systems (ICPADS), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00523937
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Proceedings of the 15th International Euro-Par Conference, vol.5704, pp.863-874, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
High-Level Support for Pipeline Parallelism on Many-Core Architectures, Europar-International European Conference on Parallel and Distributed Computing2012, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00697020
hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), pp.180-186, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00429889
Flexible runtime support for efficient skeleton programming on hybrid systems, Proceedings of the International Conference on Parallel Computing (ParCo), Applications, Tools and Techniques on the Road to Exascale Computing, vol.22, pp.159-166, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00606200
Siegfried Benkner, Jesper Larsson Träff, and Sabri Pllana. Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems, Design, Automation and Test in Europe (DATE), 2012. ,
Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system, 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Florianopolis, 2015. ,
Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing, The 2013 International Symposium on Advances of Distributed and Parallel Computing (ADPC 2013), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00920915
Implementation of FEM Application on GPU with StarPU, SIAM CSE13SIAM Conference on Computational Science and Engineering, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00926144
Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-Core Architectures, Euro-par-20th International Conference on Parallel Processing, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01011633
Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators, Symposium on Application Accelerators in High Performance Computing (SAAHPC), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00547616
Harnessing clusters of hybrid nodes with a sequential task-based programming model, 8th International Workshop on Parallel Matrix Algorithms and Applications, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01283949
Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms, Heterogeneity in Computing Workshop, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01120507
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Proceedings of the International Euro-Par Workshops 2009, HPPC'09, vol.6043, pp.56-65, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00421333
Exploiting the Cell/BE architecture with the StarPU unified runtime system, SAMOS Workshop-International Workshop on Systems, Architectures, Modeling, and Simulation, vol.5657, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00378705
Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach, 3rd Workshop on Visual Performance Analysis (VPA), 2016. ,
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes, HCW'2014 workshop of IPDPS, pp.8446-8446, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00925017
A NUMAaware fine grain parallelization framework for multi-core architecture, PDSEC-14th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing2013, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00858350
Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System, 21st International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01284004
An Efficient OpenMP Runtime System for Hierarchical Architectures, A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, vol.4935, pp.161-172, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00154502
Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite, 10th International Workshop on OpenMP, IWOMP2014, 10th International Workshop on OpenMP, IWOMP2014, pp.16-29, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01081974
Traitements d'images sur architectures parallèles et hétérogènes, 2012. ,
Ordonnancement de liste dans les systèmes embarqués sous contrainte de mémoire, 21èmes Rencontres Francophones du Parallélisme (RenPar'21), 2013. ,
Détection optimale des coins et contours dans des bases d'images volumineuses sur architectures multicoeurs hétérogènes, 20èmes Rencontres Francophones du Parallélisme (RenPar'20), 2011. ,
Detecção de Anomalias de Desempenho em Aplicações de Alto Desempenho baseadas em Tarefas em Clusters Híbridos, 17o Workshop em Desempenho de Sistemas Computacionais e de Comunicação (WPerformance), 2018. ,
Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01474556
Achieving high-performance with a sparse direct solver on Intel KNL, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01473475
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00725477
StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00467677
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00925017
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators, LNCS. Springer, vol.7490, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00725477
Scheduling of dynamic streaming applications on hybrid embedded MPSoCs comprising programmable computing units and hardware accelerators, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01159519
Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective, 2011. ,
Scheduling of Dense Linear Algebra Kernels on Heterogeneous Resources, 2017. ,
URL : https://hal.archives-ouvertes.fr/tel-01538516
A fine grain model programming for parallelization of sparse linear solver, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01230876
Scalability of a task-based runtime system for dense linear algebra applications, 2016. ,
URL : https://hal.archives-ouvertes.fr/tel-01483666
Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01372022
Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method, IEEE Transactions on Parallel and Distributed Systems, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01517153
LU factorization for accelerator-based systems, 9th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00654193
Task-based FMM for heterogeneous architectures, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00974674
Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00807368
Task-based fast multipole method for clusters of multicore processors, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01387482
Are Static Schedules so Bad ? A Case Study on Cholesky Factorization, Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium, IPDPS'16, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01223573
Task-based Conjugate Gradient: from multi-GPU towards heterogeneous architectures, Research Report, vol.8912, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01334734
A unified runtime system for heterogeneous multicore architectures, Proceedings of the International Euro-Par Workshops 2008, HPPC'08, vol.5415, pp.174-183, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00326917
Vers des supports d'exécution capables d'exploiter les machines multicoeurs hétérogènes, 2008. ,
StarPU: un support exécutif unifié pour les architectures multicoeurs hétérogènes, 19èmes Rencontres Francophones du Parallélisme (RenPar'19), 2009. ,
Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources, International Conference on High Performance Computing, Data, and Analytics (HiPC), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01361992
Approximation proofs of a fast and efficient list scheduling algorithm for task-based runtime systems on multicores and gpus, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.768-777, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01386174
Ordonnancement dynamique, adapté aux architectures hétérogènes, de la méthode multipôle pour les équations de Maxwell, en électromagnétisme, Université Bordeaux 1, 2013. ,
PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems, IEEE Micro, vol.31, issue.5, pp.28-41, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00648480
Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit FiniteVolume CFD Code with Adaptive Time Stepping, International Journal of Computational Science and Engineering, pp.1-22, 2017. ,
Resource aggregation for task-based Cholesky Factorization on top of heterogeneous machines, HeteroPar'2016 workshop of Euro-Par, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01181135
Critical resources management and scheduling under StarPU, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01718280
Programmation of heterogeneous architectures using moldable tasks, 2018. ,
URL : https://hal.archives-ouvertes.fr/tel-01816341
, Ludovic Courtès. C Language Extensions for Hybrid CPU/GPU Programming with StarPU, 2013.
Programmation unifiée multiaccélérateur OpenCL, pp.1233-1249, 2012. ,
Toward OpenCL Automatic Multi-Device Support, 2014. ,
Programmation multi-accélérateurs unifiée en OpenCL, p.20 ,
, Rencontres Francophones du Parallélisme (RenPar'20), 2011.
Modèles de programmation et supports exécutifs pour architectures hétérogènes, 2013. ,
ViperVM: a Runtime System for Parallel Functional HighPerformance Computing on Heterogeneous Architectures, 2nd Workshop on Functional High-Performance Computing (FHPC'13), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00851122
Composing multiple StarPU applications over heterogeneous machines: a supervised approach, Third International Workshop on Accelerators and Hybrid Exascale Systems, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00824514
Composabilité de codes parallèles sur architectures hétérogènes. Mémoire de master, 2011. ,
Le problème de la composition parallèle : une approche supervisée, 21èmes Rencontres Francophones du Parallélisme (RenPar'21), 2013. ,
Composability of parallel codes on heterogeneous architectures, 2014. ,
URL : https://hal.archives-ouvertes.fr/tel-01162975
Partitioning GPUs for Improved Scalability, IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2016. ,
Modulariser les ordonnanceurs de tâches : une approche structurelle, Conférence d'informatique en Parallélisme, Architecture et Système (ComPAS'2014), 2014. ,
Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers, The 21st IEEE International Conference on Parallel and Distributed Systems, 2015. ,
DOI : 10.1109/icpads.2015.67
URL : https://hal.archives-ouvertes.fr/hal-01180272