Task-Based Fast Multipole Method for Clusters of Multicore Processors ,
URL : https://hal.archives-ouvertes.fr/hal-01387482
Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2, pp.13-14, 2016. ,
DOI : 10.1109/71.993206
URL : https://hal.archives-ouvertes.fr/hal-01333645
Online algorithms: a survey, Mathematical Programming, pp.3-26, 2003. ,
DOI : 10.1007/s10107-003-0436-0
Communication-Avoiding QR Decomposition for GPUs, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.48-58, 2011. ,
DOI : 10.1109/IPDPS.2011.15
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, pp.2438-2456, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Communication Bounds for Heterogeneous Architectures, 2011. ,
Communication-optimal parallel algorithm for strassen's matrix multiplication, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.193-204, 2012. ,
DOI : 10.1145/2312005.2312044
Minimizing Communication in Numerical Linear Algebra, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.3, pp.866-901, 2011. ,
DOI : 10.1137/090769156
Constraint- Based Scheduling: Applying Constraint Programming to Scheduling Problems, 2012. ,
DOI : 10.1007/978-1-4615-1479-4
URL : https://hal.archives-ouvertes.fr/inria-00123562
A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers), IEEE Transactions on Computers, vol.50, issue.10, pp.501052-1070, 2001. ,
DOI : 10.1109/12.956091
URL : https://hal.archives-ouvertes.fr/hal-00808287
Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002. ,
DOI : 10.1007/s00453-002-0962-9
URL : https://hal.archives-ouvertes.fr/hal-00807407
Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), 2016. ,
DOI : 10.1109/HiPC.2016.045
URL : https://hal.archives-ouvertes.fr/hal-01361992
Comparison of Static and Dynamic Resource Allocation Strategies for Matrix Multiplication, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.170-177, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01163936
A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.474-483, 2016. ,
DOI : 10.1109/IPDPS.2016.32
URL : https://hal.archives-ouvertes.fr/hal-01216245
Cuboid Partitioning for Parallel Matrix Multiplication on Heterogeneous Platforms, International European Conference on Parallel and Distributed Computing (Euro-Par), pp.171-182, 2016. ,
DOI : 10.1016/j.disc.2008.07.028
URL : https://hal.archives-ouvertes.fr/hal-01269881
Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce Inria- Research Centre Grenoble?Rhône-Alpes; Inria Bordeaux Sud-Ouest. URL https, 2017. ,
Non Linear Divisible Loads: There is No Free Lunch, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.863-873, 2013. ,
DOI : 10.1109/IPDPS.2013.94
URL : https://hal.archives-ouvertes.fr/hal-00762008
Static LU Decomposition on Heterogeneous Platforms, The International Journal of High Performance Computing Applications, vol.36, issue.2, pp.310-323, 2001. ,
DOI : 10.1006/jpdc.1996.0092
URL : https://hal.archives-ouvertes.fr/hal-00856641
Analysis of dynamic scheduling strategies for matrix multiplication on heterogeneous platforms, Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, HPDC '14, pp.141-152, 2014. ,
DOI : 10.1145/2600212.2600223
URL : https://hal.archives-ouvertes.fr/hal-01090254
Balanced allocations, Proceedings of the thirty-second annual ACM symposium on Theory of computing , STOC '00, pp.745-754, 2000. ,
DOI : 10.1145/335305.335411
On weighted balls-into-bins games, Theoretical Computer Science, vol.409, issue.3, pp.511-520, 2008. ,
DOI : 10.1016/j.tcs.2008.09.023
TWO THEOREMS IN GRAPH THEORY, Proceedings of the National Academy of Sciences, vol.43, issue.9, pp.842-844, 1957. ,
DOI : 10.1073/pnas.43.9.842
Scheduling independent tasks on multi-cores with GPU accelerators, Concurrency and Computation: Practice and Experience, pp.1625-1638, 2015. ,
DOI : 10.1007/s00607-003-0011-9
URL : https://hal.archives-ouvertes.fr/hal-01081625
Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748, 1999. ,
DOI : 10.1145/324133.324234
Modern Graph Theory, 2013. ,
DOI : 10.1007/978-1-4612-0619-4
HDFS Architecture Guide, p.39, 2008. ,
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013. ,
DOI : 10.1109/MCSE.2013.98
Algorithm 781: generating Hilbert's space-filling curve by recursion, ACM Transactions on Mathematical Software, vol.24, issue.2, pp.184-189, 1998. ,
DOI : 10.1145/290200.290219
Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012. ,
DOI : 10.1109/IPDPS.2012.58
Alternative Algorithm for Hilbert's Space-Filling Curve, IEEE Transactions on Computers, vol.20, issue.4, pp.424-426, 1971. ,
DOI : 10.1109/T-C.1971.223258
The random graph threshold for k-orientiability and a fast algorithm for optimal multiplechoice allocation, Symposium on Discrete Algorithms (SODA), pp.469-476, 2007. ,
The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets, Journal of Network and Computer Applications, vol.23, issue.3, pp.187-200, 2000. ,
DOI : 10.1006/jnca.2000.0110
ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers: Design Issues and Performance, Computer Physics Communications, pp.1-15, 1996. ,
Coflow, Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp.31-36, 2012. ,
DOI : 10.1145/2390231.2390237
Efficient Coflow Scheduling Without Prior Knowledge, ACM SIGCOMM Computer Communication Review, vol.45, issue.5, pp.393-406, 2015. ,
DOI : 10.1007/978-3-540-69277-5_7
On the efficacy, efficiency and emergent behavior of task replication in large distributed systems, Parallel Computing, vol.33, issue.3, pp.213-234, 2007. ,
DOI : 10.1016/j.parco.2007.01.002
Hierarchical Partitioning Algorithm for Ccientific Computing on Highly Heterogeneous CPU + GPU Clusters, International European Conference on Parallel and Distributed Computing (Euro-Par), pp.489-501, 2012. ,
Resource Aggregation for Task-Based Cholesky Factorization on Top of Heterogeneous Machines, International European Conference on Parallel and Distributed Computing, 2016. ,
Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation, vol.9, issue.3, pp.251-280, 1990. ,
DOI : 10.1016/S0747-7171(08)80013-2
MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008. ,
DOI : 10.1145/1327452.1327492
Optimal Partitioning for Parallel Matrix Computation on a Small Number of Abstract Heterogeneous Processors, Thèse de doctorat, 2014. ,
Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.17-28, 2014. ,
DOI : 10.1109/IPDPSW.2014.8
Optimal Data Partitioning Shape for Matrix Multiplication on Three Fully Connected Heterogeneous Processors, International European Conference on Parallel and Distributed Computing (Euro-Par) Workshop, pp.201-214, 2014. ,
DOI : 10.1007/978-3-319-14325-5_18
Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, pp.125-139, 2012. ,
DOI : 10.1109/IPDPSW.2012.12
Avoiding communication in sparse matrix computations, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-12, 2008. ,
DOI : 10.1109/IPDPS.2008.4536305
Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.3, pp.803-817, 2016. ,
DOI : 10.1109/TPDS.2015.2412545
Guest Editors Introduction to the top 10 algorithms, Computing in Science & Engineering, vol.2, issue.1, pp.22-23, 2000. ,
DOI : 10.1109/MCISE.2000.814652
Two approximation algorithms for bipartite matching on multicore architectures, Journal of Parallel and Distributed Computing, vol.85, pp.62-78, 2015. ,
DOI : 10.1016/j.jpdc.2015.06.009
On Random Matrices, Studia Scientiarum Mathematicarum Hungarica, vol.8, pp.455-461, 1964. ,
Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing, International Conference on High Performance Computing, p.78, 2012. ,
Maximal flow through a network, Journal canadien de math??matiques, vol.8, issue.0, pp.399-404, 1956. ,
DOI : 10.4153/CJM-1956-045-5
A case study of MapReduce speculation for failure recovery, Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, DISCS '15, pp.1-7, 2015. ,
DOI : 10.1109/IPDPS.2015.111
Exact and approximation algorithms for a soft rectangle packing problem, Optimization, vol.63, issue.11, pp.631637-1663, 2014. ,
DOI : 10.1109/43.920707
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00727795
Major Computer Science Challenges At Exascale, The International Journal of High Performance Computing Applications, vol.23, issue.4, pp.427-436, 2009. ,
DOI : 10.1177/1094342009347445
Perfect Matchings in $O(n\log n)$ Time in Regular Bipartite Graphs, SIAM Journal on Computing, vol.42, issue.3, pp.1392-1404, 2013. ,
DOI : 10.1137/100812513
Firmament: Fast, Centralized Cluster Scheduling at Scale, Symposium on Operating Systems Design and Implementation (OSDI), pp.99-115, 2016. ,
Anatomy of high-performance matrix multiplication, ACM Transactions on Mathematical Software, vol.34, issue.3, pp.1-1225, 2008. ,
DOI : 10.1145/1356052.1356053
Getting Up to Speed: The Future of Supercomputing, 2005. ,
A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987. ,
DOI : 10.1016/0021-9991(87)90140-9
Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.714-716, 2012. ,
DOI : 10.1109/CCGrid.2012.12
Investigation of Data Locality in MapReduce, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.419-426, 2012. ,
DOI : 10.1109/CCGrid.2012.42
On Representatives of Subsets, Journal of the London Mathematical Society, issue.1, pp.1-1026, 1935. ,
Locality-Aware Reduce Task Scheduling for MapReduce, 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp.570-576, 2011. ,
DOI : 10.1109/CloudCom.2011.87
Parallel matrix multiplication based on space-filling curves on shared memory multicore platforms, Proceedings of the 2008 workshop on Memory access on future processors a solved problem?, MAW '08, pp.385-392, 2008. ,
DOI : 10.1145/1366219.1366223
Ueber die stetige Abbildung einer Line auf ein Fl???chenst???ck, Mathematische Annalen, vol.38, issue.3, pp.459-460, 1891. ,
DOI : 10.1007/BF01199431
Communication-avoiding Krylov Subspace Methods, Thèse de doctorat, 2010. ,
An $n^{5/2} $ Algorithm for Maximum Matchings in Bipartite Graphs, SIAM Journal on Computing, vol.2, issue.4, pp.225-231, 1973. ,
DOI : 10.1137/0202019
Strassen's Algorithm for Tensor Contraction, Computing Research Repository, 2017. ,
Maestro: Replica-Aware Map Scheduling for MapReduce, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.435-442, 2012. ,
DOI : 10.1109/CCGrid.2012.122
URL : https://hal.archives-ouvertes.fr/hal-00670813
Quincy, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pp.261-276, 2009. ,
DOI : 10.1145/1629575.1629601
Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers, Journal of Parallel and Distributed Computing, vol.61, issue.4, pp.520-535, 2001. ,
DOI : 10.1006/jpdc.2000.1686
An Analysis of Traces from a Production MapReduce Cluster, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp.94-103, 2010. ,
DOI : 10.1109/CCGRID.2010.112
Tiling a Rectangle with the Fewest Squares, Journal of Combinatorial Theory, Series A, vol.76, issue.2, pp.272-291, 1996. ,
DOI : 10.1006/jcta.1996.0104
Heuristic initialization for bipartite matching problems, Journal of Experimental Algorithmics, vol.15, pp.1-3, 2010. ,
DOI : 10.1145/1671970.1712656
Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing, pp.75-82, 2012. ,
DOI : 10.1109/SBAC-PAD.2012.28
URL : https://hal.archives-ouvertes.fr/hal-00735470
Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments, Future Generation Computer Systems, vol.23, issue.2, pp.163-178, 2007. ,
DOI : 10.1016/j.future.2006.04.014
The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962. ,
DOI : 10.1147/rd.62.0200
Partitioning an Array onto a Mesh of Processors Applied Parallel Computing Industrial Computation and Optimization, pp.467-477, 1996. ,
The power of two choices in randomized load balancing, Transactions on Parallel and Distributed Systems (TPDS), pp.1094-1104, 2001. ,
DOI : 10.1109/71.963420
An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007. ,
DOI : 10.1016/j.dam.2006.08.005
Towards an efficient use of the BLAS library for multilinear tensor contractions, Applied Mathematics and Computation, vol.235, pp.454-468, 2014. ,
DOI : 10.1016/j.amc.2014.02.051
Generating Efficient Tensor Contractions for GPUs, 2015 44th International Conference on Parallel Processing, pp.969-978, 2015. ,
DOI : 10.1109/ICPP.2015.106
A survey of dynamic scheduling in manufacturing systems, Journal of Scheduling, vol.40, issue.3, pp.417-431, 2009. ,
DOI : 10.1023/A:1022235519958
On the Performance Prediction of BLAS-based Tensor Contractions, International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp.193-212, 2014. ,
DOI : 10.1007/978-3-319-17248-4_10
The (1 + ??)-Choice Process and Weighted Balls-into-Bins, Symposium on Discrete Algorithms (SODA), pp.1613-1619, 2010. ,
DOI : 10.1137/1.9781611973075.131
Partitioning with Spacefilling Curves, 1994. ,
Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks, Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA '15, pp.294-303, 2015. ,
DOI : 10.1016/j.omega.2005.09.007
???Balls into Bins??? ??? A Simple and Tight Analysis, Randomization and Approximation Techniques in Computer Science (RANDOM), pp.159-170, 1998. ,
DOI : 10.1007/3-540-49543-6_13
Identifying Dynamic Replication Strategies for a High-Performance Data Grid, Grid Computing?GRID, pp.75-86, 2001. ,
DOI : 10.1007/3-540-45644-9_8
Exascale computing and big data, Communications of the ACM, vol.58, issue.7, pp.56-68, 2015. ,
DOI : 10.1145/1555349.1555372
The Power of Two Random Choices: A Survey of Techniques and Results, Combinatorial Optimization, vol.9, pp.255-304, 2001. ,
Algorithms for Scalable Storage Servers, International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pp.82-101, 2004. ,
DOI : 10.1007/978-3-540-24618-3_8
Fast Concurrent Access to Parallel Disks, Algorithmica, vol.35, issue.1, pp.21-55, 2003. ,
DOI : 10.1007/s00453-002-0987-0
Scheduling Task Graphs Optimally with A*, The Journal of Supercomputing, vol.51, issue.3, pp.310-332, 2010. ,
A state-of-art survey of static scheduling research involving due dates, Omega, vol.12, issue.1, pp.63-76, 1984. ,
DOI : 10.1016/0305-0483(84)90011-2
Exascale Computing Technology Challenges, High Performance Computing for Computational Science (VECPAR), pp.1-25, 2011. ,
DOI : 10.1109/HOTI.2010.12
On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters, Journal of Parallel and Distributed Computing, vol.71, issue.4, pp.584-593, 2011. ,
DOI : 10.1016/j.jpdc.2010.10.011
Tensor Contractions with Extended BLAS Kernels on CPU and GPU, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), pp.193-202, 2016. ,
DOI : 10.1109/HiPC.2016.031
Programming the Hilbert curve, AIP Conference Proceedings, pp.381-387, 2004. ,
DOI : 10.1063/1.1751381
Communication-optimal Parallel 2.5 D Matrix Multiplication and LU factorization Algorithms, International European Conference on Parallel and Distributed Computing (Euro- Par), pp.90-109, 2011. ,
Apache Spark: Lightning-Fast Cluster Computing, 2016. ,
File and object replication in data grids, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing, pp.76-86, 2001. ,
DOI : 10.1109/HPDC.2001.945178
Gaussian elimination is not optimal, Numerische Mathematik, vol.13, issue.4, pp.354-356, 1969. ,
DOI : 10.1007/BF02165411
Improving ReduceTask data locality for sequential MapReduce jobs, 2013 Proceedings IEEE INFOCOM, pp.1627-1635, 2013. ,
DOI : 10.1109/INFCOM.2013.6566959
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
Matchings in random regular bipartite digraphs, Discrete Mathematics, vol.31, issue.1, pp.59-64, 1980. ,
DOI : 10.1016/0012-365X(80)90172-7
Rectangles as sums of squares, Discrete Mathematics, vol.309, issue.9, pp.2913-2921, 2009. ,
DOI : 10.1016/j.disc.2008.07.028
Determining redundancy levels for fault tolerant real-time systems, IEEE Transactions on Computers, vol.44, issue.2, pp.292-301, 1995. ,
DOI : 10.1109/12.364540
Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality, 2013 Proceedings IEEE INFOCOM, pp.1609-1617, 2013. ,
DOI : 10.1109/INFCOM.2013.6566957
Introduction to Graph Theory, 2001. ,
Hadoop: The Definitive Guide. O'Reilly Media, 2012. ,
Degree-guided map-reduce task assignment with data locality constraint, 2012 IEEE International Symposium on Information Theory Proceedings, pp.985-989, 2012. ,
DOI : 10.1109/ISIT.2012.6284711
Near Data Scheduling for Data Centers with Multi Levels of Data Locality, Computing Research Repository, 2017. ,
Delay scheduling, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.265-278, 2010. ,
DOI : 10.1145/1755913.1755940
Improving MapReduce Performance in Heterogeneous Environments, Symposium on Operating Systems Design and Implementation (OSDI), pp.29-42, 2008. ,
3-D data partitioning for 3-level perfectly nested loops on heterogeneous distributed systems, Concurrency and Computation: Practice and Experience, 2016. ,
DOI : 10.1016/j.jpdc.2007.07.003