E. Agullo, . Bramas, . Bérenger, . Coulaud, . Olivier et al., Task-Based Fast Multipole Method for Clusters of Multicore Processors
URL : https://hal.archives-ouvertes.fr/hal-01387482

E. Agullo, . Buttari, . Alfredo, . Guermouche, . Abdou et al., Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2, pp.13-14, 2016.
DOI : 10.1109/71.993206

URL : https://hal.archives-ouvertes.fr/hal-01333645

S. Albers, Online algorithms: a survey, Mathematical Programming, pp.3-26, 2003.
DOI : 10.1007/s10107-003-0436-0

M. Anderson, . Ballard, . Grey, J. Demmel, and K. Keutzer, Communication-Avoiding QR Decomposition for GPUs, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.48-58, 2011.
DOI : 10.1109/IPDPS.2011.15

C. Augonnet, . Thibault, . Samuel, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-ortí et al., Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, pp.2438-2456, 2009.
DOI : 10.1109/TPDS.2003.1214317

. Ballard, . Grey, J. Demmel, and A. Gearhart, Communication Bounds for Heterogeneous Architectures, 2011.

. Ballard, . Grey, . Demmel, . James, . Holtz et al., Communication-optimal parallel algorithm for strassen's matrix multiplication, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.193-204, 2012.
DOI : 10.1145/2312005.2312044

. Ballard, . Grey, . Demmel, . James, O. Holtz et al., Minimizing Communication in Numerical Linear Algebra, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.3, pp.866-901, 2011.
DOI : 10.1137/090769156

P. Baptiste, L. Pape, C. Nuijten, and W. , Constraint- Based Scheduling: Applying Constraint Programming to Scheduling Problems, 2012.
DOI : 10.1007/978-1-4615-1479-4

URL : https://hal.archives-ouvertes.fr/inria-00123562

O. Beaumont, . Boudet, . Vincent, . Petitet, . Antoine et al., A proposal for a heterogeneous cluster ScaLAPACK (dense linear solvers), IEEE Transactions on Computers, vol.50, issue.10, pp.501052-1070, 2001.
DOI : 10.1109/12.956091

URL : https://hal.archives-ouvertes.fr/hal-00808287

O. Beaumont, . Boudet, . Vincent, F. Rastello, and Y. Robert, Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002.
DOI : 10.1007/s00453-002-0962-9

URL : https://hal.archives-ouvertes.fr/hal-00807407

O. Beaumont, . Cojean, . Terry, . Eyraud-dubois, . Lionel et al., Scheduling of Linear Algebra Kernels on Multiple Heterogeneous Resources, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), 2016.
DOI : 10.1109/HiPC.2016.045

URL : https://hal.archives-ouvertes.fr/hal-01361992

O. Beaumont, . Eyraud-dubois, . Lionel, . Guermouche, . Abdou et al., Comparison of Static and Dynamic Resource Allocation Strategies for Matrix Multiplication, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.170-177, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01163936

O. Beaumont, L. Eyraud-dubois, and T. Lambert, A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.474-483, 2016.
DOI : 10.1109/IPDPS.2016.32

URL : https://hal.archives-ouvertes.fr/hal-01216245

O. Beaumont, L. Eyraud-dubois, and T. Lambert, Cuboid Partitioning for Parallel Matrix Multiplication on Heterogeneous Platforms, International European Conference on Parallel and Distributed Computing (Euro-Par), pp.171-182, 2016.
DOI : 10.1016/j.disc.2008.07.028

URL : https://hal.archives-ouvertes.fr/hal-01269881

O. Beaumont, . Lambert, . Thomas, L. Marchal, . Thomas et al., Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce Inria- Research Centre Grenoble?Rhône-Alpes; Inria Bordeaux Sud-Ouest. URL https, 2017.

O. Beaumont, H. Larchevêque, and L. Marchal, Non Linear Divisible Loads: There is No Free Lunch, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.863-873, 2013.
DOI : 10.1109/IPDPS.2013.94

URL : https://hal.archives-ouvertes.fr/hal-00762008

B. Beaumont, . Olivier, . Legrand, . Arnaud, F. Rastello et al., Static LU Decomposition on Heterogeneous Platforms, The International Journal of High Performance Computing Applications, vol.36, issue.2, pp.310-323, 2001.
DOI : 10.1006/jpdc.1996.0092

URL : https://hal.archives-ouvertes.fr/hal-00856641

O. Beaumont and L. Marchal, Analysis of dynamic scheduling strategies for matrix multiplication on heterogeneous platforms, Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, HPDC '14, pp.141-152, 2014.
DOI : 10.1145/2600212.2600223

URL : https://hal.archives-ouvertes.fr/hal-01090254

P. Berenbrink, . Czumaj, . Artur, A. Steger, . Vöcking et al., Balanced allocations, Proceedings of the thirty-second annual ACM symposium on Theory of computing , STOC '00, pp.745-754, 2000.
DOI : 10.1145/335305.335411

P. Berenbrink, . Friedetzky, . Tom, . Hu, . Zengjian et al., On weighted balls-into-bins games, Theoretical Computer Science, vol.409, issue.3, pp.511-520, 2008.
DOI : 10.1016/j.tcs.2008.09.023

C. Berge, TWO THEOREMS IN GRAPH THEORY, Proceedings of the National Academy of Sciences, vol.43, issue.9, pp.842-844, 1957.
DOI : 10.1073/pnas.43.9.842

R. Bleuse, . Kedad-sidhoum, . Safia, . Monna, . Florence et al., Scheduling independent tasks on multi-cores with GPU accelerators, Concurrency and Computation: Practice and Experience, pp.1625-1638, 2015.
DOI : 10.1007/s00607-003-0011-9

URL : https://hal.archives-ouvertes.fr/hal-01081625

R. D. Blumofe and C. E. Leiserson, Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748, 1999.
DOI : 10.1145/324133.324234

B. Bollobás, Modern Graph Theory, 2013.
DOI : 10.1007/978-1-4612-0619-4

D. Borthakur, HDFS Architecture Guide, p.39, 2008.

G. Bosilca, . Bouteiller, . Aurélien, . Danalis, . Anthony et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

G. Breinholt and C. Schierz, Algorithm 781: generating Hilbert's space-filling curve by recursion, ACM Transactions on Mathematical Software, vol.24, issue.2, pp.184-189, 1998.
DOI : 10.1145/290200.290219

J. Bueno, J. Planas, A. Duran, R. M. Badia, X. Martorell et al., Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012.
DOI : 10.1109/IPDPS.2012.58

A. R. Butz, Alternative Algorithm for Hilbert's Space-Filling Curve, IEEE Transactions on Computers, vol.20, issue.4, pp.424-426, 1971.
DOI : 10.1109/T-C.1971.223258

J. Cain, . Anne, . Sanders, . Peter, and N. Wormald, The random graph threshold for k-orientiability and a fast algorithm for optimal multiplechoice allocation, Symposium on Discrete Algorithms (SODA), pp.469-476, 2007.

A. Chervenak, . Foster, . Ian, . Kesselman, . Carl et al., The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets, Journal of Network and Computer Applications, vol.23, issue.3, pp.187-200, 2000.
DOI : 10.1006/jnca.2000.0110

J. Choi, . Demmel, . James, . Dhillon, . Inderjiit et al., ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers: Design Issues and Performance, Computer Physics Communications, pp.1-15, 1996.

M. Chowdhury and I. Stoica, Coflow, Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp.31-36, 2012.
DOI : 10.1145/2390231.2390237

M. Chowdhury and I. Stoica, Efficient Coflow Scheduling Without Prior Knowledge, ACM SIGCOMM Computer Communication Review, vol.45, issue.5, pp.393-406, 2015.
DOI : 10.1007/978-3-540-69277-5_7

. Cirne, . Walfredo, . Brasileiro, . Francisco, . Paranhos et al., On the efficacy, efficiency and emergent behavior of task replication in large distributed systems, Parallel Computing, vol.33, issue.3, pp.213-234, 2007.
DOI : 10.1016/j.parco.2007.01.002

D. Clarke, . Ilic, . Aleksandar, A. Lastovetsky, and L. Sousa, Hierarchical Partitioning Algorithm for Ccientific Computing on Highly Heterogeneous CPU + GPU Clusters, International European Conference on Parallel and Distributed Computing (Euro-Par), pp.489-501, 2012.

. Bibliography-cojean, . Terry, . Guermouche, . Abdou, . Hugo et al., Resource Aggregation for Task-Based Cholesky Factorization on Top of Heterogeneous Machines, International European Conference on Parallel and Distributed Computing, 2016.

D. Coppersmith and . Winograd, Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation, vol.9, issue.3, pp.251-280, 1990.
DOI : 10.1016/S0747-7171(08)80013-2

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

A. Deflumere, Optimal Partitioning for Parallel Matrix Computation on a Small Number of Abstract Heterogeneous Processors, Thèse de doctorat, 2014.

A. Deflumere and A. Lastovetsky, Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.17-28, 2014.
DOI : 10.1109/IPDPSW.2014.8

A. Deflumere, A. Lastovetsky, and B. Becker, Optimal Data Partitioning Shape for Matrix Multiplication on Three Fully Connected Heterogeneous Processors, International European Conference on Parallel and Distributed Computing (Euro-Par) Workshop, pp.201-214, 2014.
DOI : 10.1007/978-3-319-14325-5_18

A. Deflumere, A. Lastovetsky, . Becker, and A. Brett, Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, pp.125-139, 2012.
DOI : 10.1109/IPDPSW.2012.12

J. Demmel, . Hoemmen, . Mark, . Mohiyuddin, . Marghoob et al., Avoiding communication in sparse matrix computations, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-12, 2008.
DOI : 10.1109/IPDPS.2008.4536305

. Deveci, . Mehmet, . Rajamanickam, . Sivasankaran, K. D. Devine et al., Multi-Jagged: A Scalable Parallel Spatial Partitioning Algorithm, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.3, pp.803-817, 2016.
DOI : 10.1109/TPDS.2015.2412545

J. Dongarra, . Sullivan, and . Francis, Guest Editors Introduction to the top 10 algorithms, Computing in Science & Engineering, vol.2, issue.1, pp.22-23, 2000.
DOI : 10.1109/MCISE.2000.814652

F. Dufossé, . Kaya, . Kamer, . Uçar, and . Bora, Two approximation algorithms for bipartite matching on multicore architectures, Journal of Parallel and Distributed Computing, vol.85, pp.62-78, 2015.
DOI : 10.1016/j.jpdc.2015.06.009

P. Erdös and A. Rényi, On Random Matrices, Studia Scientiarum Mathematicarum Hungarica, vol.8, pp.455-461, 1964.

D. Fiala, . Mueller, . Frank, . Engelmann, . Christian et al., Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing, International Conference on High Performance Computing, p.78, 2012.

L. R. Ford and D. R. Fulkerson, Maximal flow through a network, Journal canadien de math??matiques, vol.8, issue.0, pp.399-404, 1956.
DOI : 10.4153/CJM-1956-045-5

. Fu, . Huansong, Y. Zhu, and W. Yu, A case study of MapReduce speculation for failure recovery, Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, DISCS '15, pp.1-7, 2015.
DOI : 10.1109/IPDPS.2015.111

A. Fügenschuh, . Junosza-szaniawski, . Konstanty, and Z. Lonc, Exact and approximation algorithms for a soft rectangle packing problem, Optimization, vol.63, issue.11, pp.631637-1663, 2014.
DOI : 10.1109/43.920707

. Gautier, . Thierry, . Besseron, . Xavier, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-00727795

A. Geist and R. Lucas, Major Computer Science Challenges At Exascale, The International Journal of High Performance Computing Applications, vol.23, issue.4, pp.427-436, 2009.
DOI : 10.1177/1094342009347445

A. Goel, . Kapralov, . Michael, and S. Khanna, Perfect Matchings in $O(n\log n)$ Time in Regular Bipartite Graphs, SIAM Journal on Computing, vol.42, issue.3, pp.1392-1404, 2013.
DOI : 10.1137/100812513

. Gog, . Ionel, . Schwarzkopf, . Malte, . Gleave et al., Firmament: Fast, Centralized Cluster Scheduling at Scale, Symposium on Operating Systems Design and Implementation (OSDI), pp.99-115, 2016.

K. Goto and R. A. Geijn, Anatomy of high-performance matrix multiplication, ACM Transactions on Mathematical Software, vol.34, issue.3, pp.1-1225, 2008.
DOI : 10.1145/1356052.1356053

S. L. Graham, C. A. Patterson, and M. Snir, Getting Up to Speed: The Future of Supercomputing, 2005.

L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987.
DOI : 10.1016/0021-9991(87)90140-9

Z. Guo and G. Fox, Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.714-716, 2012.
DOI : 10.1109/CCGrid.2012.12

. Guo, . Zhenhua, G. C. Fox, and M. Zhou, Investigation of Data Locality in MapReduce, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.419-426, 2012.
DOI : 10.1109/CCGrid.2012.42

P. Hall, On Representatives of Subsets, Journal of the London Mathematical Society, issue.1, pp.1-1026, 1935.

M. Hammoud, . Sakr, and F. Majd, Locality-Aware Reduce Task Scheduling for MapReduce, 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp.570-576, 2011.
DOI : 10.1109/CloudCom.2011.87

A. Heinecke and M. Bader, Parallel matrix multiplication based on space-filling curves on shared memory multicore platforms, Proceedings of the 2008 workshop on Memory access on future processors a solved problem?, MAW '08, pp.385-392, 2008.
DOI : 10.1145/1366219.1366223

D. Hilbert, Ueber die stetige Abbildung einer Line auf ein Fl???chenst???ck, Mathematische Annalen, vol.38, issue.3, pp.459-460, 1891.
DOI : 10.1007/BF01199431

M. Hoemmen, Communication-avoiding Krylov Subspace Methods, Thèse de doctorat, 2010.

J. E. Hopcroft and R. M. Karp, An $n^{5/2} $ Algorithm for Maximum Matchings in Bipartite Graphs, SIAM Journal on Computing, vol.2, issue.4, pp.225-231, 1973.
DOI : 10.1137/0202019

J. Huang, D. A. Matthews, and R. A. Van-de-geijn, Strassen's Algorithm for Tensor Contraction, Computing Research Repository, 2017.

. Ibrahim, . Shadi, . Jin, . Hai, . Lu et al., Maestro: Replica-Aware Map Scheduling for MapReduce, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.435-442, 2012.
DOI : 10.1109/CCGrid.2012.122

URL : https://hal.archives-ouvertes.fr/hal-00670813

M. Isard, . Prabhakaran, . Vijayan, . Currey, . Jon et al., Quincy, Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pp.261-276, 2009.
DOI : 10.1145/1629575.1629601

A. Kalinov and A. Lastovetsky, Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers, Journal of Parallel and Distributed Computing, vol.61, issue.4, pp.520-535, 2001.
DOI : 10.1006/jpdc.2000.1686

S. Kavulya, . Tan, . Jiaqi, R. Gandhi, and P. Narasimhan, An Analysis of Traces from a Production MapReduce Cluster, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp.94-103, 2010.
DOI : 10.1109/CCGRID.2010.112

R. Kenyon, Tiling a Rectangle with the Fewest Squares, Journal of Combinatorial Theory, Series A, vol.76, issue.2, pp.272-291, 1996.
DOI : 10.1006/jcta.1996.0104

J. Langguth, . Manne, . Fredrik, and P. Sanders, Heuristic initialization for bipartite matching problems, Journal of Experimental Algorithmics, vol.15, pp.1-3, 2010.
DOI : 10.1145/1671970.1712656

J. V. Lima, . Gautier, . Thierry, N. Maillard, and V. Danjean, Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing, pp.75-82, 2012.
DOI : 10.1109/SBAC-PAD.2012.28

URL : https://hal.archives-ouvertes.fr/hal-00735470

A. Litke, . Skoutas, . Dimitrios, K. Tserpes, and T. Varvarigou, Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments, Future Generation Computer Systems, vol.23, issue.2, pp.163-178, 2007.
DOI : 10.1016/j.future.2006.04.014

R. E. Lyons, . Vanderkulk, and . Wouter, The Use of Triple-Modular Redundancy to Improve Computer Reliability, IBM Journal of Research and Development, vol.6, issue.2, pp.200-209, 1962.
DOI : 10.1147/rd.62.0200

F. Manne and T. Sørevik, Partitioning an Array onto a Mesh of Processors Applied Parallel Computing Industrial Computation and Optimization, pp.467-477, 1996.

M. Mitzenmacher, The power of two choices in randomized load balancing, Transactions on Parallel and Distributed Systems (TPDS), pp.1094-1104, 2001.
DOI : 10.1109/71.963420

H. Nagamochi and Y. Abe, An approximation algorithm for dissecting a rectangle into rectangles with specified areas, Discrete Applied Mathematics, vol.155, issue.4, pp.523-537, 2007.
DOI : 10.1016/j.dam.2006.08.005

E. Napoli, . Di, D. Fabregat-traver, G. Quintana-ortã, and P. Bientinesi, Towards an efficient use of the BLAS library for multilinear tensor contractions, Applied Mathematics and Computation, vol.235, pp.454-468, 2014.
DOI : 10.1016/j.amc.2014.02.051

T. Nelson, . Rivera, . Axel, . Balaprakash, . Prasanna et al., Generating Efficient Tensor Contractions for GPUs, 2015 44th International Conference on Parallel Processing, pp.969-978, 2015.
DOI : 10.1109/ICPP.2015.106

D. Ouelhadj and S. Petrovic, A survey of dynamic scheduling in manufacturing systems, Journal of Scheduling, vol.40, issue.3, pp.417-431, 2009.
DOI : 10.1023/A:1022235519958

E. Peise, D. Fabregat-traver, and P. Bientinesi, On the Performance Prediction of BLAS-based Tensor Contractions, International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp.193-212, 2014.
DOI : 10.1007/978-3-319-17248-4_10

Y. Peres, . Talwar, and U. Wieder, The (1 + ??)-Choice Process and Weighted Balls-into-Bins, Symposium on Discrete Algorithms (SODA), pp.1613-1619, 2010.
DOI : 10.1137/1.9781611973075.131

J. R. Pilkington, S. B. Baden, and S. B. Baden, Partitioning with Spacefilling Curves, 1994.

. Qiu, . Zhen, C. Stein, and Y. Zhong, Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks, Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA '15, pp.294-303, 2015.
DOI : 10.1016/j.omega.2005.09.007

M. Raab and A. Steger, ???Balls into Bins??? ??? A Simple and Tight Analysis, Randomization and Approximation Techniques in Computer Science (RANDOM), pp.159-170, 1998.
DOI : 10.1007/3-540-49543-6_13

K. Ranganathan and I. Foster, Identifying Dynamic Replication Strategies for a High-Performance Data Grid, Grid Computing?GRID, pp.75-86, 2001.
DOI : 10.1007/3-540-45644-9_8

D. A. Reed and J. Dongarra, Exascale computing and big data, Communications of the ACM, vol.58, issue.7, pp.56-68, 2015.
DOI : 10.1145/1555349.1555372

A. W. Richa, . Mitzenmacher, and R. Sitaraman, The Power of Two Random Choices: A Survey of Techniques and Results, Combinatorial Optimization, vol.9, pp.255-304, 2001.

P. Sanders, Algorithms for Scalable Storage Servers, International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pp.82-101, 2004.
DOI : 10.1007/978-3-540-24618-3_8

P. Sanders, . Egner, and K. Sebastian, Fast Concurrent Access to Parallel Disks, Algorithmica, vol.35, issue.1, pp.21-55, 2003.
DOI : 10.1007/s00453-002-0987-0

S. Shahul, A. Zaki, and O. Sinnen, Scheduling Task Graphs Optimally with A*, The Journal of Supercomputing, vol.51, issue.3, pp.310-332, 2010.

T. Sen, . Gupta, and K. Sushil, A state-of-art survey of static scheduling research involving due dates, Omega, vol.12, issue.1, pp.63-76, 1984.
DOI : 10.1016/0305-0483(84)90011-2

J. Shalf, . Dosanjh, . Sudip, and J. Morrison, Exascale Computing Technology Challenges, High Performance Computing for Computational Science (VECPAR), pp.1-25, 2011.
DOI : 10.1109/HOTI.2010.12

R. Shams and P. Sadeghi, On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters, Journal of Parallel and Distributed Computing, vol.71, issue.4, pp.584-593, 2011.
DOI : 10.1016/j.jpdc.2010.10.011

Y. Shi, U. Niranjan, . Naresh, . Anandkumar, . Animashree et al., Tensor Contractions with Extended BLAS Kernels on CPU and GPU, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), pp.193-202, 2016.
DOI : 10.1109/HiPC.2016.031

J. Skilling, G. J. Erickson, and Y. Zhai, Programming the Hilbert curve, AIP Conference Proceedings, pp.381-387, 2004.
DOI : 10.1063/1.1751381

B. Solomonik, . Edgar, and J. Demmel, Communication-optimal Parallel 2.5 D Matrix Multiplication and LU factorization Algorithms, International European Conference on Parallel and Distributed Computing (Euro- Par), pp.90-109, 2011.

A. Spark, Apache Spark: Lightning-Fast Cluster Computing, 2016.

H. Stockinger, . Samar, . Asad, . Allcock, . Bill et al., File and object replication in data grids, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing, pp.76-86, 2001.
DOI : 10.1109/HPDC.2001.945178

V. Strassen, Gaussian elimination is not optimal, Numerische Mathematik, vol.13, issue.4, pp.354-356, 1969.
DOI : 10.1007/BF02165411

. Tan, . Jian, . Meng, . Shicong, . Meng et al., Improving ReduceTask data locality for sequential MapReduce jobs, 2013 Proceedings IEEE INFOCOM, pp.1627-1635, 2013.
DOI : 10.1109/INFCOM.2013.6566959

. Topcuoglu, . Haluk, S. Hariri, . Wu, and . Min-you, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

D. W. Walkup, Matchings in random regular bipartite digraphs, Discrete Mathematics, vol.31, issue.1, pp.59-64, 1980.
DOI : 10.1016/0012-365X(80)90172-7

M. Walters, Rectangles as sums of squares, Discrete Mathematics, vol.309, issue.9, pp.2913-2921, 2009.
DOI : 10.1016/j.disc.2008.07.028

F. Wang, . Ramamritham, . Krithi, and J. A. Stankovic, Determining redundancy levels for fault tolerant real-time systems, IEEE Transactions on Computers, vol.44, issue.2, pp.292-301, 1995.
DOI : 10.1109/12.364540

W. Wang, . Zhu, . Kai, . Ying, . Lei et al., Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality, 2013 Proceedings IEEE INFOCOM, pp.1609-1617, 2013.
DOI : 10.1109/INFCOM.2013.6566957

D. West and . Brent, Introduction to Graph Theory, 2001.

T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

Q. Xie and Y. Lu, Degree-guided map-reduce task assignment with data locality constraint, 2012 IEEE International Symposium on Information Theory Proceedings, pp.985-989, 2012.
DOI : 10.1109/ISIT.2012.6284711

A. Yekkehkhany, Near Data Scheduling for Data Centers with Multi Levels of Data Locality, Computing Research Repository, 2017.

. Zaharia, . Matei, . Borthakur, S. Dhruba, . Sarma et al., Delay scheduling, Proceedings of the 5th European conference on Computer systems, EuroSys '10, pp.265-278, 2010.
DOI : 10.1145/1755913.1755940

. Zaharia, . Matei, . Konwinski, . Andy, A. D. Joseph et al., Improving MapReduce Performance in Heterogeneous Environments, Symposium on Operating Systems Design and Implementation (OSDI), pp.29-42, 2008.

Z. Zefreh, . Ebrahim, . Lotfi, M. Shahriar, . Khanli et al., 3-D data partitioning for 3-level perfectly nested loops on heterogeneous distributed systems, Concurrency and Computation: Practice and Experience, 2016.
DOI : 10.1016/j.jpdc.2007.07.003