Green500 list URL: https://www.green500. org/lists, 2017. ,
DOI : 10.1109/ipdpsw.2010.5470905
URL: http://www.aics.riken, 2017. ,
The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.154-165, 2014. ,
DOI : 10.1109/SC.2014.18
LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp.217-224, 2011. ,
DOI : 10.1109/AICCSA.2011.6126599
URL : https://hal.archives-ouvertes.fr/hal-00654193
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Characterizing node orderings for improved performance, Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS '15, pp.1-6, 2015. ,
DOI : 10.2172/800975
Opportunities and Challenges of Exascale Computing URL: https : / / science . energy . gov, Tech. rep. U.S. Department of Energy, 2010. ,
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Euro-Par Workshops. Lecture Notes in Computer Science, vol.6043, pp.56-65, 2009. ,
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience 23, pp.187-198, 2011. ,
DOI : 10.1007/978-3-642-03869-3_80
URL : https://hal.archives-ouvertes.fr/inria-00384363
Some models for scheduling parallel programs with communication delays, Discrete Applied Mathematics, vol.72, issue.1-2, pp.5-24, 1997. ,
DOI : 10.1016/S0166-218X(96)00034-0
URL : https://doi.org/10.1016/s0166-218x(96)00034-0
There goes the neighborhood, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '13, pp.1-4112, 2013. ,
DOI : 10.1145/2503210.2503247
Handbook on Scheduling: From Theory to Applications. International Handbooks on Information Systems, 2007. ,
Scheduling independent tasks on multi-cores with GPU accelerators, Concurrency and Computation: Practice and Experience, pp.16-1625, 2015. ,
DOI : 10.1007/s00607-003-0011-9
URL : https://hal.archives-ouvertes.fr/hal-01081625
Scheduling Independent Moldable Tasks on Multi-Cores with GPUs, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.9, pp.2689-2702, 2017. ,
DOI : 10.1109/TPDS.2017.2675891
URL : https://hal.archives-ouvertes.fr/hal-01263100
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
URL : http://www.netlib.org/lapack/lawnspdf/lawn231.pdf
A Fast 5/2-Approximation Algorithm for Hierarchical Scheduling, Euro-Par Lecture Notes in Computer Science, vol.17, issue.3, pp.157-167, 2010. ,
DOI : 10.1137/0217033
URL : https://hal.archives-ouvertes.fr/hal-00738518
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Approximation Algorithms for Multiple Strip Packing and Scheduling Parallel Jobs in Platforms, IEEE Transactions on Computers 59, pp.808-821, 2010. ,
The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM, vol.21, issue.2, pp.201-206, 1974. ,
DOI : 10.1145/321812.321815
URL : http://cr.yp.to/bib/1974/brent.pdf
Scheduling Algorithms. Fifth Edition, 2007. ,
Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.557-568, 2012. ,
DOI : 10.1109/IPDPS.2012.58
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Scheduling Unrelated Machines of Few Different Types URL: https, pp.1205-0974, 2012. ,
Understanding and Improving Computational Science Storage Access through Continuous Characterization, In: ACM Transactions on Storage, vol.7, issue.81, p.77, 2011. ,
Considering Time in Designing Large-Scale Systems for Scientific Computing, pp.1533-1545, 2016. ,
Performance Bounds for Level-Oriented Two-Dimensional Packing Algorithms, SIAM Journal on Computing, vol.9, issue.4, pp.808-826, 1980. ,
DOI : 10.1137/0209062
LogP: Towards a Realistic Model of Parallel Computation, pp.1-12, 1993. ,
Online Scheduling on a CPU-GPU Cluster, In: TAMC. Lecture Notes in Computer Science, vol.7876, pp.1-9, 2013. ,
DOI : 10.1007/978-3-642-38236-9_1
Exploiting Geometric Partitioning in Task Mapping for Parallel Computers, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.16-27, 2014. ,
DOI : 10.1109/IPDPS.2014.15
URL : http://bmi.osu.edu/hpc/papers/Deveci14-IPDPS.pdf
Scheduling Parallel Tasks Approximation Algorithms In: Handbook of Scheduling: Algorithms , Models, and Performance Analysis, Computer & Information Science Series. Chapman and Hall/CRC, 2004. ,
The International Exascale Software Project roadmap, The International Journal of High Performance Computing Applications, vol.25, issue.1, pp.3-60, 2011. ,
DOI : 10.1088/1742-6596/180/1/012045
URL : http://www.exascale.org/mediawiki/images/2/20/IESP-roadmap.pdf
Using Formal Grammars to Predict I/O Behaviors in HPC: The Omnisc'IO Approach, IEEE Transactions on Parallel and Distributed Systems, vol.27, issue.8, pp.2435-2449, 2016. ,
DOI : 10.1109/TPDS.2015.2485980
URL : https://hal.archives-ouvertes.fr/hal-01238103
Scheduling for Parallel Processing Computer Communications and Networks, 2009. ,
Fast parallel sorting under LogP: experience with the CM-5, IEEE Transactions on Parallel and Distributed Systems, pp.791-805, 1996. ,
DOI : 10.1109/71.532111
URL : http://www.ece.eng.wayne.edu/~czxu/ece561/lnotes/logp-sort-paper.pdf
Understanding Application and System Performance Through System-Wide Monitoring, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp.1702-1710, 2016. ,
DOI : 10.1109/IPDPSW.2016.145
Topology-Aware Job Scheduling Strategies for Torus Networks In: Cray User Group URL: https://cug.org/proceedings, pp.74-77, 2014. ,
URL: http://graal.ens-lyon. fr/~leyraudd/These/manuscrit.pdf. | cit An effective approximation algorithm for the Malleable Parallel Task Scheduling problem, In: Journal of Parallel and Distributed Computing, vol.72, issue.5, pp.40-693, 2006. ,
Theory and Practice in Parallel Job Scheduling, In: JSSPP. Lecture Notes in Computer Science, vol.1291, pp.1-34, 1997. ,
Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs, pp.75-82, 2012. ,
Tighter Bounds for LPT Scheduling on Uniform Processors, In: SIAM Journal on Computing, vol.163, pp.554-560, 1987. ,
Parallelism in random access machines, Proceedings of the tenth annual ACM symposium on Theory of computing , STOC '78, pp.114-118, 1978. ,
DOI : 10.1145/800133.804339
URL : http://ecommons.cornell.edu/bitstream/1813/7454/1/78-334.pdf
Scheduling the I/O of HPC Applications Under Congestion, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.1013-1022, 2015. ,
DOI : 10.1109/IPDPS.2015.116
URL : https://hal.archives-ouvertes.fr/hal-01251938
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures KAAPI: A thread scheduling runtime system for data flow computations on cluster of multiprocessors Topology-aware Resource Management for HPC Applications, DOI: 10.1109/IPDPS.2013.66. | cit, pp.1299-1308, 2007. ,
Contributions for Resource and Job Management in High Performance Computing URL: https, 2010. ,
Algorithms for Compile-Time Memory Optimization URL: https, In: SODA. ACM/SIAM, pp.907-908, 1999. ,
Bounds for Multiprocessor Scheduling with Resource Constraints, In: SIAM Journal on Computing, vol.4, issue.2, pp.187-200, 1975. ,
Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. | cit, pp.82-86 ,
Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey, Annals of Discrete Mathematics, vol.52, issue.08, pp.287-326, 1979. ,
DOI : 10.1016/S0167-5060(08)70356-X
URL : https://ir.cwi.nl/pub/18052/18052A.pdf
Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Euro-Par Lecture Notes in Computer Science, vol.35, issue.3, pp.235-246, 2010. ,
DOI : 10.1007/s00224-002-1055-5
URL : https://hal.archives-ouvertes.fr/inria-00502448
Ueber die stetige Abbildung einer Line auf ein Flächenstück, Mathematische Annalen, vol.383, pp.459-460, 1891. ,
DOI : 10.1007/bf01199431
Using Dual Approximation Algorithms for Scheduling Problems: Theoretical and Practical Results, Journal of the ACM, vol.34, issue.23, pp.144-162, 1987. ,
A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach, In: SIAM Journal on Computing, vol.173, pp.539-551, 1988. ,
CLARISSE: A Middleware for Data-Staging Coordination and Control on Large-Scale HPC Platforms, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp.346-355, 2016. ,
DOI : 10.1109/CCGrid.2016.24
Scheduling Problems on Two Sets of Identical Machines, Computing, vol.70, issue.4, pp.277-294, 2003. ,
DOI : 10.1007/s00607-003-0011-9
Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.19-439, 2017. ,
DOI : 10.1109/IPDPS.2017.91
URL: https Cost-Effective Diameter-Two Topologies: Analysis and Evaluation Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU, Linear-time Approximation Schemes for Scheduling Malleable Parallel Tasks, pp.490-498, 1999. ,
Processor Allocation on Cplant: Achieving General Processor Locality Using One-Dimensional Allocation Strategies, pp.296-304, 2002. ,
Approximation algorithms for scheduling unrelated parallel machines, Mathematical Programming, vol.23, issue.1-3, pp.259-271, 1990. ,
DOI : 10.1007/BF01585745
Scheduling Malleable and Nonmalleable Parallel Tasks URL: https: //dl.acm.org/citation.cfm?id=314464.314491. | cit Contiguity and Locality in Backfilling Scheduling, In: SODA. ACM/SIAM, vol.72, pp.167-176, 1994. ,
Scheduling for new computing platforms with GPUs URL: https, 2014. ,
A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing URL: https, Tech. rep. IBM Ltd, p.72, 1966. ,
A 3/2-Approximation Algorithm for Scheduling Independent Monotonic Malleable Tasks Solving very large instances of the scheduling of independent tasks problem on the GPU, In: SIAM Journal on Computing Journal of Parallel and Distributed Computing, vol.372, issue.731, pp.401-412, 2007. ,
Applicationaware metrics for partition selection in cube-shaped topologies, Parallel Computing, vol.405, pp.129-139, 2014. ,
Adapting a Message- Driven Parallel Application to GPU-Accelerated Clusters DOI: 10.1145/1413370.1413379. | cit. on p. 16 [RN12] Gurulingesh Raravi and Vincent Nélis A PTAS for Assigning Sporadic Tasks on Two-type Heterogeneous Multiprocessors Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems, Son+10] Fengguang Song, Hatem Ltaief, Bilel Hadri, and Jack Dongarra, pp.1-8, 2008. ,
DOI : 10.1109/sc.2008.5214716
URL : http://mc.stanford.edu/cgi-bin/images/8/8a/SC08_NAMD.pdf
An approximation algorithm for the generalized assignment problem Stanimire Tomov, and Jack Dongarra Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems A Strip-Packing Algorithm with Absolute Performance Bound 2, DOI: 10.1007/BF01585178. | cit. on p. 18 [STD12] Fengguang Song, pp.461-474, 1993. ,
An optimal rounding gives a better approximation for scheduling unrelated machines, Operations Research Letters, vol.33, issue.2, pp.127-133, 2005. ,
On the existence of schedules that are nearoptimal for both makespan and total weighted completion time, Operations Research Letters, vol.21397, pp.115-122, 1997. ,
DOI : 10.1016/s0167-6377(97)00025-4
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.25-232, 2010. ,
DOI : 10.1016/j.parco.2009.12.005
URL : http://icl.cs.utk.edu/news_pub/submissions/tdb.pdf
Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers, 2016 First International Workshop on Communication Optimizations in HPC (COMHPC), pp.73-81, 2016. ,
DOI : 10.1109/COMHPC.2016.013
URL : https://hal.archives-ouvertes.fr/hal-01394741
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, pp.77-260, 2002. ,
DOI : 10.1109/71.993206
URL : http://meseec.ce.rit.edu/eecc722-fall2002/papers/hc/5/l0260.pdf
PaCMap, Proceedings of the 29th ACM on International Conference on Supercomputing, ICS '15, pp.37-46, 2015. ,
DOI : 10.1109/SC.2012.47
URL : http://dl.acm.org/ft_gateway.cfm?id=2751225&type=pdf
Approximate algorithms scheduling parallelizable tasks, Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures , SPAA '92, pp.323-332, 1992. ,
DOI : 10.1145/140901.141909
A Bridging Model for Parallel Computation, Communications of the ACM, vol.338, issue.7 8, 1990. ,
QUARK Users' Guide: QUeueing And Runtime for Kernels. Tech. rep. ICL-UT-11-02, p.30, 2011. ,
94 A11 List of Tables 3.1 Parameter settings used to generate scheduling instances . . . . . . 57 3.2 HEFT-like heuristics used for comparison, p.61 ,