D. C. ?-danilo-carastan-santos, S. W. Martins-jr, . Song, and C. Luiz,

R. Y. Rozante and . De-camargo, A hybrid CPU-GPU-MIC algorithm for hitting set problem, XVIII Simpósio de Computação de Alto-Desempenho (WSCAD), 2017. Full paper in a recognized Brazilian High Performance Computing workshop

A. Krim, A. Evripidis, B. Claire, K. , and Y. Manoussakis, Scheduling independent multiprocessor tasks, Algorithmica 32, vol.2, p.17, 2002.

S. Brenda, . Baker, G. Edward, R. L. Coffman, and . Rivest, Orthogonal packings in two dimensions, SIAM Journal on computing, vol.9, p.17, 1980.

A. Michael, . Bender, C. Soumen, and M. Sambavi, Flow and Stretch Metrics for Scheduling Continuous Job Streams, In: SODA, vol.98, p.17, 1998.

B. Evan, M. A. Clark, and G. Arjun, Simulating the Weak Death of the Neutron in a Femtoscale Universe with Near-exascale Computing, Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. SC '18, vol.55, p.1, 2018.

R. Michael, . Berthold, B. Christian, H. Frank, and F. Klawonn, Guide to intelligent data analysis: how to intelligently make sense of real data, vol.29, p.27, 2010.

B. Abhinav, M. Kathryn, S. H. Langer, and K. E. Isaacs, There Goes the Neighborhood: Performance Degradation due to Nearby Jobs, vol.41, p.15, 2013.

M. Christopher and . Bishop, Pattern recognition and machine learning, vol.30, p.29, 2006.

B. Raphaël, Apprehending heterogeneity at (very) large scale. (Appréhender l'hétérogénéité à (très) grande échelle, vol.17, p.16, 2017.

B. Marin, D. Pierre-françois, K. J. Christina, O. , and D. Trystram, Approximation Algorithms for Multiple Strip Packing, Approximation and Online Algorithms, 7th International Workshop, p.17, 2009.

B. Peter, Scheduling Algorithms. Fifth Edition, 2007.

C. Nicolas, D. A. Georges, . Costa, and G. Yiannis, A batch scheduler with high level components, Cluster Computing and the Grid, vol.2, p.22, 2005.

J. Raymond, D. Carroll, and . Ruppert, Transformation and weighting in regression, vol.30, p.40, 1988.

C. Henri, G. Arnaud, L. Arnaud, Q. Martin, and F. Suter, Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014.

C. Walfredo and F. Berman, A model for moldable supercomputer jobs, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS, vol.8, p.15, 2000.

C. Adaptative, Moab Workload Manager Documentation, p.22, 2017.

C. Víctor, D. Saumen, K. Sven, R. Sean, and B. Ludäscher, Scientific workflows and provenance: Introduction and research opportunities, Datenbank-Spektrum, vol.12, issue.3, p.14, 2012.

D. Leonardo and R. Menon, OpenMP: An Industry-Standard API for Shared-Memory Programming, IEEE Comput. Sci. Eng, vol.5, issue.1, p.13, 1998.

D. Charles, On the origin of species, p.25, 2004.

J. Du, Y. Joseph, and . Leung, Complexity of scheduling parallel task systems, SIAM Journal on Discrete Mathematics, vol.2, issue.4, p.17, 1989.

A. Duran and K. Michael, The Intel® many integrated core architecture, High Performance Computing and Simulation (HPCS), 2012 International Conference on, pp.365-366, 2012.

D. Pierre-françois, M. Michael, P. Millian, R. Olivier, . Desai et al., Batsim: A Realistic Language-Independent Resources and Jobs Management Systems Simulator, Job Scheduling Strategies for Parallel Processing, p.64, 2017.

D. Pierre-françois, S. Erik, S. Abhinav, and D. Trystram, Online Non-preemptive Scheduling to Optimize Max Stretch on a Single Machine, Computing and Combinatorics -22nd International Conference, p.17, 2016.

F. Umer, M. Zied, and M. Habib, FPGA Architectures: An Overview, Tree-based Heterogeneous FPGA Architectures: Application Specific Exploration and Optimization, p.14, 2012.

F. Dror, Online; last access, Parallel Workloads Archive: Logs, p.67, 2018.

G. Dror and . Feitelson, Metrics for parallel job scheduling and their convergence, Workshop on Job Scheduling Strategies for Parallel Processing, p.36, 2001.

G. Dror and . Feitelson, Resampling with feedback -a new paradigm of using workload data for performance evaluation, European Conference on Parallel Processing, vol.65, p.12, 2016.

G. Dror and . Feitelson, Workload modeling for computer systems performance evaluation, p.25, 2015.

G. Dror, . Feitelson, A. Morris, and . Jettee, Improved utilization and responsiveness with gang scheduling, Workshop on Job Scheduling Strategies for Parallel Processing, p.23, 1997.

G. Dror, . Feitelson, and R. Larry, Metrics and benchmarking for parallel job scheduling, Workshop on Job Scheduling Strategies for Parallel Processing, p.69, 1998.

G. Dror, . Feitelson, R. Larry, S. Uwe, C. Kenneth et al., Theory and practice in parallel job scheduling, Workshop on Job Scheduling Strategies for Parallel Processing, pp.1-34, 1997.

G. Dror, . Feitelson, T. Dan, and D. Krakov, Experience with using the Parallel Workloads Archive, Journal of Parallel and Distributed Computing, vol.74, p.61, 2014.

G. Dror, . Feitelson, T. Dan, and D. Krakov, Experience with using the parallel workloads archive, Journal of Parallel and Distributed Computing, vol.74, p.36, 2014.

P. Message, . Forum, and . Mpi, A Message-Passing Interface Standard, p.13, 1994.

F. Eitan, G. Dror, and . Feitelson, Pitfalls in parallel job scheduling evaluation, Workshop on Job Scheduling Strategies for Parallel Processing, p.12, 2005.

F. Jerome, H. Trevor, and R. Tibshirani, The elements of statistical learning, Springer series in statistics, vol.1, p.27, 2001.

F. U. Haohuan, H. E. Conghui, and C. Bingwei, 18.9Pflopss Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of 18-Hz and 8-meter Scenarios, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC '17, vol.2, p.1, 2017.

G. Ana, A. Guillaume, and A. Benoit, Scheduling the I/O of HPC Applications Under Congestion, IPDPS. IEEE, p.17, 2015.

R. Michael, D. S. Garey, and . Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, p.13, 1979.

E. Gaussier, J. Lelong, V. Reis, and D. Trystram, Online Tuning of EASY-Backfilling using Queue Reordering Policies, IEEE Transactions on Parallel and Distributed Systems, vol.29, pp.2304-2316, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01963216

G. Eric, G. David, R. Valentin, and D. Trystram, Improving Backfilling by Using Machine Learning to Predict Running Times, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC '15, vol.64, pp.1-64, 2015.

Y. Georgiou, Resource and Job Management in High Performance Computing, p.22, 2010.
URL : https://hal.archives-ouvertes.fr/tel-01499598

G. Ian, B. Yoshua, and A. Courville, Deep learning, vol.30, p.27, 2016.

R. Lewis, G. , E. Leighton, L. , J. Karel et al., Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey, Annals of Discrete Mathematics, vol.5, pp.287-326, 1979.

L. Johann, J. Hurink, and P. Jan, Online algorithm for parallel job scheduling and strip packing, International Workshop on Approximation and Online Algorithms, p.17, 2007.

J. Klaus, A (3/2+ ?) approximation algorithm for scheduling moldable and nonmoldable parallel tasks, Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures, p.17, 2012.

K. Jansen and P. Lorant, Linear-time approximation schemes for scheduling malleable parallel tasks, Algorithmica, vol.32, p.17, 2002.

J. Eric, O. Travis, and P. Pearu, Open source scientific tools for Python, p.42, 2001.

K. Hans, T. Thomas, and G. Woeginger, Approximability and nonapproximability results for minimizing total flow time on a single machine, SIAM Journal on Computing, vol.28, p.17, 1999.

K. Johann, In: (Pragae) 1609 (2015) (cit, p.25

C. Lee and X. Cai, Scheduling one and two-processor tasks on two parallel processors, IIE transactions, vol.31, p.17, 1999.

L. Arnaud, S. U. Alan, and F. Vivien, Minimizing the stretch when scheduling flows of divisible requests, Journal of Scheduling, vol.11, p.17, 2008.

L. Arnaud, T. Denis, and Z. Salah, Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning?, In: 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019.

L. Jérôme, R. Valentin, and D. Trystram, Tuning EASY-Backfilling Queues, 21st Workshop on Job Scheduling Strategies for Parallel Processing, 2017.

A. David and . Lifka, The anl/ibm sp scheduling system, Workshop on Job Scheduling Strategies for Parallel Processing, p.24, 1995.

U. Lublin, G. Dror, and . Feitelson, The workload on parallel supercomputers: modeling the characteristics of rigid jobs, Journal of Parallel and Distributed Computing, vol.63, p.55, 2003.

L. Giorgio, M. Fernando, . Mendonça, T. Denis, and F. Wagner, Contiguity and Locality in Backfilling Scheduling, p.20, 2015.

L. Giorgio, M. Benjamin, N. Kim, T. Abhinav, S. et al., Online Non-Preemptive Scheduling to Minimize Weighted Flow-time on Unrelated Machines, p.17, 2018.

M. Scott, S. Lundberg, U. V. Guyon, S. Luxburg, and . Bengio, A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems, vol.30, p.30, 2017.

M. Cathy, V. Raj, and J. Zahorjan, A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors, ACM Transactions on Computer Systems (TOCS), vol.11, issue.2, p.23, 1993.

W. Ahuva, D. G. Mu'alem, and . Feitelson, Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling, IEEE Transactions on Parallel and Distributed Systems, vol.12, pp.21-24, 2001.

N. Bill, M. Jennifer, J. Schopf, and J. Patton, PBS Pro: Grid computing and scheduling attributes, Grid resource management, pp.183-190, 2004.

J. D. Owens, M. Houston, and D. Luebke, Proceedings of the IEEE 96, vol.5, pp.879-899, 2008.

P. Dejan, J. Peter, and . Keleher, Randomization, speculation, and adaptation in batch schedulers, Proceedings of the 2000 ACM/IEEE conference on Supercomputing, p.25, 2000.

M. L. Pinedo, Scheduling: Theory, Algorithms, and Systems. 3rd, 2008.

P. Millian, Simulation approach for resource management. (Approche par la simulation pour la gestion de ressources), p.21, 2017.

P. Gonzalo, P. Rodrigo, . Östberg, and E. Erik, Towards understanding HPC users and systems: a NERSC case study, Journal of Parallel and Distributed Computing, vol.111, p.22, 2018.

L. F. Sant'ana, D. Carastan-santos, D. Cordeiro, R. De, and . Camargo, Analysis of Potential Online Scheduling Improvements by Real-Time Strategy Selection, 2018 Symposium on High Performance Computing Systems (WSCAD). Oct, p.38, 2018.

A. F. George, C. Seber, and W. John, Nonlinear Regression. Hoboken, vol.62, p.30, 2003.

S. David, H. Thomas, and S. Julian, A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science, vol.362, pp.1140-1144, 2018.

S. Joseph, C. Waiman, Z. Honbo, and D. L. , The EASY-LoadLeveler API Project, Workshop on Job Scheduling Strategies for Parallel Processing, vol.63, p.42, 1996.

S. Srividya, K. Rajkumar, S. Vijay, and P. Sadayappan, Characterization of backfilling strategies for parallel job scheduling, Proceedings. International Conference on, vol.63, p.42, 2002.

S. Garrick, TORQUE resource manager, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p.22, 2006.

S. Achim, The self-tuning dynP job-scheduler, Parallel and Distributed Processing Symposium, vol.8, p.25, 2001.

S. Richard, A. G. Sutton, and . Barto, Introduction to reinforcement learning, vol.135, p.82, 1998.

T. Wei, L. Zhiling, D. Narayan, and D. Buettner, Fault-aware, utilitybased job scheduling on BlueGene/P systems, Cluster Computing and Workshops, pp.1-10, 2009.

. O. Top500 and . Supercomputer-sites, Online; last access, p.1, 2018.

D. Trystram, Scheduling parallel applications using malleable tasks on clusters, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS, p.17, 2001.

T. Dan, E. Yoav, G. Dror, and . Feitelson, Modeling user runtime estimates, Workshop on Job Scheduling Strategies for Parallel Processing, vol.48, p.47, 2005.

Y. E. Deshi, H. Xin, and G. Zhang, Combinatorial Optimization and Applications, Theoretical Computer Science, vol.412, p.17, 2011.

B. Andy, . Yoo, A. Morris, M. Jette, and . Grondona, Slurm: Simple linux utility for resource management, Workshop on Job Scheduling Strategies for Parallel Processing, pp.44-60, 2003.

Y. U. Jia and B. Rajkumar, A taxonomy of scientific workflow systems for grid computing, ACM Sigmod Record, vol.34, p.14, 2005.

. Sn-zhuk, Approximate algorithms to pack rectangles into several strips, Discrete Mathematics and Applications dma, vol.16, p.17, 2006.

Z. Dmitry, J. Peter, and . Keleher, Job-length estimation and performance in backfilling schedulers, Proceedings. The Eighth International Symposium on. IEEE, p.61, 1999.

, Résumé Les plate-formes de Calcul Haute Performance (de l'Anglais High Performance Computing, HPC) augmentent en taille et en complexité

C. De-manière, Dans le but de faire un usage plus responsable de ce puissance de calcul, les chercheurs consacrent beaucoup d'efforts à la conception d'algorithmes et de techniques permettant d'améliorer différents aspects de performance, tels que l'ordonnancement et la gestion des ressources. Cependant, les responsables des plate-formes HPC hésitent encore à déployer des méthodes d'ordonnancement à la fine pointe de la technologie et la plupart d'entre eux recourent à des méthodes heuristiques simples, telles que l'EASY Backfilling, qui repose sur un tri naïf premier arrivé, premier servi (de l'Anglais First-Come-First-Served, FCFS), la demande en énergie de telles plates-formes a également rapidement augmenté. Les supercalculateurs actuels ont besoin d'une puissance équivalente à celle de toute une centrale d'énergie

, ML) pour apprendre des méthodes heuristiques d'ordonnancement online de tâches parallèles. À l'aide de simulations et d'un modèle de génération de charge de travail, nous avons pu déterminer les caractéristiques des applications HPC (tâches) qui contribuent pour une réduction du ralentissement moyen des tâches dans une file d'attente d'exécution. La modélisation de ces caractéristiques par une fonction non linéaire et l'application de cette fonction pour sélectionner la prochaine tâche à exécuter dans une file d'attente ont amélioré le ralentissement moyen des tâches dans les charges de travail synthétiques, Appliquées à des traces de charges de travail réelles de plate-formes HPC très différents, ces fonctions ont néanmoins permis d'améliorer les performances

, Nous avons également évalué des autres effets tels que la relation entre la taille des tâches et leur ralentissement, la distribution des valeurs de ralentissement et le nombre de tâches mises en calcul par backfilling, par chaque plate-forme HPC et politique d'ordonnancement. Nous démontrons de manière expérimentale que l'on ne peut que gagner en remplaçant l'EASY Backfilling par la stratégie SAF (de l'Anglais Smallest estimated Area First) aidée par backfilling, car elle offre une amélioration des performances allant jusqu'à 80% dans la métrique de ralentissement, tout en maintenant la simplicité et la transparence d'EASY Backfilling. La SAF réduit le nombre de tâches à hautes valeurs de ralentissement et, Dans un deuxième temps, à l'aide de simulations et de traces de charge de travail de plusieurs plates-formes HPC réelles, nous avons effectué une analyse approfondie des résultats cumulés de quatre heuristiques simples d'ordonnancement (y compris l'EASY Backfilling)

, une fonction non linéaire des caractéristiques des tâches peuvent être apprises automatiquement, bien qu'il soit subjectif de conclure si le raisonnement qui sous-tend les décisions d'ordonnancement de ces heuristiques est clair ou non. (ii) La zone (l'estimation du temps d'exécution multipliée par le nombre de processeurs) des tâches semble être une propriété assez importante pour une bonne heuristique d'ordonnancement des tâches parallèles, car un bon nombre d'heuristiques (notamment la SAF) qui ont obtenu de bonnes performances ont la zone de la tâche comme entrée (iii) Le mécanisme de backfilling semble toujours contribuer à améliorer les performances, bien que cela ne remédie pas à un meilleur tri de la file, Dans l'ensemble, nous avons obtenu les remarques suivantes : (i) des heuristiques simples et efficaces sous la forme d