M. A. Aba, L. Zaourar, and A. Munier, Approximation Algorithm for Scheduling a Chain of Tasks on Heterogeneous Systems, European Conference on Parallel Processing, pp.353-365, 2017.
URL : https://hal.archives-ouvertes.fr/cea-01772616

S. Amarasinghe, D. Campbell, W. Carlson, A. Chien, W. Dally et al., Exascale software study: Software challenges in extreme scale systems, DARPA IPTO, Air Force Research Labs, pp.1-153, 2009.

G. , The validity of the single processor approach to achieving large scale computing capabilities, AFIPS Conference Proceedings, pp.483-485, 1967.

F. Angiolini, L. Benini, and A. Caprara, Polynomial-time algorithm for on-chip scratchpad memory partitioning, Proceedings of the 2003 international conference on Compilers, pp.318-326, 2003.

F. Angiolini, F. Menichelli, A. Ferrero, L. Benini, and M. Olivieri, A post-compiler approach to scratchpad mapping of code, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, pp.259-267, 2004.

R. Asai, Clustering Modes in Knights Landing Processors: Developer's Guide, 2016.

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware Task Scheduling on Multi-accelerator Based Platforms, IEEE Int. Conf. on Parallel and Distributed Systems, pp.291-298, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00523937

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

G. Aupy, M. Shantharam, A. Benoit, Y. Robert, and P. Raghavan, Co-scheduling algorithms for high-throughput workload execution, In: Journal of Scheduling, vol.19, pp.627-640, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00819036

O. Avissar, R. Barua, and D. Stewart, An optimal memory allocation scheme for scratch-padbased embedded systems, In: ACM Transactions on Embedded Computing Systems (TECS), vol.1, issue.1, pp.6-26, 2002.

D. H. Bailey, The NAS Parallel Benchmarks-Summary and Preliminary Results, Proc. of the 1991 ACM/IEEE Conf. on Supercomputing, pp.0-89791, 1991.

S. Bao, Y. Huo, P. Parvathaneni, A. J. Plassard, C. Bermudez et al., A Data Colocation Grid Framework for Big Data Medical Image ProcessingBackend Design, 2017.

A. C. Bauer, H. Abbasi, J. Ahrens, H. Childs, B. Geveci et al., In situ methods, infrastructures, and applications on high performance computing platforms, Computer Graphics Forum, vol.35, pp.577-597, 2016.


P. B. Bhat, C. S. Raghavendra, and V. K. Prasanna, Efficient collective communication in distributed heterogeneous systems, In: Journal of Parallel and Distributed Computing, vol.63, pp.251-263, 2003.

R. Bitirgen, E. Ipek, and J. F. Martinez, Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach, IEEE/ACM International Symposium on. IEEE, pp.318-329, 2008.

S. Blagodurov, S. Zhuravlev, and A. Fedorova, Contention-Aware Scheduling on Multicore Systems, In: ACM Trans. Comput. Syst, vol.28, p.45, 2010.

J. Blazewicz, M. Drabowski, and J. Weglarz, Scheduling Multiprocessor Tasks to Minimize Schedule Length, Computers, IEEE Transactions on C-35, pp.18-9340, 1986.

J. Blazewicz, M. Machowiak, G. Mounié, D. Trystram, ;. Sakellariou et al., Approximation Algorithms for Scheduling Independent Malleable Tasks, Euro-Par 2001 Parallel Processing, vol.2150, pp.978-981, 2001.

J. A. Bondy and U. S. Murty, Graph theory with applications, 1976.

G. Bosilca, A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra et al., Unified model for assessing checkpointing protocols at extreme-scale, In: Concurrency and Computation: Practice and Experience, vol.26, pp.1532-0634, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00908447

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Herault et al., PaRSEC: Exploiting heterogeneity for enhancing scalability, In: Computing in Science & Engineering, vol.15, pp.36-45, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00930217

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for high performance computing, Parallel Computing, vol.38, issue.2, pp.37-51, 2012.

M. Bougeret, H. Casanova, M. Rabie, Y. Robert, and F. Vivien, Checkpointing strategies for parallel jobs, High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, pp.1-11, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00560582

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, A portable programming interface for performance evaluation on modern processors, vol.14, pp.189-204, 2000.

F. Cappello and D. Etiemble, MPI Versus MPI+OpenMP on IBM SP for the NAS Benchmarks, SC '00, 2000.

F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer et al., Toward exascale resilience, The International Journal of High Performance Computing Applications, vol.23, pp.374-388, 2009.

F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer et al., Toward exascale resilience: 2014 update, In: Supercomputing frontiers and innovations, vol.1, issue.1, pp.5-28, 2014.

D. Chandra, F. Guo, S. Kim, and Y. Solihin, Predicting inter-thread cache contention on a chip multi-processor architecture, High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on. IEEE. 2005, pp.340-351

K. Chandrasekar, X. Ni, and L. V. Kalé, A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications, IEEE Int. Parallel and Distributed Processing Symposium Workshops, pp.1293-1300, 2017.

H. Cho, B. Egger, J. Lee, and H. Shin, Dynamic data scratchpad memory management for a memory subsystem with an MMU, In: ACM SIGPLAN Notices, vol.42, issue.7, pp.195-206, 2007.

P. Computing, ZettaScaler-2.0 Configurable Liquid Immersion Cooling System, 2017.

I. Corporation, Memkind: A User Extensible Heap Manager, 2018.

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, In: FGCS, vol.22, pp.303-312, 2004.

D. Dauwe, E. Jonardi, R. Friese, S. Pasricha, A. A. Maciejewski et al., A Methodology for Co-Location Aware Application Performance Modeling in Multicore Computing, Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp.434-443, 2015.

J. Dongarra, Report on the Sunway TaihuLight system, 2016.

J. Dongarra, T. Hérault, and Y. Robert, Performance and reliability trade-offs for the double checkpointing algorithm, In: International Journal of Networking and Computing, vol.4, pp.2185-2847, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01091928

J. Dongarra, P. Beckman, P. Aerts, F. Cappello, T. Lippert et al., The international exascale software project: a call to cooperative action by the global high-performance community, The International Journal of High Performance Computing Applications, vol.23, pp.309-322, 2009.

M. Dreher and B. Raffin, A Flexible Framework for Asynchronous In Situ and In Transit Analytics for Scientific Simulations, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00941413

M. Drozdowski, Scheduling Parallel Tasks-Algorithms and Complexity, p.1584883979, 2004.

J. Du, J. Y. , and -. Leung, Complexity of Scheduling Parallel Task Systems, In: SIAM Journal on Discrete Mathematics, vol.2, pp.473-487, 1989.

B. Egger, J. Lee, and H. Shin, Dynamic scratchpad memory management for code in portable systems with an MMU, In: ACM Transactions on Embedded Computing Systems (TECS), vol.7, issue.2, p.11, 2008.

B. Egger, J. Lee, and H. Shin, Scratchpad memory management for portable systems with a memory management unit, Proceedings of the 6th ACM & IEEE International conference on Embedded software, pp.321-330, 2006.

M. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A Survey of Rollback-recovery Protocols in Message-passing Systems, In: ACM Comput. Surv, vol.34, issue.3, pp.360-0300, 2002.

E. Strohmaier, The TOP500 benchmark

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and Correction of Silent Data Corruption for Large-scale High-performance Computing, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC '12, vol.78, 2012.

M. Frigo, C. E. Leiserson, and K. H. Randall, The Implementation of the Cilk-5 Multithreaded Language, Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. PLDI '98, pp.0-89791, 1998.

A. Gainaru, G. Aupy, A. Benoit, F. Cappello, Y. Robert et al., Scheduling the I/O of HPC applications under congestion, IEEE Int. Parallel and Distributed Processing Symposium (IPDPS, pp.1013-1022, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01251938

M. R. Garey and D. S. Johnson, Computers and Intractability, a Guide to the Theory of NPCompleteness, 1979.

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, XKaapi: A Runtime System for DataFlow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.1299-1308, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00799904

J. Gecsei, D. Slutz, and I. Traiger, Evaluation techniques for storage hierarchies, In: IBM Systems journal, vol.9, pp.78-117, 1970.

N. Guan, M. Stigge, W. Yi, and G. Yu, Cache-aware Scheduling and Analysis for Multicores, Proc. 7th ACM Int. Conf. Embedded Software. EMSOFT '09, pp.245-254, 2009.

N. J. Gunther, Guerrilla capacity planning-a tactical approach to planning for highly scalable applications and services, 2007.

T. Harris, M. Maas, and V. J. Marathe, Callisto: co-scheduling parallel runtime systems, Proceedings of the Ninth European Conference on Computer Systems, p.24, 2014.

A. Hartstein, V. Srinivasan, T. Puzak, and P. Emma, On the nature of cache miss behavior: Is it ? 2, In: The Journal of Instruction-Level Parallelism, vol.10, pp.1-22, 2008.

L. He, H. Zhu, and S. A. Jarvis, Developing Graph-Based Co-Scheduling Algorithms on Multicore Computers, IEEE Trans. Parallel Distributed Systems, vol.27, pp.1617-1632, 2016.

M. T. Heath, A tale of two laws, In: Int. J. High Performance Computing Applications, vol.29, pp.320-330, 2015.

T. Herault and Y. Robert, Fault-Tolerance Techniques for High-Performance Computing, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01200488

T. Herault and Y. Robert, Fault-tolerance techniques for high-performance computing, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01200488

H. Hulett, T. G. Will, and G. J. Woeginger, Multigraph realizations of degree sequences: Maximization is easy, minimization is hard, Operations Research Letters, vol.36, issue.5, pp.594-596, 2008.

. Intel, Intel 64 and IA-32 Architectures Software Developer's Manual, 2014.

. Intel, Intel Xeon Phi Processor: Performance Monitoring Reference Manual, Registers, vol.1, 2017.

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely et al., Adaptive insertion policies for managing shared caches, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp.208-219, 2008.

Y. Jiang, X. Shen, J. Chen, and R. Tripathi, Analysis and Approximation of Optimal Coscheduling on Chip Multiprocessors, Proc. 17th Int. Conf. Parallel Architectures Compilation Techniques. PACT '08, pp.220-229, 2008.

O. Kang and D. P. , Scalable scheduling for symmetric multiprocessors (smp), In: Journal of parallel and distributed computing, vol.63, pp.273-285, 2003.

S. Kim, D. Chandra, and Y. Solihin, Fair cache sharing and partitioning in a chip multiprocessor architecture, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp.111-122, 2004.

S. Kim and J. Browne, A general approach to mapping of parallel computation upon multiprocessor architectures, International conference on parallel processing, vol.3, p.8, 1988.

R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn, Using OS observations to improve performance in multicore systems, IEEE micro, vol.28, issue.3, 2008.

A. Krishna, A. Samih, and Y. Solihin, Data sharing in multi-threaded applications and its impact on chip design, In: Int. Symp. Performance Analysis of Systems and Software (ISPASS), pp.125-134, 2012.

E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu, Evaluating STT-RAM as an energy-efficient main memory alternative, IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS), pp.256-267, 2013.

L. A. Laboratory, Simplified Interface to Complex Memory, 2017.

R. Landaverde, T. Zhang, A. K. Coskun, and M. Herbordt, In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp.1-6, 2014.

M. A. Laurenzano, M. M. Tikir, L. Carrington, and A. Snavely, PEBIL: Efficient static binary instrumentation for Linux, IEEE Int. Symp. on Performance Analysis of Systems Software (ISPASS), pp.175-183, 2010.

J. Leverich and C. Kozyrakis, Reconciling high server utilization and sub-millisecond qualityof-service, In: 9th European Conf. on Computer Systems, 2014.

J. Liedtke, H. Hartig, and M. Hohmuth, OS-controlled cache predictability for real-time systems, Real-Time Technology and Applications Symposium, 1997. Proceedings., Third IEEE, pp.213-224, 1997.

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang et al., Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems, IEEE 14th International Symposium on, pp.367-378, 2008.

D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis, Improving resource efficiency at scale with Heracles, In: ACM Transactions on Computer Systems (TOCS), vol.34, issue.2, 2016.

P. Malakar, V. Vishwanath, T. Munson, C. Knight, M. Hereld et al., Optimal scheduling of in-situ analysis for large-scale scientific simulations, Proc. of the Int. Conf. for High Performance Computing, Networking, Storage and Analysis

J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa, Contention aware execution: online contention detection and response, Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, pp.257-265, 2010.

G. Martín, D. E. Singh, M. Marinescu, and J. Carretero, Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration, In: Parallel Computing, vol.46, pp.167-8191, 2015.

R. L. Mcgregor, C. D. Antonopoulos, and D. S. Nikolopoulos, Scheduling algorithms for effective thread pairing on hybrid multiprocessors, Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, p.10

G. D. Micheli, Synthesis and Optimization of Digital Circuits. 1st. McGraw-Hill Higher Education, p.70163332, 1994.

D. Molka, D. Hackenberg, R. Schone, and W. E. Nagel, Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture, Int. Conf. on Parallel Processing (ICPP), pp.739-748, 2015.

S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning, Proc. 44th IEEE/ACM Int. Sym. Microarchitecture. MICRO-44, pp.374-385, 2011.

N. Muthuvelu, I. Chai, E. Chikkannan, and R. Buyya, Batch Resizing Policies and Techniques for Fine-Grain Grid Tasks: The Nuts and Bolts, In: J. Information Processing Systems, vol.7, issue.2, 2011.

K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith, Fair queuing memory systems, Proceedings of the 39th Annual IEEE/ACM international Symposium on Microarchitecture, pp.208-222, 2006.

K. J. Nesbit, J. Laudon, and J. E. Smith, Virtual private caches, In: ACM SIGARCH Computer Architecture News, vol.35, pp.57-68, 2007.

K. T. Nguyen, Introduction to Cache Allocation Technology in the Intel R Xeon R Processor E5

. Family, , 2016.

X. Ni, E. Meneses, and L. Kale, Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm, Cluster Computing (CLUSTER), pp.364-372, 2012.

N. Cuda, Unified Memory Programming, 2018.

L. Oden and P. Balaji, Hexe: A Toolkit for Heterogeneous Memory Management, IEEE International Conference on Parallel and Distributed Systems (ICPADS, 2017.

K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, The case for a single-chip multiprocessor, In: ACM Sigplan Notices, vol.31, issue.9, pp.2-11, 1996.

, OpenMP Architecture Review Board. OpenMP Application Program Interface, 2013.

J. K. Ousterhout, Scheduling Techniques for Concurrent Systems, In: ICDCS, vol.82, pp.22-30, 1982.

J. K. Ousterhout, D. A. Scelza, and P. S. Sindhu, Medusa: an experiment in distributed operating system structure, Communications of the ACM, vol.23, pp.92-105, 1980.

A. J. Pena and P. Balaji, Toward the efficient use of multiple explicitly managed memory subsystems, IEEE Int. Conf. on Cluster Computing (CLUSTER), pp.123-131, 2014.

S. Perarnau, M. Tchiboukdjian, and G. Huard, Controlling Cache Utilization of HPC Applications, International Conference on Supercomputing (ICS), 2011.

S. Perarnau, J. A. Zounmevo, B. Gerofi, K. Iskra, and P. Beckman, Exploring Data Migration for Future Deep-Memory Many-Core Systems, 2016.

M. K. Qureshi and Y. N. Patt, Utility-based cache partitioning: A low-overhead, highperformance, runtime mechanism to partition shared caches, Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, pp.423-432, 2006.

, Ten technical approaches to address the challenges of Exascale computing, 2014.

D. A. Reed, R. Bajcsy, M. A. Fernandez, J. Griffiths, R. D. Mott et al., Computational science: Ensuring America's competitiveness. Tech. rep. President's Information Technology Advisory Committee Arlington VA, 2005.

B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang et al., Scaling the bandwidth wall: challenges in and avenues for CMP scaling, In: ACM SIGARCH Computer Architecture News, vol.37, pp.371-382, 2009.

H. Servat, A. J. Peña, G. Llort, E. Mercadal, H. Hoppe et al., Automating the Application Data Placement in Hybrid Memory Systems, 2017 IEEE International Conference on Cluster Computing, pp.126-136, 2017.

R. Sethi and J. D. Ullman, The generation of optimal code for arithmetic expressions, In: Journal of the ACM (JACM), vol.17, pp.715-728, 1970.

C. Sewell, Large-scale compute-intensive analysis via a combined in-situ and coscheduling workflow approach, Proc. of the Int. Conf. for High Perf. Computing, Networking, Storage and Analysis, SC'15, 2015.

M. Shantharam, Y. Youn, and P. Raghavan, Speedup-aware co-schedules for efficient workload management, In: Parallel Processing Letters, vol.23, p.1340001, 2013.

T. Sherwood, B. Calder, and J. Emer, Reducing cache misses using hardware and software page placement, Proceedings of the 13th international conference on Supercomputing, pp.155-164, 1999.

A. Snavely, N. Mitchell, L. Carter, J. Ferrante, and D. Tullsen, Explorations in symbiosis on two multithreaded architectures, Workshop on Multi-Threaded Execution, Architecture, and Compilers, 1999.

A. Snavely and D. M. Tullsen, Symbiotic jobscheduling for a simultaneous mutlithreading processor, In: ACM SIGPLAN Notices, vol.35, pp.234-244, 2000.

G. E. Suh, L. Rudolph, and S. Devadas, Effects of memory performance on parallel job scheduling, Workshop on Job Scheduling Strategies for Parallel Processing, pp.116-132, 2001.

D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm, RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations, In: ACM SIGARCH Computer Architecture News, vol.37, pp.121-132, 2009.


D. Tam, R. Azimi, L. Soares, and M. Stumm, Managing shared L2 caches on multicore systems in software, Workshop on the Interaction between Operating Systems and Computer Architecture. Citeseer, pp.26-33, 2007.

G. Taylor, P. Davies, and M. Farmwald, The TLB slice-a low-cost high-speed address translation mechanism, Computer Architecture, 1990. Proceedings., 17th Annual International Symposium on, pp.355-363, 1990.

K. Tian, Y. Jiang, and X. Shen, A Study on Optimally Co-scheduling Jobs of Different Lengths on Chip Multiprocessors, Proc. 6th ACM Conf. Computing Frontiers. CF '09, pp.41-50, 2009.

T. Tobita and H. Kasahara, A standard task graph set for fair evaluation of multiprocessor scheduling algorithms, In: Journal of Scheduling, vol.5, pp.379-394, 2002.

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.1045-9219, 2002.

D. Unat, J. Shalf, T. Hoefler, T. Schulthess, and A. D. , Programming Abstractions for Data Locality, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01083080

S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson et al., An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS, IEEE International Solid-State Circuits Conference, pp.98-99, 2007.

A. Vladimirov and R. Asai, MCDRAM as High-Bandwith Memory (HBM) in Knights Landing Processors: Developer's Guide, 2016.

G. Voskuilen, A. F. Rodrigues, and S. D. Hammond, Analyzing allocation behavior for multilevel memory, Proceedings of the Second International Symposium on Memory Systems, MEMSYS 2016, pp.204-207, 2016.

G. V. Wilson, The history of the development of parallel computing, 1994.

Y. Xie and G. Loh, Dynamic classification of program memory behaviors in CMPs, the 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.

J. W. Young, A First Order Approximation to the Optimum Checkpoint Interval, Commun. ACM, vol.17, pp.1-0782, 1974.

Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang, Smite: Precise QOS prediction on realsystem SMT processors to improve utilization in warehouse scale computers, Proc. of the 47th Int. Symp. on Microarchitecture, pp.406-418, 2014.

H. Zhu, L. He, B. Gao, K. Li, J. Sun et al., Modelling and Developing Coscheduling Strategies on Multicore Processors, Int. Conf. Parallel Processing (ICPP), pp.220-229, 2015.

S. Zhuravlev, S. Blagodurov, and A. Fedorova, Addressing shared resource contention in multicore processors via scheduling, In: ACM Sigplan Notices, vol.45, pp.129-142, 2010.

S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto, Survey of scheduling techniques for addressing shared resources in multicore processors, In: ACM Computing Surveys (CSUR), vol.45, issue.1, p.4, 2012.

G. Aupy, A. Benoit, L. Pottier, P. Raghavan, Y. Robert et al., Co-scheduling high-performance computing applications, Big Data Management and Processing, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02082818

, Articles in International Refereed Journals

G. Aupy, A. Benoit, S. Dai, L. Pottier, P. Raghavan et al., Coscheduling Amdahl applications on cache-partitioned systems, In: International Journal of High Performance Computing and Applications, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01670137

A. Benoit, L. Pottier, and Y. Robert, Resilient co-scheduling of malleable applications, In: International Journal of High Performance Computing and Applications, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01670153

, Articles in International Refereed Conferences

A. Benoit, L. Pottier, and Y. Robert, Resilient application co-scheduling with processor redistribution, 45th International Conference on Parallel Processing, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01219258

A. Benoit, S. Perarnau, L. Pottier, and Y. Robert, A performance model to execute workflows on high-bandwidth-memory architectures, 47th International Conference on Parallel Processing, ICPP, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01798726

G. Aupy, A. Benoit, B. Goglin, L. Pottier, and Y. Robert, Co-scheduling HPC workloads on cache-partitioned CMP platforms, IEEE International Conference on Cluster Computing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01874154

A. Benoit, L. Pottier, and Y. Robert, Resilient application co-scheduling with processor redistribution, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01219258

G. Aupy, A. Benoit, L. Pottier, P. Raghavan, Y. Robert et al., Co-scheduling algorithms for cache-partitioned systems, p.28, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01654660

G. Aupy, A. Benoit, S. Dai, L. Pottier, P. Raghavan et al., Coscheduling Amdahl applications on cache-partitioned systems, p.33, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01670137

A. Benoit, S. Perarnau, L. Pottier, and Y. Robert, A performance model to execute workflows on high-bandwidth memory architectures, ENS Lyon, pp.1-28, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01798726

G. Aupy, A. Benoit, B. Goglin, L. Pottier, and Y. Robert, Co-scheduling HPC workloads on cache-partitioned CMP platforms, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01874154