E. Agullo, Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for, p.2010
URL : https://hal.archives-ouvertes.fr/inria-00547847

T. William, A. , and T. M. Pinkston, Characterizing the Cell EIB On-Chip Network, IEEE Micro, vol.275, pp.6-14, 2007.

B. Alexander, Mapping Adl to the Bird-Meertens Formalism, 1994.

B. Alexander, D. Engelhardt, and A. Wendelborn, A supercomputer implementation of a functional data parallel language, 1994.

B. Alexander, D. Engelhardt, and A. Wendelborn, An overview of the Adl language project, 1997.

B. Alexander and A. Wendelborn, Automated transformation of BMF programs, The First International Workshop on Object Systems and Software Architectures, pp.133-141, 2004.

M. Amini, Par4All: From convex array regions to heterogeneous computing, 2nd International Workshop on Polyhedral Compilation Techniques, Impact, p.2012, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00744733

J. Ansel, PetaBricks: A Language and Compiler for Algorithmic Choice, ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.

A. Openmp, OpenMP Application Programming Interface 3.1, 2011.

J. Armstrong, Concurrent Programming in ERLANG, 1993.

C. Augonnet, Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective, 2011.

C. Augonnet, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Euro-Par 2009 Parallel Processing, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

J. Backus, Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs, 1977.

D. Barthou, QIRAL: A High Level Language for Lattice QCD Code Generation, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00666885

C. Basaran and K. Kyoung-don, Grex: An efficient MapReduce framework for graphics processing units, Journal of Parallel and Distributed Computing, vol.73, issue.4, 2013.
DOI : 10.1016/j.jpdc.2013.01.004

R. Bird, Lectures on Constructive Functional Programming, 1988.
DOI : 10.1007/978-3-642-74884-4_5

G. Blelloch, Nesl: A Nested Data-Parallel Language, 3, 1995.

E. Guy and . Blelloch, Prefix sums and their applications, Synthesis of Parallel Algorithms, pp.35-60, 1991.

E. Guy, . Blelloch, W. Gary, and . Sabot, Compiling Collection-Oriented Languages onto Massively Parallel Computers, 1989.

D. Robert and . Blumofe, Cilk: An efficient multithreaded runtime system, 1995.

G. Bosilca, DAGuE: A generic distributed DAG engine for high performance computing, Parallel Computing, vol.381, pp.37-51, 2011.

G. Bosilca, Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441
DOI : 10.1109/IPDPS.2011.299

M. Boyer, Load balancing in a changing world, Proceedings of the ACM International Conference on Computing Frontiers, CF '13, p.21, 2013.
DOI : 10.1145/2482767.2482794

S. Breitinger, The Eden coordination model for distributed memory systems, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments, pp.120-124, 1997.
DOI : 10.1109/HIPS.1997.582964

R. Andre and . Brodtkorb, State-of-the-art in heterogeneous computing, In: Scientific Programming, vol.181, pp.1-33, 2010.

F. Broquedis, ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010.
DOI : 10.1007/s10766-010-0136-3

URL : https://hal.archives-ouvertes.fr/inria-00496295

F. Broquedis, hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications In: PDP 2010 -The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, IEEE, 2010.

I. Buck, Brook for GPUs: stream computing on graphics hardware, ACM SIGGRAPH 2004 Papers. SIGGRAPH '04, pp.777-786, 2004.

D. Cann, Retire Fortran? A debate rekindled, In: Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pp.264-272, 1991.

M. Manuel and . Chakravarty, Accelerating Haskell array codes with multicore GPUs, Proceedings of the sixth workshop on Declarative aspects of multicore programming, pp.3-14, 2011.

M. T. Manuel and . Chakravarty, Nepal -Nested Data-Parallelism in Haskell, Euro- Par '01, pp.524-534, 2001.

B. L. Chamberlain, The case for high-level parallel programming in ZPL, IEEE Computational Science and Engineering, vol.5, issue.3, pp.76-86, 1998.
DOI : 10.1109/99.714604

B. Chamberlain, A Brief Overview of Chapel (revision 1.0) " . In: To be published, 2013.

I. Christadler and V. Weinberg, RapidMind: Portability across Architectures and Its Limitations, Lecture Notes in Computer Science, vol.23, issue.3, pp.4-15, 2011.
DOI : 10.1177/1094342009106195

K. Claessen and J. Hughes, QuickCheck, ACM SIGPLAN Notices, vol.46, issue.4, pp.53-64, 2011.
DOI : 10.1145/1988042.1988046

C. Clauss, Evaluation and improvements of programming models for the Intel SCC many-core processor, 2011 International Conference on High Performance Computing & Simulation, pp.525-532, 2011.
DOI : 10.1109/HPCSim.2011.5999870

M. Cole, Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming, Parallel Computing, vol.30, issue.3, pp.389-406, 2004.
DOI : 10.1016/j.parco.2003.12.002

U. Consortium, UPC Language Specifications, 1.2, 2005.

M. Cosnard and M. Loi, Automatic task graph generation techniques, In: System Sciences II. Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995.

J. Darlington, Functional skeletons for parallel coordination, pp.55-66, 1995.
DOI : 10.1007/BFb0020455

J. Darlington, Parallel programming using skeleton functions, Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe. PARLE '93, pp.146-160, 1993.
DOI : 10.1007/3-540-56891-3_12

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.6680

J. Dean and S. Ghemawat, MapReduce, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, pp.137-150, 2004.
DOI : 10.1145/1327452.1327492

J. Peter, J. B. Denning, and . Dennis, The resurgence of parallelism, Commun. ACM, vol.536, pp.30-32, 2010.

R. Dolbeau, S. Bihan, and F. Bodin, HMPP: A hybrid multi-core parallel programming environment, 2007.

U. Drepper and I. Molnar, The native POSIX thread library for Linux, 2003.

K. Fatahalian, Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

J. Fischer, S. Gorlatch, and H. Bischof, Foundations of Data-parallel Skeletons, pp.1-27, 2003.
DOI : 10.1007/978-1-4471-0097-3_1

M. Fluet, Manticore, Proceedings of the 2007 workshop on Declarative aspects of multicore architectures , DAMP '07, pp.37-44, 2007.
DOI : 10.1145/1248648.1248656

M. Forum, MPI: A Message-Passing Interface Standard 3.0, 2012.

S. Frenz, A Practical Comparison of Cluster Operating Systems Implementing Sequential and Transactional Consistency, AndrzejM. Goscinski, and Wanlei Zhou. Lecture Notes in Computer Science, vol.3719, pp.23-33, 2005.
DOI : 10.1007/11564621_3

URL : https://hal.archives-ouvertes.fr/hal-01271219

J. Gaudiot, The Sisal model of functional programming and its implementation, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis, pp.112-123, 1997.
DOI : 10.1109/AISPAS.1997.581640

T. Gautier, XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
DOI : 10.1109/IPDPS.2013.66

URL : https://hal.archives-ouvertes.fr/hal-00799904

D. Gelernter and N. Carriero, Coordination languages and their significance, Communications of the ACM, vol.352, p.96, 1992.
DOI : 10.1145/129630.376083

A. Geser and S. Gorlatch, Parallelizing Functional Programs by Generalization, Journal of Functional Programming, pp.46-60, 1997.

A. Ghuloum, Ct, Proceedings of the 4th ACM SIGPLAN workshop on Commercial users of functional programming , CUFP '07, 2007.
DOI : 10.1145/1362702.1362707

H. González-vélez and M. Leyton, A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers In: Software: Practice and Experience 40, pp.1135-1160, 2010.

C. Grelck and S. Scholz, SAC, Proceedings of the 2007 workshop on Declarative aspects of multicore architectures , DAMP '07, pp.401-412, 2003.
DOI : 10.1145/1248648.1248654

D. Grewe and M. F. O-'boyle, A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL, Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, pp.286-305, 2011.
DOI : 10.1007/978-3-540-92990-1_4

D. Grewe, Z. Wand, and M. F. O-'boyle, Portable mapping of data parallel programs to OpenCL for heterogeneous systems, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2013.
DOI : 10.1109/CGO.2013.6494993

K. Group, OpenCL Specification, 1.2, 2011.

P. Haller and M. Odersky, Scala Actors: Unifying thread-based and eventbased programming In: Theoretical Computer Science 410, Distributed Computing Techniques, pp.2-3, 2009.

K. Hammond, Parallel Functional Programming: An Introduction, International Symposium on Parallel Symbolic Computation. Hagenberg, 1994.

K. Hammond and Á. Portillo, HaskSkel: Algorithmic Skeletons in Haskell, In: Implementation of Functional Languages, pp.181-198, 2000.
DOI : 10.1007/10722298_11

A. Heinecke, Towards High-Performance Implementations of a Custom HPC Kernel Using ?? Array Building Blocks, Facing the Multicore -Challenge II
DOI : 10.1145/1095408.1095421

R. Ed, D. Keller, J. Kramer, and . Weiss, Lecture Notes in Computer Science, pp.36-47

P. Hilfinger, Titanium Language Reference Manual, 2.20, 2006.

R. Huggahalli, R. Iyer, and S. Tetrick, Direct Cache Access for High Bandwidth Network I, Proceedings of the 32nd annual international symposium on Computer Architecture. ISCA '05, pp.50-59, 2005.

A. Hugo, Composing multiple StarPU applications over heterogeneous machines: a supervised approach, Third International Workshop on Accelerators and Hybrid Exascale Systems, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00824514

I. Spe, Runtime Management Library Version 2.2. 2.2, 2007.

E. Jeannot, G. Mercier, and F. Tessier, TreeMatch : Un algorithme de placement de processus sur architectures multicoeurs, French. In: RenPAR - 21e Rencontres Francophones du Parallélisme, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00773254

V. Laxmikant, S. Kale, and . Krishnan, CHARM++: a portable concurrent object oriented system based on C++ In: Proceedings of the eighth annual conference on Objectoriented programming systems, languages, and applications. OOPSLA '93, pp.91-108, 1993.

G. Keller, Regular, shape-polymorphic, parallel arrays in Haskell, Proceedings of the 15th ACM SIGPLAN international conference on Functional programming. ICFP '10, pp.261-272, 2010.

P. H. Kelly, Functional programming for loosely-coupled multiprocessors, 1989.

K. Kennedy, C. Koelbel, and H. Zima, The rise and fall of High Performance Fortran, Proceedings of the third ACM SIGPLAN conference on History of programming languages , HOPL III, 2007.
DOI : 10.1145/1238844.1238851

J. Kim, SnuCL, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.341-352, 2012.
DOI : 10.1145/2304576.2304623

J. Kim, Achieving a single compute device image in OpenCL for multiple GPUs, In: SIGPLAN Notices, vol.468, p.277, 2011.

S. Monica, . Lam, C. Martin, and . Rinard, Coarse-grain parallel programming in Jade, In: ACM SIGPLAN Notices, vol.26, issue.7, pp.94-105, 1991.

S. Carlos and L. Lama, Static Multi-device Load Balancing for OpenCL, Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on. IEEE. 2012, pp.675-682

S. Lee, GPU kernels as data-parallel array computations in Haskell, Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods, 2009.

C. Lin and L. Snyder, ZPL: An array sublanguage, Lecture Notes in Computer Science, vol.768, pp.96-114, 1994.
DOI : 10.1007/3-540-57659-2_6

H. Loidl, Comparing Parallel Functional Languages: Programming and Performance, Higher-Order and Symbolic Computation, vol.16, issue.3, pp.203-2511025641323400, 2003.
DOI : 10.1023/A:1025641323400

R. Loogen, Y. Ortega-mallén, and R. Peña-marí, Parallel functional programming in Eden, Journal of Functional Programming, vol.15, issue.3, pp.431-476, 2005.
DOI : 10.1017/S0956796805005526

R. Lottiaux, OpenMosix, OpenSSI and Kerrighed: a comparative study In: Cluster Computing and the Grid, IEEE International Symposium on, vol.2, issue.2, pp.1016-1023, 2005.

C. Luk, S. Hong, and H. Kim, Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009.
DOI : 10.1145/1669112.1669121

E. Lusk and K. Yelick, LANGUAGES FOR HIGH-PRODUCTIVITY COMPUTING: THE DARPA HPCS LANGUAGE PROJECT, Parallel Processing Letters, pp.89-102, 2007.
DOI : 10.1142/S0129626407002892

. Luxrender, GPL physically based renderer, 2013.

G. Mainland and G. Morrisett, Nikola: embedding compiled GPU functions in Haskell, Proceedings of the third ACM Haskell symposium on Haskell. Haskell '10, pp.67-78, 2010.

R. William and . Mark, Cg: a system for programming graphics hardware in a C-like language, In: ACM Trans. Graph, vol.223, pp.896-907, 2003.

D. Michael, S. Mccool, and . Toit, Metaprogramming GPUs with Sh, 2004.

D. Melpignano, Platform 2012, a many-core computing accelerator for embedded SoCs, Proceedings of the 49th Annual Design Automation Conference on, DAC '12, pp.1137-1142, 2012.
DOI : 10.1145/2228360.2228568

G. Michaelson, NESTED ALGORITHMIC SKELETONS FROM HIGHER ORDER FUNCTIONS, Parallel Algorithms and Applications, vol.16, issue.3, pp.181-206, 2001.
DOI : 10.1007/BFb0012826

. Inc and . Multicoreware, GMAC: Global Memory for Accelerator, TM: Task Manager, 2011.

M. Nakao, Productivity and Performance of Global-View Programming with XcalableMP PGAS Language, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.402-409
DOI : 10.1109/CCGrid.2012.118

N. Nakasato, Astrophysical Particle Simulations on Heterogeneous CPU- GPU Systems, p.1199, 1206.

C. J. Newburn, Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language, International Symposium on Code Generation and Optimization (CGO 2011), pp.224-235, 2011.
DOI : 10.1109/CGO.2011.5764690

W. Robert, J. Numrich, and . Reid, Co-array Fortran for parallel programming, In: SIGPLAN Fortran Forum, vol.17, issue.2, pp.1-31, 1998.

S. L. Peyton and . Jones, Parallel Implementations of Functional Programming Languages, The Computer Journal, vol.32, issue.2, pp.175-186, 1989.
DOI : 10.1093/comjnl/32.2.175

P. Simon, . Jones, R. David, and . Lester, Implementing functional languages: a tutorial, 1992.

J. Planas, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

R. Plasmeijer, M. Van-eekelen, and M. Plasmeijer, Functional programming and parallel graph rewriting, 1993.

C. Martin, M. S. Rinard, and . Lam, The design, implementation, and evaluation of Jade, In: ACM Trans. Program. Lang. Syst, vol.203, pp.483-545, 1998.

M. C. Rinard, D. J. Scales, and M. S. Lam, Heterogeneous parallel programming in Jade, Proceedings Supercomputing '92, pp.245-256
DOI : 10.1109/SUPERC.1992.236678

P. Roe and A. Wendelborn, Implicit Array Copying: Prevention is Better than Cure, 1992.

P. Van, R. , and S. Haridi, Concepts, Techniques, and Models of Computer Programming, 2004.

V. Saraswat, Report on the Programming Language X10. Version 2.3, 2013.

W. Schreiner, Parallel Functional Programming -an Annotated Bibliography, 1993.

J. T. Schwartz, Programming with sets; an introduction to SETL, pp.0-387, 1986.

S. Michael-lee, Programming language pragmatics, 2009.

K. Spafford, J. Meredith, and J. Vetter, Maestro: Data Orchestration and Tuning for OpenCL Devices, Euro-Par 2010-Parallel Processing, pp.275-286, 2010.
DOI : 10.1007/978-3-642-15291-7_26

S. Inc and . Microsystems, The Fortress Language Specification. Version 1.0, 2007.

B. Svensson and R. Newton, Programming Future Parallel Architectures with Haskell and Intel ArBB, 2011.

B. Svensson and M. Sheeran, Parallel programming in Haskell almost for free, Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing, FHPC '12, pp.3-14, 2012.
DOI : 10.1145/2364474.2364477

J. Svensson, M. Sheeran, and K. Claessen, Obsidian: A Domain Specific Embedded Language for Parallel Programming of Graphics Processors, Implementation and Application of Functional Languages, pp.156-173, 2011.
DOI : 10.1007/978-3-540-25935-0_3

D. Tarditi, S. Puri, and J. Oglesby, Accelerator: using data parallelism to program GPUs for general-purpose uses, ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pp.325-335, 2006.

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, Parallel and Distributed Systems, pp.260-274, 2002.
DOI : 10.1109/71.993206

P. W. Trinder, Algorithm + strategy = parallelism, Journal of Functional Programming, vol.8, issue.1, pp.23-60, 1998.
DOI : 10.1017/S0956796897002967

K. Vaidyanathan and D. K. Panda, Benefits of I/O Acceleration Technology (I/OAT) in Clusters, 2007 IEEE International Symposium on Performance Analysis of Systems & Software, pp.220-229, 2007.
DOI : 10.1109/ISPASS.2007.363752

T. White, Hadoop: The definitive guide. O'Reilly Media, 2012.

J. Windows, Automated Parallelisation of code written in the Bird-Meertens Formalism, 2003.

M. Wolfe, Implementing the PGI Accelerator model, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, 2010.
DOI : 10.1145/1735688.1735697

Y. Zhang, Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications, 2011 IEEE International Symposium on Workload Characterization (IISWC), pp.205-215, 2011.
DOI : 10.1109/IISWC.2011.6114180

G. Zheng, Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers, 2010 39th International Conference on Parallel Processing Workshops, 2010.
DOI : 10.1109/ICPPW.2010.65