Faster, Cheaper, Better ? a Hybridization Methodology to Develop Linear Algebra Software for, p.2010 ,
URL : https://hal.archives-ouvertes.fr/inria-00547847
Characterizing the Cell EIB On-Chip Network, IEEE Micro, vol.275, pp.6-14, 2007. ,
Mapping Adl to the Bird-Meertens Formalism, 1994. ,
A supercomputer implementation of a functional data parallel language, 1994. ,
An overview of the Adl language project, 1997. ,
Automated transformation of BMF programs, The First International Workshop on Object Systems and Software Architectures, pp.133-141, 2004. ,
Par4All: From convex array regions to heterogeneous computing, 2nd International Workshop on Polyhedral Compilation Techniques, Impact, p.2012, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00744733
PetaBricks: A Language and Compiler for Algorithmic Choice, ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009. ,
OpenMP Application Programming Interface 3.1, 2011. ,
Concurrent Programming in ERLANG, 1993. ,
Scheduling Tasks over Multicore machines enhanced with Accelerators: a Runtime System's Perspective, 2011. ,
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Euro-Par 2009 Parallel Processing, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs, 1977. ,
QIRAL: A High Level Language for Lattice QCD Code Generation, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00666885
Grex: An efficient MapReduce framework for graphics processing units, Journal of Parallel and Distributed Computing, vol.73, issue.4, 2013. ,
DOI : 10.1016/j.jpdc.2013.01.004
Lectures on Constructive Functional Programming, 1988. ,
DOI : 10.1007/978-3-642-74884-4_5
Nesl: A Nested Data-Parallel Language, 3, 1995. ,
Prefix sums and their applications, Synthesis of Parallel Algorithms, pp.35-60, 1991. ,
Compiling Collection-Oriented Languages onto Massively Parallel Computers, 1989. ,
Cilk: An efficient multithreaded runtime system, 1995. ,
DAGuE: A generic distributed DAG engine for high performance computing, Parallel Computing, vol.381, pp.37-51, 2011. ,
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441 ,
DOI : 10.1109/IPDPS.2011.299
Load balancing in a changing world, Proceedings of the ACM International Conference on Computing Frontiers, CF '13, p.21, 2013. ,
DOI : 10.1145/2482767.2482794
The Eden coordination model for distributed memory systems, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments, pp.120-124, 1997. ,
DOI : 10.1109/HIPS.1997.582964
State-of-the-art in heterogeneous computing, In: Scientific Programming, vol.181, pp.1-33, 2010. ,
ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010. ,
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295
hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications In: PDP 2010 -The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, IEEE, 2010. ,
Brook for GPUs: stream computing on graphics hardware, ACM SIGGRAPH 2004 Papers. SIGGRAPH '04, pp.777-786, 2004. ,
Retire Fortran? A debate rekindled, In: Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pp.264-272, 1991. ,
Accelerating Haskell array codes with multicore GPUs, Proceedings of the sixth workshop on Declarative aspects of multicore programming, pp.3-14, 2011. ,
Nepal -Nested Data-Parallelism in Haskell, Euro- Par '01, pp.524-534, 2001. ,
The case for high-level parallel programming in ZPL, IEEE Computational Science and Engineering, vol.5, issue.3, pp.76-86, 1998. ,
DOI : 10.1109/99.714604
A Brief Overview of Chapel (revision 1.0) " . In: To be published, 2013. ,
RapidMind: Portability across Architectures and Its Limitations, Lecture Notes in Computer Science, vol.23, issue.3, pp.4-15, 2011. ,
DOI : 10.1177/1094342009106195
QuickCheck, ACM SIGPLAN Notices, vol.46, issue.4, pp.53-64, 2011. ,
DOI : 10.1145/1988042.1988046
Evaluation and improvements of programming models for the Intel SCC many-core processor, 2011 International Conference on High Performance Computing & Simulation, pp.525-532, 2011. ,
DOI : 10.1109/HPCSim.2011.5999870
Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming, Parallel Computing, vol.30, issue.3, pp.389-406, 2004. ,
DOI : 10.1016/j.parco.2003.12.002
UPC Language Specifications, 1.2, 2005. ,
Automatic task graph generation techniques, In: System Sciences II. Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995. ,
Functional skeletons for parallel coordination, pp.55-66, 1995. ,
DOI : 10.1007/BFb0020455
Parallel programming using skeleton functions, Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe. PARLE '93, pp.146-160, 1993. ,
DOI : 10.1007/3-540-56891-3_12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.6680
MapReduce, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, pp.137-150, 2004. ,
DOI : 10.1145/1327452.1327492
The resurgence of parallelism, Commun. ACM, vol.536, pp.30-32, 2010. ,
HMPP: A hybrid multi-core parallel programming environment, 2007. ,
The native POSIX thread library for Linux, 2003. ,
Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006. ,
DOI : 10.1109/SC.2006.55
Foundations of Data-parallel Skeletons, pp.1-27, 2003. ,
DOI : 10.1007/978-1-4471-0097-3_1
Manticore, Proceedings of the 2007 workshop on Declarative aspects of multicore architectures , DAMP '07, pp.37-44, 2007. ,
DOI : 10.1145/1248648.1248656
MPI: A Message-Passing Interface Standard 3.0, 2012. ,
A Practical Comparison of Cluster Operating Systems Implementing Sequential and Transactional Consistency, AndrzejM. Goscinski, and Wanlei Zhou. Lecture Notes in Computer Science, vol.3719, pp.23-33, 2005. ,
DOI : 10.1007/11564621_3
URL : https://hal.archives-ouvertes.fr/hal-01271219
The Sisal model of functional programming and its implementation, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis, pp.112-123, 1997. ,
DOI : 10.1109/AISPAS.1997.581640
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing ,
DOI : 10.1109/IPDPS.2013.66
URL : https://hal.archives-ouvertes.fr/hal-00799904
Coordination languages and their significance, Communications of the ACM, vol.352, p.96, 1992. ,
DOI : 10.1145/129630.376083
Parallelizing Functional Programs by Generalization, Journal of Functional Programming, pp.46-60, 1997. ,
Ct, Proceedings of the 4th ACM SIGPLAN workshop on Commercial users of functional programming , CUFP '07, 2007. ,
DOI : 10.1145/1362702.1362707
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers In: Software: Practice and Experience 40, pp.1135-1160, 2010. ,
SAC, Proceedings of the 2007 workshop on Declarative aspects of multicore architectures , DAMP '07, pp.401-412, 2003. ,
DOI : 10.1145/1248648.1248654
A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL, Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, pp.286-305, 2011. ,
DOI : 10.1007/978-3-540-92990-1_4
Portable mapping of data parallel programs to OpenCL for heterogeneous systems, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2013. ,
DOI : 10.1109/CGO.2013.6494993
OpenCL Specification, 1.2, 2011. ,
Scala Actors: Unifying thread-based and eventbased programming In: Theoretical Computer Science 410, Distributed Computing Techniques, pp.2-3, 2009. ,
Parallel Functional Programming: An Introduction, International Symposium on Parallel Symbolic Computation. Hagenberg, 1994. ,
HaskSkel: Algorithmic Skeletons in Haskell, In: Implementation of Functional Languages, pp.181-198, 2000. ,
DOI : 10.1007/10722298_11
Towards High-Performance Implementations of a Custom HPC Kernel Using ?? Array Building Blocks, Facing the Multicore -Challenge II ,
DOI : 10.1145/1095408.1095421
Lecture Notes in Computer Science, pp.36-47 ,
Titanium Language Reference Manual, 2.20, 2006. ,
Direct Cache Access for High Bandwidth Network I, Proceedings of the 32nd annual international symposium on Computer Architecture. ISCA '05, pp.50-59, 2005. ,
Composing multiple StarPU applications over heterogeneous machines: a supervised approach, Third International Workshop on Accelerators and Hybrid Exascale Systems, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00824514
Runtime Management Library Version 2.2. 2.2, 2007. ,
TreeMatch : Un algorithme de placement de processus sur architectures multicoeurs, French. In: RenPAR - 21e Rencontres Francophones du Parallélisme, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00773254
CHARM++: a portable concurrent object oriented system based on C++ In: Proceedings of the eighth annual conference on Objectoriented programming systems, languages, and applications. OOPSLA '93, pp.91-108, 1993. ,
Regular, shape-polymorphic, parallel arrays in Haskell, Proceedings of the 15th ACM SIGPLAN international conference on Functional programming. ICFP '10, pp.261-272, 2010. ,
Functional programming for loosely-coupled multiprocessors, 1989. ,
The rise and fall of High Performance Fortran, Proceedings of the third ACM SIGPLAN conference on History of programming languages , HOPL III, 2007. ,
DOI : 10.1145/1238844.1238851
SnuCL, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.341-352, 2012. ,
DOI : 10.1145/2304576.2304623
Achieving a single compute device image in OpenCL for multiple GPUs, In: SIGPLAN Notices, vol.468, p.277, 2011. ,
Coarse-grain parallel programming in Jade, In: ACM SIGPLAN Notices, vol.26, issue.7, pp.94-105, 1991. ,
Static Multi-device Load Balancing for OpenCL, Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on. IEEE. 2012, pp.675-682 ,
GPU kernels as data-parallel array computations in Haskell, Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods, 2009. ,
ZPL: An array sublanguage, Lecture Notes in Computer Science, vol.768, pp.96-114, 1994. ,
DOI : 10.1007/3-540-57659-2_6
Comparing Parallel Functional Languages: Programming and Performance, Higher-Order and Symbolic Computation, vol.16, issue.3, pp.203-2511025641323400, 2003. ,
DOI : 10.1023/A:1025641323400
Parallel functional programming in Eden, Journal of Functional Programming, vol.15, issue.3, pp.431-476, 2005. ,
DOI : 10.1017/S0956796805005526
OpenMosix, OpenSSI and Kerrighed: a comparative study In: Cluster Computing and the Grid, IEEE International Symposium on, vol.2, issue.2, pp.1016-1023, 2005. ,
Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009. ,
DOI : 10.1145/1669112.1669121
LANGUAGES FOR HIGH-PRODUCTIVITY COMPUTING: THE DARPA HPCS LANGUAGE PROJECT, Parallel Processing Letters, pp.89-102, 2007. ,
DOI : 10.1142/S0129626407002892
GPL physically based renderer, 2013. ,
Nikola: embedding compiled GPU functions in Haskell, Proceedings of the third ACM Haskell symposium on Haskell. Haskell '10, pp.67-78, 2010. ,
Cg: a system for programming graphics hardware in a C-like language, In: ACM Trans. Graph, vol.223, pp.896-907, 2003. ,
Metaprogramming GPUs with Sh, 2004. ,
Platform 2012, a many-core computing accelerator for embedded SoCs, Proceedings of the 49th Annual Design Automation Conference on, DAC '12, pp.1137-1142, 2012. ,
DOI : 10.1145/2228360.2228568
NESTED ALGORITHMIC SKELETONS FROM HIGHER ORDER FUNCTIONS, Parallel Algorithms and Applications, vol.16, issue.3, pp.181-206, 2001. ,
DOI : 10.1007/BFb0012826
GMAC: Global Memory for Accelerator, TM: Task Manager, 2011. ,
Productivity and Performance of Global-View Programming with XcalableMP PGAS Language, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.402-409 ,
DOI : 10.1109/CCGrid.2012.118
Astrophysical Particle Simulations on Heterogeneous CPU- GPU Systems, p.1199, 1206. ,
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language, International Symposium on Code Generation and Optimization (CGO 2011), pp.224-235, 2011. ,
DOI : 10.1109/CGO.2011.5764690
Co-array Fortran for parallel programming, In: SIGPLAN Fortran Forum, vol.17, issue.2, pp.1-31, 1998. ,
Parallel Implementations of Functional Programming Languages, The Computer Journal, vol.32, issue.2, pp.175-186, 1989. ,
DOI : 10.1093/comjnl/32.2.175
Implementing functional languages: a tutorial, 1992. ,
Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009. ,
DOI : 10.1177/1094342009106195
Functional programming and parallel graph rewriting, 1993. ,
The design, implementation, and evaluation of Jade, In: ACM Trans. Program. Lang. Syst, vol.203, pp.483-545, 1998. ,
Heterogeneous parallel programming in Jade, Proceedings Supercomputing '92, pp.245-256 ,
DOI : 10.1109/SUPERC.1992.236678
Implicit Array Copying: Prevention is Better than Cure, 1992. ,
Concepts, Techniques, and Models of Computer Programming, 2004. ,
Report on the Programming Language X10. Version 2.3, 2013. ,
Parallel Functional Programming -an Annotated Bibliography, 1993. ,
Programming with sets; an introduction to SETL, pp.0-387, 1986. ,
Programming language pragmatics, 2009. ,
Maestro: Data Orchestration and Tuning for OpenCL Devices, Euro-Par 2010-Parallel Processing, pp.275-286, 2010. ,
DOI : 10.1007/978-3-642-15291-7_26
The Fortress Language Specification. Version 1.0, 2007. ,
Programming Future Parallel Architectures with Haskell and Intel ArBB, 2011. ,
Parallel programming in Haskell almost for free, Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing, FHPC '12, pp.3-14, 2012. ,
DOI : 10.1145/2364474.2364477
Obsidian: A Domain Specific Embedded Language for Parallel Programming of Graphics Processors, Implementation and Application of Functional Languages, pp.156-173, 2011. ,
DOI : 10.1007/978-3-540-25935-0_3
Accelerator: using data parallelism to program GPUs for general-purpose uses, ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pp.325-335, 2006. ,
Performance-effective and low-complexity task scheduling for heterogeneous computing, Parallel and Distributed Systems, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
Algorithm + strategy = parallelism, Journal of Functional Programming, vol.8, issue.1, pp.23-60, 1998. ,
DOI : 10.1017/S0956796897002967
Benefits of I/O Acceleration Technology (I/OAT) in Clusters, 2007 IEEE International Symposium on Performance Analysis of Systems & Software, pp.220-229, 2007. ,
DOI : 10.1109/ISPASS.2007.363752
Hadoop: The definitive guide. O'Reilly Media, 2012. ,
Automated Parallelisation of code written in the Bird-Meertens Formalism, 2003. ,
Implementing the PGI Accelerator model, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, 2010. ,
DOI : 10.1145/1735688.1735697
Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications, 2011 IEEE International Symposium on Workload Characterization (IISWC), pp.205-215, 2011. ,
DOI : 10.1109/IISWC.2011.6114180
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers, 2010 39th International Conference on Parallel Processing Workshops, 2010. ,
DOI : 10.1109/ICPPW.2010.65