P. Used, F. Testing, X. , and F. B. , 1 Single Multicore Processor ˆ Intel Core i7 Q720 ( 4x Cores, 8x Threads, 4x L1 Cache 32KB, 4x L2 Cache 256KB, 1x L3 Cache 6MB, 1.6 GHz) This platform is equipped with a GPU

G. Gddr5, The platform has 32 GB of RAM (ECC Registred at 1600 Mhz

F. Alba, M. J. Almeida, J. Blesa, C. Cabeza, M. Cotta et al., Intel Xeon X5560 Quad Core at 2.8 GHz". The remaining 38 Nodes are bi-processors board with 2 x "Intel Xeon X5677 Quad Core at 3.46 GHz". Therefore the CAPARMOR Supercomputer is Bibliography, Most of the Nodes MALLBA: A Library of Skeletons for Combinatorial Optimisation Proceedings of the 8th International Euro-Par Conference on Parallel Processing, Euro-Par '02, p.927

Z. E. Anderson, C. Bai, J. Bischof, J. Demmel, J. Dongarra et al., LAPACK's User's Guide, 1992.

R. K. Asanovic, J. Bodik, T. Demmel, K. Keaveny, J. Keutzer et al., A view of the parallel computing landscape, Communications of the ACM, vol.52, issue.10, p.525667, 2009.
DOI : 10.1145/1562764.1562783

N. E. Ayguade, . Copty, J. Duran, Y. Hoeinger, F. Lin et al., The Design of OpenMP Tasks. Parallel and Distributed Systems, IEEE Transactions on, vol.20, issue.3, p.404418, 2009.

M. [. Aldinucci and . Danelutto, The cost of security in skeletal systems, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07), p.213220, 2007.
DOI : 10.1109/PDP.2007.79

M. [. Aldinucci, P. Danelutto, and . Dazzi, MUSKEL: An Expandable Skeleton Environment, 2007.

M. [. Aldinucci, P. Danelutto, and . Kilpatrick, Autonomic management of non-functional concerns in distributed & parallel application programming, 2009 IEEE International Symposium on Parallel & Distributed Processing, p.112, 2009.
DOI : 10.1109/IPDPS.2009.5161034

M. [. Aldinucci, P. Danelutto, M. Kilpatrick, and . Torquati, Fastow: high-level and ecient streaming on multi-core, Programming Multi-core and Many-core Computing Systems, ser. Parallel and Distributed Computing, S. Pllana, p.13, 2012.

M. Aldinucci, M. Danelutto, and P. Teti, An advanced environment supporting structured parallel programming in Java, Future Generation Computer Systems, vol.19, issue.5, p.611626, 2003.
DOI : 10.1016/S0167-739X(02)00172-3

S. [. Alt and . Gorlatch, Using Skeletons in a Java-Based Grid System
DOI : 10.1007/978-3-540-45209-6_103

M. Alt and S. Gorlatch, Adapting Java RMI for grid computing, Future Generation Computer Systems, vol.21, issue.5, p.699707, 2005.
DOI : 10.1016/j.future.2004.05.010

[. Group and A. Munshi, The OpenCL specication version 1, 2011.

[. Amd, AMD64 Architecture Programmer's Manual, -Bit and 256-Bit XOP, FMA4 and CVT16 Instructions, p.128, 2009.

S. [. Augonnet, R. Thibault, P. Namyst, and . Wacrenier, StarPU: A Unied Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurr. Comput. : Pract. Exper, vol.23, issue.2, p.187198, 2011.

S. Baskaran, J. Krishnamoorthy, A. Ramanujam, P. Rountev, and . Sadayappan, Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model, Compiler Construction, p.132146

. Bcd-+-97-]-b, B. Bacci, M. Cantalupo, S. Danelutto, D. Orlando et al., An Environment for Structured Parallel Programming, Advances in High Performance Computing, p.219234, 1997.

. A. Bcgh05, M. Benoit, S. Cole, J. Gilmore, and . Hillston, Flexible Skeletal Programming with Eskel, Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Euro-Par'05, p.761770, 2005.

. Bdf-+-06-]-u, A. Bondhugula, J. Devulapalli, P. Fernando, P. Wycko et al., Parallel FPGA-based all-pairs shortest-paths in a directed graph, Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'06), 2006.

G. Blake, R. G. Dreslinski, and T. Mudge, A survey of multicore processors, IEEE Signal Processing Magazine, vol.26, issue.6, p.2637, 2009.
DOI : 10.1109/MSP.2009.934110

. Bdo-+-95-]-b, M. Bacci, S. Danelutto, S. Orlando, M. Pelagatti et al., P3L: A structured high level programming language and its structured support, Concurrency: Practice and Experience, p.225255, 1995.

M. [. Bacci, S. Danelutto, M. Pelagatti, and . Vanneschi, SkIE: A heterogeneous environment for HPC applications, Parallel Computing, vol.25, issue.13-14, pp.13-1418271852, 1999.
DOI : 10.1016/S0167-8191(99)00072-1

R. [. Blume and . Eigenmann, Performance analysis pf parallelizing compilers on the Perfect Benchmarks programs, IEEE Transactions on Parallel and Distributed Systems, vol.3, issue.6, p.643656, 1992.
DOI : 10.1109/71.180621

]. W. Bef-+-95, R. Blume, K. Eigenmann, J. Faigin, J. Grout et al., Polaris: Improving the Eectiveness of Parallelizing Compilers, Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing, LCPC '94, p.141154, 1995.

S. Benkner, VFC: The Vienna Fortran Compiler, Scientific Programming, vol.7, issue.1, pp.67-81, 1999.
DOI : 10.1155/1999/304639

. E. Bhc-+-93-]-g, J. C. Blelloch, S. Hardwick, J. Chatterjee, M. Sipelstein et al., Implementation of a Portable Nested Data-parallel Language, SIGPLAN Not, vol.28, issue.7, p.102111, 1993.

. M. Bhl-+-09-]-r, J. R. Badia, J. Herrero, J. M. Labarta, E. S. Pérez et al., Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, p.24382456, 2009.

A. [. Bondhugula, J. Hartono, P. Ramanujam, and . Sadayappan, A Practical Automatic Polyhedral Parallelizer and Locality Optimizer, p.101113, 2008.
DOI : 10.1145/1375581.1375595

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5126

H. [. Botorog and . Kuchen, Skil: an imperative language with algorithmic skeletons for efficient distributed programming, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing HPDC-96, p.243, 1996.
DOI : 10.1109/HPDC.1996.546194

S. [. Bienia, J. P. Kumar, K. Singh, and . Li, The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, 2008.
DOI : 10.1145/1454115.1454128

G. E. Blelloch, Vector Models for Data-parallel Computing, 1990.

G. E. Blelloch, Programming parallel algorithms, Communications of the ACM, vol.39, issue.3, p.8597, 1996.
DOI : 10.1145/227234.227246

R. [. Breitinger, Y. O. Loogen, R. Mallen, and . Pena, The Eden coordination model for distributed memory systems, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments, p.120, 1997.
DOI : 10.1109/HIPS.1997.582964

]. S. Bma-+-02, S. Bromling, J. Macdonald, J. Anvik, D. Schaefer et al., Pattern-based parallel programming, Proceedings of the International Conference on Parallel Programming, p.257265, 2002.

K. [. Berger, R. D. Mckinley, P. R. Blumofe, and . Wilson, Hoard: A Scalable Memory Allocator for Multithreaded Applications, SIGOPS Oper. Syst. Rev, vol.34, issue.5, p.117128, 2000.

]. U. Brs07a, J. Bondhugula, P. Ramanujam, and . Sadayappan, Automatic mapping of nested loops to FPGAs, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'07), 2007.

]. U. Brs07b, J. Bondhugula, P. Ramanujam, and . Sadayappan, PLuTo: A Practical and Fully Automatic Polyhedral Parallelizer and Locality Optimizer, 2007.

]. J. Can87 and . Canny, Readings in Computer Vision: Issues, Problems, Principles, and Paradigms. chapter A Computational Approach to Edge Detection, p.184203, 1987.

]. B. Ccz07a, D. Chamberlain, H. P. Callahan, and . Zima, Parallel Programmability and the Chapel Language, Int. J. High Perform. Comput. Appl, vol.21, issue.3, p.291312, 2007.

]. B. Ccz07b, D. Chamberlain, H. P. Callahan, and . Zima, Parallel Programmability and the Chapel Language, Int. J. High Perform. Comput. Appl, vol.21, issue.3, p.291312, 2007.

. Barcelona-supercomputing and . Center, SMP Superscalar (SMPSs) User's Manual Version 1, 2011.

M. [. Chen, Z. Guo, and . Huang, Adaptive Cache Aware Bitier Work-Stealing in Multisocket Multicore Architectures, IEEE Transactions on Parallel and Distributed Systems, vol.24, issue.12, p.23342343, 2013.
DOI : 10.1109/TPDS.2012.322

C. P. Charles, V. Grotho, C. Saraswat, A. Donawa, K. Kielstra et al., X10: An Object-oriented Approach to Non-uniform Cluster Computing, SIGPLAN Not, issue.10, pp.40519-538, 2005.

G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat et al., Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing, 2008 37th International Conference on Parallel Processing, p.536545, 2008.
DOI : 10.1109/ICPP.2008.88

M. [. Caromel and . Leyton, Fine Tuning Algorithmic Skeletons, Proceedings of the 13th International Euro-Par Conference on Parallel Processing, Euro-Par'07, p.7281, 2007.
DOI : 10.1007/978-3-540-74466-5_9

M. [. Caromel and . Leyton, A transparent non-invasive le data model for algorithmic skeletons, IEEE International Symposium on Parallel and Distributed Processing, p.110, 2008.

C. Campbell and A. Miller, A Parallel Programming with Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures, 2011.

. Cmv-+-06-]-f, V. Clément, A. Martin, R. D. Vodicka, P. Cosmo et al., Domain Decomposition and Skeleton Programming with OCamlP31, Parallel Comput, vol.32, issue.7, p.539550, 2006.

M. Cole, Algorithmic Skeletons, 1991.
DOI : 10.1007/978-1-4471-0841-2_13

]. M. Col04 and . Cole, Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming, Parallel Comput, vol.30, issue.3, p.389406, 2004.

]. L. Cou13 and . Courtès, C Language Extensions for Hybrid CPU/GPU Programming with StarPU, 2013.

M. [. Ciechanowicz, H. Poldner, and . Kuchen, The Munster Skeleton Library Muesli: A comprehensive overview, 2009.

]. M. Dan05 and . Danelutto, QoS in Parallel Programming Through Application Managers, Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP '05, p.282289, 2005.

. Dbm-+-09-]-c, H. Dave, S. J. Bae, S. Min, R. Lee et al., Cetus: A Source-to-Source Compiler Infrastructure for Multicores, Computer, issue.12, p.423642, 2009.

S. [. Dunnweber and . Gorlatch, HOC-SA: a grid service architecture for higher-order components, IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004, p.288294, 2004.
DOI : 10.1109/SCC.2004.1358017

J. [. Dorta, C. González, F. Rodríguez, and . Sande, LLC: A PARALLEL SKELETAL LANGUAGE, Proc. of the Second International Workshop on High Level Parallel Programming and Applications, p.7788, 2003.
DOI : 10.1142/S0129626403001409

Y. [. Darlington, H. W. Guo, J. To, and . Yang, Parallel Skeletons for Structured Composition, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, 1928.

]. E. Dij79 and . Dijkstra, Classics in Software Engineering. chapter Go to Statement Considered Harmful, p.2733, 1979.

R. [. Dagum and . Menon, OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, p.4655, 1998.
DOI : 10.1109/99.660313

A. Darte, Y. Robert, F. Vivien, and F. Vivien, Scheduling and Automatic Parallelization, 2000.
DOI : 10.1007/978-1-4612-1362-8

URL : https://hal.archives-ouvertes.fr/hal-00856645

M. [. Danelutto and . Stigliani, SKElib: Parallel Programming with Skeletons in C, Proceedings from the 6th International Euro-Par Conference on Parallel Processing, Euro-Par '00, p.11751184, 2000.
DOI : 10.1007/3-540-44520-X_166

H. [. Darlington and . To, Abstract Machine Models for Highly Parallel Computers, chapter Building Parallel Applications Without Programming, p.140154, 1995.

. Boost and . Simd, Generic Programming for Portable SIMDization, Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, p.431432

M. [. Estérie, J. Gaunard, J. T. Falcou, and . Lapresté, Exploiting Multimedia Extensions in C++: A Portable Approach, Computing in Science and Engineering, vol.14, issue.5, p.7277, 2012.

C. [. Enmyren and . Kessler, SkePU, Proceedings of the fourth international workshop on High-level parallel programming and applications, HLPP '10, 2010.
DOI : 10.1145/1863482.1863487

]. J. Eva and . Evans, A Scalable Concurrent malloc(3) Implementation for FreeBSD

S. [. Frigo and . Johnson, The Design and Implementation of FFTW3, Special issue on "Program Generation, Optimization, and Platform Adaptation, p.216231, 2005.
DOI : 10.1109/JPROC.2004.840301

C. [. Frigo, K. H. Leiserson, and . Randall, The Implementation of the Cilk-5 Multithreaded Language, SIGPLAN Not, vol.33, issue.5, p.212223, 1998.

J. [. Ferreira, A. J. Sobral, and . Proenca, JaSkel: a Java skeleton-based framework for structured cluster and grid computing, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), p.301304, 2006.
DOI : 10.1109/CCGRID.2006.65

J. [. Grelck, F. Julku, and . Penczek, S-Net for multi-memory multicores, Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming, DAMP '10, p.2534, 2010.
DOI : 10.1145/1708046.1708054

S. Ghemawat and P. Menage, TCMalloc : Thread-Caching Malloc

]. J. Gpb-+-07, G. Giacomoni, B. Price, M. Bushnell, D. Vachharajani et al., Toward a toolchain for pipeline parallel programming on CMPs, Workshop on Software Tools for Multi-Core Systems, 2007.

C. Grelck, Shared Memory Multiprocessor Support for SAC, Selected Papers from the 10th International Workshop on 10th International Workshop, IFL '98, p.3853, 1999.
DOI : 10.1007/3-540-48515-5_3

]. C. Gre05 and . Grelck, Shared Memory Multiprocessor Support for Functional Array Processing in SAC, J. Funct. Program, vol.15, issue.3, p.353401, 2005.

. [. González-vélez, Self-adaptive skeletal task farm for computational grids, Parallel Computing, vol.32, issue.7-8, p.479490, 2006.
DOI : 10.1016/j.parco.2006.07.002

M. [. González-véléz and . Cole, Adaptive structured parallelism for computational grids, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '07, p.140141, 2007.
DOI : 10.1145/1229428.1229456

M. [. Gonzalez-velez and . Cole, An adaptive parallel pipeline pattern for grids, 2008 IEEE International Symposium on Parallel and Distributed Processing, p.111, 2008.
DOI : 10.1109/IPDPS.2008.4536264

M. [. González-vélez and . Leyton, A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers, Software: Practice and Experience, vol.21, issue.6
DOI : 10.1002/spe.1026

J. Herrington, Code Generation in Action, 2003.

C. [. Herrmann and . Lengauer, HDC: A Higher-Order Language for Divide-and-Conquer, 2000.

V. [. Hoberock, Y. Lu, J. C. Jia, and . Hart, Stream compaction for deferred shading, Proceedings of the 1st ACM conference on High Performance Graphics, HPG '09, p.173180, 2009.
DOI : 10.1145/1572769.1572797

J. [. Hofmann, G. Treibig, G. Hager, and . Wellein, Comparing the Performance of Dierent x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi-and Manycore Chips, Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, p.5764, 2014.

H. [. Ishihara, M. Honda, and . Sato, Development and Implementation of an Interactive Parallelization Assistance Tool for OpenMP: iPat/OMP, IEICE Transactions on Information and Systems, vol.89, issue.2
DOI : 10.1093/ietisy/e89-d.2.399

. Inta and . Intel, Automatic Parallelization with Intel Compilers, https://software.intel.com/en-us/articles/automatic-parallelization- with-intel-compilers

. Intb and . Intel, Intel Core i7-720QM, http://ark.intel.com/products

F. [. Javed and . Loulergue, OSL: Optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays, Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies, APPT '09, p.436451, 2009.
DOI : 10.1145/79173.79181

URL : https://hal.archives-ouvertes.fr/inria-00452523

]. V. Jos13 and . Joshi, A study of possible optimizations for the task scheduler "QUARK" on the shared memory architecture, 2013.

. J. Kag-+-09-]-l, I. Karam, A. Alkamal, G. A. Gatherer, D. V. Frantz et al., Trends in multicore DSP platforms, Signal Processing Magazine, issue.6, p.263849, 2009.

K. Hahn and R. Bond, Multicore software technologies, Signal Processing Magazine IEEE, vol.26, issue.6, p.8089, 2009.

J. [. Khammassi, J. P. Le-lann, A. Diguet, and . Skrzyniarz, MHPM: Multi-Scale Hybrid Programming Model: A Flexible Parallelization Methodology, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, p.7180
DOI : 10.1109/HPCC.2012.20

R. [. Klusik, S. Loogen, F. Priebe, and . Rubio, Implementation Skeletons in Eden: Low-Effort Parallel Programming, Selected Papers from the 12th International Workshop on Implementation of Functional Languages , IFL '00, p.7188, 2001.
DOI : 10.1007/3-540-45361-X_5

]. J. Kly-+-14, P. Kurzak, A. Luszczek, M. Yarkhan, J. Faverge et al., Multithreading in the PLASMA Library, Handbook of Multi and Many-Core Processing: Architecture, Algorithms, Programming, and Applications. Chapman and Hall/CRC, 2014.

P. Kobalicek, Complete x86/x64 JIT and Remote Assembler for C++, https

]. J. Kos and . Koskinen, Metaprogramming in C++, www.cs.tut

L. Lamport, The parallel execution of DO loops, Communications of the ACM, vol.17, issue.2, p.8393, 1974.
DOI : 10.1145/360827.360844

E. A. Lee, The Problem with Threads, Computer, vol.39, issue.5, p.3342, 2006.
DOI : 10.1109/MC.2006.180

C. E. Leiserson, The Cilk++ Concurrency Platform, Proceedings of the 46th Annual Design Automation Conference, DAC '09, pp.522-527, 2009.

J. [. Lebak, H. Kepner, E. Homann, and . Rutledge, Parallel VSIPL++: An Open Standard Software Library for High-Performance Parallel Signal Processing, Proceedings of the IEEE, p.313330, 2005.
DOI : 10.1109/JPROC.2004.840303

O. [. Lashari, M. Lhoták, and . Mccool, Control Flow Emulation on Tiled SIMD Architectures, Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction, CC'08/ETAPS'08, p.100115, 2008.
DOI : 10.1007/978-3-540-78791-4_7

]. R. Lompnm05 and Y. Loogen, Ortega-mallén, and R. Peña marí. Parallel Functional Programming in Eden, J. Funct. Program, vol.15, issue.3, p.431475, 2005.

J. [. Leyton and . Piquer, Skandium: Multi-core Programming with Algorithmic Skeletons, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, p.289296, 2010.
DOI : 10.1109/PDP.2010.26

W. [. Leijen, S. Schulte, and . Burckhardt, The Design of a Task Parallel Library, Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '09, p.227242, 2009.

R. G. Lyons, Understanding Digital Signal Processing. Prentice-Hall accounting series, 2010.

D. Mackenzie, An Engine, Not a Camera: How Financial Models Shape Markets. Inside Technology, 2008.
DOI : 10.7551/mitpress/9780262134606.001.0001

]. M. Mcc10 and . Mccool, Structured Parallel Programming with Deterministic Patterns, Proceedings of the 2Nd USENIX Conference on Hot Topics in Parallelism, HotPar'10, p.55

D. [. Müller, M. Charypar, and . Gross, Particle-based Fluid Simulation for Interactive Applications, Proceedings of the 2003 ACM SIG- GRAPH/Eurographics Symposium on Computer Animation, SCA '03, p.154159, 2003.

. Mica and . Microsoft, Processor Information, http://msdn.microsoft.com/en- us/library/windows/desktop/ms683194 [Micb] Microsoft. Task Parallel Library

H. [. Matsuzaki, K. Iwasaki, Z. Emoto, and . Hu, A library of constructive skeletons for sequential style of parallel programming, Proceedings of the 1st international conference on Scalable information systems , InfoScale '06, 2006.
DOI : 10.1145/1146847.1146860

M. Mccool, J. Reinders, and A. Robison, Structured Parallel Programming: Patterns for Ecient Computation, 2012.

T. Mattson, B. Sanders, and B. Massingill, Patterns for Parallel Programming, 2004.

D. [. Macdonald, J. Szafron, and . Schaeer, Rethinking the pipeline as object-oriented states with transformations, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings., p.1221, 2004.
DOI : 10.1109/HIPS.2004.1299186

K. [. Mccool, B. Wadleigh, H. Y. Henderson, and . Lin, Performance Evaluation of GPUs Using the RapidMind Development Platform, Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC '06, 2006.

R. [. Ostadzadeh, C. Meeuws, K. Galuzzi, and . Bertels, QUAD ??? A Memory Access Pattern Analyser, Proceedings of the 6th International Conference on Recongurable Computing: Architectures, Tools and Applications, ARC'10, p.269281, 2010.
DOI : 10.1007/978-3-642-12133-3_25

D. Parnham, An Infrastructure for Video-Augmented Environments

D. Pnueli and C. Gutnger, Fluid Mechanics, 1992.
DOI : 10.1017/CBO9781139172561

J. [. Parnham, Y. Robinson, and . Zhao, A compact ducial for ane augmented reality, Proceedings of the 2005 IEEE International Conference on Visual Information Engineering , VIE'05, p.347352, 2005.

U. [. Peleg and . Weiser, MMX technology extension to the Intel architecture, IEEE Micro, vol.16, issue.4, p.4250, 1996.
DOI : 10.1109/40.526924

A. [. Reyes, F. Dorta, F. Almeida, and . Sande, Automatic Hybrid MPI+OpenMP Code Generation with llc, Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, p.185195, 2009.
DOI : 10.1007/978-3-642-03770-2_25

A. D. Robison, Composable Parallel Patterns with Intel Cilk Plus, Computing in Science & Engineering, vol.15, issue.2, p.6671, 2013.
DOI : 10.1109/MCSE.2013.21

V. [. Raman, J. Pentkovski, and . Keshava, Implementing streaming SIMD extensions on the Pentium III processor, IEEE Micro, vol.20, issue.4, pp.47-57, 2000.
DOI : 10.1109/40.865866

B. Schling, The Boost C++ Libraries, 2011.

D. [. Sérot and . Ginhac, Skeletons for parallel image processing: an overview of the SKIPPER project, Parallel Computing, vol.28, issue.12, p.16851708, 2002.
DOI : 10.1016/S0167-8191(02)00189-8

H. Singh, Introspective C++, 2004.

E. [. Squiillante and . Lazowska, Using Processor-Cache Anity Information in Shared-Memory Multiprocessor Scheduling, IEEE Trans. Parallel Distrib. Syst, vol.4, issue.2, p.131143, 1993.

A. Stepanov and M. Lee, The Standard Template Library, WG21/N0482, ISO Programming Language C++ Project, 1994.

J. P. Shen and M. H. Lipasti, Modern Processor Design: Fundamentals of Superscalar Processors, 2002.

D. [. Skillicorn and . Talia, Models and languages for parallel computation, ACM Computing Surveys, vol.30, issue.2, p.123169, 1998.
DOI : 10.1145/280277.280278

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.1801

]. J. Sta03 and . Stam, Real-Time Fluid Dynamics for Games, 2003.

M. [. Savage and . Zubair, A Unied Model for Multicore Architectures, Proceedings of the 1st International Forum on Next-generation Multicore/Manycore Technologies, IFMT '08, p.12, 2008.

R. [. Tam, M. Azimi, and . Stumm, Thread Clustering: Sharing-aware Scheduling on SMP-CMP-SMT Multiprocessors, SIGOPS Oper. Syst. Rev, vol.41, issue.3, p.4758, 2007.

]. K. Tss-+-03, D. Tan, J. Szafron, J. Schaeer, S. Anvik et al., Using Generative Design Patterns to Generate Parallel Code for a Distributed Memory Environment, SIGPLAN Not, issue.10, p.38203215, 2003.

C. Teijeiro, G. L. Taboada, J. Touriño, B. B. Fraguela, R. Doallo et al., Evaluation of UPC programmability using classroom studies, Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS '09, pp.110-117, 2009.
DOI : 10.1145/1809961.1809975

]. M. Van02 and . Vanneschi, The Programming Model of ASSIST, an Environment for Parallel and Distributed Portable Applications, Parallel Comput, vol.28, issue.12, p.17091732, 2002.

. Vsg-+-12-]-n, T. Ventroux, A. Sassolas, B. Guerre, R. Creusillet et al., SESAM/Par4All: A Tool for Joint Exploration of MPSoC Architectures and Dynamic Dataow Code Generation, Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO '12, p.916, 2012.

P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles, Dynamic storage allocation: A survey and critical review, Proceedings of the International Workshop on Memory Management, IWMM '95, p.1116, 1995.
DOI : 10.1007/3-540-60368-9_19

W. Wolf, Multiprocessor system-on-chip technology, IEEE Signal Processing Magazine, vol.26, issue.6, 2009.
DOI : 10.1109/MSP.2009.934138

A. Yarkhan, Dynamic Task Execution on Shared and Distributed Memory Architectures, 2012.

C. [. Yang, C. L. Lin, and . Yang, Cache-aware task scheduling on multi-core architecture, International Symposium on VLSI Design Automation and Test (VLSI-DAT), p.139142, 2010.

B. Ghz and .. , execution time on a 16 Threads SMP platform with two Intel Xeon E5620 Processor at 2.4, p.130

T. Black-scholes-programmability-comparison, Line count of the sequential version and the parallel versions using XPU (vectorized), p.131

P. Black-scholes-programmability-comparison, Line count of the sequential version and the vectorized parallel versions using XPU (Vectorized, OpenMP+SSE, p.131

X. , T. , C. Plus, and O. , generates unnecessary idles times when executing certain Task Graphs (DAG) 155 8.4 The super-scalar execution model used by FATMA, SMPSS or Quark executes asynchronously the tasks and use event-based peer-to-peer synchronization model between dependent task, This allows FATMA to eliminate unnecessary idles times when executing Task Graphs, p.155

F. Comparison-between, Q. , and .. , SMPSs implementations of the tiled Cholesky factorization on and 8 Threads Intel Core i7 Q720 processor, p.176

F. Comparison-between and P. , Static Scheduling) implementations of the tiled dgesv on an SMP platform with 2 x Intel Xeon E5620 at 2.4 GHz (16 Hardware Threads), p.178