G. Concernant-les, ils arrivent tous en fin de classement, même si seul le cpp de calcul est pris en compte. Le plus efficace étant paradoxalement celui qui

. Il-est-intéressant, efficacité énergétique de ces machines pour des images plus petites : 300×300 (tableau 5.8) Dans ce cas, toutes les machines sont surdimensionnées : le bi-Yorkfield affichant une cadence de traitement de 26 315 images/sec et le "petit" Penryn U9300 une cadence de 2 777 images/sec. L'ordre de performance est maintenant respecté : le U9300

. Jusqu-'à-maintenant, les performances des machines étaient évaluées pour des tailles fixes d'images. Il peut être intéressant de prendre le problème à l'envers et de s'interroger sur l'intervalle de taille d'image pour lequel ces processeurs sont performants

M. Aldinucci, M. Danelutto, and P. Dazzi, Muskel : an expandable skeleton environment, Scalable Computing : Practice and Experience, pp.325-341, 2007.

M. Aldinucci, M. Danelutto, and P. Teti, An advanced environment supporting structured parallel programming in Java, Future Generation Computer Systems, vol.19, issue.5, pp.611-626, 2003.
DOI : 10.1016/S0167-739X(02)00172-3

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, spring joint computer conference, AFIPS '67 (Spring), Proceedings of the, pp.483-485, 1967.

B. Bacci, M. Danelutto, S. Pelagatti, and M. Vanneschi, SkIE: A heterogeneous environment for HPC applications, Parallel Computing, vol.25, issue.13-14, pp.13-141827, 1999.
DOI : 10.1016/S0167-8191(99)00072-1

B. Bacci, M. Danelutto, S. Pelagatti, M. Vanneschi, and S. Orlando, Summarising an experiment in parallel programming language design, Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking, HPCN Europe '95, pp.7-13, 1995.
DOI : 10.1007/BFb0046602

R. M. Badia, D. Du, E. Huedo, A. Kokossis, I. M. Llorente et al., Integration of GRID Superscalar and GridWay Metascheduler with the DRMAA OGF Standard, Euro-Par '08 : Proceedings of the 14th international Euro-Par conference on Parallel Processing, pp.445-455, 2008.
DOI : 10.1007/978-3-540-85451-7_49

H. B. Bakoglu, G. F. Grohoski, and R. K. Montoye, The IBM RISC System/6000 processor: Hardware overview, IBM Journal of Research and Development, vol.34, issue.1, pp.12-22, 1990.
DOI : 10.1147/rd.341.0012

P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta, CellSs: a Programming Model for the Cell BE Architecture, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.17

A. Bergmann, Linux on Cell Broadband Engine status update, Proceedings of the Linux Symposium, pp.21-27, 2007.

R. J. Blainey, Instruction scheduling in the TOBEY compiler, IBM Journal of Research and Development, vol.38, issue.5, pp.577-593, 1994.
DOI : 10.1147/rd.385.0577

F. Bodin and S. Bihan, Heterogeneous Multicore Parallel Programming for Graphics Processing Units, Scientific Programming, pp.325-336, 2009.
DOI : 10.1155/2009/784893

G. Horatiu, B. , and H. Kuchen, Skil : An imperative language with algorithmic skeletons for efficient distributed programming, Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing, HPDC '96, p.243, 1996.

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian et al., Brook for gpus : stream computing on graphics hardware

R. Calkin, H. Hempel, P. Hoppe, and . Wypior, Portable programming with the PARMACS message-passing library, Parallel Computing, vol.20, issue.4, pp.615-632, 1994.
DOI : 10.1016/0167-8191(94)90031-0

D. Caromel and M. Leyton, Fine Tuning Algorithmic Skeletons, Euro-Par, pp.72-81, 2007.
DOI : 10.1007/978-3-540-74466-5_9

P. Ciechanowicz, M. Poldner, and H. Kuchen, The Münster Skeleton Library Muesli -A Comprehensive Overview, 2009.

F. Clément, V. Martin, A. Vodicka, R. D. Cosmo, and P. Weis, Domain decomposition and skeleton programming with OCamlP3l, Parallel Computing, vol.32, issue.7-8, pp.539-550, 2006.
DOI : 10.1016/j.parco.2006.04.003

M. Cole, Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming, Parallel Computing, vol.30, issue.3, pp.389-406, 2004.
DOI : 10.1016/j.parco.2003.12.002

I. Murray and . Cole, Algorithmic Skeletons : Structured Management of Parallel Computation, 1989.

P. Courbin, A. Pédron, T. Saidani, and L. Lacassagne, Parallélisation d'opéateurs de TI : multi-coeurs, Cell ou GPU ?, Actes de la Conférence du GRETSI, 2009.

X. Martorell, D. Jimenez-gonzalez, and A. Ramirez, Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications, Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software, pp.210-219, 2007.

L. Dagum and R. Menon, OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998.
DOI : 10.1109/99.660313

M. Danelutto and M. Stigliani, SKElib: Parallel Programming with Skeletons in C, Proceedings from the 6th International Euro-Par Conference on Parallel Processing, Euro-Par '00, pp.1175-1184, 2000.
DOI : 10.1007/3-540-44520-X_166

J. Darlington, A. J. Field, P. G. Harrison, P. H. Kelly, D. W. Sharp et al., Parallel programming using skeleton functions, Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe, PARLE '93, pp.146-160, 1993.
DOI : 10.1007/3-540-56891-3_12

R. John, P. M. Davey, and . Dew, Abstract machine models for highly parallel computers, 1995.

K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales, AltiVec extension to PowerPC accelerates media processing, IEEE Micro, vol.20, issue.2, pp.85-95, 2000.
DOI : 10.1109/40.848475

R. Dolbeau, Hmpp : A hybrid multi-core parallel, First Workshop on General Purpose Processing on Graphics Processing Units, pp.1-5, 2007.

J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff, A set of level 3 basic linear algebra subprograms, ACM Transactions on Mathematical Software, vol.16, issue.1, pp.1-17, 1990.
DOI : 10.1145/77626.79170

A. J. Dorta, J. A. González, C. Rodríguez, and F. Sande, llc : a Parallel Skeletal Language . Parallel Processing Letters, pp.437-448, 2003.

J. C. Prener, B. Shepherd, Z. So, A. Sura, T. Wang et al., Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture, IBM Syst. J, vol.45, issue.1, pp.59-84, 2006.

A. E. Eichenberger, K. O. Brien, K. O. Brien, P. Wu, T. Chen et al., Optimizing Compiler for the CELL Processor, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), pp.161-172, 2005.
DOI : 10.1109/PACT.2005.33

J. Falcou, J. Sérot, T. Chateau, and J. T. Lapresté, Quaff: efficient C++ design for parallel skeletons, Parallel Computing, vol.32, issue.7-8, pp.604-615, 2006.
DOI : 10.1016/j.parco.2006.06.001

URL : https://hal.archives-ouvertes.fr/hal-00167412

J. Falcou, Un cluster pour la Vison Temps Réel Architecture, Outils et Applications, 2006.

J. Falcou, High Level Parallel Programming EDSL -A BOOST Libraries Use Case, BOOST'CON 09, 2009.

J. Falcou and J. Sérot, Formal semantics applied to the implementation of a skeleton-based parallel programming library, PARCO, pp.243-252, 2007.

J. Falcou and J. Sérot, EVE, an Object Oriented SIMD Library, International Conference on Computational Science, pp.314-321, 2004.
DOI : 10.1007/978-3-540-24688-6_43

URL : https://hal.archives-ouvertes.fr/hal-00103176

K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. Reiter-horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

J. F. Ferreira, J. L. Sobral, and A. J. Proenca, JaSkel: a Java skeleton-based framework for structured cluster and grid computing, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), pp.301-304, 2006.
DOI : 10.1109/CCGRID.2006.65

J. Michael and . Flynn, Some computer organizations and their effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972.

J. Michael, K. W. Flynn, and . Rudd, Parallel architectures, ACM Comput. Surv, vol.28, issue.1, pp.67-70, 1996.

H. González-vélez and M. Leyton, A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers, Software: Practice and Experience, vol.21, issue.6, pp.1135-1160, 2010.
DOI : 10.1002/spe.1026

C. Grelck, Shared memory multiprocessor support for functional array processing in SAC, Journal of Functional Programming, vol.15, issue.3, pp.353-401, 2005.
DOI : 10.1017/S0956796805005538

M. Gschwind, D. Erb, S. Manning, and M. Nutter, An Open Source Environment for Cell Broadband Engine System Software, Computer, vol.40, issue.6, pp.37-47, 2007.
DOI : 10.1109/MC.2007.192

J. L. Gustafson, Reevaluating Amdahl's law, Communications of the ACM, vol.31, issue.5, pp.532-533, 1988.
DOI : 10.1145/42411.42415

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.509.6892

C. Harris and M. Stephens, A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, pp.147-151, 1988.
DOI : 10.5244/C.2.23

C. A. Herrmann and C. Lengauer, Hdc : A higher-order language for divide-and-conquer. Parallel Processing Letters, pp.239-250, 2000.

H. and P. Hofstee, Power Efficient Processor Architecture and The Cell Processor, 11th International Symposium on High-Performance Computer Architecture, pp.258-262, 2005.
DOI : 10.1109/HPCA.2005.26

Z. Horvath, V. Zsok, P. Serrarens, and R. Plasmeijer, Parallel elementwise processable functions in concurrent clean, Mathematical and Computer Modelling, vol.38, issue.7-9, pp.865-875, 2003.
DOI : 10.1016/S0895-7177(03)90071-9

I. Standard, 1003.1c-1995 thread extensions, 1995.

L. L. , J. Falcou, T. Saidani, and D. Etiemble, Programmation par squelettes algorithmiques pour le processeur cell, SYMPA '08 : SYMPosium en Architectures nouvelles de machines, 2008.

C. R. Johns and D. A. Brokenshire, Introduction to the Cell Broadband Engine Architecture, IBM Journal of Research and Development, vol.51, issue.5, pp.503-519, 2007.
DOI : 10.1147/rd.515.0503

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer et al., Introduction to the Cell multiprocessor, IBM Journal of Research and Development, vol.49, issue.4.5, pp.589-604, 2005.
DOI : 10.1147/rd.494.0589

U. Kapasi, W. J. Dally, S. Rixner, J. D. Owens, and B. Khailany, The Imagine Stream Processor, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors, pp.282-288, 2002.
DOI : 10.1109/ICCD.2002.1106783

J. Ujval, S. Kapasi, W. J. Rixner, B. Dally, J. Khailany et al., Programmable stream processors, Computer, vol.36, issue.8, pp.54-62, 2003.

A. H. Karp and H. P. Flatt, Measuring parallel processor performance, Communications of the ACM, vol.33, issue.5, pp.539-543, 1990.
DOI : 10.1145/78607.78614

K. Kennedy and J. R. Allen, Optimizing compilers for modern architectures : a dependence-based approach, 2002.

K. Kennedy, C. Koelbel, and H. Zima, The rise and fall of High Performance Fortran, Proceedings of the third ACM SIGPLAN conference on History of programming languages , HOPL III, pp.7-8, 2007.
DOI : 10.1145/1238844.1238851

K. Opencl and W. Group, The OpenCL Specification, version 1.0, 2008.

M. Kistler, M. Perrone, and F. Petrini, Cell Multiprocessor Communication Network: Built for Speed, IEEE Micro, vol.26, issue.3, pp.10-23, 2006.
DOI : 10.1109/MM.2006.49

H. Kuchen, A Skeleton Library, Euro-Par, pp.620-629, 2002.
DOI : 10.1007/3-540-45706-2_86

A. Kumar, G. Senthilkumar, M. Krishna, N. Jayam, P. Baruah et al., A Buffered-Mode MPI Implementation for the Cell BETM Processor, Computational Science ? U ICCS 2007, pp.603-610, 2007.
DOI : 10.1007/978-3-540-72584-8_80

H. T. Kung, C. E. Leiserson, and C. University, Systolic Arrays for (VLSI). CMU-CS, Dept. of Computer Science, 1978.

S. Lee, S. Min, and R. Eigenmann, Openmp to gpgpu : a compiler framework for automatic translation and optimization, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pp.101-110, 2009.

M. Leyton and J. M. Piquer, Skandium: Multi-core Programming with Algorithmic Skeletons, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.289-296, 2010.
DOI : 10.1109/PDP.2010.26

R. Loogen, Y. Ortega-mallén, and R. Peña-marí, Parallel functional programming in Eden, Journal of Functional Programming, vol.15, issue.3, pp.431-475, 2005.
DOI : 10.1017/S0956796805005526

K. Matsuzaki, H. Iwasaki, K. Emoto, and Z. Hu, A library of constructive skeletons for sequential style of parallel programming, Proceedings of the 1st international conference on Scalable information systems , InfoScale '06, 2006.
DOI : 10.1145/1146847.1146860

M. D. Mccool, Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform, Proceeding of GSPx Multicore Applications Conference, 2006.

M. Mernik, J. Heering, and A. M. Sloane, When and how to develop domain-specific languages, ACM Computing Surveys, vol.37, issue.4, pp.316-344, 2005.
DOI : 10.1145/1118890.1118892

G. Michaelson, N. Scaife, P. Bristow, and P. King, Nested algorithmic skeletons from higher order functions. Parallel Algorithms and Applications, pp.181-206, 2001.

G. E. Moore, Cramming More Components Onto Integrated Circuits, Proceedings of the IEEE, vol.86, issue.1, pp.114-117, 1965.
DOI : 10.1109/JPROC.1998.658762

H. Moravec, Obstacle avoidance and navigation in the real world by a seeing robot rover, Robotics Institute, 1980.

C. J. Newburn, B. So, Z. Liu, M. Mccool, A. Ghuloum et al., Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language, International Symposium on Code Generation and Optimization (CGO 2011), pp.224-235, 2011.
DOI : 10.1109/CGO.2011.5764690

J. Nickolls, I. Buck, M. Garland, and K. Skadron, Scalable parallel programming with cuda. Queue, pp.40-53, 2008.

O. Kevin, K. O. Brien, Z. Brien, T. Sura, T. Chen et al., Supporting openmp on cell, International Journal of Parallel Programming, vol.36, issue.10, pp.289-311, 1007.

M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani, MPI microtask for programming the Cell Broadband Engine??? processor, IBM Systems Journal, vol.45, issue.1, pp.85-102, 2006.
DOI : 10.1147/sj.451.0085

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger et al., A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum, vol.7, issue.4, pp.80-113, 2007.
DOI : 10.1016/j.rti.2005.04.002

P. Antoine, L. Florence, S. Tarik, C. Pierre, L. Lionel et al., Parallelisation d'opérateurs de ti : multi-coeurs, cell ou gpu ? TS, pp.161-187, 2010.

S. Pemmaraju and S. Skiena, Computational Discrete Mathematics : Combinatorics and Graph Theory with Mathematica, pp.336-337, 2003.
DOI : 10.1017/CBO9781139164849

T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, and D. Etiemble, Algorithmic Skeletons within an Embedded Domain Specific Language for the CELL Processor, 2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009.
DOI : 10.1109/PACT.2009.21

URL : https://hal.archives-ouvertes.fr/hal-00905054

T. Saidani, L. Lacassagne, S. Bouaziz, and T. M. Khan, Parallelization Strategies for the Points of Interests Algorithm on the Cell Processor, ISPA '07 : Proceedings of the 5th International Symposium on Parallel and Distributed Processing and Applications, 2007.
DOI : 10.1007/978-3-540-74742-0_12

T. Saidani, L. Lacassagne, J. Falcou, C. Tadonki, and S. Bouaziz, Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study on the Harris Corner Detector, Transactions on High-Performance Embedded Architectures and Compilers III, pp.177-200, 2011.
DOI : 10.1007/s10766-007-0034-5

URL : https://hal.archives-ouvertes.fr/hal-00753708

T. Saidani, S. Piskorski, L. Lacassagne, and S. Bouaziz, Parallelization schemes for memory optimization on the cell processor, Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture, MEDEA '07, 2007.
DOI : 10.1145/1327171.1327172

URL : https://hal.archives-ouvertes.fr/hal-00753708

S. Schaetz, J. Falcou, and L. Lacassagne, Cell-MPI mastering the cell broadband engine architecture through a boost based parallel communication library, the 5th Annual Boost Libraries Conference, 2011.

J. Sérot and D. Ginhac, Skeletons for parallel image processing: an overview of the SKIPPER project, Parallel Computing, vol.28, issue.12, pp.1685-1708, 2002.
DOI : 10.1016/S0167-8191(02)00189-8

V. S. Sunderam, PVM: A framework for parallel distributed computing, Concurrency: Practice and Experience, vol.4, issue.4, pp.315-339, 1990.
DOI : 10.1002/cpe.4330020404

C. Tadonki, L. Lacassagne, T. Saidani, J. Falcou, and K. Hamidouche, The harris algorithm revisited on the cell processor, Proceedings of the 1st International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies HEART 2010, 2010.

W. Thies, M. Karczmarek, and S. Amarasinghe, StreamIt: A Language for Streaming Applications, International Conference on Compiler Construction, 2002.
DOI : 10.1007/3-540-45937-5_14

D. Vandevoorde and N. M. Josuttis, C++ Templates : The Complete Guide

M. Vanneschi, The programming model of ASSIST, an environment for parallel and distributed portable applications, Parallel Computing, vol.28, issue.12, pp.1709-1732, 2002.
DOI : 10.1016/S0167-8191(02)00188-6

T. Veldhuizen, Expression templates, 1995.

L. Todd and . Veldhuizen, C++ templates as partial evaluation, PEPM, pp.13-18, 1999.

. John-von-neumann, First draft of a report on the EDVAC, IEEE Annals of the History of Computing, vol.15, issue.4, pp.27-75, 1993.
DOI : 10.1109/85.238389

M. Wolfe, More iteration space tiling, Proceedings of the 1989 ACM/IEEE conference on Supercomputing , Supercomputing '89, 1989.
DOI : 10.1145/76263.76337