K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands et al., The landscape of parallel computing research: A view from berkeley, 2006.

J. L. Henning, SPEC CPU2006 benchmark descriptions, ACM SIGARCH Computer Architecture News, vol.34, issue.4, pp.1-17, 2006.
DOI : 10.1145/1186736.1186737

D. Bailey, The NAS parallel benchmarks---summary and preliminary results, Proceedings of the 1991 ACM/IEEE conference on Supercomputing , Supercomputing '91, pp.158-165, 1991.
DOI : 10.1145/125826.125925

K. Hoste, Analysis, estimation and optimization of computer system performance using machine learning, 2010.

C. Bienia and K. Li, Benchmarking modern multiprocessors, 2011.

J. J. Yi, H. Vandierendonck, L. Eeckhout, and D. J. Lilja, The exigency of benchmark and compiler drift, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.75-86, 2006.
DOI : 10.1145/1183401.1183414

K. C. Barr, Summarizing multiprocessor program execution with versatile , microarchitecture-independent snapshots, 2006.

S. Hong and H. Kim, An integrated gpu power and performance model, ACM SIGARCH Computer Architecture News, pp.280-289, 2010.
DOI : 10.1145/1815961.1815998

G. Marin and J. Mellor-crummey, Cross-architecture performance predictions for scientific applications using parameterized models, ACM SIGMETRICS Performance Evaluation Review, pp.2-13, 2004.
DOI : 10.1145/1012888.1005691

URL : http://www.cs.rice.edu/~johnmc/papers/MM-SIGMETRICS04.pdf

J. Cavazos, C. Dubach, F. Agakov, E. Bonilla, M. F. O-'boyle et al., Automatic performance model construction for the fast software exploration of new hardware designs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems , CASES '06, pp.24-34, 2006.
DOI : 10.1145/1176760.1176765

]. E. Petit, P. De-oliveira-castro, T. Menour, B. Krammer, and W. Jalby, Computing-kernels performance prediction using dataflow analysis and microbenchmarking, International Workshop on Compilers for Parallel Computers, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00699525

C. Haine, O. Aumage, E. Petit, and D. Barthou, Exploring and evaluating array layout restructuration for SIMDization, Proceedings of the 27th international conference on Languages and Compilers for Parallel Computing, p.14
DOI : 10.1007/978-3-319-17473-0_23

URL : https://hal.archives-ouvertes.fr/hal-01070467

T. Sherwood, E. Perelman, and B. Calder, Basic block distribution analysis to find periodic behavior and simulation points in applications, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pp.3-14, 2001.
DOI : 10.1109/PACT.2001.953283

URL : http://www.cse.ucsd.edu/~calder/papers/PACT-01-BBDA.pdf

G. , D. Michell, and R. K. Gupta, Hardware/software co-design, Proceedings of the IEEE, pp.349-365, 1997.

C. Dubach, T. M. Jones, and M. F. O-'boyle, Exploring and predicting the architecture/optimising compiler co-design space, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, CASES '08, pp.31-40, 2008.
DOI : 10.1145/1450095.1450103

K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter et al., Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, p.4, 2008.
DOI : 10.1109/SC.2008.5222004

C. Lattner and V. Adve, LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp.75-86, 2004.
DOI : 10.1109/CGO.2004.1281665

S. R. Ladd, Acovea: Analysis of compiler options via evolutionary algorithm, 2007.

K. Hoste and L. Eeckhout, Cole, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.165-174, 2008.
DOI : 10.1145/1356058.1356080

G. Fursin, Milepost GCC: Machine Learning Enabled Self-tuning Compiler, International Journal of Parallel Programming, vol.16, issue.2???3, pp.296-327, 2011.
DOI : 10.1088/1742-6596/16/1/071

URL : https://hal.archives-ouvertes.fr/hal-00685276

J. Y. Joshua, R. Sendag, L. Eeckhout, A. Joshi, D. J. Lilja et al., Evaluating benchmark subsetting approaches, IEEE International Symposium on Workload Characterization, pp.93-104, 2006.

A. Phansalkar, A. Joshi, and L. K. John, Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite, ACM SIGARCH Computer Architecture News, pp.412-423, 2007.

G. Hamerly, E. Perelman, J. Lau, and B. Calder, Simpoint 3.0: Faster and more flexible program phase analysis, Journal of Instruction Level Parallelism, vol.7, issue.4, pp.1-28, 2005.
DOI : 10.1201/9781420037425.ch7

E. Perelman, G. Hamerly, M. Van-biesbrouck, T. Sherwood, and B. Calder, Using simpoint for accurate and efficient simulation, ACM SIGMETRICS Performance Evaluation Review, pp.318-319, 2003.
DOI : 10.1145/781027.781076

URL : http://www.cs.ucsd.edu/~calder/papers/SIGMETRICS-03-SimPoint.pdf

L. Eeckhout, J. Sampson, and B. Calder, Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., pp.2-12, 2005.
DOI : 10.1109/IISWC.2005.1525996

URL : http://www.elis.ugent.be/~leeckhou/papers/iiswc05-phase.pdf

Y. Lee and M. Hall, A Code Isolator: Isolating Code Fragments from Large Programs, Languages and Compilers for High Performance Computing, pp.164-178, 2005.
DOI : 10.1007/11532378_13

P. De-oliveira-castro, C. Akel, E. Petit, M. Popov, and W. Jalby, CERE: LLVM Based Codelet Extractor and REplayer for Piecewise Benchmarking and Optimization, Transactions on Architecture and Code Optimization, vol.12, issue.1, p.6, 2015.

M. Popov, C. Akel, F. Conti, W. Jalby, P. De-oliveira et al., PCERE: Fine-Grained Parallel Benchmark Decomposition for Scalability Prediction, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.1151-1160, 2015.
DOI : 10.1109/IPDPS.2015.19

URL : https://hal.archives-ouvertes.fr/hal-01417304

M. Popov, C. Akel, W. Jalby, and P. D. Castro, Piecewise Holistic Autotuning of Compiler and Runtime Parameters, Euro-Par 2016 Parallel Processing -22nd International Conference, p.2016
DOI : 10.1145/1755888.1755903

URL : https://hal.archives-ouvertes.fr/hal-01417211

P. De-oliveira-castro, Y. Kashnikov, C. Akel, M. Popov, and W. Jalby, Fine-grained Benchmark Subsetting for System Selection, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pp.132-142, 2014.
DOI : 10.1145/2581122.2544144

URL : https://hal.archives-ouvertes.fr/hal-00952256

]. J. Dongarra, K. London, S. Moore, P. Mucci, and D. Terpstra, Using papi for hardware performance monitoring on linux systems, Proc. Conf. on Linux Clusters, pp.25-27, 2001.

A. S. Charif-rubial, E. Oseret, J. Noudohouenou, W. Jalby, and G. Lartigue, CQA: A code quality analyzer tool at binary level, 2014 21st International Conference on High Performance Computing (HiPC), pp.1-10, 2014.
DOI : 10.1109/HiPC.2014.7116904

URL : https://hal.archives-ouvertes.fr/hal-01658710

B. Sprunt, The basics of performance-monitoring hardware, IEEE Micro, vol.22, issue.4, pp.64-71, 2002.
DOI : 10.1109/MM.2002.1028477

S. C. Johnson, Lint, a C program checker, 1977.

L. Djoudi, D. Barthou, P. Carribault, C. Lemuet, J. Acquaviva et al., Maqao: Modular assembler quality analyzer and optimizer for itanium 2, The 4th Workshop on EPIC architectures and compiler technology, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00141075

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010.
DOI : 10.1109/ICPPW.2010.38

URL : http://arxiv.org/pdf/1104.4874

A. S. Phansalkar, Measuring program similarity for efficient benchmarking and performance analysis of computer systems, 2007.

K. Hoste and L. Eeckhout, Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics, 2006 IEEE International Symposium on Workload Characterization, pp.83-92, 2006.
DOI : 10.1109/IISWC.2006.302732

URL : http://www.bioperf.org/HE06.pdf

V. Palomares, Combining static and dynamic approaches to model loop performance in hpc, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01293040

Z. Bendifallah, Generalization of the decremental performance analysis to differential analysis, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01293039

H. Vandierendonck and K. De-bosschere, Experiments with subsetting benchmark suites WWC-7, Workload Characterization, pp.55-62, 2004.
DOI : 10.1109/wwc.2004.1437398

URL : http://escher.elis.ugent.be/publ/Edocs/DOC/P104_101.pdf

A. Joshi, A. Phansalkar, L. Eeckhout, and L. K. John, Measuring benchmark similarity using inherent program characteristics, IEEE Transactions on Computers, vol.55, issue.6, pp.769-782, 2006.
DOI : 10.1109/TC.2006.85

H. Vandierendonck and K. De-bosschere, Many benchmarks stress the same bottlenecks, Workshop on Computer Architecture Evaluation Using Commercial Workloads, 2004.

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, Automatically characterizing large scale program behavior, ACM SIGARCH Computer Architecture News, pp.45-57, 2002.
DOI : 10.1145/635506.605403

J. Macqueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp.281-297, 1967.

G. Hamerly and C. Elkan, Alternatives to the k-means algorithm that find better clusterings, Proceedings of the eleventh international conference on Information and knowledge management , CIKM '02, pp.600-607, 2002.
DOI : 10.1145/584792.584890

URL : http://ai.ucsd.edu/~ghamerly/academic_papers/techreport_01_alternatives.ps.gz

J. H. Ward, Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963.
DOI : 10.1007/BF02289263

M. Kaur and U. Kaur, Comparison between k-mean and hierarchical algorithm using query redirection, International Journal of Advanced Research in Computer Science and Software Engineering, vol.3, issue.7, 2013.

R. Thorndike, Who belongs in the family?, Psychometrika, vol.18, issue.4, pp.267-276, 1953.
DOI : 10.1007/BF02289263

I. Jolliffe, Principal component analysis, 2002.

A. Hyvärinen, J. Hurri, and P. O. Hoyer, Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, 2009.
DOI : 10.1007/978-1-84882-491-1

]. E. Ipek, B. R. De-supinski, M. Schulz, and S. A. Mckee, An Approach to Performance Prediction for Parallel Applications, European Conference on Parallel Processing, pp.196-205, 2005.
DOI : 10.1007/11549468_24

URL : https://digital.library.unt.edu/ark:/67531/metadc873470/m2/1/high_res_d/878233.pdf

K. D. Cooper, P. J. Schielke, and D. Subramanian, Optimizing for reduced code space using genetic algorithms, SIGPLAN Notices, pp.1-9, 1999.
DOI : 10.1145/315253.314414

URL : http://www.cs.rice.edu/~keith/EMBED/lctes99.pdf

K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John et al., Performance prediction based on inherent program similarity, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, pp.114-122, 2006.
DOI : 10.1145/1152154.1152174

URL : http://lca.ece.utexas.edu/pubs/aashish_pact06.pdf

D. Whitley, A genetic algorithm tutorial, Statistics and Computing, vol.4, issue.2, pp.65-85, 1994.
DOI : 10.1007/BF00175354

URL : http://www.cs.uga.edu/~potter/CompIntell/ga_tutorial.pdf

W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical recipes: The art of scientific computing, 1986.

C. Bienia, S. Kumar, and K. Li, PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on chipmultiprocessors, Workload Characterization, pp.47-56, 2008.

T. E. Carlson, W. Heirman, K. Van-craeynest, and L. Eeckhout, BarrierPoint: Sampled simulation of multi-threaded applications, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014.
DOI : 10.1109/ISPASS.2014.6844456

T. Lafage and A. Seznec, Choosing Representative Slices of Program Execution for Microarchitecture Simulations: A Preliminary Application to the Data Stream, Workload characterization of emerging computer applications, pp.145-163, 2001.
DOI : 10.1007/978-1-4615-1613-2_7

URL : https://hal.archives-ouvertes.fr/inria-00476687

T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi et al., SimFlex: Statistical Sampling of Computer System Simulation, IEEE Micro, vol.26, issue.4, pp.18-31, 2006.
DOI : 10.1109/MM.2006.79

M. Van-biesbrouck, T. Sherwood, and B. Calder, A co-phase matrix to guide simultaneous multithreading simulation, " in Performance Analysis of Systems and Software, IEEE International Symposium on-ISPASS, pp.45-56, 2004.

T. E. Carlson, W. Heirman, and L. Eeckhout, Sampled simulation of multi-threaded applications, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.2-12, 2013.
DOI : 10.1109/ISPASS.2013.6557141

E. K. Ardestani and J. Renau, ESESC: A fast multicore simulator using Time-Based Sampling, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp.448-459, 2013.
DOI : 10.1109/HPCA.2013.6522340

URL : http://masc.soe.ucsc.edu/docs/hpca13.pdf

J. Gonzalez, J. Gimenez, and J. Labarta, Automatic detection of parallel applications computation phases, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-11, 2009.
DOI : 10.1109/IPDPS.2009.5161027

D. E. Knuth, An empirical study of FORTRAN programs, Software: Practice and Experience, pp.105-133, 1971.
DOI : 10.1002/spe.4380010203

C. Akel, Y. Kashnikov, P. De-oliveira-castro, and W. Jalby, Is Sourcecode Isolation Viable for Performance Characterization, International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI), 2013.
DOI : 10.1109/icpp.2013.116

URL : https://hal.archives-ouvertes.fr/hal-00952290

E. Petit, G. Papaure, and F. Bodin, Poster reception---ASTEX, Proceedings of the 2006 ACM/IEEE conference on Supercomputing , SC '06, 2006.
DOI : 10.1145/1188455.1188602

C. Liao, D. J. Quinlan, R. Vuduc, and T. Panas, Effective source-tosource outlining to support whole program empirical optimization, Languages and Compilers for Parallel Computing, pp.308-322, 2010.
DOI : 10.1007/978-3-642-13374-9_21

Y. Kashnikov, P. De-oliveira-castro, E. Oseret, and W. Jalby, Evaluating architecture and compiler design through static loop analysis, 2013 International Conference on High Performance Computing & Simulation (HPCS), pp.535-544, 2013.
DOI : 10.1109/HPCSim.2013.6641465

URL : https://hal.archives-ouvertes.fr/hal-00952298

R. E. Kessler, M. D. Hill, D. A. Wood, T. M. Conte, M. A. Hirsch et al., A comparison of trace-sampling techniques for multi-megabyte caches, Computer Design: VLSI in Computers and Processors ICCD'96. Proceedings ., 1996 IEEE International Conference on, pp.664-675, 1994.
DOI : 10.1109/12.286300

J. W. Haskins-jr and K. Skadron, Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003., pp.195-203, 2003.
DOI : 10.1109/ISPASS.2003.1190246

X. Gao, M. Laurenzano, B. Simon, and A. Snavely, Reducing overheads for acquiring dynamic memory traces, Workload Characterization Symposium, 2005. Proceedings of the IEEE International, pp.46-55, 2005.

E. Petit and F. Bodin, Code-Partitioning for a Concise Characterization of Programs for Decoupled Code Tuning, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00460897

D. Quinlan and C. Liao, The rose source-to-source compiler infrastructure, Cetus Users and Compiler Infrastructure Workshop, in conjunction with PACT, p.1, 2011.

T. Kisuki, P. M. Knijnenburg, M. F. O-'boyle, F. Bodin, and H. A. Wijshoff, A feasibility study in iterative compilation, High Performance Computing, pp.121-132, 1999.
DOI : 10.1007/BFb0094916

A. Mazouz, S. Touati, and D. Barthou, Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of SPEC OMP applications on intel architectures, 2011 International Conference on High Performance Computing & Simulation, pp.273-279, 2011.
DOI : 10.1109/HPCSim.2011.5999834

URL : https://hal.archives-ouvertes.fr/inria-00636845

B. Rountree, D. K. Lownenthal, B. R. De-supinski, M. Schulz, V. W. Freeh et al., Adagio, Proceedings of the 23rd international conference on Conference on Supercomputing, ICS '09, pp.460-469, 2009.
DOI : 10.1145/1542275.1542340

S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. I. August, Compiler optimization-space exploration, International Symposium on Code Generation and Optimization, 2003. CGO 2003., pp.204-215, 2003.
DOI : 10.1109/CGO.2003.1191546

URL : http://www.cs.princeton.edu/~nvachhar/papers/cgo01_ose.pdf

P. De-oliveira-castro, E. Petit, A. Farjallah, and W. Jalby, Adaptive sampling for performance characterization of application kernels, Concurrency and Computation: Practice and Experience, 2013.
DOI : 10.1109/SC.2010.2

URL : https://hal.archives-ouvertes.fr/hal-00952288

G. Fursin, A. Cohen, M. O. Boyle, and O. Temam, Quick and Practical Run-Time Evaluation of Multiple Program Optimizations, pp.34-53, 2007.
DOI : 10.1109/SC.1998.10004

URL : https://hal.archives-ouvertes.fr/inria-00084110

D. Sands, Reimplementing llvm-gcc as a gcc plugin, Third Annual LLVM Developers' Meeting, 2009.

C. Liao, O. Hernandez, B. Chapman, W. Chen, and W. Zheng, OpenUH: an optimizing, portable OpenMP compiler, Concurrency and Computation: Practice and Experience, vol.6, issue.18, pp.2317-2332, 2007.
DOI : 10.1002/cpe.1174

URL : http://www2.cs.uh.edu/%7Ehpctools/pub/openuh_cpe_2007.pdf

Z. Pan and R. Eigenmann, Fast, automatic, procedure-level performance tuning, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, pp.173-181, 2006.
DOI : 10.1145/1152154.1152182

URL : http://www.ece.purdue.edu/~eigenman/reports/pact2006.pdf

D. Mikushin, N. Likhogrud, E. Z. Zhang, and C. Bergström, KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, p.75, 2013.
DOI : 10.1109/IPDPSW.2014.115

A. N. Burton and P. H. Kelly, Performance prediction of paging workloads using lightweight tracing, Future Generation Computer Systems, vol.22, issue.7, pp.784-793, 2006.
DOI : 10.1016/j.future.2006.02.003

M. Arenaz, J. Touriño, and R. Doallo, XARK, ACM Transactions on Programming Languages and Systems, vol.30, issue.6, p.32, 2008.
DOI : 10.1145/1391956.1391959

M. A. Khan, H. Charles, and D. Barthou, An Effective Automated Approach to Specialization of Code, Languages and Compilers for Parallel Computing, pp.308-322, 2008.
DOI : 10.1007/978-3-540-85261-2_21

]. B. Calder, P. Feller, and A. Eustace, Value profiling, Proceedings of 30th Annual International Symposium on Microarchitecture, pp.259-269, 1997.
DOI : 10.1109/MICRO.1997.645816

Y. Chen, Y. Huang, L. Eeckhout, G. Fursin, L. Peng et al., Evaluating iterative optimization across 1000 data sets, Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI'10), 2010.
DOI : 10.1145/1809028.1806647

L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, 2009.
DOI : 10.1002/9780470316801

J. Noudohouenou, Performance prediction based on codelet driven application characterization, 2013.

E. Willighagen, GNU R package 'genalg', " 2013 Available: http://cran.r-project

L. T. Yang, X. Ma, and F. Mueller, Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution, ACM/IEEE SC 2005 Conference (SC'05), pp.40-40, 2005.
DOI : 10.1109/SC.2005.20

E. Baysal, Reverse time migration, GEOPHYSICS, vol.48, issue.11, p.1514, 1983.
DOI : 10.1190/1.1441434

M. Popov, NAS 3.0 C OpenMP

D. H. Bailey, Nas parallel benchmarks, 2011.

C. Bienia, Benchmarking modern multiprocessors, 2011.

S. N. Natarajan, B. Swamy, and A. Seznec, Modeling multi-threaded programs execution time in the many-core era, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00914335

. Intel, Reference Guide for the Intel(R) C++ Compiler 15.0

P. A. Kulkarni, M. R. Jantz, and D. B. Whalley, Improving both the performance benefits and speed of optimization phase sequence searches, SIGPLAN Notices, pp.95-104, 2010.

S. Purini and L. Jain, Finding good optimization sequences covering program space, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, p.56, 2013.
DOI : 10.1145/2400682.2400715

J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. F. O-'boyle et al., Rapidly Selecting Good Compiler Optimizations using Performance Counters, International Symposium on Code Generation and Optimization (CGO'07), pp.185-197, 2007.
DOI : 10.1109/CGO.2007.32