The landscape of parallel computing research: A view from berkeley, 2006. ,
SPEC CPU2006 benchmark descriptions, ACM SIGARCH Computer Architecture News, vol.34, issue.4, pp.1-17, 2006. ,
DOI : 10.1145/1186736.1186737
The NAS parallel benchmarks---summary and preliminary results, Proceedings of the 1991 ACM/IEEE conference on Supercomputing , Supercomputing '91, pp.158-165, 1991. ,
DOI : 10.1145/125826.125925
Analysis, estimation and optimization of computer system performance using machine learning, 2010. ,
Benchmarking modern multiprocessors, 2011. ,
The exigency of benchmark and compiler drift, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.75-86, 2006. ,
DOI : 10.1145/1183401.1183414
Summarizing multiprocessor program execution with versatile , microarchitecture-independent snapshots, 2006. ,
An integrated gpu power and performance model, ACM SIGARCH Computer Architecture News, pp.280-289, 2010. ,
DOI : 10.1145/1815961.1815998
Cross-architecture performance predictions for scientific applications using parameterized models, ACM SIGMETRICS Performance Evaluation Review, pp.2-13, 2004. ,
DOI : 10.1145/1012888.1005691
URL : http://www.cs.rice.edu/~johnmc/papers/MM-SIGMETRICS04.pdf
Automatic performance model construction for the fast software exploration of new hardware designs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems , CASES '06, pp.24-34, 2006. ,
DOI : 10.1145/1176760.1176765
Computing-kernels performance prediction using dataflow analysis and microbenchmarking, International Workshop on Compilers for Parallel Computers, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00699525
Exploring and evaluating array layout restructuration for SIMDization, Proceedings of the 27th international conference on Languages and Compilers for Parallel Computing, p.14 ,
DOI : 10.1007/978-3-319-17473-0_23
URL : https://hal.archives-ouvertes.fr/hal-01070467
Basic block distribution analysis to find periodic behavior and simulation points in applications, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pp.3-14, 2001. ,
DOI : 10.1109/PACT.2001.953283
URL : http://www.cse.ucsd.edu/~calder/papers/PACT-01-BBDA.pdf
Hardware/software co-design, Proceedings of the IEEE, pp.349-365, 1997. ,
Exploring and predicting the architecture/optimising compiler co-design space, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, CASES '08, pp.31-40, 2008. ,
DOI : 10.1145/1450095.1450103
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, p.4, 2008. ,
DOI : 10.1109/SC.2008.5222004
LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp.75-86, 2004. ,
DOI : 10.1109/CGO.2004.1281665
Acovea: Analysis of compiler options via evolutionary algorithm, 2007. ,
Cole, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization , CGO '08, pp.165-174, 2008. ,
DOI : 10.1145/1356058.1356080
Milepost GCC: Machine Learning Enabled Self-tuning Compiler, International Journal of Parallel Programming, vol.16, issue.2???3, pp.296-327, 2011. ,
DOI : 10.1088/1742-6596/16/1/071
URL : https://hal.archives-ouvertes.fr/hal-00685276
Evaluating benchmark subsetting approaches, IEEE International Symposium on Workload Characterization, pp.93-104, 2006. ,
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite, ACM SIGARCH Computer Architecture News, pp.412-423, 2007. ,
Simpoint 3.0: Faster and more flexible program phase analysis, Journal of Instruction Level Parallelism, vol.7, issue.4, pp.1-28, 2005. ,
DOI : 10.1201/9781420037425.ch7
Using simpoint for accurate and efficient simulation, ACM SIGMETRICS Performance Evaluation Review, pp.318-319, 2003. ,
DOI : 10.1145/781027.781076
URL : http://www.cs.ucsd.edu/~calder/papers/SIGMETRICS-03-SimPoint.pdf
Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., pp.2-12, 2005. ,
DOI : 10.1109/IISWC.2005.1525996
URL : http://www.elis.ugent.be/~leeckhou/papers/iiswc05-phase.pdf
A Code Isolator: Isolating Code Fragments from Large Programs, Languages and Compilers for High Performance Computing, pp.164-178, 2005. ,
DOI : 10.1007/11532378_13
CERE: LLVM Based Codelet Extractor and REplayer for Piecewise Benchmarking and Optimization, Transactions on Architecture and Code Optimization, vol.12, issue.1, p.6, 2015. ,
PCERE: Fine-Grained Parallel Benchmark Decomposition for Scalability Prediction, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.1151-1160, 2015. ,
DOI : 10.1109/IPDPS.2015.19
URL : https://hal.archives-ouvertes.fr/hal-01417304
Piecewise Holistic Autotuning of Compiler and Runtime Parameters, Euro-Par 2016 Parallel Processing -22nd International Conference, p.2016 ,
DOI : 10.1145/1755888.1755903
URL : https://hal.archives-ouvertes.fr/hal-01417211
Fine-grained Benchmark Subsetting for System Selection, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pp.132-142, 2014. ,
DOI : 10.1145/2581122.2544144
URL : https://hal.archives-ouvertes.fr/hal-00952256
Using papi for hardware performance monitoring on linux systems, Proc. Conf. on Linux Clusters, pp.25-27, 2001. ,
CQA: A code quality analyzer tool at binary level, 2014 21st International Conference on High Performance Computing (HiPC), pp.1-10, 2014. ,
DOI : 10.1109/HiPC.2014.7116904
URL : https://hal.archives-ouvertes.fr/hal-01658710
The basics of performance-monitoring hardware, IEEE Micro, vol.22, issue.4, pp.64-71, 2002. ,
DOI : 10.1109/MM.2002.1028477
Lint, a C program checker, 1977. ,
Maqao: Modular assembler quality analyzer and optimizer for itanium 2, The 4th Workshop on EPIC architectures and compiler technology, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00141075
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
URL : http://arxiv.org/pdf/1104.4874
Measuring program similarity for efficient benchmarking and performance analysis of computer systems, 2007. ,
Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics, 2006 IEEE International Symposium on Workload Characterization, pp.83-92, 2006. ,
DOI : 10.1109/IISWC.2006.302732
URL : http://www.bioperf.org/HE06.pdf
Combining static and dynamic approaches to model loop performance in hpc, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01293040
Generalization of the decremental performance analysis to differential analysis, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01293039
Experiments with subsetting benchmark suites WWC-7, Workload Characterization, pp.55-62, 2004. ,
DOI : 10.1109/wwc.2004.1437398
URL : http://escher.elis.ugent.be/publ/Edocs/DOC/P104_101.pdf
Measuring benchmark similarity using inherent program characteristics, IEEE Transactions on Computers, vol.55, issue.6, pp.769-782, 2006. ,
DOI : 10.1109/TC.2006.85
Many benchmarks stress the same bottlenecks, Workshop on Computer Architecture Evaluation Using Commercial Workloads, 2004. ,
Automatically characterizing large scale program behavior, ACM SIGARCH Computer Architecture News, pp.45-57, 2002. ,
DOI : 10.1145/635506.605403
Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp.281-297, 1967. ,
Alternatives to the k-means algorithm that find better clusterings, Proceedings of the eleventh international conference on Information and knowledge management , CIKM '02, pp.600-607, 2002. ,
DOI : 10.1145/584792.584890
URL : http://ai.ucsd.edu/~ghamerly/academic_papers/techreport_01_alternatives.ps.gz
Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963. ,
DOI : 10.1007/BF02289263
Comparison between k-mean and hierarchical algorithm using query redirection, International Journal of Advanced Research in Computer Science and Software Engineering, vol.3, issue.7, 2013. ,
Who belongs in the family?, Psychometrika, vol.18, issue.4, pp.267-276, 1953. ,
DOI : 10.1007/BF02289263
Principal component analysis, 2002. ,
Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, 2009. ,
DOI : 10.1007/978-1-84882-491-1
An Approach to Performance Prediction for Parallel Applications, European Conference on Parallel Processing, pp.196-205, 2005. ,
DOI : 10.1007/11549468_24
URL : https://digital.library.unt.edu/ark:/67531/metadc873470/m2/1/high_res_d/878233.pdf
Optimizing for reduced code space using genetic algorithms, SIGPLAN Notices, pp.1-9, 1999. ,
DOI : 10.1145/315253.314414
URL : http://www.cs.rice.edu/~keith/EMBED/lctes99.pdf
Performance prediction based on inherent program similarity, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, pp.114-122, 2006. ,
DOI : 10.1145/1152154.1152174
URL : http://lca.ece.utexas.edu/pubs/aashish_pact06.pdf
A genetic algorithm tutorial, Statistics and Computing, vol.4, issue.2, pp.65-85, 1994. ,
DOI : 10.1007/BF00175354
URL : http://www.cs.uga.edu/~potter/CompIntell/ga_tutorial.pdf
Numerical recipes: The art of scientific computing, 1986. ,
PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on chipmultiprocessors, Workload Characterization, pp.47-56, 2008. ,
BarrierPoint: Sampled simulation of multi-threaded applications, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014. ,
DOI : 10.1109/ISPASS.2014.6844456
Choosing Representative Slices of Program Execution for Microarchitecture Simulations: A Preliminary Application to the Data Stream, Workload characterization of emerging computer applications, pp.145-163, 2001. ,
DOI : 10.1007/978-1-4615-1613-2_7
URL : https://hal.archives-ouvertes.fr/inria-00476687
SimFlex: Statistical Sampling of Computer System Simulation, IEEE Micro, vol.26, issue.4, pp.18-31, 2006. ,
DOI : 10.1109/MM.2006.79
A co-phase matrix to guide simultaneous multithreading simulation, " in Performance Analysis of Systems and Software, IEEE International Symposium on-ISPASS, pp.45-56, 2004. ,
Sampled simulation of multi-threaded applications, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.2-12, 2013. ,
DOI : 10.1109/ISPASS.2013.6557141
ESESC: A fast multicore simulator using Time-Based Sampling, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp.448-459, 2013. ,
DOI : 10.1109/HPCA.2013.6522340
URL : http://masc.soe.ucsc.edu/docs/hpca13.pdf
Automatic detection of parallel applications computation phases, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-11, 2009. ,
DOI : 10.1109/IPDPS.2009.5161027
An empirical study of FORTRAN programs, Software: Practice and Experience, pp.105-133, 1971. ,
DOI : 10.1002/spe.4380010203
Is Sourcecode Isolation Viable for Performance Characterization, International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI), 2013. ,
DOI : 10.1109/icpp.2013.116
URL : https://hal.archives-ouvertes.fr/hal-00952290
Poster reception---ASTEX, Proceedings of the 2006 ACM/IEEE conference on Supercomputing , SC '06, 2006. ,
DOI : 10.1145/1188455.1188602
Effective source-tosource outlining to support whole program empirical optimization, Languages and Compilers for Parallel Computing, pp.308-322, 2010. ,
DOI : 10.1007/978-3-642-13374-9_21
Evaluating architecture and compiler design through static loop analysis, 2013 International Conference on High Performance Computing & Simulation (HPCS), pp.535-544, 2013. ,
DOI : 10.1109/HPCSim.2013.6641465
URL : https://hal.archives-ouvertes.fr/hal-00952298
A comparison of trace-sampling techniques for multi-megabyte caches, Computer Design: VLSI in Computers and Processors ICCD'96. Proceedings ., 1996 IEEE International Conference on, pp.664-675, 1994. ,
DOI : 10.1109/12.286300
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003., pp.195-203, 2003. ,
DOI : 10.1109/ISPASS.2003.1190246
Reducing overheads for acquiring dynamic memory traces, Workload Characterization Symposium, 2005. Proceedings of the IEEE International, pp.46-55, 2005. ,
Code-Partitioning for a Concise Characterization of Programs for Decoupled Code Tuning, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00460897
The rose source-to-source compiler infrastructure, Cetus Users and Compiler Infrastructure Workshop, in conjunction with PACT, p.1, 2011. ,
A feasibility study in iterative compilation, High Performance Computing, pp.121-132, 1999. ,
DOI : 10.1007/BFb0094916
Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of SPEC OMP applications on intel architectures, 2011 International Conference on High Performance Computing & Simulation, pp.273-279, 2011. ,
DOI : 10.1109/HPCSim.2011.5999834
URL : https://hal.archives-ouvertes.fr/inria-00636845
Adagio, Proceedings of the 23rd international conference on Conference on Supercomputing, ICS '09, pp.460-469, 2009. ,
DOI : 10.1145/1542275.1542340
Compiler optimization-space exploration, International Symposium on Code Generation and Optimization, 2003. CGO 2003., pp.204-215, 2003. ,
DOI : 10.1109/CGO.2003.1191546
URL : http://www.cs.princeton.edu/~nvachhar/papers/cgo01_ose.pdf
Adaptive sampling for performance characterization of application kernels, Concurrency and Computation: Practice and Experience, 2013. ,
DOI : 10.1109/SC.2010.2
URL : https://hal.archives-ouvertes.fr/hal-00952288
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations, pp.34-53, 2007. ,
DOI : 10.1109/SC.1998.10004
URL : https://hal.archives-ouvertes.fr/inria-00084110
Reimplementing llvm-gcc as a gcc plugin, Third Annual LLVM Developers' Meeting, 2009. ,
OpenUH: an optimizing, portable OpenMP compiler, Concurrency and Computation: Practice and Experience, vol.6, issue.18, pp.2317-2332, 2007. ,
DOI : 10.1002/cpe.1174
URL : http://www2.cs.uh.edu/%7Ehpctools/pub/openuh_cpe_2007.pdf
Fast, automatic, procedure-level performance tuning, Proceedings of the 15th international conference on Parallel architectures and compilation techniques , PACT '06, pp.173-181, 2006. ,
DOI : 10.1145/1152154.1152182
URL : http://www.ece.purdue.edu/~eigenman/reports/pact2006.pdf
KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, p.75, 2013. ,
DOI : 10.1109/IPDPSW.2014.115
Performance prediction of paging workloads using lightweight tracing, Future Generation Computer Systems, vol.22, issue.7, pp.784-793, 2006. ,
DOI : 10.1016/j.future.2006.02.003
XARK, ACM Transactions on Programming Languages and Systems, vol.30, issue.6, p.32, 2008. ,
DOI : 10.1145/1391956.1391959
An Effective Automated Approach to Specialization of Code, Languages and Compilers for Parallel Computing, pp.308-322, 2008. ,
DOI : 10.1007/978-3-540-85261-2_21
Value profiling, Proceedings of 30th Annual International Symposium on Microarchitecture, pp.259-269, 1997. ,
DOI : 10.1109/MICRO.1997.645816
Evaluating iterative optimization across 1000 data sets, Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI'10), 2010. ,
DOI : 10.1145/1809028.1806647
Finding groups in data: an introduction to cluster analysis, 2009. ,
DOI : 10.1002/9780470316801
Performance prediction based on codelet driven application characterization, 2013. ,
GNU R package 'genalg', " 2013 Available: http://cran.r-project ,
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution, ACM/IEEE SC 2005 Conference (SC'05), pp.40-40, 2005. ,
DOI : 10.1109/SC.2005.20
Reverse time migration, GEOPHYSICS, vol.48, issue.11, p.1514, 1983. ,
DOI : 10.1190/1.1441434
NAS 3.0 C OpenMP ,
Nas parallel benchmarks, 2011. ,
Benchmarking modern multiprocessors, 2011. ,
Modeling multi-threaded programs execution time in the many-core era, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00914335
Reference Guide for the Intel(R) C++ Compiler 15.0 ,
Improving both the performance benefits and speed of optimization phase sequence searches, SIGPLAN Notices, pp.95-104, 2010. ,
Finding good optimization sequences covering program space, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, p.56, 2013. ,
DOI : 10.1145/2400682.2400715
Rapidly Selecting Good Compiler Optimizations using Performance Counters, International Symposium on Code Generation and Optimization (CGO'07), pp.185-197, 2007. ,
DOI : 10.1109/CGO.2007.32