The path to ONEVIEW configuration file containing experiment parameters. The OneView file can be generated using the following command ,
if clean" : Loop throughput (number of cycles per iteration) if all data are in L1 and scalar integer instructions removed ,
if fully vectorized" : Loop throughput (number of cycles per iteration) if all data are in L1 and fully vectorized ,
, Vector-efficiency ratio all" : Vector efficiency ratio (average proportion of used vector length) of instructions processing FP or integer elements
, Vectorization ratio all" : Vectorization ratio (proportion of vectorizable instructions that was vectorized) of instructions processing FP or integer elements
, FP op per cycle L1" : FLOPs per cycle if all data are in L1
, Speedup if clean" : speedup of instructions if clean (Cycles L1 / Cycles L1 if clean)
, Speedup if fully vectorized" : speedup of instructions if fully vectorized (Cycles L1 / Cycles L1 if fully vectorized)
, Installation Requirements Before to use ASSIST, some requirements are needed. The list below shows the required packages and libraries
, Openjdk-6-jdk: version 6 or higher (not tested with older versions)
Boost Libraries, available at git.maqao.org:S2S/LIBS.git ASSIST can be downloaded from the git, as well as the required libraries and a pre-compiled binary : ? binary (containing MAQAO, ASSIST and all libraries) : git clone git@git, S2S/LIBS.git ? ,
, ? Sources : git clone git@git.maqao.org:S2S/S2S.git
, ? Libraries required : git clone git@git.maqao.org:S2S/LIBS.git
End-to-end Auto-tuning with Active Harmony, Performance Tuning of Scientific Applications. Chapman and Hall/CRC Computational Science Series, 2010. ,
HPCTOOLKIT: tools for performance analysis of optimized parallel programs, vol.22, pp.685-701 ,
,
Multi-level tiling M for the price of one, ACM/IEEE conference on Supercomputing, p.51, 2007. ,
Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators, 2012. ,
,
Program Analysis and Specialization for the C Programming Language, 1994. ,
, APS
, ATLAS
, AVBP
Performance Tuning of x86 OpenMP Codes with MAQAO, Parallel Tools Workshop. Desden, pp.95-113, 2009. ,
Code generation in the polyhedral model is easier than you think, International Conference on Parallel Architectures and Compilation Techniques, pp.7-16, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00017260
PAMDA: Performance Assessment Using MAQAO Toolset and Differential Analysis, Tools for High Performance Computing 2013: Proceedings of the 7th International Workshop on Parallel Tools for High Performance Computing, pp.107-127, 2014. ,
Generalization of the decremental performance analysis to differential analysis, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01293039
RFC] llvm-mca: a static performance analysis tool, 2018. ,
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, ACM International Conference on Supercomputing, pp.253-260, 2014. ,
A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN Notices. ACM, pp.101-113, 2008. ,
The SmaCC Transformation Engine, OOPSLA '09, 2009. ,
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications, pp.1-11, 2010. ,
MIL: A language to build program analysis tools through static binary instrumentation, 20th Annual International Conference on High Performance Computing, pp.206-215, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00920875
CQA: A code quality analyzer tool at binary level, pp.1-10, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01658710
CHiLL: A Framework for Composing High-Level Loop Transformations, 2008. ,
AutoFDO: Automatic Feedback-directed Optimization for Warehouse-scale Applications, Proceedings of the 2016 International Symposium on Code Generation and Optimization. CGO '16, pp.978-979, 2016. ,
Automatic Source Code Specialization for Energy Reduction, Proceedings of the 2001 International Symposium on Low Power Electronics and Design. ISLPED '01. Huntington Beach, California, pp.1-58113, 2001. ,
Source Transformation, Analysis and Generation in TXL, Proceedings of the 2006 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation. PEPM '06, pp.1-11, 2006. ,
, Coria
Cetus: A source-to-source compiler infrastructure for Multicores, pp.36-42, 2009. ,
HMPP: A hybrid multi-core parallel programming environment, Workshop on general purpose processing on graphics processing units, vol.28, 2007. ,
A Language for the Compact Representation of Multiple Program Versions, International Workshop on Languages and Compilers for Parallel Computing, pp.136-151, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-00141067
a set of level 3 basic linear algebra subprograms, ACM Transactions on Mathematical Software. ACM, pp.1-17, 1990. ,
,
,
FFTW: An adaptive software architecture for the FFT, Processings of the ICASSP Conference. Nowhere: void, p.1381, 1998. ,
, MACC: Mercurium ACCeletator Model". In: International Workshop on OpenMP, 2014.
The SCALASCA Performance Toolset Architecture, International Workshop on Scalable Tools for High-End Computing (STHEC), pp.51-65, 2008. ,
Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies, International Journal of Parallel Programming. International Journal of Parallel Programming, pp.261-317, 2006. ,
, GNU
ABINIT: First-principles approach to material and nanosystem properties, Computer Physics Communications, pp.2582-2615, 2009. ,
,
Maximizing Multiprocessor Performance with the SUIF Compiler, IEEE Computer, 1996. ,
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels ,
Annotation-based empirical performance tuning using Orio, 2009 IEEE International Symposium on Parallel Distributed Processing, pp.1-11, 2009. ,
Towards the Generalization of Value Profiling for High-Performance Application Optimization ,
,
Interprocedural Analyses for Programming Environments, Workshop on Environments and Tools For Parallel Scientific Computing, 1992. ,
Partial dead code elimination, Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, pp.147-158, 1994. ,
The Long and Winding Road Toward Efficient High-Performance Computing, vol.106, pp.1985-2003, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-02179181
Mira: A Framework for Static Performance Analysis, Cluster Computing (CLUSTER), pp.978-979, 20017. ,
RASCAL A Domain Specific Language for Source Code Analysis ad Manipulation, IEEE International Working Conference on Source Code Analysis and Manipulation, pp.168-177, 2009. ,
Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Tools for High Performance Computing, pp.79-91, 2011. ,
Spectre Attacks: Exploiting Speculative Execution, 2018. ,
Quantifying Performance Bottleneck Cost Through Differential Analysis, Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS '13, pp.263-272, 2013. ,
Static and dynamic approach for performance evaluation of scientific codes, 2011. ,
Scout: A Source-to-Source Transformator for SIMD-Optimizations, Euro-Par, pp.137-145, 2012. ,
Automatic configuration of GCC using irace, Artificial Evolution. 2017, pp.202-216 ,
Techniques for Increasing and Detecting Memory Alignment, 2001. ,
When Prefetching Works, When It Doesn't, and Why, vol.9, pp.1-29, 2012. ,
,
Machine learning-based prefetch optimization for data center applications, Proceedings of the Conference on High Performance Computing Networking, pp.1-10, 2009. ,
Meltdown, 2018. ,
On the usefulness of object tracking techniques in performance analysis, SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2013. ,
PIT: A Framework for Effectively Composing High-Level Loop Transformations, Computing and Informatics. Open Journal Systems, pp.943-963, 2012. ,
Stratego/XT 0.17. A language and toolset for program transformation, Science of Computer Programming, 2008. ,
Using Dynamic Compilation to Achieve Ninja Performance for CNN Training on Many-Core Processors, Europar. IEEE, 2018. ,
,
ACOTES Project: Advanced Compiler Technologies for Embedded Streaming, vol.39, pp.397-450, 2011. ,
Code Specialization based on Value Profiles, International Static Analysis Symposium, pp.340-359, 2000. ,
CIL: Intermediate language and tools for analysis and transformation of C programs, International Conference on Compiler Construction, pp.213-228, 2002. ,
SamplePGO: The Power of Profile Guided Optimizations Without the Usability Burden, Proceedings of the 2014 LLVM Compiler Infrastructure in HPC. LLVM-HPC '14, pp.22-28, 2014. ,
,
,
, OpenMP
, GCC Instrumentation Options
, GCC Optimizations Options
,
Aestimo: a feedback-directed optimization evaluation tool, 2006. ,
TRACO Parallelizing Compiler, Soft Computing in Computer and Information Science, pp.409-421, 2015. ,
Combining static and dynamic approaches to model loop performance in HPC, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01293040
BOLT: A Practical Binary Optimizer for Data Centers and Beyond, 2018. ,
A data alignment technique for improving cache performance, Proceedings International Conference on Computer Design VLSI in Computers and Processors, pp.587-592, 1997. ,
,
,
An Automatic tool for tuning compiler optimizations, Ninth International Conference on Computer Science and Information Technologies Revised Selected Papers, pp.1-7, 2013. ,
Automatic Tuning of Compiler Optimizations and Analysis of their Impact, vol.18, pp.1312-1321, 2013. ,
Piecewise Holistic Autotuning of Parallel Programs with CERE, vol.29, p.4190, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01542912
Software Methods for Improvement of Cache Performance on Supercomputer Applications, 1989. ,
Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time, ACM SIGPLAN Notices. ACM, pp.90-100, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-01257273
,
Numerical Recipes 3rd Edition: The Art of Scientific Computing, p.521880688, 2007. ,
SPIRAL: Code generation for DSP transforms, Proceedings of the IEEE, pp.216-231, 2005. ,
,
ROSE: Compiler Support for Object-Oriented Framework, Parallel Processing Letters, pp.215-226, 2000. ,
Automated empirical optimizations of software and the ATLAS project, Parallel Computing, pp.3-35, 2000. ,
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers, pp.65-79, 2010. ,
Steady and Unsteady Flow SimulationsUsing the Hybrid Flow Solver AVBP, AIAA Journal. AIAA ARC, pp.1378-1385, 1999. ,
The Tau Parallel Performance System, vol.20, pp.287-311, 2006. ,
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers, IEEE 13th International Symposium on High Performance Computer Architecture, pp.63-74, 2007. ,
Xevtgen: Fortran code transformer generator for high performance scientific codes, International Journal of Networking and Computing, pp.263-289, 2016. ,
A Code Generation Framework for Targeting Optimized Library Calls for Multiple Platforms, IEEE Transactions on parallel and distributed systems, vol.26, 2014. ,
Locus: A System and a Language for Program Optimization, Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. CGO 2019, pp.217-228, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02135657
A Scalable Auto-tuning Framework for Compiler Optimization, Parallel and Distributed Processing, pp.1-12, 2009. ,
,
A generic approach to the definition of low-level components for multi-architecture binary analysis, 2014. ,
Polyhedral Parallel Code Generation for CUDA, ACM Trans. Architec. Code Optim, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00786677
DMS/spl reg: program transformations for practical scalable software evolution, Software Engineering, ICSE 2004. Proceedings. 26th International Conference on, pp.625-634, 2004. ,
LLVM A compilation framework for lifelong program Analysis and Transformation, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, 2004. ,
,
OSKI: A library of automatically tuned sparse matrix kernels, Journal of Physics: Conference Series, p.521, 2005. ,
, Warp3d
,
An Overview of the Open Research Compiler, Languages and Compilers for High Performance Computing: 17th International Workshop, LCPC 2004, pp.17-31, 2004. ,
Hitting the Memory Wall: Implications of the Obvious, vol.23, pp.20-24, 1995. ,
An Approach to Customization of Compiler Directives for Application-Specific Code Transformations, 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, pp.99-106, 2014. ,
POET: A Scripting Language For Applying Parameterized Source-to-source Program Transformations, Software Practice And Experience. University of Texas at, pp.675-706, 2012. ,