Runtime optimization of binary through vectorization transformations

Nabil Hallou

Thèse Année : 2017

Runtime optimization of binary through vectorization transformations

Optimisation dynamique de code binaire par des transformations vectorielles

(1, 2)

1
2

Nabil Hallou

Fonction : Auteur

Inria Rennes – Bretagne Atlantique

Pushing Architecture and Compilation for Application Performance

Résumé

In many cases, applications are not optimized for the hardware on which they run. This is due to backward compatibility of ISA that guarantees the functionality but not the best exploitation of the hardware. Many reasons contribute to this unsatisfying situation such as legacy code, commercial code distributed in binary form, or deployment on compute farms. Our work focuses on maximizing the CPU efficiency for the SIMD extensions. The first contribution is a lightweight binary translation mechanism that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependencies and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The second contribution is a runtime auto-vectorization of scalar loops. For this purpose, we use open source frame-works that we have tuned and integrated to (1) dynamically lift the x86 binary into the Intermediate Representation form of the LLVM compiler, (2) abstract hot loops in the polyhedral model, (3) use the power of this mathematical framework to vectorize them, and (4) finally compile them back into executable form using the LLVM Just-In-Time compiler. In most cases, the obtained speedups are close to the number of elements that can be simultaneously processed by the SIMD unit. The re-vectorizer and auto-vectorizer are implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require any rewriting of the binaries, and operates during program execution.

Les applications ne sont pas toujours optimisées pour le matériel sur lequel elles s'exécutent, comme les logiciels distribués sous forme binaire, ou le déploiement des programmes dans des fermes de calcul. On se concentre sur la maximisation de l'efficacité du processeur pour les extensions SIMD. Nous montrons que de nombreuses boucles compilées pour x86 SSE peuvent être converties dynamiquement en versions AVX plus récentes et plus puissantes. Nous obtenons des accélérations conformes à celles d'un compilateur natif ciblant AVX. De plus, on vectorise en temps réel des boucles scalaires. Nous avons intégré des logiciels libres pour (1) transformer dynamiquement le binaire vers la forme de représentation intermédiaire, (2) abstraire et vectoriser les boucles fréquemment exécutées dans le modèle polyédrique (3) enfin les compiler. Les accélérations obtenues sont proches du nombre d'éléments pouvant être traités simultanément par l'unité SIMD.

Mots clés

Performance Runtime optimization Dynamic binary optimization Vectorization Polyhedral model

Performance Optimisation dynamique de code binaire Vectorisation Modèle polyédrique

Domaines

Arithmétique des ordinateurs

Fichier principal

HALLOU_Nabil.pdf (1.51 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://theses.hal.science/tel-01795489

Soumis le : vendredi 18 mai 2018-15:01:08

Dernière modification le : jeudi 6 avril 2023-03:55:08

Archivage à long terme le : lundi 24 septembre 2018-21:44:42

Dates et versions

tel-01795489 , version 1 (23-12-2017)

tel-01795489 , version 2 (18-05-2018)

Identifiants

HAL Id : tel-01795489 , version 2

Citer

Nabil Hallou. Runtime optimization of binary through vectorization transformations. Computer Arithmetic. Université de Rennes, 2017. English. ⟨NNT : 2017REN1S120⟩. ⟨tel-01795489v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA STAR CENTRALESUPELEC INRIA2 UR1-THESES UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

478 Consultations

1289 Téléchargements

Runtime optimization of binary through vectorization transformations

Optimisation dynamique de code binaire par des transformations vectorielles

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager