Runtime optimization of binary through vectorization transformations

Abstract : In many cases, applications are not optimized for the hardware on which they run. This is due to backward compatibility of ISA that guarantees the functionality but not the best exploitation of the hardware. Many reasons contribute to this unsatisfying situation such as legacy code, commercial code distributed in binary form, or deployment on compute farms. Our work focuses on maximizing the CPU efficiency for the SIMD extensions. The first contribution is a lightweight binary translation mechanism that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependencies and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The second contribution is a runtime auto-vectorization of scalar loops. For this purpose, we use open source frame-works that we have tuned and integrated to (1) dynamically lift the x86 binary into the Intermediate Representation form of the LLVM compiler, (2) abstract hot loops in the polyhedral model, (3) use the power of this mathematical framework to vectorize them, and (4) finally compile them back into executable form using the LLVM Just-In-Time compiler. In most cases, the obtained speedups are close to the number of elements that can be simultaneously processed by the SIMD unit. The re-vectorizer and auto-vectorizer are implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require any rewriting of the binaries, and operates during program execution.
Document type :
Theses
Complete list of metadatas

Cited literature [72 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01795489
Contributor : Abes Star <>
Submitted on : Friday, May 18, 2018 - 3:01:08 PM
Last modification on : Friday, May 3, 2019 - 4:23:44 AM
Long-term archiving on : Monday, September 24, 2018 - 9:44:42 PM

File

HALLOU_Nabil.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01795489, version 1

Citation

Nabil Hallou. Runtime optimization of binary through vectorization transformations. Computer Arithmetic. Université Rennes 1, 2017. English. ⟨NNT : 2017REN1S120⟩. ⟨tel-01795489⟩

Share

Metrics

Record views

260

Files downloads

231