, En poursuivant l'analogie, nous aimerions également estimer les effets secondaires. Néanmoins, lorsque le programme comporte un grand nombre de fonctions, on retombe sur des problèmes d'explosion combinatoires évoqués en section 4.9.1. Plusieurs solutions existent pour éviter une explosion combinatoire du nombre de facteurs à analyser : ne regarder qu'un certains nombres d'interactions entre facteurs, ce qui revient à limiter la taille de u, ou partitionner l'application pour dégager des groupes de fonctions sans liens de calcul, diminuant ainsi le nombre de facteurs à analyser. De nombreux travaux ont été menés pour partitionner les applications HPC en morceaux indépendants, Cette décomposition produit des effets principaux f i px i q et des sommes d'interactions f u px u q, |u| ? 2 1 ou effets secondaires

, L'approche par codelets consiste à partitionner une application en codelets indépendants à l'aide d'outils comme CERE [22]. Pour pouvoir modifier, compiler et exécuter les codelets indépendemment du code d'origine, CERE capture la mémoire qu'une région (codelet) utilise durant l'exécution orignale du code. CERE travaille à la granularité d'une page mémoire en ne sauvegardant que les pages touchées, ce qui permet de limiter la taille des sauvegardes, Approche par codelets Un codelet est un morceau d'un code rejouable indépendemment de l'application originale

, L'approche par codelets s'est montrée très puissante pour caractériser les performances d'un code HPC, vol.150

, Dans le cadre de l'analyse numérique de code, nous voyons trois intérêts à pouvoir rejouer un noyau de calcul indépendamment de l'application originale, p.1

, Analyse de sensibilité sur une région L'intérêt des codelets est qu'ils représentent une petite section de code au sens qu'ils sont plus rapides à exécuter et qu'ils délimitent une zone restreinte. De plus, ils peuvent être rejoués de manière indépendante

, ? Caractérisation de noyaux de calcul De manière analogue à la caractérisation de performance, il serait intéressant de pouvoir caractériser numériquement une séquence de calculs pour par exemple prédire l'impact

, ASCII American Standard Code for Information Interchange, p.90

, ASIC Application-Specific Integrated Circuit, vol.28, p.29

, Glossary : BLAS CADNA Control of Accuracy and Debugging for Numerical Applications, BLAS Basic Linear Algebra Subprograms, vol.149, p.12

. Cestac-contrôle, Estimation Stochastique des Arrondis de Calculs. 11, Glossary : CESTAC CG Conjugate Gradient, vol.134, p.141

, CPU Central Processing Unit, vol.6, p.153

, CSV Comma Separated Values, vol.78, p.92

, DAZ Denormals As Zero, vol.145

, Glossary : DFT DPCG Deflated Preconditionned Conjugate Gradient, DFT Density Functional Theory. 96, vol.140, p.145

, DSA Discrete Stochastic Arithmetic

, Glossary : EFT FORTRAN FORmula TRANSlator, vol.105, p.97

, FPGA FieldProgrammable Gate Array, vol.6, p.29

. Ftz-flush-to and . Zero, , p.145

G. Gnu-c-compiler, , vol.52, p.153

. Gdb-gnu-debugger, , p.143

U. Gnu-gnu's-not, , vol.12

. Hpc-high-performance, Glossary : HPC HPCG High-Performance Conjugate Gradient. Glossary : HPCG IA-32 Intel Architecture, vol.143

, LLVM Low Level Virtual Machine, vol.9, p.154

, LTO Link Time Optimization, p.146

A. Mac-multiply, , vol.28, p.29

C. Mca-monte and . Arithmetic, Glossary : MCA MPI Message Passing Interface, vol.11, p.145

, Glossary : NaN PB Precision Bounding. 45, Glossary : PB RAM Random Access Memory, p.110

. Rr-random-rounding, Glossary : RR SIMD Single Instruction Multiple Data, vol.45, p.156

, Glossary : SMT SSA Static Single Assignment, SMT Satisfiability Modulo Theory, vol.8

, SSD Solid State Drive, p.110

, TLS Thread Local Storage, vol.57

, TPU Tensor Processor Unit, vol.2, p.29

, Glossary : ulp Glossaire etotal Nom de la variable dans l'application ABINIT contenant l'énergie totale (exprimée en Hartree) du système physique étudié. C'est le résultat d'intérêt majeur calculé par ABINIT, ulp Unit in the Last Place, vol.30, p.105

, Agnostique Qualificatif qui exprime l'indépendance vis à vis d'un contexte. Par exemple, un compilateur est dit architecture agnostique s'il peut compiler du code sur n'importe quelle architecture, vol.79, p.80

, AND ET logique avec pour table de vérité ?, p.62

, On parle d'annulation catastrophique lorsque les deux nombre sont très proches et font remonter des erreurs d'arrondi des derniers chiffres de la mantisse, vol.12, p.155

, Chaque backend doit implémenter un certain nombre d'opérations arithmétiques et logiques pour s'interfacer avec Verificarlo, vol.14, pp.143-146

, Backtrace Contexte d'appels d'une fonction. Présenté sous forme de pile d'appels en empilant la fonction appelante à chaque appel, vol.87, pp.91-93

, Backward error Pour un schéma numérique f donné, l'erreur inverse est la distance entre le problème exact x?et sa discrétisation en précision finie x. Si y?est la solution exacte, alors l'erreur inverse est le ? x tel que f px`? x q

, Basic block Un basic block est une suite d'instructions ayant les propriétés suivantes : un seul point d'entrée et un seul point de sortie ce qui impose que les instructions de branchements soient à la dernière instruction du basic block. Les instructions à l'intérieur d'un basic block sont donc toujours exécutées dans le même ordre, pp.81-83

, Benchmark Test de performance servant à mesurer la performance d'un code de calcul ou une architecture. Différentes métriques existent : Flops, énergie, mémoire, entrés/sorties, vol.3, p.154

, Bit Valeur binaire avec deux valeurs possibles {0,1}, vol.90, p.98

, Bit-flip Inversement incontrôlé d'un bit en matériel du à un phénomène physique aléatoire comme un rayon cosmique, p.110

, BLAS Librairie d'algèbre linaire regroupant les opérations de base (produit scalaire, produit matrice-vecteur, produit matrice-matrice, p.40

, Branch and Bound La méthode de séparation (branch) et d'évaluation (bound ) pour un problème d'optimisation consiste à construire un arbre de décision représentant les solutions possibles puis d'évaluer le coût d'une branche. Si l'on arrive à montrer que cette branche est trop coûteuse

, CESTAC Arithmétique stochastique créée par J. Vignes en 1974 qui propose d'introduire un arrondi aléatoire sur les résultats des calculs flottants pour modéliser les erreurs d'arrondi, vol.41, p.149

, Clang Compilateur frontend pour les langages C,C++,Ojective-C et Ojective-C++ qui utilise l'infrastructure LLVM comme back end, vol.33, p.128

, Convolution Un produit de convolution entre deux fonctions est le produit qui @f, g P L 2 , pfg qpxq " ?`8 8 f px´tqgptqdt, p.156

, Couverture de code La couverture de code mesure l'ensemble des flots d'exécution qu'un programme peut prendre. La couverture de code sert notamment à démontrer qu'un chemin d'exécution ne sera jamais pris (code mort) ou à vérifier qu'une batterie de tests teste bien l'ensemble des exécutions possibles d'un code, vol.88, p.153

, Criticité La criticité est la détermination et la hiérarchisation du degré d'importance et de la disponibilité d'un processus informatique, p.97

, Désassembler Le désassemblage consiste à décoder les instructions assembleurs binaires en mnéomnique, c'est-à-dire des instructions assembleurs lisibles par un humain, p.79

, Deep learning Les méthodes d'apprentissage par réseaux de neurones profonds sont des algorithmes combinant plusieurs couches de neurones. Ils surpassent les méthodes d'apprentissages plus conventionnelles d'où leur intérêt croissant, vol.6, p.110

, DFT La théorie de la fonctionnelle de la densité est une théorie physique qui permet l'étude de la densité électronique, c'est-à-dire la probabilité de trouver un électron dans un point de l'espace, vol.96, p.149

, Diagramme quantile-quantile Le diagramme quantile-quantile ou diagramme Q-Q est un graphe permettant de comparer la position de certains quantile dans la population observée avec leur position dans la population théorique, p.130

, Divide & Conquer Méthode algorithmique qui consiste à résoudre un problème en le divisant en sous-problèmes plus faciles à résoudre et combiner les solutions pour obtenir la solution du problème initial, p.116

, Dénormalisé Un dénormalisé est un nombre flottant dont le premier chiffre significatif est 0. Depuis le norme IEEE 754-2008 on parle également de sous-normal, vol.113, p.145

, EFT Ensemble d'algorithmes permettant de calculer exactement avec des nombres flottants, vol.40, p.149

, Erreur relative Distance entre deux nombres, normalisé par l'un des deux

, Exposant Si x " m?? e alors m est sa mantisse, e son exposant et ? sa base, vol.13, p.154

, Multi-physique Un code multi-physique modélise plusieurs disciplines physiques au sein d'une même simulation. Par exemple, les vibrations d'une aile sont le fruit de l'interaction entre les déformations de l'aile (contraintes mécaniques) et l'écoulement de l'air (mécanique des fluides)

, NaN Nombre flottant n'ayant pas de sens mathématique dans les nombres réels R. Par exemple ?´1 ou 0{0, vol.26, p.150

L. Obfusqué, obfusquage est une technique de sécurité informatique qui consiste à rendre un code illisible pour une personne externe si celle-ci n'a pas la clef pour le déchiffrer, p.79

, Octet Un octet représente 8 bits, p.90

, OpenMP Interface de programmation pour le calcul parallèle à mémoire partagée basé sur le modèle fork-join, vol.37

, OR OU logique avec pour table de vérité ?, p.62

, Ordre de récurrence L'ordre de récursion d'une suite est l'écart entre le plus petit et le plus grand terme de récurrence. Voir definition ?, p.116

, Overflow Un overflow ou dépassement de capacité vers?8 survient lorsque la valeur absolue du nombre flottant est plus grande que le plus grand nombre représentable dans un format donné, p.115

, Padding Technique de bourrage consitant à ajouter des 0 pour aligner un mot memoire, p.90

, PB Mode de bruitage de MCA permettant de détecter les annulations. Voir section 2.6.1, vol.150, pp.45-47

, Processus Programme en cours d'exécution sur une machine. Il dispose d'une mémoire qui lui est propre contrairement au thread, p.71

, Quantile Valeur qui divise un jeu de données en parts égales

, Race condition Situation de compétition ou deux threads essaient d'accéder à une ressource partagée. Sans mécanisme d'exclusion mutuelle, cette situation produit des bugs difficiles à reproduire, p.156

, Racine Solutions x de l'équation f pxq " 0 pour une fonction f donnée, vol.118

, RR Mode d'arrondi aléatoire de MCA. Voir section 2.6.1. 45, vol.46, p.150

, sémantique La sémantique d'un programme est le comportement attendu d'un programme, c'està-dire ce que le programme calcule réellement. Elle décrit tout les états possibles du programme

, Satisfiabilité Une formule logique propositionnelle satisfiable est une formule dont une instance des variable rend la formule vraie

, Section spatiale Section de code contenant plusieurs unités spatiales. La taille d'une unité dépend de la résolution spatiale utilisée (fonction, ligne source, instructions, p.154

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis et al., Tensorflow : A system for large-scale machine learning, 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp.265-283, 2016.

R. Andraka, A survey of CORDIC algorithms for FPGA based computers, Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays, pp.191-200, 1998.

H. Anzt, J. Dongarra, G. Flegar, J. Nicholas, and E. Higham, Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers, Concurrency and Computation : Practice and Experience, p.4460, 2017.

M. Arafa, B. Fahim, S. Kottapalli, A. Kumar, P. Lily et al., Cascade Lake : Next generation Intel Xeon scalable processor, IEEE Micro, vol.39, issue.2, pp.29-36, 2019.

J. Bajard, D. Michelucci, J. Moreau, and J. Muller, Introduction to the Special Issue "Real Numbers and Computers, The Journal of Universal Computer Science, pp.436-438, 1996.

T. Earl, T. Barr, V. Vo, Z. Le, and . Su, Automatic detection of floating-point exceptions, ACM Sigplan Notices, vol.48, pp.549-560, 2013.

J. Batlle, P. Mart?, J. Ridao, and . Amat, A new FPGA/DSP-based parallel architecture for real-time image processing, Real-Time Imaging, vol.8, issue.5, pp.345-356, 2002.

H. Becker, E. Darulova, M. O. Myreen, and Z. Tatlock, Icing : Supporting fast-math style optimizations in a verified compiler, International Conference on Computer Aided Verification, pp.155-173, 2019.

P. Benard, . Lartigue, R. Moureau, and . Mercier, Large-eddy simulation of the lean-premixed PRECCINSTA burner with wall heat loss, Proceedings of the Combustion Institute, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02130417

P. Bénard, G. Lartigue, V. Moureau, and R. Mercier, Large-eddy simulation of the lean-premixed PRECCINSTA burner with wall heat loss, Proceedings of the Combustion Institute, vol.37, pp.5233-5243, 2019.

F. Benz, A. Hildebrandt, and S. Hack, A dynamic program analysis to find floating-point accuracy problems, ACM SIGPLAN Notices, vol.47, pp.453-462, 2012.

J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi et al., Task scheduling strategies for workflow-based applications in grids, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, vol.2, pp.759-767, 2005.

S. Boldo, Preuves formelles en arithmétiques à virgule flottante, École normale supérieure (sciences), 2004.

S. Boldo, F. Clément, J. Filliâtre, M. Mayero, G. Melquiond et al., Wave equation numerical resolution : a comprehensive mechanized proof of a C program, Journal of Automated Reasoning, vol.50, issue.4, pp.423-456, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00649240

S. Boldo, J. Jourdan, X. Leroy, and G. Melquiond, Verified compilation of floating-point computations, Journal of Automated Reasoning, vol.54, issue.2, pp.135-163, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00862689

S. Boldo and G. Melquiond, Flocq : A unified library for proving floating-point algorithms in Coq, 2011 IEEE 20th Symposium on Computer Arithmetic, pp.243-252, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00534854

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT'2010, pp.177-186, 2010.

D. Bruening, T. Garnett, and S. Amarasinghe, An infrastructure for adaptive dynamic optimization, International Symposium on Code Generation and Optimization, pp.265-275, 2003.

B. Buck, K. Jeffrey, and . Hollingsworth, An API for runtime code patching. The International, Journal of High Performance Computing Applications, vol.14, issue.4, pp.317-329, 2000.

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds et al., Learning to rank using gradient descent, Proceedings of the 22nd International Conference on Machine learning (ICML-05), pp.89-96, 2005.

F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer et al., Toward exascale resilience : 2014 update, Supercomputing frontiers and innovations, vol.1, issue.1, pp.5-28, 2014.

P. Castro, C. Akel, E. Petit, M. Popov, and W. Jalby, CERE : LLVM-based Codelet Extractor and REplayer for piecewise benchmarking and optimization, ACM Transactions on Architecture and Code Optimization (TACO), vol.12, issue.1, p.6, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01417214

R. Chandra, L. Dagum, D. Kohr, R. Menon, D. Maydan et al., , 2001.

Y. Chatelain, P. Castro, E. Petit, D. Defour, J. Bieder et al., Veritracer : Context-enriched tracer for floating-point arithmetic analysis, IEEE 25th Symposium on Computer Arithmetic (ARITH), pp.61-68, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01989607

Y. Chatelain, E. Petit, P. De-oliveira, G. Castro, D. Lartigue et al., Automatic exploration of reduced floating-point representations in iterative methods, European Conference on Parallel Processing, pp.481-494, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02564972

S. Cherubin, D. Cattaneo, M. Chiari, A. D. Bello, and G. Agosta, TAFFO : Tuning assistant for floating to fixed point optimization, IEEE Embedded Systems Letters, 2019.

W. Chiang, G. Gopalakrishnan, Z. Rakamaric, and A. Solovyev, Efficient search for inputs causing high floating-point errors, ACM Sigplan Notices, vol.49, pp.43-52, 2014.

A. Chorin, Numerical solution of the Navier-Stokes equations, Mathematics of computation, vol.22, issue.104, pp.745-762, 1968.

V. David, G. V. Chudnovsky, and . Chudnovsky, Approximations and complex multiplication according to ramanujan, Pi : A Source Book, pp.596-622, 2004.

E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield et al., Serving dnns in real time at datacenter scale with project brainwave, IEEE Micro, vol.38, issue.2, pp.8-20, 2018.

A. Cimatti, A. Griggio, J. Bastiaan, R. Schaafsma, and . Sebastiani, The MathSAT 5 SMT solver, International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp.93-107, 2013.

D. Sylvain-collange, S. Defour, R. Graillat, and . Iakymchuk, Full-speed deterministic bit-accurate parallel floating-point summation on multi-and many-core architectures, HAL-CCSD, 2014.

, Standards Committee and American National Standards Institute. IEEE standard for binary floating-point arithmetic, IEEE Computer Society, vol.754, 1985.

. Ieee-standards-committee, 754-2008 IEEE standard for floating-point arithmetic, p.517, 2008.

S. Cools, E. Emrullah-fatih-yetkin, L. Agullo, W. Giraud, and . Vanroose, Analysis of rounding error accumulation in Conjugate Gradients to improve the maximal attainable accuracy of pipelined CG, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01262716

J. Martyn, D. Corden, and . Kreitzer, Consistency of floating-point results using the Intel compiler or why doesn't my application always give the same answer, 2009.

. Marius-a-cornea-hasegan, A. Roger, P. Golliver, and . Markstein, Correctness proofs outline for Newton-Raphson based floating-point divide and square root algorithms, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No. 99CB36336), pp.96-105, 1999.

M. Courbariaux, J. David, and Y. Bengio, Low precision storage for deep learning, 2014.

P. Cousot and R. Cousot, Abstract interpretation : a unified lattice model for static analysis of programs by construction or approximation of fixpoints, Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pp.238-252, 1977.

R. Cytron, J. Ferrante, K. Barry, . Rosen, F. Mark-n-wegman et al., Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.13, issue.4, pp.451-490, 1991.

N. Damouche and M. Martel, Mixed precision tuning with Salsa, PECCS, pp.185-194, 2018.

E. Darulova, E. Horn, and S. Sharma, Sound mixed-precision optimization with rewriting, 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS), pp.208-219, 2018.

E. Darulova and V. Kuncak, Sound compilation of reals, Acm Sigplan Notices, vol.49, pp.235-248, 2014.

L. Florent-de-dinechin, J. Forget, Y. Muller, and . Uguen, Posits : the good, the bad and the ugly, Proceedings of the Conference for Next Generation Arithmetic, 2019.

C. Q. Florent-de-dinechin, G. Lauter, and . Melquiond, Assisted verification of elementary functions using Gappa, Proceedings of the 2006 ACM symposium on Applied computing, pp.1318-1322, 2006.

D. Luiz-henrique, J. Figueiredo, and . Stolfi, Affine arithmetic : concepts and applications, Numerical Algorithms, vol.37, issue.1-4, pp.147-158, 2004.

L. De-moura and N. Bjørner, Z3 : An efficient SMT solver, International conference on Tools and Algorithms for the Construction and Analysis of Systems, pp.337-340, 2008.

J. Dean and S. Ghemawat, MapReduce : simplified data processing on large clusters, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.

D. Defour, Contribution au calcul sur GPU : considérations arithmétiques et architecturales. Habilitation à Diriger des Recherches, 2014.

D. Defour, FP-ANR : A representation format to handle floating-point cancellation at run-time, 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH), pp.76-83, 2018.
URL : https://hal.archives-ouvertes.fr/lirmm-01549601

T. Dekker, A floating-point technique for extending the available precision, Numerische Mathematik, vol.18, issue.3, pp.224-242, 1971.

D. Delmas, E. Goubault, S. Putot, J. Souyris, K. Tekkal et al., Towards an industrial use of FLUCTUAT on safety-critical avionics software, International Workshop on Formal Methods for Industrial Critical Systems, pp.53-69, 2009.

J. Demmel and H. Diep-nguyen, Fast reproducible floating-point summation, 2013 IEEE 21st Symposium on Computer Arithmetic, pp.163-172, 2013.

J. Demmel and H. Diep-nguyen, Numerical reproducibility and accuracy at exascale, IEEE 21st Symposium on Computer Arithmetic, pp.235-237, 2013.

J. E. Dendy, Black box multigrid, Journal of Computational Physics, vol.48, issue.3, pp.366-386, 1982.

C. Denis, P. D. Castro, and E. Petit, Verificarlo : Checking floating point accuracy through monte carlo arithmetic, 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp.55-62, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01417293

J. Detrey, Arithmétiques réelles sur FPGA : virgule fixe, virgule flottante et système logarithmique, 2007.

J. Detrey, E. Florent-de-dinechin, and . Lip, Opérateurs trigonométriques en virgule flottante sur FPGA, RenPar, vol.17, pp.96-105, 2006.

L. Djoudi, D. Barthou, P. Carribault, C. Lemuet, J. Acquaviva et al., Modular assembler quality analyzer and optimizer for itanium 2, The 4th Workshop on EPIC architectures and compiler technology, vol.200, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00141075

J. Dongarra, E. Meuer, and . Strohmaier, Top 500 supercomputers. website, 2008.

E. Hadi-esmaeilzadeh, R. Blem, K. St-amant, D. Sankaralingam, and . Burger, Dark silicon and the end of multicore scaling, 38th Annual international symposium on computer architecture (ISCA), pp.365-376, 2011.

G. Even, -. Peter, W. E. Seidel, and . Ferguson, A parametric error analysis of Goldschmidt's division algorithm, Journal of Computer and System Sciences, vol.70, issue.1, pp.118-139, 2005.

F. Févotte and B. Lathuiliere, VERROU : a CESTAC evaluation without recompilation, p.47, 2016.

E. George and . Forsythe, Reprint of a note on rounding-off errors, SIAM Review, vol.1, issue.1, p.66, 1959.

, Message Passing Interface Forum. MPI : A Message-Passing Interface Standard, Version 3.1. High Performance Computing Center Stuttgart (HLRS), 2015.

R. Fourer, M. David, B. W. Gay, and . Kernighan, AMPL : A modeling language for mathematical programming, 1993.

L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, and P. Zimmermann, MPFR : A multiple-precision binary floating-point library with correct rounding, ACM Transactions on Mathematical Software (TOMS), vol.33, issue.2, p.13, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00070266

M. Frechtling, H. W. Philip, and . Leong, Mcalib : Measuring sensitivity to rounding error with monte carlo programming, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.37, issue.2, 2015.

E. Gabriel, E. Graham, G. Fagg, T. Bosilca, . Angskun et al., Open MPI : Goals, concept, and design of a next generation MPI implementation, European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting, pp.97-104

. Springer, , 2004.

G. Debugger, org/software/gdb, 2019.

D. Goldberg, What every computer scientist should know about floating-point arithmetic, ACM Computing Surveys (CSUR), vol.23, issue.1, pp.5-48, 1991.

M. Goldstein, Significance arithmetic on a digital computer, Communications of the ACM, vol.6, issue.3, pp.111-117, 1963.

X. Gonze and F. Jollet, Recent developments in the ABINIT software package, Computer Physics Communications, vol.205, pp.106-131, 2016.
URL : https://hal.archives-ouvertes.fr/cea-01849847

E. Goubault, Static analysis by abstract interpretation of numerical programs and systems, and fluctuat, International Static Analysis Symposium, pp.1-3, 2013.
URL : https://hal.archives-ouvertes.fr/cea-01834987

S. Graillat, F. Jézéquel, R. Picot, F. Févotte, and B. Lathuiliere, PROMISE : floating-point precision tuning with stochastic arithmetic, Proceedings of the 17th International Symposium on Scientific Computing, Computer Arithmetics and Verified Numerics (SCAN), pp.98-99, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01355005

S. Graillat, F. Jézéquel, R. Picot, F. Févotte, and B. Lathuilière, Auto-tuning for floating-point precision with discrete stochastic arithmetic, Journal of Computational Science, vol.36, p.101017, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01331917

S. Graillat and V. Ménissier-morain, Error-free transformations in real and complex floating point arithmetic, Proceedings of the International Symposium on Nonlinear Theory and its Applications, pp.341-344, 2007.
URL : https://hal.archives-ouvertes.fr/hal-01306229

A. Griewank, D. Juedes, and J. Utke, Algorithm 755 : ADOL-C : a package for the automatic differentiation of algorithms written in C/C++, ACM Transactions on Mathematical Software (TOMS), vol.22, issue.2, pp.131-167, 1996.

W. Gropp, D. William, A. Gropp, E. Lusk, A. Lusk et al., Using MPI : portable parallel programming with the message-passing interface, vol.1, 1999.

W. Gropp and E. Lusk, User's guide for mpich, a portable implementation of MPI, 1996.

. Posit-working-group, Posit standard documentation, Posit Standard Documentation, 2018.

L. John, . Gustafson, and . Isaac-t-yonemoto, Beating floating point at its own game : Posit arithmetic, Supercomputing Frontiers and Innovations, vol.4, issue.2, pp.71-86, 2017.

A. Haidar, S. Tomov, J. Dongarra, and N. J. Higham, Harnessing GPU Tensor Cores for fast fp16 arithmetic to speed up mixed-precision iterative refinement solvers, Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC '18, vol.47, pp.1-47, 2018.

J. Harrison, Floating point verification in HOL Light : the exponential function, 1997.

J. Harrison, A machine-checked theory of floating point arithmetic, International Conference on Theorem Proving in Higher Order Logics, pp.113-130, 1999.

J. Harrison, HOL light : An overview, International Conference on Theorem Proving in Higher Order Logics, pp.60-66, 2009.

L. Hascoet and V. Pascual, The tapenade automatic differentiation tool : Principles, model, and specification, ACM Transactions on Mathematical Software (TOMS), vol.39, issue.3, p.20, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00695839

C. Martin, T. Herbordt, Y. Vancourt, B. Gu, A. Sukhwani et al., Achieving high performance with FPGA-based computing, Computer, vol.40, issue.3, pp.50-57, 2007.

J. Nicholas and . Higham, Accuracy and stability of numerical algorithms, vol.80, 2002.

N. Ho, E. Manogaran, W. Wong, and A. Anoosheh, Efficient floating point precision tuning for approximate computing, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp.63-68, 2017.

G. Huet, G. Kahn, and C. Paulin-mohring, , 2002.

R. Iakymchuk, S. Collange, D. Defour, and S. Graillat, ExBLAS : Reproducible and accurate BLAS library, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01140280

. Intel, , 2008.

, Intel Corporation. Intel64 and IA-32 Architectures Software Developer's Manual, vol.1, 2019.

, International Union of Pure and Applied Chemistry (IUPAC), and International Union of Pure and Applied Physics (IUPAP). ISO and OIML : The international vocabulary of metrology-basic and general concepts and associated terms (VIM), Bureau International des Poids et Mesures (BIPM), 2012.

A. Ioualalen and M. Martel, Sardana : an automatic tool for numerical accuracy optimization, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00698619

F. Jézéquel and J. Chesneaux, CADNA : a library for estimating round-off error propagation, Computer Physics Communications, vol.178, issue.12, pp.933-955, 2008.

P. Norman, C. Jouppi, N. Young, D. Patil, G. Patterson et al., In-datacenter performance analysis of a tensor processing unit, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp.1-12, 2017.

W. Kahan, Pracniques : further remarks on reducing truncation errors, Communications of the ACM, vol.8, issue.1, p.40, 1965.

W. Kahan, A logarithm too clever by half, 2004.

D. Kalamkar, D. Mudigere, N. Mellempudi, D. Das, K. Banerjee et al., A study of bfloat16 for deep learning training, 2019.

A. Ashfaq, . Khokhar, K. Viktor, M. E. Prasanna, C. Shaaban et al., Heterogeneous computing : Challenges and opportunities, Computer, vol.26, issue.6, pp.18-27, 1993.

E. Donald and . Knuth, Art of computer programming, Seminumerical algorithms, vol.2, 2014.

U. Köster, T. Webb, X. Wang, M. Nassar, K. Arjun et al., Flexpoint : An adaptive numerical format for efficient training of deep neural networks, Advances in neural information processing systems, pp.1742-1752, 2017.

. Ev-krishnamurthy, On optimal ierative schemes for high-speed division, IEEE Transactions on Computers, vol.100, issue.3, pp.227-231, 1970.

U. Kulisch and V. Snyder, The exact dot product as basic tool for long interval arithmetic, Computing, vol.91, issue.3, pp.307-313, 2011.

. Ht-kung, Let's design algorithms for VLSI systems, 1979.

O. Michael, J. K. Lam, . Hollingsworth, . Bronis-r-de, M. P. Supinski et al., Automatically adapting programs for mixed-precision floating-point computation, Proceedings of the 27th international ACM conference on International conference on supercomputing, pp.369-378, 2013.

O. Michael, J. K. Lam, G. W. Hollingsworth, and . Stewart, Dynamic floating-point cancellation detection, Parallel Computing, vol.39, issue.3, pp.146-155, 2013.

P. Lancaster, Error analysis for the Newton-Raphson method, Numerische Mathematik, vol.9, issue.1, pp.55-68, 1966.

P. Langlois, R. Nheili, and C. Denis, Recovering numerical reproducibility in hydrodynamic simulations, 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp.63-70, 2016.
URL : https://hal.archives-ouvertes.fr/lirmm-01274671

G. Lartigue, C. Meier, and . Bérat, Experimental and numerical investigation of self-excited combustion oscillations in a scaled gas turbine combustor, Applied thermal engineering, vol.24, pp.1583-1592, 2004.

C. Lattner and V. Adve, LLVM : A compilation framework for lifelong program analysis & transformation, Proceedings of the international symposium on Code generation and optimization : feedback-directed and runtime optimization, p.75, 2004.

L. Chuck, R. J. Lawson, . Hanson, R. David, F. T. Kincaid et al., Basic linear algebra subprograms for Fortran usage, 1977.

M. Leeser, S. Mukherjee, J. Ramachandran, and T. Wahl, Make it real : effective floating-point reasoning via exact arithmetic, Proceedings of the conference on Design, p.117, 2014.

X. Leroy, The Compcert verified compiler. Documentation and user's manual. INRIA Paris-Rocquencourt, vol.53, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01399482

L. Google, Using bfloat16 with tensorflow models, 2019.

, DragonEgg -using as a GCC backend, LLVM, 2014.

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., Pin : building customized program analysis tools with dynamic instrumentation, Acm sigplan notices, vol.40, pp.190-200, 2005.

, Commisseriat l'énergie atomique et aux énergies alternatives (CEA), 2016.

V. Magron, G. Constantinides, and A. Donaldson, Certified roundoff error bounds using semidefinite programming, ACM Transactions on Mathematical Software (TOMS), vol.43, issue.4, p.34, 2017.

M. Malandain, Massively parallel simulation of low-Mach number turbulent flows. Theses, INSA de Rouen, 2013.
URL : https://hal.archives-ouvertes.fr/tel-00801502

M. Malandain, N. Maheu, and V. Moureau, Optimization of the deflated Conjugate Gradient algorithm for the solving of elliptic equations on massively parallel machines, Journal of Computational Physics, vol.238, pp.32-47, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01657525

N. Manjikian and T. Abdelrahman, Array data layout for the reduction of cache conflicts, Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems, pp.1-8, 1995.

P. Markstein, Software division and square root using Goldschmidt's algorithms, Proceedings of the 6th Conference on Real Numbers and Computers (RNC'6), vol.123, pp.146-157, 2004.

C. M. and M. , Application-Specific Integrated Circuits (asics), Bebop to the Boolean Boogie, pp.235-249, 2009.

H. Menon, M. Lam, D. Kuffour, M. Schordan, K. Llyod et al., ADAPT : Algorithmic differentiation for floating-point precision tuning, 2018.

P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen et al., , 2017.

S. Mittal, S. Jeffrey, and . Vetter, A survey of CPU-GPU heterogeneous computing techniques, ACM Computing Surveys (CSUR), vol.47, issue.4, p.69, 2015.

J. Monaghan, L. Trouche, and J. M. Borwein, Tools and mathematics, 2016.

D. Monniaux, The pitfalls of verifying floating-point computations, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.30, issue.3, p.12, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00128124

E. Ramon and . Moore, Methods and applications of interval analysis, 1979.

V. Moureau, L. Domingo, and . Vervisch, Design of a massively parallel CFD code for complex geometries, Comptes Rendus Mécanique, issue.339, pp.141-148, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01672172

J. Muller, N. Brisebarre, F. De-dinechin, C. Jeannerod, V. Lefevre et al., Handbook of floating-point arithmetic, 2010.
URL : https://hal.archives-ouvertes.fr/ensl-00379167

Z. Navabi, VHDL : Analysis and modeling of digital systems, 1997.

V. Nedialko-s-nedialkov, S. Kreinovich, and . Starks, Interval arithmetic, affine arithmetic, taylor series methods : why, what next ? Numerical Algorithms, vol.37, pp.325-336, 2004.

N. Nethercote and J. Seward, Valgrind : a framework for heavyweight dynamic binary instrumentation, ACM Sigplan notices, vol.42, pp.89-100, 2007.

J. Newsome and D. Song, Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software, 2005.

A. Roy and . Nicolaides, Deflation of conjugate gradients with applications to boundary value problems, SIAM Journal on Numerical Analysis, vol.24, issue.2, pp.355-365, 1987.

A. Nötzli and F. Brown, LifeJacket : verifying precise floating-point optimizations in LLVM, Proceedings of the 5th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, pp.24-29, 2016.

T. Ogita, M. Siegfried, S. Rump, and . Oishi, Accurate sum and dot product, SIAM Journal on Scientific Computing, vol.26, issue.6, pp.1955-1988, 2005.

P. Osmialowski, How the flang frontend works, Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC 2017, 2017.

G. Paganelli and W. Ahrendt, Verifying (in-) stability in floating-point programs by increasing precision, using SMT solving, 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp.209-216, 2013.

P. Panchekha, A. Sanchez-stern, J. R. Wilcox, and Z. Tatlock, Automatically improving accuracy for floating point expressions, In ACM SIGPLAN Notices, vol.50, pp.1-11, 2015.

B. Stott-parker, P. R. Pierce, and . Eggert, Monte carlo arithmetic : how to gamble with floating point and win, Computing in Science & Engineering, vol.2, issue.4, p.58, 2000.

D. Stott and P. , Monte Carlo Arithmetic : exploiting randomness in floating-point arithmetic, 1997.

K. Persson, Materials data on BaTiO3 (sg :99) by Materials Project, vol.7, 2014.

E. Petit, Vers un partitionnement automatique d'applications en codelets spéculatifs pour les systèmes hétérogènes à mémoires distribuées, 2009.

C. D. Pierce and P. Moin, Progress-variable approach for large-eddy simulation of nonpremixed turbulent combustion, Journal of Fluid Mechanics, vol.504, pp.73-97, 2004.

M. Popov, C. Akel, W. Jalby, and P. Castro, Piecewise holistic autotuning of compiler and runtime parameters, European Conference on Parallel Processing, pp.238-250, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01417211

J. Porter, Google employee calculates pi to record 31 trillion digits : But remember, only 40 or so of them are actually useful. Web site, 2019.

M. Michael, E. Resch, and . Gabriel, Supercomputers in grids, Cloud, Grid and High Performance Computing : Emerging Applications, pp.1-9, 2011.

N. Revol, Introduction à l'arithmétique par intervalles. Rapport de recherche RR-4297, INRIA, 2001.

C. Rubio-gonzález, C. Nguyen, B. Mehne, K. Sen, J. Demmel et al., Floatingpoint precision tuning using blame analysis, Proceedings of the 38th International Conference on Software Engineering, pp.1074-1085, 2016.

C. Rubio-gonzález, C. Nguyen, H. D. Nguyen, J. Demmel, W. Kahan et al., Precimonious : Tuning assistant for floating-point precision, SC'13 : Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp.1-12, 2013.

A. Sanchez-stern, P. Panchekha, S. Lerner, and Z. Tatlock, Finding root causes of floating point error with herbgrind, 2017.

G. Saporta, Probabilités, analyse des données et statistique. Editions Technip, 2006.

G. Sawaya, M. Bentley, I. Briggs, G. Gopalakrishnan, and D. H. Ahn, FLiT : Cross-platform floating-point result-consistency tester and workload, 2017 IEEE international symposium on workload characterization (IISWC), pp.229-238, 2017.

E. Schkufza, R. Sharma, and A. Aiken, Stochastic optimization of floating-point programs with tunable precision, ACM SIGPLAN Notices, vol.49, pp.53-64, 2014.

K. Schloegel, G. Karypis, and V. Kumar, Graph partitioning for high performance scientific simulations. Army High Performance Computing Research Center, 2000.

J. Edward, T. Schwartz, D. Avgerinos, and . Brumley, All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask), Security and privacy (SP), 2010 IEEE symposium on, pp.317-331, 2010.

A. Shamir, A survey on mesh segmentation techniques, vol.27, pp.1539-1556, 2008.

S. Samuel, M. B. Shapiro, and . Wilk, An analysis of variance test for normality (complete samples), Biometrika, vol.52, issue.3/4, pp.591-611, 1965.

E. James and . Smith, Decoupled access/execute computer architectures, ACM SIGARCH Computer Architecture News, vol.10, pp.112-119, 1982.

W. Steven and . Smith, The scientist and engineer's guide to digital signal processing, 1997.

D. Sohier, P. Castro, F. Févotte, B. Lathuilière, E. Petit et al., Confidence intervals for stochastic arithmetic, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01827319

A. Solovyev, S. Marek, I. Baranowski, C. Briggs, Z. Jacobsen et al., Rigorous estimation of floating-point round-off errors with symbolic taylor expansions, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.41, issue.1, p.20, 2018.

M. Souza, M. Borges, M. Amorim, and C. S. P?s?reanu, CORAL : solving complex constraints for symbolic pathfinder, NASA Formal Methods Symposium, pp.359-374, 2011.

N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole et al., The ARM scalable vector extension, IEEE Micro, vol.37, issue.2, pp.26-39, 2017.

B. Stevens and S. Bony, What are climate models missing ?, Science, vol.340, issue.6136, pp.1053-1054, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01109028

G. Tagliavini, S. Mach, D. Rossi, A. Marongiu, and L. Benini, A transprecision floating-point platform for ultra-low power computing, Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.1051-1056, 2018.

F. Alexandre, . Tenca, D. Milos, and . Ercegovac, A variable long-precision arithmetic unit design for reconfigurable coprocessor architectures, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No. 98TB100251), pp.216-225, 1998.

D. Thomas and P. Moorby, The Verilog R Hardware Description Language, 2008.

A. Tisserand, P. Marchal, and C. Piguet, An on-line arithmetic based FPGA for low-power custom computing, International Workshop on Field Programmable Logic and Applications, pp.264-273, 1999.

J. Tissot, Sur la décomposition ANOVA et l'estimation des indices de Sobol'. Application à un modèle d'écosystème marin, 2012.

L. Titolo, A. Marco, M. Feliú, C. Moscato, and . Muñoz, An abstract interpretation framework for the round-off error analysis of floating-point programs, International Conference on Verification, Model Checking, and Abstract Interpretation, pp.516-537

. Springer, , 2018.

G. Honey-durga-tiwari, C. M. Gankhuyag, Y. Kim, and . Cho, Multiplier design based on ancient Indian Vedic Mathematics, International SoC Design Conference, vol.2, p.65, 2008.

J. D. Ullman, NP-complete scheduling problems, Journal of Computer and System Sciences, vol.10, issue.3, pp.384-393, 1975.

. Michael-l-van-de, D. E. Vanter, M. E. Post, and . Zosel, HPC needs a tool strategy, Proceedings of the second international workshop on Software engineering for high performance computing system applications, pp.55-59, 2005.

V. Vassiliadis, J. Riehme, J. Deussen, K. Parasyris, and C. D. Antonopoulos, Towards automatic significance analysis for approximate computing, IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp.182-193, 2016.

J. Vignes and . La-porte, Error analysis in computing, 1974.

N. Whitehead and A. Fit-florea, Precision & performance : Floating point and IEEE-754 compliance for nvidia GPUs. rn (A+ B), vol.21, pp.18749-19424, 2011.

J. Hardy-wilkinson, Rounding errors in algebraic processes, Courier Corporation, 1994.

J. Tjalling and . Ypma, Historical development of the Newton-Raphson method, SIAM review, vol.37, issue.4, pp.531-551, 1995.

A. Zeller, Yesterday, my program worked. Today, it does not. Why ?, ACM SIGSOFT Software engineering notes, vol.24, pp.253-267, 1999.