I. Bediaga, H. Chanal, P. Hopchev, S. Cadeddu, S. Stoica et al., Framework TDR for the LHCb upgrade, tech. rep, 2012.

R. Aaij, M. Fontana, R. L. Gac, E. A. Zacharjasz, R. Schwemmer et al., Upgrade trigger : Biannual performance update, 2017.

G. Guennebaud and B. Jacob, Eigen v3, 2016.

S. Tomov, R. Nath, P. Du, and J. Dongarra, Magma, matrix algebra on gpu and multicore architectures

, Intel(R) math kernel library, MKL

I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin et al., High-performance matrix-matrix multiplications of very small matrices, European Conference on Parallel Processing, pp.659-671, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01409286

J. Shin, M. W. Hall, J. Chame, C. Chen, and P. D. Hovland, Autotuning and specialization : Speeding up matrix multiply for small matrices with compiler technology, Software Automatic Tuning, pp.353-370, 2011.

A. Heinecke, G. Henry, M. Hutchinson, and H. Pabst, LIBXSMM : accelerating small matrix multiplications by runtime code generation, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p.84, 2016.

X. Tian, H. Saito, S. V. Preis, E. N. Garcia, S. S. Kozhukhov et al., Effective SIMD vectorization for Intel Xeon Phi coprocessors, Scientific Programming, vol.2015, pp.1-14, 2015.

D. G. Spampinato and M. Püschel, A basic linear algebra compiler, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, p.23, 2014.

A. R. Terán, L. Lacassagne, A. H. Zahraee, and M. Gouiffes, Real-time covariance tracking algorithm for embedded systems, Design and Architectures for Signal and Image Processing (DASIP), 2013 Conference on, pp.104-111, 2013.

F. Laguzet, A. Romero, M. Gouiffès, L. Lacassagne, and D. Etiemble, Color tracking with contextual switching : real-time implementation on CPU, Journal of Real-Time Image Processing, vol.10, issue.2, pp.403-422, 2015.

, Optimizing compilers for modern architectures : a dependence-based approach, vol.9, p.11, 2002.

A. Cohen, M. Sigler, S. Girbal, O. Temam, D. Parello et al., Facilitating the search for compositions of program transformations, Proceedings of the 19th annual international conference on Supercomputing, pp.151-160, 2005.
URL : https://hal.archives-ouvertes.fr/hal-01257296

M. J. Flynn, Some computer organizations and their effectiveness, IEEE Transactions on Computers, issue.21, pp.948-960, 1972.

P. Estérie, J. Falcou, M. Gaunard, and J. Lapresté, Boost.SIMD : Generic programming for portable SIMDization, Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, WPMVP '14, pp.1-8, 2014.

T. Ewart, F. Delalondre, and F. Schürmann, Cyme : A library maximizing SIMD computation on user-defined containers, Proceedings of the 29th International Conference on Supercomputing, vol.8488, pp.440-449, 2014.

, Header-only zero-overhead c++ wrapper for simd intrinsics of multiple instruction sets, pp.7-9

A. Cassagne, B. L. Gal, C. Leroux, O. Aumage, and D. Barthou, An efficient, portable and generic library for successive cancellation decoding of polar codes, International Workshop on Languages and Compilers for Parallel Computing, pp.303-317, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01203105

P. Karpi?ski and J. Mcdonald, A high-performance portable abstract interface for explicit SIMD vectorization, Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'17, pp.21-28, 2017.

M. Kretz and V. Lindenstruth, Vc : A C++ library for explicit vectorization, Software : Practice and Experience, vol.42, issue.11, pp.1409-1430, 2012.

A. Fog, C++ vector class library, pp.2017-2024, 2017.

D. Khaldi, P. Jouvelot, C. Ancourt, and F. Irigoin, Task parallelism and data distribution : An overview of explicit parallel programming languages, International Workshop on Languages and Compilers for Parallel Computing, pp.174-189, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00742536

M. Pharr and W. R. Mark, ispc : A SPMD compiler for high-performance CPU programming, Innovative Parallel Computing (InPar), pp.1-13, 2012.

S. Wienke, P. Springer, C. Terboven, and D. Mey, OpenACC-first experiences with real-world applications, European Conference on Parallel Processing, pp.859-870, 2012.

A. Munshi, The OpenCL specification, Hot Chips 21 Symposium (HCS), pp.1-314, 2009.

L. Dagum and R. Menon, OpenMP : an industry standard API for shared-memory programming, Computational Science & Engineering, IEEE, vol.5, issue.1, pp.46-55, 1998.

, IEEE standard for information technology-portable operating system interface (POSIX(R)) base specifications, issue 7, IEEE Std, pp.1-3951, 2018.

, Intel Corporation, Intel R ? C++ Compiler 18.0 Developer Guide and Reference, pp.2018-2024, 2018.

, Programming languages -C," standard, International Organization for Standardization, 2011.

G. H. Barnes, R. M. Brown, M. Kato, D. J. Kuck, D. L. Slotnick et al., The ILLIAC IV computer, IEEE Transactions on computers, vol.100, issue.8, pp.746-757, 1968.

J. Watson, The Texas Instruments Advanced Scientific Computer, International Workshop on(AFIPS), vol.00, p.1899

C. J. Purcell, The Control Data STAR-100 : performance measurements, Proceedings of the May 6-10, pp.385-387, 1974.

R. M. Russell, The CRAY-1 computer system, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978.

S. F. Reddaway, DAP-a distributed array processor, ACM SIGARCH Computer Architecture News, vol.2, pp.61-65, 1973.

W. D. Hillis, The Connection Machine, 1989.

R. J. Fisher and H. G. Dietz, Compiling for SIMD within a register, International Workshop on Languages and Compilers for Parallel Computing, pp.290-305, 1998.

, IEEE standard for binary floating-point arithmetic, pp.754-1985, 1985.

N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole et al., The ARM Scalable Vector Extension, IEEE Micro, vol.37, issue.2, pp.26-39, 2017.

&. Nec, T. Nec-sx-aurora, and . Engine, , pp.2018-2028, 2018.

D. M. Carlson, D. C. Sullivan, R. E. Bach, and D. R. Resnick, The ETA10 liquidnitrogen-cooled supercomputer system, IEEE Transactions on Electron Devices, vol.36, issue.8, pp.1404-1413, 1989.

T. Watanabe, NEC SX-3 supercomputer system, Supercomputers and Their Performance in Computational Fluid Dynamics, pp.63-75, 1993.

J. Palmer and G. Steele, Connection Machine model CM-5 system overview, Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the, pp.474-483, 1992.

J. J. Dongarra, P. Luszczek, and A. Petitet, Concurrency and Computation : practice and experience, vol.15, pp.803-820, 2003.

J. D. Mccalpin, A survey of memory bandwidth and machine balance in current high performance computers, IEEE TCCA Newsletter, vol.19, p.25, 1995.

J. Abel, K. Balasubramanian, M. Bargeron, T. Craver, and M. Phlipot, Applications tuning for streaming SIMD extensions, Intel Technology Journal, vol.2, 1999.

U. Drepper, What every programmer should know about memory, vol.11, p.2007, 2007.

R. Leißa, I. Haffner, and S. Hack, Sierra : A SIMD extension for C++, Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, WPMVP '14, pp.17-24, 2014.

D. Callahan, S. Carr, and K. Kennedy, Improving register allocation for subscripted variables, ACM Sigplan Notices, vol.25, issue.6, pp.53-65, 1990.

F. Irigoin, P. Jouvelot, and R. Triolet, Semantical interprocedural parallelization : An overview of the PIPS project, ACM International Conference on Supercomputing 25th Anniversary Volume, pp.143-150, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00984684

C. Bastoul, Code generation in the polyhedral model is easier than you think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp.7-16, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00017260

S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. Silber et al., GRAPHITE : Polyhedral analyses and optimizations for GCC, Proceedings of the 2006 GCC Developers Summit, 2006.

K. Trifunovic, A. Cohen, D. Edelsohn, F. Li, T. Grosser et al., Graphite two years after : First lessons learned from real-world polyhedral compilation, GCC Research Opportunities Workshop (GROW'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551516

D. Bernstein, M. Rodeh, and I. Gertner, On the complexity of scheduling problems for parallel/pipelined machines, IEEE Transactions on Computers, vol.38, pp.1308-1313, 1989.

L. Lacassagne, D. Etiemble, A. Hassan-zahraee, A. Dominguez, and P. Vezolle, High level transforms for SIMD and low-level computer vision algorithms, Workshop on Programming Models for SIMD/Vector Processing (WPMVP, associated with ACM PPoPP), pp.49-56, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01094906

D. Goldberg, What every computer scientist should know about floating-point arithmetic, ACM Comput. Surv, vol.23, pp.5-48, 1991.

J. Muller, N. Brisebarre, F. De-dinechin, C. Jeannerod, V. Lefevre et al., Handbook of floating-point arithmetic, 2010.
URL : https://hal.archives-ouvertes.fr/ensl-00379167

N. Revol and P. Théveny, Numerical reproducibility and parallel computations : Issues for interval algorithms, CoRR, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00916931

F. Jézéquel, P. Langlois, and N. Revol, First steps towards more numerical reproducibility, ESAIM : Proceedings, vol.45, pp.229-238, 2014.

L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, and P. Zimmermann, MPFR : A multiple-precision binary floating-point library with correct rounding, ACM Trans. Math. Softw, vol.33, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00103655

N. Revol and F. Rouillier, Motivations for an arbitrary precision interval arithmetic and the MPFI library, Reliable computing, vol.11, issue.4, pp.275-290, 2005.
URL : https://hal.archives-ouvertes.fr/inria-00100985

W. Kahan, Pracniques : Further remarks on reducing truncation errors, Commun. ACM, vol.8, p.40, 1965.

A. Fog, Instruction tables : Lists of instruction latencies, throughputs and microoperation breakdowns for Intel, AMD and VIA CPUs, pp.2018-2022, 2018.

P. Soderquist and M. Leeser, Area and performance tradeoffs in floating-point divide and square-root implementations, ACM Comput. Surv, vol.28, pp.518-564, 1996.

C. Lomont, Fast inverse square root, tech. rep, 2003.

, Intel Corporation, Intel R ? 64 and IA-32 Architectures Optimization Reference Manual, 2018.

B. Barrois, O. Sentieys, and D. Menard, The hidden cost of functional approximation against careful data sizing : a case study, Proceedings of the Conference on Design, Automation & Test in Europe, pp.181-186, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01423147

P. Sebah and X. Gourdon, Newton's method and high order iterations, 2001.

V. Y. Pan, Methods of computing values of polynomials, Russian Mathematical Surveys, vol.21, issue.1, pp.105-136, 1966.

M. Bourgoin, E. Chailloux, and J. L. Lamotte, SPOC : GPGPU programming through stream processing with OCaml, Parallel Processing Letters, vol.22, p.1240007, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00697257

, Python template engine, JINJA2

I. Masliah, M. Baboulin, and J. Falcou, Metaprogramming dense linear algebra solvers applications to multi and many-core architectures, Trustcom/BigDataSE/ISPA, 2015 IEEE, vol.3, pp.69-76, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01221358

P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg et al., Spectre attacks : Exploiting speculative execution, 2018.

M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas et al., , 2018.

, Intel Corporation, Intel R ? 64 and IA-32 Architectures Software Developer's Manual, 2018.

D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous multithreading : Maximizing on-chip parallelism, Proceedings 22nd Annual International Symposium on Computer Architecture, pp.392-403, 1995.

F. Jézéquel and J. Chesneaux, CADNA : a library for estimating round-off error propagation, Computer Physics Communications, vol.178, issue.12, pp.933-955, 2008.

N. J. Higham, Cholesky factorization, Wiley Interdisciplinary Reviews : Computational Statistics, vol.1, issue.2, pp.251-254, 2009.

G. H. Golub and C. F. Van-loan, Matrix computations, vol.3, 2012.

K. Aneja, F. Laguzet, L. Lacassagne, and A. Merigot, Video rate image segmentation by means of region splitting and merging, IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009.

L. Cabaret, L. Lacassagne, and D. Etiemble, Parallel Light Speed Labeling : an efficient connected component labeling algorithm for multi-core processors, IEEE International Conference on Image Processing (ICIP), pp.1-4, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01243310

N. J. Higham, Accuracy and stability of numerical algorithms, 2002.

J. J. Du-croz and N. J. Higham, Stability of methods for matrix inversion, IMA Journal of Numerical Analysis, vol.12, issue.1, pp.1-19, 1992.

, Spiral : Sotfware/hardware generation for dsp algorithms

, Automatically tuned linear algebra software, ATLAS

T. Dong, A. Haidar, P. Luszczek, J. A. Harris, S. Tomov et al., LU factorization of small matrices : accelerating batched DGETRF on the GPU, IEEE 11th Intl Conf on Embedded Software and Syst (HPCC, CSS, ICESS), pp.157-160, 2014.

T. Dong, A. Haidar, S. Tomov, and J. Dongarra, A fast batched Cholesky factorization on a GPU, Parallel Processing (ICPP), 2014 43rd International Conference on, pp.432-440, 2014.

R. E. Kalman, A new approach to linear filtering and prediction problems, Journal of basic Engineering, vol.82, issue.1, pp.35-45, 1960.

M. S. Grewal, Kalman filtering, International Encyclopedia of Statistical Science, pp.705-708, 2011.

E. Wolin and L. Ho, Covariance matrices for track fitting with the kalman filter, Nuclear Instruments and Methods in Physics Research Section A : Accelerators, Spectrometers, Detectors and Associated Equipment, vol.329, issue.3, pp.493-500, 1993.

A. Romero, M. Gouiffès, and L. Lacassagne, Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01805045

D. Reid, An algorithm for tracking multiple targets, IEEE transactions on Automatic Control, vol.24, issue.6, pp.843-854, 1979.

D. Beymer, P. Mclauchlan, B. Coifman, and J. Malik, A real-time computer vision system for measuring traffic parameters, Computer Vision and Pattern Recognition, pp.495-501, 1997.

R. G. Brown and P. Y. Hwang, Introduction to random signals and applied Kalman filtering, vol.3, 1992.

A. Mohamed and K. Schwarz, Adaptive kalman filtering for INS/GPS, Journal of geodesy, vol.73, issue.4, pp.193-203, 1999.

J. Sasiadek and Q. Wang, Sensor fusion based on fuzzy kalman filtering for autonomous robot vehicle, Proceedings. 1999 IEEE International Conference on, vol.4, pp.2970-2975, 1999.

A. C. Harvey, Forecasting, structural time series models and the Kalman filter, 1990.

M. A. Palis and D. K. Krecker, Parallel Kalman filtering on the connection machine, Frontiers of Massively Parallel Computation, pp.55-58, 1990.

S. Li, D. C. Wunsch, E. O'hair, and M. G. Giesselmann, Extended Kalman filter training of neural networks on a SIMD parallel machine, journal of Parallel and Distributed Computing, vol.62, issue.4, pp.544-562, 2002.

M. Huang, S. Wei, B. Huang, and Y. Chang, Accelerating the Kalman filter on a GPU, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, pp.1016-1020, 2011.

T. Hu and M. Shing, Computation of matrix chain products. Part I, SIAM Journal on Computing, vol.11, issue.2, pp.362-373, 1982.

T. Hu and M. Shing, Computation of matrix chain products. Part II, SIAM Journal on Computing, vol.13, issue.2, pp.228-251, 1984.

G. J. Chaitin, Register allocation & spilling via graph coloring, ACM Sigplan Notices, vol.17, pp.98-105, 1982.

A. B. Kahn, Topological sorting of large networks, Commun. ACM, vol.5, pp.558-562, 1962.

R. Frühwirth, Application of Kalman filtering to track and vertex fitting, Nuclear Instruments and Methods in Physics Research Section A : Accelerators, Spectrometers, Detectors and Associated Equipment, vol.262, issue.2, pp.444-450, 1987.

B. D. Anderson and J. B. Moore, Optimal filtering, vol.21, 1979.

H. E. Rauch, C. Striebel, and F. Tung, Maximum likelihood estimates of linear dynamic systems, AIAA journal, vol.3, issue.8, pp.1445-1450, 1965.

P. Billoir, Track fitting with multiple scattering : A new method, Nuclear Instruments and Methods in Physics Research, vol.225, issue.2, pp.352-366, 1984.

P. Kaminski, A. Bryson, and S. Schmidt, Discrete square root filtering : A survey of current techniques, IEEE Transactions on Automatic Control, vol.16, pp.727-736, 1971.

D. H. Cámpora-pérez, LHCb Kalman Filter cross architecture studies, Journal of Physics : Conference Series, vol.898, issue.3, p.32052, 2017.

D. H. Cámpora-pérez and O. Awile, An efficient low-rank kalman filter for modern simd architectures, Concurrency and Computation : Practice and Experience, p.4483, 2018.

G. Cerati, P. Elmer, S. Krutelyov, S. Lantz, M. Lefebvre et al., Kalman filter tracking on parallel architectures, Journal of Physics : Conference Series, vol.898, issue.4, p.42051, 2017.

S. Gorbunov, U. Kebschull, I. Kisel, V. Lindenstruth, and W. Müller, Fast SIMDized Kalman filter based track fit, Computer Physics Communications, vol.178, issue.5, pp.374-383, 2008.

F. Lemaitre and L. Lacassagne, Batched cholesky factorization for tiny matrices, Design and Architectures for Signal and Image Processing (DASIP), 2016 Conference on, pp.130-137, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01361204

F. Lemaitre, B. Couturier, and L. Lacassagne, Cholesky factorization on SIMD multi-core architectures, Journal of Systems Architecture, vol.79, pp.1-15, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01550129

F. Lemaitre, B. Couturier, and L. Lacassagne, Small SIMD matrices for CERN high throughput computing, Workshop on Programming Models for SIMD/Vector Processing (WPMVP, associated with ACM PPoPP), p.1, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01760260

, Glossaire Précision Quantité d'information utilisée pour représenter la valeur (aussi appelé, vol.55, p.74

, Python Langage de programmation 49, vol.50, p.115

, RAM Random Access Memory: mémoire à accès aléatoires 22, vol.67, p.113

, SIMD Single Instruction Multiple Data: une même opération s'appliquant en parallèle sur plusieurs données différentes, vol.12, pp.127-129

M. Smt-simultaneous, Technique consistant à exécuter plusieurs threads en même temps sur un même coeur, se partageant ainsi les unités fonctionelles 71

, SoA Structure of Arrays: Agencement mémoire de type structure de tableaux 25, vol.27, p.115

, SSE Jeu d'instructions SIMD pour x86. largeur: 128 bits 15, vol.83, p.113

, SVE Jeu d'instructions SIMD pour ARM. largeur: entre 128 et 2048 bits, vol.20, p.21

A. Swar-simd-within and . Register, Architecture bénéficiant d'instructions utilisant un registre comme un vecteur de plusieurs éléments plus petits

, Thread Fil d'exécution 32, vol.33, p.72

, SIMD Bibliothèque C++ pour l'écriture de code SIMD, vol.16, p.87

, Vc Bibliothèque C++ pour l'écriture de code SIMD 16

, Vcl Bibliothèque C++ pour l'écriture de code SIMD, vol.16, p.87

, VSX Jeu d'instructions SIMD pour Power. largeur: 128 bits. Extension du jeu d'instruction Altivec 15, vol.20, p.82

, x86 Architecture matérielle initiée par Intel 15, vol.16, pp.127-129