Framework TDR for the LHCb upgrade, tech. rep, 2012. ,
Upgrade trigger : Biannual performance update, 2017. ,
Eigen v3, 2016. ,
Magma, matrix algebra on gpu and multicore architectures ,
, Intel(R) math kernel library, MKL
High-performance matrix-matrix multiplications of very small matrices, European Conference on Parallel Processing, pp.659-671, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01409286
Autotuning and specialization : Speeding up matrix multiply for small matrices with compiler technology, Software Automatic Tuning, pp.353-370, 2011. ,
LIBXSMM : accelerating small matrix multiplications by runtime code generation, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p.84, 2016. ,
Effective SIMD vectorization for Intel Xeon Phi coprocessors, Scientific Programming, vol.2015, pp.1-14, 2015. ,
A basic linear algebra compiler, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, p.23, 2014. ,
Real-time covariance tracking algorithm for embedded systems, Design and Architectures for Signal and Image Processing (DASIP), 2013 Conference on, pp.104-111, 2013. ,
Color tracking with contextual switching : real-time implementation on CPU, Journal of Real-Time Image Processing, vol.10, issue.2, pp.403-422, 2015. ,
, Optimizing compilers for modern architectures : a dependence-based approach, vol.9, p.11, 2002.
Facilitating the search for compositions of program transformations, Proceedings of the 19th annual international conference on Supercomputing, pp.151-160, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-01257296
Some computer organizations and their effectiveness, IEEE Transactions on Computers, issue.21, pp.948-960, 1972. ,
Boost.SIMD : Generic programming for portable SIMDization, Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, WPMVP '14, pp.1-8, 2014. ,
Cyme : A library maximizing SIMD computation on user-defined containers, Proceedings of the 29th International Conference on Supercomputing, vol.8488, pp.440-449, 2014. ,
, Header-only zero-overhead c++ wrapper for simd intrinsics of multiple instruction sets, pp.7-9
An efficient, portable and generic library for successive cancellation decoding of polar codes, International Workshop on Languages and Compilers for Parallel Computing, pp.303-317, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01203105
A high-performance portable abstract interface for explicit SIMD vectorization, Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'17, pp.21-28, 2017. ,
Vc : A C++ library for explicit vectorization, Software : Practice and Experience, vol.42, issue.11, pp.1409-1430, 2012. ,
C++ vector class library, pp.2017-2024, 2017. ,
Task parallelism and data distribution : An overview of explicit parallel programming languages, International Workshop on Languages and Compilers for Parallel Computing, pp.174-189, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00742536
ispc : A SPMD compiler for high-performance CPU programming, Innovative Parallel Computing (InPar), pp.1-13, 2012. ,
OpenACC-first experiences with real-world applications, European Conference on Parallel Processing, pp.859-870, 2012. ,
The OpenCL specification, Hot Chips 21 Symposium (HCS), pp.1-314, 2009. ,
OpenMP : an industry standard API for shared-memory programming, Computational Science & Engineering, IEEE, vol.5, issue.1, pp.46-55, 1998. ,
, IEEE standard for information technology-portable operating system interface (POSIX(R)) base specifications, issue 7, IEEE Std, pp.1-3951, 2018.
, Intel Corporation, Intel R ? C++ Compiler 18.0 Developer Guide and Reference, pp.2018-2024, 2018.
, Programming languages -C," standard, International Organization for Standardization, 2011.
The ILLIAC IV computer, IEEE Transactions on computers, vol.100, issue.8, pp.746-757, 1968. ,
The Texas Instruments Advanced Scientific Computer, International Workshop on(AFIPS), vol.00, p.1899 ,
The Control Data STAR-100 : performance measurements, Proceedings of the May 6-10, pp.385-387, 1974. ,
The CRAY-1 computer system, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978. ,
DAP-a distributed array processor, ACM SIGARCH Computer Architecture News, vol.2, pp.61-65, 1973. ,
The Connection Machine, 1989. ,
Compiling for SIMD within a register, International Workshop on Languages and Compilers for Parallel Computing, pp.290-305, 1998. ,
, IEEE standard for binary floating-point arithmetic, pp.754-1985, 1985.
The ARM Scalable Vector Extension, IEEE Micro, vol.37, issue.2, pp.26-39, 2017. ,
, , pp.2018-2028, 2018.
The ETA10 liquidnitrogen-cooled supercomputer system, IEEE Transactions on Electron Devices, vol.36, issue.8, pp.1404-1413, 1989. ,
NEC SX-3 supercomputer system, Supercomputers and Their Performance in Computational Fluid Dynamics, pp.63-75, 1993. ,
Connection Machine model CM-5 system overview, Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the, pp.474-483, 1992. ,
Concurrency and Computation : practice and experience, vol.15, pp.803-820, 2003. ,
A survey of memory bandwidth and machine balance in current high performance computers, IEEE TCCA Newsletter, vol.19, p.25, 1995. ,
Applications tuning for streaming SIMD extensions, Intel Technology Journal, vol.2, 1999. ,
What every programmer should know about memory, vol.11, p.2007, 2007. ,
Sierra : A SIMD extension for C++, Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, WPMVP '14, pp.17-24, 2014. ,
Improving register allocation for subscripted variables, ACM Sigplan Notices, vol.25, issue.6, pp.53-65, 1990. ,
Semantical interprocedural parallelization : An overview of the PIPS project, ACM International Conference on Supercomputing 25th Anniversary Volume, pp.143-150, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00984684
Code generation in the polyhedral model is easier than you think, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp.7-16, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00017260
GRAPHITE : Polyhedral analyses and optimizations for GCC, Proceedings of the 2006 GCC Developers Summit, 2006. ,
Graphite two years after : First lessons learned from real-world polyhedral compilation, GCC Research Opportunities Workshop (GROW'10), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00551516
On the complexity of scheduling problems for parallel/pipelined machines, IEEE Transactions on Computers, vol.38, pp.1308-1313, 1989. ,
High level transforms for SIMD and low-level computer vision algorithms, Workshop on Programming Models for SIMD/Vector Processing (WPMVP, associated with ACM PPoPP), pp.49-56, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01094906
What every computer scientist should know about floating-point arithmetic, ACM Comput. Surv, vol.23, pp.5-48, 1991. ,
Handbook of floating-point arithmetic, 2010. ,
URL : https://hal.archives-ouvertes.fr/ensl-00379167
Numerical reproducibility and parallel computations : Issues for interval algorithms, CoRR, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00916931
First steps towards more numerical reproducibility, ESAIM : Proceedings, vol.45, pp.229-238, 2014. ,
MPFR : A multiple-precision binary floating-point library with correct rounding, ACM Trans. Math. Softw, vol.33, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00103655
Motivations for an arbitrary precision interval arithmetic and the MPFI library, Reliable computing, vol.11, issue.4, pp.275-290, 2005. ,
URL : https://hal.archives-ouvertes.fr/inria-00100985
Pracniques : Further remarks on reducing truncation errors, Commun. ACM, vol.8, p.40, 1965. ,
Instruction tables : Lists of instruction latencies, throughputs and microoperation breakdowns for Intel, AMD and VIA CPUs, pp.2018-2022, 2018. ,
Area and performance tradeoffs in floating-point divide and square-root implementations, ACM Comput. Surv, vol.28, pp.518-564, 1996. ,
Fast inverse square root, tech. rep, 2003. ,
, Intel Corporation, Intel R ? 64 and IA-32 Architectures Optimization Reference Manual, 2018.
The hidden cost of functional approximation against careful data sizing : a case study, Proceedings of the Conference on Design, Automation & Test in Europe, pp.181-186, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01423147
Newton's method and high order iterations, 2001. ,
Methods of computing values of polynomials, Russian Mathematical Surveys, vol.21, issue.1, pp.105-136, 1966. ,
SPOC : GPGPU programming through stream processing with OCaml, Parallel Processing Letters, vol.22, p.1240007, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00697257
, Python template engine, JINJA2
Metaprogramming dense linear algebra solvers applications to multi and many-core architectures, Trustcom/BigDataSE/ISPA, 2015 IEEE, vol.3, pp.69-76, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01221358
Spectre attacks : Exploiting speculative execution, 2018. ,
, , 2018.
, Intel Corporation, Intel R ? 64 and IA-32 Architectures Software Developer's Manual, 2018.
Simultaneous multithreading : Maximizing on-chip parallelism, Proceedings 22nd Annual International Symposium on Computer Architecture, pp.392-403, 1995. ,
CADNA : a library for estimating round-off error propagation, Computer Physics Communications, vol.178, issue.12, pp.933-955, 2008. ,
Cholesky factorization, Wiley Interdisciplinary Reviews : Computational Statistics, vol.1, issue.2, pp.251-254, 2009. ,
, Matrix computations, vol.3, 2012.
Video rate image segmentation by means of region splitting and merging, IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2009. ,
Parallel Light Speed Labeling : an efficient connected component labeling algorithm for multi-core processors, IEEE International Conference on Image Processing (ICIP), pp.1-4, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01243310
Accuracy and stability of numerical algorithms, 2002. ,
Stability of methods for matrix inversion, IMA Journal of Numerical Analysis, vol.12, issue.1, pp.1-19, 1992. ,
, Spiral : Sotfware/hardware generation for dsp algorithms
, Automatically tuned linear algebra software, ATLAS
LU factorization of small matrices : accelerating batched DGETRF on the GPU, IEEE 11th Intl Conf on Embedded Software and Syst (HPCC, CSS, ICESS), pp.157-160, 2014. ,
A fast batched Cholesky factorization on a GPU, Parallel Processing (ICPP), 2014 43rd International Conference on, pp.432-440, 2014. ,
A new approach to linear filtering and prediction problems, Journal of basic Engineering, vol.82, issue.1, pp.35-45, 1960. ,
Kalman filtering, International Encyclopedia of Statistical Science, pp.705-708, 2011. ,
Covariance matrices for track fitting with the kalman filter, Nuclear Instruments and Methods in Physics Research Section A : Accelerators, Spectrometers, Detectors and Associated Equipment, vol.329, issue.3, pp.493-500, 1993. ,
Enhanced Local Binary Covariance Matrices (ELBCM) for texture analysis and object tracking, ACM International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01805045
An algorithm for tracking multiple targets, IEEE transactions on Automatic Control, vol.24, issue.6, pp.843-854, 1979. ,
A real-time computer vision system for measuring traffic parameters, Computer Vision and Pattern Recognition, pp.495-501, 1997. ,
, Introduction to random signals and applied Kalman filtering, vol.3, 1992.
Adaptive kalman filtering for INS/GPS, Journal of geodesy, vol.73, issue.4, pp.193-203, 1999. ,
Sensor fusion based on fuzzy kalman filtering for autonomous robot vehicle, Proceedings. 1999 IEEE International Conference on, vol.4, pp.2970-2975, 1999. ,
Forecasting, structural time series models and the Kalman filter, 1990. ,
Parallel Kalman filtering on the connection machine, Frontiers of Massively Parallel Computation, pp.55-58, 1990. ,
Extended Kalman filter training of neural networks on a SIMD parallel machine, journal of Parallel and Distributed Computing, vol.62, issue.4, pp.544-562, 2002. ,
Accelerating the Kalman filter on a GPU, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, pp.1016-1020, 2011. ,
Computation of matrix chain products. Part I, SIAM Journal on Computing, vol.11, issue.2, pp.362-373, 1982. ,
Computation of matrix chain products. Part II, SIAM Journal on Computing, vol.13, issue.2, pp.228-251, 1984. ,
Register allocation & spilling via graph coloring, ACM Sigplan Notices, vol.17, pp.98-105, 1982. ,
Topological sorting of large networks, Commun. ACM, vol.5, pp.558-562, 1962. ,
Application of Kalman filtering to track and vertex fitting, Nuclear Instruments and Methods in Physics Research Section A : Accelerators, Spectrometers, Detectors and Associated Equipment, vol.262, issue.2, pp.444-450, 1987. ,
, Optimal filtering, vol.21, 1979.
Maximum likelihood estimates of linear dynamic systems, AIAA journal, vol.3, issue.8, pp.1445-1450, 1965. ,
Track fitting with multiple scattering : A new method, Nuclear Instruments and Methods in Physics Research, vol.225, issue.2, pp.352-366, 1984. ,
Discrete square root filtering : A survey of current techniques, IEEE Transactions on Automatic Control, vol.16, pp.727-736, 1971. ,
LHCb Kalman Filter cross architecture studies, Journal of Physics : Conference Series, vol.898, issue.3, p.32052, 2017. ,
An efficient low-rank kalman filter for modern simd architectures, Concurrency and Computation : Practice and Experience, p.4483, 2018. ,
Kalman filter tracking on parallel architectures, Journal of Physics : Conference Series, vol.898, issue.4, p.42051, 2017. ,
Fast SIMDized Kalman filter based track fit, Computer Physics Communications, vol.178, issue.5, pp.374-383, 2008. ,
Batched cholesky factorization for tiny matrices, Design and Architectures for Signal and Image Processing (DASIP), 2016 Conference on, pp.130-137, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01361204
Cholesky factorization on SIMD multi-core architectures, Journal of Systems Architecture, vol.79, pp.1-15, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01550129
Small SIMD matrices for CERN high throughput computing, Workshop on Programming Models for SIMD/Vector Processing (WPMVP, associated with ACM PPoPP), p.1, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01760260
, Glossaire Précision Quantité d'information utilisée pour représenter la valeur (aussi appelé, vol.55, p.74
, Python Langage de programmation 49, vol.50, p.115
, RAM Random Access Memory: mémoire à accès aléatoires 22, vol.67, p.113
, SIMD Single Instruction Multiple Data: une même opération s'appliquant en parallèle sur plusieurs données différentes, vol.12, pp.127-129
Technique consistant à exécuter plusieurs threads en même temps sur un même coeur, se partageant ainsi les unités fonctionelles 71 ,
, SoA Structure of Arrays: Agencement mémoire de type structure de tableaux 25, vol.27, p.115
, SSE Jeu d'instructions SIMD pour x86. largeur: 128 bits 15, vol.83, p.113
, SVE Jeu d'instructions SIMD pour ARM. largeur: entre 128 et 2048 bits, vol.20, p.21
Architecture bénéficiant d'instructions utilisant un registre comme un vecteur de plusieurs éléments plus petits ,
, Thread Fil d'exécution 32, vol.33, p.72
, SIMD Bibliothèque C++ pour l'écriture de code SIMD, vol.16, p.87
, Vc Bibliothèque C++ pour l'écriture de code SIMD 16
, Vcl Bibliothèque C++ pour l'écriture de code SIMD, vol.16, p.87
, VSX Jeu d'instructions SIMD pour Power. largeur: 128 bits. Extension du jeu d'instruction Altivec 15, vol.20, p.82
, x86 Architecture matérielle initiée par Intel 15, vol.16, pp.127-129