=. Wr and *. Data_real, * j * bu t te rs Pe r Gr ou p + b utt er sP er G ro up

*. Wi, * j * b u tt er sP e rG ro up + bu t te rs Pe r Gr

=. Temp_imag, *. Wi, and . Data_real, * j * bu t te rs Pe r Gr ou p + b utt er sP er G ro up

*. Wr, * j * b u tt er sP e rG ro up + bu t te rs Pe r Gr

[. Artigues, S. Demassey, and E. Neron, Resource- Constrained Project Scheduling: Models, Algorithms, Extensions and Applications, ISTE, p.27, 2007.
DOI : 10.1002/9780470611227
URL : https://hal.archives-ouvertes.fr/hal-00482946

B. Boissinot, A. Darte, F. Rastello, B. Dupont-de-dinechin, and C. Guillon, Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency, 2009 International Symposium on Code Generation and Optimization, pp.114-125, 2009.
DOI : 10.1109/CGO.2009.19
URL : https://hal.archives-ouvertes.fr/inria-00349925

[. Betcke, Optimal Scaling of Generalized and Polynomial Eigenvalue Problems, SIAM Journal on Matrix Analysis and Applications, vol.30, issue.4, pp.1320-1338, 2008.
DOI : 10.1137/070704769

[. Bodík, R. Gupta, and V. Sarkar, Abcd: eliminating array bounds checks on demand, PLDI, pp.321-333, 2000.

C. Bjjl-+-10-]-christian-bertin, J. Jeannerod, H. Jourdan-lu, C. Knochel, C. Monat et al., Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors, Proceedings of PASCO'10, pp.1-9, 2010.

J. L. Blue, A Portable Fortran Program to Find the Euclidean Norm of a Vector, ACM Transactions on Mathematical Software, vol.4, issue.1, pp.15-23, 1978.
DOI : 10.1145/355769.355771

J. [. Cody and . Coonen, Algorithm 722; Functions to support the IEEE standard for binary floating-point arithmetic, ACM Transactions on Mathematical Software, vol.19, issue.4, pp.443-451, 1993.
DOI : 10.1145/168173.168185

G. M. Andrea, H. Cilio, and . Corporaal, Floating point to fixed point conversion of C code, Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99, CC'99, pp.229-243, 1999.

[. Cornea, J. Harrison, P. T. , and P. Tang, Scientific Computing on Itanium R -based Systems, 2002.

[. Chevillard, M. Joldes, and C. Lauter, Sollya: An Environment for the Development of Numerical Codes, Proc. of the Third International Congress on Mathematical Software (ICMS), pp.28-31, 2010.
DOI : 10.1007/978-3-642-15582-6_5
URL : https://hal.archives-ouvertes.fr/hal-00761644

Y. Jern, C. , and S. Parameswaran, Automatic application specific floating-point unit generation, Proceedings of the conference on Design , automation and test in Europe, DATE '07, pp.461-466, 2007.

Y. Jern, C. , and S. Parameswaran, Custom floating-point unit generation for embedded systems, Trans. Comp.-Aided Des. Integ. Cir. Sys, vol.28, issue.5, pp.638-650, 2009.

. Benoit-dupont-de-dinechin, Time-indexed formulations and a large neighborhood search for the resource-constrained modulo scheduling problem, 3rd Multidisciplinary International Scheduling conference: Theory and Applications (MISTA), p.27, 2007.

J. Detrey and . Florent-de-dinechin, Floating-Point Trigonometric Functions for FPGAs, 2007 International Conference on Field Programmable Logic and Applications, pp.29-34, 2007.
DOI : 10.1109/FPL.2007.4380621

B. Florent-de-dinechin and . Pasca, Designing Custom Arithmetic Data Paths with FloPoCo, IEEE Design & Test of Computers, vol.28, issue.4, pp.18-27, 2011.
DOI : 10.1109/MDT.2011.44

]. J. Dem84 and . Demmel, Underflow and the reliability of numerical software, SIAM Journal on Scientific and Statistical Computing, vol.5, issue.20, pp.887-919, 1984.

T. [. Ercegovac and . Lang, Digital Arithmetic, pp.89-90, 2004.
URL : https://hal.archives-ouvertes.fr/ensl-00542215

P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proc. of the 27th International Symposium on Computer Architecture (ISCA), pp.203-213, 2000.

B. Gal and . Bachelis, An accurate elementary mathematical library for the IEEE floating point standard, ACM Transactions on Mathematical Software, vol.17, issue.1, pp.26-45, 1991.
DOI : 10.1145/103147.103151

L. Ronald, D. E. Graham, O. Knuth, and . Patashnik, Concrete Mathematics: A Foundation for Computer Science, p.47, 1994.

[. Gök, Integer squarers with overflow detection, Computers & Electrical Engineering, vol.34, issue.5, pp.378-391, 2008.
DOI : 10.1016/j.compeleceng.2007.11.002

D. Goldberg, What every computer scientist should know about floating-point arithmetic, ACM Computing Surveys, vol.23, issue.1, pp.5-48, 1991.
DOI : 10.1145/103162.103163

[. Guillon, F. Rastello, T. Bidault, and F. Bouchez, Procedure placement using temporal-ordering information, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems , CASES '04, pp.437-459, 2005.
DOI : 10.1145/1023833.1023870

[. Hauser, The SoftFloat and TestFloat Packages Available at http://www.jhauser.us/arithmetic

J. R. Hauser, Handling floating-point exceptions in numeric programs, ACM Transactions on Programming Languages and Systems, vol.18, issue.2, pp.139-174, 1996.
DOI : 10.1145/227699.227701

J. Desmond, N. J. Higham, and . Higham, MATLAB Guide Second Edition, Society for Industrial and Applied Mathematics, p.81, 2005.

. Hig02, J. Nicholas, and . Higham, Accuracy and Stability of Numerical Algorithms, SIAM, pp.41-56, 2002.

E. R. Hansen, L. Merrell, R. L. Patrick, and . Wang, Polynomial evaluation with scaling, ACM Transactions on Mathematical Software, vol.16, issue.1, pp.86-93, 1990.
DOI : 10.1145/77626.77633

[. Jeannerod and J. , Simultaneous Floating-Point Sine and Cosine for VLIW Integer Processors, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors
DOI : 10.1109/ASAP.2012.12
URL : https://hal.archives-ouvertes.fr/hal-00672327

[. Jeannerod, J. Jourdan-lu, C. Monat, and G. Revy, How to Square Floats Accurately and Efficiently on the ST231 Integer Processor, 2011 IEEE 20th Symposium on Computer Arithmetic, pp.77-81, 1109.
DOI : 10.1109/ARITH.2011.19
URL : https://hal.archives-ouvertes.fr/ensl-00644147

[. Jeannerod, H. Knochel, C. Monat, and G. Revy, Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation, IEEE Transactions on Computers, vol.60, issue.2, pp.214-227, 2011.
DOI : 10.1109/TC.2010.152
URL : https://hal.archives-ouvertes.fr/ensl-00559236

[. Jeannerod, N. Louvet, and J. Muller, Further analysis of Kahan's algorithm for the accurate computation of 2×2 determinants Mathematics of Computation, 2012. to appear. Preliminary version available at http

[. Jeannerod and G. Revy, FLIP 1.0: a fast floatingpoint library for integer processors. http://flip.gforge.inria, 2009.

[. Jeannerod and G. Revy, Optimizing correctlyrounded reciprocal square roots for embedded VLIW cores, Proceedings of the 43rd Asilomar Conference on Signals, Systems, and Computers (Asilomar'09), p.2, 2009.
DOI : 10.1109/acssc.2009.5469948
URL : https://hal.archives-ouvertes.fr/ensl-00391185

]. W. Kah81 and . Kahan, Why do we need a floating-point arithmetic standard?, 1981.

]. W. Kah96 and . Kahan, Lecture notes on the status of IEEE Standard 754 for binary floating-point arithmetic. Manuscript, 1996.

]. W. Kah98 and . Kahan, Matlab's loss is nobody's gain, 1998.

]. D. Knu87 and . Knuth, Seminumerical Algorithms, volume 2 of The Art of Computer Programming, 1987.

. Lee, G. Corinna, and . Lee, UTDSP Benchmark Suite Available at http://www. eecg.toronto, UTDSP.html, p.148

K. Gusso, L. , and O. Saotome, Optimized math functions for a fixed-point DSP architecture, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC- PAD), pp.125-132, 2007.

[. Markstein, IA-64 and Elementary Functions: Speed and Precision, 2000.

[. Markstein, Accelerating sine and cosine evaluation with compiler assistance, 16th IEEE Symposium on Computer Arithmetic, 2003. Proceedings., pp.137-140, 2003.
DOI : 10.1109/ARITH.2003.1207671

J. Muller, N. Brisebarre, C. Florent-de-dinechin, V. Jeannerod, G. Lefèvre et al., Handbook of Floating-Point Arithmetic. Birkhäuser, 2010. [cited on page(s) 13, pp.50-89
URL : https://hal.archives-ouvertes.fr/ensl-00379167

[. Ménard, D. Chillet, F. Charot, and O. Sentieys, Automatic floating-point to fixed-point conversion for DSP code generation, Proceedings of the international conference on Compilers, architecture, and synthesis for embedded systems , CASES '02, pp.270-276, 2002.
DOI : 10.1145/581630.581674

M. [. Mitra, H. Chakraborty, and . Sakai, A block floating-point treatment to the LMS algorithm: efficient realization and a roundoff error analysis, IEEE Transactions on Signal Processing, vol.53, issue.12, pp.4536-4544, 2005.
DOI : 10.1109/TSP.2005.859342

G. Melquiond, Gappa -génération automatique de preuves de propriétés arithmétiques

C. Mouilleron and G. Revy, Automatic Generation of Fast and Certified Code for Polynomial Evaluation, 2011 IEEE 20th Symposium on Computer Arithmetic, pp.233-242
DOI : 10.1109/ARITH.2011.39
URL : https://hal.archives-ouvertes.fr/ensl-00531721

R. Scott-mahlke, M. Ravindran, R. Schlansker, T. Schreiber, and . Sherwood, Bitwidth cognizant architecture synthesis of custom hardware accelerators, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.20, issue.11, pp.1355-1371, 2001.
DOI : 10.1109/43.959864

[. Muller, On the definition of ulp(x), 2005.
URL : https://hal.archives-ouvertes.fr/inria-00070503

[. Muller, Elementary Functions, Algorithms and Implementation, 2006.
URL : https://hal.archives-ouvertes.fr/ensl-00000008

R. C. Jason and . Patterson, Accurate static branch prediction by value range propagation, SIGPLAN Not, vol.30, issue.6, pp.67-78, 1995.

[. Paliouras, K. Karagianni, and T. Stouraitis, A floating-point processor for fast and accurate sine/cosine evaluation, IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol.47, issue.5
DOI : 10.1109/82.842112

C. [. Parlett and . Reinsch, Balancing a matrix for calculation of eigenvalues and eigenvectors, Numerische Mathematik, vol.12, issue.4, pp.293-304, 1969.
DOI : 10.1007/BF02165404

D. M. Priest, Efficient scaling for complex division, ACM Transactions on Mathematical Software, vol.30, issue.4, pp.389-401, 2004.
DOI : 10.1145/1039813.1039814

[. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes Third Edition: The Art of Scientific Computing, Americas, vol.32, issue.59, pp.41-60, 2007.

[. Raina, FLIP: a Floating-point Library for Integer Processors, pp.42-114, 2006.

G. Revy, Implementation of binary floating-point arithmetic on embedded integer processors: polynomial evaluation-based algorithms and certified code generation, pp.20-54, 2009.
URL : https://hal.archives-ouvertes.fr/tel-00469661

H. Hani and . Saleh, Fused Floating-Point Arithmetic For DSP, 2009.

H. Hani, J. Saleh, E. Earl, and . Swartzlander, A floating-point fused dotproduct unit, IEEE International Conference on Computer Design, pp.427-431, 2008.

[. Sol, C. Guillon, F. M. , Q. Pereira, and M. A. Bigonha, Dynamic Elimination of Overflow Tests in a Trace Compiler, Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, pp.2-21, 2011.
DOI : 10.1016/j.tcs.2005.07.035

N. Shibata, Efficient evaluation methods of elementary functions suitable for??SIMD computation, Computer Science - Research and Development, vol.23, issue.1, pp.25-32, 2010.
DOI : 10.1007/s00450-010-0108-2

A. Simon, Value-Range Analysis of C Programs: Towards Proving the Absence of Buffer Overflow Vulnerabilities, 2008.
DOI : 10.1007/978-1-84800-017-9

E. Earl, . Swartzlander, H. M. Hani, and . Saleh, FFT implementation with fused floating-point operations, IEEE Trans. on Computers, vol.61, issue.2, pp.284-288, 2012.

M. Eric, M. Schwarz, S. D. Schmookler, and . Trong, FPU implementations with denormalized numbers, IEEE Trans. Comput, vol.54, issue.20, pp.825-836, 2005.

]. P. Ste74 and . Sterbenz, Floating-point computation. Prentice-Hall series in automatic computation, 1974.

[. Tang, Some software implementations of the functions sine and cosine, Argonne, Ill, 1990.

[. Tisserand, Hardware Operator for Simultaneous Sine and Cosine Evaluation, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, pp.992-995, 2006.
DOI : 10.1109/ICASSP.2006.1660823
URL : https://hal.archives-ouvertes.fr/lirmm-00125366

[. Vázquez and J. D. Bruguera, Composite iterative algorithm and architecture for q-th root calculation, Proceedings of the 20th IEEE Symposium on Computer Arithmetic (ARITH-20), pp.52-61, 2011.

J. [. Walters, M. J. Schlessman, and . Schulte, Combined unsigned and two's complement hybrid squarers, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256), pp.861-866, 2001.
DOI : 10.1109/ACSSC.2001.987046

N. Mark, F. K. Wegman, and . Zadeck, Constant propagation with conditional branches, ACM Trans. Program. Lang. Syst, vol.13, issue.2, pp.181-210, 1991.

=. St-*-st, e n _ I n t r i n s i c _ F u n c t i o n ( ty

A. Annot_get, BB_an notation s ( bb ), ANNOT _CA LLINFO

S. E-t-_-p-u-_-n, s i d e _ e f f e c t s ( pu )