, Table 13.1: Syntheses and accuracy measurements of a FIR filter generated using Matlab and the proposed method, vol.224

, XC2000 Logic Cell Array Familii. Xilinx Corporation, 1985.

, FIR suite, 2009.

, Serii DSP48E1 Slice User Guide (UG479). Xilinx Corporation

, Serii FPGAs Configurable Logic Block User Guide (UG474). Xilinx Corporation

, Serii FPGAs Memory Resourcc User Guide (UG473). Xilinx Corporation

, Stratix 10 Embedded Memory User Guide (ug-s10-memory), 2016.

, Stratix 10 Logic Array Blocks and Adaptive Logic Modull User Guide (ug-s10-lab), 2016.

, Stratix 10 Variable Precision DSP Blocks User Guide (ug-s10-dsp), 2016.

H. M. Ahmed, Signal Processing Algorithms and Architecturr, 1982.

L. Aksoy, E. Costa, P. Flores, and J. Monteiro, Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.27, issue.6, pp.1013-1026, 2008.

R. Andraka, A survey of CORDIC algorithms for FPGA based computers, Proceedings of the 1998 ACM/SIGDA Sixth International Symposium on Field Programmable Gate Arrays, FPGA '98, pp.191-200, 1998.

A. Antoniou, Digital Signal Processing: Signals, Systems, and Filters, 2005.

V. Balakrishnan and S. Boyd, On computing the worst-case peak gain of linear systems, Systems & Control Letters, vol.19, pp.265-269, 1992.

S. Banescu, F. De-dinechin, B. Pasca, and R. Tudoran, Multipliers for floating-point double precision and beyond on FPGAs, ACM SIGARCH Computer Architecture News, vol.38, issue.4, pp.73-79, 2011.
URL : https://hal.archives-ouvertes.fr/ensl-00475781

C. R. Baugh and B. A. Wooley, A two's complement parallel array multiplication algorithm, IEEE Transactions on Computers, vol.22, issue.12, pp.1045-1047, 1973.

O. Beaumont, V. Boudet, F. Rastello, and Y. Robert, Partitioning a square into rectangles: Np-completeness and approximation algorithms, Algorithmica, vol.34, issue.3, pp.217-239, 2002.
URL : https://hal.archives-ouvertes.fr/hal-00807407

J. M. Borwein and P. B. Borwein, Pi and the AGM: A Study in the Analytic Number Theory and Computational Complexity, 1987.

N. Boullis and A. Tisserand, Some optimizations of hardware multiplication by constant matrices, IEEE Transactions on Computers, vol.54, issue.10, pp.1271-1282, 2005.
URL : https://hal.archives-ouvertes.fr/lirmm-00113092

S. Boyd and J. Doyle, Comparison of peak and RMS gains for discrete-time systems, Systems Control Letters, vol.9, issue.1, pp.1-6, 1987.

C. Brandolese, W. Fornaciari, and F. Salice, An area estimation methodology for FPGA based designs at systemc-level, pp.129-132, 2004.

N. Brisebarre, S. Filip, and G. Hanrot, A lattice basis reduction approach for the design of quantized FIR filters, 2016.

P. R. Cappello and C. Wu, Computer-aided design of VLSI FIR filters, Proceedings of the IEEE, vol.75, issue.9, pp.1260-1271, 1987.

S. C. Chan, K. M. Tsui, and S. H. Zhao, A methodology for automatic hardware synthesis of multiplier-less digital filters with prescribed output accuracy, APCCAS 20062006 IEEE Asia Pacific Conference on Circuits and Systems, pp.61-64, 2006.

K. D. Chapman, Fast integer multipliers fit in FPGAs, 1994.

D. Chen and M. Sima, Fixed-point CORDIC-based qr decomposition by givens rotations on FPGA, 2011 International Conference on Reconfigurable Computing and FPGAs, pp.327-332, 2011.

A. J. Chung, K. Cobden, M. Jervis, M. Langhammer, and B. Pasca, Tools and techniques for efficient high-level system design on FPGAs, 2014.

F. Chung, E. Gilbert, R. Graham, J. Shearer, and J. Van-lint, Tiling rectangles with rectangles. Mathematics Magazine, vol.55, pp.286-291, 1982.

L. Dadda, Some schemes for parallel multipliers, Alta frequenza, vol.34, issue.5, pp.349-356, 1965.

D. Sarma, D. Matula, and D. , Faithful bipartite rom reciprocal tables, ARITH'12, pp.17-28, 1995.

R. Dawson and R. Paré, Characterizing tileorders, Order, vol.10, issue.2, pp.111-128, 1993.

F. De-dinechin, J. Detrey, O. Cre?, R. ;. Tudoran, and . Éns-lyon, When FPGAs are better at floating-point than microprocessors, 2007.
URL : https://hal.archives-ouvertes.fr/ensl-00174627

F. De-dinechin and L. Didier, Table-based division by small integer constants, International Symposium on Applied Reconfigurable Computing, pp.53-63, 2012.
URL : https://hal.archives-ouvertes.fr/ensl-00642145

F. De-dinechin, M. Joldes, and B. Pasca, Automatic generation of polynomial-based hardware architectures for function evaluation, ASAP 21st IEEE International Conference on Application-specific Systems, Architecturr and Processors, pp.216-222, 2010.
URL : https://hal.archives-ouvertes.fr/ensl-00470506

F. De-dinechin and B. Pasca, Large multipliers with fewer dsp blocks, International Conference on Field Programmable Logic and Applications, pp.250-255, 2009.

F. De-dinechin and B. Pasca, Designing custom arithmetic data paths with FloPoCo, IEEE Design & Test of Computers, vol.28, issue.4, pp.18-27, 2011.
URL : https://hal.archives-ouvertes.fr/ensl-00646282

F. De-dinechin, H. Takeugming, and J. Tanguy, A 128-tap complex FIR filter processing 20 giga-samples/s in a single FPGA, 44th Asilomar Conference on Signals, Systems & Computers, 2010.
URL : https://hal.archives-ouvertes.fr/ensl-00542950

F. De-dinechin and A. Tisserand, Some improvements on multipartite table methods, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15, pp.128-135, 2001.
URL : https://hal.archives-ouvertes.fr/inria-00072577

J. Detrey and F. De-dinechin, Floating-point trigonometric functions for FPGAs, International Conference on Field Programmable Logic and Applications, pp.29-34, 2007.

M. Ercegovac, T. Lang, J. Muller, and A. Tisserand, Reciprocation, square root, inverse square root, and some elementary functions using small multipliers. Computers, IEEE Transactions on, vol.49, issue.7, pp.628-637, 2000.
URL : https://hal.archives-ouvertes.fr/hal-02101940

M. D. Ercegovac and T. Lang, Digital arithmetic, p.247, 2004.
URL : https://hal.archives-ouvertes.fr/ensl-00542215

C. Farabet, C. Poulet, J. Y. Han, and Y. Lecun, Cnp: An FPGA-based processor for convolutional networks, 2009 International Conference on Field Programmable Logic and Applications, pp.32-37, 2009.

S. I. Filip, A robust and scalable implementation of the Parks-McClellan algorithm for designing FIR filters, ACM Transactions on Mathematical Software (TOMS), vol.43, issue.1, p.7, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01136005

M. Flynn, On division by functional iteration. Computers, IEEE Transactions, issue.19, pp.702-706, 1970.

B. Gaide, Methods of pipelining a data path in an integrated circuit, US Patent, vol.8, p.71, 2014.

S. Gal, Computing elementary functions: A new approach for achieving high accuracy and good performance, Accurate Scientific Computations, pp.1-16, 1986.

S. Gal, An accurate elementary mathematical library for the IEEE floating point standard, ACM Transactions on Mathematical Software, p.17, 1991.

I. Ganusov, H. Fraisse, A. N. Ng, R. T. Possignolo, and S. Das, Automated extra pipeline analysis of applications mapped to Xilinx UltraScale+ FPGAs, Field Programmable Logic and Applications, 2016.

R. Gutierrez, V. Torres, and J. Valls, FPGA-implementation of atan(Y/X) based on logarithmic transformation and LUT-based techniques, Journal of Systems Architecture, p.56, 2010.

R. Gutierrez and J. Valls, Low-Power FPGA-Implementation of atan(Y/X) Using Look-Up Table Methods for Communication Applications, Journal of Signal Processing Systems, p.56, 2008.

S. Hauck and A. Dehon, Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, 2007.

T. Hilaire and B. Lopez, Reliable implementation of linear filters with fixed-point arithmetic, SiPS, Workshop on Signal Processing Systems, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01076048

S. Hsiao, P. Wu, C. Wen, and P. Meher, Table size reduction methods for faithfully rounded lookup-table-based multiplierless function evaluation. Circuits and Systems II: Express Briefs, IEEE Transactions on, vol.62, issue.5, pp.466-470, 2015.

D. Hwang and A. Willson, A 400-MHz processor for the conversion of rectangular to polar coordinates in 0.25-µm cmos, IEEE Journal of Solid-State Circuits, vol.38, p.248, 2003.

J. Hwang and J. Ballagh, Building custom FIR filters using system generator, Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications, FPL '02, pp.1101-1104, 2002.

, Ieee standard for information technology-local and metropolitan area networks-specific requirements-part 15.4: Wireless medium access control (mac) and physical layer (phy) specifications for low rate wireless personal area networks (wpans), IEEE Std, vol.802, pp.1-320, 2006.

M. Ishikawa, M. Edahiro, T. Yoshimura, T. Miyazaki, S. I. Aikoh et al., Automatic layout synthesis for FIR filters using a silicon compiler, IEEE International Symposium on, vol.4, pp.2588-2591, 1990.

M. Ito, N. Takagi, and S. Yajima, Efficient initial approximation for multiplicative division and square root by a multiplication with operand modification. Computers, IEEE Transactions on, vol.46, issue.4, pp.495-498, 1997.

G. Jaberipur, B. Parhami, and M. Ghodsi, An efficient universal addition scheme for all hybrid-redundant representations with weighted bit-set encoding, vol.42, pp.149-158, 2006.

R. Jain, P. T. Yang, and T. Yoshino, FIRGEN: a computer-aided design system for high performance FIR filter integrated circuits, IEEE Transactions on Signal Processing, vol.39, issue.7, pp.1655-1668, 1991.

P. K. Jha and N. D. Dutt, A Fast Area-Delay Estimation Technique for RTL Component Generators, 1992.

T. Kailath, Linear Systems, 1980.

J. Kaiser, Nonrecursive digital filter design using the I 0-sinh window function, Proc. 1974 IEEE International Symposium on Circuits & Systems, pp.20-23, 1974.

A. Karatsuba and Y. Ofman, Multiplication of multidigit numbers on automata, Soviet physics doklady, vol.7, p.595, 1963.

M. Karkooti, J. R. Cavallaro, and C. Dick, FPGA implementation of matrix inversion using QRD-RLS algorithm, Asilomar Conference on Signals, Systems, and Computers, 2005.
DOI : 10.1109/acssc.2005.1600043

URL : https://scholarship.rice.edu/bitstream/1911/20002/1/Kar2005Oct5FPGAImplem.PDF

S. Kilts, Advanced FPGA Design: Architecture, Implementation, and Optimization, 2007.

J. Koomey, S. Berard, M. Sanchez, and H. Wong, Implications of historical trends in the electrical efficiency of computing, IEEE Annals of the History of Computing, vol.33, issue.3, pp.46-54, 2011.

M. Kumm, K. Möller, and P. Zipf, Dynamically reconfigurable fir filter architectures with fast reconfiguration, 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), pp.1-8, 2013.
DOI : 10.1109/recosoc.2013.6581517

M. Kumm and P. Zipf, Efficient high speed compression trees on xilinx FPGAs, Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), pp.171-182, 2014.

M. Kumm and P. Zipf, Pipelined compressor tree optimization using integer linear programming, 24th International Conference on Field Programmable Logic and Applications (FPL), pp.1-8, 2014.
DOI : 10.1109/fpl.2014.6927468

C. E. Leiserson and J. B. Saxe, Retiming synchronous circuitry, Algorithmica, vol.6, issue.1, pp.5-35, 1991.
DOI : 10.1007/bf01759032

URL : http://www.cs.columbia.edu/~cs6861/handouts/leiserson-algorithmica-88.pdf

A. K. Lenstra, H. W. Lenstra, and L. Lovász, Factoring polynomials with rational coefficients, Mathematische Annalen, vol.261, issue.4, pp.515-534, 1982.

L. A. Levin, Problems, complete in “average” instance, Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing, STOC '84, p.465, 1984.

L. A. Levin, Average case complete problems, SIAM Journal on Computing, vol.15, issue.1, pp.285-286, 1986.
DOI : 10.1007/978-1-4612-4808-8_26

B. Lopez, T. Hilaire, and L. Didier, Formatting bits to better implement signal processing algorithms, Pervasive and Embedded Computing and Communication Systems, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01076049

J. Lotze, S. A. Fahmy, J. Noguera, L. Doyle, and R. Esser, An FPGA-based cognitive radio framework. IET Conference Proceedings, pp.138-143, 2008.
DOI : 10.1049/cp:20080652

J. Low and C. C. Jong, A memory-efficient tables-and-additions method for accurate computation of elementary functions. Computers, IEEE Transactions on, vol.62, issue.5, pp.858-872, 2013.
DOI : 10.1109/tc.2012.43

R. Marlow, C. Dobson, and P. Athanas, An enhanced and embedded gnu radio flow, 24th International Conference on Field Programmable Logic and Applications (FPL), pp.1-4, 2014.
DOI : 10.1109/fpl.2014.6927427

T. Matsunaga, S. Kimura, and Y. Matsunaga, An exact approach for gpc-based compressor tree synthesis, IEICE Transactions on Fundamentals of Electronics, vol.96, issue.12, pp.2553-2560, 2013.

J. V. Mccanny, Y. Hu, and M. Yan, Automated design of DSP array processor chips, Proceedings. International Conference on, pp.33-44, 1994.

J. H. Mcclellan, T. W. Parks, and L. Rabiner, A computer program for designing optimum FIR linear phase digital filters, IEEE Transactions on Audio and Electroacoustics, vol.21, issue.6, pp.506-526, 1973.

M. Mehendale, D. Sherlekar, S. Venkatesh, and G. , Synthesis of multiplier-less FIR filters with minimum number of additions, IEEE/ACM International Conference on Computer-Aided Design, pp.668-671, 1995.
DOI : 10.1109/iccad.1995.480201

P. K. Meher, J. Valls, T. Juang, K. Sridharan, and K. Maharatna, 50 years of CORDIC: Algorithms, architectures, and applications, IEEE Transactions on Circuits and Systems I: Regular Papers, vol.56, issue.9, pp.1893-1907, 2009.

O. Mencer, L. Semeria, M. Morf, and J. Delosme, Journal of VLSI signal processing systems for signal, image and video technoloo, vol.24, pp.211-221, 2000.

P. Milder, M. Ahmad, J. C. Hoe, and M. Püschel, Fast and accurate resource estimation of automatically generated custom DFT IP cores, Proceedings of the internation symposium on Field programmable gate arrays-FPGA'06, p.211, 2006.

S. Mohanakrishnan and J. B. Evans, Automatic implementation of FIR filters on field programmable gate arrays, IEEE Signal Processing Letters, vol.2, issue.3, pp.51-53, 1995.

P. L. Montgomery, Five, six, and seven-term karatsuba-like formulae, IEEE Transactions on Computers, vol.54, issue.3, pp.362-369, 2005.

C. Moore and J. M. Robson, Hard tiling problems with simple tiles. Discrete & Computational Geometry, vol.26, pp.573-590, 2001.

G. E. Moore, Cramming more components onto integrated circuits, reprinted from electronics, vol.38, pp.33-35, 1965.

J. Muller, A few results on table-based methods, Reliable Computing, vol.5, issue.3, pp.279-288, 1999.
URL : https://hal.archives-ouvertes.fr/hal-02101969

J. Muller, Elementary functions, 2006.
URL : https://hal.archives-ouvertes.fr/ensl-00989001

D. A. Narayan and A. J. Schwenk, Tiling large rectangles. Mathematics magazine, vol.75, pp.372-380, 2002.

A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee, Accurate area and delay estimators for FPGAs, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, pp.862-869, 2002.

H. D. Nguyen, B. Pasca, and T. B. Preußer, FPGA-specific arithmetic optimizations of short-latency adders, 21st International Conference on Field Programmable Logic and Applications, pp.232-237, 2011.
URL : https://hal.archives-ouvertes.fr/ensl-00542389

V. G. Oklobdzija, D. Villeger, and S. S. Liu, A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach, IEEE Transactions on Computers, vol.45, issue.3, pp.294-306, 1996.

A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, 2010.

H. Parandeh-afshar, A. Neogy, P. Brisk, and P. Ienne, Compressor tree synthesis on commercial high-performance FPGAs, ACM Transactions on Reconfigurable Technoloo and Systems (TRETS), vol.4, issue.4, p.39, 2011.

T. W. Parks and J. H. Mcclellan, Chebyshev Approximation for Nonrecursive Digital Filters with Linear Phase, IEEE Transactions on Circuit Theory, vol.19, issue.2, pp.189-194, 1972.

B. Pasca, Correctly rounded floating-point division for dsp-enabled FPGAs, 22nd International Conference on Field Programmable Logic and Applications (FPL), pp.249-254, 2012.

D. Perrelet, A. Villanueva, M. Sundal, Y. Brischetto, D. Oberson et al., White-Rabbit Based Revolution Frequency Program for the Longitudinal Beam Control of the CERN PS, 2015.

M. Potkonjak, M. Srivastava, and A. Chandrakasan, Efficient substitution of multiple constant multiplications by shifts and additions using iterative pairwise matching, ACM IEEE Design Automation Conference, pp.189-194, 1994.

P. Rabinowitz, Multiple-precision division, Commun. ACM, vol.4, issue.2, p.98, 1961.

S. Rajan, S. Wang, and R. Inkol, Efficient Approximations for the Four-Quadrant Arctangent Function, Canadian Conference on Electrical and Computer Engineering, 2006.

E. Remes, Sur le calcul effectif des polynomes d'approximation de Tchebichef. Comptt rendd hebdomadairr dd séancc de l, Académie dd Sciencc, vol.199, pp.337-340, 0199.

M. Saber, Y. Jitsumatsu, and T. Kohda, A low-power implementation of arctangent function for communication applications using FPGA, Signal Design and its Applications in Communications, 2009.

S. M. Sait and H. Youssef, VISI Physical Design Automation: Theory and Practice, 1994.

J. Schlessman, C. Chen, W. Wolf, B. Ozer, K. Fujino et al., Hardware/software co-design of an FPGA-based embedded tracking system, Conference on Computer Vision and Pattern Recognition Workshop, pp.123-123, 2006.

M. Schulte and J. Stine, Symmetric bipartite tables for accurate function approximation, ARITH'13, pp.175-183, 1997.

P. Schumacher, Fast and accurate resource estimation of RTL-based designs targeting FPGAs, Field Programmable Logic and Applications, International Conference on, 2008.

J. Shen and G. Strang, The asymptotics of optimal (equiripple) filters, IEEE Transactions on Signal Processing, vol.47, issue.4, pp.1087-1098, 1999.

R. Smyk, FIReWORK: FIR filters hardware structures auto-generator, Journal of Applied Computer Science, vol.21, issue.1, pp.135-149, 2013.

J. L. Stanislaus and T. Mohsenin, Low-complexity FPGA implementation of compressive sensing reconstruction, Computing, Networking and Communications (ICNC), 2013 International Conference on, pp.671-675, 2013.

P. F. Stelling, C. U. Martel, V. G. Oklobdzija, and R. Ravi, Optimal circuits for parallel multipliers, IEEE Transactions on Computers, vol.47, issue.3, pp.273-285, 1998.

J. Stine and M. Schulte, The symmetric table addition method for accurate function approximation, vol.21, pp.167-177, 1999.

S. Story and P. T. Tang, New algorithms for improved transcendental functions on IA-64, 14th IEEE Symposium on Computer Arithmetic, 1999.

E. E. Swartzlander, Merged arithmetic, IEEE Transactions on Computers, vol.29, issue.10, pp.946-950, 1980.

P. E. Sweeney and E. R. Paternoster, Cutting and packing problems: a categorized, application-orientated research bibliography, Journal of the Operational Research Society, vol.43, issue.7, pp.691-706, 1992.

N. Takagi, Generating a power of an operand by a table look-up and a multiplication, ARITH'13, pp.126-131, 1997.

D. Timmermann, H. Hahn, and B. Hosticka, Modified CORDIC algorithm with reduced iterations, Electronics Letters, vol.15, issue.25, pp.950-951, 1989.

J. Valls, M. Kuhlmann, and K. K. Parhi, Journal of VLSI signal processing systems for signal, image and video technoloo, vol.32, pp.207-222, 2002.

P. Van-emde-boas and M. Savelsbergh, Bounded tiling, an alternative to satisfiability, Proceedings of the 2nd Frege Memorial Conference, pp.401-407, 1984.

G. Venkataramani and Y. Gu, System-level retiming and pipelining, FieldProgrammable Custom Computing Machinn (FCCM), International Symposium on, pp.80-87, 2014.

A. K. Verma, P. Brisk, and P. Ienne, Data-flow transformations to maximize the use of carry-save representation in arithmetic circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.27, issue.10, pp.1761-1774, 2008.

J. E. Volder, The CORDIC trigonometric computing technique, IRE Transactions on Electronic Computers, EC, vol.8, issue.3, pp.330-334, 1959.

A. Volkova, T. Hilaire, and C. Lauter, Reliable evaluation of the Worst-Case Peak Gain matrix in multiple precision, IEEE Symposium on Computer Arithmetic, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01083879

A. Volkova, T. Hilaire, and C. Q. Lauter, Determining fixed-point formats for a digital filter implementation using the worst-case peak-gain measure, Asilomar Conference on Signals, Systems and Computers, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01308403

Y. Voronenko and M. Püschel, Multiplierless multiple constant multiplication, ACM Trans. Algorithms, issue.2, p.3, 2007.

C. S. Wallace, A suggestion for a fast multiplier, IEEE Transactions on Electronic Computers, EC, vol.13, issue.1, pp.14-17, 1964.

J. S. Walther, A unified algorithm for elementary functions, Spring Joint Computer Conference, AFIPS '71 (Spring), pp.379-385, 1971.

D. Wang, M. Ercegovac, and Y. Xiao, Complex function approximation using twodimensional interpolation, IEEE Transactions on Computers, vol.63, issue.12, pp.2948-2960, 2014.

S. White, Applications of distributed arithmetic to digital signal processing: a tutorial review, IEEE ASSP Magazine, vol.6, issue.3, pp.4-19, 1989.

F. Willers and R. Beyer, Practical analyss: graphical and numerical methods, 1948.

M. J. Wirthlin, Journal of VLSI signal processing systems for signal, image and video technoloo, vol.36, pp.7-15, 2004.

W. Wong and E. Goto, Fast evaluation of the elementary functions in single precision. Computers, IEEE Transactions on, vol.44, issue.3, pp.453-457, 1995.

P. T. Yang, R. Jain, T. Yoshino, W. Gass, and A. Shah, A functional silicon compiler for high speed FIR digital filters, Acoustics, Speech, and Signal Processing, pp.1329-1332, 1990.