E. Setup and R. , 78 4.5.2 Target processors, p.82

W. Joint and S. , 96 5.3.1 Overview and Intuition, p.101

R. Experimental-evaluation, 116 5.5.2 WLO-then-SLP source-to-source flow Floating-point vs, Fixed-point, p.121

S. [. Moussawi and . Derrien, Superword level parallelism aware word length optimization, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017.
DOI : 10.23919/DATE.2017.7927148

URL : https://hal.archives-ouvertes.fr/hal-01425550

G. [. Estibals, A. H. Deest, S. Moussawi, and . Derrien, System Level Synthesis for Virtual Memory Enabled Hardware Threads, Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.738-743, 2016.
DOI : 10.3850/9783981537079_0733

URL : https://hal.archives-ouvertes.fr/hal-01424772

S. [. Moussawi and . Derrien, Demo: SLP-aware word length optimization, 2016 Conference on Design and Architectures for Signal and Image Processing (DASIP), p.2016
DOI : 10.1109/DASIP.2016.7853829

]. A. Fyem-+-13, T. Floc-'h, A. H. Yuki, A. Moussawi, K. Morvan et al., Gecos: A framework for prototyping custom hardware design flows, Source Code Analysis and Manipulation (SCAM), 2013 IEEE 13th International Working Conference on, pp.100-105, 2013.

. Gva-+-13-]-g, C. Goulas, P. Valouxis, N. S. Alefragis, C. Voros et al., Coarse-grain optimization and code generation for embedded multicore systems, Digital System Design (DSD), 2013 Euromicro Conference on, pp.379-386, 2013.

A. V. Aho and S. C. Johnson, Optimal code generation for expression trees, Proceedings of Seventh Annual ACM Symposium on Theory of Computing, STOC '75, pp.207-217, 1975.

A. V. Aho, S. C. Johnson, and J. D. Ullman, Code Generation for Expressions with Common Subexpressions, Journal of the ACM, vol.24, issue.1, pp.146-160, 1977.
DOI : 10.1145/321992.322001

V. Alfred, M. Aho, . Ganapathi, W. Steven, and . Tjiang, Code generation using tree matching and dynamic programming, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.11, issue.4, pp.491-516, 1989.

R. John, K. Allen, and . Kennedy, PFC: A Program to Convert Fortran to Parallel Form, 1982.

R. Allen and K. Kennedy, Automatic translation of FORTRAN programs to vector form, ACM Transactions on Programming Languages and Systems, vol.9, issue.4, pp.491-542, 1987.
DOI : 10.1145/29873.29875

R. Barik, J. Zhao, and V. Sarkar, Efficient Selection of Vector Instructions Using Dynamic Programming, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.201-212, 2010.
DOI : 10.1109/MICRO.2010.38

M. Bass, P. Knebel, W. David, . Quint, L. William et al., The PA 7100LC microprocessor: A Case Study of IC Design Decisions in a Competitive Environment, HEWLETT PACKARD JOURNAL, vol.46, pp.12-22, 1995.

P. Belanovic and M. Rupp, Fixify: A Toolset for Automated Floating-point to Fixedpoint Conversion, International Conference on Computer, Communication, and Control Technologies CCCT'04, 2004.

P. Belanovic and M. Rupp, Automated floating-point to fixed-point conversion with the fixify environment The 16th IEEE, Rapid System Prototyping, pp.172-178, 2005.

R. Bhargava, K. Lizy, . John, L. Brian, R. Evans et al., Evaluating MMX technology using DSP and multimedia applications, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture, pp.37-46, 1998.
DOI : 10.1109/MICRO.1998.742767

J. C. Aart, M. Bik, P. M. Girkar, X. Grey, and . Tian, Automatic intra-register vectorization for the intel architecture, International Journal of Parallel Programming, vol.30, issue.2, pp.65-98, 2002.

H. , H. Boehm, and R. Cartwright, Exact real arithmetic: Formulating real numbers as functions, 1993.

D. Boland and G. A. Constantinides, A scalable approach for automated precision analysis, Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, FPGA '12, pp.185-194
DOI : 10.1145/2145694.2145726

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN Notices, vol.43, issue.6, pp.101-113, 2008.
DOI : 10.1145/1379022.1375595

J. Bruno and R. Sethi, Code Generation for a One-Register Machine, Journal of the ACM, vol.23, issue.3, pp.502-510, 1976.
DOI : 10.1145/321958.321971

G. Caffarena, C. Fernandez, O. Carreras, and . Nieto-taladriz, Fixed-point refinement of ofdm-based adaptive equalizers: An heuristic approach, EUSIPCO. Conference, 2004.

G. Caffarena, C. Carreras, J. A. López, and Á. Fernández, SQNR Estimation of Fixed-Point DSP Algorithms, EURASIP Journal on Advances in Signal Processing, vol.2010, issue.1, pp.1-2112, 2010.
DOI : 10.1109/TCSI.2004.823652

M. Cantin, Y. Savaria, and P. Lavoie, A comparison of automatic word length optimization procedures, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353), pp.612-615, 2002.
DOI : 10.1109/ISCAS.2002.1011427

C. Chang, C. Chen, and C. King, Using integer linear programming for instruction scheduling and register allocation in multi-issue processors, Computers & Mathematics with Applications, vol.34, issue.9, pp.1-14, 1997.
DOI : 10.1016/S0898-1221(97)00184-3

G. Cheong and M. Lam, An Optimizer for Multimedia Instruction Sets, Contract, vol.30602, issue.95, p.98, 1997.

G. Andrea, H. Cilio, and . Corporaal, Floating Point to Fixed Point Conversion of C Code, Compiler Construction, pp.229-243, 1999.

F. Cladera, M. Gautier, and O. Sentieys, Energy-Aware Computing via Adaptive Precision under Performance Constraints in OFDM Wireless Receivers, 2015 IEEE Computer Society Annual Symposium on VLSI, pp.591-596, 2015.
DOI : 10.1109/ISVLSI.2015.88

URL : https://hal.archives-ouvertes.fr/hal-01175920

S. Coleman and K. S. Mckinley, Tile size selection using cache organization and data layout, ACM SIGPLAN Notices, vol.30, issue.6, pp.279-290, 1995.
DOI : 10.1145/223428.207162

M. Coors, H. Keding, O. Lüthje, and H. Meyr, Design and DSP Implementation of Fixed-Point Systems, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.9, pp.908-925, 2002.
DOI : 10.1155/S1110865702205065

G. Deest, T. Yuki, O. Sentieys, and S. Derrien, Toward scalable source level accuracy analysis for floating-point to fixed-point conversion, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp.726-733, 2014.
DOI : 10.1109/ICCAD.2014.7001432

URL : https://hal.archives-ouvertes.fr/hal-01095207

E. Alexandre, P. Eichenberger, K. O. Wu, and . 'brien, Vectorization for SIMD Architectures with Alignment Constraints, In ACM SIGPLAN Notices, vol.39, pp.82-93, 2004.

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, pp.23-53, 1991.
DOI : 10.1145/360827.360844

P. Feautrier, Some efficient solutions to the affine scheduling problem. I. One-dimensional time, International Journal of Parallel Programming, vol.40, issue.6, pp.313-347, 1992.
DOI : 10.1145/360827.360844

P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time, International Journal of Parallel Programming, vol.2, issue.4, pp.389-420, 1992.
DOI : 10.1007/BF01379404

D. Feld, T. Soddemann, M. Jünger, and S. Mallach, Facilitate SIMD-Code-Generation in the Polyhedral Model by Hardware-aware Automatic Code- Transformation. IMPACT 2013, p.45, 2013.

L. Fireman, E. Petrank, and A. Zaks, New Algorithms for SIMD Alignment, Compiler Construction, pp.1-15, 2007.
DOI : 10.1007/978-3-540-71229-9_1

J. Randall, . Fisher, G. Henry, and . Dietz, Compiling For SIMD Within A Register, Languages and Compilers for Parallel Computing, pp.290-305, 1999.

A. Floc-'h, T. Yuki, A. El-moussawi, A. Morvan, K. Martin et al., Gecos: A framework for prototyping custom hardware design flows, Source Code Analysis and Manipulation (SCAM), 2013 IEEE 13th International Working Conference on, pp.100-105, 2013.

F. Franchetti, S. Kral, J. Lorenz, and C. W. Ueberhuber, Efficient Utilization of SIMD Extensions, Proceedings of the IEEE, pp.409-425, 2005.
DOI : 10.1109/JPROC.2004.840491

J. Fridman, Sub-word parallelism in digital signal processing, IEEE Signal Processing Magazine, vol.17, issue.2, pp.27-35, 2000.
DOI : 10.1109/79.826409

S. Ghosh, M. Martonosi, and S. Malik, Cache miss equations, Proceedings of the 11th international conference on Supercomputing , ICS '97, pp.317-324, 1997.
DOI : 10.1145/263580.263657

M. Willems, H. Keding, F. Hürtgen, and M. Coors, Transformation of floating-point into fixed-point algorithms by interpolation applying a statistical approach, Proc. Int. Conf. on Signal Processing Application and Technology (ICSPAT), 1998.

K. Han, L. Brian, . Evans, E. Earl, and . Swartzlander, Data wordlength reduction for low-power signal processing software, Signal Processing Systems, pp.343-348, 2004.

T. Henretty, K. Stock, L. Pouchet, F. Franchetti, P. Ramanujam et al., Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures, Compiler Construction, pp.225-245, 2011.
DOI : 10.1109/COMPSAC.2009.82

M. Hohenauer, F. Engel, R. Leupers, G. Ascheid, and H. Meyr, A SIMD optimization framework for retargetable compilers, ACM Transactions on Architecture and Code Optimization, vol.6, issue.1, 2009.
DOI : 10.1145/1509864.1509866

L. Huang, L. Shen, S. Ma, N. Xiao, and Z. Wang, Dm-simd: a new simd predication mechanism for exploiting superword level parallelism, 2009 IEEE 8th International Conference on ASIC, pp.863-866, 2009.

I. Inria, Generic Compiler Suite (GeCoS), 2016.

B. Jang, P. Mistry, D. Schaa, R. Dominguez, and D. Kaeli, Data transformations enabling loop vectorization on multithreaded data parallel architectures, ACM SIGPLAN Notices, vol.45, issue.5, pp.353-354, 2010.
DOI : 10.1145/1837853.1693510

M. Jiménez, M. José, A. Llabería, E. Fernández, and . Morancho, A general algorithm for tiling the register level, Proceedings of the 12th international conference on Supercomputing , ICS '98, pp.133-140, 1998.
DOI : 10.1145/277830.277859

H. Keding, M. Coors, O. Lüthje, and H. Meyr, Fast bit-true simulation, Proceedings of the 38th conference on Design automation , DAC '01, pp.708-713, 2001.
DOI : 10.1145/378239.379052

H. Keding, M. Willems, M. Coors, and H. Meyr, FRIDGE: a fixed-point design and simulation environment, Proceedings Design, Automation and Test in Europe, pp.429-435, 1998.
DOI : 10.1109/DATE.1998.655893

S. Kim, K. Kum, and W. Sung, Fixed-point optimization utility for c and c++ based digital signal processing programs. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, issue.11, pp.451455-1464, 1998.

S. Kim and W. Sung, Fixed-point simulation utility for c and c++ based digital signal processing programs, Signals, Systems and Computers Conference Record of the Twenty-Eighth Asilomar Conference on, pp.162-166, 1994.

S. Kim and W. Sung, A floating-point to fixed-point assembly program translator for the tms 320c25. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, issue.11, pp.41730-739, 1994.

S. Kim and H. Han, Efficient SIMD Code Generation for Irregular Kernels, Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pp.55-64

T. Kim and Y. Hoskote, Automatic generation of custom simd instructions for superword level parallelism, Proceedings of the conference on Design European Design and Automation Association, p.362, 2014.

P. Knebel, B. Arnold, M. Bass, W. Kever, D. Joel et al., HP's PA7100LC: a low-cost superscalar PA-RISC processor, Digest of Papers. Compcon Spring, pp.441-447, 1993.
DOI : 10.1109/CMPCON.1993.289711

R. Koenig, L. Bauer, T. Stripf, M. Shafique, W. Ahmed et al., KAHRISMA: A Novel Hypermorphic Reconfigurable-Instruction-Set Multi-grained-Array Architecture, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pp.819-824, 2010.
DOI : 10.1109/DATE.2010.5456939

D. R. , K. , and S. C. Goldstein, Near-optimal instruction selection on dags, Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, pp.45-54, 2008.

M. Kong, R. Veras, K. Stock, F. Franchetti, L. Pouchet et al., When Polyhedral Transformations Meet SIMD Code Generation, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation of PLDI '13, pp.127-138, 2013.

S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr, SoftSIMD - Exploiting Subword Parallelism Using Source Code Transformations, 2007 Design, Automation & Test in Europe Conference & Exhibition, pp.1349-1354, 2007.
DOI : 10.1109/DATE.2007.364485

A. Krall and S. Lelait, Compilation Techniques for Multimedia Processors, International Journal of Parallel Programming, vol.28, issue.4, pp.347-361, 2000.
DOI : 10.1023/A:1007507005174

A. Kudriavtsev and P. Kogge, Generation of permutations for SIMD processors, ACM SIGPLAN Notices, vol.40, issue.7, pp.147-156, 2005.
DOI : 10.1145/1070891.1065931

K. Kum, J. Kang, and W. Sung, Autoscaler for c: An optimizing floating-point to integer c program converter for fixed-point digital signal processors. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, issue.9, pp.47840-848, 2000.

K. Kum and W. Sung, Word-length optimization for high-level synthesis of digital signal processing systems, 1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374), pp.569-578, 1998.
DOI : 10.1109/SIPS.1998.715819

S. Larsen, Compilation techniques for short-vector instructions, MAS- SACHUSETTS INSTITUTE OF TECHNOLOGY, 2006.

S. Larsen and S. Amarasinghe, Exploiting Superword Level Parallelism with Multimedia Instruction Sets, Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pp.145-156, 2000.

S. Larsen, R. Rabbah, and A. , Exploiting Vector Parallelism in Software Pipelined Loops, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), pp.119-129, 2005.
DOI : 10.1109/MICRO.2005.20

S. Larsen, E. Witchel, and S. Amarasinghe, Increasing and detecting memory address congruence, Proceedings.International Conference on Parallel Architectures and Compilation Techniques, pp.18-29, 2002.
DOI : 10.1109/PACT.2002.1105970

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.6590

R. B. Lee, Subword Parallelism with MAX-2. Micro, IEEE, vol.16, issue.4, pp.51-59, 1996.

S. Lee and A. Gerstlauer, Fine Grain Precision Scaling for Datapath Approximations in Digital Signal Processing Systems, IFIP/IEEE International Conference on Very Large Scale Integration-System on a Chip, pp.119-143, 2013.
DOI : 10.1007/978-3-319-23799-2_6

URL : https://hal.archives-ouvertes.fr/hal-01380301

R. Leupers, Code Selection for Media Processors with SIMD Instructions, Proceedings of the conference on Design, Automation and Test in Europe, pp.4-8, 2000.

R. Leupers and P. Marwedel, Instruction selection for embedded DSPs with complex instructions, Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition, pp.200-205, 1996.
DOI : 10.1109/EURDAC.1996.558205

C. Li, W. Luo, S. Sachin, J. Sapatnekar, and . Hu, Joint precision optimization and high level synthesis for approximate computing, Proceedings of the 52nd Annual Design Automation Conference on, DAC '15, p.104, 2015.
DOI : 10.1109/TCAD.2006.873887

B. Liu, Effect of finite word length on the accuracy of digital filters--a review, IEEE Transactions on Circuit Theory, vol.18, issue.6, pp.670-677, 1971.
DOI : 10.1109/TCT.1971.1083365

J. Liu, Y. Zhang, O. Jang, W. Ding, and M. Kandemir, A Compiler Framework for Extracting Superword Level Parallelism, Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation of PLDI '12, pp.347-358, 2012.

P. Liu, R. Zhao, W. Gao, and S. Wei, A New Algorithm to Exploit Superword Level Parallelism, 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, pp.521-527, 2013.
DOI : 10.1109/DASC.2013.118

Y. Liu, L. Liu, V. Öwall, and S. Chen, Implementation of a dynamic wordlength simd multiplier, NORCHIP, 2014, pp.1-4, 2014.

J. A. Lopez, G. Caffarena, C. Carreras, and O. Nieto-taladriz, Fast and accurate computation of the round-off noise of linear time-invariant systems, IET Circuits, Devices & Systems, vol.2, issue.4, pp.393-408, 2008.
DOI : 10.1049/iet-cds:20070198

D. Menard, H. N. Nguyen, F. Charot, S. Guyetant, J. Guillot et al., Exploiting reconfigurable SWP operators for multimedia applications, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1717-1720, 2011.
DOI : 10.1109/ICASSP.2011.5946832

URL : https://hal.archives-ouvertes.fr/inria-00567017

D. Menard, R. Rocher, and O. Sentieys, Analytical fixed-point accuracy evaluation in linear time-invariant systems. Circuits and Systems I: Regular Papers, IEEE Transactions on, issue.10, pp.553197-3208, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00459231

D. Menard, D. Chillet, F. Charot, and O. Sentieys, Automatic floatingpoint to fixed-point conversion for dsp code generation, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, pp.270-276, 2002.
URL : https://hal.archives-ouvertes.fr/inria-00482916

D. Menard, D. Chillet, and O. Sentieys, Floating-to-Fixed-Point Conversion for Digital Signal Processors, EURASIP Journal on Applied Signal Processing, vol.37, issue.8, pp.77-77, 2006.
DOI : 10.1155/ASP/2006/96421

URL : https://hal.archives-ouvertes.fr/inria-00459212

D. Menard, D. Novo, R. Rocher, F. Catthoor, and O. Sentieys, Quantization mode opportunities in fixed-point system design, 18th European Signal Processing Conference, pp.542-546, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00534526

D. Menard, R. Rocher, P. Scalart, and O. Sentieys, Automatic sqnr determination in non-linear and non-recursive fixed-point systems, Signal Processing Conference 12th European, pp.1349-1352, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00482941

D. Menard and O. Sentieys, DSP Code Generation with Optimized Data Word- Length Selection. Software and Compilers for Embedded Systems, pp.214-228, 2004.
URL : https://hal.archives-ouvertes.fr/inria-00482942

D. Menard, O. Sentieys, and I. Inria, Automatic evaluation of the accuracy of fixed-point algorithms, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 2002.
DOI : 10.1109/DATE.2002.998351

URL : https://hal.archives-ouvertes.fr/inria-00482931

D. Naishlos, Autovectorization in GCC, Proceedings of the 2004 GCC Developers Summit, pp.105-118, 2004.

H. Nguyen, O. Menard, and . Sentieys, Novel algorithms for word-length optimization, Signal Processing Conference 19th European, pp.1944-1948, 2011.

H. Nobayashi and C. Eoyang, A comparison study of automatically vectorizing Fortran compilers, Proceedings of the 1989 ACM/IEEE conference on Supercomputing , Supercomputing '89, pp.820-825, 1989.
DOI : 10.1145/76263.76356

D. Novillo, Openmp and automatic parallelization in gcc, the Proceedings of the GCC Developers, 2006.

D. Novo, S. El-alaoui, and P. Ienne, Accuracy vs Speed Tradeoffs in the Estimation of Fixed-Point Errors on Linear Time-Invariant Systems, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pp.15-20, 2013.
DOI : 10.7873/DATE.2013.018

D. Nuzman, I. Rosen, and A. Zaks, Auto-vectorization of interleaved data for SIMD, ACM SIGPLAN Notices, vol.41, issue.6, pp.132-143, 2006.
DOI : 10.1145/1133255.1133997

A. David, . Padua, J. Michael, and . Wolfe, Advanced compiler optimizations for supercomputers, Communications of the ACM, vol.29, issue.12, pp.1184-1201, 1986.

K. Parashar, D. Menard, R. Rocher, O. Sentieys, D. Novo et al., Fast performance evaluation of fixed-point systems with un-smooth operators, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp.9-16, 2010.
DOI : 10.1109/ICCAD.2010.5654064

URL : https://hal.archives-ouvertes.fr/inria-00534527

A. Peleg and U. Weiser, MMX technology extension to the Intel architecture, IEEE Micro, vol.16, issue.4, pp.42-50, 1996.
DOI : 10.1109/40.526924

M. Philippsen, A survey of concurrent object-oriented languages. Concurrency -Practice and Experience, pp.917-980, 2000.

V. Porpodas, M. Timothy, and . Jones, Throttling Automatic Vectorization: When Less is More, 2015 International Conference on Parallel Architecture and Compilation (PACT), 2015.
DOI : 10.1109/PACT.2015.32

I. Pryanishnikov, A. Krall, and N. Horspool, Pointer alignment analysis for processors with simd instructions, Proceedings of the 5th Workshop on Media and Streaming Processors, pp.50-57, 2003.

I. Pryanishnikov, A. Krall, and N. Horspool, Compiler optimizations for processors with simd instructions. Software: Practice and Experience, pp.93-113, 2007.

G. Psychou, R. Fasthuber, F. Catthoor, J. Hulzink, and J. Huisken, Sub-word Handling in Data-parallel Mapping, ARCS Workshops (ARCS), 2012, pp.1-7, 2012.

P. Gang-ren, D. Wu, and . Padua, A preliminary study on the vectorization of multimedia applications for multimedia extensions, Languages and Compilers for Parallel Computing, pp.420-435, 2004.

P. Gang-ren, D. Wu, and . Padua, An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions, 19th IEEE International Parallel and Distributed Processing Symposium, pp.89-89, 2005.
DOI : 10.1109/IPDPS.2005.94

P. Gang-ren, D. Wu, and . Padua, Optimizing data permutations for simd devices, SIGPLAN Not, vol.41, issue.6, pp.118-131, 2006.

L. Renganarayana, U. Bondhugula, S. Derisavi, E. Alexandre, K. O. Eichenberger et al., Compact multi-dimensional kernel extraction for register tiling, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.45, 2009.
DOI : 10.1145/1654059.1654105

R. Rocher and D. Menard, Analytical approach for numerical accuracy estimation of fixed-point systems based on smooth operations. Circuits and Systems I, pp.592326-2339, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00741741

R. Rocher, D. Menard, N. Herve, and O. Sentieys, Fixed-point configurable hardware components, EURASIP J. Embedded Syst, issue.1, pp.20-20, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00455557

J. R. Rose and G. L. Steele, C*: an extended c language for data parallel programming, International Conference on Supercomputing, 1987.

S. Roy and P. Banerjee, An algorithm for converting floating-point computations to fixed-point in MATLAB based FPGA design, Proceedings of the 41st annual conference on Design automation , DAC '04, pp.484-487, 2004.
DOI : 10.1145/996566.996701

S. Roy and P. Banerjee, An Algorithm for Trading Off Quantization Error with Hardware Resources for MATLAB-Based FPGA Design, IEEE Transactions on Computers, vol.54, issue.7, pp.886-896, 2005.
DOI : 10.1109/TC.2005.106

R. Rugina and M. Rinard, Pointer analysis for multithreaded programs, ACM SIGPLAN Notices, vol.34, issue.5, pp.77-90, 1999.
DOI : 10.1145/301631.301645

M. Richard and . Russell, The CRAY-1 Computer System, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978.

R. Robert, . Schaller, . Moore-'s, and . Law, past, present, and future, IEEE Spectrum, vol.34, issue.6, pp.52-59, 1997.

K. Scott and J. Davidson, Exploring the limits of sub-word level parallelism, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622), pp.81-91, 2000.
DOI : 10.1109/PACT.2000.888333

E. Sedano, D. Menard, and A. Juan, Automated Data Flow Graph Partitioning for a Hierarchical Approach to Wordlength Optimization, 2014.
DOI : 10.1007/978-3-319-05960-0_12

R. Sethi, Complete Register Allocation Problems, SIAM Journal on Computing, vol.4, issue.3, pp.226-248, 1975.
DOI : 10.1137/0204020

C. Shi and R. W. Brodersen, An automated floating-point to fixed-point conversion methodology, Proc. IEEE Int. Conf. on Acoust., Speech, and Signal Processing, pp.529-532, 2003.

C. Shi and R. W. Brodersen, A perturbation theory on statistical quantization effects in fixed-point dsp with non-stationary inputs, Circuits and Systems Proceedings of the 2004 International Symposium on, pp.373-379, 2004.

J. Shin, Compiler Optimizations for Architectures Supporting Superword-level Parallelism, p.3196891, 2005.

J. Shin, Introducing Control Flow into Vectorized Code, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.280-291, 2007.
DOI : 10.1109/PACT.2007.4336219

J. Shin, J. Chame, and M. W. Hall, Exploiting Superword-Level Locality in Multimedia Extension Architectures, J. Instr. Level Parallel, vol.5, pp.1-28, 2003.

J. Shin, M. Hall, and J. Chame, Superword-Level Parallelism in the Presence of Control Flow, Proceedings of the international symposium on Code generation and optimization, pp.165-175, 2005.

N. Sreraman and R. Govindarajan, A vectorizing compiler for multimedia extensions, International Journal of Parallel Programming, vol.28, issue.4, pp.363-400, 2000.
DOI : 10.1023/A:1007559022013

T. Stripf, R. Koenig, and J. Becker, A cycle-approximate, mixed-ISA simulator for the KAHRISMA architecture, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.21-26, 2012.
DOI : 10.1109/DATE.2012.6176426

W. Sung and K. Kum, Simulation-based word-length optimization method for fixed-point digital signal processing systems, IEEE Transactions on Signal Processing, vol.43, issue.12, pp.3087-3090, 1995.
DOI : 10.1109/78.476465

R. Surendran, R. Barik, J. Zhao, and V. Sarkar, Inter-iteration Scalar Replacement Using Array SSA Form, International Conference on Compiler Construction, pp.40-60, 2014.
DOI : 10.1007/978-3-642-54807-9_3

S. Tallam and R. Gupta, Bitwidth aware global register allocation, ACM SIGPLAN Notices, vol.38, issue.1, pp.85-96, 2003.
DOI : 10.1145/640128.604139

H. Tanaka, S. Kobayashi, Y. Takeuchi, K. Sakanushi, and M. Imai, A Code Selection Method for SIMD Processors with PACK Instructions, pp.66-80
DOI : 10.1007/978-3-540-39920-9_6

C. Tenllado, L. Piñuel, M. Prieto, and F. Catthoor, Pack Transposition: Enhancing Superword Level Parallelism Exploitation, ParCo, pp.573-580, 2005.

C. Tenllado, L. Pinuel, M. Prieto, F. Tirado, and F. Catthoor, Improving superword level parallelism support in modern compilers, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES+ISSS '05, pp.303-308, 2005.
DOI : 10.1145/1084834.1084909

K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen, Polyhedralmodel guided loop-nest auto-vectorization, Parallel Architectures and Compilation Techniques , 2009. PACT'09. 18th International Conference on, pp.327-337, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00645325

Y. Tung, C. Ho, and J. Wu, MMX-based DCT and MC Algorithms for Real-Time Pure Software MPEG Decoding, Multimedia Computing and Systems, 1999. IEEE International Conference on, pp.357-362, 1999.

N. Vasilache, B. Meister, M. Baskaran, and R. Lethin, Joint scheduling and layout optimization to enable multi-level vectorization, IMPACT, 2012.

S. Verdoolaege, isl: An Integer Set Library for the Polyhedral Model, International Congress on Mathematical Software, pp.299-302, 2010.
DOI : 10.1007/978-3-642-15582-6_49

A. Suhrid, A. C. Wadekar, and . Parker, Accuracy sensitive word-length selection for algorithm optimization, Computer Design: VLSI in Computers and Processors, 1998. ICCD'98. Proceedings. International Conference on, pp.54-61, 1998.

P. Wu, E. Alexandre, A. Eichenberger, and . Wang, Efficient SIMD Code Generation for Runtime Alignment and Length Conversion, International Symposium on Code Generation and Optimization, 2005. CGO 2005, pp.153-164, 2005.

P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao, An integrated simdization framework using virtual vectors, Proceedings of the 19th annual international conference on Supercomputing , ICS '05, pp.169-178, 2005.
DOI : 10.1145/1088149.1088172