78 4.5.2 Target processors, p.82 ,
96 5.3.1 Overview and Intuition, p.101 ,
116 5.5.2 WLO-then-SLP source-to-source flow Floating-point vs, Fixed-point, p.121 ,
Superword level parallelism aware word length optimization, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017. ,
DOI : 10.23919/DATE.2017.7927148
URL : https://hal.archives-ouvertes.fr/hal-01425550
System Level Synthesis for Virtual Memory Enabled Hardware Threads, Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.738-743, 2016. ,
DOI : 10.3850/9783981537079_0733
URL : https://hal.archives-ouvertes.fr/hal-01424772
Demo: SLP-aware word length optimization, 2016 Conference on Design and Architectures for Signal and Image Processing (DASIP), p.2016 ,
DOI : 10.1109/DASIP.2016.7853829
Gecos: A framework for prototyping custom hardware design flows, Source Code Analysis and Manipulation (SCAM), 2013 IEEE 13th International Working Conference on, pp.100-105, 2013. ,
Coarse-grain optimization and code generation for embedded multicore systems, Digital System Design (DSD), 2013 Euromicro Conference on, pp.379-386, 2013. ,
Optimal code generation for expression trees, Proceedings of Seventh Annual ACM Symposium on Theory of Computing, STOC '75, pp.207-217, 1975. ,
Code Generation for Expressions with Common Subexpressions, Journal of the ACM, vol.24, issue.1, pp.146-160, 1977. ,
DOI : 10.1145/321992.322001
Code generation using tree matching and dynamic programming, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.11, issue.4, pp.491-516, 1989. ,
PFC: A Program to Convert Fortran to Parallel Form, 1982. ,
Automatic translation of FORTRAN programs to vector form, ACM Transactions on Programming Languages and Systems, vol.9, issue.4, pp.491-542, 1987. ,
DOI : 10.1145/29873.29875
Efficient Selection of Vector Instructions Using Dynamic Programming, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.201-212, 2010. ,
DOI : 10.1109/MICRO.2010.38
The PA 7100LC microprocessor: A Case Study of IC Design Decisions in a Competitive Environment, HEWLETT PACKARD JOURNAL, vol.46, pp.12-22, 1995. ,
Fixify: A Toolset for Automated Floating-point to Fixedpoint Conversion, International Conference on Computer, Communication, and Control Technologies CCCT'04, 2004. ,
Automated floating-point to fixed-point conversion with the fixify environment The 16th IEEE, Rapid System Prototyping, pp.172-178, 2005. ,
Evaluating MMX technology using DSP and multimedia applications, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture, pp.37-46, 1998. ,
DOI : 10.1109/MICRO.1998.742767
Automatic intra-register vectorization for the intel architecture, International Journal of Parallel Programming, vol.30, issue.2, pp.65-98, 2002. ,
Exact real arithmetic: Formulating real numbers as functions, 1993. ,
A scalable approach for automated precision analysis, Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, FPGA '12, pp.185-194 ,
DOI : 10.1145/2145694.2145726
A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN Notices, vol.43, issue.6, pp.101-113, 2008. ,
DOI : 10.1145/1379022.1375595
Code Generation for a One-Register Machine, Journal of the ACM, vol.23, issue.3, pp.502-510, 1976. ,
DOI : 10.1145/321958.321971
Fixed-point refinement of ofdm-based adaptive equalizers: An heuristic approach, EUSIPCO. Conference, 2004. ,
SQNR Estimation of Fixed-Point DSP Algorithms, EURASIP Journal on Advances in Signal Processing, vol.2010, issue.1, pp.1-2112, 2010. ,
DOI : 10.1109/TCSI.2004.823652
A comparison of automatic word length optimization procedures, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353), pp.612-615, 2002. ,
DOI : 10.1109/ISCAS.2002.1011427
Using integer linear programming for instruction scheduling and register allocation in multi-issue processors, Computers & Mathematics with Applications, vol.34, issue.9, pp.1-14, 1997. ,
DOI : 10.1016/S0898-1221(97)00184-3
An Optimizer for Multimedia Instruction Sets, Contract, vol.30602, issue.95, p.98, 1997. ,
Floating Point to Fixed Point Conversion of C Code, Compiler Construction, pp.229-243, 1999. ,
Energy-Aware Computing via Adaptive Precision under Performance Constraints in OFDM Wireless Receivers, 2015 IEEE Computer Society Annual Symposium on VLSI, pp.591-596, 2015. ,
DOI : 10.1109/ISVLSI.2015.88
URL : https://hal.archives-ouvertes.fr/hal-01175920
Tile size selection using cache organization and data layout, ACM SIGPLAN Notices, vol.30, issue.6, pp.279-290, 1995. ,
DOI : 10.1145/223428.207162
Design and DSP Implementation of Fixed-Point Systems, EURASIP Journal on Advances in Signal Processing, vol.2002, issue.9, pp.908-925, 2002. ,
DOI : 10.1155/S1110865702205065
Toward scalable source level accuracy analysis for floating-point to fixed-point conversion, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp.726-733, 2014. ,
DOI : 10.1109/ICCAD.2014.7001432
URL : https://hal.archives-ouvertes.fr/hal-01095207
Vectorization for SIMD Architectures with Alignment Constraints, In ACM SIGPLAN Notices, vol.39, pp.82-93, 2004. ,
Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, pp.23-53, 1991. ,
DOI : 10.1145/360827.360844
Some efficient solutions to the affine scheduling problem. I. One-dimensional time, International Journal of Parallel Programming, vol.40, issue.6, pp.313-347, 1992. ,
DOI : 10.1145/360827.360844
Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time, International Journal of Parallel Programming, vol.2, issue.4, pp.389-420, 1992. ,
DOI : 10.1007/BF01379404
Facilitate SIMD-Code-Generation in the Polyhedral Model by Hardware-aware Automatic Code- Transformation. IMPACT 2013, p.45, 2013. ,
New Algorithms for SIMD Alignment, Compiler Construction, pp.1-15, 2007. ,
DOI : 10.1007/978-3-540-71229-9_1
Compiling For SIMD Within A Register, Languages and Compilers for Parallel Computing, pp.290-305, 1999. ,
Gecos: A framework for prototyping custom hardware design flows, Source Code Analysis and Manipulation (SCAM), 2013 IEEE 13th International Working Conference on, pp.100-105, 2013. ,
Efficient Utilization of SIMD Extensions, Proceedings of the IEEE, pp.409-425, 2005. ,
DOI : 10.1109/JPROC.2004.840491
Sub-word parallelism in digital signal processing, IEEE Signal Processing Magazine, vol.17, issue.2, pp.27-35, 2000. ,
DOI : 10.1109/79.826409
Cache miss equations, Proceedings of the 11th international conference on Supercomputing , ICS '97, pp.317-324, 1997. ,
DOI : 10.1145/263580.263657
Transformation of floating-point into fixed-point algorithms by interpolation applying a statistical approach, Proc. Int. Conf. on Signal Processing Application and Technology (ICSPAT), 1998. ,
Data wordlength reduction for low-power signal processing software, Signal Processing Systems, pp.343-348, 2004. ,
Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures, Compiler Construction, pp.225-245, 2011. ,
DOI : 10.1109/COMPSAC.2009.82
A SIMD optimization framework for retargetable compilers, ACM Transactions on Architecture and Code Optimization, vol.6, issue.1, 2009. ,
DOI : 10.1145/1509864.1509866
Dm-simd: a new simd predication mechanism for exploiting superword level parallelism, 2009 IEEE 8th International Conference on ASIC, pp.863-866, 2009. ,
Generic Compiler Suite (GeCoS), 2016. ,
Data transformations enabling loop vectorization on multithreaded data parallel architectures, ACM SIGPLAN Notices, vol.45, issue.5, pp.353-354, 2010. ,
DOI : 10.1145/1837853.1693510
A general algorithm for tiling the register level, Proceedings of the 12th international conference on Supercomputing , ICS '98, pp.133-140, 1998. ,
DOI : 10.1145/277830.277859
Fast bit-true simulation, Proceedings of the 38th conference on Design automation , DAC '01, pp.708-713, 2001. ,
DOI : 10.1145/378239.379052
FRIDGE: a fixed-point design and simulation environment, Proceedings Design, Automation and Test in Europe, pp.429-435, 1998. ,
DOI : 10.1109/DATE.1998.655893
Fixed-point optimization utility for c and c++ based digital signal processing programs. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, issue.11, pp.451455-1464, 1998. ,
Fixed-point simulation utility for c and c++ based digital signal processing programs, Signals, Systems and Computers Conference Record of the Twenty-Eighth Asilomar Conference on, pp.162-166, 1994. ,
A floating-point to fixed-point assembly program translator for the tms 320c25. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, issue.11, pp.41730-739, 1994. ,
Efficient SIMD Code Generation for Irregular Kernels, Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pp.55-64 ,
Automatic generation of custom simd instructions for superword level parallelism, Proceedings of the conference on Design European Design and Automation Association, p.362, 2014. ,
HP's PA7100LC: a low-cost superscalar PA-RISC processor, Digest of Papers. Compcon Spring, pp.441-447, 1993. ,
DOI : 10.1109/CMPCON.1993.289711
KAHRISMA: A Novel Hypermorphic Reconfigurable-Instruction-Set Multi-grained-Array Architecture, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), pp.819-824, 2010. ,
DOI : 10.1109/DATE.2010.5456939
Near-optimal instruction selection on dags, Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, pp.45-54, 2008. ,
When Polyhedral Transformations Meet SIMD Code Generation, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation of PLDI '13, pp.127-138, 2013. ,
SoftSIMD - Exploiting Subword Parallelism Using Source Code Transformations, 2007 Design, Automation & Test in Europe Conference & Exhibition, pp.1349-1354, 2007. ,
DOI : 10.1109/DATE.2007.364485
Compilation Techniques for Multimedia Processors, International Journal of Parallel Programming, vol.28, issue.4, pp.347-361, 2000. ,
DOI : 10.1023/A:1007507005174
Generation of permutations for SIMD processors, ACM SIGPLAN Notices, vol.40, issue.7, pp.147-156, 2005. ,
DOI : 10.1145/1070891.1065931
Autoscaler for c: An optimizing floating-point to integer c program converter for fixed-point digital signal processors. Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on, issue.9, pp.47840-848, 2000. ,
Word-length optimization for high-level synthesis of digital signal processing systems, 1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374), pp.569-578, 1998. ,
DOI : 10.1109/SIPS.1998.715819
Compilation techniques for short-vector instructions, MAS- SACHUSETTS INSTITUTE OF TECHNOLOGY, 2006. ,
Exploiting Superword Level Parallelism with Multimedia Instruction Sets, Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pp.145-156, 2000. ,
Exploiting Vector Parallelism in Software Pipelined Loops, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), pp.119-129, 2005. ,
DOI : 10.1109/MICRO.2005.20
Increasing and detecting memory address congruence, Proceedings.International Conference on Parallel Architectures and Compilation Techniques, pp.18-29, 2002. ,
DOI : 10.1109/PACT.2002.1105970
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.6590
Subword Parallelism with MAX-2. Micro, IEEE, vol.16, issue.4, pp.51-59, 1996. ,
Fine Grain Precision Scaling for Datapath Approximations in Digital Signal Processing Systems, IFIP/IEEE International Conference on Very Large Scale Integration-System on a Chip, pp.119-143, 2013. ,
DOI : 10.1007/978-3-319-23799-2_6
URL : https://hal.archives-ouvertes.fr/hal-01380301
Code Selection for Media Processors with SIMD Instructions, Proceedings of the conference on Design, Automation and Test in Europe, pp.4-8, 2000. ,
Instruction selection for embedded DSPs with complex instructions, Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition, pp.200-205, 1996. ,
DOI : 10.1109/EURDAC.1996.558205
Joint precision optimization and high level synthesis for approximate computing, Proceedings of the 52nd Annual Design Automation Conference on, DAC '15, p.104, 2015. ,
DOI : 10.1109/TCAD.2006.873887
Effect of finite word length on the accuracy of digital filters--a review, IEEE Transactions on Circuit Theory, vol.18, issue.6, pp.670-677, 1971. ,
DOI : 10.1109/TCT.1971.1083365
A Compiler Framework for Extracting Superword Level Parallelism, Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation of PLDI '12, pp.347-358, 2012. ,
A New Algorithm to Exploit Superword Level Parallelism, 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, pp.521-527, 2013. ,
DOI : 10.1109/DASC.2013.118
Implementation of a dynamic wordlength simd multiplier, NORCHIP, 2014, pp.1-4, 2014. ,
Fast and accurate computation of the round-off noise of linear time-invariant systems, IET Circuits, Devices & Systems, vol.2, issue.4, pp.393-408, 2008. ,
DOI : 10.1049/iet-cds:20070198
Exploiting reconfigurable SWP operators for multimedia applications, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1717-1720, 2011. ,
DOI : 10.1109/ICASSP.2011.5946832
URL : https://hal.archives-ouvertes.fr/inria-00567017
Analytical fixed-point accuracy evaluation in linear time-invariant systems. Circuits and Systems I: Regular Papers, IEEE Transactions on, issue.10, pp.553197-3208, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00459231
Automatic floatingpoint to fixed-point conversion for dsp code generation, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, pp.270-276, 2002. ,
URL : https://hal.archives-ouvertes.fr/inria-00482916
Floating-to-Fixed-Point Conversion for Digital Signal Processors, EURASIP Journal on Applied Signal Processing, vol.37, issue.8, pp.77-77, 2006. ,
DOI : 10.1155/ASP/2006/96421
URL : https://hal.archives-ouvertes.fr/inria-00459212
Quantization mode opportunities in fixed-point system design, 18th European Signal Processing Conference, pp.542-546, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00534526
Automatic sqnr determination in non-linear and non-recursive fixed-point systems, Signal Processing Conference 12th European, pp.1349-1352, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00482941
DSP Code Generation with Optimized Data Word- Length Selection. Software and Compilers for Embedded Systems, pp.214-228, 2004. ,
URL : https://hal.archives-ouvertes.fr/inria-00482942
Automatic evaluation of the accuracy of fixed-point algorithms, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 2002. ,
DOI : 10.1109/DATE.2002.998351
URL : https://hal.archives-ouvertes.fr/inria-00482931
Autovectorization in GCC, Proceedings of the 2004 GCC Developers Summit, pp.105-118, 2004. ,
Novel algorithms for word-length optimization, Signal Processing Conference 19th European, pp.1944-1948, 2011. ,
A comparison study of automatically vectorizing Fortran compilers, Proceedings of the 1989 ACM/IEEE conference on Supercomputing , Supercomputing '89, pp.820-825, 1989. ,
DOI : 10.1145/76263.76356
Openmp and automatic parallelization in gcc, the Proceedings of the GCC Developers, 2006. ,
Accuracy vs Speed Tradeoffs in the Estimation of Fixed-Point Errors on Linear Time-Invariant Systems, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pp.15-20, 2013. ,
DOI : 10.7873/DATE.2013.018
Auto-vectorization of interleaved data for SIMD, ACM SIGPLAN Notices, vol.41, issue.6, pp.132-143, 2006. ,
DOI : 10.1145/1133255.1133997
Advanced compiler optimizations for supercomputers, Communications of the ACM, vol.29, issue.12, pp.1184-1201, 1986. ,
Fast performance evaluation of fixed-point systems with un-smooth operators, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp.9-16, 2010. ,
DOI : 10.1109/ICCAD.2010.5654064
URL : https://hal.archives-ouvertes.fr/inria-00534527
MMX technology extension to the Intel architecture, IEEE Micro, vol.16, issue.4, pp.42-50, 1996. ,
DOI : 10.1109/40.526924
A survey of concurrent object-oriented languages. Concurrency -Practice and Experience, pp.917-980, 2000. ,
Throttling Automatic Vectorization: When Less is More, 2015 International Conference on Parallel Architecture and Compilation (PACT), 2015. ,
DOI : 10.1109/PACT.2015.32
Pointer alignment analysis for processors with simd instructions, Proceedings of the 5th Workshop on Media and Streaming Processors, pp.50-57, 2003. ,
Compiler optimizations for processors with simd instructions. Software: Practice and Experience, pp.93-113, 2007. ,
Sub-word Handling in Data-parallel Mapping, ARCS Workshops (ARCS), 2012, pp.1-7, 2012. ,
A preliminary study on the vectorization of multimedia applications for multimedia extensions, Languages and Compilers for Parallel Computing, pp.420-435, 2004. ,
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions, 19th IEEE International Parallel and Distributed Processing Symposium, pp.89-89, 2005. ,
DOI : 10.1109/IPDPS.2005.94
Optimizing data permutations for simd devices, SIGPLAN Not, vol.41, issue.6, pp.118-131, 2006. ,
Compact multi-dimensional kernel extraction for register tiling, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.45, 2009. ,
DOI : 10.1145/1654059.1654105
Analytical approach for numerical accuracy estimation of fixed-point systems based on smooth operations. Circuits and Systems I, pp.592326-2339, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00741741
Fixed-point configurable hardware components, EURASIP J. Embedded Syst, issue.1, pp.20-20, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00455557
C*: an extended c language for data parallel programming, International Conference on Supercomputing, 1987. ,
An algorithm for converting floating-point computations to fixed-point in MATLAB based FPGA design, Proceedings of the 41st annual conference on Design automation , DAC '04, pp.484-487, 2004. ,
DOI : 10.1145/996566.996701
An Algorithm for Trading Off Quantization Error with Hardware Resources for MATLAB-Based FPGA Design, IEEE Transactions on Computers, vol.54, issue.7, pp.886-896, 2005. ,
DOI : 10.1109/TC.2005.106
Pointer analysis for multithreaded programs, ACM SIGPLAN Notices, vol.34, issue.5, pp.77-90, 1999. ,
DOI : 10.1145/301631.301645
The CRAY-1 Computer System, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978. ,
past, present, and future, IEEE Spectrum, vol.34, issue.6, pp.52-59, 1997. ,
Exploring the limits of sub-word level parallelism, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622), pp.81-91, 2000. ,
DOI : 10.1109/PACT.2000.888333
Automated Data Flow Graph Partitioning for a Hierarchical Approach to Wordlength Optimization, 2014. ,
DOI : 10.1007/978-3-319-05960-0_12
Complete Register Allocation Problems, SIAM Journal on Computing, vol.4, issue.3, pp.226-248, 1975. ,
DOI : 10.1137/0204020
An automated floating-point to fixed-point conversion methodology, Proc. IEEE Int. Conf. on Acoust., Speech, and Signal Processing, pp.529-532, 2003. ,
A perturbation theory on statistical quantization effects in fixed-point dsp with non-stationary inputs, Circuits and Systems Proceedings of the 2004 International Symposium on, pp.373-379, 2004. ,
Compiler Optimizations for Architectures Supporting Superword-level Parallelism, p.3196891, 2005. ,
Introducing Control Flow into Vectorized Code, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.280-291, 2007. ,
DOI : 10.1109/PACT.2007.4336219
Exploiting Superword-Level Locality in Multimedia Extension Architectures, J. Instr. Level Parallel, vol.5, pp.1-28, 2003. ,
Superword-Level Parallelism in the Presence of Control Flow, Proceedings of the international symposium on Code generation and optimization, pp.165-175, 2005. ,
A vectorizing compiler for multimedia extensions, International Journal of Parallel Programming, vol.28, issue.4, pp.363-400, 2000. ,
DOI : 10.1023/A:1007559022013
A cycle-approximate, mixed-ISA simulator for the KAHRISMA architecture, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.21-26, 2012. ,
DOI : 10.1109/DATE.2012.6176426
Simulation-based word-length optimization method for fixed-point digital signal processing systems, IEEE Transactions on Signal Processing, vol.43, issue.12, pp.3087-3090, 1995. ,
DOI : 10.1109/78.476465
Inter-iteration Scalar Replacement Using Array SSA Form, International Conference on Compiler Construction, pp.40-60, 2014. ,
DOI : 10.1007/978-3-642-54807-9_3
Bitwidth aware global register allocation, ACM SIGPLAN Notices, vol.38, issue.1, pp.85-96, 2003. ,
DOI : 10.1145/640128.604139
A Code Selection Method for SIMD Processors with PACK Instructions, pp.66-80 ,
DOI : 10.1007/978-3-540-39920-9_6
Pack Transposition: Enhancing Superword Level Parallelism Exploitation, ParCo, pp.573-580, 2005. ,
Improving superword level parallelism support in modern compilers, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES+ISSS '05, pp.303-308, 2005. ,
DOI : 10.1145/1084834.1084909
Polyhedralmodel guided loop-nest auto-vectorization, Parallel Architectures and Compilation Techniques , 2009. PACT'09. 18th International Conference on, pp.327-337, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00645325
MMX-based DCT and MC Algorithms for Real-Time Pure Software MPEG Decoding, Multimedia Computing and Systems, 1999. IEEE International Conference on, pp.357-362, 1999. ,
Joint scheduling and layout optimization to enable multi-level vectorization, IMPACT, 2012. ,
isl: An Integer Set Library for the Polyhedral Model, International Congress on Mathematical Software, pp.299-302, 2010. ,
DOI : 10.1007/978-3-642-15582-6_49
Accuracy sensitive word-length selection for algorithm optimization, Computer Design: VLSI in Computers and Processors, 1998. ICCD'98. Proceedings. International Conference on, pp.54-61, 1998. ,
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion, International Symposium on Code Generation and Optimization, 2005. CGO 2005, pp.153-164, 2005. ,
An integrated simdization framework using virtual vectors, Proceedings of the 19th annual international conference on Supercomputing , ICS '05, pp.169-178, 2005. ,
DOI : 10.1145/1088149.1088172