M. Graphics, Catapult-c synthesis

C. Bastoul, Code generation in the polyhedral model is easier than you think, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004., pp.7-16, 2004.
DOI : 10.1109/PACT.2004.1342537

URL : https://hal.archives-ouvertes.fr/hal-00017260

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, PLuTo: A practical and fully automatic polyhedral program optimization system, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2008.

L. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos, Iterative optimization in the polyhedral model: Part II, multidimensional time, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08), pp.90-100, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01257273

K. Muthukumar and G. Doshi, Software Pipelining of Nested Loops, Proceedings of the 10th International Conference on Compiler Construction, ser. CC '01, pp.165-181, 2001.
DOI : 10.1007/3-540-45306-7_12

H. Rong, Z. Tang, R. Govindarajan, A. Douillet, and G. R. Gao, Single-dimension software pipelining for multidimensional loops, ACM Transactions on Architecture and Code Optimization, vol.4, issue.1, 2007.
DOI : 10.1145/1216544.1216550

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.1796

S. Derrien, S. Rajopadhye, and S. Kolay, Combined instruction and loop parallelism in array synthesis for FPGAs, Proceedings of the 14th international symposium on Systems synthesis , ISSS '01, pp.165-170, 2001.
DOI : 10.1145/500001.500039

J. Teich, L. Thiele, and L. Z. Zhang, Partitioning processor arrays under resource constraints, The Journal of VLSI Signal Processing, vol.17, issue.1, pp.5-20, 1997.
DOI : 10.1023/A:1007935215591

M. W. Benabderrahmane, L. Pouchet, A. Cohen, and C. Bastoul, The Polyhedral Model Is More Widely Applicable Than You Think, Compiler Construction, pp.283-303, 2010.
DOI : 10.1007/978-3-642-11970-5_16

URL : https://hal.archives-ouvertes.fr/inria-00551087

J. F. Collard, D. Barthou, and P. Feautrier, Fuzzy array dataflow analysis, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.92-101, 1995.
DOI : 10.1145/209936.209947

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.6305

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, pp.23-53, 1991.
DOI : 10.1007/BF01407931

F. Quilleré, S. Rajopadhye, and D. Wilde, Generation of efficient nested loops from polyhedra, International Journal of Parallel Programming, vol.28, issue.5, pp.469-4981007554627716, 2000.
DOI : 10.1023/A:1007554627716

W. Kelly, W. Pugh, and E. Rosser, Code generation for multiple mappings, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp.332-341, 1995.
DOI : 10.1109/FMPC.1995.380437

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.8696

P. Boulet and P. Feautrier, Scanning polyhedra without Do-loops, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), p.4, 1998.
DOI : 10.1109/PACT.1998.727127

URL : https://hal.archives-ouvertes.fr/inria-00564990

A. Guillou, P. Quinton, and T. Risset, Hardware Synthesis for Multi- Dimensional Time, " in ASAP, IEEE Computer Society, pp.40-50, 2003.

A. Turjan, B. Kienhuis, and E. F. Deprettere, Classifying interprocess communication in process network representation of nested-loop programs, ACM Transactions on Embedded Computing Systems, vol.6, issue.2, 2007.
DOI : 10.1145/1234675.1234680

S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe, Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions, Algorithmica, vol.48, issue.1, pp.37-66, 2007.
DOI : 10.1007/s00453-006-1231-0

P. Feautrier, Parametric integer programming, RAIRO - Operations Research, vol.22, issue.3, pp.243-268, 1988.
DOI : 10.1051/ro/1988220302431

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.9957

S. Verdoolaege, Available: http://www A library for doing polyhedral operations, IRISA, Tech. Rep, 1993.

P. Feautrier, Some efficient solutions to the affine scheduling problem. I. One-dimensional time, International Journal of Parallel Programming, vol.40, issue.6, pp.313-347, 1992.
DOI : 10.1007/BF01407835

S. Verdoolaege, Handbook of Signal Processing Systems, Polyhedral process networks, 2004.

C. Zissulescu, B. Kienhuis, and E. F. Deprettere, Increasing Pipelined IP Core Utilization in Process Networks Using Exploration, " in FPL, ser. Lecture Notes in Computer Science, pp.690-699, 2004.

M. S. Lam, Software pipelining, PLDI, pp.318-328, 1988.
DOI : 10.1145/989393.989420

H. Yun, J. Kim, and S. Moon, Time optimal software pipelining of loops with control flows, International Journal of Parallel Programming, vol.31, issue.5, pp.339-3911027387028481, 2003.
DOI : 10.1023/A:1027387028481

C. Akturan and J. M. , CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281), pp.112-118, 2001.
DOI : 10.1109/ICCAD.2001.968606

M. Fellahi and A. Cohen, Software Pipelining in Nested Loops with Prolog-Epilog Merging, " in HiPEAC, ser. Lecture Notes in Computer Science, pp.80-94, 2009.

M. T. O-'keefe and H. G. Dietz, Loop coalescing and scheduling for barrier MIMD architectures, IEEE Transactions on Parallel and Distributed Systems, vol.4, issue.9, pp.1060-1064, 1993.
DOI : 10.1109/71.243531

C. Bastoul and P. Feautrier, ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY, Parallel Processing Letters, vol.15, issue.01n02, pp.3-17, 2005.
DOI : 10.1142/S0129626405002027

N. Vasilache, A. Cohen, and L. Pouchet, Automatic Correction of Loop Transformations, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.292-30417, 2007.
DOI : 10.1109/PACT.2007.4336220

URL : https://hal.archives-ouvertes.fr/hal-01257283

V. Basupalli, T. Yuki, S. V. Rajopadhye, A. Morvan, S. Derrien et al., ompVerify: Polyhedral Analysis for the OpenMP Programmer, 7th International Workshop on OpenMP, IWOMP 2011, pp.37-53, 2011.
DOI : 10.1007/978-3-642-13217-9_2

URL : https://hal.archives-ouvertes.fr/hal-00752626

R. Chikhi, S. Derrien, A. Noumsi, and P. Quinton, Combining Flash Memory and FPGAs to Efficiently Implement a Massively Parallel Algorithm for Content- Based Image Retrieval, Reconfigurable Computing: Architectures, Tools and Applications, Third International Workshop, pp.247-258, 2007.

A. Cornu, S. Derrien, and D. Lavenier, HLS Tools for FPGA: Faster Development with Better Performance, Reconfigurable Computing: Architectures, Tools and Applications -7th International Symposium, ARC 2011, pp.67-78, 2011.
DOI : 10.1007/978-3-642-19475-7_8

D. Vivek, O. Tovinakere, S. Sentieys, and . Derrien, A Polynomial Based Approach to Wakeup Time and Energy Estimation in Power-Gated Logic Clusters Accepted for publication on, Journal of Low Power Electronics, p.51, 2011.

T. D. Vivek, O. Sentieys, and S. Derrien, Wakeup Time and Wakeup Energy Estimation in Power-Gated Logic Clusters, 2011 24th Internatioal Conference on VLSI Design, pp.340-345, 2011.
DOI : 10.1109/VLSID.2011.18

A. Darte, S. Derrien, and T. Risset, Hardware/Software Interface for Multi-Dimensional Processor Arrays, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05), pp.23-25, 2005.
DOI : 10.1109/ASAP.2005.38

S. Derrien, A. Guillou, P. Quinton, T. Risset, and C. Wagner, Automatic Synthesis of Efficient Interfaces for Compiled Regular Architectures, Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS), pp.56-57, 2002.

S. Derrien and P. Quinton, Parallezing HMMER for Hardware Acceleration on FP- GAs, 18th IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pp.10-17, 2007.

S. Derrien and P. Quinton, Hardware Acceleration of HMMER on FPGAs, Journal of Signal Processing Systems, vol.21, issue.16, pp.53-67, 2010.
DOI : 10.1007/s11265-008-0262-y

URL : https://hal.archives-ouvertes.fr/inria-00453947

S. Derrien and S. Rajopadhye, FCCMs and the memory wall, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871), pp.329-330, 2000.
DOI : 10.1109/FPGA.2000.903439

S. Derrien and S. Rajopadhye, Loop Tiling for Reconfigurable Accelerators, Proceedings of the 11th International Conference on Field-Programmable Logic and Applications, pp.398-408, 2001.
DOI : 10.1007/3-540-44687-7_41

S. Derrien and S. Rajopadhye, Energy/power estimation of regular processor arrays, Proceedings of the 15th international symposium on System Synthesis , ISSS '02, pp.50-55, 2002.
DOI : 10.1145/581199.581212

S. Derrien, S. Rajopadhye, and S. Sur-kolay, Optimal Partitionning for FPGA Based Regular Array Implementations, IEEE PARELEC'00, pp.155-159, 2000.

S. Derrien, S. V. Rajopadhye, and S. Sur-kolay, Combining Instruction and Loop Level Parallelism for Array Synthesis on FPGAs, International Symposium on System Synthesis (ISSS'01), pp.273-282, 2001.

S. Derrien and T. Risset, Interfacing Compiled FPGA Programs: the MMAlpha Approach, PDPTA2000: Second International Workshop on Engineering of Reconfigurable Hardware/Software Objects, pp.189-195, 2000.

S. Derrien, A. Turjan, and C. Kienhuis, Deriving Efficient Control for Process Networks, Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS), p.64, 2003.

S. Derrien, A. Turjan, C. Zissulescu, B. Kienhuis, and E. F. Deprettere, Deriving efficient control in Process Networks with Compaan/Laura, International Journal of Embedded Systems, vol.3, issue.3, pp.170-180, 2008.
DOI : 10.1504/IJES.2008.020298

A. Floch, T. Yuki, C. Guy, S. Derrien, S. Benoit-combemale et al., Model-Driven Engineering and Optimizing Compilers: A Bridge Too Far?, ACM/IEEE 14th International Conference on Model Driven Engineering Languages and Systems (Models'11), pp.608-622, 2011.
DOI : 10.1007/s10270-006-0036-6

URL : https://hal.archives-ouvertes.fr/inria-00613575

S. Guyetant, M. Giraud, L. Ludovic, S. Hours, S. Derrien et al., Cluster of re-configurable nodes for scanning large genomic banks, Parallel Computing, vol.31, issue.1, pp.73-96, 2005.
DOI : 10.1016/j.parco.2004.12.005

D. Lavenier, S. Guyetant, S. Derrien, and S. Rubini, A Reconfigurable Parallel Disk System for Filtering Genomic Banks, Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, pp.154-166, 2003.

A. Morvan, S. Derrien, and P. Quinton, Efficient nested loop pipelining in high level synthesis using polyhedral bubble insertion, 2011 International Conference on Field-Programmable Technology, p.66, 2011.
DOI : 10.1109/FPT.2011.6132715

URL : https://hal.archives-ouvertes.fr/hal-00746434

A. Noumsi, S. Derrien, and P. Quinton, Acceleration of a content-based image-retrieval application on the RDISK cluster, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pp.109-109, 2006.
DOI : 10.1109/IPDPS.2006.1639346

S. Muhammad-adeel-pasha, O. Derrien, and . Sentieys, Ultra Low-Power FSM for Controlled Oriented Applications, ISCAS '09: IEEE International Symposium on Circuits and Systems, pp.1577-1580, 2009.

S. Muhammad-adeel-pasha, O. Derrien, and . Sentieys, A Complete Design- Flow for the Generation of Ultra Low-Power WSN Node Architectures Based on Micro- Tasking, Proc. of the IEEE/ACM Design Automation Conference (DAC), pp.693-698, 2010.

S. Muhammad-adeel-pasha, O. Derrien, and . Sentieys, System-Level Synthesis for Ultra Low-Power Wireless Sensor Nodes, Proc. of the 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), pp.493-500, 2010.

S. Muhammad-adeel-pasha, O. Derrien, and . Sentieys, System Level Synthesis for Wireless Sensor Node Controllers: A Complete Design Flow, ACM Transactions on Design Automation of Electronic Systems, vol.51, pp.44-52, 2011.

S. Muhammad-adeel-pasha, O. Derrien, and . Sentieys, Toward Ultra Low-Power Hardware Specialization of a Wireless Sensor Network Node, INMIC 2009. IEEE International Multi Topic Conference, pp.1-6, 2009.

R. Chikhi, S. Derrien, A. Noumsi, and P. Quinton, Combining flash memory and FPGAs to efficiently implement a massively parallel algorithm for content-based image retrieval, International Journal of Electronics, vol.95, issue.7, pp.621-635, 2008.
DOI : 10.1109/40.592312

. Frama-c, a Framework for Modular Analysis of C programs, p.72

C. Alias, F. Baray, and A. Darte, Bee+cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose, Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'07), pp.73-82, 2007.

S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, vol.25, issue.17, pp.3899-3402, 1997.
DOI : 10.1093/nar/25.17.3389

L. Amsaleg and P. Gros, Content-based Retrieval Using Local Descriptors: Problems and Issues from a Database Perspective, Pattern Analysis & Applications, vol.4, issue.2-3, pp.108-124, 2001.
DOI : 10.1007/s100440170011

URL : https://hal.archives-ouvertes.fr/inria-00568567

A. Muthu-manikandan-baskaran, S. Hartono, T. Tavarageri, J. Henretty, P. Ramanujam et al., Parameterized tiling revisited, Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization , CGO '10, pp.200-209, 2010.

C. Bastoul, Efficient code generation for automatic parallelization and optimization, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings., pp.23-30, 2003.
DOI : 10.1109/ISPDC.2003.1267639

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.621.1073

C. Bastoul, Code generation in the polyhedral model is easier than you think, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004., pp.7-16, 2004.
DOI : 10.1109/PACT.2004.1342537

URL : https://hal.archives-ouvertes.fr/hal-00017260

R. Bezerra-batista, A. Boukerche, A. C. , and M. Alves-de-melo, A parallel strategy for biological sequence alignment in restricted memory space, Journal of Parallel and Distributed Computing, vol.68, issue.4, pp.548-561, 2008.
DOI : 10.1016/j.jpdc.2007.08.007

M. Bednara and J. Teich, Interface synthesis for FPGA-based VLSI processor arrays, International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA-02), p.57, 2002.

M. W. Benabderrahmane, L. Pouchet, A. Cohen, and C. Bastoul, The Polyhedral Model Is More Widely Applicable Than You Think, Compiler Construction, pp.283-303, 2010.
DOI : 10.1007/978-3-642-11970-5_16

URL : https://hal.archives-ouvertes.fr/inria-00551087

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, PLuTo: A practical and fully automatic polyhedral program optimization system, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.53-72, 2008.

J. L. Bosque, O. D. Robles, A. Rodriguez, and L. Pastor, Study of a parallel CBIR implementation using MPI, Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception, p.19, 2000.
DOI : 10.1109/CAMP.2000.875978

P. Boulet and P. Feautrier, Scanning polyhedra without Do-loops, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), pp.64-68, 1998.
DOI : 10.1109/PACT.1998.727127

URL : https://hal.archives-ouvertes.fr/inria-00564990

R. P. Brent and H. T. Kung, A Regular Layout for Parallel Adders, IEEE Transactions on Computers, vol.31, issue.3, pp.31260-264, 1982.
DOI : 10.1109/TC.1982.1675982

J. Bu, E. F. Deprettere, and P. Dewilde, A design methodology for fixed-size systolic arrays, [1990] Proceedings of the International Conference on Application Specific Array Processors, pp.591-602, 1990.
DOI : 10.1109/ASAP.1990.145495

Y. Cai, E. F. Haratsch, M. Mccartney, and K. Mai, FPGA-Based Solid-State Drive Prototyping Platform, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, pp.101-104, 2011.
DOI : 10.1109/FCCM.2011.28

P. Clauss and V. Loechner, Parametric analysis of polyhedral iteration spaces, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96, pp.179-194, 1998.
DOI : 10.1109/ASAP.1996.542833

URL : https://hal.archives-ouvertes.fr/inria-00534840

J. F. Collard, D. Barthou, and P. Feautrier, Fuzzy array dataflow analysis, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.92-101, 1995.
DOI : 10.1145/209936.209947

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.6305

A. Darte, Regular partitioning for synthesizing fixed-size systolic arrays. Integration, The VLSI Journal, pp.293-304, 1991.
DOI : 10.1016/0167-9260(91)90026-h

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.7037

A. Darte and J. Delosme, Partitioning for array processors, p.57, 1991.

A. Darte, R. Schreiber, B. R. Rau, and F. Vivien, Constructing and exploiting linear schedules with prescribed parallelism, ACM Transactions on Design Automation of Electronic Systems, vol.7, issue.1, pp.159-172, 2002.
DOI : 10.1145/504914.504921

URL : https://hal.archives-ouvertes.fr/hal-00807410

A. Darte, R. Schreiber, B. R. Rau, and F. Vivien, Constructing and exploiting linear schedules with prescribed parallelism, ACM Transactions on Design Automation of Electronic Systems, vol.7, issue.1, pp.159-172, 2002.
DOI : 10.1145/504914.504921

URL : https://hal.archives-ouvertes.fr/hal-00807410

J. Florent-de-dinechin, O. Detrey, R. Cret, and . Tudoran, When FPGAs are better at floating-point than microprocessors, Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, pp.260-279, 2008.

C. Florent-de-dinechin, B. Klein, and . Pasca, Generating high-performance custom floating-point pipelines, 19th International Conference on Field Programmable Logic and Applications, pp.59-64, 2009.

B. Dutertre and L. De-moura, Yices: An SMT Solver, p.72, 2006.

H. Dutta, F. Hannig, H. Ruckdeschel, and J. Teich, Efficient control generation for mapping nested loop programs onto processor arrays, Journal of Systems Architecture, vol.53, issue.5-6, pp.300-309, 2007.
DOI : 10.1016/j.sysarc.2006.10.009

H. Dutta, F. Hannig, and J. Teich, PARO ? A Design Tool for Synthesis of Hardware Accelerators for SoCs, Tool Presentation at the University Booth at Design, Automation and Test in Europe (DATE), p.54, 2010.

S. Eddy, HMMER3: a new generation of sequence homology search software, p.28

S. R. Eddy, Profile hidden Markov models, Bioinformatics, vol.14, issue.9, pp.755-763, 1998.
DOI : 10.1093/bioinformatics/14.9.755

S. R. Eddy, Accelerated profile HMM searches (preprint), p.6, 2011.
DOI : 10.1371/journal.pcbi.1002195

URL : http://doi.org/10.1371/journal.pcbi.1002195

V. Ekanayake, C. Kelly, I. , and R. Manohar, An ultra low-power processor for sensor networks, ACM SIGOPS Operating Systems Review, vol.38, issue.5, pp.27-36, 2004.
DOI : 10.1145/1037949.1024397

J. F. , E. Giraldo, N. Moreano, R. P. Jacobi, A. C. et al., A hmmer hardware accelerator using divergences, Proceedings of the Conference on Design, Automation and Test in Europe, pp.405-410, 2010.

M. Fellahi and A. Cohen, Software Pipelining in Nested Loops with Prolog-Epilog Merging, High Performance Embedded Architectures and Compilers, Fourth International Conference Proceedings, pp.80-94, 2009.
DOI : 10.1007/978-3-540-92990-1_8

URL : https://hal.archives-ouvertes.fr/inria-00445489

D. Fimmel, Generation of scheduling functions supporting LSGP-partitioning, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pp.349-358, 2000.
DOI : 10.1109/ASAP.2000.862405

A. Fraboulet and T. Risset, Efficient on-chip communications for data-flow IPs, IEEE International Conference on Application-specific Systems, Architectures and Processors, pp.293-303, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00399632

C. W. Fraser, R. R. Henry, and T. A. Proebsting, BURG, ACM SIGPLAN Notices, vol.27, issue.4, pp.68-76, 1992.
DOI : 10.1145/131080.131089

N. Ganesan, R. D. Chamberlain, J. Buhler, and M. Taufer, Accelerating HMMER on GPUs by implementing hybrid data and task parallelism, Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB '10, pp.418-421, 2010.
DOI : 10.1145/1854776.1854844

S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello et al., Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies, International Journal of Parallel Programming, vol.20, issue.1, pp.261-317, 2006.
DOI : 10.1007/s10766-006-0012-3

URL : https://hal.archives-ouvertes.fr/hal-01257288

L. Gonnord and N. Halbwachs, Combining Widening and Acceleration in Linear Relation Analysis, 13th International Static Analysis Symposium, SAS'06, p.72, 2006.
DOI : 10.1007/11823230_10

URL : https://hal.archives-ouvertes.fr/hal-00189614

M. Graphics, Catapult-C Synthesis, p.56

A. Guillou, P. Quinton, T. Risset, C. Wagner, and D. Massicotte, High-level design of digital filters in mobile communications, DATE Design Contest, p.54, 2001.

A. Guillou, P. Quinton, and T. Risset, Hardware synthesis for multi-dimensional time, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003, pp.40-50, 2003.
DOI : 10.1109/ASAP.2003.1212828

T. Han and D. A. Carlson, Fast area-efficient VLSI adders, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH), pp.49-55, 1987.
DOI : 10.1109/ARITH.1987.6158699

F. Hannig, H. Ruckdeschel, and J. Teich, The PAULA Language for Designing Multi-Dimensional Dataflow-Intensive Applications, Proceedings of the GI/ITG/GMM-Workshop ? Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen, pp.129-138, 2008.

F. Hannig and J. Teich, Output serialization for fpga-based and coarse-grained processor arrays, Proceedings of The 2005 International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 2005, pp.78-84, 2005.

D. Harris, A taxonomy of parallel prefix networks, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, pp.2213-2217, 2003.
DOI : 10.1109/ACSSC.2003.1292373

M. Hempstead, . Gu-yeon, D. Wei, and . Brooks, An accelerator-based wireless sensor network processor in 130nm CMOS, Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, CASES '09, pp.215-222, 2009.
DOI : 10.1145/1629395.1629426

R. Hess, An open-source SIFTLibrary, Proceedings of the international conference on Multimedia, MM '10, pp.1493-1496, 2010.
DOI : 10.1145/1873951.1874256

D. T. Hoang, Searching genetic databases on Splash 2, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines, pp.185-191, 1993.
DOI : 10.1109/FPGA.1993.279464

D. R. Horn, M. Houston, and P. Hanrahan, ClawHMMER: A Streaming HMMer-Search Implementatio, ACM/IEEE SC 2005 Conference (SC'05), p.47
DOI : 10.1109/SC.2005.18

F. Irigoin and R. Triolet, Supernode partitioning, Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '88, pp.319-329, 1988.
DOI : 10.1145/73560.73588

G. Kahn, The Semantics of a Simple Language For Parallel Programming, Proceedings of the IFIP Congress, p.56, 1974.

K. Keeton, D. A. Patterson, and J. M. Hellerstein, A case for intelligent disks (idisks). SIGMOD Rec, pp.42-52, 1998.

A. C. Kienhuis, Design Space Exploration of Stream-based Dataflow Architectures: Method and Tools, p.63, 1999.

B. Kienhuis, E. Rijpkema, and E. F. Deprettere, Compaan, Proceedings of the eighth international workshop on Hardware/software codesign , CODES '00, p.61, 2000.
DOI : 10.1145/334012.334015

B. Kienhuis, E. Rijpkema, and E. F. Deprettere, Deriving Process Networks from Nested Loop Alogorithms, Proc. 8th International Workshop on Hardware/Software Codesign (CODES'2000), p.56, 2000.

S. Knowles, A family of adders, ARITH '99: Proceedings of the 14th IEEE Symposium on Computer Arithmetic, p.35, 1999.

M. Peter, H. S. Kogge, and . Stone, A parallel algorithm for the efficient solution of a general class of recurrence equations, IEEE Transcation on Computers, vol.22, issue.8, pp.786-793, 1973.

A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler, Hidden Markov Models in Computational Biology, Journal of Molecular Biology, vol.235, issue.5, pp.1501-1531, 1994.
DOI : 10.1006/jmbi.1994.1104

H. Kung, E. Charles, and . Leiserson, Algorithms for VLSI Processor Arrays, p.54, 1978.

J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, A 65 nm Sub-<formula formulatype="inline"><tex Notation="TeX">$V_{t}$</tex> </formula> Microcontroller With Integrated SRAM and Switched Capacitor DC-DC Converter, IEEE Journal of Solid-State Circuits, vol.44, issue.1, pp.115-126, 2009.
DOI : 10.1109/JSSC.2008.2007160

E. Richard, M. J. Ladner, and . Fischer, Parallel Prefix Computation, Journal of ACM, vol.27, issue.4, pp.831-838, 1980.

M. Lam, Software pipelining, Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, PLDI '88, pp.318-328, 1988.
DOI : 10.1145/989393.989420

D. Lavenier, G. Georges, and X. Liu, A Reconfigurable Index FLASH Memory tailored to Seed-Based Genomic Sequence Comparison Algorithms, The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, vol.85, issue.11, pp.255-269, 2007.
DOI : 10.1007/s11265-007-0073-6

URL : https://hal.archives-ouvertes.fr/inria-00178314

H. Lejsek, B. Jónsson, and L. Amsaleg, NV-Tree, Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pp.1-54, 2011.
DOI : 10.1145/1991996.1992050

URL : https://hal.archives-ouvertes.fr/hal-00644939

L. Ludovic and . Hours, Generating efficient custom fpga soft-cores for control-dominated applications, 16th IEEE International Conference on Application-Specific Systems, Architectures , and Processors, pp.23-25, 2005.

T. Li, M. Huang, T. El-ghazawi, and H. H. Huang, Reconfigurable Active Disk: An FPGA Accelerated Storage Architecture for Data-Intensive Applications, Symposium on Application Accelerators in High-Performance Computing, p.24, 2009.

S. Liao, S. Devadas, K. Keutzer, and S. Tjiang, Instruction selection using binate covering for code size optimization, IEEE/ACM International Conference on Computer-Aided Design, ICCAD'95, pp.393-399, 1995.

E. A. Lin, J. M. Rabaey, and A. Wolisz, Power-efficient rendez-vous schemes for dense wireless sensor networks, 2004 IEEE International Conference on Communications (IEEE Cat. No.04CH37577), pp.3769-3776, 2004.
DOI : 10.1109/ICC.2004.1313259

E. Lindahl, HMMer Altivec Implementation, p.37, 2005.

L. L. Hours, Generating Efficient Custom FPGA Soft-Cores for Control-Dominated Applications, Proceedings of the IEEE International Conference on Application-Specific Systems, Architecture Processors: ASAP '05, pp.127-133, 2005.

J. Luu, I. Kuon, P. Jamieson, T. Campbell, A. Ye et al., Vpr 5.0: Fpga cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling, Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '09, pp.133-142, 2009.

R. P. Maddimsetty, J. Buhler, R. D. Chamberlain, M. A. Franklin, and B. Harris, Accelerator design for protein sequence HMM search, Proceedings of the 20th annual international conference on Supercomputing , ICS '06, pp.37-38, 2006.
DOI : 10.1145/1183401.1183442

K. Martin, C. Wolinski, K. Kuchcinski, A. Floch, and F. Charot, Constraint-Driven Instructions Selection and Application Scheduling in the DURASE system, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, p.45
DOI : 10.1109/ASAP.2009.19

URL : https://hal.archives-ouvertes.fr/inria-00449747

S. Meijer, H. Nikolov, and T. Stefanov, On compile-time evaluation of process partitioning transformations for Kahn process networks, Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, CODES+ISSS '09, pp.31-40, 2009.
DOI : 10.1145/1629435.1629441

G. Memik, M. T. Kandemir, and A. Choudhary, Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads, Journal of Parallel and Distributed Computing, vol.61, issue.11, pp.611633-1664, 2001.
DOI : 10.1006/jpdc.2001.1743

I. Dan, J. A. Moldovan, and . Fortes, Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays, IEEE Transactons on Computers, vol.35, issue.1, pp.1-12, 1986.

D. I. Moldovan and J. A. Fortes, Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays, IEEE Transactions on Computers, vol.35, issue.1, pp.1-12, 1986.
DOI : 10.1109/TC.1986.1676652

R. Mueller and J. Teubner, FPGAs, Proceedings of the 13th International Conference on Extending Database Technology, EDBT '10, pp.721-723, 2010.
DOI : 10.1145/1739041.1739137

K. Muthukumar and G. Doshi, Software Pipelining of Nested Loops, Proceedings of the 10th International Conference on Compiler Construction, CC '01, pp.165-181, 2001.
DOI : 10.1007/3-540-45306-7_12

S. Mysore, B. Agrawal, F. T. Chong, and T. Sherwood, Exploring the Processor and ISA Design for Wireless Sensor Network Applications, 21st International Conference on VLSI Design (VLSID 2008), pp.59-64, 2008.
DOI : 10.1109/VLSI.2008.72

L. Nazhandali, M. Minuth, and T. Austin, Sensebench: toward an accurate evaluation of sensor network processors, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005., pp.197-203, 2005.
DOI : 10.1109/IISWC.2005.1526017

T. Oliver, B. Schmidt, Y. Jakop, and D. L. Maskell, Accelerating the Viterbi Algorithm for Profile Hidden Markov Models Using Reconfigurable Hardware, International Conference on Computational Science, pp.33-38, 2006.
DOI : 10.1007/11758501_71

T. Oliver, L. Y. Yeow, and B. Schmidt, High Performance Database Searching with HMMer on FPGAs, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.37-38, 2007.
DOI : 10.1109/IPDPS.2007.370448

J. Park and P. C. Diniz, Synthesis of pipelined memory access controllers for streamed data applications on FPGA-based computing engines, Proceedings of the 14th international symposium on Systems synthesis , ISSS '01, pp.221-226, 2001.
DOI : 10.1145/500001.500054

J. Park and P. C. Diniz, Synthesis and estimation of memory interfaces for FPGA-based reconfigurable computing engines, International Symposium on FPGA Custom Computing Machines, p.57, 2003.

A. P. Fabien, R. J. Petitcolas, M. G. Anderson, and . Kuhn, Attacks on copyright marking systems, Information Hiding, pp.218-238, 1998.

A. Plesco, Program Transformations and Memory Architecture Optimizations for High-Level Synthesis of Hardware Accelerators Ecole normale sup?Alsup? sup?Al'rieure de Lyon, pp.56-72, 2010.

L. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos, Iterative optimization in the polyhedral model: Part II, multidimensional time, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08), pp.90-100, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01257273

L. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam et al., Loop Transformations: Convexity, Pruning and Optimization, 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL'11), pp.549-562, 2011.
DOI : 10.1145/1925844.1926449

URL : https://hal.archives-ouvertes.fr/inria-00551077

P. Quinton, Automatic synthesis of systolic arrays from uniform recurrent equations, Proceedings of the 11th annual international symposium on Computer architecture, ISCA '84, pp.208-214, 1984.

L. Rabiner and B. Juang, An introduction to hidden Markov models, IEEE ASSP Magazine, vol.3, issue.1, pp.4-16, 1986.
DOI : 10.1109/MASSP.1986.1165342

R. K. Raval, Low-power TinyOS tuned processor platform for wireless sensor network motes, ACM Transactions on Design Automation of Electronic Systems, vol.15, issue.3, pp.1-17, 2010.
DOI : 10.1145/1754405.1754408

L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout, Parameterized tiled loops for free CAIRN-INRIA research group. The gecos: The generic compiler suite, Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pp.405-414, 2007.

E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle, Active disks for large-scale data processing, Computer, vol.34, issue.6, p.14, 2001.
DOI : 10.1109/2.928624

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.4090

E. Rijpkema, From Piecewise Regular Algorithms to Dataflow Architectures, p.61, 2001.

O. D. Robles, J. L. Bosque, L. Pastor, and A. Rodr¨?£¡guezrodr¨?£¡guez, Performance Analysis of a CBIR System on Shared-Memory Systems and Heterogeneous Clusters, Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05), p.19, 2005.
DOI : 10.1109/CAMP.2005.40

H. Rong, Z. Tang, R. Govindarajan, A. Douillet, and G. R. Gao, Single-dimension software pipelining for multidimensional loops, ACM Transactions on Architecture and Code Optimization, vol.4, issue.1
DOI : 10.1145/1216544.1216550

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.1796

R. Schreiber, S. Aditya, B. R. Rau, V. Kathail, S. Mahlke et al., High-level synthesis of non programmable hardware accelerators, IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP'00), pp.113-126, 2000.

M. Seok, The Phoenix Processor: A 30pW Platform for Sensor Applications, VLSI'08: Proceedings of the IEEE Symposium on VLSI Circuits, pp.188-189, 2008.

W. Shang and J. A. Fortes, Independent partitioning of algorithms with uniform dependencies, International Conference on Parallel Processing (ICPP'88), pp.26-33, 1988.
DOI : 10.1109/12.123395

M. Sheets, F. Burghardt, T. Karalar, J. Ammer, Y. H. Chee et al., A powermanaged protocol processor for wireless sensor networks, Symposium on VLSI Circuits Digest of Technical Papers, pp.212-213, 2006.

J. Sklansky, Conditional-Sum Addition Logic, IEEE Transactions on Electronic Computers, vol.9, issue.2, pp.226-231, 1960.
DOI : 10.1109/TEC.1960.5219822

T. F. Smith and M. S. Waterman, Identification of common molecular subsequences, Journal of Molecular Biology, vol.147, issue.1, pp.195-197, 1981.
DOI : 10.1016/0022-2836(81)90087-5

C. Wills-smullen, I. Shahrukh-rohinton-tarapore, S. Gurumurthi, P. Ranganathan, and M. Uysal, Active storage revisited, Proceedings of the 2008 conference on Computing frontiers , CF '08, pp.293-304, 2008.
DOI : 10.1145/1366230.1366280

P. Sotin and B. Jeannet, Precise Interprocedural Analysis in the Presence of Pointers to the Stack, p.72
DOI : 10.1007/11547662_20

URL : https://hal.archives-ouvertes.fr/inria-00547888

J. Steel and J. Jézéquel, Model Typing for Improving Reuse in Model-Driven Engineering, Proceedings of MODELS/UML'2005, p.75, 2005.
DOI : 10.1007/11557432_7

URL : https://hal.archives-ouvertes.fr/hal-00795081

T. Stefanov, B. Kienhuis, and E. Deprettere, Algorithmic transformation techniques for efficient exploration of alternative application instances, Proceedings of the tenth international symposium on Hardware/software codesign , CODES '02, pp.7-12, 2002.
DOI : 10.1145/774789.774792

T. Stefanov, C. Zissulescu, A. Turjan, B. Kienhuis, and E. Deprettere, System design using kahn process networks: The compaan/laura approach, Proceedings of DATE2004, p.61, 2004.

Y. Sun, P. Li, G. Gu, Y. Wen, Y. Liu et al., HMMer acceleration using systolic array based reconfigurable architecture, Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '09, pp.37-38, 2009.
DOI : 10.1145/1508128.1508193

T. Takagi and T. Maruyama, Accelerating HMMER search using FPGA, 2009 International Conference on Field Programmable Logic and Applications, pp.37-38, 2009.
DOI : 10.1109/FPL.2009.5272276

S. Tavarageri, L. Pouchet, J. Ramanujam-andatanas-rountev, and P. Sadayappan, Dynamic selection of tile sizes, 2011 18th International Conference on High Performance Computing, p.73, 2011.
DOI : 10.1109/HiPC.2011.6152742

J. Teich, L. Thiele, and L. Z. Zhang, Partitioning Processor Arrays under Resource Constraints, The Journal of VLSI Signal Processing, vol.17, issue.1, pp.5-20, 1997.
DOI : 10.1023/A:1007935215591

A. Turjan, B. Kienhuis, and E. F. Deprettere, Classifying interprocess communication in process network representation of nested-loop programs, ACM Transactions on Embedded Computing Systems, vol.6, issue.2, p.56, 2007.
DOI : 10.1145/1234675.1234680

A. Turjan, T. Stefanov, B. Kienhuis, and E. Deprettere, The Compaan Tool Chain: Converting Matlab into Process Networks, Designers' Forum " Design, Automation and Test in Europe, pp.7-57, 2002.

S. Verdoolaege, Handbook of Signal Processing Systems, chapter Polyhedral process networks, pp.56-61, 2004.

E. Viaud, F. Pêcheux, and A. Greiner, An efficient tlm/t modeling and simulation environment based on conservative parallel discrete event principles European Design and Automation Association, Proceedings of the conference on Design, automation and test in Europe: Proceedings, DATE '06, pp.94-99, 2006.

E. Jean and . Vuillemin, On computing power, pp.69-86, 1994.

J. P. Walters, B. Qudah, and V. Chaudhary, Accelerating the HMMER sequence analysis suite using conventional processors, 20th International Conference on Advanced Information Networking and Applications, Volume 1 (AINA'06), p.37, 2006.
DOI : 10.1109/AINA.2006.68

J. Paul-walters, V. Balu, S. Kompalli, and V. Chaudhary, Evaluating the use of GPUs in liver image segmentation and HMMER database searches, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-12, 2009.
DOI : 10.1109/IPDPS.2009.5161073

H. Wang, X. Zhu, L. Peh, and S. Malik, Orion: a powerperformance simulator for interconnection networks, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp.294-305, 2002.

D. Wilde, A LIBRARY FOR DOING POLYHEDRAL OPERATIONS, Parallel Algorithms and Applications, vol.15, issue.3-4, p.54, 1993.
DOI : 10.1007/BF02574699

URL : https://hal.archives-ouvertes.fr/inria-00074515

B. Wun, J. Buhler, and P. Crowley, Exploiting coarse-grained parallelism to accelerate protein motif finding with a network processor, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), p.37, 2005.
DOI : 10.1109/PACT.2005.21

J. Xue and C. Lengauer, The synthesis of control signals for one-dimensional systolic arrays, Integration, the VLSI Journal, vol.14, issue.1, pp.1-32, 1992.
DOI : 10.1016/0167-9260(92)90008-M

C. Zissulescu, B. Kienhuis, and E. F. Deprettere, Expression Synthesis in Process Networks generated by LAURA, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05), pp.15-21, 2005.
DOI : 10.1109/ASAP.2005.34

C. Zissulescu, T. Stefanov, B. Kienhuis, and E. F. Deprettere, Laura: Leiden Architecture Research and Exploration Tool, Field Programmable Logic and Applications (FPL'03), pp.911-920, 2003.
DOI : 10.1007/978-3-540-45234-8_88