N. L. Passos and E. H. Sha, Achieving full parallelism using multidimensional retiming, IEEE Transactions on Parallel and Distributed Systems, vol.7, issue.11, pp.1150-1163, 1996.
DOI : 10.1109/71.544356

P. Sheliga and E. H. Sha, Fully parallel hardware/software codesign for multidimensional DSP applications, Proceedings of the 4th International Workshop on Hardware/Software Co-Design CODES'96, Pennsylvania (USA), 1996.

L. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos, Iterative optimization in the polyhedral model: Part II, multidimensional time, the ACM sigplan conference on programming language design and implementation (PLDI '08), pp.90-100, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01257273

. Zhuge, . Xue, . Qiu, E. H. Hu, and . Sha, Timing optimization via nest-loop pipelining considering code size, Microprocessors and Microsystems, vol.32, issue.7, pp.351-363, 2008.
DOI : 10.1016/j.micpro.2008.02.002

C. Xue, E. H. , and -. Sha, Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping, The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, vol.2, issue.4, pp.153-167, 2006.
DOI : 10.1007/s11265-006-0034-5

Q. Zhuge, C. Xue, Z. Shao, M. Liu, M. Qiu et al., Design optimization and space minimization considering timing and code size via retiming and unfolding, Microprocessors and Microsystems, vol.30, issue.4, pp.173-183, 2006.
DOI : 10.1016/j.micpro.2005.11.002

O. Neil, E. H. Tongsima, and . Sha, Extended retiming: Optimal scheduling via a graphtheoretical approach, Proceeding the Acoustics, Speech, and Signal Processing, 1999.

T. W. , E. H. , and -. Sha, Combining extended retiming and unfolding for rateoptimal graph transformation, J. of VLSI Sign. Process, vol.39, issue.3, pp.273-293, 2005.

T. Grosser, A. Cohen, P. H. Kelly, J. Ramanujam, P. Sadayappan et al., Split tiling for GPUs, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pp.24-31, 2013.
DOI : 10.1145/2458523.2458526

URL : https://hal.archives-ouvertes.fr/hal-00786812

Q. Zhuge, B. Xiao, Z. Shao, E. H. Sha, and C. Chantrapornchai, Optimal code size reduction for software-pipelined and unfolded loops, Proceedings of the 15th international symposium on System Synthesis , ISSS '02, pp.144-149, 2002.
DOI : 10.1145/581199.581232

Q. Zhuge, Z. Shao, B. Xiao, E. H. , and -. Sha, Design space minimization with timing and code size optimization for embedded DSP, Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign & system synthesis , CODES+ISSS '03, pp.144-149, 2003.
DOI : 10.1145/944645.944685

C. J. Xue, E. H. Sha, Z. Shao, and M. Qiu, Effective loop partitioning and scheduling under memory and register dual constraints, Proceedings of the conference on Design, automation and test in Europe, DATE '08, pp.1202-1207, 2008.
DOI : 10.1145/1403375.1403667

C. J. Xue, J. Hu, Z. Shao, and E. H. Sha, Iterational retiming with partitioning, ACM Transactions on Embedded Computing Systems, vol.9, issue.3, 2010.
DOI : 10.1145/1698772.1698780

D. Liu, Z. Shao, M. Wang, M. Guo, and J. Xue, Optimal loop parallelization for maximizing iteration-level parallelism, Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, CASES '09, 2009.
DOI : 10.1145/1629395.1629407

A. Qasem and K. Kennedy, Model-guided empirical tuning of loop fusion, International Journal of High Performance Systems Architecture, vol.1, issue.3, 2008.
DOI : 10.1504/IJHPSA.2008.021798

H. Rong, Z. Tang, R. Govindarajan, A. Douillet, and G. R. Gao, Single-dimension software pipelining for multidimensional loops, ACM Transactions on Architecture and Code Optimization, vol.4, issue.1, 2007.
DOI : 10.1145/1216544.1216550

M. Liu, E. H. Sha, Q. Zhuge, Y. He, and M. Qiu, Loop Distribution and Fusion with Timing and Code Size Optimization, Journal of Signal Processing Systems, vol.18, issue.2, pp.325-340, 2011.
DOI : 10.1007/s11265-010-0465-x

M. Fellahi and A. Cohen, Software Pipelining in Nested Loops with Prolog-Epilog Merging, High Performance Embedded Architectures and Compilers Lecture Notes in Computer Science, pp.80-94, 2009.
DOI : 10.1007/978-3-540-92990-1_8

URL : https://hal.archives-ouvertes.fr/inria-00445489

M. A. Khan, Improving performance through deep value profiling and specialization with code transformation, Computer Languages, Systems & Structures, vol.37, issue.4, pp.193-203, 2011.
DOI : 10.1016/j.cl.2011.08.001

A. Morvan, S. Derrien, and P. Quinton, Efficient nested loop pipelining in high level synthesis using polyhedral bubble insertion Page(s):1 ? 10, International Conference on Field- Programmable Technology (FPT), pp.12-14, 2011.

K. Turkington, G. A. Constantinides, K. Masselos, and P. Y. Cheung, Outer Loop Pipelining for Application Specific Datapaths in FPGAs, IEEE Transaction on very large scale integration (VLSI) systems, pp.1268-1280, 2008.
DOI : 10.1109/TVLSI.2008.2001744

K. Muthukumar and . Doshi, Software Pipelining of Nested Loops, Lecture Notes in Computer Science, vol.2027, pp.165-181, 2001.
DOI : 10.1007/3-540-45306-7_12

M. Fellahi, A. Cohen, and S. Touati, Code-size conscious pipelining of imperfectly nested loops, Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture, MEDEA '07, 2007.
DOI : 10.1145/1327171.1327177

URL : https://hal.archives-ouvertes.fr/hal-00646688

E. Leiserson and B. Saxe, Retiming synchronous circuitry, Algorithmica, vol.9, issue.No. 1, pp.1-6, 1991.
DOI : 10.1007/BF01759032

N. L. Passos and E. H. Sha, Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming, 1994 International Conference on Parallel Processing (ICPP'94), 1994.
DOI : 10.1109/ICPP.1994.114

N. L. Passos, D. C. Defoe, R. J. Bailey, R. H. Halverson, and R. P. Simpson, Theoretical Constraints on Multi-Dimensional Retiming Design Techniques, Proceedings of the AeroSense-Aerospace/Defense Sensing, Simulation and Controls, 2001.

Q. Zhuge, Z. Shao, and E. H. Sha, timing optimization of nested loops considering code size for DSP applications, Proceeding of the 2004 international conference on parallel processing, 2004.

Q. Zhuge, E. H. Sha, and C. Chantrapornchai, CRED: code size reduction technique and implementation for software-pipelined applications, Proceedings of the IEEE Workshop of Embedded System Codesign (ESCODES), pp.50-56, 2002.

T. C. Denk and K. K. Parhi, Two-Dimensional Retiming, the IEEE Transactions on VLSI, pp.198-211, 1999.

N. L. Passos, E. H. , and -. Sha, Scheduling of Uniform Multi-Dimensional Systems under Resource Constraints, the IEEE Transactions on VLSI Systems, pp.719-730, 1998.

L. F. Chao and E. H. Sha, scheduling data flow graphs via retiming and unfolding, IEEE, 1997.

C. J. Xue, Z. Shao, M. Liu, M. K. Qiu, E. H. et al., Optimizing parallelism for nested loops with iterational and instructional retiming, J. Embed. Comput, vol.3, issue.1, pp.29-37, 2009.

T. W. O-'neil, E. H. , and -. Sha, Combining extended retiming and unfolding for rate-optimal graph transformation, J. of VLSI Sign. Process, vol.39, issue.3, pp.273-293, 2005.

C. J. Xue, Z. Shao, M. Liu, M. K. Qiu, E. H. et al., Optimizing parallelism for nested loops with iterational and instructional retiming, J. Embed. Comput, vol.3, issue.1, pp.29-37, 2009.

L. Kaouane, M. Akil, T. Grandpierre, and Y. Sorel, A Methodology to Implement Real-Time Applications onto Reconfigurable Circuits, The Journal of Supercomputing, vol.30, issue.3, pp.283-301, 2004.
DOI : 10.1023/B:SUPE.0000045213.82276.8e

K. K. Parhi and D. G. Messerschmitt, Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding, IEEE Transactions on Computers, vol.40, issue.2, pp.178-195, 1991.
DOI : 10.1109/12.73588

O. Lobachev, M. Guthe, and R. Loogen, Estimating parallel performance, Journal of Parallel and Distributed Computing, vol.73, issue.6, 2013.
DOI : 10.1016/j.jpdc.2013.01.011

G. Romanazzi, P. K. Jimack, and C. E. Goodyer, Reliable performance prediction for multigrid software on distributed memory systems, Advances in Engineering Software, vol.42, issue.5, pp.247-258, 2011.
DOI : 10.1016/j.advengsoft.2010.10.005

O. Sinnen, Task Scheduling for Parallel Systems, 2007.
DOI : 10.1002/0470121173

P. Y. Calland, A. Darte, Y. Robert, and F. Vivien, On the removal of anti and output dependencies, International Journal of Parallel Programming, vol.26, issue.3, pp.285-312, 1998.
DOI : 10.1023/A:1018790129478

F. Sanchez and J. Cortadella, Time-Constrained Lopp Pipelining, p.3, 1995.

J. P. Elloy, Systèmes réactifs synchrones et asynchrones, Applications, Réseaux et Systèmes ? École d'été temps réel'99, pp.43-51, 1997.

P. Richard, F. Cottet, and C. Kaiser, Précédences généralisées et ordonnançabilité des tâches de suivi temps réel d'un laminoir, Journal Européen des Systèmes Automatisés, vol.35, issue.9, pp.1055-1071, 2001.

C. Kaiser, Description et critique d un système temps réel pour le suivi d'un laminoir : Robustesse et potentiel d'évolutivité, Hermes Science, vol.20, issue.1, 2001.

D. Isovic, G. Fohler, and L. Steffens, Real-time issues of mpeg-2 playout in resource constrained systems, International Journal on Embedded Systems, vol.1, issue.6, 2004.

F. Cottet, J. Delacroix, C. Kaiser, and Z. Mammeri, Ordonnancement temps réel, Hermes, p.207, 2000.

C. Kaiser and G. Stoffel, Système d'acquisition et d analyse en temps réel des signaux d un laminoir, 1999.

F. Sanchez and J. Cortadella, Maximum-Throughput Software Pipelining, 1998.

C. Jesshope, Scalable Instruction-Level Parallelism, Proc. Computer Systems: Architectures, Modeling and Simulation, 3rd and 4th Int. Workshops, 2004.
DOI : 10.1007/978-3-540-27776-7_40

B. R. Rau and J. A. Fisher, Instruction-level parallel processing: History, overview, and perspective, The Journal of Supercomputing, vol.7, pp.1-2, 1993.

D. K. Arvind and V. E. Rebello, Instruction-Level Parallelism in Asynchronous Processor Architectures, proceedings of the 3 RD international workshop on algorithms and parallel VLSI architecture
DOI : 10.1016/B978-044482106-5/50018-1

K. W. Rudd, Vliw Processors: Efficiently Exploiting Instruction Level Parallelism, 1999.

K. Lodaya and P. Weil, A Kleene Iteration for Parallelism, Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science, vol.1530, pp.355-366, 1998.
DOI : 10.1007/978-3-540-49382-2_33

M. Lam, Software pipelining, PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pp.318-328
DOI : 10.1145/989393.989420

D. Liu, Y. Wang, Z. Shao, M. Guo, and J. Xue, Optimally Maximizing Iteration-Level Loop Parallelism, IEEE Transactions on Parallel and Distributed Systems, vol.23, issue.3, pp.564-572, 2012.
DOI : 10.1109/TPDS.2011.171

J. M. Roy, Exploiting iteration-level parallelism in declarative programs

S. Shekhar and H. Xiong, Nested-Loop, Blocked, Encyclopedia of GIS, p.787, 2008.
DOI : 10.1007/978-0-387-35973-1_874

C. Beeri and M. Y. Vardi, The implication problem for data dependencies, Automata, Languages and Programming Lecture Notes in Computer Science, vol.115, pp.73-85, 1981.
DOI : 10.1007/3-540-10843-2_7

L. Qiao, W. Huang, and Z. Tang, Coping with Data Dependencies of Multi-dimensional Array References, Network and Parallel Computing Lecture Notes in Computer Science, vol.3779, pp.278-284, 2005.
DOI : 10.1007/11577188_40

M. Heffernan and K. Wilken, Data-Dependency Graph Transformations for Instruction Scheduling, Journal of Scheduling, vol.21, issue.2, pp.427-451, 2005.
DOI : 10.1007/s10951-005-2862-8

N. Vasilache, C. Bastoul, and A. Cohen, Polyhedral Code Generation in the Real World, Compiler Construction Lecture Notes in Computer Science, vol.3923, pp.185-201, 2006.
DOI : 10.1007/11688839_16

URL : https://hal.archives-ouvertes.fr/inria-00001106

S. Derrien, S. Rajopadhye, P. Quinton, and T. Risset, High-Level Synthesis of Loops Using the Polyhedral Model, High-Level Synthesis, pp.215-230, 2008.
DOI : 10.1007/978-1-4020-8588-8_12

URL : https://hal.archives-ouvertes.fr/hal-00410719

M. W. Benabderrahmane, L. N. Pouchet, A. Cohen, and C. Bastoul, The Polyhedral Model Is More Widely Applicable Than You Think, Compiler Construction Lecture Notes in Computer Science, vol.6011, pp.283-303, 2010.
DOI : 10.1007/978-3-642-11970-5_16

URL : https://hal.archives-ouvertes.fr/inria-00551087

Z. Huang and S. Malik, Exploiting operation level parallelism through dynamically reconfigurable datapaths, Proceedings of the 39th conference on Design automation , DAC '02, 2002.
DOI : 10.1145/513918.514006

L. Wang, Z. Wang, and K. Dai, Cycle Period Analysis and Optimization of Timed Circuits, Advances in Computer Systems Architecture Lecture Notes in Computer Science, pp.502-508, 2006.
DOI : 10.1007/11859802_50

G. Tu, F. Yang, and Y. Lu, Scheduling algorithms based on weakly hard real-time constraints, Journal of Computer Science and Technology, vol.20, issue.1, pp.815-821, 2003.
DOI : 10.1007/BF02945471

S. Leonardi, A. Marchetti-spaccamela, A. Vitaletti, and F. Tcs, Approximation Algorithms for Bandwidth and Storage Allocation Problems under Real Time Constraints, Foundations of Software Technology and Theoretical Computer Science Lecture Notes in Computer Science, pp.409-420, 1974.
DOI : 10.1007/3-540-44450-5_33

N. Ge, M. Pantel, and X. Crégut, Formal Specification and Verification of Task Time Constraints for Real-Time Systems " , Leveraging Applications of Formal Methods, Verification and Validation. Applications and Case Studies Lecture Notes in Computer Science, pp.143-157, 2012.

T. Sato and I. Arita, Execution Latency Reduction via Variable Latency Pipeline and Instruction Reuse, Euro-Par Parallel Processing Lecture Notes in Computer Science, vol.2150, pp.428-438, 2001.
DOI : 10.1007/3-540-44681-8_62

C. Xue, Z. Shao, M. Liu, and E. H. Sha, Iterational retiming, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES+ISSS '05, 2005.
DOI : 10.1145/1084834.1084910

F. M. Ciorba, Algorithms Design for the Parallelization of Nested Loops " , doctoral dissertation national technical university of Athens, 2008.

A. Cohen, program analysis and transformation: from the polytope model to formal languages, 2009.
URL : https://hal.archives-ouvertes.fr/tel-00550829

D. Petkov, R. Harr, and S. Amarasinghe, Efficient pipelining of nested loops: unroll-andsquash, Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, 2001.

T. W. O-'neil and E. H. Sha, Time-Constrained Loop Scheduling with Minimal Resources, Journal of Embedded Computing -Embeded Processors and Systems, pp.103-117, 2006.

J. L. Lo, S. J. Eggers, J. S. Emer, H. M. Levy, R. L. Stamm et al., Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems, vol.15, issue.3, 1997.
DOI : 10.1145/263326.263382

F. Balasa, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, Q. Hu et al., Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications, Journal of Signal Processing Systems, vol.2, issue.1, pp.1-2, 2008.
DOI : 10.1007/s11265-008-0244-0

Y. Elloumi, M. Akil, and M. H. Bedoui, Timing and Code Size Optimization on Achieving Full Parallelism in Uniform Nested Loop, Journal of computing, vol.3, issue.7, pp.68-77, 2011.

Y. Elloumi, M. Akil, and M. H. Bedoui, Execution Time Optimization Using Delayed Multidimensional Retiming, 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications, pp.25-27
DOI : 10.1109/DS-RT.2012.34

Y. Elloumi, M. Akil, and M. H. Bedoui, Execution Time and Code Size Optimization Using Multidimensional Retiming and Loop Striping, 2013 Euromicro Conference on Digital System Design, 2013.
DOI : 10.1109/DSD.2013.132

Y. Elloumi, M. Akil, and M. H. Bedoui, Execution time and code size optimization using delayed multidimensional retiming, soumis au journal " ACM Transaction on architecture and code optimization

L. Kaouane, Formalisation et optimisation d'applications s'exécutant sur architecture reconfigurable, thèse de doctorat, 2004.

L. Kaouane, M. Akil, and T. Grandpierre, A Methodology to Implement Real-Time Applications onto Reconfigurable Circuits, The Journal of Supercomputing, vol.30, issue.3, pp.283-301, 2004.
DOI : 10.1023/B:SUPE.0000045213.82276.8e

Y. Elloumi, M. Akil, T. Grandpierre, and M. H. Bedoui, Latency and power optimization in AAA methodology for integrated circuits, 2010 17th IEEE International Conference on Electronics, Circuits and Systems, pp.639-642
DOI : 10.1109/ICECS.2010.5724593

S. Kurra, N. K. Singh, and P. R. Panda, The Impact of Loop Unrolling on Controller Delay in High Level Synthesis, 2007 Design, Automation & Test in Europe Conference & Exhibition, pp.391-396
DOI : 10.1109/DATE.2007.364623

V. Sarkar, Optimized unrolling of nested loops, Proceedings of the 14th international conference on Supercomputing , ICS '00, pp.545-581, 2001.
DOI : 10.1145/335231.335246

Y. Dong, J. Zhou, Y. Dou, L. Deng, and J. Zhao, Impact of Loop Unrolling on Area, Throughput and Clock Frequency for Window Operations Based on a Data Schedule Method, 2008 Congress on Image and Signal Processing, 2008.
DOI : 10.1109/CISP.2008.211

X. Liu, M. C. Papaefthymiou, and E. G. Friedman, Retiming and Clock Scheduling for Digital Circuit Optimization " , IEEE Transaction on computer-aided design of integrated circuits and systems, 2002.

M. Gao, J. Huang, S. Zhang, Z. Qian, S. Voros et al., 4D Cardiac Reconstruction Using High Resolution CT Images, Lecture Notes in Computer Science, vol.29, issue.9, pp.153-160, 2011.
DOI : 10.1007/978-3-540-85988-8_76

M. Ioannides, A. Hadjiprocopis, and N. Doulamis, ONLINE 4D RECONSTRUCTION USING MULTI-IMAGES AVAILABLE UNDER OPEN ACCESS, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W1, pp.2-6, 2013.
DOI : 10.5194/isprsannals-II-5-W1-169-2013

P. Crowley and J. L. Baer, Worst-Case Execution Time Estimation for Hardware-Assisted Multithreaded Processors, Proc. of the 2nd Workshop on Network Processors
DOI : 10.1016/B978-012198157-0/50006-4

V. Blanco, J. A. Gonzalez, C. Leon, C. Rodr?guez, G. Rodr?guez et al., Predicting the performance of parallel programs, Parallel Computing, vol.30, issue.3, pp.337-356, 2004.
DOI : 10.1016/j.parco.2003.11.004

O. Lobacheva, M. Guthe, and R. Loogen, Estimating parallel performance, Journal of Parallel and Distributed Computing, vol.73, issue.6, p.22, 2013.
DOI : 10.1016/j.jpdc.2013.01.011

V. Bandishti, I. Pananilath, and U. Bondhugula, Tiling stencil computations to maximize parallelism, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, 2012.
DOI : 10.1109/SC.2012.107

L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout, Parameterized Tiled Loops for Free, PLDI '07 Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pp.405-414, 2007.

G. Goumas, N. Drosinos, M. Athanasaki, and N. Koziris, Automatic parallel code generation for tiled nested loops, Proceedings of the 2004 ACM symposium on Applied computing , SAC '04, 2004.
DOI : 10.1145/967900.968184

U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev et al., Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model, Lecture Notes in Computer Science, vol.4959, pp.132-146, 2008.
DOI : 10.1007/978-3-540-78791-4_9

H. Liu, ·. Shao, ·. Wang, ·. Du-·chun-jason-xue, and ·. Zhiping-ia, Combining Coarse-Grained Software Pipelining with DVS for Scheduling Real-Time Periodic Dependent Tasks on Multi-Core Embedded Systems, Journal of Signal Processing Systems, vol.23, issue.9, 2008.
DOI : 10.1007/s11265-008-0315-2

S. Simon, E. Bernard, M. Sauer, and J. A. Nossek, A new retiming algorithm for circuit design, Proceedings of IEEE International Symposium on Circuits and Systems, ISCAS '94, pp.35-38
DOI : 10.1109/ISCAS.1994.409190

L. N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache, Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time, International Symposium on Code Generation and Optimization (CGO'07), pp.144-156
DOI : 10.1109/CGO.2007.21

URL : https://hal.archives-ouvertes.fr/hal-01257281

P. Lieverse, P. V. Derwolf, E. Deprettere, and K. Vissers, A methodology for architecture exploration of heterogeneous signal processing systems, 1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461), pp.181-190, 1999.
DOI : 10.1109/SIPS.1999.822323

M. Matrice, W. O. Timothy, and E. H. Sha, Retiming Synchronous Data-Flow Graphs to ReduceExecution Time, IEEE transaction on signal processing, vol.49, issue.10, pp.2397-2407, 2001.

S. Rachid, Méthodes de dénombrement de points entiers de polyèdres et applications à l'optimisation de programmes, 2006.

Y. H. Lee and C. Chen, An effective and efficient code generation algorithm for uniform loops on non-orthogonal DSP architecture, Journal of Systems and Software, vol.80, issue.3, pp.410-428, 2006.
DOI : 10.1016/j.jss.2006.06.002