A comparison of list schedules for parallel processing systems, Communications of the ACM, vol.17, issue.12, pp.685-690, 1974. ,
DOI : 10.1145/361604.361619
Dependence-Conscious Global Register Allocation Optimal Code Generation for Expression Trees, Journal of the ACM, pp.488-501, 1976. ,
Software pipelining, ACM Computing Surveys, vol.27, issue.3, pp.367-432, 1995. ,
DOI : 10.1145/212094.212131
Code Generation for Machines with Multiregister, Proceedings of the ACM Symposium on Principles of Programming Languages, pp.21-28, 1977. ,
Ordering Problems Approximated : Register Suuciency, Single Processor Scheduling and Interval Graph Completion . internal research report CS-91-18, 1991. ,
SimpleScalar: an infrastructure for computer system modeling, Computer, vol.35, issue.2, pp.59-67, 2002. ,
DOI : 10.1109/2.982917
Optimal Software Pipelining with Functional Units and Registers, 1995. ,
A formal approach to code optimization, ACM SIGPLAN Notices, vol.5, issue.7, pp.86-100, 1970. ,
DOI : 10.1145/390013.808486
Spill code minimization via interference region spilling, ACM SIGPLAN Notices, vol.32, issue.5, pp.287-295, 1997. ,
DOI : 10.1145/258916.258941
Integrating Register Allocation and Instruction Scheduling for RISCs, ACM SIGPLAN Notices, vol.26, issue.4, pp.122-131, 1991. ,
Uniication of Register Allocation and Instruction Scheduling in Compilers for Fine-Grain Parallel Architecture Array Data Flow Analysis for Load- Store Optimizations in Fine-Grain Architectures, BG96] Rastislav Bodik and Rajiv Gupta, pp.24481-512, 1977. ,
URSA: A Uniied Resource Allocator for Registers and Functional Units in VLIW Architectures URSA: A Uniied ReSource Allocator for Registers and Functional Units in VLIW Architectures, Proceedings of the ACM SIGPLAN '89 Conference on Programming Language Design and Implementation. Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, pp.258-263, 1989. ,
Compiler transformations for high-performance computing, ACM Computing Surveys, vol.26, issue.4, pp.345-420, 1994. ,
DOI : 10.1145/197405.197406
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.4215
Resource Spackling: A Framework for Integrating Register Allocation in Local and Global Schedulers, International Conference on Parallel Architectures and Compilation Techniques, 1994. ,
Representing Architecture Constraints in URSA Scheduling Arithmetic and Load Operations in parallel with No Spilling, SIAM Journal on Computing, vol.18, issue.6, pp.1098-1127, 1989. ,
Lattices and Orders ,
Register Allocation via Graph Coloring, 1992. ,
Code Generation for a One-Register Machine, Journal of the ACM, vol.23, issue.3, pp.502-510, 1976. ,
DOI : 10.1145/321958.321971
CRAIG: A Practical Framework for Combining Instruction Scheduling and Register Assignment Introduction to Linear Optimization, Parallel Architectures and Compilation Techniques (PACT '95) Car91] M. Carlisle. On Local Register Allocation, 1991. ,
Improving register allocation for subscripted variables, ACM SIGPLAN Notices, vol.25, issue.6, pp.53-65, 1990. ,
DOI : 10.1145/93548.93553
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.441
Algebraic Theory of Lattices, 1973. ,
On the removal of anti-and output-dependences Flexible Issue Slot Assignment for VLIW Architectures, CER99] Zbigniew Chamski, Christine Eisenbeis, and Erven Rohou, pp.285-312, 1998. ,
What's in a Name? -or-The Value of Renaming for Parallelism Detection and Storage Allocation, Proceedings of the 1987 International Conference on Parallel Processing, 1987. ,
The Priority-Based Coloring Approach to Register Allocation, ACM Transactions on Programming Languages and Systems, vol.12, issue.4, pp.501-536, 1990. ,
Register Allocation and Spilling via Graph Coloring Register Allocation via Hierarchical Graph Coloring, Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation. CLR90] Thomas Cormen, Charles Eric Leiserson, and Ronald Rivest. Introduction to Algorithms BIBLIOGRAPHY Cof76] E. G. Cooman. Computer and Job-Shop Scheduling Theory CPL93] CPLEX Optimization, Inc., Incline Village, Nevada. Using the CPLEX Callable Library and CPLEX Mixed Integer Library, pp.98-105192, 1976. ,
Load-Store Optimization For Software Pipelining Register Pipelining: An Integrated Approach to Register Allocation for Scalar and Subscripted Variables A Practical Data Flow Framework for Array Reference Analysis and its Application in Optimizations, Compiler Construction, 4th International Conference on Compiler Construction Proc. of ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.68-77, 1992. ,
Loop Shifting for Loop Compaction, International Journal of Parallel Programming, vol.28, issue.5, 2000. ,
DOI : 10.1007/3-540-44905-1_26
Overlapped Loop Support in the Cydra 5, Proceedings of Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp.26-38, 1989. ,
Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling, Parallel Processing Letters, vol.07, issue.04, 1996. ,
DOI : 10.1142/S0129626497000383
URL : https://hal.archives-ouvertes.fr/hal-00856915
Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling, Parallel Processing Letters, pp.379-392, 1998. ,
DOI : 10.1142/S0129626497000383
URL : https://hal.archives-ouvertes.fr/hal-00856915
Compiling for the Cydra 5, The Journal of Supercomputing, vol.7, issue.1{2, pp.181-227, 1993. ,
On a graph-theoretical model for cyclic register allocation, Discrete Applied Mathematics, vol.93, issue.2-3, pp.191-203, 1999. ,
DOI : 10.1016/S0166-218X(99)00105-5
Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling, International Journal of Parallel Programming, vol.7, issue.3, pp.103-132, 1996. ,
DOI : 10.1007/BF03356744
Allocating Registers in Multiple Instruction-Issuing Processors Theoretical Improvements in Algorithmic EEciency for Network Flow Problems, Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT'95, pp.290-293248, 1972. ,
Bulldog: A Compiler for VLIW Architectures, 1986. ,
The Meeting Graph: A New Model for Loop Cyclic Register Allocation, Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT '95, pp.264-267, 1995. ,
Circular-arc Graph Coloring and Unrolling, Proceedings of the 5 th Twente Workshop on Graphs and Combinatorial Optimization, pp.71-74, 1997. ,
URL : https://hal.archives-ouvertes.fr/inria-00073353
Optimal Loop Parallelization under Register Constraints, Sixth Workshop on Compilers for Parallel Computers CPC'96. , pages 245{259, 1996. ,
URL : https://hal.archives-ouvertes.fr/inria-00073911
Optimal Loop Parallelization under Register Constraints, 1996. ,
URL : https://hal.archives-ouvertes.fr/inria-00073911
Fine-grain scheduling under resource constraints, Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing, pp.1-15, 1994. ,
DOI : 10.1007/BFb0025867
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.9737
A Clustered VLIW Architecture Based on Queue Register Files Trace Scheduling: A Technique for Global Microcode Compaction, IEEE Trans. Comput, issue.7, pp.30478-490, 1981. ,
On Local Register Allocation, Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.564-573, 1998. ,
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design BIBLIOGRAPHY FM01] D. Fimmel and J. Muller. Optimal Software Pipelining Under Resource Constraints, Proceedings of the 7th International Symposium on High-Performance Computer Architecture, pp.12697-718, 2001. ,
The program dependence graph and its use in optimization, Code Generation { Concepts, Tools, Techniques. Proceedings of the International Workshop on Code Generation, pp.319-349, 1987. ,
DOI : 10.1145/24039.24041
Iterated register coalescing, ACM Transactions on Programming Languages and Systems, vol.18, issue.3, pp.300-324, 1996. ,
DOI : 10.1145/229542.229546
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.4803
Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proceedings of the 27th annual international symposium on Microarchitecture , MICRO 27, pp.85-94, 1994. ,
DOI : 10.1145/192724.192733
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.2377
Global Predicate Analysis and its Application to Register Allocation Gebotys and M. I. Elmasry. A Global Optimization Approach for Architectural Synthesis, Proceedings of the 29th Annual International Symposium on Microarchitecture Proceedings of the IEEE International Conference on Computer-Aided Design, pp.114-125, 1990. ,
Optimal Scheduling and Allocation of Embedded VLSI Chips, Proceedings of the 29th Conference on Design Automation, 1992. ,
Handling Cross Interferences by Cyclic Cache Line Coloring Code Scheduling and Register Allocation in Large Basic Blocks, Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (PACT '98), pages 112{117 Conference Proceedings 1988 International Conference on Supercomputing, pp.442-452, 1988. ,
Computers and Intractability, 1979. ,
The Complexity of Coloring Circular Arcs and Chords, SIAM Journal on Algebraic Discrete Methods, vol.1, issue.2, pp.216-227, 1980. ,
DOI : 10.1137/0601025
Integer Programming, Series in Decision and Control, 1972. ,
Algorithmic Graph Theory and Perfect Graphs, 1980. ,
Extending Software Pipelining Techniques for Scheduling Nested Loops, Lecture Notes in Computer Science, vol.768, 1994. ,
Generating Close to Optimum Loop Schedules on Parallel Processors. Parallel Processing Letters Register Allocation via Clique Separators, Proceedings of Theory of Computing and Systems (ISTCS'92) GS94] Franco Gasperoni and Uwe Schwiegelshohn GT86] Z. Galil and E. Tardos. An O(n 2 (m + n log n) log n) Min-Cost Flow Algorithm 27th Annual Symposium on Foundations of Computer Science, pp.32-42391, 1986. ,
Minimum register instruction sequence problem: revisiting optimal code generation for DAGs, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001, 1999. ,
DOI : 10.1109/IPDPS.2001.924962
Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs Minimal Register Instruction Scheduling: A New Approach for Dynamic Instruction Scheduling Processors, Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS-01), pages 26{26 Proc. of the Twelfth International Workshop on Languages and Compilers for Parallel Computing, 1999. ,
Register Allocation Using Cyclic Interval Graphs: A New Approach to an Old Problem, ACAPS Technical Memo, vol.33, 1992. ,
Study of NP-hard Cyclic Scheduling Problem: the Periodic Recurrent Job-Shop On the Minimization of Loads/Stores in Local Register Allocation, International Workshop on Compiler for Parallel Computers, pp.151252-1260, 1989. ,
A register allocation framework based on hierarchical cyclic interval graphs, Lecture Notes in Computer Science, vol.641, p.176, 1992. ,
DOI : 10.1007/3-540-55984-1_17
Index Register Allocation, Journal of the ACM, vol.13, issue.1, pp.43-61, 1966. ,
DOI : 10.1145/321312.321317
The Superblock: An EEective Technique for VLIW and Superscalar Compilation, The Journal of Supercomputing, vol.7, pp.229-248, 1993. ,
Code Generation for Transport Triggered Architectures, 1996. ,
Translation, Static Analysis and Software Pipelining for Guarded Code, 2000. ,
Algorithmique du D ecalage d'Instructions, 2001. ,
Lifetime-Sensitive Modulo Scheduling, PLDI 93, pp.258-267, 1993. ,
Compilers Strategies for Transport Triggered Architectures, 2001. ,
Reducibility Among Combinatorial Problems, Complexity of Computer Computations, pages 85{103, 1972. ,
A New Polynomial-Time Algorithm for Linear Programming Scheduling Expression DAGs for Minimal Register Need, Combinatorica Computer Languages, vol.4, issue.241, pp.373-39533, 1984. ,
Tutorial: IA64 Architecture and Compilers, Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation. KL99] D. Kaestner and M. Langenbach. Code Optimization by Integer Linear Programming, pp.268-277122, 1993. ,
Optimal register assignment to loops for embedded code generation, ACM Transactions on Design Automation of Electronic Systems, vol.1, issue.2, pp.251-279, 1996. ,
DOI : 10.1145/233539.233542
EEcient Instruction Scheduling for Delayed-Load Architectures A Randomized Heuristic Approach to Register Allocation Precise Register Allocation for Irregular Register Architectures, Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-98), pp.740-776, 1991. ,
Handbuch der Lehre von der Verteilung der Primzahlen Reprinted from the First Edition Optimal Cycles on Graphs and Minimal Cost-to-Time Ratio Problem, Periodic Optimization, pp.38-58, 1909. ,
The Interaction of Compilation Technology and Computer Architecture. Kluwer Academic, 1994. ,
Contribution a l'Allocation de Registres dans les Boucles, 1996. ,
The multiflow trace scheduling compiler, The Journal of Supercomputing, vol.34, issue.1, pp.51-142, 1993. ,
DOI : 10.1007/BF01205182
Fusion-Based Register Allocation, ACMTOPLAS: ACM Transactions on Programming Language and Systems, vol.22, 2000. ,
Swing Modulo Scheduling: A Lifetime-Sensitive Approach, PACT 96, 1996. ,
Reducing the Impact of Register Pressure on Software Pipelined Loops, 1996. ,
Heuristics for Register-Constrained Software Pipelining, Proceedings of the 29th International Symposium on Microarchitecture, pp.250-261, 1996. ,
Dual-Issue Scheduling for Binary Trees with Spills and Pipelined Loads, SIAM Journal on Computing, vol.30, issue.6, pp.1921-1941, 2001. ,
DOI : 10.1137/S009753979834610X
EEective Compiler Support for Predicated Execution Using the Hyperblock, 25th Annual International Symposium on Microarchitecture (MICRO-25), pp.45-54, 1992. ,
LEDA: A Platform of Combinatorial and Geometric Computing, 1999. ,
Combining Register Allocation and Instruction Scheduling, 1995. ,
Register Requirements of Pipelined Processors Net] Netib. Performance database server, ACM Conference proceedings / 1992 International Conference on Supercomputing, pp.492-494, 1967. ,
A novel framework of register allocation for software pipelining, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages , POPL '93, pp.29-42, 1993. ,
DOI : 10.1145/158511.158519
Uniform Parallelism Exploitation in Ordinary Programs, 1985 International Conference on Parallel Processing, pp.614-618, 1985. ,
Optimal Register Allocation to Support Time Optimal Scheduling for Loops, 1993. ,
A Scheduler-Sensitive Global Register Allocator, IEEE Supercomputing 93 Proceedings: Portland, Oregon , pages 804{813, 1109 Spring Street, 1993. ,
The design and implementation of RAP: a PDG-based register allocator, Software: Practice and Experience, vol.28, issue.4, pp.401-424, 1998. ,
DOI : 10.1002/(SICI)1097-024X(19980410)28:4<401::AID-SPE159>3.0.CO;2-R
Probabilistic Register Allocation Computer Organization and Design The Hardware-Software Interface, Proceedings of the 20th Annual ACM Symposium on the Theory of Computing Pin93] Schlomit S. Pinter. Register Allocation with Instruction Scheduling: A New Approach. SIGPLAN Notices PS99] Massimiliano Poletto and Vivek Sarkar. Linear Scan Register Allocation, pp.248-257895, 1988. ,
Optimal Software Pipelining of Nested Loops, Proceedings of the 8th International Symposium on Parallel Processing, pp.335-343, 1994. ,
An Optimizing Compiler for Vector Processors, Proceedings of the International Conference on Parallel and Distributed Computing and Systems (ISMM), pages 97{103. Acta press Fisher. Instruction-Level Parallel Processing: History, Overview, and Perspective, pp.81-849, 1969. ,
Register allocation for software pipelined loops, Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pp.283-299, 1992. ,
DOI : 10.1145/143103.143141
Processor Architecture: from Dataaow to Superscalar and Beyond RESIS: A New Methodology for Register Optimization in Software Pipelining, Pipeline Logiciel: D ecouplage et Contraintes de Registres Proceedings of Second International Euro-Par Conference, Euro-Par'96, 1996. ,
EPIC: Explicitly Parallel Instruction Computing, Computer, vol.33, issue.2, pp.37-45, 2000. ,
Theme Feature: Compilers for Instruction-Level Parallelism, Computer, issue.12, pp.3063-69, 1997. ,
Schedule-independent storage mapping for loops, ACM SIGPLAN Notices, vol.33, issue.11, pp.24-33, 1998. ,
DOI : 10.1145/291006.291015
Theory of Linear and Integer Programming Complete register allocation problems, SIAM Journal on Computing, vol.4, issue.3, pp.226-248, 1975. ,
Achieving High Levels of instruction- Level Parallelism with Reduced Hardware Complexity The Generation of Optimal Code for Arithmetic Expressions, Journal of the ACM, vol.17, issue.4, pp.715-728, 1970. ,
A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors TE01] Sid-Ahmed-Ali Touati and Christine Eisenbeis. Schedule Independent Register Allocation for Software Pipelining Cyclic Register Pressure and Allocation for Modulo Scheduled Loops, Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques (PACT-97), pages 78{89 9th Workshop on Compilers for Parallel ComputersSIRA.ps.gz. Tou01a] Sid-Ahmed-Ali Touati. EquiMax: A New Formulation of Acyclic Scheduling Problem for ILP Processors. In Interaction between Compilers and Computer Architectures, 1997. ,
Maximizing for Reducing Register Need in Acyclic Schedules, Proceedings of 5th International Workshop on Software and Compilers for Embedded Systems, 2001. ,
URL : https://hal.archives-ouvertes.fr/hal-00646770
Optimal Acyclic Fine-Grain Schedule with Cache EEects for Embedded and Real Time Systems, Proceedings of 9th nternational Symposium on Hardware/Software Codesign, CODES, 2001. ,
Optimal Register Saturation in Acyclic Superscalar and VLIW Codes, 2001. ,
URL : https://hal.archives-ouvertes.fr/inria-00072324
Register Saturation in Superscalar and VLIW Codes TT00] Sid-Ahmed-Ali Touati and Frann cois Thomasset. Register Saturation in Data Dependence Graphs Optimization of Microprograms, Proceedings of The International Conference on Compiler Construction, pp.30491-504, 1981. ,
Coloring a Family of Circular Arcs, SIAM Journal on Applied Mathematics, vol.29, issue.3, pp.493-502, 1975. ,
DOI : 10.1137/0129040
A unified framework for schedule and storage optimization, ACM SIGPLAN Notices, vol.36, issue.5, pp.232-242, 2001. ,
DOI : 10.1145/381694.378852
URL : https://hal.archives-ouvertes.fr/hal-00808285
Dhrystone 2.1 and MIPS Results, Siemens Nixdorf Inf. Syst. electronic document, 1999. ,
Decomposed software pipelining: A new perspective and a new approach, International Journal of Parallel Programming, vol.19, issue.7, pp.351-373, 1994. ,
DOI : 10.1007/BF02577737
Decomposed Software Pipelining with Reduced Register Requirement, Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT95, pp.277-280, 1995. ,
Software pipelining with register allocation and spilling, Proceedings of the 27th annual international symposium on Microarchitecture , MICRO 27, pp.95-99, 1994. ,
DOI : 10.1145/192724.192734
Hierarchical Graph Coloring, Zha96] L. Zhang. SILP: Scheduling and Register Allocation with Integer Linear Programming, p.92, 1996. ,
Program Structure as a Basis for the Parallelization of Global Compiler Optimizations Integrated Scheduling and Register Assignment for VLIW-DSP Architectures, Proceedings of the 14th IEEE International ASIC/SOC Conference, 1992. ,
Achieved Performance in Sun and Intel Processors, p.15 ,
Memory Bottleneck in the SPEC, p.15, 2000. ,
17 2.1 Expressing an n-Disjunction with Linear Constraints, p.24 ,
82 4.9 Check Potential Killers Property Register Saturation for Local Register Allocation, Global Register Saturation, p.88 ,
255 A.5 Minimum killing set and saturating function, p.257 ,
274 B.5 spec-spice: loop9 276 B.7 spec-spice: loop5, loop6, resp 277 B.8 spec-spice: loop4, 281 C.1 RS Evolution in Unrolled Loops . . . . . . . . . . . . . . . . . . . . . . . . 287 C.2 RS Reduction in Unrolled Loops (R = 32) . . . . . . . . . . . . . . . . . . 292 C.3 ILP loss in Unrolled Loops (R = 32), p.293 ,
104 7 Spill, 105 8 Reducing the Cyclic Register, pp.9-185 ,