&. Outer and . Upperlimit, Outer++) for (Inner = 0; Inner < UPPERLIMIT; Inner++){ Res

, for (Index = 0; Index < UPPERLIMIT; Index++)

&. Outer and . Upperlimit, Outer++) for (Inner = 0; Inner < UPPERLIMIT; Inner++){ Res

, for (Index = 0; Index < UPPERLIMIT; Index++)

O. Matoussi and F. Pétrot, Loop aware ir-level annotation framework for performance estimation in native simulation, Asia and South Pacific Design Automation Conference (ASP-DAC), pp.220-225, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01522712

O. Matoussi and F. Pétrot, Modeling instruction cache and instruction buffer for performance estimation of VLIW architectures using native simulation, Design, Automation Test in Europe Conference Exhibition (DATE), pp.266-269, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01570789

O. Matoussi and F. Pétrot, IR-level annotation strategy dealing with aggressive loop optimizations for performance estimation in native simulation, Workin-progress at the 2017 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01701141

O. Matoussi and F. Pétrot, A mapping approach between IR and binary CFGs dealing with aggressive compiler optimizations for performance estimation, Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01908782

A. A. Fisher, P. Faraboschi, and C. Young, Embedded Computing, a VLIW approach to architecture, compilers and tools, 2005.

T. Austin, E. Larson, and D. Ernst, Simplescalar: an infrastructure for computer system modeling, Computer, vol.35, issue.2, pp.59-67, 2002.

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi et al., The gem5 simulator, SIGARCH Comput. Archit. News, vol.39, issue.2, pp.1-7, 2011.

A. Bouchhima, I. Bacivarovx, W. Youssef, M. Bonaciu, and A. A. Jerraya, Using abstract cpu subsystem simulation model for high level hw/sw architecture exploration, Asia and South Pacific Design Automation Conference, vol.2, pp.969-972, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00005754

N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi et al., The m5 simulator: Modeling networked systems, IEEE Micro, vol.26, issue.4, pp.52-60, 2006.

D. Berlin, D. Edelsohn, and S. Pop, High-level loop optimizations for gcc, GCC Developers' Summit, 2004.

A. Bouchhima, P. Gerin, and F. Pétrot, Automatic instrumentation of embedded software for high level hardware/software co-simulation, Asia and South Pacific Design Automation Conference, pp.546-551, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00418912

A. Bouchhima, S. Yoo, and A. Jerraya, Fast and accurate timed execution of high level embedded software using hw/sw interface simulation model, Asia and South Pacific Design Automation Conference, pp.469-474, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00008006

E. Cheung, H. Hsieh, and F. Balarin, Fast and accurate performance simulation of embedded software for mpsoc, Asia and South Pacific Design Automation Conference, pp.552-557, 2009.

E. Cheung, H. Hsieh, and F. Balarin, Memory subsystem simulation in software tlm/t models, Asia and South Pacific Design Automation Conference, pp.811-816, 2009.

C. Cifuentes and V. Malhotra, Binary translation: Static, dynamic, retargetable, IEEE International Conference on Software Maintenance, pp.340-349, 1996.

J. Cornet, F. Maraninchi, and L. Maillet-contoz, A method for the efficient development of timed and untimed transaction-level models of systems-onchip, 2008 Design, Automation and Test in Europe, pp.9-14, 2008.

M. Cunha, O. Matoussi, and F. Pétrot, Detecting software cache coherence violations in MPSoC using traces captured on virtual platforms, ACM Trans. Embedded Comput. Syst, vol.16, issue.2, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01709225

R. J. Douma, S. Altmeyer, and A. D. Pimentel, Fast and precise cache performance estimation for out-of-order execution, Design, Automation Test in Europe Conference Exhibition (DATE), pp.1132-1137, 2015.

, A clustered manycore processor architecture for embedded and accelerated applications, IEEE High Performance Extreme Computing Conference, pp.1-6, 2013.

L. Díaz, H. Posadas, and E. Villar, Fast data-cache modeling for native cosimulation, Design Automation Conference (ASP-DAC), 2011.

T. Dullien and R. Rolles, Graph-based comparison of executable objects, Actes du Symposium sur la securite des technologies de l'information et des communications, pp.1-13, 2005.

A. Gerstlauer, S. Charkravarty, and M. Kathuria, Abstract system-level models for early performance and power exploration, Asia South-Pacific Design Automation Conference, pp.213-218, 2012.

R. K. Gupta, C. N. Coelho, and G. De-micheli, Synthesis and simulation of digital systems containing interacting hardware and software components, Proceedings 29th ACM/IEEE Design Automation Conference, pp.225-230, 1992.

A. Gerstlauer, S. Charkravarty, and Z. Zhao, Automated, retargetable backannotation for host compiled performance and power modeling, International Conference on Hardware/Software Codesign and System Synthesis, pp.1-10, 2013.

P. Gerin, Modéles de Simulation pour la Validation Logicielle et l'Exploration d'Architectures des Systémes Multiprocesseurs sur Puce, 2009.

P. Gerin, M. Muhammad-hamayun, and F. Pétrot, Native mpsoc co-simulation environment for software performance estimation, Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis, pp.403-412, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00472526

A. Gerstlauer, H. Yu, and D. D. Gajski, Rtos modeling for system level design, Design, Automation and Test in Europe Conference and Exhibition, pp.130-135, 2003.

P. Gerin, S. Yoo, G. Nicolescu, and A. A. Jerraya, Scalable and flexible cosimulation of soc designs with heterogeneous multi-processor target architectures, Proceedings of the ASP-DAC, pp.63-68, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00008089

J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan et al., 48-core ia-32 message-passing processor with dvfs in 45nm cmos, Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp.108-109, 2010.

Y. Hwang, G. Schirner, S. Abdi, and D. G. Gajski, Accurate timed rtos model for transaction level modeling, 2010 Design, Automation Test in Europe Conference Exhibition (DATE 2010), pp.1333-1336, 2010.

. Intel, Intel Itanium Processor 9300 Series Reference Manual for Software Development and Optimization. Intel, 2010.

S. Kraemer, L. Gao, J. Weinstock, R. Leupers, G. Ascheid et al., Hysim: A fast simulation framework for embedded software development, International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp.75-80, 2007.

T. Kempf, K. Karuri, S. Wallentowitz, G. Ascheid, R. Leupers et al., A sw performance estimation framework for early system-level-design using fine-grained instrumentation, Proceedings of the Design Automation Test in Europe Conference, vol.1, 2006.

L. Kun, D. Muller-gritschneder, and U. Schlichtmann, Memory access reconstruction based on memory allocation mechanism for source-level simulation of embedded software, Design Automation Conference (ASP-DAC), 2013.

M. T. Lazarescu, J. R. Bammi, E. Harcourt, L. Lavagno, and M. Lajolo, Compilation-based software performance estimation for system level design, International High-Level Design Validation and Test Workshop, pp.167-172, 2000.

R. A. Lethin, How vliw almost disappeared-and then proliferated, IEEE Solid-State Circuits Magazine, vol.1, issue.3, pp.15-23, 2009.

K. Lin, C. Lo, and R. Tsay, Source-level timing annotation for fast and accurate tlm computation model generation, Asia and South Pacific Design Automation Conference (ASP-DAC), pp.235-240, 2010.

K. Lu, D. M-gritschneder, and U. Schlichtmann, Hierarchical control flow matching for source-level simulation of embedded software, System On Chip International Symposium, pp.1-5, 2012.

K. Lu, D. Muller-gritschneder, U. Schlichtmann, and O. Bringmann, Fast cache simulation for host-compiled simulation of embedded software. Design, Automation and Test in Europe Conference and Exhibition (DATE), pp.637-642, 2013.

X. Li, A. Roychoudhury, and T. Mitra, Modeling out-of-order processors for wcet analysis, Real-Time Syst, vol.34, issue.3, pp.195-227, 2006.

Y. S. Li, S. Malik, and A. Wolfe, Performance estimation of embedded software with instruction cache modeling, International conference of ComputerAided design, 1995.

D. Melpignano, L. Benini, E. Flamand, B. Jego, T. Lepley et al., Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications, Proceedings of the 49th Annual Design Automation Conference, pp.1137-1142, 2012.

D. Mueller-gritschneder, K. Lu, and U. Schlichtmann, Control-flow-driven source level timing annotation for embedded software models on transaction level, Euromicro Conference on Digital System Design, 2011.

O. Matoussi and F. Pétrot, IR-level annotation strategy dealing with aggressive loop optimizations for performance estimation in native simulation, Work-in-progress at the 2017 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01701141

O. Matoussi and F. Pétrot, Loop aware ir-level annotation framework for performance estimation in native simulation, Asia and South Pacific Design Automation Conference (ASP-DAC), pp.220-225, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01522712

O. Matoussi and F. Pétrot, Modeling instruction cache and instruction buffer for performance estimation of VLIW architectures using native simulation, Design, Automation Test in Europe Conference Exhibition (DATE), pp.266-269, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01570789

O. Matoussi and F. Pétrot, A mapping approach between IR and binary CFGs dealing with aggressive compiler optimizations for performance estimation, Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01908782

R. Le-moigne, O. Pasquier, and J. P. Calvez, A generic rtos model for real-time systems simulation with systemc, Proceedings Design, Automation and Test in Europe Conference and Exhibition, vol.3, pp.82-87, 2004.

A. Muttreja, A. Raghunathan, S. Ravi, and N. K. Jha, Hybrid simulation for embedded software energy estimation, Proceedings. 42nd Design Automation Conference, pp.23-26, 2005.

T. Meyerowitz, A. Sangiovanni-vincentelli, M. Sauermann, and D. Langen, Source-level timing annotation and simulation for a heterogeneous multiprocessor, Design Automation and Test Europe, 2008.

T. Meyerowitz, A. Sangiovanni-vincentelli, M. Sauermann, and D. Langen, Source-level timing annotation and simulation for a heterogeneous multiprocessor, Design, Automation and Test in Europe, pp.276-279, 2008.

D. Novillo, Gcc-an architectural overview, current status, and future directions, Proceedings of the Linux Symposium, 2006.

, Osci tlm-2.0 language reference manual

S. Pop, A. Cohen, and C. Bastoul, Graphite: Polyhedral analyses and optimizations for gcc, Proceedings of the GCC Developers Summit, pp.1-18, 2006.

A. Pedram, D. Craven, and A. Gerstlauer, Modeling Cache Effects at the Transaction Level, pp.89-101, 2009.

F. Pétrot, N. Fournel, P. Gerin, M. Gligor, M. Hamayun et al., On mpsoc software execution at the transaction level, IEEE design and test of computers, vol.28, issue.3, pp.32-43, 2011.

X. Pan and B. Jonsson, A modeling framework for reuse distance-based estimation of cache performance. Performance Analysis of Systems and Software (ISPASS), pp.62-71, 2015.

I. Puaut, B. Lesage, and D. Hardy, Scalable fixed-point free instruction cache analysis, Real-Time System Symposium (RTSS), 2011.
URL : https://hal.archives-ouvertes.fr/inria-00638698

L. N. Pouchet, Polybench benchmark

R. Plyaskin, T. Wild, and A. Herkersdorf, System-level software performance simulation considering out-of-order processor execution, IEEE International Symposium on System On Chip, pp.1-7, 2012.

C. Rochange and P. Sainrat, A Context-Parameterized Model for Static Analysis of Execution Times, pp.222-241, 2009.

G. Sarrazin, Simulation fonctionnelle native pour des systèmes manycoeurs, 2016.

J. Schnerr and O. Bringmann, High-performance timing simulation of embedded software, Design Automation Conference (DAC), 2008.

S. Stattelmann, O. Bringmann, and W. Rosenstiel, Dominator homomorphism based code matching for source-level simulation of embedded software, International Conference on Hardware/Software Codesign and System Synthesis, 2011.

S. Stattelmann, O. Bringmann, and W. Rosenstiel, Fast and accurate sourcelevel simulation of software timing considering complex code optimizations, Design Automation Conference, 2011.

S. Stattelmann, G. Gebhard, C. Cullmann, and O. Bringmann, Hybrid sourcelevel simulation of data caches using abstract cache models. Design, Automation and Test in Europe Conference and Exhibition (DATE), 2012.

H. Shen, M. M. Hamayun, and F. Petrot, Native simulation of mpsoc using hardware-assisted virtualization, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.31, issue.7, pp.1074-1087, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00744439

A. Seznec and P. Michaud, A case for (partially) tagged geometric history length branch prediction, J. Instruction-Level Parallelism, issue.8, 2006.

, STMicroelectronics. ST200 VLIW Series ST231 Core and Instruction Set Architecture Manual. STMicroelectronics, 2004.

R. Tarjan, Depth-first search and linear graph algorithms, SIAM journal on computing, vol.1, issue.2, pp.146-160, 1972.

, Tile-gx8072 processor product brief

J. Michael-bedford-taylor, J. Kim, D. Miller, F. Wentzlaff, B. Ghodrat et al., The raw microprocessor: A computational fabric for software circuits and general-purpose programs, IEEE micro, vol.22, issue.2, pp.25-35, 2002.

W. Tibboel, V. Reyes, M. Klompstra, and D. Alders, System-level design flow based on a functional reference for hw and sw, Design Automation Conference, pp.23-28, 2007.

Z. Wang, Software Performance Estimation Methods for System-Level Design of Embedded Systems, 2010.

Z. Wang and J. Henkel, Accurate source-level simulation of embedded software with respect to compiler optimizations, Design, Automation and Test in Europe Conference and Exhibition, pp.382-387, 2012.

Z. Wang and J. Henke, Fast and accurate cache modeling in source-level simulation of embedded software, Design, Automation and test in Europe Conference and Exhibition (DATE), 2013.
DOI : 10.7873/date.2013.129

M. Steven-cameron-woo, E. Ohara, J. P. Torrie, A. Singh, and . Gupta, The splash-2 programs: Characterization and methodological considerations, SIGARCH Comput. Archit. News, vol.23, issue.2, pp.24-36, 1995.

,. I. Sungjoo-yoo, A. Bacivarov, Y. Bouchhima, A. Paviot, and . Jerraya, Building fast and accurate sw simulation models based on hardware abstraction layer and simulation environment abstraction layer, Design, Automation and Test in Europe Conference and Exhibition, pp.550-555, 2003.

S. Yoo and A. A. Jerraya, Introduction to hardware abstraction layers for soc, Design, Automation and Test in Europe Conference and Exhibition, pp.336-337, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00016132

]. R. Yan, D. Ma, K. Huang, X. Zhang, and S. Xiu, Annotation and analysis combined cache modeling for native simulation, Asia and South Pacific Design Automation Conference (ASP-DAC), pp.406-411, 2014.
DOI : 10.1109/aspdac.2014.6742925

W. Zhonglei and A. Herkersdorf, An efficient approach for system-level timing simulation of compiler-optimized embedded software, Design Automation Conference (DAC), 2009.

V. Zivojnovic and H. Meyr, Compiled hw/sw co-simulation, Design Automation Conference Proceedings, pp.690-695, 1996.