D. Avoidance and C. , 35 3.3.3 Recovery from Transient Faults, Hybrid and Optimized Error Control Schemes, p.37

C. Checkpointing and R. , 53 4.3.1 Principle of Coordinated Checkpointing, p.55

.. État-de-l-'art-des-techniques-de-tolérance-aux-fautes-dans-les-nocs, 135 8.3.1 Récupération par points de reprise, p.135

[. Bibliography, W. Aguilera, S. Chen, and . Toueg, Failure detection and consensus in the crash-recovery model, Distrib. Comput, vol.13, issue.2, pp.99-125, 2000.

E. Lorenzo-alvisi, S. S. Elnozahy, . Rao, A. Syed, and . Husain, An analysis of communication-induced checkpointing, 1999.

A. Agarwal and V. K. Garg, Efficient dependency tracking for relevant events in sharedmemory systems, PODC '05: Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing, pp.19-28, 2005.

L. Alvisi, B. Hoppe, and K. Marzullo, Nonblocking and orphan-free message logging protocols, Proceedings of the 23 rd Fault-Tolerant Computing Symposium, pp.145-154, 1993.

N. Arshad, D. Heimbigner, and A. L. Wolf, Dealing with failures during failure recovery of distributed systems, DEAS '05: Proceedings of the 2005 workshop on Design and evolution of autonomic application software, pp.1-6, 2005.

A. Avizienis, J. Laprie, B. Randell, and C. Landwehr, Basic concepts and taxonomy of dependable and secure computing, IEEE Transactions on Dependable and Secure Computing, vol.1, issue.1, pp.11-33, 2004.
DOI : 10.1109/TDSC.2004.2

L. Alvisi and K. Marzullo, Message logging: pessimistic, optimistic, and causal, Proceedings of 15th International Conference on Distributed Computing Systems, 1995.
DOI : 10.1109/ICDCS.1995.500024

D. Avresky and N. Natchev, Dynamic reconfiguration in computer clusters with irregular topologies in the presence of multiple node and link failures, IEEE Transactions on Computers, vol.54, issue.5, pp.603-615, 2005.
DOI : 10.1109/TC.2005.76

A. Armoush, F. Salewski, and S. Kowalewski, A Hybrid Fault Tolerance Method for Recovery Block with a Weak Acceptance Test, 2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, pp.484-491, 2008.
DOI : 10.1109/EUC.2008.102

A. Armoush, F. Salewski, and S. Kowalewski, Recovery Block with Backup Voting: A New Pattern with Extended Representation for Safety Critical Embedded Systems, 2008 International Conference on Information Technology, pp.232-237, 2008.
DOI : 10.1109/ICIT.2008.60

D. Bertozzi, L. Benini, and G. D. Micheli, Low power error resilient encoding for on-chip data buses, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, p.102, 2002.
DOI : 10.1109/DATE.2002.998256

D. Bertozzi, L. Benini, and G. D. Micheli, Error control schemes for on-chip communication links: the energy-reliability tradeoff, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.24, issue.6, pp.818-831, 2005.
DOI : 10.1109/TCAD.2005.847907

L. Bononi and N. Concer, Simulation and analysis of network on chip architectures: ring, spidergon and 2D mesh, Proceedings of the Design Automation & Test in Europe Conference, pp.154-159, 2006.
DOI : 10.1109/DATE.2006.243841

E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, QNoC: QoS architecture and design process for network on chip, Journal of Systems Architecture, vol.50, issue.2-3, pp.105-128, 2004.
DOI : 10.1016/j.sysarc.2003.07.004

E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, Routing Table Minimization for Irregular Mesh NoCs, 2007 Design, Automation & Test in Europe Conference & Exhibition, pp.942-947, 2007.
DOI : 10.1109/DATE.2007.364414

D. Buntinas, C. Coti, T. Herault, P. Lemarinier, L. Pilard et al., Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols, Future Generation Computer Systems, vol.24, issue.1, pp.73-84, 2008.
DOI : 10.1016/j.future.2007.02.002

URL : https://hal.archives-ouvertes.fr/hal-00688644

G. Bronevetsky, D. Marques, K. Pingali, and P. Stodghill, Automated application-level checkpointing of MPI programs, PPoPP '03: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp.84-94, 2003.

G. Bronevetsky, K. Pingali, and P. Stodghill, Experimental evaluation of applicationlevel checkpointing for OpenMP programs, ICS '06: Proceedings of the 20 th annual international conference on Supercomputing, pp.2-13, 2006.

P. Bhojwani, R. Singhal, G. Choi, and R. Mahapatra, Forward error correction for on-chip interconnection networks, UCAS-II: Proceedings of International Workshop on Unique Chips and Systems, 2006.

K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, 3-D ICs: a novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration, Proceedings of the IEEE, pp.602-633, 2001.
DOI : 10.1109/5.929647

A. Bartzas, K. Siozios, and D. Soudris, Topology exploration and buffer sizing for threedimensional networks-on-chip, DATE Workshop on 3D Integration, 2009.

J. Bainbridge, W. Toms, D. Edwards, and S. Furber, Delay-insensitive, point-topoint interconnect using m-of-n codes, ASYNC'03 Proceedings. Ninth International Symposium on Asynchronous Circuits and Systems, pp.132-140, 2003.

C. M. Cunningham and D. R. Avresky, Fault-tolerant adaptive routing for two-dimensional meshes, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture, p.122, 1995.
DOI : 10.1109/HPCA.1995.386549

F. Chaix, D. Avresky, N. Zergainoh, and M. Nicolaidis, Faulttolerant deadlock-free adaptive routing for any set of link and node failures in multi-core systems, NCA '10: The 9 th IEEE International Symposium on Network Computing and Applications, 2010.

S. Chalasani and R. V. Boppana, Communication in multicomputers with nonconvex faults, IEEE Transactions on Computers, vol.46, issue.5, pp.616-622, 1997.
DOI : 10.1109/12.589238

C. Chen and G. Chiu, A fault-tolerant routing scheme for meshes with nonconvex faults, IEEE Transactions on Parallel and Distributed Systems, vol.12, issue.5, pp.467-475, 2001.
DOI : 10.1109/71.926168

S. Chandra and P. M. Chen, The impact of recovery mechanisms on the likelihood of saving corrupted state, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings., p.91, 2002.
DOI : 10.1109/ISSRE.2002.1173219

G. Campobello, M. Castano, C. Ciofi, and D. Mangano, GALS Networks on Chip: A New Solution for Asynchronous Delay-Insensitive Links, Proceedings of the Design Automation & Test in Europe Conference, pp.160-165, 2006.
DOI : 10.1109/DATE.2006.243842

L. Cai and D. Gajski, Transaction level modeling: an overview, CODES+ISSS '03: Proceedings of the 1 st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis ACM. [CGL + 08] Marcello Coppola, Miltos D. Grammatikakis, Riccardo Locatelli, Giuseppe Maruccia, and Lorenzo Pieralisi. Design of Cost-Efficient Interconnect Processing Units, pp.19-24, 2003.

C. L. Chen and M. Y. Hsiao, Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review, IBM Journal of Research and Development, vol.28, issue.2, pp.124-134, 1984.
DOI : 10.1147/rd.282.0124

N. Concer, S. Iamundo, and L. Bononi, aEqualized: A novel routing algorithm for the Spidergon Network On Chip, 2009 Design, Automation & Test in Europe Conference & Exhibition, pp.749-754, 2009.
DOI : 10.1109/DATE.2009.5090764

X. Caron, J. Kienzle, and A. Strohmeier, Object-Oriented Stable Storage Based on Mirroring, Ada Europe '01: Proceedings of the 6 th Ade-Europe International Conference Leuven on Reliable Software Technologies, pp.278-289, 2001.
DOI : 10.1007/3-540-45136-6_22

K. , M. Chandy, and L. Lamport, Distributed snapshots: determining global states of distributed systems, ACM Trans. Comput. Syst, vol.3, issue.1, pp.63-75, 1985.

F. Cristian, S. Mishra, and Y. S. Hyun, Implementation and performance of a stable-storage service in Unix, Proceedings 15th Symposium on Reliable Distributed Systems, p.86, 1996.
DOI : 10.1109/RELDIS.1996.559701

C. Constantinescu, Intermittent faults and effects on reliability of integrated circuits, 2008 Annual Reliability and Maintainability Symposium, pp.370-374, 2008.
DOI : 10.1109/RAMS.2008.4925824

G. Cao and M. Singhal, On coordinated checkpointing in distributed systems, IEEE Trans. Parallel Distrib. Syst, vol.9, issue.12, pp.1213-1225, 1998.

G. Cao and M. Singhal, Mutable checkpoints, Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing , PODC '99, pp.157-172, 2001.
DOI : 10.1145/301308.301371

C. Chiang and S. Sinha, The road to 3D EDA tool readiness, 2009 Asia and South Pacific Design Automation Conference, pp.429-436, 2009.
DOI : 10.1109/ASPDAC.2009.4796519

N. Chen, Y. Yu, and S. Ren, Checkpoint Interval and System's Overall Quality for Message Logging-Based Rollback and Recovery in Distributed and Embedded Computing, 2009 International Conference on Embedded Software and Systems, pp.315-322, 2009.
DOI : 10.1109/ICESS.2009.34

W. J. Dally, Future directions for on-chip interconnection networks, OCIN Workshop, 2006.

J. C. David-de-andrés, D. Ruiz, P. J. Gil, A. Gil-de-camargo, F. Goldchleger et al., Fast emulation of permanent faults in VLSI systems Checkpointing-based rollback recovery for parallel applications on the integrade grid middleware, Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL) MGC '04: Proceedings of the 2 nd workshop on Middleware for grid computing, pp.1-6, 2004.

S. Das, A. Chandrakasan, and R. Reif, Three-dimensional integrated circuits: performance, design methodology, and CAD tools, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings., p.13, 2003.
DOI : 10.1109/ISVLSI.2003.1183348

G. Druais, G. Dilliway, P. Fischer, E. Guidotti, O. Lühn et al., High aspect ratio via metallization for 3D integration using CVD TiN barrier and electrografted Cu seed, Microelectronic Engineering, vol.85, issue.10, pp.851957-1961, 2008.
DOI : 10.1016/j.mee.2008.06.004

W. Edsger and . Dijkstra, Self-stabilizing systems in spite of distributed control, Commun. ACM, vol.17, issue.11, pp.643-644, 1974.

T. Dumitras and R. Marculescu, On-Chip Stochastic Communication, DATE '03: Proceedings of the conference on Design, Automation and Test in Europe, 2003.
DOI : 10.1007/0-306-48709-8_28

G. Denaro and L. Mariani, Towards Testing and Analysis of Systems that Use Serialization, Electronic Notes in Theoretical Computer Science, vol.116, pp.171-184, 2005.
DOI : 10.1016/j.entcs.2004.02.075

G. De and M. , An outlook on design technologies for future integrated systems, Trans. Comp.-Aided Des. Integ. Cir. Sys, vol.28, issue.6, pp.777-790, 2009.

J. William, C. L. Dally, and . Seitz, Deadlock-free message routing in multiprocessor interconnection networks, pp.345-351, 1994.

A. Dutta and N. A. Touba, Reliable Network-on-Chip Using a Low Cost Unequal Error Protection Code, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007), pp.3-11, 2007.
DOI : 10.1109/DFT.2007.20

J. M. Dumoulin, Apollo-11, 2001.

C. Duan, C. Zhu, and S. P. Khatri, Forbidden transition free crosstalk avoidance CODEC design, Proceedings of the 45th annual conference on Design automation, DAC '08, pp.986-991, 2008.
DOI : 10.1145/1391469.1391717

E. N. Mootaz, L. Elnozahy, Y. Alvisi, D. B. Wang, and . Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Comput. Surv, vol.34, issue.3, pp.375-408, 2002.

D. B. Elmootazbellah-nabil-elnozahy, W. Johnson, and . Zwaenepoel, The performance of consistent checkpointing, Proceedings of the 11 th Symposium on Reliable Distributed Systems, pp.39-47, 1992.

N. Elmootazbellah, J. S. Elnozahy, and . Plank, Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery, IEEE Trans. Dependable Secur. Comput, vol.1, issue.2, pp.97-108, 2004.

D. Fick, A. Deorio, G. K. Chen, V. Bertacco, D. Sylvester et al., A highly resilient routing algorithm for fault-tolerant NoCs, 2009 Design, Automation & Test in Europe Conference & Exhibition, pp.21-26, 2009.
DOI : 10.1109/DATE.2009.5090627

A. Pereira-frantz, F. Lima-kastensmidt, L. Carro, and E. Cota, Dependable network-on-chip router able to simultaneously tolerate soft errors and crosstalk, pp.1-9, 2006.

T. Huining, F. , and E. A. Lee, Incremental checkpointing with application to distributed discrete event simulation, WSC '06: Proceedings of the 38 th conference on Winter simulation, pp.1004-1011, 2006.

C. Ferri, S. Reda, and R. I. Bahar, Parametric yield management for 3D ICs, ACM Journal on Emerging Technologies in Computing Systems, vol.4, issue.4, pp.1-22, 2008.
DOI : 10.1145/1412587.1412592

S. Furber, Living with failure: Lessons from nature? Essential fault-tolerance metrics for NoC infrastructures, ETS '06: Proceedings of the Eleventh IEEE European Test Symposium IOLTS '07: Proceedings of the 13 th IEEE International On-Line Testing Symposium, pp.4-8, 2006.

Q. Gao, W. Huang, M. J. Koop, and D. K. Panda, Group-based Coordinated Checkpointing for MPI: A Case Study on InfiniBand, 2007 International Conference on Parallel Processing (ICPP 2007), p.47, 2007.
DOI : 10.1109/ICPP.2007.44

. Gis-+-06-]-cristian, A. Grecu, R. Ivanov, E. S. Saleh, P. P. Sogomonyan et al., Online fault detection and location for NoC interconnects, IOLTS'06 Proceedings of IEEE International On-Line Testing Symposium, 2006.

C. Grecu, A. Ivanov, R. Saleh, and P. P. Pande, NoC Interconnect Yield Improvement Using Crosspoint Redundancy, 2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, pp.457-465, 2006.
DOI : 10.1109/DFT.2006.46

B. Gupta, N. Mogharreban, S. Rahimi, and A. Vemuri, A high performance non-blocking checkpointing/recovery algorithm for ring networks, PDPTA, pp.234-240, 2006.

A. Ganguly, P. Pratim-pande, B. Belzer, and C. Grecu, Design of Low Power & Reliable Networks on Chip Through Joint Crosstalk Avoidance and Multiple Error Correction Coding, Journal of Electronic Testing, vol.5, issue.4, pp.1-367, 2008.
DOI : 10.1007/s10836-007-5035-1

B. Gupta and S. Rahimi, A fast and efficient non-blocking coordinated checkpointing approach for distributed systems, PDPTA, pp.99-105, 2006.

B. Gupta and S. Rahimi, A Novel Low-Overhead Recovery Approach for Distributed Systems, Journal of Computer Systems, Networks, and Communications, vol.3, issue.3, pp.1-8, 2009.
DOI : 10.1016/j.jpdc.2008.08.003

Q. Gao, W. Yu, W. Huang, and D. K. Panda, Application-transparent checkpoint/restart for MPI programs over infiniband, ICPP '06: Proceedings of the 2006 International Conference on Parallel Processing, pp.471-478, 2006.

M. Huang and B. Bode, A performance comparison of tree and ring topologies in distributed systems, IPDPS '05: Proceedings of the 19 th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05, pp.258-259, 2005.
DOI : 10.2172/837269

A. Hansson, M. Coenen, and K. Goossens, Undisrupted Quality-of-Service during Reconfiguration of Multiple Applications in Networks on Chip, 2007 Design, Automation & Test in Europe Conference & Exhibition, pp.954-959, 2007.
DOI : 10.1109/DATE.2007.364416

A. Hansson, K. Goossens, and A. Radulescu, A unified approach to constrained mapping and routing on network-on-chip architectures, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES+ISSS '05, pp.75-80, 2005.
DOI : 10.1145/1084834.1084857

D. Hodges, H. Jackson, and R. Saleh, Analysis and Design of Digital Integrated Circuits, 2004.

S. Hong, Y. Kim, H. Y. Cho, . Yeom, and . Park, On the choice of checkpoint interval using memory usage profile and adaptive time series analysis, PRDC '01: Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing, p.45, 2001.

J. Hu and R. Marculescu, DyAD, Proceedings of the 41st annual conference on Design automation , DAC '04, 2004.
DOI : 10.1145/996566.996638

C. Ho and L. Stockmeyer, A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers, IEEE Trans. Comput, vol.53, issue.4, pp.427-439, 2004.

M. Y. Hsiao, A Class of Optimal Minimum Odd-weight-column SEC-DED Codes, IBM Journal of Research and Development, vol.14, issue.4, pp.395-401, 1970.
DOI : 10.1147/rd.144.0395

S. Jafar, A. Krings, and T. Gautier, Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing, IEEE Transactions on Dependable and Secure Computing, vol.6, issue.1, pp.32-44, 2009.
DOI : 10.1109/TDSC.2008.17

URL : https://hal.archives-ouvertes.fr/hal-00684942

A. Jantsch, R. Lauter, and A. Vitkovski, Power analysis of link level and end-to-end data protection in networks on chip, 2005 IEEE International Symposium on Circuits and Systems, pp.1770-1773, 2005.
DOI : 10.1109/ISCAS.2005.1464951

A. Jain and R. K. Shyamasundar, Failure Detection and Membership Management in Grid Environments, Fifth IEEE/ACM International Workshop on Grid Computing, pp.44-52, 2004.
DOI : 10.1109/GRID.2004.30

F. Lima-kastensmidt, L. Carro, and R. Reis, Fault-Tolerance Techniques for SRAM-Based FPGAs (Frontiers in Electronic Testing, 2006.

C. Kretzschmar, A. K. Nieuwland, and D. Müller, Why transition coding for power minimization of on-chip buses does not work, Proceedings Design, Automation and Test in Europe Conference and Exhibition, p.10512, 2004.
DOI : 10.1109/DATE.2004.1268897

]. Kim, C. Nicopoulos, D. Park, V. Narayanan, M. S. Yousif et al., A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA'06 33 rd International Symposium on Computer Architecture, pp.4-15, 2006.
DOI : 10.1145/1150019.1136487

R. Koo and S. Toueg, Checkpointing and Rollback-Recovery for Distributed Systems, ACM '86: Proceedings of 1986 ACM Fall joint computer conference, pp.1150-1158, 1986.
DOI : 10.1109/TSE.1987.232562

R. Koo and S. Toueg, Checkpointing and Rollback-Recovery for Distributed Systems, IEEE Transactions on Software Engineering, vol.13, issue.1, pp.23-31, 1987.
DOI : 10.1109/TSE.1987.232562

R. Koetter and A. Vardy, Algebraic soft-decision decoding of reed-solomon codes, IEEE Transactions on Information Theory, vol.49, issue.11, pp.2809-2825, 2001.
DOI : 10.1109/TIT.2003.819332

L. Lamport, Time, clocks, and the ordering of events in a distributed system, Communications of the ACM, vol.21, issue.7, pp.558-565, 1978.
DOI : 10.1145/359545.359563

P. Lemarinier, A. Bouteiller, G. Krawezik, and F. Cappello, Coordinated checkpoint versus message log for fault tolerant MPI, International Journal of High Performance Computing and Networking, vol.2, issue.2/3/4, pp.2-4146, 2004.
DOI : 10.1504/IJHPCN.2004.008899

P. Leduc, 3D integration: A solution for interconnects, 2007.

J. Li, 3D integration opportunities and challenges, RFID '07: IEEE International Conference on Radio Frequency Identification, pp.175-182, 2008.

J. Lequepeys and D. Lattard, Trends in complex SoC design: From technology variability to multiprocessor architectures, 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference, 2008.
DOI : 10.1109/NEWCAS.2008.4606406

T. Lehtonen, P. Liljeberg, and J. Plosila, Online reconfigurable self-timed links for fault tolerant NoC. VLSI Design, p.13, 2007.

T. Lehtonen, P. Liljeberg, J. Plosila-igor, S. Loi, T. H. Mitra et al., Fault tolerant distributed routing algorithms for mesh networks-on-chip A low-overhead fault tolerance scheme for TSV-based 3D network on chip links, ISSCS '09: International Symposium on Signals, Circuits and Systems ICCAD '08: Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, pp.1-4, 2008.

]. A. Lms-+-05, P. Leroy, A. Marchal, F. Shickova, F. Catthoor et al., Spatial division multiplexing: a novel approach for guaranteed throughput on NoCs, CODES+ISSS '05: Proceedings of the 3 rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pp.81-86, 2005.

Y. Li, S. Peng, and W. Chu, Adaptive box-based efficient fault-tolerant routing in 3D torus, pp.71-77, 2005.

L. Lu, H. Shang, H. Zhou, F. Zhu, X. Yang et al., Statistical reliability analysis under process variation and aging effects, Proceedings of the 46th Annual Design Automation Conference on ZZZ, DAC '09, pp.514-519, 2009.
DOI : 10.1145/1629911.1630044

A. Nur, . Touba-laung-terng, C. E. Wang, and . Stroud, System-on-Chip Test Architectures, 2007.

L. Li, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, Adaptive error protection for energy efficiency, ICCAD '03: Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design, 2003.

Z. Li, L. Yuan, P. Mohapatra, and C. Chuah, On the analysis of overlay failure detection and recovery, Computer Networks, vol.51, issue.13, pp.3828-3843, 2007.
DOI : 10.1016/j.comnet.2007.04.007

S. Murali, D. Atienza, L. Benini, and G. D. Micheli, A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip, Proceedings of the 43rd annual conference on Design automation , DAC '06, pp.845-848, 2006.
DOI : 10.1145/1146909.1147124

L. Srinivasan-murali, T. Benini, N. Theocharides, M. J. Vijaykrishnan, G. D. Irwin et al., Analysis of Error Recovery Schemes for Networks on Chips, IEEE Design and Test of Computers, vol.22, issue.5, pp.434-442, 2005.
DOI : 10.1109/MDT.2005.104

C. Marcon, N. Calazans, F. Moraes, A. Susin, I. Reis et al., Exploring NoC Mapping Strategies: An Energy and Timing Aware Technique, Design, Automation and Test in Europe, pp.502-507, 2005.
DOI : 10.1109/DATE.2005.149

URL : https://hal.archives-ouvertes.fr/hal-00181561

N. Mittal, F. C. Freiling, S. Venkatesan, and L. D. Penso, On termination detection in crash-prone distributed systems with failure detectors, Journal of Parallel and Distributed Computing, vol.68, issue.6, pp.855-875, 2008.
DOI : 10.1016/j.jpdc.2008.02.001

N. Magen, A. Kolodny, U. Weiser, and N. Shamir, Interconnect-power dissipation in a microprocessor, Proceedings of the 2004 international workshop on System level interconnect prediction , SLIP '04, pp.7-13, 2004.
DOI : 10.1145/966747.966750

P. Sarathi, M. , and K. Mukhopadhyaya, Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks, J. Parallel Distrib. Comput, vol.64, issue.5, pp.649-661, 2004.

S. Monnet, C. Morin, and R. Badrinath, A hierarchical checkpointing protocol for parallel applications in cluster federations, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., p.211, 2004.
DOI : 10.1109/IPDPS.2004.1303242

URL : https://hal.archives-ouvertes.fr/inria-00000990

K. Todd and . Moon, Error Correction Coding: Mathematical Methods and Algorithms, 2005.

F. Moll, M. Roca, and A. Rubio, Measurement of crosstalk-induced delay errors in integrated circuits, Electronics Letters, vol.33, issue.19, pp.1623-1624, 1997.
DOI : 10.1049/el:19971083

M. Mutyam, Selective shielding: a crosstalk-free bus encoding technique, 2007 IEEE/ACM International Conference on Computer-Aided Design, pp.618-621, 2007.
DOI : 10.1109/ICCAD.2007.4397333

N. Nordbotten, M. Gómez, . Flich, . Lopez, . Robles et al., A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes, pp.341-356, 2004.
DOI : 10.1137/0211027

M. Nicolaidis, Design for soft error mitigation, IEEE Transactions on Device and Materials Reliability, vol.5, issue.3, pp.405-418, 2005.
DOI : 10.1109/TDMR.2005.855790

URL : https://hal.archives-ouvertes.fr/hal-00107331

A. K. Nieuwland, A. Katoch, D. Rossi, and C. Metra, Coding techniques for low switching noise in fault tolerant busses, 11th IEEE International On-Line Testing Symposium, pp.183-189, 2005.
DOI : 10.1109/IOLTS.2005.19

B. Parhami, Design of reliable software via general combination of N-version programming and acceptance testing, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering, p.104, 1996.
DOI : 10.1109/ISSRE.1996.558714

S. Pasricha, Exploring serial vertical interconnects for 3D ICs, Proceedings of the 46th Annual Design Automation Conference on ZZZ, DAC '09, pp.581-586, 2009.
DOI : 10.1145/1629911.1630061

I. Parulkar and R. Cypher, Trends and trade-offs in designing highly robust throughput computing oriented chips and systems, 11th IEEE International On-Line Testing Symposium, pp.74-77, 2005.
DOI : 10.1109/IOLTS.2005.68

J. A. Patel, I. Gupta, and N. Contractor, Jetstream: Achieving predictable gossip dissemination by leveraging social network principles, NCA '06: Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications, pp.32-39, 2006.

D. A. Patterson, G. Gibson, and R. H. Katz, A case for redundant arrays of inexpensive disks(raid), pp.474-481, 2000.

M. Palesi, R. Holsmark, S. Kumar, and V. Catania, Application Specific Routing Algorithms for Networks on Chip, IEEE Transactions on Parallel and Distributed Systems, vol.20, issue.3, pp.316-330, 2009.
DOI : 10.1109/TPDS.2008.106

M. Palesi, S. Kumar, R. H. , M. Pirretti, G. M. Link et al., A method for router table compression for application specific routing in mesh topology NoC architectures Fault tolerant algorithms for network-on-chip interconnect, Proc. International Workshop on Architectures, Modeling, and Simulation VLSI'04 Proceedings. IEEE Computer society Annual Symposium on VLSI, 2004.

N. Ketan, I. L. Patel, N. Markov-jongman-kim, C. R. Vijaykrishnan, and . Das, Error-correction and crosstalk avoidance in DSM busses [PNK + 06 Exploring fault-tolerant network-on-chip architectures, DSN '06: Proceedings of the International Conference on Dependable Systems and Networks, pp.1076-1080, 2004.

P. Pratim-pande, H. Zhu, A. Ganguly, and C. Grecu, Energy reduction through crosstalk avoidance coding in NoC paradigm, DSD '06: Proceedings of the 9 th EUROMICRO Conference on Digital System Design, pp.689-695, 2006.

F. Quaglia and A. Santoro, Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters, ICS '03: Proceedings of the 17 th annual international conference on Supercomputing, pp.130-139, 2003.

H. Racke, Survey on Oblivious Routing Strategies, CiE '09: Proceedings of the 5 th Conference on Computability in Europe, pp.419-429, 2009.
DOI : 10.1145/828.1892

D. Rossi, P. Angelini, and C. Metra, Configurable Error Control Scheme for NoC Signal Integrity, 13th IEEE International On-Line Testing Symposium (IOLTS 2007), pp.43-48, 2007.
DOI : 10.1109/IOLTS.2007.24

B. Randell, System structure for software fault tolerance, Proceedings of the international conference on Reliable software, pp.437-449, 1975.

E. Rijpkema, K. G. Goossens, A. Radulescu, J. Dielissen, J. Van-meerbergen et al., Trade offs in the design of a router with both guaranteed and best-effort services for networks on chip, DATE '03: Proceedings of the conference on Design, Automation and Test in Europe, p.10350, 2003.

D. Rossi, A. K. Nieuwland, A. Katoch, and C. Metra, Exploiting ECC redundancy to minimize crosstalk impact, IEEE Design and Test of Computers, vol.22, issue.1, pp.59-70, 2005.
DOI : 10.1109/MDT.2005.10

I. S. Reed and G. Solomon, Polynomial Codes Over Certain Finite Fields, Journal of the Society for Industrial and Applied Mathematics, vol.8, issue.2, pp.300-304, 1960.
DOI : 10.1137/0108018

B. Ramkumar and V. Strumpen, Portable checkpointing for heterogeneous archtitectures, p.58, 1997.
DOI : 10.1109/ftcs.1997.614078

J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers, Lifetime Reliability: Toward an Architectural Solution, IEEE Micro, vol.25, issue.3, pp.70-80, 2005.
DOI : 10.1109/MM.2005.54

M. Scannell, CEA-Leti 3D activities and roadmap, EMC-3D European Technical Symposium, 2007.

F. B. Schneider, Byzantine generals in action: implementing fail-stop processors, ACM Transactions on Computer Systems, vol.2, issue.2, pp.145-154, 1984.
DOI : 10.1145/190.357399

W. K. Kuo-feng-ssu, H. C. Fuchs, and . Jiau, Process recovery in heterogeneous systems, IEEE Transactions on Computers, vol.52, issue.2, pp.126-138, 2003.
DOI : 10.1109/TC.2003.1176981

T. Sato and Y. Kunitake, A Simple Flip-Flop Circuit for Typical-Case Designs for DFM, 8th International Symposium on Quality Electronic Design (ISQED'07), pp.539-544, 2007.
DOI : 10.1109/ISQED.2007.23

C. Seiculescu, S. Murali, L. Benini, and G. D. Micheli, SunFloor 3D: A tool for Networks On Chip topology synthesis for 3D systems on chips, 2009 Design, Automation & Test in Europe Conference & Exhibition, pp.9-14, 2009.
DOI : 10.1109/DATE.2009.5090625

D. Richard, F. B. Schlichting, and . Schneider, Fail-stop processors: An approach to designing fault-tolerant computing systems, 1983.

M. Singhal and N. G. Shivaratri, Advanced Concepts in Operating Systems, 1994.

R. Srinivasa, N. R. Sridhara, and . Shanbhag, Coding for system-on-chip networks: a unified framework, DAC '04: Proceedings of the 41 st annual Design Automation Conference, pp.103-106, 2004.

L. M. Silva, J. G. Silva, and S. Chapple, Portable transparent checkpointing for distributed shared memory, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing HPDC-96, p.422, 1996.
DOI : 10.1109/HPDC.1996.546213

F. Salewski and A. Taylor, Fault Handling in FPGAs and Microcontrollers in Safety-Critical Embedded Applications: A Comparative Survey, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007), pp.124-131, 2007.
DOI : 10.1109/DSD.2007.4341459

B. Towles, W. J. Dally, and S. Boyd, Throughput-centric routing algorithm design, Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '03, pp.200-209, 2003.
DOI : 10.1145/777412.777444

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.4317

D. Tutsch and M. Malek, Comparison of Network-on-Chip Topologies for Multicore Systems Considering Multicast and Local Traffic, Proceedings of the Second International ICST Conference on Simulation Tools and Techniques, pp.1-9, 2009.
DOI : 10.4108/ICST.SIMUTOOLS2009.5590

P. Vellanki, N. Banerjee, and K. S. Chatha, Quality-of-service and error control techniques for mesh-based network-on-chip architectures, Integration, the VLSI Journal, vol.38, issue.3, pp.353-382, 2005.
DOI : 10.1016/j.vlsi.2004.07.009

T. Wilfredo, Software fault tolerance: A tutorial, 2000.

E. Wong and S. K. Lim, 3D floorplanning with thermal vias, DATE '06: Proceedings of the conference on Design, automation and test in Europe European Design and Automation Association, pp.878-883, 2006.

J. Wu, A simple fault-tolerant adaptive and minimal routing approach in 3-D meshes, Journal of Computer Science and Technology, vol.5, issue.3, pp.1-13, 2003.
DOI : 10.1007/BF02946645

D. Xiang, J. Sun, and J. Wu, Fault-Tolerant Routing in Meshes/Tori Using Planarly Constructed Fault Blocks, 2005 International Conference on Parallel Processing (ICPP'05), pp.577-584, 2005.
DOI : 10.1109/ICPP.2005.40

Q. Yu and P. Ampadu, Adaptive Error Control for NoC Switch-to-Switch Links in a Variable Noise Environment, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems, pp.352-360, 2008.
DOI : 10.1109/DFT.2008.40

Y. Yang and J. Wang, Efficient all-to-all broadcast in all-port mesh and torus networks, Proceedings Fifth International Symposium on High-Performance Computer Architecture, p.290, 1999.
DOI : 10.1109/HPCA.1999.744382

Z. Zhang, A. Greiner, and S. Taktak, A reconfigurable routing algorithm for a faulttolerant 2D-mesh network-on-chip, DAC '08: Proceedings of the 45 th annual Design Automation Conference, pp.441-446, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00591783

H. Zimmermann, OSI Reference Model--The ISO Model of Architecture for Open Systems Interconnection, IEEE Transactions on Communications, vol.28, issue.4, pp.2-9, 1988.
DOI : 10.1109/TCOM.1980.1094702

H. Zimmer and A. Jantsch, A fault model notation and error-control scheme for switch-toswitch buses in a network-on-chip, CODES+ISSS '03: Proceedings of the 1 st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pp.188-193, 2003.

Y. Zhou, V. Lakamraju, I. Koren, and C. M. Krishna, Software-Based Failure Detection and Recovery in Programmable Network Interfaces, IEEE Transactions on Parallel and Distributed Systems, vol.18, issue.11, pp.1539-1550, 2007.
DOI : 10.1109/TPDS.2007.1093

K. Zhang and S. Pande, Minimizing downtime in seamless migrations of mobile applications, LCTES '06: Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems, pp.12-21, 2006.

C. Rusu-publications, ?. Claudia-rusu, L. Anghel, and D. Avresky, Adaptive Inter-Layer Message Routing in 3D Networks-on-Chip, Microprocessors and Microsystems Journal, 2010.

?. Claudia-rusu, L. Anghel, and D. Avresky, RILM: Reconfigurable inter-layer routing mechanism for 3D multi-layer networks-on-chip, 16 th IEEE International On- Line Testing Symposium (IOLTS), 2010.

?. Vladimir-pasca, L. Anghel, C. Rusu, and M. Benabdenbi, Configurable serial fault-tolerant link for communication in 3D integrated systems, 16 th IEEE International On-Line Testing Symposium (IOLTS), 2010.

?. Vladimir-pasca, L. Anghel, C. Rusu, and M. Benabdenbi, Configurable fault-tolerant link for inter-die communication in 3D on-chip networks, 15 th IEEE European Test Symposium (ETS), 2010.

?. Vladimir-pasca, L. Anghel, C. Rusu, and M. Benabdenbi, Non-regular 3D mesh networks-on-chip, DAC Workshop on Diagnostic Services in Network-on-Chips (DSNoC), 2010.

?. Claudia-rusu and L. Anghel, Checkpoint and rollback recovery in network-on-chip based systems, 2010.

?. Vladimir-pasca, L. Anghel, C. Rusu, R. Locatelli, and M. Coppola, Error resilience of intra-die and inter-die communication with 3D Spidergon STNoC, DATE, 2010.

?. Claudia-rusu, L. Anghel, and D. Avresky, Message routing in 3D networks-onchip, NORCHIP, 2009.

?. Claudia-rusu, C. Grecu, and L. Anghel, Efficient coordinated checkpointing recovery schemes for network-on-chip based systems, nd International Workshop on Dependable Circuit Design (DECIDE), 2008.

?. Claudia-rusu, C. Grecu, and L. Anghel, Network-on-chip fault tolerance through checkpoint and rollback recovery Communication-aware recovery configurations for networks-on-chip, National Symposium on System-on-Chip ? Systemin-Package 14 th IEEE International On-Line Testing Symposium (IOLTS), 2008.

?. Claudia-rusu, C. Grecu, and L. Anghel, Blocking and non-blocking checkpointing for networks-on-chip, Secure Nanocomputing (WDSN), 2008.

?. Claudia-rusu, C. Grecu, and L. Anghel, Improving the scalability of checkpoint recovery for networks-on-chip, IEEE International Symposium on Circuits and Systems (ISCAS), 2008.

?. Claudia-rusu, C. Grecu, and L. Anghel, Coordinated versus uncoordinated checkpoint recovery for network-on-chip based systems, th IEEE International Symposium on Electronic Design, Test and Applications (DELTA), 2008.

?. Cristian-grecu, A. Ivanov, R. Saleh, C. Rusu, L. Anghel et al., A flexible network-on-chip simulator for early design space exploration, st Microsystems and Nanoelectronics Research Conference (MNRC), 2008.

?. Claudia-rusu, A. Bougerol, L. Anghel, C. Weulerse, N. Buard et al., Multiple event transient induced by nuclear reactions in CMOS logic cells, 13 th IEEE International On-Line Testing Symposium (IOLTS), 2007.