Design, Optimization, and Formal Verification of Circuit Fault-Tolerance Techniques

Dmitry Burlyaev 1
1 SPADES - Sound Programming of Adaptive Dependable Embedded Systems
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Technology shrinking and voltage scaling increase the risk of fault occurrences in digital circuits. To address this challenge, engineers use fault-tolerance techniques to mask or, at least, to detect faults. These techniques are especially needed in safety critical domains (e.g., aerospace, medical, nuclear, etc.), where ensuring the circuit functionality and fault-tolerance is crucial. However, the verification of functional and fault-tolerance properties is a complex problem that cannot be solved with simulation-based methodologies due to the need to check a huge number of executions and fault occurrence scenarios. The optimization of the overheads imposed by fault-tolerance techniques also requires the proof that the circuit keeps its fault-tolerance properties after the optimization.In this work, we propose a verification-based optimization of existing fault-tolerance techniques as well as the design of new techniques and their formal verification using theorem proving. We first investigate how some majority voters can be removed from Triple-Modular Redundant (TMR) circuits without violating their fault-tolerance properties. The developed methodology clarifies how to take into account circuit native error-masking capabilities that may exist due to the structure of the combinational part or due to the way the circuit is used and communicates with the surrounding device.Second, we propose a family of time-redundant fault-tolerance techniques as automatic circuit transformations. They require less hardware resources than TMR alternatives and could be easily integrated in EDA tools. The transformations are based on the novel idea of dynamic time redundancy that allows the redundancy level to be changed "on-the-fly" without interrupting the computation. Therefore, time-redundancy can be used only in critical situations (e.g., above Earth poles where the radiation level is increased), during the processing of crucial data (e.g., the encryption of selected data), or during critical processes (e.g., a satellite computer reboot).Third, merging dynamic time redundancy with a micro-checkpointing mechanism, we have created a double-time redundancy transformation capable of masking transient faults. Our technique makes the recovery procedure transparent and the circuit input/output behavior remains unchanged even under faults. Due to the complexity of that method and the need to provide full assurance of its fault-tolerance capabilities, we have formally certified the technique using the Coq proof assistant. The developed proof methodology can be applied to certify other fault-tolerance techniques implemented through circuit transformations at the netlist level.
Complete list of metadatas

Cited literature [149 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01253368
Contributor : Abes Star <>
Submitted on : Tuesday, May 10, 2016 - 7:02:51 PM
Last modification on : Thursday, October 11, 2018 - 8:48:04 AM
Long-term archiving on : Wednesday, May 25, 2016 - 8:48:10 AM

File

BURLYAEV_2015_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01253368, version 2

Collections

Citation

Dmitry Burlyaev. Design, Optimization, and Formal Verification of Circuit Fault-Tolerance Techniques. Hardware Architecture [cs.AR]. Université Grenoble Alpes, 2015. English. ⟨NNT : 2015GREAM058⟩. ⟨tel-01253368v2⟩

Share

Metrics

Record views

535

Files downloads

1098