Compact modeling and circuit design based on spin injection
Qi An

To cite this version:

HAL Id: tel-01720258
https://tel.archives-ouvertes.fr/tel-01720258
Submitted on 1 Mar 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Thèse de doctorat
de l’Université Paris-Saclay
préparée à l’Université Paris-Sud

Ecole doctorale n°575
Electrical, Optical, Bio-physics and Engineering (EOBE)
Spécialité de doctorat : PHYSIQUE

par

Mlle Qi An

Modélisation compacte et conception de circuit à base d’injection de spin

Compact modeling and circuit design based on spin injection

Thèse présentée et soutenue à Orsay, le 05 Octobre 2017.

Composition du Jury :

M. Arnaud Bournel Professeur Université Paris-Sud (Président du jury)
M. Lionel Torres Professeur Université Montpellier (Rapporteur)
Mme Cristell Maneux Professeure Université de Bordeaux (Rapporteur)
M. Ian O’Connor Professeur Ecole Centrale de Lyon (Examinateur)
M. Sébastien Le Beux Maître de conférences Ecole Centrale de Lyon (Examinateur)
M. Weisheng Zhao Professeur Beihang Université (Examinateur)
M. Jacques-Olivier Klein Professeur Université Paris-Sud (Directeur de thèse)
Acknowledgements

I would like to thank all the people that helped and supported me in the research and the life during the past three years. This work would not have been achieved without their help and support.

Before the acknowledgment, I want to tell you some stories. In 2013, I was able to come to France to study through the China government scholarship, based on the thesis subject provided by Sébastien Le Beux and Ian O’Connor at Ecole Centrale de Lyon. After finishing the study of Master, I was supposed to continue my doctoral study in Lyon. However, due to the check of “ Fonctionnaire de Sécurité Défense (FSD)”, I could not continue my research at Ecole Centrale de Lyon. Under this circumstance, Sébastien Le Beux and Ian O’Connor, whom I should really thank to, help me contact Jacques-Olivier Klein, who became my advisor in the past three years, and Weisheng Zhao, who helps me and discusses with me a lot. Thanks to all of them, I am able to continue my research in France and work on this topic in this laboratory. Except for the opportunity they offered me, I also would like to thank them for their help in my research. I want to express my grateful gratitude to Jacques-Olivier Klein, for his patience, the fruitful discussions and invaluable advice that inspired me on the subject. I felt encouraged by his trust and also have learned what a doctor/researcher really is from him. I am also thankful for my co-advisor, Sébastien Le Beux. He always keeps patient and communicates with me on every detail of my work plans, the subject, and the paper manuscripts. I was influenced a lot by his preciseness in doing research. I also want to thank Weisheng Zhao, for his guidance and fruitful discussions during my thesis. I am also thankful for Ian O’Connor, for his patience, the fruitful discussions on my papers and felt encouraged by his confidence in my ability.

I am very grateful to the members of my thesis committee. I want to thank Prof. Lionel Torres and Prof. Cristell Maneux, for their agree to report this thesis. Similarly, I want to thank Prof. Arnaud Bournel, Prof. Ian O’Connor, Prof. Sébastien Le Beux and Prof. Weisheng Zhao to accept to be the examiners. It is my honor to have this thesis reviewed by such a significant committee. Thank you very much for having accepted this charge.

I would like to thank Prof. Nicolas Vernier, for his help in understanding the magnetic fundamental, and his kindness. I am also thankful to Prof. Arnaud Bournel, for the discussions in material physics. My sincere thanks to all the members of NANOARCHI group: Damien Querluz, Christopher Benett, Nicolas Locatelli, Adrien Vincent, Alice Mizrahi, Damir Vodenicarevic...I also want to thank Dr. Zhaohao Wang, who helps me a lot in technical problems and understanding the fundamental of magnetism.

I also thank the administration members: Mme Lydia Pactole, Mme Marie-Pierre Garon, Mme Florence Gomez...It has been a pleasure to work in such a kind laboratory. Also thanks to Prof. Eric Cassan, Mme. Sophie Bouchoule and Mme. Laurence Stephen from Doctoral School, for their patience and their assistance in the registration and thesis defense procedure.

To all other friends that made my stay in Paris enjoyable: Erya Deng(thank a lot for your encouragement), Li Su(appreciate your help in my first year), Gefei Wang, Xueying Zhang, Yu Zhang, You Wang, Ping Che, Jiaqi Zhou, Boyu Zhang, Men Su, Lu Lu, Ming Wu, Xiaoyi Yang, Yiyi Li, Hao Zhang, Zheng Yang, Si Chen, Jiating Luo, Weilong Li...

A few lines to my dearest friends: Changhui Zhuang, who has always been close to me and provides me with spiritual power to confront the problems with life; Chenying Gan, who influences me with your optimism and humor on life; Rui Sun, grateful for your concerns
and support and Lingfei Wan, ...you pushed me to like life and gave me optimism.

A special gratitude to my family for their continuous support and encouragement especially to my dearest parents. Thanks for your selfless love and tolerance on me, which helps me live through all the difficulties and gives me hope in my life. Thanks a lot and I love you all!

The last but not the least, I would like to thank China Scholarship Council (CSC) for the financial support.
Abstract

The Complementary Metal Oxide Semi-conductor (CMOS) technology has tremendously affected the development of the semi-conductor industry. However, as the technology node is scaled down, the CMOS technology faces significant challenges set by the leakage power and the short channel effects. To cope with this problem, researchers pay their attention to the spintronics in recent years, considering its possibilities to allow smaller-size fabrication and lower-power operations. The Magnetic Tunnel Junction (MTJ) is one of the most important spintronic devices which can store binary data based on Tunnel Magnetoresistance (TMR) effect. Except for the non-volatile memory, MTJ can be also used to combine with or replace the CMOS circuits to implement a hybrid circuit, for the potential to achieve low-power consumption and high-speed performance. However, the problem of frequent spin-charge conversion in a hybrid circuit may cause large power consumption, which diminishes the advantage of the hybrid circuits. Therefore, the ASL concept which uses a pure spin current to transport the information is proposed for fewer charge-spin conversions, thus for less power consumption. The design of All-Spin Logic (ASL) device-based circuits leads to numerous challenges related to the heterogeneity they introduce and the large design space to explore. Hence, this thesis focus on filling the gap between application requirements at the system level and the device fabrication at the device level.

In device level, we developed a compact model integrating the Spin-Transfer Torque (STT), the TMR, the spin injection/accumulation effects, the channel breakdown current and the spin diffusion delay. Validated by comparing with experimental results, this model allows exploring fabrication-related device parameters such as channel lengths and MTJ sizes and help designers to prevent from device damages. Moreover, programmed with Verilog-A on Cadence and divided into several blocks: injector, detector, channel and contact devices, this model allows the independent design and cross-layer optimization of ASL-based circuits, that eases the design of hierarchical, complex circuits. Furthermore, the spin injection/accumulation expressions for the considered ASL device are derived, enabling to discuss the experimental phenomena of the ASL device.

In circuit level, we developed a circuit/system design methodology, taking into account the multi-channel distribution, the gate interconnection and the different injection current ratios caused by the spin diffusion. With circuit/system specifications and constraints, the boolean functions of a circuit are synthesized based on the developed synthesis methods and fabrication-level parameters: channel lengths, MTJ sizes are specified. Based on this developed methodology, basic combinational circuits that form a circuit library are designed and evaluated by using the developed compact model.

In system level, a convolution circuit and an Intel i7 system are evaluated exploring the interconnection issues: interconnection distribution between gates and inserted buffer count. With theoretical parameters, results show that ASL-based circuit/system can outperform CMOS-based circuit/system. Moreover, the pipelining schema of the ASL-based circuit is discussed with MTJ as latches inserted between stages. The reconfigurability caused by the injection current polarities/values and the control terminal states of ASL-based circuits are also discussed with the reconfigurable exploration of basic logic circuits.

**Keywords:** All spin logic, compact modeling, design methodology, pipelining, reconfigurability
Résumé

La technologie CMOS a considérablement contribué au développement de l'industrie des semi-conducteurs. Cependant, au fur et à mesure que le nœud technologique est réduit, la technologie CMOS fait face à des défis importants liés à la dissipation due aux courants de fuite et aux effets du canal court. Pour résoudre ce problème, les chercheurs se sont intéressés à la spintronique ces dernières années, compte tenu de la possibilité de fabriquer des dispositifs de taille réduite et d’opérations de faible puissance. La jonction tunnel magnétique (MTJ) est l’un des dispositifs spintroniques les plus importants qui peut stocker des données binaires grâce à la Magnétorésistance à effet tunnel (TMR). En dehors des applications de mémoire non volatile, la MTJ peut également être utilisée pour combiner ou remplacer les circuits CMOS pour implémenter un circuit hybride, de façon à combiner une faible consommation d’énergie et des performances à grande vitesse. Cependant, le problème de la conversion fréquente de charge en spin dans un circuit hybride peut entraîner une importante consommation d’énergie, ce qui obère l’intérêt pour des circuits hybrides. Par conséquent, le concept ASL qui repose sur un pur courant de spin comme support de l’information est proposé pour limiter les conversions entre charge et spin, donc pour réduire la consommation d’énergie. La conception de circuits à base de dispositifs ASL entraîne de nombreux défis liés à l’hétérogénéité qu’ils introduisent et à l’espace de conception étendu à explorer. Par conséquent, cette thèse se concentre sur l’écart entre les exigences d’application au niveau du système et la fabrication des nanodispositifs.

Au niveau du dispositif, nous avons développé un modèle compact intégrant le couple de transfer de spin (STT), la TMR, les effets d’injection/accumulation de spin, le courant de breakdownd des canaux et le délai de diffusion de spin. Validé par comparaison avec les résultats expérimentaux, ce modèle permet d’explorer les paramètres du dispositif liés à la fabrication, tels que les longueurs de canaux et les tailles de MTJ, et aide les concepteurs à éviter leur destruction. De plus, ce modèle, décrit avec Verilog-A sur Cadence et divisé en plusieurs blocs: injecteur, détecteur, canal et contact, permet une conception indépendante et une optimisation des circuits ASL qui facilitent la conception de circuits hiérarchiques et complexes. En outre, les expressions permettant le calcul de l’injection/accumulation de spin pour le dispositif ASL utilisé sont dérivées. Elles permettent de discuter des phénomènes expérimentaux observés sur les dispositifs ASL.


Au niveau du système, un circuit de convolution et un système Intel i7 sont évalués en explorant les problèmes d’interconnexion: la répartition de l’interconnexion entre les portes et le nombre de tampons inséré. Avec des paramètres théoriques, les résultats montrent que le circuit/système ASL peut surpasser le circuit/système basé sur CMOS. De plus, le schéma de pipeline du circuit basé sur ASL est discuté avec MTJ comme tampons insérés entre les étapes. La reconfigurabilité provoquée par les polarités/valeurs du courant d’injection et les
états des terminaux de contrôle des circuits ASL sont également discutés avec l'exploration reconfigurable des circuits logiques de base.

**Mots-clés:** logique à pur courant de spin, modélisation compacte, méthodologie de conception, pipeline, reconfigurabilité
Contents

Acknowledgements i
Abstract iii
Résumé v

1 Introduction 1
  1.1 Background ................................................. 1
  1.1.1 Device .................................................. 1
  1.1.2 Circuit and system ...................................... 2
  1.2 Motivation .................................................. 3
  1.3 Objectives and Methods .................................... 4
  1.4 Research Contributions .................................... 5
    1.4.1 Compact modeling of All Spin Logic (ASL) device .... 5
    1.4.2 ASL based circuit design method ....................... 5
    1.4.3 System design & evaluation method ..................... 5
  1.5 Organization of the Thesis ................................. 6

2 State-of-the-art 7
  2.1 MTJs ......................................................... 7
    2.1.1 Structure and working principle ....................... 7
    2.1.2 MTJ fundamental and development ....................... 7
    2.1.3 Memory and circuit applications ....................... 11
  2.2 All Spin Logic (ASL) Device .............................. 14
    2.2.1 Structure and working principle ....................... 14
    2.2.2 ASL fundamental and development ....................... 15
    2.2.3 ASL modeling and benchmarking ......................... 18
    2.2.4 ASL circuit and system application .................... 19
  2.3 Summary .................................................... 22

3 Compact Modeling of ASL Device 23
  3.1 Physical Model of ASL Device ............................. 24
    3.1.1 MTJ models ............................................ 25
    3.1.2 Spin injection/detection Model ........................ 27
    3.1.3 Scaling effects ....................................... 31
  3.2 Electrical Model of ASL Device ............................ 33
    3.2.1 Model language ........................................ 33
    3.2.2 Model parameters ...................................... 33
    3.2.3 Model hierarchy ....................................... 33
    3.2.4 Model implementation ................................... 34
  3.3 Results .................................................... 35
    3.3.1 Model validation ...................................... 36
    3.3.2 ASL device performance analysis ....................... 37
    3.3.3 Inverter/Buffer simulation ............................. 42
3.4 Summary .............................................................. 43

4 Circuit Design and Simulations .................................... 45
  4.1 Background and Related Work .................................. 45
    4.1.1 Majority principle ...................................... 45
    4.1.2 Circuit synthesis method ................................ 46
    4.1.3 Benchmarking ............................................. 49
  4.2 Circuit Design Method ........................................... 50
  4.3 Logic Circuits Simulations and Evaluations ................. 52
    4.3.1 Basic logic circuit .................................... 52
    4.3.2 Arithmetic logical functions .......................... 59
    4.3.3 Data transmission ...................................... 75
    4.3.4 Arbitrary circuit ...................................... 86
  4.4 Circuit Benchmarking ........................................... 87
  4.5 Summary ........................................................ 88

5 System Level Design ................................................ 93
  5.1 System Design Issues ........................................... 93
    5.1.1 Reconfigurability ....................................... 93
    5.1.2 ASL-based pipelining .................................. 95
    5.1.3 Interconnection issues ................................ 96
  5.2 Computing Circuits/Systems Evaluation .................... 98
    5.2.1 Convolution circuit ................................... 98
    5.2.2 Intel i7 System ......................................... 100
  5.3 Summary ........................................................ 103

6 Conclusions and Perspectives .................................... 105
  6.1 Conclusions .................................................... 105
    6.1.1 Global conclusions .................................... 105
    6.1.2 Device level ............................................ 105
    6.1.3 Circuit level ........................................... 106
    6.1.4 System level ........................................... 106
  6.2 Perspectives .................................................... 106
    6.2.1 Modeling ................................................ 106
    6.2.2 Circuit Layout ......................................... 107
    6.2.3 System evaluation and application .................... 107

References .................................................................. 108

Appendix A ASL Performance Equations Derivation ............ 129

Appendix B Source Code of ASL Compact Model ................. 133
  Input ferromagnetic model ..................................... 133
  Output ferromagnetic model .................................. 133
  Tunnel barrier model .......................................... 137
  Interface model ................................................ 138
  Channel model .................................................. 139
    Channel shunt model ...................................... 139
    Channel series model ...................................... 140
  Ground model ................................................... 141

List of Figures ....................................................... 143

List of Tables ........................................................ 151
List of Acronyms

List of Publications

Synthèse en Français  157
Chapitre 1 Introduction générale  ............................................ 157
Chapitre 2 État de l’art .............................................................. 158
Chapitre 3 Modélisation compacte de ASL ............................... 159
Chapitre 4 Conception et simulation des circuits à base de ASL 161
Chapitre 5 Modélisation et évaluation niveau système ............... 162
Conclusions et perspectives ......................................................... 165
Chapter 1

Introduction

1.1 Background

1.1.1 Device

The CMOS technology has tremendously affected the development of semi-conductor industry in the past decades. Its ability to scale electronic devices to ever-smaller dimensions has been the primary driver of the increased performance leading to this development. For over 40 years, the industry has been able to pack twice as many CMOS Field-Effect Transistor (FET) onto a chip every 18 months, which is known as “Moore’s Law” [1]. Moore’s prediction proved accurate for several decades and has been used in the semiconductor industry to guide long-term planning and to set targets for research and development. However, as device scaling continues into the 21st century, it turns out that past trends in growth, doubling circuit density and increasing performance by around 40% for each new technology generation cannot be maintained by conventional scaling. The CMOS technology faces significant challenges and will slow down the growth of semiconductor industry, according to the International Technology Roadmap for Semiconductors (ITRS) [2], one of the leaders in the fields of semiconductor research and industry.

The limitations are found in three aspects: performance, lithography and economic. When scaled down to nano-scale, Short Channel Effect (SCE) becomes increasingly dominant, lowering the threshold voltage and making the devices more vulnerable to variability. Moreover, increasing leakage current [3] during scaling down leads to higher power consumption. From the lithographic point of view, scaling down to sub-50 nm requires several innovations in terms of design and equipment: optical proximity correction, high output power laser light sources, off-axis illumination, short wavelength, etc. These will increase the manufacturing costs. Due to these observations, the way out for semiconductor industry is, either finding a way to continue the scaling of CMOS technology (“More Moore”
and “More-than-Moore” [4]), or finding other replaced technologies promising more scaling opportunities (“Beyond-CMOS” [2]), as shown in Fig. 1.1.

Figure 1.1 – Hierarchical organization and opportunities for CMOS and emerging technologies [5].

“Beyond CMOS” is the name of one of the seven focus groups in ITRS 2.0 and refers to the possible future digital logic technologies beyond the CMOS scaling limits, such as spin-based devices, ferromagnetic logic, and atomic switch. Until now, various kinds of “Beyond-CMOS” devices, including memory devices (MTJ [6–33], Ferroelectronic Field-Effect Transistor (FeFET) [34, 35], Resistive Random-Access Memory (ReRAM) [36–38], Molecular memory [39, 40], etc) and logic devices (Spin Field-Effect Transistor (Spin-FET) [41–43], Spin Wave Device (SWD) [44], Spin torque majority gate [45], All-Spin Logic (ASL) [46–59], Spin Torque Oscillator (STO) [60–62], etc), have been proposed, for their potential to overcome the power and performance limitations.

Among emerging memory devices, MTJ, which can store binary data based on Tunnel MagnetoResistance (TMR) effect [6, 31, 63], has been studied a lot since the main source of static power consumption in a computational system is the memory which must be maintained by an ongoing power supply. Besides being used as Magnetoresistance Random Access Memory (MRAM) [15, 16, 64, 65], MTJ is also used to combine with CMOS to establish hybrid MRAM/CMOS circuits [8, 24, 25, 66]. This type of circuits is proved to have high power efficiency. Moreover, the hybrid circuit overcomes the bottleneck of communications between memory and logic. Nevertheless, it is difficult to manage the power consumption caused by the frequent conversion between spin and charge. Moreover, from the perspective of design method, the hybrid circuits still belong in the category of CMOS design. Hence, to better exert the advantage of spintronic devices, some ASL devices are proposed, which use the pure spin current to transport information and thus lower power consumption caused by charge-spin commutation. It has been argued that ASL device could potentially lead to ultra low power switches since a stable nanomagnet with an activation barrier of about \( k_B T \) could be switched with less than 1 aJ [57]. Under this circumstance, this thesis focus on the study of the ASL device.

### 1.1.2 Circuit and system

Speaking of the possible applications of emerging research memory and logic device, the envisioned applications can take many forms, according to ITRS 2011 [4]:

- as a drop-in replacement for conventional circuits,
• as supplemental devices that complement and coexist with CMOS devices,
• as devices whose unusual properties can provide unique functionality for selected information processing applications.

The aforementioned hybrid MRAM/CMOS circuits belong to the second possible application. The future trend will possibly focus on the first and the third possible applications, considering the scaling limitation of CMOS technology. ASL devices, which integrates the function of memory and logic, have the possibility to be used for new circuit/system design. Furthermore, the current superposition and switching threshold of ASL device impel their research in neuromorphic application [29, 30, 52, 55, 56, 67–98], which is different with traditional Von Neumann machines [99, 100] and attracts growing interest due to their potential to achieve human-like intelligence and low-power operation. Along with zero static power, low power consumption, high density, and non-volatility, ASL could provide a dominant implementation for future circuits and systems.

1.2 Motivation

ASL device is first proposed in [46]. It is an emerging device that uses a pure spin current to transport information and can realize both memory and logic functions. It is argued that ASL device shows five essential characteristics for logic applications: concatenability, non-linearity, feedback elimination, gain and a complete set of Boolean operations [46]. Hence, along with the possibility to allow low-power and high-density operations, these advantages prompt the discussion of the ASL device in new Boolean computing and neuromorphic computing.

Nevertheless, the ASL research is still in its infancy. Its physical fundamentals – the spin injection/transport experiments begin at the end of the 20th century [101, 102]. Most of these experiments prove the spin injection/transport phenomena and explore the enhancement of the injection/transport efficiency by using different materials and structures of single ASL device [22, 103–145]. By the time of debut of this thesis, there is few article exploring the possible applications of ASL on circuits and systems. While as an emerging device and different with CMOS, ASL needs to be explored to system level with a new design paradigm. This situation prompts us to study ASL in three levels: device, circuit, and system.

First and most basic, an electrical model is necessary to explore its possibility in circuits and systems, to fill the gap between application requirements at the system level and the circuit fabrication at the device level. Currently, some electrical models are proposed based on the spin-circuit concept by using MatLab [51, 54, 146–151]. However, these approaches are not scalable and cannot be used for complex circuits. Some electrical models [152, 153] use Verilog-A as the modeling language, yet cannot be used for circuit simulation because of the integration of the whole device into one block [152], or does not take into account the STT effect, the channel diffusion and breakdown current effects which are essential for delay calculation in circuit design [153]. Hence, an electrical model of ASL device, that allows the independent design of different parts, needs to be developed for circuit simulation and analysis.

In circuit level, the current addition or subtraction of ASL device shows the majority property [154]. The ASL circuit design is based on majority principle, which means a new design/synthesis method, unlike the AND/OR/Inverter (AOI) method [155], needs to be developed exploring the majority property. Moreover, to evaluate the performance of the circuits even the systems and to compare with CMOS based circuits, the benchmarking of ASL based circuits [156] also needs to be developed based on this new synthesis design method. Unlike the CMOS technology, which is fairly straightforward, the circuit design and the benchmarking of ASL are far more complicated. Many of the devices may perform computation utilizing different architectures, so it requires looking at not just the device but also the circuit implementation and in some cases even the specific application or computation.

3
algorithm being implemented. Moreover, finding a quantitative set of metrics that can be used to contrast the devices and architectures is also necessary for circuit/system evaluation.

Moreover, as it is said that ASL device prompts the investigations of new computing, new computational architecture and relevant optimization methods need to be explored. How ASL device provides unique functionality in information processing applications is also one of the most promising topics.

1.3 Objectives and Methods

This section focuses on the objectives and methods of our research, which aims to make progress of the synthesis methodology used in the framework of emerging computing technologies, here the ASL device. We approach this objective in three steps/levels: compact modeling for ASL device, ASL-based circuit design and system design/evaluation.

Based on the above-mentioned motivations, our goal of this thesis is divided into three parts: device, circuit and system level.

First, in device level, a compact model of ASL device needs to be developed in order to fill the gap between application requirements at the system level and the circuit fabrication at the device level. This compact model should achieve these following goals:

- Accurate simulations of spin injection/detection effects are needed to estimate the MTJ switching time, spin diffusion delay and the spin accumulation according to material properties.

- A scalable approach that allows the independent design of different parts, is mandatory to investigate the design of complex and hierarchical circuits. It is worth noticing that, to be adopted by the designer community, the approach should be compliant with current standardized CMOS-based design techniques and should be implemented in an existing commercial environment.

- The models should be generic to allow exploring fabrication-related device parameters such as channel lengths and MTJ sizes. Such exploration should allow investigating not only performances tradeoffs but should also help designers to prevent from device damages.

In circuit level, since ASL device follows the majority principle, this demands an entirely different design/synthesis method with that of CMOS. Hence, the most important is developing a methodology for ASL-based circuit design, approaching the circuit layout to the greatest extent possible. Moreover, considering the design/evaluation in system level and the comparison with CMOS technology, the benchmarking of these circuits is needed.

In system level, the main goal is to develop a method to evaluate the system performance based on the circuit benchmarking and to study the advantages of ASL devices and the possible optimizations compared with CMOS technology. The exploration in new computing, new computational architecture and relevant optimization methods is also inevitable.

The development of the compact model is the core of this thesis and is the foundation for circuit/system design and evaluation. According to our objective for ASL compact model, we use Verilog-A language [157], which is compatible with standard circuit simulation tools, to program our model. With Cadence platform, this model provides an easy parameter interface and can be divided and reorganized to implement different circuits. This allows cross-layer optimization of ASL-based circuits and eases the design of hierarchical circuits. The validation of this model is achieved by comparing with the experimental results extracted from published literatures.

A circuit/system design methodology is developed combining with the majority synthesis methods. With circuit specifications and constraints, the circuit is implemented specifying the channel lengths and MTJ sizes, and its functional behavior is verified based on the
developed compact model. Moreover, basic circuits are implemented and benchmarked to build a library for system design/evaluation.

The system design and evaluation are based on the benchmarked circuit library, considering the gate interconnection distributions and the inserted buffer count. Moreover, the pipelining of ASL-based circuit/system is discussed to improve the performance, namely the throughout. Finally, the unusual property of ASL device induced by the majority functions, the reconfigurability, is introduced and prompts the application of ASL device in new computational architectures.

### 1.4 Research Contributions

In accordance with the goals of this thesis, the contributions have been made at three levels from the device modeling to the system architecture/application. The compact modeling of ASL device and circuit design methodology based on ASL are the cores of this thesis. The reconfigurability of ASL devices are discussed using the developed basic logic circuits. Based on the designed circuits and benchmarking, system design/evaluation methodology is developed, as well as the pipelining method.

#### 1.4.1 Compact modeling of All Spin Logic (ASL) device

On the device level, this thesis develops a compact model of ASL device based on the spin-circuit concept with Verilog-A language on Cadence platform, which has an easy parameter interface. This compact model integrates the STT effect, the TMR effect, the spin injection/diffusion/accumulation effects and the channel breakdown current effect, which allows the investigation of the performance tradeoffs and also help designers to prevent from device damages. Furthermore, divided into six blocs: injector/detector F1/F2, channel N, ground G, interface C1/C2, this model allows the independent design of different parts and eases the design of hierarchical circuits.

Except for the compact modeling of ASL device, we also derived the equations of different performance criteria from the fundamental Maxwell’s equations in the spin domain [148]. This would provide an insight into the circuit optimization and enables to discuss the experimental phenomena of the ASL device.

#### 1.4.2 ASL based circuit design method

In circuit level, a circuit design methodology is developed. With the given parameters, the circuit specifications and constraints, the circuit is synthesized, implemented and laid out with optimized performance. The synthesis method defines the majority functions of the circuit: the “truth table” method in [154], or the “AOI replacement” method in [158]. With determinate majority functions, the circuit topology that yields minimum possible area is determined by exploring all possible layout topologies. Based on the chosen topologies, the MTJ sizes, the channel lengths and the injection currents are explored to implement the circuit and optimize the performance. Based on this methodology, combinational circuits are implemented and benchmarked, which form a circuit library for system design and evaluation. It is worth mentioning that for integrated circuit design, gate interconnection distribution and inserted buffer are considered to evaluate the performance more precisely.

#### 1.4.3 System design & evaluation method

The design and evaluation of ASL device in system level are still in its infancy and there is not a suitable methodology for ASL-based system design. In this thesis, we use a cell-library approach to evaluate the ASL-based system. The system functionality is realized by replacing with the basic ASL-based combinational circuits; gate interconnection distribution
and inserted buffer count are calculated. With knowing the numbers and types of different basic circuits, the system performance can be calculated and optimized with different parameters of basic circuits. Moreover, for system optimization, we consider this from the original point of view of CMOS-based system: the pipelining and reconfigurability. A possible pipelining method is developed for the ASL-based system, by adding registers (MTJs) as latches between each stage like that in CMOS-based system. The reconfigurability is an inherent property of ASL device. By modifying the values and the polarities of the injected currents, the function of one ASL-based circuit can be changed.

1.5 Organization of the Thesis

The present thesis is divided into six chapters as follows.

This chapter presented the background, motivation, objectives and methods, and our contributions.

Chapter 2 reviews the state-of-the-art of MTJs and ASL devices. The basic principle, milestones in the development and the circuits/applications related to our work will be introduced.

Chapter 3 is the modeling part of our work. The physical and electrical models of MTJs and ASL device are presented. Moreover, the dependence of the performance criteria on different device parameters are derived and simulated, which helps the ASL-based optimization. The developed compact model is validated by comparing with the experiments of different materials and structures.

In chapter 4, we present the ASL-based circuit design. A methodology of ASL based circuit design is developed, considering the current diffusion problems. Basic circuits, including Inverter/Buffer, AND/OR(NAND/NOR), Adder, Multiplexer (MUX) and Multiplier, and some combinational circuits are implemented and analysed based on this methodology. The benchmarking of these circuits are also evaluated, which is used for system evaluation.

Chapter 5 focus on the system level design and evaluation. The system performance is evaluated by using the cell-library approach based on the benchmarked circuit library in Chapter 4. Three high-level computing/system circuits: convolution circuit, and i7 system, are designed and evaluated as the examples. Moreover, we also initiate the research of the pipelining method and re-configurable properties of ASL-based circuits/systems, which are essential for performance optimization of a system.

Chapter 6 concludes the thesis and presents some perspectives.
Chapter 2
State-of-the-art

2.1 MTJs

2.1.1 Structure and working principle

The basic structure of MTJ is shown in Fig. 2.1. It is composed of an insulating barrier sandwiched by two Ferromagnetic (FM) layers. The insulating barrier can be CuO [159], CoO [160], ZnO [161], NiO [162,163], TiO₂ [164], MgO [21,165,166], Al₂O₃ [167,168], SiO₂, or manganites [169,170], and needs to be enough thin to guarantee the electron tunneling effect. One FM layer is magnetically pinned, called pinned layer, whereas the other one, is called free layer, whose magnetization can be switched by a magnetic field or an enough large current based on the STT effect [171,172]. Depending on the relative magnetization orientations of these two FM layers, i.e. Parallel (P) or Anti-parallel (AP), one MTJ can have two resistance states: R_P or R_AP where R_P < R_AP, which is also named Tunnel MagnetoResistance (TMR). The relative change of these two resistances is the primary performance metric of an MTJ. It is usually named TMR ratio and defined as:

\[
TMR \text{ ratio} = \frac{R_{AP} - R_P}{R_P} = \frac{G_P - G_{AP}}{G_{AP}}
\] (2.1)

2.1.2 MTJ fundamental and development

In this subsection, we will present the fundamental effects of MTJ: TMR effect and STT effect and outline some crucial progress in their enhancements, including the scaling effect. Moreover, two types of Multi-Level (ML) MTJs are presented, which will enlighten the research on neural network.
Figure 2.1 – Two Magnetic Tunnel Junction (MTJ) states with different resistances based on Tunnel MagnetoResistance (TMR) effect: Parallel ($R_P$, state “0”) and Antiparallel ($R_{AP}$, state “1”); If the current flows from the free layer to the pinned layer and is larger than the critical current $I_{c0}$, the state will be switched to Parallel; on the contrary, the state will be switched to Anti-parallel.

2.1.2.1 TMR effect

The origin of TMR arises from the difference in the electronic density of states (DOS) at the Fermi level $E_F$ between spin-up $N_\uparrow$ and spin-down $N_\downarrow$ electrons. The tunnel conductance is proportional to the product of DOS of the two FMs with same spin orientation, and is given by $N_{1\uparrow}N_{2\uparrow} + N_{1\downarrow}N_{2\downarrow}$.

An intuitive figure of tunneling process explained above is shown in Fig. 2.2. As shown in this figure, an electron tunnels to the spin subband of the same spin orientation, i.e. spin-up to spin-up and spin-down to spin-down. A change from the parallel configuration (2.2(a)) to the antiparallel configuration (2.2(b)) of the magnetizations of two FM layers results in an exchange of the spin subband, causing a corresponding change in resistance/conductance and thus giving rise to TMR ratio.

Figure 2.2 – A schematic of tunneling process of MTJ, electron spin orientation is preserved while traveling from one FM layer to another. (a) Parallel configuration; (b) Anti-parallel configuration.

To calculate the TMR ratio, the resistance/conductance of parallel and anti-parallel states need to be calculated based on Eq. 2.1.

The MTJ resistance/conductance depends on the relative magnetic orientations of two FM layers. Supposing that the angle between these two magnetization orientations is $\theta$, the conductance of MTJ is given by [173]:

$$G(\theta, T, V) = G_T(T, V)(1 + P_1P_2\cos\theta) + G_{ie}(T)$$  \hspace{1cm} (2.2)
where $G_T$ is the tunnel conductance [12, 174–176], which depends on the bias voltage $V$ of the junction and on its temperature $T$; $G_{AP}$ and $G_P$ correspond with $\theta = 180^\circ$ and $\theta = 0^\circ$. $P_1$ and $P_2$ are the polarizations of the first and second FM layer and defined as:

$$P = \frac{N_1 - N_\perp}{N_1 + N_\perp} \tag{2.3}$$

The dependence of $G_T$ upon the temperature $T$ is given by Stratton model [174]:

$$G_T(T, V) = G_T(0, V) \frac{\lambda T}{\sin(\lambda T)} + G_{ie}(T) \tag{2.4}$$

where the constant $\lambda$ is given by $\lambda = (\pi t_{ox} k/\delta t_{ox})$, $t_{ox}$ is the oxide thickness, $k$ the Boltzmann constant, $m_e$ the electron mass and $e$ its charge, $G_{ie}(T)$ is a second term of inelastic conductance to describe the thermal variations in the conductance and is given by $G_{ie}(T) = \tau_n T^\beta$, $\tau_n$ is a material dependent constant and $\beta$ depends on the number of states occupied by the electrons when traversing the tunnel barrier. For a second order system, $\beta = \frac{4}{3}$.

The voltage dependence of the conductance is given by Brinkman model [175] and Simmons model [177]:

$$G_T(0, V) = G_T(0, 0)(1 - 2\beta V + 3\delta V^2)$$

$$G_T(0, 0) = k_0 k_1 A \frac{\sqrt{\phi}}{2t_{ox}} e^{-k_1 t_{ox} \sqrt{\phi}} \tag{2.5}$$

where $G_T(0, V)$ is the tunnel conductance at 0 K and $G_T(0, 0)$ is the tunnel conductance at 0 V, 0 K. $\beta$ is given by $e \sqrt{2m_e t_{ox} d\phi}/(24h\phi^{3/2})$, $\delta$ is given by $e^2 m_e^{1/2}/(12h\phi)$, $k_0 = e^2/(2\pi h)$, $k_1 = 4\pi \sqrt{2m_e e}/h$, $\phi$ is the height of the tunnel barrier, $d\phi$ the barrier asymmetry and $A$ the surface of the junction.

TMR effect is first measured by Juliere in 1975 [173], with a maximum measured value of 14% in Fe-Ge-Co junctions at 4 K. The observed value, however, were rather small and cannot be applied in practice. The above equations show that the TMR depends on the DOS or the spin polarization coefficient $P$. This means realistic electronic structures and disorder at interfaces exert a large effect on TMR. Hence, over the following decades, researches on these factors flourish to get a larger TMR. In 1995, a large TMR of 18% at room temperature was reported for Fe/Al$_2$O$_3$/Fe [178]. Then the research on aluminum oxide barrier yields a steady increase in TMR ratio by improving the spin polarization and fabrication. In 2004, a TMR of 70.4% measured at room temperature is reported in [179] using a CoFeB/AlO$_x$/CoFeB junction. In 2001, a series of theoretical calculations predicted high TMR ratios for Fe/MgO/Fe MTJs [180] [181], where the tunnel barrier is a crystalline MgO layer with (001) texture. The MgO has attracted new research attention. So far, TMR ratio as high as 60% has been reported in a CoFeB/MgO/CoFeB MTJ [182]. There is no doubt that the research on improving TMR ratio will continue by exploring different materials/structures like the half-metals [43,183] with extremely high spin polarization.

### 2.1.2.2 STT effect

As said above in subsection 2.1.1, the magnetization orientation of FM free layer can be switched by applying a magnetic field or a current based on STT effect. As shown in Fig. 2.3, when a current is passing through this device, electrons are first polarized with the magnetization orientation of FM1 and then injected into FM2 through NM. The spins of the injected electrons interact with that in FM2 by exchange interaction and exert torque. If there is enough torque, the magnetization orientation in FM2 will be reversed. The dynamic of the magnetic switching can be explained by Landau-Lifschitz-Gilbert (LLG) equations [184], which includes a STT term:
\[
\frac{d\vec{M}}{dt} = \frac{\text{Effective Field Torque}}{\text{Gilbert Damping}} + \frac{\text{STT}}{\text{Current In Plan (CIP) or Current Perpendicular to Plan (CPP)}}
\]

\[T_{\text{CIP}} = -(\mu \cdot \nabla)M + \frac{\beta}{M_s} \times [(\mu \cdot \nabla)M] \]

\[T_{\text{CPP}} = g(\theta) \frac{\alpha I}{M_s} M \times (p \times M) \]

where the first term is the effective field torque; the second is Gilbert damping; and the third is the spin-transfer torque, which is different depending on the geometries: Current In Plan (CIP) or Current Perpendicular to plan (CPP).

Figure 2.3 – Schematic illustration of Spin Transfer Torque (STT) effect in a magnetic nanopillar consisting of two Ferromagnetic (FM) layers (FM1/2) switching a non-magnetic layer (NM).

Compared to CIP, CPP has several advantages, which is the main research point of MTJ in recent years. First, the magnetic anisotropy of the MTJ is directly related to the thermal stability and data retention. In-plan anisotropy mainly originates from the shape anisotropy. Thereby an elongated cell surface and a thin thickness are required to provide enough thermal stability. With the shrinking of the MTJ size, the in-plan-anisotropy MTJ has difficulty in maintaining the satisfying thermal stability. The perpendicular anisotropy MTJ has no requirement for the elongated shape and thus can overcome this issue. Second, the perpendicular MTJ is more suitable for the STT switching than the in-plan MTJ. It is explained as follows. The critical current \( I_{c0} \) for the STT switching can be derived from the LLG equation.

For in-plan MTJ, it is expressed as:

\[ I_{c0\parallel} \approx \alpha \frac{\gamma \mu_0 e}{\mu_B P} M_s V_F \left[ H_{k\parallel} + \frac{M_s}{2} \right] \]

where \( \mu_B \) is the Bohr magneton, \( V_F \) is the free layer volume, \( H_{k\parallel} \) is the uniaxial in-plan anisotropy field. The energy barrier of thermal stability (E) of the in-plan MTJ is given by:

\[ E_{\parallel} = \frac{\mu_0 M_s H_{k\parallel} V_F}{2} \]

The comparison between Eqs. 2.7 and 2.8 indicates that the STT must overcome additional field \( M_s/2 \) which makes no contribution to the thermal stability. But in a perpendicular MTJ, the critical current is proportional to the thermal stability, expressed as:

\[ I_{c0\perp} \approx \alpha \frac{\gamma \mu_0 e}{\mu_B P} M_s V_F H_{k\perp} = 2\alpha \frac{\gamma e}{\mu_B P} E_{\perp} \]

Therefore, perpendicular MTJ requires lower write current given the same thermal stability, thus lower power consumption.
The perpendicular MTJ was experimentally demonstrated for the first time in 2002, with a TbFeCo/CoFe/Al₂O₃/CoFe/GdFeCo structure and a TMR ratio of 55% [185]. In the past decade, much effort has been made to get higher TMR, lower power consumption and higher density, either in new materials research or dimension scaling [13, 15, 16, 32, 33, 65, 186–190]. Now high TMR ratio (120%), small area and low write current (49 μA) can be achieved [22].

2.1.2.3 Multi-layer MTJs

To improve the storage density and scalability of MTJs in STT-RAM, ML MTJ is proposed [10, 11], which can store multi-bit data per cell. Two types of ML MTJ structures are proposed: parallel and series, as shown in Fig.2.4. A 2-bit parallel ML MTJ is shown in Fig. 2.4(a). It is composed of one single MTJ whose free layer has two domains. These two domains switch at different spin-polarized currents and form different resistance levels with the reference layer. For the 2-bit series ML MTJ shown in Fig. 2.4(b), it is composed of two vertically stacked single MTJs that have different TMR ratios. Multiple resistance levels can be achieved with different magnetization configurations of the two MTJs.

![Figure 2.4](image_url)

Figure 2.4 – Two different structures of 2-bit Multi-layer MTJ (ML MTJ). (a) parallel ML MTJ; (b) series ML MTJ.

So far, the two structures are designed and fabricated and the spin transfer switching is demonstrated [11, 191, 192] by using different currents based on the access scheme [10, 193]. The impacts of the process variations and the thermal fluctuations on the performance and reliability are also analyzed [10] and prove their feasibility. Now, ML MTJs are used as a processor’s caches [194, 195] or for fast local checkpointing in computing systems [196].

Another possible application domain of ML MTJs is neuromorphic. Theoretically, arbitrary numbers of bits, if we stack multiple MTJs vertically and carefully chose the switching currents, can be stored in one ML MTJ cell. This prompts its possibility as synapse in a deep neural network, combined with a learning diagram with integer weights. The compact modeling developed in [27] will further facilitate this possible application.

2.1.3 Memory and circuit applications

Due to its non-volatility, MTJs provide a new route to the next generation memories and logic circuits. So far, MTJ-based memory, i.e. MRAM, has been widely explored and produced commercially. Moreover, its possible application in neuromorphic is also explored based on its memristive property and two-states representation. This subsection will review these applications.
2.1.3.1 Memory

The discovery of TMR at room temperature and higher value prompts MTJ in memory applications. One of the first working MRAMs, using the magnetic field for writing, was developed at IBM in 2000 [197]. Then the development of spin transfer switching represented a huge step forward for MRAM perspective. One schematic of STT-RAM is presented in Fig. 2.5 as 1T1R (one transistor and one resistor) form. The word line is connected to the gate of a transistor which is used to select the MTJ to be written or read. The writing is done through spin transfer switching by applying either a positive or a negative voltage pulse between the source line and the bit line. The reading is done by applying a weaker voltage to the bit line to sense the resistance of MTJ.

![Figure 2.5 - Schematic of 1T1R memory cell [198].](image)

Recently, a 64 Mbit SPI/DDR4 chip is fabricated by Avalanche Technology using 55 nm CMOS technology, achieving a read raw (Bit Error Rate (BER)) below $10^{-7}$ [64]. Many studies on the failure analysis [199, 200], stability [200], power and delay [14, 201] have been carried on.

2.1.3.2 Logic circuits

Another possibility of MTJ is the application for logic circuits. The most mature one is the hybrid MRAM/CMOS circuit [202]. Such a circuit contains both MTJs and CMOS transistors. The logic functions are still provided by CMOS transistors, but the MTJs provide enhanced functionalities such as instant on/off or enhanced radiation hardness. The realized hybrid circuits contain the flip-flop [203, 204], full-adder [20, 205], sensing amplifier [66] and magnetic FPGA [206, 207]. The other approach for logic circuits is using magnetic interactions between magnetic nanostructures. Full logic functions (AND, OR, NAND, NOR, XOR, and XNOR) can be realized [28, 208, 209].

2.1.3.3 Neural network

Neuromorphic computing has emerged as a future computing architecture due to its possibility of allowing low power consumption. CMOS based neural networks have been studied. However, the complexity of neural network prompts the research on nanodevices, considering the scaling limit of CMOS technology. MTJ is one of the possible devices, which can act both synapses and neurons in a neural network.

The neuron property originates from the thresholding operation during the switching of MTJ states [210]. Fig. 2.7 shows a crossbar neuromorphic architecture consisting of programmable resistive synapses and the MTJ neuron [29]. The synapses generate the excitatory/inhibitory charge current, inputting to the MTJ neuron and switching the neuron states.
Figure 2.6 – Schematic of Spin-MTJ based Non-Volatile Flip-Flop [203].

Figure 2.7 – Neuromorphic architecture based on “STT-Neuron” [29].

A MTJ can be a synapse in three ways: intrinsic non-volatile memory property representing integral weights, stochastic [211] and memristive [73, 98, 212, 213] properties representing continuous weights. To represent integral synaptic weights, single MTJ can be used to represent binary (0, 1) [80, 82, 214, 215] or ternary number (-1, 0, +1) [216] by adjusting the MTJ states and the input current values. The recent discovery of ML MTJ provides the possibility of representing arbitrary integral numbers even floating numbers if the function of each MTJ is well defined. The single MTJ with intrinsic stochastic property is proposed [211] to implement learning-capable synapses, giving an insight into a new way to use memory nanodevices. The most popular way of using MTJ as synapse is as a memristor, with the discovery of simultaneous occurrence of TMR and Resistive Switching (RS), a displacement of oxygen vacancies located at the interface. By applying several voltage pulses, the resistance of the AP or P state will be periodically switched between a High Resistance State (HRS) and a Low Resistance State (LRS). The resistance depends on the flux \( \Phi \), which is defined as \( \Phi(t) = \sum_{i=0}^{n} v_i t_i \), with \( v_i \) and \( t_i \) the voltage and duration of the \( i \)th pulse, respectively. Many oxides can exhibit memristive switching behavior, including Magnesia (Mg) [213, 217], Barium Titanate (BTO) [218] and Tantalum Oxide (Ta_{x}O_{2}) [219]. Researches on new materials keep going to improve the resistance change and to improve the reliability and the robustness.
2.2 All Spin Logic (ASL) Device

2.2.1 Structure and working principle

The fundamental of the ASL device is the spin injection/detection. To perform electrical spin injection into spin channel, two types of measurement known as “non-local” [101, 102, 117, 220, 221] and “local” [220, 222, 223] have been commonly used, as shown in Fig. 2.8 (a) and (b). For non-local measurement, a current source is applied between the electrodes E1 and E2, where E2 serves as the injector where charge current injected spin current through. After spin injection, the spin current in the channel underneath E2 is able to diffuse in both directions, towards E1 (as a spin current with a charge current) and E3 (spin current mostly). The spin is then detected by measuring the voltage across E3 and E4, where E3 (FM) is the spin detector. This measurement is called non-local because the voltage probe lies outside the charge current loop; This geometry allows the voltage to detect the spin density at E3 arising from the pure spin current of diffusion of spin-polarized electrons. The measured voltage $V_{NL}$ is positive or negative depending on whether the magnetization configurations of E2 and E3 are parallel or antiparallel to each other. The difference between these two voltages is the non-local spin signal, and it is often converted to units of resistance by dividing out the injection current, $I_{inj}$, as $\Delta R_{NL} = (V_{NL}^P - V_{NL}^{AP})/I_{inj}$.

![Figure 2.8 – Spin valve structure. (a)non-local spin valve; (b) local spin valve; (c) Schematic of All Spin Logic with perpendicular MTJs.](image)

The local measurement directly measures the standard two-terminal resistance across two FM electrodes (E1 and E2) as shown in Fig. 2.8(b). Spin-polarized electrons are injected from one electrode, transported across the channel, and detected by the second electrode. The difference in the resistance between the parallel and antiparallel magnetization alignments of the two electrodes is the local magnetoresistance which is the signal of spin transport.

ASL device is a non-local spin valve device which can perform logic functions by combinations. There are several types of ASL devices: ASL with No Clock (ASLNC), ASL with Clock (ASLC), ASL with Clock with Biaxial anisotropy (ASLCB) [58] and Graphene based All Spin Logic Gate (G-ASLG) [152]. Fig. 2.8 (c) shows a typical ASL device. It is composed of two MTJs as the memories and injector/detector, and one channel for spin transport. Fig. 2.9 shows the working flow of the ASL device. First, a voltage/current source $V_{supply}/I_{write}$ is applied to write the MTJs. When the writing process finished, a charge current $I_{inj}$ is injected into the channel through the free layer of MTJ injector, and polarized into spin currents whose magnetization orientation depends on the MTJ free layer magnetization orientation and the injection current polarity. The spin current in the channel diffuses in both directions: MTJ detector and input electrode. The spin current flowing into the MTJ detector will switch the state of the detector if enough torque is applied according to STT effect. The switched state can be read by applying a read voltage/current $I_{read}$ and depends on the MTJ free layer magnetization orientation and the injected current polarity.
2.2.2 ASL fundamental and development

2.2.2.1 Spin injection/detection and transport

To describe the spin injection/detection fundamental [148], we use the non-local structure in Fig. 2.10. Two ferromagnetic magnets are on top of a nonmagnetic conductor, separated by a distance of $L_N$. Spin is injected into $N$ from $F1$ and a part of spin current flows towards $F2$, which is indicated in Fig. 2.10(b).

In this case, we suppose that the spin current in $F1$, $F2$ and $N$ are one-dimensional. The boundary conditions in infinities are:

$$\mu_{sN}(\pm\infty) = \mu_{sF1}(-\infty) = \mu_{sF2}(\infty) = 0$$  \hspace{1cm} (2.10)

Figure 2.10 – (a) Non-local geometry for spin injection and detection. (b) Cross view of the non-local geometry.
The charge and spin currents are presented in spin domain and expressed as:
\[
\begin{align*}
\dot{j} &= \dot{j}_\uparrow + \dot{j}_\downarrow = \sigma \nabla \mu + \sigma_s \nabla \mu_s \\
\dot{j}_s &= \dot{j}_\uparrow - \dot{j}_\downarrow = \sigma_s \nabla \mu + \sigma \nabla \mu_s
\end{align*}
\]
(2.11)
where \(\dot{j}_\uparrow/\downarrow\) is the electric current carried by spin up/down electrons, \(\mu / \mu_s\) is the quasichemical charge/spin potential, \(\sigma / \sigma_s\) is the charge/spin conductivity and are expressed as:
\[
\begin{align*}
\sigma &= \sigma_\uparrow + \sigma_\downarrow \\
\sigma_s &= \sigma_\uparrow - \sigma_\downarrow
\end{align*}
\]
(2.12)

Based on Eqs. 2.11 and 2.12, the spin currents at the contact and through the spin-polarizing contact C1 are:
\[
\begin{align*}
\dot{j}_{sF1}(0) &= j_{\sigma F1} + \frac{1}{R_{F1}}\mu_{sF1}(0) \\
\dot{j}_{sc1} &= j_{\sigma C1} + \frac{1}{R_{C1}}[\mu_{sN}(0) - \mu_{sF1}(0)]
\end{align*}
\]
(2.13)
where \(P_{\sigma F1/\Sigma 1}\) is the conductivity spin polarization in F1/C1 and expressed as \(P_\sigma = \frac{\sigma_\uparrow - \sigma_\downarrow}{\sigma_\uparrow + \sigma_\downarrow} = \frac{\sigma}{\sigma_\uparrow}, \quad R_{F1/C1}\) is the spin resistance of F1/C1 and expressed as \(\frac{\sigma}{\sigma_\uparrow}\).

The spin current at \(x = 0\) diffuses towards two directions, and expressed as:
\[
\begin{align*}
\dot{j}_{sN}(0+) &= \frac{1}{R_N}[-\mu_{sN}(0)c\coth(L_N / \lambda_{sN}) + \frac{\mu_{sN}(L_N)}{\sinh(L_N / \lambda_{sN})}] \\
\dot{j}_{sN}(0-) &= \frac{1}{R_N}\mu_{sN}(0)
\end{align*}
\]
(2.14)
where \(R_N\) is the spin resistance of the channel, \(L_N\) is the channel length, \(\mu_{sN}(0)/L_N\) is the spin quasichemical potential at \(x = L_N\) and \(\lambda_{sN}\) is the spin diffusion length of the channel.

The continuity of the spin current in the injector gives:
\[
\dot{j}_{sN}(0+) = \dot{j}_{sN}(0-) + \dot{j}_{sc1} = \dot{j}_{sN}(0-) + \dot{j}_{sF1}
\]
(2.15)

The same procedure for the detector, the spin currents at the contact and the channel, the continuity of spin currents are expressed as:
\[
\begin{align*}
\dot{j}_{sF2}(0) &= -\frac{1}{R_{F2}}\mu_{sF2}(0) \\
\dot{j}_{sc2} &= \frac{1}{R_{C2}}[\mu_{sF2}(0) - \mu_{sN}(L_N)] \\
\dot{j}_{sN}(L_N-) &= \frac{1}{R_N}[-\frac{\mu_{sN}(0)}{\sinh(L_N / \lambda_{sN})} + \mu_{sN}(L_N)c\coth(L_N / \lambda_{sN})] \\
\dot{j}_{sN}(L_N+) &= -\frac{1}{R_N}\mu_{sN}(L_N)
\end{align*}
\]
(2.16)

The continuity of spin currents in the detector is:
\[
\dot{j}_{sN}(L_N-) = \dot{j}_{sN}(L_N+) + \dot{j}_{sc2} = \dot{j}_{sN}(L_N+) + \dot{j}_{sF2}(0)
\]
(2.17)

The detected voltage \(V_{det}\) is expressed as:
\[
V_{det} = \mu_N(\infty) - \mu_{F2}(\infty) - (R_{C2}P_{\Sigma 2} + R_{F2}P_{\sigma 2})\frac{\mu_{sN}(L_N)}{R_{C2} + R_{F2}}
\]
(2.18)
where \(\mu_{sN}(L_N)\) can be extracted from Eqs. 2.15 and 2.17.
The detected voltage $V_{\text{det}}$ is in general positive for parallel and negative for antiparallel magnetization orientations. Often what is detected is the nonlocal resistance $R_{\text{nl}}$, which is expressed as:

$$R_{\text{nl}} = V_{\text{det}} / j$$  \hspace{1cm} (2.19)

or the corresponding difference in the nonlocal resistance for parallel and antiparallel orientations of $F_1$ and $F_2$:

$$\Delta R_{\text{nl}} = R_{\text{nl}}^\uparrow - R_{\text{nl}}^\downarrow$$  \hspace{1cm} (2.20)

### 2.2.2.2 ASL development

Johnson and Silsbee [101] first reported that nonequilibrium spin injected from a ferromagnet diffuses into an Al film over the spin diffusion length of the order of 1 $\mu$m in 1985. During the following two decades, researchers mainly devoted to finding an efficient way to inject and detect the spin signal and the fabrication of spin valve devices. Only until 2010 [46], the application of spin valve in logic, is proposed by Behzad Behin-Aein, with a device named as All Spin Logic Device. In this proposal, the in-plan MTJs are used as the injector/detector. In the previous section, we have explained that the perpendicular MTJ consumes less power. Due to this advantage, the structure with perpendicular MTJs is proposed in [152].

As we explained in the previous subsection, the charge current will be injected into the channel and be polarized into a spin current, which will switch the MTJ state if enough torque is applied. Hence, the value of spin current into the detector is the most important criterion of an ASL device, which is directly related to the spin injection efficiency and the detected non-local resistance. Hence, to improve the detected spin current, the spin injection efficiency and the non-local resistance should be improved.

One way to improve the spin injection efficiency focus on the materials, which could once more be divided into two directions: finding new materials and improving the material quality. The researches on new materials concerns about both FM electrodes and channel. The basic idea for FM electrode material research is looking for a material with high spin polarization, since it is related to how many spins will be polarized into the channel. The half-metallic material [183] is an interesting topic since its spin polarization is nearly 100%. The materials for channel, including metals (Cu, Mg) [54, 108, 114, 117, 131, 224–226], semiconductors (silicon) [48, 227] and new materials like graphene [111, 124–127, 134, 138, 152, 228–232], are also explored. Their spin diffusion lengths influence directly spin detection current. Theoretically, long spin diffusion length is better for spin transport and thus for spin detection current. The graphene, whose spin diffusion length could be as long as 100 $\mu$m [233], is studied a lot in recent years. Another research on material concerns how to improve the spin diffusion length of each material. The temperature and the extrinsic scattering with impurities, defects and boundaries, are the two important factors which are related to. The temperature dependence of spin relaxation process can be studied by means of nonlocal spin valve measurements. Based on this study, the maximum spin diffusion length can be given at a specific temperature [122]. The extrinsic scattering will reduce the spin diffusion length of the material and increase the delay and energy consumption. To reduce this scattering, advanced fabrication methods, proper surface manipulation (such as oxidation [108]), the use of smaller junction area [123] and looking for new structure (e.g. suspended graphene device [133]) are the feasible ways.

Another way to improve the spin injection efficiency focus on overcoming the conductance mismatch problem between the ferromagnet and the channel. Because of the conductance mismatch, most of the spin-polarization of the current in FM relaxes at the interface. Therefore it is difficult to inject spins into the channel. This problem is first revealed by G. Schmidt [234] in 2000. In the same year, many groups began their researches on how to eliminate this problem and found that insertion of a spin-polarized interface (e.g. tunnel contact) between the FM and channel can remedy it [46, 109, 114, 139, 235–237]. Except for insertion of a tunnel barrier, another way reported in [103] uses ultra-fast optical excitation
instead of electrical spin injection. The amount of injected spin is constrained by the amount of excited electrons in the FM. With a femtosecond laser pump pulse, a population of excited electrons and hole in FM is created and a spin polarization of 80% is obtained in Ni-Si interface.

Except for the injection efficiency enhancement of ASL, researchers also made efforts for low power consumption and high density. 3-D ASL design constituting of multiple ASL layers [238] stacked vertically and the scaling down of ASL device [239] can achieve effective power savings and area benefit. The design method, the clocking for 3D stacking and the scaling limits (e.g. dipole coupling between input and output) and material targets are presented in [238] and [51, 239, 240], respectively.

2.2.3 ASL modeling and benchmarking

For performance assessment and circuit analysis, a compact model of ASL, describing the magnetization dynamics and the spin transport property, is necessary. Table 2.1 shows several ASL compact modeling. In 2011, the group of Supriyo Datta proposed the spin-circuit based compact modeling [47, 57, 151, 241]. This modeling contains two components: a description of magnetization dynamics and a circuit model for non-collinear spin transport by combining the well-established spin-diffusion model developed by Johnson-Slitbee [242] and Valet-Fert [243] with a conductance model pioneered by Brataas et al. [244]. The circuit model representation is shown in Fig.2.11(a). Each element (FM, Contact, Channel) is represented by a $\pi$-network. Each block of the $\pi$-network is a $4 \times 4$ conductance matrix: one for charge information and three for spin information corresponding to the x, y, z directions. In 2014, Philip Bonhomme et al. [54] has proposed another modeling approach, using basic electrical circuit elements such as resistors, capacitors and current sources to create the circuit simulation environment, as shown in Fig. 2.11(b). These approaches of modeling allow the modeling and analysis of circuits based on ASL device. However, with MatLab platform, to perform a complex circuit implementation, matrixes of large scale need to be carefully established, which limits the application of this approach in MatLab platform. [152] proposed a Verilog-A model which implements ASL device as a single block based on the pre-established spin injection/detection equations, which avoids exploring the design space for its optimization and the hierarchical design. A scalable Verilog-A model is proposed in [153], yet does not integrate important characteristics such as spin diffusion delay and channel breakdown effects. Moreover, these modeling relies on the non-collinear magneto-electronic theory [244]. It disregards the interface spin-flip scattering and uses finite-element formulation which is implicit for the definition of current-voltage definition. In our thesis, we developed an ASL compact model which takes into account the STT/TMR effects, spin injection/detection/accumulation effects, spin diffusion delay and the channel breakdown effect, based on the fundamental Maxwell’s equations in spin domain, which leads to an explicit definition of current and voltage relations in circuits. This compact model is divided into several parts, which allows the independent design of injector, detector, channel and contact devices. This allows cross-layer optimization of ASL-based circuits and eases the design of hierarchical circuits.

The benchmarking method of ASL-based circuits is proposed by Dmitri E. Nikonov and Ian A. Young in 2013 [156]. This method is based on the number of majority gates used in one circuit. The area is proportional to the number of used majority gates and the evaluation of energy $E$ uses the common formula $E = I^2 R_t$. In our thesis, we use this method to evaluate the developed circuits and systems, taking account of the number of inserted buffers caused by the spin diffusion length limitation.
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Language/software</td>
<td>MatLab</td>
<td>Circuit simulator</td>
<td>Verilog-A</td>
<td>Verilog-A/Cadence</td>
<td>Verilog-A/Cadence</td>
</tr>
<tr>
<td>Scalability</td>
<td>low</td>
<td>low</td>
<td>high</td>
<td>×</td>
<td>high</td>
</tr>
<tr>
<td>Module</td>
<td>separated conductance matrix</td>
<td>separated electrical circuit element</td>
<td>separated conductance block</td>
<td>one block</td>
<td>separated conductance block</td>
</tr>
<tr>
<td>STT effect</td>
<td>coupled with LLG equation (in-plan)</td>
<td>coupled with LLG equation (perpendicular)</td>
<td>integrated the static and dynamic switching expressions (perpendicular)</td>
<td>integrated the static and dynamic switching expressions (perpendicular)</td>
<td></td>
</tr>
<tr>
<td>TMR effect</td>
<td>×</td>
<td>×</td>
<td>×</td>
<td>√</td>
<td>√</td>
</tr>
<tr>
<td>Channel breakdown</td>
<td>×</td>
<td>×</td>
<td>×</td>
<td>×</td>
<td>√</td>
</tr>
<tr>
<td>Channel diffusion time</td>
<td>×</td>
<td>×</td>
<td>×</td>
<td>×</td>
<td>×</td>
</tr>
<tr>
<td>Spin relaxation in FM</td>
<td>×</td>
<td>×</td>
<td>√</td>
<td>×</td>
<td>√</td>
</tr>
<tr>
<td>Fundamental theory</td>
<td>[244]</td>
<td>[244]</td>
<td>[244]</td>
<td>[244]</td>
<td>[148]</td>
</tr>
</tbody>
</table>

### 2.2.4 ASL circuit and system application

ASL device is a promising candidate for future computing, due to its integration of memory and logic into one device. The research of the application of ASL device now focuses on the logic circuit implementation and neural-network. In this subsection, we will present these two applications in detail.

#### 2.2.4.1 ASL logic circuit

The ASL device is first proposed by Behtash Behin-Aein et al. in 2010 [46]. It is argued that the ASL device shows the five essential characteristics for logic applications: concatenability, non-linearity, feedback elimination, gain and a complete set of Boolean operations. Fig. 2.12 shows the Inverter/Buffer and AND/OR logic implemented by ASL device.

The working principle of inverter/buffer is the same with a single ASL device. The injected charge current is polarized and injected into the channel. With the magnetization orientation, the spin current flowing into the detector MTJ will switch its state if enough torque is applied. The information of magnetization orientation depends not only on the magnetization orientation of input MTJ, but also the polarity of the injection current. As shown in Fig. 2.12(a), if the injection current flows into the FM, called positive in this thesis, the spin orientation injected into the channel will be opposite with the magnetization orientation of FM due to the reflection. In this case, the circuit will realize a “NOT” function, i.e. inverter. On the contrary, if the injection current is negative, the function of this circuit is “COPY”, i.e. the buffer. This dependence of current polarity can be used to design the reconfigurable circuits.

Fig. 2.12 (b) shows a structure of AND/OR logic circuit based on ASL device. It contains two input terminals A and B, one function terminal F and one output terminal Output. Unlike the changing current polarity in inverter/buffer circuit, in this circuit, the polarity of injection current for each terminal is fixed to negative, which injected spins of the same
magnetization with input MTJs. The final state of the output is determined by the superposition of spin currents (majority principle) injected from the three terminals A, B and F. The function of this circuit depends on the relative magnetization orientation of two layers of F terminal. If it is parallel, defined as “0” in this thesis, the function of this circuit is “AND”, otherwise, the realized function is “OR” in the case of antiparallel state, defined as “1”. Given the influence of the polarity of injection current on the circuit function, this AND/OR circuit can be changed to NAND/NOR if the positive injection currents are applied.

Theoretically, by using these basic logic circuits, any complex circuits can be constructed based on ASL device, like full-adder, multiplier, more complex computational blocks and so on. Speaking of the circuit design method, since the superposition or cancellation of the spin currents in the channel and the final state dependence of the majority spin magnetization orientation, is similar to the majority principle, the synthesis method using majority gates in [154] is also suitable for ASL based circuit design. This method employs the truth table of the designed function, and with several transformations, the truth table becomes “reduced table” which can be directly used for circuit synthesis based on majority gates. However, this “truth table” method limits the design of circuits with complex truth tables. This limitation can be resolved by the replacement method in [158, 245]. The function is first implemented by AND/OR/Inverter logic gates, which are then replaced by their corresponding majority gates. The majority gate representation is optimized and finally, the designed circuit for this function can be obtained and implemented by ASL device. However, the synthesis of circuits is the first step for a real circuit realization. More efforts need to be made considering the
circuit layout, such as the current diffusion, the area optimization, the scaling limit, etc. In this thesis, we will take these problems into consideration when designing a circuit and list the corresponding parameters for each designed circuit.

2.2.4.2 Neuromorphic application

In the previous chapter, we have stated the possible application of spintronics in the neural network. ASL device, as one of the most promising spintronic devices, can be also used to implement the neuromorphic architectures for the purpose of low power consumption. Mrigank Sharad et al. has proposed two structures of neural network based on ASL device [68, 210, 246]: bipolar and unipolar, as shown in Fig. 2.13 (a) and (b).

![Diagram](image)

Figure 2.13 – (a) Device structure for bipolar spin neuron based ASL; (b) device structure for unipolar spin neuron based ASL.

The bipolar structure shown in Fig.2.13(a) consists of five magnet terminals: inputs \( m_1 \) and \( m_2 \), preset \( m_3 \), output \( m_4 \) and read \( m_5 \). Both the magnetization orientations of \( m_1 \) and \( m_2 \) lie along their easy axis, yet are opposite, which makes this structure bipolar. The initial magnetization orientation of \( m_4 \) is either the same with \( m_1 \) or with \( m_2 \). The magnet \( m_3 \) has its easy axis orthogonal to that of \( m_1 \) and \( m_2 \) and is used to implement current-mode Bennett-clocking for low power consumption. A current will be injected through \( m_3 \) to \( m_4 \), presetting \( m_4 \) along its hard axis. The current of \( m_4 \) is overlapped with the current injected through \( m_1 \) and \( m_2 \). When the current of \( m_3 \) is removed, \( m_4 \) will switch back to its easy axis and its magnetization orientation depends on the superposition of the spin current from \( m_1 \) and \( m_2 \). The final state can be read by \( m_5 \). In this structure, \( m_4 \) works as a reception-neuron and evaluate the step function with its threshold. \( m_1 \) and \( m_2 \) realize the excitatory and inhibitory synapse functions, or the contrary, respectively.

Fig. 2.13(b) shows the unipolar structure. It consists of four terminals: input \( m_1 \), preset \( m_2 \), output \( m_3 \) and read \( m_4 \). The functions of \( m_2 \), \( m_3 \) and \( m_5 \) are the same with that of bipolar structure. In this structure, only one terminal \( m_1 \) for input makes this structure unipolar. It receives the difference of the currents from excitatory and inhibitory synapses. The difference of currents is carried out by outside circuits, such as CMOS circuits.

Considering that the MTJ used in ASL can only represent binary state, which limits the function of synapse, many efforts are made to find a spintronic device which can have analogue state that can also replace MTJs and Domain Wall Magnet(DWM) is found to be suitable for analogue synapse function (Fig. 2.14) [55, 72, 75, 79]. The synaptic weight, i.e. the spin polarization of the DWM, is proportional to the offset of the DW location from the center. Supposing the left part of the DWM is up-spin and the right is down-spin, for the extreme left location of DW, the current injected into the channel will be maximally up-spin polarized and vice versa. The net polarization is reduced to zero for the central location of the DW, as the equal amount of up and down spin electrons are injected into the channel in this case.

The aforementioned ASL based neural structures have some limitations. One limitation is on the fan-in/out caused by the spin diffusion length of the channel. The number of input synapses is limited whereas a real neural network often needs thousands of synapses. Moreover, limited spin diffusion length introduces the mismatch between the strengths of
different DWM synapses, depending on their location with respect to the neuron magnet, like the magnets S1 and S2 in Fig. 2.14. The mismatch can be mitigated by adjusting the value of current injected into each magnet. Considering these limitations, one solution is finding a new material with longer spin diffusion length or inserting the buffers to improve the fan-in/out. Another solution is combining ASL device with the memristive synapses to establish the neural network.

2.3 Summary

In this chapter, we reviewed the state-of-the-art of MTJs and All Spin Logic (ASL) device. For MTJs, we mainly investigated the fundamental of MTJ: TMR effect, in which part we deduced the conductance equation of MTJ for current calculations; STT effect, where the LLG equation is the basis of delay calculation and MTJ scaling. Its applications as memory, circuit, and neural synapse are also introduced. For ASL device, its working principle and fundamental are investigated for the performance analysis. The development of ASL, mainly the injection efficiency enhancement is presented. We devoted a large number of pages for ASL modeling and circuit design, elaborating the need for an improved modeling and design method, which are the focuses of our thesis.
Chapter 3

Compact Modeling of ASL Device

3.1 Physical Model of ASL Device

3.1.1 MTJ models

3.1.2 Spin injection/detection Model

3.1.3 Scaling effects

3.2 Electrical Model of ASL Device

3.2.1 Model language

3.2.2 Model parameters

3.2.3 Model hierarchy

3.2.4 Model implementation

3.3 Results

3.3.1 Model validation

3.3.2 ASL device performance analysis

3.3.3 Inverter/Buffer simulation

3.4 Summary

Preface

As ASL technology is gaining in maturity, compact models are needed to fill the gap between application requirements at circuit/system level and device fabrication at the device level. Accurate simulations of spin injection/detection effects are needed to evaluate the circuit performance. Furthermore, the models should be generic to allow exploring the performance tradeoffs and be scalable to investigate the hierarchical circuit design. However, to our knowledge, there is no such model in the literature. Indeed, models have been implemented in Mat- Lab to execute transformed conductance matrix [47,57,151,241]. These approaches are not scalable and cannot be used for complex circuits design. Verilog-A model in [152] implements ASL device as a single block, which avoids exploring the design space for its optimization and the hierarchical design. A scalable Verilog-A model is proposed in [153], yet does not integrate important characteristics such as spin diffusion delay, channel breakdown effects and the STT effect.

In this chapter, an electrical model of the ASL device is developed with Verilog-A language on platform Cadence, based on the Maxwell’s equations in spin domain. It integrates i) effects with MTJ: the tunneling resistance, the static and dynamic property of STT effect, ii) spin injection/detection/accumulation effects. The channel breakdown current and the scaling effects are also investigated in this model. Divided into six independent blocks: Injector/Detector, the channel N, the ground G and two types of contacts C1/C2, this model
allows the design of hierarchical circuits. Validated by comparing with the experimental results, this model is used to implement and evaluate the ASL-based circuit. Moreover, spin injection/detection expressions are derived from the fundamental Maxwell’s equations in the spin domain, enabling to discuss the phenomena of ASL experiments and provide a basis for circuit optimization.

3.1 Physical Model of ASL Device

In this section, the physical models of an ASL device, including the MTJ model and spin injection/detection model, are presented. The MTJ model integrates the tunnel resistance, the switching threshold current, and the switching time calculations. The spin injection/detection model is developed based on the spin-circuit concept, specifying the current-voltage relation of each part of the ASL device. The breakdown current of the channel (metal or semiconductor) and the diffusion time in the channel are also investigated. Considering the scaling down of the ASL device for the performance improvement, we also analyzed the scaling effects of the ASL device. Moreover, the equations of certain performance criteria of the ASL device: the injection/detection efficiency, detection voltage and non-local resistance, are deduced, which enables the analysis of spin injection/detection experiments.

![ASymmetric ASL device with perpendicular MTJ and its compact model.](image)

Figure 3.1 – Asymmetric ASL device with perpendicular MTJ and its compact model. (a) ASL device with the asymmetric structure: $L_N$, $W$ and $L_F$ as the channel length, MTJ width and MTJ length. Two MTJs are used as the memories and their free layers form the injector/detector with the channel; A tunnel barrier is only placed between the injection free layer and the channel, which forms an asymmetric structure; An insulator is placed underneath the MTJ, to prevent the current flowing into the another channel; A ground lead is placed near the injector, to guarantee the non-reciprocity of the circuit. (b) MTJ switching with different current polarities. (c) Spin-circuit model of the basic ASL device. Each block is a π-network, and corresponds with the component in (a).

Fig. 3.1 (a) illustrates the ASL device we consider. It is mainly composed of i) two perpendicular MTJs to inject/detect spin currents and store spin information and ii) a channel for spin current transmission. The MTJ is composed of one oxide barrier sandwiched between two ferromagnetic layers (FM). Depending on the relative magnetization orientations of two FM layers, the MTJ has two resistance levels ($R_P$ and $R_{AP}$) that are represented by states “0” and “1”. The state of MTJ is written by applying a voltage/current source ($V_{write}/I_{write}$) above a critical current $I_{c0}$. Then, a charge current $I_{inj}$ is injected through the MTJ free layer and polarized into the channel. With spin-flipping and diffusion through the channel, the spin current arriving at the detector will switch the MTJ state if it is larger than the critical current $I_{c0}$. The resulting state $S_{out}$ depends on the injected current $I_{inj}$ polarity and the input MTJ state $S_{in}$. A negative (resp. positive) value for $I_{inj}$ – injected from MTJ free layer to the channel (channel to free layer), will lead to $S_{out} = S_{in}$ (resp. $S_{out} = \text{not}(S_{in})$).
3.1.1 MTJ models

As illustrated in Fig. 2.8 of Chapter 2, ASL device contains two MTJs as the memories and their free layers form the injector/detector with the channel. The MTJ model integrates the tunneling resistance model for the current calculation, and the STT model for critical current and delay calculations. The temperature evolution is also analyzed in this model.

3.1.1.1 Tunneling resistance model

As explained in Chapter 2, an MTJ has two different resistances/conductances, depending on the relative magnetization orientations of its two magnetic layers: low resistance for the parallel magnetizations and high resistance for the anti-parallel magnetizations. Based on the two different resistances, an MTJ can represent the binary numbers: “0” for parallel and “1” for anti-parallel. Its conductance depends on the applied bias voltage and the temperature, as shown in Eqs. 2.2, 2.4 and 2.5. In our model, the simplified resistance model of a perpendicular CoFeB/MgO MTJ in [247] is used, as shown in Eq. 3.1:

\[
R_P = \frac{t_{ox}}{F \times \phi^{1/2} \times Area} \times \exp(1.025 \times t_{ox} \times \phi^{1/2}) \tag{3.1}
\]

\[
R_{AP} = R_P \times (1 + TMR)
\]

where \(R_P(R_{AP})\) is the resistance of the MTJ in the parallel(anti-parallel) state, \(TMR\) is the tunnel magnetoresistance ratio, \(\phi = 0.4\) is the potential barrier height of crystalline MgO [31], \(t_{ox}\) is the thickness of the oxide barrier and \(Area\) is the MTJ area. \(F\) is a factor calculated from the resistance-area product (RA) value of MTJ. In this model \(RA\) is defined as 10, which gives \(F = 332.2\).

3.1.1.2 TMR ratio model

Like the conductance, the TMR ratio depends not only on the bias voltage but also on the temperature. It is given in [248]:

\[
TMR(V, T) = TMR(T) \cdot (1 + \frac{V^2}{V_h^2})^{-1}
\]

\[
TMR(T) = (TMR_0 + 1)/(1 + 2Q \beta_{AP} \ln(k_B T/E_c)) - 1
\]

where \(V\) is the applied bias voltage on MTJ, \(V_h\) is the bias voltage where \(TMR_{real} = 0.5 \times TMR(0)\), \(TMR_0\) is the value at zero temperature, \(E_c\) is the magnon energy cutoff energy, \(Q\) describes the probability of a magnon involved in the tunneling process, \(\beta_{AP} = S k_B T/E_m\), \(S\) is the spin parameter, \(E_m\) is related to the Curie temperature of the ferromagnetic electrodes and is given by \(E_m = 3k_B T_C/S + 1\).

3.1.1.3 Static switching model

In a STT switching, only a current/voltage greater than the threshold current/voltage can switch the state of MTJ. The threshold current is calculated as:

- In the high bias regime and for uniaxial anisotropy only, the threshold current \(I_{t0}\) is given as:

\[
I_{t0} = \alpha \frac{\gamma e}{\mu B g} (\mu_0 M_s) H_K V = 2\alpha \frac{\gamma e}{\mu B g} E
\]

where \(E\) is the barrier energy, \(\alpha\) is the magnetic damping constant, \(\gamma\) is the gyromagnetic ratio, \(e\) is the elementary charge, \(\mu_B\) is the Bohr magneton, \(V\) is the free layer
volume, $H_K$ is the anisotropy field, $\mu_0$ is the permeability in free space, $M_S$ is the saturation magnetization and $g$ is the spin polarization efficiency factor and give by:

$$
g = g_{sv} \pm g_{tunnel}$$

$$
g_{sv} = -4 + (P^{-1/2} + P^{1/2})^3 \cos \theta$$

$$
g_{tunnel} = \frac{P}{2(1 + P^2 \cos \theta)}$$

where $g_{sv}$ and $g_{tunnel}$ are the spin polarization efficiency values in a spin valve and tunnel junction nanopillar, respectively; $P$ is the spin polarization percentage of the tunnel current and $\theta$ is the angle between the magnetization of the free and reference layers.

- In the low bias regime and non-uniaxial (triaxial and cubic) anisotropy system, the threshold current should be calculated with the LLG equation (Eq. 2.6 in Chapter 2.1.2.2).

### 3.1.1.4 Dynamic switching model

The switching delays of the two different regimes (Sun model ($I > I_0$) and Néel-Brown model ($I < I_0$)) can be calculated as follows:

$$
\tau = \tau_0 exp\left(\frac{E}{k_B T}(1 - \frac{I}{I_0})\right) \quad (I < I_0)
$$

$$
\frac{1}{\langle \tau \rangle} = \left[\frac{2}{C + ln(\frac{z^2}{4})}\right] \frac{\mu_B P_{ref}}{em(1 + P_{ref}P_{free})(I - I_0)} \quad (I > I_0)
$$

where $C \approx 0.577$ is the Euler’s constant, $\Delta = \frac{E}{k_B T}$ is the thermal factor, $P_{ref}$ and $P_{free}$ are the tunneling spin polarizations of the reference (fixed) and free layers and $m$ is the magnetic moment of the free layer.

The same with the threshold current, in a triaxial or cubic anisotropy system, this model is not suitable and the switching time should be calculated with the fundamental LLG equation (Eq. 2.6 in Chapter 2.1.2.2).

### 3.1.1.5 Temperature evolution model

In a MTJ, a large current density is necessary for the magnetization switching, which will heat up the MTJ due to the Joule heating. The temperature will influence the TMR, the critical current $I_c$ and the switching delay $\tau$ of a MTJ. Hence, it is necessary to investigate the temperature evolution in a MTJ. It is given as [26]:

$$
\frac{V \times j}{2\lambda/\text{thick}_b} = \langle T - T_R \rangle + \tau_{th} \times \frac{dT}{dt}
$$

$$
\tau_{th} = \frac{C_v \times \text{thick}_s}{2\lambda/\text{thick}_b}
$$

where $V$ is the voltage across MTJ, $\lambda$ is thermal conductivity of the thermal barrier, $C_v$ is heat capacity per unit volume, $j$ is current density, $T_R$ is room temperature, thick_b is the thickness of thermal barrier, thick_s is the total thickness of MTJ and $\tau_{th}$ is the characteristic heating/cooling time.

Based on this temperature evolution model, one can get the value of the temperature and thus the values of the temperature-dependent parameters.
3.1.2 Spin injection/detection Model

We developed the spin injection/detection model based on the spin-circuit concept and the fundamental Maxwell’s equations in the spin domain. The voltage-current relation of each part in ASL device: Ferromagnet FM1/FM2, Interface C1/C2, Channel N and Ground lead G, is investigated. The breakdown current model and the diffusion time model of the channel are also developed. Moreover, in this subsection, we also deduce the equations of performance criteria for our spin injection/detection model, like the spin injection/detection efficiency, the detected voltage and the non-local resistance, from the Maxwell’s equations in the spin domain, which can be used to analyze the phenomena and optimize the device performance in spin injection/detection experiments.

3.1.2.1 Spin circuit model

In an ASL device, the non-reciprocity required for logic implementation is enabled by putting the ground lead closer to the input FM. Hence, the ground lead model is required to guarantee the non-reciprocity. Moreover, two interface models are included in this model to analyze the conductance mismatch problem and the asymmetric structure of the ASL device: i) a simple FM-N contact with no material between the ferromagnet and the channel and ii) a more complex contact involving a tunnel barrier (TB) to improve the spin injection efficiency. Therefore, our spin injection/detection model is divided into six cells: two ferromagnetic layers FM1/FM2, one channel N, one ground G and two interface C1/C2. Each cell is represented by a $\pi$-network, containing a series conductance matrix and two shunt conductance matrices, as shown in Fig. 3.1 (c).

Modified Maxwell’s equations for ASL model Our ASL compact model relies on current-voltage equations deduced from Maxwell’s equations in the spin domain. The set of equations defined in [148] corresponds to the generalized form of Kirchhoff’s Potential and Flow laws (KPL and KFL). They are defined by:

\[
\begin{align*}
    j &= \sigma \nabla \mu + \sigma_s \nabla \mu_s \\
    j_s &= \sigma_s \nabla \mu + \sigma \nabla \mu_s
\end{align*}
\]  

(3.7)

(3.8)

where $j$ (resp. $j_s$) is the charge (resp. spin) current density, $\sigma$ (resp. $\sigma_s$) is the charge (resp. spin) conductivity and $\mu$ (resp. $\mu_s$) is the charge (resp. spin) quasi-chemical potential.

From these basic current rules, we thus define a set of charge and spin currents device-specific rules. In our model, $\rho(\rho_s)$ is the charge/spin resistivity, $L$ is the length, $\lambda_s$ is the spin diffusion length and $t$ is the thickness.

We assume 4 types of devices: injector/detector, contact, channel and ground, and define their charge/spin currents as follows:

- **Injector and detector:**

\[
\begin{align*}
    I_F(0) &= \pi W L_{Fi} \Delta \mu + \frac{\pi P_{Fi} W L_{Fi}}{4\rho_{Fi} t_{Fi}} \mu_{sFi}(0) \\
    I_sF(0) &= \frac{\pi P_{Fi} W L_{Fi}}{4\rho_{Fi} t_{Fi}} \Delta \mu + \frac{P_{Fi}^2 \pi W L_{Fi}}{4\rho_{Fi} \lambda_{sFi}} \mu_{sFi}(0)
\end{align*}
\]  

(3.9)

(3.10)

where $W$ is the MTJ width, $P_{Fi}$ is the ferromagnet spin conductivity polarization, $t_F$ is the free layer thickness and $L_{Fi}$ is the MTJ free layer length.

- **Contacts:**

\[
\begin{align*}
    I_{Ci} &= \frac{\pi W L_{Fi}}{8 R A_{Ci}} \Delta \mu + P_{Ci} \cdot \frac{\pi W L_{Fi}}{8 R A_{Ci}} \Delta \mu_s \\
    I_{sCi} &= P_{Ci} \cdot \frac{\pi W L_{Fi}}{8 R A_{Ci}} \Delta \mu + \frac{\pi W L_{Fi}}{8 R A_{Ci}} \Delta \mu_s
\end{align*}
\]  

(3.11)

(3.12)
where $P_{C_1}$ is the spin resistance polarization of the contact and $R A_{C_1}$ is the resistance area product of the contact. We assume two types of contacts: i) a simple FM-N contact with no material between the ferromagnet and the channel and ii) a more complex contact involving a Tunnel Barrier (TB) to improve the spin injection efficiency.

- **Channel:**

\[
I_N = \frac{W t_N}{\rho_N L_N} \Delta \mu \\
I_{SN}(0/L_N) = \frac{W t_N}{\rho_N \lambda_{SN}} \frac{\mu_{SN}(L_N) - \mu_{SN}(0)}{\sinh(L_N/\lambda_{SN})} \left[ (\cosh(L_N/\lambda_{SN}) - 1) \mu_{SN}(L_N), \right] 
\]

where $L_N$ is the channel length.

- **Ground:**

\[
I_G = \frac{L_{F1} t_G}{2 \rho_G L_G} \Delta \mu \\
I_{SG} = \frac{L_{F1} t_G}{2 \rho_G \lambda_{SG}} \mu_{SG}(0)
\]

**Channel diffusion delay and breakdown current**

- **Channel diffusion delay**

The average transit time of carriers through the interconnect proposed in [249] is defined as:

\[
t_{DIFF} = \frac{L_N^2}{2D} + \frac{L_N}{v_f}
\]

where $L_N^2/2D$ is the diffusive time constant and $L_N/v_f$ is the ballistic time constant. $D$ is the electron diffusion coefficient, and $v_f$ is the Fermi velocity of electrons.

- **Channel breakdown current**

A channel is characterized by a breakdown current density $J_{BR}$. It corresponds to an upper limit a current density should not exceed to avoid channel destruction or malfunction. As detailed in [59, 250–253], the physical phenomenon induced by the breakdown current depends on the channel material:

- For a metal material, a large current density leads to a high electromigration, which results in the breakdown of the channel. By considering the copper material, the Blech model [252, 253] allows defining the maximum current density $J_{BR,Cu}$ by:

\[
J_{BR,Cu} \times L_N = \frac{\Omega \Delta \sigma}{Z^* e \rho_{Cu}}
\]

where $\Omega$, $\Delta \sigma$, $Z^*$ are the atomic volume, normal stress difference between stripe ends and the effective valence of Cu, respectively; $\rho_{Cu}$ is the resistivity and $e$ is the electron charge.

- For semiconductor materials, channel breakdown occurs when Joule heating effect leads to a temperature larger than the fusion point. For such a material, the
maximum current density is defined by [250, 251]:

$$J_{BR} = \left[ g(T_{BD} - T_0) \right] \frac{\rho T_N W}{\pi k_{ox} \ln(6(t_{subox} W + 1)) + \frac{k_{ox}}{t_{subox}} W}$$

\begin{align*}
g^{-1} &= \left\{ \frac{\pi k_{ox}}{\ln(6(t_{subox} W + 1))} + \frac{k_{ox}}{t_{subox}} W \right\}^{-1} \\
R_T &\approx L_{Hm} / [k_m t_m (W + 2L_{Hm})] \\
L_{Hm} &= [k_m / (k_{ox} t_m t_{subox})]^{1/2}
\end{align*}

(3.19)

Where $T_{BD}$ and $T_0$ are the breakdown and room temperatures respectively; $g$ is the contact thermal resistance per length unit, $L_H = \sqrt{k_g W t_N / g}$ is the thermal healing length, $k_g$ is the thermal conductivity of channel material, $k_{ox}$ and $t_{subox}$ are the thermal conductivity and thickness of the substrate, respectively; $R_{Cox}$ is the contact thermal resistance between channel and substrate, $K_S$ is the thermal conductivity of the highly doped Si substrate, $R_T$ is the contact thermal conductance, $L_{Hm}$ is the thermal healing length of heat spreading into the contact, $k_m$ and $t_m$ are the thermal conductivity and thickness of the metal electrodes.

3.1.2.2 Performance equations

To analyze and to optimize the performance dependence of the parameters in an ASL device, we deduced the equations of the performance criteria of an ASL device based on the fundamental Maxwell’s equations. The ASL used in this deduction is shown in Fig. 3.1 (a), with an insulator inserted underneath the MTJs, to prevent the current from flowing into another channel. These equations can be also used to calculate the parameters with the known performance in such an ASL spin injection/detection experiment and discuss the experimental phenomena.

**Detected voltage and non-local resistance** In an ASL device, the voltage caused by spin accumulation and spin transport is measured. This is related to the amount of polarized spins that has been transported to the detector. It is defined as:

$$V_{det} = \mu_N(\infty) - \mu_{F2}(\infty)$$

(3.20)

By applying the current continuity conditions, $V_{det}$ is calculated as:

\begin{align*}
V_{det} &= -I_{inj}(R_{C2} + R_{F2}) - I_{det} \frac{R_{C2} R_{F2}}{R_{C2} + R_{F2}} (P_{C2} - P_{F2})^2 \\
&\quad - \frac{P_{C2} R_{C2} + P_{F2} R_{F2}}{R_{C2} + R_{F2}} \mu_s N(L_N) \\
R_{C2} &= \frac{RA_{C2}}{WL_F} \\
R_{F2} &= \rho_{F2} \frac{t_{FM}}{WL_{F2}}
\end{align*}

(3.21)

where $I_{inj} / I_{det}$ are the charge currents in the injector/detector; $R_{C2/F2}$ are the real resistances of $C2/F2$; $R_{C2/F2}$ are the spin resistances of $C2/F2$; $P_{\Sigma2}$ is the conductance polarization

\begin{align*}
\mu_s (d) &= \frac{A}{B}
\end{align*}
of $C_2$ and $P_{SF_2}$ is the conductivity polarization of $F_2$; $\mu_{SN}(d)$ is the spin quasi-chemical potential at the end of channel.

The non-local resistance $R_{NL}$ is:

$$R_{NL} = \frac{V_{det}}{I_{inj}}$$

(3.22)

and the non-local resistance difference $\Delta R_{NL}$ is:

$$\Delta R_{NL} = \frac{V_{det} - V_{det_p}}{I_{inj}} \approx 2R_{NL}$$

(3.23)

**Injection efficiency** The spin injection efficiency $P_{inj}$ allows estimating the currents injected into the channel. The spin current out of the injected MTJ and the injection efficiency are expressed as:

$$I_{inj} = I_{inj} \frac{P_{C1}R_{c1} + P_{F1}R_{F1} + 2\mu_{SN}(0)/I_{inj}}{R_{c1} + R_{F1}}$$

(3.24)

$$P_{inj} = \frac{I_{sinj}}{I_{inj}}$$

(3.25)

where $R_{C1}/F_1$ are the spin resistances of $C1/F1$; $P_{C1}$ is the conductance polarization of $C1$; $P_{SF1}$ is the conductivity polarization of $F1$ and $\mu_{SN}(0)$ is the spin quasi-chemical potential of the head of channel.

**Detection efficiency** The ASL efficiency $P_{eff}$ gives the spin current that is transported to the detector to switch the MTJ state. The spin current out of the channel and the detection efficiency are expressed as:

$$I_{sdet} = I_{det} \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{R_{c2} + R_{F2}} - \frac{1}{\mu_{SN}(L_N)}$$

(3.26)

$$P_{eff} = \frac{I_{sdet}}{I_{inj}}$$

(3.27)

$$R_{C1} = \frac{RA_{C1}}{(1 - P_{C1}^2)WL_{Fi}}$$

$$R_{F1} = \frac{\lambda x_{Fi}}{(1 - P_{C1}^2)WL_{Fi}}$$

(3.28)

The above equations vary depending on different cases:

$$A = -I_{inj}R_N \frac{P_{C1}R_{c1} + P_{F1}R_{F1}}{R_{c1} + R_{F1}} + I_{det}R_N^2 \sinh(L_N/\lambda x_N) \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{(R_{c2} + R_{F2})(R_{c1} + R_{F1})} + I_{det}R_N e^{L_N/\lambda x_N} \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{R_{c2} + R_{F2}}$$

$$B = e^{L_N/\lambda x_N} \frac{\sinh(L_N/\lambda x_N)}{\cosh(L_N/\lambda x_N)} \frac{R_{N}cosh(L_N/\lambda x_N)}{R_{c1} + R_{F1}} + e^{L_N/\lambda x_N} \frac{R_N^2}{R_{c2} + R_{F2}} + \frac{R_N^2}{(R_{c1} + R_{F1})(R_{c2} + R_{F2})} - \frac{1}{\sinh(L_N/\lambda x_N)}$$

(3.28)

$$R_N = \mu x_N \frac{\lambda x_N}{WL_N}$$

$$\mu_{SN}(0) = \sinh(L_N/\lambda x_N) \frac{R_N}{R_{c2} + R_{F2}} + \cosh(L_N/\lambda x_N) \mu_{SN}(L_N)$$

$$- I_{det} \sinh(L_N/\lambda x_N) R_N \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{R_{F2} + R_{C2}}$$

$$= \sinh(L_N/\lambda x_N) \frac{R_N}{R_{c2} + R_{F2}} + \cosh(L_N/\lambda x_N) \mu_{SN}(L_N)$$

$$- I_{det} \sinh(L_N/\lambda x_N) R_N \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{R_{F2} + R_{C2}}$$

$$= \sinh(L_N/\lambda x_N) \frac{R_N}{R_{c2} + R_{F2}} + \cosh(L_N/\lambda x_N) \mu_{SN}(L_N)$$

$$- I_{det} \sinh(L_N/\lambda x_N) R_N \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{R_{F2} + R_{C2}}$$
• If there is charge current $I_{det}$ in the channel or not, namely $I_{det} \neq 0$ or $I_{det} = 0$

• The contact is transparent ($R_{ci} \ll R_N$) or a tunnel barrier ($R_{ci} \gg R_{Fi}$).

• The contact is spin polarized ($P_{ci} \neq 0$) or unpolarized ($R_C = \Re_C$).

The above equations allow estimating the ASL device performances given material parameters and device dimensions. This allows adapting the model to experimental results and, through extrapolations, to predict ASL device performances for more advanced technologies. For a circuit with multi-inputs/outputs, the closed expressions cannot be used and we use the compact model to implement the circuit and simulate the performance.

3.1.3 Scaling effects

To improve the energy and the delay performance of the ASL device, one way is scaling down the device dimensions. However, when scaled down, the results of the experiments of the ASL devices with small dimensions cannot be analyzed based on the existing models presented in the previous subsection. Therefore, it is necessary to explore the phenomena caused by the scaling down and to modify the models.

3.1.3.1 Thermal stability

The thermal stability factor $\Delta = \frac{\mu_0 M_s H_{K_{eff}} V}{2 k_B T} = \frac{K_{eff} V}{k_B T}$ determines the duration of non-volatility of MTJ. In general, it has to stay above 69 to guarantee ten year’s retention time. From the equation, we can see that the MTJ dimension has an influence on the thermal stability factor. Hence, we have to study the relation between the MTJ dimension and the thermal stability factor in the process of scaling down, to guarantee the non-volatility property of an MTJ.

As observed in several experiments [254, 255], the thermal stability factor $\Delta$ remains constant down to MTJ diameter $W$ of 30 nm, and starts to decrease when $W$ becomes smaller. The reasons behind this phenomenon are the size dependence of effective magnetic fields and the different reversal regimes. By using a circular MTJ, we analyzed the thermal stability factor $\Delta$ dependence of MTJ diameter $W$.

First, we analyze the size dependence of effective magnetic fields. The effective perpendicular magnetic anisotropy energy density $K_{eff}$ is calculated as follows:

$$K_{eff} = \frac{K_i}{t_F} - \frac{\mu_0 M_s^2}{2} (N_Z - N_X) \quad (3.29)$$

where $K_i$ is the interfacial anisotropy; $t_F$ is the thickness of the free layer; $N_Z$ and $N_X$ are demagnetization factors along out-of-plane and in-plane directions, respectively. In the case where the shape of P-MTJ is circular, $N_X$ is expressed as $N_X = (1 - N_Z)/2$ and $N_Z$ is given as:

$$N_Z = \frac{1}{t_F} \left( t_F + \frac{W}{2} - \sqrt{t_F^2 + \frac{W^2}{4}} \right) \quad (3.30)$$

We can deduce from this equation that, as the diameter $W$ shrinks, $N_Z$ ($N_X$) decreases (increases) [256] and $K_{eff}$ increases with decreasing $W$ owing to the decrease (increase) of $N_Z$ ($N_X$).

The phenomena of remaining constant above 30 nm and decreasing when $W$ becomes smaller for $\Delta$, are observed in in [255], suggesting that two different regimes are applied in these two domains [254], which is divided by the nucleation size $W_n$ [254]. $W_n$ is of the order of the domain wall width $\delta_w = \pi (A_S/K_{eff})^{1/2}$ [190] and $A_S$ is the exchange stiffness constant.
• When \( W > W_n \), the nucleation type magnetization reversal takes place, in which case 
\( \Delta \) is determined by the nucleation size instead of the junction diameter and is expressed by 
\( \Delta \approx \pi^3 A_s t_F/4k_BT \).

• When \( W < W_n \), the single-domain magnetization reversal takes place and \( \Delta \) is given by 
the effective anisotropy times the volume of the recording layer 
\( \Delta = K_{eff} \pi(W/2)^2 t_F/k_BT \).

3.1.3.2 Critical current/damping factor

Speaking of the critical current \( I_{c0} \), it shows a monotonic decrease with the decrease of \( W \). 
To further explore how \( I_{c0} \) varies with \( W \), we investigate the \( I_{c0} \) models in the two regimes.

• Previous study shows that \( I_{c0} \) in the nucleation reversal regime, i.e. \( W > W_n \), is 
proportional to the area of the device, which corresponds to Eq. 3.3.

• For the device with \( W < W_n \), \( I_{c0} \) can be expressed as \( I_{c0} = 4\alpha \mu P K_{eff} V \), where \( \alpha \) is 
the magnetic damping constant, \( e \) is the elementary charge, \( \hbar \) is the Dirac constant, \( P \) 
is the spin polarization, and \( V \) is the recording layer volume (= \( \pi(W/2)^2 t_F \)).

Based on the models of \( \Delta \) and \( I_{c0} \) in single-domain magnetization reversal regime, i.e. 
\( (W < W_n) \), the ratio of these two parameters \( \frac{\Delta}{I_{c0}} = \frac{hP}{4\alpha \mu P \hbar} \) should be constant. However, 
experimental results show that it continues to increase with the decreasing of \( W \). Therefore, 
it is suggested that the effective damping constant \( \alpha \) decreases as \( W \) decreases below \( W_n \).

3.1.3.3 Interconnection

In an ASL device, the spin diffusion delay in the channel and the spin detection efficiency are 
determined by the channel spin diffusion length \( \lambda_{sN} \) and the channel length \( L_N \). When 
the ASL device is scaled down, the size effect, i.e. surface and grain boundary scattering, will 
affect several important parameters of ASL interconnect including the resistivity, diffusion 
coefficient and spin diffusion length.

In most centrosymmetric materials (metals and Group IV semiconductors), the dominant 
spin relaxation mechanism is the Elliott-Yafet (EY) mechanism. As the interconnection 
dimensions scale down, the surface to volume ratio of the wire will increase. The electrons 
interact more often with wire surfaces and get backscattered more frequently. In addition, 
because the average grain size in wires is equal to either the wire width or the wire thickness, 
whichever is smaller. As cross-sectional dimensions are scaled down, the grains become 
smaller and electrons will pass through a larger number of grains, which again increases the 
scattering rate. The increase in scattering rate lowers the spin relaxation time and diffusion 
coefficient [50]. The models for spin relaxation time \( \tau_s \) and spin diffusion length \( \lambda_{sN} \) are 
presented as [257]:

\[
\tau_s^{net} = \frac{\tau_p^{d,ph}}{\tau_p^{d,ph} + \tau_p^{ph,dd}} \\
\lambda_{sN}^{net} = \sqrt{D^{net} \tau_s^{net}} \\
D^{net} = \left( \frac{2E_{Fermi}}{3m^* v_f^2} \right) \lambda_m^{net} v_f
\]

where \( a^d \) and \( a^{ph} \) are spin-flipping probabilities corresponding to defects and phonons, respectively; 
\( \tau_p \) is the momentum-relaxation time; \( D^{net} \) is the net diffusion coefficient of electrons,
\( m^* \) is their effective mass, \( E_{Fermi} \) is the metal Fermi level, \( v_f \) is the Fermi velocity of electrons,
\( \lambda_m^{net} \) is the net electron Mean Free Path (MFP) in the material.
3.1.3.4 Dipolar interaction

As a smaller interconnection length will lead to a smaller spin transport delay and a larger spin detection current, scaling down the interconnection length will enhance the performance of the ASL device. However, when the interconnection length is scaled down to a certain value, the dipolar coupling between magnets will affect the performance of the circuits. As presented in [240], when dipolar coupling between magnets is included, the switching delay becomes bigger and the input magnetization dynamics is also modified. This is because the presence of dipolar fields acts like an additional anisotropy to the magnet, making STT to become relatively weaker. To mitigate the dipolar effect, one way is adding material anisotropy to the magnets. Another way is increasing the current to make the output mainly dominated by STT. However, this will increase the energy consumption. Rather than injecting larger currents, using a magnet with a smaller saturation magnetization ($M_s$), or making the magnet size smaller, is usually employed to enhance the STT without consuming additional energy.

In a circuit design, since it is difficult to get the exact value of the injection current or applied anisotropy field, or the saturation magnetization, we can think of this problem of dipolar interaction from another way: which is to guarantee enough space between each magnet to eliminate the dipolar coupling. The micro-magnetic simulation software, OOMMF [51, 258], which takes the magnetic material parameters and configurations as an input, can numerically compute the resultant magnetization dynamics by solving the LLG equations. By observing the magnetization dynamics, we can determine the distance where there is no dipolar interaction.

3.2 Electrical Model of ASL Device

This section focuses on the ASL modeling by using the Verilog-A language in Cadence, based on the developed physical model in the previous section. By using the Inverter/Buffer circuit as an example, we presented how to implement circuits with the developed model.

3.2.1 Model language

We use Verilog-A HDL language to program our electrical model of ASL device. The behavior of each model can be described mathematically in terms of the current/voltage and external parameters applied to the model. The interconnections between different modules are defined based on the Kirchhoff’s laws, which meets the aforementioned spin-circuit concept. Hence it provides a good programming flexibility, which makes it easy to implement the ASL-based circuits. Moreover, the user-friendly interface makes it easy to modify the parameters and to update the model.

3.2.2 Model parameters

The model parameters and variables are presented in the following Table. 3.1, with the channel material of graphene.

3.2.3 Model hierarchy

The hierarchy of the developed electrical model is illustrated by Fig. 3.2. The main physical equations are mathematically described with Verilog-A language. Parameters and constants are feed into this model. An MTJ state writing voltage and an injection current are feed into the MTJ model and the injector model, respectively. With the writing voltage, the MTJ state can be determined, then the state information is transported to the detector with the spin current polarized by the injection current. The output state of MTJ is then switched if the spin detection current is above the threshold current, with the delay. The injected
Table 3.1 – ASL device parameters.

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Parameter (unit)</th>
<th>Description</th>
<th>Default value</th>
<th>Range</th>
</tr>
</thead>
<tbody>
<tr>
<td>Global</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>W (nm)</td>
<td>Device width</td>
<td></td>
<td>40</td>
<td>[25, 50]</td>
</tr>
<tr>
<td>T0 (K)</td>
<td>Temperature</td>
<td></td>
<td>300</td>
<td></td>
</tr>
<tr>
<td>TMR(b)</td>
<td>TMR ratio with 0 Vwrite</td>
<td>1.005</td>
<td></td>
<td></td>
</tr>
<tr>
<td>a3</td>
<td>Ferromagnetic damping factor</td>
<td>0.027</td>
<td></td>
<td>[0.007, 0.027]</td>
</tr>
<tr>
<td>Hk2 (A/m)</td>
<td>Perpendicular anisotropy field</td>
<td>270 × 10^3</td>
<td></td>
<td>[22]</td>
</tr>
<tr>
<td>M2 (A/m)</td>
<td>Perpendicular magnetization</td>
<td>1.1 × 10^3</td>
<td></td>
<td>[22]</td>
</tr>
<tr>
<td>ρ (T/m)</td>
<td>Ferromagnetic resistivity</td>
<td>2.6 × 10^-6</td>
<td></td>
<td></td>
</tr>
<tr>
<td>ΔP (nm)</td>
<td>Perpendicular spin diffusion length</td>
<td>0.2</td>
<td></td>
<td></td>
</tr>
<tr>
<td>FP</td>
<td>Ferromagnetic spin polarization</td>
<td>0.5</td>
<td></td>
<td>[0.0, 0.99]</td>
</tr>
<tr>
<td>State</td>
<td>MTJ state representation</td>
<td>&quot;0&quot; (&quot;0&quot;) + &quot;1&quot; (&quot;AP&quot;)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lf (nm)</td>
<td>MTJ length</td>
<td>40</td>
<td></td>
<td>[25, 50]</td>
</tr>
<tr>
<td>tip (nm)</td>
<td>MTJ oxide barrier height</td>
<td>0.85</td>
<td></td>
<td>[0.8-1.5]</td>
</tr>
<tr>
<td>RAP (Ωm²)</td>
<td>MTJ resistance area product</td>
<td>1</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Contact</th>
<th></th>
<th>Contact type selection:</th>
<th>1 (with TB)</th>
<th>1.0</th>
</tr>
</thead>
<tbody>
<tr>
<td>ι</td>
<td>Contact spin polarization resistance</td>
<td>0.5</td>
<td>[0.0, 0.09]</td>
<td></td>
</tr>
<tr>
<td>RAP2 (Ωm²)</td>
<td>Contact resistance area product</td>
<td>100 (with TB)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Ground</th>
<th></th>
<th>Metal electrode thickness</th>
<th>5 [250]</th>
<th>X</th>
</tr>
</thead>
<tbody>
<tr>
<td>kp (W/m-K^-1)</td>
<td>Metal electrode thermal conductivity</td>
<td>22 [250]</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>ksub (W/m-K^-1)</td>
<td>Substrate thickness</td>
<td>100 [250]</td>
<td>-</td>
<td></td>
</tr>
<tr>
<td>Rsub (m²/K-W^-1)</td>
<td>Substrate thermal conductivity</td>
<td>1.4 [SiO2] [250]</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>Tbd (K)</td>
<td>G/N material breakdown temperature</td>
<td>875 [250]</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>k (W/m-K^-1)</td>
<td>G/N material thermal conductivity</td>
<td>100 (graphene) [250]</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>vp (m/s)</td>
<td>G/N material Fermi velocity of electrons</td>
<td>6.85 (graphene)</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>D2 (m²/s)</td>
<td>G/N material electron diffusion coefficient</td>
<td>0.02 (graphene)</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>lN (nm)</td>
<td>G/N material thickness</td>
<td>&quot;1&quot; (graphene)</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>Rsub/N</td>
<td>G/N material resistivity</td>
<td>2.86 × 10^3 (graphene)</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>R/N (ΩP)</td>
<td>G/N material spin diffusion length</td>
<td>1</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>T/N (Ωm²)</td>
<td>G/N material selection</td>
<td>1 (semi); 0 (metal)</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>tsub (nA)</td>
<td>Metal G/N breakdown current density calculation factor</td>
<td>5.1 × 10^-4 (copper) [56]</td>
<td>X</td>
<td></td>
</tr>
</tbody>
</table>

| Channel | Channel length | 90 | < 0.8λsN |

1 Parameters used to calculate the breakdown current density for semiconductor material.
2 Parameters used to calculate the spin diffusion time.
3 Parameters used to calculate MTJ spin transfer torque and TMR effects.
4 Parameters are fixed in this model.
5 Parameters depend on the material.
6 The unit of the graphene resistivity is [Ω] and the graphene resistance is calculated as R = ρN × L/N instead of R = ρN × L/N. We thus arbitrarily set the graphene thickness to "1". The resistivity unit of other materials is [Ωm]; the material thickness is thus set to the actual one.

The compact model has been implemented in Cadence using Verilog-A. Table 3.1 represents the 5 symbols corresponding to the following ASL devices and their parameters: Injector and Detector, contacts CTB and C FM-N (Tunnel barrier and FM-N interface), ground lead G and channel N. Each block describes the current-voltage relations of the device, based on the equations previously described. Injector and Detector also take into account the spin torque switching effect and N integrates the spin diffusion and channel breakdown effects. The following details the 5 blocks:

- "Injector" integrates a resistance tunneling model, an STT model, and a spin injection model. The state of an MTJ depends on the voltage source V write connected to terminals "1" and "2". The MTJ state is output on terminal Sout, taking into account the switching delay. The output is represented as a voltage signal: "V=0V" and "V=1V" correspond to parallel and anti-parallel state respectively. Once the MTJ state has been configured, an injection current I inj is injected into the channel from the MTJ free layer through the terminal "I inj". Based on the integrated spin injection effect (Eqs. 3.9 and 3.10), a charge current and a spin current will be output through terminals "outing" and "outs".

- "C" corresponds to the contact model, which can be implemented with or without...
tunnel barrier TB ($C_{TB}/C_{FM-N}$). The two input terminals “inc” and “ins” represent the input charge and spin currents. Terminals “outc” and “outs” represent the output charge and spin currents.

- “G” and “N” correspond to the ground and channel model respectively. Part of the charge and spin currents outputted by the contact flows into the ground while the remaining part flows into the channel, where it will propagate until reaching a detector. The breakdown current model and the spin diffusion delay model are integrated into these models.

- “Detector” corresponds to block able to switch an MTJ state according to the current flowing through a contact, which integrates the spin circuit model and the STT model. With the spin current through the terminal ins above the threshold current $I_{th}$, the “State” terminal is switched to 0V (parallel) or 1V (anti-parallel) depending on the injection current polarity and the input MTJ state. The state can be read by applying a voltage source $V_{read}$ to terminals “T1” and “T2” and is output to “Sout” terminal in the form of the voltage signal, depending on the MTJ resistance.

Fig. 3.3 shows the symbols of the developed ASL model in Cadence platform.

### 3.3 Results

In this section, we validate the integrated spin injection/detection effects by comparing with three experimental results of different channel materials. Performance analysis with different parameters and the scaling effects are exploited, giving a basis for performance improvement.
The simple ASL device in Fig. 3.3 is implemented and simulated based on the developed model and results verifies its functional behaviour as inverter/buffer.

### 3.3.1 Model validation

In order to evaluate the accuracy of the compact model, we compare the simulation results with the experimental results of different channel materials: metal (Mg and Cu) and graphene.

![Figure 3.4](image)

**Figure 3.4** — (a) Simulation and characterization results $\Delta R_{NL}$ comparison for channels implemented with Mg [115] and Cu materials [105]. (b) Spin resistance difference $\Delta R_{NL}$ comparison of the trilayer-graphene/MgO/Py junction between compact model and experimental result.

In order to setup the simulation environment, we first tune the compact model in order to match with characterization results. For this purpose, we simulate the spin resistance $\Delta R_{NL}$ between the parallel and anti-parallel configurations at room temperature, which is approximately twice the spin resistance $R_{NL}$ (Eqs. 3.22 and 3.23).

- **Metal**

  The experimental data have been reported in the literature for Py/Mg (ferromagnet/channel) [115] and Py/Cu [105] materials. By adjusting the spin polarization to 0.58 (resp. 0.37) and the channel spin diffusion length to 205 nm (resp. 320 nm) for Py/Mg (Cu) material, the simulation results are well aligned with characterization results, as illustrated in Fig. 3.4 (a).

- **Graphene**
In the case of a graphene channel [124], the spin polarization \( P_C \) here is 0.038 and the spin diffusion length \( \lambda_{SN} \) of the graphene channel is 1.5 \( \mu m \). Fig. 3.4(b) shows that our model is well fitted with the experimental data extracted from [124].

In conclusion, by comparing with different experiments of different channels, our compact model is validated and can be used in a general case with adjustable parameters. This would provide a preliminary estimation of the performance of the logic circuits.

### 3.3.2 ASL device performance analysis

ASL is used to implement logic circuits. The performance dependence of different parameters needs to be evaluated to optimize the circuit design. In this subsection, we determine certain performance criteria and analyze their dependence on different parameters in a circuit design.

#### performance-parameter dependence analysis

An ASL device integrates the STT, TMR effects of MTJ model which are related to the switching and the spin injection/detection effects related to the spin current detection. Hence, for an ASL-based circuit, the speed and energy consumption are two important performance criteria, which we will evaluate and analyze in this subsection.

![Figure 3.5 – Performance dependence of the parameters of ASL device.](image)

Depending on the physical equations in section 3.1, we determine several device parameters and intermediate parameters which influence the device speed and energy consumption:

- The global device width \( W \), related to the scaling effect and is the most important parameter in an ASL device.

- STT model parameters: damping factor \( \alpha \) and the thermal factor \( \Delta \), which are related to the switching critical current \( I_{c0} \), hence the switching delay \( t \).

- Spin injection/detection model parameters: the channel length \( L_N \), channel diffusion length \( \lambda_{SN} \), contact polarization \( P_C \) and contact resistance area product \( R_{AC} \). Integrated into the spin injection/detection model, these parameters influence the spin injection/detection efficiency \( P_{inj/eff} \). With a given injection current \( I_{inj} \), the detection current devoted to spin transfer torque effect \( I_{det} \) depends on the \( P_{inj/eff} \) and determines the switching delay \( t \) with the critical current \( I_{c0} \). Moreover, based on the energy equation \( E = I_{inj}^2 R t \), the product of \( I_{inj} \), the device resistance \( R \) and the current pulse gives the values of energy consumption.
Fig. 3.5 illustrates the interrelations between the performance and intermediate criteria ($P_{inj/eff}$, $I_{sdet}$, $I_{o0}$, Delay $t$ and Energy) and ASL parameters ($W$, $\alpha$, $\Delta$, $R_{AC}$,$P_C$, $L_N$, $\lambda_{sN}$ and $I_{inj}$). Besides these performance criteria, our model also integrated the channel breakdown current density $J_{BR}$, related to the channel width $W$ and channel length $L_N$. With an determined $J_{BR}$ and given device parameters, a maximum injection current $I_{inj}$ exists to prevent the channel damage. In the following, we will analyze the performance dependence on different groups of parameters and the channel breakdown current density $J_{BR}$ based on this schema.

### 3.3.2.1 Channel breakdown current density $J_{BR}$

Fig. 3.6 reports the estimated breakdown current according to the length and the width of the channel. Here we use the graphene as an example and calculate the break down current density based on Eq. 3.19. The parameters for the calculation are shown in Table. 3.1. It is worth mentioning that the parameters $k$ and $R_{Cox}$ change with the device width. In the calculation, we use the experimental values from [250].

The area located under a line corresponds to current density values leading to a channel working properly. The area located above a line corresponds to current density values exceeding the breakdown current, which is likely to damage the channel due to Joule self-heating or electromigration damages effects (for semi-conductor and metal materials respectively). The larger the channel area, the smaller $J_{BR}$. This is due to the lateral 3D heat spreading into the substrate, the contacts and along the graphene channel. The heat transfer depends on the thermal conductance and hence on its length and width: a small channel length/width leads to high thermal conductance, which contributes to maintaining the heat spreading along the graphene and into the contacts [251].

![Graph](image)

Figure 3.6 – Channel breakdown current density $J_{BR}$ according to channel length $L_N$ and channel width $W$.

### 3.3.2.2 Scaling effect and STT effect of MTJ model

As presented in the previous sections, the device width is an important parameter influencing not only the STT effect but also the spin injection/detection effect. In the following, we analyse its influences on the performance $t$ and Energy and the inter-parameters $\alpha$, $I_{o0}$, $\Delta$ and the $K_{eff}$.

Fig. 3.7 (a) shows the delay dependence of the device width $W$. In this simulation, only the value of device width $W$ changes, and we suppose the other parameters which vary with $W$, like $\alpha$, do not change, the same as shown in Table. 3.1. The delay and critical current $I_{o0}$ are calculated by using the LLG equation, considering the demagnetization effect. The channel length is set to 190 nm, considering that the maximum value of $W$ in our simulation is 50 nm and the minimum distance of the two MTJs to avoid the dipolar interaction is 10
Figure 3.7 – Performance dependence of channel width W and inter-dependence of STT parameters. (a) Delay dependence of the width, with the other parameters constant; Inset shows the thermal factor $\Delta$ and the critical current $I_{c0}$ dependence of the device width; (b) Critical current and delay dependence of the device width, with the thermal factor $\Delta$ fixed at 80, by changing the thickness of the free layer $t_F$; Inset shows the corresponding thickness of the free layer and the $K_{eff}$ with different widths; (c) Critical current and delay dependence of the device width, with the thermal factor $\Delta$ fixed at 40, by changing the thickness of the free layer $t_F$; Inset shows the corresponding thickness of the free layer and the $K_{eff}$ with different widths; (d) Delay dependence of the damping factor $\alpha$.

The injection current is set to 0.5 mA, based on the breakdown current calculation, to avoid the channel breakdown. From the figure, we can see that the delay increases with $W$, since the two most important factors in delay calculation (Eq. 3.5): the critical current $I_{c0}$ and the thermal factor $\Delta$ increases with $W$, as shown in the inset.

In fact, the thermal factor determines the lifetime of the data stored in MTJ and influence the critical current $I_{c0}$ and the switching time $t$. In a circuit design, $\Delta$ is generally fixed to guarantee the retention time. Fig. 3.7(b) and (c) shows the delay and critical current $I_{c0}$ dependence of the device width in two cases: $\Delta = 80$ and $\Delta = 40$. In these two cases, we fix the value of thermal factor $\Delta$ by changing the thickness of the free layer of MTJ $t_F$. The $K_{eff}$ and $t_F$ variations with the width $W$ in two cases are shown in the insets, respectively. By comparing these two figures, we can conclude that with a smaller $\Delta$, the MTJ is easy to be switched with a smaller critical current $I_{c0}$, which leads to a smaller delay $t$.

Fig. 3.7 (d) shows the delay dependence of the damping factor $\alpha$. In this simulation, we suppose the device width $W$ and the injection current $I_{inj}$ are fixed. According to Eq. 3.3, the critical current $I_{c0}$ increases with $\alpha$, which will lead to the increase of switching time, as illustrated in the figure.

In conclusion, to improve the ASL device performance, a smaller device width $W$ with a smaller thermal factor $\Delta$ and a smaller damping factor $\alpha$, which are related to the critical
current $I_d$ and the STT effect, are needed in the circuit design.

### 3.3.2.3 Spin injection/detection effect

In a spin injection/detection model, with an injection current $I_{inj}$, a spin current with the MTJ magnetization orientation information is generated and transported into the detector to switch the output MTJ state. The value of the detection current $I_{det}$ is related to the spin injection/detection efficiency $P_{inj/eff}$, and will influence the switching time $t$ according to Eq. 3.5. In this subsection, we will analyze the performance dependence on several parameters integrated into the spin injection/detection model: the tunnel barrier parameters ($RAC$ and $P_C$), the ground parameters ($RG$) and the channel parameters ($LN$ and $\lambda_N$). Moreover, since our compact model also integrates the channel breakdown current $J_{BR}$, in this subsection, we will present how the $J_{BR}$ is used in our model to prevent the channel damage.

**Tunnel barrier** In an ASL device, the resistance mismatch between the ferromagnet and the channel will impede the spin current injected into the channel, which will decrease the spin injection efficiency. To mitigate this problem, a tunnel barrier is added between the ferromagnet and the channel. However, a tunnel barrier added in the detector will impede the spin currents to flow into the ferromagnetic detector. Fig. 3.8 (a) shows the dependence of the spin injection efficiency on the tunnel resistance area product in a symmetric structure. At first, $P_{eff}$ increases with the $RAC$, because the mismatch problem is restrained. Once the $RAC$ passes a certain value, the inhibiting effect in the detector is advantageous and $P_{eff}$ decreases with $RAC$. In our device, the value of the optimized $RA$, based on the spin injection efficiency $P_{eff}$ optimization, is about 10 $\Omega \mu m^2$, with the default values of the device parameter shown in Table. 3.1.

![Graphs showing spin injection efficiency $P_{eff}$ vs. $RAC$](image1)

Figure 3.8 – (a) Spin injection efficiency $P_{eff}$ VS. Resistance area product of the tunnel barrier $RAC$ in a symmetric structure, with tunnel barriers added in both of the injector and detector. (b) Spin injection efficiency $P_{eff}$ versus resistance area product of the tunnel barrier $RAC$ of the injector in an asymmetric structure, with the tunnel barrier added only in the injector. (c) Delay and energy dependence of the tunnel resistance area product $RAC$; a $RAC$ exists for the minimum energy, in this case, $RAC$ equals to $4e^{-11}$ $\Omega \mu m^2$. (d)Spin injection efficiency increases with the ground resistance while resistances of the other parts are constant.

However, actually, it could be difficult to fabricate the proper tunnel barrier with the optimized value. According to the different effects of the tunnel barrier in the injector and
detector, an asymmetric structure can be designed, where the tunnel barrier is added only in the injector. Fig. 3.8 (b) shows the dependence of $P_{eff}$ on the $RA_C$ of the injector in an asymmetric structure. It is shown that the $P_{eff}$ increases with the $RA_C$ until it reaches its maximum value.

From these two simulations, we can conclude an asymmetric structure with a tunnel barrier inserted in the injector leads to a large spin detection efficiency. However, a large tunnel barrier will increase the energy consumption. Hence, in a circuit, a tradeoff between spin detection efficiency and energy consumption should be simulated for the performance optimization.

Fig. 3.8 (c) shows the delay and energy dependence of the resistance area product of the tunnel barrier $RA_C$. In this figure, with the values of the parameters shown in Table. 3.1, the delay decreases with the $RA_C$, because of the enhancement of spin injection efficiency and thus the spin detection current. The energy consumption first decreases with $RA_C$ and then increases with $RA_C$. Because first, the injection efficiency enhancement is dominant compared with the increase of resistance. When $RA_C$ exceeds $4e^{-12}$ in this case, the increase of resistance predominates in energy consumption compared with the increase of the injection efficiency, which leads to the increase of the energy consumption.

The tunnel barrier conductance polarization $P_C$ determines how many charge currents will be polarized into spin currents. It is obvious that the bigger $P_C$ is, the more spin will be polarized into, thus a better spin injection/detection efficiency $P_{inj/eff}$ and a better performance. Fig. 3.10 in the following illustrates this tendency.

**Ground** In our ASL device, a ground lead is placed near to the injector to enhance the non-reciprocity of the ASL device. Since the compact model developed is based on the spin circuit concept, the value of the ground resistance will influence the backflow spin current and the spin current in the channel. Fig. 3.8 (d) shows the dependence of the spin detection efficiency $P_{eff}$ on the value of the ground resistance in a single ASL device. As we can derive from the spin circuit model, with a larger ground resistance, the spin detection efficiency will increase and the backflow amount will recede.

![Image of graph showing delay and channel spin current $I_{inj}$ according to the injection current $I_{inj}$ and channel lengths $L_N$.](image)

**Figure 3.9** - Delay and channel spin current $I_{inj}$ according to the injection current $I_{inj}$ and channel lengths $L_N$. For each channel length, the breakdown current is labeled on $I_{inj}$ curves. The following defines i) the maximum injection current, ii) the corresponding spin injection current and iii) the delay according to the channel length: (1.9 mA, 803 μA, 0.292 ns) for 100 nm, (1.587 mA, 581 μA, 0.4164 ns) for 200 nm, (1.565 mA, 509 μA, 0.5039 ns) for 300 nm, (1.63 mA, 478 μA, 0.5586 ns) for 400 nm, (1.72 mA, 463 μA, 0.6108 ns) for 500 nm. Inset gives the spin diffusion delay $t_{diff}$ according to $L_N$. 

41
Channel  In the previous analysis, we have analyzed the channel breakdown current integrated into the compact model with the variations of the channel length and channel width. Based on the channel breakdown current density, ASL device has a maximum injection current limit to prevent the channel damage, and thus the smallest delay. Fig. 3.9 shows the relationship between the injection current and the delay with different channel lengths, considering the channel breakdown density. For each channel length $L_N$, with different injection currents, the spin injection current into the channel $I_{\text{inj}}$ and the switching delay $t$ can be calculated based on the compact model. Breakdown currents are calculated for each channel length and are compared with $I_{\text{inj}}$ to determine if the channel is damaged or not. In the figure, the plain line corresponds to injection current value respecting the breakdown current constraints while dashed line represents cases for which channel is likely to be damaged. For instance, for $L_N = 100$ nm, the breakdown current is estimated to be 803 µA, which corresponds to a maximum value of 1.9 mA for $I_{\text{inj}}$.

The inset represents the spin diffusion delay $t_{\text{diff}}$ according to $L_N$ and for $W = 40$ nm. Results show that $t_{\text{diff}}$ approximates 1 ps range, which can be neglected considering to the MTJ switching delay (100 ps to few ns for $W = 40$ nm). It is worth noticing that the diffusion delay is expected to play a significant role in the total delay as MTJ fabrication technology will gain in maturity.

![Figure 3.10](image)

Figure 3.10 – (a) Delay dependence of the spin diffusion length of the channel $\lambda_{SN}$ with different spin polarizations of the tunnel resistance $P_C$; Inset shows the dependence of the spin detection efficiency on $\lambda_{SN}$, with different values of $P_C$; (b) Delay dependence of the channel length $L_N$, with different spin polarizations of the tunnel barrier $P_C$; Inset shows the dependence of the spin detection efficiency on $L_N$, with different values of $P_C$.

Fig. 3.10 (a) and (b) show the delay dependence of the channel diffusion length $\lambda_{SN}$ and channel length $L_N$. We can see that the delay decreases with the spin diffusion length and increases with the channel length. Because the spin current will attenuate in the channel with a factor of $\exp(-L_N/\lambda_{SN})$. A longer spin diffusion length and a shorter channel length will reduce the spin attenuation in the channel, which in another way increases the spin detection efficiency $P_{\text{eff}}$, as shown in the insets of these two figures.

In conclusion, for the spin injection model in the circuit design, a shorter channel length $L_N$, longer channel spin diffusion length $\lambda_{SN}$, larger tunnel conductance polarization $P_C$, can lead to an improved performance. While for the tunnel barrier resistance area product $RAC$, a compromised value needs to be simulated for an optimized performance.

### 3.3.3 Inverter/Buffer simulation

The developed model is validated by comparing with two experimental results in Subsection 3.3.1. By using this model, we can implement arbitrary logic/analog circuits. Fig. 3.11 shows
Figure 3.11 – Simulation of ASL based Inverter/buffer. $V_{\text{write}}$, $S_{\text{ins}}$, $I_{\text{inj}}$ and $S_{\text{out}}$ are the writing voltage, input state, injection current and output state, in Fig. 3.3. $I_{\text{sinj}}$ and $I_{\text{sdet}}$ are the injected and detected spin current, corresponding to $\text{outs}$ of $\text{Injector}$ and $\text{ins}$ of $\text{Detector}$ in Fig. 3.3.

the simulation results of the simplest Inverter/Buffer circuit. The values of the parameters used in this simulation are presented in Table. 3.1. By applying a voltage source $V_{\text{write}}$ through MTJ, the MTJ state $S_{\text{in}}$ can be obtained, which is the input state of the Inverter/Buffer. To transport the information of the MTJ state, a charge current $I_{\text{inj}}$ of 1.9 $\text{mA}$ is applied through the MTJ free layer and channel junction, polarizing a spin current $I_{\text{sinj}}$ of 912 $\mu\text{A}$ into the channel. The spin accumulation $I_{\text{sdet}}$ of 868 $\mu\text{A}$ with MTJ magnetization information through the channel will arrive in the detector MTJ and switch the MTJ state if enough torque is applied to the MTJ. The switched state depends on the injector MTJ magnetization and the injection current $I_{\text{inj}}$ polarity. As shown in the figure, we suppose that a negative charge current will realize the Buffer function and a positive charge current will realize the Inverter function. This property can be used to design the reconfigurable/programmable circuits. The delay, in this case, is 0.65 $\text{ns}$, considering the read time of MTJ state and the injection current pulse.

3.4 Summary

In this chapter, we have presented the physical model of the ASL device: i) MTJ model integrating the TMR effect and STT effect, ii) spin injection/detection model relying on extended Maxwell’s equations in the spin domain and integrating the spin injection/detection/accumulation effects, the spin diffusion effect and the channel breakdown current effect. Moreover, the temperature evaluation in MTJ model and the scaling effect on thermal stability, critical current, switching time and the interconnection, are investigated. With an injection current, the switching time, channel diffusion delay, and the detection current can be calculated. After validating the model by comparing with experimental results, we have investigated the performance dependence of device characteristics such as width, channel length. Furthermore, the model has been programmed with Verilog-A on Cadence and divided into several
independent blocks, which allows the independent design of complex circuits and eases the design of hierarchical circuits which we will design in the next chapter. In addition, the expressions for spin injection/detection are explored, which enable to discuss the controversy of the ASL experiments and provide a basis for circuit optimizations.
Chapter 4
Circuit Design and Simulations

<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.1</td>
<td>Background and Related Work</td>
<td>45</td>
</tr>
<tr>
<td>4.1.1</td>
<td>Majority principle</td>
<td>45</td>
</tr>
<tr>
<td>4.1.2</td>
<td>Circuit synthesis method</td>
<td>46</td>
</tr>
<tr>
<td>4.1.3</td>
<td>Benchmarking</td>
<td>49</td>
</tr>
<tr>
<td>4.2</td>
<td>Circuit Design Method</td>
<td>50</td>
</tr>
<tr>
<td>4.3</td>
<td>Logic Circuits Simulations and Evaluations</td>
<td>52</td>
</tr>
<tr>
<td>4.3.1</td>
<td>Basic logic circuit</td>
<td>52</td>
</tr>
<tr>
<td>4.3.2</td>
<td>Arithmetic logical functions</td>
<td>59</td>
</tr>
<tr>
<td>4.3.3</td>
<td>Data transmission</td>
<td>75</td>
</tr>
<tr>
<td>4.3.4</td>
<td>Arbitrary circuit</td>
<td>86</td>
</tr>
<tr>
<td>4.4</td>
<td>Circuit Benchmarking</td>
<td>87</td>
</tr>
<tr>
<td>4.5</td>
<td>Summary</td>
<td>88</td>
</tr>
</tbody>
</table>

All Spin Logic (ASL) device is a promising technology due to its potential for low power and high-density computation. Thanks to its spin nature, it has the potential to replace conventional charge-based technology such as CMOS. However, new circuits and architectures are needed. This is a challenging task due to the numerous physical parameters to consider and the lack of tools. In this chapter, we propose a methodology allowing to design ASL-based circuit specifying the circuit parameters in layout level with the given materials and the constraints. By using this methodology, a library of ASL logic circuits has been defined and circuits energy, area and delay are estimated based on the developed ASL compact model in Chapter 3.

4.1 Background and Related Work

In an ASL device, the operation of the spin current follows the majority principle and the ASL-based circuit design is based on the majority-synthesis method. In this section, we present the ASL-based circuit synthesis method: “truth table” method [154] for simple circuits, and “AND/OR/Inverter (AOI) replacement” method [158, 259] for integrated circuits. We also introduce the benchmarking methodology [156] and the designed circuits are evaluated for high-level circuit/system evaluation.

4.1.1 Majority principle

As presented in 2.2.1, the information of the magnetization orientation is transmitted in the form of spin current. When speaking of the operation of multiple spin currents in a
channel, the spin currents operate the addition or subtraction depending on their polarization orientations (upspin or downspin). One spin direction will dominate the state switching of the detector MTJ, which is the same with the majority principle. Hence, the ASL-based circuit design follows the majority principle.

Fig. 4.1 shows a 5-inputs majority gate, with five inputs: \( \text{In}_1, \text{In}_2, \text{In}_3, \text{In}_4, \text{In}_5 \) and one output: \( \text{Out} \). Each terminal state is initially written by a voltage source \( V_{\text{write}} \). An injection current \( I_{\text{inj}} \) is applied to the input terminal to transmit the magnetization orientation information. After the summation or subtraction of each other, the cumulative spin current in the channel switches the output MTJ state if it is larger than the threshold current. The other-bit inputs majority gates are implemented in the same way with adding or reducing the number of input terminals. Fig. 4.1 presents its symbol with the terminals connected to the voltage and current sources: \( \text{inj}^+ (\text{P}) \) for positive current, \( \text{inj}^- (\text{N}) \) for negative current and “0” means no current is injected. In the following circuit design, we use the symbols of the majority gates to illustrate the circuits implementation for simplicity.

![5-inputs majority gate with inputs In1, In2, In3, In4, In5, output Out and its symbol presentation.](image)

### 4.1.2 Circuit synthesis method

ASL-based circuit design follows the majority principle. The circuits need to be synthesized with majority functions and implemented with majority gates. In this subsection, we use the example of an XOR3 circuit to present two synthesis methods: the “truth table” method based on the truth table of the circuit function, and “AOI replacement” method replacing the circuit Boolean function with the combinations of the basic circuits majority functions.

#### 4.1.2.1 "Truth table" method

The “truth table” method is used for the synthesis of simple circuits and synthesizes the circuit from its transformed truth table. Followings present the procedure of this method:

1. Making \( F \) into an Logically Passive Self Dual (lpsd) \([154]\) by adding sufficient complements and constants so that for any two rows, \( r_i \) and \( r_j \), there exists a column, \( x_k \) such that \( r_{ik} = F(r_i) \) and \( r_{jk} = F(r_j) \).

2. Obtain the unitized table (All \( F_s = 1 \)) for \( F \) by complementing every row where \( F \) is 0 and removing the \( F \)-column.

3. Eliminate any column whose removal does not violate the basic property that every pair of rows have a 1 in common.

4. If \( r_i < r_j \) (iff \( r_j \) has 1’s everywhere that \( r_i \) does), remove row \( r_j \).
5. Realization with any bit-input majority gates.

- Select any column, \( x_k \), let \( z \) be the number of zeros in this column; let the rows where \( x_k = 0 \) be \( r_1, r_2, \ldots, r_Z \); and let the number of units in these rows be \( u_1, u_2, \ldots, u_Z \).
- Build a \((2u_1 - 1)\) - input majority gate. AS inputs use the \( u_2 \) variables whose columns have units in \( r_1 \) plus \( x_k \) on the remaining \( u_1 - 1 \) inputs. Call the output of this gate \( m_1 \).
- Build a \((2u_2 - 1)\) - input majority gate. As inputs use the \( u_2 \) variables whose columns have units in \( r_2 \) plus \( m_1 \) on the remaining \( u_2 - 1 \) inputs. Call the output of this gate \( m_2 \).
- Continue in this manner until a “chain” of \( Z \) gates corresponding to \( r_1, r_2, \ldots, r_Z \) has been built. This chain of gates will realize the given function.

There may be some “interplay” between steps 3 and 4 in that the elimination of a column may permit more rows to be eliminated and vice-versa. Therefore, these two steps should be repeated successively until neither results in further elimination.

<table>
<thead>
<tr>
<th>Input</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>( A )</td>
<td>( B )</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Input</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>( A )</td>
<td>( B )</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

The full-adder (XOR3) circuit function is synthesized with the previous procedure, with two outputs \( C_{out} \) and \( \text{Sum} \).

1. Consider the truth table of the full-adder shown in Table. 4.1, first step is to verify if this function is an Ipsd or not. Based on the theorem, \( C_{out} \) is an Ipsd, while \( \text{Sum} \) is not and is violated by some pairs of rows, e.g. (2,3/4/5/6), (3,4/5/7), (4,6/7), (5,6/7) and (6,7). To make \( \text{Sum} \) an Ipsd, we add the inversion of \( C_{out} \) and the truth table becomes Table. 4.2.

2. Second step replaces each row where the output is to be 0 by its complement. By removing the same rows in these two tables, the final unitized tables for \( C_{out} \) and \( \text{Sum} \) are shown in Table. 4.3 and 4.4 respectively.

3. Considering step 3 and step 4, for \( C_{out} \), no column and row can be removed and Table. 4.3 is the final unitized table; for \( \text{Sum} \), the only unit in the column \( C_{out} \) is in the 1st row and this case is covered by \( A/B/C_{in} \), thus the column \( C_{out} \) can be removed, as shown in Table. 4.5.

4. Based on Tables. 4.3 and 4.5, we synthesize their majority functions.
Table 4.3 – Unitized truth table for $C_{out}$.

<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>$C_{in}$</th>
<th>$C_{out}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Table 4.4 – Unitized truth table for $Sum$.

<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>$C_{in}$</th>
<th>$C_{out}$</th>
<th>$\overline{C_{out}}$</th>
<th>$Sum$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

For $C_{out}$, we select the column $C_{in}$ and the number of zeros in this column is 1 and the number of units is 2. Hence, $C_{out}$ can be synthesized with one $3(2 \times 2 - 1)$-input majority gate: $C_{out} = MAJ(A, B, C_{in})$.

For $Sum$, we select the column $\overline{C_{out}}$ and the number of zeros in this column is 1 and the number of units is 3. Hence, $Sum$ can be synthesized with one $5(3 \times 2 - 1)$-input majority gate: $Sum = MAJ(A, B, C_{in}, \overline{C_{out}}, \overline{C_{out}})$.

4.1.2.2 "AOI replacement" method

For a function with a complex truth table, the “truth table” method is not suitable. In this case, the function $F$ is synthesized with the “AOI replacement” method, by replacing the AND/OR/Inverter functions with the corresponding 3-inputs majority functions. The majority representation of $F$ can be optimized based on the novel Boolean algebra of majority and inverter operations. The novel Boolean algebra [158] is defined over the set $(\mathbb{B}, M, ', 0, 1)$, where $M$ is the majority operator of the three variables and $'$ is the complementation operator. The following set of five primitive transformation rules, referred to as $\Omega$, is an axiomatic system for $(\mathbb{B}, M, ', 0, 1)$. All the variables considered hereafter belong to $\mathbb{B}$.

$$
\begin{align*}
\Omega & \quad \text{Commutativity} - \Omega.C \\
& \quad M(x, y, z) = M(y, x, z) = M(z, y, x) \\
\text{Majority} - \Omega.M & \quad \left\{ \begin{array}{l} 
if (x = y) : M(x, y, z) = x = y \\
if (x = y') : M(x, y, z) = z \\
\end{array} \right. \\
\text{Associativity} - \Omega.A & \quad M(x, u, M(y, u, z)) = M(z, u, M(y, u, x)) \\
\text{Distributivity} - \Omega.D & \quad M(x, y, M(u, v, z)) = M(M(x, y, u), M(x, y, v), z) \\
\text{Inverter Propagation} - \Omega.I & \quad M'(x, y, z) = M'(x', y', z') \\
\end{align*}
\tag{4.1}
$$

Several other complex rules, formally called theorems, in $(\mathbb{B}, M, ', 0, 1)$ are derived by $\Omega$ [158]. The following lists three particular rules ($\Psi$) for logic optimization. The symbol $z_{x/y}$ represents a replacement operation, say replace $x$ with $y$ in all its appearance in $z$.

$$
\begin{align*}
\Psi & \quad \text{Relevance} - \Psi.R \\
& \quad M(x, y, z) = M(x, y, z_{x/y'}) \\
\text{Complementary Associativity} - \Psi.C & \quad M(x, u, M(y, u', z)) = M(x, u, M(y, x, z)) \\
\text{Substitution} - \Psi.S & \quad M(x, y, z) = M(v, M(v', M_{v/0}(x, y, z), u), M(v', M_{v/0}(x, y, z), u')) \\
\end{align*}
\tag{4.2}
$$

Based on this novel Boolean algebra, the majority representation of a function $F$ can be transformed, with better figures of merit in terms of area, delay, and power.

Fig. 4.2 shows the depth optimization of the XOR3 function with two the theorems $\Psi.S$ and $\Omega.M$ [158]. The majority-based XOR3 function is $f = MAJ(x, MAJ(x', y, z), MAJ(x', y', z'))$. 

48
Figure 4.2 – XOR2/3 synthesized based on replacement method.

4.1.3 Benchmarking

In this subsection, we present the benchmarking method of ASL-based circuits in terms of area, power, and computational throughput [156]. The CMOS auxiliary circuits for voltage and injection current sources are not considered.

4.1.3.1 Area

Based on [156], the semiconductor process generations are labeled by characteristic lithography size called the DRAM’s half-pitch $F$. The pitch of metal-1 in the contacted transistor is supposed to be $p_m = 8\lambda = 4F$ where $\lambda$ is the maximum mask misalignment. Since the ASL-based circuit is based on majority principle, we approximate the length of the majority gate by:

$$l_{maj} = 2p_m \quad (4.3)$$

and the area of a majority gate is:

$$a_{maj} = l_{maj}^2 \quad (4.4)$$

The area of one circuit is determined by the number of majority gates required:

$$a = N_{maj}a_{maj}M_{bit} \quad (4.5)$$

where $M_{bit}$ is the bit area overhead.

4.1.3.2 Energy consumption

The energy for ASL-based circuits is calculated as:

$$E = I_{inj}^2 R t \quad (4.6)$$

where $I_{inj}$ and $t$ are the injection current amplitude and pulse width, $R$ is the real circuit resistance.

4.1.3.3 Computational throughput

The computational throughput is a measure of useful work performed by a circuit and is defined as a number of integer operations per second per unit area. We estimate it as:

$$T_{throughput} = \frac{1}{at} \quad (4.7)$$

where $a$ and $t$ are the area and delay of the circuit, respectively.
4.2 Circuit Design Method

In this section, we present the ASL-based circuit design methodology, from the synthesis of the circuit with majority functions, to the specifications of circuit parameters with the implementation based on the ASL compact model for the circuit layout.

Fig. 4.3 illustrates the circuit design methodology with 4 successive steps:

![Diagram of the Circuit Design Methodology]

**Figure 4.3 – Circuit design methodology based on ASL device.**

**Step 1:** Specification of i) the boolean functions and requirements of the circuit, and ii) ASL device parameters.

The circuit requirement specifications allow defining performance objectives such as power consumption, speed, and area. The ASL device parameters are technology-dependent and are thus determined by the considered materials.

**Step 2:** Circuit function synthesis.

The ASL based circuits are designed based on majority principle. Hence, the circuit to be implemented is specified as majority functions by using the majority synthesis method presented in the previous section. Basic logic circuits are synthesized with the “truth table” method, whereas large scale integrated circuits are synthesized with the “replacement” method by using the basic circuit library. Moreover, since the spin current attenuates in the channel, buffers need to be inserted in the channel to guarantee the functionality of the
circuit. The number of buffer inserted is related to the logic gate number and in this step we need to count this number for the buffer calculation.

**Step 3:** Search for injection currents and device parameters satisfying the system constraints.

We explore the technological parameters and the corresponding injection current in order to meet the system constraints. If the constraints are met and the circuits are optimized, the parameters are exported to Step 4 for system implementation. In this step, we explore the channel lengths and the MTJ width and length, the other parameters are specified in Step 1 with the materials.

**MTJ width/length:** the MTJ width and length are first specified based on the fabrication technology. In this paper, we assume that the minimum length of both width and length is 5 nm, which allows investigating the scaling of the device.

**Channel length:** we explore the multi-channel problem in this step.

Theoretically, the smaller the channel length $L_N$, the better the circuit performances. However, if $L_N$ is too small, a dipolar-coupling effects occurs [51], which reduces the spin current injection and increases the MTJ switching time. Hence, a minimal value for $L_N$ needs to be assumed in order to ensure there is no dipolar coupling. Following the conclusions from the OOMMF [51] [258], we assume the minimum value $L_N = 10 + W$ nm. Furthermore, circuits such as XOR (see Fig. 4.12 (a)) involve channel fork and join junctions, which lead to a multi-channel design problem. To solve this problem, the length of each channel must be defined. Since the channel lengths influence the spin current division and diffusion in the circuit, they have a direct impact on performances. As defined in Section 3.1.2.2, the circuit can be optimized by increasing the non-local resistance $R_{NL}(\Delta R_{NL})$ and the injection/detection efficiency $P_{inj/eff}$.

**Injection current:** For a given MTJ width and length and for a given multi-channel lengths, injection currents have to be defined in order to ensure circuit functionality and to evaluate performances. The injection currents are defined by taking into account the input weights in the circuit majority functions, the influence of the current division and diffusion and the channel breakdown current. For instance, for majority functions (4.10) of XOR/2/3, we set the same injection current for the three input terminals since weights for $I_{in1}$, $I_{in2}$, and $I_{in3}$ are the same. However, for some terminals, the injection currents do not follow the weight ratios between each terminal due to the current division and backflow current caused by spin diffusion. For instance, the injection current for $M1$ to reach $Out$ is not twice the $I_{injIn1/In2/In3}$, because of the current division in fork $P2$. Moreover, the total injected spin currents into the channel should remain below the channel breakdown current to prevent channel damage. Hence, simulations are carried out to determine the injection currents.

For injection current specification, we do not take into account the current ratio margins caused by the nature of majority principle. In a majority-based circuit, the state of output depends on the predominant spin magnetization orientation. The currents ratio may vary within a certain range as long as the predominant spin magnetization orientation in each case does not change. Taking the (see Fig. 4.12 (a)) as an example, state “1” on $M1$ is reached by specifying $In1$, $In2$, $In3$ to “011”. Ideally, the injection current should be the same for the three inputs. However, if we set, for instance, $I_{injIn1} = 0.8 \times I_{injIn2} = 0.8 \times I_{injIn3}$, the state of $M1$ remains “1” as long as detected current $I_{det}$ exceeds the critical current $I_{c1}$. Hence, the injection current ratios have margins, depending on the dimensional parameters, the injection current $I_{inj}$ and the critical current $I_{c1}$.

**Buffer count:** The buffer count depends on the buffer channel length. A longer channel length reduces the buffer count, but a higher charge current is needed for the diffusion loss in the channel. Hence, a compromised channel length needs to be found for an optimized performance. We will discuss this optimization in the next chapter.

**Step 4:** Implement the circuit/system.
According to the technological parameters and variables calculated in Step 3, the optimized system is implemented.

### 4.3 Logic Circuits Simulations and Evaluations

In this section, based on the developed methodology, we implemented the basic logic circuits and the combinational logic circuits (Fig. 4.4). Their architectures are presented, specifying the dimensional parameters. Their functional behaviors are simulated and verified based on the ASL compact model. Moreover, the performance is evaluated for high-level circuits and system evaluation.

![Diagram of basic logic circuits and combinational logic circuits](image)

**Figure 4.4 – Basic logic circuits and combinational logic circuits.**

#### 4.3.1 Basic logic circuit

The three main basic logic circuits: Inverter/Buffer, AND/NAND/OR/NOR, XOR/XNOR are implemented and analyzed in this subsection, which can be combined for an integrated circuit design.

##### 4.3.1.1 Inverter/Buffer

- **Architecture:**

  The inverter/buffer is realized with an simple ASL device, with one input terminal and one output terminal, as shown in Fig. 4.5(a).

  As shown in Table. 4.6, the inverter or buffer function is realized by using the injection currents with different polarities. The positive injection current, flowing from the MTJ free layer to the channel, will inject the spin with opposite magnetization orientation with that of the MTJ free layer, realizing the inversion; on the contrary, the negative injection current will realize the buffer function. Fig. 4.5 (c) shows the functional symbol of the inverter/buffer structure with \(I_{\text{inj}}\) as the input, \(Out\) as the output and \(I_{\text{inj}}\) as the control signal.
Figure 4.5 – (a) Inverter/Buffer architecture, In as the input and Out as the output. Positive current: flowing from the MTJ free layer to the channel, induces an opposite spin magnetization orientation, realizing the inversion; on the contrary, negative current realizes the buffer function. (b) Vertical view of the architecture with the channel length. (c) Functional symbol of the inverter/buffer, In as the input and Out as the output, I_{ij} as the control signal.

Table 4.6 – Reconfigurable Functions Based on inverter/buffer architecture.

<table>
<thead>
<tr>
<th>Function</th>
<th>I_{inj}</th>
</tr>
</thead>
<tbody>
<tr>
<td>Buffer</td>
<td>N</td>
</tr>
<tr>
<td>Inverter</td>
<td>P</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>L_N</td>
<td>100 nm</td>
</tr>
<tr>
<td>I_{inj}</td>
<td>1.9 mA</td>
</tr>
</tbody>
</table>

- Parameters:
  The parameters need to be set for this architecture are the channel length and the injection current. As shown in Table 4.7, the channel length needs to be set to guarantee that there is no dipolar coupling between the input and output magnets. In our case, we set the channel length to 100 nm. The set of the injection current needs to follow the principle of no channel breakdown. With the parameters in Table 3.1 and the channel length of 100 nm, the maximum injection current is simulated as 1.9 mA.

- Timing: The input In state is written by applying a voltage/current source across MTJs with a “written delay” of T_{write}. Then an injection current I_{inj} is injected through the MTJ free layer to the channel, producing a spin current diffusing towards the detector MTJ, with a “propagation delay” of T_{diff}. The spin current into the detector will switch the MTJ state if it is larger than the threshold current, with a “switching delay” of T_{switch}. A reading voltage/current is applied to read the final switched state with a “read delay” of T_{read}. The critical delay of this inverter/buffer circuit is T_{write} + T_{diff} + T_{switch} + T_{read}.

- Simulation:
  Figs. 4.6 and 4.7 show the simulation results of the inverter/buffer by using the ASL compact model, with different injection currents values: I_{inj} = 1.9 mA and I_{inj} = 697 μA. These two simulations verified the functions of this architecture and give the corresponding delays.

4.3.1.2 AND/OR(NAND/NOR)

AND/OR/NAND/NOR 2

- Architecture:
  The AND/OR/NAND/NOR2 functions can be implemented with the same majority function(Eq. 4.8): 3-input majority function, and with the same architecture: 3-input
Figure 4.6 – Simulation of ASL based inverter/buffer with maximum injection current $I_{inj} = 1.9 \text{ mA}$.

Figure 4.7 – Simulation of ASL based inverter/buffer with injection current $I_{inj} = 697 \mu\text{A}$.

Figure 4.8 – (a) 2-bit AND/OR/NAND/NOR architecture, $In1$ and $In2$ as the inputs, $F$ as the control terminal and $Out$ as the output. Different injection current polarities and $F$ states lead to different functions. (b) Vertical view of the architecture with the channel length $L1$ and $L2$. (c) Functional symbol of the 2-bit AND/OR/NAND/NOR. (d) Spin injection efficiency $P_{eff}$ vs. Channel distribution of this architecture.

majority gate MAJ3, as shown in Fig. 4.8 (a). $In1$ and $In2$ are the input terminals, $F$ the control terminal and $Out$ the output terminal. The realized functions are modified with different control terminal states and different injection current polarities, listed in Table. 4.8.

$$Function = Maj(In1, In2, F)$$  \hspace{1cm} (4.8)

- Parameter:

As shown in Fig. 4.8 (b), the channel distribution needs to be simulated and determined: two channel lengths $L1$ and $L2$. The total channel length is set to be 100 nm. We simulated the spin detection efficiency $P_{eff}$ (Eq. 3.27 in Chapter 3.1.2.2) dependence of the channel length ratio ($L_1/L_1+L_2$). Fig. 4.8 (d) shows that a larger ratio leads to a larger spin detection efficiency $P_{eff}$. Hence, in our following simulations, we set $L1$ to 70 nm and $L2$ to 30 nm, considering the dipolar interaction between different magnets. The injection current is set to 697 $\mu\text{A}$ through simulations, which leads to the channel breakdown current. (Table. 4.9).

54
Table 4.8 – Reconfigurable Functions Based on AND/OR/NAND/OR2 architecture.

<table>
<thead>
<tr>
<th>Function</th>
<th>F</th>
<th>( I_{inj} )</th>
</tr>
</thead>
<tbody>
<tr>
<td>AND2</td>
<td>1</td>
<td>N</td>
</tr>
<tr>
<td>OR2</td>
<td>0</td>
<td>N</td>
</tr>
<tr>
<td>NAND2</td>
<td>1</td>
<td>P</td>
</tr>
<tr>
<td>NOR2</td>
<td>0</td>
<td>P</td>
</tr>
</tbody>
</table>

Table 4.9 – ASL channel distribution and injection current parameters.

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>( L_1 )</td>
<td>70 nm</td>
</tr>
<tr>
<td>( L_2 )</td>
<td>30 nm</td>
</tr>
<tr>
<td>( I_{inj} )</td>
<td>697 ( \mu A )</td>
</tr>
</tbody>
</table>

- Timing:
  Since the 3-input majority gate is a one-step function, the timing is the same with that of the inverter/buffer, where the MTJ states of three inputs are written with the same current.

- Simulation:
  The simulation result is shown in Fig. 4.9, which verifies the proposed functions of this architecture: AND/OR/NAND/NOR2. With an injection current of 697 \( \mu A \), the average delay of this architecture is 0.87 ns.

![Figure 4.9 – Function simulation of 2-bit AND/OR/NAND/NOR.](image)

The backflow problem can be observed by comparing the spin detection currents \( I_{det} \) in Figs. 4.9 and 4.7. With the same injection current \( I_{inj} = 697 \mu A \), in the case of inverter/buffer with only one input, the detection spin current \( I_{det} \) is \( \sim 292.6 \mu A \). However, in the AND/OR/NAND/NOR2 circuit with three inputs, the average detection current \( I_{det} \) is \( \sim 266 \mu A \). This means the spin current flows not only into the output magnet, but also into other magnets, which demonstrated the backflow phenomenon.

**AND/OR/NAND/NOR 3**

- Architecture
  Similarly with the AND/OR/NAND/NOR2 architecture, the AND/OR/NAND/NOR3 functions can be implemented with the 5-input majority function (Eq. 4.9), and with the MAJ5 architecture as shown in Fig. 4.10 (a). \( In_1, In_2 \) and \( In_3 \) are the input terminals, \( F1 \) and \( F2 \) the control terminals and \( Out \) the output terminal. The realized
functions are configured with different control terminal states and different injection current polarities, listed in Table. 4.10.

\[ \text{Function} = \text{Maj}(\text{In}1, \text{In}2, \text{In}3, F1, F2) \]  
(4.9)

- Parameter

As shown in Fig. 4.10 (b), the channel distribution needs to be simulated and determined: two channel lengths \( L1 \) and \( L2 \). The total channel length is set to be 100 nm. We simulated the spin detection efficiency \( P_{eff} \) dependence of the channel length ratio \( \frac{L1}{L1+L2} \). Fig. 4.10 (d) shows that a larger ratio leads to a larger spin detection efficiency \( P_{eff} \). Hence, in our following simulations, we set \( L1 \) to 70 nm and \( L2 \) to 30 nm, considering the dipolar interaction between different magnets. The injection current is set to 455 μA through simulations, which leads to the channel breakdown current (Table. 4.11).

Table 4.10 – Reconfigurable Functions Based on 5-input majority gate architecture.

<table>
<thead>
<tr>
<th>Function</th>
<th>In3</th>
<th>F1</th>
<th>F2</th>
<th>( I_{inj} )</th>
</tr>
</thead>
<tbody>
<tr>
<td>AND3</td>
<td>X</td>
<td>0</td>
<td>0</td>
<td>N</td>
</tr>
<tr>
<td>OR3</td>
<td>X</td>
<td>1</td>
<td>1</td>
<td>N</td>
</tr>
<tr>
<td>NAND3</td>
<td>X</td>
<td>0</td>
<td>0</td>
<td>P</td>
</tr>
<tr>
<td>NOR3</td>
<td>X</td>
<td>1</td>
<td>1</td>
<td>P</td>
</tr>
<tr>
<td>AND2</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>N</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>OR2</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>N</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NAND2</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>P</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NOR2</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>P</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 4.11 – ASL channel distribution and injection current parameters.

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>( L1 )</td>
<td>70 nm</td>
</tr>
<tr>
<td>( L2 )</td>
<td>30 nm</td>
</tr>
<tr>
<td>( I_{inj} )</td>
<td>455 μA</td>
</tr>
</tbody>
</table>

- Timing:

As a one-step majority function, its timing is the same with that of the inverter/buffer/MAJ3, where the MTJ states of five inputs are written with the same current.
Simulation

The simulation result is shown in Fig. 4.11, which verifies the proposed functions of this architecture: AND/OR/NAND/NOR3. The functions of AND/OR/NAND/OR2 have not been simulated, since the 2-bit functions are realized with different In3 states and different injection current polarities, and the injection currents into F1 and F2 are canceled, and this result has already been presented in Fig. 4.9. With an injection current of 455 $\mu$A, the average delay of this architecture is 1.25 ns.

![Figure 4.11 – Function simulation of 3-bit AND/OR/NAND/NOR.](image)

4.3.1.3 XOR/XNOR2/3

**Architecture**

As we presented in Section 4.1.2, the XOR3 function can be synthesized with two different methods: “truth table” and “replacement” methods. Their majority functions are described as in 4.10 and 4.11, with $In1/In2/In3$ as three inputs and Out as the output:

\[
M1 = In1 \cdot In2 + In3(In1 \oplus In2) = Maj(In1, In2, In3)
\]

\[
Out = (In1 \oplus In2) \oplus In3 = Maj(In1, In2, In3, \overline{M1}, \overline{M1}) \tag{4.10}
\]

\[
M1 = Maj(In1, In2, In3)
\]

\[
M2 = Maj(\overline{In1}, In2, In3)
\]

\[
Out = Maj(\overline{M1}, M2, In1) \tag{4.11}
\]

The two architectures $XOR_{TT}$ and $XOR_{rep}$ are illustrated in Fig. 4.12 (a) and (b).

As presented in the subsection of Inverter/buffer, the XNOR3 function can be configured with the same architecture of XOR3, except that the injection current is opposite. The XOR/XNOR2 functions can be configured with an input (e.g. $In3$) with the value of 0.

**Parameter**

Fig. 4.12 (a) also shows the channel distribution of the $XOR_{TT}$, where the default values are listed in Table. 4.13. The injection current of the three inputs is set to 700 $\mu$A, which leads to the maximum channel current. Considering the backflow problem and
Figure 4.12 – (a)“truth table” method based XOR/XNOR2/3 circuit $XOR_{TT}$: $In1/2/3$ as three inputs, M1 as the intermediate terminal, Out as the final output. (b) “replacement” method based XOR/XNOR2/3 circuit $XOR_{rep}$, $I_{inj1} = -I_{inj2}$. This architecture can also realize the function of full-adder and full-subtractor, with the other two intermediate outputs: M1’ as the inversion of the out put carry of the full-adder, M2 as the output borrow of the full-subtractor. (c) Functional symbol of the XOR/XNOR2/3 circuit: In1/2/3 as three inputs, Out as final output for XOR/XNOR2/3, M1 as the output carry for full-adder, M2 as the output borrow for full-subtractor that is only output with the architecture in (b).

the current division problems caused by the spin diffusion property, the injection current of M1 to generate Out is simulated and is set to 637 $\mu A$ to guarantee the weight of M1 is twice that of $In1/In2/In3$ for Out. For the $XOR_{rep}$, the channel lengths and the injection currents are the same with that in the 3-input majority gate. The functional symbol is shown in Fig. 4.12 (c), with the achievable functions of these two architectures. Table 4.12 lists the configured functions of the $XOR_{TT}$ circuit.

<table>
<thead>
<tr>
<th>Function</th>
<th>M1</th>
<th>Function</th>
<th>Out</th>
<th>$In3$</th>
<th>$I_{inj}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>$In1In2$</td>
<td></td>
<td>$In1 \oplus In2$</td>
<td>0</td>
<td>N</td>
<td></td>
</tr>
<tr>
<td>$In1In2$</td>
<td></td>
<td>$In1 \oplus In2$</td>
<td>0</td>
<td>P</td>
<td></td>
</tr>
<tr>
<td>$In1 + In2$</td>
<td></td>
<td>$In1 \oplus In2$</td>
<td>1</td>
<td>N</td>
<td></td>
</tr>
<tr>
<td>$In1 + In2$</td>
<td></td>
<td>$In1 \oplus In2$</td>
<td>1</td>
<td>P</td>
<td></td>
</tr>
<tr>
<td>Maj($In1, In2, In3$)</td>
<td></td>
<td>XOR3</td>
<td>#</td>
<td>N</td>
<td></td>
</tr>
<tr>
<td>Maj($In1, In2, In3$)</td>
<td></td>
<td>XNOR3</td>
<td>#</td>
<td>P</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>$L_1$</td>
<td>50 nm</td>
</tr>
<tr>
<td>$L_2$</td>
<td>25 nm</td>
</tr>
<tr>
<td>$L_3$</td>
<td>25 nm</td>
</tr>
<tr>
<td>$I_{inj1}$</td>
<td>700 $\mu A$</td>
</tr>
<tr>
<td>$I_{inj2}$</td>
<td>637 $\mu A$</td>
</tr>
</tbody>
</table>

- **Timing**
  - $XOR_{TT}$: Based on its majority functions (4.10), the circuit is realized within two steps. In first step, the states of three inputs $In1$, $In2$ and $In3$ are written and injected to get the intermediate output $M1$; second step injects currents into $In1$, $In2$, $In3$ and $M1$ to get $Out$. Final reading is processed into both $M1$ and $Out$ after the whole calculation.
– XOR<sub>rep</sub>: This architecture contains two 3-input majority gates in stage 1 and one 3-input majority gate in stage 2. The states of the initial inputs are written firstly, including the In1 of stage 2. Currents are then injected into the inputs of stage 1 to get M1 and M2. With the next current injection into M1, M2 and In1 of the stage 2, Out can be switched. The voltage source is applied to Out to get the state and M1/2 can be also read for full – adder/subtractor functions.

• Simulation

Fig. 4.13 shows the simulation results of the XOR<sub>TT</sub> circuit, which verified our design and the functional behavior of our design. The area and the average delay are, compared with the 3-input majority gate based XOR/XNOR, this circuit has a larger delay (7.75 ns vs. 1.77 ns), a smaller area (0.08 vs. 0.12 μm<sup>2</sup>) and consumes more energy (0.81 vs. 0.244 nJ).

![Figure 4.13 – Function simulation of XOR<sub>TT</sub> circuit.](image)

### 4.3.2 Arithmetic logical functions

An arithmetic logic circuit is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers. In this subsection, we present the design of five circuits: adder, subtractor, comparator, multiplier and an Arithmetic Logical Unit (ALU).

#### 4.3.2.1 Adder

**full-adder** A binary full adder performs the arithmetic sum of three input bits. It consists of three inputs, of which two are input variables representing the two significant bits to be added, labeled as A and B, whereas the third input terminal is the carry from the previous lower significant position and labeled as C<sub>in</sub>. The two outputs are a sum and a carry output which are labeled as Sum and C<sub>out</sub> respectively. The Boolean expressions for a full-adder are expressed as follows:
\[ C_{out} = A \cdot B + C_{in} (A \oplus B) = \text{Maj}(A, B, C_{in}) \]
\[ \text{Sum} = (A \oplus B) \oplus C_{in} \]  

(4.12)

We can see that the \( C_{out} \) and \( \text{Sum} \) can be configured with a 3-input majority gate and an XOR3 gate, which can be realized with the structures in Fig. 4.12 based on the majority functions 4.10 and 4.11. The injection currents and the channel length parameters are the same with these two structures.

![Figure 4.14 - 4-bit adder implementations. (a) series implementation; (b) parallel implementation; (c) 4-bit adder functional symbol.](image)

**Multi-bit adder**  
As we discussed that a single full-adder performs the addition of two one-bit numbers and an input carry, to perform the addition of binary numbers with more than one bit, a multi-bit adder is needed. We use a 4-bit adder to present the ASL-based multi-bit adder design and it is possible to construct 16-bit and 32-bit adders by cascading numbers of 4-bit adders. Fig. 4.14 below shows a serial and a parallel 4-bit adder implementations and the functional symbol of the 4-bit adder. The two binary numbers to be added are supposed to be \( A = A_3A_2A_1A_0 \) and \( B = B_3B_2B_1B_0 \).

**Series Adder**  
Fig. 4.14 (a) shows the serial adder implementation. Four full-adders are cascaded together to produce the result, which is known as a Ripple Carry Adder. In the case of serial adders (ripple carry type adder), the carry output of each full adder stage is connected to the carry input of the next higher-order stage. Signals must be propagated at a given enough time to produce the correct or desired output. The delay of the 4-bit serial adder is \( T_{\text{write}} + 4 \times T_{C_{out}} + T_{\text{Sum}} + T_{\text{read}} \), where \( T_{\text{write/read}} \) is the delay of writing/reading MTJ state; \( T_{C_{out}/\text{Sum}} \) is the delay needed to switch the state of one carry/sum. The MTJs used is four times as that in full-adder \( (4 \times 5 = 20) \).

Fig. 4.15 illustrates the simulation results of the 4-bit serial adder with the injection currents: \( I_{\text{inj}1} = 700 \mu A \), which leads to maximum spin current in the channel, and \( I_{\text{inj}2} = 0.91I_{\text{inj}1} = 637 \mu A \), which leads to \( \text{weights of } C_{out} = 2 \times \text{weight of } A/B/C_{in} \). The average delay and the corresponding energy are simulated as 10.5 ns and 1.876 nJ.
For a serial adder, there will be a considerable time delay in the addition process, since it is not possible to produce the sum and carry outputs of any state until the input carry occurs. The following are the methods to improve the speed of a series adder:

1. We can reduce the delay by changing the property of the used MTJ/channel, e.g. the dimensions, $Hk$, $Ms$, $I_{inj}$, etc. But there will be an intrinsic limitation or breakdown limitation for the ASL device.

2. Another way is to increase the circuit complexity in order to reduce the delay. There are several methods available, one commonly used method in CMOS technology employs the principle of look ahead-carry addition by eliminating inter stage carry logic.

\[
C_1 = Maj(A_1, B_1, C_0) = Maj(A_1, B_1, A_1, B_1, A_0, B_0, C_{in})
\]  

\[(4.13)\]

A carry-lookahead adder is a fast parallel adder, making use of logic gates so as to look at the lower order bits of the augend and addend to see whether a higher order carry is to be generated or not. Our parallel adder is designed based on the same principle by replacing the input carry with the augend, addend and the initial input carry. We found that in a majority gate, the input carry $C_{i-1}$ for stage $i$ can be replaced with its primary inputs $A_{0/.../i-1}$, $B_{0/.../i-1}$ and the weights of the original inputs of this stage $A_i, B_i$ are duplicated. Eq. 4.13 shows an example of $C1$. 

![Figure 4.15 – Function simulation of 4-bit series adder.](image)

**Carry-Lookahead/Parallel Adder** A carry-lookahead adder is a fast parallel adder, making use of logic gates so as to look at the lower order bits of the augend and addend to see whether a higher order carry is to be generated or not. Our parallel adder is designed based on the same principle by replacing the input carry with the augend, addend and the initial input carry. We found that in a majority gate, the input carry $C_{i-1}$ for stage $i$ can be replaced with its primary inputs $A_{0/.../i-1}$, $B_{0/.../i-1}$ and the weights of the original inputs of this stage $A_i, B_i$ are duplicated. Eq. 4.13 shows an example of $C1$. 

\[
C_1 = Maj(A_1, B_1, C_0) = Maj(A_1, B_1, A_1, B_1, A_0, B_0, C_{in})
\]  

\[(4.13)\]

$A_1$ and $B_1$ are duplicated to compute $C_1$, which leads to a higher number of inputs than that used to compute $C_0$. More generally, the computation of $C_i$ will require $2^i A_i/B_i$, $2^{i-1} A_{i-1}/B_{i-1}$ and so on, the same for $S_i$. The implementation of parallel calculation is shown in Fig. 4.14(b), with 3/5/7/9-inputs majority gates. With this implementation, the 4-bit adder can achieve a smaller delay of $T_{write} + T_{Cout} + T_{Sum} + T_{read}$, at the cost of area $(5 + 7 + 9 + 11 = 32 \text{ MTJs})$ and energy $(I_{inj} \text{ exponential-growth})$.

In conclusion, a multi-bit adder can be implemented by series and parallel calculations. Compared with the serial adder, the parallel adder possibly has a smaller delay at the cost.
of energy and area. Moreover, the number of inputs of the majority gate increases with the number of inputs of the adder. Considering the backflow current, it will be more and more difficult to guarantee the accuracy of the circuit.

### 4.3.2.2 Subtractor

Subtraction is a mathematical operation in which one integer number is deducted from another to obtain the equivalent quantity. The number from which other number is to be deducted is called as minuend and the number subtracted from the minuend is called subtrahend.

**full-subtractor** The full-subtractor performs a subtraction between the two binary bits by considering borrow of the lower significant stage. It has three input terminals in which two terminals corresponds to the two bits to be subtracted, and a borrow bit corresponds to the borrow operation. There are two outputs, one corresponds to the difference $D$ output and other borrow output $B_{out}$.

The majority function of the different $D$ and $B_{out}$ can be configured with the $XOR_{rep}$ circuit shown in Fig. 4.12 (b): $In1$ and $In2$ as inputs, $In3$ as the input borrow, $M2$ as output borrow and $Out$ as output difference. The half-subtractor function can be realized with $In3 = 0$.

The parameters and the simulation result are the same with that of the $XOR_{rep}$ circuit, with a delay of 1.77 ns and an energy consumption of 0.24 nJ.

![4-bit Subtractor](image)

Figure 4.16 - 4-bit subtractor implementation. (a) Architecture of a series 4-bit subtractor with 4 full-subtractors. The output borrow of the previous stage is the input borrow of the next stage. (b) Functional symbol of the series 4-bit subtractor: $A_3A_2A_1A_0$ are the minuend and $B_3B_2B_1B_0$ are the subtrahend; $B_{in}$ is the input borrow; $D_3D_2D_1D_0$ is the output difference; $B_{out3}$ is the output borrow.

**4-bit subtractor** As with the binary adder, we can also have n number of full-subtractor cascaded together to subtract two parallel n-bit numbers from each other. Fig. 4.16 (a) shows the architecture of a series 4-bit subtractor formed by connecting four full-subtractors. In this subtractor, 4 bit minuend $A_3A_2A_1A_0$ is subtracted by 4 bit subtrahend $B_3B_2B_1B_0$ and gives the difference output $D_3D_2D_1D_0$. The borrow output of each subtractor is connected as the borrow input to the next preceding subtractor. Fig. 4.17 shows the results of a 4-bit subtractor, verifying the functional behavior of the designed 4-bit subtractor.

It is also possible to design a 4-bit parallel subtractor with the method of replacement for multi-bit parallel adders. Same as the multi-bit adder, compared with the serial adder, the parallel subtractor is expected to have a smaller delay since there is no borrow propagation.
4.3.2.3 Comparators

Data comparison is needed in digital systems while performing arithmetic or logical operations. This comparison determines whether one number is greater than, equal, or less than the other numbers. A digital comparator is widely used in the combinational system and is specially designed to compare the relative magnitudes of binary numbers.

![Figure 4.17 – Function simulation of 4-bit Subtractor with full-subtractor.](image)

![Figure 4.18 – 1-bit comparator implementation. (a) Architecture of 1-bit comparator: A/B as inputs, L(A < B), E(A = B) and H(A > B) as three outputs, I_{inj1/2} as two different injection current sources, where I_{inj1} = -I_{inj2}; (b) Functional symbol of 1-bit comparator.](image)

1-bit magnitude comparator A 1-bit comparator compares two bits, i.e. two numbers each of single bit. It consists of two inputs for allowing two single bit numbers and three outputs to generate less than, equal or greater than comparison outputs. The majority function of the 1-bit comparator, with A and B as the inputs and L(A < B), E(A = B) and
\( H(A > B) \) as the three outputs, is expressed in 4.14, where we choose the AND-based XNOR to realize the function \( E \) due to its lower energy and delay.

\[
L = \bar{A}B = Maj(\bar{A}, B, 0)
\]

\[
E_{maj} = XNOR(A, B) = Maj(\bar{A}, Maj(A, B, 0), Maj(A, \bar{B}, 1))
\]

\[
H = AB = Maj(A, \bar{B}, 1)
\]  \hspace{1cm} (4.14)

The 1-bit comparator can be implemented with one XOR/XNOR circuit and one 3-input majority gates. Fig. 4.18 (a) and (b) shows the functional diagram of a single bit AND-based magnitude comparator and the functional symbol of the 1-bit comparator, with two injection current sources \( I_{inj1}/2 \), where \( I_{inj1} = -I_{inj2} \).

Fig. 4.19 shows the simulation results of the 1-bit comparator. Because the gates used for the 1-bit comparator are the 3-input majority gate and the XNOR gate that is composed of 3-input majority gates, the amplitude of the injection currents \( I_{inj1}/2 \) is set to 700 \( \mu \)A, which are the values of injection current of the 3-input majority gate.

![Function_simulation_of_1-bit_comparator](image)

Figure 4.19 – Function simulation of 1-bit comparator. \( V_{\text{write/read}} \) is the writing/reading source of MTJs; \( I_{inj1}/2 \) is the corresponding injection current; \( A/B \) are the input states; \( L/E/H \) are the output states; \( F \) is the control state of the 3-input majority gate, which is 0 in this case to realize the AND function.

2-bit comparator A 2-bit comparator compares two binary numbers, each of two bits and produces their relation such as one number is equal to or greater than or less than the other. We suppose that the first number \( A \) is designated as \( A = A_1A_0 \) and the second number is designated as \( B = B_1B_0 \). The comparator produces three outputs as \( H \) (\( H = 1 \) if \( A > B \)), \( E \) (\( E = 1 \), if \( A = B \)) and \( L \) (\( L = 1 \) if \( A < B \)).

The boolean function and the corresponding majority function based on "replacement" method for each output can be expressed as:

\[
L = A_1B_1 + A_0B_1B_0 + \bar{A}_1A_0B_0
\]

\[
= Maj(Maj(A_1, B_1, 0), Maj(\bar{A}_0, A_1, B_0, 0, 0), Maj(\bar{A}_1, \bar{A}_0, B_0, 0, 0), 1, 1)
\]

\[
E = (A_0 \otimes B_0)(A_1 \otimes B_1)
\]

\[
= Maj(XNOR(A_0, B_0), XNOR(A_1, B_1), 0)
\]

\[
H = A_0\bar{B}_1\bar{B}_0 + A_1\bar{B}_1 + A_1A_0\bar{B}_0
\]

\[
= Maj(Maj(A_0, \bar{B}_1, \bar{B}_0, 0, 0), Maj(A_1, \bar{B}_1, 0), Maj(A_1, A_0, \bar{B}_0, 0, 0), 1, 1)
\]  \hspace{1cm} (4.15)
Figure 4.20 - 2-bit comparator implementation. (a) Majority gates implementation of 2-bit comparator by using 3-input majority gates, 5-input majority gates, and XOR/XNOR gates; b. Functional symbol of 2-bit comparator, with inputs A1A0 and B1B0, outputs L/E/H and five injection current sources: \( I_{inj1-4} \).

Or, the majority functions of the 2-bit comparator based on "truth table" method are expressed as:

\[
LM1 = Maj(A_0, B_0, 0, \bar{A}_1, B_1, B_1) \\
LM2 = Maj(0, \bar{A}_1, A_0, LM1, LM1) \\
L = Maj(B_0, \bar{A}_1, \bar{A}_0, LM2, LM2) \\
\]

(4.16)

\[
EM1 = Maj(B_1, \bar{A}_0, \bar{B}_0, 1, A_1, A_1) \\
EM2 = Maj(A_0, B_0, \bar{B}_1, \bar{A}_1, 1, EM1, EM1, EM1, EM1) \\
EM3 = Maj(A_0, B_0, B_1, 0, \bar{A}_1, EM2, EM2, EM2, EM2) \\
EM4 = Maj(\bar{A}_0, \bar{A}_1, B_0, B_1, 0, EM3, EM3, EM3, EM3) \\
EM5 = Maj(A_0, \bar{A}_1, B_0, B_1, 0, EM4, EM4, EM4, EM4) \\
EM6 = Maj(A_0, \bar{A}_1, B_0, B_1, 0, EM5, EM5, EM5, EM5) \\
EM7 = Maj(A_0, \bar{A}_1, B_0, B_1, 0, EM6, EM6, EM6, EM6) \\
E = Maj(A_0, \bar{A}_1, B_0, B_1, 0, EM7, EM7, EM7, EM7) \\
\]

(4.17)

\[
GM1 = Maj(A_0, B_0, \bar{B}_1, A_1, A_1) \\
GM2 = Maj(A_0, B_0, \bar{B}_1, 0, GM1, GM1, GM1) \\
GM3 = Maj(A_0, \bar{B}_0, \bar{B}_1, 0, GM2, GM2, GM2) \\
GM4 = Maj(\bar{B}_0, B_1, 0, GM3, GM3) \\
\]

(4.18)

By comparing the two methods, we find that the first expression uses fewer MTJs and can achieve smaller delays. Hence, we use the first majority expression to implement the 2-bit comparator. Fig. 4.20 (a) shows the implementation of the three outputs: L/H/E, with four different injection current sources \( I_{inj1-4} \).
Fig. 4.21 shows the simulation results of the implemented 2-bit comparator. The values of different injection currents are set according to the original 3/5-input majority gates. Simulation results verified the functional behaviors of the 2-bit comparator.

Figure 4.21 – Function simulation of 2-bit comparator. \( V_{\text{write/read}} \) are the writing/reading voltage source of MTJs; \( I_{\text{inj}} \) is the injection current value; Input states are expressed as \( A1/A0/B1/B0 \) and output states are expressed as \( L/E/H \).

### 4-bit magnitude comparator

4-bit comparator can be used to compare two four-bit words. The two 4-bit numbers are \( A = A_3A_2A_1A_0 \) and \( B_3B_2B_1B_0 \) where \( A_3 \) and \( B_3 \) are the most significant bits. The three outputs are \( L(L = 1, \text{if } A < B), H(H = 1, \text{if } A > B) \) and \( E(E = 1, \text{if } A = B) \). The boolean functions can be written as:

\[
L = \bar{A}_3B_3 + (A_3 \odot B_3)\bar{A}_2B_2 + (A_3 \odot B_3)(A_2 \odot B_2)\bar{A}_1B_1 + (A_3 \odot B_3)(A_2 \odot B_2)(A_1 \odot B_1)\bar{A}_0B_0 \\
E = (A_3 \odot B_3)(A_2 \odot B_2)(A_1 \odot B_1)(A_0 \odot B_0) \\
H = A_3\bar{B}_3 + (A_3 \odot B_3)A_2\bar{B}_2 + (A_3 \odot B_3)(A_2 \odot B_2)A_1\bar{B}_1 + (A_3 \odot B_3)(A_2 \odot B_2)(A_1 \odot B_1)A_0\bar{B}_0 \tag{4.19}
\]

From the above output boolean expressions, the logic circuits for this 4-bit comparator can be implemented by using the XOR/XNOR circuit to realize the XNOR(\( \odot \)) function and 3-input majority circuits to realize the AND/OR functions. The inversion function can be realized by using a positive injection current.

Fig. 4.22 shows the simulation results of the 4-bit comparator, which verified the functional behavior of the implemented circuit.

When comparing larger binary numbers, the comparator can be implemented by cascading several small binary numbers. For example, the 8-bit comparator can be implemented by using two 4-bit comparators. The comparative mechanism is: the comparator starts by comparing the highest-order bit first. If equality exists, then it compares the next lower bit and so on until it reaches the lowest-order bit. If equality still exists then the two numbers are defined as being equal. If inequality is found, either \( A < B \) or \( A > B \), the relationship
between the two numbers if determined and the comparison between any additional lower order bits stops.

Digital magnitude comparators are widely used:

- in the address decoding circuitry in computers and microprocessor based devices to select a specific input/output device for the storage data.

- in control applications in which the binary numbers representing physical variables such as temperature, position, etc., are compared with a reference value. Then the outputs from the comparator are used to drive the actuators so as to make the physical variables closest to the set or reference value.

- process controllers

### 4.3.2.4 Multipliers

A binary multiplier is a combinational logic circuit used in digital systems to perform the multiplication of two binary numbers. These are most commonly used in various applications especially in the field of digital signal processing to perform the various algorithms. Commercial applications like computers, mobiles, high-speed calculators and some general purpose processors require binary multipliers.

- Architecture:

Fig. 4.23 (a) shows an implementation of a 4-bit array multiplier, with three 4-bit adders and sixteen AND gates. \( A = A_3A_2A_1A_0 \) and \( B = B_3B_2B_1B_0 \) are the multiplier and multiplicand respectively. The functional symbol is shown in Fig. 4.23 (b).

A classical array structure is used: the first stage is the multiplication of \( A_i \) and \( B_{0/1} \); results are transmitted to the second stage, where additions occur, etc. The multiplier
Figure 4.23 – (a) Implementation of the 4-bit array multiplier, with three 4-bit adders and sixteen AND gates: \( A = A_3A_2A_1A_0 \) and \( B = B_3B_2B_1B_0 \) as the multiplicand and multiplier, \( M_7M_6M_5M_4M_3M_2M_1M_0 \) as the output, \( C_{in} \) as the input carry. (b) Functional symbol of the 4-bit array multiplier.

is implemented using 16 AND gates (each AND gate corresponds to a 3-inputs majority gate configured for AND function) and three 4-bit adders, for which serial and parallel implementations are possible. The multiplier is thus a hierarchical circuit for which multiple design options are possible.

- Timing:

In the first step, all the inputs \( A \) and \( B \) are written simultaneously with a delay of \( T_{write} \). Then the 8 AND gates for \( A_iB_i \) are implemented with the delay of \( T_{AND} \). Step 3 contains a 4-bit adder and 4 parallel AND gates. As shown in Table 4.21, with the same injection current, the delay of AND gate is smaller than that of the 4-bit adder. Hence we consider the delay of the 4-bit adder: \( T_{adder} \), as the delay of step 3. Similar for the last two steps, the delay is \( T_{adder} \). After calculating all the outputs, the results are read with a delay of \( T_{read} \). In conclusion, the delay of the 4-bit multiplier is \( T_{write} + T_{AND} + 3T_{adder} + T_{read} \).

Fig. 4.24 shows the simulation results of the 4-bit array multiplier, which verified the functional behaviors of the designed 4-bit array multiplier.

### 4.3.2.5 ALU

An ALU is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers. Their functionality and complexity depend on the system requirements and the data size to handle. Most ALUs perform the operations such as: addition, subtraction, two’s complement, increment, decrement, AND, OR, XOR, ones’ complement and bit shift operations. In this subsection, we consider a one-bit and 4-bit ALUs allowing to execute the following functions: addition, subtraction, multiplexing, increment, decrement, AND/OR/NAND/NOR, XOR/XNOR.

#### 1-bit ALU

The implementation of the 1-bit ALU relies on two methods: circuit assembly and majority gate synthesis technique. Both ALUs rely on 5-inputs majority gates, consid-
erating the ASL device breakdown current and the backflow issues.

- Logic circuit based ALU

$ALU_{LCA}$ is designed by assembly ASL-based full-adder and multiplexer and is illustrated in Fig. 4.25 (a). It is implemented using two 5-inputs gates (M1 and M2) and a 3-inputs gate (M3 for which # indicates that no current is injected into two terminals). The state of each input and control terminal is written by a voltage source “$V_{write}$” and then injected into the channel by a positive or negative injection current “$I_{inj}+$” or “$I_{inj}-$”. The “$A$” and “$B$” mean the states of these terminals are written by the same write voltage sources, i.e. “$V_{writeA}$” and “$V_{writeB}$”, respectively, whereas the injection current polarity is specified to each terminal. “$2M1$” means the injection current for terminal “$M1$” is duplicated;

Fig. 4.25 (b) illustrates the symbol of $ALU_{LCA}$ and Table 4.14 summarizes the configuration schemes for each terminal. In this table, possible values for the injected spin current “$I_{inj}$” are “0” (i.e. no spin current), “P” (positive) or “N” (negative). The state of each MTJ can be “0/1” (i.e. parallel/anti-parallel) or “X” (i.e. don’t care). For example, to perform full-subtraction operation, the following configuration is needed: no current injected for $H/U/Z$, positive currents for $A1/M1$ and negative currents for $A2/A3/B1/B2/C$. The half-subtraction is performed by specifying $C = 0$ with the other states invariable. The half-adder is configured in the same way. For multiplexing operation, the configuration is very different and is described as following: $U = 0$, $Z = 1$, $H = B1$, no injection current for $A1/B2$, positive current for $A2$ and negative current for $A3/B1/C/H/U/Z$. Hence, from this two examples, it is clear that configuring $ALU_{LCA}$ is a tedious task since terminal current polarity and state need to be controlled independently from each other.

- Majority gate synthesis method based ALU

$ALU_{MG}$ is designed using a majority gates synthesis method. Assuming that basic logic circuits can be implemented by combining full-adder and multiplexer circuits,
Figure 4.25 – Proposed ASL devices based ALUs: (a) $ALU_{LCA}$ requires three 5-inputs majority gates and 11 control signals for currents/voltages (S1-S11); The “A” and “B” in the figure mean the states of these terminals are written by the same write voltage sources, i.e. “$V_{write_A}$” and “$V_{write_B}$”, respectively, whereas the injection current polarity is specified to each terminal. “2M1” means the injection current for terminal “M1” is doubled; and (c) $ALU_{MG}$ is implemented using 14 5-inputs majority gates and one control signal of the current (S1); The terminals with the same symbol are connected to a same writing voltage source, e.g. all “A” to “$V_{write_A}$”. Green and violet lines for the terminals connect to the corresponding injection currents; (b) and (d) are the corresponding functional symbol of two ALUs. Symbol # indicates that no current is injected in the terminal.

ALU function is formalized as:

$$Function = \overline{H} \times (adder) + H \times (Multiplexer)$$  \hspace{1cm} \text{(4.20)}$$

The resulting circuit requires 14 majority gates and is illustrated in Fig. 4.25 (c). Input signals are “A”, “B”, “C” and there are three control signals (“U”, “Z” and “H”) and two outputs (“$F_0$” and “$F_1$”). The terminals with the same symbol are connected to a same writing voltage source, e.g. all “A” to “$V_{write_A}$”. All the terminals have the same injection current polarity, which contributes to simplifying the ALU configuration by significantly reducing the number of control signals. Fig. 4.25 (d) presents the symbol for $ALU_{MG}$ and Table 4.15 summarizes the possible configurations. As an example, configuring the subtractor function is realized as follow: $UZH = 101$ and negative injection current used for all the terminals. Full-adder function (second row in the table) is configured as follow: $UZH = 010$ or $UZH = 101$ (both combinations are possible), negative injection current are used for all the majority gates and results are outputted.
on F₁ (carry) and F₀ (sum). Following the approach described in [58], XOR₃ function corresponds to sum and thus can be obtained on terminal F₀. Function XNOR₃ is configured with the same configuration for UZH but using a positive injection current.

- **Simulation**

  We first model ALUₘ₉ using the compact model and the following parameters: we assume a 40nm for MTJ diameter/width and \( I_{mj} \) is set to 420 \( \mu A \) with a 2.5 \( ns \) pulse duration. This allows defining a maximum current in the channel equals to the breakdown current and leads to 1.24 \( ns \) and 2.8 \( ns \) channel propagation delays. SPICE simulations for both ALUₗ₉ and ALUₘ₉ have been carried out and allow validating the models. We show the simulation results for ALUₘ₉ executing full-adder, half-adder, AND/OR₃ and multiplexer (Fig. 4.26). In addition to representing input and output terminals, we also show the intermediate outputs \( mi \) for the \( i \)th majority gate. The simulations allow validating the correct behavior of models, validate the models and help to extract delays and energy figures used in the following comparison.

![Simulation results](image-url)

Figure 4.26 – ALUₘ₉ simulation results for full-adder, half adder, AND3/OR3 and multiplexer functions.

- **Comparison**

  ALUₗ₉ and ALUₘ₉ require 3 and 14 gates respectively, which leads to 0.21 \( \mu m^2 \) and 0.98 \( \mu m^2 \) footprint for their implementation (Table 4.16). Due to a smaller number of devices, ALUₗ₉ is also a more energy efficient and a faster solution. However, it is important to notice that delay and energy results do not take into account the energy dissipated in the control circuits. Indeed, configuring ALUₗ₉ is tedious since both terminal state and injection current polarity need to be specified in a non-correlated way. This drastically increases the number of control signals required for ALUₗ₉ (Tables 4.14 and 4.15). ALUₘ₉ thus requires a simplified control circuit, which could lead to significant design complexity reduction when integrating an ALU in a complete computing system. This evaluation is part of future works.

Moreover, while these results are given for 40 nm MTJ only, it is also possible to investigate ALU performances/area for other technological parameters. This allows i) to explore the scalability opportunity offered by ASL devices and ii) to optimize ALUs according to the area and energy consumption figures.

71
In this subsection, we have proposed two 1-bit ALU circuits implemented using ASL devices. $ALU_{LCA}$ is an assembly result of previously proposed circuits while $ALU_{MG}$ is synthesized using majority gate design method. Results give a significant advantage to $ALU_{LCA}$ since it is the most efficient implementation regarding energy, area and latency metrics. However, configuring $ALU_{LCA}$ is a tedious task since it requires a significant number of control signals. Its integration into a complete computing system may thus lead to a high design complexity, which is not a disadvantage shared by to the more easily configurable $ALU_{MG}$. This will be further investigated in our future works.
Table 4.14 – Integrated functions configurations of $ALU_{LCA}$.

<table>
<thead>
<tr>
<th>Function</th>
<th>A1</th>
<th>A2</th>
<th>A3</th>
<th>B1</th>
<th>B2</th>
<th>C</th>
<th>H</th>
<th>Z</th>
<th>U</th>
<th>M1</th>
<th>$F_{out}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Full-adder</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
<td>N</td>
<td>(P)</td>
<td>0/1</td>
<td>N</td>
<td>(P)</td>
<td>0/1</td>
<td>N</td>
<td>(P)</td>
</tr>
<tr>
<td>/XOR3(XNOR3)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Subtractor</td>
<td>0/1</td>
<td>P</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Multiplexer</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
<td>P</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
</tr>
<tr>
<td>increment</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
<td>N</td>
<td>1</td>
<td>N</td>
<td>0</td>
<td>N</td>
<td>X</td>
</tr>
<tr>
<td>decrement</td>
<td>0/1</td>
<td>P</td>
<td>0/1</td>
<td>N</td>
<td>0/1</td>
<td>N</td>
<td>1</td>
<td>N</td>
<td>0</td>
<td>N</td>
<td>X</td>
</tr>
<tr>
<td>AND2(OR2) /NAND2(NOR2)</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
<td>N</td>
<td>(P)</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
<td>N</td>
<td>(P)</td>
<td>X</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>AND3(OR3) /NAND3(NOR3)</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
<td>N</td>
<td>(P)</td>
<td>0/1</td>
<td>0</td>
<td>0/1</td>
<td>N</td>
<td>(P)</td>
<td>Z</td>
</tr>
</tbody>
</table>

## F1F0 /F0(F0)
Table 4.15 – Integrated functions configurations of $ALU_{MG}$.

<table>
<thead>
<tr>
<th>Function</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>U</th>
<th>Z</th>
<th>H</th>
<th>$I_{inj}$</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>Full-adder /XOR3(XNOR3)</td>
<td>0/1</td>
<td>0/1</td>
<td>0/1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>N</td>
<td>F1F0</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>(P)</td>
<td>F0(F0)</td>
</tr>
<tr>
<td>Subtractor without borrow</td>
<td>1</td>
<td>0/1</td>
<td>0/1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>P</td>
<td>F1F0</td>
</tr>
<tr>
<td>Subtractor</td>
<td>0/1</td>
<td>0/1</td>
<td>0/1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>N</td>
<td>F1F0</td>
</tr>
<tr>
<td>Multiplier</td>
<td>0/1</td>
<td>0/1</td>
<td>0/1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>N</td>
<td>F0</td>
</tr>
</tbody>
</table>

**Increment**

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>1</th>
<th>0/1</th>
<th>0</th>
<th>X</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>1</td>
<td></td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td>1</td>
<td>X</td>
<td>1</td>
</tr>
</tbody>
</table>

**Decrement**

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>1</th>
<th>0/1</th>
<th>0</th>
<th>0</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>0</td>
<td>0</td>
<td></td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>0</td>
<td></td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>1</td>
<td></td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**AND2(OR2) /NAND2(NOR2)**

<table>
<thead>
<tr>
<th></th>
<th>(0)</th>
<th>0/1</th>
<th>0/1</th>
<th>0</th>
<th>X</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>(0)</td>
<td></td>
<td></td>
<td>1</td>
<td>X</td>
<td>1</td>
</tr>
</tbody>
</table>

**AND3(OR3) /NAND3(NOR3)**

|       | 0/1 | 0/1 | 0/1 | (0) | (0) | (0) | N | F0 |

Table 4.16 – $ALU_{LCA}$ and $ALU_{MG}$ performance comparison.

<table>
<thead>
<tr>
<th></th>
<th>$ALU_{LCA}$</th>
<th>$ALU_{MG}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gate number</td>
<td>3</td>
<td>14</td>
</tr>
<tr>
<td>Area ($\mu m^2$)</td>
<td>0.21</td>
<td>0.98</td>
</tr>
<tr>
<td>Delay (ns)</td>
<td>F0 2.48</td>
<td>14.88</td>
</tr>
<tr>
<td></td>
<td>F1 1.24</td>
<td>3.72</td>
</tr>
<tr>
<td></td>
<td>F2 1.24</td>
<td>#</td>
</tr>
<tr>
<td>Energy (nJ)</td>
<td>F0 0.27</td>
<td>1.566</td>
</tr>
<tr>
<td></td>
<td>F1 0.135</td>
<td>0.351</td>
</tr>
<tr>
<td></td>
<td>F2 0.081</td>
<td>#</td>
</tr>
</tbody>
</table>

74
4.3.3 Data transmission

4.3.3.1 Multiplexer (MUX)

The multiplexer is a digital switch, also called as date selector. It is a combinatorial circuit with more than one input line, one output line and more than one select line. It allows the binary information from several input lines or sources and depending on the set of select lines, particular input line, is routed onto a single output line.

Figure 4.27 − (a). 2-to-1 multiplexer architecture: \( In1/2 \) as two inputs, \( S_0 \) as select signal, \( Z = 1 \) and \( U = 0 \) are the control signals, \( Q \) as the output, \( M1 \) as the intermediate state; the “2” after the \( In1 \) and \( M1 \) means the weights of terminals are twice the others. (b). Vertical view of the 2-to-1 multiplexer, with channel distributions: \( L1 – L4 \). (c). Functional symbol of 2-to-1 multiplexer, with three different injection current sources, where \( I_{inj1} = -I_{inj2} \).

2-to-1 Multiplexer A 2-to-1 multiplexer consists of two inputs \( In1 \) and \( In2 \), one select input \( S_0 \) and one output \( Q \). Depends on the select signal, the output is connected to either of the inputs. Since there are two input signals, only one select is needed to do these operations. The boolean and the corresponding majority functions based on the "replacement" method are expressed as:

\[
Q = \bar{S}_0 In1 + S_0 In2 \\
= Maj(Maj(\bar{S}_0, In1, 0), Maj(S_0, In2, 0), 1) \tag{4.21}
\]

The multiplexer based on the "truth table" method can be expressed as with \( Z = 1 \) and \( U = 0 \):

\[
M1 = Maj(\bar{S}_0, Z, In1, In1, In2) \\
Q = Maj(S_0, M1, M1, In2, U) \tag{4.22}
\]

Fig. 4.27 (a) shows the architecture of the multiplexer based on the “truth table” synthesis method. Fig. 4.27(b) is the vertical view of the multiplexer architecture with the channel distributions. The functional symbol of the multiplexer is shown in Fig. 4.27(c).

With different control terminal states \( U \) and \( Z \), this circuit can be configured as AND/OR/NAND/NOR and MUX functions, as shown in Table 4.17. For example, if \( U \) and \( Z \) are both configured as “0”, and the injection current is negative, the circuit will output the function \( AND(In1, In2) \).

Fig. 4.28 shows the simulation results of the 2-to-1 multiplexer with "truth table" method, with the MTJ writing/reading signal \( V_{write/read} \), the injection current signals \( I_{inj1–3} \), the select signal \( S_0 \), the inputs signals \( In1/2 \), the control signals \( U/Z \) and the output signal \( Q \). The injection current is simulated and set to 455 \( \mu \)A to guarantee the channel function, the same injection in the 5-inputs majority gate. Simulation results verify the functional behaviors of the designed 2-to-1 multiplexer.
Table 4.17 – Reconfigurable Functions Based on MUX Structure synthesized with “truth table” method.

<table>
<thead>
<tr>
<th>Function Q</th>
<th>U</th>
<th>Z</th>
<th>$I_{inj}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>$In1In2$</td>
<td>0</td>
<td>0</td>
<td>N</td>
</tr>
<tr>
<td>$In1In2$</td>
<td>0</td>
<td>0</td>
<td>P</td>
</tr>
<tr>
<td>$S0In1 + S0In2$</td>
<td>0</td>
<td>1</td>
<td>N</td>
</tr>
<tr>
<td>$S0In1 + S0In2$</td>
<td>0</td>
<td>1</td>
<td>P</td>
</tr>
<tr>
<td>$S0In1 + S0In2$</td>
<td>1</td>
<td>0</td>
<td>N</td>
</tr>
<tr>
<td>$S0In1 + S0In2$</td>
<td>1</td>
<td>0</td>
<td>P</td>
</tr>
<tr>
<td>$In1 + In2$</td>
<td>1</td>
<td>1</td>
<td>N</td>
</tr>
<tr>
<td>$In1 + In2$</td>
<td>1</td>
<td>1</td>
<td>P</td>
</tr>
</tbody>
</table>

Figure 4.28 – Function simulation of 2-to-1 multiplexer: $V_{write/read}$ as MTJ writing/reading signal, $I_{inj1−3}$ as the injection current signals, $S0$ as the select signal, $In1/2$ as the inputs signals, $U/Z$ as the control signals and $Q$ as the output signal.
4-to-1 Multiplexer  A 4-to-1 multiplexer consists of four data input lines as \( A/B/C/D \), two select lines as \( S_1 \) and \( S_2 \) and a single output line \( Q \). The select lines \( S_1 \) and \( S_2 \) select one of the four input lines to connect the output line. The particular input combination on select lines selects one of input \( (A \text{ through } D) \) to the output.

We suppose the boolean function of this 4-to-1 multiplexer is expressed as:

\[
Q = \overline{S_1}S_2A + S_1\overline{S_2}B + \overline{S_1}S_2C + S_1S_2D
\]

(4.23)

By using the \( 3/5 \)-input majority gates, the implementation of the 4-to-1 multiplexer is illustrated in Fig. 4.29 (a) and (b) presents the functional symbol with three injection current signals \( I_{inj1/2/3} \).

![Diagram of 4-to-1 multiplexer](image)

Figure 4.29 – a. Implementation of 4-to-1 multiplexer with 3/5-input majority gates: \( S_1/2 \) as two select signals, \( A/B/C/D \) as four input signals, \( Q \) as output, \( I_{inj1/2/3} \) as injection current sources, where \( I_{inj1} = I_{inj2} \). b. functional symbol of 4-to-1 multiplexer.

By setting the injection currents values \( I_{inj1} = -455 \, \mu A \), \( I_{inj2} = 455 \, \mu A \) and \( I_{inj3} = -700 \, \mu A \), which lead to the maximum spin current in the channel, the functional behaviors of the multiplexer are shown in Fig. 4.30.

In general, an \( n \)-to-1 multiplexer needs \( n \) input signals and \( \lceil \log_2 n \rceil \) select signals.

In all types of digital system applications, multiplexers find its immense usage. Since they allow multiple inputs to be connected independently to a single output, they are found in a variety of applications including data routing, logic function generations, control sequences, parallel-to-serial converters, etc.

- Data routing: Multiplexers are extensively used in data routing applications to route the data to one particular destination from one of the several sources.

- Logic function generator: In place of logic gates, a logical expression can be generated by using a multiplexer. It is possible to connect the multiplexer such that it duplicates the logic of any truth table. In such cases it can generate the Boolean algebraic function of a set of input variables. This abruptly reduces the number of logic gates or integrated circuits to perform the logic function since the multiplexer is a single integrated circuit. In this kind of applications, multiplexers are viewed as logic function generators.
Figure 4.30 – Simulation of 4-bit multiplexer. $U = 0(Z = 1)$ is the control signal state of the 3/5-input majority gate to realize the AND/OR function.

- Parallel to serial conversion: A multiplexer circuit can be used to convert the parallel data to serial data, so as to reduce parallel buses to serial signals. This type of conversion is needed in telecommunication, test and measurement, military/aerospace, data communications applications.

- Other applications of multiplexers include control sequences, pulse train generators, encoders, register to register data transfer, waveform generators, etc.

### 4.3.3.2 Demultiplexer

A demultiplexer is a combinational logic circuit that receives the information on a single input and transmits the same information over one of $2^n$ possible output lines. Since the demultiplexers are used to select or enable the one signal out of many, these are extensively used in microprocessor or computer control systems such as:

- Selecting different IO devices for data transfer
- Choosing different banks of memory
- Enabling different rows of memory chips, depending on the adders
- Enabling different functional units
- Synchronising data transmission systems

**1-to-2 demultiplexer**  A 1-to-2 demultiplexer consists of one input line $In$, two output lines $Y0/1$ and one select line $S$. The signal on the select line helps to switch the input to one of the two outputs. We suppose that the boolean function of the 1-to-2 demultiplexer is expressed as:
\[ Y_0 = \bar{S}I_n = \text{Maj}(\bar{S}, I_n, 0) \]
\[ Y_1 = S I_n = \text{Maj}(S, I_n, 0) \]  

(4.24)

Based on the above boolean equations, this demultiplexer is implemented by using two AND gates, which is actually a 3-input majority gate with the control signal \( F = 0 \). The architecture is shown in Fig. 4.31 (a) and (b) shows its functional symbol with two injection current signals: \( I_{inj1} \) and \( I_{inj2} \), where \( I_{inj1} = -I_{inj2} \).

Figure 4.31 - 1-to-2 bit demultiplexer. (a) Implemented architecture of the 1-to-2 demultiplexer with two 3-inputs majority gate (AND function with control terminal \( F = 0 \)); two injection current signals \( I_{inj1} = -I_{inj2} \). (b) Functional symbol of the 1-to-2 demultiplexer.

![Figure 4.31 - 1-to-2 bit demultiplexer](image)

Figure 4.32 – Function simulation of 1-to-2 bit demultiplexer: \( V_{\text{write/read}} \) as the writing/reading voltage sources to write/read the MTJ states, \( I_n \) as the input state, \( S \) as the select signal state, \( Y_0/Y_1 \) as the output states, \( U = 0 \) as the control signal state to realize the AND function in a 3-inputs majority gate.

Fig. 4.32 shows the simulation results of the demultiplexer, with \( I_{inj1} = -700 \mu A \) and \( I_{inj2} = 700 \mu A \), which is the maximum currents to guarantee the channel function. The simulation results verify the functional behaviors of the designed 1-to-2 demultiplexer.

**1-to-4 demultiplexer** A 1-to-4 demultiplexer has a single input \( (I_n) \), two selection lines \( (S1 \text{ and } S0) \) and four outputs \( (Y0 \text{ to } Y3) \). The input data goes to any one of the four outputs at a given time for a particular combination of select lines. This demultiplexer is also called
a 2-to-4 demultiplexer which means that two select lines and 4 output lines. The boolean functions of the designed 1-to-4 demultiplexer are expressed as:

\[
\begin{align*}
Y_0 &= \bar{S}_1\bar{S}_0In = \text{Maj}(\bar{S}_1, \bar{S}_0, In, 0, 0) \\
Y_1 &= \bar{S}_1S_0In = \text{Maj}(S_1, S_0, In, 0, 0) \\
Y_2 &= S_1\bar{S}_0In = \text{Maj}(S_1, \bar{S}_0, In, 0, 0) \\
Y_3 &= S_1S_0In = \text{Maj}(S_1, S_0, In, 0, 0)
\end{align*}
\]  

(4.25)

Based on the above equations, this multiplexer is implemented in Fig. 4.33(a), with four AND3 gates (5-inputs majority gate with the control state \( F = 0 \)). Fig. 4.33 (b) illustrates the functional symbol of the 1-to-4 demultiplexer, with two injection current signals \( I_{\text{inj}_1} = -I_{\text{inj}_2} \).

![Diagram of 1-to-4 demultiplexer](image)

Figure 4.33 – 1-to-4 bit demultiplexer. (a) Implementation of the 1-to-4 demultiplexer, with four AND gates (5-input majority gate with control state \( F = 0 \)): \( In \) as input, \( S_0/1 \) as select signal, \( Y_1/2/3/4 \) as output signals, \( I_{\text{inj}_1/2} \) as injection current signals with \( I_{\text{inj}_1} = -I_{\text{inj}_2} \). (b) Functional symbol of the 1-to-4 demultiplexer, with \( I_{\text{inj}_1/2} \) as two injection current signals.

Fig. 4.34 shows the simulation results of the 1-to-4 bit demultiplexer, which verifies its functional behaviors. The amplitude of the two injection currents is 455 \( \mu A \), which leads to the maximum spin current in the channel.

![Simulation Results of 1-to-4 Demultiplexer](image)

Figure 4.34 – Function simulation of 1-to-4 bit demultiplexer. \( V_{\text{write/read}} \) are the writing/reading signal of MTJ states; \( In \) is the input signal; \( S_0/1 \) are the select signals; \( Y_0/1/2/3 \) are the output signals; \( U = 0 \) is the control signal of the 5-input majority gate which realizes the AND function.
1-to-8 demultiplexer  A 1-to-8 demultiplexer consists of one single input $I_n$, three select inputs $S2, S1$ and $S0$ and eight outputs from $Y0$ to $Y7$. It is also called 3-to-8 demultiplexer due to three select input lines. It distributes one input line to one of 8 output lines depending on the combination of select inputs.

The boolean function of the designed 1-to-8 demultiplexer is expressed as:

$$
Y0 = \bar{S}2\bar{S}1\bar{S}0I_n \\
Y1 = \bar{S}2\bar{S}1\bar{S}0I_n \\
Y2 = S2\bar{S}1\bar{S}0I_n \\
Y3 = S2\bar{S}1\bar{S}0I_n \\
Y4 = S2\bar{S}1\bar{S}0I_n \\
Y5 = S2\bar{S}1\bar{S}0I_n \\
Y6 = S2S1\bar{S}0I_n \\
Y7 = S2S1\bar{S}0I_n 
$$

From the above equations, the demultiplexer can be implemented by using AND gates. Considering the backflow problem, we use one AND2 and one AND3 gate to realize the AND4 function. Hence, eight 3-input and eight 5-input majority gates are used for the implementation. The injection currents for these two gates are set to 700 $\mu$A and 455 $\mu$A as in 3-input and 5-input majority gates.

Fig. 4.35 shows the simulation results of this implementation, which verifies the functional behaviours of this demultiplexer.

Figure 4.35 – Function simulation of 1-to-8 bit demultiplexer. $V_{write}$ is the writing signal of MTJ states; $I_n$ is the input signal; $S0/1/2$ are the select signals; $Y0 – 7$ are the output signals; $U = 0$ is the control signal of the 3/5-input majority gates which realizes the AND function.

When the application requires a large demultiplexer with more number of output pins, we cannot implement it by a single integrated circuit, then two or more demultiplexers need to be cascaded to fulfill the requirement. In general, a 1-to-n demultiplexer needs $\lceil \log_2^n \rceil$ select lines. Then we can cascade m 1-to-k demultiplexer to realize the 1-to-n demultiplexer, where $\lceil \log_2^k \rceil \times m = \lceil \log_2^n \rceil$. For example, to realize 1-to-32 bit demultiplexer, we can cascade two 1-to-16 bit demultiplexer or three 1-to-8 bit demultiplexer.
4.3.3.3 Encoders

Unlike a multiplexer that selects one individual data input line and then sends that data to a single output line or switch, a digital encoder more commonly called a binary encoder takes all its data inputs one at a time and then converts them into a single encoded output. Generally, digital encoders produce outputs of 2-bit, 3-bit or 4-bit codes depending upon the number of data input lines. An “n-bit” binary encoder has $2^n$ input lines and n-bit output lines with common types that include 4-to-2, 8-to-3 and 16-to-4 line configurations.

Encoders are very common electronic circuits used in all digital systems. In the case of pocket calculators, there are used to translate the decimal values to the binary in order to perform the binary functions such as addition, subtraction, multiplication, etc. They are also used to generate the digital signals in response to the movement which is classified into shaft encoders and linear encoders.

4-to-2 bit binary encoder  A binary encoder has $2^n$ input lines and $n$ output lines, hence it encodes the information from $2^n$ inputs into an n-bit code. From all the input lines, only one of the input lines is activated at a time, and depending on the input line, it produces the n-bit output code. We use a 4-to-2 bit binary encoder as an example to present the binary encoder.

Fig. 4.36 shows the block diagram of a 4 input binary encoder, with four input $W0/1/2/3$, two outputs $Y1/2$. Table 4.18 shows the truth table of this 4 input binary encoder. It is observed from the table that the output $Y0$ is 1 when either input $W1$ or $W3$ is 1; the output $Y1$ is 1 when either input $W2$ or $W3$ is 1. Hence, the 4-input binary encoder can be implemented with two OR2 gates.

![4-input binary encoder block diagram](image)

Figure 4.36 – Block diagram of 4 input binary encoder: $W0/1/2/3$ are four inputs, $Y0/1$ are two outputs.

<table>
<thead>
<tr>
<th>$W3$</th>
<th>$W2$</th>
<th>$W1$</th>
<th>$W0$</th>
<th>$Y1$</th>
<th>$Y0$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Table 4.18 – Truth table of 4 input binary encoder.

One of the main disadvantages of the standard binary encoders is that they can generate the wrong output code when there is more than one input present at logic level “1”. For example, if we make inputs $W1$ and $W2$ at logic “1” both at the same time, the resulting output is neither at “01” or at “10” but will be at “11” which is an output binary number that is different to the actual input present. Also, an output code of all logic “0”s can be generated when all of its inputs are at “0” or when input $W0$ is equal to “1”.

One simple way to overcome this problem is to “Prioritise” the level of each input pin and if there was more than one input at logic level “1” the actual output code would only correspond to the input with the highest designated priority. Then this type of digital encoder is known commonly as a Priority Encoder or P-encoder for short.
**4 input priority encoder** The Priority encoder solves the problems mentioned above by allocating a priority level to each input. The priority encoders output corresponds to the currently active input which has the highest priority. So when an input with a higher priority is present, all other inputs with a lower priority will be ignored.

We use the example of a 4 input priority encoder, with 4 inputs $D0 - 3$ and three outputs $Y0/1$ and $V$, to present the priority encoder. The third output $V$ is a valid bit indicator and is set to “1” when one or more inputs are active or equal to “1”, or is set to “0” when all the inputs are “0” which indicates that there is no valid input.

Table 4.19 is the truth table of the 4 input priority encoder. In the truth table, $D3$ has the highest priority and $D0$ has the lowest priority. When $D3$ is active or “1”, then regardless of other inputs, the output is “11”. The priority from high to low is $D3, D2, D1, D0$.

<table>
<thead>
<tr>
<th>$D0$</th>
<th>$D1$</th>
<th>$D2$</th>
<th>$D3$</th>
<th>$Y1$</th>
<th>$Y0$</th>
<th>$V$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>X</td>
<td>X</td>
<td>0</td>
</tr>
<tr>
<td>X</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>X</td>
<td>X</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>X</td>
<td>X</td>
<td>X</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

From the above truth table, the boolean functions of this 4 input priority encoder can be expressed as:

$$V = D3 + D2 + D1 + D0$$

$$Y1 = D3 + D2$$

$$Y0 = D3 + \bar{D}2D1$$

Based on the boolean functions, the 4 input priority encoder can be implemented with AND/OR gates, which is illustrated in Fig. 4.37 (a). Fig. 4.37 (b) shows the functional symbol of the encoder with three injection currents $I_{inj1/2/3/4/5}$.

![Diagram](image)

Figure 4.37 – 4-inputs priority encoder. (a) Implementation of the encoder: $D0 - 3$ as input signals, $Y0 - 2$ as output signals, AND/OR functions realized with the control terminal $F = 0/1$, $I_{inj1/2/3/4/5}$ as five different injection current signals where $I_{inj1} = -I_{inj2}$. (b) Functional symbol of the 4-input priority encoder.

With $I_{inj1} = I_{inj4} = -700 \mu A, I_{inj2} = 700 \mu A$ and $I_{inj3} = I_{inj5} = 455 \mu A$, the simulation results are illustrated in Fig. 4.38, which verify the functional behaviors of the designed priority encoder.

### 4.3.3.4 Decoders

The binary decoder is another combinational logic circuit constructed from individual logic gates and is the exact opposite to that of an “Encoder”. The name “Decoder” means to translate or decode coded information from one format into another, so a digital decoder
transforms a set of digital input signals into an equivalent decimal code at its output. Generally, a binary decoder has n inputs and $2^n$ outputs. The most commonly used practical binary decoders are 2-to-4 decoder and 3-to-8 decoder. The followings present these two decoders respectively.

2-to-4 binary decoder The 2-to-4 decoder decodes 2 binary inputs labelled $A$ and $B$ into one of 4 outputs $Q_0$–$Q_3$. Only one output is active at any time while the other outputs are maintained at logic “0” and the output which is held active or high is determined the two binary inputs $A$ and $B$. The relationship between the inputs and the outputs are expressed with the following boolean equations:

$$
Q_0 = \bar{A} \bar{B} = Maj(\bar{A}, \bar{B}, 0) \\
Q_1 = \bar{A}B = Maj(\bar{A}, B, 0) \\
Q_2 = A\bar{B} = Maj(A, \bar{B}, 0) \\
Q_3 = AB = Maj(A, B, 0)
$$  (4.28)

These boolean equations can be implemented by using four AND gates, which is illustrated in Fig. 4.39 (a). Fig. 4.39 (b) shows the functional symbol of the 2-to-4 bit decoder.

With the amplitude of $I_{inj1/2}$ of 700 $\mu$A, the simulation results of the designed decoder are shown in Fig. 4.40, which verify the functional behaviors of the decoder.

3-to-8 binary decoder In a 3-to-8 decoder, three inputs are decoded into eight outputs. It has three inputs $A$, $B$ and $C$, and eight output from $Y_0$ through $Y_7$. Based on the combinations of the three inputs, only one of the eight outputs is selected.
Figure 4.39 – 2-to-4 bit decoder. (a) Implementation of 2-to-4 bit decoder with four AND gates (realized with the control state $F = 0$ in a 3-input majority gate): $A$ and $B$ as inputs, $Q_0-3$ as outputs, the inversions are realized with an positive injection current $I_{inj2}$, $I_{inj1} = -I_{inj2}$. (b) Functional symbol of the 2-to-4 bit decoder.

Figure 4.40 – Function simulation of 2-to-4 bit decoder. $V_{write/read}$ are the writing/reading voltage signal of MTJ states; $A$ and $B$ are the input states; $Q_0-3$ are the output states; $I_{inj1/2}$ are the injection currents; $U = 0$ is the control state of the 3-input majority gate to realize the AND function.

The boolean functions of this 3-to-8 decoder are expressed as:

$$
Y_0 = \overline{AB}\overline{C} \\
Y_1 = \overline{ABC} \\
Y_2 = \overline{A}\overline{BC} \\
Y_3 = \overline{ABC} \\
Y_4 = A\overline{B}\overline{C} \\
Y_5 = A\overline{B}C \\
Y_6 = AB\overline{C} \\
Y_7 = ABC
$$

(4.29)

Based on the above boolean equations, this decoder can be implemented with eight AND gates, as shown in Fig. 4.41 (a). Fig. 4.41 (b) illustrates the functional symbol of the 3-to-8 decoder, with two different injection currents $I_{inj1/2}$, where $I_{inj1} = -I_{inj2}$.

With the amplitude of the two injection current $I_{inj1/2} = 455 \mu A$, the simulation results are shown in Fig. 4.42, which verify the functional behaviors of the designed 3-to-8 binary...
Figure 4.41 – 3-to-8 bit decoder. (a) Implementation of a 3-to-8 binary decoder with eight AND gates (realized by setting the control state $F = 0$ of a 5-input majority gate). $A/B/C$ as three inputs, $Y_0 – 7$ as eight outputs, $I_{inj1/2}$ are two injection currents where $I_{inj1} = -I_{inj2}$. (b) Functional symbol of the designed 3-to-8 binary decoder.

decoder.

It is possible to combine or cascade two or more decoders to produce a decoder with larger number of input bits. For example, two 3-to-8 decoders can be cascaded into a 4-to-16 decoder.

### 4.3.4 Arbitrary circuit

This subsection uses the Binary Coded Decimal (BCD) to 7-segment display decoder to present the arbitrary circuit design with multi-inputs and multi-output. It combines a BCD decoder and a 7-segment decoder, which we will present in the followings.

#### 4.3.4.1 BCD

BCD is used to get the decimal digit corresponding to a specific input combination. A BCD number needs 4 binary digits to represent the 0 to 9 decimal digits, thus it consists of 4 input lines, 10 output lines corresponding to 0 to 9 decimal digits. Table 4.20 shows the truth table of the BCD and the general binary-to-decimal decoder.

#### 4.3.4.2 7-segment decoder

A BCD to 7-segment display decoder has 4 BCD inputs $A – D$ and 7 outputs $a – g$, one for each segment, as shown in Fig. 4.43. For example, in order to display the number 3, segments $a, b, c, d$ and $g$ would need to be illuminated. If we want to display a different number or letter then a different set of segments would need to be illuminated. The corresponding boolean equations are expressed as follows:

\[
\begin{align*}
a &= A + C + BD + \bar{B}\bar{D} \\
b &= B + CD + CD \\
c &= B + \bar{C} + D \\
d &= BD + CD + B\bar{C}D + \bar{B}C + A \\
e &= \bar{B}D + CD \\
f &= A + \bar{C}D + B\bar{C} + B\bar{D} \\
g &= A + B\bar{C} + \bar{B}C + CD \\
\end{align*}
\] (4.30)
Figure 4.42 – Function simulation of 3-to-8 bit decoder. $V_{write/read}$ are the writing/reading signals of MTJ states; $A/B/C$ are the three input states; $Y_{0-7}$ are the eight output states; $I_{inj1/2}$ are the two injection currents. $U = 0$ is the control state which realizes the AND function in a majority gate.

![Image of 7-segment display elements for all numbers. Each number corresponds to a set of illuminated segments.](image)

From the above equations, we can see that the BCD to 7-segment display decoder can be implemented with AND/OR gates, which is illustrated in Fig. 4.44 (a). Fig. 4.44 (b) shows the functional symbol of this designed display decoder.

With the amplitudes of injection currents $I_{inj1} = I_{inj2} = 700 \ \mu A$ and $I_{inj3} = I_{inj4} = 455 \ \mu A$, the simulation results of the designed display decoder are shown in Fig. 4.45, which verify the functional behaviors of this design.

### 4.4 Circuit Benchmarking

From the above simulations of different circuits, we analyze their performance including the delay, energy, Energy Delay Product (EDP), and throughput. Table 4.21 shows the performance evaluations with delay-optimized and energy-optimized approaches. Compared with the performance of CMOS-based circuits listed in Table 4.22, ASL-based circuits have the advantages in terms of area and leakage power, yet have a larger delay and energy due to the MTJ switching. Hence, ASL performance enhancement focus on the delay and energy optimization, which are related to the STT switching and the spin injection/detection efficiency, based on the performance analysis in Chapter 3.
Table 4.20 – Truth table of BCD decoder.

<table>
<thead>
<tr>
<th>Decimal</th>
<th>Binary Pattern</th>
<th>BCD</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>8</td>
<td>4</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>5</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>6</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>7</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>8</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>9</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>11</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>12</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>13</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>14</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>15</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

4.5 Summary

In this chapter, we developed a circuit design methodology for ASL-based circuit design. Based on the synthesized majority functions, the parameters of the designed circuits are determined to meet the design requirements. Based on the developed ASL compact model in Chapter 3, the circuit can be simulated and the performance can be benchmarked. With this design methodology, we designed and implemented the basic logic circuits and combinational circuits: arithmetic logic circuit, data transmissions and code converters. Their functional behaviors are simulated and validated, and their performances are benchmarked to form a basic circuit library, which is useful for complex circuits/system evaluation. Moreover, compared with CMOS technology, ASL-based circuits have a larger delay and energy which is mainly due to the MTJ switching. The performance can be improved with device scaling, material study, based on the performance analysis in Chapter 3.
Figure 4.44 – 7-segment display decoder. A/B/C/D as four inputs, a – g as seven outputs; F = 0/1 to realize the AND/OR function in a majority gate. (b) Functional symbol of the designed BCD to 7-segment display decoder.

Figure 4.45 – Function simulation of 7-segment encoder. V_{write/read} are the writing/reading signals of MTJ states; A/B/C/D are the input states; a – g are the seven output states; U = 0/Z = 1 are the control states to realize the AND/OR function in a majority gate.
Table 4.21 – Basic circuits benchmarking.

<table>
<thead>
<tr>
<th>Function</th>
<th>Area $\mu m^2$</th>
<th>$I_{inj}$ $\mu A$</th>
<th>Delay (ns)</th>
<th>Energy (nJ)</th>
<th>EDP (aJ / s)</th>
<th>Throughput $1$/$\mu m^2$ $\mu A$</th>
<th>Delay (ns)</th>
<th>Energy (nJ)</th>
<th>EDP (aJ / s)</th>
<th>Throughput $1$/$\mu m^2$ $\mu A$</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverter</td>
<td>0.04</td>
<td>$1.9 \times 10^3$</td>
<td>0.29</td>
<td>0.065</td>
<td>0.019</td>
<td>86.21</td>
<td>410</td>
<td>2.285</td>
<td>0.024</td>
<td>0.0549</td>
</tr>
<tr>
<td>Buffer</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>AND2</td>
<td>0.04</td>
<td>700</td>
<td>0.886</td>
<td>0.0807</td>
<td>0.0715</td>
<td>28.22</td>
<td>450</td>
<td>1.82</td>
<td>0.069</td>
<td>0.126</td>
</tr>
<tr>
<td>OR2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>NAND2</td>
<td>0.08</td>
<td>455</td>
<td>1.659</td>
<td>0.0644</td>
<td>0.107</td>
<td>7.534</td>
<td>455</td>
<td>1.659</td>
<td>0.0644</td>
<td>0.107</td>
</tr>
<tr>
<td>NOR2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>AND3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>OR3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>NAND3</td>
<td>0.08</td>
<td>455</td>
<td>1.659</td>
<td>0.0644</td>
<td>0.107</td>
<td>7.534</td>
<td>455</td>
<td>1.659</td>
<td>0.0644</td>
<td>0.107</td>
</tr>
<tr>
<td>NOR3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>XOR2</td>
<td>TT</td>
<td>0.08</td>
<td>700/637</td>
<td>7.754</td>
<td>0.81</td>
<td>6.286</td>
<td>700/637</td>
<td>7.754</td>
<td>0.81</td>
<td>6.286</td>
</tr>
<tr>
<td>XNOR2</td>
<td>rep</td>
<td>0.12</td>
<td>700</td>
<td>1.772</td>
<td>0.244</td>
<td>0.433</td>
<td>450</td>
<td>3.641</td>
<td>0.207</td>
<td>0.755</td>
</tr>
<tr>
<td>XOR3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>XNOR3</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Adder</td>
<td>1-bit</td>
<td>0.16</td>
<td>700/970</td>
<td>1.77</td>
<td>0.274</td>
<td>0.485</td>
<td>450/457</td>
<td>3.641</td>
<td>0.231</td>
<td>0.842</td>
</tr>
<tr>
<td></td>
<td>4-bit</td>
<td>0.64</td>
<td>N/A</td>
<td>7.088</td>
<td>1.095</td>
<td>7.76</td>
<td>N/A</td>
<td>14.56</td>
<td>0.925</td>
<td>13.47</td>
</tr>
<tr>
<td>Subtractor</td>
<td>1-bit</td>
<td>0.12</td>
<td>700</td>
<td>1.772</td>
<td>0.244</td>
<td>0.433</td>
<td>450</td>
<td>3.64</td>
<td>0.207</td>
<td>0.755</td>
</tr>
<tr>
<td></td>
<td>4-bit</td>
<td>0.48</td>
<td>N/A</td>
<td>7.088</td>
<td>0.977</td>
<td>6.92</td>
<td>N/A</td>
<td>14.56</td>
<td>0.83</td>
<td>12.08</td>
</tr>
<tr>
<td>Comparator</td>
<td>1-bit</td>
<td>0.16</td>
<td>700/475</td>
<td>1.772</td>
<td>0.319</td>
<td>0.506</td>
<td>450</td>
<td>3.64</td>
<td>0.277</td>
<td>1.007</td>
</tr>
<tr>
<td></td>
<td>2-bit</td>
<td>0.92</td>
<td>600/455</td>
<td>3.318</td>
<td>1.038</td>
<td>3.443</td>
<td>600/455</td>
<td>3.318</td>
<td>1.038</td>
<td>3.443</td>
</tr>
<tr>
<td></td>
<td>4-bit</td>
<td>1.24</td>
<td>700/450</td>
<td>6.292</td>
<td>1.713</td>
<td>10.62</td>
<td>450</td>
<td>12.74</td>
<td>1.53</td>
<td>19.49</td>
</tr>
<tr>
<td>Multiplier</td>
<td>4-bit</td>
<td>2.56</td>
<td>700</td>
<td>22.15</td>
<td>4.483</td>
<td>99.3</td>
<td>450</td>
<td>45.52</td>
<td>3.88</td>
<td>176.62</td>
</tr>
<tr>
<td>ALU</td>
<td>MG</td>
<td>0.2</td>
<td>450/455</td>
<td>3.318</td>
<td>0.21</td>
<td>0.695</td>
<td>450/455</td>
<td>3.318</td>
<td>0.21</td>
<td>0.695</td>
</tr>
<tr>
<td></td>
<td>LG</td>
<td>1.04</td>
<td>N/A</td>
<td>19.14</td>
<td>0.923</td>
<td>17.66</td>
<td>N/A</td>
<td>20.07</td>
<td>0.911</td>
<td>18.29</td>
</tr>
<tr>
<td>Multiplexer</td>
<td>2-to-1</td>
<td>0.16</td>
<td>455</td>
<td>3.318</td>
<td>0.129</td>
<td>0.427</td>
<td>455</td>
<td>3.318</td>
<td>0.129</td>
<td>0.427</td>
</tr>
<tr>
<td></td>
<td>4-to-1</td>
<td>0.44</td>
<td>450/455</td>
<td>3.43</td>
<td>0.5</td>
<td>1.715</td>
<td>450/455</td>
<td>3.43</td>
<td>0.5</td>
<td>1.715</td>
</tr>
<tr>
<td></td>
<td>1-to-2</td>
<td>0.08</td>
<td>700</td>
<td>0.886</td>
<td>0.161</td>
<td>0.143</td>
<td>450</td>
<td>1.82</td>
<td>0.138</td>
<td>0.252</td>
</tr>
<tr>
<td></td>
<td>1-to-4</td>
<td>0.32</td>
<td>455</td>
<td>1.659</td>
<td>0.258</td>
<td>0.427</td>
<td>55</td>
<td>1.659</td>
<td>0.258</td>
<td>0.427</td>
</tr>
<tr>
<td></td>
<td>1-to-8</td>
<td>0.96</td>
<td>700</td>
<td>1.77</td>
<td>0.242</td>
<td>0.429</td>
<td>450/455</td>
<td>3.48</td>
<td>0.134</td>
<td>0.465</td>
</tr>
<tr>
<td>4 input priority encoder</td>
<td></td>
<td>0.28</td>
<td>N/A</td>
<td>2.545</td>
<td>0.36</td>
<td>0.915</td>
<td>N/A</td>
<td>3.48</td>
<td>0.336</td>
<td>1.17</td>
</tr>
<tr>
<td>Decoder</td>
<td>2-to-4</td>
<td>0.16</td>
<td>700</td>
<td>0.886</td>
<td>0.323</td>
<td>0.286</td>
<td>450</td>
<td>1.821</td>
<td>0.277</td>
<td>0.503</td>
</tr>
<tr>
<td></td>
<td>3-to-8</td>
<td>0.64</td>
<td>455</td>
<td>1.659</td>
<td>0.515</td>
<td>0.855</td>
<td>455</td>
<td>1.659</td>
<td>0.515</td>
<td>0.855</td>
</tr>
<tr>
<td>7-segment decoder</td>
<td></td>
<td>1.36</td>
<td>N/A</td>
<td>4.204</td>
<td>2.137</td>
<td>8.986</td>
<td>N/A</td>
<td>5.462</td>
<td>1.907</td>
<td>10.42</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Table 4.22 – Performance of CMOS-based logic circuits at 25C, 1V type process with 40 nm CMOS [260].

<table>
<thead>
<tr>
<th>Function</th>
<th>Area ($\mu m^2$)</th>
<th>Average leakage power (mW)</th>
<th>Propagation delay (ns)</th>
<th>Energy ($\mu W/MHz$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverter</td>
<td>0.7056</td>
<td>3.596e-7</td>
<td>0.0346</td>
<td>7.158e-4</td>
</tr>
<tr>
<td>AND2</td>
<td>1.76</td>
<td>3.73e-6</td>
<td>0.064</td>
<td>5.74e-3</td>
</tr>
<tr>
<td>NAND2</td>
<td>2.4696</td>
<td>3.524e-6</td>
<td>0.03</td>
<td>4e-3</td>
</tr>
<tr>
<td>OR2</td>
<td>1.764</td>
<td>2.77e-6</td>
<td>5.659e-3</td>
<td>0.0754</td>
</tr>
<tr>
<td>NOR2</td>
<td>3.1752</td>
<td>3.345e-6</td>
<td>0.0433</td>
<td>6.084e-4</td>
</tr>
<tr>
<td>XOR2</td>
<td>6.3504</td>
<td>1.533e-5</td>
<td>0.11</td>
<td>2.1e-2</td>
</tr>
<tr>
<td>MUX</td>
<td>3.88</td>
<td>6.417e-6</td>
<td>0.08</td>
<td>6.8e-3</td>
</tr>
<tr>
<td>OR3</td>
<td>3.8808</td>
<td>5.579e-6</td>
<td>1.16e-2</td>
<td>0.1067</td>
</tr>
</tbody>
</table>
Chapter 5

System Level Design

5.1 System Design Issues .................................. 93
  5.1.1 Reconfigurability .................................. 93
  5.1.2 ASL-based pipelining .............................. 95
  5.1.3 Interconnection issues ............................. 96
5.2 Computing Circuits/Systems Evaluation ............... 98
  5.2.1 Convolution circuit ................................ 98
  5.2.2 Intel i7 System .................................... 100
5.3 Summary .................................................. 103

In the previous chapter, we have analyzed the ASL-based circuits at circuit level with the developed circuit design method: the dimension, the different parameters, and the performance optimization. However, this circuit design methodology cannot be used to evaluate the performance of large systems. In system level, other methods are necessary, taking into account the interconnection issues, which plays an important role. One convolution, and Intel i7 system circuits are developed and evaluated as the examples. The pipelining of the ASL-based circuits are analyzed, by using the convolution circuit. Moreover, we analyzed the reconfigurability of the ASL-based circuit, induced by the injection current value/polarity and the control terminal states of the circuit, with the design of 2-input LookUp Table (LUT).

5.1 System Design Issues

In this section, we have analyzed system design issues: the reconfigurability, the pipelining to improve the throughput, and the interconnection issues calculating the buffer count and the input/output counts.

5.1.1 Reconfigurability

In Chapter 4, we have designed several basic circuits and combinational circuits. Since the ASL-based circuit follows the majority principle, we found that the ASL-based circuit is reconfigurable with:

1. different control terminal states

2. injection current polarities: positive or negative

3. terminal weights caused by the injection current amplitudes, the channel length, and the terminal dimensions
In this manuscript, we only consider the first two variations to configure the ASL-based circuits. With different injection current polarities and control terminal states, the different configurations of different ASL-based circuits are presented in Chapter 4.3.1: inverter/buffer, AND/OR/NAND/NOR, XOR/XNOR2/3, MUX.

Chapter 4.3.2.5 presents two structures of the 1-bit ALU. In this subsection, we present another 1-bit ALU design which exploits further the reconfigurability of ASL-based circuit and shows a better performance in terms of area and energy consumption.

The ALU circuit is actually the XOR/XNOR2/3 circuit (Fig. 4.12 (b) in Chapter 4.3.1) plus one inverter (to configure the output carry for full-adder). Fig. 5.1 shows the circuit implementation with three 3-input majority gates and one inverter gate: A, B and C as three inputs, M1, M2 and Out as three outputs.

![ALU Circuit Implementation](image)

Figure 5.1 – ALU circuit implementation.

Table 5.1 shows the configured functions with this circuit.

<table>
<thead>
<tr>
<th>Function</th>
<th>Input</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>XOR3</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>XNOR3</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>XOR2</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>Full-adder</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>Subtractor</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>Increment</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>Decrement</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>AB</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>A̅B</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>A+B</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>A̅B</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>M1</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>M2</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
<tr>
<td>Out</td>
<td>A=0, B=0, C=0</td>
<td>I_{inj1}, I_{inj2}, I_{inj3}</td>
</tr>
</tbody>
</table>

Table 5.1 – Integrated functions configurations of ALU.
5.1.2 ASL-based pipelining

Pipelining is one way to improve the overall performance of a system. The pipeline design technique decomposes a sequential process into several subprocesses, called stages or segments. A stage performs a particular function and produces an intermediate result. It consists of an input latch, also called a register or buffer, followed by a processing circuit. A clock signal is connected to each input latch. At each clock pulse, every stage transfers its intermediate result to the input latch of the next stage. In this way, the final result is produced after the input data have passed through the entire pipeline, completing one stage per clock pulse. The period of the clock pulse should be large enough to provide sufficient time for a signal to traverse through the slowest stage. Hence, the pipelining allows the simultaneous execution of several stages, exploiting the parallelism at the instruction level by overlapping the execution process of instructions. This subsection gives the design of sequential circuits: latch and flip-flop, and explains the pipeline diagram of ASL-based circuits.

Different from CMOS-based circuits, each terminal of the ASL-based circuits is connected to a clock signal through the injection current and no constant supply current is needed (see Fig. 5.2 (a)). As illustrated in the timing diagram in Fig. 5.2 (a) and (b), the XORrep circuit is implemented using two clocks (CLK1 and CLK2 for stage 1 and stage 2 respectively). There are two injection current phases within each clock cycle (positive and negative) that are connected to the MTJ terminal in order to implement inverter and buffer functions. During CLK1, injection current Iinj1 with two phases are injected into the 6 inputs of the two AND gates. The injected spin currents propagate to M1 and M2 where it will be detected during CLK2. For this purpose, current Iinj2 is injected into the In1, M1, and M2 of stage 2, thus transmitting the spin current to the Out MTJ. The clock signals and the injection current phases are provided by CMOS auxiliary circuits that we do not discuss in this manuscript.

Figure 5.2 – ASL-based circuit clocking. (a) 3-input majority gate-based XOR/XNOR2/3 circuit with clocked injection signals; (b) Clocked signals: CLK1 and CLK2 are connected to stage 1 and stage 2 respectively. The injection current for each stage can have two phases: positive and negative amplitudes to configure the inverter and buffer function respectively. (c) Activity diagram of 2-stage sequential XOR/XNOR2/3 circuit without pipelining.

Following this approach, the simplest “pipelining” of this circuit is dividing it into two stages as shown in Fig. 5.2: stage 1 with two MAJ3 gates and stage 2 with one MAJ3 gate. The corresponding activity diagram is shown in Fig. 5.2 (c). However, the circuit is rather sequential than pipelined. Indeed, if the majority gates of stage 1 are written and calculated in clock cycle C2, partial results will be transported to the majority gate of stage...
2 immediately. This will cause errors since the majority gate in stage 2 has not finished the calculation. Hence, the XOR/XNOR2/3 gate exhibits a latency of 2 clock cycles, where one clock cycle is the maximum latency of the two stages. However, the throughput is still \(0.5/ClockCycle\). This “pipelining” implementation thus does not improve the performances to non-pipelined implementation.

Pipelining is achieved by introducing MTJs between each stage in order to implemented latches, as illustrated in Fig. 5.3 (a). The data are transmitted from the stage 1 to the latches when an injection current is applied to the outputs of the majority gates in stage 1. Similarly, the data stored in the latches are transmitted to the next majority gate in stage 2 when an injection current is applied. Hence, the injection currents applied onto MTJ latches act as triggered clock. This prevents from the variation of the inputs of the majority gate in stage 2 during the computation of stage 1. The pipelining diagram of the developed pipelined circuit is presented in Fig. 5.3 (b). The duration of the clock cycle is \(Max(T_{MAJ3S1}, T_{MAJ3S2}) + T_{latch}\), where \(T_{latch}\) is the delay to write MTJ latch state and \(T_{MAJ3S1/S2}\) are the latencies of the majority gate in stage 1 and stage 2. The latency of this pipelined circuit is 2 clock cycle, and the throughput is \((1/ClockCycle)\), which is improved at the cost of area and energy.

![Diagram](image)

**Figure 5.3 – Pipelined XOR/XNOR2/3 circuit.** (a) 2-stage pipelined circuit by adding MTJs as latches between stages. (b) Activity diagram of the pipelined circuit.

### 5.1.3 Interconnection issues

Chapter 4 presents a circuit design methodology and a cell-library for the system design and evaluation. However, this methodology cannot be used for system design due to the lack of interconnections and large scale. In this subsection, we present the buffer count and the device count calculations for the system design based on cell-library approach.

#### 5.1.3.1 Buffer count

As the spin current attenuates quickly in the channel with \(e^{-t/\lambda_f}\), some buffers should be added in a large scale circuit to guarantee the spin current transfer through long distance. The total number of buffers needed in a random logic block, given in [236], is expressed as follows:

\[
\text{Buffer count} = \sum_{i=1}^{n} \left( S_i + S_{i+1} \right)
\]
\[ m = \int_{l=1}^{\sqrt{N}} [l \cdot L_p/L_{crit}]i(l) \, dx \]
\[ L_{crit} = \lambda_s N \log \left( \frac{1 + \sqrt{1 - r^2}}{r} \right) \]

where \( L_p \) is the gate pitch; \( L_{crit} \) is the critical length beyond which buffers must be inserted along the diffusion interconnect, depending on the channel spin diffusion length \( \lambda_s N \) and \( r \) (the ratio of the critical spin current needed at the receiver for switching \( I_{so} \) and the input spin current \( I_{sinj} \) at the driver); \( i(l) \) is the interconnection distribution density which is defined as [261];

**Region I**: \( 1 \leq l \leq \sqrt{N} \)
\[ i(l) = \frac{\alpha_{inter} k}{2} \Gamma \left( \frac{l^3}{3} - 2\sqrt{N}l^2 + 2Nl \right)^{2p-1} \]  
**Region II**: \( \sqrt{N} \leq l \leq 2\sqrt{N} \)
\[ i(l) = \frac{\alpha_{inter} k}{6} \Gamma \left( 2\sqrt{N} - l \right)^{3}l^{2p-4} \]  
\[ \alpha_{inter} = \frac{f.o.}{f.o. + 1} \]  
\[ \Gamma = \frac{2N(1 - N^{p-1})}{(-Np \frac{1+2p-2^{p-1}}{(p-1)(2p-1)} - \frac{1}{6p} + \frac{2\sqrt{N} - N}{p-1})} \]

where \( l \) is the interconnect length in units of gate pitches, \( N \) is the number of logic gates, \( p \) is Rent’s exponent, \( \alpha_{inter} \) is the fraction of the on-chip terminals that are sink terminals and is related to average fan-out, \( f.o. \), \( \Gamma \) is a normalized factor.

As shown in 5.1-5.4, the buffer count depends on the buffer channel length. A longer channel length can reduce the buffer count, but a higher charge current is needed for the spin diffusion loss in the channel. Hence, a compromised channel length needs to be found for the system performance optimization.

### 5.1.3.2 Device count

The system design uses the cell-library approach: complex integrated circuit is replaced by the corresponding basic logic circuits presented in Chapter 4. Table 5.2 shows the device count of basic logic circuits with CMOS technology and cascaded ASL device [52] (only the intermediate and fixed magnets count). Based on this table, the circuits of a system are replaced with the ASL-based circuits and the device count equals to the summation of the total replaced circuit counts and the primary input device count.

We estimate the primary input device count by using Rent’s rule [261], correlating the number of input/output terminals \( T \), to the number of gates \( N \):
\[ T = kN^p \]

where \( k \) is the average number of terminals per gate and \( P \) is the connectivity of the gates. These two numbers are empirical constants.

The ASL implementation is based on the majority function instead of AOI graphs for CMOS technology, and further optimization of the circuits in terms of the size and depth is possible, e.g., based on the delay-oriented optimization [158, 259], the number of the device can be reduced by 18%, on average. The system design and evaluation use this cell-library and the logic optimization approaches.
Table 5.2 – DEVICE COUNT COMPARISON BETWEEN CMOS and ASL

<table>
<thead>
<tr>
<th>Function</th>
<th>CMOS</th>
<th>ASL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverter</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>Buffer</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>2-input AND</td>
<td>6</td>
<td>2</td>
</tr>
<tr>
<td>2-input OR</td>
<td>6</td>
<td>2</td>
</tr>
<tr>
<td>2-input NAND</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>2-input NOR</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>2-input XOR</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>2-input XNOR</td>
<td>8</td>
<td>3</td>
</tr>
<tr>
<td>3-input AND</td>
<td>8</td>
<td>3</td>
</tr>
<tr>
<td>3-input OR</td>
<td>8</td>
<td>3</td>
</tr>
<tr>
<td>3-input NAND</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>3-input NOR</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>3-input XOR</td>
<td>20</td>
<td>2</td>
</tr>
<tr>
<td>3-input XNOR</td>
<td>22</td>
<td>2</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Function</th>
<th>CMOS</th>
<th>ASL</th>
</tr>
</thead>
<tbody>
<tr>
<td>2-input MUX</td>
<td>16</td>
<td>4</td>
</tr>
<tr>
<td>4-input MUX</td>
<td>24</td>
<td>18</td>
</tr>
<tr>
<td>1-to-2 DEMUX</td>
<td>14</td>
<td>4</td>
</tr>
<tr>
<td>1-to-4 DEMUX</td>
<td>28</td>
<td>12</td>
</tr>
<tr>
<td>1-to-8 DEMUX</td>
<td>52</td>
<td>40</td>
</tr>
<tr>
<td>Full-adder</td>
<td>28</td>
<td>3</td>
</tr>
<tr>
<td>4-bit adder</td>
<td>112</td>
<td>12</td>
</tr>
<tr>
<td>Full-subtractor</td>
<td>34</td>
<td>2</td>
</tr>
<tr>
<td>1-bit comparator</td>
<td>22</td>
<td>4</td>
</tr>
<tr>
<td>2-bit comparator</td>
<td>30</td>
<td>30</td>
</tr>
<tr>
<td>4-bit comparator</td>
<td>98</td>
<td>40</td>
</tr>
<tr>
<td>4-input encoder</td>
<td>30</td>
<td>11</td>
</tr>
<tr>
<td>2-to-4 decoder</td>
<td>28</td>
<td>8</td>
</tr>
<tr>
<td>3-to-8 decoder</td>
<td>52</td>
<td>24</td>
</tr>
</tbody>
</table>

5.2 Computing Circuits/Systems Evaluation

5.2.1 Convolution circuit

5.2.1.1 Design

Convolution is an important calculation in signal processing and analysis. By using convolution, we can construct the output of a system for any arbitrary input signal, if we know the impulse response of the system. 2D convolution convolves both horizontal and vertical directions in a 2D dimensional spatial domain. The output of linear and time invariant system $y[m, n]$ can be written by convolution of input signal $x[m, n]$ and impulse response $h[m, n]$:

$$y[m, n] = x[m, n] * h[m, n]$$

$$= \sum_{i=-\infty}^{\infty} \sum_{j=-\infty}^{\infty} x[i, j] \cdot h[m - i, n - j]$$  \hspace{1cm} (5.6)

From Equation 5.6, the convolution function is formed with additions and multiplications. Hence, we implemented the convolution circuit with adders and multipliers, as shown in Fig. 5.4 (a). The convolution is divided into 9 steps by using this circuit. In each step, the input signal $x[m, n]$ are transferred to the convolution circuit in accordance with the raster scan order, whereas the impulse response $h[m, n]$ are initially written to the multipliers. The multipliers realize the $16 \times 16$ bit multiplication, outputting nine 32-bit values at the same time. These values are transferred to the adder tree, which generates the 32-bit values. The latter is input into the output registers. After nine steps, nine outputs $y[m, n]$ of the system are obtained.

**Adder tree**  To efficiently add the partial output from the multipliers, the adder tree shown in Fig.5.5 is used in the convolution circuit. It is implemented using eight 32-bit ripple-carry adders.

**Multiplier**  Fig. 5.6 illustrates the 16-bit multiplier implementation using $16 \times 16$ AND gates and 15 16-bit adders.

To improve the throughput, the convolution circuit is pipelined into 2 stages: first stage with multipliers and MTJ latches and second stage with the adder tree and the registers.

98
Figure 5.4 – 2D 16-bit $3 \times 3$ convolution implementation. (a) General convolution circuit; (b) 2-stage pipelined convolution circuit; (c) Activity diagram of a 2-stage pipelined convolution circuit.

Fig. 5.4(b) and (c) respectively illustrate the 2-stage pipelining of the convolution circuit and its activity diagram. The clock cycle in this case is $Max(adder, multipliers) + T_{write}$, where $T_{write}$ is the delay of MTJ latches (writing MTJ states) and $T_{adder}/T_{multiplier}$ are the latencies of adder tree and multiplier operations. The latency of this pipelined convolution circuit is 2 clock cycles, and the throughput is $1/ClockCycle$, which is improved at the cost of the area.

5.2.1.2 Performance

**Non-pipelined circuit** The 16/32-bit serial adder is implemented using 16/32 full-adders and each full-adder is implemented using a XOR$_{rep}$ structure and an inverter. Hence, as detailed in Table. 5.3, the area, energy consumption and delay of 16/32-bit adder are

![Figure 5.5 – 32-bit adder tree.](image)
Figure 5.6 – 16-bit array multiplier implemented with AND gates and 16-bit adders.

2.6/5.1\mu m^2, 4.4/8.8 nJ and 15.3/29.5 ns. The 16-bit multiplier is implemented using 15 16-bit adders and 256 AND gates. Its area, delay and energy consumption are 73\mu m^2, 231 ns and 86.2 nJ respectively. The area of the 2D convolution circuit is 1048 \mu m^2 with an area overhead of 1.5, taking into account the required multipliers and adders. The delay and energy consumption are 349 ns and 846 nJ, respectively.

**Pipelined circuit** The clock speed of the 2-stage pipelined convolution circuit (Fig. 5.4 (b)) depends on the slower stage (231 ns for multiplier comparing with 118 ns for adder). By considering the latency of MTJ latches (buffer), the total latency of circuit is thus 462.6 ns. However, since the latency of the multiplier approximately twice the one of the adder, it is possible to introduce latches in order to increase the clock speed (e.g. a 16-bit multiplier can be pipelined into 2 stages). Moreover, finer-grain can be also investigated based on the basic circuit performance in Table 4.21 in Chapter 4. As illustrated in Table 5.3, great improvement of the throughput can be achieved.

### 5.2.2 Intel i7 System

#### 5.2.2.1 Design

To explore the interconnection distribution in a system, we evaluated an Intel i7 Haswell system, whose system specifications are listed in TABLE 5.4.

It is composed of $2.6 \times 10^9$ transistors out of which about $0.42 \times 10^9$ are used for caches while the remaining $2.18 \times 10^9$ are used for logic circuits. The methodology we propose in Chapter 4.2 is applied as follow:

**Step 1:** Inputs parameters are those listed in Table 3.1 in Chapter 3.2.2, and we target a 25 MHz operating frequency. Assuming 20 logic gates in a single pipeline stage, the switching time of an ASL device is estimated as $1/(25 MHz/20) = 2 ns$. In order to ensure that most of the ASL devices are used for logic functions, we limit the buffer count for the interconnection to half of the total device count. Based on these two constraints, the power consumption of the system (which is calculated as switching energy of a single device $\times$ frequency $\times$ device count) is optimized.

**Step 2:** Since the system synthesis relies on the circuit library, we replace CMOS-based
Table 5.3 – Convolution Circuit Implementation Results.

<table>
<thead>
<tr>
<th></th>
<th>Area ($\mu m^2$)</th>
<th>delay (ns)</th>
<th>energy (nJ)</th>
<th>Throughput ($\times 10^7$ s$^{-1}$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>16-bit adder</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>NP</td>
<td>2.6</td>
<td>15.3</td>
<td>4.4</td>
<td>6.5</td>
</tr>
<tr>
<td>2-levels</td>
<td>3.2</td>
<td>17.1</td>
<td>5.4</td>
<td>11.7</td>
</tr>
<tr>
<td>4-levels</td>
<td>4.5</td>
<td>20</td>
<td>7.5</td>
<td>20</td>
</tr>
<tr>
<td>8-levels</td>
<td>7</td>
<td>25.9</td>
<td>11.7</td>
<td>30.9</td>
</tr>
<tr>
<td>16-levels</td>
<td>12.2</td>
<td>37.6</td>
<td>20</td>
<td>42.5</td>
</tr>
<tr>
<td>32-bit adder</td>
<td>5.1</td>
<td>29.5</td>
<td>8.8</td>
<td>3.4</td>
</tr>
<tr>
<td>NP</td>
<td>5.8</td>
<td>31.3</td>
<td>9.8</td>
<td>6.4</td>
</tr>
<tr>
<td>4-levels</td>
<td>7</td>
<td>34.2</td>
<td>11.9</td>
<td>11.7</td>
</tr>
<tr>
<td>8-levels</td>
<td>9.6</td>
<td>40.1</td>
<td>16</td>
<td>20</td>
</tr>
<tr>
<td>16-levels</td>
<td>14.7</td>
<td>51.8</td>
<td>24.4</td>
<td>30.9</td>
</tr>
<tr>
<td>32-levels</td>
<td>25.6</td>
<td>75.3</td>
<td>41</td>
<td>42.5</td>
</tr>
<tr>
<td>16-bit multiplier</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>NP</td>
<td>73</td>
<td>231</td>
<td>86.2</td>
<td>0.43</td>
</tr>
<tr>
<td>2-levels</td>
<td>73.3</td>
<td>1246.2</td>
<td>87.2</td>
<td>0.81</td>
</tr>
<tr>
<td>4-levels</td>
<td>73.9</td>
<td>246.8</td>
<td>89.3</td>
<td>1.6</td>
</tr>
<tr>
<td>8-levels</td>
<td>75.2</td>
<td>247.9</td>
<td>93.5</td>
<td>3.2</td>
</tr>
<tr>
<td>16-levels</td>
<td>78.1</td>
<td>250.2</td>
<td>102.8</td>
<td>6.4</td>
</tr>
<tr>
<td>Convolution circuit</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>NP</td>
<td>1048</td>
<td>349</td>
<td>846</td>
<td>0.29</td>
</tr>
<tr>
<td>2-levels</td>
<td>1053.8</td>
<td>462.6</td>
<td>864.7</td>
<td>0.43</td>
</tr>
<tr>
<td>3-levels</td>
<td>1059.5</td>
<td>355.2</td>
<td>883.4</td>
<td>0.84</td>
</tr>
</tbody>
</table>

1 NP means non-pipelined

Table 5.4 – Intel i7 Haswell Architecture Characteristics.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Channel length</td>
<td>22 nm</td>
</tr>
<tr>
<td>Die size</td>
<td>$177 \text{ mm}^2$</td>
</tr>
<tr>
<td>Transistor count</td>
<td>$2.6 \times 10^9$</td>
</tr>
<tr>
<td>Power</td>
<td>84 W @3.4 GHz</td>
</tr>
<tr>
<td># of cores</td>
<td>4</td>
</tr>
<tr>
<td>Cache</td>
<td>L1 64 KB per core</td>
</tr>
<tr>
<td></td>
<td>L2 256 KB per core</td>
</tr>
<tr>
<td></td>
<td>L3 8 MB</td>
</tr>
</tbody>
</table>

functions with majority gates. We consider only NAND gates for system implementation since it can be used to implement any Boolean functions. Since the device count for ASL-based NAND gate is half of its CMOS counterpart (Table 5.2 in subsection 5.1.3.2), we estimate that $1.09 \times 10^9$ (i.e. $2.18 \times 10^9/2$) ASL devices are needed to implement the system. The number of primary input is estimated as 3746, based on Rent’s rule with $k = 2.09$ and $p = 0.36$, as suggested in [261] for the Intel microprocessor family. With delay-oriented optimization due to majority function [158], the number of the device can be reduced to $894 \times 10^6$ (i.e. $477 \times 10^6$ NAND gates). It is worth mentioning that we do not consider the control circuits and clocking circuits in this evaluation.

**Step 3**: The aim of this step is to optimize the system performances. Since only NAND gates and buffers are used, the purpose of the optimization is to optimize these circuits. The performances evaluation of the NAND gate follows the method described in 4.3.1.2. The buffer count and its injection current are analysed for power optimization using the Algorithm 1.

**Step 4**: The optimized system (i.e. including NAND count, buffer count, and their injection currents) is implemented and evaluated.

### 5.2.2.2 Performance

According to the system design described in Section 5.2.2.1, we assume an Intel system composed only of buffer and NAND gates. The system optimization is thus carried out by
Algorithm 1 Power optimization of inserted buffer.

Require:
initial parameters: device parameters, e.g. \( \lambda_N \), Thermal factor \( E \), \( P_{F/C} \), \( \alpha \), \( W \) interconnection parameters, e.g. \( N, p \), constraints: delay \( \Delta t \), the ratio of buffer count and logic gate count \( \eta \)

Ensure:
optimized buffer count \( c_{bufferop} \), power \( P_{op} \), injection current \( I_{injop} \), channel length \( L_{Nbufferop} \)

1: set the initial virtual values for power \( P_{op} = 100000 \) and injection current \( I_{injop} = 1 \)
2: calculate the interconnection distribution
3: set the minimum and maximum channel length \( L_{Nmin} \) and \( L_{Nmax} \), the iteration \( L_{Niter} \)
4: set the possible channel lengths \( L_{Nbuffer} \in [L_{Nmin} : L_{Niter} : L_{Nmax}] \) and calculate the iteration step \( N_{iter} \);
5: for \( i = 1; i < N_{iter}; i + + \) do
6: find the injection current \( I_{inj} \) which leads to the constraint delay \( \Delta t \).
7: calculate the buffer count \( c_{buffer} \)
8: if \( c_{buffer} < \eta \times N \) (buffer count constraint) then
9: calculate the power \( p_i \)
10: if \( p_i < P_{op} \) then
11: update the optimized parameters: \( c_{bufferop} \), \( P_{op} \), \( I_{injop} \), \( L_{Nbufferop} \)
12: end if
13: end if
14: end for

improving buffer and NAND gates implementation. This is achieved by exploring i) the FM switching current threshold (which depends on the device width \( W \), the thermal factor \( \Delta \) and the damping factor \( \alpha \)) and ii) the spin detection efficiency (the spin diffusion length \( \lambda_{sN} \) and the spin polarization \( P_F \) and \( P_C \)).

Table 5.5 – ASL Versus CMOS Power Comparison.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>CMOS</th>
<th>ASL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology node</td>
<td>22 nm</td>
<td>48 nm</td>
</tr>
<tr>
<td>NAND count</td>
<td>( 2.18 \times 10^6 )</td>
<td>( 0.447 \times 10^7 )</td>
</tr>
<tr>
<td>(buffer count)</td>
<td>( (8.9 \times 10^6) )</td>
<td>( (2.45 \times 10^6) )</td>
</tr>
<tr>
<td>( \lambda_N )</td>
<td>( 0.06 \mu A )</td>
<td>( 0.06 \mu A )</td>
</tr>
<tr>
<td>( (NAND/buffer) )</td>
<td>( 0.5 \mu A/\mu m )</td>
<td>( 0.5 \mu A/\mu m )</td>
</tr>
<tr>
<td>Channel</td>
<td>( L_N )</td>
<td>( 10 \mu m )</td>
</tr>
<tr>
<td>(NAND/buffer)</td>
<td>( 50 \mu m )</td>
<td>( 15 \mu m )</td>
</tr>
<tr>
<td>Magnet</td>
<td>( P_{F/C} )</td>
<td>( 0.5 )</td>
</tr>
<tr>
<td>( \alpha )</td>
<td>( 0.027 )</td>
<td>( 0.027 )</td>
</tr>
<tr>
<td>Power (135 MHz)</td>
<td>( 4.4 \times 10^4 ) W</td>
<td>( 5.8 \times 10^4 ) W</td>
</tr>
<tr>
<td>(NAND/buffer)</td>
<td>( 6877 ) W</td>
<td>( 517.7 ) W</td>
</tr>
</tbody>
</table>

Results from Subsection 4.3.1.2 show that the NAND gate with three channels without junction is the best design option. Moreover, a shorter channel length leads to a lower spin attenuation, i.e. a lower injection current is needed. Considering the dipolar coupling, we set the channel length to 50(resp. 15) nm for \( W = 40(5) \) nm technology node. Table 5.5 gives the results assuming a 2 ns delay latency constraint.

As previously explained, the buffer count depends on the channel length. While a longer channel length reduces the buffer count, a higher charge current is needed to prevent the spin diffusion loss in the channel. Hence, an optimized buffer channel length and the corresponding injection current need to be optimized based on 2 ns delay and the buffer count constraints. Table 5.5 shows the optimized channel length and buffer count for the set of parameters we assume.

Results show that an ASL-based system implemented using existing fabrication and material technologies consumes much more power than the CMOS-based system. However, we can expect that with future improvement in the fabrication process and material discoveries, the ASL-based system will dominate the CMOS-based system.
5.3 Summary

In this chapter, we present the system design procedure, taking into account the gate interconnection distribution and the buffer count, by using the cell-library approach with the example of a convolution circuit and an Intel i7 system circuit. The performances of these circuit/systems are evaluated. Results point out that future improvements in ASL circuits’ fabrication process and material technologies lead to opportunities to outperform CMOS implementation.

The pipelining scheme of the ASL-based circuit is discussed with the example of the convolution circuit. The performances of different-level pipelining are evaluated. Finer-grain pipelining is possible for further performance improvement. The reconfigurability of ASL-based circuit related to the injection current polarities/values and the control terminal states is discussed with an ALU circuit. Further system architectures integrated the reconfigurability of ASL-device can be explored for new computing.
Chapter 6

Conclusions and Perspectives

The global objective of this work was to develop an ASL-based circuit/system design methodology, to evaluate their performance and to compare their performance with the standard benchmarks. This chapter summarizes the overall contributions and the results of this work, and discuss the future opportunity to pursue this work.

6.1 Conclusions

6.1.1 Global conclusions

This thesis is intended to build an integrated schema to design and evaluate the ASL-based circuit/system, from device modeling/layout to system evaluation. ASL research is still in its infancy and most of the experiments focus on the prove of spin injection/detection phenomena and the enhancement of the injection efficiency, not the circuit/system design(Chapter 1). The possibility of low power in system application prompts the build of an entire schema of ASL device from device level to system level. Hence, in this thesis, our main contributions rely on an ASL compact model and a circuit/system methodology. We first investigate the basic structure and principle of ASL device (Chapter 2): MTJ and spin injection/detection model. By exploring the physical models of ASL device, we developed a compact model, programmed with Verilog-A on Cadence, that allows the hierarchical circuit design. Validated by comparing with certain ASL experimental results, this compact model could be used to design and evaluate arbitrary circuits, theoretically(Chapter 3). Hence, we use this compact model to design the general combinational circuits based on a developed circuit/system design methodology(Chapter 4). Circuits are implemented and evaluated based on the compact model. A circuit library is developed for system design and evaluation (Chapter 5). The reconfigurability and the pipelining of ASL-based circuits are analyzed and different systems are evaluated. Results point out that future improvements in ASL circuits’ fabrication process and material technologies lead to opportunities to outperform CMOS implementation.

6.1.2 Device level

To allow the circuit implementation and evaluation, we developed a compact model of ASL device based on the physical models and the experimental results. This compact model integrates the TMR effect to describe the MTJ resistance, the STT effect to define the switching threshold current and the switching time, the spin injection/accumulation/detection effects. Moreover, we also consider the spin diffusion effect in the channel with the diffusion delay and channel breakdown current calculations. The scaling effects are considered in circuit evaluation.

The compact model is divided into several blocks and programmed with Verilog-A on Cadence, which allows cross-layer optimization of ASL-based circuits and eases the design of
hierarchical, complex circuits. By comparing with experimental results, this compact model is validated and single device simulation validates the functional behavior of the ASL device.

Moreover, we also developed the spin injection/accumulation expressions (detected voltage, non-local resistance, injection/detection efficiency) for the used ASL device (with insulator underneath the MTJ and asymmetric FM-N interfaces) in this manuscript, which enables to discuss the ALS experimental phenomena.

6.1.3 Circuit level

In circuit level, we developed a design methodology, taking into account the channel distributions, the gate interconnection and the injection current variation caused by the spin diffusion. With circuit functional definition, two synthesis methods are used for different circuit volumes: “truth table” method for small circuits and “replacement” method for integrated circuits. With given material parameters and circuit specifications/constraints, other variables: magnet dimensions, channel lengths, and injection currents, are specified and optimized, followed by the circuit implementation and verification based on the compact model.

Based on this methodology, we have implemented the combinational circuits: basic logic circuits, arithmetic logical functions, data transmission functions and code converters. Through SPICE simulations, the functional behaviors of these circuits are validated; their performances are evaluated and a performance library is established for system design and evaluation.

6.1.4 System level

System design uses the cell-library approach, based on the performance library we developed in circuit level. Interconnection issues including buffer count and interconnection distributions are considered in system design. A convolution and an Intel i7 system circuits are developed based on the “replacement” method, with the logic functions replaced by ASL-based circuits and circuits optimized based on the majority Boolean algebra. Circuits performances are evaluated with the benchmarked circuits in circuit level. Results point out that future improvements in ASL circuits’ fabrication process and material technologies lead to opportunities to outperform CMOS implementation.

Moreover, we also discuss the reconfigurability of ASL-based circuit caused by the injection current polarities/values, control terminals states and the terminal weights issued from the majority principle. The reconfigurability of different circuits are explored: the configurations and the corresponding functions are listed. Two ALU circuits are designed exploring the reconfigurability.

Considering the performance improvement, we explore the pipelining scheme of the ASL-based circuit. Each magnet in an ASL-based circuit uses a separated clocked signal and the circuit is pipelined with MTJs as latches inserted between different stages. Different-level pipelinings of the convolution circuit are discussed based on the basic circuit benchmark.

6.2 Perspectives

The main works of this manuscript are the development of an ASL compact model and a circuit/system design methodology. As an emerging technology, ASL device is in its infancy. Although we have explored the ASL device from device level to system level, there are some points which can further improve the ASL applications.

6.2.1 Modeling

Our compact modeling integrates the necessary effects: STT, TMR, spin injection/detection effects, the channel breakdown current and the spin diffusion delay. However, the parameter
variations caused by the thermal effect are not considered, which need specific models to define the trends of variations according to the experimental results. Moreover, the width of the used MTJ model is in the range of 25nm and 40 nm. Smaller MTJ dimensions are necessary to improve the device performance. Hence, an MTJ physical and compact model of sub-nanometer dimensions needs to be developed. For the channel in the spin injection/detection, we do not consider the edging effect which will influence the spin diffusion. Hence, a more sophisticated model could be developed by taking into account of the above-mentioned effects, to precisely evaluate the device and circuit performance.

6.2.2 Circuit Layout

In circuit level, we have developed a circuit design methodology and designed some combinatorial circuits. With implementations and simulations, the performances of these circuits are evaluated with the dimensions (channel lengths, MTJ width, and length) defined. However, to precisely evaluate the circuit, the real layout should be done and a layout method taking into account the placement, the timing, etc, should be developed. Moreover, in performance evaluation, we do not consider the CMOS auxiliary circuits, e.g. for power supply, which can be considered in the future evaluation.

The synthesis methods used for circuit design are the “truth table” method for small circuits and “replacement” method for complex/integrated circuits. However, the “replacement” method is still based on the CMOS design. A new synthesis method for complex circuits should be developed considering the majority principle and the special properties (e.g. reconfigurability) of ASL device.

6.2.3 System evaluation and application

In system level, we used the convolution circuits, and the Intel i7 system as the examples to present the system evaluation method and the pipelining of ASL-based circuits/systems. However, the system design is based on the “replacement” method that is not perfectly suitable for the ASL-based system, as we mentioned in the previous subsection. Moreover, the calculation of the interconnection distribution follows the empirical equations of CMOS technology. The proper interconnection distribution equations should be found for the ASL-based system.

We have discussed the pipelining of ASL-based circuits, and finer-grain pipelining could be discussed in the future. The reconfigurable property of ASL-based circuits could be exploited to design more complex circuits.

In a system evaluation, both memories and logics should be taken into account and we should consider the special property that the ASL-based device can realize both non-volatile memory and logic functions, which overcomes the bottleneck between the logic and memory communications and prompts the exploration of new computing architectures.

Besides the logic-in-memory and analog computing, another investigation of ASL device is its possibility to build neuromorphic computing. The properties of spin current superposition and threshold for state switching shows the possibility for neuromorphic architectures. As synapse, multi-level MTJs can be used to store the integrated weights. Associated with the reconfigurability of ASL device, ASL device can be used to efficiently implement a neuromorphic architecture.
Bibliography


[70] Mrigank Sharad, Deliang Fan, Kyle Aitken, and Kaushik Roy. Energy-efficient non-
boolean computing with spin neurons and resistive memory. *IEEE Transactions on

energy-efficient neuromorphic computers. *Journal of Applied Physics*, 114(23):234906,
2013.

based neuron-synapse module for ultra low power programmable computational net-
worls. *Proceedings of the International Joint Conference on Neural Networks*, pages

[73] Mirko Prezioso, Farnood Merrikh-Bayat, BD Hoskins, GC Adam, Konstantin K
Likharev, and Dmitri B Strukov. Training and operation of an integrated neuromorphic

A neuromorphic implementation of multiple spike-timing synaptic plasticity rules for

low energy analog image processing using spin based neurons. In *Proceedings of the

[76] Damien Querlioz, Olivier Bichler, and Christian Gamrat. Simulation of a memristor-
based spiking neural network immune to device variations. *Proceedings of the Interna-
tional Joint Conference on Neural Networks*, pages 1775–1781, 2011.

[77] Terrence C Stewart, Ashley Kleinhans, Andrew Mundy, and Jörg Conradt. Serendipi-

[78] Giacomo Indiveri, Bernabé Linares-Barranco, Robert Legenstein, George Deli.georgis,
and Themistoklis Prodromakis. Integration of nanoscale memristor synapses in neu-

[79] Mrigank Sharad, Charles Augustine, Georgios Panagopoulos, and Kaushik Roy. Propo-
2012.

[80] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. XNOR-

[81] Shankar Ganesh Ramasubramanian, Rangharajan Venkatesan, Mrigank Sharad,
Kaushik Roy, and Anand Raghunathan. SPINDLE: SPINtronic deep learning engine for
large-scale neuromorphic computing. In *Proceedings of the international symposium

[82] Xichuan Zhou, Shengli Li, Kai Qin, Kunping Li, Fang Tang, Shengdong Hu, Shujun
Liu, and Zhi Lin. Deep adaptive network: An efficient deep neural network with sparse

[83] WS Zhao, Guillaume Agnis, Vincent Derycke, A Filoramo, JP Bourgoin, and C Gam-
rat. Nanotube devices based crossbar architecture: toward neuromorphic computing.


Appendix A ASL Performance Equations Derivation

The ASL logic device is presented in Fig. 1, with an asymmetric structure. A tunnel barrier $C1$ is placed between the ferromagnet $FM1$ and the channel $N$ to reduce the mismatch problem and the reciprocity of ASL device, while single interface $C2$ is formed between the ferromagnet $FM2$ and the channel $N$. The isolator layer beneath the FM prevent the current flowing into another direction. A ground lead $G$ is placed to enhance the non-reciprocity of ASL device.

![Figure 1 – Asymmetric ASL structure. (a) Non-local geometry for spin injection and detection. (b) Cross view of the non-local geometry.](image)

We suppose that the quantities labeled $N$ vary along $x$, while those of $FM1$ and $FM2$ along $z$, as indicated in Fig. 1. The whole current is injected into the channel through the tunnel barrier (right side) and there is no current injected into another side of the $FM1$.

The boundary conditions for the spin quasichemical potential $\mu_s$ at infinities are:

$$\mu_{sF1}(-\infty) = \mu_{sF2}(\infty) = \mu_G(\infty) = 0 \quad (1)$$

Let us consider the injector, the detector separately.

**Spin injector: $FM1/N$ junction**

The distribution of the the spin current is shown in Fig. 1 (b). In $FM1$, by using the Eq. 2.11, the spin current at the contact ($z = 0$) and through the contact $C1$ are:

$$j_{sF1}(0) = jP_{F1} + \frac{1}{R_{F1}} \mu_{sF1}(0)$$

$$j_{sC1} = jP_{C1} + \frac{1}{R_{C1}} [\mu_{sN}(0) - \mu_{sF1}(0)] \quad (2)$$

where $P_{F1/C1}$ is the conductivity spin polarization in $F1/C1$ and expressed as $P_F = \frac{\sigma_1 - \sigma_-}{\sigma_1 + \sigma_-} = \frac{\sigma_1}{\sigma_1 + \sigma_-}$, $R_{F1/C1}$ is the spin resistance of $F1/C1$ and expressed as $\frac{\sigma_1}{4\sigma_1\sigma_-}$.

The spin current at the debut of the channel ($x = 0$) and the ground lead are:

$$j_{sN}(0) = \frac{1}{R_N} [-\mu_{sN}(0)c\coth(L_N/\lambda_{sN}) + \frac{\mu_{sN}(L_N)}{\sinh(L_N/\lambda_{sN})}]$$

$$j_G(0) = \frac{1}{R_G} \mu_{sN}(0) \quad (3)$$
where $R_N/R_G$ is the spin resistance of the channel/ground lead and $R_N(R_G) = \lambda sG/\sigma_{N(G)}$, $L_N$ is the channel length, $\mu_{sN}(0/L_N)$ is the spin quasichemical potential at $x = L_N$ and $\lambda_{sN}$ is the spin diffusion length of the channel.

The continuity of the spin current at the contact requires that:

$$j_{sN}(0+) = j_{sN}(0-) + j_{sC1} = j_{sN}(0-) + j_{sF1} \quad (4)$$

Using the above algebraic system, supposing the ground and the channel have the same material, we find that:

$$\mu_{sN}(0)[\frac{R_N}{RC_1 + R_F1} + \frac{\exp(L_N/\lambda_{sN})}{\sinh(L_N/\lambda_{sN})}] - \mu_{sN}(L_N) \frac{1}{\sinh(L_N/\lambda_{sN})} = -jR_N \frac{PC_1RC_1 + PF_1RF_1}{RC_1 + RF_1} \quad (5)$$

**Spin detector: FM2/N junction**

Similar to the injector, we obtain for the spin currents in the ferromagnet FM2, the interface C2 and the channel N:

$$j_{sF2}(0) = PF_2j_{channel} \frac{1}{RF_2} \mu_{sF2}(0)$$

$$j_{sC2}(0) = j_{channel}PC_2 + \frac{1}{RC_2}(\mu_{sF2}(0) - \mu_{sN}(L_N)) \quad (6)$$

$$j_{sN}(L_N) = \frac{1}{R_N} [\frac{\mu_{sN}(0)}{\sinh(L_N/\lambda_{sN})} + \mu_{sN}(L_N) \coth(L_N/\lambda_{sN})]$$

where $j_{channel}$ is the charge current flowing into the channel.

By applying the continuity condition of the spin currents in the detector, $j_{sF2}(0) = j_{sC2}(0) = j_{sN}(L_N)$, we get another relationship between the quasichemical potential at $x = 0$ and $x = L_N$:

$$\frac{1}{R_N \sinh(L_N/\lambda_{sN})} \mu_{sN}(0) + \frac{PF_2RF_2 + PC_2RC_2}{RF_2 + RC_2} j_{channel} = \left( \frac{\coth(L_N/\lambda_{sN})}{R_N} + \frac{1}{RF_2 + RC_2} \right) \mu_{sN}(L_N) \quad (7)$$

From Eqs. 5 and 7, we can calculate the quasichemical potentials at $x = 0$ and $x = L_N$:

$$\mu_{sN}(L_N) = A/B$$

$$\mu_{sN}(0) = \sinh(L_N/\lambda_{sN})(\frac{R_N}{RC_2 + RF_2} + \coth(L_N/\lambda_{sN}) \mu_{sN}(L_N))$$

$$- j_{channel} \sinh(L_N/\lambda_{sN}) R_N \frac{PC_2RC_2 + PF_2RF_2}{RF_2 + RC_2}$$

$$A = -jR_N \frac{PC_1RC_1 + PF_1RF_1}{RC_1 + RF_1} + j_{channel} \frac{R_N^2 \sinh(L_N/\lambda_{sN})}{RF_2} \frac{PC_2RC_2 + PF_2RF_2}{(RC_2 + RF_2)(RC_1 + RF_1)}$$

$$+ j_{channel} R_N e^{L_N/\lambda_{sN}} \frac{PC_2RC_2 + PF_2RF_2}{RC_2 + RF_2}$$

$$B = e^{L_N/\lambda_{sN}} \coth(L_N/\lambda_{sN}) + \frac{R_N \cosh(L_N/\lambda_{sN})}{R_{C1} + RF_1} + \frac{e^{L_N/\lambda_{sN}} R_N}{R_{C2} + RF_2}$$

$$+ \frac{R_N^2 \sinh(L_N/\lambda_{sN})}{(RC_1 + RF_1)(RC_2 + RF_2)} - \frac{1}{\sinh(L_N/\lambda_{sN})} \quad (8)$$

The value of the detected voltage relates to how much spin is polarized and transported to the detector. It is defined as:

$$V_{det} = \mu_N(\infty) - \mu_{F2}(\infty) \quad (9)$$
Based on the spin-charge coupling equation $\Delta \mu = j\mathbb{R} - P_\sigma \Delta \mu_s$ deduced from Eq. 3.7, $V_{\text{det}}$ is:

$$V_{\text{det}} = -j_{\text{channel}}(\mathbb{R}_C + \mathbb{R}_F) - j_2 \frac{R_{c2}R_{F2}}{R_{c2} + R_{F2}} (P_{C2} - P_{F2})^2$$

$$- \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{R_{c2} + R_{F2}} \mu_s N(L_N)$$

(10)

The non-local resistance $R_{NL}$ is:

$$R_{NL} = \frac{V_{\text{det}}}{j}$$

(11)

and the non-local resistance difference $\Delta R_{NL}$ is:

$$\Delta R_{NL} = \frac{V_{\text{det}AP} - V_{\text{det}P}}{j} \approx 2R_{NL}$$

(12)

The spin injection efficiency $P_{\text{inj}}$ determines how many currents will be injected into the channel. Based on Eq. 2, the injected spin current and spin injection efficiency are given as:

$$I_{\text{sinj}} = j \frac{P_{C1}R_{c1} + P_{F1}R_{F1} + \mu_s N(0)}{j}$$

$$P_{\text{inj}} = \frac{I_{\text{sinj}}}{j}$$

(13)

The ASL efficiency $P_{\text{eff}}$ determines how many spin currents will be transported to the detector to switch the MTJ state. The spin detection current and spin detection efficiency are given as:

$$I_{\text{det}} = j_{\text{channel}} \frac{P_{C2}R_{c2} + P_{F2}R_{F2}}{R_{c2} + R_{F2}} - \frac{1}{j} \frac{\mu_s N(L_N)}{R_{c2} + R_{F2}}$$

$$P_{\text{eff}} = \frac{I_{\text{det}}}{j}$$

(14)

The above equations can be simplified, depending on different cases:

- If there is charge current $j_{\text{channel}}$ in the channel or not, namely $j_{\text{channel}} \neq 0$ or $j_{\text{channel}} = 0$

- The contact is transparent ($R_{ci} \ll R_N$) or a tunnel barrier ($R_{ci} \gg R_{F1}$).

- The contact is spin polarized ($P_{Ci} \neq 0$) or unpolarized ($R_C = \mathbb{R}_C$).
Appendix B Source Code of ASL Compact Model

Input ferromagnetic model

module model_ferromagnet_res(inc,ins,outc,outs);

inout inc,ins,outc,outs;

electrical inc,ins;
electrical outc,outs;

//parameters related to the magnetic layer

parameter real rho=2.6e-6; //resistivity of CoFeB (magnet layer material)
parameter real thick=1.3e-9; //magnet thickness
parameter real P=0.5; //magnet conductivity polarization
parameter real lamda=2e-10; //spin diffusion length of magnetic layer
parameter real w=40e-9; //magnet width
parameter real l=40e-9; //magnet length

real area; //cross area (to calculate the magnetic layer resistance)
real R_real; //electrical resistance of FM
real R_eff; //effective spins) resistance of FM

analog
begin

area=w*l;
R_real=rho*thick/area;
R_eff=rho*lamda/area/P;

I(inc,outc)<+1/R_real*V(inc,outc)+1/R_eff*V(ins,outs);
I(ins,outs)<+P/R_real*V(inc,outc)+1/(P*R_eff)*V(ins,outs);

end
endmodule

Output ferromagnetic model

//unit in SI
shopping the stochastic effect

'include "constants.vams"
'include "disciplines.vams"

define sqrt(x) pow( (x), 0.5)

//Constant definition in SI

//elementary charge
'define q 1.6e-19

//Bohr Magneton Constant in SI
'define ub 9.27e-24

//Boltzman Constant
'define kB 1.38e-23

//Electron Mass
'define m 9.10e-31

//Euler’s constant
'define C 0.577

//vacumm permeability [H/m]
'define u0 1.2566e-6

module Model_MTJ_FM_OUT(inc,ins,outc,outs,state,T1,T2,Sout);
inout inc,ins,outc,outs;
inout T1,T2;
output Sout,state;

electrical inc,ins;
electrical outc,outs;
voltage state;
voltage Sout;
electrical T1,T2;

//------------------------Parameter Definition------------------------

//parameter related to the magnetic layer

parameter real rho=2.6e-6; //resistivity of magnetic free layer
parameter real thick=1.3e-9; //magnet thickness
parameter real P=0.5; //magnet conductivity polarization
parameter real lambda=2e-10; //spin diffusion length of magnetic layer
parameter real w=40e-9; //magnet width
parameter real l=40e-9; //magnet length
///parameter related to the dynamic property

parameter real alpha=0.027; //Gilbert damping coefficient
parameter real gamma=1.76e11; //Gyromagnetic constant [rad/s/T]
parameter real Hk=2.7e5; //Out of plane magnetic anisotropy
parameter Ms=1.194e6; //Saturation field in the free layer

parameter real PhiBas=0.4; //the energy barrier height for MgO in electron-volt
parameter real RA=5; //Resistance area product in ohmum2
//Voltage bias when the TMR(real) is 1/2TMR(0) in Volt,
//experimental value with MgO barrier
parameter real Vh=0.5;
parameter TMR=0.7; //TMR(0) with Zero Volt Bias Voltage

///other parameters

parameter real Pwidth=10e-3; //Current pulse width in second
parameter real T=300; //room temperature in Kelvin

//////////////////////Parameter Definition////////////////////////////////////////

//////////////////////Current calculation intermediate variable///////////////////

real area; //cross section of MTJ free layer to calculate the resistance
real R_real; //electrical resistance of FM
real R_eff; //effective (spin) resistance of FM

//////////////////////Dynamic calculation intermediate variable////////////////////

real istate; //output state of the MTJ
real duration; //average switching time
real Iset; //detection current
real Teta; //factor to calculate the duration
real PAP; //to present the state; 0 parallel, 1 antiparallel
real PolaP,PolaAP;
real IcP,IcAP,IC;
real Em,EE;
real FA; //Factor for calculating the resistance based on RA
real Vb; //V(T1,T2)
real R0; //Resistance of MTJ when bias voltage = 0 V
real Rp,Rap; //Resistance of P and AP states
real TMR_real; //TMR to calculate the P and AP state
real Id;
//representing the MTJ output state, output state after the read current
real tstate,isout;

analog
begin

//////////////////////Detective current calculation///////////////////////////////////
area=\text{M\_PI*(w/2)*(w/2)};
R\_real=rho*thick/area;
//R\_eff=rho*lambda/area/(1-P*P);
R\_eff=rho*lambda/area/P;

I(inc,out)=+1/R\_real*V(inc,outc)+1/R\_eff*V(ins,outs);
I(ins,outs)=P/R\_real*V(inc,outc)+1/(P*R\_eff)*V(ins,outs);

//-------------------State and duration calculation------------------

FA=3322.53/RA; //initialization of resistance factor according to RA product

papaer "Dynamic compact model of thermally
assisted switching magnetic tunnel junctions"
//resistance, model of ZhangYue
R0=(thick*1e10/(FA*sqrt(PhiBas)*area*1e12))*exp(1.025*thick*1e10*sqrt(PhiBas));
Em='u0*Ms*thick*area+Hk/2;
EE=Em/\k_B/T;

//-------------------change state depending on the detective current----------------
Isdet=I(ins,outs);

if (Isdet>0) //anti-parallel
begin
PAP=1;
Teta='M\_PI;

//Polarization state anti-parallel
PolaAP=1/(-4+('sqrt(P)+1/('sqrt(P))*(('sqrt(P)+1/('sqrt(P))
*('sqrt(P)+1/('sqrt(P))*((3+cos(Teta))/4))))-(P/(2*(1+P*P*cos(Teta)))));

IcAP=\alpha*\gamma\*q*\'u0*Ms*thick*area+Hk/(\'ub*PolaAP); //critical current

$\text{strobe}("IcAP=\%f\%%",IcAP);

if(abs(Isdet)>=abs(IcAP))
begin
duration=('C*ln(\text{M\_PI\*M\_PI*EE/4})*\'q*\'Ms*area*thick*(1+P*P)
/(2*\'ub*P*(abs(Isdet)-abs(IcAP)));

istate=1.0;

end

end

if (Isdet<0) //parallel
begin
PAP=0;
Teta=1;

//Polarization state anti-parallel
PolaP=(1/(-4+(‘sqrt(P)+1/‘sqrt(P))∗(‘sqrt(P)+1/‘sqrt(P))∗((‘sqrt(P)+1/‘sqrt(P))∗((3*cos(Teta))/4))+(P/(2*(1+P*P*cos(Teta)))));

IcP=alpha*gamma∗‘q∗‘u0*Ms*thick*area*Hk/(‘ub*PolaP);  //critical current
if(abs(Isdet)>=abs(IcP))
begin

duration=(‘C+ln(‘M_PI∗‘M_PI*EE/4))∗‘q*Ms*area*thick*(1+P∗P)
/(2∗‘ub∗P∗(abs(Isdet)-abs(IcP)));

istate=0.0;

end

end

V(state)<+transition(istate,duration,1e-12);
V(Sout)<+isout;
I(T1,T2)<+Id;

end

endmodule

Tunnel barrier model

‘include "constants.vams"
‘include "disciplines.vams"

module model_tunnel_barrier_res(inc,ins,outc,outs);

inout inc,ins,outc,outs;

electrical inc,ins,outc,outs;

parameter real Pt=0.5;  //spin conductance polarisation, different with PFM
parameter real w=40e-9;  //tunnel width
parameter real l=40e-9;  //tunnel length

//tunnel barrier
parameter real RA=1e-10;  //tunnel resistance area product

real area;
real R_real;

analog
begin

area=w*1;

R_real=RA/area;  //tunnel barrier resistance calculation

I(inc, outc)<+1/R_real*V(inc, outc)+1/R_real*Pt*V(ins, outs);
I(ins, outs)<+1/R_real*Pt*V(inc, outc)+1/R_real*V(ins, outs);
end
endmodule

Interface model

'include "constants.vams"
'include "disciplines.vams"

/*--------Elementary Charge-------------*/
'define e 1.6e-19

module model_Interface(inc, ins, outc, outs);
inout inc, ins, outc, outs;
electrical inc, ins, outc, outs;

parameter real Pt=0.5;  //spin conductance polarisation
parameter real w=40e-9;  //tunnel width
parameter real l=40e-9;  //tunnel length

//simple interface
parameter real h=6.626e-34;
parameter real kf=1.36e10;

real area;
real R_real;

//number of conducting modes at the interface,
//for interface resistance calculation
real modes;
analog
begin

area=w*1;

modes=kf*kf/2/'π_M_PI;

R_real=h/(‘e*‘e*mode*area);  //FM/NM interface resistance

end
endmodule
I(inc, outc)++1/R_real*V(inc, outc)+1/R_real*Pt*V(ins, outs);
I(ins, outs)<+1/R_real*Pt*V(inc, outc)+1/R_real*V(ins, outs);

end
endmodule

Channel model

Channel shunt model

`include "constants.vams"
`include "disciplines.vams"

module model_channel_res_gs(in, out);
inout in, out;
electrical in, out;

parameter real lamda=1e-6; //spin diffusion length
parameter real w=40e-9; //channel width
parameter l=100e-9;   //channel length

// chose one case depending on the channel material: metal or semiconductor

//for graphene, considering the breakdown current
parameter real rho=1/(0.35e-3); //channel resistivity: ohm
parameter real Jbr=20e3;  //breakdown current density
parameter real thick=1;   //

                            -----------------------------------------

real area;
real R_real;   //electrical resistance
real R_eff;   //effective spin resistance
//real Imax;   //tolerable maximum current

analog
begin

area=thick*w; //for general case
R_real=rho*l/area;
R_eff=rho*lamda/area;
I_max=Jbr*w;
I(in, out)< V(in, out)/R_eff/sinh(l/lambda)*(cosh(l/lambda)-1);

//how to add/verify the breakdown current
if (I(in, out)>I_max)
$strobe("Warning: channel breakdown");
Channel series model

'include "constants.vams"
'include "disciplines.vams"

module model_channel_res(inc,ins,outc,outs);
inout inc,ins,outc,outs;
electrical inc,ins;
electrical outc,outs;

parameter real lamda=1e-6; //spin diffusion length
parameter real w=40e-9; //channel width
parameter l=100e-9; //channel length

//chose one case depending on the channel material

//for graphene, considering the breakdown current
parameter real rho=1/(0.35e-3); //channel resistivity
parameter real Jbr=20e3; //breakdown current density
parameter real thick=1; //for graphene

real area;
real R_real; //electrical resistance
real R_eff; //effective spin resistance
//real Imax; //tolerable maximum current

analog
begin

area=thick*w; //for general case

R_real=rho*l/area;
R_eff=rho*lamda/area;

I_max=Jbr*w;
I(inc,outc)< V(inc,outc)/R_real;
I(ins,outs)< V(ins,outs)/R_eff/sinh(l/lamda);

//how to add/verify the breakdown current
if (I(ins,outs)>I_max)
$strobe("Warning:channel breakdown");
end
endmodule

Ground model

include "constants.vams"
include "disciplines.vams"

module model_ground(inc,ins,outc,outs);
  inout inc,ins,outc,outs;

  electrical inc,ins;
  electrical outc,outs;

  parameter real rho=1/(0.35e-3); //channel resistivity of graphene
  parameter real l=1e-6; //ground length
  parameter real w=40e-9; //ground width
  parameter real t=40e-9; //ground thickness
  parameter real lamda=1e-6; //ground material spin diffusion length

  real area;
  real R_real;
  real R_eff;

  analog
    begin

      //area=thick*w;
      area=w*t;

      R_real=rho*l/area; //general case
      R_eff=rho*lamda/area; //general case

      //R_real=rho*l/w;
      //R_eff=rho*lamda/w; //graphene

      I(inc,outc)<+ V(inc,outc)/R_real;
      I(ins,outs)<+ V(ins,outs)/R_eff;
      //I(ins,outs)<+ V(ins,outs)/R_eff/Pg;

    end

endmodule
List of Figures

1.1 Hierarchical organization and opportunities for CMOS and emerging technologies [5]. ................................................. 2

2.1 Two Magnetic Tunnel Junction (MTJ) states with different resistances based on Tunnel Magnetoresistance (TMR) effect: Parallel ($R_P$, state “0”) and Antiparallel ($R_{AP}$, state “1”); If the current flows from the free layer to the pinned layer and is larger than the critical current $I_c$, the state will be switched to Parallel; on the contrary, the state will be switched to Anti-parallel. .......................... 8

2.2 A schematic of tunneling process of MTJ, electron spin orientation is preserved while traveling from one FM layer to another. (a) Parallel configuration; (b) Anti-parallel configuration. .................................................. 8

2.3 Schematic illustration of Spin Transfer Torque (STT) effect in a magnetic nano-pillar consisting of two Ferromagnetic (FM) layers (FM1/2) switching a non-magnetic layer (NM). ............................................................................. 10

2.4 Two different structures of 2-bit Multi-layer MTJ (ML MTJ): (a) parallel ML MTJ; (b) series ML MTJ. .............................................................. 11

2.5 Schematic of 1T1R memory cell [198]. ............................................................ 12

2.6 Schematic of Spin-MTJ based Non-Volatile Flip-Flop [203]. ...................... 13

2.7 Neuromorphic architecture based on “STT-Neuron” [29]. ......................... 13

2.8 Spin valve structure. (a) non-local spin valve; (b) local spin valve; (c) Schematic of All Spin Logic with perpendicular MTJs. .............................. 14

2.9 ASL working flow. Step 1: MTJ states writing with an applied voltage/current $I_{write}$; Step 2: Spin current injected with a charge current $I_{inj}$ injected and polarized through the MTJ free layer to the channel; Step 3: Output MTJ state switched with the injected and diffusive spin current; MTJ state read with an applied voltage/current $I_{read}$. .................................................. 15

2.10 (a) Non-local geometry for spin injection and detection. (b) Cross view of the non-local geometry. ............................................................. 15

2.11 (a) Spin-circuit representation of a non-local ASL device. (b) Electrical circuit representation of ASL inverter. ................................. 20

2.12 (a) All Spin Logic (ASL) based Inverter and Buffer. The function depends on the polarity of injected current. (b) All Spin Logic based AND/OR logic circuit; The function depends on the state of F terminal by using three injected currents with the same polarity. If the magnetization orientations of F are parallel, this circuit realizes the function AND, otherwise, it realizes the function OR. ................................................................. 20

2.13 (a) Device structure for bipolar spin neuron based ASL; (b) device structure for unipolar spin neuron based ASL. ................................. 21

2.14 Neural network based on Domain Wall Magnet (DWM) and MTJ. Spin currents through DWM synapses gathered underneath the MTJ neuron via the channel. ............................................................. 22
3.1 Asymmetric ASL device with perpendicular MTJ and its compact model. (a) ASL device with the asymmetric structure: $L_N$, $W$ and $L_F$ as the channel length, MTJ width and MTJ length. Two MTJs are used as the memories and their free layers form the injector/detector with the channel; A tunnel barrier is only placed between the injection free layer and the channel, which forms an asymmetric structure; An insulator is placed underneath the MTJ, to prevent the current flowing into the another channel; A ground lead is placed near the injector, to guarantee the non-reciprocity of the circuit. (b) MTJ switching with different current polarities. (c) Spin-circuit model of the basic ASL device. Each block is a $\pi$-network, and corresponds with the component in (a). ................................................. 24

3.2 Hierarchy of the developed ASL device model. ................................................. 35

3.3 Different model blocks of an inverter/buffer based on ASL device. ................. 36

3.4 (a) Simulation and characterization results $\Delta R_{NL}$ comparison for channels implemented with $Mg$ [115] and $Cu$ materials [105]. (b) Spin resistance difference $\Delta R_{NL}$ comparison of the trilayer-graphene/MgO/Py junction between compact model and experimental result. ................. 36

3.5 Performance dependence of the parameters of ASL device. ......................... 37

3.6 Channel breakdown current density $J_{BR}$ according to channel length $L_N$ and channel width $W$. ................................................. 38

3.7 Performance dependence of channel width $W$ and inter-dependence of STT parameters. (a) Delay dependence of the width, with the other parameters constant; Inset shows the thermal factor $\Delta$ and the critical current $I_0$ dependence of the device width; (b) Critical current and delay dependence of the device width, with the thermal factor $\Delta$ fixed at 80, by changing the thickness of the free layer $t_F$; Inset shows the corresponding thickness of the free layer and the $K_{eff}$ with different widths; (c) Critical current and delay dependence of the device width, with the thermal factor $\Delta$ fixed at 40, by changing the thickness of the free layer $t_F$; Inset shows the corresponding thickness of the free layer and the $K_{eff}$ with different widths; (d) Delay dependence of the damping factor $\alpha$. ................................................. 39

3.8 (a) Spin injection efficiency $P_{eff}$ VS. Resistance area product of the tunnel barrier $RAC$ in a symmetric structure, with tunnel barriers added in both of the injector and detector. (b) Spin injection efficiency $P_{eff}$ versus resistance area product of the tunnel barrier $RAC$ of the injector in an asymmetric structure, with the tunnel barrier added only in the injector. (c) Delay and energy dependence of the tunnel resistance area product $RAC$; a $RAC$ exists for the minimum energy, in this case, $RAC$ equals to $4e^{-11} \Omega m^2$. (d) Spin injection efficiency increases with the ground resistance while resistances of the other parts are constant. ................................................. 40

3.9 Delay and channel spin current $I_{spinj}$ according to the injection current $I_{inj}$ and channel lengths $L_N$. For each channel length, the breakdown current is labeled on $I_{spinj}$ curves. The following defines i) the maximum injection current, ii) the corresponding spin injection current and iii) the delay according to the channel length: (1.9 mA, 803 $\mu$A, 0.292 ns) for 100 nm, (1.587 mA, 581 $\mu$A, 0.4164 ns) for 200 nm, (1.565 mA, 509 $\mu$A, 0.5039 ns) for 300 nm, (1.63 mA, 478 $\mu$A, 0.5586 ns) for 400 nm, (1.72 mA, 463 $\mu$A, 0.6108 ns) for 500 nm. Inset gives the spin diffusion delay $t_{diff}$ according to $L_N$. ................................................. 41
3.10 (a) Delay dependence of the spin diffusion length of the channel $\lambda_{sN}$ with different spin polarizations of the tunnel resistance $P_C$; Inset shows the dependence of the spin detection efficiency on $\lambda_{sN}$, with different values of $P_C$; (b) Delay dependence of the channel length $L_N$, with different spin polarizations of the tunnel barrier $P_C$; Inset shows the the dependence of the spin detection efficiency on $L_N$, with different values of $P_C$. .......................... 42

3.11 Simulation of ASL based Inverter/buffer. $V_{\text{write}}$, $S_{\text{in}}$, $I_{\text{inj}}$ and $S_{\text{out}}$ are the writing voltage, input state, injection current and output state, in Fig. 3.3. $I_{\text{inj}}$ and $I_{\text{det}}$ are the injected and detected spin current, corresponding to outs of Injector and ins of Detector in Fig. 3.3. .......................... 43

4.1 5-inputs majority gate with inputs $In1$, $In2$, $In3$, $In4$, $In5$, output $Out$ and its symbol presentation. .......................... 46

4.2 XOR2/3 synthesized based on replacement method. .......................... 49

4.3 Circuit design methodology based on ASL device. .......................... 50

4.4 Basic logic circuits and combinational logic circuits. .......................... 52

4.5 (a) Inverter/Buffer architecture, $In$ as the input and $Out$ as the output. Positive current: flowing from the MTJ free layer to the channel, induces an opposite spin magnetization orientation, realizing the inversion; on the contrary, negative current realizes the buffer function. (b) Vertical view of the architecture with the channel length. (c) Functional symbol of the inverter/buffer, $In$ as the input and $Out$ as the output, $I_{\text{inj}}$ as the control signal. .......................... 53

4.6 Simulation of ASL based inverter/buffer with maximum injection current $I_{\text{inj}} = 1.9$ mA. .......................... 54

4.7 Simulation of ASL based inverter/buffer with injection current $I_{\text{inj}} = 697$ $\mu$A. .......................... 54

4.8 (a) 2-bit AND/OR/NAND/NOR architecture, $In1$ and $In2$ as the inputs, $F$ as the control terminal and $Out$ as the output. Different injection current polarities and $F$ states lead to different functions. (b) Vertical view of the architecture with the channel length $L1$ and $L2$. (c) Functional symbol of the 2-bit AND/OR/NAND/NOR. (d) Spin injection efficiency $P_{\text{eff}}$ vs. Channel distribution of this architecture. .......................... 54

4.9 Function simulation of 2-bit AND/OR/NAND/NOR. .......................... 55

4.10 (a) 3-bit AND/OR/NAND/NOR architecture, $In1$, $In2$ and $In3$ as the inputs, $F1$ and $F2$ as the control terminal and $Out$ as the output. Different injection current polarities and $F$s states lead to different functions. (b) Vertical view of the architecture with the channel length $L1$ and $L2$. (c) Functional symbol of the 3-bit AND/OR/NAND/NOR. (d) Spin injection efficiency $P_{\text{eff}}$ vs. Channel distribution of this architecture. .......................... 56

4.11 Function simulation of 3-bit AND/OR/NAND/NOR. .......................... 57

4.12 (a)”truth table” method based XOR/XNOR2/3 circuit $XOR_{\text{TT}}$: $In1/2/3$ as three inputs, $M1$ as the intermediate terminal, Out as the final output. (b) “replacement” method based XOR/XNOR2/3 circuit $XOR_{\text{rep}}$: $I_{\text{inj}1} = -I_{\text{inj}2}$. This architecture can also realize the function of full-adder and full-subtractor, with the other two intermediate outputs: $M1'$ as the inversion of the output carry of the full-adder, $M2$ as the output borrow of the full-subtractor. (c) Functional symbol of the XOR/XNOR2/3 circuit: $In1/2/3$ as three inputs, Out as final output for XOR/XNOR2/3, $M1$ as the output carry for full-adder, $M2$ as the output borrow for full-subtractor that is only output with the architecture in (b). .......................... 58

4.13 Function simulation of $XOR_{\text{TT}}$ circuit. .......................... 59

4.14 4-bit adder implementations. (a) series implementation; (b) parallel implementation; (c) 4-bit adder functional symbol. .......................... 60

4.15 Function simulation of 4-bit series adder. .......................... 61
4.16 4-bit subtractor implementation. (a) Architecture of a series 4-bit subtractor with 4 full-subtractors. The output borrow of the previous stage is the input borrow of the next stage. (b) Functional symbol of the series 4-bit subtractor: \( A_3A_2A_1A_0 \) are the minuend and \( B_3B_2B_1B_0 \) are the subtrahend; \( B_{in} \) is the input borrow; \( D_3D_2D_1D_0 \) is the output difference; \( B_n\) is the output borrow.

4.17 Function simulation of 4-bit Subtractor with full-subtractor. 

4.18 1-bit comparator implementation. (a). Architecture of 1-bit comparator: \( A/B \) as inputs, \( L(A < B), E(A = B) \) and \( H(A > B) \) as three outputs, \( I_{inj1/2} \) as two different injection current sources, where \( I_{inj1} = -I_{inj2} \); (b). Functional symbol of 1-bit comparator.

4.19 Function simulation of 1-bit comparator. \( V_{write/read} \) is the writing/reading source of MTJs; \( I_{inj1/2} \) is the corresponding injection current; \( A/B \) are the input states; \( L/E/H \) are the output states; \( F \) is the control state of the 3-input majority gate, which is 0 in this case to realize the AND function. 

4.20 2-bit comparator implementation. (a) Majority gates implementation of 2-bit comparator by using 3-input majority gates, 5-input majority gates, and XOR/XNOR gates; b. Functional symbol of 2-bit comparator, with inputs \( A1A0 \) and \( B1B0 \), outputs \( L/E/H \) and five injection current sources: \( I_{inj-4} \).

4.21 Function simulation of 2-bit comparator. \( V_{write/read} \) are the writing/reading voltage source of MTJs; \( I_{inj} \) is the injection current value; Input states are expressed as \( A1/A0/B1/B0 \) and output states are expressed as \( L/E/H \).

4.22 Function simulation of 4-bit comparator. \( A = A_3A_2A_1A_0 \) and \( B_3B_2B_1B_0 \) are the input states; \( L/E/H \) are the output states; \( Z = 1 \) is the control state value for OR function in a 3-input majority gate and \( U = 0 \) is the control state value for AND function in a 3-input majority gate.

4.23 (a) Implementation of the 4-bit array multiplier, with three 4-bit adders and sixteen AND gates: \( A = A_3A_2A_1A_0 \) and \( B = B_3B_2B_1B_0 \) as the multiplicand and multiplicand, \( M7M6M5M4M3M2M1M0 \) as the output, \( C_{in} \) as the input carry. (b) Functional symbol of the 4-bit array multiplier.

4.24 4-bit multiplier simulation results with 4-bit serial adder.

4.25 Proposed ASL devices based ALUs: (a) \( ALU_{LCA} \) requires three 5-inputs majority gates and 11 control signals for currents/voltages (S1-S11); The “A” and “B” in the figure mean the states of these terminals are written by the same write voltage sources, i.e. “\( V_{writeA} \)” and “\( V_{writeB} \)”, respectively, whereas the injection current polarity is specified to each terminal, “2M1” means the injection current for terminal “M1” is doubled; and (c) \( ALU_{MG} \) is implemented using 14 5-inputs majority gates and one control signal of the current (S1); The terminals with the same symbol are connected to a same writing voltage source, e.g. all “A” to “\( V_{writeA} \)”: Green and violet lines for the terminals connect to the corresponding injection currents; (b) and (d) are the corresponding functional symbol of two ALUs. Symbol # indicates that no current is injected in the terminal.

4.26 \( ALU_{MG} \) simulation results for full-adder, half adder, AND3/OR3 and multiplexer functions.

4.27 (a). 2-to-1 multiplexer architecture: \( In1/2 \) as two inputs, \( S_0 \) as select signal, \( Z = 1 \) and \( U = 0 \) are the control signals, \( Q \) as the output, \( M1 \) as the intermediate state; the “2” after the \( In1 \) and \( M1 \) means the weights of terminals are twice the others. (b). Vertical view of the 2-to-1 multiplexer, with channel distributions: \( L1-L4 \). (c). functional symbol of 2-to-1 multiplexer, with three different injection current sources, where \( I_{inj1} = -I_{inj2} \).

4.28 Function simulation of 2-to-1 multiplexer: \( V_{write/read} \) as MTJ writing/reading signal, \( I_{inj1-3} \) as the injection current signals, \( S_0 \) as the select signal, \( In1/2 \) as the inputs signals, \( U/Z \) as the control signals and \( Q \) as the output signal.
4.29 a. Implementation of 4-to-1 multiplexer with 3/5-input majority gates: \( S1/2 \) as two select signals, \( A/B/C/D \) as four input signals, \( Q \) as output, \( I_{\text{inj}1/2/3} \) as injection current sources, where \( I_{\text{inj}1} = I_{\text{inj}2} \). b. Functional symbol of 4-to-1 multiplexer. ................................................................. 77

4.30 Simulation of 4-bit multiplexer. \( U = 0 (Z = 1) \) is the control signal state of the 3/5-input majority gate to realize the AND/OR function. ................................. 78

4.31 1-to-2 bit demultiplexer. (a) Implemented architecture of the 1-to-2 demultiplexer with two 3-inputs majority gate (AND function with control terminal \( F = 0 \)); two injection current signals \( I_{\text{inj}1} = -I_{\text{inj}2} \). (b) Functional symbol of the 1-to-2 demultiplexer. ................................................................. 79

4.32 Function simulation of 1-to-2 bit demultiplexer: \( V_{\text{write/read}} \) as the writing/reading voltage sources to write/read the MTJ states, \( In \) as the input state, \( S \) as the select signal state, \( Y0/Y1 \) as the output states, \( U = 0 \) as the control signal state to realize the AND function in a 3-inputs majority gate. ................................................................. 79

4.33 1-to-4 bit demultiplexer. (a) Implementation of the 1-to-4 demultiplexer, with four AND gates (5-input majority gate with control state \( F = 0 \)): \( In \) as input, \( S0/1 \) as select signal, \( Y1/2/3/4 \) as output signals, \( I_{\text{inj}1/2} \) as injection current signals with \( I_{\text{inj}1} = -I_{\text{inj}2} \). (b) Functional symbol of the 1-to-4 demultiplexer, with \( I_{\text{inj}1/2} \) as two injection current signals. ................................................................. 80

4.34 Function simulation of 1-to-4 bit demultiplexer. \( V_{\text{write/read}} \) are the writing/reading signal of MTJ states; \( In \) is the input signal; \( S0/1 \) are the select signals; \( Y0/1/2/3 \) are the output signals; \( U = 0 \) is the control signal of the 5-input majority gate which realizes the AND function. ................................................................. 80

4.35 Function simulation of 1-to-8 bit demultiplexer. \( V_{\text{write}} \) is the writing signal of MTJ states; \( In \) is the input signal; \( S0/1/2 \) are the select signals; \( Y0/1/2/3 \) are the output signals; \( U = 0 \) is the control signal of the 3/5-input majority gates which realizes the AND function. ................................................................. 81

4.36 Block diagram of 4 input binary encoder: \( W0/1/2/3 \) are four inputs, \( Y0/1 \) are two outputs. ........................................................................................................ 82

4.37 4-inputs priority encoder. (a) Implementation of the encoder: \( D0 - 3 \) as input signals, \( Y0 - 3 \) as output signals, AND/OR functions realized with the control terminal \( F = 0/1 \), \( I_{\text{inj}1/2/3/4/5} \) as five different injection current signals where \( I_{\text{inj}1} = -I_{\text{inj}2} \). (b) Functional symbol of the 4-input priority encoder. ............... 83

4.38 Function simulation of 4 priority encoder. \( V_{\text{write/read}} \) are the writing/reading signals of MTJ state; \( D0 - 3 \) are the input signals; \( Y0/1 \) and \( V \) are the output signals; \( U/Z \) are the control states to realize the AND/OR functions in a majority gate. \( I_{\text{inj}1-5} \) are the different injection currents. ................................................................. 84

4.39 2-to-4 bit decoder. (a) Implementation of 2-to-4 bit decoder with four AND gates (realized with the control state \( F = 0 \) in a 3-input majority gate): \( A \) and \( B \) as inputs, \( Q0-3 \) as outputs, the inversions are realized with an positive injection current \( I_{\text{inj}2} \), \( I_{\text{inj}1} = -I_{\text{inj}2} \). (b) Functional symbol of the 2-to-4 bit decoder. .................................................................... 85

4.40 Function simulation of 2-to-4 bit decoder. \( V_{\text{write/read}} \) are the writing/reading voltage signal of MTJ states; \( A \) and \( B \) are the input states; \( Q0-3 \) are the output states; \( I_{\text{inj}1/2} \) are the injection currents; \( U = 0 \) is the control state of the 3-input majority gate to realize the AND function. ................................................................. 85

4.41 3-to-8 bit decoder. (a) Implementation of a 3-to-8 binary decoder with eight AND gates (realized by setting the control state \( F = 0 \) of a 5-input majority gate). \( A/B/C \) as three inputs, \( Y0-7 \) as eight outputs, \( I_{\text{inj}1/2} \) are two injection currents where \( I_{\text{inj}1} = -I_{\text{inj}2} \). (b) Functional symbol of the designed 3-to-8 binary decoder. .................................................................... 86
4.42 Function simulation of 3-to-8 bit decoder. $V_{\text{write/read}}$ are the writing/reading signals of MTJ states; $A/B/C$ are the three input states; $Y_0-7$ are the eight output states; $I_{\text{inj1/2}}$ are the two injection currents. $U = 0$ is the control state which realizes the AND function in a majority gate. ................. 87

4.43 7-segment display elements for all numbers. Each number corresponds to a set of illuminated segments. ................................. 87

4.44 7-segment display decoder. $A/B/C/D$ as four inputs, $a-g$ as seven outputs; $F = 0/1$ to realize the AND/OR function in a majority gate. (b) Functional symbol of the designed BCD to 7-segment display decoder. ................. 89

4.45 Function simulation of 7 segment encoder. $V_{\text{write/read}}$ are the writing/reading signals of MTJ states; $A/B/C/D$ are the input states; $a-g$ are the seven output states; $U = 0/Z = 1$ are the control states to realize the AND/OR function in a majority gate. ................................. 89

5.1 ALU circuit implementation. ........................................ 94

5.2 ASL-based circuit clocking. (a) 3-input majority gate-based XOR/XNOR2/3 circuit with clocked injection signals; (b) Clocked signals: CLK1 and CLK2 are connected to stage 1 and stage 2 respectively. The injection current for each stage can have two phases: positive and negative amplitudes to configure the inverter and buffer function respectively. (c) Activity diagram of 2-stage sequential XOR/XNOR2/3 circuit without pipelining. ................. 95

5.3 Pipelined XOR/XNOR2/3 circuit. (a) 2-stage pipelined circuit by adding MTJs as latches between stages. (b) Activity diagram of the pipelined circuit. .................................................. 96

5.4 2D 16-bit 3×3 convolution implementation. (a) General convolution circuit; (b) 2-stage pipelined convolution circuit; (c) Activity diagram of a 2-stage pipelined convolution circuit. .................................................. 99

5.5 32-bit adder tree. ..................................................... 99

5.6 16-bit array multiplier implemented with AND gates and 16-bit adders. ... 100

1 Asymmetric ASL structure. (a) Non-local geometry for spin injection and detection. (b) Cross view of the non-local geometry. .................. 129

2 (a) Schémé de dispositif ASL avec MTJ perpendiculaire, basé sur le modèle d’injection de spin non-local. (b) Deux états de MTJ avec différentes résistances basées sur la magnétorésistance à effet tunnel(TMR) : Parallèle ($R_P$, état “0”) et Antiparallèle ($R_{AP}$, état “1”); Si le courant passe de la couche libre à la couche fixe et est supérieur au courant critique $I_{\text{cr}}$, l’état basculera en configuration Parallèle; dans la situation inverse, l’état sera changé en anti-parallèle. (c) Modèle de circuit de spin du dispositif ASL. Chaque bloc est un réseau en π, et correspond aux composants présentés dans (a). ... 158

3 Différents modèles de blocs d’un inverseur/tampon basé sur un dispositif ASL. 159

4 Dépendance entre performances et paramètres du dispositif ASL. ........ 160

5 Méthode de conception des circuits basés sur le dispositif ASL. ........... 161
Mise en œuvre de circuits fondamentaux avec \( L_i \) représentant la longueur de canal. (a) Fonctions inverseur/tampon configurées avec les polarités du courant d’injection; (b) Fonctions ET/OU/NAND/NOR réalisées par une porte majoritaire à 3 entrées, avec les polarités du courant d’injection et les états de l’entrée de commande F; Nous ne considérons pas l’influence du virage de In1/F à P; (c) \( XOR_{TT} \) architecture de circuits XOR2/3 basée sur la méthode “table de vérité”, qui peut être utilisée dans le calcul de l’additionneur complet, où M1 comme sortie de retenue de somme. (d) Une autre architecture \( XOR_{rep} \) de circuit XOR2/3 basée sur une méthode de “remplacement”, composée de trois portes majoritaires à 3 entrées. Cette architecture peut servir aux calculs de l’additionneur complet et du soustracteur complet, où M1 représente la sortie retenue et F la sortie somme ou différence. 

Le chronogramme et pipelinage des circuits ASL. (a) circuit XOR/XNOR2/3 à 3 entrées avec signaux d’injection d’horloge; (b) Signaux d’horloge : CLK1 et CLK2 sont connectés à l’étape 1 et à la phase 2 respectivement. Le courant d’injection pour chaque étage peut comporter deux phases : amplitudes positives et négatives pour configurer respectivement l’inverseur et la fonction tampon. (c) circuit pipeline à 2 étages en ajoutant des MTJ comme bascules entre les étages. (d) Diagramme d’activité du circuit à pipeline.
List of Tables

2.1 ASL compact modeling comparison ........................................... 19
3.1 ASL device parameters .............................................................. 34
4.1 The truth table of XOR3 function ........................................... 47
4.2 Transformed lpsd truth table of the full-adder ................................. 47
4.3 Unitized truth table for $C_{out}$ .................................................. 48
4.4 Unitized truth table for $Sum$ .................................................... 48
4.5 Reduced Unitized truth table for $Sum$ ........................................ 48
4.6 Reconfigurable Functions Based on inverter/buffer architecture .......... 53
4.7 ASL channel distribution and injection current parameters ................. 53
4.8 Reconfigurable Functions Based on AND/OR/NAND/OR2 architecture .... 55
4.9 ASL channel distribution and injection current parameters ................. 55
4.10 Reconfigurable Functions Based on 5-input majority gate architecture .... 56
4.11 ASL channel distribution and injection current parameters ................. 56
4.12 Reconfigurable functions based on $XOR_{TT}$ structure in Fig. 4.12 (a) ... 58
4.13 $XOR_{TT}$ structure channel distribution and injection current parameters.. 58
4.14 Integrated functions configurations of $ALU_{LCA}$ .......................... 73
4.15 Integrated functions configurations of $ALU_{MG}$ .......................... 74
4.16 $ALU_{LCA}$ and $ALU_{MG}$ performance comparison .......................... 74
4.17 Reconfigurable Functions Based on MUX Structure synthesized with “truth table” method .......................................................... 76
4.18 Truth table of 4 input binary encoder .......................................... 82
4.19 Truth table of 4 input priority encoder ......................................... 83
4.20 Truth table of BCD decoder ....................................................... 88
4.21 Basic circuits benchmarking ....................................................... 90
4.22 Performance of CMOS-based logic circuits at 25C, 1V type process with 40 nm CMOS [260] .......................................................... 91

5.1 Integrated functions configurations of ALU ..................................... 94
5.2 DEVICE COUNT COMPARISON BETWEEN CMOS and ASL .............. 98
5.3 Convolution Circuit Implementation Results ................................... 101
5.4 Intel i7 Haswell Architecture Characteristics .................................. 101
5.5 ASL Versus CMOS Power Comparison .......................................... 102

1 Évaluation des circuits de base du dispositif ASL ............................... 163
2 Fonctions reconfigurables basées sur une porte majoritaire à 3 entrées .... 163
3 Comparaison des nombres de dispositifs entre CMOS et ASL ............... 164
4 Comparaison de puissance entre CMOS et ASL du système Intel i7 ........ 164
# List of Acronyms

<table>
<thead>
<tr>
<th>Acronym</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALU</td>
<td>Arithmetic Logical Unit</td>
</tr>
<tr>
<td>AOI</td>
<td>AND/OR/Inverter</td>
</tr>
<tr>
<td>AP</td>
<td>Anti-parallel</td>
</tr>
<tr>
<td>ASL</td>
<td>All-Spin Logic</td>
</tr>
<tr>
<td>ASLC</td>
<td>ASL with Clock</td>
</tr>
<tr>
<td>ASLCB</td>
<td>ASL with Clock with Biaxial anisotropy</td>
</tr>
<tr>
<td>ASLNC</td>
<td>ASL with No Clock</td>
</tr>
<tr>
<td>BCD</td>
<td>Binary Coded Decimal</td>
</tr>
<tr>
<td>BER</td>
<td>Bit Error Rate</td>
</tr>
<tr>
<td>CIP</td>
<td>Current In Plan</td>
</tr>
<tr>
<td>CMOS</td>
<td>Complementary Metal Oxide Semi-conductor</td>
</tr>
<tr>
<td>CNN</td>
<td>Convolutional Neural Network</td>
</tr>
<tr>
<td>CPP</td>
<td>Current Perpendicular to plan</td>
</tr>
<tr>
<td>DCT</td>
<td>Discrete Cosine Transform</td>
</tr>
<tr>
<td>DOS</td>
<td>density of states</td>
</tr>
<tr>
<td>DW</td>
<td>Domain Wall</td>
</tr>
<tr>
<td>EDP</td>
<td>Energy Delay Product</td>
</tr>
<tr>
<td>FeFET</td>
<td>Ferroelectronic Field-Effect Transistor</td>
</tr>
<tr>
<td>FET</td>
<td>Field-Effect Transistor</td>
</tr>
<tr>
<td>FM</td>
<td>Ferromagnetic</td>
</tr>
<tr>
<td>FPGA</td>
<td>Field-programmable gate array</td>
</tr>
<tr>
<td>G-ASLG</td>
<td>Graphene based All Spin Logic Gate</td>
</tr>
<tr>
<td>HRS</td>
<td>High Resistance State</td>
</tr>
<tr>
<td>ITRS</td>
<td>International Technology Roadmap for Semiconductors</td>
</tr>
<tr>
<td>LLG</td>
<td>Landau-Lifschitz-Gilbert</td>
</tr>
<tr>
<td>lpsd</td>
<td>Logically Passive Self Dual</td>
</tr>
<tr>
<td>LRS</td>
<td>Low Resistance State</td>
</tr>
</tbody>
</table>
LUT  LookUp Table
MFP  Mean Free Path
ML   Multi-Level
MRAM Magnetoresistance Random Access Memory
MTJ  Magnetic Tunnel Junction
MUX  Multiplexer
P    Parallel
RS   Resistive Switching
RAM  Random-access Memory
ReRAM Resistive Random-Access Memory
SCE  Short Channel Effect
Spin-FET Spin Field-Effect Transistor
STO  Spin Torque Oscillator
STT  Spin-Transfer Torque
SWD  Spin Wave Device
TMR  Tunnel Magnetoresistance
List of Publications

Journals


International Conferences with publications


Workshop and others

Chapitre 1 Introduction générale

La technologie CMOS a considérablement contribué au développement de l'industrie des semi-conducteurs au cours des dernières décennies. Toutefois, au fur et à mesure que la mise à l'échelle des transistors se poursuit au 21ème siècle, la technologie CMOS fera face à des défis importants qui ralentiront la croissance de l'industrie des semi-conducteurs, selon le ITRS [2], qui regroupe les leaders dans le domaine de la recherche et de l'industrie des semi-conducteurs. Pour résoudre ce problème, les chercheurs portent leur attention sur le "Beyond-CMOS" comme des dispositifs spintroniques qui explorent la propriété de spin des électrons. L'un des dispositifs les plus importants est la jonction tunnel magnétique MTJ, qui peut stocker des données binaires basées sur la magnétorésistance à effet tunnel (TMR) [6, 31, 63]. La MTJ peut être utilisée comme une mémoire non volatile ou combinée avec le CMOS pour constituer des circuits hybrides. Cependant, il est difficile de limiter la consommation d'énergie causée par la conversion fréquente entre spin et charge. En outre, du point de vue de la méthode de conception, les circuits hybrides suivent toujours la même méthodologie de conception que les circuits CMOS. Par conséquent, pour mieux profiter de l'avantage des dispositifs spintroniques, certains dispositifs All-Spin Logic (ASL) sont proposés, qui utilisent le courant de pur spin pour transporter des informations, réduisant ainsi la consommation d'énergie provoquée par la commutation charge-spin. Il est avancé que les dispositifs ASL pourraient potentiellement constituer des interrupteurs particulièrement puissants qui peuvent être utilisés dans des applications hybrides et enregistrent de nouveaux paradigmes de conception. Cette situation nous invite à l'étudier ASL sur trois niveaux : dispositif, circuit et système.

Au niveau du dispositif, un modèle électrique est nécessaire pour explorer leur possibilités dans les circuits et les systèmes, de façon à combler l'écart entre les exigences d'application au niveau du système et la fabrication du circuit au niveau du dispositif. Ce modèle doit être précis pour estimer et évaluer la performance du dispositif, évolutif pour étudier la conception de circuits complexes/hierarchiques et génériques conformément aux techniques de conception standardisées basées sur le CMOS. Selon ces exigences, nous développons un modèle électrique, décrit en Verilog-A sous Cadence. Divisé en six blocs, ce modèle permet la conception indépendante et facilite la conception de circuits hiérarchiques. Ce modèle intègre l'effet de la double transfert de spin (STT), l'effet TMR, les effets d'injection/diffusion/accumulation de spin et l'effet de rupture des canaux (breakdown), ce qui permet d'explorer les compromis de performance et d'aider le concepteur à éviter les destructions matérielles. Validé par comparaison aux résultats expérimentaux, ce modèle est utilisé pour rendre en œuvre et évaluer les circuits/systèmes.

Au niveau circuit, l'opérateur majoritaire sur lequel le dispositif ASL repose induit une méthode de conception entièrement différente du CMOS. Par conséquent, il est nécessaire de développer une méthodologie de conception de circuit, en prenant en compte le plus possible la disposition du circuit. Dans notre thèse, nous avons développé une telle méthodologie, capable de synthétiser des circuits avec des fonctions majoritaires, d'explorer les paramètres de dimension : taille du dispositif, interconnections et courants d'injection en fonction des contraintes de conception et d'optimiser les performances avec des matériaux donnés.

Au niveau système, l'objectif principal est d'évaluer le potentiel d'un dispositif ASL dans des applications complexes et d'exploiter son exploitation dans le cadre d'un nouveau paradigme de calcul en fonction de ses propriétés uniques. Notre thèse utilise une approche cellulaire-bibliothèque pour modéliser et évaluer le système ASL en tenant compte des problèmes d'interconnexion. Une version pipeline du système ASL est discutée pour améliorer les performances. En outre, la reconfigurabilité du dispositif ASL basée sur les polarités et valeurs du courant d'injection et les états des entrées de contrôle est explorée, ce qui peut être exploité dans des futures applications numériques.
Chapitre 2 État de l’art

La figure 2(a) montre un dispositif ASL composé de deux MTJs comme mémoires et un canal pour transporter les informations en combinant MTJ et modèle d’injection de spin non-local [102]. L’état de l’art du dispositif MTJ et des dispositifs ASL est présenté dans ce chapitre : principe, développement et applications des MTJ et des dispositifs ASL.

Figure 2 – (a) Schéma de dispositif ASL avec MTJ perpendiculaire, basé sur le modèle d’injection de spin non-local. (b) Deux états de MTJ avec différentes résistances basées sur la magnétorésistance à effet tunnel (TMR) : Parallèle (R_P, état “0”) et Antiparallèle (R_AP, état “1”); Si le courant passe de la couche libre à la couche fixe et est supérieur au courant critique I_c, l’état basculera en configuration Parallèle; dans la situation inverse, l’état sera changé en anti-parallèle. (c) Modèle de circuit de spin du dispositif ASL. Chaque bloc est un réseau en π, et correspond aux composants présentés dans (a).

Une MTJ (voir Fig. 2 (b)) est composée d’une couche isolante prise en sandwich par deux couches ferromagnétiques (FM), où l’une est magnétiquement figée, appelée couche fixe et l’autre est appelée couche libre dont l’aimantation peut être commutée par un champ magnétique ou un courant supérieur au courant critique I_c basé sur l’effet de transfert de spin (STT) (LLG équation [184]). Selon les orientations d’aimantation relatives de ces deux couches FM, c’est-à-dire P ou AP, une MTJ peut avoir deux états de résistance : R_P ou R_AP (R_P < R_AP), également appelé le rapport de TMR.

En raison de sa non-volatilité, les MTJs fournissent une nouvelle voie vers les mémoires et les circuits logiques de prochaines générations. Jusqu’à présent, la mémoire basée sur la MTJ, à savoir la MRAM, a été largement explorée et produite commercialement. En outre, combinée au CMOS, la MTJ peut être utilisée dans un circuit hybride et fournit des fonctionnalités améliorées telles que l’activation/désactivation instantanée ou l’immunité aux rayonnements améliorée.

Le modèle d’injection de spin non-local est composé de deux couches ferromagnétiques (couches libres MTJ en ASL) et d’un canal. Avec un courant d’injection appliqué à la couche ferromagnétique d’entrée, le courant de spin est polarisé dans le canal portant l’orientation de l’aimantation de la couche libre. Ce courant de spin se diffuse vers la couche ferromagnétique de sortie pour commuter son aimantation par l’effet de STT [202], [148] présente le principe du modèle d’injection de spin non local, en précisant les relations entre les courants et les tensions en tenant compte des effets d’injection/accumulation/diffusion. Nous pouvons l’utiliser pour calculer les courants de spin polarisés, la tension, les résistances, etc.

Comme nous l’avons présenté, le courant de spin s’écoulant par la sortie bascule l’orientation de l’aimantation de la couche ferromagnétique polarisée par le courant injecté. Par conséquent, l’efficacité de l’injection est l’un des critères les plus importants qui influencent la performance du dispositif. Pour améliorer l’efficacité de l’injection, une approche consiste à se concentrer sur les recherches de nouveaux matériaux et l’amélioration des qualités matérielles (Par ex. la longueur de diffusion de spin λ_N, la polarisation P). Une autre méthode consiste à se concentrer sur l’étude des structures ASL, l’insertion d’une barrière tunnel entre la couche ferromagnétique et le canal pour résoudre le problème de désadaptation de la résistance. Dans cette thèse, compte tenu de la généralité et de la flexibilité de la structure pour améliorer l’efficacité de l’injection de spin, nous retenons le dispositif ASL présenté à la figure 2 pour sa modélisation compacte générique.

Chapitre 3 Modélisation compacte de ASL

Un modèle compact est nécessaire pour combler l’écart entre la fabrication des dispositifs et les exigences des applications au niveau système. En particulier, des simulations précises d’effets d’injection/détection de spin sont nécessaires pour estimier le temps de commutation et le délai de diffusion de spin à partir des propriétés des matériaux. En outre, les modèles devraient être génériques pour permettre d’explorer les paramètres du dispositif liés à la fabrication tels que les longueurs de canaux et les tailles MTJ. Une telle exploration devrait permettre d’étudier non seulement les compromis des performances, mais devrait également aider les concepteurs à éviter les dommages matériels. Enfin, une approche évolutive est obligatoire pour étudier la conception de circuits hiérarchiques complexes. Il convient de noter que, pour être adoptée par la communauté des concepteurs, l’approche devrait être conforme aux techniques de conception standardisées basées sur le CMOS et devrait être implémentée dans un environnement commercial existant. Par conséquent, il est nécessaire de disposer de modèles précis, génériques, évoluifs et faciles à utiliser, c’est-à-dire des modèles complets et compacts.

Ce chapitre présente les modèles physiques d’un dispositif ASL : le modèle de transfert de spin (STT), le modèle TMR, le modèle d’injection/diffusion/accumulation de spin, le modèle de densité de courant de rupture de canal et les effets de mise à l’échelle. Sur la base de ces modèles physiques, un modèle compact est développé et programmé en utilisant le langage Verilog-A sur la plate-forme Cadence. Divisé en six parties : Injector, Detector, Contact C (barrière tunnel ou simple FM-N transparent), canal N et le circuit vers la masse G, il permet la conception et l’optimisation indépendantes des circuits basé sur le dispositif ASL et facilite la conception de circuits hiérarchiques.

Figure 3 – Différents modèles de blocs d’un inverseur/tampon basé sur un dispositif ASL.
La figure 3 montre le dispositif ASL simple avec les six blocs indépendants du modèle développé. La suite détaille ces blocs :

- **"Injector"** intègre un modèle TMR, un modèle STT et un modèle d’injection de spin. L’état d’un MTJ dépend de la source de tension \( V_{\text{write}} \) connectée aux terminaux “T1” et “T2”. L’état MTJ est émis sur le terminal \( S_{in} \), en tenant compte du délai de commutation. La sortie est représentée comme un signal de tension : \( V = 0 \text{V} \) et \( V = 1 \text{V} \) correspondant respectivement à l’état parallèle et anti-parallèle. Une fois que l’état de la MTJ a été configuré, un courant d’injection \( I_{nj} \) est injecté dans le canal à partir de la couche libre MTJ via le terminal “\( I_{inj} \)” Cela entraîne un courant de charge “\( ousc \)” et un courant de spin “\( outs \)”.

- **“C”** correspond au modèle de contact, qui peut être implémenté avec ou sans barrière tunnel(TB). Les deux bornes d’entrée “\( inc \)” et “\( ins \)” représentent le courant de charge et de spin d’entrée. Les terminaux “\( ousc \)” et “\( outs \)” représentent le courant de charge et de spin de sortie.

- **“G”** et **“N”** correspondent au modèle de masse et de canal respectivement. Une partie des courants de charge et de spin délivrés par le contact s’écoulent vers la masse tandis que la partie restante s’écoule dans le canal où elle se propage jusqu’à atteindre un détecteur.

- **“Detector”** correspond au bloc permettant de basculer un état MTJ en fonction du courant traversant un contact. Au-dessus d’un courant de seuil le terminal “\( State \)” est commuté sur 0V (parallèle) ou 1V (anti parallèle) en fonction de la polarité du courant d’injection et de l’entrée d’état MTJ. L’état peut être lu en appliquant une source de tension \( V_{\text{read}} \) aux terminaux “T1” et “T2” et est envoyé à “\( S_{out} \)” terminal.

Le modèle compact est validé en le comparant à trois résultats expérimentaux avec trois matériaux de canal différents : Py [115], Mg [105] et graphène [124]. La performance du dispositif ASL est d’abord analysée avec les expressions d’injection/détectection de spin développées pour le dispositif ASL considéré. De plus, ces expressions peuvent également être utilisées pour discuter des phénomènes observés expérimentalement sur les dispositifs ASL. La dépendance des critères de performance sur les paramètres du dispositif est répertoriée et illustrée à la figure 4.

**Figure 4** – Dépendance entre performances et paramètres du dispositif ASL

Des simulations basées sur le modèle compact sont effectuées pour l’analyse de performance, réparties en trois ensembles :

- La largeur globale du dispositif \( W \), liée à l’effet de mise à l’échelle, est l’un des paramètres les plus importants d’un dispositif ASL.

- Paramètres du modèle STT : facteur d’amortissement \( \alpha \) et le facteur thermique \( \Delta \), qui sont liés au courant critique de commutation \( I_{0} \), donc au délai de commutation \( t \).

- Paramètres du modèle d’injection/détectection de spin : la longueur du canal \( L_{N} \), la longueur de diffusion de spin du canal \( \lambda_{N} \), la polarisation des contacts \( P_{C} \) et le produit résistance \( R \) x surface \( RAC \). Intégé dans le modèle d’injection/détectection de spin, ces paramètres influencent l’efficacité d’injection/détectection de spin \( P_{nj/eff} \). Avec un courant d’injection donné \( I_{nj} \), le courant de détection consacrée à l’effet de transfert de spin \( I_{det} \) dépend de \( P_{nj/eff} \) et détermine le délai de commutation \( t \) avec le courant critique \( I_{0} \). De plus, en fonction de l’équation d’énergie \( E = I_{nj} \Delta R t \), le produit de \( I_{nj} \), la résistance du dispositif \( R \) et la durée de l’impulsion donnent les valeurs de consommation d’énergie.

Outre ces critères de performance, notre modèle a également intégré la densité de courant de rupture des canaux \( J_{BR} \), liée à la largeur du canal \( W \) et la longueur du canal \( L_{N} \). Avec une densité de courant \( J_{BR} \)
déterminée et des paramètres de dispositif donnés, il existe un courant d’injection maximum \( I_{inj} \) pour éviter les dommages au canal.

Les résultats montrent qu’une largeur de dispositif plus petite \( W \) avec un facteur thermique plus petit \( \Delta \) un facteur d’amortissement plus petit \( \alpha \) conduit à un courant critique plus petit \( I_0 \) et donc un délai de commutation plus petit; Une longueur de canal plus courte \( L_N \), une longueur de diffusion de spin de canal plus longue \( \lambda_N \), une polarisation de conductance de tunnel plus grande \( P_C \) peut entraîner une plus grande efficacité d’injection/détectio et donc une performance améliorée (consommation d’énergie). Ainsi, pour optimiser les performances dans une structure ASL asymétrique, un compromis doit être simulé pour la valeur du produit résistance \( \times \) surface de la barrière tunnel \( RA_C \).

En conclusion, le modèle compact développé dans ce chapitre permet la conception des circuits hiérarchiques et l’analyse de performance fournit une base pour les optimisations de circuits.

Chapitre 4 Conception et simulation des circuits à base de ASL

Avec le modèle compact ASL développé au chapitre 3, nous explorons le dispositif ASL au niveau circuit pour permettre une exploration plus poussée du dispositif ASL au niveau système. À la différence de la technologie CMOS fondée sur les charges, des nouveaux circuits et des architectures sont nécessaires pour prendre en compte des phénomènes physiques basés sur les spins. C’est une tâche difficile en raison des nombreux paramètres physiques à considérer et du manque d’outils. Dans ce manuscrit, nous proposons une méthodologie permettant de concevoir des circuits ASL en fonction des propriétés physiques des matériaux utilisés, en tenant compte du problème lié aux multiples canaux, à l’interconnexion entre les portes et de l’injection requise pour compenser la diffusion du spin. La figure 5 montre la méthodologie développée avec 4 étapes successives.

![Diagram](image_url)

Figure 5 – Méthode de conception des circuits basés sur le dispositif ASL.
L’étape 1 spécifie les caractéristiques du circuit et du système et les contraintes, ainsi que les paramètres du matériau.

Étant donné l’exploitation faite du dispositif ASL qui suit le principe de la majorité, l’étape 2 synthétise le circuit avec des fonctions majoritaires basées sur deux méthodes de synthèse différentes : i) La méthode “table de vérité” synthétise un circuit simple à partir de sa table de vérité; ii) La méthode de “remplacement” synthétise un circuit complexe en remplaçant sa fonction booléenne par des fonctions logiques fondamentales basées sur la majorité.

L’étape 3 explore les paramètres du circuit (tailles MTJ, la longueur du canal) et les courants d’injection afin de respecter les contraintes du système. La fonction du circuit est validée et la performance est optimisée en fonction du modèle compact ASL. Les paramètres sont exportés vers l’étape 4 pour la mise en œuvre.

Figure 6 – Mise en œuvre de circuits fondamentaux avec $L_i$ représentant la longueur de canal. (a) Fonctions inverseur/tampon configurées avec les polarités du courant d’injection; (b) Fonctions ET/OU/NAND/NOR réalisées par une porte majoritaire à 3 entrées, avec les polarités du courant d’injection et les états de l’entrée de commande $F$; Nous ne considérons pas l’influence du virage de In1/F à P; (c) XORTT architecture de circuits XOR2/3 basée sur la méthode “table de vérité”, qui peut être utilisée dans le calcul de l’additionneur complet, où M1 comme sortie de retenue de somme. (d) Une autre architecture XORx, de circuit XOR2/3 basée sur une méthode de “remplacement”, composée de trois portes majoritaires à 3 entrées. Cette architecture peut servir aux calculs de l’additionneur complet et du soustracteur complet, où M1 représente la sortie retenue et F la sortie somme ou différence.

Sur la base de cette méthodologie, nous avons implémenté les circuits logiques fondamentaux (Fig. 6) et les circuits combinatoires. Leurs architectures sont présentées, en précisant les paramètres dimensionnels. Leurs comportements fonctionnels sont simulés et vérifiés en fonction du modèle compact ASL. En outre, leurs performances y compris le retard, l’énergie, le produit énergie × délais (EDP) et le débit avec des approches optimisées en délais et optimisées en énergie (Tableau 1), sont évalués pour les évaluations des circuits de haut niveau et du système. Les comparaisons avec les circuits CMOS 45 nm [260] sont également analysées.

En conclusion, ce chapitre présente une méthodologie de conception de circuit ASL où les circuits logiques de base et les circuits combinatoires sont conçus, implémentés et validés sur la base du modèle compact ASL développé, formant une bibliothèque. Leurs performances sont comparées entre eux et à la technologie CMOS. Les circuits ASL ont un délai et une consommation énergétique plus importants, principalement en raison de la commutation de la MTJ. La performance peut être améliorée avec la mise à l’échelle des dispositifs, l’étude des matériaux, en fonction de l’analyse de performance du chapitre 3.

Le dispositif ASL possède une propriété unique : la reconfigurabilité, que nous pouvons utiliser pour une nouvelle conception de circuit et d’architecture. La reconfigurabilité des circuits ASL dépend de i) des états des différentes entrées de contrôle, ii) les polarités du courant d’injection (positive ou négative) et iii) les poids des entrées induites par les amplitudes du courant d’injection, la longueur du canal et les dimensions du terminal. Le tableau 2 montre les fonctions configurées avec une porte majoritaire à 3 entrées. Selon la polarité du courant d’injection et l’état de l’entrée de commande $F$, ce circuit peut réaliser les fonctions ET/NAND/OU/NOR2.

Chapitre 5 Modélisation et évaluation niveau système

A partir de la méthode de conception, les circuits logiques et combinatoires fondamentaux sont conçus, mis en œuvre et évalués, ce qui permet de concevoir et d’évaluer des systèmes. Au niveau système, jusqu’à
Table 1 – Évaluation des circuits de base du dipositif ASL.

<table>
<thead>
<tr>
<th>Fonction</th>
<th>Surface ($\mu m^2$)</th>
<th>$I_{in,j}$ ($\mu A$)</th>
<th>Délai (ns)</th>
<th>Énergie (nJ)</th>
<th>EDP</th>
<th>Délai ($\frac{1}{\mu J \cdot s}$)</th>
<th>Énergie ($\frac{1}{\mu J \cdot s}$)</th>
<th>EDP ($\frac{1}{\mu J \cdot s}$)</th>
<th>$I_{in,j}$ ($\mu A$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverseur</td>
<td>0.04</td>
<td>$1.9 \times 10^3$</td>
<td>0.29</td>
<td>0.065</td>
<td>0.019</td>
<td>86.21</td>
<td>410</td>
<td>2.285</td>
<td>0.024</td>
</tr>
<tr>
<td>TAMRON</td>
<td>0.04</td>
<td>700</td>
<td>0.886</td>
<td>0.0807</td>
<td>0.0715</td>
<td>28.22</td>
<td>450</td>
<td>1.82</td>
<td>0.069</td>
</tr>
<tr>
<td>ET2</td>
<td>0.08</td>
<td>455</td>
<td>1.659</td>
<td>0.0644</td>
<td>0.107</td>
<td>7.534</td>
<td>455</td>
<td>1.659</td>
<td>0.0644</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Fonction</th>
<th>$I_{in,j}$ ($\mu A$)</th>
<th>Délai (ns)</th>
<th>Énergie (nJ)</th>
<th>EDP</th>
<th>Délai ($\frac{1}{\mu J \cdot s}$)</th>
<th>Énergie ($\frac{1}{\mu J \cdot s}$)</th>
<th>EDP ($\frac{1}{\mu J \cdot s}$)</th>
<th>$I_{in,j}$ ($\mu A$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Additionner</td>
<td>rep</td>
<td>0.12</td>
<td>700</td>
<td>1.772</td>
<td>0.244</td>
<td>0.433</td>
<td>4.703</td>
<td>450</td>
</tr>
<tr>
<td>Soustraire</td>
<td>1-bit</td>
<td>0.16</td>
<td>700/750</td>
<td>1.77</td>
<td>0.274</td>
<td>0.485</td>
<td>4.703</td>
<td>450/455</td>
</tr>
<tr>
<td>4-bit</td>
<td>N/A</td>
<td>7.088</td>
<td>1.095</td>
<td>7.76</td>
<td>0.294</td>
<td>N/A</td>
<td>14.56</td>
<td>0.925</td>
</tr>
<tr>
<td>Compteur</td>
<td>1-bit</td>
<td>0.16</td>
<td>700/475</td>
<td>1.772</td>
<td>0.319</td>
<td>0.506</td>
<td>3.527</td>
<td>450</td>
</tr>
<tr>
<td>2-bit</td>
<td>0.92</td>
<td>600/455</td>
<td>3.318</td>
<td>1.038</td>
<td>3.443</td>
<td>0.328</td>
<td>600/455</td>
<td>3.318</td>
</tr>
<tr>
<td>4-bit</td>
<td>1.24</td>
<td>700/450</td>
<td>6.202</td>
<td>1.713</td>
<td>10.62</td>
<td>0.13</td>
<td>450</td>
<td>12.74</td>
</tr>
<tr>
<td>Multiplicateur</td>
<td>4-bit</td>
<td>2.56</td>
<td>700</td>
<td>22.15</td>
<td>4.483</td>
<td>99.3</td>
<td>0.0176</td>
<td>450</td>
</tr>
<tr>
<td>ALU</td>
<td>MG</td>
<td>0.2</td>
<td>450/455</td>
<td>3.318</td>
<td>0.21</td>
<td>0.609</td>
<td>1.507</td>
<td>450/455</td>
</tr>
<tr>
<td>LG</td>
<td>N/A</td>
<td>19.14</td>
<td>0.923</td>
<td>17.66</td>
<td>0.05</td>
<td>N/A</td>
<td>20.07</td>
<td>0.911</td>
</tr>
<tr>
<td>Multiplexeur</td>
<td>2-to-1</td>
<td>0.16</td>
<td>455</td>
<td>3.318</td>
<td>0.129</td>
<td>0.427</td>
<td>1.883</td>
<td>455</td>
</tr>
<tr>
<td>4-to-1</td>
<td>0.44</td>
<td>450/455</td>
<td>3.43</td>
<td>0.5</td>
<td>1.715</td>
<td>0.662</td>
<td>450/455</td>
<td>5.3</td>
</tr>
<tr>
<td>Demultiplexeur</td>
<td>1-to-2</td>
<td>0.08</td>
<td>700</td>
<td>0.886</td>
<td>0.161</td>
<td>0.143</td>
<td>14.11</td>
<td>450</td>
</tr>
<tr>
<td>1-to-4</td>
<td>0.32</td>
<td>455</td>
<td>1.659</td>
<td>0.258</td>
<td>0.427</td>
<td>1.88</td>
<td>55</td>
<td>1.659</td>
</tr>
<tr>
<td>1-to-8</td>
<td>0.96</td>
<td>700</td>
<td>1.77</td>
<td>0.242</td>
<td>0.429</td>
<td>0.588</td>
<td>450/455</td>
<td>3.48</td>
</tr>
<tr>
<td>Codeur de priorité à 4 entrées</td>
<td>N/A</td>
<td>2.545</td>
<td>0.36</td>
<td>0.915</td>
<td>1.403</td>
<td>N/A</td>
<td>3.48</td>
<td>0.336</td>
</tr>
<tr>
<td>Décodage</td>
<td>2-to-4</td>
<td>0.16</td>
<td>700</td>
<td>0.886</td>
<td>0.323</td>
<td>0.286</td>
<td>7.054</td>
<td>450</td>
</tr>
<tr>
<td>3-to-8</td>
<td>0.64</td>
<td>455</td>
<td>1.659</td>
<td>0.515</td>
<td>0.855</td>
<td>0.942</td>
<td>455</td>
<td>1.659</td>
</tr>
<tr>
<td>Afficheur 7 segments</td>
<td>1.36</td>
<td>N/A</td>
<td>4.204</td>
<td>2.137</td>
<td>8.986</td>
<td>0.175</td>
<td>N/A</td>
<td>5.462</td>
</tr>
</tbody>
</table>

maintenant, en raison du caractère spécifique des dispositifs ASL, il n’existe pas de méthode de conception et d’évaluation efficace. Dans ce manuscrit, nous utilisons la méthode du “remplacement”, en remplaçant les fonctions booléennes originelles par des fonctions basées sur la fonction majoritaire. Le tableau 3 compare, pour plusieurs circuits basés sur les technologies CMOS et ASL, le nombre de dispositifs requis. A partir de ce tableau, on estime les types des circuits et leurs nombres et évalue les performances du système en évaluant chaque circuit séparément.

L’évaluation du système prend également en compte les interconnexions entre les portes. Comme le courant de spin s’atténue rapidement dans le canal en $e^{-LN/A_N}$, des tampons doivent être insérés pour garantir le transfert du circuit de spin à longue distance. Les tampons nécessaires dans un modèle statistique de bloc logique sont donnés dans [236] avec les distributions d’interconnexion dans [261]. Le nombre de tampons insérés dépend de la longueur du canal de tampon. Une longueur de canal plus longue peut réduire le nombre de tampons, mais un courant de charge plus élevé est nécessaire pour compenser les pertes de diffusion de spin dans le canal. Par conséquent, un compromis pour la longueur de canal doit être trouvé pour l’optimisation des performances du système. Avec le nombre de circuits différents calculés, les performances du système peuvent être évaluées en fonction de l’évaluation des éléments isolés. Dans cette thèse, nous évaluons la performance de trois circuits complexes : un circuit DCT, un circuit de convolution, et un.

Table 2 – Fonctions reconfigurables basées sur une porte majoritaire à 3 entrées.

<table>
<thead>
<tr>
<th>Fonction</th>
<th>$F$</th>
<th>$I_{in,j}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>ET2</td>
<td>1</td>
<td>N</td>
</tr>
<tr>
<td>OUI</td>
<td>0</td>
<td>N</td>
</tr>
<tr>
<td>NAND2</td>
<td>1</td>
<td>P</td>
</tr>
<tr>
<td>NOR2</td>
<td>0</td>
<td>P</td>
</tr>
</tbody>
</table>
microprocesseur Intel i7.

Le tableau 4 montre les résultats de l'évaluation du microprocesseur Intel i7 composé uniquement de tampons et de portes NAND. L'optimisation du système s'effectue ainsi en améliorant la mise en œuvre des tampons et des portes NAND2. Elle est obtenue grâce à l'optimisation présentée au chapitre 3 en explorant i) le seuil de courant de commutation FM (qui dépend de la largeur W du dispositif, du facteur thermique Δ et du facteur d'amortissement α) et ii) la capacitance de l'événement de spin (la longueur de diffusion de spin du canal λ_N et la polarisation de spin P_F et P_C).

Les résultats du chapitre 3 montrent que la porte NAND avec trois canaux sans jonction est la meilleure option de conception. De plus, une longueur de canal plus courte conduit à une diminution de l'atténuation de spin, c'est-à-dire un moindre courant d’injection nécessaire. Compte tenu du couplage dipolaire, nous définissons la longueur du canal à 50 (resp. 15) nm pour le noyau technologique W=40(5) nm.

Comme expliqué précédemment, le nombre de tampons insérés dépend de la longueur du canal. Une longueur de canal optimisée conjointement avec le courant d’injection correspondant doivent être déterminés en fonction des contraintes de décalage (2 ns) et des contraintes sur le nombre de tampons (le nombre de tampons doit être moins que la moitié du nombre de dispositifs totalement utilisés).

Les résultats montrent qu'un système ASL implanté à l'aide de technologies de fabrication et de matériaux existants consomme beaucoup plus d'énergie que le système équivalent basé sur la technologie CMOS. Cependant, nous pouvons nous arrêter à ce qu'avoir une amélioration future du processus de fabrication et les découvertes sur les matériaux, les systèmes ASL viennent à dominer le CMOS.

Table 3 – Comparaison des nombres de dispositifs entre CMOS et ASL.

<table>
<thead>
<tr>
<th>Fonction</th>
<th>CMOS</th>
<th>ASL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverseur</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>Tampon</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>2-entrée ET</td>
<td>6</td>
<td>2</td>
</tr>
<tr>
<td>2-entrée OU</td>
<td>6</td>
<td>2</td>
</tr>
<tr>
<td>2-entrée NAND</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>2-entrée NOR</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>2-entrée XOR</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>2-entrée XNOR</td>
<td>8</td>
<td>3</td>
</tr>
<tr>
<td>3-entrée ET</td>
<td>8</td>
<td>3</td>
</tr>
<tr>
<td>3-entrée NAND</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>3-entrée NOR</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>3-entrée XOR</td>
<td>20</td>
<td>2</td>
</tr>
<tr>
<td>3-entrée XNOR</td>
<td>22</td>
<td>2</td>
</tr>
</tbody>
</table>

Table 4 – Comparaison de puissance entre CMOS et ASL du système Intel i7.

<table>
<thead>
<tr>
<th>Paramètre</th>
<th>CMOS</th>
<th>40 nm</th>
<th>5 nm</th>
<th>5 nm</th>
<th>5 nm</th>
<th>5 nm</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>22 nm</td>
<td>0.447 × 10^7</td>
<td>0.447 × 10^7</td>
<td>0.447 × 10^7</td>
<td>0.447 × 10^7</td>
<td>0.447 × 10^7</td>
</tr>
<tr>
<td>≠ NAND (≠ tampon)</td>
<td>500 µA/182 µA</td>
<td>596 µA/181 µA</td>
<td>51 µA/61.6 µA</td>
<td>13 µA/2.4 µA</td>
<td>1.7 µA/1.83 µA</td>
<td></td>
</tr>
<tr>
<td>I_{M/N}</td>
<td>1 µm</td>
<td>10µm</td>
<td>10µm</td>
<td>10µm</td>
<td>10µm</td>
<td></td>
</tr>
<tr>
<td>(NAND/tampon)</td>
<td>50 nm/15 nm</td>
<td>50 nm/15 nm</td>
<td>15 nm/15 nm</td>
<td>15 nm/15 nm</td>
<td>15 nm/15 nm</td>
<td></td>
</tr>
<tr>
<td>Δ</td>
<td>0.5</td>
<td>0.5</td>
<td>0.5</td>
<td>0.8</td>
<td>0.8</td>
<td></td>
</tr>
<tr>
<td>P_F/C</td>
<td>69</td>
<td>69</td>
<td>69</td>
<td>69</td>
<td>69</td>
<td></td>
</tr>
<tr>
<td>(NAND/tampon)</td>
<td>0.27</td>
<td>0.27</td>
<td>0.007</td>
<td>0.007</td>
<td>0.007</td>
<td></td>
</tr>
<tr>
<td>Puissance (625 MHz)</td>
<td>4.6 W/6877 W</td>
<td>5.8 × 10^4 W/517.7 W</td>
<td>4.6 × 10^5 W/36.4 W</td>
<td>0.154 W/0.065 W</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Pour amorcer le débit des circuits ASL, leur fonctionnement dans un mode pipeline est discuté. À la différence des circuits CMOS, chaque entrée des circuits ASL est connectée à un signal d'horloge à travers le courant d’injection et aucun courant d’alimentation constant n’est nécessaire. Comme illustré dans le chronogramme de la figure 7 (a) et (b), le circuit XOR rep est implémenté en utilisant deux horloges (CLK1 et CLK2 pour les étapes 1 et 2 respectivement). Il existe deux phases de courant d’injection dans chaque cycle d’horloge (positive et négative) qui sont connectées à l’entrée de la MTJ afin d’implémenter les fonctions de l’inversion et de tampon. Durant l’activité du signal CLK1, le courant d’injection I_{inj} (à deux phases) est injecté dans les 6 entrées des deux portes ET. Les courants de spin injectés se propagent à M1 et M2.
où il sera détecté pendant CLK2. À cette fin, le $I_{inj2}$ actuel est injecté dans les entrées $In1$, $M1$, et $M2$ de la phase 2, transmettant ainsi le courant de spin à la sortie $Out$ MTJ. Les signaux d’horloge et les phases de courant d’injection sont fournis par des circuits auxiliaires CMOS que nous ne discutons pas dans ce manuscrit.

Un pipeline est réalisé en introduisant des MTJ entre chaque étape jouant le rôle de bascules, comme l’illustre la figure 7 (c). Les données sont transmises des entrées aux bascules lorsqu’un courant d’injection est appliqué. De même, les données stockées dans les bascules sont transmises à $M1/M2$ lorsqu’un courant d’injection est appliqué. Par conséquent, les courants d’injection appliqués sur les bascules MTJ agissent comme le déclenchement d’une horloge. Cela verrouille les entrées de l’étape 2 pendant le calcul de l’étape 1. Le diagramme de pipeline du circuit est illustré à la figure 7 (d). La durée du cycle de l’horloge est $Max(T_{MAJ_{1stage2}}, T_{MAJ_{2ndstage2}}) + T_{write}$, où $T_{write}$ est le délai d’écriture des bascules et $T_{MAJ_{1stage1}} / T_{MAJ_{2ndstage2}}$ sont les latences de la porte majoritaire à 3 entrées à l’étape 1 et à étape 2. La latence de ce circuit pipeliné est de 2 cycles d’horloge et le débit est de $1/ClockCycle$. Par conséquent, cette implementation permet d’améliorer le débit au détriment de la surface.

![Figure 7](image_url)

**Figure 7** – Le chronogramme et pipelining des circuits ASL. (a) circuit XOR/XNOR2/3 à 3 entrées avec signaux d’injection d’horloge; (b) Signaux d’horloge : CLK1 et CLK2 sont connectés à l’étape 1 et à la phase 2 respectivement. Le courant d’injection pour chaque étape peut comporter deux phases : amplitudes positives et négatives pour configurer respectivement l’inverseur et la fonction tampon. (c) circuit pipeline à 2 étages en ajoutant des MTJ comme bascules entre les étages. (d) Diagramme d’activité du circuit à pipeline.

**Conclusions et perspectives**

Cette thèse vise à proposer un cadre unifié pour concevoir et évaluer les circuits et systèmes ASL, de la modélisation/layout de dispositif à l’évaluation du système. La recherche sur les circuits ASL en est encore à ses balbutiements et la plupart des expériences se concentrent sur la preuve des phénomènes d’injection/détectio de spin et de l’amélioration de l’efficacité d’injection, et non sur la conception du circuit/système (chapitre 1). La possibilité d’obtenir une faible consommation dans une application système implique la construction d’un cadre global pour les dispositifs ASL, du niveau dispositif au niveau système. Par conséquent, dans cette thèse, nos principales contributions reposent sur la proposition d’un modèle compact ASL et d’une méthodologie de conception de circuits/systèmes. Nous étudions d’abord la structure et le principe de base du dispositif ASL (chapitre 2) : MTJ et son modèle d’injection/détection de spin. En explorant les modèles physiques d’ASL, nous avons développé un modèle compact, programmé en langage Verilog-A sous Cadence, qui permet la conception de circuits hiérarchiques. Validé par comparaison avec des résultats expérimentaux des dispositifs ASL, ce modèle compact peut être utilisé pour concevoir et évaluer théoriquement des circuits.
arbitraires, (chapitre 3). Par conséquent, nous utilisons ce modèle compact pour concevoir les circuits combinatoires usuels sur la base d’une méthode de conception de circuits/systèmes développée au (chapitre 4). Les circuits sont mis en œuvre et évalués sur la base du modèle compact. Une bibliothèque de circuits est développée pour la conception et l’évaluation des systèmes (chapitre 5). La reconfigurabilité et les circuits ASL pipelinés sont analysés et différents systèmes sont évalués. Les résultats soulignent que les améliorations futures du processus de fabrication et des technologies matérielles des circuits ASL rendent possible de surpasser la mise en œuvre de la technologie CMOS.

Même si nous avons exploré le dispositif ASL du niveau du dispositif au niveau du système, certains points peuvent encore améliorer les applications ASL.

Notre modélisation compacte intègre les effets nécessaires pour le calcul. Cependant, les variations de paramètres causées par les effets thermiques ne sont pas prises en considération, ce qui nécessite des modèles spécifiques pour définir les tendances des variations sur la base des résultats expérimentaux. De plus, la largeur du modèle MTJ utilisé est comprise entre 25 nm et 40 nm. Des dimensions de MTJ plus petites sont nécessaires pour améliorer les performances du dispositif. Par conséquent, un modèle de MTJ physique et compact de dimensions sub-nanométriques doit être développé. Pour le canal dans l’injection/détectio de spin, nous ne considérons pas l’effet de bord qui influencera la diffusion de spin. Par conséquent, un modèle plus sophistiqué pourrait être développé en tenant compte des effets susmentionnés, pour évaluer avec précision les performances des dispositifs et des circuits.

Pour évaluer précisément le circuit, le dessin (layout) devrait être effectué et une méthode de dessin des masques prenant en compte le placement, l’ordonnancement, etc, devrait être développée. En outre, dans l’évaluation de la performance, nous ne considérons pas les circuits auxiliaires CMOS, pour l’alimentation électrique, qui devra être considérée dans une future évaluation.

Au niveau du circuit, la méthode de synthèse utilisée dans ce manuscrit reste toujours inspirée de la technologie CMOS. Une nouvelle méthode de synthèse pour les circuits complexes devrait être développée, en prenant en compte le principe de la majorité et des propriétés spéciales des dispositifs ASL.

Au niveau du système, nous avons discuté des circuits ASL pipelinés. Un pipeline de grain plus fin pourrait être discuté à l’avenir. La propriété reconfigurable des circuits ASL pourrait être exploitée pour concevoir des circuits plus complexes.

Une autre étude sur le dispositif ASL : il est possible de construire les circuits logique en mémoire (MTJ comme mémoire non volatile), analogique et neuromorphique. Les propriétés de la superposition du courant de spin et du seuil pour la commutation d’état montrent la possibilité d’architectures neuromorphiques. En tant que synapse, les MTJ multi-niveaux peuvent être utilisés pour stocker les poids intégrés. Associé à la reconfigurabilité du dispositif ASL, le dispositif ASL peut être utilisé pour implémenter efficacement une architecture neuromorphique.
Title: Compact modeling and circuit design based on spin injection

Keywords: All spin logic, compact modeling, design methodology, pipelining, reconfigurability

Abstract: The CMOS technology has tremendously affected the development of the semi-conductor industry. However, as the technology node is scaled down, the CMOS technology faces significant challenges set by the leakage power and the short channel effects. To cope with this problem, researchers pay their attention to the spintronics in recent years, considering its possibilities to allow smaller size fabrication and lower power operations. The magnetic tunnel junction (MTJ) is one of the most important spintronic devices which can store binary data based on Tunnel Magnetoresistance (TMR) effect. Except for the non-volatile memory, MTJ can be also used to combine with or replace the CMOS circuits to implement a hybrid circuit, for the potential to achieve low power consumption and high speed performance. However, the problem of frequent spin-charge conversion in a hybrid circuit may cause large power consumption, which diminishes the advantage of the hybrid circuits. Therefore, the ASL concept which uses a pure spin current to transport the information is proposed for fewer charge-spin conversions, thus for less power consumption. The design of ASL device-based circuits leads to numerous challenges related to the heterogeneity they introduce and the large design space to explore. Hence, this thesis focus on filling the gap between application requirements at the system level and the device fabrication at the device level.

In device level, we developed a compact model integrating the STT, the TMR, the spin injection/accumulation effects, the channel breakdown current and the spin diffusion delay. Validated by comparing with experimental results, this model allows exploring fabrication-related device parameters such as channel lengths and MTJ sizes and help designers to prevent from device damages. Moreover, programmed with Verilog-A on Cadence and divided into several blocks: injector, detector, channel and contact devices, this model allows the independent design and cross-layer optimization of ASL-based circuits, that eases the design of hierarchical, complex circuits. Furthermore, the spin injection/accumulation expressions for the used ASL device are derived, enabling to discuss the experimental phenomena of the ASL device.

In circuit level, we developed a circuit/system design methodology, taking into account the channel distribution, the gate interconnection and the different injection current ratios caused by the spin diffusion. With circuit/system specifications and constraints, the boolean functions of a circuit are synthesized based on the developed synthesis method and fabrication-level parameters: channel lengths, MTJ sizes are specified. Based on this developed methodology, basic combinational circuits that form a circuit library are designed and evaluated by using the developed compact model.

In system level, a convolution circuit and an Intel i7 system are evaluated exploring the interconnection issues: interconnection distribution between gates and inserted buffer count. With theoretical parameters, results show that ASL-based circuit/system can outperform CMOS-based circuit/system. Moreover, the pipelining schema of the ASL-based circuit is discussed with MTJ as latches inserted between stages. The reconfigurability caused by the injection current polarities/values and the control terminal states of ASL-based circuits are also discussed with the reconfigurable exploration of basic logic circuits.
Titre : Modélisation compacte et conception de circuit à base d’injection de spin

Keywords : logique à pur courant de spin, modélisation compacte, méthodologie de conception, pipeline, reconfigurable

Résumé : La technologie CMOS a considérablement contribué au développement de l’industrie des semi-conducteurs. Cependant, au fur et à mesure que le nœud technologique est réduit, la technologie CMOS fait face à des défis importants liés à la dissipation due aux courants de fuite et aux effets du canal court. Pour résoudre ce problème, les chercheurs se sont intéressés à la spintronique ces dernières années, compte tenu de la possibilité de fabriquer des dispositifs de taille réduite et d’opérations de faible puissance. La jonction tunnel magnétique (MTJ) est l’un des dispositifs spintroniques les plus importants qui peut stocker des données binaires grâce à la Magnétorésistance à effet tunnel (TMR). En dehors des applications de mémoire non volatile, la MTJ peut également être utilisée pour combiner ou remplacer les circuits CMOS pour implémenter un circuit hybride, de façon à combiner une faible consommation d’énergie et des performances à grande vitesse. Cependant, le problème de la conversion fréquente de charge en spin dans un circuit hybride peut entraîner une importante consommation d’énergie, ce qui obère l’intérêt pour des circuits hybrides. Par conséquent, le concept ASL qui repose sur un pur courant de spin comporte de la transition est proposé pour limiter les conversions entre charge et spin, donc pour réduire la consommation d’énergie. La conception de circuits à base de périphériques ASL entraîne de nombreux défis liés à l’hétérogénéité qu’ils introduisent et à l’espace de conception étendu à explorer. Par conséquent, cette thèse se concentre sur l’écart entre les exigences d’application au niveau du système et la fabrication des nanodispositifs.

Au niveau du dispositif, nous avons développé un modèle compact intégrant le STT, la TMR, les effets d’injection/accumulation de spin, le courant de breakdown des canaux et le délai de diffusion de spin. Validé par comparaison avec les résultats expérimentaux, ce modèle permet d’explorer les paramètres du dispositif liés à la fabrication, tels que les longueurs de canaux et les tailles de MTJ, et aide les concepteurs à éviter leur destruction. De plus, ce modèle, décrit avec Verilog-A sur Cadence et divisé en plusieurs blocs: injecteur, détecteur, canal et contact, permet une conception indépendante et une optimisation des circuits ASL qui facilitent la conception de circuits hiérarchiques et complexes. En outre, les expressions permettant le calcul de l’injection/accumulation de spin pour le dispositif ASL utilisé sont dérivées. Elles permettent de discuter des phénomènes expérimentaux observés sur les dispositifs ASL.


Au niveau du système, un circuit de convolution et un système Intel i7 sont évalués en explorant les problèmes d’interconnexion: la répartition de l’interconnexion entre les portes et le nombre de tampons inséré. Avec des paramètres théoriques, les résultats montrent que le circuit/système ASL peut surpasser le circuit/système basé sur CMOS. De plus, le schéma de pipeline du circuit basé sur ASL est discuté avec MTJ comme tampons insérés entre les étapes. La reconfigurabilité provoquée par les polarités/valeurs du courant d’injection et les états des terminaux de control des circuits ASL sont également discutés avec l’exploration reconfigurable des circuits logiques de base.
Titre : Modélisation compacte et conception de circuit à base d’injection de spin

Mots clés : logique à pur courant de spin, modélisation compacte, méthodologie de conception, pipeline, reconfigurabilité

Résumé : La technologie CMOS a considérablement contribué au développement de l’industrie des semi-conducteurs. Cependant, au fur et à mesure que le niveau technologique est réduit, la technologie CMOS fait face à des défis importants liés à la dissipation dûe aux courants de fuite et aux effets du canal court. Pour résoudre ce problème, les chercheurs se sont intéressés à la spintronique ces dernières années, compte tenu de la possibilité de fabriquer des dispositifs de taille réduite et d’opérations de faible puissance. La jonction tunnel magnétique (MTJ) est l’un des dispositifs spintroniques les plus importants qui peut stocker des données binaires grâce à la Magnétorésistance à effet tunnel (TMR). En dehors des applications de mémoire non volatile, la MTJ peut également être utilisée pour combiner ou remplacer les circuits CMOS pour implémenter un circuit hybride, de façon à combiner une faible consommation d’énergie et des performances à grande vitesse. Cependant, le problème de la conversion fréquente de charge en spin dans un circuit hybride peut entrainer une importante consommation d’énergie, ce qui obère l’intérêt pour des circuits hybrides. Par conséquent, le concept ASL qui repose sur un pur courant de spin comme support de l’information est proposé pour limiter les conversions entre charge et spin, donc pour réduire la consommation d’énergie. La conception de circuits à base de périphériques ASL entraîne de nombreux défis liés à l’hétérogénéité qu’ils introduisent et à l’espace de conception étendu à explorer. Par conséquent, cette thèse se concentre sur l’écart entre les exigences d’application au niveau du système et la fabrication des nanodispositifs.

Au niveau du dispositif, nous avons développé un modèle compact intégrant le STT, la TMR, les effets d’injection/accumulation de spin, le courant de breakdown des canaux et le délai de diffusion de spin. Validé par comparaison avec les résultats expérimentaux, ce modèle permet d’explorer les paramètres du dispositif liés à la fabrication, tels que les longueurs de canaux et les tailles de MTJ, et aide les concepteurs à éviter leur destruction. De plus, ce modèle, écrit avec Verilog-A sur Cadence et divisé en plusieurs blocs: injecteur, détecteur, canal et contact, permet une conception indépendante et une optimisation des circuits ASL qui facilitent la conception de circuits hiérarchiques et complexes. En outre, les expressions permettant le calcul de l’injection/accumulation de spin pour le dispositif ASL utilisé sont dérivées. Elles permettent de discuter des phénomènes expérimentaux observés sur les dispositifs ASL.


Au niveau du système, un circuit de convolution et un système Intel i7 sont évalués en explorant les problèmes d’interconnexion: la répartition de l’interconnexion entre les portes et le nombre de tampons inséré. Avec des paramètres théoriques, les résultats montrent que le circuit/système ASL peut surpasser le circuit/système basé sur CMOS. De plus, le schéma de pipeline du circuit basé sur ASL est discuté avec MTJ comme tampons insérés entre les étapes. La reconfigurabilité provoquée par les polarités/valeurs du courant d’injection et les états des terminaux de contrôle des circuits ASL sont également discutés avec l’exploration reconfigurable des circuits logiques de base.
Title: Compact modeling and circuit design based on spin injection

Keywords: All spin logic, compact modeling, design methodology, pipelining, reconfigurability

Abstract: The CMOS technology has tremendously affected the development of the semiconductor industry. However, as the technology node is scaled down, the CMOS technology faces significant challenges set by the leakage power and the short channel effects. To cope with this problem, researchers pay their attention to the spintronics in recent years, considering its possibilities to allow smaller size fabrication and lower power operations. The magnetic tunnel junction (MTJ) is one of the most important spintronic devices which can store binary data based on Tunnel MagnetoResistance (TMR) effect. Except for the non-volatile memory, MTJ can be also used to combine with or replace the CMOS circuits to implement a hybrid circuit, for the potential to achieve low power consumption and high speed performance. However, the problem of frequent spin-charge conversion in a hybrid circuit may cause large power consumption, which diminishes the advantage of the hybrid circuits. Therefore, the ASL concept which uses a pure spin current to transport the information is proposed for fewer charge-spin conversions, thus for less power consumption. The design of ASL device-based circuits leads to numerous challenges related to the heterogeneity they introduce and the large design space to explore. Hence, this thesis focus on filling the gap between application requirements at the system level and the device fabrication at the device level.

In device level, we developed a compact model integrating the STT, the TMR, the spin injection/accumulation effects, the channel breakdown current and the spin diffusion delay. Validated by comparing with experimental results, this model allows exploring fabrication-related device parameters such as channel lengths and MTJ sizes and help designers to prevent from device damages. Moreover, programmed with Verilog-A on Cadence and divided into several blocks: injector, detector, channel and contact devices, this model allows the independent design and cross-layer optimization of ASL-based circuits, that eases the design of hierarchical, complex circuits. Furthermore, the spin injection/accumulation expressions for the used ASL device are derived, enabling to discuss the experimental phenomena of the ASL device.

In circuit level, we developed a circuit/system design methodology, taking into account the channel distribution, the gate interconnection and the different injection current ratios caused by the spin diffusion. With circuit/system specifications and constraints, the boolean functions of a circuit are synthesized based on the developed synthesis method and fabrication-level parameters: channel lengths, MTJ sizes are specified. Based on this developed methodology, basic combinational circuits that form a circuit library are designed and evaluated by using the developed compact model.

In system level, a convolution circuit and an Intel i7 system are evaluated exploring the interconnection issues: interconnection distribution between gates and inserted buffer count. With theoretical parameters, results show that ASL-based circuit/system can outperform CMOS-based circuit/system. Moreover, the pipelining schema of the ASL-based circuit is discussed with MTJ as latches inserted between stages. The reconfigurability caused by the injection current polarities/values and the control terminal states of ASL-based circuits are also discussed with the reconfigurable exploration of basic logic circuits.