Clockless Spin-based Look-Up Tables with Wide Read Margin
CClockless Spin-based Look-Up Tables with Wide Read Margin
Soheil Salehi, Ramtin Zand, Ronald F. DeMara
Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, 32816 USA
ABSTRACT
In this paper, we develop a 6-input fracturable non-volatile Clock-less LUT (C-LUT) using spin Hall effect (SHE)-based Magnetic Tun-nel Junctions (MTJs) and provide a detailed comparison betweenthe SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-based C-LUT. The proposed C-LUT offers an attractive alternativefor implementing combinational logic as well as sequential logicversus previous spin-based LUT designs in the literature. Foremost,C-LUT eliminates the sense amplifier typically employed by us-ing a differential polarity dual MTJ design, as opposed to a staticreference resistance MTJ. This realizes a much wider read marginand the Monte Carlo simulation of the proposed fracturable C-LUTindicates no read and write errors in the presence of a variety ofprocess variations scenarios involving MOS transistors as well asMTJs. Additionally, simulation results indicate that the proposed C-LUT reduces the standby power dissipation by 5 . . CCS CONCEPTS • Hardware → Spintronics and magnetic technologies ; Emerg-ing architectures ; Asynchronous circuits ; Combinational cir-cuits ; Programmable logic elements ; Process, voltage and tem-perature variations ; KEYWORDS
Reconfigurable Logic, Fracturable LUT, Magnetic Tunnel Junction,Spin-based Memory Cell, Spin Hall Effect, Spin Transfer Torque.
Flexibility and runtime adaptability are two of the main motivationsfor the wide adoption of reconfigurable fabrics. Among the mostcommonly used reconfigurable fabrics, Field Programmable GateArrays (FPGA) have been the primary focus due to their flexibil-ity that allows realization of logic elements at medium and finegranularities while incurring low non-recurring engineering costsand rapid deployment to market. Additionally, FPGAs have beenresearched as promising platform that can be utilized effectively toincrease reliability in case of process-voltage-temperature variation[1]. The main challenge of static random access memory (SRAM)-based FPGAs is their increased area and power consumption toachieve flexible design. The main components of FPGAs are Look-Up Tables (LUTs) and switch boxes that are mainly consisted ofSRAM cells [6]. However, SRAM-based LUTs incur limitations suchas high static power, volatility, and low logic density.Innovations using emerging devices within FPGAs have beensought to bridge the gaps needed to overcome the limitations ofSRAM-based FPGAs. High-endurance non-volatile spin-based LUTshave been studied in the literature as promising alternatives to SRAM-based LUTs, Flash-based LUTs, and other state-of-the-artemerging LUTs such as resistive random access memory (RRAM)-based LUTs and phase change memory (PCM)-based LUTs [2, 4, 10–12, 14]. Spin-based devices offer non-volatility, near-zero staticpower, high endurance, and high integration density [9, 13]. Thespin-based LUTs presented in the literature [2, 4, 10–12, 14] requireseparate read and write operations as well as a clock, which makesthese LUTs a suitable candidate for sequential logic operations.However, the main challenge that has not been addressed in theliterature is providing a spin-based LUT design for combinationallogic operation without the need for a clock. Additionally, proposedspin-based LUTs proposed in the literature fail to maintain a widesense margin and high reliability without incurring significant areaand power dissipation overheads [2, 4, 10–12, 14]. In this paper,in order to address the aforementioned challenges, we develop aclockless 6-input fracturable non-volatile Combinational LUT (C-LUT) with wide read margin using spin Hall effect (SHE)-basedMagnetic Tunnel Junction (MTJ) and provide a detailed comparisonbetween the SHE-MRAM and Spin Transfer Torque (STT)-MRAMC-LUTs. Additionally, we provide detailed analysis on the reliabilityof our proposed C-LUT in the presence of Process Variation (PV).
The primary goal of using LUTs in the reconfigurable fabrics isfor implementing combinational logic. Generally, M -input Booleanfunctions are implemented using LUTs that are considered a mem-ory that has 2 M memory cells. The inputs are assigned using a selecttree which is constructed with Pass Transistors and TransmissionGates (TGs) [15]. Most contemporary FPGAs, utilize fracturable6-input LUTs in their design in order to be able to implement one6-input boolean function or two 5-input boolean functions [7]. Fig.1(a) depicts our proposed 6-input fracturable SHE-MRAM C-LUTand Fig. 1(b) illustrates the 6-input fracturable STT-MRAM C-LUT.In Fig. 1(a) and Fig. 1(b), where red color indicates the write pathand black color indicates the read path. When the WWL and
WWL signals are asserted, the Write TGs of each memory cell,
TGW1 and
TGW2 , will turn on and using Bit Lines, BL i , and Source Lines, SL i , we write into both MTJs in each memory cell, MTJ i and MTJ i ,so that they hold complementary values. If MTJ i is in the P statethen MTJ i will be in the AP state and vice versa. This will result ina wide read margin during the read operation.After the termination of the write operation, in order to read thedata stored in the MTJs, RWL and
RWL signals will be enabled,which results in activation of Read TGs of each memory cell,
TGR .During the read operation, PR and NR transistors are turned onwhen RWL and
RWL are asserted, which provides the read pathfrom
VDD to GND . The source of PR , which is a PMOS transistor,is connected to VDD to provide strong one and the source of NR ,which is an NMOS transistor, is connected to GND to provide strong a r X i v : . [ c s . ET ] M a r a)(b) Figure 1: The circuit-level diagram of the proposed -inputfracturable Combinational Look-Up Table (C-LUT) using (a)SHE-MTJ devices and (b) STT-MTJ devices. zero. A voltage divider circuit is designed as a result of resistancedifference between the MTJ i and MTJ i , and the divided voltagecan be observed at the D i nodes shown in Fig. 1(a) and Fig. 1(b).According to the select tree input signals, shown as A , B , C , D , E , and F in Fig. 1, using two inverters, the voltage on D i nodeswill be amplified to generate the required output. Since the valuesstored in the MTJ i and MTJ i devices are complementary, using oneMTJ device to retain the data value and the other as the referencevalue will result in a wide read margin from AP to P [8], which weleverage herein to increase the reliability of the read operation.In the proposed C-LUT design there is no need for an externalclock or a large sense amplifier circuit. Furthermore, the proposed Table 1: Comparison between SRAM-LUT and MRAM-LUT.
Power ( µW ) DelayRead Write Standby Read WriteSRAM LUT Logic “0” 2.58 28.4 1.5 30 ps 20 psLogic “1” 7.55 27.7 1.85 30 ps 20 psAverage 5.06 25.08 1.67 30 ps 20 psMRAM C-LUT Logic “0” 14.38 81.16 0.31 20 ps 2 nsLogic “1” 19.91 81.25 0.31 60 ps 2 nsAverage 17.15 81.18 0.31 40 ps 2 ns Table 2: Area and Energy Consumption comparison be-tween SRAM LUT and MRAM C-LUT.
Features SRAM LUT MRAM C-LUTStorage Cells 384 MOS 128MTJDevice Write/Control 384 MOS 256 × (1) Count Read 261 MOS 267 MOSTotal 1029 MOS 1547 MOS + 128 MTJAverage Energy Read 2.53 fJ 8.58 fJConsumption Write 14 fJ 162.36 fJ (1)
Write transistors are 4 × larger than minimum feature size. fracturable C-LUT can perform as a single 6-input LUT or two 5-input LUTs. The Operation mode of the proposed LUT is controlledusing S5 and S6 signals. If S5 signal is enabled and S6 is disabled,then the C-LUT will be operating as two 5-input LUTs and theoutputs of the C-LUT will be OUT0 and
OUT2 . On the other hand,if S5 signal is disabled and S6 signal is enabled, then the C-LUT willbe operating as a 6-input LUT and OUT1 will be the C-LUT’soutput. The proposed fracturable C-LUT provides significantlyhigher functional flexibility at the expense of slightly more powerconsumption as studied in Section 3.
Herein, we use the HSPICE circuit simulator to validate the function-ality of proposed C-LUT using 45nm CMOS technology and the STT-MRAM model developed by Kim et al. in [5]. Figure 2(a) and 2(b)show the transient response of the C-LUT implementing a 6-inputOR operation for
ABCDEF = “000000” and ABCDEF = “111111”input signals, respectively. In order to generate the current requiredfor a write delay of less than 2ns, the write transistors are requiredto be enlarged 4-fold. As shown, the HSPICE simulations verify thecorrect functionality of our proposed C-LUT.Table 1 lists comparison results between the SRAM-LUT andproposed C-LUT in terms of power consumption and delay. Theresults show more than 80% standby power reduction at the cost ofincreased write power which can be tolerated due to its infrequentoccurrence of write operations in LUTs. There are three energyprofiles in the FPGA LUT circuits: ( ) Read energy consumptionduring the FPGA normal operation, ( ) Standby energy for theLUTs that are not on the active datapath, which can constitute asignificant portion of the FPGA fabric, and ( ) write energy that isconsumed during the LUTs’ configuration operation which occurs a)(b) Figure 2: Transient response of C-LUT implementing -input OR operation for (a) ABCDEF = “000000” input signal,and (b) ABCDEF = “111111” input signal. rarely. Table 2 provides an area and energy consumption compari-son between SRAM-LUT and C-LUT. As listed, the structure of a6-input MRAM-based C-LUT requires 1 ,
547 MOS transistors plus128 MTJs, which can be fabricated on top of the CMOS transis-tors incurring low area overhead, while the conventional 6-inputSRAM-LUT includes 1 ,
029 MOS transistors. This results in an areaoverhead of roughly 50% for C-LUT compared to SRAM-LUT, whichis primarily induced by the write circuits. Thus, innovations are
Table 3: Iso-Delay Area and Write Energy Consumptioncomparison between STT-MRAM and SHE-MRAM C-LUTs.
Features C-LUTSTT-MRAM SHE-MRAMStorage Cells 128MTJ 128MTJDevice Write/Control (256 × (1) (2) Count Read 267MOS 267MOSTotal 1547MOS+128MTJ 779MOS+128MTJAverage Write 162.3 fJ 175.5 fJEnergy per Cell (1)
Write transistors are × larger than minimum feature size. (2) Write transistors with minimum feature size are used. sought to reduce the area and energy consumption of the MRAMcell’s write circuit to mitigate these issues. Recently, SHE-MRAMcells have attracted considerable attentions as an alternative for theconventional STT-MRAMs. Herein, we have used the SHE-MRAMdevice model proposed by Camsari et al. [3] to realize a circuit-levelsimulation of our SHE-MRAM C-LUT. The results obtained exhibitthat a TG-based write circuit with minimum-sized MOS transistorscan produce the sufficient write current amplitude required forswitching the SHE-MRAM’s state in less than 2ns. Thus, table 3provides an iso-delay comparison between STT-MRAM and SHE-MRAM C-LUT in terms of device count and write energy. As listed,the SHE-MRAM C-LUT can achieve more than 49% area reduction,while realizing comparable write energy consumption. Moreover,the SHE-MRAM C-LUT achieves at least 24% device count reductioncompared to SRAM-LUT.Furthermore, to analyze the reliability of the read and write op-erations of the proposed C-LUT, Monte Carlo (MC) simulation isperformed to cover a wide range of PV scenarios that may occurin the fabricated device. The MC simulation is performed with1 ,
000 instances considering the effects of PV on CMOS peripheralcircuit and the MTJs. In particular, variation of 10% for the MTJs’dimensions along with 10% variation on the threshold voltage and1% variation on transistors dimentions are assessed. Fig. 3(a) de-picts the distribution of the switching times for T P − AP and T AP − P ,Fig. 3(b) illustrates the distribution of MTJ resistances in R AP and R P states, and Fig. 3(c) shows the distribution of read, I READ , andwrite, I W rite currents for the 1 ,
000 MC instances. According to theMC simulation results, C-LUT provides reliable write performanceresulting in less than 0 . ,
000 error-free MCinstances. In particular, results of the MC simulation show that theswitching time for P − AP is 1 . AP − P is 1 . . ,
000 error-free MC simulation results.Furthermore, our proposed C-LUT does not suffer from read distur-bance due to the small read current compared to the write currentas shown in Fig. 3(c). According to our MC simulation results, theread current is 38 . µ A on average, which is significantly lowerthan the write current that is 71 . µ A on average. a) (b) (c)
Figure 3: Simulation Results of , MC instances for (a) T P − AP and T AP − P Switching Times, (b) R AP and R P resistance states,and (c) read, I READ , and write, I W rite currents.
To overcome the conventional SRAM-LUT limitations such as highstatic power, volatility, and low logic density, we have proposed anovel LUT design using spin-based devices. The proposed C-LUT isa clockless design and a suitable candidate for combinational logic,which can also be combined with a flip-flop circuit to implementsequential logic. According to our simulation results, the standbypower dissipation of the proposed C-LUT is 0 . µ W, which is re-duced by 5 . < . ACKNOWLEDGEMENT
This work was supported in part by the National Science Foundation(NSF) through ECCS-1810256.
REFERENCES [1] Rawad Al-Haddad, Rashad S. Oreifej, Ramtin Zand, Abdel Ejnioui, and Ronald F.DeMara. 2015. Adaptive Mitigation of Radiation-Induced Errors and TDDB inReconfigurable Logic Fabrics. In .IEEE, 23–32. https://doi.org/10.1109/NATW.2015.14[2] Aliyar Attaran, Tyler David Sheaves, Praveen Kumar Mugula, and HamidMahmoodi. 2018. Static Design of Spin Transfer Torques Magnetic Look UpTables for ASIC Designs. In
Proceedings of the 2018 on Great Lakes Sympo-sium on VLSI - GLSVLSI ’18 . ACM Press, New York, New York, USA, 507–510.https://doi.org/10.1145/3194554.3194651[3] Kerem Yunus Camsari, Samiran Ganguly, and Supriyo Datta. 2015. Modularapproach to spintronics.
Scientific reports
5, 1 (9 2015), 10571. https://doi.org/10.1038/srep10571[4] Kejie Huang, Yajun Ha, Rong Zhao, Akash Kumar, and Yong Lian. 2014. ALow Active Leakage and High Reliability Phase Change Memory (PCM) BasedNon-Volatile FPGA Storage Element.
IEEE Transactions on Circuits and SystemsI: Regular Papers
61, 9 (9 2014), 2605–2613. https://doi.org/10.1109/TCSI.2014.2312499 [5] Jongyeon Kim, An Chen, Behtash Behin-Aein, Saurabh Kumar, Jian-Ping Wang,and Chris H. Kim. 2015. A technology-agnostic MTJ SPICE model with user-defined dimensions for STT-MRAM scalability studies. In . IEEE, 1–4. https://doi.org/10.1109/CICC.2015.7338407[6] Ian Kuon, Russell Tessier, and Jonathan Rose. 2008. Fpga architecture: Survey andchallenges.
Foundations and Trends in Electronic Design Automation . IEEE, 342–345. https://doi.org/10.1109/ICCD.2018.00058[9] Soheil Salehi, Deliang Fan, and Ronald F Demara. 2017. Survey of STT-MRAMCell Design Strategies: Taxonomy and Sense Amplifier Tradeoffs for Resiliency.
ACM Journal on Emerging Technologies in Computing Systems
13, 3 (2017), 1–16.https://doi.org/10.1145/2997650[10] Daisuke Suzuki and Takahiro Hanyu. 2019. Design of a highly reliable, high-speed MTJ-based lookup table circuit using fractured logic-in-memory structure.
Japanese Journal of Applied Physics
58, SB (2 2019), SBBB10. https://doi.org/10.7567/1347-4065/aafd98[11] Daisuke Suzuki, Yuhui Lin, Masanori Natsui, and Takahiro Hanyu. 2013. A71%-Area-Reduced Six-Input Nonvolatile Lookup-Table Circuit Using a Three-Terminal Magnetic-Tunnel-Junction-Based Single-Ended Structure.
JapaneseJournal of Applied Physics
52, 4S (4 2013), 04CM04. https://doi.org/10.7567/JJAP.52.04CM04[12] Xifan Tang, Gain Kim, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli.2016. A Study on the Programming Structures for RRAM-Based FPGA Architec-tures.
IEEE Transactions on Circuits and Systems I: Regular Papers
63, 4 (4 2016),503–516. https://doi.org/10.1109/TCSI.2016.2528079[13] Hiroaki Yoda, Hideyuki Sugiyama, Tomoaki Inokuchi, Yuushi Kato, Yuichi Oh-sawa, Keiko Abe, Naoharu Shimomura, Yoshiaki Saito, Satoshi Shirotori, Kat-suhiko Koui, Buyandalai Altansargai, Souichi Oikawa, Mariko Shimizu, MizueIshikawa, Kazutaka Ikegami, Yuuzo Kamiguchi, Shinobu Fujita, and AtsushiKurobe. 2017. High-Speed Voltage-Control Spintronics Memory (High-SpeedVoCSM). In . IEEE, 1–4. https://doi.org/10.1109/IMW.2017.7939085[14] Ramtin Zand and Ronald F DeMara. 2017. Radiation-hardened MRAM-basedLUT for non-volatile FPGA soft error mitigation with multi-node upset tolerance.
Journal of Physics D: Applied Physics
50, 50 (12 2017), 505002. https://doi.org/10.1088/1361-6463/aa9781[15] Ramtin Zand, Arman Roohi, Soheil Salehi, and Ronald F. DeMara. 2016. ScalableAdaptive Spintronic Reconfigurable Logic Using Area-Matched MTJ Design.