[PDF] DMR-based Technique for Fault Tolerant AES S-box Architecture

Abstract

This paper presents a high-throughput fault-resilient hardware implementation of AES S-box, called HFS-box. If a transient natural or even malicious fault in each pipeline stage is detected, the corresponding error signal becomes high and as a result, the control unit holds the output of our proposed DMR voter till the fault effect disappears. The proposed low-cost HFS-box provides a high capability of fault-tolerant against transient faults with any duration by putting low area overhead, i.e. 137%, and low throughput degradation, i.e. 11.3%, on the original implementation.

Full PDF

1 DMR-based Technique for Fault Tolerant AES S-box Architecture

Mahdi Taheri , Saeideh Sheikhpour , Mohammad Saeed Ansari and Ali Mahani Department of Electrical Engineering, Shahid Bahonar University, Kerman, Iran Eideticom Computational Storage, Calgary, AB, Canada

Abstract — This paper presents a high-throughput fault-resilient hardware implementation of AES S-box, called HFS-box. We propose deep pipelining S-box in the gate level in which a novel DMR technique is used for fault correction. Proposed fault-resilient technique is based on fault correction in DMR implementation (FC-DMR) of each S-box’s combined with a temporal redundancy technique. If a transient natural or even malicious fault(s) in each pipeline stage is detected, the corresponding error signal become high and as a result the control unit holds the output of our proposed DMR voter till the fault effect disappears. Proposed low-cost HFS-box provide a high capability of fault tolerant against transient faults with any duration by putting low area overhead, i.e. 137%, and low throughput degradation, i.e. 11.3%, on the original implementation.

Keywords — Fault-Resilient, AES, S-box, High-Throughput. I. I NTRODUCTION

Dependable applications, like secure information systems, remote security services, online banking, etc play an important role in our daily lives. Secure storage and communication are critical requirements of these applications. Nowadays, cryptography is extensively used in dependable applications to meet these critical requirements and in consequence, prevent any unauthorized access to the secure information. Another important requirement of dependable application is reliability. Therefore, in many cases, a fault resilient approach incorporated with original hardware implementation [1]. The Advanced Encryption Standard (AES) [2] was standardized by the National Institute of Standards and Technology (NIST) in 1997. After that AES has been one of the most common symmetric cryptographic algorithms. Until now, many hardware implementations of the AES were proposed with different characteristics [3-6] which each of them is suited for different applications with different constraints. Recently, many fault injection attacks has been proposed on AES [7-9]. In a fault attack, attackers inject malicious faults into the VLSI design of cryptographic primitives to extract secure information (i.e. cryptographic key). On the other hand, with transistor size downscaling, reducing power supply voltage level, increasing operating frequencies and therefore reducing noise margins, VLSI hardware designs will be more and more sensitive to random faults occurrence [10]. All random faults that occur in VLSI designs can be grouped into transient and permanent faults. To thwart the random and/or malicious faults effect, various fault resilient hardware implementations of AES were proposed [11-14]. AES include four basic operations, named SubByte, ShifRows, MixColumns and AddRoundKey. The hardware implementation of SubByte operation is realized with 16 S-Boxes that are nonlinear mapping in which replace each byte of state array with another byte. It also occupies much of the total AES hardware implementation area [15]. So, integrating its hardware implementations with an efficient fault resilient scheme is crucial for making the AES robust to the random and/or malicious faults. There are many online error detection schemes for SubByte implementation of AES, see for example [16-17]. Just a few studies among previous research works have addressed the fault correction. In fact, in most of the previous studies only detection task is considered and so, for their solutions the extra corrective operations should be employed. In [18] a hybrid redundancy in which hardware redundancy and time redundancy are combined for fault correction in S-box is proposed. Their proposed S-box architecture can tolerate the single faults. It's worth noting that the fault tolerant S-box in [18] provided a high level of reliability against the natural faults due to the essence of electronic devices, not the malicious faults in the fault attacks. The main aim of the present paper is to propose a high-throughput fault-resilient hardware implementation of the AES S-box. We propose a correction scheme in hardware level so that the circuit frequency is not significantly affected. In this paper a high-speed design is considered. In fact, we exploit the features of gate-level implementation of S-box allowing pipeline technique to speed up the hardware implementation of SubByte operation of AES. The proposed technique is also practical for any generic cipher block. The main contributions of this paper are including as follows: -

We present an implementation of high-throughput and lightweight S-box in the gate level for high-speed AES encryption. -

We propose a fault-resilient technique, i.e. FC-DMR, for real-time applications which cannot tolerate high running time and require a high-speed process. Proposed technique generally could be used in all digital functional units. -

We design a new DMR voter which is composed of the standard library components and could be implemented on any digital platform such as FPGA and ASIC. 2

Fig. 1. Composite field based S-box architecture.

The rest of this paper is organized as follows. Section II presents a brief background of the S-box of AES algorithm and its implementation. Section III presents the proposed fault-resilient technique (FC-DMR) besides our DMR voter model. It also describes the HFS-box architecture. We evaluate the proposed architecture’s architectural characteristics in terms of area, frequency and throughput in section IV. Finally, section V concludes the paper. II. S- BOX IMPLEMENTATION

In this subsection, we describe the S-box operation and its utilized architecture. The proposed S-box architecture using composite-field in [19] is employed in this paper. The S-box operation which is believed to be most resource consuming among other AES operation, is a nonlinear mapping on each state array byte. This nonlinear mapping is nothing but finding a multiplicative inverse over

GF(2 ) , i.e. 𝑥 −1 𝜖 𝐺𝐹(2 ) followed by an affine transformation. In other words, if 𝑦 = 𝑆𝐵(𝑥) and 𝑋𝜖𝐺𝐹(2 ) and 𝑌𝜖𝐺𝐹(2 ) , then we have: 𝑦 = 𝐴𝑥 −1 + 𝑏 = [ 1 1 0 0 0 0 1 00 1 0 0 1 0 1 00 1 1 1 1 0 0 10 1 1 0 0 0 1 10 1 1 1 0 1 0 10 0 1 1 0 1 0 10 1 1 1 1 0 1 10 0 0 0 0 1 0 1] 𝑥 −1 + [ 00001011] (1) Since direct multiplicative inversion of S-box computation is costly, multiplicative inversion in composite fields is preferred [20]. This implementation leads to lower complexity and smaller implementation area. The S-box implementation using composite-field and polynomial basis is illustrated in Fig. 1. As shown in this figure, the 8-bit input of multiplicative inversion, i.e., 𝑋 = ∑ 𝛼 𝑖 𝑥 𝑖7𝑖=0 in the binary field GF(2 ) ,using the transformation matrix δ transforms to composite-field 𝐺𝐹(2 )/𝐺𝐹(((2 ) ) ) . In turn, the output of the multiplicative inverse from composite-field transforms back to binary field GF(2 ) by the inverse transformation matrix δ -1 to obtain X -1 . The hierarchical composite-field decomposition, i.e., 𝐺𝐹(((2 ) ) ) → 𝐺𝐹((2 ) ) , 𝐺𝐹((2 ) ) → 𝐺𝐹(2 ) and 𝐺𝐹(2 ) → 𝐺𝐹(2) , can be made using the irreducible polynomials of 𝑥 + 𝑥 + 𝜆 , 𝑥 + 𝑥 + 𝜑 and 𝑥 + 𝑥 + 1 , respectively. As shown in Fig. 1, the output of S-box i.e., Y, is obtained using the affine transformation after inverse transformation (δ -1 ) [19]. The S-box compose of the multiplications, the squaring and the inversion that all of them are over 𝐺𝐹((2 ) ) . Besides these arithmetic blocks, the S-box includes modulo-2 addition that realized by XOR gates, see Fig. 1. Considering this figure, the output of the S-box can be formulated as following: 𝜎 ℎ = ((𝜉 ℎ + 𝜉 𝑙 )𝜉 𝑙 + 𝜉 ℎ2 𝜆) −1 𝜉 ℎ (2) 𝜎 𝑙 = ((𝜉 ℎ + 𝜉 𝑙 )𝜉 𝑙 + 𝜉 ℎ2 𝜆) −1 (𝜉 ℎ + 𝜉 𝑙 ) (3) Where, the ξ and σ are the input and output of the multiplicative inversion, respectively. III. PROPOSED FAULT CORRECTION STRUCTURE (F C - DMR ) A. FC-DMR

We propose a correction technique in a DMR implementation of a digital circuit (FC-DMR) depicted in Fig. 2. The proposed FC-DMR protects the operation of both combinational and the sequential parts of a digital circuit in each pipeline stage. Fig. 2 depicts an instance pipeline stage i in the intended circuit. As depicted in this Figure, our FC-DMR is consist of the following elements: -

Pipeline Logic i (original) : a part of system’s combinational logic utilized to process data in the original mode in the i th pipeline stage. - Pipeline Logic i (redundant) : a redundant copy of the original i th pipeline stage utilized to process data in the redundant mode in the i th pipeline stage. - Register stage i : the register or sequential part of the i th pipeline includes DMR register and two DMR voters to preserve the correct state in present of fault. - DU : the fault detection unit which is actually implemented using a comparator must provide the output error signal err i which indicate any differences in the DMR register in the i th pipeline stage occur. - CU : the control unit produce the Err which is a general error signal and indicates a fault occurrence in the system (any pipeline stage), i.e., a fault is detected. 3

Fig. 2. Proposed fault correction technique in DMR implementation (FC-DMR).

The input of each pipeline stage is processed by the pipeline logic and its redundant unit. The corresponding output of the original and redundant pipeline logic units are stored in the register stages, i.e. pipeline register and pipeline register , respectively. If the register’s contents are identical, no fault(s) is (are) detected. Otherwise, the comparator CMP i ’s output in DU i , i.e. err i , will be activated. Two DMR voter are employed to protect both combinational and sequential part of the system. Proposed technique can correct any transient fault which occurs in a single S-box. When a fault detects in any pipeline stage components, either in the logic stages, in the pipeline registers, or in the DU , the CU will reset its output, i.e. Err and later it prevents loading the incorrect state on the output of DMR voters. Hence, the pipeline logics process previous correct state till the fault effect disappears. When the fault effect disappears, the next correct state will process without any problem. This solution may put a negligible delay overhead on the critical path due to the comparison and voting. B. Proposed voter

The employed voter does the two tasks of a majority voter in a DMR technique which is: holding the previous state when faces a mismatch and changing the vote signal’s value when both modules produce a same output.

In fact, when the outputs of the two replicas are not the same as each other which means an error has occurred, the voter holds the previous value until the two replicas’ outputs become similar. Besides, our design has a delay module which is useful in case of the comparator faces a mismatch. This delay makes it possible to affecting the enable signals. Enables are provided to control internal wires not to send the faulty signals to voter’s output which means that pipeline stage be unchanged until the correct value gains and the sequence in our pipeline design remains unaffected.

Fig. 3. Proposed voter in gate level. C. HFS-box

The main contribution of this paper is proposing high-throughput fault-resilient hardware implementations of S-box. We propose a fully pipeline implementation of S-box in composite field approach which leads to reduction of the circuit critical path. In fact, this solution enables us to enhance frequency of clock signal in our proposed method and also makes it suitable for meeting the high-speed application requirements. Proposed pipeline S-box is depicted in Fig. 4. We place pipeline registers into this schema which are illustrated by the dotted lines. As depicted in this figure, proposed S-box architecture (shown in Fig. 1) is divided into 5 stage. This pipeline registers are inserted to S-box architecture so that the critical path is optimally pipelined. This architecture is integrated with proposed FC-DMR to achieve fault tolerance for any transient fault in both combinational and sequential parts in any pipeline stage of a single S-box, named HFS-box. In HFS-box each DMR implementation of pipeline logic is lied between two register stages to check against fault occurrence as depicted in Fig. 2. 4

Fig. 4. The architecture of the S-box with 5-stage pipeline.

IV. I MPLEMENTATION RESULT

To evaluate the proposed HFS-box, we compare it with the TMR and TTR implementation of S-box, as traditional fault tolerant structures with high fault correction capability. We report the synthesis result by using the TSMC 180 nm CMOS. We employ Verilog as design entry description language and Synopsys DC as the-synthesis tool. It should be noted the 8-bit SubByte operation is considered so in each structure a single S-box is needed.

Table 1. Throughput, maximum frequency, area result.

Design metric Original TMR TTR HFS-box Area GE 212.42 673.31 279.02 503.46 % - 216 31.35 137 Frequency MHz 555 525 519 492 % - -5.4 -6.4 -11.3 Throu. Mbps 4440 4200 1384 3936 % - -5.4 -68.8 -11.3 Fault tolerance Transient     Permanent     Security against fault attack     In this section, the ASIC implementation results of all fault-tolerant S-box implementations are reported and compared. The design features that we consider contain area, area overhead, frequency and frequency overhead. Table 1 presents the implementation results of all fault resilient designs. In this table, we use equations 4 to calculate the- the-cost overhead.

𝑂𝑣𝑒𝑟ℎ𝑒𝑎𝑑 = 𝐶 𝐹𝑇 − 𝐶 O 𝐶 O (4) Where, 𝐶 O is the original implementation cost (area, frequency, throughput, etc.), and 𝐶 𝐹𝑇 is the cost of the fault tolerant implementation. It can be seen that TTR has the lowest area overhead (44.5% and 58.54% reduction compared to HFS-box and TMR, respectively) and at the same time lower throughput, (64.83% and 67.04% worse than HFS-box and TMR, respectively). HFS-box requires about 503 NAND gate equivalences (GEs). Actually, it puts more area overhead than TTR but still is much better than TMR (25.22% better than TMR). However, TMR achieves the best throughput among all fault resilient architectures, its security and reliability against fault attacks is lower than our HFS-box and also it puts much more area overhead on the original S-box than HFS-box. In fact, proposed low-cost HFS-box can continue its proper task without considerable negative impact on the system speed or even any traditional recovery scheme. It is a suitable fault tolerant technique for resource-constrained applications that require a high level of security. V. C ONCLUSION

In this paper, we proposed a lightweight high-throughput fault-resilient architecture for composite field S-box implementation of AES which consume the largest space in AES, named HFS-box. The proposed fault-resilient technique is based on fault correction in DMR implementation (FC-DMR) combined with a temporal redundancy technique. It is able to correct transient faults which may occur in S-box naturally or maliciously. Our solution is valid for any digital circuit implementation (specially block cipher hardware implementation) with different level of pipelining. HFS-box uses 5 pipeline stage to meet the real-time application requirements for speed and throughput. Indeed, we inserted pipeline registers in optimal places in the S-box architecture. Furthermore, we introduced a compatible DMR voter with our FC-DMR. The proposed HFS-box and two well-known methods with high fault-tolerant ability, i.e. TMR, TTR have been implemented on ASIC using TSMC 180nm CMOS technology and their area, 5 frequency and throughput have been derived and reported. The synthesis results pointed out that the HFS-box has a low area overhead (137%) and low throughput degradation (11.3) compared with other fault tolerant schemes. R

EFERENCES [1]

S. Patranabis, and D. Mukhopadhyay, "Fault Tolerant Architectures for Cryptography and Hardware Security", Berlin: Springer, 2018. [2]

National Institute of Standards and Technologies, Announcing the Advanced Encryption Standard (AES) FIPS 197, Nov. 2001. [3]

D. Bui, D. Puschini, S. Bacles-Min, E. Beigné and X. Tran, "AES Datapath Optimization Strategies for Low-Power Low-Energy Multisecurity-Level Internet-of-Things Applications," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 12, pp. 3281-3290, Dec. 2017. [4]

D.-S. Kundi, A. Aziz, N. Ikram, "A high performance ST-Box based unified AES encryption/decryption architecture on FPGA", Microprocessors and Microsystems, vol. 41, no. 1, pp. 37-46, 2015. [5]

S. S. Priya,, P. Karthigaikumar, N. M.Siva-Mangai , P. K. Gaurav-Das, "An-Efficient Hardware Architecture for High Throughput AES Encryptor Using MUX Based Sub Pipelined S-Box", Wireless Personal Communications, vol. 94, no. 4, pp.2259-2273, 2017. [6]

S. Shanthi Rekha and P. Saravanan, "Low-Cost AES-128 Implementation for Edge Devices in IoT Applications", Journal of Circuits, Systems and Computers, vol. 28, no.4, pp.1950062, 2019. [7]

E. Biham, A. Shamir, "Differential Fault Analysis of Secret Key Cryptosystems", Proc. Advances in Cryptology (CRYPTO '97), pp. 513-525, 1997.

FLEXChip Signal Processor (MC68175/D) , Motorola, 1996. [8]

T. Fuhr, E. Jaulmes, V. Lomné and A. Thillard, "Fault Attacks on AES with Faulty Ciphertexts Only," 2013 Workshop on Fault Diagnosis and Tolerance in Cryptography, Santa Barbara, CA, 2013, pp. 108-118. [9]

S. S. Mukherjee, J. Emer and S. K. Reinhardt, "The soft error problem: an architectural perspective," 11th International Symposium on High-Performance Computer Architecture, San Francisco, CA, USA, 2005, pp. 243-247. [11]

X. Guo and R. Karri, "Recomputing with Permuted Operands: A Concurrent Error Detection Approach," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 10, pp. 1595-1608, Oct. 2013. [12]

M. Mozaffari-Kermani, A. Reyhani-Masoleh, "Concurrent structure independent fault detection schemes for the advanced encryption standard", IEEE Transaction on computers, vol. 59, pp. 608-622, 2010. [13]

H. Mestiri, F. Kahri, B. Bouallegue, M. Machhout, "A high-speed AES design resistant to fault injection attacks", Microprocessors and Microsystems Journal Elsevier, vol. 41, pp. 47-55, 2016. [14]

M. Bedoui, H. Mestiri, B. Bouallegue, M. Marzougui, M. Qayyum and M. Machhout, "An improved and efficient countermeasure against fault attacks for AES," 2017 2nd International Conference on Anti-Cyber Crimes (ICACC), Abha, 2017, pp. 209-212. [15]

S. Morioka and A. Satoh, “An optimized s-box circuit architecture for low power aes design,” Cryptographic Hardware and Embedded Systems-CHES 2002, pp. 271–295, 2003. [16]

M. Mozaffari-Kermani and A. Reyhani-Masoleh, "A Lightweight High-Performance Fault Detection Scheme for the Advanced Encryption Standard Using Composite Fields," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 1, pp. 85-91, Jan. 2011. [17]

M. Mozaffari-Kermani, A. Reyhani-Masoleh, "Fault Detection Structures of the S-boxes and the Inverse S-boxes for the Advanced Encryption Standard", J. Electronic Testing, no. 4, pp. 225-245, Aug. 2009. [18]

T. An, Lirida A. B. Naviner and M. Philippe “A low cost reliable architecture for S-Boxes in AES processors”, 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), pp. 155-160, 2013. [19]