Towards Lightweight Error Detection Schemes for Implementations of MixColumns in Lightweight Cryptography
aa r X i v : . [ c s . CR ] A p r Towards Lightweight Error Detection Schemes forImplementations of MixColumns in LightweightCryptography
Anita Aghaie, Mehran Mozaffari Kermani, Reza Azarderakhsh
Abstract —In this paper, through considering lightweight cryp-tography, we present a comparative realization of MDS matricesused in the VLSI implementations of lightweight cryptography.We verify the MixColumn/MixNibble transformation using MDSmatrices and propose reliability approaches for thwarting naturaland malicious faults. We note that one other contribution of thiswork is to consider not only linear error detecting codes but alsorecomputation mechanisms as well as fault space transformation(FST) adoption for lightweight cryptographic algorithms. Ourintention in this paper is to propose reliability and error detectionmechanisms (through linear codes, recomputations, and FSTadopted for lightweight cryptography) to consider the errordetection schemes in designing beforehand taking into accountsuch algorithmic security. We also posit that the MDS matricesapplied in the MixColumn (or MixNibble) transformation ofciphers to protect ciphers against linear and differential attacksshould be incorporated in the cipher design in order to reducethe overhead of the applied error detection schemes. Finally,we present a comparative implementation framework on ASICto benchmark the VLSI hardware implementation presented inthis paper.
I. I
NTRODUCTION
Research on error detection of primitives in the hardwareVLSI structures of cryptographic algorithms has been center ofattention in prior work [1]-[11]. In addition, cipher designersconstruct the MixColumn transformation by a linear diffusionlayer with maximum branch number, known as maximaldistance separable (MDS) matrices. We also mention thatthe MDS matrices applied in the MixColumn (or MixNibble)transformation of ciphers to protect ciphers against linear anddifferential attacks should be incorporated in the cipher designin order to reduce the overhead of the applied error detectionschemes.To motivate the urgency of embedding error detection aspart of the design cycle, we briefly go over the complicationsof adopting fault FST method for lightweight ciphers (it is amotivation to our proposed criteria in the following sectionsin terms of utilizing MDS matrices as the mapping function).This method which suggests a generic FST mapping for
Anita Aghaie is with Embedded Security Group, Horst Gortz Institutefor IT-Security, Ruhr-Universitaet Bochum, 44780 Bochum, Germany (email:[email protected]).M. Mozaffari Kermani is with the Department of Computer Science andEngineering, University of South Florida, Tampa, FL 33620, USA (e-mail:[email protected]).R. Azarderakhsh is with the Department of Computer and Electrical En-gineering and Computer Science and is an I-SENSE Fellow, Florida AtlanticUniversity, Boca Raton, FL, USA (e-mail: [email protected]). data storage during encryption/decryption operations of AES-like ciphers increases the security of expensive redundancycountermeasures against fault attacks [6]. FST is utilized tomake “fault collision” difficult during attacks on the classicredundancy-based countermeasures, i.e., the adversary wouldhave challenge in injecting same fault in the storage registershaving both original and spatial/time redundancy structures.The countermeasures based on recomputations could fail todetect the occurrence of a fault as long as the adversarycould inject the same fault in both the original and redun-dant computations (biased fault model makes it easier). Thecountermeasures based on recomputations can be used inconjunction with encoding schemes which nullify the effectof the bias in the fault model by FST, thwarting both theseattack schemes.To investigate the importance of using/reusing MDS ma-trices in lightweight block ciphers, we have applied thismethod to the KLEIN cipher. Implementing the “naive” spatialredundancy of KLEIN, we need 232 occupied slices on Virtex-7 (xc7vx330t) with high area overhead. Moreover, we haveimplemented the spatial redundancy with FST method thatapplies MixNibble as a mapping function W through thisredundancy and using InvMixNibble as its inverse W − . Wenote that although KLEIN allows two types of decryptionthrough (a) using encryption transformations but utilizingmodes of operations and (b) reverse transformations, thismight not be the case for other lightweight block ciphers andthat adds to the complications of using FST in lightweightcryptography. Our implementations show that higher area,i.e., 239 occupied slices, is achieved, as expected, due to the W function. Applying the pipeline method, which utilizesMixNibble operation as W function, improves the spatialredundancy algorithm metrics with occupying just 235 sliceson Virtex-7.Our intention in this paper is to propose reliability and errordetection mechanisms (through linear codes, recomputations,and FST adopted for lightweight cryptography) to consider theerror detection schemes in designing beforehand taking intoaccount such algorithmic security. The MDS matrices appliedin the MixColumn (or MixNibble) transformation of ciphersto protect ciphers against linear and differential attacks shouldbe incorporated in the cipher design in order to reduce theoverhead of the applied error detection schemes.1 edundant Computation Transformation W Register Update Inverse TransformationInput/ Intermediate Inputs Original Computation Register Update Error W -1 Applied through MDS matrices
Fig. 1. The FST approach with MDS matrices mapping function.
Algorithm 1 m × m MDS matrix design criteria.
Inputs: A : F m → F m ; X = ( x , ..., x n ) ∈ ( F m ) n ,Outputs: Criteria of MDS matrix.1. Define a linear diffusion L ( X ) = ( [ i = 1] n P L ,i ( x i ) , ..., [ i =1] n P L n,i ( x i )) , for ≤ i, j ≤ n , and if L ◦ L = X, then it isinvolutory.2. Define the bundle weight of X, ω b ( x ) = |{ x i : x i = 0 , ≤ i ≤ n }| ,3. Define the branch number of L: N = min { ω b ( X ) + ω b ( L ( x )) | X ∈ ( F m ) n , X = 0 } , and if N = n + 1 , then it isan MDS matrix. II. E
RROR D ETECTION OF
VLSI A
RCHITECTURES FOR M IX C OLUMN
MixColumn has a significant role to perform as the lineardiffusion layer in the encryption and decryption operationsover the finite fields. Although, there is a wide range of cat-egories such as circulant, Hadamard, Cauchy, and Hadamard-Cauchy for the MDS matrices to apply in MixColumn, choos-ing an efficient MDS matrix should be carefully consideredin terms of low-cost hardware area, high diffusion speed, andlow-latency implementation. One of the common methods toconstruct lightweight MDS matrices, e.g., circulant, is sparingand compacting in implementation, and then composing itseveral times in which it provides similar rows in matricesto reduce the hardware implementation cost (number of XORgates, for instance) like Photon hash functions [12].The design criteria of MDS matrices, e.g., based on alow Hamming weight polynomial, generate a wide pool ofinvolutory and non-involutory MDS matrices. Moreover, thesecurity of these MDS matrices should be considered carefullyduring the design phase to improve the security levels.All types of MDS matrices offer optimal linear diffusion toprovide the proper linear part, the MixColumn operation inblock ciphers and hash functions, but in general, a compactdescription for this matrix on which one is better may notbe very achievable. The criteria in [13] potentially lead tolow number of gates in hardware implementations and smallamount of memory usage. The m × m MDS matrix designcriteria are presented in Algorithm 1.For each of the MDS matrices, we present the multiplicationand reduction operations with irreducible polynomials to countthe number of XOR gates. The number of XOR gates for a number of lightweight block ciphers is presented in TableI which also shows the overhead percentages (this table ispresented at the end of this section).First cipher, Midori64, can utilize three × MDS matricesfor the MixColumn transformation. We investigate two ofthem, i.e., the non-involutive MDS matrix ( M B ) and theinvolutive almost MDS matrix ( M C ). The former one is thesame as the MDS matrix applied in KLEIN (will be shown inmore details), and the latter form, M C , is shown below. Letus denote the input state of MixColumn as A and the outputstate as R . Then, we have the following: R = M C × A = ⇒ r r r r r r r r r r r r r r r r = × a a a a a a a a a a a a a a a a . (1)According to above, we have modulo-2 added three input el-ements of each column to generate each element of the outputmatrix ( R ), in which each output column needs eight XORgates. Because of the fact that the coefficients of the inputstate matrix are 0 or 1, we have the number of XOR gates aseight and twelve XOR gates for cumulative column signatureand interleaved cumulative column signature, respectively, asshown below. r = a + a + a ,r = a + a + a ,r = a + a + a ,r = a + a + a . (2)Let us modulo-2 add the first column of the state output matrixto derive the cumulative column signature-based scheme. r + r + r + r = (0 + 1 + 1 + 1) a + (1 + 0 + 1 + 1) a + (1 + 1 + 0 + 1) a + (1 + 1 + 1 + 0) a . (3)For interleaved cumulative column signature, let us modulo-2 add two even-row elements of the output state, i.e., rows 0and 2, and two odd-row elements, i.e., rows 1 and 3: r + r = (0 + 1) a + (1 + 1) a + (1 + 0) a + (1 + 1) a = a + a , (4) r + r = (1 + 1) a + (0 + 1) a + (1 + 1) a + (1 + 0) a = a + a . (5)As shown in Table I, we need × XOR gates in Midori64with M C , in which the total number of cumulative columnsignature gates is × , and the required XOR gates forinterleaved cumulative column signature is × . Due to thefact that these XOR gates are used in all of the MixColumntransformations similarly, we do not count them in the table.2he last cipher to present the details for the sake ofbrevity, LED, applies a hardware-friendly MDS matrix for theMixColumn transformation, that is given by: M = B E A
92 2
F B . (6)Each entity in the input and the output state matrices isa four-bit nibble. As a case study, to compute r (the firstelement of the resultant matrix R ), denoting the bits of theelements of A as a ij for i th bit of j th element, we have: r = 4 .a + a +2 .a +2 .a = x .a + a + x.a + x.a , (7) r = x . [ a x + a x + a x + a ] + [ a x + a x + a x + a ] + x. [ a x + a x + a x + a ] + x. [ a c x + a c x + a c x + a c ] . (8)Using the irreducible polynomial X + X + 1 utilized forreductions, one can derive the following for r : r = x . [ a c + a + a + a ] + x . [ a + a + a + a + a c ] + x. [ a + a + a + a + a + a c + a c ] + 1 . [ a + a + a c + a ] . (9)Let us derive the formulae for just one column signatureof MixColumn by modulo-2 adding the first column entries r , r , r , and r . One can derive: r = 8 .a + 6 .a + 5 .a + 6 .a = x .a +( x + x ) .a + ( x + 1) .a + ( x + x ) .a , (10) r = B.a + E.a + A.a + 9 .a = ( x + x + 1) .a + ( x + x + x ) .a + ( x + x ) .a + ( x + 1) .a , (11) r = 2 .a + 2 .a + F.a + B.a = x.a + x.a + ( x + x + x + 1) .a + ( x + x + 1) .a . (12)For the cumulative column signature-based scheme, wemodulo-2 add the first column entries of matrix R to derivethe following signature ˆ P : r + r + r + r = (4 + 8 + B + 2) .a + (1 + 6 + E + 2) .a +(2 + 5 + A + F ) .a + (2 + 6 + 9 + B ) .a TABLE IN
UMBER OF GATES NEEDED FOR THE M IX C OLUMN TRANSFORMATIONAND DERIVING THE PREDICTED SIGNATURES IN DIFFERENT LIGHTWEIGHTBLOCK CIPHERS
Block cipher MixCol. CCS Inter. CCSXOR XOR XORMidori64 ( M C ) 128 176 (37.50%) 160 (25%)Midori64 ( M B ) 256 304 (18.75%) 416 (62.50%)LED 444 564 (27.02%) 672 (51.35%)KLEIN (two-nibble) 256 304 (18.75%) 416 (62.50%) = 5 .a + B.a + 2 .a + 6 .a . (13)After the reduction, one can derive the below formulae asthe final form: x [ a + a + a + a + a c + a c + a ] + x . [ a + a + a + a + a + a c + a c + a c + a ]+ x. [ a + a + a + a + a + a + a + a c + a + a c ]+1 . [ a + a + a c + a + a + a + a + a c ] . (14)This can be generalized to other columns and thus we havethe followings for the second to the fourth columns after thereductions: r + r + r + r = 5 .a + B.a + 2 .a + 6 .a , (15) r + r + r + r = 5 .a + B.a + 2 .a + 6 .a , (16) r + r + r + r = 5 .a + B.a + 2 .a + 6 .a . (17)In the following, the other signature-based scheme (in-terleaved cumulative column signature) is derived throughmodulo-2 adding the odd-row elements with each other andthe even ones as well: r + r = F.a + F.a + 8 .a + B.a , (18) r + r = A.a + 4 .a + A.a + D.a . (19)According to these formulae, we are able to count the num-ber of utilized XOR gates and the cumulative column signatureand interleaved cumulative column signature overheads forLED which are presented in Table I. As mentioned above,by default, for each cipher, we need and XOR gatesfor cumulative column signature and interleaved cumulativecolumn signature, respectively, in addition to what presented,which are omitted for the sake of brevity.Finally, as mentioned before, we summarize the number ofXOR gates for all the mentioned ciphers in the MixColumntransformation in Table I, in which the overhead percentagesare presented. We have not used sub-expression sharing,similar to the S-boxes, for deriving the numbers in Table I;nonetheless, sub-expression sharing can be used to reducethe number of gates at the expense of possible high fan-outs (which might not be tolerated in some cases, requiringrepeaters to resolve the problem). Table I and the MixColumn3
ABLE IIASIC TSMC 65-
NM SYNTHESIS RESULTS FOR TWO SELECT M IX C OLUMNTRANSFORMATIONS AND THEIR ERROR DETECTION MECHANISMS ( CUMULATIVE COLUMN SIGNATURES : CCS)Block Cipher GE of ArchitecturesMixCol. CCS Inter. CCSMidori, M C
272 330 (21.3%) 317 (16.5%)LED 575 746 (29.7%) 765 (33.0%) transformations that we have not considered for the sake ofbrevity motivate the urgency of considering the overheadsbeforehand, perhaps as a design factor (we note that other errordetection schemes can be considered and the ones providedhere are just a subset).Similar to the S-boxes, we present two metrics for analyzingthe results in Table I. The first one is the overhead forour error detection schemes shown in Table I. As seen inthis table, Midori64, which applies the M C matrix, has thelowest-cost for the MixColumn implementation; nevertheless,the cumulative column signature-based scheme overhead ofthis matrix is more than other ones. Comparing the percentoverheads in Table I shows how different they could be withrespect to error detection using cumulative column signature(and other schemes for error detection). The second metricis the total number of gates (the original and the add-ondetection), e.g., the number of applied logic gates of M B isnot as low as M C .Finally, through ASIC synthesis and for two select construc-tions in Table II, we present the areas for the MixColumntransformations of Midori ( M C ) and LED. The benchmarkingis done for the error detection architectures using TSMC65-nm library and Synopsys Design Compiler. Similar tothe S-boxes, in order to make the area results meaningfulwhen switching technologies on ASIC, we have providedthe NAND-gate equivalency (gate equivalents: GE). This isperformed using the area of a NAND gate in the utilizedTSMC 65-nm CMOS library which is 1.41 µm . The resultsare shown in Table II, where the overheads are presentedin parentheses (the contrast when comparing this table andTable I is because of the optimizations performed in DesignCompiler, noting that we have not performed sub-expressionsharing in Table I). The aforementioned metrics/overheads aresome of the possible indications to give designers the requiredcriteria to predict low-cost MixColumn implementations forlightweight ciphers. III. C ONCLUSIONS
In this paper, we evaluated the hardware complexitiesof the MixColumn transformations to propose a frameworkfor design-for lightweight and effective countermeasures forintentional and natural faults in crypto-architectures and topresent respective motivations. One can also base the crite-ria (depending on the objectives) on other performance andimplementation metrics, e.g., delay (frequency, throughput,efficiency), power consumption, and energy. Although wechose the MixColumn transformation due to their importance,other less costly transformations can be considered. The results of our VLSI implementations on ASIC platform shows thediversity of MixColumn in lightweight cryptography, callingfor efficient approaches for error detection. Finally, one couldconsider a subset of fault attacks, differential fault intensityanalysis (DFIA), see for instance, [14], [15], [16], which com-bines differential power analysis with fault injection principlesto obtain biased fault models (multi-byte faults cannot beused practically for attacking time redundancy countermeasureimplementations, and single-byte fault models are the onlyviable option for the attackers).R
EFERENCES[1] M. Mozaffari Kermani and A. Reyhani-Masoleh, “Reliable hardwarearchitectures for the third-round SHA-3 finalist Grostl benchmarked onFPGA platform,” in Proc. DFT , pp. 325-331, 2011.[2] M. Mozaffari Kermani and A. Reyhani-Masoleh, “Fault detectionstructures of the S-boxes and the inverse S-boxes for the AdvancedEncryption Standard,”
J. Electronic Testing , vol. 25, no. 4-5, pp. 225-245, 2009.[3] M. Mozaffari Kermani and A. Reyhani-Masoleh, “A Lightweight High-Performance Fault Detection Scheme for the Advanced EncryptionStandard Using Composite Fields,”
IEEE Trans. Very Large ScaleIntegrated (VLSI) Systems, vol. 19, no. 1, pp. 85-91, Jan. 2011.[4] X. Guo, D. Mukhopadhyay, C. Jin, and R. Karri, “Security analysisof concurrent error detection against differential fault analysis,”
J.Cryptographic Engineering, vol. 5, no. 3, pp. 153-169, 2015.[5] M. Mozaffari Kermani, R. Azarderakhsh, and A. Aghaie, “Fault de-tection architectures for post-quantum cryptographic stateless hash-based secure signatures benchmarked on ASIC,”
ACM Trans. EmbeddedComputing Syst. (special issue on Embedded Device Forensics andSecurity: State of the Art Advances), vol. 16, no. 2, pp. 59:1-19, Dec.2016.[6] S. Patranabis, A. Chakraborty, D. Mukhopadhyay, and P. P. Chakrabarti,“Fault space transformation: A generic approach to counter differentialfault analysis and differential fault intensity analysis on AES-like blockciphers,”
IEEE Trans. Information Forensics and Security, vol. 12, no.5, May 2017.[7] M. Mozaffari Kermani, R. Azarderakhsh, C. Lee, and S. Bayat-Sarmadi, “Reliable concurrent error detection architectures for extendedEuclidean-based division over GF (2 m ) ,” IEEE Trans. Very Large ScaleIntegrated (VLSI) Systems, vol. 22, no. 5, pp. 995-1003, May 2014.[8] M. Mozaffari Kermani, R. Azarderakhsh, and A. Aghaie, “Reliableand error detection architectures of Pomaranch for false-alarm-sensitivecryptographic applications,”
IEEE Trans. Very Large Scale Integrated(VLSI) Systems, vol. 23, no. 12, pp. 2804-2812, Dec. 2015.[9] M. Mozaffari Kermani, K. Tian, R. Azarderakhsh, and S. Bayat-Sarmadi, “Fault-resilient lightweight cryptographic block ciphers forsecure embedded systems,”
IEEE Embedded Systems, vol. 6, no. 4, pp.89-92, Dec. 2014.[10] S. Bayat-Sarmadi, M. Mozaffari Kermani, and A. Reyhani-Masoleh,“Efficient and concurrent reliable realization of the secure cryptographicSHA-3 algorithm,”
IEEE Trans. Computers-Aided Design Integr. Cir-cuits Syst., vol. 33, no. 7, pp. 1105-1109, Jul. 2014.[11] M. Mozaffari Kermani and A. Reyhani-Masoleh, “A High-PerformanceFault Diagnosis Approach for the AES SubBytes Utilizing MixedBases,” in
Proc. IEEE Workshop Fault Diagnosis and Tolerance inCryptography (FDTC), pp. 80-87, Nara, Japan, Sep. 2011.[12] J. Guo, T. Peyrin, and A. Poschmann, “The PHOTON family oflightweight hash functions,” in
Proc. Annual Cryptology , Springer, Aug.2011, pp. 222-239.[13] D. Augot and M. Finiasz, “Direct construction of recursive MDSdiffusion layers using shortened BCH codes,” in
Proc. Int. WorkshopFast Software Encryption,
Embedded Systems Letters, vol. 8,no. 2, pp. 33-36, 2016.[15] N. Farhady Ghalaty, B. Yuce, M. M. I. Taha, and P. Schaumont,“Differential fault intensity analysis,” in
Proc. FDTC,
Proc. COSADE,2015, pp. 189-203.