Lightweight Hardware Architectures for Efficient Secure Hash Functions ECHO and Fugue
Mehran Mozaffari Kermani, Reza Azarderakhsh, Siavash Bayat-Sarmadi
aa r X i v : . [ c s . CR ] A p r Lightweight Hardware Architectures for EfficientSecure Hash Functions ECHO and Fugue
Mehran Mozaffari Kermani, Reza Azarderakhsh, Siavash Bayat-Sarmadi
Abstract —In cryptographic engineering, extensive attentionhas been devoted to ameliorating the performance and securityof the algorithms within. Nonetheless, in the state-of-the-art,the approaches for increasing the reliability of the efficienthash functions ECHO and Fugue have not been presented todate. We propose efficient fault detection schemes by presentingclosed formulations for the predicted signatures of differenttransformations in these algorithms. These signatures are derivedto achieve low overhead for the specific transformations andcan be tailored to include byte/word-wide predicted signatures.Through simulations, we show that the proposed fault detectionschemes are highly-capable of detecting natural hardware failuresand are capable of deteriorating the effectiveness of maliciousfault attacks. The proposed reliable hardware architectures areimplemented on the application-specific integrated circuit (ASIC)platform using a 65-nm standard technology to benchmarktheir hardware and timing characteristics. The results of oursimulations and implementations show very high error coveragewith acceptable overhead for the proposed schemes.
I. I
NTRODUCTION
Cryptographic hash functions take arbitrary-length inputsand generate fixed-length outputs. The output of hash functionis then utilized to provide authentication and integrity forthe transferred data. In this paper, due to the efficiency ofthe algorithms ECHO [1] and Fugue [2] (which has beenimproved to Fugue 2.0), and the fact that these are inspired bythe widely-utilized Advanced Encryption Standard (AES), wepresent their respective fault detection schemes. These AES-inspired hash functions (which have been part of the NISTcompetition) have received much attention in the literature.For instance, in [3] and [4], differential and side-channelanalysis attacks for ECHO are presented. Moreover, mucheffort has been put into developing high-performance andefficient hardware implementations of these algorithms, see,for instance, [5], [6], and [7]. As discussed in [8], oneimportant feature of these hash functions is that one can sharesome resources between the AES and these hash algorithms.Thus, low-complexity implementations are achieved.Fault attacks pose serious threats to the implementationsof the crypto-algorithms. Therefore, many fault detectionschemes have been proposed to date for cryptographic andarithmetic entities, see, for instance, [9], [10], [11], [12],
Mehran Mozaffari Kermani is with Department of Computer Scienceand Engineering, University of South Florida, Tampa, FL 33620, Email:[email protected] Azarderakhsh is with Department of ECE and Computer Sci-ence, Florida Atlantic University, Boca Raton, FL 14623, Email: [email protected] Bayat-Sarmadi is with Department of Computer Engineering, SharifUniversity of Technology, Tehran, Iran, Email: [email protected]. [13], [14], [15], [16], [17], and [18] for some examples.Nonetheless, to the best of our knowledge, the schemes forincreasing the reliability of these algorithms have not been pre-sented in the open literature. Effective fault detection schemeswith minimal overhead on these algorithms are essential forachieving reliable hardware architectures.The summary of our contributions is presented in thefollowing. • We have obtained new formulations for the predicted sig-natures of different transformations for hash algorithms,i.e., ECHO [1] and Fugue [2]. The presented closedformulations are used for proposing high-performanceand effective fault detection schemes. • Our simulation results show high fault detection capabil-ity for the proposed schemes for all the algorithms. Thismakes the proposed architectures reliable in practice. • We have used ASIC implementations to benchmark thehardware and timing characteristics of the proposedschemes. The high efficiency of the proposed schemesmakes the proposed architectures suitable for high-performance applications.II. P
RELIMINARIES
ECHO (presented by Benadjila et al. ) [1] supports any hashoutput of length from 128 to 512 bits. The hash functionECHO takes a message and a salt as input. Although the outputcan be of any length from 128 to 512 bits, the four outputsfor NIST competition were 224, 256, 384, and 512 bits. TheECHO algorithm with the output size ( H size ) less than 256,i.e., ≤ H size ≤ , uses the compression functioncalled Compress . However, for ≤ H size ≤ ,the compression function is called Compress which isvery similar to Compress [1]. More details are presentedthroughout the paper as needed.In what follows, we explain the hash function Fugue (pre-sented by the IBM) [2]. Fugue-256 generates a 256-bit output H for the message M which is split into 32-bit blocks m i , ≤ i ≤ t . The chaining value of Fugue-256 (denoted by h )is also split to 32-bit blocks denoted by S i , ≤ i ≤ . Thefollowing transformation sequence is used for updating h from m i : TIX, ROR3, CMIX, SMIX, ROR3, CMIX, and SMIX(called one round R ). The sequence ROR3, CMIX, SMIX iscalled a sub-round. Therefore, a round R consists of the TIXtransformation followed by two sub-rounds [2]. More detailsare presented throughout the paper as needed. C salt v v v v m m m m m m m m m m m m C i salt i v (cid:16) i v (cid:16) i v (cid:16) i v (cid:16) i m i m i m i m i m i m i m i m i m i m i m i m BIG.SubWordsBIG.ShiftRowsBIG.MixColumnsBIG.Final BIG.SubWordsBIG.ShiftRowsBIG.MixColumnsBIG.Final BIG.SubWordsBIG.ShiftRowsBIG.MixColumns saltC t BIG.Final t v (cid:16) t v (cid:16) t v (cid:16) t v (cid:16) t m t m t m t m t m t m t m t m t m t m t m t m (cid:117) (cid:117) (cid:117)
16 128 (cid:117)
16 128 (cid:117)
16 128 (cid:117) (cid:34) T h Compress (cid:34)
Compress
Compress
Fig. 1. The ECHO algorithm for ≤ H size ≤ [1]. III. T HE P ROPOSED F AULT D IAGNOSIS A PPROACHES
In what follows, for each of the algorithms presented in thispaper, we propose respective fault detection schemes.
A. ECHO
An overview of the ECHO algorithm for ≤ H size ≤ including the Compress functions is presented in Fig.1. As seen in Fig. 1, each of the t Compress functions getsthe 128-bit salt, a × state of 128-bit entries, and the counter C i , ≤ i ≤ t (used to count the number of message bits beinghashed). The first column of the state consists of four 128-bitvalues which construct the chaining variable of the previousCompress , i.e., V i − = ( v i − , v i − , v i − , v i − ) , ≤ i ≤ t .The other three columns include the 128-bit blocks of the inputmessage. Therefore, in total, there are × t consists of fourdifferent transformations, i.e., BIG.SubWords, BIG.ShiftRows,BIG.MixColumns, and BIG.Final. Each BIG.SubWords con-tains two AES rounds. The first transformation SubByteswhich includes 16 S-boxes is the only nonlinear AES trans-formation. In the AES S-box, the irreducible polynomial of M ( x ) = x + x + x + x + 1 is used to construct thebinary field GF (2 ) . Let X ∈ GF (2 ) and Y ∈ GF (2 ) be the 8-bit input and output of each S-box, respectively.Then, the S-box consists of a multiplicative inversion, i.e., X − ∈ GF (2 ) , followed by an affine transformation toobtain Y ∈ GF (2 ) . Look-up tables (LUTs) and com-posite fields (polynomial basis, normal basis, mixed basis,and redundant-basis are among the approaches for this low-area implementation variant [19], [20], [21], [22]) are usedto implement the S-boxes. In general, with composite fieldrealizations, a transformation matrix first transforms a fieldelement in the binary field GF (2 ) to the correspondingrepresentation in the composite fields GF (2 ) / GF (((2 ) ) ) .Then, a multiplicative inversion consisting of composite fieldoperations in the sub-field GF ((2 ) ) is performed. Finally,through an inverse transformation matrix, the inverted outputis obtained. There have been a number of great research worksfor error detection of the S-boxes and for the sake of brevity,we do not discuss them. The next transformation used in BIG.SubWords of ECHO isShiftRows whose fault detection is straightforward and by re-wiring. Moreover, for the two final linear transformations, i.e.,MixColumns and AddRoundKey, the 32-bit error indicationflag E c = P r =0 ( in r,c + k r,c + out r,c ) , ≤ c ≤ , can beused. It is noted that in r,c , k r,c , and out r,c are the input toMixColumns, the round key, and the output of AddRoundKey,respectively. This error indication flag can be compressed sothat an n -bit, ≤ n ≤ , error indication flag for these twotransformations are achieved. Finally, after two rounds of theAES, the output of BIG.SubWords is derived.Fault detection for the next transformation in ECHO,BIG.ShiftRows, is by permutation. As explained in the afore-mentioned explanation, the last transformation in BIG.Round,i.e., BIG.MixColumns, is an expansion of MixColumns of theAES. Specifically, the output state of BIG.SubWords (inputstate of BIG.MixColumns) is arranged as a 4-row, 64-columnmatrix. Then, each × sub-matrix is multiplied by the fixedMixColumns matrix. Therefore, we obtain the error indicationflags of the BIG.MixColumns (B.MC) transformation for j sub-matrices, ≤ j ≤ , as follows E jc ( B.M C ) = X r =0 ( in r,c + out r,c ) , j ≤ c ≤ j + 3 , (1)where in the sub-matrices, in r,c and out r,c are the input andoutput of BIG.MixColumns, respectively, for which ≤ r ≤ and ≤ c ≤ .Finally, the BIG.Final transformation is performed as thelast transformation in each Compress (see Fig. 1) of ECHO.This transformation includes modulo-2 addition of the inputstate of the Compress and the output state of the eighthBIG.MixColumns. We present the following lemma for ob-taining the predicted parities of this transformation. Lemma 1:
Let M ji , ≤ j ≤ , be the 128-bit messageblocks and A ji , ≤ j ≤ , be the 128-bit outputs of theeighth BIG.MixColumns of the i th Compress in Fig. 1.In addition, let v ji − , ≤ j ≤ , be the previous chainingvalues. Then, the predicted parities of v ji , ≤ j ≤ (the current chaining values), after performing the BIG.Finaltransformation is obtained as ˆ P ( v ji ) = X j =0 P ( v ji − + A ji ) + X j =0 P ( M ji ) . (2) Proof. According to [1], we have v ji = P j =0 v ji − + P j =0 A ji + P j =0 M ji . Therefore, for the predicted par-ity we reach ˆ P ( v ji ) = P j =0 P ( v ji − ) + P j =0 P ( A ji ) + P j =0 P ( M ji ) and after rearranging, the proof is complete. It is interesting to note that one can also obtain multipleparities for v ji by applying the parity derivation function ( P )to selected bits of the arguments v ji − + A ji and M ji . B. Fugue
To propose a fault detection scheme for Fugue, we observethat the Fugue transformations can be divided into three types.The first type is the rotation transformations, i.e., ROR3,ROR14, and ROR15. The second category contains the two linear transformations TIX and CMIX. Finally, the last one isthe nonlinear transformation SMIX.Each Fugue round has the following sequence: TIX, ROR3,CMIX, SMIX, ROR3, CMIX, and SMIX. First, we proposethe following theorem for the first three transformations TIX,ROR3, and CMIX in the round sequence. Then, we proposethe fault detection scheme for the nonlinear transformationSMIX.
Theorem 1:
Let σ S i = P i =0 S i be the 32-bit result ofmodulo-2 additions of S i , ≤ i ≤ (called word-widesignature). Then, the predicted word-wide signature of thetransformations sequence TIX, ROR3, and CMIX ( ˆ σ T RC ) inthe Fugue round is obtained as ˆ σ T RC = σ S i + S . (3) Proof. For TIX, the following substitutions are performed: S ← S + S , S ← m i , S ← S + m i , and S ← S + S .Therefore, we have ˆ σ T IX = σ S i + S + S + S + S + m i + S + S + m i + S + S + S = σ S i + S . The ROR3transformation, which is just rotations three positions to right,does not change ˆ σ T IX = σ S i + S . Moreover, for CMIX, wehave S ← S + S , S ← S + S , S ← S + S , S ← S + S , S ← S + S , and S ← S + S . Consequently,we reach ˆ σ CMIX = σ S i + S + S + S + S + S + S + S + S + S + S + S + S + S + S + S + S + S + S = σ S i .Therefore, one reaches ˆ σ T RC = σ S i + S and the proof iscomplete. The nonlinear transformation SMIX in Fugue consists oftwo functions. The second one is the linear Super-Mix func-tion. The Super-Mix function consists of multiplication of S - S (as a 16-byte input vector) with the following × matrix N with hexadecimal entries to derive a 16-byte output N = . (4)We propose the following theorem for the predicted parityof the Super-Mix function. Theorem 2:
Let I i ∈ GF (2 ) and O i ∈ GF (2 ) , ≤ i ≤ , be the 16-byte input and output of the Super-Mixfunction in Fugue, respectively. Then, the predicted parity forthis function, i.e., ˆ P SM , is derived as follows (we note thatparity is just an example and any other detecting codes canbe utilized) ˆ P SM = { } h ( I + I + I + I ) , (5)where the multiplication is performed using the irreduciblepolynomial M ( x ) = x + x + x + x + 1 . Proof. We add the elements of the columns of N to reach thepredicted parity ˆ P SM . It is interesting to note that adding theelements in all columns except those in columns , , , and would result zero. For instance, if one adds the elements incolumn 1 of N (modulo-2), the result would be { } h + { } h + { } h + { } h + { } h + { } h + { } h = 0 . For columns , , , and , the addition of elements results in { } h + { } h + { } h + { } h = { } h + { } h = { } h and this completes theproof. We note that the multiplication with { } h = { } h + { } h is derived by the addition of I + I + I + I with x ( I + I + I + I ) mod M ( x ) . IV. S
IMULATION R ESULTS AND
ASIC I
MPLEMENTATIONS
The proposed error detection architectures have been simu-lated after injecting faults. The proposed architectures have thecapability of detecting both permanent and transient faults (thiscovers both natural and malicious faults). In this paper, we usestuck-at error model. The objective in using this model is tocover the malicious errors injected by the attackers to breakthe algorithm (by injecting one or more incorrect bits) andto detect natural errors caused by bit flips. The stuck-at errorforces one bit (for single stuck-at error model) or multiple bits(for multiple stuck-at error model) to be stuck at logic one orzero. This makes the result value independent of the error-freeintended value.In fault attacks, single error injection is the ideal casefor gaining the maximum information. Nevertheless, due totechnological constraints, a more realistic error model is toinject multiple errors. Therefore, for covering both naturalerrors and fault attacks, multiple errors need to be considered.The proposed diagnosis schemes in this paper are independentof the life-time of errors. Therefore, both permanent andtransient stuck-at errors lead to the same error coverage. Wealso note that intelligent attackers do not get confined to justmultiple stuck-at faults and thus the ability to detect singlefaults is important.The fault model used to test the proposed architectures iscreated using external feedback linear-feedback shift registers(LFSRs) to generate pseudo-random fault vectors that canflip random bits in the output of the gates and at randomintervals. For the architectures presented, we have injected upto 80,000 faults and recorded the number of errors. We havealso used the redundant-basis S-boxes in composite field whereapplicable. Moreover, the false alarm ratios are derived. Theerror coverage in all the cases is more than 99% (and for thecase of single stuck-at faults, 100% if we harden the errorindication flag comparison units), with relatively low ratio forfalse alarms, i.e., 0.1%-0.3% for the cases. As we inject morefaults, the difference between the error detection results is,comparably, not high, showing the relatively high accuracy ofthe results.Through ASIC and for the constructions of the algorithmsin 256-bit form, we also present the performance and imple-mentation metrics of the presented constructions. The bench-marking is performed for the error detection architecturesusing TSMC 65nm library and Synopsys Design Compiler(shown in Table I for area, frequency, throughput, and effi-ciency [throughput over GE]). We note that in Table I, in
TABLE IB
ENCHMARK FOR THE PROPOSED ERROR DETECTION SCHEMES FOR THE HASH ALGORITHMS ON
ASIC (65 NM TSMC)Algorithm Block (bits) Area [GE] Frequency [MHz] Throughput [Gbps] Efficiency [Mbps/GE]ECHO-256 1,536 145,912 389 6.48 44.40Proposed scheme 187,098 (28%) 370 (4.9%) 6.18 (4.6%) 33.03 (25.6%)Fugue-256 32 49,040 547 8.77 178.8Proposed scheme 57,900 (18.1%) 519 (5.1%) 8.33 (5.1%) 141.1 (21.1%) order to make the area results meaningful when switchingtechnologies, we have also provided the NAND-gate equiv-alency (gate equivalents: GE). This is performed using thearea of a NAND gate in the utilized TSMC 65-nm CMOSlibrary which is 1.41 µm . The results presented in Table Ishow acceptable overhead (degradation) for performance andimplementation metrics. We also note that the utilized platformis merely for benchmark and we expect similar results onfield-programmable gate arrays (FPGAs) or different ASIClibraries. V. C ONCLUSIONS
In this paper, we have proposed efficient fault detectionschemes by presenting closed formulations for the predictedsignatures of different transformations in three hash algo-rithms. These signatures are derived to achieve low overheadfor the specific transformations and can be tailored to includebyte/word-wide predicted signatures. Through simulations, wehave shown that the proposed fault detection schemes arehighly capable of detecting natural hardware failures and arecapable of deteriorating the effectiveness of malicious faultattacks. The proposed reliable hardware architectures havebeen also implemented on ASIC platform using a 65-nmstandard technology to benchmark their hardware and timingcharacteristics. The high efficiency of the proposed schemesmakes the proposed reliable architectures suitable for high-performance applications.R
EFERENCES[1] R. Benadjila, O. Billet, H. Gilbert, G. Macario-Rat, T. Peyrin,M. Robshaw, and Y. Seurin, “ECHO hash function,” available:http://crypto.rd.francetelecom.com/echo/, accessed March 2018.[2] S. Halevi, W. E. Hall, and C. S. Jutla, “The hash function Fugue,”
Cryptology ePrint Archive, IACR , https://eprint.iacr.org/2014/423.pdf,2014, accessed March 2018.[3] T. Peyrin, “Improved differential attacks for ECHO and Grøstl,”
Cryp-tology ePrint Archive , Report 2010/223, 2010.[4] O. Beno¨ıt and T. Peyrin, “Side-channel analysis of six SHA-3 candi-dates,” in
Proc. CHES , 2010, pp. 140-157.[5] J.-L. Beuchat, E. Okamoto, and T. Yamazaki, “A compact FPGAimplementation of the SHA-3 candidate ECHO,”
Cryptology ePrintArchive , Report 2010/364, 2010.[6] K. Gaj, E. Homsirikamol, and M. Rogawski, “Fair and comprehensivemethodology for comparing hardware performance of fourteen roundtwo SHA-3 candidates using FPGAs,” in
Proc. CHES , 2010, pp. 264-278.[7] S. Tillich, M. Feldhofer, M. Kirschbaum, T. Plos, J.-M. Schmidt, and A.Szekely, “Uniform evaluation of hardware implementations of the round-two SHA-3 candidates,”
The Second SHA-3 Candidate Conference , Aug.2010.[8] K. J¨arvinen, “Sharing resources between AES and the SHA-3 secondround candidates Fugue and Grøstl,”
The Second SHA-3 CandidateConference , Aug. 2010. [9] M. Mozaffari Kermani and A. Reyhani-Masoleh, “Concurrent Structure-Independent Fault Detection Schemes for the Advanced EncryptionStandard,”
IEEE Trans. Computers , vol. 59, no. 5, pp. 608-622, May2010 (special issue on System Level Design of Reliable Architectures).[10] M. Mozaffari Kermani and A. Reyhani-Masoleh, “A High-PerformanceFault Diagnosis Approach for the AES SubBytes Utilizing MixedBases,” in
Proc. IEEE Workshop Fault Diagnosis and Tolerance inCryptography (FDTC), pp. 80-87, Nara, Japan, Sep. 2011.[11] M. Mozaffari Kermani and A. Reyhani-Masoleh, “A Lightweight High-Performance Fault Detection Scheme for the Advanced EncryptionStandard Using Composite Fields,”
IEEE Trans. Very Large ScaleIntegrated (VLSI) Systems, vol. 19, no. 1, pp. 85-91, Jan. 2011.[12] M. Mozaffari Kermani, V. Singh, and R. Azarderakhsh, “Reliablelow-latency Viterbi algorithm architectures benchmarked on ASIC andFPGA,”
IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 1, pp. 208-216, 2017.[13] M. Mozaffari Kermani, R. Azarderakhsh, and A. Aghaie, “Fault de-tection architectures for post-quantum cryptographic stateless hash-based secure signatures benchmarked on ASIC,”
ACM Trans. EmbeddedComputing Syst. (special issue on Embedded Device Forensics andSecurity: State of the Art Advances), vol. 16, no. 2, pp. 59:1-19, Dec.2016.[14] S. Patranabis, A. Chakraborty, D. Mukhopadhyay, and P. P. Chakrabarti,“Fault space transformation: A generic approach to counter differentialfault analysis and differential fault intensity analysis on AES-like blockciphers,”
IEEE Trans. Information Forensics and Security, vol. 12, no.5, May 2017.[15] M. Mozaffari Kermani, R. Azarderakhsh, C. Lee, and S. Bayat-Sarmadi, “Reliable concurrent error detection architectures for extendedEuclidean-based division over GF (2 m ) ,” IEEE Trans. Very Large ScaleIntegrated (VLSI) Systems, vol. 22, no. 5, pp. 995-1003, May 2014.[16] M. Mozaffari Kermani, R. Azarderakhsh, and A. Aghaie, “Reliableand error detection architectures of Pomaranch for false-alarm-sensitivecryptographic applications,”
IEEE Trans. Very Large Scale Integrated(VLSI) Systems, vol. 23, no. 12, pp. 2804-2812, Dec. 2015.[17] M. Mozaffari Kermani, K. Tian, R. Azarderakhsh, and S. Bayat-Sarmadi, “Fault-resilient lightweight cryptographic block ciphers forsecure embedded systems,”
IEEE Embedded Systems, vol. 6, no. 4, pp.89-92, Dec. 2014.[18] S. Bayat-Sarmadi, M. Mozaffari Kermani, and A. Reyhani-Masoleh,“Efficient and concurrent reliable realization of the secure cryptographicSHA-3 algorithm,”
IEEE Trans. Computers-Aided Design Integr. Cir-cuits Syst., vol. 33, no. 7, pp. 1105-1109, Jul. 2014.[19] A. Hodjat and I. Verbauwhede, “Area-Throughput Trade-Offs for FullyPipelined 30 to 70 Gbits/s AES Processors,”
IEEE Trans. Computers ,vol. 55, no. 4, pp. 366-372, April 2006.[20] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Rijndaelhardware architecture with S-Box optimization,” in
Proc. ASIACRYPT ,2001, pp. 239-254.[21] D. Canright, “A very compact S-Box for AES,” in