[PDF] Efficient Parallel Verification of Galois Field Multipliers

Abstract

Galois field (GF) arithmetic is used to implement critical arithmetic components in communication and security-related hardware, and verification of such components is of prime importance. Current techniques for formally verifying such components are based on computer algebra methods that proved successful in verification of integer arithmetic circuits. However, these methods are sequential in nature and do not offer any parallelism. This paper presents an algebraic functional verification technique of gate-level GF (2m ) multipliers, in which verification is performed in bit-parallel fashion. The method is based on extracting a unique polynomial in Galois field of each output bit independently. We demonstrate that this method is able to verify an n-bit GF multiplier in n threads. Experiments performed on pre- and post-synthesized Mastrovito and Montgomery multipliers show high efficiency up to 571 bits.

Full PDF

aa r X i v : . [ c s . S C ] J a n Efﬁcient Parallel Veriﬁcation of Galois Field Multipliers

Cunxi Yu, Maciej CiesielskiECE Department, University of Massachusetts, Amherst, [email protected], [email protected]

Abstract -

Galois ﬁeld (GF) arithmetic is used to implementcritical arithmetic components in communication and security-relatedhardware, and veriﬁcation of such components is of prime importance.Current techniques for formally verifying such components are basedon computer algebra methods that proved successful in veriﬁcationof integer arithmetic circuits. However, these methods are sequentialin nature and do not offer any parallelism. This paper presents analgebraic functional veriﬁcation technique of gate-level GF (2 m ) multipliers, in which veriﬁcation is performed in bit-parallel fashion.The method is based on extracting a unique polynomial in Galoisﬁeld of each output bit independently. We demonstrate that thismethod is able to verify an n -bit GF multiplier in n threads.Experiments performed on pre- and post-synthesized Mastrovito and

Montgomery multipliers show high efﬁciency up to 571 bits.

Keywords — Formal veriﬁcation; Galois ﬁeld arithmetic circuits; computeralgebra.

I. I

NTRODUCTION

Galois ﬁeld (GF) arithmetic is used to implement critical arithmeticcomponents in communication and security-related hardware. It hasbeen extensively applied in many digital signal processing andsecurity applications, such as Elliptic Curve Cryptography (ECC),Advanced Encryption Standard (AES), and others. Multiplication isone of the most heavily used Galois ﬁeld computations and it is ahigh complexity operation. Speciﬁcally, in cryptography systems, thesize of Galois ﬁeld circuits can be very large. Therefore, developinggeneral formal analysis technique of Galois ﬁeld arithmetic HW/SWimplementations becomes critical. Contemporary formal techniques,such as

Binary Decision Diagrams (BDDs),

Boolean Satisﬁability (SAT),

Satisﬁability Modulo Theories (SMT), etc., are not directlyapplicable to either the veriﬁcation or reverse engineering of Galoisﬁeld arithmetic. The limitations of these techniques when applied toGalois ﬁeld arithmetic have been addressed in [1].The most successful techniques for verifying arithmetic circuitsuse computer algebra techniques with polynomial representations[1][2][3][4]. The veriﬁcation problem is typically formulated asproving that the implementation satisﬁes the speciﬁcation. This isaccomplished by performing a series of divisions of the speciﬁcationpolynomial F by the implementation polynomials B = { f , . . . , f s } ,representing components that implement the circuit. The techniquebased on Gr¨obner Basis demonstrated that this approach can efﬁ-ciently reduce the complexity of the veriﬁcation problem to mem-bership testing of the speciﬁcation polynomial in the ideals [1][3].This technique has been applied successfully to large Galois Fieldarithmetic circuits [1]. Symbolic computer algebra methods havebeen used to derive word-level operation for GF circuits and integerarithmetic circuits to improve the veriﬁcation performance [5][6]. Adifferent approach to arithmetic veriﬁcation of synthesized gate-levelcircuits has been proposed in [4]. This method uses algebraic rewrit-ing of the polynomials at the primary outputs to extract speciﬁcationas a polynomial at the primary inputs.However, a common limitation to all these works is that they arenot applicable to parallel veriﬁcation. This is because the veriﬁcation problem based on computer algebra technique expresses the speciﬁca-tion as polynomial in all output bits. In this approach, the polynomialdivision can be done only in a single thread. In principle, multiplespeciﬁcations (called output signature in [4]) can be generated bysplitting the output signature. However, we examined this methodand found the performance to be really poor. The reason is that thetechnique of [4] needs to rewrite the entire output signature in allthe output bits to beneﬁt from large monomial cancellations duringrewriting. In other works [1][5], the veriﬁcation problem of post-synthesized Galois ﬁeld multipliers have not been addressed.In this work, we extend the veriﬁcation technique of [4] to veriﬁca-tion of Galois ﬁeld multipliers, while applying bit-level parallelism.Speciﬁcally: • We propose an algorithm for Galois ﬁeld arithmetic veriﬁcation,which signiﬁcantly reduces the internal expression size duringalgebraic rewriting. • We evaluate our approach using benchmarks used in [1][5],including

Mastrovito and

Montgomery multipliers, up to 571bits. The results show that efﬁciency of our approach surpassesthat of [1] and [5]. • We demonstrate that for the veriﬁcation problem for an n -bitGalois ﬁeld multiplier can be accomplished ideally in n parallelthreads. In this work, we set the number of threads to 5, 10, 20,and 30. We also analyze the efﬁciency of our parallel approachby studying the tradeoff between CPU runtime and memoryusage. • We address the veriﬁcation of synthesized

Galois ﬁeld multi-pliers, while previous work dealt only with the veriﬁcation ofstructural representation (arithmetic netlists) prior to synthesis.II. B

ACKGROUND

Different variants of canonical, graph-based representations havebeen proposed for arithmetic circuit veriﬁcation, including Binary De-cision Diagrams (BDDs) [7], Binary Moment Diagrams (BMDs) [8],Taylor Expansion Diagrams (TED) [9], and other hybrid diagrams.While the canonical diagrams have been used extensively in logicsynthesis, high-level synthesis and veriﬁcation, their application toverify large arithmetic circuits remains limited by the prohibitivelyhigh memory requirement for complex arithmetic circuits [4][1].Alternatively, arithmetic veriﬁcation problems can be modeled andsolved using Boolean satisﬁability (SAT) or satisﬁability modulo the-ories (SMT). However, it has been demonstrated that these techniquescannot efﬁciently solve the veriﬁcation problem of large arithmeticcircuits [1] [10]. Another class of solvers include Theorem Provers,deductive systems for proving that an implementation satisﬁes thespeciﬁcation, using mathematical reasoning. However, this techniquerequires manual guidance, which makes it difﬁcult to be appliedautomatically.

A. Computer Algebra Approaches

The most advanced techniques that have potential to solve the arith-metic veriﬁcation problems are those based on symbolic ComputerAlgebra. These methods model the arithmetic circuit speciﬁcationand its hardware implementation as polynomials [1][2][4][5][11].

The veriﬁcation goal is to prove that implementation satisﬁes thespeciﬁcation by performing a series of divisions of the speciﬁcationpolynomial F by the implementation polynomials B = { f , . . . , f s } ,representing components that implement the circuit. The polynomials f , ..., f s are called the bases, or generators , of the ideal J . Given aset f , ..., f s of generators of J , a set of all simultaneous solutions toa system of equations f ( x , ..., x n ) =0; ..., f s ( x , ..., x n ) =0 is calleda variety V ( J ) . Veriﬁcation problem is then formulated as testing ifthe speciﬁcation F vanishes on V ( J ) In some cases, the test can besimpliﬁed to checking if F ∈ J , which is known in computer algebraas ideal membership testing [1].There are two basic techniques to reduce polynomial F modulo B . A standard procedure to test if F ∈ J is to divide polynomial F by the elements of B : f , ..., f s , one by one. The goal is to cancel,at each iteration, the leading term of F using one of the leadingterms of f , ..., f s . If the remainder of the division is r = 0 , then F vanishes on V ( J ) , proving that the implementation satisﬁes thespeciﬁcation. However, if r = 0 , such a conclusion cannot be made: B may not be sufﬁcient to reduce F to 0, and yet the circuit maybe correct. To check if F is reducible to zero, a canonical set ofgenerators, G = { g , ..., g t } , called Gr¨obner basis is needed. Thistechnique has been successfully applied to Galois ﬁeld arithmetic[1] and integer arithmetic circuits [3]. A different approach hasbeen proposed in [4][12][6][13][14], where a gate-level network isdescribed by a system of equations and proved by backward rewriting .Starting with the known output signature (polynomial) in primaryoutput variables, it rewrites the signature from the primary outputs toprimary inputs, to extract an arithmetic function (speciﬁcation). Thespeciﬁc veriﬁcation work of Galois ﬁeld arithmetic has been presentedin [1] [5]. These works provide signiﬁcant improvement compared toother techniques, since their formulations relies on certain simplifyingproperties in Galois ﬁeld during polynomial reductions. Speciﬁcally,the problem reduces to the ideal membership testing over a largerideal that includes J = h x − x i in F . In this paper, we provide acomparison between this technique and our approach. B. Galois Field Multiplication

Galois ﬁeld (GF) is a number system with a ﬁnite number ofelements and two main arithmetic operations, addition and multipli-cation; other operations can be derived from those two [15]. Galoisﬁeld with p elements is denoted as GF ( p ) . The most widely-usedﬁnite ﬁelds are Prime Fields and

Extension Fields , and particularly binary extension ﬁelds . Prime ﬁeld, denoted GF ( p ) , is a ﬁnite ﬁeldconsisting of ﬁnite number of integers { , , ...., p − } , where p isa prime number, with additions and multiplication performed modulop . Binary extension ﬁeld, denoted GF (2 m ) (or F m ), is a ﬁnite ﬁeldwith m elements; unlike in prime ﬁelds, however, the operations inextension ﬁelds is not computed modulo m . Instead, in one possiblerepresentation (called polynomial basis), each element of GF (2 m ) is a polynomial ring with m terms with the coefﬁcients in GF (2) .Addition of ﬁeld elements is the usual addition of polynomials,with coefﬁcient arithmetic performed modulo 2. For example, a 2-bit vector A = { a , a } in GF (2 ) , is A ( x ) = a + a x , where a i ∈ GF (2) = { } . Multiplication of ﬁeld elements is performed modulo irreducible polynomial P ( x ) of degree m and coefﬁcients in GF (2) .For example, P = x + x + 1 is an irreducible polynomial in GF (2 ) .The irreducible polynomial P ( x ) is analog to the prime number p inprime ﬁelds GF ( p ) . Extension ﬁelds are used in many cryptographyapplications, such as AES and ECC. In this work, we focus on theveriﬁcation problem of GF (2 m ) multipliers.An example of multiplication in GF (2 ) is shown in Figure 1. Theleft part of the ﬁgure shows a standard 2-bit integer multiplication a a b b a b a b a b a b r r r r a a b b a b a b a b a b s s s s s z z a) b)Fig. 1: 2-bit multiplication: a) standard integer multiplication with4-bit result; b) multiplication in GF (2 ) with A ( x ) = a + a x , B ( x ) = b + b x and result Z ( x ) = z + z x ≡ A ( x ) · B ( x ) mod P ( x ) ;irreducible polynomial P ( x ) = x + x + 1 .with four output bits. To represent the result in GF (2 m ) , which cancontain only two bits, the bits r and r are reduced in GF (2 ) . Thisresult of such a reduction is shown on the right part of the ﬁgure.The input and output operands are represented using polynomials A ( x ) , B ( x ) and Z ( x ) . The functions of s , s and s are representedusing polynomials in GF (2) : s = a b , s = a b + a b , and s = a b .Hence, z = s + s and z = s + s . As a result, the coefﬁcients of themultiplication are: z = a b + a b , z = a b + a b + a b . In digitalcircuits, partial products can be implemented using AND gates, andaddition modulo 2 using

XOR gates. Note that, unlike in the integermultiplication in GF (2 m ) circuits there is no carry out to the nextbit. For this reason, as we can see in Figure 1, the function of eachoutput bit is computed independently of other bits. C. Function ExtractionFunction extraction is an arithmetic veriﬁcation method proposedin [4] for integer arithmetic circuits, in Z m . It extracts a uniquebit-level polynomial function implemented by the circuit directlyfrom its gate-level implementation. Extraction is done by backwardrewriting , i.e., transforming the polynomial representing encoding ofthe primary outputs (called the output signature ) into a polynomialat the primary inputs (the input signature ). This technique has beensuccessfully applied to large integer arithmetic circuits, such as 512-bit integer multipliers. However, it cannot be directly applied tolarge GF multipliers because of exponential size of the intermediatenumber of polynomial terms before cancellations during rewriting.Fortunately, arithmetic GF (2 m ) circuits offer an inherent parallelismwhich can be exploited in backward rewriting. In the rest of the paper,we show how to apply such parallel rewriting in GF (2 m ) circuitswhile avoiding memory explosion experienced in integer arithmeticcircuits. III. P RELIMINARIES

A. Computer Algebraic model

The circuit is modeled as a network of logic elements of arbitrarycomplexity including: basic logic gates (AND, OR, XOR, INV) andcomplex standard cell gates (AOI, OAI, etc.) obtained by synthesisand technology mapping. Instead of modeling Boolean operatorsusing pseudo-Boolean equations, we use the algebraic models in GF (2) , i.e. modulo 2. For example, the pseudo-Boolean model ofXOR( a, b )= a + b − ab is reduced to ( a + b − ab ) mod = ( a + b ) mod . The following algebraic equations are used to describe basiclogic gates in GF (2 m ) , according to [1]: For polynomials in GF (2) , ”+” is computed modulo 2. ¬ a = 1 + aa ∧ b = a · ba ∨ b = a + b + a · ba ⊕ b = a + b (1) B. Outline of the Approach

Similarly to the work of [4], the computed function of the circuitsis speciﬁed by two polynomials. The output signature of a GF (2 m ) multiplier, Sig out = P m − i =0 z i x i , with z i ∈ GF (2) . The input signa-ture of a GF (2 m ) multiplier, Sig in = P m − i =0 P i x i , with coefﬁcients P i ∈ GF (2) being product terms, with addition operation performedmodulo 2 (e.g. ( a b + a b ) mod 2). For a GF (2 m ) multiplier, ifthe irreducible polynomial P ( x ) is provided, Sig in is known. Ourgoal is to transform the output signature, Sig out , using polynomialrepresentation of the internal logic elements, into the input signature

Sig in in GF (2 m ) . The the goal of the veriﬁcation problem is thento check if Sig in = Sig out , expressed in the primary inputs.

Theorem 1:

Given a combinational GF (2 m ) arithmetic circuit,composed of logic gates, described by algebraic expressions (Eq.1), input signature Sig in computed by backward rewriting is uniqueand correctly represents the function implemented by the circuit in GF (2 m ) . Proof:

The proof of correctness relies on the fact that each transfor-mation step (rewriting iteration) is correct. That is, each internal signalis represented by an algebraic expression, which always evaluates toa correct value in GF (2 m ) . This is guaranteed by the correctnessof the algebraic model in Eq. (1), which can be proved easily byinspection. For example, the algebraic expression of XOR(a,b) in Z m is a + b − ab . When implemented in GF (2 m ) , the coefﬁcients inthe expression must be in GF (2) . Hence, XOR(a,b) in GF m isrepresented by a + b . The proof of uniqueness is done by inductionon i , the step of transforming polynomial F i into F i +1 . A detailedinduction proof for expressions in Z m is provided in [4]. (cid:3) Theorem 2:

Let the number of logic elements (polynomials) ina GF (2 m ) multiplier be n . At each iterations, backward rewritingprocess generates n internal expressions, F , F , ..., F n − , such thatevery expression F i ∈ GF (2 m ) . Proof:

Assuming that F = Sig out and each F i ∈ GF (2 m ) , weprove that F i +1 ∈ GF (2 m ) . Each variable in F i represents outputof some logic gate. During the rewriting process, this variable issubstituted by a corresponding polynomial in Eq. (1). Accordingto Theorem 1, resulting polynomial F i +1 correctly represents thefunction F i +1 ∈ GF (2 m ) . (cid:3) Theorems 1 and 2, together with the algebraic model in Eq. (1),provide the basis for polynomial reduction in backward rewriting inthis work. This is described by Algorithm 1. Our method takes thegate-level netlist of a GF (2 m ) multiplier as input and ﬁrst convertseach logic gate into equations using Eq. (1). The output signature Sig out is required to initialize the backward rewriting. The rewritingprocess starts with F = Sig out , and ends when all the variablesin F i are primary inputs. This is done by rewriting the polynomialsrepresenting logic elements in the netlist in a topological order [4].Each iteration includes two steps: Step 1) substitute the variable ofthe gate output using the expression in the inputs of the gate (Eq.1),and name the new expression F i +1 (lines 3 - 6); Step 2) simplify thenew expression by removing all the monomials (including constants)that evaluate to 0 in GF (2) (line 3 and lines 7 - 10). The algorithmoutputs the function of the design in GF (2 m ) after n iterations, where Algorithm 1

Backward Rewriting in GF (2 m ) Input: Gate-level netlist of GF (2 m ) multiplierInput: Output signature Sig out , and (optionally) input signature,

Sig in Output: GF function of the design, and answer whether

Sig out == Sig in P = { p , p , ..., p n } : polynomials representing gate-level netlist F = Sig out for each polynomial p i ∈ P do for output variable v of p i in F i do replace every variable v in F i by the expression of p i F i → F i +1 for each element/monomial M in F i +1 do if the coefﬁcient of M %2==0 or M is constant, M %2==0 then remove M from F i +1 end if end for end for end for return F n and F n =? Sig in G1G2G3G4 n n G6 G8 G7 n G5 n n n z z a a b b a b a b Fig. 2: The gate-level netlist of post-synthesized and mapped 2-bitmultiplier over GF (2 ) . The irreducible polynomial P ( x ) = x + x + 1 . n is the number of gates in the netlist. The ﬁnal expression F n can beused for functional veriﬁcation, by checking if it matches the expectedinput signature (if provided). Example 1 (Figure 2): We illustrate our method using a post-synthesized 2-bit multiplier in GF (2 ) , shown in Figure 2. Theirreducible polynomial is P ( x ) = x + x + 1 . The output sig-nature is Sig out = z + z x , and input signature is Sig in =( a b + a b )+( a b + a b + a b ) x . First, F = Sig out is transformedinto F using polynomial of gate G , z = n + n . This expression issimpliﬁed to F = z + n x + n x . Then, the polynomials F i +1 aresuccessively derived from F i and checked for a possible reduction.The ﬁrst reduction happens when F is transformed into F , where n (at gate G ) is replaced by ( a b ). After simpliﬁcation,a monomial x is identiﬁed and removed from F since 2%2=0.Similar reductions are applied during the transformations F → F and F → F . Finally, the function of the design is extracted byAlgorithm 1. A complete rewriting process is shown in Figure 3.We can see that F = Sig in , which indicates that the circuit indeed Sig out : F = z +x z Eliminating termsG7: F = z +x( n + n ) - G6: F = n + n +x( n + n ) - G5: F = n + n +x( n + n + n ) - G8: F = n + n +x( n + n + n +1) - G4: F = n + n +x( n + n + a b )+2x G3: F = n + n +x( n + a b + a b +1) - G2: F = n + a b +1+x( a b + a b + a b )+2x G1: F = a b + a b +2+x( a b + a b + a b ) Sig in : a b + a b +x( a b + a b + a b ) - Fig. 3: Function extraction of a 2-bit GF multiplier shown in Figure2 using backward rewiring from PO to PI. implements the GF (2 ) multiplication with P ( x ) = x + x + 1 .An important observation is that the potential reductions takeplace only within the expression associated with the same degreeof polynomial ring ( Sig out is a polynomial ring). In other words, thereductions happen independently in a logic cone of every output bit,independently of other bits, regardless of logic sharing between thecones. For example, the reductions in F and F are extracted fromoutput z only. Similarly, in F , the reduction is from z . Theorem 3:

Given a GF (2 m ) multiplier with Sig out = F = z x + z x + ... + z m x m ; and F i = E x + E x + ... + E m x m ,where E i is an algebraic expression in GF (2) obtained duringrewriting. Then, the polynomial reduction is possible only within asingle expression E i , for i =1, 2, ..., m. Proof:

Consider a polynomial E i x n i + E k x n k , where E i and E k are simpliﬁed in GF (2) . That is, E i = ( e i + e i + ... ), and E k =( e k + e k + ... ). After simplifying each of the two polynomials, there areno common monomials between E i x n i and E k x n k . This is becausefor any element, e li x n i = e jk x n k , for any pairs of ( i, k ) and ( l, j ) . (cid:3) IV. I

MPLEMENTATION

Gate-levenetlistNetlist to EquationsSigout

Sigout =zmSigout =z2Sigout =z1Equationsof netlistSigout=z0 thread 1thread 2thread 3thread m

Compute ﬁnal functionReturn Fn … Fig. 4: Overview of parallel veriﬁcation of GF multipliers.This section describes the implementation of our parallel veriﬁca-tion method for Galois ﬁeld multipliers. The overview of the proposedtechnique is shown in Figure 4. Our approach takes the gate-levelnetlist as input, and outputs the extracted function of the design. Itincludes four steps:1) Convert the gate-level netlist into algebraic equations. Duringthis step, the gate-level netlist is translated into algebraicequations based on Eq.(1). The equations are levelized intopological order, to be rewritten by backward rewriting inthe next step.2) Split the output signature of GF (2 m ) multipliers into m polynomials with Sig out i = z i . These new signatures are rep-resented by m equation ﬁles.3) Split the function of m output bits into m separate functions,each to be processed by a separate thread using Algorithm1. In contrast to work of [4], the internal expression of eachoutput bit does not offer any polynomial reduction ( monomialcancellations ) for other bits.4) Compute the ﬁnal function of the multiplier. Once the algebraicexpression of each output bit in GF (2) is computed, our method computes the ﬁnal function by constructing the Sig out using the rewriting process in step 3.

Sig out = z elim Sig out =x · z elimG7: z - G7: x( n + n ) -G6: n + n - G6: x( n + n ) -G5: n + n - G5: x( n + n + n ) -G8: n + n - G8: x( n + n + n )+x -G4: n + n - G4: x( n + n + a a )+2x 2xG3: n + n - G3: x( n + a b + a b )+x -G2: n + a b +1 - G2: x( a b + a b + a b )+2x 2xG1: a b + a b +2 G1: x( a b + a b + a b ) - Sig in = a b + a b +x( a b + a b + a b ) Fig. 5: Parallel extraction of a 2-bit GF multiplier shown in Figure2.

Example 2 (Figure 5): We illustrate our parallel extraction methodusing the 2-bit multiplier in GF (2 ) in Figure 2. The output signature Sig out = z + z x is split into two signatures, Sig out = z and Sig out = z . Then, the rewriting process is applied to Sig out and Sig out in parallel. When Sig out and Sig out have been success-fully extracted, the two signatures are merged as Sig out + Sig out x resulting in the polynomial Sig in . In Figure 3, we can see thatelimination happens three times ( F , F , and F ). According toTheorem 3, we know that the elimination happens within eachelement in GF( n ). In Figure 5, one elimination in Sig out andtwo eliminations in Sig out have been done independently, as shownearlier (Figure 3). V. R ESULTS

The veriﬁcation technique described in this paper was implementedin C++. It performs backward rewriting with variable substitution andpolynomial reductions in Galois ﬁeld, using the approach discussedin Sections III and IV. The program was tested on a numberof combinational gate-level GF (2 m ) multipliers taken from [1],including Montgomery multipliers [16] and Mastrovito multipliers[17]. The bit-width of the multipliers varies from 32 to 571 bits. Theexperiments of verifying Galois ﬁeld multipliers using SAT, SMT,ABC [18] and Singular [19] have been presented in [1] and [5]. Itshows that the rewriting technique performs signiﬁcantly better thanother techniques. Hence, in this work, we only compare our approachto those of [1] and [5]. Speciﬁcally, we compare our approach to thetool described in [5] on the same benchmark set. Our experimentswere conducted on a PC with Intel(R) Xeon CPU E5-2420 v2 2.20GHz x12 with 32 GB memory. As described in the next section, ourtechnique is able to verify Galois ﬁeld multipliers in multiple threads(up to 30 using our platform). In each thread, Algorithm 1 is appliedon a single output bit. The number of threads is given as input to thetool. A. Evaluation of Our Approach

The experimental results of our approach and comparison against[5] are shown in Table I for gate-level Mastrovito multipliers withbit-width varying from 32 to 571 bits. These multipliers are directlymapped using ABC [18] without any optimization. The largest circuitincludes more than 1.6 million gates. This is also the number ofpolynomial equations and the number of rewriting iterations (seeSection 3). The results generated by the tool, presented in [5] areshown in columns 3 and 4. We performed four different series ofexperiments, with a different number of threads, T =5, 10, 20, and30. The runtime results are shown in columns 6 to 8 and memoryusage in column 9. The timeout limit (TO) was set to 12 hours and Mastrovito [5] This workOp size

T=5 T=10 T=20 T=30 T=1*

32 5,482 0.83 3 1.90 1.54 0.95 1.09 10 MB48 12,228 8.39 13 5.73 3.36 2.83 2.27 21 MB64 21,814 28.90 21 11.08 7.88 6.87 6.74 37 MB96 51,412 195.2 45 38.14 26.69 20.19 22.66 84 MB128 93,996 924.3 91 91.67 62.68 54.99 56.76 152 MB163 153,245 3546 161 192.6 137.5 120.7 113.1 248 MB233 167,803 4933 168 294.1 212.7 180.1 170.6 270 MB283 399,688 30358 380 890.7 606.5 549.7 529.8 642 MB571 1628,170 TO - 7980 5038 MO MO

TABLE I: Results of verifying Mastrovito multipliers using our parallel approach. T is the number of threads. T O =Time out of 12 hours. MO =Memory out of 32 GB.(* T=1 shows the maximum memory usage of each thread.)

Montgomery [5] This workOp size

T=5 T=10 T=20 T=30 T=1*

32 4,352 1.98 3 3.49 2.16 1.31 2.08 8 MB48 9,602 14.19 13 17.71 10.67 9.16 6.01 16 MB64 16.898 63.48 21 44.86 30.57 28.3 27.22 27 MB96 37,634 554.6 45 234.3 157.8 133.1 142.3 59 MB128 66,562 1924 68 208.9 121.3 115.8 110.4 95 MB163 107,582 12063 101 1615.7 1172.3 1094.9 1008.1 161 MB233 219,022 TO

168 722.3 564.8 457.7 479.8 301 MB283 322,622 TO

380 19745 17640 15300 14820 488 MB

TABLE II: Results of verifying

Montgomery multipliers using our parallel approach. T is the number of threads. T O =Time out of 12 hours. MO =Memory out of 32 GB.(* T=1 shows the maximum memory usage of each thread.)memory limit (MO) to 32 GB. The experimental results show that ourapproach provides on average 26.2x, 37.8x, 42.7x, and 44.3x speedup,for T =

5, 10, 20, and 30 threads, respectively. Our approach canverify the multipliers up to 571 bit-wide multipliers in 1.5 hours,while that of [5] fails after 12 hours.Note that the reported memory usage of our approach is themaximum memory usage per thread . This means that our toolexperiences maximum memory usage with all T threads running inthe process; in this case, the memory usage is T · Mem . This is whythe 571-bit Mastrovito multipliers could be successfully veriﬁed with T = 5 and 10, but failed with T = 20 and 30 threads. For example,the peak memory usage of 571-bit Mastrovito multiplier with T = 20 is . ×

20 = 52

GB, which exceeds the available memory limit.We also tested Montgomery multipliers with bit-width varyingfrom 32 to 283 bits. These experiments are different than those in[5]. In our work, we ﬁrst ﬂatten the Montgomery multipliers beforeapplying our veriﬁcation technique. That is, we assume that onlythe positions of the primary inputs and outputs are known, withoutthe knowledge of the internal structure or clear boundaries of theblocks inside the design. The results are shown in Table II. For 32-to 163-bit Montgomery multipliers, our approach provides on averagea 9.2x, 15.9x, 16.6x, and 17.4x speedup, for T =

5, 10, 20, and 30,respectively. Notice that [5] cannot verify the ﬂattened Montgomerymultipliers larger than 233 bits in 12 hours.In Table II, we observe that CPU runtime for verifying a 163-bitmultiplier is greater than that of a 233-bit multiplier. This is becausethe computation complexity depends not only on the bit-width of themultipliers, but also on the irreducible polynomial P ( x ) .To analyze this dependency, we studied the effects of P ( x ) on4-bit multiplications implemented using different irreducible poly-nomials. The results are reported in Figure 6). We can see thatwhen P ( x ) = x + x + 1 , the longest logic paths for z and z ,include ten and seven products that need to be generated using XORs,respectively. However, when P ( x ) = x + x +1 , the two longest paths, z and z , have only seven and six products. This means that the GF( ) multiplication requires XOR operations using P ( x ) andrequires XOR operations using P ( x ) . In other words, the gate-levelimplementation of the multiplier implemented using P ( x ) has moregates compared to P ( x ) . In conclusion, we can see that irreduciblepolynomial P ( x ) has signiﬁcant impact on both design cost and theveriﬁcation cost of the GF( m ) multipliers. a a a a b b b b a b a b a b a b a b a b a b a b a b a b a b a b a b a b a b a b s s s s s s s P ( x ) = x + x + 1 s s s s s s s s s s s s s z z z z P ( x ) = x + x + 1 s s s s s s s s s s z z z z Fig. 6: Analysis of the computation complexity of Galois ﬁeldmultipliers with different irreducible polynomials using two 4-bitGF multiplications, which are implemented using x + x + 1 and x + x + 1 . B. Runtime and Memory Tradeoff

In this section, we discuss the tradeoff of runtime and memoryusage of our approach. The plots in Figure 7 show how the averageruntime and memory usage change with different number of threads.The vertical axis on the left is CPU runtime (in seconds), and on theright is memory usage (MB). Horizontal axis represents the numberof threads T , ranging from 5 to 30. The runtime is signiﬁcantlyimproved for T between 5 and 15. However there is not muchspeedup when T is greater than 20, most likely due to the memory A v e r age r un t i m e ( s e c ) A v e r age M e m o r y u s age ( M B ) Mas-runtimeMas-memoryMont-runtimeMont-memory

Fig. 7: Runtime and memory usage of our parallel veriﬁcationapproach as a function of number of threads T .management synchronization overhead between the threads. Based onthe experiments of Mastrovito multipliers (Table I), our approach islimited by the memory usage when the size of multiplier and T arelarge. In our work, T = 20 seems to be the best choice. Obviously, T varies on different platform depending on the number of cores, andthe memory. C. Veriﬁcation of Synthesized GF Multipliers

In [10], the authors conclude that highly bit-optimized integer arith-metic circuits are harder to verify than their original, pre-synthesizednetlists. This is because efﬁciency of the rewriting technique relieson the amount of cancellations between the different terms of thepolynomial, and the cancellations strongly depend on the order inwhich signals are rewritten. A good ordering of signals is difﬁcult tobe achieved in highly bit-optimized circuits.In order to see the effect of synthesis on parallel veriﬁcationof GF circuits, we applied our approach to post-synthesized

Galoisﬁeld multipliers with operands up to 409 bits (571-bit multiplierscould not be synthesized in a reasonable time). We synthesized

Mastrovito and

Montgomery multipliers using

ABC tool [18]. Werepeatedly used the commands resyn2 and dch until ABC couldnot reduce the number of levels or the number of nodes any more.The synthesized multipliers were mapped using a 14nm technologylibrary. The veriﬁcation experiments shown in Table III are performedby our tool with T = 20 threads. Our tool was able to verify both 409-bit Mastrovito and

Montgomery multipliers within just 13 minutes.We observe that the Galois ﬁeld multipliers are much easier to beveriﬁed after optimization. For example, the veriﬁcation of a 283-bit Montgomery multiplier takes 15,300 seconds when T =

20. Afteroptimization, the runtime was just 169.2 seconds, which is 90x fasterthan verifying the original implementation. The memory usage is alsoreduced from 488 MB to 194 MB. In summary, in contrast to [10],the bit-level optimization actually reduces the complexity of backwardrewriting process. This is because extracting the function of an outputbit of a GF multiplier depends only on the logic cone of this bit anddoes not require logic from other bits to be simpliﬁed (see Theorem3). Hence, the complexity of function extraction is naturally reducedif logic cone is minimized.VI. C

ONCLUSION

In this paper, we present an algebraic functional veriﬁcationtechnique of gate-level GF (2 m ) multipliers, in which veriﬁcation isperformed in bit-parallel fashion. The method is based on extractinga unique polynomial in Galois ﬁeld of each output bit independently.We demonstrate that this method is able to verify an n -bit GFmultiplier in n threads, while applying on pre- and post-synthesized ”dch” is the most efﬁcient bit-optimization function in ABC. Op size Mastrovito Montgomery

Runtime Mem Runtime Mem64 4.25 s 21 MB 15.3 s 38 MB96 10.9 s 44 MB 40.5 s 54 MB128 28.9 s 77 MB 27.1 s 78 MB163 62.3 s 123 MB 205.2 s 153 MB233 134.8 s 201 MB 141.4 s 199 MB283 168.4 s 198 MB 169.2 s 194 MB409 775.6 s 635 MB 750.6 s 597 MB

TABLE III: Runtime and memory usage of synthesized

Mastrovito and

Montgomery multipliers ( T =20). Mastrovito and

Montgomery multipliers up to 571 bits. The resultsshow that our parallel approach gives average 44 × and 17 × speedupcompared to the best existing algorithm. In addition, we analyzethe effects of irreducible polynomial and synthesis on veriﬁcationof GF( m ) multipliers. Acknowledgment:

This work was supported by an award fromNational Science Foundation, No. CCF-1319496 and No. CCF-1617708. R

EFERENCES [1] J. Lv, P. Kalla, and F. Enescu, “Efﬁcient Grobner Basis Reductions forFormal Veriﬁcation of Galois Field Arithmatic Circuits,”

IEEE Trans.on CAD , vol. 32, no. 9, pp. 1409–1420, September 2013.[2] E. Pavlenko, M. Wedler, D. Stoffel, W. Kunz et al. , “Stable: A new qf-bvsmt solver for hard veriﬁcation problems combining boolean reasoningwith computer algebra,” in

DATE , 2011, pp. 155–160.[3] A. Sayed-Ahmed, D. Große, U. K¨uhne, M. Soeken, and R. Drechsler,“Formal veriﬁcation of integer multipliers by combining grobner basiswith logic reduction,” in

DATE’16 , 2016, pp. 1–6.[4] M. Ciesielski, C. Yu, W. Brown, D. Liu, and A. Rossi, “Veriﬁcation ofGate-level Arithmetic Circuits by Function Extraction,” in .ACM, 2015, pp. 52–57.[5] T. Pruss, P. Kalla, and F. Enescu, “Equivalence Veriﬁcation of LargeGalois Field Arithmetic Circuits using Word-Level Abstraction viaGr¨obner Bases,” in

DAC’14 , 2014, pp. 1–6.[6] C. Yu and M. J. Ciesielski, “Automatic word-level abstraction ofdatapath,” in

ISCAS’16 , 2016, pp. 1718–1721.[7] R. E. Bryant, “Graph-based algorithms for boolean function manipula-tion,”

IEEE Trans. on Computers , vol. 100, no. 8, pp. 677–691, 1986.[8] R. E. Bryant and Y. Chen, “Veriﬁcation of arithmetic circuits with binarymoment diagrams,” in

Proceedings of the 32st Conference on DesignAutomation, San Francisco, California, USA, Moscone Center, June 12-16, 1995. , 1995, pp. 535–541.[9] M. Ciesielski, P. Kalla, and S. Askar, “Taylor Expansion Diagrams: ACanonical Representation for Veriﬁcation of Data Flow Designs,”

IEEETrans. on Computers , vol. 55, no. 9, pp. 1188–1201, Sept. 2006.[10] C. Yu, W. Brown, D. Liu, A. Rossi, and M. J. Ciesielski, “Formalveriﬁcation of arithmetic circuits using function extraction,”

IEEETrans. on CAD of Integrated Circuits and Systems , vol. 35, no. 12,pp. 2131–2142, 2016.[11] O. Wienand, M. Wedler, D. Stoffel, W. Kunz, and G.-M. Greuel, “AnAlgebraic Approach for Proving Data Correctness in Arithmetic DataPaths,”

CAV , pp. 473–486, July 2008.[12] C. Yu, W. Brown, and M. J. Ciesielski, “Veriﬁcation of arithmeticdatapath designs using word-level approach - A case study,” in , 2015, pp. 1862–1865.[13] S. Ghandali, C. Yu, D. Liu, B. Walter, , and M. Ciesielski, “LogicDebugging of Arithmetic Circuits,” in

IEEE Computer Society AnnualSymposium on VLSI (ISVLSI) . IEEE, 2015, pp. 113–118.[14] C. Yu and M. Ciesielski, “Formal Veriﬁcation using Don’t-care andVanishing Polynomials,” in . IEEE, 2016, pp. 284–289.[15] C. Paar and J. Pelzl,

Understanding cryptography: a textbook forstudents and practitioners . Springer Science & Business Media, 2009.[16] C. K. Koc and T. Acar, “Montgomery multiplication in gf (2k),”

Designs, Codes and Cryptography , vol. 14, no. 1, pp. 57–69, 1998. [17] B. Sunar and C¸ . K. Koc¸, “Mastrovito multiplier for all trinomials,”

Computers, IEEE Transactions on , vol. 48, no. 5, pp. 522–527, 1999.[18] A. Mishchenko et al. , “Abc: A system for sequential synthesis andveriﬁcation,” , 2007.[19] W. Decker, G.-M. Greuel, G. Pﬁster, and H. Sch¨onemann, “S