[PDF] Cumulant Expansion of Mutual Information for Quantifying Leakage of a Protected Secret

Abstract

The information leakage of a cryptographic implementation with a given degree of protection is evaluated in a typical situation when the signal-to-noise ratio is small. This is solved by expanding Kullback-Leibler divergence, entropy, and mutual information in terms of moments/cumulants.

Full PDF

CCumulant Expansion of Mutual Informationfor Quantifying Leakage of a Protected Secret

Olivier Rioul ∗ , Wei Cheng ∗ , and Sylvain Guilley †∗∗ LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France, ﬁ[email protected] † Secure-IC S.A.S., Tour Montparnasse, Paris, France, [email protected]

Abstract —The information leakage of a cryptographic imple-mentation with a given degree of protection is evaluated in atypical situation when the signal-to-noise ratio is small. This issolved by expanding Kullback-Leibler divergence, entropy, andmutual information in terms of moments/cumulants.

I. I

NTRODUCTION

Consider the following threat model in any secrecy orprivacy problem where the adversary guesses a secret (crypto-graphic key, password, identiﬁer, etc.), modeled as a discreterandom variable X , using some observation output of some side channel (power consumption, electromagnetic emanation,acoustic noise, timing, etc.) modeled as a real-valued randomvariable Y . In side-channel applications targeting cryptographicimplementations, the observation is generally made by somenoisy measurement of a sensitive variable Z , an unknown(possibly randomized) function of the secret X which dependson the implementation. The noise is often modeled as Gaussian N ∼ N (0 , σ N ) independent of ( X, Z ) , and the observed Y = Z + N is the output of an AWGN channel. We areinterested in how mutual information I ( X ; Y ) = h ( Z + N ) − h ( Z + N | X ) (1)decreases as noise power σ N increases, that is, in a typical smallsignal-to-noise scenario. The aim is to provide a theoreticalleakage quantiﬁcation as a dependency metric between secret X and attacker’s observation Y . This is particularly interestingfor the designer who needs to evaluate the robustness of agiven implementation to side-channel attacks.In practice, the cipher algorithm is protected by somemasking scheme in such a way that leakage is perfectlybalanced at all orders k < K : E ( Z k | X ) = E ( Z k ) a.s. ( k = 1 , , . . . , K − . (2)Expanding powers Y k = ( Z + N ) k and using the fact that N is independent of X , it follows by induction that E ( Y k | X ) = E ( Y k ) a.s. ( k = 1 , , . . . , K − . (3)The order K is referred to as the high-order correlationimmunity (HCI) order by Carlet et al. [1]. It corresponds to thesmallest moment of leakage that may depend on the secret. Asa result, any attack from observation Y based on correlationanalysis of degree k < K necessarily fails; K is the minimalattack order that can succeed.The question now becomes: How does mutual information I ( X ; Y ) capture the fact that the k th order conditional moment m K ( Y | X = x ) = E ( Y K | X = x ) depends on x when thenoise increases? Carlet et al.’s statement [1] is that I ( X ; Y ) is asymptotically O ( σ − KN ) as σ N → ∞ . This was taken asa fundamental result in the ﬁeld of side-channel analysis. Itwas leveraged to illustrate the strength of leakage squeezing [2,Fig. 4], to compare different countermeasures [3], [4], andwas extended in [5] in the case of a code-based maskingimplementation where countermeasures can reduce mutualinformation by increasing the dual distance of the code andreducing its kissing number.Carlet et al.’s derivation [1], however, is based on Cardoso’ssmall cumulant approximation [6, Eq. (41)] which in factreplaces Kullback-Leibler divergence by its quadratic approx-imation [6, Eq. (29)]. As shown in this paper, this results ina problematic expansion of mutual information [1, Eq. (6)],which may yield ambiguous results. We make the appropriatecorrections and ﬁnd the asymptotic equivalent of I ( X ; Y ) upto K = 6 . Higher protection orders K > are rare in practiceand involve cross-terms which make the asymptotic equivalentharder to ﬁnd. Our main result is then the following . Theorem 1:

Let

X, Z be (discrete or continuous) real-valuedrandom variables satisfying (2) at orders k = 1 , , . . . , K − but not at order K (i.e., with at least one value x such that E ( Z K | X = x ) (cid:54) = E ( Z K ) ). Then if ≤ K ≤ , the followingasymptotic equivalence holds as σ N → ∞ : I ( X ; Y ) ∼ Var (cid:0) E ( Z K | X ) (cid:1) · K ! · ( σ N + σ Z ) K (4)where σ Z = Var( Z ) denotes variance and Var (cid:0) E ( Z K | X ) (cid:1) denotes inter-class variance.Our strategy to prove Theorem 1 is to rewrite mutualinformation in terms of non-Gaussianity terms: I ( X ; Y ) = h ( Y ∗ | X ) − h ( Y | X ) − (cid:0) h ( Y ∗ ) − h ( Y ) (cid:1) = D ( Y (cid:107) Y ∗ | X ) − D ( Y (cid:107) Y ∗ ) (5)where Y ∗ is a Gaussian random variable independent of X (hence h ( Y ∗ | X ) = h ( Y ∗ ) ) with the same ﬁrst and secondorder moments as Y . We then go beyond the quadraticcumulant approximation of Cardoso [6, Eq. (29)] and investigatehow Kullback-Leibler divergences D ( Y (cid:107) Y ∗ ) and D ( Y (cid:107) Y ∗ | X ) behave as σ increases, using a Gram-Charlier expansion [7] interms of a sequence of “modiﬁed moments”. This will ﬁll the Throughout we use natural logarithms so that informational quantities areexpressed in nats . a r X i v : . [ c s . I T ] F e b ap in proving Carlet et al.’s main result [1, Thm. 1], while alsogiving the asymptotic equivalent for ≤ K ≤ . As we shallsee, some annoying cross-terms prevent any straightforwardgeneralization for K > .The remainder of the paper is organized as follows. Section IIreviews a kind of Gram-Charlier expansion and derives thecorresponding non-Gaussianity expansions. Section III givesthe resulting expansions of mutual information and explainswhy the extension of (4) to K > is problematic. Numericalvalidation is carried out in Section IV in a practical code-basedmasking scheme in AES with Hamming weight leakage model.Section V concludes.II. C UMULANT E XPANSION OF N ON -G AUSSIANITY

Non-Gaussianity D ( Y (cid:107) Y ∗ ) = h ( Y ∗ ) − h ( Y ) is a nonnega-tive quantity which vanishes if and only if Y is Gaussian. Fornotational convenience write µ = µ Y and σ = σ Y . Because Y and Y ∗ share the same mean µ and variance σ , it is convenientto write their densities in the form σ f ( y − µσ ) and σ g ( y − µσ ) ,respectively, where f and g are standardized densities (inparticular g = N (0 , ). Since Kullback-Leibler divergence isinvariant by invertible transformations, one has D ( Y (cid:107) Y ∗ ) = D (cid:0) Y − µσ (cid:13)(cid:13) Y ∗ − µσ (cid:1) = D ( f (cid:107) g ) = (cid:90) f log fg . (6) A. Density Expansion As σ N increases, σ = (cid:112) σ N + σ Z → ∞ but high-order cumulants κ , κ , . . . , κ K of Y remain bounded. In fact for k ≥ , κ k = κ k ( Y ) = κ k ( Z ) + κ k ( N ) = κ k ( Z ) are keptconstant. On the other hand since Y ∗ is Gaussian, all its high-order cumulants are zero. This, as we show in the next Lemma,can be used to show that the Gaussian noise N dominates in Y = Z + N so that f will approach the Gaussian g : Lemma 1 (Gram-Charlier Expansion): f ( x ) g ( x ) = 1 + K (cid:88) k =3 (cid:101) m k k ! σ k H k ( x ) + o (cid:0) σ K (cid:1) (7)where H k is the k -th Hermite polynomial ( H ( x ) = x − x , H ( x ) = x − x + 3 , H ( x ) = x − x + 15 x , etc.) andwhere the “modiﬁed moments” (cid:101) m k satisfy the recursion (cid:101) m k = κ k + k − (cid:88) j =3 (cid:18) k − j (cid:19) (cid:101) m j κ k − j . (8)The modiﬁed moments are computed exactly as the genuinemoments m k are computed from the cumulants κ k usingSmith’s formula [8], except that κ and κ are absent.Thus (cid:101) m = (cid:101) m = 0 , (cid:101) m = κ , (cid:101) m = κ , (cid:101) m = κ , (cid:101) m = κ +10 κ , (cid:101) m = κ +35 κ κ , etc. Notice that modiﬁedmoments, like high-order cumulants, are bounded as σ → ∞ . Proof:

By deﬁnition of cumulants, the characteristicfunction φ Y ( t ) = E ( e itY ) of Y can be factorized as φ Y ( t ) = φ Y ∗ ( t ) e ψ ( t ) (9)where φ Y ∗ ( t ) = e iµ Y t − σ Y t / is the characteristic function of Y ∗ ∼ N ( µ, σ ) and ψ ( t ) = (cid:80) Kk =3 κ k ( it ) k k ! + o ( t K ) . Taking the exponential we expand exp ψ ( t ) = 1 + (cid:80) Kk =3 (cid:101) m k ( it ) k k ! + o ( t K ) . The coefﬁcients (cid:101) m k can be found by Taylor’s formulaand Leibniz’s rule: i k (cid:101) m k = ( e ψ ) ( k ) (0) = ( ψ (cid:48) e ψ ) ( k − (0) = (cid:80) j (cid:0) k − j (cid:1) ( e ψ ) ( j ) (0) ψ ( k − j ) (0) which simpliﬁes to (8). Now (9)becomes φ Y ( t ) = (cid:16) K (cid:88) k =3 (cid:101) m k k ! ( it ) k (cid:17) φ Y ∗ ( t ) + o ( t K ) φ Y ∗ ( t ) . (10)Taking the inverse Fourier transform gives the density of Y : σ f (cid:0) y − µσ (cid:1) = (cid:16) K (cid:88) k =3 (cid:101) m k k ! ( − dd y ) k (cid:17) σ g (cid:0) y − µσ (cid:1) + R ( y ) where we have used that multiplication by ( − it ) in the Fourierdomain (characteristic function) corresponds to differentiation.Now by the deﬁning property of Hermite polynomials, ( − dd y ) k g (cid:0) x − µσ (cid:1) = 1 σ k H k (cid:0) x − µσ (cid:1) · g (cid:0) x − µσ (cid:1) . The o ( t K ) term in (10) having at most polynomial growth atinﬁnity, we can apply Watson’s lemma [9, Chap. 2] for theremainder term R ( y ) , which gives R ( y ) = o ( σ − K ) (with atmost polynomial growth in y at inﬁnity). Letting x = x − µσ and dividing by g ( x ) > gives the announced expansion. Remark 1:

Contrary to what seems to be a popular beliefin the literature (see e.g., [6]), the coefﬁcients multiplying theHermite polynomials in the Gram-Charlier expansion (7) arenot just cumulants κ k , but “modiﬁed moments” (cid:101) m k , whichdiffer from cumulants as soon as k ≥ . B. Divergence ExpansionTheorem 2:

The expansion of divergence in power of σ isof the form D ( f (cid:107) g ) = K (cid:88) k =3 c k k ! σ k + o (cid:16) σ K (cid:17) (11)where c k = (cid:101) m k + other terms of the form α m (cid:101) m k (cid:101) m k · · · (cid:101) m k m where m ≥ and k + k + · · · + k m = 2 k . Proof:

Using (7) in the form fg = 1 + h where h = (cid:80) Kk =3 (cid:101) m k k ! σ k H k ( x )+ O ( σ − K ) , we proceed to expand D ( f (cid:107) g ) = (cid:82) g (1 + h ) log(1 + h ) where (1 + h ) log(1 + h ) = h + h − h + h + · · · + o ( h K ) . Substituting gives D ( f (cid:107) g ) = (cid:90) gh + 12 (cid:90) gh − (cid:90) gh + 112 (cid:90) gh + · · · + o (cid:16)(cid:90) gh K (cid:17) . (12)By the orthogonality property of Hermite polynomials (cid:90) gH k H l = k ! δ kl , (13)one has (cid:82) gH k = (cid:82) gH k H = 0 ( k > ) hence (cid:82) gh = 0 .Moreover, by orthogonality, (cid:82) gh = (cid:80) Kk =3 (cid:0) (cid:101) m k k ! σ k (cid:1) k ! + o ( σ − K ) = (cid:80) Kk =3 (cid:101) m k k ! σ k + o ( σ − K ) . Thus the quadratic part (cid:82) gh accounts for the (cid:101) m k k ! σ k terms ( k ≥ ).The expansion of all higher-order terms (cid:82) gh m ( m ≥ )involve terms of the form (cid:101) m k (cid:101) m k ··· (cid:101) m km σ k k ··· + km (cid:82) gH k H k · · · H k m .ince each Hermite polynomial H k has the same parity as itsdegree k , all such terms vanish if k + k + · · · + k m is odd.Hence there remains only terms in σ k as stated. Remark 2:

The asymptotic D ( f (cid:107) g ) = (cid:82) gh + o (cid:0) (cid:82) gh (cid:1) was already proved in [10, Lemma 1]. C. First Few Terms in the Divergence Expansion

We can carry out the computations up to K = 6 . The cubicand quartic terms can be evaluated at ﬁrst orders using thespecial values [11, §6.8]: (cid:82) gH H = = 216 , (cid:82) gH = = 1728 , (cid:82) gH H H = = 1440 , (cid:82) gH H = = 720 , and (cid:82) gH = 3 + 6 + = 3348 ,plus the fact that all terms in odd powers of σ are zero (sincethey involve integrals (cid:82) gH k H l H m = 0 when k + l + m isodd). After some calculation we obtain (cid:90) gh = 648 σ (cid:16) (cid:101) m (cid:17) (cid:101) m

4! + 1728 σ (cid:16) (cid:101) m (cid:17) + 8640 σ (cid:101) m (cid:101) m (cid:101) m σ (cid:16) (cid:101) m (cid:17) (cid:101) m

6! + O (cid:16) σ (cid:17) and (cid:90) gh = 3348 σ (cid:16) (cid:101) m (cid:17) + O (cid:16) σ (cid:17) . Putting all pieces together and expressing modiﬁed momentsin terms of cumulants, we obtain D ( f (cid:107) g ) = (cid:101) m σ + (cid:101) m σ + (cid:101) m σ − (cid:101) m (cid:101) m σ + (cid:101) m σ − (cid:101) m σ − (cid:101) m (cid:101) m (cid:101) m σ − (cid:101) m (cid:101) m σ + 31 (cid:101) m σ + O (cid:16) σ (cid:17) = κ σ + κ σ − κ κ σ + κ σ + 7 κ σ − κ σ − κ κ κ σ + κ σ + O (cid:16) σ (cid:17) . (14) Remark 3:

In order to check the validity of (14), we canrecover a known expression in a different model. Instead ofhaving Y = Z + N , suppose that Y = Y + Y + · · · + Y n wherethe Y i ’s are i.i.d. with mean µ , variance σ , and high-ordercumulants κ , κ , . . . . The previous expansions can be usedby replacing σ by √ nσ , κ k by nκ k , and letting n → + ∞ .The Gram-Charlier expansion , re-ordered in powers of √ n ,becomes the Edgeworth expansion fg = 1+ κ σ √ n H + κ σ n H + κ σ n H + κ σ n √ n H + κ κ σ n √ n H + κ σ n √ n H + O (cid:0) n (cid:1) . (15)It is easily seen that all O (cid:0) σ (cid:1) terms in (14) are thennecessarily O (cid:0) n (cid:1) . Four terms out of the eight in (14) arealso O (cid:0) n (cid:1) , and there remains D ( f (cid:107) g ) = κ nσ + κ n σ − κ κ n σ + 7 κ n σ + O (cid:16) n (cid:17) (16) which is exactly the result of Comon [12, Thm 14] for his“negentropy” D ( f (cid:107) g ) = h ( g ) − h ( f ) = log(2 πeσ ) − h ( f ) . Remark 4:

The expansion (14) contrasts with Cardoso’s smallcumulant approximation to the Kullback-Leibler divergence [6,Eq. (41)] which in our setting would read κ σ + κ σ + κ σ + κ σ + · · · The difference with (14) is due to two facts: (a) as alreadynoticed in Remark 1, the coefﬁcients of the

Gram-Charlierexpansion (7) are not the cumulants κ k for k ≥ , but themodiﬁed moments (cid:101) m k , which differ from cumulants as soonas k ≥ ; (b) Cardoso’s derivation only takes the quadraticapproximation (cid:82) gh of divergence into account, ignoringhigher order terms such as (cid:82) gh = O ( σ ) .While (a) and (b) have no effect for the ﬁrst two terms D ( f (cid:107) g ) = κ σ + κ σ + O (cid:0) σ (cid:1) , both result in annoyinghigher-order cross-terms in the genuine expression (14) whichdo not appear in [6]. Because of this, derivations basedon [6, Eq. (41)], particularly the main result of [1], becomequestionable as soon as O (cid:0) σ (cid:1) terms are considered. Remark 5:

Since D ( f (cid:107) g ) = D ( Y (cid:107) Y ∗ ) = h ( Y ∗ ) − h ( Y ) = log(2 πeσ ) − h ( Y ) we have the following expansion of(differential) entropy: h ( Y ) = 12 log(2 πeσ ) − κ σ − κ σ + κ κ σ − κ σ − κ σ + κ σ + κ κ κ σ − κ σ + O (cid:16) σ (cid:17) . (17)III. C UMULANT E XPANSION OF M UTUAL I NFORMATION

A. Mutual Information Expansion

We now apply the expansion (14) to both terms D ( Y (cid:107) Y ∗ ) and D ( Y (cid:107) Y ∗ | X ) in (5). To simplify the derivation we assumethat ( K − th order protection (2) holds at least for the ﬁrst twomoments (hence K ≥ ): µ = µ Y = µ Y | X = x and σ = σ Y = σ Y | X = x for all x . We can, therefore, apply (14) for D ( Y (cid:107) Y ∗ ) and D ( Y (cid:107) Y ∗ | X = x ) for a given secret value x , and thentake the expectation over X . Letting κ k ( Z ) = κ k ( Y ) = κ k and κ k ( Z | X = x ) = κ k ( Y | X = x ) ( k ≥ ) be the high-ordercumulants of Z and Z | X = x , respectively, we readily obtain I ( X ; Y ) = E κ ( Z | X ) − κ ( Z )12 σ + E κ ( Z | X ) − κ ( Z )48 σ − E (cid:0) κ ( Z | X ) κ ( Z | X ) (cid:1) − κ ( Z ) κ ( Z )8 σ + E κ ( Z | X ) − κ ( Z )240 σ + 7 (cid:0) E κ ( Z | X ) − κ ( Z ) (cid:1) σ − E κ ( Z | X ) − κ ( Z )48 σ − E (cid:0) κ ( Z | X ) κ ( Z | X ) κ ( Z | X ) (cid:1) − κ ( Z ) κ ( Z ) κ ( Z )12 σ + E κ ( Z | X ) − κ ( Z )1440 σ + O (cid:16) σ (cid:17) . (18) emark 6: This contrasts with the high-order expansion ofmutual information in [1, Eq. (6)] which reads I ( X ; Y ) = E (cid:0) κ ( Z | X ) − κ ( Z ) (cid:1) σ + E (cid:0) κ ( Z | X ) − κ ( Z ) (cid:1) σ + E (cid:0) κ ( Z | X ) − κ ( Z ) (cid:1) σ + E (cid:0) κ ( Z | X ) − κ ( Z ) (cid:1) σ + O (cid:16) σ (cid:17) . The difference with (18) is due to three facts: (a) and (b) leadingto annoying cross-terms in the non-Gaussianity expansion, asexplained in Remark 4; (c) terms of the form E κ k ( Z | X ) − κ k ( Z ) can be written as variances E κ k ( Z | X ) − κ k ( Z ) = E (cid:0) κ k ( Z | X ) − κ k ( Z ) (cid:1) (19)only under the condition that κ k ( Z ) = E κ k ( Z | X ) . Thiscondition indeed holds for k = 3 , , under the aboveassumptions because of the well-known expressions of κ , κ , and κ in terms of moments m , m , m , m , m ,where the quantities m ( Z | X = x ) = E ( Z | X = x ) = E ( Z ) = m ( Z ) and m ( Z | X = x ) = E ( Z | X = x ) = E ( Z ) = m ( Z ) do not depend on X = x and where m k ( Z ) = E ( Z k ) = E E ( Z k | X ) = E m k ( Z | X ) . However,the condition κ k ( Z ) = E κ k ( Z | X ) is no longer satisﬁed for k = 6 because of the − m term in the expression of κ = m − m m − m m +30 m m − m +120 m m m − m m + 30 m − m m + 360 m m − m .While (a), (b), and (c) have no effect for the ﬁrst two terms I ( X ; Y ) = E ( κ ( Z | X ) − κ ( Z )) σ + E ( κ ( Z | X ) − κ ( Z )) σ + O (cid:0) σ (cid:1) ,they result in annoying higher-order cross-terms in the genuineexpression (18) which do not appear in [1]. Proof of the main Theorem 1:

The HCI condition (2)states that m k ( Z | X ) = m k ( Z ) a.s. for k < K . Now fromthe well-known formulas expressing cumulants in terms ofmoments, one has κ k ( Z | X ) = m k ( Z | X )+ lower-order termsin m ( Z | X ) = m ( Z ) , . . . , m k − ( Z | X ) = m k − ( Z ) . Itfollows that κ k ( Z | X ) = κ k ( Z ) a.s. for k < K whilefor k = K , we have κ K ( Z | X ) − m K ( Z | X ) = κ K ( Z ) − m K ( Z ) . Thus, κ K ( Z | X ) − κ K ( Z ) = m K ( Z | X ) − m K ( Z ) and in particular E κ K ( Z | X ) − κ K ( Z ) = E m K ( Z | X ) − m K ( Z ) = E E ( Z K | X ) − E ( Z K ) = 0 . Therefore, we can write E κ K ( Z | X ) − κ K ( Z ) = Var (cid:0) κ K ( Z | X ) (cid:1) = E (cid:0) κ K ( Z | X ) − κ K ( Z ) (cid:1) = E (cid:0) m K ( Z | X ) − m K ( Z ) (cid:1) = Var (cid:0) m K ( Z | X ) (cid:1) =Var (cid:0) E ( Z K | X ) (cid:1) which is nonzero since E ( Z K | X ) is notconstant a.s.By examination of (18) when K ≤ , it is easily seen that I ( X ; Y ) = E κ K ( Z | X ) − κ K ( Z )2 · K ! · σ K + o (cid:16) σ K (cid:17) = Var (cid:0) E ( Z K | X ) (cid:1) · K ! · σ K + o (cid:16) σ K (cid:17) (20)where σ = σ Y = σ N + σ Z . Remark 7:

What makes the proof of Theorem 1 work in thatin (14), all terms in σ k ( k = 3 , , , ) involve only cumulantsof order ≤ k .This property, however, does not generalize to higher orders.In fact by Theorem 2, there is at least one additional term in σ in the form α (cid:101) m (cid:101) m (since ) which willcontribute to a term α κ ( Z )( E κ ( Z | X ) − κ ( Z )) σ in addition ofthe E κ ( Z | X ) − κ ( Z )10080 · σ of (20). Assuming κ ( Z ) (cid:54) = 0 , we stillhave I ( X ; Y ) = O ( σ − K ) for K = 7 but with a differentasymptotic equivalent.Furthermore, again assuming κ ( Z ) (cid:54) = 0 , for K = 8 the term α κ ( Z )( E κ ( Z | X ) − κ ( Z )) σ still contributes to mutualinformation so that in the case it is no longer true that I ( X ; Y ) = O ( σ − K ) . We still have I ( X ; Y ) = O ( σ − ) instead of I ( X ; Y ) = O ( σ − ) .In general for higher orders, the terms α m (cid:101) m k (cid:101) m k · · · (cid:101) m k m ( m ≥ , k i ≥ , k + k + · · · + k m = 2 k ) of Theorem 2will not contribute to I ( X ; Y ) only when all k i are necessarily < K . Since the maximum possible k i is k − (for m = 3 ,the other two k i ’s being equal to ), we must have at least (2)satisﬁed at order k − to ensure that I ( X ; Y ) = O ( σ − k +1) ) .Therefore, for K ≥ , I ( X ; Y ) = O ( σ − K ) requires an HCIat least K − .In practice, such extremely high-order protection ( K =5 , , , . . . ) is unthinkable for all implementations. Hence Theo-rem 1 will apply to all cases of interest. In the following sectionwe illustrate this using a code-based masked implementationfor K ≤ . IV. N UMERICAL S IMULATIONS

Consider an advanced encryption standard (AES [13]) blockcipher, which takes in input a plaintext of bytes, and outputsa ciphertext of the same size. The attacker is able to monitorinputs and outputs, but does not know the secret key. In such acryptographic algorithm, it is practically impossible to deducethe secret from inputs and outputs: all the security relies on thesecrecy of the key, in keeping with Kerckhoffs’s principle [14](a.k.a. Shannon’s maxim [15]). Q : ··· a3b9 b85a 67 0dbe 8f20 40 12 629f 15 b5 c8ae00 8be1 76 d764 8b7b fe 9c76 9d ba dd9e X (one byte) Y (one real) a b s tr a c t i o n plaintextKnown ciphertextKnownGuessed1st round Measured noisyside-channel traces S Unknownmaster key Oscilloscope S keyLast round A n t e nn a Fig. 1. Public information (plaintext and ciphertext) available to an attackerand ﬁrst-round key-dependent intermediate (discrete) value, put in front ofcorresponding side-channel execution traces (analog) observed by the attacker.

Side-channel attacks consist in measuring power consump-tion [16] or electromagnetic (EM) waves [17] produced duringhe execution of the AES algorithm. As shown in Fig. 1, theattacker measures waveforms corresponding to the side-channelemanation of the AES computation. Such side information(repeatedly collected many times) is correlated to the secret key,and the attacker tries to exploit it in order to validate assump-tions on small chunks of the key. In Fig. 1, the reference -bit key is (taken as an example in the NIST speciﬁcation [13, Ap-pendix A]) and the guessed values are that of the ﬁrst round ofAES, which consists in the application of AES SubBytes on theplaintext XORed with the key. The measured waveforms aretime series of power or EM emanations, which depend on theplaintext (or equivalently, on the ciphertext, since encryptionis symmetric). Some speciﬁc samples depend on small chunksof the plaintext/ciphertext and of the secret key, and are usedby the attacker to assess hypotheses on the key.We consider a practical case where the block cipher algorithmis protected by a masking scheme [18]. The order of protectionrarely exceeds K = 4 . Speciﬁcally, we target a two-sharemasking scheme [5] in which the key chunk X ∈ F q is encoded as ( X ⊕ ( C ⊗ M ) , M ) , using an independentuniformly distributed mask M ∈ F q and a nonzero constant C ∈ F q . Here ⊕ and ⊗ denote the addition and multiplication,respectively, in a ﬁnite ﬁeld F q = F or F .Hamming weight model + Gaussian noise is a commonlyused model in side-channel analysis. The leaked sensitivevariable is modeled as a Z = w H ( X ⊕ ( C ⊗ M )) + w H ( M ) where w H ( · ) denotes the Hamming weight (number of nonzerobits) and Y = Z + N where N ∼ N (0 , σ N ) . As demonstratedin [5] and shown in Table I, both HCI K and Var (cid:0) E ( Z K | X ) (cid:1) change with different choices of C . We can, therefore, validateTheorem 1 in multiple cases. TABLE ID

IFFERENT K AND V k = Var (cid:0) E ( Z k | X ) (cid:1) ( k = 1 , . . . , K ) BY USINGDIFFERENT C ( IN DECIMAL REPRESENTATION ). X ∈ F X ∈ F C K V V V - - - 0.25 - - 3.9375 0.5625 0 V - - - - - - - - 6.75 The numerical results of mutual information are shown inFig. 2 in log-log scale, where slope − K indicate I ( X ; Y ) ∼ Cst · σ − KN . We observe the ﬁrst nonzero order expansion ofmutual information dominates when the noise level is highenough (e.g., when σ N ≥ ). Overall Theorem 1 gives anaccurate approximation of mutual information.V. C ONCLUSION

In this paper, we presented a cumulant-based expansionof Kullback-Leibler divergence and mutual information with The irreducible polynomial we used in this paper are α + α + 1 for F and α + α + α + α + 1 for F . N N Fig. 2. Numerical validation of Theorem 1 by taking different C thereforedifferent K and Var (cid:0) E ( Z K | X ) (cid:1) (a) X ∈ F . (b) X ∈ F . application to side-channel analysis. We ﬁxed the mathematicalissue that existed in the literature and proposed a rigorous prooffor the main result in [1] in most cases of interest.R EFERENCES[1] C. Carlet, J.-L. Danger, S. Guilley, H. Maghrebi, and E. Prouff,“Achieving side-channel high-order correlation immunity with leakagesqueezing,”

J. Cryptographic Engineering , vol. 4, no. 2, pp. 107–121,2014.[2] C. Carlet, J. Danger, S. Guilley, and H. Maghrebi, “Leakage squeezing:Optimal implementation and security evaluation,”

J. MathematicalCryptology , vol. 8, no. 3, pp. 249–295, 2014. [Online]. Available:http://dx.doi.org/10.1515/jmc-2012-0018[3] A. Duc, S. Faust, and F. Standaert, “Making Masking SecurityProofs Concrete - Or How to Evaluate the Security of Any LeakingDevice,” in

Advances in Cryptology - EUROCRYPT 2015 - 34thAnnual International Conference on the Theory and Applicationsof Cryptographic Techniques, Soﬁa, Bulgaria, April 26-30, 2015,Proceedings, Part I , ser. Lecture Notes in Computer Science, E. Oswaldand M. Fischlin, Eds., vol. 9056. Springer, 2015, pp. 401–429. [Online].Available: http://dx.doi.org/10.1007/978-3-662-46800-5_16[4] V. Grosso and F. Standaert, “Masking Proofs Are Tight and Howto Exploit it in Security Evaluations,” in

Advances in Cryptology- EUROCRYPT 2018 - 37th Annual International Conference onthe Theory and Applications of Cryptographic Techniques, Tel Aviv,Israel, April 29 - May 3, 2018 Proceedings, Part II , ser. LectureNotes in Computer Science, J. B. Nielsen and V. Rijmen, Eds.,vol. 10821. Springer, 2018, pp. 385–412. [Online]. Available:https://doi.org/10.1007/978-3-319-78375-8_13[5] W. Cheng, S. Guilley, C. Carlet, S. Mesnager, and J. Danger,“Optimizing Inner Product Masking Scheme by a Coding TheoryApproach,”

IEEE Trans. Inf. Forensics Secur. , vol. 16, pp. 220–235,2021. [Online]. Available: https://doi.org/10.1109/TIFS.2020.30096096] J. Cardoso, “Dependence, correlation and Gaussianity in independentcomponent analysis,”

J. Mach. Learn. Res. , vol. 4, pp. 1177–1203, 2003.[Online]. Available: http://jmlr.org/papers/v4/cardoso03a.html[7] A. Hald, “The early history of the cumulants and the Gram-Charlierseries,”

International Statistical Review , vol. 68, no. 2, pp. 137–153,2000.[8] P. J. Smith, “A recursive formulation of the old problem of obtainingmoments from cumulants and vice versa,”

The American Statistician ,vol. 49, no. 2, pp. 217–218, 1995.[9] P. D. Miller,

Applied asymptotic analysis . American Mathematical Soc.,2006, vol. 75.[10] E. Abbe and L. Zheng, “A coordinate system for Gaussian networks,”

IEEE Trans. Inf. Theory , vol. 58, no. 2, pp. 721–733, 2012. [Online].Available: https://doi.org/10.1109/TIT.2011.2169536[11] G. E. Andrews, R. Askey, and R. Roy,

Special Functions . CambridgeUniversity Press, 1999.[12] P. Comon, “Independent component analysis, a new concept?”

SignalProcess. , vol. 36, no. 3, pp. 287–314, 1994. [Online]. Available:https://doi.org/10.1016/0165-1684(94)90029-9[13] NIST/ITL/CSD, “Advanced Encryption Standard (AES). FIPS PUB 197,”Nov 2001, http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf (alsoISO/IEC 18033-3:2010). [14] A. Kerckhoffs, “La cryptographie militaire (2),”

Journal des sciencesmilitaires , vol. 9, pp. 161–191, February 1883, http://en.wikipedia.org/wiki/Kerckhoffs_law.[15] C. E. Shannon, “Communication theory of secrecy systems,”

BellSystem Technical Journal , vol. 28, pp. 656–715, octobre 1949. [Online].Available: https://doi.org/10.1002%2Fj.1538-7305.1949.tb00928.x[16] P. C. Kocher, J. Jaffe, and B. Jun, “Differential Power Analysis,” in

Proceedings of CRYPTO’99 , ser. LNCS, vol. 1666. Springer-Verlag,1999, pp. 388–397.[17] K. Gandolﬁ, C. Mourtel, and F. Olivier, “Electromagnetic analysis:Concrete results,” in

Proceedings of the Third International Workshopon Cryptographic Hardware and Embedded Systems , ser. CHES ’01.London, UK, UK: Springer-Verlag, 2001, pp. 251–261. [Online].Available: http://dl.acm.org/citation.cfm?id=648254.752700[18] E. Prouff and M. Rivain, “Masking against Side-Channel Attacks: AFormal Security Proof,” in