Probability Analysis and Comparison of Well-Known Integer Factorization Algorithms
aa r X i v : . [ m a t h . G M ] J a n Probability Analysis and Comparison ofWell-Known Integer Factorization Algorithms
Duggirala Meher Krishna
Gayatri Vidya Parishad College of Engineering (Autonomous)Madhurawada, VISAKHAPATNAM – 530 048, Andhra Pradesh, IndiaE-mail : [email protected] and
Duggirala Ravi
Gayatri Vidya Parishad College of Engineering (Autonomous)Madhurawada, VISAKHAPATNAM – 530 048, Andhra Pradesh, IndiaE-mail : [email protected]; [email protected];duggirala.ravi@rediffmail.com; [email protected]
Abstract
Two prominent methods for integer factorization are those based ongeneral integer sieve and elliptic curve. The general integer sieve methodcan be specialized to quadratic integer sieve method. In this paper, a prob-ability analysis for the success of these methods is described, under somereasonable conditions. The estimates presented are specialized for the el-liptic curve factorization. These methods are compared through heuristicestimates. It is shown that the elliptic curve method is a probabilisticpolynomial time algorithm under the assumption of uniform probabilitydistribution for the arising group orders and clearly more likely to suc-ceed, faster asymptotically.
Keywords:
Integers; Prime numbers; Unique factorization theorem;General integer sieve; Elliptic curve method.
In this paper, the success probabilities for two prominent methods, viz , generalinteger sieve method and elliptic curve method, are presented. The estimates arespecialized for the elliptic curve factorization algorithm. The random variablesstudied are (1) the number generated by exponentiating a chosen fixed base1andom number to various random integer exponents, for general integer sievemethod, and (2) the group orders of the elliptic curve groups, with restrictionto mod p , for each (as yet unknown) prime factor p of the integer modulusto be factored. The common assumptions taken in our estimates are that theprobabilistic events arising from the consideration of various different smallerprime numbers being factors of any particular realization (sample) of the ran-dom variable are mutually independent. With the assumption of independenceof events corresponding to divisibility by different smaller prime numbers, theprobabilities of success are shown to be fairly optimistic. The general integersieve needs the random base point to be a group generator (primitive in thissense), which may be difficult to ensure. The merits of elliptic curve methodare highlighted, with a caution concerning the widths of the intervals of thepossible group orders. Nevertheless, the estimated probabilities of success donot depend too heavily on this fact, as they are applicable to random samplesform any arbitrary interval of considerable width, for asymptotic analysis. Let Z be the ring of integers, and N be the set of positive integers. Let N bea very large positive integer to be factored, and let Z N be the ring of integerswith arithmetic operations taken mod N .Let L min , L max ∈ Z be such that L min < L max and L max − L min is verylarge. The consecutive prime numbers are listed in the ascending order asfollows: 2 = q , q , q , .... , so that q i is the i -th prime number,for i ∈ N . Let k be a small positive integer, but still large enough that theasymptotic estimates hold good, and let n be the largest positive integer, suchthat q n < max {| L min | , | L max |} . Let X be a random variable taking integervalues in the interval I = (cid:2) L min , L max (cid:3) , with uniform probability distribution. Proposition 1
In the notation just discussed, the probability π X ( z ) of the eventthat a sample of the random variable X is divisible by a positive integer z ≥ is approximately z , and more precisely the following bounds hold good: z − L max − L min ≤ π X ( z ) ≤ z + 1 L max − L min (1) Proof.
For every positive integer z ≥
2, the number of integer multiples of z in I are between (cid:0) L max − L min z − (cid:1) and (cid:0) L max − L min z + 1 (cid:1) . Thus, the probabil-ity that a random sample of X is divisible by z is between z − L max − L min and z + L max − L min , which justifies the assumptions, with appropriate choices of z . (cid:3) The conjunct consideration concerning the divergence of P i q i and the con-vergence of P i q i necessitates taking product spaces. Moreover, the estimatesare presented only for elliptic curve factorization algorithm.2 .1 Success of Elliptic Curve Factorization Let r = l log( N )log( q k ) m , where the choice of k , the number of smaller prime factors tobe used, is assumed to be considerably larger than 2, such as about 1000. Ac-tually, k can run into tens of thousands, for practical purposes, and constrainedby the condition that q r k ≥ N . If q k is too small, then r can be so large that theestimated failure probabilities may become irrelevant. Let C l (cid:0) Z N (cid:1) be ellipticcurves, defined over Z N , for 1 ≤ l ≤ r . Let p be a large but unknown primeinteger factor N , such that p ≤ √ N , and C l (cid:0) Z p (cid:1) be the corresponding ellipticcurves restricted to Z p , for 1 ≤ l ≤ r . The group order of C l (cid:0) Z p (cid:1) is p + 1 − a l ,where − √ p ≤ a l ≤ √ p , by Hasse-Weil bounds for the elliptic curve grouporders. The probability distribution of p + 1 − t of the group order of C (cid:0) Z p (cid:1) ,as obtained by taking mod p restriction of a randomly generated elliptic curve C (cid:0) Z N (cid:1) is assumed to be uniform over the interval I = [( √ p − , ( √ p + 1) ]. Proposition 2
Let C l (cid:0) Z N (cid:1) , for ≤ l ≤ r + 2 , be any ( r + 2) independent sam-ples of the elliptic curves, and p be a fixed (though unknown yet) prime factorof N , such that p ≤ √ N . Let E k +1 be the random event that each of the ( r + 2) group orders p +1 − a l of the elliptic curves C l (cid:0) Z p (cid:1) , for ≤ l ≤ r +2 , is divisibleby a prime factor at least as large as q k +1 , where the prime number p is as-sumed to be such that p | N and p ≥ q k +1 . Then, P r (cid:0) E k +1 (cid:1) ≤ ( r +2)( r +1)+82 × × ( q k +1 − + O (cid:0) ( r +2)( r +1)8 × log( log( p ) ) √ p (cid:1) . Further, if the approximation q i ≈ i log( i ) , for suf-ficiently large positive integer i , is permitted, then P r (cid:0) E k +1 (cid:1) ≤ ( r +2)( r +1)+82 × × k × (log( k +1)) + O (cid:0) ( r +2)( r +1)8 × log( log( p ) ) √ p (cid:1) . Proof.
Before proceeding with the proof, a justification for the validity of theapproximation in the last part is as follows: by the prime number theorem, i ≈ q i log( q i ) < q i log( i ) , and q i is likely to be larger than i log( i ). It may also benoticed that ( r +2)( r +1)+88 k (log( k +1)) ≈ ( r +2)( r +1)+88 q k (log( k +1)) .The random event E k +1 in the statement is broken up into the following twoparts: E k +1 ⊆ E k +1 , ∪ E k +1 , , where1. E k +1 , is the event that there are distinct prime numbers q il ≥ q k +1 , for1 ≤ l ≤ r + 2, such that q il | ( p + 1 − a l ) and q il ∤ ( p + 1 − a l ′ ), for l ′ = l and 1 ≤ l, l ≤ r + 2, and2. E k +1 , is the event that there is a prime number q i ≥ q k +1 , such that q i | ( p + 1 − a l ) and q i | ( p + 1 − a l ′ ), for two indexes l and l ′ , l ′ = l ,where 1 ≤ l, l ≤ r + 2.The two events listed above are not mutually exclusive, but an upper foundfor the sum of their probabilities is found, as an estimate for the upper boundof the event in the statement. 3 art (1). For the event E k +1 , , it is observed that, from the simultaneouscongruence relations p + 1 ≡ a l mod q il , for 1 ≤ l ≤ r + 2, the fixed number p + 1 can be recovered by the Chinese remainder theorem. The mapping a l a l mod q il , for 1 ≤ l ≤ r + 2, induces the homomorphism ( a , · · · , a r +2 ) ( a mod q i , · · · , a r +2 mod q ir +2 ), that preserves the algebraic structure. Inthe proof, it is assumed that the probability distributions remain uniform underthe mapping a l a l mod q il , for 1 ≤ l ≤ r + 2, with restriction on the domainof possible values of ( a mod q i , · · · , a r +2 mod q ir +2 ).By the mutual independence of a l , for 1 ≤ i ≤ r + 2, there are at least4 r +2 Q r +2 l =1 p q il many possibilities, in all, for the set of possible realizations( a mod q i , · · · , a r +2 mod q ir +2 ), after taking into account the restriction that | a l | ≤ √ p . The fixed number p + 1 must belong to the set of positive integersthat can be reconstructed by any realization of ( a mod q i , · · · , a r +2 mod q ir +2 ),with p constrained to be a prime number. Now, the number of possibilitiesfor the realizations for ( a mod q i , · · · , a r +2 mod q ir +2 ), that could resultin the reconstruction of p + 1, with p restricted to be a prime number atmost √ N (or of bit size at most log ( N )2 ), is smaller than Q rl =1 p q il , because (cid:0) √ q k (cid:1) r ≥ √ N > p +12 . Thus, P r (cid:0) E k +1 , (cid:1) ≤ √ q ir +1 q ir +2 ≤ q k +1 . A justifica-tion for this approach is given in a separate paragraph following the proof ofthe second part. Part (2).
For the event E k +1 , , a slightly weaker proof is given in thisparagraph, and a more accurate proof is given the correction part below. Theevent that a prime number q i ≥ q k +1 , such that q i divides the group ordersof both C l (cid:0) Z N (cid:1) and C l ′ (cid:0) Z N (cid:1) , for some l and l ′ , l = l ′ and 1 ≤ l, l ′ ≤ r + 2,occurs with probability ( r +2)( r +1)2 q i , for any i , where i ≥ k + 1. This probabilityalso accounts for the possibility that q i | p + 1 − a l and q i | p + 1 − a l ′ , in case a l = a l ′ , but l = l ′ , where 1 ≤ l, l ′ ≤ r + 2, for some prime number p | N and p ≥ q k +1 . However, there are at least four possibilities that q i divides eithercomponent of the pairs ( p + 1 − a l , p + 1 − a l ′ ), ( p ′ + 1 − a ′ l , p + 1 − a l ′ ),( p + 1 − a l , p ′ + 1 − a ′ l ′ ) and ( p ′ + 1 − a ′ l , p ′ + 1 − a ′ l ′ ), for two distinct primefactors p and p ′ of the composite number N , of which only one possibility istaken into account, for a fixed p . Thus, a multiplier by at most the fraction must be applied. Now, P i ≥ k +1 1 q i < P i ≥ k +1 (cid:2) q i − − q i (cid:3) . < q k +1 − . The resultfollows by adding it to probability bound in the first part.If the approximation q i ≈ i log( i ) is permitted, the probability bound in thesecond part is as follow: P i ≥ k +1 1 q i ≈ P i ≥ k +1 1 i (log( i )) < k +1)) P i ≥ k +1 1 i < k +1)) P i ≥ k +1 (cid:2) i − − i (cid:3) < k (log( k +1)) . (cid:3) In the following, a justification for the upper bound for
P r (cid:0) E k +1 , (cid:1) anda small correction to the upper bound for P r (cid:0) E k +1 , (cid:1) , assuming that N is arandom integer modulus of a prescribed bit size, are given.4 ustification for Upper Bound for P r (cid:0) E k +1 , (cid:1) . Conditional and jointprobabilities over the possible random modulus integer N , of bit size equal toa prescribed parameter ( ⌈ log ( N ) ⌉ ), for independent realizations of the tuples( a , . . . , a r +2 ), with appropriate restrictions on the domains of possible values,are taken into consideration. Let the sequences ( i , . . . , i r +2 ), for i l = i l ′ and k + 1 ≤ i l , i l ′ ≤ n , where 1 ≤ l, l ′ ≤ r + 2, l = l ′ and n is the largest positiveinteger such that q n ≤ ( N + 1) , be enumerated in some particular totalorder, denoted by ≺ . Let X ( i , ...,i r +2 ) be the event that the group order of C l (cid:0) Z N (cid:1) is divisible by q il , for 1 ≤ l ≤ r + 2, over all possible integer moduliof bit size ( ⌈ log ( N ) ⌉ ), excluding the events X ( j , ...,j r +2 ) , for ( j , . . . , j r +2 ) ≺ ( i , . . . , i r +2 ), if any. Now P r (cid:0) E k +1 , (cid:1) ≤ X ( i , ...,i r +2 ) (cid:2) P r (cid:0) X ( i , ...,i r +2 ) (cid:1) × P r (cid:0) the event that p is a large prime numberof bit size at most log ( N )2 , such that,for every l, q il | p + 1 − a l , andfor some l ′ , q jl ′ ∤ p + 1 − a l ′ , whenever( j , . . . , j r +2 ) ≺ ( i , . . . , i r +2 ) , where 1 ≤ l, l ′ ≤ r + 2 (cid:1) (cid:3) ≤ X ( i , ...,i r +2 ) (cid:2) P r (cid:0) X ( i , ...,i r +2 ) (cid:1) × P r (cid:0) the event that p is a large prime numberof bit size at most log ( N )2 , such that,for every l, q il | p + 1 − a l , where 1 ≤ l ≤ r + 2 (cid:1) (cid:3) ≤ X ( i , ...,i r +2 ) P r (cid:0) X ( i , ...,i r +2 ) (cid:1) × q k +1 ≤ q k +1 Small Correction of Upper Bound for
P r (cid:0) E k +1 , (cid:1) . Taking the upperestimate q i + √ p in place of q i , for k + 1 ≤ i ≤ n , the following is obtained: P r (cid:0) E k +1 , (cid:1) ≤ n X i = k +1 (cid:18) q i + 14 √ p (cid:19) = n X i = k +1 (cid:18) q i + 18 q i √ p + 116 p (cid:19) where n is constrained to be the largest positive integer such that q n maypossibly divide both p + 1 − a and p + 1 − a ′ , for some − √ p ≤ a, a ′ ≤ √ p .Since gcd ( p + 1 − a , p + 1 − a ′ ) must divide | a − a ′ | ≤ √ p , it may beassumed that n ≤ √ p log(4 √ p ) , when a = a ′ . The terms accrued from5 . the sum √ p P ni = k +1 1 q i , which can be replaced with log (cid:0) log( q n ) (cid:1) √ p ≈ log (cid:0) √ p +1) (cid:1) √ p ;2. the event that a = a ′ , which is √ p , for independent samples a and a ′ , assumingvalues from the interval [ − √ p , √ p ] ; and3. the sum P ni = k +1 1 p , which can be replaced with (4 √ p ) p log(4 √ p ) = √ p log(4 √ p ) are insignificant for large p . In the statement of the proposition, the effect ofthe correction terms is reflected in the addend O (cid:0) ( r +2)( r +1)8 × log( log( p ) ) √ p (cid:1) .The methods for justification and correction terms are similar to a priori and a posteriori estimation of the probabilities. To be more explicit, the prob-ability that a random prime p being a factor of the random modulus N , where N satisfies the requirements specified by X ( i , ...,i r +2 ) , with specified bit size oflog ( N ) of a fixed number, assuming uniform likelihood among all such primenumbers that may arise, is estimated and shown to be upper bounded by q k +1 .If we were to take p for the probability distribution of this event, we would,actually, get an even smaller upper bound for P r (cid:0) E k +1 , (cid:1) . This indirect ap-proach is necessitated by the difficulties arising out of the need to deal withthe principle of inclusion-and-exclusion in the estimation of the probability ofunion of events, from the probabilities of independent individual atomic events.For instance, if P r (cid:0) E k +1 (cid:1) is replaced with something like P ni = k +1 1 q i P ni =1 1 q i , for somelarge enough n , the resulting failure probability may become totally unrealis-tic. If hyperelliptic curve method can be adapted for factorization, the successprobability may hopefully become better. Let N be a large composite positive integer, and g ∈ Z ∗ N , where Z ∗ N is the groupof invertible elements mod N , with respect to the multiplication mod N . Fora randomly chosen t ∈ Z N , estimates for the probability of the event that everyprime factor of g t mod N is at most q k remain elusive. The operational theoryof general integer sieve method is described below.Let d j be the discrete logarithm of q j , assuming that q j belong to the cyclicsubgroup generated by g , for 1 ≤ j ≤ k . After collecting sufficient numberof samples, a system linear equations of the form P kj =1 ν i, j d j ≡ t i mod φ ( N )is formed, for 1 ≤ i ≤ k , where φ ( N ) is the Euler function of N , which isthe group order of Z ∗ N . Any such relation arise as a result of the factorization g t i = Q kj =1 q ν i,ji , for some random samples t i , for 1 ≤ i ≤ k .From every new relation P kj =1 ν k + l, j d j ≡ t k + l mod φ ( N ), a vector, consistingof integers τ k + l, i , 1 ≤ i ≤ k , as components, may be hopefully found, such that P ki =1 τ k + l, i ν i, j ≡ mod φ ( N ), for l = 1 , , , . . . . Some of the relations maybe redundant, leading to trivial relations. In fact, if two linearly independent6elations P kj =1 ν i, j d j ≡ t i mod φ ( N ), for i = 1 and 2, are obtained, then alinear relation of the form P kj =1 c j d j ≡ mod φ ( N ), for some integers c j ,1 ≤ j ≤ k , not all 0, can be found. In addition, if ρ | c j , 1 ≤ j ≤ k , for someinteger ρ ≥
2, then a relation of the form h ρ = 1 mod N , for some h ∈ Z ∗ N ,can be found out. Linear relations, like P kj =1 c j d j ≡ mod φ ( N ), are calledtrivial, if it so happens that P kj =1 c j d j = 0, even without applying mod φ ( N ).For quadratic integer sieve, mod ρ = 2) is taken, with a view to improve the efficiency,because if g t = 1 mod N , for some integer t , then, with h = g t , ( h −
1) and( h + 1) may yield nontrivial factors of N by gcd .The estimation of probability of generating a linear relation in d j , for 1 ≤ j ≤ k , does not carry over from elliptic curve method to general integer sieve, asthe term ( p +1) plays a pivotal role in our estimation of error probabilities of theelliptic curve factorization method. As for the primitiveness of the chosen baseelement g , it may be observed that the cardinality of Z ∗ N is φ ( N ), and among theelements of Z ∗ N , there are about φ (cid:0) φ ( N ) (cid:1) elements that can be primitive (groupgenerator) elements. For multiple base elements, the primitiveness constraintmay be overcome, but the probability of generating a linear relation is less clearlyunderstood. Subsequently, the merits of elliptic curve factorization method aredescribed. Merits of Elliptic Curve Factorization
1. the method is probabilistic polynomial time algorithm under the assump-tion of uniform probability of the group orders for random modulus ofgiven size ;2. the space requirement is quite small, compared to integer sieve method ;3. if at least one sample of k -smooth group order is realized, then the factor-ization produces a result ; and4. it is not necessary to assume that the initial random point for any selectedcurve is a group generatorHowever, diligence must be exercised while exponentiating by a prime num-ber q i , in that the exponentiation may be conducted for at most log( N )2 log( q i ) times,for every positive integer i ≤ k . The number of curve samples also plays animportant role, which must be taken in parallel, for each exponentiation by q i ,1 ≤ i ≤ k . The probability analysis for the elliptic curve factorization is presented. Themethod is shown to be a probabilistic polynomial time algorithm, under rea-sonable assumptions on the probability distribution of the group orders that7rise, when restriction to a fixed (but unknown) smaller prime factor of themodulus integer to be factored is taken. The integer modulus to be factored istreated as a random variable of fixed size, because it is an input to the factor-ization algorithm. The analysis takes into account the a priori and a posteriori probabilities. The probability of successful factorization is fairly optimistic.
References [1] N. Koblitz, “A Course in Number Theory and Cryptography”,
Springer-Verlag , 1994[2] L. Washington, “Elliptic Curves : Number Theory and Cryptography”,