aa r X i v : . [ m a t h . N T ] J u l Number Field Sieve with provable complexity
Barry van Leeuwen
Supervisor: Dr. A.R. BookerChair: Dr. T. Dokchitser (University of Bristol)Examiners: Dr. J. Bober (University of Bristol)Dr. S. Siksek (University of Warwick)
A dissertation submitted to the
University of Bristol in accordance with the requirements for award of the degree of
Master of Science by Research in Mathematics at the
Faculty of Science
School of Mathematics
July 14, 2020
Word count: 28557 bstract
In this thesis we give an in-depth introduction to the General Number Field Sieve, as it was usedby Buhler, Lenstra, and Pomerance, [17], before looking at one of the modern developments ofthis algorithm: A randomized version with provable complexity. This version was posited in2017 by Lee and Venkatesan, [14], and will be preceded by ample material from both algebraicand analytic number theory, Galois theory, and probability theory.Page 1 of 114 edication and Acknowledgements
I want to thank Dr. Andrew Booker, who as my supervisor managed to find what I needed eventhough it may not have been what I wanted. I also want to thank Dr. James Milne, Dr. FlorianBouyer, and Dr. Lynne Walling for providing some of the material used. I also want to thankDr. Dan Fretwell, who helped me find my footing when I just started (which feels very long ago)and my mother, Cokky van Leeuwen, who with her sorcery managed to find typos that I couldnot.
To my wife, Sarah van Leeuwen, who with her continued support andmotivation made possible what I thought impossible.
Page 2 of 114 eclaration
I declare that the work in this dissertation was carried out in accordance with the requirementsof the University’s Regulations and Code of Practice for Research Degree Programmes and thatit has not been submitted for any other academic award. Except where indicated by specificreference in the text, the work is the candidate’s own work. Work done in collaboration with,or with the assistance of, others, is indicated as such. Any views expressed in the dissertationare those of the author.Signed: Barry van LeeuwenDate: July 14, 2020 Page 3 of 114 ontents
Abstract 1Contents 5Notation 61 Introduction 72 Algebraic fundamentals 9 Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.3 Sieving over Z [ α ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3 Dealing with obstructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.4 Computational efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 L -functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2.3 Dedekind zeta functions and Siegel zeroes . . . . . . . . . . . . . . . . . 604.2.4 Siegel zeroes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.3 Arithmetic progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.3.1 Extensions and Refinements . . . . . . . . . . . . . . . . . . . . . . . . . 644.3.2 Smooth numbers on Arithmetic Progressions . . . . . . . . . . . . . . . . 654.4 Probability measures and moments . . . . . . . . . . . . . . . . . . . . . . . . . 68 References 112
Page 5 of 114 otation
A brief introduction to some of the notation used. Most of the notation listed will be formallyintroduced, but this can be used as a reference guide. R [ α ] - A ring extended by an element αF ( α ) - A field extended by an element α O K - The ring of algebraic integers in a field extension Kf.f. ( R ) - The field of fractions generated by a ring R irr F ( a ) - The minimal polynomial of a over F [ a ] ∈ B - The coset of a in structure B h a i ⊂ B - The ideal generated by a in structure B L ( s, χ ) - Dirichlet L-function with character χ and exponent sa ∼ B - Draw a accoridng to distribution Bπ ( X ) - The number of primes below X In this paper we use Bachmann–Landau Big-O notation extended by Knuth Big-Ω notation,such that for functions f ( x ) and g ( x ) and N ∈ N : f ( x ) = o ( g ( x )) - ∀ k ∈ R > ∃ N such that ∀ x ≥ N : | f ( x ) | ≤ k · g ( x ) f ( x ) = O ( g ( x )) - ∃ k ∈ R > ∃ N such that ∀ x ≥ N : | f ( x ) | ≤ k · g ( x ) f ( x ) = ω ( g ( x )) - ∀ k ∈ R > ∃ N such that ∀ x ≥ N : | f ( x ) | ≥ k · g ( x ) f ( x ) = Ω ( g ( x )) - ∃ k ∈ R > ∃ N such that ∀ x ≥ N : | f ( x ) | ≥ k · g ( x )Page 6 of 114 Introduction
In 1988 Pollard introduced a brand new factorization algorithm: The Number Field Sieve (NFS).This saw a first implementation in 1994 by Lenstra et al. and was used to factor the ninth Fer-mat number, F .The grandure of this algorithm was in the conjectured complexity of L n (cid:18) , r
649 + o (1) (cid:19) , where n ∈ N and for a, b, x ∈ R : L x ( a, b ) = exp( b ((log n ) a (log log n ) − a ) . This was a major improvement to the best algorithms at the time, such as the quadratic sieve,which were of complexity L n ( , b ) for some constant b .Despite abundant heuristics indicating the bound and copious research poured into a proofit was very difficult to show the complexity of the two major parts of the sieve:1. Using the concept of smoothness to find a factor base.2. Reducing this factor base to a factorization of a given n ∈ N .In chapter 3 we will be looking at the General Number Field Sieve (GNFS) how it was explicitlydescribed in [17] and how the sieve is structured.It wasn’t until 2017 that Lee and Venkatesan, [14]. used a probabilistic and combinatorialapproach to alter the algorithm in such a way that they managed to prove the following, ([14],theorem 2.1 & 2.3): Page 7 of 114 heorem 1.1. There is a randomised variant of the Number Field Sieve which for each n findscongruences of squares x ≡ y mod n in expected time: L n (cid:18) , r
649 + o (1) (cid:19) Remark.
Note that no claims are made regarding the factorization of n . This theorem onlyproves that there exists an algorithm for which the sieving process up until finding a congruenceof squares, i.e. a pair ( x, y ) for which x ≡ y mod n , with a non-zero probability in thecomplexity bound given in the theorem above.This version of the algorithm will be extensively studied in chapter 5 and finishes by provingthe theorem above.To do this we need to use a strong basis in analytic number theory, which we will introducein chapter 4 along with a review of some probability theory. In chapter 2 we will recall thenecessary concepts from algebraic number theory and Galois theory, but for those familiar withthese subjects this chapter can be skipped.Page 8 of 114 Algebraic fundamentals
The reason the General Number Field Sieve is so effective is that it uses strong algebraic con-structions to get a factorization, which allows for far fewer computations to be made to get aneffective result. The discussion starts by considering these constructions and concepts, such asfield extensions, in all generality, but will swiftly collapse down to the specific cases needed forthis sieve. We begin by introducing some notation that we will use throughout.
Definition 2.1. • Let K and L be fields such that K ⊂ L , then L is called a field extension of K and isdenoted L/K . • K ( α ) is the smallest field extension of K that contains α . If α ∈ K then K ( α ) = K . • K [ x ] is the polynomial extension of K , such that K [ x ] = n ∞ X i =0 a i x i | ∀ i ∈ Z + : a i ∈ K o . Remark.
To prevent confusion ⊂ and ⊃ are proper sub- and super-constructions, while ⊆ and ⊇ can have equality.A field extension L/K can be seen as a K -vector space and when considered as such itbecomes a natural next step to think about the dimension of L as a K -vectorspace. This is calledthe degree and is denoted [ L : K ] = dim K ( L ). If there are multiple extensions, L ⊇ M ⊇ K ,called a tower of extensions, called a tower of extensions then the degrees of these extensions arevery well behaved. Theorem 2.2. Tower Law
Let K n ⊇ K n − ⊇ . . . ⊇ K be fields. Then[ K n : K ] = [ K n : K n − ] . . . [ K : K ] . Page 9 of 114f [ L : K ] < ∞ the extension is called finite, else it is infinite. To use the sieve it is importantto understand when such an extension is finite and when it isn’t. Let f ( x ) be an irreduciblemonic monovariate polynomial with coefficients in K of degree deg( f ) = d . As f ( x ) is irreducibleit has no roots in K , but it might have roots over other fields that contain K . Definition 2.3.
An irreducible polynomial f over a field K is said to be separable over K if ithas no multiple zeros in a splitting field. That is there exists a field L ⊃ K such that f ( x ) = ( x − α )( x − α ) . . . ( x − α d )in L , where f ( α i ) = 0.This definition is not extremely helpful and in practice it will be more useful to consider [[4],proposition 9.14] instead: Proposition 2.4. If K is a subfield of C then every irreducible polynomial over K is separable.By the fundamental theorem of algebra it is known that a polynomial f ( x ) with deg( f ) = d has exactly d roots in C . These roots are not exclusive, and a root α is the root of manypolynomials. However there is a unique monic polynomial of lowest degree that is of specificimportance: Definition 2.5.
Let f ∈ K [ x ] be a polynomial with coefficients in a field K for which α is aroot. Then f is the minimal polynomial of α in K if • f is monic. • f is irreducible. • ∀ g ∈ K [ x ] such that g ( α ) = 0, f divides g .This polynomial is unique up to units.Throughout this text we adopt the convention to denote the minimal polynomial for α ∈ K over K as irr K ( α ). The minimal polynomial gives rise to the following equivalencies, which arevital in the background throughout. Page 10 of 114 heorem 2.6. Let f ∈ K [ x ] with deg( f ) = d be a monic irreducible monovariate polynomialwith root α . Let h f ( x ) i denote the ideal generated by f ( x ).Then the following are equivalent:1. f ( x ) is the minimal polynomial for α in K [ x ]; f ( x ) = irr K ( α ).2. K ( α ) ∼ = K [ x ] / h f ( x ) i .3. K ( α ) is a field.4. [ K ( α ) : K ] = d and { , α, α , . . . , α d − } is a basis for K ( α ) over K .The above theorem reveals a condition for which the extensions K ( α ) is a finite degreeextension of K . This condition is called algebraicity and for an extension L/K an element α ∈ L is called algebraic exactly if it is a root for a polynomial f ( x ) ∈ K [ x ]. Remark.
There are two special cases to consider: • A complex number, α ∈ C , that is the root of a polynomial with coefficients in Z is calledan algebraic integer. • A complex number, α ∈ C , that is the root of a polynomial with coefficients in Q is calledan algebraic numberIt turns out that the structure of the algebraic numbers is very simple to describe. Theorem 2.7.
Every algebraic number is of the form αb where α is algebraic over Z and b ∈ Z \{ } .As just discussed: If irr K ( α ) has degree d , then it is only natural to ask about the other rootsof irr K ( α ). These are called the conjugates of α and have some interesting properties. Definition 2.8.
Let α ∈ C be algebraic over K ⊆ C . The conjugates of α , α = α , . . . , α d arethe roots of irr K ( α ).As K ⊂ C proposition 2.4 shows that any irreducible polynomial over K is separable, henceeach α i is distinct: irr K ( α ) = ( x − α )( x − α ) . . . ( x − α d ) . Page 11 of 114sing the conjugates [[1], theorem 5.6.1] proves that we only need to consider extensions by asingular algebraic number:
Theorem 2.9.
Let K ⊆ C and let α, β ∈ C be algebraic over K . Then there exists γ ∈ C algebraic over K such that K ( α, β ) = K ( γ ) . This can easily be extended to any finite extension K ( α , . . . , α n ) by using the fact that K ( α , α ) = ( K ( α )) ( α ). To work effectively with the general number field sieve there are some restrictions we place onour extensions. The first restriction placed upon the extensions is the restriction that they befinite. Hence, from now on, every extension
L/K ⊆ C is assumed finite. Special focus will beon algebraic number fields: Definition 2.10.
Let K ⊆ C , then K is an algebraic number field if K = Q ( α , . . . , α n ) suchthat for all i ∈ N , α i is an algebraic number.By theorem 2.9 we can equate K = Q ( α ) for some algebraic number α ∈ C . Moreover weknow that any algebraic number can be written as α = βb where β is an algebraic integer and b ∈ Z \{ } , but Q ( βb ) ∼ = Q ( β ) as b ∈ Q so it is safe to assume that α is in fact an algebraicinteger. Definition 2.11.
Let A be the set of all algebraic integers and let K be an algebraic numberfield. Then the set O K = A ∩ K is called the ring of algebraic integers over K .As the name suggests, O K is a ring. In fact, by [[1], theorem 6.1.4 and 6.1.6], Theorem 2.12.
Let K be an algebraic number field, then O K is integrally closed. i.e.1. O K is an integral domain. Page 12 of 114. ∀ α : α algebraic over O K ⇒ α ∈ O K .In generality O K is surprisingly well behaved allowing for some interesting properties, in-cluding that it’s field of fractions is K : f.f. ( O K ) = n αβ (cid:12)(cid:12)(cid:12) α, β ∈ O K o = K. Another of the properties which is important is [[1], theorem 6.5.3]:
Theorem 2.13.
Let K be an algebraic number field. Then O K is a Noetherian domain. Remark.
A terribly inconvenient, yet common, convention is that the ideals I ⊆ O K are justcalled ideals of K . Despite the confusion this might causes it will not pose a large problem, andwill be clear from context, as in practice K will be a number field, hence only has trivial ideals.It is nearly by definition that a maximal ideal of an integral domain is a prime ideal, but ingeneral the converse is not true. However, in the case of O K for an algebraic number field K this converse is true. Theorem 2.14.
Let K = Q ( α ) be an algebraic number field. Let P ⊆ O K be a prime ideal of O K , then P is a maximal ideal of O K .Now we have shown that O K is integrally closed, is a Noetherian domain, and each primeideal of O K is a maximal ideal. These three define an important class of domains, which arevery useful for the General Number Field Sieve. Definition 2.15.
An integral domain D that satisfies the following properties: • D is a Noetherian domain. • D is integrally closed. • Each prime ideal of D is a maximal ideal.is called a Dedekind domain. Page 13 of 114edekind domains admit a very interesting property, as they generalize the idea of uniquefactorization into primes which we know by the fundamental theorem of arithmetic holds forall z ∈ Z up to units, ±
1. In Dedekind domains this holds as well, however we must use theequivalent construction for number fields.
Theorem 2.16.
Let D be a Dedekind domain. Any non-trivial integral ideal is a product ofprime ideals and this factorization is unique up to order, i.e. for I ⊂ D an ideal with factorization I = Y i ∈ I P e i i = Y j ∈ J Q f j j , with prime ideals P i , Q j then there exists a permutation σ : I → J such that P i = Q j and e i = f j .To prove this we need some auxiliary measures: Proposition 2.17.
Let D be a Dedekind domain. Then every non-zero ideal I ∈ O K containsa product of one or more prime ideals. Definition 2.18.
Let D be an integral domain and let K be the quotient field of D . For eachprime ideal P of D we define the set ¯ P by¯ P = { α ∈ K : α P ⊆ D } . This set is an ideal and is called a fractional ideal of D . Proposition 2.19.
Let D be a Dedekind domain. Let P be a prime ideal of D . Then P ¯ P = D. With these facts recorded we proceed with the proof of theorem 2.16:
Proof.
This proof will come in two parts: Existence and Uniqueness. Let D be a Dedekinddomain such that there exists a non-trivial ideal of D that is not a product of prime ideals.Page 14 of 114xistance: As D is a Dedekind domain, it is Noetherian, and so by the maximality princi-ple there is a non-trivial ideal A of D that is maximal with respect to the property of not beinga product of prime ideals. Then by proposition 2.17 there exists prime ideals P , . . . , P k of D such that P . . . P k ⊆ A . Now let k be the smallest positive integer for which such a product exists. If k = 1 then P ⊆ A ⊂ D is a prime ideal, hence it is maximal. So A = P . This contradicts our assumptionthat A was not a product of prime ideals. Hence k ≥
2. By proposition 2.19 we have that¯ P P = D and so ¯ P P . . . P k = D P . . . P k . Hence we have that ¯ P A ⊇ P . . . P k . Moreover by proposition 2.19 we also have that A ⊆ ¯ P A .Now assume that A = ¯ P A , then A ⊇ P . . . P k . Which contradicts the minimality of k . Now assume A ⊂ ¯ P A , then as the latter is an ideal of D , by the maximality property of A , we get¯ P A = Q . . . Q l , for some prime ideals Q j , j ∈ { , . . . , l } . But then A = A D = A ¯ P P = P Q . . . Q l , which contradicts the way A was chosen. Hence every ideal of D is a product of prime ideals.Page 15 of 114niqueness: Suppose now that there exists a maximal ideal A of D , such that A = P . . . P k = Q . . . Q l . By the maximality principle this is a choice we can make. Then as P . . . P k ⊆ Q and Q is aprime ideal we have that P i = Q i for some i ∈ { , . . . , k } . Relabelling Q i as Q we get Q = P as P is a prime, hence maximal, ideal of D . Thus¯ Q Q . . . Q l = AQ = A ¯ P = ¯ P P . . . P k = P . . . P k . Now assume that A = A ¯ P , then A ¯ P P = AP and so A = AP . Define the fractional idealof A as ¯ A = ¯ P . . . ¯ P k . Then A ¯ A = P . . . P k ¯ P . . . ¯ P k . and so D = ¯ AA = A ¯ P ¯ A = P . but this contradicts the primality of P , so our equality assumption was faulty. Hence A ⊂ A ¯ P .Since A ¯ P is a maximal ideal of D , A ¯ P has exactly one factorization as a product of primeideals. Thus we deduce that k − l − k = l . Relabbeling this gives that for all i ∈ { , . . . , k } P i = Q i , proving that the decomposition into prime ideals is unique.Page 16 of 114 .3 Galois theory and the Frobenius element In this section we give a very concise introduction to a few ideas from Galois Theory. If youhave a basic understanding of finite Galois theory then there is no loss in simply skipping thissection.
Definition 2.20.
Let
L/K be number fields. Then L is said to be normal over K if everyirreducible polynomial f ∈ K [ x ] that has a zero in L splits over L .Normality is easily checked under certain conditions: Definition 2.21.
Let
L/K be number fields. Then L is a splitting field for K if for thepolynomial f ∈ K [ x ]: • f splits over L . • There is no M , L/M/K , such that M = L and f splits over M . Theorem 2.22.
A field extension
L/K is normal and finite if and only if L is a splitting fieldfor some polynomial f ∈ K [ x ].We can also speak of seperable extensions Definition 2.23.
Let K be a number field. Then a polynomial f ∈ K [ x ] is said to be separableif it splits into linear terms in K . Definition 2.24.
Let
L/K be number fields. Then F is said to be a seperable extension of K if for every α ∈ K , irr K ( α ) is seperable over L . Remark.
A finite, normal, and seperable extension of a number field is also called a Galoisextension.For seperable extensions we have an equivalence allowing us to reduce to one irreduciblepolynomial.:
Proposition 2.25.
For an extensions
L/K the following are equivalent:Page 17 of 114. L is a Galois extension of K .2. L is the splitting field of a seperable polynomial f ∈ K [ x ].With this in place we will now start talking about automorphisms. First we define the Galoisgroup: Definition 2.26.
Let
L/K be number fields. Then the Galois group of L over K is definedas the group of K -automorphisms, i.e. the group of automorphisms π : L → L that is fixedfor all elements in K . The group operation is composition of maps and the group is denotedGal( L/K ) . The existance of such a group is trivial as the identity function on L is a K -autmorphism,so will always lie in the Galois group. We make a distinction between real and complex K -automorphisms: A K -automorphism σ ∈ Gal(
L/K ) is real if σ : L → R and complex if σ : L → C . Definition 2.27.
Let
L/K be number fields with Galois group Gal(
L/K ) such that | Gal(
L/K ) | = n . Then the signature of Gal( L/K ) is sign (Gal(
L/K )) = ( r, s ) , where r is the number of real K -automorphisms and s is half the number of complex K -automorphisms in Gal( L/K ) suchthat n = r + 2 s .There is one element of the Galois group that is particularly interesting, which is the Frobe-nius element. The idea of this comes from the Frobenius automorphism, which we recall forgood measure. Definition 2.28. If K is a field of characteristic p >
0, the map φ : K → K defined by k k p is called the Frobenius monomorphism of K . When K is finite, φ is called the Frobeniusautomorphism.To be able to define the Frobenius elements over extensions we have a little setting up to do.For this let L/K be a Galois extension with Galois group Gal(
L/K ). Let P ⊆ O L lying overPage 18 of 114 ⊆ O K . Then we can define the decomposition group D ( P | p ) as the set of automorphisms inGal( L/K ) which fix P : D ( P | p ) = { σ ∈ Gal(
L/K ) | σ ( P ) = P } . Definition 2.29.
Let
L/K be a Galois extension with Galois group Gal(
L/K ), let P ⊆ O L bea prime ideal unramified in L/K lying over p ∈ O K . Then the Frobenius element, Frob P , isthe element of D ( P | p ) that acts as the Frobenius automorphism on the residue field. I.e. theunique Frob P ∈ D ( P | p ) such that • Frob P ∈ D ( P | p ). • For all α ∈ O L , Frob P ( α ) ≡ a q mod P , where q = |O K / p | .This Frobenius element for P is uniquelly determined, but is not unrelated. For example, if P and P ′ both lie over p ⊂ O K then the Frobenius elements are conjugate. This means thateach prime ideal in O K gives rise to a conjugacy class of Frobenius elements in Gal( L/K ). Proposition 2.30.
Let
L/K be a Galois extension with Galois group Gal(
L/K ), let P , P ′ ⊆ O L be prime ideals unramified in L/K , both lying over p ∈ O K . Then there exists σ ∈ Gal(
L/K )such that Frob P = σ Frob ′ P σ − . Proof.
Let α ∈ O L . Then σ Frob P σ − ( α ) = σ (cid:0) ( σ − ( α )) q + a (cid:1) , for some a ∈ P , and σ (cid:0) ( σ − ( α )) q + a (cid:1) = α q + σ ( a ) ≡ α q mod σ P . Page 19 of 114his is all we need for now, but in chapter 5 we will be expand our theory a little more.
Let K = Q ( α ) be a number field with Galois group Gal( K/ Q ). Let O K be the ring of algebraicintegers of K . It will prove useful in what is to come that we can relate the elements of a numberfield to rational number in Q . There are two natural maps to consider: Definition 2.31.
Let
L/K be an extension of number fields with [ L : K ] = n . Let σ , . . . , σ n be the K -automorphisms of L in C . For any α ∈ L we define the following maps:1. The trace of an element α : Tr L/K ( α ) : L → C ,α n X i =1 σ i ( α ) .
2. The field norm of an element α : N L/K : L → C ,α n Y i =1 σ i ( α ) [ L : K ( α )] . Remark.
If it is clear from context what field we are taking the norm over we will often justwrite N L/K = N .We will mostly be concerned with Galois extensions which makes the field norm far simpler.For this consider L/K with Galois group Gal(
L/K ), then for any α ∈ L the norm becomes N L/K ( α ) = Y σ ∈ Gal(
L/K ) σ ( α ) . These two maps have a few important properties we will make use of throughout.
Proposition 2.32.
Page 20 of 114. N L/K and Tr
L/K are homomorphisms from L → K .2. If α ∈ O L then N L/K ( α ) ∈ O K and Tr L/K ( α ) ∈ O K .3. Let α, β ∈ O L such that α | β , then N L/K ( α ) | N L/K ( β ).4. If k ∈ K , then N L/K ( k ) = k n and Tr L/K ( k ) = nk , where [ L : K ] = n . Remark.
Note that (2) implies that for all α ∈ O L : α ∈ O ∗ L ⇔ N L/ Q ( α ) = ± O Q = Z and Z ∗ = ± Definition 2.33.
Let K be a number field of finite degree. Let O K be the ring of algebraicintegers of K . Let I ⊂ O K be a non-trivial ideal of O K . Then N ( I ) = |O K / I | . However these definitions are not completely independent from eachother, as they are relatedfor principal ideals. In fact, for principal ideals the field and ideal norm overlap.
Theorem 2.34.
Let K be a number field of finite degree. Let I = h α i ⊂ O K . Then |O K / I | = N ( I ) = N ( h α i ) = | N ( α ) | . It will not come as a surprise that the ideal norm is also a homomorphism, such that forevery two non-zero integral ideals A , B ∈ O K , N ( AB ) = N ( A ) N ( B ). This is absolutely nottrivial, but intuitive enough for us to understand to leave the proof to [[1], theorem 9.3.2]. Nowwe have a definition for the ideal norm and an explicit, and reasonably simple, way to computethis for principal ideals. Up until now we do not have a general way to compute the ideal norm, There are multiple ways to define the norm of an ideal. The definition we adopt has the benefit of beingcompletely algebraic, which is why we use it.
Page 21 of 114ut we can get closer by recalling theorem 2.16 as the ideal norm is a homomorphism, so wejust have to consider how to compute the ideal norm for prime ideals.
Theorem 2.35.
Let K be an algebraic number field. Let P ⊂ O K . Then there exists a uniqueprime p ∈ Z , such that P | h p i . As this p ∈ Z is unique we say P lies over p , or equivalently p lies below P . This has animmediate effect on the ideal norm, as can be seen in the following theorem: Theorem 2.36.
Let K be an algebraic number field with [ K : Q ] = n . Let P ⊂ O K be a primeideal lying over p ∈ Z . Then N ( P ) = p f , for some integer f ∈ { , . . . , n } . Proof. As P lies above p we have that P | h p i . Hence h p i = PQ for some integral ideal Q ⊂ O K .As the norm is multiplicative we have that N ( h p i ) = N ( P ) N ( Q ) . As the K -conjugates of p comprise p n times we have N ( p ) = p n , and so N ( h p i ) = | N ( p ) | = N ( P ) N ( Q ) = p n . The f in the theorem is called the inertial degree and is used to define one of the mostinteresting objects in number theory, the ideal class group, which will define momentarily. Firstwe need some preliminary definitions and properties.Page 22 of 114 efinition 2.37. Let K be an algebraic number field with [ K : Q ] = n . Let p ∈ Z be primesuch that h p i = g Y i =1 P e i i . Then g is called the decomposition number, e i is the ramification index for P , and the followingproperties hold:1. The inertia degree, f i , can be computed as f = [ O K / P i : Z / h p i ].2. p is said to be ramified in K if, for some i , e i > p is said to be unramified in K if for all i : e i = 1.4. The following equation holds: g X i =1 e i f i = n. P is said to be (totally) split if for all i : e i = f i = 1, equivalently g = n . Remark.
This can also be defined generally for extensions
L/K , and follows naturally from theabove definition. For our purposes the above will suffice however.
We now define the concept of the (absolute) discriminant of a number field and immediatelyapply it to a theorem by Dedekind:
Definition 2.38.
Let K be a number field of finite degree with ring of algebraic integers O K .Let { a , . . . , a n } be an integral basis for O K and let { σ , . . . , σ n } be the set of K -automorphismsinto C . Then the (absolute) discriminant of K is∆ K = det σ ( a ) . . . σ ( a n )... . . . ... σ n ( a ) . . . σ n ( a n ) . Page 23 of 114 heorem 2.39.
Let K be an algebraic number field. Then the prime p ∈ Z ramifies in K ifand only if p | ∆ K .Now we define the ideal class group: Definition 2.40.
Let K be an algebraic number field and let A , B ⊂ O K be non-zero ideals.Then we say that A ∼ B , that is they are equivalent, if there exists α, β ∈ O K \{ } such that h b i A = h a i B . The equivalence classes under ∼ are called ideal classes and the set of ideal classesof K , denoted Cl K , is called the ideal class group. Moreover, the cardinality of the ideal classgroup, | Cl K | = h K , is called the class number of K .It is well known that h K is finite and that Cl K is an abelian group for K an algebraic numberfield. More can be said, especially about the generation of Cl K , and we will record these in quicksuccession. Theorem 2.41.
Let I ⊆ O K be a non-zero ideal. Then there exists a non-zero α ∈ I with (cid:12)(cid:12) N K/ Q ( α ) (cid:12)(cid:12) ≤ c K N ( I ) , for c K = (cid:18) π (cid:19) r n ! n n p | ∆ K | , where 2 r = |{ σ i | σ i a Q -automorphism , σ i ( K ) R }| . Remark. c K is called the Minkowski bound. Theorem 2.42.
Any ideal class c ∈ Cl K contains an ideal I such that N ( I ) ≤ c K . Theorem 2.43.
Let K be a number field. Then Cl K is generated by the classes of the primeideals [ P ], satisfying N ( P ) ≤ c K . In particular, any such P must lie above a prime p ≤ c K . In the last section we defined the absolute discriminant. It will not surprise the reader that therethen also is a relative discriminant. To do this we need to consider the following:Page 24 of 114 efinition 2.44.
Let
L/K be number field with rings of integers O L and O K respectively. Thefractional ideal , C O L |O K = { x ∈ L | Tr( x O L ) ⊂ O K } , is called Dedekind’s complementary module. It’s inverse, D O L |O K = C − O L |O K , is called the different of O L / O K . Remark.
With the usual abuse of notation we will write D L/K for the different D O L |O K By definition Dedekind’s complementary module contains O L and so D L/K is in fact anintegral ideal. The different is well-behaved in towers of number fields:
Proposition 2.45.
For a tower of fields
L/M/K/ Q : D L/K = D L/M D M/K .This different ideal, or simply ‘different’, is key in the definition of the relative discriminant;
Definition 2.46.
Let
L/K be number fields. The relative discriminant ∆
L/K is the ideal of O K defined by the relative norm of the different:∆ L/K = N L/K ( D L/K ) . This allows us to define a transitivity condition for the discriminant
Lemma 2.47.
Let
L/M/K be a tower of number fields. Then∆
L/K = ∆ [ L : M ] M/K N M/K (∆ L/M ) . Proof.
Applying to D L/K = D L/M D M/K the norm N L/K = N M/K ◦ N L/M we obtain∆
L/K = N M/K (∆ L/M ) N M/K ( D [ L : M ] M/K ) = N M/K (∆ L/M )∆ [ L : M ] M/K . Page 25 of 114 .5 Dirichlet’s Unit Theorem
In this last preliminary section we want to get an idea of how O K looks. Specifically, we wantto get an idea of the units in O K . Note that the units of a ring form a group called the unitgroup and is denoted O × K . Definition 2.48.
Let n ∈ N and ζ n = e πi/n . Then ζ n is a n th root of unity and the numberfield Q ( ζ n ) is called the n th cyclotomic field.These cyclotomic fields are in many ways the easiest number fields to work with becausethey behave especially nicely when n is an odd prime. To emphasize this let p ∈ Z be an oddprime for the remainder of the section. Theorem 2.49.
Let K = Q ( ζ p ), then O K = Z [ ζ p ].With that little note we return to what is at hand. Let µ ( K ) be the group of units containedin a number field K . Note that µ ( K ) is a cyclic group of order n under multiplication. Thefollowing proposition is then a tautology: Proposition 2.50.
Let K be a number field. Then µ ( K ) ⊂ O × K .The question now becomes: What are the other elements in O × K ? Recall from 2.3 that thesignature of a field, sign( K ) = ( r, s ), where r is the number of real Q -embeddings and 2 s thenumber of complex Q -embeddings. Theorem 2.51.
Let K be a number field. There exists a map l : K → R r + s − such that ker( l )is finite and cyclic, and ker( l ) ∼ = µ ( K ). Moreover l ( O × K ) ∼ = Z r + s − . This immediately leads us to the concluding theorem:
Theorem 2.52. Dirichlet’s Unit Theorem
Let K be a number field and let sign( K ) = ( r, s ).Then O × K ∼ = µ ( K ) × Z r + s − . Page 26 of 114 roof.
From theorem 2.51 we have that there exists a map l such that ker( l ) ∼ = µ ( K ) hence bythe isomorphism theorems K/µ ( K ) = K/ ker( l ) ∼ = im( l ) = Z r + s − , and as µ ( K ) is an abelian group and Z r + s − is free we get that K ∼ = µ ( K ) × Z r + s − .Page 27 of 114 The General Number Field Sieve
For the entirety of this section let n ∈ N be a given integer that we wish to factor. Without lossof generality assume that this n is odd, else we could simply consider m ∈ N where n = 2 s m ,and that n is composite, else a primality test, which can be done in polynomial time, such asthe AKS Primality Test [18], will show that n does not need factoring. The General NumberField Sieve (GNFS) then uses a congruence of squares modulo n to perform integer factorization.Assume that ( x, y ) is a pair that admits a congruence of squares modulo n so that x ≡ y mod n, where x = ± y . Then we can use ( x, y ) to find a non-trivial factorization by using the well-established fact that x − y = ( x + y )( x − y ). We then compute gcd( x + y, n ) = d andgcd( x − y, n ) = d , in the hope that this gives a non-trivial factorization, that is d , d = 1 or n .Assume that we are able to produce random integers x and y such that the above congruenceholds. Then for the hardest case, n = pq semiprime, Table 1 shows an exhaustive truth table:Table 1: Factorization cases for n = pqp | x + y p | x − y q | x + y q | x − y gcd( x + y, n ) gcd( x − y, n ) Factor?T T T T n n FT T T F n p
TT T F T p n
TT F T T n q
TT F T F n p q TF T T T q n
TF T T F q p
TF T F T 1 n FIt can be seen that we only have failure if p and q have the same conditions on the factorizationof x + y and x − y . This is because that means that x ≡ ± y mod n . If that is not the case thenwe always find a non-trivial factorization. How the NFS operates to find such congruent pairs( x, y ) is the goal of this section. Page 28 of 114 .1 Introducing the algorithm The Number Field Sieve is so named because of a sieving process happening both over Z anda ring Z [ α ] which are contained in the field Q and the number field Q ( α ) respectively. In thissection we will give a description on how this is done, while we go into detail, introducing formaldefinitions and rigor, in Section 3.3. However, the following commuting diagram captures thewhole process: Z [ x ] Z Z [ x ] / ( f ) ∼ = Z [ α ] Z /n Z x m mod f mod n x m mod n Before we can discuss how the NFS obtains a congruence we introduce the concept of smoothnumbers.
Definition 3.1.
Let z ∈ Z and let B ∈ N . Then z is said to be B -smooth if in the primefactorization of z , z = ± n Y i =1 p e i i , it holds that p i ≤ B for all i = 1 , . . . , n .It is clear that this definition is insufficient for O Q ( α ) as this ring does not solely consist ofintegers, however that is easily amended.For the sake of exposition we will consider O Q ( α ) = Z [ α ]. This is rarely the case for num-ber fields in general as this only happens for monogenic fields, but in section 3.3 we will see thatthis assumption can be dealt with algebraically so we only need to concern ourselves with thiscase. Let a − bα ∈ Z [ α ] and let h a − bα i be the ideal generated by it. As Q ( α ) is an algebraicnumber field we get by that Z [ α ] is a Dedekind domain, hence any ideal splits uniquely into aproduct of prime ideals: h a − bα i = n Y i =1 P η i i , Page 29 of 114oreover by theorem 2.36 we know that any prime ideal of Z [ α ] evaluates to a prime powerunder the norm map. This allows us to define an analogy to smoothness over Z : Definition 3.2.
Let a − bα ∈ Z [ α ] and let B ∈ N . Consider h a − bα i ⊆ Z [ α ] with factorizationinto prime ideals h a − bα i = n Y i =1 P η i i . Then a − bα is said to be B -smooth if for all N ( P i ) = p f i i it holds that p i ≤ B, where N : Q ( α ) → Q is the field norm.Using these definitions we can, for some fixed m to be defined later, define sets S Z and S Z [ α ] with smoothness bounds B and B ′ respectively, as follows: S Z = (cid:8) ( a, b ) ∈ Z | gcd( a, b ) = 1 , a − bm is B -smooth as in Definition 3.1 (cid:9) S Z [ α ] = (cid:8) ( a, b ) ∈ Z | gcd( a, b ) = 1 , a − bα is B ′ -smooth as in Definition 3.2 (cid:9) The bulk of the work done in the GNFS is done in determining these sets and we will go inmuch more detail in the next section. If we succeed in finding a set S ⊆ S Z ∩ S Z [ α ] , such that | S | ≥ π ( B ) + π ( B ′ ) + 1, then the GNFS uses the remainder of its computation time to find aset T ⊂ S such that Y ( a,b ) ∈ T ( a − bm ) is a square in Z , (1) Y ( a,b ) ∈ T ( a − bα ) is a square in Z [ α ] . (2)It is not a certainty that such a set T fulfilling these equations can be found. If it is not thenPage 30 of 114he GNFS can be restarted with larger smoothness bounds B and B ′ . For now assume we havefound such a set T . From here completion of the algorithm is nearly trivial:First we observe that f ′ ( m ) Q ( a,b ) ∈ T ( a − bm ) must give a square in Z , z say, and f ′ ( α ) Q ( a,b ) ∈ T ( a − bα ) must give a square in Z [ α ], ξ say, for which there exist elementary methods to find thesquare roots z and ξ . Remark.
We have multiplied equation (1) and (2) by the square of the derivative of a polynomialat m and α respectively. This is to ensure that z ∈ Z and ξ ∈ Z [ α ]. A more thorough discussionof this follows in 3.3.From here we compute gcd( z − N ( ξ ) , n ) and gcd( z + N ( ξ ) , n ). If this is non-trivial, thenthe result is a prime factor of n . Else we have to conclude failure and start again with differentparameters.This describes the algorithm in grand lines. Now it is time some mathematical rigor is introduced. The first step of the NFS is that we must choose a d ∈ N , d = 1 and define m = ⌊ n d ⌋ s.t.gcd( m, n ) = 1. This choice of m gives rise to a polynomial f ∈ Z [ x ] such that n | f ( m ).Over the years there have been many attempts at making the choice of polynomial as effectiveas possible, but the simplest and effectively cheapest way to do this is to use the “base- m ”method. For each integer m ∈ Z we can write n as a linear combinations of powers of m suchthat n = d X i =0 c i m i . When we say “cheapest” we mean computationally most effective.
Page 31 of 114hen we can define a polynomial f ∈ Z [ x ] such that f ( x ) = d X i =0 c i x i , where the c i ∈ { , . . . , m − } are the same as the base- m expansion of n . This guarantees that n | f ( m ), in fact f ( m ) = n , and deg( f ) = d . By [[17], prop. 3.2] we have that the leadingcoefficient c d = 1, such that f ∈ Z [ x ] is in fact monic. Remark.
Note that as | c i | ∈ { , . . . , m − } hence c i < m < n d we can see that the discriminantof f satisfies that | ∆( f ) | < d d n − d . We may also assume that f ∈ Z [ x ] is irreducible. To see this assume, to the contrary, that f ∈ Z [ x ] is reducible. This means that there exist non-trivial polynomials g, h ∈ Z [ x ] such that f ( x ) = g ( x ) h ( x ) . So a simple computation gives a non-trivial factorization, as g ( x ) and h ( x ) are assumed tobe non-trivial. As we may now assume f ( x ) to be monic and irreducible we have shown thefollowing: Proposition 3.3.
Let f ( x ) be a monovariate, monic, irreducible polynomial, with deg( f ) = d ,in Z [ x ] with roots α , . . . , α d , not necessarily in Z . Then for all i = 1 , . . . , d , f ( x ) = irr Z ( α i ) . Note that by assumption deg( f ) = d >
1. As f is an irreducible polynomial over Z [ x ] wehave that none of the roots of f lie in Z , for else f would split into linear factors over Z . Thisroot, α say, lies in C and as f ( α ) = 0 this means that α is an algebraic integer. Moreover, bytheorem 2.6 we have that Q ( α ) is a field and has basis (cid:8) , α, . . . , α d − (cid:9) over Q .Page 32 of 114 .2.2 Sieving over Z Now that we have defined a polynomial f ∈ Z [ x ] and we have defined a field extension Q ( α ) wecan start sieving for pairs ( a, b ) such that equation (1) and (2) hold. For the sake of expositionwe will focus on the mathematical aspects and therefore consider the pairs ( a, b ) over Z and Z [ α ]separately. Definition 3.4.
Let u ∈ Z , then the set of integer pairs defined by U = (cid:8) ( a, b ) ∈ Z | | a | ≤ u, < b < u, gcd( a, b ) = 1 (cid:9) is called the universe of sieving.Now let u be a large integer dependent on the to-factor n and define B = (cid:8) p ∈ Z | p prime , p ≤ B (cid:9) ∪ (cid:8) ± (cid:9) as the rational factor base of the sieve , fully dependent on the chosen smoothness bound B .Then there is a standard procedure to work through to fill up the set S Z : The rational factor base, normally, only contains the primes p ≤ B . In our case, however, appending ± a − bm < Page 33 of 114 lgorithm 1
Procedure to populate S Z Input:
Universe U of ( a, b ) pairs, smoothness bound B Output:
Set S Z = { ( a, b ) ∈ U | a − bm is B -smooth , gcd( a, b ) = 1 } Initialize array comprised of a − mb for all ( a, b ) ∈ U for each element in the array do compute the set, P ( a,b ) , of primes p , p ≤ B , such that a ≡ bm mod p for each p ∈ P ( a,b ) do divide a − bm by the maximal power of p , such that p does notdivide the quotient, and replace a − bm by this quotient end forif the the element in the array is ± and gcd( a, b ) = 1 then add ( a, b ) to S Z end ifend for We will accept that this algorithm terminates by choosing B large enough and returns theset S Z such that | S Z | ≥ π ( B ) + 1. As we have recorded the vectors { e p ( a − bm ) } p ≤ B of p -exponents for all p ≤ B prime for each a − bm we now define M ∈ M | S Z |× π ( B ) ( Z ) given by: ∀ ( a i , b i ) ∈ S Z , ∀ p j ∈ B : M = sign( a − b m ) e p ( a − b m ) · · · e p π ( B ) ( a − b m ) · · · · · · · · · · · · sign( a | S Z | − b | S Z | m ) e p ( a | S Z | − b | S Z | m ) · · · e p π ( B ) ( a | S Z | − b | S Z | m ) , where the sign-bit is defined as:sign( a i − b i m ) = a i − b i m <
00 otherwise . To achieve the situation in which equation (1) holds we now want to find an independent subsetof S Z . For this we first consider a pair a − bm ∈ S Z , then it is easily observed that for this to bePage 34 of 114 square in Z we must have that x = a − bm = Y p ∈B p e ( p ) = (cid:18) Y p ∈B p e ( p )2 (cid:19) . Hence x = ± Y p ∈B p e ( p ) . So it suffices for us to look for the independent subset of S Z modulo F , the finite field ofcharacteristic 2. Hence we define M = M mod 2 = ( m i,j mod 2) i ∈{ ,..., | S Z |} ,j ∈{ ,...,π ( B ) } . Let f ( a i + b i m ) be the i th row of M . As | S Z | > π ( B ) we have that there is a linearly dependentsubset in M and hence there is a non-trivial solution T Z ⊂ S Z to the linear equation: X ( a,b ) ∈ T Z f ( a − bm ) = 0 . With this set we obtain (1): Y ( a,b ) ∈ T Z ( a − bm ) is a square in Z . Z [ α ]To sieve over Z [ α ] we will attempt to have a similar construction as the process used for Z , butto do this we will first have to deal with a few of the obstructions that arise. First recall fromtheorem 2.36 that for a prime ideal P we have that N ( P ) = p f , Page 35 of 114here p ∈ Z is prime and f ∈ N . Then p is called the prime lying below P and f is calledthe inertia degree of P . Moreover a prime ideal for which f = 1 is called a first degree prime.For a first degree prime ideal, P , we have that the index [ Z [ α ] : P ] = p , hence gives rise to theisomorphism Z [ α ] / P ∼ = Z /p Z , hence Z [ α ] / P is a field. This gives us a direct link between Z [ α ] and Z /p Z : Theorem 3.5.
Let f ( x ) be a monic irreducible polynomial with coefficients in Z . Let α be aroot of f ( x ), then there is a bijective correspondence between the set P of first degree primeideals and the set { ( p, m ) : p prime , m ∈ Z /p Z , f ( m ) ≡ p } . Proof.
Let p be a first degree prime ideal of Z [ α ]. then [ Z [ α ] : p ] = p for some prime integer p so that Z [ α ] / p ∼ = Z /p Z . There is a canonical ring epimorphism θ : Z [ α ] → Z [ α ] / p such thatker( θ ) = p , hence for z ∈ p : p | θ ( z ), moreover for n ∈ Z such that p | n there is a z ∈ p so θ ( z ) = n . As θ is a homomorphism θ (1) = 1 and so ∀ n ∈ Z : θ ( n ) ≡ n mod p .Now let m = θ ( α ) ∈ Z /p Z . If f ( x ) = P di =0 a i x i with a d = 1 and a i ∈ Z then θ ( f ( α )) ≡ p as f ( α ) = 0, and so0 ≡ θ ( f ( α )) ≡ θ d X i =1 a i x i ! ≡ d X i =1 a i θ ( x ) i ≡ d X i =1 a i k i ≡ f ( m ) mod p, hence f ( m ) ≡ p and p determines the unique pair ( p, m ).Conversely, let p be a prime integer and m ∈ Z /p Z with f ( r ) ≡ p . Then there isa natural ring epimorphism that maps polynomials in α to polynomials in r . In particular θ ( a ) ≡ a mod p for all a ∈ Z and θ ( α ) ≡ m mod p . Let p = ker( θ ) so that p is an ideal of Z [ α ].Since θ is surjective that means that Z [ α ] / p ∼ = Z /p Z and so [ Z [ α ] : p ] = p . This implication isPage 36 of 114nique, hence we have the two unique implications,( p, m ) ֒ → p ֒ → ( p, m ) , proving the theorem.Hence finding these first degree prime ideals is equivalent to finding roots mod p for theminimal polynomial irr K ( α ). Finding roots of polynomials over finite fields has a well doc-umented background [5], for example Berlekamp’s algorithm or Cantor-Zassenhaus. We maytherefore assume that finding first degree prime ideals is easy and therefore assume we can findsufficient first degree prime ideals.Now that we have restricted our prime ideals we may generalize the smoothness test for Z [ α ].Buhler et. al suggested the following algorithm in [17]: For a smoothness bound B ′ ∈ Z , fornow further undefined, S Z [ α ] as defined above, and the polynomial f with root α = t . And with easy we mean effective, and in that not adding to our overall complexity, as the size of deg( f ) isfar smaller than the size of n . Page 37 of 114 lgorithm 2
Procedure to populate S Z [ α ] Input:
Universe U , polynomial f , smoothness bound B ′ Output:
Set S Z [ α ] = { ( a, b ) | a − bα is B ′ -smooth , gcd( a, b ) = 1 } for each prime p ≤ B ′ dofor each ( a, b ) ∈ U such that b p do initialize an array populated by N ( a − bα ) end forfor each r ∈ Z /p Z do compute the set R ( p ) = { r | f ( r ) ≡ p } end forfor each r ∈ R ( p ) doif a ≡ br mod p then retrieve N ( a − bα )divide N ( a − bα ) by the maximal power of p , such that p does not divide thequotient, and replace N ( a − bα ) by this quotient. end ifend forend forfor each element in the array corresponding to the pair ( a, b ) doif the element is ± then add the pair ( a, b ) to S Z [ α ] end ifend for Remark.
It is clear that if b ≡ p then there are no integers with ( a, b ) ∈ U and N ( h a − bα i ) = N ( a − bα ) ≡ p .What is left to us is to find the square in Z [ α ] that we wish to use. However following thePage 38 of 114ame procedure as the rational sieve would leave us, not with (2), but with N Y ( a,b ) ∈ S Z [ α ] ( a − bα ) is a square in Z . This is clearly a necessary condition for (2), but it is not sufficient. To combat this obstructionwe recall the pairs ( p, R ( p )) as defined in algorithm 3.2.3 and define the following: Proposition 3.6.
Let a, b ∈ Z , gcd( a, b ) = 1, and p ∈ Z prime. Let r ∈ R ( p ). Then thefunction e ( p,r ) ( a − bα ) = ord p ( N ( a − bα )) a − br ≡ p otherwise . , where for z ∈ Z : ord p ( z ) = f if p f || z , is well defined. Moreover N ( a − bα ) = Y ( p,r ) (cid:0) p e ( p,r ) ( a − bα ) (cid:1) . To see how this can be used to produce the square in Z [ α ] we need the last theorem of thissection Theorem 3.7.
Let S ′ be a finite set of coprime integer pairs ( a, b ) fulfilling equation (2). Thenfor each prime number p and each r ∈ R ( p ) we have X ( a,b ) ∈ S Z [ α ] e ( p,r ) ( a − bα ) ≡ Proof.
For each prime ideal P ∈ Z [ α ] define the group homomorphism ϕ P : Q [ α ] ∗ → Z suchthat1. ϕ P ( β ) ≥ β ∈ Z [ α ] , β = 02. if β ∈ Z [ α ] , β = 0, then ϕ P ( β ) > β ∈ P Page 39 of 114. for each β ∈ Q [ α ] ∗ one has ϕ P ( β ) = 0 for all but finitely many P , and | N ( β ) | = Y P N ( P ) ϕ P ( β ) . Then ϕ P is a P -adic valuation from a multiplicative group of units to an additive group ofintegers , hence, for gcd( a, b ) = 1, if P is not a first degree prime then ϕ P ( a − bα ) = 0 and if P corresponds to the pair ( p, m ) as defined in theorem 3.6 then ϕ P ( a − bα ) = e p,r ( a − bα ).Now let P i ∈ Z [ α ] be a first degree prime ideal and let Y ( a,b ) ∈ S Z [ α ] ( a − bα ) = ξ . Then, as ϕ P is a homomorphism, we get that X ( a,b ) ∈ S Z [ α ] e ( p,r ) ( a − bα ) = X ( a,b ) ∈ S Z [ α ] ϕ P i ( a − bα ) = ϕ P i Y ( a,b ) ∈ S Z [ α ] ( a − bα ) = ϕ P i ( ξ ) = 2 ϕ P i ( ξ ) ≡ . If our assumptions hold that means we can now use this outcome to do a similar process aswe did for the rational factor base and find the final set S Z [ α ] and use this to explicitly definethe set S = S Z ∩ S Z [ α ] . However these have not been inconsequential and will all be explainedin the following section. If we, however, suspend our disbelief for a moment longer we can finishthe algorithm.Assume that we have found ( z , ξ ) ∈ Z × Z [ α ] such that they fulfil equations (1) and (2).Then all that rests us to do is finding the square roots. First consider the rational case, so z Page 40 of 114nd Z . Then, by the sieving process, we know the prime factorization of z : z = Y p i ∈ Z p e i i , but then it is trivial to find z : z = Y p i ∈ Z p ei i . In the algebraic case, i.e. ξ and Z [ α ], there is a bigger challenge, as there is no simple way toconsider factorize ξ . One naive approach would be to compute the root of x − ξ , but thisis usually not efficiently achievable. We instead take a theoretical approach, with an eye oncomputational efficiency, and let q be an odd prime. If f mod q is irreducible in F q [ x ], then Z [ α ] /q Z [ α ] ∼ = F q [ x ] / ( f mod q ). It can be shown that with significant probability there existsan odd q such that f mod q is irreducible. To see this consider Theorem 3.8.
Let f ∈ Z [ x ] be an irreducible polynomial of degree d , d >
1. Then the density,inside the set of all prime numbers, of the set of prime numbers q for which f mod q factors in F q [ x ] into distinct irreducible non-linear factors exists and is at least d To prove this we need the following proposition:
Proposition 3.9.
Let G be a finite group that acts transitively on a finite set Ω, with d >
1. Then there are at least Gd elements of G that act without fixed points on Ω.Now we prove the theorem. Proof.
Let Γ = Gal( f / Q ), viewed as a permutation group of the set A = { α , . . . , α d } with rootsof f . For each prime number q that does not divide the discriminant of f , there is a Frobeniuselement σ q ∈ Γ with the property that the degrees of the irreducible factors of f mod q are thesame as the lengths of the cycles of the permutation σ q . Hence, we are interested in those q for which σ q acts without fixed points on A . Then by the Chebotarev Density Theorem, 4.31,every subset S ⊂ G that is closed under conjugation, the set of prime numbers σ q belongs tohas density C G . The theorem follows from proposition 3.9.Page 41 of 114o except for the extremely small probability that we can not choose any f for a specific n this means we can assume that there exists a q so f mod q is irreducible in F q [ x ]. The followingtheorem completes the square root finding process. Theorem 3.10.
Let q = q Z [ α ] be an ideal of Z [ α ] /q Z [ α ]. Then there exists a δ ∈ Z [ α ] suchthat δ ξ ≡ q . Proof. As Z [ α ] /q Z [ α ] ∼ = F q [ x ] /f mod q we know that | Z [ α ] /q Z [ α ] | = q d , where d = deg( f ).Now consider I = q Z [ α ] = (cid:26) d − X i =1 a i α i : q | a i (cid:27) , which is a degree d prime ideal in Z [ α ]. As f mod q is assumed irreducible it follows that f ′ ( α ) I and for each ( a, b ) ∈ S Z [ α ] we have that a − bα I since gcd( a, b ) = 1. Therefore ξ = f ′ ( α ) Q ( a,b ) ∈ S Z [ α ] ( a − bα ) I .Using Berlekamp’s algorithm, [5], we find an element δ mod I such that δ ξ ≡ I ,completing the proof.Hence there exists an element δ that is the inverse of a square modulo q . Now we can applyNewton–Rhapson iteration ([5],[17]) to find approximations such that δ j ≡ δ j − (3 − δ j − ξ )2 mod ( q Z [ α ]) . In a finite number of steps we will find a γ such that γ ≡ δ j ξ mod ( q Z [ α ]) , where γ = ξ in Z [ α ]. To complete the algorithm we compute gcd( z − N ( ξ ) , n ) and gcd( z + N ( ξ ) , n )to find a factorization as described in section 3.1.Page 42 of 114 .3 Dealing with obstructions So far the technique we have been trying to describe can be captured in the following twobi-implications:Equation (1) ⇔ Y ( a,b ) ∈ T ( a − bm ) has non-negative even exponents at all primes p ≤ B, (3)Equation (2) ⇔ Y ( a,b ) ∈ T ( a − bα ) has even exponents at all prime ideals P ∈ Z [ α ] . (4)It is clear to see that the first of these bi-implications holds, as we can find the root by simplydividing all exponents by 2. The second of these is not completely clear however. One of the firstassumptions we made was that O Q ( α ) = Z [ α ], but this is rarely the case. In fact this is somethingthat was believed true until in the 19’th century and even featured in (eventually disproven)proposed proofs of Fermat’s Last Theorem [6]. This assumption leads to four obstructions thathave to be mitigated. For this let ω = Q ( a,b ) ∈ T ( a − bα ).1. Z [ α ] is not necessarily O Q ( α ) . At best we know that Z [ α ] ⊆ O Q ( α ) , hence we can not assumethat Z [ α ] is a Dedekind domain so ω O Q ( α ) might not be the square of an ideal in Z [ α ].2. If ω O Q ( α ) is the square of some ideal I , it is not certain that I is a principal ideal.3. If ω O Q ( α ) is the square of some principal ideal γ O Q ( α ) , it is not certain that ω = γ as ω agrees with γ only up to units of O Q ( α ) .4. And even if ω = γ , then we are not assured that γ ∈ Z [ α ]Clearly these are obstructions that break the number field sieve. To deal with obstruction 4recall the standard fact that for any θ ∈ O Q ( α ) and f ( x ) = irr Q ( α ) we have that f ′ ( α ) · θ ∈ Z [ α ] . Page 43 of 114o we simply multiply equation (2) by f ′ ( α ) and equation (1) by f ′ ( m ) . The only conditionwe must apply is that gcd( f ′ ( m ) , n ) = 1 or else the resulting element given by equation (1) maynot be invertible, however this is simply checked and if this is not the case then we have founda factorization of n so we may simply assume that f ′ ( m ) and n are coprime.Now we attempt to subvert obstruction 1, which consequently allows us to negate both ob-structions 2 and 3 as well. To do this we will consider the following chain: V ⊃ V ⊃ V ⊃ V = V ∩ (cid:0) Q ( α ) × (cid:1) , where V is the group generated by Q ( α ) × with even exponents, i.e. e ( p,r ) ( v ) ≡ v ∈ V . This is a group with the following subgroups: V = { v ∈ V | v O Q ( α ) = I for some I ⊂ O Q ( α ) } ,V = { v ∈ V | v O Q ( α ) = h ϑ i for some h ϑ i ⊂ O Q ( α ) } ,V = V ∩ ( Q ( α ) × ) . We attempt to bound the index [ V : V ]. Considered as a Z -module O Q ( α ) is free of rank d =deg( f ) and by definition Z [ α ] ⊂ O Q ( α ) . It is a well-known identity in algebraic number theorythat ∆( f ) = (cid:2) O Q ( α ) : Z [ α ] (cid:3) · ∆. and as [ Q ( α ) : Q ] > > (cid:2) O Q ( α ) : Z [ α ] (cid:3) is bounded by p ∆( f ). Remark.
Recall that ∆ is the discriminant of the number field as in Definition 2.38As we have factorization into prime ideals in O Q ( α ) there is a bijection between the primeideals Q of O Q ( α ) coprime to (cid:2) O Q ( α ) : Z [ α ] (cid:3) and the ideals I ⊂ Z [ α ] coprime to this index. Infact if we only consider the prime ideals P ⊂ Z [ α ] we get an isomorphism of local rings: Z [ α ] P → (cid:0) O Q ( α ) (cid:1) Q , Page 44 of 114 P = Q ∩ Z [ α ] . This allows us to generalize the definition of the map e ( p,r ) of proposition 3.6. Proposition 3.11.
Let P ∈ Z [ α ] be a prime ideal of Z [ α ]. Then e P ( a − bα ) = X Q ⊃ P f ( Q / P ) e ( q,r ) ( a − bα ) , where Q lies over a prime q ∈ Z .In the case that P does not divide the index (cid:2) O Q ( α ) : Z [ α ] (cid:3) , e P ( a − bα ) = e ( q,r ) ( a − bα ) asthere will only be one Q such that P ⊂ Q and f ( Q / P ) = 1.Now consider the following map V /V → M Q | [ O Q ( α ) : Z [ α ] ] Z / Z ,x ( e Q ( x ) mod 2) Q . This is an injective homomorphism, hence
V /V is a F -vectorspace of dimension bounded by (cid:12)(cid:12)(cid:8) Q | Q | (cid:2) O Q ( α ) : Z [ α ] (cid:3) (cid:9)(cid:12)(cid:12) . As the number of rational primes dividing the index is no more than | ∆( f ) | and for each of these primes there are at most deg( f ) prime ideals Q ⊂ O Q ( α ) thatdivide it, we obtain dim F ( V /V ) ≤ d f ) . This inequality is the first step to resolving the first obstruction. Further we will bound thedimensions dim F ( V /V ) and finally dim F ( V /V ) to obtain our final result.As the class group, Cl Q ( α ) , is a finite abelian group and that its order h is, [13], boundedby h < p | ∆( f ) | d − | ∆( f ) | d − ( d − . Page 45 of 114efine the map κ : V → Cl Q ( α ) by x I where x O Q ( α ) = I . By definition ker( κ ) = V , hencedim F ( V /V ) ≤ log h log 2 . Not that V consists of elements that are squares in Q ( α ) × up to units in O Q ( α ) , hence by [[13],8.3] | V /V | ≤ (cid:12)(cid:12)(cid:12) O ∗ Q ( α ) / (cid:0) O ∗ Q ( α ) (cid:1) (cid:12)(cid:12)(cid:12) ≤ d. This leads to the following theorem:
Theorem 3.12.
Let V be as above and suppose that n > d d >
1. Then the subgroup V = V ∩ ( Q ( α ) × ) of squares in V satisfiesdim F ( V /V ) ≤ log( n ) This shows that the general number field sieve algorithm is certain to generate an elementin V , however we are not sure yet that this is also a square, hence is in V . To obtain thiswe introduce the concept of quadratic characters. Let us start by proposing the idea over Z :Assume x, y ∈ Z then x is a quadratic residue modulo a prime p if x ≡ y mod p. This is easily tested by the Legendre symbol: (cid:18) xp (cid:19) = x p − ≡ ( y ) p − = y p − ≡ p. Now assume that x is a square, then it is also a square modulo p for every prime p such that p ∤ x . This means that for all p ∈ Z , p prime: (cid:18) xp (cid:19) = 1Page 46 of 114o if there is at least one p for which( xp ) = − x is not a square modulo p and by extensionnot a square element. This generalizes well to Q ( α ).Let P be a first degree prime ideal of Z [ α ] lying over p and let ( p, R ( p )) be the set as defined inalgorithm 3 . .
3. As there exists a ring homomorphism: π : Z [ α ] → Z /p Z with ker( π ) = P bydefinition this gives rise to a Legendre symbol: (cid:18) · P (cid:19) : Z [ α ] → Z /p Z → {± , } , such that for a non-square x ∈ Z [ α ] we have (cid:16) x P (cid:17) = − . Restricting toLegendre symbols coming from P over a prime p ∈ Z such that p > B ′ we avoid the charactervalue 0, and as a consequence of Chebotarev’s Density Theorem we have that the Legendresymbols coming from P are equidistributed over the space of homomorphisms from V /V to {± } , denoted Hom( V /V ) , {± } ).Now let χ P = (cid:16) · P (cid:17) and consider the following lemma. Lemma 3.13.
Let k, r be non-negative integers, and let E be a k -dimensional F vector space.Then the probability that k + r elements that are uniformly at random drawn from E form aspanning set for E is at least 1 − − r Then the equidistribution over Hom(
V /V , {± } ) with the obtained bound on the dimensionof V /V as a F vector space from theorem 3.12 makes it overwhelmingly likely that a set of B ′′ = ⌊ n )log 2 ⌋ quadratic characters span the homomorphism space. If they do, then the converseto the final theorem of this section shows that if β ∈ Z [ α ] \{ } satisfies χ Q ( β ) = 1 for all firstdegree primes Q with 2 β Q , then β is in fact a square in Z [ α ]. Furthermore, there can be afinite number of exceptions which is explored greater in [17]. Theorem 3.14.
Let S be a finite set of integer pairs ( a, b ) such that gcd( a, b ) = 1 fulfilling that See [17] for details.
Page 47 of 114or some γ ∈ Q [ α ]: ω = γ . Moreover let q be a first degree prime ideal corresponding to the pair ( s, q ) that does not divide h a − bα i for any pair ( a, b ) and for which f ′ ( s ) q . Then we have Y ( a,b ) ∈ S (cid:18) a − bsq (cid:19) = 1 . Proof.
Let ϕ : Z [ α ] → Z /q Z be the ring homomorphism given by ϕ ( α ) = r mod q and letker( ϕ ) = Q . Then Q is a first degree prime ideal corresponding to the pair ( q, r ). Let χ Q : Z α →{± } and let ψ Q : Z [ α ] / Q → Z /q Z \{ } , such that χ Q = Leg Q ◦ ψ Q where Leg Q : Z /q Z → {± } is the Legendre symbol. Then χ Q ( a − bα ) = (cid:18) a − brq (cid:19) . Letting, as discussed, ξ = f ′ ( α ) Y ( a,b ) ∈ S ( a − bα ) , for some ξ ∈ Z [ α ]. By the hypothesis the factors on the left are not in Q , so we have that ξ Q .However χ Q ( ξ ) = 1 ,χ Q ( f ′ ( α ) ) = 1 , so χ q Y ( a,b ) ∈ S (cid:18) a − brq (cid:19) = 1 . This completes the proof.So now we can be almost certain that we can find a square in Z [ α ] and therefore havemitigated all obstructions. The only question left is: What is the complexity?Page 48 of 114 .4 Computational efficiency Let us record the number field sieve algorithm step-by-step:
Algorithm 3
General Number Field Sieve
Input: n ∈ N , n odd composite, degree d >
1, universe U , smoothness bounds B and B ′ Output:
A factor of n or ¬ for failure.Choose m = ⌊ n d ⌋ .Define f ( x ) ∈ Z [ x ] as the polynomial with coefficient corresponding to the base- m expansionof n . for each ( a, b ) ∈ U do Run algorithm 3.2.2 to find S Z Run algorithm 3.2.3 to find S Z [ α ] Use the techniques from section 3.3 to find quadratic character base S Q such that | S Q | = B ′′ . end forif (cid:12)(cid:12) S Z ∩ S Z [ α ] (cid:12)(cid:12) > | B | + | B ′ | + 1 then Use a reduction algorithm over F to find an dependent subset S ⊂ S Z ∩ S Z [ α ] Compute z = Q ( a,b ) ∈ S ( a − bm ) and use its known prime factorization to find z Compute ζ = Q ( a,b ) ∈ S ( a − bα ) and use Newton–Rhapson iteration to find ζ if z ∈ Z and ζ ∈ Z [ α ] thenreturn gcd( z − ζ , n ), gcd( z + ζ , n ) else find a new dependent subset S . If all such sets produce no result return ¬ end ifelse return ¬ end if It is generally considered difficult to do a complexity analysis on the number field sieve asit is stated in all its generality. Already when it was analysed in [17] there was a conjecturedPage 49 of 114omplexity of L n , r
649 + o (1) ! , and over the 30 years that followed it showed that the conjectured complexity held up heuris-tically, however there are some things that we are able to say. Following [13] we are able tocarefully choose f and thereby deg( f ) to optimize the smoothness bound B ∗ = max B, B ′ andparameter u for the universe of sieving U . The “basic” cost of the algorithm is computed to be u o (1) + y o (1) .First we will address step 6. As the binary matrix that is formed will be sparse, i.e. there willbe significantly more zeroes than ones, it allows us to use fast algorithms to find dependencies.Two of these extremely fast algorithms are Block–Lanczos ([19],[20]) and Block–Wiedemann [21]which both have similar complexity. The size of these matrices are at most B ∗ × B ∗ , and bychoosing log B ∗ ≈ log u we balance the contributions of U and the matrix reduction.Looking at the sieving procedure we can see that a pair ( a, b ) is B ∗ -smooth if ( a − bm ) N ( a − bα ) is B ∗ -smooth, and thus, using that the size of m is approximately d √ n , we may bound this productby u d √ n ( d + 1) u d d √ n ≈ n d u d +1 , by choosing d optimally, namely d ≈ (cid:18) n log log n (cid:19) . The probability that a number x , of the right size, is smooth can be approximated by r − r where r = log x log y . In order to maximize this probability we choose r = log x log u ≈ d ′ d ′ + d ′ + 1 , for d ′ = q n log u . Page 50 of 114his means we get dependency in the matrix if we have at least y ≈ u r − r ( a, b ) pairs. Aswe have assumed log y ≈ log u taking logarithms we get log u ≈ r log r , equivalently r ≈ log u log log u .Hence, by taking powers: log u (log log u ) ≈ n ) . As we have, for a, s, t ∈ R , that s = t (log t ) a goes to t = (1 + o (1) s (log s ) − a as t goes to infinity,we get that log y ≈ log u ≈ n ) (cid:18)
13 log log n (cid:19) = (cid:18) (cid:19) (log n ) (log log n ) . The final GCD computation, which even with the most simple implementation: Euclid’s algo-rithm, has a complexity at most O (log n ) and as such can be completely disregarded in thecomplexity analysis.Hence, with this choice of basic parameters u, y and d ≈ d ′ ≈ (cid:0) n log log n (cid:1) we get the conjec-tured runtime of L n (cid:16) , q + o (1) (cid:17) , however this is fully heuristic and very little strong can beproven rigorously for this particular form of the number field sieve even with modern techniques.Page 51 of 114 Preparing for randomness
Before now we have been able to suffice with elementary and algebraic number theory and someGalois theory. To introduce randomness we will be drawing strongly on analytic number theory.It was Riemann who really gave analytic number theory a kickstart by introducing the Riemannzeta function.
Remark.
Unless otherwise noted p, q ∈ Z are prime numbers. Definition 4.1.
Let s ∈ C and n ∈ N , then the Riemann zeta function is defined as ζ ( s ) = ∞ X n =1 n s . This converges for all s such that Re( s ) >
1, has an analytic continuation to C except for asimple pole at s = 1, and admits a functional equation π − s Γ (cid:16) s (cid:17) ζ ( s ) = π − − s Γ (cid:18) − s (cid:19) ζ (1 − s ) , where Γ( s ) = R ∞ e − t t s − dt . Moreover for Re( s ) > ζ ( s ) admits an Euler product: ζ ( s ) = Y p (cid:18) − p − s (cid:19) . Remark.
The proofs for all the properties listed in the definition can be found in any undergrad-uate analytic number theory book. We suggest [8] or [10].One of the first results of analytic number theory is the prime number theorem.
Theorem 4.2.
Let π ( x ) = P p ≤ x a > π ( x ) = Li( x ) + O (cid:18) x log x exp (cid:16) − a p log x (cid:17)(cid:19) = x log x (1 + o (1)) , Page 52 of 114here Li( x ) = Z x t dt = x log x (cid:18) O (cid:18) x (cid:19)(cid:19) is the logarithmic integral.One of the most influential open questions in mathematics is related to the zeta function:The Riemann Hypothesis. Proposition 4.3. [Riemann Hypothesis]
Let ζ ( s ) be the Riemann zeta function. ForRe( s ) >
0, if ζ ( s ) = 0 then Re( s ) = .This hypothesis has been so extensively tested that it is often accepted as true in the math-ematical community. Doing so there is a whole section of number theory that is considered‘conditional’, which will become unconditional the moment that the Riemann Hypothesis isproven. There is a very interesting result by Hardy, proven in 1914, which we will record forgood measure, [theorem 14.8, [8]]. Theorem 4.4. [Hardy’s Theorem]
There exist infinitely many t ∈ R such that ζ (cid:18)
12 + it (cid:19) = 0 . However for the Riemann Hypothesis to be true this is not enough, as we want all the zeroesto be on the critical line of ℜ ( s ) = . This is still one of the major research areas in analyticnumber theory, but there are some results. The first such was by Selberg in 1942, who showedthat there is some non-zero fraction of zeroes on the critical strip. This was later improved byLevinson who showed that at least 34 .
7% of the zeroes lie on this line in 1974-1975, and thiswas improved to 40% by Conrey in 1989. Since then there have been, to my knowledge, nosignificant improvements leaving this topic wide open for future research.Page 53 of 114 .2 Generalizing the Riemann zeta function
The Riemann zeta function is not the only function of its kind. The idea of the Riemannzeta function was generalized in two directions each with their own implications. Dirichletintroduced a generalization through the use of multiplicative character functions and Dirichletseries to define a whole family of meromorphic analytic arithmetic functions. Dedekind on theother hand chose the algebraic route and defined the Dedekind zeta functions, which are definedover number fields. For the former we shall introduce Dirichlet characters, which extends theidea behind quadratic characters.
Definition 4.5.
Let G be a group. A character on G is a group homomorphism χ : G → C × and the set of characters of G is denoted ˆ G .Generally characters possess a few properties: Proposition 4.6.
Let χ be a G -character, then:1. As χ is a homomorphism: ∀ g, g ′ ∈ G : χ ( gg ′ ) = χ ( g ) χ ( g ′ )2. ∀ G, ∀ χ : χ ( e G ) = 13. ∀ G ∃ χ s.t. ∀ g ∈ G : χ ( g ) = 1. This is called the trivial or principal character.These properties reveal that ˆ G might have a group structure, and this is true for the finitecase: Lemma 4.7. If G is a finite group then so is ˆ G .There is one more property to consider for general characters before we can introduce Dirich-let characters and that is orthogonality: Page 54 of 114 efinition 4.8. Let G be a finite group with set (and as G is finite: group) of characters ˆ G .Then G is said to have the orthogonality property if the following two equations hold: X g ∈ G χ ( g ) = G, χ = χ , χ = χ X χ ∈ ˆ G χ ( g ) = G , g = e G , g = e G Remark.
For the sake of clarity: Note that the first sum requires a fixed character and sumsover all elements, while the second requires a fixed element and sums over all characters of G .It is an elementary result from representation theory that the orthogonality property holdsfor all finite groups, and in particular finite cyclic groups, which is of importance as we shallnow see: Definition 4.9.
1. Let q ∈ N . Then a Dirichlet character mod q is a character of the form χ : ( Z /q Z ) × → C .2. Let χ be a Dirichlet character mod q , we extend χ to an arithmetic function χ : Z → C as χ ( n ) = χ ( n mod q ) , gcd( n, q ) = 10 , gcd( n, q ) > . Moreover a Dirichlet character mod q has the orthogonality property: d X n =1gcd( n,q )=1 χ ( n ) = φ ( q ) , χ = χ , χ = χ , Page 55 of 114nd X χ mod q χ ( n ) = φ ( q ) , n ≡ q , otherwise . As the Dirichlet characters are defined over Z /q Z , and we will often consider q a prime, wehave a vested interest in how characters of cyclic groups behave. It turns out that these arerather well-behaved: Theorem 4.10.
Assume that G is a finite cyclic group such that | G | = n and h a i = G . Then:1. ˆ G has exactly n elements: χ k ( a m ) = ζ kmn , k ∈ { , . . . , n } . where ζ n = e πin is the n’th root of unity.2. G has the orthogonality property (either as defined in Definition 4.8 or as a Dirichletcharacter3. ˆ G is cyclic with generator χ , hence G ∼ = ˆ G . Proof.
Let χ ∈ ˆ G . Then χ ( a ) = ζ kn for some k ∈ { , . . . , n } . Hence χ ( a m ) = χ ( a ) m = ζ kmn , proving part (1). By (1) ˆ G is cyclic and generated by χ , so G ≡ ˆ G as required. To see that G has orthogonality of characters it is simple to see that χ follows the definition of orthogonality.To illustrate we show it according to Definition 4.8. Note that X g ∈ G χ ( g ) = | G | Page 56 of 114s trivial. Hence assume that k ∈ { , . . . , n − } . Then X g ∈ G χ ( g ) = n − X m =0 χ k ( a m ) = n − X m =0 ζ kmn = 1 − ζ knn − ζ kn = 0 . By the duality of this identity the identity of ˆ G holds too, hence we have shown (2).If q is not prime we will still be dealing with a special group, as ( Z /q Z ) × will still be abelian,hence can be written as a direct product of cyclic groups. Therefore we can use theorem 4.10 tosay the following: Lemma 4.11.
Let G , G be finite cyclic groups and let G ∼ = G × G . Let χ i ∈ ˆ G i for i = 1 , χ : G → C × via χ ( g , g ) = χ ( g ) χ ( g ). This is a character. Conversely, if χ ∈ ˆ G then there exists a unique choice of χ ∈ ˆ G and χ ∈ ˆ G such that χ ( g ) = χ ( g ) χ ( g ).Furthermore ˆ G ∼ = ˆ G × ˆ G and G has orthogonality of characters and if G is finite abelian then G ∼ = ˆ G .It is clear that if q is an odd prime, then we can define χ ( n ) = (cid:18) np (cid:19) , where the right hand side is the Legendre symbol. This is a character, and it is easy to show thatthis is, in fact, a Dirichlet character. It is therefore that Dirichlet characters are a generalizationof quadratic characters, which will be extremely useful when we randomize the number field sieve.Another very useful property is that, by definition, Dirichlet characters are (quasi-)periodic. Definition 4.12.
Let χ be a Dirichlet character mod q . We say that d ∈ N is a quasi-periodif for all m ≡ n mod d such that gcd( mn, q ) = 1: χ ( m ) = χ ( n ). Moreover the least such d iscalled the conductor.It is obvious from the definition that if there are two quasi-periods d , d then there is a third:gcd( d , d ). As q is always a quasi-period we have that the conductor of a Dirichlet charactermod q is at most q . We can however say something much stronger about the conductor of q :Page 57 of 114 roposition 4.13. Let χ be a Dirichlet character mod q with conductor d . Then d | q . Proof.
Let g = gcd( d, q ). Suppose m ≡ n mod g and gcd( mn, q ) = 1. By Euclid’s algorithmthere exists x, y ∈ Z such that m − n = dx + qy , thus χ ( m ) = χ ( m − qy ) = χ ( dx + n ) = χ ( n ) . Hence g is a quasiperiod of χ . As d is the conductor that means d | g and since g = gcd( d, q )we have that d | q .As per usual we have now found a way to consider characters “prime”: Definition 4.14.
A Dirichlet character modulo q is called primitive if it has conductor q . Remark.
Convention dictates that the trivial character χ is not primitive.This idea of primitive characters is especially useful when you are working with inducedcharacters. Taking inspiration from the notation as it is used in representation theory consider d | q and consider χ ∗ , a character mod d , and define χ ( n ) = χ ∗ ( n ) , gcd( n, q ) = 10 , otherwise . Then χ ( n ) is a multiplicative character and has period q , so it is a Dirichlet character mod q .This is called the Dirichlet character mod q induced by a Dirichlet character mod d andwe will denote it Ind qd ( χ ). We conclude with the following theorem about induced Dirichletcharacters: Theorem 4.15.
Let χ be a Dirichlet character mod q with conductor d . Then there exists aunique character χ ∗ mod d such that χ ( n ) = Ind qd ( χ ∗ ( n )) . L -functions The definition of Dirichlet characters leads directly to the definition of Dirichlet L -functions:Page 58 of 114 efinition 4.16. Let q ∈ N and let χ be a Dirichlet character modulo q . Then the Dirichlet L -function associated to χ , denoted L ( s, χ ), is defined as L ( s, χ ) = ∞ X n =1 χ ( n ) n s , for R ( s ) > χ is a completely multiplicative character function we can immediately conclude that L ( s, χ ) is absolutely convergent for R ( s ) > L ( s, χ ) = Y p (cid:18) − χ ( p ) p s (cid:19) − , where p is prime. The behaviour of the trivial character and the non-trivial characters is quitedifferent. In the case of the trivial character we can rewrite the Euler product: L ( s, χ ) = Y p (cid:0) − p − s (cid:1) ζ ( s ) . This means L ( s, χ ) behaves like ζ ( s ) and allows for a meromorphic continuation with a singlesimple pole at s = 1. When χ is not trivial the behaviour changes radically. Theorem 4.17.
Let q ∈ N and let χ be a Dirichlet character modulo q such that χ = χ . Then L ( s, χ ) converges and is holomorphic for R ( s ) > Remark. R ( s ) = 0 is the abscissa of convergence for L ( s, χ ), but despite the convergence we cannot extend the definition of the Euler product of L ( s, χ ) to R ( s ) < R ( s ) = 1 in this case one might be interested in whatthese values actually are. This has been a problem for analytic number theorists for quite sometime, but we can say the following: Page 59 of 114 heorem 4.18. Let Q ∈ N and let χ be a Dirichlet character mod q . Then Y χ = χ L (1 , χ ) = 0 . This is absolutely not trivial and a particularly concise proof is presented in [[8], theorem4.9]. The first of two jewels in this is the following theorem by Dirichlet.
Theorem 4.19.
Let q ∈ N and let a ∈ Z such that gcd( a, q ) = 1. Then there are infinitelymany primes p ≡ a mod q Remark.
Despite not denoting the proof explicitly one the important step is that we show that X p ≡ a mod q p diverges. The theorem follows immediately.It might not surprise that with the generalization of the Riemann zeta function there followsa generalization of the Riemann Hypothesis: Proposition 4.20. [Generalized Riemann Hypothesis]
Let L ( s, χ ) be a Dirichlet L -function with χ a primitive character mod q, q >
1. Then for Re( s ) ∈ (0 , L ( s, χ ) = 0then Re( s ) = .We will come back to Dirichlet L -functions when we adapt the prime number theorem toarithmetic progressions in Section 4.3. Now we consider the other generalization of the zeta function, though it can in this case veryaccurately be called an extension.
Definition 4.21.
Let K be an algebraic number field and let I ⊂ K be an ideal. Then thePage 60 of 114edekind zeta function over K is defined as ζ K ( s ) = X I ⊆O K N K ( I )) s . The Dedekind zeta function is for R ( s ) > P ⊆ K be a prime ideal, then: ζ K ( s ) = Y P ⊆O K (cid:0) − N K ( P ) − s (cid:1) . It was proven by Hecke that the Dedekind zeta function admits an analytic continuation to thewhole complex plane and also admits a functional equation, so it is clear that in many ways theDedekind zeta function behaves like the Riemann zeta function. This correlation is so close thatit culminates in the following theorem:
Theorem 4.22. Landau, 1903 - Prime Ideal Theorem
Let K be an algebraic number field and let P ⊆ O K be a prime ideal and define π K ( x ) = |{ P ∈ O K | N ( P ) ≤ x }| . Then for x → ∞ , π K ( x ) ≍ x log x . For our purposes it is sufficient to realize that the Dedekind zeta function over K behaves alot like the Riemann zeta function, and that the same language can be used to talk about theDedekind zeta function as what we use for the Riemann zeta function. In fact, there is also aRiemann Hypothesis for the Dedekind zeta functions: Proposition 4.23. [Extended Riemann Hypothesis]
Let K be a number field. ForRe( s ) ∈ (0 , ζ K ( s ) = 0, then Re( s ) = . Remark.
As the statements are equivalent for the Dirichlet L -functions and the Dedekind zetafunctions it is not uncommon to see the Extended Riemann Hypothesis be called the GeneralizedPage 61 of 114iemann Hypothesis for number fields. We will in this paper refer to both as the GeneralizedRiemann Hypothesis, GRH, and let context infer which version we are referring to. One of the main problems that we encounter when we consider both Dirichlet L -functions andDedekind zeta functions is that we can not generalize the zero free region. Especially, for theRiemann zeta function we were able to extend the zero-free region just past the R ( s ) = 1 line,and this may not be true for Dedekind zeta functions and Dirichlet L -functions. In fact: Theorem 4.24.
There is a constant c > L ( σ + it, χ ) = 0 for some primitivecomplex Dirichlet character χ mod q then σ < − c log q ( | t | + 2) . If χ is a real primitive character then this holds for all zeroes of L ( s, χ ) with at most oneexception. The exceptional zero, if it exists is real and simple.Later we will see a result by Landau, 5.29, which develops on this idea. It is clear to see thatthis theorem is formulated for Dirichlet L -functions, but a similar result exists for Dedekind zetafunction. We therefore introduce the following concept: Definition 4.25.
Let L ( s, χ ) be a Dirichlet L -function, then the hypothetical value 1 − ν for ν ∈ (0 , ), such that L (1 − ν, χ ) = 0 , for a Dirichlet character modulo q , χ , is called a Siegel-zero of L ( s, χ ).Equivalently, for ζ K ( s ), if there exists a ν ∈ (cid:0) , (cid:1) such that ζ K (1 − ν ) = 0 , Page 62 of 114or a number field K , is called a Siegel-zero of ζ K ( s ).The culminating theorem from Siegel on ‘his’ zeroes was Theorem 4.26. Siegel’s theorem
For any ǫ > c ( ǫ ) such that,if χ is a real primitive character mod q , then L (1 , χ ) > c ( ǫ ) q − ǫ . Even in the definition we elude to these zeroes as hypothetical, as the assumption of the(Generalized) Riemann Hypothesis has as an immediate consequence that such zeroes do notexist. For the sake of this paper [14] does not assume the Riemann Hypothesis in any way, hencewe must always concern ourselves with possible Siegel zeroes.
Now we turn to the next topic, which is a main factor in the Randomized Number Field Sieve:Smooth numbers on arithmetic progressions. To be able to do this we first need to know a littlebit about prime numbers on arithmetic progressions and their distribution. First we note thatit would be beneficial for us to have a prime counting function for arithmetic progressions. Thisturns out to be a simple modification:
Definition 4.27.
For gcd( a, q ) = 1 the prime counting function for arithmetic progressions isdefined as π ( a,q ) ( x ) = π ( x, a, q ) = X p ≤ xp ≡ a mod q . The first question that comes to mind is one that was answered by Dirichlet: Are thereinfinitely many primes of that form? This is the case, as we stated in theorem 4.19:
Theorem 4.28.
Let x ∈ R and a, q ∈ Z such that gcd( a, q ) = 1. Then there are infinitely manyprimes of the form p ≡ a mod q . Page 63 of 114his immediately shows that just like for π ( x ) we have that π ( x, a, q ) diverges. Therefore, weinfer, there must be something interesting to say about the asymptotic behaviour of the primecounting function. Not only is there something interesting to say, there is a whole lot to say aswell: Theorem 4.29.
Let x ∈ R , a, q ∈ Z such that gcd( a, q ) = 1, and let φ ( x ) be Euler’s totientfunction. Then π ( x, a, q ) ∼ x log x φ ( q ) . This immediately shows that the density is independent of a as long as gcd( a, q ) = 1. Before we start to talk about smooth numbers we want to consider if anything better thanDirichlet’s function can be said, and we can if we allow for an ineffective bound. For this wedefine ψ ( x, a, q ) = X n ≤ xn ≡ a mod q Λ( n ) , where Λ( n ) is the von Mangoldt function:Λ( n ) = log p ∃ k ∈ N : n = p k . Theorem 4.30. Siegel-Walfisz theorem
Let ψ ( x, a, q ) and Λ( n ) be as defined above. Given q ∈ N there exists a positive constant c ( q ) such that ψ ( x, a, q ) = xφ ( q ) + O (cid:16) x exp (cid:16) − c ( q ) p log x (cid:17)(cid:17) . The reason this is ineffective is because it is based on Siegel’s theorem, 4.26, and this theoremgives no way to compute c ( n ). Another way to go about this is looking at average cases. Thisprobabilistic approach will be explored later and will lead to effective constant and a far strongerPage 64 of 114ound.It may not come as a surprise that the idea of prime numbers along arithmetic progressionscan be expanded to number fields, as we have done with the Prime Ideal Theorem. For thiswe first consider what an ‘arithmetic progression’ is on a number field. This does not comeintuitively, but the answer lies in Galois Theory. Let L/K be a Galois extension with Galoisgroup G = Gal( L/K ). As
L/K is a Galois extension the Frobenius element, Frob P , defines aconjugacy class C = (cid:26) Frob Q | Q ⊂ L s.t. Q is a prime ideal and Q | P (cid:27) . It can be shown that this abides the rules of modular arithmetic and therefore can be used asan extension of the idea of arithmetic progressions. Using this we get the following result:
Theorem 4.31. Chebotarev Density Theorem
Let
L/K be Galois and let P ⊆ K be aprime ideal. Moreover let C ⊆ G be the conjugacy class defined above. Then { P | P ∤ ∆ L/K , Frob P ∈ C } has density | C || G | .This has long been the best result available, but over the years it has been strengthened bothunder the GRH assumption and without. Later we will see a version which will be of particularinterest to us as it is both unconditional and has effectively defined constants. Lastly, before we move onto a different topic, we discuss how smooth numbers behave on arith-metic progressions. First we need some definitions:Page 65 of 114 efinition 4.32.
For x, y, a, q ∈ N and a Dirichlet character χ , we define:Ψ( x, y ) = |{ z ∈ N | z < x, z is y -smooth }| , Ψ r ( x, y ) = |{ z ∈ N | z < x, z is y -smooth , gcd( z, r ) = 1 }| , Ψ( x, y, a, r ) = |{ z ∈ N | z < x, z is y -smooth , z ≡ a mod r }| , Ψ( x, y, χ ) = X z Conjecture 4.33. Let A be a given positive real number. Let y and q be large with q ≤ y A .Then as log x log y → ∞ we have Ψ( x, y, a, q ) ∼ φ ( q ) Ψ q ( x, y ) . Soundararajan proved in [29] that the above holds for A ≥ √ e − ǫ given a certain bound on y . The latter restriction was later removed by Harper, [30]. We will see more of Harper’s worklater as we use these results.We now introduce some bounds based on Hildebrand and Tenenbaum’s work, which will bepresented proof-less here. For those especially interested in this we suggest [25],[26],[22]. Wewill avidly be working with these sets later on, but to do so we will first need a few resultsstarting with [[14], fact 3.11]: Proposition 4.34. Let ǫ > ≤ u ≤ (1 − ǫ ) log x log log x Then:Ψ (cid:16) x, x u (cid:17) = x exp (cid:18) − u (cid:18) log u + log log u − u − u + O ǫ (cid:18) log log u log u (cid:19)(cid:19)(cid:19) . A very rough result that we use is a direct result of this factPage 66 of 114 orollary 4.35. Fix 0 < a < b ≤ 1. Then uniformly in c, d > ρ ( L x ( b, d ) , L x ( a, c )) = L x (cid:18) b − a, d ( b − a ) c (cid:19) − o (1) . Proof. Let u = log L x ( b, d )log L x ( a, c ) = dc (cid:18) log x log log x (cid:19) b − a . Then u → ∞ and u = log (cid:16) log x log log x (cid:17) . Hence ρ ( L x ( b, d ) , L x ( a, c )) = exp( − (1 + o (1)) u log u )= exp (cid:18) − (1 + o (1)) d ( b − a ) c log b − a ( x )(log log x ) b − a (cid:19) = L x (cid:18) b − a, d ( b − a ) c (cid:19) − o (1) . If we are substantially more careful however there are tighter results, whose proofs go beyondthe scope of this paper, by Hildebrand and Tenenbaum. These results allow short intervals tobe sieved for smooth numbers effectively: Theorem 4.36. Fix any ǫ > 0. For any x ≥ 3, log x ≥ log y ≥ (log log x ) + ǫ , z ≤ y , thefollowing estimate holds uniformly:Ψ (cid:0) x (cid:0) z − (cid:1) , y (cid:1) − Ψ( x, y ) = Ψ( x, y ) z (cid:18) O (cid:18) log( u + 1)log y (cid:19)(cid:19) . Theorem 4.37. For any x, y , we set u = log x log y . There exists a saddle-point, α = α ( x, y ), suchthat for any 1 ≤ c ≤ y :Ψ( cx, y ) = Ψ( x, y ) c α ( x,y ) (cid:18) O (cid:18) u + log yy (cid:19)(cid:19) , withPage 67 of 114 ( x, y ) = log (cid:16) y log x + 1 (cid:17) log y (cid:18) O c (cid:18) log log( y + 1)log y (cid:19)(cid:19) . Theorem 4.38. Let c > n ∈ N such that ω ( n ) is the number of primefactors of n (without multiplicity). Let n be a y -smooth number with 2 ≤ y ≤ x such that ω ( n ) ≤ y c (log(1+ u )) − . Then:Ψ n ( x, y ) = φ ( n ) n Ψ( x, y ) (cid:18) O (cid:18) log(1 + u ) log(1 + ω ( n ))log y (cid:19)(cid:19) . This final theorem induces the following corollary which will prove crucial to us: Corollary 4.39. Take c > ω as in 4.38. Let 2 ≤ y ≤ x andwith ω ( n ) ≤ y c (log(1+ u )) − . Then:Ψ n ( x, y ) = φ ( n ) n Ψ( x, y ) (cid:18) O c (cid:18) log(1 + u ) log(1 + ω ( n ))log( y ) (cid:19)(cid:19) (cid:18) O (cid:18) ω ( n ) y (cid:19)(cid:19) . Proof. Let n = sr for s y-smooth and r with no prime factor less than y . Then Ψ s ( x, y ) =Ψ r ( x, y ) , φ ( n ) = φ ( r ) φ ( s ) and, for p prime, φ ( r ) r − = Q p | r (1 − p − ) = 1 + O ( ω ( n ) y − ), whichimplies the given bound. The Randomized Number Field Sieve is a probabilistic algorithm and makes extensive use ofthe discrete uniform distribution. Define for x ∈ [ a, b ] and for y ∈ S , where S is a finite set, thediscrete uniform distribution as P ( x ) = 1 b − a + 1 , or P ( y ) = 1 | S | This is in many ways one of the simplest distributions to work with and overall we will onlyneed a limited amount of probability theory. There is however a need to understand how adistribution can be used to define a measure. For this recall the following definition:Page 68 of 114 efinition 4.40. A set function µ : F → R , for a field F , is a probability measure if it satisfiesthese conditions:1. 0 ≤ µ ( A ) ≤ A ⊆ F . µ ( ∅ ) = 0, µ ( F ) = 1 . µ ( A , A , . . . , A n ), for a disjoint sequence of F -sets, such that S ∞ i =1 A i ∈ F , then µ ∞ [ i =1 A i ! = ∞ X i =1 µ ( A i ) . Remark. Note that if µ is a probability measure then the support of µ is any set A ⊂ F forwhich µ ( A ) = 1. It is clear that this always exists by the second condition.We can confirm that for any finite set S we can define a uniform measure. To see this let µ be the uniform distribution of some finite set S = { s , . . . , s n } such that | S | = n . Then for any V ⊂ S with order | V | we have that µ ( V ) = | V || S | . Especially we have that µ ( ∅ ) = 0 , µ ( S ) = 1 , and 0 ≤ µ ( V ) ≤ . Lastly for any collection of disjoint subsets V , . . . , V m ⊂ S : µ ∞ [ i =1 V i ! = µ m [ i =1 V i ! = P mi =1 | V i || S | = m X i =1 | V i || S | = m X i =1 µ ( V i ) . Similarly we can confirm that for any continuous interval [ a, b ] we can define a uniform measure. Definition 4.41. For any finite set S , we denote the uniform measure over S by U ( S ) . Page 69 of 114ne of the things we are going to use our uniform measure on are the zero-centered half-openinteger intervals of length L , which we will denote as follows: I ( L ) = (cid:20) − L, L (cid:19) ∩ Z . We note here, for good measure, that this is in fact a finite set. In the section above we alreadydiscussed the Dirichlet convolution of arithmetic functions, but there is also a convolution ofmeasures, which is normally defined as an integral, but we are interested in the discrete uniformmeasures as defined above. This allows us to restate the convolution of measures as follows: Definition 4.42. For any two measures µ, ν over an additive group G we define the convolutionof measures as ( µ ⋆ ν ) ( x ) = X y ∈ G µ ( y ) ν ( x − y ) . Remark. Note that it is convention to denote the convolution of measures by ∗ , but to be distinctwe will denote the convolution of measures by ⋆ and the Dirichlet convolution by ∗ .One of the big concepts in the RNFS is the avoidance of the General Riemann Hypothesis,and for this we need to consider the moments of a probability distribution. We will recognizethe first two moments, mean and variance, of the discrete uniform distribution: Definition 4.43. Let U be a discrete uniform distribution with support S , then the first moment,or mean, is defined as E ( U ) = X s ∈ S P U ( s ) . Remark. By convention we denote subscripted to the probability, P , the fixed variables. In thiscase we fix the distribution, but this is usually dropped when the distribution is clear fromcontext. Definition 4.44. Let U be a discrete uniform distribution with support S , then the secondmoment, or variance, of x ∈ S is defined asVar( x ) = E ( x − E ( x )) . Page 70 of 114he idea of the RNFS is to remove the dependence of the analysis of number field sieve-typealgorithms on the second moment, for which certain bounds on the complexity exist. To do thiswe consider a probabilistic technique which Lee and Venkatesan, in [14], have dubbed stochasticdeepening . The core idea is as follows: Lemma 4.45. Let x be a random variable with E ( x ) = µ . Given there exists a K ≥ ≤ x ≤ Kµ uniformly, then there exists i ∈ { , . . . , ⌈ log K ⌉} such that: P (cid:18) x ≥ i µ ⌈ log K ⌉ (cid:19) ≥ i +1 . This lemma states that for non-negative random variables which are not too erratic, thereis a substantial set where the value is large and whose contribution to the mean is large. Thisallows us to provide a search algorithm whose run times are shown to be near optimal withoutestablishing variance bounds.The explicit statement of such a search algorithm is not too important for us, as we are analysingthe theoretical complexity, so we will not define such an algorithm explicitly. It will howeverbecome important in the proof of one of the key theorems, 5.2.Page 71 of 114 The Randomized Number Field Sieve Recalling the GNFS algorithm, algorithm 3, we will focus in this section on giving a descriptionof the differences between the sieve from chapter 3 and a randomized version with provable com-plexity. It is well known that the Number Field Sieve’s runtime is dominated by two problems:1. Finding a set S of sufficient ( a, b )-pairs such that a − bm is B -smooth and a − bα is B ′ -smooth.2. Collapsing these ( a, b )-pairs to find a subset T ⊆ S such that a congruence of squaresmodulo the to-factor integer n arises.In this version of the Number Field Sieve these two steps are randomized cleverly to produce aprovable complexity. To do this there is one more change that needs to be made, and that is thechoice of polynomial. It will be shown that by randomizing the choice of polynomial a rigorousbound can be obtained for the search. To start we consider the constants δ, κ, σ, β, β ′ under theconditions that κ > δ − , σ > max ( β, β ′ ) + δ − β (1 + o (1)) + σδ + κ β ′ (1 + o (1)) , (5) δ − < σδ + κ , (6)and fix the smoothness bounds B = L n (cid:18) , β (cid:19) , B ′ = L n (cid:18) , β ′ (cid:19) . (7)The culminating theorem of the randomized number field sieve is as follows:Page 72 of 114 heorem 5.1. Let δ, κ, σ, β, β ′ be constants as defined in (1) and (2). Then for any n , therandomized number field sieve runs in expected time L n (cid:18) , r 649 + o (1) (cid:19) , and produces a pair x, y with x ≡ y mod n .It is important to stress that this is a NFS-style algorithm, so we can not be sure that thecongruence found does not give a trivial factorization. The algorithm is started in the same wayas the GNFS by finding a polynomial, however in this randomized version we will insist that thepolynomial is a bivariate homogeneous monic irreducible polynomial of bounded degree, that is: f ( x, y ) ∈ Z [ x, y ] : f ( x, y ) = ˆ f m,n ( x, y ) + R ( x, y ) , where ˆ f m,n ( x, y ) is the polynomial given by n and m , such that ˆ f m,n ( m, 1) is the base- m expansionof n and R ( x, y ) = d − X i =0 c i ( x − ym ) x d − i − y i , such that every c i is determined uniformly at random and deg( f ) = d = δ q log n log log n , d odd.Moreover we bound the choice of m by m d ≤ n < m d . Remark. The harshness of these bounds are all necessary to show a provable complexity, so wewill often refer back to this introduction as “the defined parameters” without restating the exactparameters again. Remark. We will, like [14], freely switch between considering the polynomial f ( x, y ) as a bivariatepolynomial and it’s monovariate equivalent f ( x ) = f ( x, Theorem 5.2. Take δ, κ, σ, β, and β ′ with the defined parameters. For any n , the randomizednumber field sieve can almost surely find an irreducible polynomial f of degree d and height atPage 73 of 114ost L n ( ), with α a root of f , n | f ( m ), and L n ( 13 , max( β, β ′ ) + o (1))distinct pairs a < | b | ≤ L n ( , σ ) such that ( a − bm ) is B -smooth and a − bα is B ′ smooth, inexpected time at most L n ( , λ ) for any λ > max( β, β ′ ) + δ − β (1 + o (1)) + σδ + κ β ′ (1 + o (1)) . In particular, the probability that the randomized number field sieve fails to produce such a setis bounded above by L n ( , κ − δ − ) − o (1) .The key purpose for the randomization of the polynomial selection process has to do withthe sieving process over the number field, as it will cause the norm of a pair, N ( a − bα ), tobecome a random variable in Z . This allows the consideration of smoothness over Z and Z [ α ] tobe completely independent. We will explain this idea in more detail when we look at the sievingprocess over the number field.This also leads us to question what the quadratic characters will then look like, and this isthe second part where the RNFS differs significantly from the GNFS. Where in the GNFS wedefined the quadratic characters as the Legendre symbols (cid:0) rp (cid:1) modulo a a prime p , we nowhave to account for our randomization and instead will have to choose maps from Z [ α ] into F p k stochastically and close to uniformly across all k log( p ) < L n ( ). This exponentially increasesthe sizes of the fields, but we will see they are necessary for the unconditional equidistributionof the characters. Once the sieving process is finished we are, with the exception of a set of bad f , guaranteed a reduction to a congruence of squares: Theorem 5.3. Let B, B ′ be with the defined parameters. Let f be irreducible of degree d andheight at most L n ( , κ ), and let α be a root of f . Then for all but a L n ( , κ − δ − (1 + o (1))) − Page 74 of 114raction of the set of f , if we are given L n (cid:18) , max( β, β ′ ) (cid:19) Ω (log log n )pairs a < b ≤ L n ( ) such that a − mb is B -smooth and a − bα is B ′ -smooth, we can find acongruence of squares modulo n in expected time at most L n (cid:18) , (cid:18) δ , β, β ′ (cid:19)(cid:19) o (1) . Let’s dive into the details. As we mentioned the algorithm is a GNFS based algorithm with two major differences: Howthe polynomial is chosen and how the quadratic characters are chosen. To start we will look athow the polynomial is chosen, and consequences related to that, before proving theorem 5.2. Todo this let β, β ′ , κ, σ, δ, B , and B ′ be as defined in (5), (6), and (7).To begin we make a few restrictions: Definition 5.4. Let X be the set of tuples ( f, m, n, a, b ) such that the following conditions hold:1. m ∈ Z and m ∈ h − d L n (cid:0) , δ − (cid:1) , L n (cid:0) , δ − (cid:1)i . f ∈ Z [ x, y ], deg( f ) = d = δ q log n log log n , 2 ∤ d , with coefficients bounded by L (cid:0) , κ (cid:1) (1 + o (1)).Moreover let the coefficients { c i } i ∈ I be drawn uniformly at random such that c i ∈ I (cid:18) L n (cid:18) , κ − δ − (cid:19)(cid:19) . a, b ∈ Z , ≤ a < | b | ∈ (cid:2) L n (cid:0) , σ (cid:1)(cid:3) , with a − bm B -smooth and f ( a, b ) B ′ -smooth.Moreover let X f,m,n = { ( a, b ) ∈ Z : ( f, m, n, a, b ) ∈ X } . The first thing we note from thisPage 75 of 114efinition and the definition of our parameters is that f ( x, y ) is not generated uniformly atrandom as it has a component solely determined by m and n , ˆ f m,n ( x, y ), which is a problem aswe try to obtain that f ( a, b ) is as likely to be B ′ -smooth as a uniformly random integer of thesame size. To mitigate this we shall use the observation that c i is much larger than m to provethat the randomized part of f ( x, y ) dominates and that f ( a, b ) therefore can be considered as auniformly distributed along any arithmetic progression of common difference ( a − mb ).Once we have achieved this we shall show that for most B -smooth moduli a − mb , the B ′ -smooth numbers are approximately uniformly distributed modulo ( a − mb ). It is only then thatwe can show that f ( a, b ) is as likely to be B ′ -smooth as a random integer of the same size.Once we have shown this we shall venture to prove that there are sufficient ( a, b )-pairs such that a − mb is B -smooth and f ( a, b ) is B ′ -smooth, which will lead to a proof of theorem 5.2. A keyin this is the observation that f ( a, b ) lies on the arithmetic progression given by (cid:26) ˆ f m,n ( a, b ) + ( a − mb ) z : | z | ≤ dL n (cid:18) , κ − δ − (cid:19) b d (cid:27) . Continuing our exposition using the definitions and parameters we have set up until now werecall the following: ∀ i ∈ { , . . . , d − } : ( c i ) ∼ µ = U I (cid:18) L n (cid:18) , κ − δ − (cid:19)(cid:19) d ! and denote c = ( c i ) to be the coefficient vector of f .Note that for f ( a, b ) to be B ′ -smooth it is necessary for a − bm to be B -smooth, howeverthis is not something we can simply assume. Therefore we consider the definition of f ( x, y ) andnote that f ( a, b ) ≡ ˆ f m,n ( a, b ) mod a − bm. Page 76 of 114s gcd( a, b ) d | f ( a, b ) ˆ f m,n ( a, b ) we have that gcd( a, b ) d ( a − bm ) | R ( a, b ). Now let a and b beuniform in their ranges, then we can give an explicit description of the probability that a − mb is B -smooth: Proposition 5.5. Fix b in its interval. If a, m are uniformly random, then: P a,m ( a − bm is B -smooth) = L n (cid:18) , δ − β (1 + o (1)) (cid:19) − . Proof. We fix b . Note that a is uniformly random on an interval of length b , and m is uniformlyrandom over an interval of length comparable to its largest value. In particular: a − bm ∼ U (cid:20) − bL n (cid:18) , δ − (cid:19) , − b (cid:18) − d L n (cid:18) , δ − (cid:19) + 1 (cid:19)(cid:19) = U (cid:2) − x (cid:0) z − (cid:1) , − x (cid:1) , for x = L n (cid:0) , δ − (1 + o (1)) (cid:1) and z ≈ d − O ( d − ). Note that d = log (cid:16) B (cid:17) , and thatlog log B = O (log log x ). Hence from theorem 4.36 the number of smooth values of a − mb is:Ψ( x, B ) z (cid:18) O (cid:18) log( u + 1)log B (cid:19)(cid:19) . Since the range of values is of length xz , P a,m ( a − bm is B -smooth) = ρ ( x, B ) (cid:18) O (cid:18) log( u + 1)log B (cid:19)(cid:19) . Recall that B = L n (cid:0) , β (cid:1) and x = L n (cid:0) , δ − (cid:1) − . Furthermore, note that log u < log log n = o (log B ). Hence by corollary 4.35 ρ L n (cid:18) , δ − (cid:19) o (1) , B ! = L n (cid:18) , δ − β (cid:19) − o (1) . Page 77 of 114bsorbing the multiplicative 1 + o (1) into the o (1) error term in the exponent to obtain P a,m ( a − bm is B -smooth) = L n (cid:18) , δ − β (1 + o (1)) (cid:19) − . Now we can fix the residue of f ( a, b ) mod a − mb and consider only how the randomnessof our coefficients c i affect the polynomial. Therefore we will also want to be explicit aboutthe coefficient vector, c = ( c i ) i ∈{ ,...,d − } , for f ( x, y ), hence we will, when when we need to beexplicit, denote f ( x, y ) = f c ( x, y ).Now we are set up to show that f ( a, b ) can be considered as uniformly distributed along pro-gressions of common difference ( a − mb ). Lemma 5.6. Let a < b , with gcd( a, b ) = 1, define ϕ = ϕ a,b : Z d → Z by the following ϕ (( v , . . . , v d − )) = d − X i =0 v i a d − i − b i . There exists a set S ⊆ I (cid:0) L n (cid:0) , σ (cid:1)(cid:1) d such that ϕ bijects S and I (cid:0) b d − (cid:1) . Proof. For each i ≥ 0, we claim that for any | t | ≤ b i + a i +1 there exists a representation t = a i x + a i − bx + . . . + b i x i , with | x | , . . . , | x i | ≤ a + b . We proceed by induction on the number of terms. If i = 0 then theconclusion is trivial as | t | ≤ b + a = a + 1 and so t = a x = x , for | x | ≤ a + b , and as a + 1 ≤ a + b the result follows. Now let i > 0, then we may choosePage 78 of 114 y | < a such that (cid:12)(cid:12) t − ya i (cid:12)(cid:12) ≤ b i . Letting z ∈ { , . . . , b − } such that za i ≡ t − ya i mod b we can set x = y + z and as | y | < a and | z | < b it follows that | x | ≤ a + b . By construction b | t − x a i and (cid:12)(cid:12)(cid:12)(cid:12) t − x a i b (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) ( t − ya i ) − za i b (cid:12)(cid:12)(cid:12)(cid:12) ≤ b i − + a i . Hence by the induction hypotheses such a representation exists for every t with | t | ≤ b i + a i +1 .Now we show the existence of S directly. Let t ∈ I ( b d − ), then for any such t : | t | ≤ b d − andso the conditions of the above hold with i = d − 1. Hence there exists a sequence ( x i ) i ∈{ ,...,d − } such that d − X i =0 x i a d − i − b i = t, with | x i | < a + b for all i . Hence the vector given by the sequence, x t = ( x i ), fulfills that ϕ ( x t ) = t . Moreover, x t ∈ I (2( a + b )) d and by Defn 2.4.3 this means x t ∈ I (cid:0) L n ( , σ ) (cid:1) d .Hence, taking S = { x t : t ∈ I (cid:0) b d − (cid:1) gives the construction of a single vector x t , hence ϕ isbijective on S . Remark. Note that we have presupposed that gcd( a, b ) = 1, but it is a simple exercise to showthat this is trivial. By the definition of ( a, b ) ∈ X f,m,n we know that a − mb is B -smooth, hencegcd( a, b ) is B -smooth. Since we will only be interested in looking at B -smoothness to show theuniform randomness of f ( x, y ) we can assume without loss of generality that gcd( a, b ) = 1 asdivision of two B -smooth numbers preserves B -smoothness.We can now reformulate f ( x, y ) using the map ϕ . For this consider m, n fixed and observethat: f ( a, b ) = f c ( a, b ) = f c ′ ( a, b ) + ( a − mb ) ϕ ( a,b ) ( c − c ′ ) . This motivates the following: Definition 5.7. Let S be the set given by the lemma 5.6 . For any l ≤ b d − , we define a set S l Page 79 of 114nd a measure ν l as follows: S l = { v ∈ S | ϕ ( v ) = I ( l ) } ,ν l = U ( S l ) . Particularly, ν l gives a uniformly random element of S whose image under ϕ is in I ( l ).It is clear to see that if v ∼ ν l , then we can write f v ( a, b ) ∼ f ( a, b ) + ( a − mb ) U ( I ( l )) . This allows the consideration that for the measures ν l with support S l give us additive alterationsto c which change f ( a, b ) additively by a − mb times a uniformly random value. This meansthat if f is indeed uniformly random, then we can replace the randomness of c over cosets of S l with randomness of f c ( a, b ) over arithmetic progressions. Definition 5.8. For ¯ µ : X → R + a measure and F : X → Y a function, we define the measure F ¯ µ : Y → R + given by F ¯ µ ( y ) = ¯ µ (cid:0) { F − ( y ) } (cid:1) = X x : F ( x )= y ¯ µ ( x ) . Remark. It will be left to the reader to prove that this is in fact a measure. Moreover note thatif F is bijective then F ¯ µ ( y ) = ¯ µ ( x ), for some unique x ∈ X .Now consider c ∼ ν l such that we can write f c ( a, b ) ∼ f ( a, b ) + ( a − mb ) U ( I ( l )) . As µ is a uniform distribution on a cube of side 2 L n (cid:0) , κ − δ − (cid:1) and we have a bijective func-tion ϕ : S → I ( b d − ) we can now show that ϕ µ is actually close to a convolution of uniformdistributions on intervals. In fact, by convolving with several distributions, ν l i , we can treateach coefficient in f as if it were independent and uniformly random. This is made rigorous inthe following proposition: Page 80 of 114 roposition 5.9. Fix a, b . There is a distribution ϑ such that ϑ is the convolution of uniformdistribution on intervals of lengths L n (cid:0) , κ − δ − (cid:1) a i b d − − i for i ∈ { , . . . , d − } with k ϕ µ − ϑ k = O L n (cid:18) , ( κ − δ − )(1 + o (1)) (cid:19) − ! , and | E ( ϑ ) | ≤ d − X i =0 a i b d − i − . Proof. Recall that the convolution of distributions, given by ⋆ , is defined by Definition 4.42 anddefine ν = µ ⋆ (cid:2) ⋆ d − i =0 ν a i b d − i − (cid:3) . From lemma 5.6, the support of ν a i b d − i − is contained in a cube of side 4 L n (cid:0) , σ (cid:1) . Hencethe support of P of ⋆ d − i =0 ν a i b d − i − is contained in a cube of side 4 dL n (cid:0) , σ (cid:1) . When k x k ∞ Recall from Definition 4.32Ψ( x, y, r, a ) = |{ z ∈ N : z < x, z is y -smooth , z ≡ a mod r }| and Ψ r ( x, y ) = |{ z ∈ N : z < x, z is y -smooth , gcd( z, r ) = 1 }| . We will often supress the ǫ as we are only interested in an error exponent up to o (1). More-over, we will say r is B -bad for F if it is not B -good for F and that r is B -good (dropping the F ) if r is B -good for all F ∈ (cid:2) L n (cid:0) (cid:1) ω − , L n (cid:0) (cid:1)(cid:3) for ω = L n ( ).With this notion we can start constraining the behaviour of f ( a, b ) mod a − mb for this wewill need to characterise the moduli for which the smooth numbers are uniformly distributedacross their residue classes. Once we have that it doesn’t matter which residue class f c ( a, b ) liesin as it does not affect its probability of being smooth as c varies. From Section 4.3 we have a result of [30] proving that, under some conditions, y -smooth numbers ≤ x are uniformly distributed over the φ ( q ) residue classes a mod q with gcd( a, q ) = 1. Howeverthe conditions prevent us from concluding uniform distribution directly, hence if we can show thatthe distribution of B ′ -good numbers is close enough to uniform assuming a − mb is B -smooth,to be considered as such then we are done. To do this we consider a Bombieri–Vinogradov styletheorem proposed by Harper, [23]: Page 83 of 114 heorem 5.11. Let c and K be fixed and effective constants. Then or any log K F < B < F with u = log F log B , and Q ≤ p Ψ( F, B ): X r ≤ Q max ( s,t )=1 (cid:12)(cid:12)(cid:12)(cid:12) Ψ( F, B, r, s ) = Ψ r ( F, B ) φ ( r ) (cid:12)(cid:12)(cid:12)(cid:12) ≪ Ψ ( F, B ) (cid:16) e − cu log2( u ) + B − c (cid:17) + Q p Ψ( F, B ) log / F, with an implied effective constant C = C ( c, k ) . Remark. We remark that the definition of “effective” differs per application. For our purposewe may simply claim that c < K are chosen fittingly for the to-factor n .If we consider Q max = max b,m | a + bm | = L n (cid:18) , δ − (1 + o (1)) (cid:19) , we can reconsider our question to the equivalent notion of bounding the probability that a B -smooth modulus less than Q max is B ′ -bad. This would immediately follow from showing thatthe number of B ′ -bad moduli below the Q -bound is much smaller than Ψ( Q max , B ). We will,for our specific needs, bound the sum in theorem 5.11 as we know that the common differencein the arithmetic progression is known to be y -smooth. For this let q = a − mb , then: Proposition 5.12. Let ǫ > K, c such that forany log K x < y < x x with u = log x log y , x ǫ ≤ Q ≤ p Ψ( x, y ) and ω = ω (1) with ω = y O (1) : X r ∈ [ Qω − ,Q ] r is y -smooth max gcd( a,r )=1 (cid:12)(cid:12)(cid:12)(cid:12) Ψ( x, y, r, a ) − Ψ r ( x, y ) φ ( r ) (cid:12)(cid:12)(cid:12)(cid:12) ≪ Ψ( x, y ) ρ ( Q, y ) (cid:16) e − cu log2 u + y − c (cid:17) + Q p Ψ( x, y ) log / x, for some effective implied constant. Remark. The proof for this is mostly an exercise in restating Harper’s [23] and is thereforeomitted. The full proof can be found in [14].Page 84 of 114ow we return to what we set out to prove: For most B -smooth moduli a − mb , the B ′ -smooth numbers are uniformly distributed. This follows immediately from the following twopropositions: Proposition 5.13. Fix a, b, m, n in their intervals and let f be uniformly random as before.Then: P f ( f ( a, b ) is B ′ -smooth | ( a − mb ) is B ′ -good) = L n (cid:18) , κ + σδ β ′ (1 + o (1)) (cid:19) − . Proof. Let a − bm = r . By proposition 5.9: P n,f ( f n,m ( a, b ) is B ′ -smooth) = P n,c (cid:16) ˆ f m,n ( a, b ) + rϕ ( a,b ) ( c ) is B ′ -smooth (cid:17) = P n,ϑ (cid:16) ˆ f m,n ( a, b ) + rϑ is B ′ -smooth (cid:17) + O L n (cid:18) , ( κ − δ − )(1 + o (1)) (cid:19) − ! . By the definition of the parameters we have κ > δ − and therefore κ − δ − > 0. Now recall,from proposition 5.9, ϑ has | E ( ϑ ) | ≤ P d − i =0 a i b d − i − and is sampled according to the convolutionof uniform measures on intervals of length L n (cid:0) , κ − δ − (cid:1) a i b d − i − for i = 0 , . . . , d − 1. Hence ϑ is unimodal with mode at some M satisfying | M | ≤ d − X i =0 a i b d − i − < db d − i − = L n (cid:18) , σδ (1 + o (1)) (cid:19) . Now define F max = L n (cid:18) , κ − δ − (cid:19) ( a − mb ) d − X i =0 a i b d − i − , then the support of ϑ is contained in (cid:2) M − F max | r | − , M + F max | r | − (cid:3) . We choose an ω = L n (cid:0) , o (1) (cid:1) , such that ω → ∞ , and set Y = L n (cid:16) e , κ − δ − (cid:17) b d − ω − = L n (cid:18) , κ − δ − + σδ − o (1) (cid:19) . Page 85 of 114ow we define a measure ϑ ′ to be ϑ ′ ( x ) = ϑ (max( x, Y )) x ≥ ϑ (min( x, − Y )) x < . Then k ϑ ′ − ϑ k ≤ O z ∼ ϑ ( | z | < Y ) ≤ Y (cid:18) L n (cid:18) , κ − δ − (cid:19) b d − (cid:19) − = 2 ω − , from the definition of Y . Note that Y is much larger than M and so ϑ ′ is monotone decreasingaway from 0; hence there are non-negative weights W y for y ∈ Z , with W y = 0 for | y | > F max | r | − such that: ϑ ′ = X y ≥ Y W y U ([0 , y ]) + W − y U ([ − y, , and (cid:12)(cid:12)(cid:12) − P y W y (cid:12)(cid:12)(cid:12) ≤ ω − . Hence we have: P f ( f m,n ( a, b ) is B ′ -smooth) = O ( ω ) − + F max | r | − X y = Y W y P (cid:16) ˆ f m,n ( a, b ) + r U ([0 , y ]) is B ′ -smooth (cid:17) + W − y P (cid:16) ˆ f m,n ( a, b ) + r U ([ − y, B ′ -smooth (cid:17) . We note that O ( ω − ) = L n (cid:0) , o (1) (cid:1) − terms can be absorbed into our o (1) terms, and so itsuffices to show that for any fixed, B ′ -good r and any y ∈ (cid:2) Y, F max | r | − (cid:3) : P (cid:16) ˆ f m,n ( a, b ) + r U ([0 , y ]) is B ′ -smooth (cid:17) = L n (cid:18) , κ + σδ β ′ (cid:19) − o (1) , P (cid:16) ˆ f m,n ( a, b ) + r U ([ − y, B ′ -smooth (cid:17) = L n (cid:18) , κ + σδ β ′ (cid:19) − o (1) . Since (cid:12)(cid:12)(cid:12) ˆ f m,n ( a, b ) (cid:12)(cid:12)(cid:12) ≤ ˆ F max := Y L n (cid:0) (cid:1) − , we can absorb the probability that the value on the leftis negative or positive (respectively) in the above two equations. From the definition of B ′ -goodPage 86 of 114nd corollary 4.39 , for any x ∈ h | r | Y − ˆ F max , F max + ˆ F max i :Ψ( x, B ′ , r, s ) = Φ r ( x, B ′ ) φ ( r ) L n (cid:18) , o (1) (cid:19) = Ψ( x, B ′ ) r L n (cid:18) , o (1) (cid:19) , and so to finish the estimate we observe that for any x ∈ h | r | Y − ˆ F max , F max + ˆ F max i : ρ ( x, B ′ ) = ρ (cid:18) L n (cid:18) , κ + σδ (cid:19) , B ′ (cid:19) = L n (cid:18) , κ + σδ β ′ (cid:19) − o (1) . Now there is only one thing left to do: Show that a − bm is B ′ -good significantly often when a − bm is B -smooth, which we have assumed from the start and have a significant probabilityfor from proposition 5.5. Proposition 5.14. Fix any b . Then P a,m ( a − mb is B ′ -good | a − mb is B -smooth) = 1 − o (1) . Proof. We begin by bounding the number of moduli which are F -bad for some F ∈ " F max L n (cid:18) (cid:19) − , F max . We fix ω = B ′ for concreteness. Observe that Ψ( F, B ′ ) = F L n (cid:0) (cid:1) − . Since L n (cid:0) (cid:1) = ω (cid:0) L n (cid:0) (cid:1)(cid:1) .Q ≤ p Ψ ( F, B ′ ) L n (cid:18) , ǫ (cid:19) − . Furthermore, for any K fixed, ω (cid:0) log K F (cid:1) = B ′ = o (cid:16) F F (cid:17) . Hence we can apply proposition5.12. Suppose that a modulus r is B -smooth and also B ′ -bad for F . Then for some residue a Page 87 of 114ith gcd ( a, r ) = 1, the contribution to the LHS of proposition 5.12 for this r is at leastΨ( F, B ′ ) φ ( r ) (1 + o (1)) = Ψ( F, B ′ ) r ≥ Ψ( F, B ′ ) Q , where for the first equality we use corollary 4.39, noting that B ≤ B ′ so r is B ′ -smooth, u < log log n and the number of divisors of r sis bounded by log r so that the multiplicative error is1 + o (1). Now X r ∈ [ Q max ω − ,Q max ] r is y -smooth max gcd( a,r )=1 (cid:12)(cid:12)(cid:12)(cid:12) Ψ( F, B ′ , r, a ) − Ψ r ( F, B ′ ) φ ( r ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C Ψ( F, B ′ ) ρ ( Q max , B ′ ) (cid:18) e − cu ′ log2 u + B ′− c (cid:19) + Q max p Ψ( F, B ′ ) log / F = CF ρ ( F, B ′ ) ρ ( Q max , B ′ ) (cid:18) e − cu ′ log2 u + B ′− c (cid:19) + Q max F ρ ( F c, B ′ ) log / F. First, we observe that F and Q max are L n (23), whilst B ′ is L n (cid:0) (cid:1) . Hence both densities ρ ( Q max , B ′ ) and ρ ( F, B ′ ) are L n (cid:0) (cid:1) − . From the definition of Q max and F max we have Q max = L n (cid:18) , δ − (1 + o (1) (cid:19) , F max = L n (cid:18) , ( κ + σδ ) (1 + o (1) (cid:19) , and from the parameters defined in equation (2) we have 2 δ − < κ + δσ . Hence F Q − = L n (cid:0) (cid:1) .Since up to order L n (cid:0) (cid:1) o (1) , the first term is F and the second is Q max F , we deduce that the firstterm dominates the second. If r is B ′ -bad for F it contributes at least Ψ( F, B ′ ) Q − (1 + o (1))to the sum on the left hand side. Hence the number of moduli which are in [ Q max ω − , Q max ],are B -smooth and B ′ -bad for F is at most Q max Ψ( F, B ′ ) Ψ( F, B ′ ) ρ ( Q max , B ′ ) ( C + o (1)) (cid:18) e − cu ′ log2 u ′ + B ′− c (cid:19) , = Ψ ( Q max , B ′ ) ( C + o (1)) (cid:18) e − cu ′ log2 u ′ + B ′− c (cid:19) . Page 88 of 114f a modulus is B ′ -bad near F max , it must be B ′ -bad for some F ∈ (cid:26) F max L n (cid:18) (cid:19) − , F max (cid:27) ∪ (cid:26) i : 2 i ∈ (cid:20) F max L n (cid:18) (cid:19) , F max (cid:21) (cid:27) , which is a set of logarithmic size. We can absorb a logarithmic factor into the constants c, C , sothe number of moduli which are in [ Q max ω − , Q max ], are B -smooth and B ′ -bad is at most:Ψ( Q max , B ′ ) ( C + o (1)) (cid:18) e − cu ′ log2 u ′ + B ′− c (cid:19) = o (Ψ ( Q max B ′ )) . Hence even assuming that every B -smooth number below Q max ω − is B ′ -bad gives: P a,m ( a − mb is B ′ -good | a − mb is B -smooth) ≥ − Ψ ( Q max ω − , B ′ ) + Ψ ( Q max , B ′ ))Ψ ( Q max , B ′ ) = 1 − o (1) . Hence the probability of a B -smooth modulus to be B ′ -good is nearing 1 and the probabilityfor f ( a, b ) to be B ′ -smooth given that a − mb is a B -smooth modulus is comparable to a randominteger of similar size to be B ′ -smooth under the same conditions. Hence we can now provetheorem 5.2. Now that we have shown that the changes that randomization causes are controllable we are ableto conclude that they are also sufficient to give a provable complexity. For this let us quicklyreflect on our choice compared to the general number field sieve. Consider ( f, m, n, a, b ) ∈ X andlet α ∈ C be a root of the first coordinate of f , f ( α, 1) = 0. Then there is a ring homomorphism Z [ α ] → Z /n Z , ( a − bα ) ( a − bm ) mod n. Page 89 of 114rom the field norm we also have a multiplicative map from O Q ( α ) → Z , as we would expectfrom the general number field sieve. Moreover, denoting f d for the leading coefficient of f , thefield norm is guaranteed to be in f d Z for any element of Z + α Z . Hence the only thing thatwe can’t be sure about is that f is irreducible as f is considered uniformly at random. If f was irreducible then we can apply the strategy from the general number field sieve to obtain acongruence of squares modulo n . Lemma 5.15. P ( f is reducible) ≤ L N (cid:18) , κ − δ − + o (1)3 (cid:19) − . Proof. Fix m, n and let H = 2 L n (cid:0) , κ − δ − (cid:1) be the range of each coefficient of the random partof our polynomial f . Note that if a polynomial over Z is reducible if it is reducible modulo everyprime. Hence if we bound the number of reducible polynomials modulo F p for each prime p andbound how often a polynomial is reducible modulo several primes p , we can get good bounds onthe number of irreducible polynomials f . We count the reducible polynomials f with the Tur´ansieve, [11]. Let A = { ( c d − , . . . , c ) ∈ Z d , | c i | < H } which we equate with the set of polynomi-als f ( x, y ) = ˆ f m,n ( x, y ) + R ( x, y ) with f and ˆ f m,n are both homogeneous of degree d with ( c i )the coefficients of R . For any prime r , let A r correspond to the subset of A corresponding toirreducible polynomials mod r . Note that for any f to correspond to g mod r we must have( x − my ) | ˆ f m,n − g ∈ F r [ x, y ] or equivalently g ( m, q ) ≡ n mod r . We do not insist that G ismonic, although any irreducible g must be a scalar multiple of a monic irreducible. To estimatethe number of irreducibles we accept the following fact, [[7], Chapter 2]:For any 0 < i < r , the number of monic irreducible g of degree D such that g ( m ) ≡ i mod r is r D − D ( D − + O ( r D/ ).Note that for any g over F r , such that g ( m ) ≡ n with r ≪ √ H , there are (cid:18) Hr + O (1) (cid:19) d = (cid:18) Hr (cid:19) d + O (cid:18) Hr (cid:19) d − ! Page 90 of 114olynomials lying over g in A , and none if g ( m ) n mod r . Hence by a union bound over theirreducibles mod r : |A r | ≤ H d d ( d − 1) + O (cid:18) H d r d − (cid:19) + O (cid:0) H d − r (cid:1) , |A r ∩ A r ′ | ≤ H d d ( d − 1) + O (cid:18) H d r d − (cid:19) + O (cid:18) H d r ′ d − (cid:19) + O ( H d − rr ′ ) . From the Tur´an sieve, considering all primes r ≤ p for any p much smaller than √ H , the numberof reducible polynomials f is much smaller than H d p − log p + H d − p , which for p ∼ H / log / H is H d − log H . Then the number of potential f for fixed n, m is H d , and so the probabilitythat f is reducible is at most H − log H = L n (cid:18) , κ − δ − + o (1)3 (cid:19) − . Remark. If f is reducible we immediately assume that the algorithm fails, hence by sampling atmost L n (cid:0) (cid:1) polynomials the probability that any of them are reducible is o (1).Now assume that f is irreducible. Then the following theorem gives us the last ingredientwe need to prove theorem 5.2: Theorem 5.16. With β = β ′ , δ, σ, κ chosen subject to (5), (6), (7), and Definition 5.4: E m,f ( |X f,m,n | ) ≥ L n (cid:18) , τ (cid:19) , with τ = 2 σ − δ − β ′ (1 + o (1)) + σδ + κ β (1 + o (1)) . Proof. As proposition 5.5 and proposition 5.12 randomise over a, m for any fixed b , and uniformlyover n, f , for any b, n, f : P a,m ( a − bm is B -smooth and B ′ -good) = L n (cid:18) , δ − β (1 + o (1)) (cid:19) − . Page 91 of 114ince proposition 5.11 randomizes over f for any fixed a, b, m , we have for each fixed b that P a,m,f ( a − bm is B -smooth and B ′ -good , f ( a, b ) is B ′ -smooth)= L n (cid:18) , δ − β + κ + σδ β ′ (cid:19) − o (1) , as multiplicative factors of 1 + o (1) may be absorbed into the o (1) in the exponent of the L n (cid:0) (cid:1) terms. Summing over the L n (cid:0) , σ (cid:1) choices for a fixed pair ( a, b ): E m,f ( |X f,m,n | ) = X a,b P f,m,n (( f, n, m, a, b ) ∈ X )= L n (cid:18) , σ (cid:19) X b P f,m,n,a ( a − bm ) is B -smooth ∧ f ( a, b ) is B ′ -smooth ≥ L n (cid:18) , σ (cid:19) X b P f,m,n,a ( a − bm ) is B -smooth ∧ ( a − bm ) is B ′ -good ∧ f ( a, b ) is B ′ -smooth ≥ L n (cid:18) , σ − (cid:18) δ − β (cid:19) (1 + o (1)) + (cid:18) σδ + κ β ′ (cid:19) (1 + o (1)) (cid:19) . Proof of theorem 5.2 . Let τ = 2 σ − δ − β ′ − σδ + κ β , and note that: λ ≥ max( β, β ′ ) + δ − (1 + o (1)3 β + ( σδ + κ )(1 + o (1))3 β ′ = max( β, β ′ ) + 2 σ − τ + o (1) . For any fixed pair ( m, f ), we can use the hyper-elliptic curve method to examine any pair ( a, b )for a suitable smoothness of a − mb and f ( a, b ) in max( B, B ′ ) o (1) time. Hence we can determinewhether a pair ( a, b ) is in X f,m,n in time L n (cid:0) , o (1) (cid:1) .Page 92 of 114y lemma 5.15 the probability that f is reducible is L n (cid:0) (cid:1) , and we have an unconditionaluniform bound |X f,m,n | ≤ L n (cid:0) , σ (cid:1) . Hence from theorem 5.16 we deduce that E f,m ( |X f,m,n | | f irreducible) ≥ L n (cid:18) , τ + o (1) (cid:19) . By using lemma 4.45 we can use stochastic deepening to consider |X f,m,n | as a random variableof ( f, m ), with K ≤ L n (cid:0) , σ − τ (cid:1) . Hence for some j ≤ ⌈ log K ⌉ = O (cid:16) log n (log log n ) (cid:17) ,we have P f,m (cid:18) |X f,m,n | ≥ j L n (cid:18) , τ + log(1) (cid:19)(cid:19) > − j , absorbing logarithmic terms where appropriate.Now consider the following: To find a collection of pairs the algorithm iterates through each i ∈ { , . . . , ⌈ log K ⌉} , and for each i generates 2 i pairs ( f, m ), and for each pair ( f, m ) gen-erates 2 − i L n (cid:0) , max ( β, β ′ ) + 2 σ − τ + o (1) (cid:1) pairs ( a, b ) and tests for smoothness of a − mb and f ( a, b ). So if |X f,m,n | > i L n (cid:0) , τ + o (1) (cid:1) then the algorithm finds L n (cid:0) , max ( β, β ′ ) + o (1) (cid:1) pairs as required. Furthermore if P f,m (cid:18) |X f,m,n | ≥ i L n (cid:18) , τ + log(1) (cid:19)(cid:19) > − i , then with constant probability at least one of the pairs ( m, f ) satisfies this condition.Since the time taken for a single i is L n (cid:0) , max( β, β ′ ) (cid:1) + 2 σ = τ + o (1) we can absorb thelogarithmic number of iterations into the o (1) term. Hence, iterating it at most a logarithmicnumber of time reduces the probability of failure to L n (cid:0) , κ − δ − (cid:1) . Hence the expected timetaken to complete the algorithm is L n (cid:18) , max( β, β ′ ) + 2 σ − τ + o (1) (cid:19) . Page 93 of 114 .3 Algebraic obstructions Vol. 2: The congruence Let S be the set of ( a, b ) pairs such that a − mb is B -smooth and f ( a, b ) is B ′ -smooth. Theprevious section provides a way to find a sufficient amount of ( a, b ) pairs in a sufficiently shorttime, which we now will assume to be fulfilled. Identically to the General Number Field Sievealgorithm we now start on the next procedure: sieving for the congruence. In Chapter 3 we wereinterested in finding a subset S i such that f ′ ( m ) Y ( a,b ) ∈ S i a − bm is a square in Z ,f ′ ( α ) Y ( a,b ) ∈ S i a − bα is a square in Z [ α ] . In the RNFS there will be a similar approach, but it can be recalled that we used particularlybroad observations to avoid having to work with the ring of algebraic integers O Q [ α ] for whichthe structure might be unknown or especially difficult. In this section we will not only tackle theprocedure to find these squares, but in particular how we can avoid similar algebraic obstructionsas we saw in section 3.3.One of the first things to observe is that, given a set S such that for every ( a, b )-pair a − bm is B -smooth is that there is no reason for us not to use the standard method from the generalnumber field sieve to find a square. Given an element z ∈ Z with prime factorization Q di =1 p η i i it is easy to check if it is a square by ensuring that ∀ i ∈ { , . . . , d } : 2 | η i . So given the factor-izations of a − mb for B + 1 ( a, b )-pairs we can find a dependent subset in the same way we didwith the general number field sieve in Step 3.Once again it would be optimal if we could approach the Z [ α ] problem in the same way, butas we saw in Chapter 3 this requires some justification. This justification is only made morePage 94 of 114omplex as we are now dealing with a uniformly random polynomial. We start in much thesame fashion as we did with the general number field sieve by defining a set of pairs ( p, r ) suchthat f ( r, ≡ p for prime p and r ∈ { , . . . , p − } . Recall that the pairs ( p, r ) such that p is prime, 0 < r < p coprime to p , p | f ( r, 1) is in direct correspondence with the first degreeprime ideals P ⊆ O Q ( α ) such that P | h p i , N ( P ) = p . Then from proposition 3.8 we get that thefollowing function is well defined: e ( p,r ) ( a − bα ) = ord r ( N ( a − bα )) a − br ≡ p otherwise . From the proof of theorem 3.7 we know that this map is well defined from Q ( α ) × → Z . Hence,as we assumed that we have found sufficient ( a, b )-pairs we can use the same sieving techniquesas in Section 3 to find a subset S such that P ( α ) = Q ( a,b ) ∈ S ( a − bα ) ∈ Z [ α ]. Moreover we canagain be sure that e r,s ( P ( α )) ≡ P ( α ) is a square in Z [ α ]. To give a three line summary of our technique: we will apply the pigeonhole principle toa set H , to show that for a randomized field with a stochastic collection of characters with largeconductor the number of ways in which an element might appear square and not be square islimited. Now define the set we eluded to above: H = { z ∈ Q ( α ) × : ∀ s < r, e ( p,r ) ( z ) ≡ } / { z : z ∈ Q ( α ) × } . Because we attempt to use the pigeon hole principle we start by considering the size of this set,in fact a near-immediate result is the bounded dimension of H as a vector space:Page 95 of 114 roposition 5.17. H is an F vector space of dimension at most( δκ + o (1)) log ( n ) + δ κ n )) (log log( n )) . The proof of this follows, bar a few modifications, from the argument in [[17], theorem 6.7].The full proof can be found in [[14], p. 129-131], but we will accept this as a fact. To see that H is a vector space we only need to observe that Q ( α ) × is commutative, hence every elementof h ∈ H can be represented as a coset [ h ] = h · { z : z ∈ Q ( α ) × } . This means that for any h ∈ H , h is the identity since it is equivalent to h · { z : z ∈ ( Q ( α ) × } and h ∈ Q ( α ) × .The next step is to find a significant amount of quadratic characters. For this let P ⊂ O Q ( α ) bea prime ideal, as N ( P ) = p k and O Q ( α ) is a Dedekind Domain we have that O Q ( α ) / P = F p k . Hence we can identify P with a degree k monic irreducible polynomial p P over F r . We can saymore: Given an irreducible polynomial g of degree k over F p , if gcd( f, g ) = 1 then the quotient Q ( α ) / h g ( α ) i sends everything to 0, hence h g ( α ) i is not prime. So we may assume gcd( f, g ) > g is irreducible over F p that means that g is one of the irreducible factors of f mod h p i .In what follows we will regularly shift between following three equivalent concepts:1. Prime ideal P ⊆ O Q ( α ) .2. The irreducible polynomial divisor p P of f ( x, 1) mod p .3. Pair ( p, r ), p prime and r s.t. p P ( r, ≡ p for p P defined over F p k .The following definition extends the Legendre symbol. Definition 5.18. Let K be a field and let f ( x ) ∈ K [ x ] be a polynomial. Let p ( x ) ∈ K [ x ] be anPage 96 of 114rreducible polynomial, p ( x ) ∤ f ( x ). Then the polynomial Legendre symbol is defined as follows: (cid:18) f ( x ) p ( x ) (cid:19) L = , ∃ g ( x ) ∈ K [ x ] s.t. f ( x ) ≡ g ( x ) mod p ( x ) − , ∄ g ( x ) ∈ K [ x ] s.t. f ( x ) ≡ g ( x ) mod p ( x ) . This gives a very natural extension to a Dirichlet character, χ P , defined by the following: χ P : O Q ( α ) → F p k , h a − bα i 7→ (cid:18) a − bxp P (cid:19) L (8)All that remains now is finding P , for which we will seek to factorize f ( x, 1) mod p and lookat the irreducible divisors. Given a set F of these characters χ P = χ ( p,r ) , we can define themap Ψ F : H → F |F| sending x to the tuple (cid:0) χ ( p,r ) | χ ( p,r ) ∈ F (cid:1) . We will rely on the randomproduction of such a set F to show that this almost surely makes ker(Ψ F ) small. Lemma 5.19. There is a sampleable distribution Υ for pairs ( p, r ) such that χ ( p,r ) is a characteras in 8, such that for all but log log n of the h ∈ H , considering map as a map from H to F : P Υ (cid:0) χ ( p,r ) = − (cid:1) ≥ o (1)2 . Sampling according to Υ takes at most L n (cid:0) , c (cid:1) time for a to-define c . Furthermore, eachcharacter, χ ( p,r ) , can be evaluated in time at most L n (cid:0) , c (cid:1) . To start assume that we have the finite-degree tower of number fields L ⊃ K ⊃ Q , where L/K is a Galois extension with Galois group Gal( L/K ). Recall that ∆ L , ∆ K are the discriminants of L and K respectively, and let d L/K = [ L : K ], d L = [ L : Q ] and d K = [ K : Q ]. Let P ⊆ K be aprime ideal and let Q = { Q ⊆ L | Q lies over P } , then from proposition 2.30 we know that theFrobenius elements of all Q ∈ Q are conjugate, which allows us to give the following definition: Implicitly we are assuming F ∼ = C by the map b ( − b Page 97 of 114 efinition 5.20. Let L ⊃ K ⊃ Q be a finite degree tower of number fields with L/K Galois.Let P ⊂ K be a prime ideal which is unramified in L and let Frob P be the Frobenius elementof P in Gal( L/K ). Then the Artin symbol, h L/K P i , is defined as the conjugacy class of theFrobenius automorphisms of L/K corresponding to the prime ideals Q ⊂ L dividing P : (cid:20) L/K P (cid:21) = { Frob Q ∈ Gal( L/K ) | Q | P } . Remark. It is a common abuse of notation to write Frob P for any of the elements in h L/K P i andjust consider the element well defined up to conjugacy.This allows us to define the set π C ( x ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:26) P | P ⊂ K prime , N K/ Q ( P ) < x, (cid:20) L/K P (cid:21) ∈ C (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) , where C ⊆ Gal( L/K ) s.t. for all g ∈ Gal( L/K ): gCg − = C . Now let K = Q ( α ) and let h ∈ O K be an element of minimal norm representing a non-trivial element of H . Let L = K ( √ h ), thenby definition [ L : K ] = 2, d K = d , d L = 2 d and Gal( L/K ) = C . Since Gal( L/K ) is abelian, wehave that h L/K P i only contains one element. Hence the value h L/K P i corresponds exactly to theaction of χ P on h .As h has minimal norm it fulfills the Minkowski bound, 2.41, hence: N K/ Q ( h ) ≤ c K/ Q = p ∆ K (cid:18) n (cid:19) d d ! d d = n δκ (1+ o (1) . By construction the different is generated by 2 h , and so is an integral ideal. As the relativediscriminant is the norm of the different we get that:∆ L/ Q ≤ N K/ Q (2 h )∆ K/ Q ≤ n δκ (5+ o (1)) . We now need an improved version of Chebotarev’s Density Theorem which we alluded to before.Page 98 of 114 heorem 5.21. Unconditional Effective Chebotarev Density Th. Let L/K/ Q be a towerof number fields with L/K Galois with G = Gal( L/K ). Let C ⊆ G be a union of conjugacyclasses such that gCg − = C for all g ∈ G . Let (cid:12)(cid:12) ¯ C (cid:12)(cid:12) be the total number of conjugacy classes in G . Lastly let 1 − ν be a Siegel zero of ζ L if it exists and 0 otherwise. Then there exists c > x ≥ d L log (∆ L ): (cid:12)(cid:12)(cid:12)(cid:12) π G ′ ( x ) − | C || G | Li( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | C || G | Li( x − ν ) + O x (cid:12)(cid:12) ¯ C (cid:12)(cid:12) exp( − c r log xd L ! . Using the unconditional effective Chebotarev Density Theorem on the degree 2 extension L/K , then for P chosen uniformly at random with N ( P ) ≤ x : (cid:12)(cid:12)(cid:12)(cid:12) P ( χ P ( h ) = 1) − (cid:12)(cid:12)(cid:12)(cid:12) < x − ν (1+ o (1)) + O x exp − c r log xd L !! , where 1 − ν is a possible Siegel zero of the Dedekind zeta function over L , ζ L . Ensuring that P ( χ P ( h ) = 1) = + log(1) requires us to insist that log( x ) = ω ( d L (log log x ) ) and, if ζ L has aSiegel zero, log( x ) = ω ( ν − ).Note that, for each non-trivial coset, we can choose a representative h ∈ O K of minimal normand then run through the previous paragraph for each of these. Despite the element h beingdifferent every time we can get a similar result each time, dependent on h . Hence we make thefollowing definition: Definition 5.22. For a field K and h a minimal norm representative of an element of H , wedefine L h = K ( √ h ). For ǫ > E K,ǫ = (cid:26) h · { z | z ∈ K × } ∈ H s.t. ∃ ν : ζ L h (1 − ν ) = 0 , ν − > L n (cid:18) , ǫ (cid:19) (cid:27) . It is clear that if h ∈ K × and k ∈ K × represent the same coset in H , then L h = L k . Tojustify this we only have to note that representing the same coset means that h and k differ byPage 99 of 114 square: h = kz , hence √ h = √ kz = √ kz , since z ∈ K × we get that irr K ( √ h ) = irr K ( √ k ) and so L h = L k .The exceptional set E K,ǫ is the subset of H which cannot be reliably distinguished from 0by characters induced by primes of size e L n ( ,ǫ ). This means that if a Siegel zero of the formspecified exists, then almost every prime ideal of this size induces a character which vanishes forsome element of H . However we can limit the size of the exceptional set. Lemma 5.23. Suppose that K = Q ( α ) is a number field where α is a root of an irreducible f = ˆ f + ( x − m ) R where R is uniformly random. Then for ǫ = (cid:0) + o (1) (cid:1) δ , P f (cid:18) | E K,ǫ | > 43 log log n (cid:19) ≤ L n (cid:18) , κ − δ − 13 (1 + o (1) (cid:19) − . For now we accept this lemma as fact, as this will allow us to prove lemma 5.19. The prooffor this lemma is based on the sparseness of Siegel zeros for Dedekind zeta functions definedover L h and will be shown in Section 5.3.3. Accepting the lemma, we know there exists an x such that log( x ) = ω ( d L (log log x ) ) and, if there is a Siegel zero for ζ L h , log( x ) = ω ( ν − ) forall but log log n of the h ∈ H . Moreover for all but at most L n (cid:0) (cid:1) − of our maximum L n (cid:0) (cid:1) polynomials f we have log x < L n (cid:0) , ǫ (cid:1) . Choosing to acccept failure on all these polynomialsstill guarantees that the algorithm fails with probability o (1).Now we turn to the sampleable distribution Υ. Recall that any prime for which N ( P ) < x is guaranteed to divide a prime p with p < x . Moreover, if P is of degree k , then p < k √ x .Equivalently, a k th degree prime ideal dividing p corresponds to a simple k ’th degree divisor of f mod p . The following algorithm samples Υ in such a way that it outputs ideals of which allbut a small fraction are prime. Page 100 of 114 lgorithm 4 Sampling Υ for (prime) ideals. Input: A polynomial f of degree d Output: A pair ( p, r )Uniformly at random sample k ∈ Z /d Z Uniformly at random sample p ∈ (cid:16) x ( k +1) − , x k − (cid:17) Test for primality of p with the Miller–Rabin test if r is prime then factor f mod p and find the irreducible and unrepeated factors r i of degree < k if |{ r i }| ≥ j then Uniformly at random sample one r i and define r i = r else Sample a new k and repeat the algorithm end ifend if Remark. We will now make a few remarks regarding the runtime of the algorithm. First notethat to obtain the required runtime of the sieve we need this algorithm to have runtime at most O (cid:0) log ( n ) (cid:1) . Let’s see how we obtain this: • First note that the runtime-bound immediately means log x = L (cid:0) (cid:1) . This means wecan not use deterministic primality tests, such as the AKS primality test we used for theGNFS. Especially we note that discarding a composite p using the Miller–Rabin primalitytest takes O (log ( x ) log log x ). • The Miller–Rabin primality test discards composite p with probability 1 − O (log − ( x ))and so p is prime with probability Ω ((log x ) − ) and so any p produced here is prime withprobability 1 − O ( d (log x ) − ). • Factoring f mod p is possible in time O (cid:0) ( d log x ) (cid:1) probabilistically. • If p is composite, but slips through the cracks of the Miller–Rabin primality test, then thefactorisation of f mod p may fail. If it does not fail, then this p still induced a quadraticPage 101 of 114haracter and therefore vanishes on the squares. Since we obtain at most one characterfrom each p and are guaranteed to find a character if k = d and p is prime, the fraction ofcharacters which are not induced by primes is o (1).To finish the proof we need to show that this algorithm is sufficiently fast and that characterscan be evaluated quickly. Moreover we, once again, have to show that these characters aresufficiently uniformly distributed that the bounds on P ( χ P = 1) hold. Proposition 5.24. The expected time taken to sample ( p, r ) ∼ Υ as above is at most L n (cid:18) , (4 + o (1)) ǫ (cid:19) . Proof. We noted in our remark that each attempt from the start of the algorithm takes time O (( d log x ) ) as the factorization of f mod r is slowest. We are guaranteed to find a factorif our degree bound k is d , which happens with probability d , the integer r is prime withprobability Ω ( x ), and we successfully take an ideal in the final step with probability Ω (cid:0) d (cid:1) if k = d . Hence the number of attempts needed to output a prime is bounded in expectation by O ( d log x ). Hence the time taken to find an ideal is bounded in expectation by O ( d log ( x )) = L n (cid:0) , (4 + o (1)) ǫ (cid:1) . Proposition 5.25. For any fixed h , P Υ ( χ ( p,r ) ( h ) = − ≥ d (1 + o (1)). Proof. The distribution of primes P generated is uniform over P mod h p i for p ∈ (cid:16) x ( k +1) − , x k − (cid:17) of degree at most k by the Prime Ideal Theorem, 4.22. This property also holds for a uniformdistribution over primes of norm ≤ x by an adaptation of Dirichlet’s theorem of primes in arith-metic progressions, 4.28. Thus the difference between Υ and a uniform distribution over primesof norm ≤ x is the distribution of the degree of these primes. The probability that Υ samples P with N ( P ) ≤ x and P | h p i for p is each of these intervals is d . Hence Υ pointwise dominates d − times the uniform distribution over all primes of norm below x . Therefore P Υ ( χ ( p,r ) ( h ) = − ≥ d P N ( P ) ≤ x ( χ P ( h ) = − 1) = 12 d (1 + o (1)) . Page 102 of 114he final proposition immediately completes the proof of lemma 5.19, however it is quitetechnical as the logarithms we are interested in are quite large. We, therefore, have to track thearithmetic carefully. Proposition 5.26. Evaluating the characters χ ( p,r ) associated with the ideal P sampled asabove on a term a − bα takes time at most L n (cid:0) , (2 + o (1)) ǫ (cid:1) . Proof. Let p = 2, then χ ( p,r ) = 1. Hence we may assume p > 2. For any polynomial P ∈ F p [ x ]let | P | = p deg( P ) . Recall χ ( p,r ) ( a − bα ) = (cid:18) a − bxr ( x ) (cid:19) L . Any constant c can reduce the computation to finding a Legendre symbol mod p : (cid:16) cP (cid:17) L = c | P |− = (cid:18) cp (cid:19) pk − r − = (cid:18) cp (cid:19) k . From [7] we call attention to the law of quadratic reciprocity for function fields by noting thatfor any two relatively prime monic irreducible polynomials over F p : (cid:18) PQ (cid:19) (cid:18) QP (cid:19) = ( − | P |− | Q |− . Hence χ ( p,r ) ( a − bα ) = (cid:18) − bp (cid:19) k (cid:18) x − ab − r ( x ) (cid:19) L = (cid:18) − bp (cid:19) k ( − p − p deg( r ) − (cid:18) r ( x ) x − ab − (cid:19) = (cid:18) − bp (cid:19) k ( − p − p deg( r ) − (cid:18) r ( ab − ) x − ab − (cid:19) = (cid:18) − bp (cid:19) k ( − p − p deg( r ) − (cid:18) r ( ab − ) p (cid:19) . The parities of p − and p k − are easily computed. Hence to compute χ ( p,r ) ( a − bα ) it sufficesto compute r ( ab − ) and two legendre symbols modulo p in O (log p ) additions or subtractions ofnumbers of size at most p .To compute b − mod p requires the extended Euclidean algorithm to be run, which requiresPage 103 of 114 (log p ) additions of numbers of size at most p .To compute ab − mod p requires one multiplication.To compute s ( ab − ) mod p requires at most O ( d ) addition and multiplications modulo p .Addition or subtraction of numbers of size p or modulo p takes O (log p ) steps, while multiplica-tion modulo p takes O (cid:0) log ( p ) (cid:1) steps by iterative addition and doubling. Hence the computationtime in total requires time O ( d log ( p )) = O ( d − log ( x )) = L n (cid:18) , (2 + o (1)) ǫ (cid:19) . To complete the proof of the lemma we now simply take c = (4 + o (1)) ǫ. As we remarked earlier this proof is dependent on the sparseness of Siegel zeroes over Dedekindzeta functions. To start this we need the following proposition, [[14], fact 6.22 and corollary6.23], which we will state without proof. Proposition 5.27. Let K be a number field and let K be the normal closure of K . Let c ( K ) = 4if K/ Q is normal and c ( K ) = 4[ K : Q ] otherwise. Assume that there is a real Siegel zero of ζ K ,denoted 1 − ν , such that 1 − ( c ( K ) log | ∆ K | ) − ≤ − ν ≤ . Then there is a quadratic field F ⊂ K such that ζ F (1 − ν ) = 0.Now we start the proof of lemma 5.23. To do this we may assume that f ( x ) ∈ Q ( α )[ x ] is anirreducible polynomial, as the probability that f is reducible can be absorbed in the error termby 5.15 Proposition 5.28. If L h ⊃ L h is the normal closure of L h = K (cid:16) √ h (cid:17) , then (cid:2) L h : Q (cid:3) ≤ d d !.Page 104 of 114 roof. Let K be the splitting field of K . By construction, K/ Q is normal, especially that meansit is Galois, and [ K : Q ] ≤ d !. Let G = Gal( K/ Q ). Given h ∈ O K , let O h be the orbit of h under the action of G . Then |O h | ≤ d . We adjoin square roots of each element of O h to K toobtain a field L . Then [ L : Q ] ≤ d d !.Since degree 2 extensions are normal, the compositum of normal extensions is normal, and L/K is a compositum of at most d degree 2 extensions, hence the extension L/K and K/ Q arenormal. As any σ ∈ Aut Q ( K ) acts on O h as a permutation we can extend σ to an element ofAut Q ( L ). So L/ Q is normal and L h ⊆ L , concluding our hypothesis.Hence from proposition 5.27, if ν − > d +2 d ! log ∆ L h / Q , then 1 − ν must be a zero of somequadratic field F h = Q ( s h ) ⊆ L h . By assumption 2 ∤ [ K : Q ], hence K has no quadratic subfields,which means that F h ∩ K = Q and so F h is the only quadratic subfield of L h . Moreover, L h isthe smallest field containing both F h and K and since the h ∈ H are not related by squares of K it holds that L h doesn’t contain any h ′ ∈ H from another class. This assures that the L h produced as h varies are all distinct fields and so all s h must be distinct.By the transitivity of the discriminant, lemma 2.47, we have, for the towers of number field L h /F h / Q and L h /K/ Q that∆ dF h / Q N F h / Q (∆ L h /F h ) = ∆ K/ Q N K/ Q (∆ L h /K ) . As ∆ L h /K is the relative discriminant, hence is an ideal, we can use the Minkowski Bound: N K/ Q (∆ L h /K ) ≤ p ∆ K/ Q (cid:18) π (cid:19) d (cid:18) d ! d d (cid:19) ≤ p ∆ K/ Q . Since ∆ K/ Q ≤ L n (cid:0) , δ κ (cid:1) and ∆ L h / Q ≤ L n (cid:0) , δ κ (cid:1) , we can conclude that∆ F h / Q = O (cid:18) L n (cid:18) , δκ (cid:19)(cid:19) . Page 105 of 114y the contribution of the Euler product of the Dedekind zeta function: ζ F h / Q ( s ) = ζ ( s ) L (cid:18) s, (cid:18) ∆ F h / Q · (cid:19)(cid:19) , and by reciprocity j (cid:16) ∆ Fh/ Q j (cid:17) is a Dirichlet character modulo ∆ F h / Q .Now we invoke [[14], fact 6.24]: Proposition 5.29. There is an effective constant c > χ r , χ r ′ of moduli r, r ′ respectively, with χ r χ r ′ non-principal, then the product of Dirichlet L -functions L ( s, χ r ) L ( s, χ r ′ ) has at most one real zero in (cid:16) − c log rr ′ , (cid:17) .So if there are two characters modulo q and q ′ respectively there is at most one L -functionwith a zero 1 − ν and ν − > c log qq ′ for some effective c . It immediately follows that there is atmost one character with modulus in [ q, q e ] with a zero at 1 − ν and ν − > ( e + 1) c log q .Since ∆ F h / Q < L n (cid:0) , (cid:0) + o (1) (cid:1) δκ (cid:1) and ∆ F h / Q ∈ Z it follows that all possible discriminantscan be covered by log log n ranges of form [ x, x e ]. Hence there are at most log log n excep-tional characters with exceptional zeroes such that ν − > ( e + 1) c log (cid:18) L n (cid:18) , (cid:18) 54 + o (1) (cid:19) δκ (cid:19)(cid:19) = O (cid:16) δκ log ( n ) log log( n ) − (cid:17) , as required. This bound is far weaker than the required bound of ν − > d +2 d ! log ∆ L h / Q , and sothere are at most log log n extensions L h / Q with exceptional zeroes and ν − > d +2 d ! log ∆ L h / Q .Finally: 2 d +2 d ! log ∆ L h / Q ≤ d d (1+ o (1) log O (1) ( n ) = L n (cid:18) , δ o (1)) (cid:19) . This completes the proof of the lemma. Page 106 of 114 .3.4 Proof of theorem 5.3 We have now removed all obstructions that were needed. It is therefore that we can now provetheorem 5.3. Recall that for a set of characters F we define the mapΨ F : H → F | F | ,x (cid:0) χ ( p,r ) ( x ) : χ ( p,r ) ∈ F (cid:1) . With the work we have done up to now we are ready to produce our linear Ψ F with small kernel,and this to produce a congruence of squares. Due to the size of some of the numbers involved,having L n (cid:0) (cid:1) bits, we track the arithmetic very closely. First, using lemma 5.19, we sample4 d (cid:18) δκ log n + δ κ ( n )log log n (cid:19) pairs ( p i , r i ) from Υ independently.Note that this sample is of size o (log ( n )) = L n (cid:0) , o (1) (cid:1) and each individual sample takesat most L n (cid:0) , c (cid:1) , so we can produce the complete sample in L n (cid:0) , c + o (1) (cid:1) . After this processwe have M = 1+ B + dB ′ +4 d (cid:18) δκ log n + δ κ ( n )log log n (cid:19) pairs from Section 5.2.1 and the samplefrom Υ. Note that M = L n (cid:0) , max ( β, β ′ ) (cid:1) o (1) . For each of these, we need to evaluate eachof our characters and as each character evaluates in L n (cid:0) , c (cid:1) we get that this process takes L n (cid:0) , c + max( β, β ′ ) (cid:1) o (1) .Fix some h ∈ H \{ } such that h is not in the exceptional set of size log log n . Each map χ ( p i ,r i ) is independent and induces a map in Hom( H, F ) such that P ( h ker (cid:0) χ ( p i ,r i ) (cid:1) ) ≥ o (1)2 d . As a corollary it follows that: P ( h ∈ ker(Ψ F )) ≤ (cid:18) − o (1)2 d (cid:19) M − (1+ B + dB ′ ) ≤ | H | − o (1) . Page 107 of 114ence by the union bound over the non-trivial elements of H the probability that any of thesenon-exceptional and non-zero elements is in the kernel is o (1). Hence with high probability thekernel of Ψ F has size at most log log n .From here we proceed as normal. With the M pairs representing linear polynomials we usea fast kernel finding algorithm for sparse matrices, such as Block–Wiedemann, [21], to find asuitable subset S i to construct a polynomial P i in time O ( M ) = L n (cid:18) , β, β ′ )(1 + o (1) (cid:19) , such that P i ( m ) is a square in Z and P i ( α ) is a square in O Q ( α ) multiplied by one of at most log log n elements of H . Repeating the algorithm l = log log n times to generate polynomials P , . . . , P l , we are able to guarantee that for some i < j , P i and P j lie over the same element h ,hence P i P j is a square in O Q ( α ) . In what follows we consider the (cid:0) l (cid:1) ∼ (log log n ) polynomialsseparately.Now if γ ∈ O Q ( α and γ ∈ Z [ α ], then γ · f ′ ( α ) ∈ Z [ α ]. Let S = S i ∆ S j and fix the polyno-mial P = (cid:20) ∂f∂x ( x, (cid:21) Y ( a,b ) ∈S ( a − bx ) , and so u = (cid:20) ∂f∂x ( x, y ) (cid:21) ( m, Y a,b ∈S ( a − mb )is a square in Z . Hence u can be found by taking the product modulo n over all p < B of p raisedto half the total order of p in the terms ( a − mb ) for ( a, b ) ∈ S and multiplying by f ′ ( m, The reason for computing the square root in this fashion is to ensure we need only M log n additions and divisions and at most M log n modular multiplications to compute u mod n fromthe exponents. This ensures polynomial running time. This technique differs minimally from the technique for Z in the GNFS Page 108 of 114imilarly, for at least one of the (cid:0) l (cid:1) polynomials considered, there exists v ∈ Z [ α ] such that v = (cid:20) ∂f∂x ( x, y ) (cid:21) ( α, Y ( a,b ∈S ( a − bα ) . Using [24] we can compute square roots in the number field to find v ( m, 1) mod n in time O ( M ). We abuse notation and write v ( m ) for the element of Z /n Z obtained by substituting m for α . By the definition of f we have f ( m, 1) = n and so: v ( m ) mod f ( m, 1) = (cid:20) ∂f∂x ( x, y ) (cid:21) ( α, Y ( a,b ) ∈S ( a − bα ) mod f ( α, ( m )= (cid:20) ∂f∂x ( x, y ) (cid:21) ( m, Y ( a,b ) ∈S ( a − mb ) mod ( f ( m, u mod n, and so we have constructed a congruence of squares in time L n (cid:18) , max(2 max( β, β ′ ) , max( β, β ′ ) + c , c ) (cid:19) o (1) . As c ≤ (cid:0) + o (1) (cid:1) δ , we can insist that we have at most log log n exceptional values of h , andour f lies off a set of probability at most L n (cid:16) , κ − δ − (1 + o (1)) (cid:17) − the run time bound is asclaimed, finishing the proof. The computational efficiency boils down to proving theorem 5.1. For this we fix n, β, β ′ , σ, δ ,and κ satisfying the conditions of equations 5 and 6. Then by theorem 5.3 we can extract acongruence of squares mod n from L n (cid:0) , max( β, β ′ ) + o (1) (cid:1) pairs ( a, b ) ∈ X f,m,n for a fixed( m, f ) in expected time L n (cid:18) , (cid:18) δ , β, β ′ (cid:19) (1 + o (1)) (cid:19) . Page 109 of 114heorem 5.2 tells us that a fixed ( m, f ) and this many pairs ( a, b ) ∈ X f,m,n will be found inexpected time L n (cid:18) , max ( β, β ′ ) + δ − β (1 + o (1)) + σδ + κ β ′ (1 + o (1)) (cid:19) . Hence we can run the RNFS to obtain a congruence of squares mod n with the expected timebounded by L n (cid:18) , λ (1 + o (1) (cid:19) , λ = max (cid:18) (cid:18) δ , β, β ′ (cid:19) , max ( β, β ′ ) + (cid:18) δ − β + κ + σδ β ′ (cid:19)(cid:19) . Having chosen β, β ′ , σ, δ, κ satisfying the conditions of 5 and 6 we can optimize the constants.Note that increasing the lesser of β and β ′ cannot increase λ or cause the conditions on theconstants to be violated, so we can assume β = β ′ . We then compute:2 σ ≥ λ ≥ min β,δ (cid:18) β + 2 δ − + σδ + o (1)3 β (cid:19) ≥ min β β + √ σ + o (1)3 β ! ≥ r σ o (1) . Fix any ǫ > ǫ = o (1). If we take β = β ′ = σ = δ = q + ǫ and κ = q + ǫ thenthe above are all equalities. Furthermore all the conditions are satisfied, giving λ = 2 q + o (1).This proves a runtime of L n , r 649 + o (1) ! . As a final note we want to have a short discussion on some of the ways that the Generalized Rie-mann Hypothesis could impact the analysis performed on the Randomized number field sieve.As discussed: Until now we have not made any assumption regarding the GRH and only limitedassumptions on the heuristics, but if we were to accept the GRH then much of our discussionbecomes a lot simpler.One of the main reasons it becomes simpler is because the Generalized Riemann HypothesisPage 110 of 114llows us to assume that there are no Siegel zeroes. This makes the discussion around algebraicobstructions a lot simpler. For example we obtain the following: Proposition 5.30. Conditional on GRH, for ǫ = log − ( n ) = o ( δ ), P f ( | E K,ǫ | > 0) = 0 . This follows automatically from our discussion of the size of this set and the fact there areno Siegel zeroes. This makes lemma 5.23 vacuous and allows us to sharpen the L n (cid:0) (cid:1) boundsin lemma 5.19 polynomial in log n .The GRH also allows us to have far tighter effective bounds on the prime numbers in arith-metic progressions. Without going into details it would allow us to subvert the use of theadapted Bombieri–Vinogradov theorems and use an effective version of Chebotarev’s DensityTheorem which is completely dependent on the GRH holding.It is obvious that in a discussion like this it could easily be assumed that the GRH holds,but that would make the whole duscussion conditional and the beauty of this is that we makeno grand assumptions whatsoever. To obtain a complexity that is equivalent to the heuristiccomplexity without relying on any conditions makes this the best possible outcome.Does this mean we are done with the number field sieve after thirty years? No, definitely not.Over the years many different versions have found their way into mathematics and cryptographyto solve other problems than factorization. For example the Tower Number Field Sieve whichis an adapted algorithm to solve Discrete Logarithm Problems ([27], [28]) has been worked onfor many years and it will be interesting to see if a similar randomization leads to a provablecomplexity for that as well. Page 111 of 114 eferences Books [1] S. Alaca, K.S. Williams, Introductory Algebraic Number Theory Cambridge University Press,First edition, 2004.[2] R. Crandall, C. Pomerance, Prime numbers, a computational perspective Springer Verlag,First edition - corrected printing, 2002.[3] A.K. Lenstra, H.W. Lenstra Jr., Development of the Number Field Sieve Springer Verlag,1993. Which is the source used for the papers [15], [16] cited below[4] I. Stewart, Galois Theory , CRC Press, Fourth edition, 2015[5] H. Cohen, A Course in Computational Algebraic Number Theory Springer Verlag, FourthEdition, 2000[6] L. C. Washington, Introduction to Cyclotomic Fields Springer Verlag, Second edition, 1997[7] M. Rosen, Number Theory in Function Fields , Springer Verlag, 2002.[8] H.L. Montgomery, R.C. Vaughan, Multiplicative Number Theory 1: Classical Theory , Cam-bridge University Press, 2010.[9] J. Milne, Clasds Field Theory Multiplicative Number Theory , Springer Verlag, Second edition, 1980[11] G. Greaves, Sieve in Number Theory Springer Verlag, First Edition, 2001 Papers [12] M.A. Morrison, J. Brillhart, A Method of Factoring and the Factorization of F Mathematicsof Computation, American Mathematical Society, Volume 29, 1975.Page 112 of 11413] P. Stevenhagen, The number field sieve Algorithmic Number Theory, MSRI Publications,Volume 44, 2008.[14] J.D. Lee, R. Venkatesan, Rigorous analysis of a randomised number field sieve Journal ofNumber Theory, Volume 187, 2018[15] J.M. Pollard, Factoring with cubic integers As printed in [3], Springer Verlag, 1993.[16] A.K. Lenstra, H.W. Lenstra Jr., M.S. Manasse, J.M. Pollard The Number Field Sieve Asprinted in [3], Springer Verlag, 1993[17] J.P. Buhler, H.W. Lenstra Jr., C. Pomerance Factoring integers with the number field sieve As printed in [3], Springer Verlag, 1993[18] A. Granville, It is easy to determine whether a given integer is prime Bull. Amer. Math.Soc., Volume 42, 2005[19] D. Coppersmith, Solving linear equations over GF(2): Block–Lanczos Algorithm , LinearAlgebra Applications, 1991.[20] P. Montgomery, A Block–Lanczos Algorithm for Finding Dependencies over GF(2) Adv. inCryptology - Eurocrypt ’95, 1995[21] D. Wiedemann Solving sparse linear equations over finite fields , IEEE Trans. Inform.Theory, Volume 32, 1986[22] A. Hildebrand, G. Tenenbaum, Integers without large prime factors , Jour. Th. des Nom.de Bordeaux, Volume 5, 1993[23] A.J. Harper, Bombieri–Vinogradov and Barban-Davenport-Halberstam type theores forsmooth numbers , arXiv:1208.5992 [math.NT][24] E. Thom´e, Square root algorithms for the number field sieve , Proc. of the 4th Int. Workshopon Arith. in Finite Fields, Springer, 2012Page 113 of 11425] A. Hildebrand, On the number of positive integers ≤ x and free of prime factors > y ,J.Number Theory, 1986, Volume 22, p. 289–307[26] A. Hildebrand, G. Tenenbaum, On integers free of large prime factors , Trans. Amer. Math.Soc., 1986, Volume 296, p. 265–290[27] O. Schirokauer, The impact of the number field sieve on the Discrete Logarithm Problem ,Algorithmic Number Theory, MSRI Publications, 2008, Volume 44[28] R. Barbulescu, et. al, The Tower Number Field Sieve , Advances in Cryptology, ASI-ACRYPT 2015, 2015, Volume 9453[29] K. Soundararajan, The Distribution of Smooth Numbers in Arithmetic Progressions CRMProceedings and Lecture notes, American Mathematical Society, 2008, Volume 46, p. 115-128[30] A. Harper,