The average number of integral points on elliptic curves is bounded
aa r X i v : . [ m a t h . N T ] F e b THE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES ISBOUNDED
LEVENT ALPOGEA
BSTRACT . We prove that, when elliptic curves E/ Q are ordered by height, the averagenumber of integral points | E ( Z ) | is bounded, and in fact is less than (and at most on the minimalist conjecture). By “ E ( Z ) ” we mean the integral points on the correspondingquasiminimal Weierstrass model E A,B : y = x + Ax + B with which one computes thena¨ıve height. The methods combine ideas from work of Silverman, Helfgott, and Helfgott-Venkatesh with work of Bhargava-Shankar and a careful analysis of local heights for “most”elliptic curves. The same methods work to bound integral points on average over the families y = x + B , y = x + Ax , and y = x − D x . C ONTENTS
1. Introduction 22. Acknowledgements 43. Notation, previous results, and outline of the argument 53.1. Notation 53.2. Previous results 63.3. Detailed sketch of proof of boundedness 74. Proof of Theorem 2 94.1. Restricting to a subfamily and handling small points 94.2. Local heights and a gap principle 134.3. Decomposing the set of integral points into classes: I – IV I is small: multiples of rational points are rarely integral 174.5. II is small: integral points repel in the Mordell-Weil lattice 214.6. III is small and IV is empty: an explicit bivariate Roth’s Lemma 214.7. Conclusion of proof 305. Proof of Theorem 1 and its corollaries 315.1. y = x + B y = x + Ax y = x − D x Mathematics Subject Classification.
Key words and phrases. integral points, elliptic curves, Siegel’s theorem, Mumford gap principle, sphericalcodes.
1. I
NTRODUCTION
The question of counting the number of integral solutions to an equation of shape y = x + Ax + B goes back at least to Fermat, who, on considering this question for specific A and B (e.g. one of his challenge problems to the English was to find all integral solutionsto y = x − ), developed his method of descent. Fermat also applied this method toshow certain such equations had no nontrivial rational solutions (famously, y = x − x ,showing that is not the area of a right triangle with rational sides), leading to the questionof counting the number of rational solutions to such equations as well.This last question has seen great progress. Certainly the number of solutions is eitherinfinite or finite, and density considerations ([15]) imply that of curves with finitelymany rational points have any at all. Recent work of Bhargava-Shankar [5] and Bhargava-Skinner-Zhang [9] implies that, in fact, both possibilities — infinitely many and none atall — occur with positive probability. This agrees with the expectation derived from theBirch and Swinnerton-Dyer conjecture of each possibility occurring with probability onehalf (the “minimalist conjecture” of Goldfeld and Katz-Sarnak).Progress has also been made for equations of shape y = f ( x ) with f ∈ Z [ x ] of fixed odddegree g + 1 > . Here, by Faltings’s theorem, one cannot have infinitely many solutions,and indeed one expects none with probability . In fact Poonen-Stoll [31], building on workof Bhargava-Gross [3], were able to prove that such a curve has no rational solutions withprobability at least − (12 g + 20)2 − g , which is quite close to for g very large.But the analogous question for integral points on elliptic curves does not yield to thesemethods. By a theorem of Siegel there are only finitely many solutions to y = x + Ax + B if A and B are such that the discriminant of the cubic, − A − B , is nonzero, so thatthe equation defines an elliptic curve. Therefore we are in a situation like that of Poonen-Stoll/Bhargava-Gross, and similarly we expect to have no integral solutions with proba-bility . But despite the expected paucity of curves with integral points, until now it wasnot known whether the average number of integral points on elliptic curves is bounded.In this paper we show that it is indeed bounded — in fact, by .Let us now be more precise. An elliptic curve E/ Q has a unique Weierstrass model ofthe form E A,B : y = x + Ax + B , where A and B are such that p | A = ⇒ p ∤ B and − A − B = 0 . Given a Weierstrass model, we define the set of integral points on thecurve as E A,B ( Z ) := { ( x, y ) ∈ Z | y = x + Ax + B } , and write | E A,B ( Z ) | for its cardinality. To produce probabilistic statements, we need anotion of density. We write H ( E A,B ) := max(4 | A | , B ) for the na¨ıve height of E A,B .Note that our normalization is slightly different from that of Bhargava-Shankar.Given a family F of elliptic curves and a function f on this family, we define Avg E ∈F ≤ T ( f ( E )) := X E ∈F ,H ( E ) ≤ T f ( E ) X E ∈F ,H ( E ) ≤ T . Indeed, this expectation dates back at least to 1986: see page 269 of the first edition of Silverman’s
Arithmeticof Elliptic Curves [35].
HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 3
Thus for instance Bhargava-Shankar [5] have shown that lim sup T →∞ Avg E ∈F ≤ T (5 rank( E ) ) ≤ for F the family of all elliptic curves.Let now F universal be the family of all elliptic curves, F A =0 be the family of Mordellcurves y = x + B ( B sixth-power free), F B =0 be the family of curves y = x + Ax ( A fourth-power free), and F congruent be the family of congruent number curves y = x − D x ( D squarefree). With this notation in hand, we may state our main result. Theorem 1.
Let k ≥ . Let F = F universal , F A =0 , F B =0 , or F congruent .Then: lim sup T →∞ Avg E ∈F ≤ T ( | E ( Z ) | k ) ≤ O (1) k · lim sup T →∞ Avg E ∈F ≤ T (3 k · rank( E ) ) , where the implied constant is effective and absolute. Work of Bhargava-Shankar [5] implies that, for F = F universal , lim sup T →∞ Avg E ∈F ≤ T universal (5 rank( E ) ) ≤ , whence the right-hand side of the theorem is ≪ when k = 1 and indeed when k ≤ log 5log 3 = 1 . ... . (Hence e.g. the proportion of curves with at least n integral points is o ( n − . ... ) .) For this family we optimize our bound to get: Theorem 2.
When all elliptic curves E/ Q are ordered by height, the average number of integralpoints | E ( Z ) | is less than . . Moreover, if the minimalist conjecture is true, . maybe replaced by . That is, lim sup T →∞ Avg E ∈F ≤ T universal ( | E ( Z ) | ) < . , and this upper bound may be replaced by ≤ if the minimalist conjecture holds. In Section 5.3 we describe how to extend work of Heath-Brown in [17] to prove that, for F = F congruent , lim sup T →∞ Avg E ∈F ≤ T congruent ( k rank( E ) ) ≪ O (1) (log k ) . From this it follows that:
Corollary 3.
When the congruent number curves E : y = x − D x ( D ∈ Z + squarefree) areordered by height, the k -th moment of the number of integral points | E ( Z ) | k is bounded above by O (1) k , where the implied constant is effective and absolute. In particular, the proportion of curveswith at least n integral points decays like n − Ω(log n ) . Here by the “minimalist conjecture” we mean not only that the ranks of elliptic curves in F ≤ T universal aredistributed / between and in the limit T → ∞ , but also the same statement for the subfamily of ( A, B ) (2 ,
2) (mod 3) . Otherwise should be replaced by another constant smaller than . This results from being unable to rule out the possibility of almost every rank one curve having an integralgenerator in the subfamily ( A, B ) (2 ,
2) (mod 3) . LEVENT ALPOGE
In Section 5.2 we describe how to extend work of Kane [24] and Kane-Thorne [25] toprove that, for F = F B =0 , there is a very large (we will quantify this in the proof) subfamily e F B =0 ⊆ F B =0 for which Avg E ∈ e F ≤ TB =0 ( k rank( E ) ) ≪ O (1) (log k ) . From this it will follow that:
Corollary 4.
When the curves E : y = x + Ax ( A ∈ Z + fourth-power free) are ordered byheight, the k -th moment of the number of integral points | E ( Z ) | k is bounded above by O (1) k ,where the implied constant is effective and absolute. In particular, the proportion of curves with atleast n integral points decays like n − Ω(log n ) . The subfamily e F B =0 will essentially be the subfamily determined by the conditions that A be almost squarefree, have a number of prime factors bounded above by a large constanttimes log log A (the expected number), and not be a multiple of a modulus supporting acharacter with a problematic Siegel zero.Finally, in the case of F = F A =0 , work of Ruth [32] bounds the average of | Sel ( E ) | ,but a bound on the average of rank( E ) is not yet known. Having stated our main results, let us now detail the organization of the paper. In Sec-tion 3 we set notation, state previous results towards these theorems, and give a detailedargument (leaving inessential details to references to Section 4 along the way) towards The-orem 2, proving boundedness by O (1) rather than an explicit constant. We do this becausethe length of the argument in Section 4 potentially obscures the main ideas, which are al-ready present in the proof of boundedness. In Section 4 we prove Theorem 2, leaving thediscussion of the optimization of our bounds to Appendix A. In Section 5 we then proveTheorem 1 for the remaining three families by following the general method used to proveTheorem 2. We also prove Corollaries 3 and 4 by adapting the methods of Heath-Brownand Kane-Thorne to control sizes of Selmer groups in these families. Finally, in AppendixA we provide details of the optimization for Theorem 2.2. A CKNOWLEDGEMENTS
Theorem 2, without the explicit constant, was the subject of my senior thesis at Har-vard, supervised by Jacob Tsimerman. I would like to thank him and Arul Shankar forsuggesting the problem and for all their patience. I would also like to thank Henry Cohnfor his help with bounds on spherical codes and for allowing me to use a program he wroteto optimize linear programming upper bounds for codes on RP n . I would further like tothank Manjul Bhargava, Peter Bruin, Noam Elkies, John Cremona, Roger Heath-Brown, However, to show boundedness of the average of | E ( Z ) | over this family, one may proceed as follows.Integral points on y = x + B give solutions to − (4 a + 27 b ) = 108 B via a := − x, b := 2 y , whence alsobinary cubics x + axy + by with discriminant B . Now, by Davenport-Heilbronn, the number of binary cubicforms f ( x, y ) with discriminant | ∆ | ≪ X , when taken up to GL ( Z ) equivalence, is ≪ X . (See e.g. Theorem 5 of[8].) It therefore suffices to show that there are ≪ many forms f of shape x + axy + by in each equivalenceclass. If an equivalence class has no such forms, then we are done. Otherwise, we need only check that there are ≪ many γ ∈ GL ( Z ) that take x + axy + by to a form of shape x + a ′ xy + b ′ y . Note that, given such a γ =: (cid:18) p qr s (cid:19) , ( f ◦ γ )(1 ,
0) = 1 , so that f ( p, r ) = 1 . Moreover, the condition that the x y term be zero gives acubic or linear equation in q depending on whether or not r = 0 upon imposing ps − qr = ± . Hence the numberof such γ is at most six times the number of solutions of f ( p, r ) = 1 with p, r ∈ Z . But, by Thue’s theorem in thestrengthened form of e.g. Bennett (who gives an upper bound of ) in [2], this is uniformly bounded, completingthe argument. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 5
Harald Helfgott, Emmanuel Kowalski, Barry Mazur, Joseph Silverman, Katherine Stange,and Yukihiro Uchida for answering questions related to this work. Finally, I would like tothank Michael Stoll for pointing out a mistake in a previous version of this paper.3. N
OTATION , PREVIOUS RESULTS , AND OUTLINE OF THE ARGUMENT
Notation.
Let us now set notation. By f ≪ θ g we will mean that there exists somepositive constant C θ > depending only on θ such that | f | ≤ C θ | g | pointwise. If θ isomitted (i.e. we write f ≪ g ), then the implied constant will be absolute. By f ≍ θ g we willmean f ≫ θ g and f ≪ θ g . By O θ ( g ) we will mean a quantity which is ≪ θ g , and by Ω θ ( g ) we will mean a quantity that is ≫ θ g . By o (1) we will mean a quantity that approaches in the relevant limit (which will always be unambiguous). By f = o ( g ) we will mean f = o (1) · g , and by f ≍ g we will mean f = (1 + o (1)) g . We will write ( a, b ) for the greatestcommon divisor of two integers a, b ∈ Z , ω ( n ) for the number of prime factors of n , v p forthe p -adic valuation, |·| v for the absolute value at a place v of a number field K (normalizedso that the product formula holds), and h ( x ) for the absolute Weil height of x ∈ Q — i.e., h ( x ) := X w [ K w : Q v ][ K : Q ] log + | x | w , the sum taken over all places w of K with v := w | Q and log + ( a ) := max(log a, . Similarly H ( x ) := exp( h ( x )) will denote the multiplicative Weil height of x ∈ Q . Note that, for ab ∈ Q in lowest terms, H ( ab ) = max( | a | , | b | ) . Given a rational point P = ( x, y ) on E A,B : y = x + Ax + B , h ( P ) and H ( P ) will denote h ( x ) and H ( x ) , respectively. ˆ h ( P ) := lim n →∞ h (2 n P )4 n will denote the canonical height of P , with N´eron local heights ˆ λ v such that X v ˆ λ v = ˆ h. We will similarly write λ v ( · ) := log + | · | v . By ∆ or ∆ A,B we will mean − A + 27 B ) , the discriminant of E A,B . We will write N A,B for the conductor of E A,B , defined by N A,B = Y p | ∆ p e p , with e p = 1 if p has multiplicative reduction at p , and otherwise e p ≥ with equality if p = 2 , . The definitions of e and e are more complicated, but we will only use that e ≤ and e ≤ . By ψ n ( P ) we will mean the n -th division polynomial of E A,B , with zeroes atthe nonidentity n -torsion points and of homogeneous degree n − when x is given degree , y degree , A degree , and B degree . Note that multiplication by n is then given by nP = (cid:18) x ( P ) − ψ n − ( P ) ψ n +1 ( P ) ψ n ( P ) , ψ n ( P )2 ψ n ( P ) (cid:19) . In general ψ n +1 ( P ) is a polynomial of degree n + 2 n in x, A, B with leading coefficient(in x ) equal to n + 1 , and ψ n ( P ) is y times a polynomial in x, A, B of degree n − withleading coefficient (in x ) equal to n . By homogeneity, both these polynomials in x have no LEVENT ALPOGE term of one degree less in x (i.e., they are of the form c d x d + c d − x d − + · · · + c ). Finally,we will abuse the word “average” to mean “limsup of the average” throughout.3.2. Previous results.
Now fix A and B for which ∆ A,B = 0 . The first general resultbounding integral points on the curve E A,B is Siegel’s famous finiteness theorem:
Theorem 5 (Siegel) . E A,B ( Z ) is finite. Next Baker, as an application of his theory of linear forms in logarithms, gave an effec-tive upper bound on the heights of the integral points on E A,B : Theorem 6 (Baker, [1]) . Write H := H ( E A,B ) . Let P ∈ E A,B ( Z ) . Then: | x ( P ) | ≤ e (10 H ) . This of course gives a bound on the number of integral points on E A,B .As in the case of Roth’s theorem in Diophantine approximation, effectively bounding the number of solutions is much easier than bounding their heights . Indeed, Siegel’s argumentwas already effective, and Silverman and Hindry-Silverman were the first to use it to givean explicit upper bound. They obtained:
Theorem 7 (Silverman, [33]) . | E A,B ( Z ) | ≪ O (1) rank( E A,B )+ ω (∆) . In fact, one can further reduce ω (∆) to ω (∆ ss ) , the number of primes of semistable bad reduction. Theorem 8 (Hindry-Silverman, [22]) . | E A,B ( Z ) | ≪ O (1) rank( E A,B )+ σ EA,B , where σ E A,B := log | ∆ A,B | log N A,B is the Szpiro ratio of E A,B (here N A,B is the conductor of E A,B ). Conjecturally the Szpiro ratio is at most o (1) . This is equivalent to the ABC conjecture.In any case, the implied constants in both theorems are on the order of , even if oneuses recent improvements to the arguments in Hindry-Silverman (namely, Petsche’s [30]improved lower bound on the canonical height of a nontorsion rational point on E A,B ),one cannot reduce the constants to below this order of magnitude. On the other hand it isquite easy to show that most curves have Szpiro ratio at most, say, , so one might thinkthat this makes the second bound amenable to averaging.But finiteness of the average of (10 ) rank( E A,B ) is far out of the reach of current tech-niques. Recent spectacular results of Bhargava-Shankar (which will feature centrally inthis argument) have proven that the average of rank( E A,B ) is finite (it is at most ), and thisis the extent of current techniques. Specifically, Bhargava-Shankar have shown: Theorem 9 (Bhargava-Shankar, [6, 7, 4, 5]) . Let n = 2 , , , or . Then when all elliptic curves E/ Q are ordered by height, the average size of the n -Selmer group Sel n ( E ) is σ ( n ) , the sum ofdivisors of n . Heath-Brown [19] has proved, assuming the Grand Riemann Hypothesis and the Birch and Swinnerton-Dyerconjecture, that the proportion of curves with rank R is ≪ R − Ω( R ) , whence we may average (10 ) rank( E A,B ) .Thus our result follows from combining this theorem of Heath-Brown with the work of Hindry-Silverman forcurves of nonnegligible conductor, and the pointwise bound of Helfgott-Venkatesh (stated below) for those curvesof negligible conductor. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 7
Note that n rank( E ) ≤ | Sel n ( E ) | via Galois cohomology, whence the average of n rank( E ) isat most σ ( n ) for n ≤ .Another result crucial to us is the pointwise bound of Helfgott-Venkatesh, who obtain: Theorem 10 (Helfgott-Venkatesh, [21]) . | E A,B ( Z ) | ≪ O (1) ω (∆) · (log | ∆ | ) · . rank( E A,B ) . From this it follows that (see Lemma 14):
Corollary 11.
Avg E ∈F ≤ T universal ( | E ( Z ) | ) ≪ ǫ T ǫ . To the author’s knowledge, except for a potentially small improvement (e.g. exp (cid:16) O (cid:16) log T log log T (cid:17)(cid:17) instead of T ǫ ), this is the best result derivable directly from the literature in this direction. This sort of result will allow us to restrict our attention to subfamilies of density − T − Ω(1) ,which will be quite useful in what follows.3.3.
Detailed sketch of proof of boundedness.
Let us now give an argument proving The-orem 2 without an explicit constant. (To lower the constant to we will have to be muchmore careful.) Sketch of proof that lim sup T →∞ Avg E ∈F ≤ T universal ( | E ( Z ) | ) < ∞ . The first thing to note is thatthe size of F ≤ T universal is ≍ T . (Indeed, the bound H ( E A,B ) ≪ T is equivalent to the bounds A ≪ T and B ≪ T .)By Corollary 11, we may restrict to any subfamily of density at least − T − Ω(1) . Fix a δ > . We will restrict to the subfamily F ∗ ⊆ F universal with: • | A | ≫ T − δ , | B | ≫ T − δ . • ( A, B ) ≤ T δ . • Q v p (∆) ≥ p v p (∆) ≤ T δ . On this subfamily we break the integral points into three classes: E ( Z ) = E ( Z ) small ∪ E ( Z ) medium ∪ E ( Z ) large , where: E ( Z ) small := { P ∈ E ( Z ) | h ( P ) ≤ (5 − δ ) log T } ,E ( Z ) medium := { P ∈ E ( Z ) | (5 − δ ) log T < h ( P ) ≤ δ − log T } ,E ( Z ) large := { P ∈ E ( Z ) | δ − log T < h ( P ) } . We will call these the “small”, “medium”, and “large” ranges, respectively.By explicit counting, we obtain the bound P A ≪ T ,B ≪ T | E A,B ( Z ) small | ≪ T − δ . There-fore the small range does not contribute to the average. There has been extensive work by Heath-Brown [18], Bombieri-Pila [10], and others on bounding the numberof rational points of small height, but this does not improve the above bound. See Lemma 14. To see that this has the desired density, see Lemma 15. See the proof of the second part of Lemma 16.
LEVENT ALPOGE
To bound the points in the medium range, we prove a gap principle (analogous to theMumford gap principle for rational points on higher genus curves) which seems to havefirst appeared in work of Silverman [33] and Helfgott [20]. Lemma 12 (Helfgott-Mumford gap principle) . Let
P, R ∈ E ( Z ) medium ∪ E ( Z ) large . Let θ P,R bethe angle between them in the Mordell-Weil lattice E ( Q ) / tors ⊆ E ( Q ) ⊗ Z R (with respect to thecanonical height). Then: cos θ P,R ≤
12 max s h ( P ) h ( R ) , s h ( R ) h ( P ) ! + O ( δ ) . Therefore, via P √ ˆ h ( P ) ⊗ P , the number of points P ∈ E ( Z ) medium with canonicalheight in the range [ X, (1 + δ ) X ] is ≪ A (rank( E ) , θ ) , where θ = π − O ( δ ) , and A ( n, θ ) is the maximal number of unit vectors in R n with pairwiseangles at least θ . It is a well-known problem in the theory of sphere packing to provide agood upper bound for this quantity. For our purposes we will be interested in an upperbound for large n , and one is provided by the work of Kabatiansky-Levenshtein: Theorem 13 (Kabatiansky-Levenshtein, [23]) . A ( n, θ ) ≪ exp (cid:18) n · (cid:20) θ θ log (cid:18) θ θ (cid:19) − − sin θ θ log (cid:18) − sin θ θ (cid:19) + o (1) (cid:21)(cid:19) . For θ = π − O ( δ ) , this tells us that A ( n, θ ) ≪ . n once δ ≪ .Therefore the number of integral points with canonical height in the interval [ X, (1+ δ ) X ] is ≪ . rank( E ) . Since we can cover E ( Z ) medium with O ( δ − ) such intervals, we obtain thebound | E ( Z ) medium | ≪ δ − · . rank( E ) . Since, by Bhargava-Shankar, the average of rank( E ) is bounded over this family, the mediumrange contributes O ( δ − ) to the average.Finally, to the large range. The claim is that there are O ( δ − log ( δ − ) · . rank( E ) ) manypoints of E ( Z ) large in each coset of E ( Q ) / E ( Q ) . To see this, let R be a minimal element(with respect to height) of E ( Z ) large in its coset modulo . By the same argument as forthe medium range, there are O ( δ − log ( δ − ) · . rank( E ) ) integral points P with h ( P ) <δ − h ( R ) . For those points P ≡ R (mod 3) with h ( P ) ≥ δ − h ( R ) , we write P =: 3 Q + R with Q ∈ E ( Q ) . Then since P is very close to ∞ in the Archimedean topology, Q must bevery close to a solution of R = − R as well. That is, x ( Q ) must be very close to an x ( ˜ R ) ∈ Q solving x (3 ˜ R ) = x ( R ) . After making this precise , we find that: − O ( δ ) ≥ log | x ( Q ) − x ( ˜ R ) | − h ( Q ) The difficulty in proving this in fact lies in handling the error term, which relies in a careful estimation of thedifference between the Weil and canonical height on this curve. (This is the reason for restricting to the subfamily F ∗ : the difference between the two heights is much better controlled in this case.) See Lemma 19. See (4.5) and take
C, D ≫ δ − . HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 9 for some such ˜ R . Therefore | x ( Q ) − x ( ˜ R ) | ≤ H ( Q ) − + O ( δ ) . Thus x ( Q ) is a Roth-type approximation to x ( ˜ R ) . Moreover since h ( x ( Q )) = h ( Q ) ≫ δ − h ( ˜ R ) = δ − h ( x ( ˜ R )) , we see that x ( Q ) is a “large” rational approximation, in the senseof Bombieri-Gubler [11]. As they prove , there are only O (1) such approximations once δ − ≫ . Therefore each coset modulo contributes at most O ( δ − log ( δ − ) · . rank( E ) ) to | E ( Z ) large | , whence we obtain the bound | E ( Z ) large | ≪ δ − log ( δ − ) · . rank( E ) . Again by Bhargava-Shankar the average of rank( E ) is bounded, so that the large rangecontributes O ( δ − log ( δ − )) to the average.Therefore, in sum, we have found that the average is at most O ( δ − log ( δ − )) for any δ ≪ sufficiently small. Choosing such a δ ≍ then gives the result. (cid:3) Having given a sketch of an argument proving the weaker theorem that the limsup ofthe average is bounded, let us now give the full proof of Theorem 2.4. P
ROOF OF T HEOREM k = 1 ) — there are only a few modifications required for the case of general k , and they areall clear. Proof of Theorem 2.
As noted, |F ≤ T universal | ≍ T . To obtain a good estimate on the difference between the Weil height and the canonicalheight, we will restrict to a subfamily F ∗ ⊆ F ≤ T universal which omits a set of density O ( T − c ) for some positive c > . The following lemma shows that we may do this.4.1. Restricting to a subfamily and handling small points.Lemma 14.
Let
G ⊆ F ≤ T universal . Then, for all ǫ > , X E ∈G | E ( Z ) | ≪ |F ≤ T universal | · |G| |F ≤ T universal | ! Ω(1) · exp (cid:18) O (cid:18) log T log log T (cid:19)(cid:19) . Proof.
By H ¨older’s inequality, it suffices to show that
Avg E ∈F ≤ T universal ( | E ( Z ) | . ) ≪ exp (cid:18) O (cid:18) log T log log T (cid:19)(cid:19) . By Helfgott-Venkatesh (Theorem 11), we have that X E ∈F ≤ T universal | E ( Z ) | . ≪ (log T ) . · X E ∈F ≤ T universal O (1) ω (∆ E ) · . rank( E ) . We apply the crude bound ω ( n ) ≪ log n log log n and Bhargava-Shankar to conclude. (cid:3) See e.g. their (6.23).
Fix a δ > to be chosen later. We will take δ ≍ independent of T . Let us apply this tofirst restrict to the subfamily F • ⊆ F ≤ T universal defined by the conditions:(1) | A | ≥ T − δ .(2) | B | ≥ T − δ , and B is not a square.(3) ( A, B ) ≤ T δ .(4) | ∆ | ≥ T − δ .(5) Q p | ∆ p v p (∆) ≤ T δ .To see that we may, we prove: Lemma 15.
Let G be the complement of F • in F ≤ T universal . Then: |G| |F ≤ T universal | ≪ T − Ω( δ ) . Proof.
It suffices to impose each condition one by one and check that we throw out a density ≪ T − Ω( δ ) subset at each step. For the first and second conditions this is immediate. Forthe third condition, the number of A ≪ T , B ≪ T with ( A, B ) > T δ is at most ≪ X T δ A, B with | ∆ A,B | < T − δ is ≪ X B ≪ T T − δ ≪ T − δ . Finally, for the fifth condition given the other four, the argument will be a bit longer.Our strategy will be to show that we may take the radical of ∆ to be reasonably large, andthen we will establish that we may take ∆ to not have any nonnegligible square divisors.Then we may bound the “nonsquarefree part” of ∆ in terms of square divisors of ∆ only,which thus forces it to be small.We first show that we may assume the conductor of E A,B is at least T . . To see this, byTheorem 4.5 of Helfgott-Venkatesh [21], the number of curves of conductor N is ≪ N . .Therefore the number of ( A, B ) with conductor at most T . is ≪ T . · . < T . ,giving the claim.Now note that E A,B has additive reduction at p if and only if p | ( A, B ) . Therefore N A,B ≪ Y p =2 , ,p | ∆ p · Y p =2 , ,p | ( A,B ) p ≪ rad(∆) · T δ , where rad( n ) := Q p | n p is the radical of n . Therefore rad(∆) ≫ T . once δ ≪ . HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 11 Let us now show that we may assume that if n | ∆ and n ≪ T . then n ≤ T δ . To seethis, note that n | ∆ implies that − A ≡ B (mod n ) . The first claim is that, for fixed A , the number of B ≪ T solving this equation modulo n is ≪ O (1) ω ( n ) · ( A , n ) · (cid:18) T n (cid:19) , where ( x α , y β ) := Q p | ( x,y ) p max( αv p ( x ) ,βv p ( y )) . Indeed, at a prime power p e with p > ,the number of square roots of − A is at most p vp ( A )2 by Hensel’s lemma. When p = 3 it is instead at most ≪ v A )2 for the same reason, but the implied constant is different.Similarly at p = 2 it is ≪ v A )2 . Moreover if v p ( A ) ≥ e , then the number of solutions for B is instead at most const · p e , with const ≪ and equal to if p > . Therefore the numberof solutions modulo m is ≪ O (1) ω ( m ) · Y p | ( A,m ) p min ( v p ( A ) , v p ( m ) ) . Hence the number of B ≪ T such that n | ∆ A,B is ≪ O (1) ω ( n ) · ( A , n ) · (cid:18) T n (cid:19) . But then the number of A ≪ T , B ≪ T for which there exists an n | ∆ with T δ < n ≪ T . is at most X A ≪ T X B ≪ T X T δ Let G be the complement of F ∗ in F • . Then |G| |F • | ≪ T − Ω( δ ) . Proof. Theorem 1.1 in Harron-Snowden [15] allows us to impose the first condition. For thesecond condition, the number of A ≪ T , B ≪ T such that there is at least one integralpoint P ∈ E A,B ( Z ) with h ( P ) ≤ (5 − δ ) log T is at most |{ ( x, y, A, B ) ∈ Z : | x | ≤ T − δ , A ≪ T , B ≪ T , y = x + Ax + B }| = |{ ( x, y, A, B ) : | x | ≤ T, A ≪ T , B ≪ T , y = x + Ax + B }| + X T ≤| x |≪ T − δ |{ ( y, A, B ) : A ≪ T , B ≪ T , y = x + Ax + B }| . To bound the first sum, note that, given ( x, y, A ) , B = y − x − Ax is determined. Moreover y ≪ | x | + T | x | + T ≪ T , so that y ≪ T . Therefore the number of ( x, y, A, B ) is at most ≪ T · T · T = T . . For the second sum, note that in this range | y − x | ≪ T | x | + T ≪ T | x | , whence y ≍ | x | . Now, if ( y, A, B ) and ( y ′ , A ′ , B ′ ) lie in the solution set and (without lossof generality) y, y ′ > , then y − y ′ = x ( A − A ′ ) + ( B − B ′ ) , so that | y − y ′ | ≪ T | x | + T | x | ≪ T | x | . Therefore the number of y for which there exist A, B making ( x, y, A, B ) a solution is atmost ≪ T | x | . HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 13 Next, given x and y , if ( x, y, A, B ) and ( x, y, A ′ , B ′ ) are solutions, then ( A − A ′ ) x = B ′ − B ,so that | A − A ′ | ≪ T | x | , whence the number of A for which there exists a B making ( x, y, A, B ) a solution is at most ≪ T | x | . Putting these together, the second sum is bounded above by X T ≤| x |≪ T − δ |{ ( y, A, B ) : A ≪ T , B ≪ T , y = x + Ax + B }|≪ X T ≤| x |≪ T − δ (cid:18) T | x | (cid:19) (cid:18) T | x | (cid:19) ≪ T − δ , as desired.Finally, for the third condition, note, as above, that the number of A ≪ T , B ≪ T suchthat there is at least one rational point Q = (cid:0) xd , yd (cid:1) ∈ E A,B ( Q ) with h ( Q ) ≤ (cid:0) − δ (cid:1) log T is at most | ( x, y, d, A, B ) : y = x + Ad x + Bd , | x | ≤ T − δ , | d | ≤ T − δ , A ≪ T , B ≪ T } . Note that if ( x, y, d, A, B ) is a solution, then y ≪ T d . Moreover, ( x, y, d, A ) determines B . Hence this count is at most: ≪ T − δ · (cid:16) T · T − δ (cid:17) · T − δ · T = T − δ , whence we are done. (cid:3) Local heights and a gap principle. The purpose of restricting to this subfamily is tobe able to give a very strong estimate on the difference between the Weil and canonicalheights on the curves in this family. Specifically, Lemma 17. Let E ∈ F ∗ . Let h, ˆ h be the Weil and canonical heights on E A,B , respectively. Let Q ∈ E ( Q ) . Then ˆ h ( Q ) − h ( Q ) = log + | ∆ − x ( Q ) | + 16 log | ∆ | − log + | x ( Q ) | + O ( δ log T ) . In particular, h ( Q ) ≤ ˆ h ( Q ) + O ( δ log T ) and, if | x ( Q ) | ≥ | ∆ | , ˆ h ( Q ) − h ( Q ) = O ( δ log T ) . Proof. Write ˆ h − h = P v ˆ λ v − λ v , where λ v := log + | · | v , ˆ λ v are the N´eron local heights, and v runs over the places of Q . At a prime p = 2 , of good reduction, by e.g. Theorem 4.1 in[34], the local heights are equal. At a prime p of additive reduction (so p | ( A, B ) ) or at p = 2 or , by the same theorem we see that ≤ ˆ λ p − λ p ≤ − 16 log | ∆ | p . Note: our normalization differs from Silverman’s by a factor of . Since p | ( A, B ) implies p | ∆ , we see that Y p | A,B ) p v p (∆) ≤ Y p | ∆ p v p (∆) ≪ T δ , whence the sum of these contributions is ≤ X p | A,B ) ˆ λ p − λ p ≪ δ log T. At a prime p = 2 , of multiplicative reduction, by Chapter III Theorem 5.1 of [26], since v p (∆) = 1 (whence α = 0 in Lang’s notation), we see that ˆ λ p − λ p = − 16 log | ∆ | p . Finally, at the infinite place, since j ( E A,B ) ≪ T O ( δ ) , by combining Proposition 5.4 and (31)of [34] we find that ˆ λ ∞ ( Q ) − λ ∞ ( Q ) = log + | ∆ − x ( Q ) | − log + | x ( Q ) | + O ( δ log T ) . Summing these all up and using the product formula gives the result. (cid:3) Given that the Weil and canonical heights are so close, we may now prove a bound onthe angle between two integral points by proving a corresponding bound with Weil heightsreplacing canonical heights. Specifically, Lemma 18 (Helfgott-Mumford gap principle, cf. [20]) . Let E ∈ F ∗ . Let P = R ∈ E ( Z ) with h ( P ) ≥ h ( R ) (recall that automatically h ( P ) , h ( R ) > (5 − δ ) log T ). Then: ˆ h ( P + R ) ≤ h ( P ) + h ( R ) + O (1) . Proof. Write P =: ( X, Y ) and R =: ( x, y ) with | X | ≥ | x | . Note that since | X | , | x | ≥ T − δ ,we have that | Y | ∼ | X | and | y | ∼ | x | . Now x ( P + R ) = ( Y − y ) ( X − x ) − X − x = X x + Xx − Y y + A ( X + x ) + 2 B ( X − x ) . The numerator has absolute value at most ≪ | X | | x | by hypothesis. The denominator hasabsolute value at most ≪ | X | . Therefore, since cancelling common factors will only makethe numerator and denominator smaller, we see that h ( P + R ) ≤ h ( P ) + h ( R ) + O (1) . If | x ( P + R ) | ≥ | ∆ | , then this completes the proof, by Lemma 17. Otherwise, write x ( P + Q ) = WZ in lowest terms. Then ˆ h ( P + R ) = h ( P + R ) + 16 log | ∆ | − log + | x ( P + R ) | + O ( δ log T )= max(log | W | , log | Z | ) + log T − max(log | W | − log | Z | , 0) + O ( δ log T )= log T + log | Z | + O ( δ log T ) . Since as we saw | Z | ≪ | X | , we find that ˆ h ( P + R ) ≤ log T + 2 h ( R ) + O ( δ log T ) . Observingthat h ( P ) ≥ (5 − δ ) log T finishes the result. (cid:3) This results in a lower bound on the angle of integral points close in absolute value: HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 15 Lemma 19. Let E ∈ F ∗ . Let P = R ∈ E ( Z ) . Let θ P,R be the angle between P and R in theEuclidean space E ( Q ) ⊗ Z R . Then: cos θ P,R ≤ 12 max s h ( P ) h ( R ) , s h ( R ) h ( P ) ! + O ( δ ) . Proof. By definition, cos θ P,R = ˆ h ( P + R ) − ˆ h ( P ) − ˆ h ( R )2 q ˆ h ( P )ˆ h ( R ) . By Lemma 17 and the fact that h ( P ) , h ( R ) > (5 − δ ) log T , we find that cos θ P,R = ˆ h ( P + R ) − h ( P ) − h ( R )2 p h ( P ) h ( R ) + O ( δ ) . Applying Lemma 18 then concludes the argument. (cid:3) Decomposing the set of integral points into classes: I – IV . Fix now a parameter D > . We will take D to be ≪ in the end. Let ˜ D := D + √ D + 42 , so that ˜ D ( ˜ D − = 1 D . (4.1)Fix E ∈ F ∗ . Let r := rank( E ) . Note that we may assume r > since E ( Q ) tors = 0 and so | E ( Z ) | = 0 if r = 0 . So choose P , . . . , P r ∈ E ( Q ) such that P = 0 has minimal canonicalheight (recall that E has no rational torsion) and P i has minimal canonical height amongpoints not inside span Z ( P , . . . , P i − ) + 3 E ( Q ) . Note that since ˆ h ( P i ± P j ) ≥ ˆ h ( P max( i,j ) ) it follows that |h P i , P j i| ≤ ˆ h ( P min( i,j ) )2 . It follows that, for any ǫ i = ± , ˆ h k X i =1 ǫ i P i ! ≤ k X i =1 ( k − i + 1)ˆ h ( P i ) . (4.2)Next note that P , . . . , P r is an F -basis for E ( Q ) / E ( Q ) . Given Q ∈ E ( Q ) , write i ( Q ) :=min { i | Q ∈ span Z ( P , . . . , P i ) + 3 E ( Q ) } — i.e., i ( Q ) is the least i for which Q is congruent toan element of the Z -span of P , . . . , P i modulo . (Note that i = 0 implies Q is a multiple of .) Write H := max (cid:16) (5 − O ( δ )) log T, ˆ h ( P ) (cid:17) , where, say, the implied constant is larger than one plus twice the implied constants inLemma 17, and H i := max (cid:16) ˆ h ( P i ) , ˜ D · H i − (cid:17) . For instance, the condition h ( P ) > ˜ D · H i implies h ( P ) > ˜ D i − j +1) ˆ h ( P j ) for every j ≤ i , and it alsoimplies h ( P ) > ˜ D i · (5 − O ( δ )) log T . Then if r > write E ( Z ) = 3 E ( Q ) ∩ E ( Z ) ∪ r [ i =1 { P ∈ E ( Z ) , H i ≤ ˆ h ( P ) ≤ ˜ D · H i }∪ r [ i =1 { P ∈ E ( Z ) , i ( P ) = i, ˆ h ( P ) > ˜ D · H i } =: I ∪ r [ i =1 II ( i ) D ∪ r [ i =1 III ( i ) D , (Note that our notation I , II , III is slightly different from the outline, since we have alreadygotten rid of “small” points.)In words, what we have done is broken E ( Z ) into multiples of rational points (whichwill be easy to handle) , points of “medium” height in their respective cosets, and thenpoints of “large” height in their respective cosets. (The curves with points of small heighthave already been thrown out.) Note that this decomposition is complete because if P ∈ E ( Z ) lies outside the union, then i ( P ) =: i > and ˆ h ( P ) ≤ ˜ D H i , so ˆ h ( P ) < H i . There-fore, since i ( P ) = i , we must have H i = ˜ D H i − by minimality of P i . Thus ˆ P < H i − .Proceeding inductively, we eventually find that ˆ P < (5 − O ( δ )) log T , contradicting h ( P ) > (5 − δ ) log T combined with Lemma 17.Let us further write III ( i ) D = [ ~a ∈{− , , } i : a i > { P ∈ III ( i ) D , P ≡ i X j =1 a j P j (mod 3) } =: [ ~a ∈{− , , } i : a i > III ( i,~a ) D . In words, we are breaking the points of “large” height into their congruence classes modulo . (Since we will be counting points and their negatives together below, we have forced a i > rather than a i = 0 .)Given ~a ∈ {− , , } i with a i = 0 , we will write R ~a := P ij =1 a j P j . Let us furtherbreak III ( i,~a ) D into a set we will show is empty and a set to which we can apply Roth-like In the rank case all points are multiples of a rational point, so in some sense “ E ( Z ) =: I ” would beconsistent notation here, but we have not bothered because it would be unnecessarily confusing. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 17 techniques. Specifically, write III ( i,~a ) D = n P ∈ III ( i,~a ) D : ∃ ! Q ∈ E ( Q ) : P = 3 Q + R ~a ; ∀ ˜ R ∈ E ( Q ) with R = − R ~a , we have | x ( Q ) − x ( ˜ R ) | > 12 min R ′ = − R ~a , ˜ R ′ = ˜ R | x ( Q ) − x ( ˜ R ′ ) | (cid:27) ∪ [ ˜ R ∈ E ( Q ):3 ˜ R = − R ~a n P ∈ III ( i,~a ) D : ∃ ! Q ∈ E ( Q ) : P = 3 Q + R ~a ; | x ( Q ) − x ( ˜ R ) | ≤ 12 min R ′ = − R ~a , ˜ R ′ = ˜ R | x ( Q ) − x ( ˜ R ′ ) | (cid:27) =: IV i,~aD ∪ [ R = − R ~a III ( i,~a, ˜ R ) D . In words, we have written P ∈ III ( i,~a ) D as P = 3 Q + R ~a , and split the points up based onthe element of the nine-element set − R that Q is close to. IV ( i,~a ) D is the set of points with Q not close to any point in − R , which will be empty once D is sufficiently large. (This isbecause x ( P ) is large, so P is close to the origin, so that Q is close to such a solution.)4.4. I is small: multiples of rational points are rarely integral. Let us now begin bound-ing the sizes of each of the sets I , . . . , IV . The sets I and II D require almost no work. Thefollowing lemma expresses the fact that rational points rarely have integral multiples: inthe rank one case, at worst one has the generator and its negative as integral points (via thetheory of lower bounds on linear forms in elliptic logarithms), and in the higher rank caseno triple of a rational point is integral on a curve in our family. Lemma 20. Let E ∈ F ∗ . Then: | E ( Z ) | ≤ when r = 1 , and I = ∅ otherwise. Before we prove this lemma, we will prove a preparatory lemma on the coefficients of thedivision polynomials of E . Recall that the denominator of the multiplication-by- n map, ψ n ( P ) , is homogeneous in x, A, B of degree n − with the usual grading. Write ψ n ( P ) =: X ~f ∈ N : f x +2 f A +3 f B = n − c ~f · x f x A f A B f B , with c ~f ∈ Z . The claim is that these c ~f do not grow too fast as f x decreases. More precisely, Lemma 21. c ~f ≪ n O (1) O (1) (log n ) · ( n − − f x ) . It is a theorem of Lang that c ~f ≪ O (1) n in general (which is only weaker for f x ≥ (1 − o (1)) n ), but this is not enough for our purposes. Proof of Lemma 21. Write ψ n ( P ) =: y − n mod 2 X ~f ∈ N : f x +2 f A +3 f B =2 j n − k C ~f · x f x A f A B f B . We will show that C ~f ≪ n O (1) O (1) (log n ) · (cid:16) j n − k − f x (cid:17) , from which the bound for c ~f follows. That is, we will show that there are absolute constants K , K , and K such that, for all ~f , C ~f ≤ K n K K (log n ) · (cid:16) j n − k − f x (cid:17) . (4.3)First choose K > so large that for n ≤ the bound | C ~f | ≤ K holds. Take K = 1 .Take K so large that K n K +10 < K log (1 . · log n for all n > . The bound will thenfollow by induction. Specifically, recall the recursive formulas for the division polynomials:for odd indices, ψ m +1 = ψ m +2 ψ m − ψ m − ψ m +1 , and, for even indices, ψ m = (cid:18) ψ m y (cid:19) (cid:0) ψ m +2 ψ m − − ψ m − ψ m +1 (cid:1) . So suppose we have proved (4.3) for all n ′ < n . From the recursions and induction itfollows immediately that the leading coefficient of ψ n is n , which satisfies the claimedbound since K > , K = 1 . Hence we may assume f x < (cid:22) n − (cid:23) . For n of the form n =: 4 m + 1 , using the recursive formula, we find that ψ m +1 = − ψ m − ψ m +1 + (cid:18) ψ m +2 y (cid:19) (cid:18) ψ m y (cid:19) ( x + Ax + B ) . Expanding and applying the induction hypothesis, we find that the coefficient of x f x A f A B f B in ψ m +1 is, in absolute value, at most a sum of at most n terms (corresponding to decom-positions ~f = ~e + · · · + ~e ), each at most K n K K log (2 m +2) (8 m +4 m − f x )3 . But log (2 m + 2) ≤ log n − log (1 . , so that log (2 m + 2) ≤ (log n ) − log (1 . · log n. Inserting this into the inequality and using f x < j n − k , we find that | C ~f | ≤ K n K K (log n ) (cid:16) j n − k − f x (cid:17) h K n K +6 K − log (1 . · log n i , and the factor in brackets is smaller than by hypothesis. For n not congruent to mod the argument is exactly the same, using the other recursive relation when n is even. (cid:3) This finishes our preparations. Let us now prove Lemma 20. Proof of Lemma 20. For the first bound, note that if nP is integral for some n ≥ , then P must be integral. To see this, write P = (cid:0) xd , yd (cid:1) in lowest terms and suppose d > . Thensince x ( nP ) = xψ n ( P ) − ψ n +1 ( P ) ψ n − ( P ) ψ n ( P ) HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 19 is the quotient of two homogeneous polynomials of degree n and n − , respectively(again, x, y, A, B are given degrees , , , and , respectively) with the numerator havingleading term x n , we see that, on clearing denominators, x ( nP ) = x n + ( ∈ d Z )( ∈ d Z ) , which is not an integer since ( x, d ) = 1 by hypothesis.So if P = P is not integral we are done for the rank case. If it is integral, then the claimis that none of its multiples nP , n > , are also integral. Indeed, since P is integral, we findthat h ( P ) > (5 − δ ) log T since E ∈ F ∗ .Let us first show that nP is not integral for < n ≪ O (1) √ log T . Of course it sufficesto show that the denominator d n in lowest terms of x ( nP ) is larger than for these n . ButLemma 29 of [36] (or, equivalently, Proposition 4.2.3 in [28]) allows us to do this. Indeed,we find that log ( d n ) ≥ log ( ψ n ( P ) ) − n | ∆ | ≥ log ( ψ n ( P ) ) − n T − O (1) . By Lemma 21, the coefficient of x k is at most ≪ n O (1) (cid:16) O (1) (log n ) T (cid:17) n − − k . Hence since | x ( P ) | ≥ T − δ is much larger than T , we find that ψ n ( P ) is dominated by itstop term. Specifically, for n ≪ O (1) √ log T , ψ n ( P ) ≥ | x ( P ) | n − (cid:16) − n O (1) O (1) (log n ) T − Ω(1) (cid:17) ≫ | x ( P ) | n − , so that log ( d n ) ≥ ( n − h ( P ) − n T − O (1) ≥ (9 − O ( δ )) log T − O (1) , which is positive. Thus d n > and so x ( nP ) is not integral for n ≪ O (1) √ log T . This in factcompletes the first estimate since it shows that no integral point is thrice a rational pointin general as well (for this application we could simply use Lang’s coefficient bound, ofcourse).Thus it remains to show that nP is not integral for n ≫ O (1) √ log T . This will follow fromDavid’s bounds on linear forms in elliptic logarithms — in fact we will show that nP is notintegral for n ≫ log T √ log log T log log log T . To do this we apply the Corollary of equation(26) in [14]. Let us translate their notation into ours. Recall that, for us, r = 1 , so that their C ≪ . Moreover, since our curves have no torsion, their g = 1 . Their N is our n . Their µ ∞ = log max( | A | , | B | ) ≤ log T + O (1) .They define the real period ω to be ω := 2 Z ∞ ρ dx √ x + Ax + B , where ρ ∈ R is the largest real solution of ρ + Aρ + B = 0 . Let us show that T − ≪ ω ≪ T − + O ( δ ) . Let ρ ′ , ρ ′′ ∈ C be the other two roots. Since A and B satisfy T − O ( δ ) ≪ | A | , | B | ≪ T, it follows by the reverse triangle inequality that the same bounds hold for | ρ | , | ρ ′ | , and | ρ ′′ | .Now the integral over [10 T, ∞ ) is ≍ T − since x + Ax + B ≫ | x | there. Hence, since theintegrand is positive, the lower bound on ω follows. For the upper bound, we split intocases. If ρ ′ , ρ ′′ are not real, then Re ( ρ ′ ) = Re ( ρ ′′ ) = − ρ and Im ( ρ ′ ) = − Im ( ρ ′′ ) = ρ ′ − ρ ′′ . Inthis case, on ( ρ, T ) x + Ax + B ≫ ( x − ρ ) | ρ ′ − ρ ′′ | . If ρ ′ , ρ ′′ are real, then on ( ρ, T ) x + Ax + B = ( x − ρ )( x − ρ ′ )( x − ρ ′′ ) ≥ ( x − ρ )( ρ − ρ ′ )( ρ − ρ ′′ ) . Since the discriminant of x + Ax + B is ≫ T − O ( δ ) , applying Mahler’s bound on the bottomof page 261 in [29], in both cases it follows that x + Ax + B ≫ ( x − ρ ) · T − O ( δ ) on the interval. Hence the integral over the interval is Z Tρ dx √ x + Ax + B ≪ T − O ( δ ) Z Tρ dx √ x − ρ ≪ T − + O ( δ ) , completing the argument.It follows that their c ′ ≫ T − O ( δ ) . Note also that their h ≪ log T . The bound | ρ | , | ρ ′ | , | ρ ′′ | ≪ T implies that their ξ ≪ T . Finally, we turn to the expression π | u | ω Im ( τ ) defining their log V .Since we may take τ in the classical fundamental domain for SL ( Z ) acting on the upperhalf plane, we have Im ( τ ) ≫ . Now, u , the elliptic logarithm of our P = P =: ( ξ, η ) ,satisfies u = 1 ω Z ∞ ξ dx √ x + Ax + B ≪ ξ − T + O ( δ ) . Thus π | u | ω Im ( τ ) ≪ | u | ω ≪ | x ( P ) | − T O ( δ ) . But | x ( P ) | ≫ T − δ implies that this is ≪ T − O ( δ ) . Therefore their log V satisfies log V ≪ ˆ h ( P ) . Finally, their λ = ˆ h ( P ) in the rank one case.This completes the translation of their notation. Their Corollary now reads (since cer-tainly any integral point P ′ satisfies the hypothesis of their Proposition, which is x ( P ′ ) ≫ T — x ( P ′ ) is positive since x ( P ′ ) + Ax ( P ′ ) + B is): Corollary 22 (Cf. equation (26) of [14].) . For E ∈ F ∗ of rank one and generator P = P , if nP is integral and n ≫ , then n ≪ (log T ) log n (log log n ) . It follows that, if nP is integral, then n ≪ log T √ log log T log log log T . Since we havealready shown that if n > then n ≫ O (1) √ log T , this completes the argument. (cid:3) Note that we have now completely handled the cases of rank( E ) = 0 or . Hence fromnow on we may assume rank( E ) ≥ . HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 21 II is small: integral points repel in the Mordell-Weil lattice. Let < J < be aparameter which we will choose at the end ( J will depend on r for r ≪ ). Write J =:2 cos θ . We encode the fact that integral points repel in the Mordell-Weil lattice with thefollowing lemma. Lemma 23. | II ( i ) D | ≤ & log ˜ D log J ' max S ⊆ RP r − : ∀ v = w ∈ S, |h v,w i|≤ cos θ + O ( δ ) | S | . We will bound the maximum occurring in this bound with a bound on codes in RP n vialinear programming techniques for n ≪ and a simpleminded volume estimate for n ≫ . Proof. It suffices to prove that the number of points with height in an interval [ m, M ] is ≤ & log (cid:0) Mm (cid:1) J ' max S ⊆ RP r − : ∀ v = w ∈ S, |h v,w i|≤ cos θ + O ( δ ) | S | . To see this, note that [ m, M ] ⊆ & log ( Mm ) J ' [ i =1 [ m ( J ) i , m ( J ) i +1 ] , so that it suffices to prove this bound for an interval of the form [ m, J m ] . But now if h ( R ) ≤ h ( P ) ≤ J h ( R ) , then by Lemma 18, cos θ P,R ≤ J O ( δ ) = cos θ + O ( δ ) . Therefore the map { P ∈ E ( Z ) : h ( P ) ∈ [ m, J m ] } / ± → RP r − via ± P 7→ {± P ⊗ √ ˆ h ( P ) } (the projection to RP r − of the nonzero point P ∈ R r ∼ = E ( Q ) ⊗ Z R ) is injective (since cos θ P,R < if P = R once δ ≪ J ). Moreover the image satisfies the condition that forevery v = w in the image, |h v, w i| = cos θ v,w ≤ cos θ + O ( δ ) , as desired. This completes theproof of the second bound. (cid:3) III is small and IV is empty: an explicit bivariate Roth’s Lemma. For III ( i,~a, ˜ R ) D and IV ( i,~a ) D we will follow Siegel’s proof of his finiteness theorem. Write C := (5 − δ ) ˜ D , so thatfor every P ∈ III ( i,~a ) D we have h ( P ) > C log T . Note also that Lemma 24. Let P ∈ III ( i,~a ) D . Then: h ( R ~a ) , ˆ h ( R ~a ) ≤ (cid:18) D + O ( δ ) (cid:19) h ( P ) . Proof. Observe that ˆ h i X j =1 a j P j ≤ i X j =1 ( i − j + 1)ˆ h ( P i ) ≤ i X j =1 i − j + 1˜ D i − j +1) ˆ h ( P )= i X j =1 j ˜ D − j (1 + O ( δ )) h ( P ) , where the first step follows from (4.2) and the second follows from the definition of III ( i ) D ,plus Lemma 17. But in general k X ℓ =1 ℓx − ℓ ≤ x ( x − , so that we find that h ( R ~a ) ≤ (cid:18) D + O ( δ ) (cid:19) h ( P ) by (4.1) and Lemma 17, as desired. (cid:3) Having established this estimate, let us now prove: Lemma 25. Let P ∈ III ( i,~a ) D . Then: log Q R = − R ~a (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − h ( P ) ≥ − max (cid:18) 19 log Th ( P ) , D (cid:19) − O ( δ ) , and (cid:0) D − + O ( δ ) (cid:1) − ≤ h ( P )9 h ( Q ) ≤ (cid:0) − D − − O ( δ ) (cid:1) − . Proof. Observe that 12 = log | x ( P ) | h ( P )= log | x (3 Q + R ~a ) | h (3 Q + R ~a ) . (4.4)Let us examine the numerator and denominator of this expression. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 23 First, the denominator. Note that p h ( P ) = p h (3 Q + R ~a ) ≥ q ˆ h (3 Q + R ~a ) − O ( δ log T )= q ˆ h (3 Q + R ~a ) (1 − O ( δ )) ≥ (cid:18) q ˆ h ( Q ) − q ˆ h ( R ~a ) (cid:19) (1 − O ( δ )) ≥ p h ( Q ) − p h ( P ) D ! (1 − O ( δ )) , where we have used the triangle inequality for p ˆ h and Lemma 24.Therefore p h ( P ) ≥ p h ( Q ) (cid:0) D − + O ( δ ) (cid:1) − . The same argument works to prove that p h ( Q ) ≥ p h ( P ) (cid:0) − D − − O ( δ ) (cid:1) as well. This proves the second statement of the Lemma.Now we move to the numerator in (4.4). Observe that x (3 Q ) = x ( Q ) ψ ( Q ) − ψ ( Q ) ψ ( Q ) ψ ( Q ) . Note also that Y R = − R ~a ( x ( Q ) − x ( ˜ R )) = ψ ( Q ) (cid:18) x ( Q ) − ψ ( Q ) ψ ( Q ) ψ ( Q ) − x ( R ~a ) (cid:19) = ψ ( Q ) ( x (3 Q ) − x ( R ~a )) , since both are polynomials in x ( Q ) of degree with leading coefficient and roots exactlyat x ( Q ) = x ( ˜ R ) for some ˜ R with R = − R ~a . But then, since in general x ( W + Z ) = ( y ( W ) − y ( Z )) − ( x ( W ) + x ( Z ))( x ( W ) − x ( Z )) ( x ( W ) − x ( Z )) , we have that x (3 Q + R ~a ) · Y R = − R ~a ( x ( Q ) − x ( ˜ R )) = ψ ( Q ) (cid:0) ( y (3 Q ) − y ( R ~a )) − ( x (3 Q ) + x ( R ~a ))( x (3 Q ) − x ( R ~a )) (cid:1) . Now from the equation y = x + Ax + B , we find that y ≪ ( | x | + T ) in general. Also, | x ( R ~a ) | ≤ exp( h ( R ~a )) ≤ exp (cid:18) h ( P ) D (1 + O ( δ )) (cid:19) = | x ( P ) | D + O ( δ ) . Therefore, as we saw in the proof of Lemma 18, by writing out the numerator and de-nominator, | x (3 Q ) | = | x ( P − R ~a ) | ≪ | x ( R ~a ) | since x ( R ~a ) is much smaller than x ( P ) inabsolute value. But if | x ( Q ) | ≥ T , then | x (3 Q ) | ≫ | x ( Q ) | since it is a quotient oftwo polynomials dominated by their leading terms. Therefore we find that in general | x ( Q ) | , | x (3 Q ) | ≪ | x ( R ~a ) | + T and so | y ( Q ) | , | y (3 Q ) | ≪ ( | x ( R ~a ) | + T ) .Therefore (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x (3 Q + R ~a ) · Y R = − R ~a ( x ( Q ) − x ( ˜ R )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12) ψ ( Q ) (cid:0) ( y (3 Q ) − y ( R ~a )) − ( x (3 Q ) + x ( R ~a ))( x (3 Q ) − x ( R ~a )) (cid:1)(cid:12)(cid:12) ≪ ( | x ( R ~a ) | + T ) ≪ max (cid:16) T , | x ( P ) | D + O ( δ ) (cid:17) . Written another way, log | x (3 Q + R ~a ) | ≤ log Y R = − R ~a (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − + max (cid:18) 19 log T, h ( P ) D + O ( δ ) (cid:19) + O (1) . Therefore, returning to (4.4), we find that: ≤ log Q R = − R ~a (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − h ( P ) + max (cid:18) 19 log Th ( P ) , D (cid:19) + O ( δ ) . This completes the proof. (cid:3) Let us now show that, once D is suitably chosen, IV ( i,~a ) D is empty. (Recall that C =(5 − δ ) ˜ D .) Lemma 26. Suppose C + 72 D + max (cid:18) C , D (cid:19) < . Then IV ( i,~a ) D = ∅ .Proof. Suppose P ∈ IV ( i,~a ) D . Then, by definition, Y R = − R ~a (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − ≪ min ˜ R = ˜ R ′ , R =3 ˜ R ′ = − R ~a | x ( ˜ R ) − x ( ˜ R ′ ) | − . Now, as we saw in the previous lemma, as polynomials in x ( Q ) , Y R = − R ~a ( x ( Q ) − x ( ˜ R )) = ψ ( Q ) x ( Q ) − ψ ( Q ) ψ ( Q ) − ψ ( Q ) x ( R ~a ) . This is homogeneous of degree in x ( Q ) , x ( R ) , A, B when the variables are given degrees , , , , respectively. Therefore the coefficients of x ( Q ) in the first two terms (namely, ψ ( Q ) x ( Q ) − ψ ( Q ) ψ ( Q ) ) are bounded in absolute value by ≪ T . Thus the polynomialhas na¨ıve height, in the sense of Bugeaud and Mignotte [12], at most T + h ( R ~a ) . To seethis, clear the denominator of x ( R ~a ) so that the polynomial is an integral polynomial and HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 25 then the estimate is clear. Therefore by the estimate on page 262 of Mahler [29], we findthat min ˜ R = ˜ R ′ , R =3 ˜ R ′ = − R ~a | x ( ˜ R ) − x ( ˜ R ′ ) | ≫ T − H ( R ~a ) − , and hence that Y R = − R ~a log (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − ≤ 576 log T + 72 h ( R ~a ) + O (1) . Therefore by Lemma 25 it follows that ≤ 576 log T + 72 h ( R ~a ) + O (1) h ( P ) + max (cid:18) 19 log Th ( P ) , D (cid:19) + O ( δ ) . Applying Lemma 24, we see that ≤ 576 log Th ( P ) + 72 D + max (cid:18) 19 log Th ( P ) , D (cid:19) + O ( δ ) . The desired contradiction now follows (once δ ≪ ) by using the inequality h ( P ) >C log T . (cid:3) Finally we will bound the size of III ( i,~a, ˜ R ) D for D suitably chosen. The idea here is that,roughly, we have obtained the inequality log Q R = − R ~a (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − h ( Q ) ≥ . , and now Q is very close to some ˜ R . Therefore, roughly, this tells us that | x ( Q ) − x ( ˜ R ) | ≤ H ( Q ) − . , and so x ( Q ) is a Roth-type rational approximation to x ( ˜ R ) . But Roth’s theoremrequires many such rational approximations to reach a contradiction, and hence providesa poor bound on their number for our purposes. In fact x ( Q ) is also a Siegel-type rationalapproximation, in the sense that x ( ˜ R ) is of degree over Q , and q x ( ˜ R ) = √ 18 =4 . ... < . . Moreover x ( Q ) has very large height compared to x ( ˜ R ) , so if we are verycareful with how we prove Siegel’s theorem on Diophantine approximation (namely, viaRoth’s lemma for bivariate polynomials), we will be able to conclude.So let c < be another parameter (which we will choose such that − c ≫ ). Given c and D , we may bound the size of III ( i,~a, ˜ R ) D as follows. Lemma 27. Suppose s ∈ Z + is such that √ c − κ − s ! κ − κ − s ( D − (cid:18) κ + 1( c − − (cid:19) > . Then | III ( i,~a, ˜ R ) D | ≤ s .Proof. Let P ∈ III ( i,~a, ˜ R ) D . Note that, for all ˜ R ′ = ˜ R such that R ′ = − R ~a , | x ( Q ) − x ( ˜ R ′ ) | > | x ( ˜ R ) − x ( ˜ R ′ ) | by the triangle inequality. Therefore Y R ′ = − R ~a , ˜ R ′ = ˜ R (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ′ ) (cid:12)(cid:12)(cid:12) ≫ Y R ′ = − R ~a , ˜ R ′ = ˜ R | x ( ˜ R ) − x ( ˜ R ′ ) | . By a bound of Mahler (the last line on page 262 of [29]), Y R ′ = − R ~a , ˜ R ′ = ˜ R | x ( ˜ R ) − x ( ˜ R ′ ) | ≫ T − H ( R ~a ) − . Hence, by Lemma 25, log (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − h ( P ) ≥ − max (cid:18) C , D (cid:19) − C − D + O ( δ ) . Next, applying the second part of Lemma 25, we therefore find that log (cid:12)(cid:12)(cid:12) x ( Q ) − x ( ˜ R ) (cid:12)(cid:12)(cid:12) − h ( Q ) ≥ (cid:18) − max (cid:18) C , D (cid:19) − C − D + O ( δ ) (cid:19) (cid:0) D − + O ( δ ) (cid:1) − . (4.5)Write κ := (cid:18) − max (cid:18) C , D (cid:19) − C − D (cid:19) (cid:0) D − (cid:1) − + O ( δ ) . Then | x ( Q ) − x ( ˜ R ) | ≤ H ( Q ) − κ . That is, x ( Q ) ∈ Q is a rational approximation to x ( ˜ R ) ∈ Q with exponent κ . Moreover, h ( Q ) ≥ h ( P ) (cid:0) − D − − O ( δ ) (cid:1) ≥ ( D − − O ( δ )) h ( R ~a ) ≥ ( D − − O ( δ )) ˆ h ( ˜ R ) ≥ ( D − − O ( δ )) h ( ˜ R ) (4.6)so that x ( Q ) is a “large” rational approximation of x ( ˜ R ) as well. To bound the numberof these, we will run through the usual argument for Siegel’s theorem on Diophantineapproximation via Roth’s lemma, except we will be explicit and careful in our bounds.Write α := x ( ˜ R ) (whence deg α ≤ and | α | ≪ T + | x ( R ~a ) | ) and let us suppose therewere s + 1 such approximations — i.e. λ i = λ j satisfying:(1) λ i ∈ Q ,(2) | λ i | ≪ T + | x ( R ~a ) | ,(3) | λ i − α | ≤ H ( λ i ) − κ ,(4) h ( λ i ) ≥ ( D − − O ( δ )) h ( α ) ,(5) h ( λ i ) ≥ C (1 − D − − O ( δ )) log T .Let us also suppose, without loss of generality, that H ( λ s +1 ) ≥ H ( λ s − ) ≥ · · · ≥ H ( λ ) .Note that, by rationality of the β i we have that H ( λ i − ) H ( λ i ) ≤ | λ i − − λ i | ≤ H ( λ i − ) − κ . Hence H ( λ i ) ≥ H ( λ i − ) κ − — i.e., h ( λ i ) ≥ ( κ − h ( λ i − ) + O (1) . HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 27 Therefore h ( λ s +1 ) ≥ ( κ − s h ( λ ) + O ( s ) . Hence λ s +1 and λ are very far apart in height, and it is these rational approximations thatwe will use. We will write β := λ s +1 and β := λ .Now let d > d ∈ Z + be such that (cid:12)(cid:12)(cid:12)(cid:12) d d − h ( β ) h ( β ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d . We will take d , d → ∞ at the end of the argument, so any error terms suppressed byfactors of d or d will be negligible.Let t := c √ and let deg α · K := deg α · d d · t · (cid:0) t − ( d − + d − ) (cid:1) = d d c (cid:18) cd √ cd √ (cid:19) ≤ d d ( c + O ( δ )) once d , d ≫ c,δ . An application of Siegel’s lemma gives us the following: Claim 28. There is a nonzero p ∈ Z [ x, y ] such that ( ∂ kx ∂ ℓy p )( α, α ) = 0 for all nonnegative integers k, ℓ with kd + ℓd ≤ t, and such that H ( p ) ≤ O ( H ( ˜ R )) d d c − − − O ( δ ) . Proof of Claim. We apply Siegel’s lemma in the form of Bombieri-Gubler Lemma 2.9.1 [11].Indeed, we are imposing the conditions P ≤ i ≤ d , ≤ j ≤ d a ij α i + j − k − ℓ (cid:0) d k (cid:1)(cid:0) d k (cid:1) = 0 on thecoefficients a ij ∈ Z of P . But recall that we have the relation den · α deg α = f ( α ) with f ( z ) := den · z deg α − den · g ( z ) , and g ( z ) ∈ Q [ z ] the minimal polynomial of α (here den is the least positive integer suchthat den · g ∈ Z [ z ] ). Multiplying our relations through by den d + d − deg α +1 and repeatedlyapplying this relation reduces us to forcing deg α times as many conditions (but now withintegral coefficients) for each condition with coefficients in Q ( α ) . Importantly, since thecoefficients of f ( z ) ∈ Z [ z ] are all of absolute value at most O ( H ( α )) and we apply therelation ≤ d + d times, the resulting linear conditions on a ij have coefficients bounded inabsolute value by ≪ O (1) d + d H ( ˜ R ) d + d , where we get an O (1) d + d H ( ˜ R ) d + d from the α i + j − k − ℓ terms, and an O (1) d + d from thebinomial coefficients and the sum. Of course infinitely many such d and d exist if h ( β ) h ( β ) is irrational, but since we do not require ( d , d ) = 1 ,such d , d exist in case the ratio of heights is rational as well! To conclude, we note that the number of variables a ij is ( d + 1)( d + 1) , and the numberof equations is deg α · (cid:12)(cid:12)(cid:12) { kd + ℓd ≤ t } (cid:12)(cid:12)(cid:12) , which is at most deg α · K by Bombieri-Gubler page158 [11]. Now apply Lemma 2.9.1 of [11]. (cid:3) So let p be such a polynomial. Following Bombieri-Gubler, we define the index of van-ishing of a polynomial q ∈ Z [ x, y ] at a point ( ξ , ξ ) to be ind( q, ~ξ ) := min (cid:26) kd + ℓd : k, ℓ ≥ , ( ∂ kx ∂ ℓy q )( ξ , ξ ) = 0 (cid:27) . As Bombieri-Gubler note, ind( · , ~ξ ) is a non-Archimedean valuation on Z [ x, y ] , and ind( ∂ ax ∂ by q, ~ξ ) ≥ ind( q, ~ξ ) − ad − bd . By construction ind( p, ( α, α )) ≥ t . To show that ind( p, ( β , β )) is small, we will use animproved bivariate form of Roth’s lemma. Specifically, we will prove: Claim 29. ind( p, ~β ) ≤ d d + (1 + d d )( c − − D − + O ( δ ) . We will simply follow Bombieri-Gubler and be more careful in the bivariate case. Proof of Claim. Write U ( x ) := det X ≤ k ≤ d (cid:18) ki (cid:19) a kj x k − i ≤ i,j ≤ d . (4.7)Note that U ( x ) = det ∂ ix ∂ jy pi ! j ! ! ≤ i,j ≤ d (4.8)as polynomials in Z [ x, y ] , since the latter is simply U ( x ) times det (cid:0)(cid:0) ji (cid:1) y j − i (cid:1) ≤ i,j ≤ d = 1 .But (4.8) is proportional to the Wronskian of p , whence it does not vanish identically as apolynomial in x, y (i.e., in x ) by Wronski’s theorem.Now, by expanding out the determinant in (4.7) as a sum over permutations, we findthat deg U ≤ d + ( d − 1) + · · · + ( d − d ) = ( d + 1) (cid:18) d − d (cid:19) . Also, by examining the absolute value of the coefficients of U via the same sum over per-mutations, we find that H ( U ) ≤ O (1) d d H ( p ) d +1 ≤ O (1) d d H ( ˜ R ) ( d d d c − − − O ( δ ) . But now for a univariate polynomial f ( x ) ∈ Z [ x ] , ( qx − p ) k | f ( x ) (which implies H ( f ) ≥ H (cid:16) pq (cid:17) k ) if f vanishes to order k at pq . Hence H ( U ) ≥ H ( β ) d ind( W,~β ) − , HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 29 or, written another way, ind( W, ~β ) ≤ h ( U ) d h ( β ) + O (cid:18) d (cid:19) . But, applying the fact that ind( · , ~β ) is a non-Archimedean valuation, ind( W, ~β ) = ind( U, ~β ) ≥ min σ ∈ S d d X a =0 ind( ∂ ax ∂ σ ( a ) y p, ~β ) . But ind( ∂ ax ∂ σ ( a ) y p, ~β ) ≥ max (cid:18) ind( p, ~β ) − ad , (cid:19) − σ ( a ) d , so that this sum is simply − d ( d + 1)2 d + X ≤ a ≤ min( d ,d · ind( p,~β )) ind( p, ~β ) − ad = − d ( d + 1)2 d + ( d + 1) (cid:16) ind( p, ~β ) − d d (cid:17) ind( p, ~β ) > d d , (cid:16)j d ind( p, ~β ) k + 1 (cid:17) · ind( p, ~β ) − ⌊ d ind( p,~β ) ⌋ ( ⌊ d ind( p,~β ) ⌋ +1)2 d ind( p, ~β ) ≤ d d . In the first case we derive the inequality ind( p, ~β ) ≤ d d + (1 + d d ) · h ( ˜ R ) h ( β ) c − − − O ( δ ) + O ( δ ) . In the second case we start with the inequality ind( p, ~β ) ≤ d d anyway.Therefore ind( p, ~β ) ≤ d d + (1 + d d ) · h ( ˜ R ) h ( β ) c − − − O ( δ ) + O ( δ ) . Recall that h ( β ) = h ( Q ) ≥ ( D − − O ( δ )) h ( ˜ R ) by (4.6), so that our bound reads ind( p, ~β ) ≤ d d + (1 + d d )( c − − D − + O ( δ ) , as desired. (cid:3) Therefore there are a, b such that ( ∂ ax ∂ by p )( β , β ) = 0 and ad + bd ≤ d d + (1 + d d )( c − − D − + O ( δ ) . Let now q ( x, y ) := ( ∂ ax ∂ by p )( x, y ) a ! b ! ∈ Z [ x, y ] . Notice that H ( q ) ≤ O (1) d + d H ( p ) ≤ O ( H ( α )) d d c − − − O ( δ ) as well. Moreover ind( q, ( α, α )) ≥ ind( p, ( α, α )) − ad − bd ≥ t − d d − (1 + d d )( c − − D − + O ( δ ) . Let now k ∗ , ℓ ∗ ≥ be such that k ∗ − d + ℓ ∗ d , k ∗ d + ℓ ∗ − d ≤ ind( q, ( α, α )) but k ∗ d + ℓ ∗ d > ind( q, ( α, α )) . Then observe that q ( x, y ) = Z xα · · · Z w k ∗− α Z yα · · · Z z ℓ ∗− α ( ∂ k ∗ x ∂ ℓ ∗ y q )( w k ∗ , z ℓ ∗ ) dw · · · dw k ∗ dz · · · dz ℓ ∗ . Therefore | q ( x, y ) | ≤ | x − α | k ∗ | y − α | ℓ ∗ sup ( w,z ) ∈ [ α,x ] × [ α,y ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ∂ k ∗ x ∂ ℓ ∗ y q )( w, z ) k ∗ ! ℓ ∗ ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Hence q ( β , β ) = 0 is bounded above in absolute value by | q ( β , β ) | ≤ H ( β ) − κk ∗ H ( β ) − κℓ ∗ O ( H ( α )) d d c − − − O ( δ ) O ( T + | x ( R ~α ) | ) d + d . But it is also a nonzero rational with denominator at most H ( β ) d H ( β ) d , so that | q ( β , β ) | ≥ H ( β ) − d H ( β ) − d . Therefore (using d h ( β ) = d h ( β ) + O (cid:16) h ( β ) d (cid:17) ) we have derived the inequality − d h ( β ) ≤ − κd h ( β ) (cid:18) k ∗ d + ℓ ∗ d (cid:19) + ( d + d ) h ( α ) c − − − O ( δ ) + ( d + d ) max(log T, log | x ( R ~α ) | ) + O ( d + d ) . Using k ∗ d + ℓ ∗ d > √ c − d d − (1 + d d )( c − − D − + O ( δ ) and dividing through by d h ( β ) , we find that √ c − d d ! κ − (1 + d d )( c − − D − ! ( κ + 1) − d d )( D − < O ( δ ) . Finally, recall that d d = h ( β ) h ( β ) + O (cid:16) d (cid:17) ≤ ( κ − − s + O ( δ ) . Inserting this into the boundwe get that √ c − κ − s ! κ − κ − s ( D − (cid:18) κ + 1( c − − (cid:19) < O ( δ ) . This contradicts the hypothesis once δ ≪ c,D , and so we are done. (cid:3) Conclusion of proof. Summarizing, we have proved: Proposition 30. Let c < , D > , ˜ D := D + √ D +42 , C := 5 ˜ D , and s ∈ Z + be such that C + 72 D + max (cid:18) C , D (cid:19) < and √ c − κ − s ! κ − κ − s ( D − (cid:18) κ + 1( c − − (cid:19) > , HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 31 where κ := (cid:18) − max (cid:18) C , D (cid:19) − C − D (cid:19) (cid:0) D − (cid:1) − . Let δ ≪ c,D . Let T ≫ c,D,δ . Let < J < . Let E ∈ F ∗ . Then:(1) If rank( E ) = 0 then | E ( Z ) | = 0 .(2) If rank( E ) = 1 then | E ( Z ) | ≤ .(3) If rank( E ) = r > then: | E ( Z ) | ≤ r & log ˜ D log J ' · max S ⊆ RP r − : ∀ v = w ∈ S, |h v,w i|≤ J + O ( δ ) | S | + 9 s (3 r − . Note that, of course, this implies that if the density of curves with ranks and are both , then lim sup T →∞ Avg F ≤ T universal ( | E ( Z ) | ) ≤ , as claimed. (To see this, the only questionis the contribution from the density zero higher-rank curves. To bound this, use the propo-sition and the Kabatiansky-Levenshtein bound (Theorem 13) and then combine H ¨older’sinequality with Bhargava-Shankar as usual.)In any case, the details of the optimization procedure given this bound are given inthe appendix since the rest of the argument is unrelated to Diophantine geometry. Thiscompletes the argument. (cid:3) 5. P ROOF OF T HEOREM AND ITS COROLLARIES To get inexplicit bounds we may simply follow the general procedure of the proof ofTheorem 2. On examination, to prove Theorem 1 for a family F , it is clear that the onlyestimates required are:(1) An estimate on small points: lim sup T →∞ Avg F ≤ T ( |{ P ∈ E ( Z ) : h ( P ) ≤ C log T + O (1) }| ) ≪ , (2) and a repulsion estimate on larger points: if P = R ∈ E ( Z ) with h ( P ) , h ( R ) ≥ C log T + O (1) and h ( R ) ≤ h ( P ) ≤ (1 + Ω(1)) h ( R ) , then cos θ P,R ≤ . , where the . has come from the Kabatiansky-Levenshtein bound (Theorem 13) — specif-ically, the solution to exp (cid:18) θ θ log (cid:18) θ θ (cid:19) − − sin θ θ log (cid:18) − sin θ θ (cid:19)(cid:19) = 3 has cos θ = 0 . ... .From there one bounds the small points by the first part, the medium points by pro-jecting those in an interval of shape [ X, (1 + Ω(1)) X ] to the unit sphere and applyingKabatiansky-Levenshtein, and the large points by using Siegel’s argument, exactly as wedid in the proof of Theorem 2. So to prove Theorem 1 we will provide exactly these ingre-dients. Since the families will be getting thinner and thinner (from ≍ T for F ≤ T universal to ≍ T for F ≤ TA =0 to ≍ T for F ≤ TB =0 to ≍ T for F ≤ T congruent ), our constants C in the small pointsesimates will get worse and worse (in fact we will always have C = log ( |F| )log T ), which However, for congruent number curves, Le Boudec [27] has obtained a bound with C = 2 , which is muchstronger than the C = 1 we get with our methods. will lead us to be a bit cleverer with our repulsion estimates each time. Note that the mainissue in establishing the repulsion estimate is that the discriminants of the curves in thesefamilies are nowhere near squarefree, so the methods that allowed us to treat the canonicaland Weil heights as roughly the same in the proof of Theorem 2 do not apply here. y = x + B . Proof of Theorem 1 for F A =0 . Of course |F ≤ TA =0 | ≍ T . To count points with | x | ≤ T , note that y ≪ T and that x and y determine B .Therefore the number of solutions ( x, y, B ) with | B | ≪ T and | x | ≤ T is at most thenumber of | x | ≤ T and | y | ≪ T , which is ≪ T . .Otherwise, | y | ≍ | x | . But now given an x , if ( x, y, B ) and ( x, y ′ , B ′ ) are both solutionsand (without loss of generality) y, y ′ > , then y − y ′ = B − B ′ , whence | y − y ′ | ≪ T | x | . Hence, given an T ≤ | x | ≪ T , the number of y such that | x − y | =: | B | ≪ T is ≪ T | x | . Therefore, taking these together, the number of solutions ( x, y, B ) with | x | ≪ T is ≪ T . + X T ≪| x |≪ T T | x | ≪ T . This contributes ≪ to the average.So we have proved the first necessary result. For the second, again restrict (by Helfgott-Venkatesh, H ¨older, and now Fouvry [13] instead of Bhargava-Shankar) to the subfamilywith the largest square divisor of B at most ≪ T δ and with | ∆ | ≍ | B | ≫ T − δ . Now j ( E ,B ) = 0 , so that, by Lang [26] (Chapter III, Section 4), at p > such that v p ( B ) = 1 , λ p ( Q ) − ˆ λ p ( Q ) = log + | x ( Q ) | p − log + | B − x ( Q ) | p , where λ p and ˆ λ p are the local heights for h and ˆ h , respectively, and we have written Q fora rational point on E . (Note that Lang’s normalizations are different from ours by a factorof .)Now this expression for λ p − ˆ λ p is log | B | p unless v p ( x ( Q )) ≥ v p ( B ) = — i.e., unless v p ( x ( Q )) ≥ . But y ( Q ) = x ( Q ) + B As a sidenote, one could also proceed by noting that the curves in each of these families are all twists ofone another, and then estimating effects of twisting on the heights precisely. This reduces to a roughly similarcomputation, though we proceed via local heights in order to also introduce the idea of establishing repulsionbetween P and R for integral points P and R . In fact, at least for the families y = x + Ax and y = x − D x ,since we have such good control on the ranks of the curves in these families one could also simply apply thetheorem of Hindry-Silverman (Theorem 8) after throwing out those curves with large Szpiro ratio, but the impliedconstants would be tremendous. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 33 and v p ( x ( Q )) ≥ , v p ( B ) = 1 implies v p ( y ( Q )) ≥ , so v p ( B ) ≥ , a contradiction. Thus thisexpression is always equal to 13 log | B | p when v p ( B ) = 1 .For primes such that p | B (or p = 2 , ), Lang also proves that | λ p ( x ( Q )) − ˆ λ p ( x ( Q )) | ≪ − log | ∆ | p . Thus the sum over these primes is O ( δ log T ) .Finally, for the infinite prime, Lang proves that λ ∞ ( Q ) − ˆ λ ∞ ( Q ) = log + | x ( Q ) | − log + (cid:16) | ∆ | − | x ( Q ) | (cid:17) . Therefore, exactly as before, ˆ h ( Q ) − h ( Q ) = log + (cid:16) | ∆ | − | x ( Q ) | (cid:17) + 16 log + | ∆ | − log + | x ( Q ) | + O ( δ log T ) by the product formula. Therefore we may simply repeat the proof of Lemma 18 verbatim.This completes the ingredients necessary for this family. (cid:3) y = x + Ax . Now let us move to the family y = x + Ax . Proof of Theorem 1 for F B =0 . The family is of size |F ≤ TB =0 | ≍ T . Fixing y , since x, x + A are both divisors of y , the number of ( x, A ) pairs such that ( x, y, A ) is a solution is at most the number of pairs of divisors ( d , d ) of y , which is τ ( y ) ≪ ǫ y ǫ . Therefore the number of ( x, y, A ) such that | x | ≤ T − ǫ and | A | ≪ T is atmost (since | y | ≪ T − ǫ in this case) ≪ X | y |≪ T − ǫ y ǫ ≪ T − ǫ . For | x | ≥ T − ǫ (so that | y | ≍ | x | ), fix x and note that if ( x, y, A ) and ( x, y ′ , A ′ ) are bothsolutions and y, y ′ > without loss of generality, then y − y ′ = x ( A − A ′ ) , so that | y − y ′ | ≪ T | x | . Thus all the | y | live in an interval of length ≪ T | x | . Note also that y ≡ x ) , which has Q p | x p j vp ( x )2 k solutions modulo x . Therefore thenumber of y given x is at most ≪ T · Q p | x p j vp ( x )2 k | x | . Thus, taking these together, the number of solutions ( x, y, A ) with | x | ≪ T and | A | ≪ T is at most ≪ T − ǫ + X T − ǫ ≪| x |≪ T T · Q p | x p j vp ( x )2 k | x | ≪ T . Thus we have counted small points. For the second ingredient, we would again (withmore difficulty) be able to prove a repulsion bound in terms of h ( P ) and h ( R ) , but esti-mating the error in this bound for points of small height would give us serious difficulty.Moreover, these methods would not work for the next case where we restrict to square A .So we introduce another idea.First restrict to A that have largest square divisor at most T δ and such that | A | ≥ T − δ . Note that j ( E A, ) = 1728 ∈ Z , so that again Lang applies, whence once p > and v p ( A ) = 1 the difference of local Weil and canonical heights is λ p ( Q ) − ˆ λ p ( Q ) = log + | x ( Q ) | p − log + | A − x ( Q ) | p , which is 12 log | A | p unless v p ( x ( Q )) ≥ v p ( A ) = 12 — i.e., unless v p ( x ( Q )) ≥ . In this case the expression is , which we will write as 12 log | A | p − 12 log | A | p . Again, at p = 2 , or p such that p | A , the difference of local heights is ≪ − log | ∆ | p . At theinfinite place, as before the contribution to the difference is log + | x ( Q ) | − log + (cid:16) | A − | · | x ( Q ) | (cid:17) . Therefore we have found that (applying the product formula as before) ˆ h ( Q ) − h ( Q ) = log + (cid:16) | ∆ | − · | x ( Q ) | (cid:17) + 16 log | ∆ | − log + | x ( Q ) | + 12 X p || A,v p ( x ( Q )) ≥ log | A | p + O ( δ log T ) . (5.1)Since the log | A | p terms are simply − log p , this gives us a way of getting an upper boundon ˆ h : ˆ h ( Q ) − h ( Q ) ≤ log + (cid:16) | ∆ | − · | x ( Q ) | (cid:17) + 16 log | ∆ | − log + | x ( Q ) | + O ( δ log T ) . Now for the new idea. Let P = ± R ∈ E ( Z ) with h ( P ) ≥ h ( R ) ≥ T . Write instead cos θ P,R = ˆ h (2 P + 2 R ) − ˆ h (2 P ) − ˆ h (2 R )2 q ˆ h (2 P )ˆ h (2 R ) . From the above we have the upper bound ˆ h (2 P + 2 R ) ≤ h (2 P + 2 R ) + O ( δ log T ) . To do this, see Lemma 32. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 35 Moreover, writing P =: ( x, y ) , if p || A and v p ( x (2 P )) ≥ , then since x (2 P ) = (3 x + A ) y − x = x − Ax + A y , we see that p | x . But then p | y since y = x + Ax . Hence p | y . Since v p ( x (2 P )) ≥ , wesee that p | x − Ax + A , whence p | A , which is to say p | A , a contradiction. The sameholds for R , so that we have found (by (5.1)) that ˆ h (2 P ) = h (2 P ) + O ( δ log T ) and ˆ h (2 R ) = h (2 R ) + O ( δ log T ) . Also note that h (2 P ) ≤ h ( P ) + O (1) since the expression for x (2 P ) has numerator at most O ( x ) and denominator at most O ( x ) , and upon cancelling common terms these estimatesstill hold.Finally, let us write out x (2 P + 2 R ) in terms of x (2 P ) and x (2 R ) . Write P =: (cid:16) αβ , ˜ αβ (cid:17) and R =: (cid:16) α ′ β ′ , ˜ α ′ β ′ (cid:17) . Recall that | x (2 P ) | ≍ | x ( P ) | and similarly for R since | x ( P ) | , | x ( R ) | ≥ T . Thus certainly | α | ≥ | β | and similarly for R , so that H ( P ) = | α | and H ( R ) = | α ′ | .Moreover for the same reason | y (2 P ) | ≍ | x (2 P ) | , so that | ˜ α | ≍ | α | = H ( α ) and similarlyfor R .Now x (2 P + 2 R ) = x (2 P ) x (2 R ) + x (2 P ) x (2 R ) + 2 y (2 P ) y (2 R ) + Ax (2 P ) + Ax (2 R )( x (2 P ) − x (2 R )) = α α ′ β ′ + αα ′ β + 2 ˜ α ˜ α ′ ββ ′ + Aαβ β ′ + Aα ′ β β ′ ( α − α ′ ) . By using the first expression and the fact that | x (2 P ) | ≍ | x ( P ) | (and similarly for R ) itfollows that the first term in the numerator is the largest (up to O (1) ) among those in thenumerator or denominator since h ( P ) ≥ h ( R ) . Therefore H (2 P + 2 R ) ≪ | α | | α ′ || β ′ | = H (2 P ) H (2 R ) | x (2 R ) |≍ H (2 P ) H (2 R ) | x ( R ) | = H (2 P ) H (2 R ) H ( R ) , which is to say h (2 P + 2 R ) ≤ h (2 P ) + 2 h (2 R ) − h ( R ) + O (1) . Since h ( R ) ≥ h (2 R ) − O (1) ,this reduces to h (2 P + 2 R ) ≤ h (2 P ) + 74 h (2 R ) + O (1) . Therefore, putting these together and arguing as in Lemma 18, we find that cos θ P,R = ˆ h (2 P + 2 R ) − ˆ h (2 P ) − ˆ h (2 R )2 q ˆ h (2 P )ˆ h (2 R ) ≤ ˆ h (2 P + 2 R ) − h (2 R ) − h (2 R ) + O ( δ log T )2 p h (2 P ) h (2 R ) ≤ s h (2 P ) h (2 R ) + 38 s h (2 R ) h (2 P ) + O ( δ log T )2 p h (2 P ) h (2 R ) . Now suppose we could show any nontrivial (i.e., not x = 0 ) rational point on y = x + Ax must have height at least c log T for some c ≫ a (very small) positive constant.Then this upper bound would read: cos θ P,R ≤ s h (2 P ) h (2 R ) + 38 s h (2 R ) h (2 P ) + O ( δ ) . Hence this would complete the proof of the second necessary ingredient, since + = = 0 . < . . This is because the number of points P with h ( P ) ≥ T + O (1) and h (2 P ) ∈ [ X, (1 + γ ) X ] is then ≪ γ − · rank( E ) once γ ≪ . Hence since h (2 P ) ≥ c log T ≫ log T, the number of points P with h ( P ) ≥ T + O (1) and h (2 P ) ≤ M log T is ≪ log ( M ) · rank( E ) . But h ( P ) ≥ h (2 P ) − O (1) , so the number of points with h ( P ) ≤ M log T is in fact also ≪ log ( M ) · rank( E ) , which is all we need to conclude the argument.Thus it suffices to show that the smallest nontrivial rational point has height at least c log T for some positive c ≫ . Actually it suffices to do this for a large enough subfam-ily of curves, by the usual H ¨older, Helfgott-Venkatesh, and then Bhargava-Shankar-typeprocedure. We will show that the density of curves with a nontrivial rational point ofmultiplicative height smaller than T =: T c is T − Ω(1) .Now if (cid:16) mn , m ′ n (cid:17) is a point on y = x + Ax with | m | ≤ T c , | n | ≤ T c , then ( m, m ′ ) is anintegral point on y = x + An x with | m | ≤ T c . Note that | m ′ | ≪ T c . The numberof such pairs ( m, m ′ ) is at most T c . Moreover since ( m, m ′ ) determine A and n (up tosign) since A is fourth-power free by minimality, we see that the number of A with E A, with a nontrivial rational point of height at most T c is at most the number of such rationalpoints on an E A, for some A , which is at most the number of ( m, m ′ ) pairs, which is atmost T c . Thus the density is T − c , which is of the desired shape.This completes the argument. (cid:3) Having proven this, let us now explain how to use the methods of Kane [24] and Kane-Thorne [25] to deduce Corollary 4. We will freely use their notation throughout, and forease of reading one should at least go through their arguments to understand the effects ofour modifications. Again, see Lemma 32 for details. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 37 Proof of Corollary 4. Let us quickly show that to control an average of e.g. k · rank( E ) it suf-fices to control moments of Selmer groups on the curves. Let ϕ A : E A, → E − A, be the -isogenies on the curves. Let Sel ϕ A ( E A, ) be the associated Selmer groups. Note that theisogeny dual to ϕ A is simply ϕ − A . Hence ϕ − A ◦ ϕ A = 2 · , multiplication by on E A, . Thefollowing Lemma (combined with Cauchy-Schwarz) shows that to control the average of k · rank( E ) ≤ | Sel ( E ) | k it is enough to control the moments of | Sel ϕ A ( E A, ) | . Lemma 31. Let E α −→ E ′ β −→ E ′′ be a sequence of isogenies between elliptic curves over Q . Then | Sel β ◦ α ( E ) | ≤ | Sel α ( E ) | · | Sel β ( E ′ ) | . Proof. Consider the long exact sequence in Galois cohomology associated to → ker α → ker ( β ◦ α ) → ker β → . It induces a sequence Sel α ( E ) → Sel β ◦ α ( E ) → Sel β ( E ′ ) which isexact at the middle term. (Surjection onto the kernel follows from exactness on H and thefact that the left-hand map is induced by the identity map E → E so only locally trivialclasses map to one another.) The result follows. (cid:3) Hence we will concentrate on bounding moments of | Sel ϕ A ( E A, ) | , as Kane-Thorne do.The next claim is that for this family we may improve Lemma 14 to: Lemma 32. Let G ⊆ F ≤ TB =0 . Then, for all ǫ > , X E ∈G | E ( Z ) | ≪ |F ≤ T B=0 | · |G| |F ≤ T B=0 | ! Ω(1) · (log T ) O (1) . Proof. The only change in the proof of Lemma 14 is that ω (∆) is replaced by ω ( A ) andnow we may use the bound rank( E A, ) ≪ ω ( A ) as well (this comes from a descent by -isogeny: see Proposition 4.9 in Chapter X, Section 4 of [35]). Instead of using the bound ω (∆) ≪ log T log log T , we instead use P n ≤ X O (1) ω ( n ) ≪ X (log X ) O (1) . (cid:3) Hence we may restrict to a subfamily of density − O (cid:0) (log T ) − M (cid:1) once M ≫ . Hencewe may further impose the restriction that ω ( A ) ≤ M log log A for some sufficiently largeconstant M on our curves (on top of the usual restriction that A have non-squarefree partat most T δ ), since the number of n ≤ X with m prime factors is at most ≪ X log X · (log log X + O (1)) m m ! . Moreover, suppose there is a real character χ of modulus D ≪ T with L ( s, χ ) having areal zero β χ with − β χ ≤ (log T ) δ . Then since (by Siegel’s theorem on Siegel zeroes) − β χ ≫ ǫ D − ǫ for all ǫ > , we find that D ≫ δ (log T ) M +1 , for instance. Hence once T ≫ δ (with ineffective implied constant) we may remove all A divisible by D as well.As Kane notes on page 17 of [24], this implies − β χ ≫ (log T ) − for any real zeroes β χ of L ( s, χ ) with χ of modulus not divisible by D and at most T .Call the resulting subfamily e F B =0 ⊆ F B =0 . Let us now indicate the necessary changesto Kane’s argument in [24] in order to get a bound of shape lim sup T →∞ Avg E ∈ e F ≤ TB =0 ( k rank( E ) ) ≪ O (1) (log k ) . We first fix a positive integer F ≤ T δ such that p | F = ⇒ p | F for all primes p > and re-strict our attention to the subfamily of D with F = 2 v ( D ) sq( D ) := 2 v ( D ) Q p | D : p> p v p ( D ) .The claim is that the restrictions log log N < n < N may be replaced by n < M log log N , where M is the sufficiently large constant arising in the definition of e F B =0 .To prove this, we change the following in Kane’s argument. In Proposition 11 we replace O (cid:16) N √ log log N (cid:17) by max ˜ n ≤ n π ˜ n ( N ) , where π ˜ n ( N ) is the number of integers in [1 , N ] with ex-actly ˜ n prime factors. This improves Lemma 17 to a bound of shape ≪ max ˜ n ≤ n π ˜ n ( N ) · (cid:18) O (log log B ) L (cid:19) k + · · · ! . In the proof of Proposition 9 we instead obtain a bound of shape ≪ O (1) k · (cid:18) max ˜ n ≤ n π ˜ n ( N ) (cid:19) · (cid:18) ǫ log log Nn (cid:19) k + (log N ) − C ! . If n ≫ log log log N and N ≫ c,k , then this is ≪ N · c m , as in Kane. If n ≪ log log log N ,then max ˜ n ≤ n π ˜ n ( N ) = π n ( N ) ≍ (log log N ) n n ! · N log N ≪ N log N · O (1) (log log log N ) . Hence the resulting bound in this case is ≪ N log N · O (1) (log log log N ) (cid:18) log log Nn (cid:19) k + 1 ! ≪ N log N · O (1) (log log log N ) , (5.2)since k ≤ n ≪ log log log N . This is again ≪ N · c n ≪ N · c m once N ≫ c since N · c n ≫ N (log log N ) − O (log c ) .Thus we have the necessary improvement to Kane’s Proposition 9 to feed into the anal-ysis in Kane-Thorne. As they note, the contribution of terms with m > is (once N ≫ k and e.g. c = 2 − k − ) ≪ k N − kn n X m =1 (cid:18) nm (cid:19) (2 k − n − m km c m ω ( F ) ≤ N · (1 − − k − ) n ω ( F ) ≤ N · (log N ) − Ω(2 − k ) · ω ( F ) if n ≫ log log log N . When n ≪ log log log N we use the stronger bound in (5.2) to obtain ≪ k N log N O (1) (log log log N ) − kn n X m =1 (cid:18) nm (cid:19) (2 k − n − m km ω ( F ) ≤ N log N O (1) (log log log N ) O (1) kn ω ( F ) ≪ N · (log N ) − · ω ( F ) once N ≫ k . So we may ignore the terms with m > .Also, as in Kane-Thorne, the sum over terms with m = 0 is ≪ O (1) k · O (1) ω ( F ) · |{| x | ≤ N : F | x, ω ( x ) = n, v p ( x ) sq( x ) = F }| , HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 39 where sq( x ) is the “odd squarefull” part of x : sq( x ) = Y p | x : p> p v p ( x ) . Summing over all n ≪ log log N , we find that the sum of k · rank( E ) over those E with v ( D ) Q p | D p v p ( D ) = F is ≪ O (1) k · O (1) ω ( F ) · |{| x | ≤ N : F | x, v ( x ) sq( x ) = F }|≪ O (1) k · O (1) ω ( F ) F · |{| x | ≤ N : 2 v ( x ) sq( x ) ≤ T δ }| , whence the contribution to the average of those D with “even/squarefull part” F is ≪ O (1) k · O (1) ω ( F ) F .Summing over F ≤ T δ such that p | F = ⇒ p | F for all p > gives the result. Indeed, X F ≤ T δ : F odd squarefull O (1) ω ( F ) F ≪ . (cid:3) y = x − D x . Finally, we handle the congruent number curves. Proof of Theorem 1 for F congruent . The family is of size |F ≤ T congruent | ≍ T. First, the small points. We will in fact drop the restriction that D be squarefree whencounting the small points since it will not be necessary, but we may, and will, assume | D | ≥ T − δ . Fix x = 0 . Break up the set of solutions ( x, y, D ) with y, D > and D = ± x (without loss of generality) into two classes: those with | D − | x || ≤ T | x | and those with | D − | x || > T .Let ( y, D ) , ( y ′ , D ′ ) be two solutions. As usual, by taking differences, | y − y ′ | ≪ | D − D ′ || x || y | . Now | x ( x − D )( x + D ) | ≫ | x || D || D − | x || , so | y | ≫ | x | | D | | D − | x || . Thus | y − y ′ | ≪ | D − D ′ || x | | D − | x || | D | . Hence if ( x, y, D ) and ( x, y ′ , D ′ ) are solutions of the first class, then D and D ′ are close,so that | y − y ′ | ≪ T | x | . If ( x, y, D ) and ( x, y ′ , D ′ ) are solutions of the second class and D is maximal among all suchsolutions, then | y − y ′ | ≪ D | x | T − | x | − ≪ T | x | . Thus in general | y − y ′ | ≪ T | x | . Now also y ≡ x ) , which has Q p | x p j vp ( x )2 k solutions modulo x . Therefore sincethe y for which there exists a D making ( x, y, D ) a solution all lie in at most four intervals(depending on sign and class) of length at most ≪ T | x | and since ( x, y ) determine ± D ,we find that there are at most ≪ T Q p | x p j vp ( x )2 k | x | solutions with fixed x .Therefore we find that the number of ( x, y, D ) with | x | ≤ T and | D | ≪ T is at most ≪ X | x |≤ T T Q p | x p j vp ( x )2 k | x | . But the Dirichlet series X n ≥ Q p | n p j vp ( n )2 k n s + = Y p (1 + p − s − + p − s − + p − s − + · · · )= Y p p − s − − p − s − = ζ (cid:0) s + (cid:1) ζ (cid:0) s + (cid:1) ζ (cid:0) s + (cid:1) has its rightmost pole at s = , of order two. Thus X n ≪ T Q p | n p j vp ( n )2 k n ≪ T log T, whence X | x |≤ T T Q p | x p j vp ( x )2 k | x | ≪ T + T log T, which finishes the small point counting. Now for the repulsion estimate. The argument is exactly the same as in the case y = x + Ax — the only difference is that in the beginning of the argument we derive v p ( x ( Q )) ≥ v p ( D ) rather than v p ( A ) , but we only use the consequence that this implies v p ( x ( Q )) ≥ .The rest goes through completely, so that it suffices to show that on a density − T − Ω(1) subfamily there are no nontrivial rational points of height smaller than c log T for some(small) positive constant c , by the same argument as in the case y = x + Ax . We willagain take c := . In fact, by using Proposition 1 in [27], we may count small points of height ≪ T (log T ) − O (1) instead of ≪ T ! HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 41 But, as before, a rational point (cid:16) mn , m ′ n (cid:17) with | m | ≤ T c , | n | ≤ T c on y = x − D x corresponds to the integral point ( m, m ′ ) on y = x − ( Dn ) x . Write ˜ D := Dn — notethat the information of ˜ D is equivalent to that of ( D, n ) since D is taken to be squarefree.Note also that in this case | m ′ | ≍ | ˜ D || x | .Now fix m . From the same argument as in the small point counting above (as we noted,we didn’t need D squarefree), we find that the number of ( m ′ , ˜ D ) is at most ≪ T c − ǫ Q p | m p j vp ( m )2 k | m | . Summing this up to | m | ≤ T c gives a bound on the number of very small rational pointson these curves of ≪ T c + T + c − ǫ , which completes the argument. (cid:3) To deduce Corollary 3 we will have a slightly easier time than we did for Corollary 4,since Heath-Brown’s methods in [17] control the moments of rank( E ) over the family quitewell. Again, we use his notation freely throughout, and urge the reader to go through theoriginal argument to understand our modifications. Proof of Corollary 3. Theorem 1 of Heath-Brown [17] gives us the claimed bound lim sup T →∞ Avg E ∈F ≤ T congruent , odd ( k rank( E ) ) ≪ O (1) (log k ) over the subfamily F congruent , odd ⊆ F congruent of curves y = x − D x with D odd . Butextending this to D ≡ (recall D is restricted to be squarefree) is no problem,since we only need an upper bound on the average of (in Heath-Brown’s notation) k · s ( D ) of shape O (1) k , where s ( D ) is the -Selmer rank of y = x − D x . Specifically, for these D Heath-Brown’s quadratic form P controlling the appearance of a Legendre symbol doesnot change — in fact we need only change R , which does not affect the shape of the upperbound.Let us indicate the necessary changes in the argument. Lemma 1 of [17] changes into anupper bound of shape (here D = 2 v ( D ) · D odd ): s ( D ) ≤ X D odd = Q ≤ i ≤ , ≤ j ≤ ,i = j D ij (cid:18) − α (cid:19) (cid:18) β (cid:19) Y i =1 − ω ( D i ) Y ≤ j ≤ ,j = i − ω ( D ij ) Y k = i,j Y ℓ (cid:18) D kℓ D ij (cid:19) · (cid:20) (cid:18) D D D D D D (cid:19) + (cid:18) D D D D D D (cid:19) + (cid:18) D D D D D D (cid:19) + (cid:18) D D D D D D (cid:19)(cid:21) . The only changes required to obtain this bound are that in [16] Heath-Brown chooses a(unique) representative of a point P ∈ E ( Q ) / tors with | x | = 1 and x > — instead onehas to change the -adic condition to | x | = | D | . Also, instead of worrying about thecondition for local solubility of the equations resulting from the -descent at p = 2 (whichHeath-Brown handles by a trick reducing to Hilbert’s reciprocity law), we may simply dropthe condition since we are only concerned with an upper bound on s ( D ) . The rest of the argument proceeds in exactly the same way, except we trivially bound the sum remainingin Section 5 (“the leading terms”) of [17]. This completes the proof. (cid:3) A PPENDIX A. O PTIMIZING THE BOUND FOR T HEOREM J by O ( δ ) for computational purposes below): Proposition 33. Let c < , D > , ˜ D := D + √ D +42 , C := 5 ˜ D , and s ∈ Z + be such that C + 72 D + max (cid:18) C , D (cid:19) < and √ c − κ − s ! κ − κ − s ( D − (cid:18) κ + 1( c − − (cid:19) > , where κ := (cid:18) − max (cid:18) C , D (cid:19) − C − D (cid:19) (cid:0) D − (cid:1) − . Let δ ≪ c,D . Let T ≫ c,D,δ . Let < J < . Let E ∈ F ∗ . Then:(1) If rank( E ) = 0 then | E ( Z ) | = 0 .(2) If rank( E ) = 1 then | E ( Z ) | ≤ .(3) If rank( E ) = r > , then: | E ( Z ) | ≤ r & log ˜ D log J + O ( δ ) ' · max S ⊆ RP r − : ∀ v = w ∈ S, |h v,w i|≤ J | S | + 9 s (3 r − . The first question is how to get an explicit bound on max S ⊆ RP r − : ∀ v = w ∈ S, |h v,w i|≤ J | S | for r very large. (Kabatiansky-Levenshtein gives an asymptotic, but this is not enough.)Since we can take r extremely large (e.g., r ≥ ) and Bhargava-Shankar guarantee that theproportion of curves with rank at least r is ≪ − r , the following simpleminded estimatewill suffice. Lemma 34. Let θ > , let r ≥ , and let S ⊆ S r − be such that for every v = w ∈ S, θ v,w ≥ θ .Then | S | ≤ √ r sin (cid:18) θ (cid:19) − r cos (cid:18) θ (cid:19) − . Proof. Note that balls of radius θ (in the spherical distance) about the points of S do notintersect. Thus vol( S r − ) ≥ | S | · vol (cid:16) B θ ((1 , , . . . )) (cid:17) . But the ball of radius θ about (1 , , . . . ) is the spherical cap x ≥ cos (cid:0) θ (cid:1) . The surface areaof such a cap is 12 vol( S r − ) I sin ( θ ) (cid:18) r − , (cid:19) , where I x ( a, b ) is the regularized incomplete beta function.But I x ( a, b ) = x a (1 − x ) b aB ( a, b ) X n ≥ B ( a + 1 , n + 1) B ( a + b, b + 1) x n +1 HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 43 where B ( w, z ) is the usual beta function. Thus in particular I x ( a, b ) ≥ x a (1 − x ) b aB ( a,b ) , so that wehave found: | S | ≤ ( r − B (cid:0) r − , (cid:1) sin r − (cid:0) θ (cid:1) cos (cid:0) θ (cid:1) ≤ r √ π · Γ (cid:0) r − (cid:1) Γ (cid:0) r (cid:1) · sin − r (cid:18) θ (cid:19) cos (cid:18) θ (cid:19) − . Therefore it suffices to show that Γ (cid:0) r − (cid:1) Γ (cid:0) r (cid:1) ≤ √ √ πr . But this follows via induction (with equality at r = 3 ). (cid:3) Note that this implies that max S ⊆ RP r − : ∀ v = w ∈ S, |h v,w i|≤ J | S | ≤ √ r sin (cid:18) θ (cid:19) − r cos (cid:18) θ (cid:19) − = √ r (cid:18) − J (cid:19) − r (cid:18) 12 + J (cid:19) − , where as usual we have written J = 2 cos θ .Now notice that we have left the r = 2 case on its own. This is because in this case theunit sphere is simply the circle and we can give a very good estimate for the maximum (theidea is the same): Lemma 35. Let S ⊆ RP be such that for every v = w ∈ S , θ v,w ≥ θ . Then | S | ≤ (cid:22) πθ (cid:23) . Proof. Let T := { ϕ ∈ [0 , π ) : e iϕ ∈ π − ( S ) } , where π : S → RP is the projection. (Notethat | T | = | S | .) Without loss of generality ∈ T . Then the union [ = t ∈ T (cid:18) t − θ , t + θ (cid:19) ∪ (cid:18) π − θ , π (cid:19) ∪ (cid:18) , θ (cid:19) is disjoint. On taking measures we find the desired inequality. (cid:3) Now for ≤ r ≤ we use a program written by Henry Cohn to find optimal linearprogramming bounds on these maxima. This allows us to compile a table of bounds forgiven θ ranging from slightly larger than to slightly smaller than π . Then for each fixed r we choose c, D, s, J making the upper bound on | E ( Z ) | as small as possible. This choiceof J corresponds to a θ via J = 2 cos θ , and one needs only check the sphere packing upperbound we use with rigorous arithmetic for this J . In any case, what is left is simply a Mathematica calculation, and the relevant Math-ematica document used to optimize the bound has been included. As a final note, ob-serve that if ( A, B ) ≡ (2 , 2) (mod 3) , then E A,B ( Z ) = ∅ . Thus we may restrict to the sub-family G of ( A, B ) not congruent to (2 , modulo . Inside this subfamily, we use themethods of Bhargava-Shankar (and Bhargava-Skinner-Zhang) to compute lower bounds We end up simply choosing c = 0 . , D = 612 . , s = 3 and instead only optimizing J for each ≤ r ≤ . for the proportions of curves with rank , , and either or . For reference, denote by ˜ F , . . . , ˜ F , ˜ F + , ˜ F − , and ˜ F the subfamilies of curves with ( A, B ) (2 , 2) (mod 3) corre-sponding to the large families F , . . . , F , F + , F − , and F constructed in [5]. Then ˜ F , ˜ F , ˜ F have unchanged densities, and ˜ F has density µ ( F ) − ≥ . (we are lucky becausethe local root number at does not vary when v ( A ) = v ( B ) = 0 ). Here we have written µ to mean the density of a subfamily (where the ambient family is understood). This resultsin lower bounds of µ ( ˜ F + ) ≥ . and µ ( ˜ F − ) ≥ . . Therefore the union of thesefamilies has density µ ( ˜ F ) ≥ . . Following Bhargava-Skinner-Zhang, this results in aproportion of at least . of curves in G having rank . Following Bhargava-Shankar,this also results in a proportion of at least . of curves having rank , and at least . having rank either or . Since G has density in F universal , we in effect gain afactor of (as well as slightly more from the improved lower bounds on rank ≤ curves)due to these considerations. The remaining optimization is in the Mathematica file.R EFERENCES[1] A. Baker. Linear forms in the logarithms of algebraic numbers. I, II, III. Mathematika 13 (1966), 204-216; ibid.14 (1967), 102-107; ibid. , 14:220–228, 1967.[2] Michael A. Bennett. On the representation of unity by binary cubic forms. Trans. Amer. Math. Soc. ,353(4):1507–1534 (electronic), 2001.[3] Manjul Bhargava and Benedict H. Gross. The average size of the 2-selmer group of jacobians of hyperellipticcurves having a rational weierstrass point. See http://arxiv.org/abs/1208.1007, preprint (2013).[4] Manjul Bhargava and Arul Shankar. The average number of elements in the 4-selmer groups of elliptic curvesis 7. See http://arxiv.org/abs/1312.7333, preprint (2013).[5] Manjul Bhargava and Arul Shankar. The average size of the 5-selmer group of elliptic curves is 6, and theaverage rank is less than 1. See http://arxiv.org/abs/1312.7859, preprint (2013).[6] Manjul Bhargava and Arul Shankar. Binary quartic forms having bounded invariants, and the boundednessof the average rank of elliptic curves. See http://arxiv.org/abs/1006.1002, preprint (2010).[7] Manjul Bhargava and Arul Shankar. Ternary cubic forms having bounded invariants, and the existence of apositive proportion of elliptic curves having rank 0. See http://arxiv.org/abs/1007.0052, preprint (2010).[8] Manjul Bhargava, Arul Shankar, and Jacob Tsimerman. On the Davenport-Heilbronn theorems and secondorder terms. Invent. Math. , 193(2):439–499, 2013.[9] Manjul Bhargava, Christopher Skinner, and Wei Zhang. A majority of elliptic curves over q satisfy the birchand swinnerton-dyer conjecture. See http://arxiv.org/abs/1407.1826, preprint (2014).[10] E. Bombieri and J. Pila. The number of integral points on arcs and ovals. Duke Math. J. , 59(2):337–357, 1989.[11] Enrico Bombieri and Walter Gubler. Heights in Diophantine geometry , volume 4 of New Mathematical Mono-graphs . Cambridge University Press, Cambridge, 2006.[12] Yann Bugeaud and Maurice Mignotte. Polynomial root separation. Int. J. Number Theory , 6(3):587–602, 2010.[13] ´E. Fouvry. Sur le comportement en moyenne du rang des courbes y = x + k . In S´eminaire de Th´eorie desNombres, Paris, 1990–91 , volume 108 of Progr. Math. , pages 61–84. Birkh¨auser Boston, Boston, MA, 1993.[14] J. Gebel, A. Petho, and H. G. Zimmer. Computing integral points on elliptic curves. Acta Arith. , 68(2):171–192,1994.[15] Robert Harron and Andrew Snowden. Counting elliptic curves with prescribed torsion. Seehttp://arxiv.org/abs/1311.4920, preprint (2013).[16] D. R. Heath-Brown. The size of Selmer groups for the congruent number problem. Invent. Math. , 111(1):171–195, 1993.[17] D. R. Heath-Brown. The size of Selmer groups for the congruent number problem. II. Invent. Math. ,118(2):331–370, 1994. With an appendix by P. Monsky.[18] D. R. Heath-Brown. The density of rational points on curves and surfaces. Ann. of Math. (2) , 155(2):553–595,2002.[19] D. R. Heath-Brown. The average analytic rank of elliptic curves. Duke Math. J. , 122(3):591–623, 2004.[20] H. A. Helfgott. On the square-free sieve. Acta Arith. , 115(4):349–402, 2004.[21] H. A. Helfgott and A. Venkatesh. Integral points on elliptic curves and 3-torsion in class groups. J. Amer.Math. Soc. , 19(3):527–550 (electronic), 2006. HE AVERAGE NUMBER OF INTEGRAL POINTS ON ELLIPTIC CURVES IS BOUNDED 45 [22] M. Hindry and J. H. Silverman. The canonical height and integral points on elliptic curves. Invent. Math. ,93(2):419–450, 1988.[23] G. A. Kabatjanski˘ı and V. I. Levenˇste˘ın. Bounds for packings on the sphere and in space. Problemy PeredaˇciInformacii , 14(1):3–25, 1978.[24] Daniel Kane. On the ranks of the 2-Selmer groups of twists of a given elliptic curve. Algebra Number Theory ,7(5):1253–1279, 2013.[25] Daniel Kane and Jack Thorne. On the ϕ -selmer groups of the elliptic curves y = x − dx . Seehttp://math.harvard.edu/ thorne/phi-selmer.pdf, preprint (2013).[26] Serge Lang. Elliptic curves: Diophantine analysis , volume 231 of Grundlehren der Mathematischen Wissenschaften[Fundamental Principles of Mathematical Sciences] . Springer-Verlag, Berlin-New York, 1978.[27] P. Le Boudec. Linear growth for certain elliptic fibrations. Int. Math. Res. Not. (to appear) .[28] Val´ery Mah´e. Prime power terms in elliptic divisibility sequences. Math. Comp. , 83(288):1951–1991, 2014.[29] K. Mahler. An inequality for the discriminant of a polynomial. Michigan Math. J. , 11:257–262, 1964.[30] Clayton Petsche. Small rational points on elliptic curves over number fields. New York J. Math. , 12:257–268(electronic), 2006.[31] Bjorn Poonen and Michael Stoll. Most odd degree hyperelliptic curves have only one rational point. Ann. ofMath. (2) , 180(3):1137–1166, 2014.[32] Sam Ruth. A bound on the average rank of j-invariant zero elliptic curves. Princeton PhD Thesis (2014).[33] Joseph H. Silverman. A quantitative version of Siegel’s theorem: integral points on elliptic curves and Cata-lan curves. J. Reine Angew. Math. , 378:60–100, 1987.[34] Joseph H. Silverman. The difference between the Weil height and the canonical height on elliptic curves. Math. Comp. , 55(192):723–743, 1990.[35] Joseph H. Silverman. The arithmetic of elliptic curves , volume 106 of Graduate Texts in Mathematics . Springer,Dordrecht, second edition, 2009.[36] Katherine E. Stange. Integral points on elliptic curves and explicit valuations of division polynomials. Seehttp://arxiv.org/pdf/1108.3051v4.pdf, to appear in the Canadian Journal of Mathematics. E-mail address : [email protected] C HURCHILL C OLLEGE , U NIVERSITY OF C AMBRIDGE , C