[PDF] Probabilistic Szpiro, Baby Szpiro, and Explicit Szpiro from Mochizuki's Corollary 3.12

Abstract

In \cite{Dupuy2020a} we gave some explicit formulas for the "indeterminacies" Ind1,Ind2,Ind3 in Mochizuki's Inequality as well as a new presentation of initial theta data. In the present paper we use these explicit formulas, together with our probabilistic formulation of \cite[Corollary 3.12]{IUT3} to derive variants of Szpiro's inequality (in the spirit of \cite{IUT4}). In particular, for an elliptic curve in initial theta data we show how to derive uniform Szpiro (with explicit numerical constants). The inequalities we get will be strictly weaker than \cite[Theorem 1.10]{IUT4} but the proofs are more transparent, modifiable, and user friendly. All of these inequalities are derived from an probabilistic version of \cite[Corollary 3.12]{IUT3} formulated in \cite{Dupuy2020a} based on the notion of random measurable sets.

Full PDF

aa r X i v : . [ m a t h . N T ] A p r PROBABILISTIC SZPIRO, BABY SZPIRO, AND EXPLICIT SZPIROFROM MOCHIZUKI’S COROLLARY 3.12

TAYLOR DUPUY AND ANTON HILADO

Abstract.

In [DH20b] we gave some explicit formulas for the “indeterminacies” ind1 , ind2 , ind3in Mochizuki’s Inequality as well as a new presentation of initial theta data. In the presentpaper we use these explicit formulas, together with our probabilistic formulation of [Moc15a,Corollary 3.12] to derive variants of Szpiro’s inequality (in the spirit of [Moc15b]). In par-ticular, for an elliptic curve in initial theta data we show how to derive uniform Szpiro(with explicit numerical constants). The inequalities we get will be strictly weaker than[Moc15b, Theorem 1.10] but the proofs are more transparent, modiﬁable, and user friendly.All of these inequalities are derived from an probabilistic version of [Moc15a, Corollary 3.12]formulated in [DH20b] based on the notion of random measurable sets. Contents

1. Introduction 12. Explicit Computations in Tensor Packets 53. Conductors, Minimal Discriminants, and Ramiﬁcation 104. Estimates on p -adic Logarithms 165. Archimedean Logarithms 186. Upper Bounds on Hulls 187. Probabilistic Versions of the Mochizuki and Szpiro Inequalities 228. Deriving Explicit Constants For Szpiro’s Inequality From Mochizuki’s Inequality 29References 351. Introduction

In [DH20b] we gave a probabilistic interpretation of Mochizuki’s Inequality (Corollary 3.12of [Moc15a]). In the present paper we perform some explicit computations using this inequal-ity to derive three inequalities which we will call “Probabilistic Szpiro”, “Baby Szpiro”, and“Explicit Szpiro”. All of these inequalities depend on hypothesis of an elliptic curve beingin “initial theta data built from the ﬁeld of moduli”, [Moc15a, Corollary 3.12], and someassumed behavior at the archimedean place stated in Claim 5.0.1.In order to state our results we need to talk about initial theta data. As formulated in § F /F, E F , l, M , V , V badmod , ǫ ) Date : April 30, 2020. urrounding an elliptic curve E = E F over a ﬁeld F satisfying various conditions whichare not important for the purposes of the introduction (the curious reader should consult[DH20b, § l is a ﬁxed primeand, in the present paper, the choices of M , and ǫ will be irrelevant. We will discuss the setsof places V and V badmod momentarily. This requires some set up.In order to deﬁne the sets of places V badmod and V we need to introduce the ﬁelds F and K to which they belong. The ﬁeld F is the ﬁeld of Moduli of the elliptic curve deﬁned by F := Q ( j E ) , in Mochizuki’s notation this is F mod . The ﬁeld K is the l -division ﬁeld of F given by K := F ( E [ l ]) , (1.1)obtained by adjoining the l -torsion of E ( F ) to F . In this paper, for any ﬁeld L we will let V ( L ) denote the collection of places of L and for any non-archimedean place v ∈ V ( L ) wewill let κ ( v ) denote the residue ﬁeld and L v denote the completion of L at v .We now come to the deﬁnitions of V badmod and V . First V badmod ⊂ V ( F ) is a non-empty set ofbad multiplicative places over the ﬁeld of moduli: for every E an elliptic curve over F suchthat E ∼ = E ⊗ F F , if v ∈ V badmod , then E has multiplicative reduction at v . Next, the set V ⊂ V ( K ) is a set that maps bijectively to V ( F ) under the natural map V ( K ) → V ( F ).We will be using these quantities momentarily but ﬁrst we need to describe a special typeof initial theta data that will be used in the course of this manuscript. For computationalpurposes one can always take an elliptic curve over its ﬁeld of moduli (satisfying some mildconditions) and base change this ﬁeld to a larger ﬁeld to obtain some curve that can be putin initial theta data. We call such theta data “built from the ﬁeld of moduli”. The precisedeﬁnition is below. Deﬁnition 1.0.1.

Let

E/F be an elliptic curve inside initial theta data(

F /F, E F , l, M , V , V badmod , ǫ ) . We will say that such a tuple is built from the ﬁeld of moduli provided E = E ⊗ F F where F = Q ( j E ) is the ﬁeld of moduli of E , E is a model of E over F , F := F ( √− , E [30]) , and V badmod ⊂ V ( F ) is the full set of places of multiplicative reduction. We will often use the notation d := [ F : Q ] . The constants in our Szpiro-like inequalities will depend on this degree. In stating our resultswe recall from [DH20b] that for a rational prime p that V ( F ) p is given the structure of aprobability space where Pr : V ( F ) p → [0 ,

1] is deﬁned byPr( v ) := [ F ,v : Q p ][ F : Q p ] . The deﬁnition of initial theta data in § sing V one can deﬁne some interesting probabilistic quantities which appear in our Prob-abilistic Szpiro inequality and give a good sense of the types of things that Mochizuki’sinequality “knows about”. Deﬁnition 1.0.2. (1) The probability that v ∈ V p is unramiﬁed is P unr ,p = X w ∈{ v ∈ V ( F ) p : e ( v/p )=1 } Pr( w ) . (1.2)(2) The average ramiﬁcation degree of v ∈ V p is deﬁned to be e p = E ( e ( v/p )) . (1.3)(3) The average diﬀerent of V / Q is deﬁned to beDiﬀ( V / Q ) = Y p p diﬀ p . (1.4)In (1.4) we have diﬀ p = log p ( E ( p diﬀ( v/p ) )) and diﬀ( v/p ) = ord p (Diﬀ( K v / Q p )); Diﬀ( K v / Q p )is the diﬀerent of K v over Q p .Using these quantities we can now state the Probabilistic Szpiro. Theorem 1.0.3 (Probabilistic Szpiro) . Assume [Moc15a, Corollary 3.12] and Claim 5.0.1.For any elliptic curve

E/F in initial theta data ( F /F, E F , l, M , V , V badmod , ǫ ) built from theﬁeld of moduli we have

16 + ε l ln | ∆ min E/F | [ F : Q ] ≤ ln Diﬀ( V / Q ) + X p ln( e p ) + A l,V (1.5) where A l,V = ln( π ) + X p (1 − P l +1 / ,p ) (cid:18) ln( b p ) + 5 l + 4 (cid:19) , and b p = 1 / exp(1) ln( p ) , and ε l = 24( l + 3) / ( l + l − . From the Probabilistic Szpiro Inequality we can derive a “Baby Szpiro” Inequality. Thisinequality only depends on discriminant and degree of the division ﬁeld K . This inequalitycan be derived quickly dispensing with a discussion of ramiﬁcation of the mod l Galoisrepresentation and its relation to the conductor (which is reviewed in § I in a ring of integers R we let | I | denote the absolute norm. Theorem 1.0.4 (Baby Szpiro) . Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. Thenfor any elliptic curve

E/F in initial theta data ( F /F, E F , l, M , V , V badmod , ǫ ) built from the ﬁeldof moduli we have

16 + ε l ln | ∆ min E/F | [ F : Q ] ≤ ln([ K : Q ] / ) ln( | Disc( K/ Q ) | / ) + ln( π ) . (1.6) In the above formula ε l = (24 l + 72) / ( l + l − and K = Q ( j E , E [30 l ] , √− . n § Theorem 1.0.5 (Explicit Szpiro) . Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. If

E/F is an elliptic curve in initial theta data ( F /F, E F , l, M , V , V badmod , ǫ ) built from the ﬁeldof moduli then | ∆ min E/F | ≤ e A d l + B d ( | Cond(

E/F ) | · | Disc( F/ Q ) | ) ε l , (1.7) where A = 84372107405 ,B = 316495 ,ε l = 96 ( l + 3) / ( l + l − . All of these computations follow the same pattern. First, one computes an upper boundfor the so-called hull of the multiradial representation (c.f. [DH20b, § § § Acknowledgements.

This article is very much indebted to many previous expositions ofIUT including (but not limited to) [Fes15, Hos18, Ked15, Hos15, Sti15, Mok15, Moc17,Yam17, Hos17, Tan18, SS17]. The ﬁrst author also greatly beneﬁtted from conversationswith many other mathematicians and would especially like to thank Yuichiro Hoshi for help-ful discussions regarding Kummer theory and his patience during discussions of the thetalink and Mochizuki’s comparison; Kirti Joshi for discussions on deformation theory in thecontext of IUT; Kiran Kedlaya for productive discussions on Frobenioids, tempered funda-mental groups, and global aspects of IUT; Emmanuel Lepage for helpful discussions on thep-adic logarithm, initial theta data, aut holomorphic spaces, the log-kummer correspondence,theta functions and their functional equations, tempered fundamental groups, log-structures,cyclotomic synchronization, reconstruction of fundamental groups, reconstruction of decom-position groups, the ”multiradial representation of the theta pilot object”, the third indeter-minacy, the second indeterminacy, discussions on Hodge Theaters, labels, and kappa coricfunctions, and discussions on local class ﬁeld theory; Shinichi Mochizuki for his patiencein clarifying many aspects of his theory — these include discussions regarding the relation-ship between IUT and Hodge Arakelov theory especially the role of ”global multiplicativesubspaces” in IUT, discussions on technical hypotheses in initial theta data; discussions onTheorem 3.11 and ”(abc)-modules”, discussions on mono-theta environments and the inte-rior and exterior cyclotomes, discussions of the behavior of various objects with respect to utomorphisms and providing comments on treatment of log-links and the use of polyiso-morphisms, discussions on indeterminacies and the multiradial representation, discussionsof the theta link, discussions on various incarnations of Arakelov Divisors, discussions oncyclotomic synchronization; Chung Pang Mok for productive discussions on the p-adic log-arithm, anabelian evaluation, indeterminacies, the theta link, and hodge theaters; ThomasScanlon for discussions regarding interpretations and inﬁnitary logic as applied to IUT andanabelian geometry. We apologize if we have forgotten anybody.The authors also beneﬁtted from the existence of the following workshops: the 2015 Ox-ford workshop funded by the Clay Mathematics Institute and the EPSRC programme grant Symmetries and Correspondences ; the 2017 Kyoto

IUT Summit workshop funded by RIMSand EPSRC; the Vermont workshop in 2017 funded by the NSF DMS-1519977 and

Sym-metries and Correspondences entitled

Kummer Classes and Anabelian Geometry ; the 2018Vermont Workshop on

Witt Vectors, Deformations and Absolute Geometry funded by NSFDMS-1801012.The ﬁrst author was partially supported by the European Research Council under theEuropean Unions Seventh Framework Programme (FP7/2007-2013) / ERC Grant agreementno. 291111/ MODAG while working on this project.The research discussed in the present paper proﬁted enormously from the generous supportof the International Joint Usage/Research Center (iJU/RC) located at Kyoto UniversitiesResearch Institute for Mathematical Sciences (RIMS) as well as the Preparatory Center forResearch in Next-Generation Geometry located at RIMS.2.

Explicit Computations in Tensor Packets

The entire purpose of this subsection is Theorem 2.8.1 and the entire purpose of Theo-rem 2.8.1 is the for hull computation in §

6. At the end of the day the diﬀerents appearing inthese sections are what give rise to the conductor term in Szpiro inequalities (in conjunctionwith the material in § K , . . . , K m ﬁnite extensions of Q p . Let L = K ⊗ · · · ⊗ K m . We are interested inthe diﬀerence between the Z p -lattices O K ⊗ · · · ⊗ O K m ⊂ O L . It turns out that the indexof O K ⊗ · · · ⊗ O K m in O L is related to the Diﬀerents of K i / Q p which we describe in thesubsequent subsections. Again, this is needed for the hull computations. Integral Closures.

For a reduced ring T , the total ring of fractions is deﬁned by κ ( T ) = S − T where S max is the multiplicatively closed set of non-zero divisors. We letInt κ ( T ) ( T ) denote the integral closure of T in κ ( T ).Let K , . . . , K m be ﬁnite extensions of Q p and consider the special case T = O K ⊗· · ·⊗O K m (where tensor products are taken over Z p ). It turns out that κ ( T ) = K ⊗ · · · ⊗ K m (wheretensor products are over Q p ). Let L = κ ( T ). We can use the Chinese Remainder Theorem A lattice of a Q p -vectors space V is a Z p -submodule V ⊂ V which is free of rank dim( V ) whose Q p -spanis all of V . And in explicit cases one can actually proceed diﬀerently, say, using conductors in the sense of Commu-tative Algebra. o write L ∼ = L rj =1 L j where each L j is a ﬁnite extension of Q p . Deﬁne O L = L rj =1 O L j . Italso turns out that Int L ( T ) = O L .2.2. Diﬀerents and Discriminants.

For a discussion on Diﬀerents and Discriminants ofﬁelds we refer the reader to [Neu99, III.2] or [Sut15] or [Con]. A very comprehensive reviewof Diﬀerents in great generality can be found in [Aut19, 0DW4] and the references therein.For A ⊃ B a ﬁnite extension of rings we deﬁne the diﬀerent ideal to beDiﬀ( A/B ) := ann B (Ω B/A ) . Here Ω

B/A is the module of Kahler diﬀerentials. When

L/L is an extension of ﬁelds weuse the notation Diﬀ( L/L ) := Diﬀ( O L / O L ). For an extension of number ﬁelds L/L thediﬀerent ideal and be computed “as a product” of local diﬀerents (see [Neu99], for details).The diﬀerent also behaves well in towers. If K is a ﬁnite extension of Q p with residue ﬁeld k then O K can be written as O K = W ( k )[ x ] / ( f ( x )) , where f ( x ) is an Eisenstein polynomial of degree e (also e is the ramiﬁcation degree of K/ Q p ) and W ( k ) is the full ring of p -typical Witt vectors of k . As Diﬀ( W ( k ) / Z p ) = 1 (thisextension is unramiﬁed) to compute Diﬀ( O K / Z p ) it remains to compute Diﬀ( O K /W ( k )).From the formula Ω O K /W ( k ) = ( O K · dx ) / ( O K · df ) and d ( f ( x )) = f ′ ( x ) dx we ﬁnd thatDiﬀ( O K / Z p ) = ( f ′ ( π )) . (2.1)We can do some more computations to get some useful information. We ﬁnd f ′ ( π ) = eπ e − + · · · where all of terms have distinct valuation and the leading term of f ′ ( π ) hasminimal valuation (this is due to the Eisenstein-ness hypothesis). This gives the formulaDiﬀ( K/ Q p ) = ( eπ e − ) from which we computeord p (Diﬀ( K/ Q p )) = ord p ( e ) + ( e −

1) ord p ( π ) = ord p ( e ) + ( e −

1) 1 e .

The discriminant of an extension of ﬁelds

L/L is then deﬁned to be the ideal-norm ofthe diﬀerent: Disc( L/L ) = N L/L (Diﬀ( L/L )) ⊳ O L We remark that Q p p ord p Diﬀ( L/ Q ) = | Disc( L/ Q ) | / [ L : Q ] . This is helpful when thinking about(say) (1.5). In later sections we will make use of the notation diﬀ( K/ Q p ) = ord p Diﬀ( K/ Q p ).2.3. Explicit Chinese Remainder Formulas.

Fix a ﬁeld K and an algebraic closure K .Let K , . . . , K m be ﬁnite extensions of K sitting inside the common algebraic closure. Theisomorphism of rings K ⊗ · · · ⊗ K m ϕ −→ M ψ ∈ Φ L ψ (2.2)will play an important role for us. We describe its ingredients: See [Neu99]. Φ ⊂ L mi =1 Hom( K i , K ), is a complete system of representatives under the equiva-lence relation ( ψ , . . . , ψ m ) ∼ ( σψ , . . . , σψ m )for σ ∈ G ( K /K ). • For ψ = ( ψ , . . . , ψ m ) ∈ Φ we let L ψ be the compositum L ψ = ψ ( K ) · · · ψ m ( K m ) ⊂ K . • The isomorphism ϕ is deﬁned via extending linearly the map ϕ ( a ⊗ · · · ⊗ a m ) = ( ϕ ψ ( a ⊗ · · · ⊗ a m )) ψ ∈ Φ , where ϕ ψ ( a ⊗ · · · ⊗ a m ) = ψ ( a ) · · · ψ m ( a m ).We prove (2.2) is an isomorphism: To see this we ﬁrst note that Spec( K ⊗ · · · ⊗ K m ) = Q mi =1 Spec( K i ) so the scheme is zero dimensional (and the spectrum of a product of ﬁelds).Each maximal ideal in the tensor product is the kernel of some map ϕ : K ⊗ · · · ⊗ K m → K . Two such maps have the same kernel if and only if they diﬀer by an automorphismof K . This explains the bijection between maximal ideals of the tensor product and( L Hom( K i , K )) / ∼ . Also, using the composition K i → K ⊗ · · · ⊗ K m → K we see that any ϕ induces ψ i : K i → K and we witness ϕ as having the special form ϕ ( a ⊗ · · · ⊗ a m ) = ψ ( a ) · · · ψ m ( a m ).2.4. Field Embeddings vs Choices of Roots.

Let

K/K be a ﬁnite ﬁeld extension. Writethis as a primitive extension with K = K ( α ) and let f ( x ) be the minimal polynomial of α .Using this notation we can write down a bijectionHom K ( K, K ) ∼ −→ { β ∈ K : f ( β ) = 0 } ψ ψ ( α ) . Now, let Φ ⊂ L mi =1 Hom K ( K, K ) be a complete system of representatives for the equiva-lence relation ∼ . Let K i = K ( α i ) where α i has minimal polynomial f i ( x ). We can modifyany ψ = ( ψ , . . . , ψ m ) by some σ ∈ G ( K /K ) with σψ = id K so that( ψ , ψ , . . . , ψ m ) ∼ (id K , ψ ′ , . . . , ψ ′ m ) . Such choices of ψ will be called normalized (for K ) and a collection of embeddings Φ willbe called normalized if each element is normalized.Note that normalized Φ are in bijection with tuples of roots of the corresponding minimalpolynomials. Φ ∼ −→ { ~α ′ = ( α ′ , . . . , α ′ m ) : f ( α ′ ) = 0 , . . . , f m ( α ′ m ) = 0 } (id K , ψ , . . . , ψ m ) = ( ψ , , ψ , . . . , ψ m ) ( ψ ( α ) , . . . , ψ m ( α m )) . We record that the inverse map is given by ~α ′ ψ ~α ′ here the components of ψ ~α ′ are the ﬁeld embeddings uniquely determined by where theysend the speciﬁed primitive element. We will make use of this correspondence frequently.2.5. Notation for Quotients.

For a polynomial ring R [ x , . . . , x n ] /I we will sometimesuse the notation ¯ x , . . . , ¯ x n to denote the images of x , . . . , x n in the quotient.2.6. Decomposition Comparisons.

Given ﬁelds K i = K ( α i ) with minimal polynomials f i ( x ) for i = 1 , . . . , m and Φ a K -normalized system of embeddings we are interested in adescription of the isomorphism (2.2) under the image of the base-change functor K ⊗ K − .This description will be used later in relating the tensor product of rings of integers to thering of integers of tensor products.First, we observe that K ⊗ · · · ⊗ K m ∼ = K [ x , . . . , x m ] / ( f , . . . , f m ). This gives K ⊗ K ( K ⊗ · · · ⊗ K m ) ∼ = K ⊗ K K [ x , . . . , x m ] / ( f , . . . , f m ) ∼ = K [ x , . . . , x m ] / ( f , . . . , f m ) ∼ = M ~α ′ K [ x , . . . , x m ] / ( x − α ′ , . . . , x m − α ′ m ) . Hence the isomorphism K ⊗ K ϕ : M ~α ′ K [ x , . . . , x m ] / ( x − α ′ , . . . , x m − α ′ m ) → K ⊗ K M ψ ∈ Φ L ψ . is now seen to be given by( f (¯ x , . . . , ¯ x m )) ~α ′ ( f ( α ′ , . . . , α ′ m )) ψ ~α ′ . The point here is that base changing to the algebraic closure splits ﬁelds and this allows usto work with roots of polynomials.2.7.

Idempotents and Diﬀerents.

Let K , . . . , K m be ﬁnite extensions of a ﬁeld K with K i = K ( α i ) and minimal polynomials f i . If K contains the Galois closures of K , . . . , K m then the idempotents of K ⊗ · · · ⊗ K m have the form g j ,...,j m = m Y i =2 f i (¯ x i )(¯ x i − α i,j i ) 1 f ′ i ( α i,j i ) . (2.3)Here K ⊗ · · · ⊗ K m = K [ x , . . . , x m ] / ( f , . . . , f m ) and f i ( x ) = ( x − α i, )( x − α i, ) · · · ( x − α i,n i ) . Alternatively, we can ﬁx some ψ := ( ψ , . . . , ψ m ) a tuple of embeddings ψ i : K i → K andwrite g ψ = m Y i =2 f i (¯ x i )(¯ x i − ψ i ( α i )) 1 f ′ i ( ψ ( α i )) . Proof.

We know that K [ x , . . . , x m ] / ( f , . . . , f m ) = M ~α ′ =( α ′ ,...,α ′ m ) K [ x , . . . , x m ] / ( x − α ′ , . . . , x m − α ′ m ) , o ﬁnd the idempotents in this decomposition is the same as solving for g ~α ′ such that ( g ~α ′ ≡ x − α ′ , . . . , x m − α ′ m ) ,g ~α ′ ≡ x − β ′ , . . . , x m − β ′ m ) , ~β ′ = ~α ′ . Since f i ( x ) / ( x − α ′ i ) → f ′ i ( α i ) as x → α i by L’hˆopital’s rule (which by universality of thecomputation holds algebraically), the element e g i ( x ) = f i ( x )( x − α i ) 1 f ′ i ( α i )has e g i ( α ′ i ) = 1 and e g i ( β ′ i ) = 0 for β ′ i = α ′ i . To obtain our result we just take the product ofthe e g i as in the statement of the result. (cid:3) The relation between idempotents and diﬀerents now appears clear via formulas (2.1) in § Rings of Integers of Tensor Products vs Tensor Products of Rings of Integers [Moc15b, Theorem 1.1] . We now give the comparison of T = N mi =1 O K i and O L . Here O L = L ψ ∈ Φ O L ψ where L = K ⊗ · · · ⊗ K m = L ψ ∈ Φ L ψ . We remind ourselves that O L is a T -algebra. Here ϕ : T → O L is given by (extending linearly) ϕ ( a ⊗ · · · ⊗ a m ) = ( ψ ( a ) · · · ψ m ( a m )) ψ ∈ Φ . For future reference we will let ϕ ψ denote the component of ϕ in the ψ th factor. Explicitly, ϕ ψ ( a ⊗ · · · ⊗ a m ) = ψ ( a ) · · · ψ m ( a m ) if ψ = ( ψ , . . . , ψ m ). Theorem 2.8.1.

Let K , . . . , K m be ﬁnite extensions of Q p sitting in a ﬁxed algebraic closure.Let T = O K ⊗ · · · ⊗ O K m . Let L = κ ( T ) = K ⊗ · · · ⊗ K m . Let k i for i = 1 , . . . , m denotethe respective residue ﬁelds of K i . If β = 1 ⊗ f ′ ( α ) ⊗ · · · ⊗ f ′ m ( α m ) where O K i = W ( k i )[ α i ] with Eisenstein polynomial f i ( x ) ∈ W ( k i )[ x ] then β ∈ ( T : L O L ) . That is β · O L ⊂ T .Proof. in what follows we let Z p denote the integral closure of Z p in Q p . In view of faithfulﬂatness [AM69, Chapter 3, exercises 16,17] it is enough to show Z p ⊗ O K ( β · O L ) ⊂ Z p ⊗ O K T. We will use the notation O L := Z p ⊗ O K O L and T := Z p ⊗ O K T . Using our embeddingdecomposition we have Z p ⊗ O K ( β O L ) = β · O L where β = P ψ ∈ Φ ϕ ψ ( β ) g ψ . Here we notethat ϕ ψ ( β ) = ϕ ψ (1 ⊗ f ′ ( α ) ⊗ · · · ⊗ f ′ m ( α m )) = f ′ ( ψ ( α )) · · · f ′ m ( ψ m ( α m )) , for ψ = ( ψ , . . . , ψ m ) ∈ Φ (here we take Φ to be K -normalized).We now use that the idempotents are given by g ψ = m Y i =1 f i (¯ x i )(¯ x i − ψ i ( α i )) 1 f ′ i ( ψ i ( α i )) ∈ ϕ ψ ( β ) Z p [¯ x , . . . , ¯ x m ] = 1 ϕ ψ ( β ) T . ow we just check: if x ∈ R it has the form x = P ψ ∈ Φ x ψ g ψ for some x ψ ∈ Z p . We have β · x = X ψ ∈ Φ ϕ ψ ( β ) g ψ ! X ξ ∈ Φ x ξ g ξ ! = X ψ ϕ ψ ( β ) x ψ g ψ ∈ T .

The second equality follows from orthogonality of idempotents and the last membershipstatement follows from the fact that ϕ ψ ( β ) g ψ ∈ T . (cid:3) Remark . The proof of Theorem 2.8.1 has nothing to do with K . We can chose some K i which makes the inclusion tightest.3. Conductors, Minimal Discriminants, and Ramification

This section contains deﬁnitions and facts about bad reduction, minimal discriminants,and Galois theory necessary for our applications. Readers just interested in ProbabilisticSzpiro or Baby Szpiro ( §

7) may skip the last two subsections and proceed directly to §

6. Fora quick reading, readers may which to skip to § § § Inertia/Decomposition Sequences.

Recall that for a ﬁnite extension K of Q p wehave an extension of topological groups1 → I K → G K → G k → , where I K is the inertia group and k is the residue ﬁeld.If L is a global ﬁeld and v ∈ V ( L ) is non-archimedean and v | v is a place of L we have1 → I ( v/v ) → D ( v/v ) → G ( κ ( v ) /κ ( v )) → G ( v/v ) = Stab G L ( v ) ∼ = G L v is the decomposition group.3.2. Unramiﬁed and Ramiﬁed Representations.

Let L be a ﬁnite extension of Q p . If X is an object in a category, a representation ρ : G L → Aut( X ) is unramiﬁed if and onlyif ρ ( I L ) = 1. We may speak of X being unramiﬁed, where the representation is understood(usually torsion points of an elliptic curve).Let L be a global ﬁeld. Given ρ : G L → Aut( X ) we say that ρ is unramiﬁed at v if andonly if ρ | G v is unramiﬁed. In either of these cases, if a representation is not unramiﬁed it iscalled ramiﬁed .3.3. Good and Bad Reduction.

Let K be a ﬁnite extension of Q p . Let R = O K be itsring of integers and let k be its residue ﬁeld. Let A K be an abelian variety over K . We recallthat A K has good reduction if and only if there exists and abelian schemes A over R whosegeneric ﬁber is isomorphic to A K . This is equivalent to the special ﬁber of the N´eron modelbeing an abelian variety. Given an abelian variety A over a global ﬁeld L we say that A L has good reduction at v if and only if A L v does. .4. Division Fields and Galois Representations.

Let A be an abelian variety over aﬁeld L . Let m be an integer. We will abuse notation and let A [ m ] denote both the groupscheme of m -torsion points and the G L -module given by taking L -points of this group scheme.Assume now that L is a number ﬁeld. We will let L l = L ( A [ l ]). We remark that L l maybe deﬁned by literally adjoining the coordinates of torsion points in some model and thatthis ﬁeld extension is independent of the model. If we ﬁx an algebraic closure L we also have L l ∼ = L ker ρ l . We also note that G ( L l /L ) ∼ = im( ρ l : G L → Aut( A [ l ])).Consider now the Tate module T l A = lim ←− A [ l n ] in the category of Galois modules. We let ρ l ∞ : G L → Aut T l A denote action in the underlying representation. When it is necessary tospecify the abelian variety we use ρ l ∞ ,A .Serre’s surjectivity theorem says that for an elliptic curve without complex multiplicationthe image of ρ l surjective for all but ﬁnitely many l . This implies that for l suﬃciently largeim( ρ l ) ∼ = GL ( F l ). We make this remark because the initial theta data hypotheses of[DH20b, §

5] require ρ l ( G F ) ⊃ SL ( F l ) — Serre’s Conjecture says this is generically true.3.5. Minimal Discriminants and Tate Parameters.

We suppose that E is an ellipticcurve over a number ﬁeld F sitting in initial theta data. We will assume that the it issemi-stable (all bad places are places with multiplicative reduction). Note that if it is notsemi-stable one can make a ﬁnite change of base such that all places of the new ﬁeld abovea place of additive reduction in the old ﬁeld are places of good reduction. Under any basechange, places in the new ﬁeld over places of multiplicative reduction in the old ﬁeld stillhave multiplicative reduction (hence the word semi-stable).In the case that E is an elliptic curve over L , a ﬁnite extension of Q p by [Sil13, Ch V,Lemma 5.1] if | j E | p > E having multiplicative reduction) thereexists a Tate parameter q = q E ∈ L and an isomorphism of elliptic curves u : E → E q deﬁned over L . Here E q is the Tate curve which admits a Tate uniformization. Note thatthis implies that all elliptic curves without potential good reduction have a unique Tateparameter at bad places. In fact: q E ∈ Q p ( j E ) if L is a ﬁnite extension of Q p .The following describes the relationship between the minimal discriminant and the Tateparameter. Lemma 3.5.1. If E is an elliptic curve over L a complete discretely valued ﬁeld with valu-ation v then(1) If E has multiplicative reduction then ord v (∆ min ) = ord v ( q E ) , where ∆ min is theminimal discriminant E/L .(2) All Tate curves E q are minimal Weierstrass models.Proof. The proof of the ﬁrst assertion follows from Ogg’s Formula. This formula states c = ord(∆ min ) + 1 − m. It is conjectured by Serre that for every number ﬁeld L there exist some l max such that for every ellipticcurve and all l ≥ l max that im( ρ E,l ) = GL ( F l ). In the case L = Q it is further conjectured that l max = 37. ere c is the local conductor exponent, ∆ min is the minimal discriminant and m is the numberof irreducible components E s the special ﬁber of the N´eron model of E for R = O L . Since ourelliptic curve has multiplicative reduction this implies c = 1 which implies m = ord v (∆ min ).Now we have E s / E s ∼ = E q ( L ) / ( E q ) ( L ) ∼ = ( L × /q Z ) / ( R × /q Z ) ord v −−→ Z / ord v ( q ) . The ﬁrst equality is the Kodaira-N´eron Theorem, where E s denotes the special ﬁber of theN´eron model and the superscript zero denotes the connected component of the identity. Also( E q ) ( L ) is the kernel of specialization. The second equality follows from Tate uniformizationand the last equality follows from taking valuations. From this the equality follows. (see[Sil09, Appendix C]).We now prove ∆ E q is minimal. We know that∆ E q = q Y n ≥ (1 − q n ) . This shows ord v (∆ E q ) = ord v ( q ) and since ord v ( q ) = ord v (∆ min E q ) from the ﬁrst assertion ofthe lemma we are done. (cid:3) Minimal Discriminants and Base Change.

The following describes how minimaldiscriminants behave under base change.

Lemma 3.6.1.

Let

K/F be a ﬁnite extension of number ﬁelds. If

E/F is a semi-stableelliptic curve then [ K : F ] ln | ∆ min E/F | = ln | ∆ min E K /K | . Proof.

The proof is a computation:ln | ∆ min E K /K | = X w ∈ V ( K ) ord w (∆ min E K /K ) f ( w/p w ) ln( p w )= X w ∈ V ( K ) ord w ( q w ) f ( w/p w ) ln( p w )= X v ∈ V ( F ) X w ∈ V ( F ) w e ( w/v ) ord v ( q v ) f ( w/v ) f ( v/p v ) ln( p v )= X v ∈ V ( F ) X w | v [ K w : F v ]  ord v ( q v ) f ( v/p v ) ln( p v )= [ K : F ] X v ∈ V ( F ) ord v ( q v ) f ( v/p v ) ln( p v ) = [ K : F ] ln | ∆ min E/F | . (cid:3) .7. Normalized Arakelov Degrees.

For a number ﬁeld L and an Arakelov divisors D ∈ d Div( L ) the normalized Arakelov degree is deﬁned by d deg L ( D ) = d deg L ( D )[ L : Q ] . For v ∈ V ( L ) with [ v ] ∈ d Div( L ) degrees are normalized so that d deg([ v ]) = ln | N v | = f v ln( p v )where p v is the characteristic of κ ( v ) and f v is the inertia degree. We use the property thatthe normalized degree is invariant under pullback: if f : V ( L ) → V ( L ) is the natural mapassociated to an extension of number ﬁelds L ⊂ L and D ∈ d Div( L ) then d deg L ( D ) = d deg L ( f ∗ D ) . We record that f ∗ [ v ] = P v | v e ( v/v )[ v ].3.8. q and Theta pilots. Fix initial theta data (

F /F, E F , l, M , V , V badmod , ǫ ). Furthermore,suppose it is built from the ﬁeld of moduli so that V badmod ⊂ V ( F ) contains all the semi-stableplaces of bad reduction. Deﬁnition 3.8.1.

The q -pilot divisor of this data is then P q = X v ∈ V badmod ord v ( q / lv )[ v ] ∈ Div( F ) Q . (3.1)The q -pilot is related to the minimal discriminant of an elliptic curve by the followingformula: d deg F ( P q ) = 12 l ln | ∆ min E/F | [ F : Q ] . (3.2)To see this we perform a simple computation: d deg F X v bad ord v ( q v )[ v ] ! = d deg K  X v ∈ V badmod ord v ( q v ) X w ∈ V ( K ) v e ( w/v )[ w ]  = d deg K  X v ∈ V badmod X w ∈ V ( K ) v ord w ( q v )[ w ]  = d deg K X w bad ord w ( q w )[ w ] ! = d deg K X w bad ord w (∆ min E K /K )[ w ] ! = ln | ∆ min E K /K | [ K : Q ] = ln | ∆ min E/F | [ F : Q ] . We now discuss Theta pilots. eﬁnition 3.8.2. The theta pilot divior is a tuple P Θ = ( P Θ ,j ) ( l − / j =1 ∈ d Div lgp ( F ) ( l − / Q where P Θ ,j = X v ∈ V ( F ) bad ord v ( q j / lv )[ v ] ∈ d Div( F ) Q . The relationship between the theta and q -pilots is given by d deg F ( P q ) = l ( l + 1)12 d deg lgp ,F ( P Θ ) . (3.3)This formula is derived by a simple computation: d deg lgp ,F ( P Θ ) = 2 l − ( l − / X j =1 d deg F ( P Θ ,j )= 2 l − ( l − / X j =1 j d deg F ( P q )= l ( l + 1)12 d deg F ( P q ) . Remark . (1) The assertion of [SS17, pg 10] is that (3.3) is the only relation betweenthe q -pilot and Θ-pilot degrees. The assertion of [Moc18, C14] is that [SS17, pg 10] isnot what occurs in [Moc15a]. The reasoning of [SS17, pg 10] is something like whatfollows:(a) The Θ × µ LGP -link in [Moc15a] is a polyisomorphism between F (cid:13)◮ × µ -strips, , F (cid:13)◮ × µ LGP and , F (cid:13)◮ × µ ∆ .(b) Within these objects there are two global realiﬁed Frobenioids , C (cid:13) LGP and , C (cid:13) ∆ .Also there exists objects , P (cid:13) Θ ∈ , C (cid:13) LGP and , P (cid:13) q ∈ , C (cid:13) ∆ called the (0,0) thetapilot object and (1,0) q pilot object respectively and the theta link Θ × µ LGP is suchthat Θ × µ LGP ( , P (cid:13) Θ ) = , P (cid:13) q .(c) To each such global realiﬁed Frobenioids C (cid:13) we can interpret a one dimensionalreal vector space Pic( C (cid:13) ). Also, to any object P (cid:13) ∈ C (cid:13) there is an associateddegree deg C (cid:13) ( P (cid:13) ) ∈ Pic( C (cid:13) ).(d) Any isomorphism between , C (cid:13) LGP and , C (cid:13) ∆ induces an isomorphism betweenPic( , C (cid:13) LGP ) and Pic( , C (cid:13) ∆ ).(e) An identiﬁcation one can make is to ﬁx isomorphisms α : Pic( , C (cid:13) LGP ) → R β : Pic( , C (cid:13) ∆ ) → R speciﬁed by extending linearly α (deg , C (cid:13) LGP ( , P (cid:13) Θ )) = d deg lgp ( P Θ ) ,β (deg , C (cid:13) ∆ ( , P (cid:13) q )) = d deg( P q ) , where the degree on the left hand side are as in the present subsection ([SS17,2.1.6] calls this the canonical trivialization.) f) The authors of the present article, Scholze-Stix, and Mochizuki all agree thatthe items above lead to a contradition. Stripping away the abstraction, theseassertions are tautologically equivalent to d deg lgp ,F ( P Θ ) = ( l ( l +1)) / · d deg F ( P q )and d deg lgp ,F ( P Θ ) = d deg F ( P q ). This clearly gives a contradiction.(g) It is our understanding that no such α map is speciﬁed in IUT; meaning thatcommutativity of the diagram consisting of the map induced by Θ × µ LGP , α , and β is not asserted.(2) We would like to point out that the diagram on page 10 of [SS17] is very similar tothe diagram on § . .We note that there is also the review [Rob 3] which some may ﬁnd interesting.3.9. N´eron-Ogg-Shafarevich: Conductors and Good Reduction.

The following the-orem of Serre and Tate, which they call the N´eron-Ogg-Shafarevich Criterion, tells us howramiﬁcation of an l -power Tate module is related to the reduction geometry of the N´eronmodel of corresponding the abelian variety. Theorem 3.9.1 ([ST68]) . Let A be an abelian variety over a local ﬁeld L of residue char-acteristic p . The following are equivalent.(1) For all m ∈ N , ( m, p ) = 1 , A [ m ] is unramiﬁed.(2) There exist a rational prime l such that l = p and T l ( A ) is unramiﬁed.(3) There exist inﬁnitely many m with ( m, p ) = 1 such that A [ m ] is unramiﬁed.(4) A has good reduction. We apply this in subsequent sections to get information about the behavior of ramiﬁcationdegrees in our computations. We can apply this theorem to get a criteria relating theconductor of Abelian varieties to discriminants the of an associate l division ﬁeld. Theorem 3.9.2.

Let A be an abelian variety over a number ﬁeld L . Let l be a rational prime.Let L l = L ( A [ l ]) . Let w be a non-archimedean place of L l coprime to l and char( κ ( w )) = p .The following holds e ( w/p ) > ⇐⇒ w | l or w | Cond(

A/L ) or w | Diﬀ( L/ Q ) . Proof.

Suppose that e ( w/p ) >

1. Since e ( w/p ) = e ( w/v ) e ( v/p ), where v ∈ V ( L ) is theimage of w ∈ V ( L l ) under the natural map V ( L l ) → V ( L ), we must have e ( w/v ) > e ( v/p ) >

1. If e ( v/p ) > v | Diﬀ( L/ Q ) which implies w | Diﬀ( L/ Q ). If e ( w/v ) > I w/v = 1 since I w/v = e ( w/v ). Since ρ l : G ( L l /L ) → Aut( A [ l ]) is injective and T l has A [ l ] as a quotient, we know that ρ l ( I v/v ) = 1 and hence T l A is ramiﬁed. This implies v | Cond(

A/L ). The ﬁnal option is w | l .Conversely suppose w | l or w | Cond(

A/L ) or w | Diﬀ( L/ Q ) . f w | l then since L l ⊃ Q ( ζ l ) we have e ( w/l ) > l −

1. If w | Diﬀ( L/ Q ) then by deﬁnition e ( v/p ) >

1. If w | Cond(

A/L ) then v | Cond(

A/L ) since L l /L is Galois. We know that v | Cond(

A/L ) ⇐⇒ c v = 0 ⇐⇒ v is ramiﬁed ⇐⇒ I w/v = 1 . This proves the result. Above, c v = ord v (Cond( A/L ). (cid:3) Estimates on p -adic Logarithms The material in this section is applied in § p -adic logarithm, ln denote the real valued natural logarithm, and log p denote the real valuedbase p logarithm. We refer the reader to [Rob00] for a quick review of elementary propertiesof the p -adic logarithm. See also [DH20a, § Notation. C p be the p -adic completion of Q p and let ord p be the unique extension ofthe valuation on Q p to C p with ord p ( p ) = 1. We normalize the p -adic absolute values by | x | p = p − ord p ( x ) .If K is local ﬁeld with uniformizer π K we let ord K denote the valuation normalized byord K ( π K ) = 1. In the case that L is a global ﬁeld and v ∈ V ( L ) is a non-archimedean place,we let ord v = ord L v denote the normalized valuation on L v .4.1.2. Let K/K be a ﬁnite extension of non-archimedean ﬁelds of residue characteristic p .We will let e ( K/K ) denote the ramiﬁcation degree of the extension. We will say e ( K/K )is small provided e ( K/K ) < p −

1. Note that small implies tame.If L ′ ⊃ L is an extension of number ﬁelds and v ′ | v are places of the respective numberﬁelds we let e ( v ′ /v ) := e ( L ′ v ′ /L v ). If L is a number ﬁeld, we say that a non-archimedeanplace v of L is small if L v / Q p is small.4.1.3. For a p -adic ﬁeld K , a ∈ K and r ≥ r by D K ( a, r ) = { x ∈ K : | x | p ≤ r } . Similarly if L = L mj =1 L j is a ﬁnite direct some of p -adic ﬁelds, ~a = ( a , . . . , a m ) ∈ L and ~r = ( r , . . . , r m ) is a vector of non-negative real numbers then we will denote the polydisc ofpolyradius ~r by D L ( ~a, ~r ) = { ( x , . . . , x m ) ∈ L : | x | p ≤ r and · · · and | x m | p ≤ r m } . When writing D L (0 , R ) where R ∈ R we will understand this to mean D L (0 , ( R, R, . . . , R )). .2. Estimates on The Size of The p -Adic Logarithm. We begin by estimating thesize of the p -adic logarithm (c.f. [Moc15b, Prop 1.2]). Lemma 4.2.1 (Crude Estimate) . Let a ∈ C p , with ord p ( a ) > . We have | log(1 + a ) | p < c p ord p ( a ) , where c p = (exp(1) ln( p )) − , where exp(1) = 2 . . . . is the base of the natural log.Proof. To get an upper bound on | − log(1 − a ) | p = | P n ≥ a n n | p for | a | p < | a n /n | p . Equivalently, we can compute the minimum of ord p ( a n /n ). We ﬁndthese lower bounds by usingord p ( a n /n ) = n ord p ( a ) − ord p ( n ) ≥ n ord p ( a ) − log p ( n ) , and minimizing the function f ( x ) = xc − log p ( x ) . The function has global minimum at x = 1 /c ln( p ) which gives f ( x ) ≥ f ( x ) = 1ln( p ) + log p ( c ln( p )) . Converting this lower bound on the order to an upper bound on the p -adic absolute valuegives our result. (cid:3) Remark . One can also minimize the function f ( x ) = p x c − x giving | log(1 + a ) | p ≤ b p | a | p ord p ( a ) , where b p = p ) e ln( p )2 . This is not of any use to us.The application of Lemma 4.2.1 gives an upper bound on the smallest radius r such thatlog( O × K ) ⊂ D K (0 , r ) where K is a ﬁnite extension of Q p . With knowledge that e ( K/ Q p ) issmall we can do much better. We state these results and omit the proofs. Lemma 4.2.3.

Let K/ Q p be a ﬁnite extension.(1) With no assumptions on the ramiﬁcation of K/ Q we have log( O × K ) ⊂ D K (0 , e ( K/ Q p )ln( p ) exp(1) ) .(2) If e ( K/ Q p ) < p − then log( O × K ) = π O K where π is the uniformizer of K .Remark . In [Moc15b, Prop 1.2] Mochizuki proves log( O × K ) ⊂ p − b O K where b = ⌊ ln( pe ( K/ Q ) p − ) / ln( p ) ⌋ − e ( K/ Q ) . As far as usability goes, the formula in Lemma 4.2.3 whileweaker, seems to be easier to understand.4.3. p -Adic Log Shells. The present section collects and reformulates some of the materialin [Moc15c].

Deﬁnition 4.3.1.

Let K/ Q p be a ﬁnite extension. The log-shell of K is the Z p -submoduleof K deﬁned by I K = p log( O × K ) Lemma 4.3.2 (Upper Semi-Compatibility) . I K contains both O K and log( O × K ) . One could have also used ord p ( a n /n ) = p m ord p ( a ) − m. along the sequence n = p m . This will givediﬀerent, less useful bounds. See the remark below. roof. It is clear that log( O × K ) ⊂ I K . Conversely, since | p | p < r p we have log(1 + 2 p O K ) =2 p O K since ord p ( wp ) > / ( p − I K ⊃ p log(1 + 2 p O K ) = 12 p (2 p O K ) = O K . (cid:3) Remark O × K )) . For K/ Q p a ﬁnite extension we not thatlog( O × K ) has the structure of an O K -module very rarely. In order for log( O × K ) to be an O K -module we need a log( b ) = log( b a )for a ∈ O K and b ∈ O × K . This in turn depends on the convervence of b a = P ∞ n =0 a ( a − ··· ( a − n +1) n ! ( b − n . We will not pursue this here, as estimates will not be needed. On the other hand wedo observe that log( O × K ) is always a Z p -module for exactly the same reason.5. Archimedean Logarithms

In order for our estimates to be complete we require deﬁnitions and estimates for hull( U Θ )at the Archimedean factor L ∞ . For ~v = ( v , . . . , v j ) ∈ V ( F ) j +1 ∞ we will let H ~v denote thecomponent of hull( U Θ ) in K v ⊗ · · · ⊗ K v j (since √− ∈ K we know that K v ∼ = C for each v ∈ V ). Claim 5.0.1. If ~v ∈ V ( F ) j +1 ∞ then H ~v ⊂ D L ~v (0; R ~v ) where ln( R ~v ) = ( j + 1) ln( π ) . We do not develop the theory necessary to discuss this bound as this requires an Archimedeantheory parallel to the p -adic theory in [ ? ]. A full anabelian treatment requires so-called aut-holomorphic spaces. The starting place is [Moc15a, Deﬁnition 1.1]. The claim above can befound in [Moc15b, Proposition 1.5, Proof of Theorem 1.10, step vii].6. Upper Bounds on Hulls

We now come to the section of the paper which contains the ﬁrst major computation.Fix initial theta data (

F /F, E F , l, M , V , V badmod , ǫ ). In this section our goal is to ﬁnd, for eachprime p and each ~v ∈ ` ( l − / j =1 V ( F ) j +1 p , the smallest poly-disc D L ~v (0 , R ~v ) such that thecomponent of the multiradial representation at ~v is contained in this polydisc. The smallestpossible polydisc here is called the hull.6.1. Hulls. If L = L mj =1 L j is a ﬁnite direct sum of p -adic ﬁelds and Ω ⊂ L then lets deﬁne R i (Ω) = max {| x i | p : ( x , . . . , x m ) ∈ Ω } , then deﬁne the poly-radius of Ω to be ~R (Ω) = ( R (Ω) , . . . , R m (Ω)) . Deﬁne the hull of Ω to be the smallest poly-disc containing Ω:hull(Ω) = D L (0 , ~R (Ω)) . It is easy to check that if α = ( α , . . . , α m ) ∈ L then ~R ( α · Ω) = ( | α | p R (Ω) , . . . , | α m | p R m (Ω)) . lso, given a collection of compact regions Ω i ⊂ L where i = 1 , , · · · then for each j where1 ≤ j ≤ m we have R j ( ∞ [ i =1 Ω i ) = sup { R j (Ω i ) : i ≥ } . Note that the right hand of the above equality is possibly inﬁnite.We now state some basic properties of hulls. For

A, B ⊂ L we will write A ⊂ ∼ B ⇐⇒ hull( A ) ⊂ hull( B ) . Note that A ⊂ ∼ B if there exists some Q p -linear tranformation T : L → L with | det( T ) | p = 1and T ( A ) ⊂ B (such a T could be multiplication by a unit of L for example). Also if Ω ⊂ L and a ∈ K m (which we view as acting on L via multiplication on the m th tensor factor) then a N · Ω ⊂ ∼ a · Ω. To see that hull( a N · Ω) ⊂ hull( a · Ω), we observe that a ∈ K m acts on each directsummand of L by ψ j ( a ) where we have written L = L L ψ j using the Chinese Remainderformulas developed in § R j ( a N · Ω) = sup { R j ( a n · Ω) : n ≥ } = | a | p R j (Ω).This implies R j ( a N · Ω) ≤ R j ( a · Ω) and hence hull( a N · Ω) ⊂ hull( a · Ω).6.2.

Worst Case Scenario.

We now give a toy-version of our the computation of the hullbound associated to a tuple ~v ∈ V ( F ) j +1 . Here we make assumptions on ramiﬁcation of ourﬁelds.Let K , . . . , K m be ﬁnite extensions of Q p (in our actual application m will be j + 1). Let a ∈ K m with | a | p <

1. Let L = N mi =1 K i ∼ = L ri =1 L j where the factors of the right hand sidecome from the Chinese Remainder Theorem as in § I = N mi =1 I K i be the tensor product of log-shells and Aut( L : I )denote the collection of Q p -vector space automorphisms of L obtained by extending Q p -linearly Z p -lattice automorphisms of I . These automorphisms are a stand-ins for ind1 andind2 in our actual applications (see [DH20b, §

4] for deﬁnitions). This subsection gives a bound on the hull of the “multiradial representation” U = hull Aut Q p ( L : m O i =1 log( O × K i )) · ( O ind3( a ) L ) ! . This region is a stand-in for the random measurable set U ( j ) ⊂ A ⊗ j +1 V ,p of the hull of the coarsemultiradial representation of the Theta pilot region (see [DH20b, §

4] and the next section).We prove hull (cid:16)

Aut( L : I ) · ( O ind3( a ) L ) (cid:17) ⊂ D L (0; R ) (6.1)where the radius R is given byln( R ) = −⌊ ord p ( a ) + k diﬀ k ∞ − k diﬀ k ⌋ ln( p ) + m ln( c p ) + m X i =1 ln( e ( K i / Q p )) . (6.2) The only reason this subsection can’t directly be applied is because the actual ind1 has some permutationsamong diﬀerent tensor product factors of A ⊗ j +1 V ,p . The permutation of these factors does not appear in thisexample. In [DH20b] we used the notation U for what we are now denoting U . he constant c p ∈ R and the vector diﬀ ∈ R m are given by c p = 1 / exp(1) ln( p ) , diﬀ = (diﬀ( K / Q p ) , . . . , diﬀ( K m / Q p )) . To obtain this radius we compute. We have labeled each line in the computation belowand give the justiﬁcation for each step in the itemized environment following the displayedequations. Aut( L : I ) a N · O L ∪ m O i =1 log( O × K i ) !! (6.3) ⊂ ∼ Aut( L : I ) a · β − m O i =1 O K i ∪ m O i =1 log( O × K i ) !! (6.4) ⊂ ∼ Aut( L : I ) (cid:0) aβ − I (cid:1) (6.5) ⊂ ∼ Aut( L : I ) (cid:0) p ⌊ ord p ( a ) − ord p ( β ) ⌋ I (cid:1) (6.6)= p ⌊ ord p ( a ) − ord p ( β ) ⌋ I (6.7) ⊂ ∼ p ⌊ ord p ( a ) − ord p ( β ) ⌋ D L (0 , (cid:18) p exp(1) ln( p ) (cid:19) m m Y i =1 e ( K i / Q p )) (6.8)Since hull( D L (0 , R )) = D L (0 , R ) for all radiuses R > U ) ⊂ D L (0 , e ( K / Q p ) · · · e ( K m / Q p ) | p | p p −⌊ ord p ( a )+ k diﬀ k ∞ −k diﬀ k ⌋ ) . Here are the justiﬁcations for each step: • (6.1) to (6.3): Uses the main result concerning ind3 in [ ? ] • (6.4): Uses the theory of §

2. In particular there exist some β = ( β , . . . , β r ) ∈ L rj =1 L j = L such that β O L = L rj =1 β j O L j ⊂ N ri =1 O K i where ord p ( β j ) = k diﬀ k −k diﬀ k ∞ for each j where 1 ≤ j ≤ r . • (6.5): First we are using the “upper semi-compatibility” of I K , namely that for aﬁnite extension K of Q p we have log( O × K ) , O K ⊂ I K . We use this fact tensor factorby tensor factor. Also, since the factors of β all have large order, multiplication by β − will increase the size of the hull. • (6.6): We are using the general fact that if A is a region and | a | p < | a | p then a A ⊂ ∼ a A . • (6.7): This uses that Aut( L : I ) is by deﬁnition Q p -linear and ﬁxes I as a set. • (6.8): We are applying the results of Lemma 4.2.31. Remark . One can break this inclusion down in some alternative ways. Here we highlightsome areas for improvement. We do not pursue these here.(1) Alternative to (6.3): For bounding Aut( L : I )( a · ( O L ∪ L mi =1 log( O K i )) one could writeour an explicit Z p -basis for a · O L and explicitly compute the action by Aut( L : I ).

2) Alternative to (6.4): One could attempt to compute the index of N mi =1 O K i in O L .This seems practical to do in speciﬁc toy cases but the size of the division ﬁelds maygive in actual applications. It seems conceivable that other invariants around thisinclusions can be used to write down more precise results.(3) (6.5): One could attempt to ﬁnd a smaller region here containing the two sets. Arelog-shells optimal? Maybe, maybe not.(4) (6.8): We can go beyond the worst case scenario and make additional considerationsabout the ramiﬁcation of the ﬁelds to improve bounds on I . This includes applyingthe second part of Lemma 4.2.3 (which is applicable most of the times). In fact, forall but ﬁnitely many places of v ∈ V ( F ) we have I v = O K v .6.3. Actual Scenario.

Fix initial theta data (

F /F, E F , l, M , V , V badmod , ǫ ) built from the ﬁeldof moduli. In what follows A V = Q v ∈ V ( F ) K v denotes the “fake adeles” from [DH20b, § § U ( j ) p ⊂ A ⊗ j +1 V ,p =: L ( j ) p where U ( j ) p is of the form U ( j ) p = ind2(ind1( O ind3( ~a j ) L ( j ) p )) . Here we have made the following notational conventions: O L ( j ) p = M v | p O L ( j ) v , O L ( j ) v = Peel jv ( O L ( j ) v ) , O ind3( ~a j ) L ( j ) p = M v | p O ind3( a j,v ) L ( j ) v , and we have let ~a j = ( a j,v ) v ∈ V ( F ) where a j,v = ( q j / lv , v bad multiplicative1 , else . All of this of course depends on a choice of initial theta data. The peel decompositionPeel jv ( O L ( j ) v ) is described in [DH20b, § § ~v = ( v , . . . , v j ) ∈ V j +1 p . We say that ~v is small if every e ( v i /p ) is small for 0 ≤ i ≤ j . Similarly we say that ~v is unramiﬁed if v i isunramiﬁed for each i where 0 ≤ i ≤ j . We will also let L ~v = K v ⊗ · · · ⊗ K v j , where thetensor products are over Q p . Lemma 6.3.1.

In the notation of this subsection, we have hull( U ( j ) p ) ⊂ Y ~v ∈ V j +1 p D L ~v (0 , R ~v ) here ln( R ~v ) =  , ~v unramiﬁed and p ∤ ∞−⌊ ord p ( a j,v ) − ord p ( β ~v ) ⌋ ln( p ) , ~v small and p ∤ ∞−⌊ ord p ( a j,v ) − ord p ( β ~v ) ⌋ ln( p ) + ( j + 1) ln( b p ) + P ji =0 ln( e ( v j /p )) , p | ∞ and ~v general ( j + 1) ln( π ) , p | ∞ Proof.

There are three points of departure from the computation in § β and a , improvement of log-bounds, and the inclusion of the archimedean place. In thecase that ~v is unramiﬁed we know that a j,v j = 1 by N´eron-Ogg-Shafarevich. In the casethat ~v is small, we apply the bounds from Lemma 4. In the archimedean case we applyLemma 5.0.1. (cid:3) Probabilistic Versions of the Mochizuki and Szpiro Inequalities

Throughout this section we ﬁx initial theta data (

F /F, E F , l, M , V , V badmod , ǫ ) built from theﬁeld of moduli F = Q ( j E ).7.1. Probability Spaces.

Fix a rational prime p . Recall that, as in the introduction, wegive ` ( l − / j =1 V ( F ) j +1 p the structure of a ﬁnite probability space where ( v , v , . . . , v j ) ∈ ` ( l − / j =1 V ( F ) j +1 p is assigned probabilityPr(( v , v , . . . , v j )) = 2 l − K v : Q p ][ K v : Q p ] · · · [ K v j : Q p ][ F : Q ] j +1 . The space ` ( l − / j =1 V ( F ) j +1 p can be viewed as a uniform independent disjoint union of prob-ability spaces V ( F ) j +1 . For a random variable X ( ~v ) that depends on ~v = ( v , v , . . . , v j ) ∈ ` ( l − / j =1 V ( F ) j +1 we can view the expectation of X as an “iterated expectation”, by ﬁrstcomputing the expectation as we vary over ( v , . . . , v j ) ∈ V ( F ) j +1 p for a ﬁxed j and thencomputing the expection of these expectations as we vary uniformly over j . In what follows E p will denote this iterated expectation: E p ( X ( ~v )) = E ( E ( X ( ~v ) : ~v ∈ V ( F ) j +1 ) : 1 ≤ j ≤ ( l − / . Note that the colons here do not denote conditional probabilities.7.2.

Jensen’s Inequality.

Jensen’s inequality states that for a convex function g ( x ) and arandom variable X that g ( E ( X )) ≤ E ( g ( X )) . The inequality goes the other way for concave functions and one can test for convexity usingthe second derivative test: a function of a real variable g ( x ) is convex if and only g ′′ ( x ) ≥ g ( x ) = exp( x ) is a convex function and g ( x ) = ln( x ) in concave. This allowsus to say that exp( E (ln( X ))) ≤ E ( X ) ≤ ln( E (exp( X ))) . (7.1) .3. Random Variables Pulled-back from a Projection.

Let S be a discrete probabilityspace. Let ( X , . . . , X n ) be a random variable on S n . If f ( X , . . . , X n ) only depends on X n (i.e. f ( X , . . . , X n ) = g ( X n ) for some function of a single variable g ) then the expected valueof f ( X , . . . , X n ) can be computed by just varying over what the function depends on. Insymbols: E ( f ( X , . . . , X n )) = E ( g ( X )) . It is also elementary to check that E ( g ( X ) g ( X ) · · · g ( X n )) = E ( g ( X )) n . Measures.

For L a direct sum of p -adic ﬁelds, we will often make use of the formulaln µ L ( D L (0 , R )) ≤ ln( R ) . Here, for a ﬁnite dimensional vector space V and a measurable set A ⊂ V we deﬁneln µ V ( A ) = ln( µ V ( A )) / dim( V ) . Probabilistic Mochizuki.

Using the Probabilistic formalism developed in [DH20b, § Theorem 7.5.1 (Tautological Probabilistic Inequality) . For ~v ∈ V j +1 p let R ◦ ~v = sup { R ∈ R : U ~v ⊂ D L ~v (0 , R ) } , (7.2) here U ~v is the component of the multiradial representation in L ~v . Assuming [Moc15a, Corol-lary 3.12] we have − d deg F ( P q ) ≤ X p ∈ V ( Q ) E p (ln R ◦ ~v ) . (7.3)The radius R ~v in Lemma 6.3.1 gives an estimate on R ◦ ~v in (7.3) giving − d deg F ( P q ) ≤ X p ∈ V ( Q ) E p (ln R ~v ) . (7.4)The rest of this subsection is devoted to estimating ln R ~v (so we will be deriving, in eﬀect,will be estimates of estimates). Remark . The computation of R ~v is not optimal. It can be improved upon by readers ingeneral or in special cases. It is unclear how far oﬀ R ◦ ~v is from R ~v . It would be interesting todevelop a table of R ◦ ~v in some numerical examples (if the computations involving the divisionﬁelds are not prohibitively hard).The readers should compare what follows to [Moc15b, Proof of Theorem 1.10]. Fix p ∈ V ( Q ). We have E p (ln R ~v ) ≤ I p + II p + III p + IV p + V p (7.5) here I p = − E p (cid:16) ord p ( q j / lv j ) (cid:17) ln( p )II p = E p ( k diﬀ ~v k − k diﬀ ~v k ∞ ) ln( p )III p = E p (1 ram ( ~v )))IV p = E p (( j + 1) ln( b p )1 ram ( ~v ))V p = E p j X i =0 ln( e ( v j /p )) ! In the above formulas for III p and IV p the function 1 ram ( ~v ) is the function which is 0 if ~v isunramiﬁed and 1 if ~v is ramiﬁed. We will denote the sums over p of I p ,II p , III p , IV p , andV p by I, II , III, IV, and V respectively.

Remark . At this stage we can already see Mochizuki’s inequality as stated in [Fes15, § − d deg( P q ) ≤ log ν L (hull( U )) and log ν L (hull( U )) ≤ a ( l ) − b ( l ) d deg( P q ) to get ( b ( l ) − d deg( P q ) ≤ a ( l ) which gives d deg( P q ) ≤ a ( l ) b ( l ) − . [SS17, Claim 5] follows this style. Further approximate computations can be found at [Hos17,slide 17] (adapted in [SS17, § Computation of I p . We have E (ord p ( q j v j ) : ~v ∈ V ( F ) j +1 ) = E (ord p ( q j v ) : v ∈ V ( F ))= X v | p [ F ,v : Q p ][ F : Q p ] ord p ( q j v )= 1[ F : Q ] X v ∈ V ( F ) p e ( v/p ) ord p ( q j v ) f ( v/p ) ln( p )= d deg( X v ∈ V ( F ) p bad ord v ( q j v )[ v ]) . HenceI p = X p E p (cid:18) ord p ( q j v j ) (cid:19) = X p E ( d deg( X v ∈ V ( F ) p bad ord v ( q j v )[ v ]) = d deg lgp ,F ( P Θ ) . (7.6) We say a tuple ( v , . . . , v j ) is ramiﬁed if there exists some i with 0 ≤ i ≤ j such that e ( v i /p ) >

1. If atuple is not ramiﬁed it is called unramiﬁed . .7. Computation of II p . In what follows we make use of the average diﬀerent order of V over p is deﬁned to be the quantitydiﬀ p := log p ( E ( p diﬀ( v/p ) )) . (7.7)We will prove II p ≤ ( l + 1)4 diﬀ p ln( p ) . (7.8)Note that if we deﬁne the average diﬀerent for V by Diﬀ( V / Q ) = Q p p diﬀ p we getII ≤ ln Diﬀ( V / Q ) . (7.9)Before establishing (7.8) it is convenient to make the following Lemma. Lemma 7.7.1.

For ~v ∈ V ( F ) np let diﬀ ~v = diﬀ ( v ,...,v n ) = (diﬀ( v /p ) , . . . , diﬀ( v n /p )) . For ~v ∈ V ( F ) np following inequalities hold.(1) k diﬀ ~v k − k diﬀ ~v k ∞ ≤ n − n k diﬀ ~v k . (2) E (diﬀ ( v ,...,v n ) ) ≤ n diﬀ p . The subscripts and ∞ denote the usual l and l ∞ norms for vectors in R n .Proof. (1) The proof is a fortiori. For positive real numbers a , . . . , a n we have n ( n X i =1 a i − max ≤ i ≤ n a i ) = n ( n X i =1 a i ) − n max ≤ i ≤ n a i ) ≤ n ( n X i =1 a i ) − n X i =1 a i =( n − n X i =1 a i . This proves k ~a k − k ~a k ∞ ≤ n − n k ~a k , if we let ~a = ( a , . . . , a n ).(2) We will apply Jensen’s inequality, to turn an expectation of a sum E (diﬀ( v /p ) + · · · +diﬀ( v n /p )) into (the log of) an expectation of a product E ( p diﬀ( v /p ) · · · p diﬀ( v n /p ) ) . Now that this is a product of random variables the expectation factors, namely, E ( p diﬀ( v /p ) · · · p diﬀ( v n /p ) ) = E ( p diﬀ( v/p ) ) n . This shows E (diﬀ ( v ,...,v n ) ) ≤ n log p E ( p diﬀ( v/p ) )which is our desired result. (cid:3) We now prove our desired formulas: E ( k diﬀ ( v ,...,v j ) k − k diﬀ ( v ,...,v j ) k ∞ ) ≤ jj + 1 E ( k diﬀ ( v ,...,v j ) k ) ≤ jj + 1 (( j + 1)diﬀ p )= j diﬀ p . he ﬁrst line follows from Lemma 7.7.11 and the second line follows from Lemma 7.7.12(which as an application of Jensen’s inequality together with the way expectations of prod-ucts of random variables behave). It remains to compute the expectation of these over { , . . . , j } . We have E ( k diﬀ ( v ,...,v j ) k − k diﬀ ( v ,...,v j ) k ∞ ) ≤ E ( j diﬀ p ) =  l − ( l − / X j =1 j  diﬀ p = l + 14 diﬀ p , which gives our result.7.8. Computation of

III p . In what follows we will make use of the probability of a place v ∈ V p to be unramiﬁed. In formula this probability is deﬁned by P unr ,p = 1 − E (1 ram ( v ) : v ∈ V ( F ) p ) . (7.10)Also recall that since a tuple ~v = ( v , . . . , v j ) ∈ V j +1 p is unramiﬁed if and only if each v i isunramiﬁed for 0 ≤ i ≤ j this means that E (1 ram ( v , . . . , v j )) = 1 − P j +1unr ,p . This then givesIII p = E p (1 ram ( v , . . . , v j )) = 2 l − ( l − / X j =1 (cid:0) − P j +1unr ,p (cid:1) = 1 − l − ( l − / X j =1 P j +1unr ,p . As the smallest of the P j +1unr ,p is P ( l +1) / ,p we get the following inequality:III p ≤ − P ( l +1) / ,p . (7.11)7.9. Computation of IV p . We will proveIV p ≤ l + 54 ln( b p ) (cid:0) − P l +1 / ,p (cid:1) . (7.12)Using identical reasoning to § E (( j +1) ln( b p )1 ram ( ~v )) = ( j +1) ln( b p )(1 − P j +1unr ,p ). This gives IV p = ln( b p )  l − ( l − / X j =1 ( j + 1)(1 − P j +1unr ,p )  ≤ ln( b p )(1 − P l +1 / ,p ) (cid:18) l + 54 (cid:19) . Computation of V p . It will be convenient to deﬁne e p , the average ramiﬁcation index of V over p . In notation it is deﬁned by e p := E ( e ( v/p ) : v ∈ V ( F ) p ) . (7.13) e now compute V p : we have E ( j X i =0 ln( e ( v i /p ))) = E (ln( k Y i =0 e ( v i /p ))) ≤ ln( E j Y i =0 e ( v i /p ) ! ) ≤ ln( E ( e ( v/p )) j +1 ) = ( j + 1) ln( e p )The ﬁrst to second line is an application of Jensen’s inequality and the second to third lineuses that, for independent random variables, the expectation of the product is the productof the expectations. We then can compute the second expectation by computing the uniformaverage of j + 1 over { , . . . , ( l − / } . This givesIV p ≤ l + 54 ln( e p ) . (7.14)7.11. Archimedean Contribution.

Here we only have to deal with log-shells. ApplyingLemma 5.0.1 we have E ∞ = 2 l − ( l − / X j =1 ( j + 1) ln( π ) = l + 54 ln( π ) . Probabilistic Szpiro.

We now combine the results of the previous subsections. Theveriﬁcation of the following identities requires some careful bookkeeping.

Theorem 7.12.1 (Probabilistic Szpiro) . Assume [Moc15a, Corollary 3.12] and Claim 5.0.1.Then for any elliptic curve

E/F in initial theta data ( F /F, E F , l, M , V , V badmod , ǫ ) built overthe ﬁeld of moduli we have

16 + ε l ln | ∆ min E/F | [ F : Q ] ≤ ln Diﬀ( V ) + X p ln( e p ) + A l,V (7.15) where A l,V = ln( π ) + X p (1 − P ( l +1) / ,p ) (cid:18) ln( b p ) + 5 l + 4 (cid:19) , and b p = 1 / exp(1) ln( p ) , and ε l = 24( l + 3) / ( l + l − .Proof. For the most part, this is just a combination of the bounds on I,II,III,IV and Vgiven by equations (7.6),(7.8), (7.11), (7.12), and (7.14). The most interesting aspect of thiscomputation is the appearance of the 6 + ε l . rom the Tautological Probabilistic Inequality we get − d deg F ( P q ) ≤ − d deg lgp ,F ( P Θ ) + l + 14 ln Diﬀ+ X p (1 − P l +1 / ,p )(1 + l + 54 ln( b p )) + l + 54 X p ln( e p )+ l + 54 ln( π ) . Using that d deg lgp ,F ( P Θ ) = (( l + 1) l/ d deg( P q ) and d deg( P q ) = ln | ∆ min E/F | / l [ F : Q ] we get (cid:18) ( l + 1) l − (cid:19) l ln | ∆ min E/F | [ F : Q ] ≤ l + 14 ln Diﬀ + X p (1 − P l +1 / ,p )(1 + l + 54 ln( b p ))+ l + 54 X p ln( e p ) + l + 54 ln( π )We now divide both sides by ( l + 5) / (cid:18) l ( l + 1)12 − (cid:19) (cid:18) l (cid:19) (cid:18) l + 5 (cid:19) = 16 + ε l where ε l = 24 l + 72 l + l − . This proves the assertion that ε l = O (1 /l ) as l → ∞ . Finally, putting everything togetherwe get 16 + ε l ln | ∆ min E/F | [ F : Q ] ≤ ln Diﬀ + X p ln( e p ) + A l,V . (7.16)Here A l,V is as described in the statement of the proposition. (cid:3) Baby Szpiro.

To demonstrate the utility Probabilistic Szpiro we give a “Baby” Szpiroinequality.

Theorem 7.13.1 (Baby Szpiro) . Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. Foran elliptic curve E over a ﬁeld F sitting in initial theta data ( F /F, E F , l, M , V , V badmod , ǫ ) builtfrom the ﬁeld of moduli we have

16 + ε l ln | ∆ min E/F | [ F : Q ] ≤ ln([ K : Q ] / ) ln( | Disc( K/ Q ) | / ) + ln( π ) , (7.17) here ε l = (24 l + 72) / ( l + l − . roof of Baby Szpiro. This is just a simple application of the Probabilistic Szpiro for thetadata (Theorem 7.12.1) using elementary bounds for the right hand side. We useln(Diﬀ) ≤ ln(rad | Disc( K/ Q ) | · [ K : Q ]) , (7.18) X p ln e p ≤ ln([ K : Q ]) ω ( | Disc( K/ Q ) | ) , (7.19) X p (cid:16) − P l +14 unr ,p (cid:17) (cid:18) ln( b p ) + 4 l + 1 (cid:19) ≤ ln | Disc( K/ Q ) | , (7.20)where rad( N ) = Q p | N p and ω ( N ) = P d | n Thesetogether with the bounds ω ( N ) ≤ ln( N ) / ln ( N ) give that the right hand side of the secondprobabilistic Szpiro is less thanln( Dd ) + ln( D ) ln( d ) + ln( D )where D = | Disc( K/ Q ) | and d = [ K : Q ]. This simpliﬁes toln( Dd ) + ln( D ) ln( d ) + ln( D ) ≤ (ln( D ) + 2)(ln( d ) + 2) . Since

D, d ≥ SL ( F l ) ≥ a ∈ Q such that ln( x a ) ≥ ln( x ) + 2. Solvingthe inequality gives a ≥ / ln( x ) + 1 and since2ln( x ) + 1 ≤ ≤ D ) + 2)(ln( d ) + 2) ≤ ln( D / ) ln( d / ) which proves the result. (cid:3) Deriving Explicit Constants For Szpiro’s Inequality From Mochizuki’sInequality

In order to get strong uniform versions of Szpiro’s inequality from Mochizuki’s inequalityone needs to do some careful ramiﬁcation analysis based on the N´eron-Ogg-ShafarevichCriterion ( § | Disc( K/ Q ) | in terms of d , l, | Disc( F/ Q ) | .Here are the questions we needs to answer: What makes a place w ∈ V ( K ) p ramify? Whatis the maximum possible ramiﬁcation index e ( w/p ) as we vary over w ∈ V ( K ) p ? Does thesize of p matter? We answer all of these questions in the subsequent section and apply theseresults to get our version of uniform Szpiro with exponent 24. The inequality (7.18) is an application of the bounds on the diﬀerent order given in § p diﬀ p = E ( p diﬀ( v/p ) ) = X v | p [ F ,v : Q p ][ F : Q ] p diﬀ( v/p ) ≤ X v | p [ F ,v : Q p ][ F : Q ] p − e ( v/p ) +ord p e ( v/p ) ≤ p · p ord p [ K : Q ] . We then have Diﬀ(

V / Q ) ≤ Y p p · p ord p [ K : Q ] = rad( | Disc( K/ Q ) | )[ K : Q ] , which gives the result. .1. Ramiﬁcation Analysis.

The following Lemma answers the question about the max-imal ramiﬁcation index.

Lemma 8.1.1.

For every place w ∈ V ( K ) we have e ( w/p ) ≤ B l,d where B l,d = 276480 l d .Proof. Fix w ∈ V ( K ). We consider the successive extensions K ⊃ F = F ′ ( E [15]) ⊃ F ′ = F ( E [2] , √− ⊃ F ⊃ Q . We label the various images of w under the induced map on places as follows: V ( K ) → V ( F ) → V ( F ′ ) → V ( F ) → V ( Q ) w v v ′ v p. In this notation we have e ( w/p ) = e ( w/v ) e ( v/v ′ ) e ( v ′ /v ) e ( v /p ) ≤ l · · · [ F : Q ]= 276480 l d =: B l,d . We explain these inequalities: rach of the extensions (other than F ⊃ Q ) is Galois andwe have G ( K/F ) ⊂ GL ( F l ), G ( F/F ) ⊂ GL ( Z / G ( F ′ /F ) | . Knowing that ( F q ) = q ( q − q −

1) and plugging in explicit values gives the result. (cid:3)

As a Corollary we get the following.

Lemma 8.1.2. If p > B l,d then e ( w/p ) < p − . Note that this implies the ramiﬁcation of K/ Q is small for all but ﬁnitely many places. The upshot of most ramiﬁcation being small (Lemma 8.1.2) is that it allows us to applyour “trivial bounds” on the p -adic logarithm (Lemma 4.2.3) at all but ﬁnitely many places.The sum over p in the proof of explicit Szpiro can be broken down into three cases as shownin Figure 8.1 pB l;d = 276480 l d l + 1 e ( w=p ) small e ( w=p ) tame e ( w=p ) wild p ≤ l l + 1 < p < B l;d p > B l;d Figure 1.

A breakdown of the ramiﬁcation of a tuple ~v = ( v , v , . . . , v j ) ∈ V j +1 p . These come from the hypotheses of “initial theta data built from the ﬁeld of moduli”. .2. Explicit Szpiro.

In the remainder of the paper we derive the following version ofSzpiro’s inequality from (7.4).

Theorem 8.2.1.

Assume [Moc15a, Corollary 3.12] and Claim 5.0.1. If

E/F is an ellipticcurve in initial theta data ( F /F, E F , l, M , V , V badmod , ǫ ) built from the ﬁeld of moduli then | ∆ min E/F | ≤ e A d l + B d ( | Cond(

E/F ) | · | Disc( F/ Q ) | ) ε l , (8.1) where A = 84372107405 , B = 316495 and ε l = (96 ( l + 3)) / ( l + l − . Let B = B l,d = 2 d ( Z / l ). In what follows let E p be the expected value ofln µ ~v ( I ~v ) + k diﬀ ~v k − k diﬀ ~v k ∞ + 1 ram ( ~v ) (8.2)over ~v = ( v , . . . , v j ) ∈ ` ( l − / j =1 V ( F ) j +1 p . Above, I ~v denotes the hull of the tensor productof log-shells for ~v ∈ ` ( l − / j =1 V j +1 p . We compute E p for a given by breaking p into the cases • inﬁnite: p = ∞• large: p > B • small: p ≤ B Also, within each case we break (8.2) into three subcomputations:ln µ v ( I v ) | {z } I + k diﬀ ~v k − k diﬀ ~v k ∞ | {z } II + 1 ram ( ~v ) | {z } III . We then put these estimates together to get our results.8.3.

Computation at Inﬁnite Places.

Over the inﬁnite places we have E ∞ ≤ l + 54 ln( π ) . (8.3) Proof.

At the inﬁnite prime II ∞ = III ∞ = 0. The number ln( π )( l + 5) / j + 1) ln( π ) over j which comes from Lemma 5.0.1. (cid:3) Computation at Large Places.

Over the large places we have X p>B,p = ∞ E p ≤ l + 54 X p>B,p || D K, Q | ln( p ) , (8.4)which we can further estimate using X p>B,p || Disc( K/ Q ) | ln( p ) ≤ (cid:18) ln | Disc( F/ Q ) | [ F : Q ] + ln | Cond(

E/F ) | [ F : Q ] (cid:19) . We give a proof of these two claims.

Proof.

By the results of § p > B l,d , and every place v ∈ V p we have e ( v/p ) < p − . his leads to improvements in both the log-shell bounds I p and the diﬀerent bounds II p .From the estimates on log-shells we know that for ~v ∈ V j +1 p that I ~v ⊂ D ~v (0; p j +1 − P ji =0 /e ( v/p ) ) ⊂ D ~v (0 , p j +1 ) . This implies ln µ ~v ( I ~v ) ≤ ( j + 1) ln( p )and hence E p (ln µ v ( I v )) ≤ l + 54 ln( p )1 ram ( p ) . For the diﬀerent term II p we have E p ( k diﬀ ~v k − k diﬀ ~v k ∞ k )) ≤ l + 14 diﬀ p ln( p )where diﬀ p = log p ( E ( p diﬀ( v/p ) : v ∈ V p )). Using tameness, we have that diﬀ( v/p ) = 1 − /e ( v/p ) ≤ E ( p diﬀ( v/p ) ) ≤ E ( p ) = p . This givesdiﬀ p ≤ ram ( p ) = ( , ∀ v | p e ( v/p ) = 1 , , ∃ v | p, e ( v/p ) > . Hence E p ( k diﬀ ~v k − k diﬀ ~v k ∞ ) ≤ l + 14 1 ram ( p ) ln( p ) . Finally we estimate the third term III p : E p (1 ram( ~v ) ) ≤ (1 − P l +14 unr ,p ) ≤ ram ( p ) . Putting the estimates for I p , II p , and III p together in the case that p > B we get E p ≤ l + 54 ln( p )1 ram ( p ) + l + 14 ln( p )1 ram ( p ) + 1 ram ( p ) ≤ (cid:18) l + 54 + l + 54 (cid:19) ln( p )1 ram ( p )= l + 52 ln( p )1 ram ( p ) . To ﬁnish our result we use the Lemma just outside this proof environment. (cid:3) Lemma 8.4.1. X p || Disc( K/ Q ) | ,p>B ln( p ) ≤ (cid:18) ln | Disc( F/ Q ) | [ F : Q ] + ln | Cond(

E/F ) | [ F : Q ] (cid:19) (8.5) We have decided to label this theorem because it is a critical juncture where discriminants for K meetconductors using N´eron-Ogg-Shafarevich. This seems to be the critical step in relating the two. roof. The hard part of this formula is not getting too greedy, it seems. For p > B we knowthat p | | Disc( K/ Q ) | ⇐⇒ p | | Disc( F/ Q ) | or p | | Cond(

E/F ) | p [ F : Q ] (cid:19) . Note that we are using p -adic absolute values to take the p -parts of these integers. Weobserve that − ln | Disc( F/ Q ) | p = X w ∈ V ( F ) p f ( w/p ) d ( w/p ) ln( p ) , − ln | Cond( F/ Q ) | p = X w ∈ V ( F ) p f ( w/p ) c E ( w ) ln( p )where we have usedCond( E/F ) = Y w P c E ( w ) w , Disc( F/ Q ) = Y w P d ( w/p w ) w From § d ( w/p w ) = e ( w/p w ) − p w > B . Hence, it is enough to showthat for each p | | Disc( K/ Q ) | that2 P w | p ( f ( w/p )( e ( w/p ) −

1) + f ( w/p ) c E ( w ))[ F : Q ] ≥ . (8.6)Using that p > B and 2( e ( w/p ) − ≥ e ( w/p ) together with the fact that P w ∈ V ( F ) p f ( w/p ) e ( w/p ) =[ F : Q ] we get LHS of (8.6) ≥ [ F : Q ] + P w ∈ V ( F ) p f ( w/p ) c E ( w )[ F : Q ]= 1 + 2 P w ∈ V ( F ) p f ( w/p ) c E ( w )[ F : Q ] . This proves the result. We note that it is strictly greater than one since the initial thetadata hypothesis says that there is a non-empty set of primes in V badmod of bad reduction. (cid:3) Computation at Small Places.

Over the small places we have X p ≤ B E p ≤ ( l + 3) ln( B ) π ( B ) ≍ l d . (8.7) Proof.

In the situation where p ≤ B l,d we have worse bounds for E p . We will not care so muchabout these bounds as they turn into the constant which appears in Szpiro’s inequality. For f ( x ) and g ( x ) positive functions of a single real variable we write f ( x ) ≍ g ( x ) as x → ∞ if and onlyif f ( x ) = O ( g ( x )) and g ( x ) = O ( f ( x )) as x → ∞ . On some level, of course, we do care because we would like better constants. This is secondary achieving some

Szpiro though. n the ﬁrst term I p we use I ~v ⊂ D ~v p j +1 j Y i =0 e ( v i /p )ln( p ) exp(1) ! ⊂ D ~v (0; p j +1 B j +1 ) . This gives ln µ ~v ( I ~v ) ≤ log p ( p j +1 B j +1 ) ln( p ) = ( j + 1) ln( pB ) , which in turn gives (for p ramiﬁed) E p (ln µ ~v ( I ~v )) ≤ E (( j + 1) ln( pB )) = l + 54 ln( pB ) ≤ l + 52 ln( B ) , where the last inequality used p ≤ B .For term II p involving diﬀerents, we have E p ( k diﬀ ~v k − k diﬀ ~v k ∞ ) = l + 14 diﬀ p ln( p ) . Since diﬀ( v/p ) ≤ − /e ( v/p ) + ord p ( e ( v/p )) we get diﬀ( v/p ) ≤ p [ K : Q ] which provesdiﬀ p ≤ log p ( E ( p p [ K : Q ] )) = log p ( p p [ K : Q ] ) = 1 + ord p ([ K : Q ]) . Hence we have E p ( k diﬀ ~v k − k diﬀ ~v k ∞ ) ≤ l + 14 (1 + ord p [ K : Q ])) ln( p ) ≤ l + 12 ln( B )Finally in term III p we have E p (1 ram ( ~v )) ≤ (1 − P l +14 unr ,p ) ≤ ram ( p ) . Putting the estimates for I p , II p , and III p together we get X p ≤ B E p ≤ X p ≤ B (cid:20) l + 52 ln( B ) + l + 12 ln( B ) + 1 (cid:21) ≤ (( l + 3) ln( B ) + 1) π ( B ) ≤ ( l + 3) ln( B ) π ( B ) . This gives our main result. The asymptotic is then derived by using bounds in the primenumber function π ( x ). One such bound is Dusart’s bound [Dus18] which states that for x > π ( x ) ≤ x ln( x ) (cid:18) . x ) (cid:19) . (8.8)This then shows, using B = 276480 l d , that( l + 3) ln( B ) π ( B ) ≤ ( l + 3) B (cid:18) . B ) (cid:19) ≍ l d as l → ∞ . (cid:3) Remark . Using a slightly better form of Dusart’s bound gives an 1 / ln( B ) correctionterm. .6. Proof of Explicit Szpiro.

Working from d deg lgp ( P Θ ) − d deg( P q ) ≤ X p E p (8.9)The left hand side of (8.9) becomes (cid:18) l ( l + 1)12 − (cid:19) (cid:18) l (cid:19) ln | ∆ min E/F | [ F : Q ] , and the right hand side of (8.9) becomes X p E p = X p ≤ B E p + X p>B E p + E ∞ ≤ ( l + 3) ln( B ) π ( B ) + l + 52 · (cid:18) ln | Disc( F/ Q ) | [ F : Q ] + ln | Cond(

E/F ) | [ F : Q ] (cid:19) + (cid:18) l + 54 (cid:19) ln( π )We now divide both sides by ( l + 5). The coeﬃcient of the left hand side becomes (cid:18) l ( l + 1)12 − (cid:19) (cid:18) l (cid:19) (cid:18) l + 5 (cid:19) = l + l − l ( l + 5) =: 124 + ε l , where solving for ε l gives ε l = 96 ( l + 3) l + l − . We now have124 + ε l ln | ∆ min E/F | ≤ [ln( B ) π ( B ) + ln( π )] [ F : Q ] + ln | Disc( F/ Q ) | + ln | Cond(

E/F ) | (8.10)Finally, using d = 276480 (the upper bound on [ F : F ]) so that B = l d d , we get[ln( B ) π ( B ) + ln( π )] [ F : Q ] ≤ l d (cid:18) d (cid:18) . d ) (cid:19) + ln( π ) (cid:19) d d ≤ A d l + B d where A = 84372107405, and B = 316495. This gives our result after rewriting (8.10)multiplicatively with new bounds. References [AM69] Michael Atiyah and Ian Macdonald,

Introduction to commutative algebra, , Addison Wesley, 1969.2.8[Aut19] Stacks Project Authors,

Stacks project , 2019. 2.2[Con] Keith Conrad,

Diﬀerents , Notes of course, available on-line. 2.2[DH20a] Taylor Dupuy and Anton Hilado,

Log-Kummer Correspondences and Mochizuki’s Third Indeter-minacy , pre-print (2020). 4[DH20b] ,

The Statement of Mochizuki’s Corollary 3.12, Initial Theta Data, and the First TwoIndeterminacies . (document), 1, 1, 1, 1, 3.4, 6.2, 8, 6.3, 7.5[Dus18] Pierre Dusart,

Explicit estimates of some functions over primes , Ramanujan J. (2018), no. 1,227–251. MR 3745073 8.5 Fes15] Ivan Fesenko,

Arithmetic deformation theory via arithmetic fundamental groups and nonar-chimedean theta-functions, notes on the work of Shinichi Mochizuki , Eur. J. Math. (2015),no. 3, 405–440. MR 3401899 1, 7.5.3[Hos15] Yuichiro Hoshi, IUT Hodge-Arakelov-theoretic evalutation , 2015. 1[Hos17] , [IUTchIII-IV] from the point of view of mono-anabelian transport , 2017. 1, 7.5.3[Hos18] ,

Introduction to mono-anabelian geometry , 2018. 1[Ked15] Kiran Kedlaya,

Etale theta function , 2015. 1[Moc15a] Shinichi Mochizuki,

Inter-universal Teichm¨uller theory III: Canonical splittings of the log-theta-lattice , RIMS preprint (2015). (document), 1, 1.0.3, 1.0.4, 1.0.5, 1, 1a, 5, 7.5, 7.5.1, 7.12.1, 7.13.1,8.2.1[Moc15b] ,

Inter-universal Teichm¨uller theory IV: log-volume computations and set-theoretic founda-tions , RIMS preprint (2015). (document), 2.8, 4.2, 4.2.4, 5, 7.5[Moc15c] , Topics in absolute anabelian geometry III: global reconstruction algorithms , J. Math. Sci.Univ. Tokyo (2015), no. 4, 939–1156. MR 3445958 4.3[Moc17] Shinichi Mochizuki, The mathematics of mutually alien copies: From Gaussian integrals to inter-universal Teichmuller theory , 2017. 1[Moc18] ,

Comments on the manuscript (2018-08 version) by Scholze-Stix concerning inter-universalTeichmuller theory (iutch) , 2018. 1[Mok15] Chung Pang Mok,

Notes on Hodge theaters (for the 2015 Oxford workshop). , Handwritten Notes,2015. 1[Neu99] J¨urgen Neukirch,

Algebraic number theory , Grundlehren der Mathematischen Wissenschaften [Fun-damental Principles of Mathematical Sciences], vol. 322, Springer-Verlag, Berlin, 1999, Translatedfrom the 1992 German original and with a note by Norbert Schappacher, With a foreword by G.Harder. MR 1697859 2.2, 4[Rob00] Alain M. Robert,

A course in p -adic analysis , Graduate Texts in Mathematics, vol. 198, Springer-Verlag, New York, 2000. MR 1760253 4[Rob 3] David Roberts, A crisis of identiﬁcation , Inference Review (2019 in Volume 4, Issue 3). 3[Sil09] Joseph H Silverman, The arithmetic of elliptic curves , vol. 106, Springer Science & Business Media,2009. 3.5[Sil13] ,

Advanced topics in the arithmetic of elliptic curves , vol. 151, Springer Science & BusinessMedia, 2013. 3.5[SS17] Peter Scholze and Jakob Stix,

Why abc is still a conjecture. , 2017. 1, 1, 1e, 2, 7.5.3[ST68] Jean-Pierre Serre and John Tate,

Good reduction of abelian varieties , Ann. of Math. (2) (1968),492–517. MR 0236190 1, 3.9.1[Sti15] Jakob Stix, Reconstruction of ﬁelds using Belyi cuspidalization , 2015. 1[Sut15] Drew Sutherland,

Notes for 18.785 - number theory i , MIT course notes (2015). 2.2[Tan18] Fucheng Tan,

Note on IUT , 2018. 1, 2[Yam17] Go Yamashita,

A proof of the ABC conjecture after Mochizuki , RIMS preprint (2017). 1, RIMS preprint (2017). 1