Linear equations over multiplicative groups, recurrences, and mixing I
aa r X i v : . [ m a t h . N T ] O c t Linear equations over multiplicative groups, recurrences, and mixing I
H. Derksen* and D. Masser
Abstract.
Let K be a field of positive characteristic. When V is a linear variety in K n and G is a finitely generated subgroup of K ∗ , we show how to compute the set V ∩ G n effectively using heights. We calculate all the estimates explicitly. A special case providesthe effective solution of the S -unit equation in n variables.
1. Introduction.
In 2004 the second author published a paper [Mass] about linearequations over multiplicative groups in positive characteristic. This was specifically aimedat an application to a problem about mixing for dynamical systems of algebraic origin,and, as a result about linear equations, it lacked some of the simplicity of the classicalresults in zero characteristic. A new feature was the appearance of n − n is the number of variables.In 2007 the first author published a paper [D] about recurrences in positive char-acteristic. He proved an analogue of the famous Skolem-Lech-Mahler Theorem in zerocharacteristic. A new feature was the appearance of integer sequences involving combina-tions of d − d is the order of the recurrence.It turns out that these two new features are identical. In positive characteristic thevanishing of a recurrence with d terms can be regarded as an linear equation in d − n − d − K we write K ∗ for the multiplicativegroup of all non-zero elements of K . For any subgroup G of K ∗ and a positive integer n it makes sense to write P n ( G ) for the set of points in projective space defined over G . Theorem A (Evertse [E], van der Poorten-Schlickewei [PS]).
Let K be a field of zerocharacteristic, and for n ≥ let a , . . . , a n be non-zero elements of K . Then for anyfinitely generated subgroup G of K ∗ the equation a X + a X + · · · + a n X n = 0 (1 . has only finitely many solutions ( X , X , . . . , X n ) in P n ( G ) which satisfy X i ∈ I a i X i = 0 (1 . for every non-empty proper subset I of { , , . . . , n } . We should point out that this remains true even when G is not finitely generatedbut has finite Q -dimension. See also a recent paper [EZ] of Evertse and Zannier for aninteresting function field version.Theorem A is false in positive characteristic p ; for example in inhomogeneous formfor n = 2 the equation x + y = 1 (1 . x = t, y = 1 − t over the group G in K = F p ( t ) generated by t, − t ; andso thanks to Frobenius infinitely many solutions x = t p e , y = 1 − t p e = (1 − t ) p e ( e = 0 , , , . . . ) (1 . H defined by equation(1.1) to proper linear subvarieties defined by the vanishing of the left-hand sides in (1.2).We can iterate this descent by introducing special varieties T defined solely by binaryequations of the shape X i = aX j ( i = j, a = 0). For example T could be a single point or,when there are no equations at all, the full P n . We could call such varieties linear cosetsor just cosets. This word has a group-theoretical connotation, and indeed T above is atranslate of a group subvariety of the multiplicative group G n m in P n . Conversely it is notdifficult to see that every linear translate of a group subvariety of G n m is a coset in oursense (see for example Lemma 9.4 p.76 of [BMZ]). But we will in this paper make no useof these remarks or indeed hardly any further reference to group varieties.Anyway, it is easily seen that the complete descent yields a finite collection of cosets T , each contained in the original H , such that the full solution set H ( G ) = H ∩ P n ( G )coincides with the union of all T ( G ) = T ∩ P n ( G ). This is a little closer to the moregeneral context of Mordell-Lang (see below). No further descent from T ( G ) in terms ofproper subvarieties is possible; by way of compensation it is very simple to describe T ( G )explicitly (see for example the discussion towards the end of section 12).In positive characteristic we can establish a descent step similar to Theorem A, butit may involve Frobenius as in (1.4). This less simple situation makes the iteration moreproblematic, and for this reason it is clearer to present our result as a descent now froman arbitrary linear variety V to proper linear subvarieties.However the Frobenius does not always generate infinitely many solutions. It doesabove for x + y = 1, and also for t m x + y = 1 (1 . t m x ; this is because t lies in G . The situation is slightly moresubtle for (1.5) over the group G l generated by t l and 1 − t ; the above solution of (1.3)certainly leads to solutions x = t − m t p e , y = (1 − t ) p e ( e = 0 , , , . . . ) , (1 . G l unless p e ≡ m mod l . This can however happen for infinitelymany e but not necessarily all e in (1.6). This time t may not lie in G l but some positivepower does. Finally the equation (1 + t ) x + y = 1 has a solution x = 1 − t, y = t over G ,but the use of Frobenius will bring in an extra 1 + t , no positive power of which is in G (provided p = 2). 3hese considerations lead naturally to the radical √ G = K √ G for general G in general K ∗ . For us this remains in K ; thus it is the set of γ in K for which there exists a positiveinteger s such that γ s lies in G . Usually K will be finitely generated over its prime field,and then it is well-known that the finite generation of G is equivalent to that of √ G . Wealso see the need for some concept of isotriviality, already present in diophantine geometryat least since N´eron’s 1952 proof of the relative Mordell-Weil Theorem and Manin’s 1963proof of the relative Mordell Conjecture. In our linear context the appropriate refinementis G -isotriviality, introduced by Voloch [V] for n = 2.Namely, let K be a field of positive characteristic p , and for n ≥ V be a linearvariety in P n defined over K . We say that V is G -isotrivial if there is an automorphism ψ of P n ( K ), defined by ψ ( X , . . . , X n ) = ( g X , . . . , g n X n ) (1 . g , . . . , g n in G , such that ψ ( V ) is defined over the algebraic closure F p . Such a ψ could be called a G -automorphism. Let us write F K for F p ∩ K ; then of course ψ ( V ) isdefined over F K . So ψ ( V ) is defined over some F q ; and now a point w on V defined over G gives ψ ( w ) on ψ ( V ) which by Frobenius leads to points ψ ( w ) q e ( e = 0 , , , . . . ) on ψ ( V )and so ψ − ( ψ ( w ) q e ) ( e = 0 , , , . . . ) (1 . V , all still defined over G .Of course points over G are nothing other than zero-dimensional G -isotrivial varieties.Here is a preliminary version of our main descent step on linear equations. For V asabove write V ( G ) = V ∩ P n ( G ) for the set of points of V defined over G . But it is clearerfirst to consider points over the radical √ G . Descent Step over √ G . Let K be a field of positive characteristic, and suppose that thepositive-dimensional linear variety V defined over K is not a coset. Suppose also that √ G in K is finitely generated. Then there is an effectively computable finite collection W ofproper √ G -isotrivial linear subvarieties W of V , also defined over K , with the followingproperty.(a) If V is not √ G -isotrivial, then V ( √ G ) = [ W ∈W W ( √ G ) . b) If V is √ G -isotrivial and ψ ( V ) is defined over F q , then V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! . Thus (a) says that the points of V ( √ G ) are not Zariski-dense in V ; and (b) says thatthe points on V ( √ G ) like (1.8), which can be dense, at least arise from a set of w whichis not dense.Part (a) was essentially proved for n = 2 as Theorem 1 by Voloch [V] (p.196), and hisTheorem 2 (p.198) even covers the more general case of finite Q -dimension; here one getsthe finiteness of the solution set. A forerunner of part (b) for n = 2 can be seen in Mason[Maso] (pp. 107,108). The main result of [Mass] is restricted to a single equation (1.1) andis expressed in terms of a concept of “broad” set; as we do not need this result here (oreven the concept) we refrain from quoting it. However these authors do not discuss theeffectivity in our sense (see the discussion below).A simple example of (b) in inhomogeneous form is (1.3); this represents a line L ,clearly isotrivial and even trivial in that we can take ψ as the identity automorphism.When G is generated by t and 1 − t in K = F p ( t ), then √ G is obtained by adding theelements of F ∗ p as generators. Leitner [Le] has found that for p ≥ p + 4 points W , six of which are like w = ( t, − t ) in (1.4) and the remaining p − w = ( x, − x )for x = 2 , , . . . , p − V ( √ G ). In the analogous characterization of V ( G ) there is no longera clear separation of cases. In fact it can happen in case (b) above that the actions ofFrobenius through q e can get truncated, so that each e remains bounded; but then it iseasy to reduce this to case (a). A simple example is (1.5) for m = 1 in the group G = G l above for l = p , when the solutions (1.6) are over G only when e = 0. Here is a generalstatement. Descent Step over G . Let K be a field of positive characteristic, and suppose that thepositive-dimensional linear variety V defined over K is not a coset. Suppose also that √ G in K is finitely generated. Then there is an effectively computable finite collection W ofproper √ G -isotrivial linear subvarieties W of V , also defined over K , such that either V ( G ) = [ W ∈W W ( G )5 r V ( G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( G )) q e ! for some q and some √ G -automorphism ψ with ψ ( V ) defined over F q . It may be instructive here to consider the inhomogeneous example x + y − z = 1 (1 . G in K = F p ( t ) generated by t, − t . Now (1.9) represents a plane P ,also isotrivial and even trivial. Leitner [Le] has found that for p ≥ W and 8 points W . For example the line defined by tx + y = 1 , z = (1 − t ) x (1 . x = z, y = 1. And so is the point x = t, y = 1 − tt , z = (1 − t ) t . We can easily iterate the descent from (1.10). This is isotrivial via the automorphism ψ taking x, y, z to ˜ x = tx, ˜ y = y, ˜ z = t − t z , when the equations become ˜ x + ˜ y = 1 , ˜ z = ˜ x .Now (1.4) (with e replaced by f ) on (1.3) lead to the points w = ( x, y, z ) of W ( G ) with x = t p f − , y = (1 − t ) p f , z = t p f − (1 − t ) ( f = 0 , , , . . . ) . Then from (1.8) (with q = p and the identity automorphism) we get the points x = t ( q − r , y = (1 − t ) qr , z = t ( q − r (1 − t ) r (1 . P ( G ); here q = p f and r = p e now indicate independently varying powers of p . This isprecisely the example in [Mass] (p.202).With the help of a suitable notation we can after all do the complete descent, also forlinear varieties that are cosets; then the latter arise solely as obstacles. Denote by ϕ = ϕ q the Frobenius with ϕ ( x ) = x q . Let ψ , . . . , ψ h be projective automorphisms. Then weimitate commutator brackets by defining the operator[ ψ , . . . , ψ h ] = [ ψ , . . . , ψ h ] q = ∞ [ e =0 · · · ∞ [ e h =0 ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h ) , (1 . h = 0. This formally resembles Definition 7.7of [D] (p.208). Theorem 1.
Let K be a field of positive characteristic p , let V be an arbitrary linear vari-ety defined over K , and suppose that √ G in K is finitely generated. Then there is a power q of p such that V ( G ) is an effectively computable finite union of sets [ ψ , . . . , ψ h ] q T ( G ) with √ G -automorphisms ψ , . . . , ψ h (0 ≤ h ≤ n − , and cosets T contained in V. Here we see quite clearly the n − x + x − x − · · · − x n = 1generalizes (1.3) and (1.9), and it can be used to show that the upper bound n − d − e = 1 in (1.12) and all other zero, we see that ψ q − is a G -automorphism.Similarly for ψ q − , . . . , ψ q − h . However it may not always be possible to choose ψ , . . . , ψ h as G -automorphisms. This we also prove in section 13.We can also symmetrize the sets in Theorem 1. We explain this with the points (1.11)on P defined by (1.9). They can be written as x = t s − r , y = (1 − t ) s , z = t s − r (1 − t ) r (1 . s = qr . Here there is asymmetry because apparently r divides s . However (1.13) hasa meaning for any independent positive powers r, s of p ; and it is easily checked that theresulting points remain on P .To formulate this in general we introduce another bracket notation more related tothe group law. For points π , π , . . . , π h we define the set( π , π , . . . , π h ) = ( π , π , . . . , π h ) q = π ∞ [ l =0 · · · ∞ [ l h =0 ( ϕ l π ) · · · ( ϕ l h π h ) , (1 . π itself if h = 0. We introduce more special varieties S defined solely by binary equations of the shape X i = X j . For example S could be the7ingle point with all coordinates equal or the full P n . We could call such varieties linearsubgroups or just subgroups. As above it is not difficult to see that they are precisely thelinear group subvarieties of G n m , but again we don’t need to know this. Theorem 2.
Let K be a field of positive characteristic p , let V be an arbitrary linear vari-ety defined over K , and suppose that √ G in K is finitely generated. Then there is a power q of p such that V ( G ) is an effectively computable finite union of sets ( π , π , . . . , π h ) q S ( G ) with points π , π , . . . , π h (0 ≤ h ≤ n − defined over √ G and subgroups S . As in Theorem 1, the upper bound n − π q − , π q − , . . . , π q − h (aswell as the product π π · · · π h ) are defined over G . However this may not always be trueof π , π , . . . , π h , as we shall also prove in section 13.When V is a hyperplane defined by (1.1) we can even descend to points, provided werestrict to (1.2) in the style of Theorem A. Theorem 3.
Let K be a field of positive characteristic p , let H be defined by a X + a X + · · · + a n X n = 0 for non-zero a , a , . . . , a n in K , and write H ∗ ( G ) for the set of points in P n ( G ) satisfying X i ∈ I a i X i = 0 for every non-empty proper subset I of { , , . . . , n } . Suppose that √ G in K is finitelygenerated. Then there is a power q of p such that H ∗ ( G ) is contained both in (1) an effec-tively computable finite union of sets [ ψ , . . . , ψ h ] q { τ } in H ( G ) with √ G -automorphisms ψ , . . . , ψ h (0 ≤ h ≤ n − and points τ , and in (2) an effectively computable finite unionof sets ( π , π , . . . , π h ) q in H ( G ) with points π , π , . . . , π h (0 ≤ h ≤ n − . We do not prove it here, but in this situation H ∗ ( G ) is precisely a finite union of[ ψ , . . . , ψ h ] q { τ } . However there seems to be a strange asymmetry between the asymmetricpart (1) and the symmetric part (2). Namely it seems improbable that H ∗ ( G ) is preciselya finite union of ( π , π , . . . , π h ) q . For example, the point (1.13) on H defined by (1.9) isin H ∗ ( G ) except for r = s , which disturbs the independence of r and s .Apart from the work [V] already mentioned, there are other results of this kind, nowin the more general context of Mordell-Lang for arbitrary varieties V inside arbitrary8emiabelian varieties S . Typically here one intersects V with a finitely generated subgroupΓ of S ; however in the present paper with S = G n m we have for simplicity restricted Γ toa Cartesian product G n .Thus the main result Theorem A (p.104 - see also p.109) of Abramovich and Voloch[AV] almost implies part (a) of our Descent Step over √ G , except that they assume that V is not K ∗ -isotrivial and they have no information about W which would ensure linearityin our situation. The main result Theorem 1.1 (p.667) of Hrushovsky’s well-known paper[Hr] gives a similar implication. The restriction to our (a) corresponds to their restrictionto the non-isotrivial case. Again these authors do not discuss the effectivity in our sense.After earlier work by Scanlon, the isotrivial case was treated by Moosa and Scanlon.Their Theorem B (p.477) of [MS2] implies that our V ( G ) is what they call an F -set (seealso [MS1]). Indeed in our situation and notation an F -set is nothing other than a finiteunion of ( π , π , . . . , π h ) q A ( G ) with π π · · · π h and π q − , π q − , . . . , π q − h defined over G and an algebraic subgroup A . However they do not prove the bound h ≤ n − A which would imply that it is linear because our V is. Theirideas were developed by Ghioca [Gh], who in addition extended the results to Drinfeldmodules. See also the work [GM] of Ghioca and Moosa on division groups. Again there isno mention of effectivity.Now let us discuss this effectivity, a key aspect of the present paper.It is well-known that Theorem A (in zero characteristic) is semieffective in the sensethat effective and even explicit upper bounds for the number of solutions of (1.1) subject to(1.2) can be found. However it is not fully effective in the sense that no upper bounds areknown for the size of the solutions, even in very simple cases like K = Q and G generatedby 3,5,7; and it is even unknown how to find all the finitely many non-negative integers a, b, c satisfying an equation like 3 a + 5 b − c = 1 . Out of the works in positive characteristic quoted above, only two discuss effectivity,and then only semieffectivity in the sense above. Voloch [V] in the theorems mentionedabove gives explicit upper bounds for the cardinality of V ( G ) for n = 2 in case (a) ofTheorem 1; these are uniform in the sense that they are independent of V and furtherthey depend on G only with regard to its rank. A similarly uniform bound is given asTheorem 6.1 (p.687) by Hrushovsky [Hr] for V in an abelian variety; however as it standsit is not completely explicit due to the use of non-standard analysis. These bounds are in9ine with the well-known estimates in zero characteristic - see for example Theorem 1.1 of[ESS] (p.808).By contrast our results above are fully effective. This should be no surprise; forexample it is rather easy by differentiating to find all non-negative integers a, b, c with(3 + t ) a + (5 + t ) b − (7 + t ) c = 1in any fixed K = F p ( t ). We shall work out explicit bounds, at first for the Descent Stepover √ G , where the exponents appearing can be reasonably small; and then for the DescentStep over G and Theorems 1,2 and 3. It would then be a straightforward matter to deducebounds for the various cardinalities involved; but more work may be needed to make theseuniform in the sense above.In fact the size bounds cannot be uniform in this sense. For example from the non-isotrivial equation x + ay = 1 with a = − t m (1 − t ) m ( m = p e ) over the group generated by t and 1 − t in F p ( t ), with solution x = t m , y = (1 − t ) m , we can easily show that the size ofsolutions for fixed G must depend on V . Similarly the isotrivial equation x + y = 1 overthe group generated by t m and (1 − t ) m in F p ( t ), with the same solution, demonstratesthat the size of solutions for fixed V must depend on more than just the rank of G .Because all our varieties are linear, we can measure them in a traditional way in termsof certain heights on the Grassmannian. We will show for example in the Descent Stepover √ G that h ( W ) ≤ Ch ( V ) n (1 . W is no longer required to be √ G -isotrivial, where C depends only on K, n and G . Ifwe insist on W being √ G -isotrivial, then the exponent is not so small. The well-knownNorthcott Property of heights often implies that the set of W in (1.15) is finite and easilyeffectively computable.Perhaps since the results in zero characteristic are not effective, there is no traditionabout measuring the groups Γ, even in S = G n m . Because our Γ = G n , it is here possibleto use a basis-free notion of regulator R ( G ). We will show that the bounds, at least when G = √ G , are all of polynomial growth in R ( G ). For example in (1.15) we get C ≤ cR ( G ) n +2 again if W is no longer required to be √ G -isotrivial, where c now depends only on K, n and the rank r of G . In fact here c = 8 n d (10 n ( n + r ) n + r ) ) n +1 d depending only mildly on K ; for example d = 1 if K is a field of rational functionsin several independent variables over a finite field.However we did find it a small surprise to discover that when G = √ G the smallestbounds can be exponential in R ( G ). A hint of this can be seen from the above discussionof (1.5) and G l . For example the simplest solution of the equation t x + y = 1with x, y in the group generated by t and 1 − t in F ( t ) is x = ( t ) , y = (1 − t ) ; (1 . √
3. For an explanation see the end of section 11.In section 12 we estimate the heights (in a natural sense) of all the quantities occurringin our Theorems. The bounds are polynomial in h ( V ) and R ( G ) if G = √ G ; but otherwisethey may involve an extra, possibly unavoidable, exponential dependence on R ( G ). Heretoo there is a Northcott Property to ensure effectivity.At first sight it may seem that the methods of [Mass] and [D] are unrelated. But thereare close connections, and we give some hints of this in our exposition. Here we mentionjust that [Mass] works with derivatives and [D] works with p -automata and “free Frobeniussplitting”. For example over F p ( t ), [Mass] (p.196) has δ i = ( ddt ) i ( i = 0 , . . . , p −
1) while[D] (p.198) splits F p ( t ) into a direct sum of one-dimensional F p ( t p )-subspaces V i ( i =0 , . . . , p −
1) and considers the associated projections λ i . In the natural case V i = t i F p ( t p )one checks easily that the vectors ( δ , tδ . . . , t p − δ p − ) and ( λ , λ , . . . , λ p − ) are connectedvia an invertible matrix over F p . So in some sense differentiating is equivalent to projecting.We can also quote Hrushovsky [Hr] (p.669) “Distinguishing a basis for K/K p has theeffect of fixing also a stack of Hasse derivations.” As a matter of fact we do not use Hassederivations in this paper (see the remarks at the end of section 5).Here is a brief section-by-section account of what follows.We begin in section 2 by explaining heights. Then in section 3 we introduce deriva-tions, and we use all this to give preliminary effective versions of the two main technicalresults of [Mass] about dependence over the field of differential constants.In section 4 we explain regulators, and in section 5 we use these to refine the work ofsection 3. 11hen section 6 contains a technical result which enables us to identify isotriviality,and in section 7 we record some observations about automorphisms and heights of varieties V . We are now in a position in section 8 to make effective the main argument of [Mass]yielding the subvarieties W , at least for points over √ G and when V is either a hyperplaneor trivial. We treat general V in section 9 but omitting the isotriviality of the W . Thisomission is then remedied in section 10 with a simple inductive argument, and in section11 we show how to treat points over G . We can then in section 12 prove effective versionsof our Descent Steps and Theorems.Finally in section 13, as already mentioned, we show that various aspects of our resultscannot be further improved.We would also like to draw attention to a very recent manuscript [AB] of Adamczewskiand Bell for further work in the context of p -automata; in particular this covers alsoequations (1.1) and recurrences.
2. Heights.
The Theorems above for arbitrary fields can easily be reduced to the casewhen the field is finitely generated over its ground field F p (see section 12 below). Ingeneral let K be finitely generated over a subfield k in any characteristic. We shall defineheights on K relative to k ; thus we suppose that K is a transcendental extension of k .Here we do not know any basis-free notion of height, and thus we choose a transcendencebasis B of K over k with elements t , . . . , t b regarded as independent variables over k . Theheight ˜ h ( a ) = ˜ h B ( a ) of an element a = 0 of k [ B ] = k [ t , . . . , t b ] will be its total degree deg a regarded as a polynomial; also ˜ h (0) = 0. The height can be extended to an element x ofthe quotient field k ( B ) = k ( t , . . . , t b ) by writing x = a a for coprime polynomials a , a in k [ B ] and defining ˜ h ( x ) = ˜ h B ( x ) = max { deg a , deg a } . (2 . K . This is a standard matter using valuations.There is a valuation on k [ B ] corresponding to total degree and defined by | a | ∞ =exp(deg a ) ( a = 0); and of course | | ∞ = 0. This extends at once to k ( B ) by multi-plicativity. And for every irreducible p in k [ B ] there is a valuation defined on k [ B ] by | a | p = exp( − ω p ( a ) deg p ) ( a = 0) , where ω p ( a ) is the exact power of p dividing a ; andagain | | ∞ = 0. And it too extends to k ( B ) by multiplicativity. Using v to run over ∞ p , we have the product formula Q v | x | v = 1 ( x = 0) and the height formula˜ h ( x ) = log Q v max { , | x | v } .Now K is a finite extension of k ( B ), say of degree d . Thus each valuation v has finitelymany extensions w to K , written w | v . In fact | x | w = | N ( x ) | /d w v , (2 . K w to the completion k ( B ) v and d w is the relativedegree. We also have P w | v d w = d . Now the product formula Y w | x | d w w = 1 ( x = 0)holds. Further the formula ˜ h ( x ) = 1 d log Y w max { , | x | d w w } extends the height ˜ h = ˜ h B to an absolute height on K . For all this see [La2] (pp.1-19) or[BG] (pp.1-10).Actually for convenience in estimating we will use from now on the relative height h ( x ) = h B ( x ) = d ˜ h ( x ) ≥ . This can be calculated directly from the minimum polynomial in the following extensionof (2.1).
Lemma 2.1.
Suppose x in K satisfies an equation A ( x ) = 0 with A ( t ) = a t e + · · · + a e for a , . . . , a e in k [ B ] and A ( t ) irreducible over k [ B ] . Then eh ( x ) = d max { deg a , . . . , deg a e } . Proof.
Over a splitting field L we have A ( t ) = a ( t − x ) · · · ( t − x e ), and we can extend,keeping the same notation, all the valuations to L . Then Gauss’s Lemma givesmax {| a | w , . . . , | a e | w } = | a | w max { , | x | w } · · · max { , | x e | w } . If w does not divide ∞ then the left-hand side is 1 because a , . . . , a e are coprime; andotherwise they are all max {| a | ∞ , . . . , | a e | ∞ } . Taking the product with exponents d w andthen taking logarithms gives on the left-hand side d max { deg a , . . . , deg a e } and on theright-hand side h ( x ) + · · · + h ( x e ). This last is just eh ( x ) because x , . . . , x e are conjugateover k ( B ). 13n immediate consequence of Lemma 2.1 is the Northcott Property; namely that forany H there are at most finitely many x in K with h ( x ) ≤ H .We will also need the standard extensions to vectors. So for x , . . . , x l in K we define h ( x , . . . , x l ) = log Y w max { , | x | d w w , . . . , | x l | d w w } . For example h ( a , . . . , a e ) in the situation of Lemma 2.1 is just d max { deg a , . . . , deg a e } . The Northcott Property extends at once to K l .
3. Dependence with heights.
Given K finitely generated and transcendental over k , there is always a separable transcendence basis B = ( t , . . . , t b ); this means that K is separable over k ( B ). As above write d = [ K : k ( B )] . On k [ B ] we have the standardderivations ∂∂t , . . . , ∂∂t b , which extend in the obvious way to k ( B ). And by separabilitythey extend uniquely to K . For all this see [La1] (pp.183-184). For an integer i ≥ D ( i ) as the set of operators D = (cid:18) ∂∂t (cid:19) i · · · (cid:18) ∂∂t b (cid:19) i b as i , . . . , i b run over all non-negative integers with i + · · · + i b ≤ i . This is not quite thesame as [Mass] (p.196), where we had i ≥ i + · · · + i b < i .It will be convenient for later calculations to define a quantity h ( x ; i ) as follows. Weorder in some way the operators D , . . . , D l of D ( i ), and we define for x = 0 h ( x ; i ) = h B ( x ; i ) = h (cid:18) D xx , . . . , D l xx (cid:19) of course independent of the ordering.The next result is an explicit version of Lemma 3 of [Mass] (p.195) however withoutreference to any group G . We write C for the field of differential constants in K . Forzero characteristic this is k , but for positive characteristic p it is the set of p th powers ofelements of K . Lemma 3.1.
For m ≥ suppose c , . . . , c m are in C and x , . . . x m are in K ∗ with c x + · · · + c m x m = 1 . (3 . Then either a) h ( c x , . . . , c m x m ) ≤ ( m + 1) ( h ( x ; m −
1) + · · · + h ( x m ; m − or (b) x , . . . , x m are linearly dependent over C .Proof. If (b) does not hold, then the theory of the generalized Wronskian (see for example[La2] p.174) shows that we may find operators D i in D ( i ) ( i = 0 , . . . , m −
1) such that thematrix with entries D i x j ( i = 0 , . . . , m − j = 1 , . . . , m ) is non-singular. Applying themto (3.1) we get m X j =1 D i x j x j ( c j x j ) = D i (1) ( i = 0 , . . . , m − . These can be solved by Cramer’s Rule to get c j x j = w j w ( j = 1 , . . . , m ), where w = 0 isthe determinant of the matrix with entries D i x j x j ( i = 0 , . . . , m − j = 1 , . . . , m ). Notingthat this determinant is multilinear in the columns, we find that h ( w ) ≤ h ( x ; m −
1) + · · · + h ( x m ; m − h ( w j ) ( j = 1 , . . . , m ). We conclude that h ( c x , . . . , c m x m ) = h ( w i w , . . . , w m w ) is at most h ( w ) + h ( w ) + · · · + h ( w m ) ≤ ( m + 1) ( h ( x ; m −
1) + · · · + h ( x m ; m − G . Lemma 3.2.
For m ≥ suppose x , x , . . . x m are in K ∗ and linearly dependent over C but x , . . . x m are linearly independent over C . Then there is a relation c x + · · · + c m x m = x (3 . with c , . . . , c m in C and h (cid:18) c x x , . . . , c m x m x (cid:19) ≤ ( m + 1) (cid:18) h (cid:18) x x ; m − (cid:19) + · · · + h (cid:18) x m x ; m − (cid:19)(cid:19) . Proof.
There is certainly a relation (3.2) with c , . . . , c m in C , and we apply Lemma 3.1to the quotients x x , . . . , x m x . As x , . . . x m are linearly independent over C , the conclusion(b) cannot hold. Now conclusion (a) is just what we need, and this completes the proof.In section 5 we shall prove versions of Lemmas 3.1 and 3.2 that are uniform for x , x , . . . , x m in a finitely generated group G as in [Mass]. By way of preparation, thenext result illustrates the logarithmic nature of the quantities h ( ; i ).15 emma 3.3. For any x = 0 , y = 0 in K and any integers i ≥ , e ≥ we have h ( xy ; i ) ≤ h ( x ; i ) + h ( y ; i ) and h ( x e ; i ) ≤ ih ( x ; i ) .Proof. Let D be in D ( i ). By distributing operators over the factors of xy as in Leibniz, wesee that D ( xy ) xy is a sum with generalized binomial coefficients of products E ( x ) x F ( y ) y withoperators E, F also in D ( i ). Taking D = D , . . . , D l as in the definition of h ( xy ; i ), wededuce the first inequality of the present lemma by standard height calculations.When e is a positive integer, a similar argument shows that D ( x e ) x e is a sum withgeneralized binomial coefficients of products E ( x ) x · · · E e ( x ) x with operators E , . . . , E e alsoin D ( i ). Here E · · · E e = D , so that there are at most i terms not equal to 1 in thisproduct. Thus D ( x e ) x e is a polynomial of total degree at most i in the E ( x ) x for E in D ( i ).The second inequality now follows in a similar way, at least for e ≥
1. The result is trivialfor e = 0. Lemma 3.4.
For any x = 0 in K and any integer i ≥ we have h ( x ; i ) ≤ idh ( x ) . Proof.
This is trivial for i = 0, so we assume from now on i ≥
1. We have an equation A ( x ) = 0 as in Lemma 2.1, of degree e ≤ d . Denote by A ′ ( t ) the derivative with respectto t . Pick any D in D ( i ). We claim that B i = ( A ′ ( x )) i − Dx is a polynomial in x andvarious derivatives D a of various coefficients a of A , with coefficients in k and of degreeat most (2 i − e −
1) + 1 in x and of total degree at most 2 i − D a . We provethis by induction on i .When i = 1 we have for example D = ∂∂t = ∂ (say), and applying this to A ( x ) = 0yields B = − P ej =0 ( ∂a e − j ) x j for which the claim is clear.Assuming Dx = B i ( A ′ ( x )) i − with B i as above, we do the induction step by applyingone more operator, again say ∂∂t = ∂ . We get( A ′ ( x )) i ∂Dx = A ′ ( x ) ∂B i − (2 i − B i ∂ ( A ′ ( x )) . Here ∂B i involves x to degree at most (2 i − e −
1) + 1 and also x to degree at most(2 i − e −
1) multiplied by ∂x = B A ′ ( x ) , together with D a to total degree at most 2 i − ∂ ( A ′ ( x )) involves x to degree at most e − x to degree at most e − e = 1) multiplied by ∂x = B A ′ ( x ) , together with D a to total degree at most 1. Multiplyingby A ′ ( x ) we get ( A ′ ( x )) i +1 ∂Dx involving x to degree at most e − { (2 i − e −
1) + 1 + ( e − , (2 i − e −
1) + e } = (2( i + 1) − e −
1) + 1 , D a is at most (2 i −
1) + 1 + 1 = 2( i + 1) −
1. This proves the claim ingeneral.There follows at once the estimatelog | B i | w ≤ ((2 i − e −
1) + 1) log max { , | x | w } for any w not dividing ∞ ; and otherwise we get an extra term (2 i −
1) max { deg a , . . . , deg a e } .The same estimates also hold for log | C | w where C = x ( A ′ ( x )) i − .Now write B ij for the B i corresponding to the operators D j ( j = 1 , . . . , l ) of D ( i ), sothat D j xx = B ij C . Then h (cid:18) D xx , . . . , D l xx (cid:19) = X w d w max { log | B i | w , . . . , log | B il | w , log | C | w } which is at most((2 i − e −
1) + 1) h ( x ) + (2 i − d max { deg a , . . . , deg a e } . Finally by Lemma 2.1 this is at most((2 i − e −
1) + 1) h ( x ) + (2 i − eh ( x ) ≤ ieh ( x ) ≤ idh ( x )as required. This completes the proof of the present lemma.In view of our consistent use of the relative height (as opposed to the absolute height),the factor d here looks like a normalization error. However it cannot be avoided, as theexample x = ( t +1 t ) /d ( t = t ) in K = k ( t )( x ) = k ( x ) shows. One finds that the rationalfunction x ∂ i x∂t i has denominator ( t ( t + 1)) i . So its height is at least 2 id = 2 idh ( x ), whichshows also that our dependence on i is not too bad. Perhaps even the factor 4 essentiallycannot be avoided.
4. Regulators.
Let K be finitely generated and transcendental over k as in the precedingsection, and let B be a transcendence basis. Let G be a subgroup of K ∗ finitely generatedmodulo k ∗ ; that is, G/ ( G ∩ k ∗ ) is finitely generated. We show here how to define a regulator R ( G ) = R B ( G ).For all w except finitely many we have | g | w = 1 for every g in G . Pick a set of N ≥ L from G into17 N whose typical coordinate is d w log | g | w . In fact by (2.2) L ( G ) lies in Z N and is thereforediscrete. Thus it is a (full) lattice in the real subspace it generates, whose dimension is therank r of G/ ( G ∩ k ∗ ). If r ≥ R ( G ) = R B ( G ) = det L ( G ) ≥ r = 0 we define R ( G ) = 1. Thisdoes not quite coincide with the standard definition for the unit group in algebraic numbertheory, because the latter is obtained by a projection to one dimension lower. But theyare equal up to a constant factor.The following example will be quoted later. With K = F p ( t ) (and the obvious B ) and G l generated by t l and 1 − t we have N = 3 corresponding to valuations at t = 0 , , ∞ ;and so vectors ( l, , l ) and (0 , ,
1) giving R B ( G l ) = l √ Lemma 4.1.
Let
G, G ′ in K ∗ be finitely generated modulo k ∗ with G of finite index in G ′ .Then R ( G ) = [ G ′ : G ][ G ′ ∩ k ∗ : G ∩ k ∗ ] R ( G ′ ) = [ G ′ / ( G ′ ∩ k ∗ ) : G/ ( G ∩ k ∗ )] R ( G ′ ) , where we identify G/ ( G ∩ k ∗ ) as a subgroup of G ′ / ( G ′ ∩ k ∗ ) .Proof . The quotients G/ ( G ∩ k ∗ ) , G ′ / ( G ′ ∩ k ∗ ) are torsion-free, both with the same rank,say r . If r = 0 the lemma is trivial. Otherwise using elementary divisors we can findgenerators γ , . . . , γ r of G ′ / ( G ′ ∩ k ∗ ) and positive integers d , . . . , d r such that γ d , . . . , γ d r r generate G/ ( G ∩ k ∗ ). Then the relationship between L ( G ′ ) and L ( G ) is clear, and thelemma follows. Lemma 4.2.
Let G in K ∗ be finitely generated modulo k ∗ , let x be in K ∗ , and let G ′ bethe group generated by x and the elements of G . Then R ( G ′ ) ≤ h ( x ) R ( G ) .Proof . It is geometrically clear that if Λ is any lattice in euclidean space, then det(Λ+ Zv ) ≤ det(Λ) | v | for the length, at least if v is not in the space spanned by Λ. But this continuesto hold for all v provided only | v | ≥ Zv remains discrete. In particular it holdsfor Λ = L ( G ) and v = L ( x ). We conclude R ( G ′ ) ≤ |L ( x ) | R ( G ). Finally we have bydefinition and the product formula h ( x ) = X w max { , m w } = 12 X w | m w | (4 . m w = d w log | x | w . And |L ( x ) | = X w m w ≤ ( X w | m w | ) = 4( h ( x )) . The lemma follows.We can recover a basis from the regulator as follows.
Lemma 4.3.
Let G be a subgroup of K ∗ finitely generated modulo k ∗ with G/ ( G ∩ k ∗ ) ofrank r ≥ . Then there are g , . . . , g r in G generating G/ ( G ∩ k ∗ ) , with h ( g ) · · · h ( g r ) ≤ r δ ( r ) R ( G ) for δ ( r ) = r r .Proof . By Minkowski’s Second Theorem (see for example [Ca] Theorem V p.218) there are˜ g , . . . , ˜ g r in G multiplicatively independent modulo k ∗ , with |L (˜ g ) | · · · |L (˜ g r ) | ≤ r V ( r ) det L ( G ) = 2 r V ( r ) R ( G ) (4 . V ( r ) of the unit ball in R r . By geometry V ( r ) ≥ ( √ r ) r . We get a basis in the standard way using the argument of Mahler-Weyl (see forexample [Ca] Lemma 8 p.135); there results |L ( g i ) | ≤ max { , i }|L (˜ g i ) | ( i = 1 , . . . , r ) , and so r V ( r ) in (4.2) gets replaced by r !2 r − r r/ ≤ r r/ r − . Now (4.1) gives h ( g ) = X w max { , m w } = 12 X w | m w | for m w = d w log | g | w in Z . And | m | ≤ m for any m in Z , so we get h ( g ) ≤ X w m w = 12 |L ( g ) | . Therefore h ( g ) · · · h ( g r ) ≤ r r r R ( G ) < r δ ( r ) R ( G ) as desired. 19n view of (4.2) it seems a pity that the square of the regulator appears in Lemma4.3. But it cannot be avoided. For example let α , . . . , α l , β , . . . , β l be different constantsin k , and consider G generated by g = ( t − α ) ··· ( t − α l )( t − β ) ··· ( t − β l ) in K = k ( t ). Then R ( G ) = √ l .The only possibilities for g are γg ± with γ constant. But then h ( g ) = l , so any bound h ( g ) ≤ δ (1) R ( G ) is impossible.This leads to the following uniform version of Lemma 3.4 when x lies in G . Write G k for the group generated by the elements of G and k ∗ . Lemma 4.4.
Let G be a subgroup of K ∗ finitely generated modulo k ∗ with G/ ( G ∩ k ∗ ) ofrank r ≥ . Then for any g in G we have h ( g ; i ) ≤ i dδ ( r ) R ( G ) . Further for any positiveinteger l there is g in G k and g ′ in G with g = g g ′ l and h ( g ) ≤ lδ ( r ) R ( G ) . Proof.
Choose basis elements g , . . . , g r according to Lemma 4.3, and write g = cg e · · · g e r r for rational integers e , . . . , e r and c in k ∗ . Replacing some of the g j by their inverses,we can assume that all e j ≥
0; this does not affect the estimate in Lemma 4.3. Then byLemma 3.3 h ( g ; i ) = h ( g e · · · g e r r ; i ) ≤ h ( g e ; i ) + · · · + h ( g e r r ; i ) ≤ i ( h ( g ; i ) + · · · + h ( g r ; i )) . This in turn by Lemma 3.4 is at most4 i d ( h ( g ) + · · · + h ( g r )) ≤ i drh ( g ) · · · h ( g r ) ≤ i dδ ( r ) R ( G ) (4 . e j = f j + le ′ j with 0 ≤ f j < l ( j = 1 , . . . , r ) (compare also [D] p.197), taking g = cg f · · · g f r r , g ′ = g e ′ · · · g e ′ r r and using the inequality in (4.3).The final result of this section will lead easily to a quantitative version of Lemma 2 of[Mass] (p.193), such as those mentioned in [Mass] (pp 194,195). However it involves betterconstants and is no longer restricted to positive characteristic. It is here, by the way, thatthe radical √ G makes its essential appearance in the whole story. Lemma 4.5.
Suppose that x, y are in K ∗ with x not in √ G k and y q x in G for some positiveinteger q . Then q ≤ h ( x ) R ( G ) .Proof. Let G ′ be the group generated by x and the elements of G , and let G ′′ be the groupgenerated by y and the elements of G , so that G ′ lies in G ′′ . Since x is not in √ G , it is20asy to see that the index [ G ′′ : G ′ ] = q . Since x is not even in √ G k , it is even easier tosee that G ∩ k ∗ = G ′ ∩ k ∗ = G ′′ ∩ k ∗ . Thus by Lemma 4.1 we have R ( G ′ ) = qR ( G ′′ ) ≥ q .On the other hand R ( G ′ ) ≤ h ( x ) R ( G ) by Lemma 4.2, and the result follows.
5. Dependence with regulators.
Let K be finitely generated and transcendental over k as in the preceding sections, and let B be a transcendence basis, now assumed separable,with elements t , . . . , t b . We continue to abbreviate the height h B as h , and again we write C for the field of differential constants of K .The following result eliminates the height functions h ( x, m −
1) from Lemma 3.1,thereby providing a more useful explicit version of Lemma 3 of [Mass].
Lemma 5.1.
Let G in K ∗ be finitely generated of rank r ≥ modulo k ∗ , and for m ≥ suppose c , . . . , c m are in C and g , . . . g m are in G with c g + · · · + c m g m = 1 . Then either(a) h ( c g , . . . , c m g m ) ≤ m dδ ( r ) R ( G ) or (b) g , . . . , g m are linearly dependent over C .Proof. Just use Lemma 3.1 together with the inequalities h ( g ; m − ≤ m − dδ ( r ) R ( G ) (5 . g = g , . . . , g m .Similarly we deduce a more useful explicit version of Lemma 4 of [Mass]. Lemma 5.2.
Let G in K ∗ be finitely generated of rank r ≥ modulo k ∗ , and for m ≥ suppose g , g , . . . g m are in G and linearly dependent over C but g , . . . g m are linearlyindependent over C . Then there is a relation c g + · · · + c m g m = g with c , . . . , c m in C and h (cid:18) c g g , . . . , c m g m g (cid:19) ≤ m dδ ( r ) R ( G ) . roof. Just use Lemma 3.2 and (5.1), this time with g = g g , . . . , g m g .We have followed the proof in [Mass] quite closely. It would have been nice to see thewell-known number m ( m − in place of 4 m , and also some notion of genus and S -units asin various formulations of abc matters over function fields. But despite the considerationsof Chapter 14 of [BG] in zero characteristic and those of Hsia and Wang [HW] for arbitrarycharacteristic we have been unable to supply a satisfactory version. The results of [HW]are especially interesting in their use of divided derivatives or hyperderivations, which forexample in characteristic p leads to linear dependence over the field of p e th powers, not justover C with e = 1. If this could be done in our situation, then it would probably lead tosimplifications in the rest of our proof, and possibly to the elimination of the Proposition insection 8. But it seems that the results of [HW] cannot be directly applied to our Lemma5.1, due to the presence of c , . . . , c m whose heights cannot be controlled.
6. Isotriviality.
We take a well-earned break from estimating. From now on K willhave positive characteristic p (actually this assumption is not really needed until section8), and, as in section 1, we write F K for F p ∩ K . This field plays the role of k in sections2,3,4,5.Suppose n ≥ m ≥
1. For a ( i, j ) in K the normalized equations X i = a ( i, X + · · · + a ( i, m − X m − = m − X j =0 a ( i, j ) X j ( i = m, m + 1 , . . . , n ) (6 . P n a linear variety V of dimension m −
1. When G is a subgroup of K ∗ , we needsome conditions which ensure that V is G -isotrivial.Now any G -automorphism taking ( X , . . . , X n ) to ( g X , . . . , g n X n ) leads after renor-malization to new coefficients g i g j a ( i, j ). If the new forms are defined over F K , then everynon-zero a ( i, j ) has the shape g j g i α ( i, j ) for non-zero α ( i, j ) in F K . In particular eachequation in (6.1) defines a G -isotrivial variety. But also each quotient a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k − ) a ( i k , j k ) a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k ) a ( i k , j ) ( k = 2 , . . . , n + 1) , (6 . F K . The followingresult gives a converse statement which guarantees that the equations (6.1) become defined22ver F K after applying a suitable G -automorphism and renormalizing. In particular itguarantees that V is G -isotrivial; but without the need to recombine the equations. Lemma 6.1.
Suppose that each equation in (6.1) defines a G -isotrivial variety, and thateach quotient (6.2) lies in F K provided everything in the numerator and denominator isnon-zero. Then V is G -isotrivial.Proof. We argue by induction on the number n − m + 1 ≥ n − m + 1 = 1then the result is obvious without using (6.2). Suppose we have done it for n − m ≥ n − m in (6.1), and let us add another equation, namely thelast one in (6.1).Restricting to i < n and the appropriate j in (6.2), we see from the induction hypothe-sis that a suitable G -automorphism trivializes the first n − m equations, without botheringabout X n . This means that we can assume that all a ( i, j ) = 0 ( i < n ) are in F K ; whilethe isotriviality of the last equation means that all a ( n, j ) = 0 are in G . We now want totrivialize the last equation.How can we trivialize a given coefficient a ( n, j ) = 0 in the last equation? If all a ( i, j ) = 0 ( i < n ), so that the first n − m equations did not involve X j , then we cansimply replace X j by a ( n, j ) X j and this will not change the first n − m equations. We dothis for all such j .If there is only a single j with some a ( i, j ) = 0 ( i < n ), then we can still replace X j by a ( n, j ) X j ; but we then have to correct the new coefficients a ( i,j ) a ( n,j ) = 0 of X j in the i thequation by replacing X i by a ( n, j ) X i ( i = m, . . . , n − j . Call these “bad”.Now we say for different j, j ′ in the set { , . . . , m − } that j ∼ j ′ if there is i < n with a ( i, j ) a ( i, j ′ ) = 0 (6 . j, j ′ are both bad). This relation is symmetric but probably not tran-sitive. We can extend it via reflexivity and transitivity to a genuine equivalence relationon the bad elements of { , . . . , m − } , which we then denote by ≃ .We assume for the moment that there is a single equivalence class: any two j, j ′ arerelated. 23et j, j ′ be different bad elements, so that a ( i, j ) = 0 , a ( i ′ , j ′ ) = 0 for some i, i ′ < n .From our equivalence class assumption j ≃ j ′ . Suppose that j = j ∼ j ∼ · · · ∼ j k − ∼ j k = j ′ , where of course we can take 2 ≤ k ≤ n + 1. Then we get from (6.3) a ( i , j ) a ( i , j ) = 0 , a ( i , j ) a ( i , j ) = 0 , . . . , a ( i k − , j k − ) a ( i k − , j k ) = 0for some i , i , . . . , i k − < n . We use (6.2) with i k = n to see that a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k − ) a ( n, j ′ ) a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k ) a ( n, j )lies in F K . However the first k − F K , because we trivialized the first n − m equations. Consequently a ( n,j ′ ) a ( n,j ) lies in F K .Thus we have shown that all a ( n, j ) for bad j are multiples of a single one, call it g ,by elements of F K . Now they can be simultaneously trivialized on replacing X j by gX j .Again we must correct the new coefficients a ( i,j ) g = 0 of X j in the i th equation by replacing X i by gX i ( i = m, . . . , n − { , . . . , m − } ? Say there are h ≥ J . . . , J h . Let I be the set of i in { m, . . . , n − } for which there is j in J with a ( i, j ) = 0; and similarly for I , . . . , I h .Then I , I , . . . , I h are disjoint, because for example with any j in J and any j in J there can be no i with a ( i, j ) a ( i, j ) = 0, else by (6.3) we would have j ∼ j . (If onewishes, one can convert the matrix of the first n − m equations into a block matrix usingrow and column permutations.) The argument above, using i , . . . , i k − in I , shows thatall non-zero a ( n, j ) ( j ∈ J ) are multiples of a single one, call it g , by elements of F K .Similarly we get g , . . . , g h . Now we can trivialize the last row as follows. We replace the X j ( j ∈ J ) by g X j and we correct the effect by replacing X i by g X i ( i ∈ I ). Similarlyusing g , . . . , g h we trivialize the remaining coefficients. This completes the proof.
7. Automorphisms.
As above let K be a field, finitely generated and transcendentalover F p , with G a subgroup of K ∗ . Suppose a linear variety in P n is defined over K and G -isotrivial. Then by definition there is a G -automorphism ψ taking it to somethingdefined over F K = F p ∩ K . To make our Theorems 1,2 and 3 fully effective we have to24stimate this ψ ; indeed after doing the whole descent to single points using Theorem 1, forexample, it is mainly G -automorphisms that are left.Now it is convenient to use the projective height h P = h P B defined on P l − ( K ) by h P ( x , . . . , x l ) = log Y w max {| x | d w w , . . . , | x l | d w w } . This yields at once a height h ( ψ ) of a G -automorphism ψ , defined by (1.7), as h ( ψ ) = h P ( g , . . . , g n ) . Also if V is linear in P n defined over K , it yields a height h ( V ) in the standard way viathe Grassmannian coordinates of V ; see for example [S] (p.28), which however is in thecontext of number fields with euclidean norms at the archimedean valuations. Here wehave no archimedean valuations, so the norm problem is irrelevant. If m − ≥ V , then its Grassmannians A ( I ) correspond to subsets I of { , . . . , n } withcardinality n − m + 1 ≤ n . The Northcott Property extends at once to this height. Alsofor ψ in (1.7) the Grassmannians of ψ ( V ) are the A ( I ) g ( I ) , where g ( I ) = Q i ∈ I g i . It followseasily that h ( ψ ( V )) ≤ h ( V ) + nh ( ψ ) , h ( ψ − ) ≤ nh ( ψ ) . (7 . W also over K . Lemma 7.1. If V ∩ W is non-empty then we have h ( V ∩ W ) ≤ h ( V ) + h ( W ) . If further X n − = 0 on V and the equations of V do not involve X n , and W is defined by X n = aX n − then h ( V ∩ W ) ≥ max { h ( V ) , h ( W ) } .Proof. The upper bound may be compared with the inequality h ( V ∩ W ) + h ( V + W ) ≤ h ( V ) + h ( W ) due independently to Struppeck-Vaaler [SV] (Theorem 1 p.493) and Schmidt[S] (Lemma 8A p.28). These are proved over number fields; however it is easily checkedthat the proof in [S] remains valid with trivial modifications. Already a special case wasnoted by Thunder [T] whose Lemma 5 (p.157) implies h ( V + W ) ≤ h ( V ) + h ( W ) overfunction fields of a single variable provided V ∩ W is empty.Regarding the lower bound, let A ( I ) be the Grassmannians of V . Then it is easy toverify that the Grassmannians of V ∩ W consist of the A ( I ) together with the aA ( J ) for J not containing n −
1. There follows h ( V ∩ W ) ≥ h ( V ) at once. Also X n − = 0 on V means that at least one A = A ( J ) is non-zero (see for example Theorem 1 of [HP] p.298),so we get also h ( V ∩ W ) ≥ h P ( A, aA ) = h ( a ) = h ( W ). This completes the proof.25t is the following result which enables ψ to be estimated in the Descent Steps. Lemma 7.2.
Suppose that V is defined over K and is G -isotrivial. Then there is a G -automorphism ψ with ψ ( V ) defined over F K and h ( ψ ) ≤ n ! h ( V ) .Proof. Suppose dim V = m − A ( I ); then as noted above theGrassmannians of ψ ( V ) are the A ( I ) g ( I ) , where g ( I ) = Q i ∈ I g i . If ψ ( V ) is defined over F K thenthese have the shape λα ( I ) for λ in K ∗ and α ( I ) in F K . Thus we have A ( I ) = λα ( I ) g ( I )for all I ; but we can restrict to the set I of all I with A ( I ) = 0 (and so α ( I ) = 0). We caneliminate the λ by fixing I in I ; this gives g ( I ) g ( I ) = A ( I ) A ( I ) α ( I ) α ( I ) ( I ∈ I ) . (7 . ψ ( V ) is defined over F K .To solve (7.2) for g , . . . , g n we divide the numerator and denominator of the left-hand side by g n − m +10 and write it as ( g g ) a ( I, · · · ( g n g ) a ( I,n ) for integers a ( I, i ) which are0 , , −
1. If the vectors a ( I ) ( I ∈ I ) with coordinates a ( I, i ) ( i = 1 . . . , n ) have full rank n then we can solve (7.2) by choosing a ( I ) , . . . , a ( I n ) linearly independent and then solvingthe subset of (7.2) with I = I , . . . , I n . A multiplicative form of Cramer’s Rule gives (cid:18) g i g (cid:19) b = Q b i · · · Q b in n , Q j = A ( I j ) A ( I ) α ( I ) α ( I j ) ( j = 1 , . . . , n )with integers b = 0 and b ij . These b ij are minors of a matrix with entries 0 , , − | b ij | ≤ ( n − | b | h (cid:18) g g , . . . , g n g (cid:19) ≤ max i =1 ,...,n {| b i | + · · · + | b in |} h ( Q , . . . , Q n ) . The height on the left is h ( ψ ) and that on the right at most h ( V ). The result follows atonce, at least under our assumption that the a ( I ) ( I ∈ I ) have full rank n .If this assumption does not hold, then we simply increase the rank by successively ad-joining unit vectors e k until the rank becomes n ; this amounts to the addition of equations g k g = 1. Now we take a subset of n independent equations and solve again with Cramer.The resulting estimates are certainly no larger than before, and this completes the proof.
8. A proposition.
This, the main result of the present section, is a first step in the proofof the Descent Step over √ G , with V in P n ( n ≥
2) either a hyperplane or defined over26 finite field. We continue with our assumption that K is finitely generated over F p ; thus F K = F p ∩ K is a finite field. Let G in K ∗ be finitely generated of rank r ≥ F ∗ K ; now we may write without confusion simply that G is finitely generated. It is knownthat the radical √ G , which by definition lies still in K , is also finitely generated (see forexample [Mass] p.195), also clearly of rank r over F ∗ K . For the moment we work exclusivelywith this radical. We further assume that K is transcendental over F p and we choose anyseparable transcendence basis B ; then we are free to apply the results of sections 3,4 and5 about heights h = h B and regulators R = R B .We say that V is transversal if every coordinate X i ( i = 0 , . . . , n ) actually occursin the defining equations. This property is independent of the choice of equations. Itspurpose is to prevent “free variables” as in (1.1) with a i = 0.Transversality is a harmless restriction because we could overcome it simply by work-ing in lower dimensions. Clearly every linear subvariety of a transversal variety is alsotransversal. Also a transversal variety must be proper (i.e. not the full P n ).We recall the function δ from Lemma 4.3. Proposition . Let V be a transversal linear subvariety of P n defined over K , and supposeeither that V has dimension n − or that V is defined over some F q . Suppose also that V is not contained in any coset T = P n . Let π be any point of V ( √ G ) .If V has dimension n − , then either(i) there is a proper linear subvariety W of V , also defined over K , with h ( W ) ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) , such that π lies in W ( √ G ) ,or(ii) there is a √ G -automorphism ψ with h ( ψ ) ≤ npδ ( n + r ) R ( √ G ) , a point π ′ and a linear subvariety V ′ of P n such that π = ψ ( π ′ p ) and V = ψ ( V ′ p ) .If V is defined over F q , then either(i) there is a proper linear subvariety W of V , also defined over K , with h ( W ) ≤ n n dδ ( n + r ) R ( √ G ) , such that π lies in W ( √ G ) , r(iii) there is a point π ′ in P n ( √ G ) with π = π ′ p .Proof. Suppose first that V has dimension n −
1. Then we just have to follow the argumentsof the proof of Lemma 5 of [Mass] (p.197). Because these arguments are expressed in termsof “broad sets” and this notion is no longer appropriate, we write out all the details.Because V is transversal, we may work affinely with a point π = ( x , . . . , x n ) satisfyinga single equation a x + · · · + a n x n = 1 (8 . C for the field of p th powers in K , andconsider s = dim C ( Ca x + · · · + Ca n x n ) , so that 1 ≤ s ≤ n .First suppose that s = n . Then we apply Lemma 5.1 with k = F K , m = n and c = · · · = c m = 1 and g = a x , . . . , g m = a m x m . So the group must be enlarged byadjoining a , . . . , a n to √ G , becoming of rank at most n + r . The enlarged regulator R can be estimated by Lemma 4.2, and we find R ≤ n h ( a ) · · · h ( a n ) R ( √ G ) ≤ n h ( V ) n R ( √ G ) . (8 . b ) of Lemma 5.1 is ruled out by s = n ; and the conclusion ( a ) shows that h ( a x , . . . , a n x n ) ≤ n dδ ( n + r ) R . It follows that h ( π ) = h ( x , . . . , x n ) is at most4 n dδ ( n + r ) R + h ( a − , . . . , a − n ) ≤ n dδ ( n + r ) R + nh ( V )and so from (8.2) we deduce h ( π ) ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) + nh ( V ) ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) . (8 . W = { π } for ( i ) of the Proposition; and for these h ( W ) = h ( π ) is boundedas in (8.3).Next suppose that 1 < s < n . By means of a permutation we can assume that g = a x , . . . , g s = a s x s are linearly independent over C . Take any k with s + 1 ≤ k ≤ n ;28hen we can apply Lemma 5.2 with m = s and g = a k x k , √ G being enlarged as above.We find relations s X j =1 c kj a j x j = a k x k ( k = s + 1 , . . . , n ) (8 . c kj in C and the quotients f kj = c kj a j x j a k x k ( j = 1 , . . . , s ; k = s + 1 , . . . , n ) (8 . h ( f k , . . . , f ks ) ≤ s dδ ( n + r ) R ( k = s + 1 , . . . , n ) (8 . a k x k ( k = s + 1 , . . . , n ) in (8.1). We find c a x + · · · + c s a s x s = 1 (8 . c j = 1 + n X k = s +1 c kj ( j = 1 , . . . , s ) (8 . C .Next apply Lemma 5.1 with m = s to (8.7) and g j = a j x j ( j = 1 , . . . , s ) also in theenlarged √ G . Again conclusion ( b ) is impossible. It follows that the f j = c j a j x j ( j = 1 , . . . , s ) (8 . h ( f , . . . , f s ) ≤ s dδ ( n + r ) R . (8 . x j x k are bounded modulo C whereas in (8.9) certain x j themselves are bounded modulo C . We can eliminate C by substituting (8.8) into (8.9)and using (8.5) to get f j = a j x j + n X k = s +1 f kj a k x k ( j = 1 , . . . , s ) . (8 . a j = 0 ( j = 1 , . . . , s ) these express the fact that π = ( x , . . . , x n ) lies on a linearvariety V ′ of dimension n − s ; and because s = 1 this dimension is strictly less than thedimension n − V . So we can take W as the intersection of V ′ with V . This is in fact V ′ because if we add up all the above equations (8 .
11) and use (8.4),(8.5),(8.7),(8.9), thenwe end up with (8.1). 29ow we have to estimate the height of (8.11). In the corresponding matrix, everycolumn has by (8.6) and (8.10) height at most 4 s dδ ( n + r ) R + h ( V ), which as above in(8.3) we can estimate by B = 8 n n dδ ( n + r ) h ( V ) n R ( √ G ) . It follows that h ( W ) ≤ sB ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) . This too settles ( i ) of the Proposition.Finally suppose s = 1. This means that a x , . . . , a n x n are in C . By Lemma 4.4 with l = p we can write x j = g j x ′ pj with g j , x ′ j in √ G ( j = 1 , . . . , n ) and h ( g j ) ≤ pδ ( r ) R ( √ G ) ≤ pδ ( n + r ) R ( √ G ) ( j = 1 , . . . , n ) . Then a j g j is in C so has the form a ′ pj ( j = 1 , . . . , n ). Finally1 = a x + · · · + a n x n = a ′ p x ′ p + · · · + a ′ pn x ′ pn = ( a ′ x ′ + · · · + a ′ n x ′ n ) p , and this gives part ( ii ) of the Proposition, with ψ as in (1.7) above for g = 1, π ′ =( x ′ , . . . , x ′ n ), and V ′ defined by (8.1) above with the new coefficients a ′ , . . . , a ′ n .This proves the Proposition when V has dimension n −
1. Incidentally when thecoefficients in (8.1) are in some F q , then the argument for s = 1 shows that x , . . . , x n arein C . So they are p -th powers x ′ p , . . . , x ′ pn ; and clearly x ′ , . . . , x ′ n are in √ G . Thus we getthe conclusion ( iii ) of the Proposition when V has dimension n −
1. And the case s = 1leads of course to ( i ). So it remains only to treat V of dimension m − < n − F q .This we do by expressing the affine equations of V in triangular form, which after apermutation we can suppose are x i = a i + a i x + · · · + a i,m − x m − ( i = m, m + 1 , . . . , n ) (8 . a ij in F q . This gives V = V m ∩ · · · ∩ V n for the varieties defined individually byeach equation.Consider the first equation. There may be some zero coefficients a mj , but not all arezero, because V ( √ G ) is non-empty. In fact at least two are non-zero otherwise V wouldbe contained in a coset T = P n contrary to our assumption. We can thus regard V m asa transversal variety of codimension 1 in some projective space of dimension at least 230nd at most m < n . Applying the Proposition for the cases already proved, we get twopossibilities ( i ) , ( iii ). If ( i ) holds for V m , then we get a proper subvariety W m of V m with h ( W m ) ≤ n n dδ ( n + r ) R ( √ G ) . (8 . W m intersects the remaining intersection U m = T i = m V i in a proper subspace of V = V m ∩ U m . For example the triangular nature of(8.12) makes it clear that x m +1 , . . . , x n are determined by x , . . . , x m − on U m , and thenthat x m is determined by x , . . . , x m − on W m in V m ; but also some non-zero polynomialof degree at most 1 in x , . . . , x m − must vanish on W m . So W = W m ∩ U m has dimensionstrictly less than m −
1. By Lemma 7.1 we have h ( W ) ≤ h ( W m ). So by (8.13) we get ( i )of the Proposition for the original V . But what happens if ( iii ) holds for V m ?This means that all the x j actually occurring in the first equation of (8.12) are p -thpowers, which certainly goes some way in the direction of ( iii ) for V . But then we can trythe second equation instead. Either we get a W as above, or all the x j actually occurringin the second equation of (8.12) are p -th powers. And so on. In the end, we either get W orthat all the x j actually occurring in all the equations (8.12) are p -th powers. Because V istransversal this does give the full ( iii ) for V ; and so completes the proof of the Proposition.
9. The main estimate.
This is a quantitative version of our Descent Step over √ G without the requirement that the subvarieties W are isotrivial. This leads to a relativelysmall exponent attached to the height h ( V ). As before n ≥
2, and we continue withour assumption that K is finitely generated and transcendental over F p , with separabletranscendence basis B and F K = F p ∩ K ; further G is finitely generated of rank r ≥ F ∗ K . Main Estimate . Let V be a positive-dimensional linear subvariety of P n defined over K but not a coset.(a) If V is not √ G -isotrivial, then V ( √ G ) = [ W ∈W W ( √ G ) for a finite set W of proper linear subvarieties W of V , also defined over K and with h ( W ) ≤ n d (10 n δ ( n + r )) n +1 h ( V ) n R ( √ G ) n +2 . b) If V is √ G -isotrivial and ψ ( V ) is defined over F q , then V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! for a finite set W of proper linear subvarieties W of V , also defined over K and with h ( ψ ( W )) ≤ n n ( q/p ) dδ ( n + r ) R ( √ G ) . Proof.
We prove this first when V is transversal and not contained in any coset T = P n .We start with √ G -isotrivial V . Because we estimate h ( ψ ( W )) and not h ( W ), it clearlysuffices to assume that ψ is the identity, so that V is defined over F q . Take arbitrary π in V ( √ G ) not in V ( F K ). Then either ( i ) or ( iii ) of the Proposition holds.If ( i ) holds, then ( b ) looks good with e = 0 (and ψ the identity); at least π lies insome W ( √ G ) for a proper subvariety W of V , defined over K , with h ( W ) ≤ n n dδ ( n + r ) R ( √ G ) . (9 . iii ) holds? Now any a in F q has a unique p th root a p in F q , which is also aconjugate of a over F p . We get a new point π ′ in V ′ ( √ G ), also not in V ′ ( F K ), for a newvariety V ′ in P n which is a conjugate of V . The new variety has the same dimension as V , and is also defined over F q . So we can repeat the process, and again we get either ( i )or ( iii ) of the Proposition.If ( i ) holds, then π ′ lies in some W ′ ( √ G ) again with W ′ over K and h ( W ′ ) boundedas in (9.1). So π lies in ( W ′ ( √ G )) p as in (b) with e = 1.Or if ( iii ) holds, then we get a new point π ′′ in V ′′ ( √ G ) for a new conjugate V ′′ of V in P n .And so on, in a manner similar to the looping in the p -automata of [D] section 4. Be-cause π was not in V ( F K ), this procedure must eventually stop at some proper subvariety W ( L ) over K of V ( L ) (here the number L of repetitions might depend on π ). Now theoriginal point π lies in ( W ( L ) ( √ G )) p L with h ( W ( L ) ) bounded as in (9.1).Because π was arbitrary in V ( √ G ) not in the finite set V ( F K ), the conclusion so faris V ( √ G ) ⊆ [ W ∈W ∞ [ L =0 ( W ( √ G )) p L W of proper subvarieties W of conjugates of V defined over K and satisfying(9.1); here we may have to include single points W with h ( W ) = 0. To get equality wewrite q = p f and L = f e + l for e ≥ ≤ l ≤ f −
1; this gives V ( √ G ) ⊆ [ ˜ W ∈ ˜ W ∞ [ e =0 ( ˜ W ( √ G )) q e with a new collection ˜ W of proper subvarieties ˜ W = W p l of conjugates of V with h ( ˜ W ) = p l h ( W ) ≤ n n ( q/p ) dδ ( n + r ) R ( √ G ) . Finally by intersecting each ˜ W with V = V q we can assume that each ˜ W is a propersubvariety of V itself in the above, without increasing the height further. Because V isdefined over F q , the ( ˜ W ( √ G )) q e now lie in ( V ( √ G )) q e = V ( √ G ), and so at last the twosides are equal. Now we have the desired ( b ); of course the finiteness of the collection of˜ W follows from the Northcott Property already noted in section 7. This settles the caseof transversal √ G -isotrivial V not contained in a proper coset.Henceforth (until further notice) we will assume that V is not √ G -isotrivial (and stilltransversal not contained in a proper coset).Suppose first that V is a hyperplane. Take arbitrary π in V ( √ G ). Then either ( i ) or( ii ) of the Proposition holds. We regard this dichotomy as the starting stage l = 1.If ( i ) holds, then as before ( a ) of the Main Estimate looks good; at least π lies in some W ( √ G ) for a proper subvariety W of V , defined over K , with h ( W ) ≤ Ch ( V ) n (9 . C = 8 n n dδ ( n + r ) R ( √ G ) . (9 . ii ) holds? We get a new point π ′ in V ′ ( √ G ) for a new variety V ′ in P n with π = ψ ( π ′ p ) , V = ψ ( V ′ p ) . (9 . ψ is a √ G -automorphism with h ( ψ ) ≤ pB (9 . B = nδ ( n + r ) R ( √ G ) . (9 . V ′ is also a hyperplane, and also not √ G -isotrivial. So we can repeat the process,and again we get either ( i ) or ( ii ) of the Proposition. This dichotomy is the next stage l = 2.If ( i ) holds, then π ′ lies in some W ′ ( √ G ). So π lies in W ( √ G ) for W = ψ ( W ′ p ),almost as good as above, except that h ( W ) could be larger than before. We take care ofthis later.Or if ( ii ) holds, then we get a new point π ′′ in V ′′ ( √ G ) for a new variety V ′′ in P n .And so on. At stage l we get either π ( l − in a proper subvariety W ( l − of V ( l − with h ( W ( l − ) ≤ Ch ( V ( l − ) n (9 . π ( l ) in V ( l ) ( √ G ) for a new variety V ( l ) with π ( l − = ψ ( l − (( π ( l ) ) p ) , V ( l − = ψ ( l − (( V ( l ) ) p ) . (9 . h ( ψ ( l − ) ≤ pB. (9 . V is not √ G -isotrivial,and after a certain number L of repetitions which this time is independent of π . Actuallylet us define the integer L ≥ p L ≤ h ( V ) R ( √ G ) < p L +1 . (9 . V = ψ l (( V ( l ) ) p l ) with the √ G -automorphism ψ l = ψψ ′ p · · · ( ψ ( l − ) p l − . (9 . V in the affine form (8.1), we know that some coefficient x = a j = 0does not lie in √ G , and x = gy p l for some g in √ G and some y in K . We can now applyLemma 4.5, because √ G k there is just √ G . We conclude that p l ≤ h ( x ) R ( √ G ) ≤ h ( V ) R ( √ G ) .
34n view of (9.10) this means that ( ii ) cannot hold for l = L + 1. Thus there is some L with 0 ≤ L ≤ L such that ( ii ) holds at stages l = 1 , . . . , L (at least if L ≥ i ) holds at stage l = L + 1. We conclude that π ( L ) lies in W ( L ) , and from (9.7) h ( W ( L ) ) ≤ Ch ( V ( L ) ) n . (9 . π = ψ L (( π ( L ) ) p L ) lies in W = ψ L (( W ( L ) ) p L ). By (7.1) and (9.11) we get h ( W ) ≤ p L h ( W ( L ) ) + nh ( ψ L ) ≤ p L h ( W ( L ) ) + n (cid:16) h ( ψ ) + ph ( ψ ′ ) + · · · + p L − h ( ψ ( L − (cid:17) , which using (9.9) and (9.12) yields h ( W ) ≤ Cp L h ( V ( L ) ) n + 2 np L B ≤ C ( p L h ( V ( L ) ) n + 2 np L B. (9 . h ( V ( L ) ) we use (7.1), (9.8) and (9.9) to get ph ( V ( l ) ) = h (( ψ ( l − ) − V ( l − ) ≤ h ( V ( l − ) + n h ( ψ ( l − ) ≤ h ( V ( l − ) + n pB. If L ≥ p l − and sum from l = 1 to l = L , getting p L h ( V ( L ) ) ≤ h ( V ) + 2 n p L B (which holds also if L = 0). Inserting this into (9.13) we get h ( W ) ≤ C (cid:0) h ( V ) + 2 n p L B (cid:1) n + 2 np L B ≤ C (cid:0) h ( V ) + 2 n p L B (cid:1) n , and then using (9.6) and (9.10) with L ≤ L we find h ( W ) ≤ Ch ( V ) n (cid:16) n δ ( n + r ) R ( √ G ) (cid:17) n ≤ Ch ( V ) n (cid:16) n δ ( n + r ) R ( √ G ) (cid:17) n From (9.3) we get finally h ( W ) ≤ C ′ h ( V ) n R ( √ G ) n +2 (9 . C ′ = 16 n n dδ ( n + r ) (cid:0) n δ ( n + r ) (cid:1) n ≤ n d (cid:0) n δ ( n + r ) (cid:1) n +1 . Because π was arbitrary, the conclusion so far is V ( √ G ) ⊆ [ W ∈W W ( √ G )for a finite collection W of proper subvarieties W of V satisfying (9.14). But then the twosides are of course equal. This settles the Main Estimate for transversal hyperplanes V that are not √ G -isotrivial and not contained in a proper coset.35ext suppose that V , still not √ G -isotrivial (and still transversal not contained in aproper coset), has dimension m − m < n . So after a permutation of variablesit can be defined by equations (6.1). Each of these equations defines a hyperplane V i , sothat V = V m ∩ · · · ∩ V n .We claim that we can assume that all non-zero a ( i, j ) lie in √ G . Otherwise forexample V m is transversal and not √ G -isotrivial in the projective space with coordinates X j corresponding to j = m and the j with a ( m, j ) = 0. Since no X m − aX j ( m = j, a = 0)vanishes on V , this projective space has dimension at least 2. So then we could applythe hyperplane result (9.14) to deduce that all solutions lie in a finite union of propersubspaces W m of this V m with h ( W m ) ≤ C ′ h ( V m ) n R ( √ G ) n +2 . But as in the affine situation just after (8.13), it can be seen that W m intersects the remain-ing intersection U m = T i = m V i in a proper subspace of V = V m ∩ U m . For example the tri-angular nature of (6.1) makes it clear that X m +1 , . . . , X n are determined by X , . . . , X m − on U m , and then that X m is determined by X , . . . , X m − on W m in V m ; but also somenon-zero linear form in X , . . . , X m − must vanish on W m . Therefore W = W m ∩ U m hasdimension strictly less than m −
1. So we are indeed in a proper subspace as required by(a) of the Main Estimate. Further W = W m ∩ V and so h ( W ) ≤ h ( W m ) + h ( V ) by Lemma7.1; moreover h ( V m ) ≤ h ( V ) because the a ( m, j ) are themselves among the Grassmanniancoordinates of V . We end up with (9.14) with say an extra factor 2.So indeed from now on we can assume that all non-zero a ( i, j ) in (6.1) lie in √ G . Thismeans that we are set up to apply Lemma 6.1. We will see that the effect is to pass toa proper subvariety of at least one of V m , . . . , V n despite their being separately isotrivial.As V is not √ G -isotrivial by assumption, we find some quotient (6.2), say Q , not lyingin F K . Let π = ( ξ , . . . , ξ n ) be any point of V ( √ G ). For a typical factor a ( i,j ) a ( i,j ′ ) in Q weapply part ( b ) of the Main Estimate in lower dimensions to V i , with ψ i determined by 1and the non-zero a ( i, j ). So here q = p . We find finitely many proper subspaces W i of V i such that ψ i ( V i ( √ G )) lies in the union of the S ∞ e =0 ( ψ i ( W i )( √ G )) p e , with h ( ψ i ( W i )) ≤ n n dδ ( n + r ) R ( √ G ) (9 . p ). In particular, writing π i for the projection of π to the lowerdimensional space, we have equations ψ i ( π i ) = σ q i i (9 . σ i in some ψ i ( W i ) and some power q i of p . Thus a ( i,j ) ξ j a ( i,j ′ ) ξ j ′ = η q i for certain η = η ( i, j, j ′ )in K ∗ . Multiplying all these over the factors in (6.2) we find Q = η q · · · η q k k for certain η , . . . , η k in K ∗ . Because the fixed Q is not in F K , this forces q = min { q , . . . , q k } to bebounded above by some quantity depending only on V . In fact h ( Q ) ≥ q , but on the otherhand from (6.2) we see that h ( Q ) ≤ ( n + 1) h ( V ). Thus q ≤ ( n + 1) h ( V ) . (9 . q = q i . Now (9.16) says that π i and so π lies in the variety U = ψ − i ( ψ i ( W i )) q of dimension strictly less than the dimension of V i . This intersects V i in a proper subvariety W ′ i of V i . Once more this W ′ i intersects the remaining intersection T i ′ = i V i ′ in a proper subvariety W of V . As for heights, we have W = W ′ i ∩ V so h ( W ) ≤ h ( W ′ i ) + h ( V ). Also h ( W ′ i ) ≤ h ( U ) + h ( V i ) ≤ h ( U ) + h ( V ), and also h ( U ) ≤ qh ( ψ i ( W i )) + nh ( ψ − i ) ≤ qh ( ψ i ( W i )) + n h ( V i )because of the definition of ψ i . Putting these together and using (9.15),(9.17) we concludethat h ( W ) ≤ n ( n + n + 3)4 n dδ ( n + r ) h ( V ) R ( √ G ) . This is much smaller than (9.14), and so we have completed the proof of the Main Estimatewhen V is transversal and not contained in a proper coset. In case (a) we have reached sofar the bound h ( W ) ≤ Ah ( V ) n R n +2 with R = R ( √ G ) and A = 4 n d (10 n δ ( n + r )) n +1 due to the extra factor 2 encountered after establishing (9.14).To treat the more general situation when V is transversal and not itself a coset,we use induction on n ≥
2, and we will obtain in case (a) the slightly weaker result h ( W ) ≤ Ah ( V ) n R n +2 + nh ( V ). This leads at once to the bound given in the MainEstimate.If n = 2 then there is a single equation a X + a X + a X = 0, and transversalityimplies all a i = 0. Thus no X i − aX j ( i = j, a = 0) vanishes on V , and we are done. Thuswe can suppose that n ≥ X n − aX n − ( a = 0) vanishes on V . In the remaining equations for V we may eliminate X n to obtain a linear variety ˜ V in P n − . This ˜ V cannot be a coset otherwise V would be. Also ˜ V certainly involves thevariables X , . . . , X n − and so is transversal in P ˜ n for ˜ n = n − n = n −
1. Here ˜ n ≥ n = 3; but in that case if ˜ V is not transversal in P then V would be defined by37quations X = aX and b X + b X = 0 so would be a coset. Thus we can assume that˜ V is transversal in P ˜ n with ˜ n ≥ V is not √ G -isotrivial as in (a). Then ˜ V cannot be √ G -isotrivialotherwise we could transform X n to make V isotrivial. Thus by induction the MainEstimate holds for ˜ V . It is now relatively straightforward to deduce the Main Estimatefor V . Thus by case (a) for ˜ V we get˜ V ( √ G ) = [ ˜ W ∈ ˜ W ˜ W ( √ G ) (9 . W of proper linear subvarieties ˜ W of ˜ V , also defined over K and with h ( ˜ W ) ≤ Ah ( ˜ V ) n R n +2 + ( n − h ( ˜ V ). Now we will check that (a) for V follows with W defined by the equations of ˜ W together with X n = aX n − . First the upper bound ofLemma 7.1 gives h ( W ) ≤ h ( ˜ W ) + h ( a ) ≤ Ah ( ˜ V ) n R n +2 + ( n − h ( ˜ V ) + h ( a ) . (9 . X n − = 0 on ˜ V , else (9.18) would be empty; and so the lower bound ofLemma 7.1 gives h ( V ) ≥ max { h ( ˜ V ) , h ( a ) } . Therefore (9.19) implies h ( W ) ≤ Ah ( V ) n R n +2 + nh ( V )as required.And in case (b) for √ G -isotrivial V (assuming as above that ψ is the identity) wesee that ˜ V is √ G -isotrivial and a lies in F q . We get (b) for V from (b) for ˜ V using theanalogue ˜ V ( √ G ) = S ˜ W ∈ ˜ W S ∞ e =0 ( ˜ W ( √ G )) q e of (9.18) with as above W defined by theequations of ˜ W together with X n = aX n − ; now h ( W ) ≤ h ( ˜ W ).What if V is not transversal (and of course still not a coset)? Then it is transversal(and still not a coset) in some projective subspace of dimension n ′ ≤ n −
1. Here n ′ ≥ n ′ now leadimmediately to the same cases in P n ; we have merely ignored n − n ′ projective variablesthat were never in the equations anyway.This finally finishes the proof of the Main Estimate.In view of the fact that the estimate in case (a) is independent of the characteristic p , it may seem a nuisance that the estimate in case (b) depends on p . But actually this is38navoidable, and there are even examples to show that the full q/p is needed. To see this,take any power q > p , and define K = F q ( t ) with G = √ G generated by t, − t and agenerator ζ of F ∗ q . Here we have r = 2 , R ( √ G ) = √ d = 1. The affine equations x + y = 1 , x + ζz = 1give rise to a √ G -isotrivial line V (with h ( V ) = 0 and ψ the identity), and an upper bound B in (b) would mean that all solutions over √ G are given by w, w q , w q , . . . for some w with h ( w ) ≤ B . Thus every solution π would have either h ( π ) ≤ B or h ( π ) ≥ q . But π = ( x, y, z ) = (cid:18) (1 − t ) q/p , t q/p , t q/p ζ (cid:19) is a solution with h ( π ) = q/p . It follows that B ≥ q/p .
10. Isotrivial W . We show here how to ensure that all the subvarieties W in the MainEstimate can be made √ G -isotrivial, at the expense of enlarging the exponents in theupper bounds for their heights. To simplify the various expressions we abbreviate thefactors in case (a) of the Main Estimate by∆ = ∆( n, r, d ) = 8 n d (10 n δ ( n + r )) n +1 ≥ , h = h ( V ) , R = R ( √ G ) , (10 . n, r, d, p, q ) = 8 n n ( q/p ) dδ ( n + r ) ≥ . (10 . ρ ( m ) = ρ n ( m ) = (2 n ) m − n − , η ( m ) = η n ( m ) = (2 n ) m ( m = 1 , , . . . ) Main Estimate for isotrivial W . Let V be a linear subvariety of P n defined over K butnot a coset, with dimension m − ≥ .(a) If V is not √ G -isotrivial, then V ( √ G ) = [ W ∈W W ( √ G )39 or a finite set W of proper linear √ G -isotrivial subvarieties W of V , also defined over K and with h ( W ) ≤ (∆ R n +2 ) ρ ( m ) h η ( m ) (10 . (b) If V is √ G -isotrivial and ψ ( V ) is defined over F q , then V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! for a finite set W of proper linear √ G -isotrivial subvarieties W of V , also defined over K and with h ( ψ ( W )) ≤ (∆ R n +2 ) ρ ( m − (Ψ R ) η ( m − . Proof.
We start with case (a), and now we can write the bound as h ( W ) ≤ ∆ h n R n +2 (10 . W not necessarily √ G -isotrivial. We show by induction on the dimension m − ≥ V that the increased bound h ( ˜ W ) ≤ (∆ R n +2 ) ρ ( m ) h η ( m ) (10 . W are √ G -isotrivial.When m = 2 then the W are points and so automatically √ G -isotrivial as long as W ( √ G ) is non-empty.When m ≥ W is not √ G -isotrivial. We observe that such a W cannot be a coset T . For the latter is defined by finitely many X i = a ij X j ( a ij = 0),and if T ( √ G ) is non-empty then clearly each a ij lies in √ G . But now it is easy to see that T is √ G -isotrivial after all. For example we can rewrite the equations as a i X i = a j X j with a i , a j in √ G . Then we can set up an equivalence relation on { , , . . . , n } characterized bythe equivalence of such i, j . And now we need change only the variables in the equivalenceclasses of cardinality at least 2 in order to trivialize T .So by induction each of these W satisfies W ( √ G ) = [ ˜ W ∈ ˜ W ˜ W ( √ G )40ith √ G -isotrivial ˜ W such that h ( ˜ W ) ≤ (∆ R n +2 ) ρ ( m − h ( W ) η ( m − . Therefore all we have to do is substitute (10.4) into this. We find the upper bound (10.5)because ρ ( m −
1) + η ( m −
1) = ρ ( m ) , nη ( m −
1) = η ( m ) . For case (b) we write the bound as h ( ψ ( W )) ≤ Ψ R (10 . W not necessarily √ G -isotrivial. If some W is not √ G -isotrivial, then neither is ψ ( W ), and we can write ψ ( W )( √ G ) = [ W ∗ ∈W ∗ W ∗ ( √ G )with √ G -isotrivial W ∗ such that h ( W ∗ ) ≤ (∆ R n +2 ) ρ ( m − h ( ψ ( W )) η ( m − . (10 . h ( ψ ( ˜ W )) ≤ (∆ R n +2 ) ρ ( m − (Ψ R ) η ( m − (10 . W = ψ − ( W ∗ ) are √ G -isotrivial. In fact just as above, all wehave to do is substitute (10.6) into (10.7), and we find at once (10.8). This completes theproof.
11. Points over G . We show here how to replace V ( √ G ) and W ( √ G ) in the MainEstimate by V ( G ) and W ( G ) at the expense of worsening the dependence on the regulator.However we no longer insist that the W are isotrivial. If needed, this could be secured justby repeating the arguments of the previous section. We retain the notations (10.1),(10.2)from that section. Of course n ≥
2, and we continue with our assumption that K is finitelygenerated over F p , with F K = F p ∩ K ; further G is finitely generated of rank r ≥ F ∗ K . 41 ain Estimate for points over G . There is a positive integer f = f K ( G ) ≤ [ √ G : G ] ,depending only on K and G , with the following property. Let V be a positive-dimensionallinear subvariety of P n defined over K but not a coset.(a) If V is not √ G -isotrivial, then V ( G ) = [ W ∈W W ( G ) for a finite set W of proper linear subvarieties W of V , also defined over K and with h ( W ) ≤ ∆ h n R ( √ G ) n +2 . (b) If V is √ G -isotrivial and ψ ( V ) is defined over F q , then either(ba) we have V ( G ) = [ W ∈W W ( G ) for a finite set W of proper linear subvarieties W of V , also defined over K and with h ( ψ ( W )) ≤ | F K | Ψ R ( G ) or (bb) we have V ( G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( G )) q fe ! (11 . for a finite set W of proper linear subvarieties W of V , also defined over K and with h ( ψ ( W )) ≤ q f | F K | Ψ R ( G ) . (11 . φ is the Euler function. Lemma 11.1.
For a given power
Q > of a prime P consider a finite collection ofcongruence equations LQ e ≡ M mod N (11 . with N taken from a finite set N of positive integers and L, M taken from Z . Suppose thatthe set of solutions e ≥ is non-empty. Then if there is some M = 0 with ord P M < ord P N this set is(a) finite with Q e ≤ max N ∈N N , nd otherwise(b) a finite union of arithmetic progressions e = e , e + f, e + 2 f, . . . with f = Q N ∈N φ ( N ) and Q e < Q f max N ∈N N .Proof. Suppose first that there is some M = 0 with ord P M < ord P N . Then the corre-sponding L = 0, and we get e ord P Q ≤ ord P LQ e = ord P M < ord P N giving case (a).Thus we can assume that ord P M ≥ ord P N whenever M = 0. We proceed to verifycase (b). Now the congruences (11.3) can be split into congruences modulo powers of P and congruences modulo powers ˜ P m of other primes ˜ P = P .The former congruences, if any, will be satisfied as soon as e is sufficiently large.Indeed they amount to LQ e ≡ P ord P N and so conditions e ≥ λ for various real λ ≤ ord P N ord P Q ; that is, Q λ ≤ P ord P N ≤ N . Thus together they give a single condition e ≥ Λfor some real Λ with Q Λ ≤ max N ∈N N .We note that whether e satisfies the other congruences depends only on its congruenceclass modulo f . For if ˜ P m divides some N then φ ( ˜ P m ) divides φ ( N ) which divides f , andso Q f ≡ P m .Thus the solutions e satisfy e ≥ Λ and also must lie in a finite number of arithmeticprogressions modulo f . If e is the smallest member of one of these progressions with e ≥ Λ, then e − f < Λ and this leads to case (b), thereby completing the proof.We can now start on the proof of the Main Estimate for points over G .Suppose first that V is not √ G -isotrivial. Then (a) of the Main Estimate gives V ( √ G ) = [ W ∈W W ( √ G )for W satisfying (10.4). Now we can descend to G simply by intersecting with P n ( G ).Next suppose that V is √ G -isotrivial and ψ ( V ) is defined over F q . Using elementarydivisors we can find generators γ , . . . , γ r of √ G modulo constants and positive integers d , . . . , d r such that γ d , . . . , γ d r r generate G modulo constants. The constants can be taken43are of with an extra γ generating √ G ∩ F K and γ d generating G ∩ F K ; here d dividesthe order of γ as a root of unity. Thus[ √ G : G ] = d d · · · d r . (11 . ψ ( X , . . . , X n ) = ( ψ X , . . . , ψ n X n )with ψ i = γ a i γ a i · · · γ a ri r ( i = 0 , . . . , n ) (11 . √ G . Now (b) of the Main Estimate gives V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! (11 . W satisfying (10.6). But we can no longer descend to G simply by intersecting with P n ( G ).Consider a point π = ( π , . . . , π n ) of V ( G ). By (11.6) there is a point σ = ( σ , . . . , σ n )in some W ( √ G ) and some e ≥ π = ψ − ( ψ ( σ )) q e . As in (11.5) we write σ i = γ b i γ b i · · · γ b ri r ( i = 0 , . . . , n ); (11 . π is over G and so π i = γ c i d γ c i d · · · γ c ri d r r ( i = 0 , . . . , n ) . Equating exponents we find a system of congruences( a ji + b ji ) q e ≡ a ji mod d j ( i = 0 , . . . , n ; j = 0 , , . . . , r ) (11 . σ . We can apply Lemma 11.1, and the argument splits into two accord-ing to the conclusion. As the b ji in (11.7) appear only in the coefficients L , the splittingis independent of σ .Suppose first that Lemma 11.1(a) holds. Then q e ≤ max { d , d , . . . , d r } ≤ d d · · · d r = [ √ G : G ] (11 . π lies in the finitely many ˜ W = ψ − ( ψ ( W )) q e , which we can put togetherinto a set ˜ W , and then we have shown that V ( G ) ⊆ [ ˜ W ∈ ˜ W ˜ W ( √ G ) . P n ( G ) gives the same inclusion but with ˜ W ( G ) on the right-handside. On the other hand˜ W = ψ − ( ψ ( W )) q e ⊆ ψ − ( ψ ( V )) q e = ψ − ( ψ ( V )) = V because ψ ( V ) is defined over F q . Thus we conclude V ( G ) = [ ˜ W ∈ ˜ W ˜ W ( G )as in (ba) of the Main Estimate for points over G . But now from (11.9) and (10.6) theheights satisfy h ( ψ ( ˜ W )) = q e h ( ψ ( W )) ≤ d d · · · d r Ψ R ( √ G ) . Using Lemma 4.1 we see that R ( G ) = d · · · d r R ( √ G ), and so we can absorb some termsinto the regulator to get h ( ψ ( ˜ W )) ≤ d Ψ R ( G ) ≤ | F K | Ψ R ( G ) . (11 . e = e + f ˜ e with ˜ e ≥ e bounded as in (11.9) but with an extra q f . In particular taking ˜ e = 0we get a solution of (11.8) and this means that ˜ σ = ψ − ( ψ ( σ )) q e is also defined over G .It lies in ˜ W = ψ − ( ψ ( W )) q e (11 . W ( G ). We also have ψ ( π ) = ( ψ ( σ )) q e = ( ψ (˜ σ )) ˜ q ˜ e for ˜ q = q f . Thus we conclude V ( G ) ⊆ ψ − [ ˜ W ∈ ˜ W ∞ [ ˜ e =0 ( ψ ( ˜ W )( G )) ˜ q ˜ e (11 . W of ˜ W in (11.11). On the other hand ψ ( ˜ W ) ˜ q ˜ e = ( ψ ( W )) q e ˜ q ˜ e ⊆ ( ψ ( V )) q e ˜ q ˜ e = ψ ( V )again because ψ ( V ) is defined over F q . Thus we conclude equality in (11.12).45inally we calculate that h ( ψ ( ˜ W )) = q e h ( ψ ( W )) is bounded above by q f max { d , d , . . . , d r } Ψ R ( √ G ) ≤ q f | F K | Ψ R ( G ) (11 . f = φ ( d ) φ ( d ) · · · φ ( d r ) depends only on K and G with f ≤ d d · · · d r = [ √ G : G ] . This completes the proof of (bb); and so the Main Estimate for points over G is proved.In (11.13) the term q f cannot be so easily absorbed into the regulator without intro-ducing an exponential dependence on R ( G ). Let us discuss some aspects of this.When G = √ G then f = 1 in (bb) and we are more or less back to (b) of the MainEstimate. But in general we need the extra f in (11.1). The following example shows thatit sometimes must be almost as large as [ √ G : G ].We go back to the equation t m x + y = 1 of (1.5) over K = F p ( t ), with n = 2. It is to besolved in the group G = G l generated by t l and 1 − t , so that r = 2. Here √ G is generatedby t and 1 − t together with a generator ζ of F ∗ p . The equation defines a √ G -isotrivial line V with ψ ( x, y ) = ( t m x, y ) = (˜ x, ˜ y ), so that ˜ V = ψ ( V ) is defined by ˜ x + ˜ y = 1, with q = p .Now Leitner [Le] has found all points on ˜ V ( √ G ). If p is odd there are p − F p together with six infinite families(˜ x, ˜ y ) = (˜ x p ˜ e , ˜ y p ˜ e ) (˜ e = 0 , , . . . ) , where (˜ x , ˜ y ) are given by( t, − t ) , (1 − t, t ) , (cid:18) t , − − tt (cid:19) , (cid:18) − − tt , t (cid:19) , (cid:18) − t , − t − t (cid:19) , (cid:18) − t − t , − t (cid:19) . The ( x, y ) = ψ − (˜ x, ˜ y ) = ( t − m ˜ x, ˜ y ) are all the points on V ( √ G ). Choosing m not divisibleby l , we see that none of the constant points give rise to points of V ( G ). Similarly for thesecond family above. And the same is true of the last four families above, simply becauseof the minus signs. However the first family gives ( t − m t p ˜ e , (1 − t ) p ˜ e ), which is in G if andonly if p ˜ e ≡ m mod l. (11 . p , there are infinitely manyprimes l for which p is a primitive root modulo l . And Heath-Brown’s Corollary 2 of46He] (p.27) implies that this is true for at least one of p = 3 , ,
7. We can choose m with 1 ≤ m < l with p l − ≡ m mod l . Now (11.14) implies ˜ e ≡ l − l − e = l − l − e ( e = 0 , , . . . ). Thus the surviving points on V ( G ) are just the π = ψ − ( ψ ( W )) p ( l − e ( e = 0 , , . . . ) (11 . W as the single point ( t − m t p l − , (1 − t ) p l − ). This makes it clear that f ≥ l − √ G : G ] = ( p − l for fixed p .We could also see this from (11.2). For as R ( G ) = l √
3, it implies that there wouldbe a point π on V ( G ) with h ( ψ ( π )) ≤ cp f l for c absolute. But the point (11.15) has y = ˜ y = (1 − t ) p l − p ( l − e so h ( ψ ( π )) ≥ p l − p ( l − e ≥ p l − . (11 . l → ∞ , we deduce f ≥ l − c ′ log l , also almost as big as [ √ G : G ] = ( p − l .Less precisely, there can be no estimate h ( ψ ( W )) ≤ C ( n, r, K ) ( h ( V ) + R ( G )) κ replacing (11.2) which is polynomial in h ( V ) and R ( G ) for fixed n, r, K . For this wouldgive a point with h ( ψ ( π )) ≤ c ′′ ( m + l ) κ ≤ c ′′′ l κ , contradicting (11.16). Similarly one seesthat if the dependence on h ( V ) is polynomial, then the dependence on R ( G ) must beexponential. This explains the large solutions like (1.16), with p = 2 , l = 83 , m = 42.
12. Proof of Descent Steps and Theorems.
In the Descent Steps the variety V is certainly defined over a finitely generated transcendental extension K of F p , and nowwe can choose any separable transcendence basis to obtain a height function. Now theDescent Step over √ G follows from the Main Estimate for isotrivial W . And the DescentStep over G follows, at least without the assumption that the W are √ G -isotrivial, fromthe Main Estimate for points over G . This assumption can be removed by induction justas in section 10 (without bothering about estimates): any W that is not √ G -isotrivial canbe replaced by a finite union of √ G -isotrivial varieties.To prove Theorem 1 we may assume that V has positive dimension. We apply theMain Estimate for points over G repeatedly, taking always q = | F K | f K ( G ) for safety. With V = V , an arbitrary point π of V ( G ) is either a point of W ( G ) for finitely many W in V with dim W ≤ dim V −
1, or a point ψ − ϕ e ψ ( π ) for π in V ( G ) for finitely many V V with dim V ≤ dim V − e ≥
0, with ψ ( V ) defined over F K . Then weargue similarly with π ; and so on. After at most dim V ≤ n − T = V h , and only finitely many ψ , . . . , ψ h turn up on the way, leading to expressions asin (1.12) and thereby establishing Theorem 1.For later use we note that not just the varieties T but also the whole unions [ ψ , . . . , ψ h ] T lie in the variety V . Why is this? Well, a typical point of the union has the shape π = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ ) for some e , . . . , e h and some τ in T . The descent forTheorem 1 provides linear varieties V = V , V , . . . , V h = T . Now clearly τ lies in T inside V h − , so ψ − h ϕ e h ψ h ( τ ) lies in ψ − h ϕ e h ψ h ( V h − ) = ψ − h ψ h ( V h − ) = V h − inside V h − . In the same way ( ψ − h − ϕ e h − ψ h − )( ψ − h ϕ e h ψ h )( τ ) lies in V h − inside V h − .Continuing backwards we see that π = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ ) lies in V .We leave it to the reader to check, by a straightforward induction argument like thatin section 10 and also using Lemma 7.2, that for Theorem 1 one can takemax { h ( ψ ) , . . . , h ( ψ h ) , h ( T ) } ≤ (2 q ∆ R ( G ) n +2 ) ρ ( m ) h ( V ) η ( m ) (12 . R ( G ) and h ( V ); however,as we noted, an exponential dependence on R ( G ) may be hiding in q = | F K | f K ( G ) .For the symmetrization argument in the proof of Theorem 2 we need a version ofLemma 8.1 (p. 209) of [D], partly removed from its recurrence context. Lemma 12.1.
For m ≥ and x , . . . , x m , y , . . . , y m in K suppose that x y q l + · · · + x m y q l m = 0 (12 . for all large l . Then this holds for all l ≥ .Proof. The proof will be by induction on m , the case m = 1 being trivial. For the inductionstep we can clearly assume that x , . . . , x m are non-zero. Now we note that (12.2) forany m consecutive integers l = g, g + 1 , . . . , g + m − y , . . . , y m over F q . For if we regard these as linear equations for x , . . . , x m , the underlyingdeterminant is the q g power of that with entries y q j − i ( i, j = 1 , . . . , m ), and it is well-knownthat the latter, a so-called Moore determinant, is up to a constant the product of the β y + · · · + β m y m taken over all ( β , . . . , β m ) in P m − ( F q ) (see for example [Go] Corollary48.3.7 p.8). Thus after permuting we can suppose that y m = α y + · · · + α m − y m − for α , . . . , α m − in F q . Substituting into (12.2) gives( x + α x m ) y q l + · · · + ( x m − + α m − x m ) y q l m − = 0 , which therefore also holds for all large l . By the induction hypothesis we conclude thatthis holds for all l ≥
0, which leads back to (12.2) for all l ≥ ψ , . . . , ψ h ] T ( G ) coming from Theorem 1. Fix τ in T ( G ); then T = τ S for a linear subgroup S .We argue first on the geometric level. According to (1.12) a typical point of [ ψ , . . . , ψ h ] T has the shape ψ q − ψ q q − q ψ q q q − q q · · · ψ q ··· q h − q ··· q h − h ( τ σ ) q ··· q h with q i = q e i ( i = 1 , . . . , h ) and σ in S ; here we are regarding the ψ i ( i = 1 , . . . , h ) asmultiplication by points instead of automorphisms. This expression can be written as π π q π q q · · · π q ··· q h − h − π q ··· q h h σ q ··· q h (12 . π = ψ − , π = ψ − ψ , . . . , π h − = ψ − h ψ h − , π h = ψ h τ . (12 . q l i = q · · · q i ( i = 1 , . . . , h ) we certainly get a point of ( π , π , . . . , π h ) S according to (1.14); but at the moment we have asymmetry l ≤ · · · ≤ l h . We eliminatethe inequalities here as in [D] (p.212).Let us start with the last inequality. We can write (12.3) as ξη q l with ξ and η independent of l = l h . We already remarked that [ ψ , . . . , ψ h ] T lies in V , so (12.3) does.Thus for each linear form L defining V we have L ( ξη q l ) = 0 for all l , . . . , l h − , l with0 ≤ l ≤ · · · ≤ l h − ≤ l . Fixing l , . . . , l h − , we see from Lemma 12.1 that this equationfor all large l implies the same equation for all l ≥
0. Thus the inequality l h − ≤ l h has indeed been eliminated. Similar arguments work for the other conditions, as is clearfrom the arguments of [D] (p.212) after equation (22). For example, the next step fixes l , . . . , l h − , l h but not l = l h − .Looking back at (12.3), we have therefore proved that all the points π π r π r · · · π r h − h − π r h h σ r h (12 . V , where the integers r i = q l i ( i = 1 , . . . , h ) now range independently over all positiveintegral powers of q . This is the required symmetrization at the geometric level.It actually shows that the entire ( π , π , . . . , π h ) S lies in V . For a typical point of theformer has the shape π π r π r · · · π r h − h − π r h h ˜ σ (12 . σ in S . And there is σ in S with σ r h = ˜ σ . This could be interpreted as somethingabout the divisibility of group varieties; but for us it is just a simple consequence of thefact that S is defined by equations X i = X j . And now (12.6) and (12.5) are equal.At the arithmetic level we claim that ( π , π , . . . , π h ) S ( G ) lies in V ( G ). In fact everypoint π = π π r π r · · · π r h − h − π r h h (12 . r ≤ · · · ≤ r h has the shape (12.3) (with all coordinates of σ equal to 1).It therefore lies in [ ψ , . . . , ψ h ] T ( G ) which is in turn contained in V ( G ). In particular π lies in P n ( G ). But why does it continue to lie in P n ( G ) when the asymmetry is lifted?Well, we can take r = · · · = r h = 1 in (12.7) to see that the product π π · · · π h (12 . P n ( G ). Then taking r = · · · = r h − = 1 , r h = q we can deduce that π q − h liesin P n ( G ). And taking r = · · · = r h − = 1 , r h − = r h = q we deduce that π q − h − lies in P n ( G ). And so on, until we see that all of π q − , . . . , π q − h (12 . P n ( G ) (this was already remarked in section 1).And now if r , . . . , r h are arbitrary integral powers of q in (12.7) we can write π = ( π π · · · π h ) π r − · · · π r h − h to see from (12.8) and (12.9) that indeed π lies in P n ( G ).Now any point of ( π , π , . . . , π h ) S ( G ) by (12.5) has the form πσ r h with π as aboveand σ in S ( G ). It follows that ( π , π , . . . , π h ) S ( G ) lies in V ( G ) as claimed.On the other hand, taking all coordinates of σ as 1 in (12.3) shows that [ ψ , . . . , ψ h ] { τ } lies in ( π , π , . . . , π h ) S ( G ). As we could have fixed τ arbitrarily in T ( G ), we see that[ ψ , . . . , ψ h ] T ( G ) lies in ( π , π , . . . , π h ) S ( G ).50t follows that V ( G ) is indeed the union of the ( π , π , . . . , π h ) S ( G ), which completesthe proof of Theorem 2. We note for later use the fact, already observed, that each( π , π , . . . , π h ) S is contained in V .Here too we leave it to the reader to check using (12.1) that for Theorem 2 one cantake max { h ( π ) , h ( π ) , . . . , h ( π h ) } ≤ ( n + 1)(2 q ∆ R ( G ) n +2 ) ρ ( m ) h ( V ) η ( m ) . (12 . T ( G ) contains τ with h ( τ ) ≤ h ( T ).To prove part (1) of Theorem 3 we start from Theorem 1 with V = H . We first claimthat if some π in H ( G ) lies in some [ ψ , . . . , ψ h ] T ( G ) with T not a single point then some(1.2) fails for π . To see this, note that if T is not a single point, then there is a partitionof { , , . . . , n } into proper subsets I, J, . . . such that T is defined by the proportionality ofthe homogeneous coordinates X i ( i ∈ I ), X j ( j ∈ J ), and so on. We may suppose that I contains 0 and that the equations corresponding to I are g i X = g X i for i in I . Considerthe point τ I in P n whose coordinates X i = g i for i in I but with all other coordinates zero.It also lies in T .Now π = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ ) for some e , . . . , e h and some τ in T . From ourremark following the proof of Theorem 1, we see that π I = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ I )lies in H . Now τ and τ I have the same coordinates X i ( i ∈ I ). It follows that π and π I have the same coordinates X i ( i ∈ I ). Since the other coordinates of π I are zero, thismeans that (1.2) fails for π as claimed.Therefore H ∗ ( G ) is contained in a finite union of sets [ ψ , . . . , ψ h ] { τ } . And each ofthese lies in H ( G ). This proves part (1) of Theorem 3.Part (2) follows in a similar way with the help of the remark after the proof of Theorem2, with π = π ( ϕ l π ) · · · ( ϕ l h π h ) σ and π I = π ( ϕ l π ) · · · ( ϕ l h π h ) σ I for σ I defined by X i = 1 for i in I but with all other coordinates zero. This shows that we can restrictto single points S , and the proof is finished as above. We have therefore proved all ofTheorem 3.It is easy to deduce explicit estimates for Theorem 3 as for Theorems 1 and 2. Oneobtains at once (12.1) (with T replaced by τ ) and (12.10).
13. Limitation results.
We show here that for each n ≥ h ≤ n − p > ψ , . . . , ψ h inTheorem 1 and the π , π , . . . , π h in Theorem 2 cannot always be chosen over G .51e start with h ≤ n −
1. Because Theorem 1 directly implies Theorem 2 and thenTheorem 3, it will suffice to prove the analogous statements for Theorem 3. Also we haveseen that each [ ψ , . . . , ψ h ] { τ } in Theorem 3(1) is contained in some ( π , π , . . . , π h ) inTheorem 3(2). So it is enough to treat Theorem 3(2).This we do with the affine hyperplane x + x − x − · · · − x n = 1 (13 . p let R = R p be the set of points(1 , r , . . . , r n − ) as the integers r , . . . , r n − run through all powers of p satisfying theasymmetry conditions that r i divides r i +1 ( i = 1 , . . . , n −
2) and also the extra conditions r n − = r n − , r n − + r n − , . . . , r n − + r n − + · · · + r . (13 . Lemma 13.1.
The set R does not lie in a finite union of proper subgroups of Z n .Proof. We can actually disregard (13.2) because their failure would just add more to thefinite union of proper subgroups. Now the falsity of the lemma would lead to an equation F ( p e , . . . , p e n − ) = 0 (13 . e , . . . , e n − , where F ( y , . . . , y n − ) is a finite productof polynomials A = a + a y + a y y + · · · + a n − y y · · · y n − corresponding to the proper subgroups of Z n perpendicular to ( a , . . . , a n − ) = 0. It isclear that each A 6 = 0 and so
F 6 = 0. On the other hand it is easy to see that the pointsin (13.3) are Zariski-dense in R n − . This contradiction proves the lemma.Take as usual K = F p ( t ) and G generated by t and 1 − t . We proceed to exhibit manypoints on H ∗ ( G ) with H defined by (13.1).For integral powers q , . . . , q n − of p define r = q n − , r = q n − q n − , . . . , r n − = q n − · · · q and d = r n − − r n − − · · · − r − r , = r n − − r n − − · · · − r , down to d n − = r n − − r n − and d n − = r n − . Then x = t d , x = 1 − t d n − , x = t d n − − t d n − , . . . , x n = t d − t d (13 . ξ = ( x , . . . , x n ) lies in H . It is in H ( G ) because x = 1 − t r n − = (1 − t ) r n − ,x = t d n − (1 − t r n − ) = t d n − (1 − t ) r n − , and so on.This also leads to a multiplicative representation ξ = ξ r · · · ξ r n − n − (13 . ξ = ( 1 t , , , , , . . . , , , − tt ) ,ξ = ( 1 t , , , , , . . . , , − tt , t ) ξ = ( 1 t , , , , , . . . , − tt , t , t )down to ξ n − = ( 1 t , , − tt , t , t , . . . , t , t , t ) , but ξ n − = ( t, − t, t, t, t, . . . , t, t, t ) . We can quickly check that ξ , . . . , ξ n − are multiplicatively independent. Namely, a relation ξ a · · · ξ a n − n − = (1 , , , , , . . . , , , a n − = 0 on examining the second components, then a n − = 0 from thethird components, and so on down to a = 0.53he case n = 3 with q = q, q = r is of course (1.11) or (1.13).We can see that (13.4) lies in H ∗ ( G ) provided (1 , r , . . . , r n − ) lies in R . For thevarious exponents of t clearly satisfy d n − > d n − > · · · > d > d . There is one moreexponent 0; but d n − = 0 and from the definition of R we also have d n − = 0 , . . . , d = 0.Thus the exponents d n − , . . . , d , x , x , − x , . . . , − x n (in fact each of d n − = 0 , . . . , d = 0does lead to a vanishing subsum). We already remarked that (1.13) is in H ∗ as long as r = s , that is q = 1, that is r = r as in (13.2).Now we can prove as promised that H ∗ ( G ) does not lie in a finite union of setsΠ = ( π , π , . . . , π h ) q = ∞ [ l =0 · · · ∞ [ l h =0 π π q l · · · π q lh h (13 . q and points π , π . . . , π h with h < n −
1. The idea is to note that each Π lies ina coset of G n m of dimension at most h ≤ n −
2; whereas the points (13.5) have rank n − H ∗ ( G ) does lie in such a finite union and we shall reacha contradiction.Now for each element of R the corresponding (13.5) lies in H ∗ ( G ) so in some Π. Thisprovides a partition of R into a finite union of subsets R Π . By Lemma 13.1 we will bethrough if we can prove that each R Π lies in a proper subgroup of Z n .Suppose for some Π we are lucky in the sense that the corresponding π in (13.6) ismultiplicatively independent of ξ , . . . , ξ n − . The corresponding π − ξ = π − ξ r · · · ξ r n − n − all lie in the group generated by π , . . . , π h , and so the multiplicative rank of the various π − ξ is at most h ≤ n −
2. Since π − , ξ , . . . , ξ n − are independent, it follows that theset R Π cannot contain n (or even n −
1) independent elements. So it must indeed lie in aproper subgroup of Z n .In fact we are not so likely to be that lucky, and it is more probable that there is arelation π a = ξ a · · · ξ a n − n − with a = 0. Now the π − a ξ a = ξ ar − a · · · ξ ar n − − a n − n − still lie in a group of rank at most n −
2. Since ξ , . . . , ξ n − are independent, we deduceas above that the set of all ( ar − a , . . . , ar n − − a n − ) lie in a proper subgroup of Z n − .And this implies as above that R Π lies in a proper subgroup of Z n .54hat finishes the proof of the first limitation result. We could also have argued witha symmetrized version of R ; then the A in the proof of Lemma 13.1 could be taken moresimply as a + a y + a y + · · · + a n − y n − .We can use similar arguments to prove the second limitation result concerning non-definability over G . Because the [ ψ , . . . , ψ h ] T ( G ) in Theorem 1 lead to ( π , π , . . . , π h ) inTheorem 2 with (12.4) for τ in T ( G ), it will again suffice to check the matter for Theorem3(2).This we do with the affine line H defined by tx + y = 1 also over K = F p ( t ), nowwith G generated by t p − and 1 − t . It is the example treated at the end of section 11 with m = 1 and l = p −
1. We need another simple observation.
Lemma 13.2.
For an odd prime p suppose that q + q + q = ˜ q + ˜ q + ˜ q (13 . for integral powers q , q , q , ˜ q , ˜ q , ˜ q of p . Then ˜ q , ˜ q , ˜ q are a permutation of q , q , q .Proof. If q , q , q are all different then the left-hand side of (13.7) has just three ones inits expansion to base p . So also the right-hand side; which means that ˜ q , ˜ q , ˜ q are also alldifferent. The result in this case is now clear (even for p = 2). If say q = q = q then weget a one and a two in the expansion because p = 2; so after a permutation ˜ q = ˜ q = ˜ q too, and the result is still clear. Similarly if q = q = q as long as p = 3. This last casecan also be checked directly when p = 3 and this proves the lemma; however the example1 + 1 + 4 = 2 + 2 + 2 shows that p = 2 is not to be saved.Now the analysis in section 11 before the primitive root business shows easily that thepoints of H ∗ ( G ) = H ( G ) are given by x = t r − , y = (1 − t ) r ( r = 1 , p, p , . . . ) . (13 . x, y ) = ξ ξ r for ξ = ( t − ,
1) and ξ = ( t, − t ). Assume p = 2. If H ∗ ( G ) werecontained in a finite union of Π = ( π , π ) q = ∞ [ l =0 π π q l for some q and some π , π over G , then one of these Π would certainly contain at leastthree different points (13.8). This gives equations ξ ξ r = π π s , ξ ξ r ′ = π π s ′ , ξ ξ r ′′ = π π s ′′ (13 . r < r ′ < r ′′ of p and powers s, s ′ , s ′′ of q . Eliminating π , π leads to( ξ ξ r ) s ′ − s ′′ ( ξ ξ r ′ ) s ′′ − s ( ξ ξ r ′′ ) s − s ′ = 1;that is, ξ a = 1 for a = r ( s ′ − s ′′ ) + r ′ ( s ′′ − s ) + r ′′ ( s − s ′ ) . So a = 0; that is, rs ′ + r ′ s ′′ + r ′′ s = rs ′′ + r ′ s + r ′′ s ′ . Lemma 13.2 shows in particular that rs ′ is one of the terms on the right. But which one?Certainly rs ′ = r ′′ s ′ . And rs ′ = rs ′′ else s ′ = s ′′ and (13.9) would imply r ′ = r ′′ . Itfollows that rs ′ = r ′ s . But now eliminating ξ from the first two equations in (13.9) leadsto ξ r ′ − r = π r ′ − r . Thus there would be α, β in F p with ( αt − , β ) = ( α, β ) ξ = π ; howeverthis is impossible because αt − is not in G if p = 2. References [AB] B. Adamczewski and J.P. Bell,
On vanishing coefficients of algebraic power seriesover fields of positive characteristic , Manuscript 2010.[AV] D. Abramovich and F. Voloch,
Toward a proof of the Mordell-Lang conjecture incharacteristic p , International Math. Research Notices (1992), 103-115.[BG] E. Bombieri and W. Gubler, Heights in diophantine geometry , New MathematicalMonographs , Cambridge 2006.[BMZ] E. Bombieri, D. Masser and U. Zannier, Intersecting a plane with algebraic subgroupsof multiplicative groups , Ann. Scuola Norm. Sup. Pisa Cl. Sci. (5) VII (2008), 51-80.[Ca] J.W.S. Cassels,
An introduction to the geometry of numbers , Classics in Math.,Springer 1971.[CMP] L. Cerlienco, M. Mignotte and F. Piras,
Suites r´ecurrentes lin´eaires: propri´et´esalg´ebriques et arithm´etiques , L’Enseignement Math´ematique (1987), 67-108.[D] H. Derksen, A Skolem-Mahler-Lech theorem in positive characteristic and finite au-tomata , Invent. Math. (2007), 175-224.[E] J.-H. Evertse,
On sums of S-units and linear recurrences , Compositio Mathematica (1984), 225-244. 56ESS] J.-H. Evertse, H.P. Schlickewei and W.M. Schmidt, Linear equations in variableswhich lie in a multiplicative group , Annals of Math. (2002), 807-836.[EZ] J.-H. Evertse and U. Zannier,
Linear equations with unknowns from a multiplicativegroup in a function field , Acta Arith. (2008), 159-170.[Gh] D. Ghioca,
The isotrivial case in the Mordell-Lang theorem , Trans. Amer. Math.Soc. (2008), 3839-3856.[GM] D. Ghioca and R. Moosa,
Division points on subvarieties of isotrivial semiabelianvarieties , Internat. Math. Res. Notices , 2006, Article ID 65437, 1-23.[Go] D. Goss, Basic structures of function field arithmetic , Ergebnisse der Math. ,Springer 1996.[He] D.R. Heath-Brown, Artin’s conjecture for primitive roots , Quart. J. Math. OxfordSer. (2) (1986), 27-38.[HP] W.V.D. Hodge and D. Pedoe, Methods of algebraic geometry I , Cambridge 1968.[Hr] E. Hrushovsky,
The Mordell-Lang conjecture for function fields , J. Amer. Math. Soc. (1996), 667-690.[HW] L.-C. Hsia and J. T.-Y. Wang, The ABC theorem for higher-dimensional functionfields , Trans. Amer. Math. Soc. (2003), 2871-2887.[La1] S. Lang,
Introduction to algebraic geometry , Addison-Wesley 1973.[La2] S. Lang,
Fundamentals of diophantine geometry , Springer 1983.[Le] D. Leitner,
Linear equations in positive characteristic , Master Thesis, University ofBasel 2008.[Maso] R.C. Mason,
Diophantine equations over function fields , London Math. Soc. Lec-ture Notes , Cambridge 1984.[Mass] D. Masser, Mixing and linear equations over groups in positive characteristic , IsraelJ. Math. (2004), 189-204.[MS1] R. Moosa and T. Scanlon,
The Mordell-Lang conjecture in positive characteristicrevisited , Model theory and applications, Quaderni di matematica (eds. L. Belair, Z.Chatzidakis, P. D’Aquino, D. Marker, M. Otero, F. Point, and A. Wilkie), Dipartimentodi Matematica Seconda Universit`a di Napoli (2002) pp. 273-296.[MS2] R. Moosa and T. Scanlon, F -structures and integral points on semiabelian varietiesover finite fields , American Journal of Mathematics (2004), 473-522.57PS] A.J. van der Poorten and H. P. Schlickewei, Additive relations in fields , Journal of theAustralian Mathematical Society
A 51 (1991), 154-170.[S] W.M. Schmidt,
Diophantine approximations and diophantine equations , Lecture Notesin Math. , Springer 1991.[SV] T. Struppeck and J.D. Vaaler,
Inequalities for heights of algebraic subspaces and theThue-Siegel principle , in Analytic Number Theory (Allerton Park 1989), Progress in Math. , Birkh¨auser Boston 1990 (pp. 493-528).[T] J. Thunder, Siegel’s lemma for function fields , Michigan Math. J. (1995), 147-162.[V] J.F. Voloch, The equation ax + by = 1 in characteristic p , J. Number Theory (1998),195-200. H. Derksen:
Department of Mathematics, University of Michigan, East Hall 530 ChurchStreet, Ann Arbor, Michigan 48104, U.S.A. ( [email protected] ) D. Masser:
Mathematisches Institut, Universit¨at Basel, Rheinsprung 21, 4051 Basel,Switzerland (