[PDF] Linear equations over multiplicative groups, recurrences, and mixing I

Abstract

Let K be a field of positive characteristic. When V is a linear variety in K^n and G is a finitely generated subgroup of K^*, we show how to compute the intersection of V and G^n effectively using heights. We calculate all the estimates explicitly. A special case provides the effective solution of the S-unit equation in n variables.

Full PDF

aa r X i v : . [ m a t h . N T ] O c t Linear equations over multiplicative groups, recurrences, and mixing I

H. Derksen* and D. Masser

Abstract.

Let K be a ﬁeld of positive characteristic. When V is a linear variety in K n and G is a ﬁnitely generated subgroup of K ∗ , we show how to compute the set V ∩ G n eﬀectively using heights. We calculate all the estimates explicitly. A special case providesthe eﬀective solution of the S -unit equation in n variables.

1. Introduction.

In 2004 the second author published a paper [Mass] about linearequations over multiplicative groups in positive characteristic. This was speciﬁcally aimedat an application to a problem about mixing for dynamical systems of algebraic origin,and, as a result about linear equations, it lacked some of the simplicity of the classicalresults in zero characteristic. A new feature was the appearance of n − n is the number of variables.In 2007 the ﬁrst author published a paper [D] about recurrences in positive char-acteristic. He proved an analogue of the famous Skolem-Lech-Mahler Theorem in zerocharacteristic. A new feature was the appearance of integer sequences involving combina-tions of d − d is the order of the recurrence.It turns out that these two new features are identical. In positive characteristic thevanishing of a recurrence with d terms can be regarded as an linear equation in d − n − d − K we write K ∗ for the multiplicativegroup of all non-zero elements of K . For any subgroup G of K ∗ and a positive integer n it makes sense to write P n ( G ) for the set of points in projective space deﬁned over G . Theorem A (Evertse [E], van der Poorten-Schlickewei [PS]).

Let K be a ﬁeld of zerocharacteristic, and for n ≥ let a , . . . , a n be non-zero elements of K . Then for anyﬁnitely generated subgroup G of K ∗ the equation a X + a X + · · · + a n X n = 0 (1 . has only ﬁnitely many solutions ( X , X , . . . , X n ) in P n ( G ) which satisfy X i ∈ I a i X i = 0 (1 . for every non-empty proper subset I of { , , . . . , n } . We should point out that this remains true even when G is not ﬁnitely generatedbut has ﬁnite Q -dimension. See also a recent paper [EZ] of Evertse and Zannier for aninteresting function ﬁeld version.Theorem A is false in positive characteristic p ; for example in inhomogeneous formfor n = 2 the equation x + y = 1 (1 . x = t, y = 1 − t over the group G in K = F p ( t ) generated by t, − t ; andso thanks to Frobenius inﬁnitely many solutions x = t p e , y = 1 − t p e = (1 − t ) p e ( e = 0 , , , . . . ) (1 . H deﬁned by equation(1.1) to proper linear subvarieties deﬁned by the vanishing of the left-hand sides in (1.2).We can iterate this descent by introducing special varieties T deﬁned solely by binaryequations of the shape X i = aX j ( i = j, a = 0). For example T could be a single point or,when there are no equations at all, the full P n . We could call such varieties linear cosetsor just cosets. This word has a group-theoretical connotation, and indeed T above is atranslate of a group subvariety of the multiplicative group G n m in P n . Conversely it is notdiﬃcult to see that every linear translate of a group subvariety of G n m is a coset in oursense (see for example Lemma 9.4 p.76 of [BMZ]). But we will in this paper make no useof these remarks or indeed hardly any further reference to group varieties.Anyway, it is easily seen that the complete descent yields a ﬁnite collection of cosets T , each contained in the original H , such that the full solution set H ( G ) = H ∩ P n ( G )coincides with the union of all T ( G ) = T ∩ P n ( G ). This is a little closer to the moregeneral context of Mordell-Lang (see below). No further descent from T ( G ) in terms ofproper subvarieties is possible; by way of compensation it is very simple to describe T ( G )explicitly (see for example the discussion towards the end of section 12).In positive characteristic we can establish a descent step similar to Theorem A, butit may involve Frobenius as in (1.4). This less simple situation makes the iteration moreproblematic, and for this reason it is clearer to present our result as a descent now froman arbitrary linear variety V to proper linear subvarieties.However the Frobenius does not always generate inﬁnitely many solutions. It doesabove for x + y = 1, and also for t m x + y = 1 (1 . t m x ; this is because t lies in G . The situation is slightly moresubtle for (1.5) over the group G l generated by t l and 1 − t ; the above solution of (1.3)certainly leads to solutions x = t − m t p e , y = (1 − t ) p e ( e = 0 , , , . . . ) , (1 . G l unless p e ≡ m mod l . This can however happen for inﬁnitelymany e but not necessarily all e in (1.6). This time t may not lie in G l but some positivepower does. Finally the equation (1 + t ) x + y = 1 has a solution x = 1 − t, y = t over G ,but the use of Frobenius will bring in an extra 1 + t , no positive power of which is in G (provided p = 2). 3hese considerations lead naturally to the radical √ G = K √ G for general G in general K ∗ . For us this remains in K ; thus it is the set of γ in K for which there exists a positiveinteger s such that γ s lies in G . Usually K will be ﬁnitely generated over its prime ﬁeld,and then it is well-known that the ﬁnite generation of G is equivalent to that of √ G . Wealso see the need for some concept of isotriviality, already present in diophantine geometryat least since N´eron’s 1952 proof of the relative Mordell-Weil Theorem and Manin’s 1963proof of the relative Mordell Conjecture. In our linear context the appropriate reﬁnementis G -isotriviality, introduced by Voloch [V] for n = 2.Namely, let K be a ﬁeld of positive characteristic p , and for n ≥ V be a linearvariety in P n deﬁned over K . We say that V is G -isotrivial if there is an automorphism ψ of P n ( K ), deﬁned by ψ ( X , . . . , X n ) = ( g X , . . . , g n X n ) (1 . g , . . . , g n in G , such that ψ ( V ) is deﬁned over the algebraic closure F p . Such a ψ could be called a G -automorphism. Let us write F K for F p ∩ K ; then of course ψ ( V ) isdeﬁned over F K . So ψ ( V ) is deﬁned over some F q ; and now a point w on V deﬁned over G gives ψ ( w ) on ψ ( V ) which by Frobenius leads to points ψ ( w ) q e ( e = 0 , , , . . . ) on ψ ( V )and so ψ − ( ψ ( w ) q e ) ( e = 0 , , , . . . ) (1 . V , all still deﬁned over G .Of course points over G are nothing other than zero-dimensional G -isotrivial varieties.Here is a preliminary version of our main descent step on linear equations. For V asabove write V ( G ) = V ∩ P n ( G ) for the set of points of V deﬁned over G . But it is clearerﬁrst to consider points over the radical √ G . Descent Step over √ G . Let K be a ﬁeld of positive characteristic, and suppose that thepositive-dimensional linear variety V deﬁned over K is not a coset. Suppose also that √ G in K is ﬁnitely generated. Then there is an eﬀectively computable ﬁnite collection W ofproper √ G -isotrivial linear subvarieties W of V , also deﬁned over K , with the followingproperty.(a) If V is not √ G -isotrivial, then V ( √ G ) = [ W ∈W W ( √ G ) . b) If V is √ G -isotrivial and ψ ( V ) is deﬁned over F q , then V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! . Thus (a) says that the points of V ( √ G ) are not Zariski-dense in V ; and (b) says thatthe points on V ( √ G ) like (1.8), which can be dense, at least arise from a set of w whichis not dense.Part (a) was essentially proved for n = 2 as Theorem 1 by Voloch [V] (p.196), and hisTheorem 2 (p.198) even covers the more general case of ﬁnite Q -dimension; here one getsthe ﬁniteness of the solution set. A forerunner of part (b) for n = 2 can be seen in Mason[Maso] (pp. 107,108). The main result of [Mass] is restricted to a single equation (1.1) andis expressed in terms of a concept of “broad” set; as we do not need this result here (oreven the concept) we refrain from quoting it. However these authors do not discuss theeﬀectivity in our sense (see the discussion below).A simple example of (b) in inhomogeneous form is (1.3); this represents a line L ,clearly isotrivial and even trivial in that we can take ψ as the identity automorphism.When G is generated by t and 1 − t in K = F p ( t ), then √ G is obtained by adding theelements of F ∗ p as generators. Leitner [Le] has found that for p ≥ p + 4 points W , six of which are like w = ( t, − t ) in (1.4) and the remaining p − w = ( x, − x )for x = 2 , , . . . , p − V ( √ G ). In the analogous characterization of V ( G ) there is no longera clear separation of cases. In fact it can happen in case (b) above that the actions ofFrobenius through q e can get truncated, so that each e remains bounded; but then it iseasy to reduce this to case (a). A simple example is (1.5) for m = 1 in the group G = G l above for l = p , when the solutions (1.6) are over G only when e = 0. Here is a generalstatement. Descent Step over G . Let K be a ﬁeld of positive characteristic, and suppose that thepositive-dimensional linear variety V deﬁned over K is not a coset. Suppose also that √ G in K is ﬁnitely generated. Then there is an eﬀectively computable ﬁnite collection W ofproper √ G -isotrivial linear subvarieties W of V , also deﬁned over K , such that either V ( G ) = [ W ∈W W ( G )5 r V ( G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( G )) q e ! for some q and some √ G -automorphism ψ with ψ ( V ) deﬁned over F q . It may be instructive here to consider the inhomogeneous example x + y − z = 1 (1 . G in K = F p ( t ) generated by t, − t . Now (1.9) represents a plane P ,also isotrivial and even trivial. Leitner [Le] has found that for p ≥ W and 8 points W . For example the line deﬁned by tx + y = 1 , z = (1 − t ) x (1 . x = z, y = 1. And so is the point x = t, y = 1 − tt , z = (1 − t ) t . We can easily iterate the descent from (1.10). This is isotrivial via the automorphism ψ taking x, y, z to ˜ x = tx, ˜ y = y, ˜ z = t − t z , when the equations become ˜ x + ˜ y = 1 , ˜ z = ˜ x .Now (1.4) (with e replaced by f ) on (1.3) lead to the points w = ( x, y, z ) of W ( G ) with x = t p f − , y = (1 − t ) p f , z = t p f − (1 − t ) ( f = 0 , , , . . . ) . Then from (1.8) (with q = p and the identity automorphism) we get the points x = t ( q − r , y = (1 − t ) qr , z = t ( q − r (1 − t ) r (1 . P ( G ); here q = p f and r = p e now indicate independently varying powers of p . This isprecisely the example in [Mass] (p.202).With the help of a suitable notation we can after all do the complete descent, also forlinear varieties that are cosets; then the latter arise solely as obstacles. Denote by ϕ = ϕ q the Frobenius with ϕ ( x ) = x q . Let ψ , . . . , ψ h be projective automorphisms. Then weimitate commutator brackets by deﬁning the operator[ ψ , . . . , ψ h ] = [ ψ , . . . , ψ h ] q = ∞ [ e =0 · · · ∞ [ e h =0 ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h ) , (1 . h = 0. This formally resembles Deﬁnition 7.7of [D] (p.208). Theorem 1.

Let K be a ﬁeld of positive characteristic p , let V be an arbitrary linear vari-ety deﬁned over K , and suppose that √ G in K is ﬁnitely generated. Then there is a power q of p such that V ( G ) is an eﬀectively computable ﬁnite union of sets [ ψ , . . . , ψ h ] q T ( G ) with √ G -automorphisms ψ , . . . , ψ h (0 ≤ h ≤ n − , and cosets T contained in V. Here we see quite clearly the n − x + x − x − · · · − x n = 1generalizes (1.3) and (1.9), and it can be used to show that the upper bound n − d − e = 1 in (1.12) and all other zero, we see that ψ q − is a G -automorphism.Similarly for ψ q − , . . . , ψ q − h . However it may not always be possible to choose ψ , . . . , ψ h as G -automorphisms. This we also prove in section 13.We can also symmetrize the sets in Theorem 1. We explain this with the points (1.11)on P deﬁned by (1.9). They can be written as x = t s − r , y = (1 − t ) s , z = t s − r (1 − t ) r (1 . s = qr . Here there is asymmetry because apparently r divides s . However (1.13) hasa meaning for any independent positive powers r, s of p ; and it is easily checked that theresulting points remain on P .To formulate this in general we introduce another bracket notation more related tothe group law. For points π , π , . . . , π h we deﬁne the set( π , π , . . . , π h ) = ( π , π , . . . , π h ) q = π ∞ [ l =0 · · · ∞ [ l h =0 ( ϕ l π ) · · · ( ϕ l h π h ) , (1 . π itself if h = 0. We introduce more special varieties S deﬁned solely by binary equations of the shape X i = X j . For example S could be the7ingle point with all coordinates equal or the full P n . We could call such varieties linearsubgroups or just subgroups. As above it is not diﬃcult to see that they are precisely thelinear group subvarieties of G n m , but again we don’t need to know this. Theorem 2.

Let K be a ﬁeld of positive characteristic p , let V be an arbitrary linear vari-ety deﬁned over K , and suppose that √ G in K is ﬁnitely generated. Then there is a power q of p such that V ( G ) is an eﬀectively computable ﬁnite union of sets ( π , π , . . . , π h ) q S ( G ) with points π , π , . . . , π h (0 ≤ h ≤ n − deﬁned over √ G and subgroups S . As in Theorem 1, the upper bound n − π q − , π q − , . . . , π q − h (aswell as the product π π · · · π h ) are deﬁned over G . However this may not always be trueof π , π , . . . , π h , as we shall also prove in section 13.When V is a hyperplane deﬁned by (1.1) we can even descend to points, provided werestrict to (1.2) in the style of Theorem A. Theorem 3.

Let K be a ﬁeld of positive characteristic p , let H be deﬁned by a X + a X + · · · + a n X n = 0 for non-zero a , a , . . . , a n in K , and write H ∗ ( G ) for the set of points in P n ( G ) satisfying X i ∈ I a i X i = 0 for every non-empty proper subset I of { , , . . . , n } . Suppose that √ G in K is ﬁnitelygenerated. Then there is a power q of p such that H ∗ ( G ) is contained both in (1) an eﬀec-tively computable ﬁnite union of sets [ ψ , . . . , ψ h ] q { τ } in H ( G ) with √ G -automorphisms ψ , . . . , ψ h (0 ≤ h ≤ n − and points τ , and in (2) an eﬀectively computable ﬁnite unionof sets ( π , π , . . . , π h ) q in H ( G ) with points π , π , . . . , π h (0 ≤ h ≤ n − . We do not prove it here, but in this situation H ∗ ( G ) is precisely a ﬁnite union of[ ψ , . . . , ψ h ] q { τ } . However there seems to be a strange asymmetry between the asymmetricpart (1) and the symmetric part (2). Namely it seems improbable that H ∗ ( G ) is preciselya ﬁnite union of ( π , π , . . . , π h ) q . For example, the point (1.13) on H deﬁned by (1.9) isin H ∗ ( G ) except for r = s , which disturbs the independence of r and s .Apart from the work [V] already mentioned, there are other results of this kind, nowin the more general context of Mordell-Lang for arbitrary varieties V inside arbitrary8emiabelian varieties S . Typically here one intersects V with a ﬁnitely generated subgroupΓ of S ; however in the present paper with S = G n m we have for simplicity restricted Γ toa Cartesian product G n .Thus the main result Theorem A (p.104 - see also p.109) of Abramovich and Voloch[AV] almost implies part (a) of our Descent Step over √ G , except that they assume that V is not K ∗ -isotrivial and they have no information about W which would ensure linearityin our situation. The main result Theorem 1.1 (p.667) of Hrushovsky’s well-known paper[Hr] gives a similar implication. The restriction to our (a) corresponds to their restrictionto the non-isotrivial case. Again these authors do not discuss the eﬀectivity in our sense.After earlier work by Scanlon, the isotrivial case was treated by Moosa and Scanlon.Their Theorem B (p.477) of [MS2] implies that our V ( G ) is what they call an F -set (seealso [MS1]). Indeed in our situation and notation an F -set is nothing other than a ﬁniteunion of ( π , π , . . . , π h ) q A ( G ) with π π · · · π h and π q − , π q − , . . . , π q − h deﬁned over G and an algebraic subgroup A . However they do not prove the bound h ≤ n − A which would imply that it is linear because our V is. Theirideas were developed by Ghioca [Gh], who in addition extended the results to Drinfeldmodules. See also the work [GM] of Ghioca and Moosa on division groups. Again there isno mention of eﬀectivity.Now let us discuss this eﬀectivity, a key aspect of the present paper.It is well-known that Theorem A (in zero characteristic) is semieﬀective in the sensethat eﬀective and even explicit upper bounds for the number of solutions of (1.1) subject to(1.2) can be found. However it is not fully eﬀective in the sense that no upper bounds areknown for the size of the solutions, even in very simple cases like K = Q and G generatedby 3,5,7; and it is even unknown how to ﬁnd all the ﬁnitely many non-negative integers a, b, c satisfying an equation like 3 a + 5 b − c = 1 . Out of the works in positive characteristic quoted above, only two discuss eﬀectivity,and then only semieﬀectivity in the sense above. Voloch [V] in the theorems mentionedabove gives explicit upper bounds for the cardinality of V ( G ) for n = 2 in case (a) ofTheorem 1; these are uniform in the sense that they are independent of V and furtherthey depend on G only with regard to its rank. A similarly uniform bound is given asTheorem 6.1 (p.687) by Hrushovsky [Hr] for V in an abelian variety; however as it standsit is not completely explicit due to the use of non-standard analysis. These bounds are in9ine with the well-known estimates in zero characteristic - see for example Theorem 1.1 of[ESS] (p.808).By contrast our results above are fully eﬀective. This should be no surprise; forexample it is rather easy by diﬀerentiating to ﬁnd all non-negative integers a, b, c with(3 + t ) a + (5 + t ) b − (7 + t ) c = 1in any ﬁxed K = F p ( t ). We shall work out explicit bounds, at ﬁrst for the Descent Stepover √ G , where the exponents appearing can be reasonably small; and then for the DescentStep over G and Theorems 1,2 and 3. It would then be a straightforward matter to deducebounds for the various cardinalities involved; but more work may be needed to make theseuniform in the sense above.In fact the size bounds cannot be uniform in this sense. For example from the non-isotrivial equation x + ay = 1 with a = − t m (1 − t ) m ( m = p e ) over the group generated by t and 1 − t in F p ( t ), with solution x = t m , y = (1 − t ) m , we can easily show that the size ofsolutions for ﬁxed G must depend on V . Similarly the isotrivial equation x + y = 1 overthe group generated by t m and (1 − t ) m in F p ( t ), with the same solution, demonstratesthat the size of solutions for ﬁxed V must depend on more than just the rank of G .Because all our varieties are linear, we can measure them in a traditional way in termsof certain heights on the Grassmannian. We will show for example in the Descent Stepover √ G that h ( W ) ≤ Ch ( V ) n (1 . W is no longer required to be √ G -isotrivial, where C depends only on K, n and G . Ifwe insist on W being √ G -isotrivial, then the exponent is not so small. The well-knownNorthcott Property of heights often implies that the set of W in (1.15) is ﬁnite and easilyeﬀectively computable.Perhaps since the results in zero characteristic are not eﬀective, there is no traditionabout measuring the groups Γ, even in S = G n m . Because our Γ = G n , it is here possibleto use a basis-free notion of regulator R ( G ). We will show that the bounds, at least when G = √ G , are all of polynomial growth in R ( G ). For example in (1.15) we get C ≤ cR ( G ) n +2 again if W is no longer required to be √ G -isotrivial, where c now depends only on K, n and the rank r of G . In fact here c = 8 n d (10 n ( n + r ) n + r ) ) n +1 d depending only mildly on K ; for example d = 1 if K is a ﬁeld of rational functionsin several independent variables over a ﬁnite ﬁeld.However we did ﬁnd it a small surprise to discover that when G = √ G the smallestbounds can be exponential in R ( G ). A hint of this can be seen from the above discussionof (1.5) and G l . For example the simplest solution of the equation t x + y = 1with x, y in the group generated by t and 1 − t in F ( t ) is x = ( t ) , y = (1 − t ) ; (1 . √

3. For an explanation see the end of section 11.In section 12 we estimate the heights (in a natural sense) of all the quantities occurringin our Theorems. The bounds are polynomial in h ( V ) and R ( G ) if G = √ G ; but otherwisethey may involve an extra, possibly unavoidable, exponential dependence on R ( G ). Heretoo there is a Northcott Property to ensure eﬀectivity.At ﬁrst sight it may seem that the methods of [Mass] and [D] are unrelated. But thereare close connections, and we give some hints of this in our exposition. Here we mentionjust that [Mass] works with derivatives and [D] works with p -automata and “free Frobeniussplitting”. For example over F p ( t ), [Mass] (p.196) has δ i = ( ddt ) i ( i = 0 , . . . , p −

1) while[D] (p.198) splits F p ( t ) into a direct sum of one-dimensional F p ( t p )-subspaces V i ( i =0 , . . . , p −

1) and considers the associated projections λ i . In the natural case V i = t i F p ( t p )one checks easily that the vectors ( δ , tδ . . . , t p − δ p − ) and ( λ , λ , . . . , λ p − ) are connectedvia an invertible matrix over F p . So in some sense diﬀerentiating is equivalent to projecting.We can also quote Hrushovsky [Hr] (p.669) “Distinguishing a basis for K/K p has theeﬀect of ﬁxing also a stack of Hasse derivations.” As a matter of fact we do not use Hassederivations in this paper (see the remarks at the end of section 5).Here is a brief section-by-section account of what follows.We begin in section 2 by explaining heights. Then in section 3 we introduce deriva-tions, and we use all this to give preliminary eﬀective versions of the two main technicalresults of [Mass] about dependence over the ﬁeld of diﬀerential constants.In section 4 we explain regulators, and in section 5 we use these to reﬁne the work ofsection 3. 11hen section 6 contains a technical result which enables us to identify isotriviality,and in section 7 we record some observations about automorphisms and heights of varieties V . We are now in a position in section 8 to make eﬀective the main argument of [Mass]yielding the subvarieties W , at least for points over √ G and when V is either a hyperplaneor trivial. We treat general V in section 9 but omitting the isotriviality of the W . Thisomission is then remedied in section 10 with a simple inductive argument, and in section11 we show how to treat points over G . We can then in section 12 prove eﬀective versionsof our Descent Steps and Theorems.Finally in section 13, as already mentioned, we show that various aspects of our resultscannot be further improved.We would also like to draw attention to a very recent manuscript [AB] of Adamczewskiand Bell for further work in the context of p -automata; in particular this covers alsoequations (1.1) and recurrences.

2. Heights.

The Theorems above for arbitrary ﬁelds can easily be reduced to the casewhen the ﬁeld is ﬁnitely generated over its ground ﬁeld F p (see section 12 below). Ingeneral let K be ﬁnitely generated over a subﬁeld k in any characteristic. We shall deﬁneheights on K relative to k ; thus we suppose that K is a transcendental extension of k .Here we do not know any basis-free notion of height, and thus we choose a transcendencebasis B of K over k with elements t , . . . , t b regarded as independent variables over k . Theheight ˜ h ( a ) = ˜ h B ( a ) of an element a = 0 of k [ B ] = k [ t , . . . , t b ] will be its total degree deg a regarded as a polynomial; also ˜ h (0) = 0. The height can be extended to an element x ofthe quotient ﬁeld k ( B ) = k ( t , . . . , t b ) by writing x = a a for coprime polynomials a , a in k [ B ] and deﬁning ˜ h ( x ) = ˜ h B ( x ) = max { deg a , deg a } . (2 . K . This is a standard matter using valuations.There is a valuation on k [ B ] corresponding to total degree and deﬁned by | a | ∞ =exp(deg a ) ( a = 0); and of course | | ∞ = 0. This extends at once to k ( B ) by multi-plicativity. And for every irreducible p in k [ B ] there is a valuation deﬁned on k [ B ] by | a | p = exp( − ω p ( a ) deg p ) ( a = 0) , where ω p ( a ) is the exact power of p dividing a ; andagain | | ∞ = 0. And it too extends to k ( B ) by multiplicativity. Using v to run over ∞ p , we have the product formula Q v | x | v = 1 ( x = 0) and the height formula˜ h ( x ) = log Q v max { , | x | v } .Now K is a ﬁnite extension of k ( B ), say of degree d . Thus each valuation v has ﬁnitelymany extensions w to K , written w | v . In fact | x | w = | N ( x ) | /d w v , (2 . K w to the completion k ( B ) v and d w is the relativedegree. We also have P w | v d w = d . Now the product formula Y w | x | d w w = 1 ( x = 0)holds. Further the formula ˜ h ( x ) = 1 d log Y w max { , | x | d w w } extends the height ˜ h = ˜ h B to an absolute height on K . For all this see [La2] (pp.1-19) or[BG] (pp.1-10).Actually for convenience in estimating we will use from now on the relative height h ( x ) = h B ( x ) = d ˜ h ( x ) ≥ . This can be calculated directly from the minimum polynomial in the following extensionof (2.1).

Lemma 2.1.

Suppose x in K satisﬁes an equation A ( x ) = 0 with A ( t ) = a t e + · · · + a e for a , . . . , a e in k [ B ] and A ( t ) irreducible over k [ B ] . Then eh ( x ) = d max { deg a , . . . , deg a e } . Proof.

Over a splitting ﬁeld L we have A ( t ) = a ( t − x ) · · · ( t − x e ), and we can extend,keeping the same notation, all the valuations to L . Then Gauss’s Lemma givesmax {| a | w , . . . , | a e | w } = | a | w max { , | x | w } · · · max { , | x e | w } . If w does not divide ∞ then the left-hand side is 1 because a , . . . , a e are coprime; andotherwise they are all max {| a | ∞ , . . . , | a e | ∞ } . Taking the product with exponents d w andthen taking logarithms gives on the left-hand side d max { deg a , . . . , deg a e } and on theright-hand side h ( x ) + · · · + h ( x e ). This last is just eh ( x ) because x , . . . , x e are conjugateover k ( B ). 13n immediate consequence of Lemma 2.1 is the Northcott Property; namely that forany H there are at most ﬁnitely many x in K with h ( x ) ≤ H .We will also need the standard extensions to vectors. So for x , . . . , x l in K we deﬁne h ( x , . . . , x l ) = log Y w max { , | x | d w w , . . . , | x l | d w w } . For example h ( a , . . . , a e ) in the situation of Lemma 2.1 is just d max { deg a , . . . , deg a e } . The Northcott Property extends at once to K l .

3. Dependence with heights.

Given K ﬁnitely generated and transcendental over k , there is always a separable transcendence basis B = ( t , . . . , t b ); this means that K is separable over k ( B ). As above write d = [ K : k ( B )] . On k [ B ] we have the standardderivations ∂∂t , . . . , ∂∂t b , which extend in the obvious way to k ( B ). And by separabilitythey extend uniquely to K . For all this see [La1] (pp.183-184). For an integer i ≥ D ( i ) as the set of operators D = (cid:18) ∂∂t (cid:19) i · · · (cid:18) ∂∂t b (cid:19) i b as i , . . . , i b run over all non-negative integers with i + · · · + i b ≤ i . This is not quite thesame as [Mass] (p.196), where we had i ≥ i + · · · + i b < i .It will be convenient for later calculations to deﬁne a quantity h ( x ; i ) as follows. Weorder in some way the operators D , . . . , D l of D ( i ), and we deﬁne for x = 0 h ( x ; i ) = h B ( x ; i ) = h (cid:18) D xx , . . . , D l xx (cid:19) of course independent of the ordering.The next result is an explicit version of Lemma 3 of [Mass] (p.195) however withoutreference to any group G . We write C for the ﬁeld of diﬀerential constants in K . Forzero characteristic this is k , but for positive characteristic p it is the set of p th powers ofelements of K . Lemma 3.1.

For m ≥ suppose c , . . . , c m are in C and x , . . . x m are in K ∗ with c x + · · · + c m x m = 1 . (3 . Then either a) h ( c x , . . . , c m x m ) ≤ ( m + 1) ( h ( x ; m −

1) + · · · + h ( x m ; m − or (b) x , . . . , x m are linearly dependent over C .Proof. If (b) does not hold, then the theory of the generalized Wronskian (see for example[La2] p.174) shows that we may ﬁnd operators D i in D ( i ) ( i = 0 , . . . , m −

1) such that thematrix with entries D i x j ( i = 0 , . . . , m − j = 1 , . . . , m ) is non-singular. Applying themto (3.1) we get m X j =1 D i x j x j ( c j x j ) = D i (1) ( i = 0 , . . . , m − . These can be solved by Cramer’s Rule to get c j x j = w j w ( j = 1 , . . . , m ), where w = 0 isthe determinant of the matrix with entries D i x j x j ( i = 0 , . . . , m − j = 1 , . . . , m ). Notingthat this determinant is multilinear in the columns, we ﬁnd that h ( w ) ≤ h ( x ; m −

1) + · · · + h ( x m ; m − h ( w j ) ( j = 1 , . . . , m ). We conclude that h ( c x , . . . , c m x m ) = h ( w i w , . . . , w m w ) is at most h ( w ) + h ( w ) + · · · + h ( w m ) ≤ ( m + 1) ( h ( x ; m −

1) + · · · + h ( x m ; m − G . Lemma 3.2.

For m ≥ suppose x , x , . . . x m are in K ∗ and linearly dependent over C but x , . . . x m are linearly independent over C . Then there is a relation c x + · · · + c m x m = x (3 . with c , . . . , c m in C and h (cid:18) c x x , . . . , c m x m x (cid:19) ≤ ( m + 1) (cid:18) h (cid:18) x x ; m − (cid:19) + · · · + h (cid:18) x m x ; m − (cid:19)(cid:19) . Proof.

There is certainly a relation (3.2) with c , . . . , c m in C , and we apply Lemma 3.1to the quotients x x , . . . , x m x . As x , . . . x m are linearly independent over C , the conclusion(b) cannot hold. Now conclusion (a) is just what we need, and this completes the proof.In section 5 we shall prove versions of Lemmas 3.1 and 3.2 that are uniform for x , x , . . . , x m in a ﬁnitely generated group G as in [Mass]. By way of preparation, thenext result illustrates the logarithmic nature of the quantities h ( ; i ).15 emma 3.3. For any x = 0 , y = 0 in K and any integers i ≥ , e ≥ we have h ( xy ; i ) ≤ h ( x ; i ) + h ( y ; i ) and h ( x e ; i ) ≤ ih ( x ; i ) .Proof. Let D be in D ( i ). By distributing operators over the factors of xy as in Leibniz, wesee that D ( xy ) xy is a sum with generalized binomial coeﬃcients of products E ( x ) x F ( y ) y withoperators E, F also in D ( i ). Taking D = D , . . . , D l as in the deﬁnition of h ( xy ; i ), wededuce the ﬁrst inequality of the present lemma by standard height calculations.When e is a positive integer, a similar argument shows that D ( x e ) x e is a sum withgeneralized binomial coeﬃcients of products E ( x ) x · · · E e ( x ) x with operators E , . . . , E e alsoin D ( i ). Here E · · · E e = D , so that there are at most i terms not equal to 1 in thisproduct. Thus D ( x e ) x e is a polynomial of total degree at most i in the E ( x ) x for E in D ( i ).The second inequality now follows in a similar way, at least for e ≥

1. The result is trivialfor e = 0. Lemma 3.4.

For any x = 0 in K and any integer i ≥ we have h ( x ; i ) ≤ idh ( x ) . Proof.

This is trivial for i = 0, so we assume from now on i ≥

1. We have an equation A ( x ) = 0 as in Lemma 2.1, of degree e ≤ d . Denote by A ′ ( t ) the derivative with respectto t . Pick any D in D ( i ). We claim that B i = ( A ′ ( x )) i − Dx is a polynomial in x andvarious derivatives D a of various coeﬃcients a of A , with coeﬃcients in k and of degreeat most (2 i − e −

1) + 1 in x and of total degree at most 2 i − D a . We provethis by induction on i .When i = 1 we have for example D = ∂∂t = ∂ (say), and applying this to A ( x ) = 0yields B = − P ej =0 ( ∂a e − j ) x j for which the claim is clear.Assuming Dx = B i ( A ′ ( x )) i − with B i as above, we do the induction step by applyingone more operator, again say ∂∂t = ∂ . We get( A ′ ( x )) i ∂Dx = A ′ ( x ) ∂B i − (2 i − B i ∂ ( A ′ ( x )) . Here ∂B i involves x to degree at most (2 i − e −

1) + 1 and also x to degree at most(2 i − e −

1) multiplied by ∂x = B A ′ ( x ) , together with D a to total degree at most 2 i − ∂ ( A ′ ( x )) involves x to degree at most e − x to degree at most e − e = 1) multiplied by ∂x = B A ′ ( x ) , together with D a to total degree at most 1. Multiplyingby A ′ ( x ) we get ( A ′ ( x )) i +1 ∂Dx involving x to degree at most e − { (2 i − e −

1) + 1 + ( e − , (2 i − e −

1) + e } = (2( i + 1) − e −

1) + 1 , D a is at most (2 i −

1) + 1 + 1 = 2( i + 1) −

1. This proves the claim ingeneral.There follows at once the estimatelog | B i | w ≤ ((2 i − e −

1) + 1) log max { , | x | w } for any w not dividing ∞ ; and otherwise we get an extra term (2 i −

1) max { deg a , . . . , deg a e } .The same estimates also hold for log | C | w where C = x ( A ′ ( x )) i − .Now write B ij for the B i corresponding to the operators D j ( j = 1 , . . . , l ) of D ( i ), sothat D j xx = B ij C . Then h (cid:18) D xx , . . . , D l xx (cid:19) = X w d w max { log | B i | w , . . . , log | B il | w , log | C | w } which is at most((2 i − e −

1) + 1) h ( x ) + (2 i − d max { deg a , . . . , deg a e } . Finally by Lemma 2.1 this is at most((2 i − e −

1) + 1) h ( x ) + (2 i − eh ( x ) ≤ ieh ( x ) ≤ idh ( x )as required. This completes the proof of the present lemma.In view of our consistent use of the relative height (as opposed to the absolute height),the factor d here looks like a normalization error. However it cannot be avoided, as theexample x = ( t +1 t ) /d ( t = t ) in K = k ( t )( x ) = k ( x ) shows. One ﬁnds that the rationalfunction x ∂ i x∂t i has denominator ( t ( t + 1)) i . So its height is at least 2 id = 2 idh ( x ), whichshows also that our dependence on i is not too bad. Perhaps even the factor 4 essentiallycannot be avoided.

4. Regulators.

Let K be ﬁnitely generated and transcendental over k as in the precedingsection, and let B be a transcendence basis. Let G be a subgroup of K ∗ ﬁnitely generatedmodulo k ∗ ; that is, G/ ( G ∩ k ∗ ) is ﬁnitely generated. We show here how to deﬁne a regulator R ( G ) = R B ( G ).For all w except ﬁnitely many we have | g | w = 1 for every g in G . Pick a set of N ≥ L from G into17 N whose typical coordinate is d w log | g | w . In fact by (2.2) L ( G ) lies in Z N and is thereforediscrete. Thus it is a (full) lattice in the real subspace it generates, whose dimension is therank r of G/ ( G ∩ k ∗ ). If r ≥ R ( G ) = R B ( G ) = det L ( G ) ≥ r = 0 we deﬁne R ( G ) = 1. Thisdoes not quite coincide with the standard deﬁnition for the unit group in algebraic numbertheory, because the latter is obtained by a projection to one dimension lower. But theyare equal up to a constant factor.The following example will be quoted later. With K = F p ( t ) (and the obvious B ) and G l generated by t l and 1 − t we have N = 3 corresponding to valuations at t = 0 , , ∞ ;and so vectors ( l, , l ) and (0 , ,

1) giving R B ( G l ) = l √ Lemma 4.1.

Let

G, G ′ in K ∗ be ﬁnitely generated modulo k ∗ with G of ﬁnite index in G ′ .Then R ( G ) = [ G ′ : G ][ G ′ ∩ k ∗ : G ∩ k ∗ ] R ( G ′ ) = [ G ′ / ( G ′ ∩ k ∗ ) : G/ ( G ∩ k ∗ )] R ( G ′ ) , where we identify G/ ( G ∩ k ∗ ) as a subgroup of G ′ / ( G ′ ∩ k ∗ ) .Proof . The quotients G/ ( G ∩ k ∗ ) , G ′ / ( G ′ ∩ k ∗ ) are torsion-free, both with the same rank,say r . If r = 0 the lemma is trivial. Otherwise using elementary divisors we can ﬁndgenerators γ , . . . , γ r of G ′ / ( G ′ ∩ k ∗ ) and positive integers d , . . . , d r such that γ d , . . . , γ d r r generate G/ ( G ∩ k ∗ ). Then the relationship between L ( G ′ ) and L ( G ) is clear, and thelemma follows. Lemma 4.2.

Let G in K ∗ be ﬁnitely generated modulo k ∗ , let x be in K ∗ , and let G ′ bethe group generated by x and the elements of G . Then R ( G ′ ) ≤ h ( x ) R ( G ) .Proof . It is geometrically clear that if Λ is any lattice in euclidean space, then det(Λ+ Zv ) ≤ det(Λ) | v | for the length, at least if v is not in the space spanned by Λ. But this continuesto hold for all v provided only | v | ≥ Zv remains discrete. In particular it holdsfor Λ = L ( G ) and v = L ( x ). We conclude R ( G ′ ) ≤ |L ( x ) | R ( G ). Finally we have bydeﬁnition and the product formula h ( x ) = X w max { , m w } = 12 X w | m w | (4 . m w = d w log | x | w . And |L ( x ) | = X w m w ≤ ( X w | m w | ) = 4( h ( x )) . The lemma follows.We can recover a basis from the regulator as follows.

Lemma 4.3.

Let G be a subgroup of K ∗ ﬁnitely generated modulo k ∗ with G/ ( G ∩ k ∗ ) ofrank r ≥ . Then there are g , . . . , g r in G generating G/ ( G ∩ k ∗ ) , with h ( g ) · · · h ( g r ) ≤ r δ ( r ) R ( G ) for δ ( r ) = r r .Proof . By Minkowski’s Second Theorem (see for example [Ca] Theorem V p.218) there are˜ g , . . . , ˜ g r in G multiplicatively independent modulo k ∗ , with |L (˜ g ) | · · · |L (˜ g r ) | ≤ r V ( r ) det L ( G ) = 2 r V ( r ) R ( G ) (4 . V ( r ) of the unit ball in R r . By geometry V ( r ) ≥ ( √ r ) r . We get a basis in the standard way using the argument of Mahler-Weyl (see forexample [Ca] Lemma 8 p.135); there results |L ( g i ) | ≤ max { , i }|L (˜ g i ) | ( i = 1 , . . . , r ) , and so r V ( r ) in (4.2) gets replaced by r !2 r − r r/ ≤ r r/ r − . Now (4.1) gives h ( g ) = X w max { , m w } = 12 X w | m w | for m w = d w log | g | w in Z . And | m | ≤ m for any m in Z , so we get h ( g ) ≤ X w m w = 12 |L ( g ) | . Therefore h ( g ) · · · h ( g r ) ≤ r r r R ( G ) < r δ ( r ) R ( G ) as desired. 19n view of (4.2) it seems a pity that the square of the regulator appears in Lemma4.3. But it cannot be avoided. For example let α , . . . , α l , β , . . . , β l be diﬀerent constantsin k , and consider G generated by g = ( t − α ) ··· ( t − α l )( t − β ) ··· ( t − β l ) in K = k ( t ). Then R ( G ) = √ l .The only possibilities for g are γg ± with γ constant. But then h ( g ) = l , so any bound h ( g ) ≤ δ (1) R ( G ) is impossible.This leads to the following uniform version of Lemma 3.4 when x lies in G . Write G k for the group generated by the elements of G and k ∗ . Lemma 4.4.

Let G be a subgroup of K ∗ ﬁnitely generated modulo k ∗ with G/ ( G ∩ k ∗ ) ofrank r ≥ . Then for any g in G we have h ( g ; i ) ≤ i dδ ( r ) R ( G ) . Further for any positiveinteger l there is g in G k and g ′ in G with g = g g ′ l and h ( g ) ≤ lδ ( r ) R ( G ) . Proof.

Choose basis elements g , . . . , g r according to Lemma 4.3, and write g = cg e · · · g e r r for rational integers e , . . . , e r and c in k ∗ . Replacing some of the g j by their inverses,we can assume that all e j ≥

0; this does not aﬀect the estimate in Lemma 4.3. Then byLemma 3.3 h ( g ; i ) = h ( g e · · · g e r r ; i ) ≤ h ( g e ; i ) + · · · + h ( g e r r ; i ) ≤ i ( h ( g ; i ) + · · · + h ( g r ; i )) . This in turn by Lemma 3.4 is at most4 i d ( h ( g ) + · · · + h ( g r )) ≤ i drh ( g ) · · · h ( g r ) ≤ i dδ ( r ) R ( G ) (4 . e j = f j + le ′ j with 0 ≤ f j < l ( j = 1 , . . . , r ) (compare also [D] p.197), taking g = cg f · · · g f r r , g ′ = g e ′ · · · g e ′ r r and using the inequality in (4.3).The ﬁnal result of this section will lead easily to a quantitative version of Lemma 2 of[Mass] (p.193), such as those mentioned in [Mass] (pp 194,195). However it involves betterconstants and is no longer restricted to positive characteristic. It is here, by the way, thatthe radical √ G makes its essential appearance in the whole story. Lemma 4.5.

Suppose that x, y are in K ∗ with x not in √ G k and y q x in G for some positiveinteger q . Then q ≤ h ( x ) R ( G ) .Proof. Let G ′ be the group generated by x and the elements of G , and let G ′′ be the groupgenerated by y and the elements of G , so that G ′ lies in G ′′ . Since x is not in √ G , it is20asy to see that the index [ G ′′ : G ′ ] = q . Since x is not even in √ G k , it is even easier tosee that G ∩ k ∗ = G ′ ∩ k ∗ = G ′′ ∩ k ∗ . Thus by Lemma 4.1 we have R ( G ′ ) = qR ( G ′′ ) ≥ q .On the other hand R ( G ′ ) ≤ h ( x ) R ( G ) by Lemma 4.2, and the result follows.

5. Dependence with regulators.

Let K be ﬁnitely generated and transcendental over k as in the preceding sections, and let B be a transcendence basis, now assumed separable,with elements t , . . . , t b . We continue to abbreviate the height h B as h , and again we write C for the ﬁeld of diﬀerential constants of K .The following result eliminates the height functions h ( x, m −

1) from Lemma 3.1,thereby providing a more useful explicit version of Lemma 3 of [Mass].

Lemma 5.1.

Let G in K ∗ be ﬁnitely generated of rank r ≥ modulo k ∗ , and for m ≥ suppose c , . . . , c m are in C and g , . . . g m are in G with c g + · · · + c m g m = 1 . Then either(a) h ( c g , . . . , c m g m ) ≤ m dδ ( r ) R ( G ) or (b) g , . . . , g m are linearly dependent over C .Proof. Just use Lemma 3.1 together with the inequalities h ( g ; m − ≤ m − dδ ( r ) R ( G ) (5 . g = g , . . . , g m .Similarly we deduce a more useful explicit version of Lemma 4 of [Mass]. Lemma 5.2.

Let G in K ∗ be ﬁnitely generated of rank r ≥ modulo k ∗ , and for m ≥ suppose g , g , . . . g m are in G and linearly dependent over C but g , . . . g m are linearlyindependent over C . Then there is a relation c g + · · · + c m g m = g with c , . . . , c m in C and h (cid:18) c g g , . . . , c m g m g (cid:19) ≤ m dδ ( r ) R ( G ) . roof. Just use Lemma 3.2 and (5.1), this time with g = g g , . . . , g m g .We have followed the proof in [Mass] quite closely. It would have been nice to see thewell-known number m ( m − in place of 4 m , and also some notion of genus and S -units asin various formulations of abc matters over function ﬁelds. But despite the considerationsof Chapter 14 of [BG] in zero characteristic and those of Hsia and Wang [HW] for arbitrarycharacteristic we have been unable to supply a satisfactory version. The results of [HW]are especially interesting in their use of divided derivatives or hyperderivations, which forexample in characteristic p leads to linear dependence over the ﬁeld of p e th powers, not justover C with e = 1. If this could be done in our situation, then it would probably lead tosimpliﬁcations in the rest of our proof, and possibly to the elimination of the Proposition insection 8. But it seems that the results of [HW] cannot be directly applied to our Lemma5.1, due to the presence of c , . . . , c m whose heights cannot be controlled.

6. Isotriviality.

We take a well-earned break from estimating. From now on K willhave positive characteristic p (actually this assumption is not really needed until section8), and, as in section 1, we write F K for F p ∩ K . This ﬁeld plays the role of k in sections2,3,4,5.Suppose n ≥ m ≥

1. For a ( i, j ) in K the normalized equations X i = a ( i, X + · · · + a ( i, m − X m − = m − X j =0 a ( i, j ) X j ( i = m, m + 1 , . . . , n ) (6 . P n a linear variety V of dimension m −

1. When G is a subgroup of K ∗ , we needsome conditions which ensure that V is G -isotrivial.Now any G -automorphism taking ( X , . . . , X n ) to ( g X , . . . , g n X n ) leads after renor-malization to new coeﬃcients g i g j a ( i, j ). If the new forms are deﬁned over F K , then everynon-zero a ( i, j ) has the shape g j g i α ( i, j ) for non-zero α ( i, j ) in F K . In particular eachequation in (6.1) deﬁnes a G -isotrivial variety. But also each quotient a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k − ) a ( i k , j k ) a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k ) a ( i k , j ) ( k = 2 , . . . , n + 1) , (6 . F K . The followingresult gives a converse statement which guarantees that the equations (6.1) become deﬁned22ver F K after applying a suitable G -automorphism and renormalizing. In particular itguarantees that V is G -isotrivial; but without the need to recombine the equations. Lemma 6.1.

Suppose that each equation in (6.1) deﬁnes a G -isotrivial variety, and thateach quotient (6.2) lies in F K provided everything in the numerator and denominator isnon-zero. Then V is G -isotrivial.Proof. We argue by induction on the number n − m + 1 ≥ n − m + 1 = 1then the result is obvious without using (6.2). Suppose we have done it for n − m ≥ n − m in (6.1), and let us add another equation, namely thelast one in (6.1).Restricting to i < n and the appropriate j in (6.2), we see from the induction hypothe-sis that a suitable G -automorphism trivializes the ﬁrst n − m equations, without botheringabout X n . This means that we can assume that all a ( i, j ) = 0 ( i < n ) are in F K ; whilethe isotriviality of the last equation means that all a ( n, j ) = 0 are in G . We now want totrivialize the last equation.How can we trivialize a given coeﬃcient a ( n, j ) = 0 in the last equation? If all a ( i, j ) = 0 ( i < n ), so that the ﬁrst n − m equations did not involve X j , then we cansimply replace X j by a ( n, j ) X j and this will not change the ﬁrst n − m equations. We dothis for all such j .If there is only a single j with some a ( i, j ) = 0 ( i < n ), then we can still replace X j by a ( n, j ) X j ; but we then have to correct the new coeﬃcients a ( i,j ) a ( n,j ) = 0 of X j in the i thequation by replacing X i by a ( n, j ) X i ( i = m, . . . , n − j . Call these “bad”.Now we say for diﬀerent j, j ′ in the set { , . . . , m − } that j ∼ j ′ if there is i < n with a ( i, j ) a ( i, j ′ ) = 0 (6 . j, j ′ are both bad). This relation is symmetric but probably not tran-sitive. We can extend it via reﬂexivity and transitivity to a genuine equivalence relationon the bad elements of { , . . . , m − } , which we then denote by ≃ .We assume for the moment that there is a single equivalence class: any two j, j ′ arerelated. 23et j, j ′ be diﬀerent bad elements, so that a ( i, j ) = 0 , a ( i ′ , j ′ ) = 0 for some i, i ′ < n .From our equivalence class assumption j ≃ j ′ . Suppose that j = j ∼ j ∼ · · · ∼ j k − ∼ j k = j ′ , where of course we can take 2 ≤ k ≤ n + 1. Then we get from (6.3) a ( i , j ) a ( i , j ) = 0 , a ( i , j ) a ( i , j ) = 0 , . . . , a ( i k − , j k − ) a ( i k − , j k ) = 0for some i , i , . . . , i k − < n . We use (6.2) with i k = n to see that a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k − ) a ( n, j ′ ) a ( i , j ) a ( i , j ) a ( i , j ) · · · a ( i k − , j k ) a ( n, j )lies in F K . However the ﬁrst k − F K , because we trivialized the ﬁrst n − m equations. Consequently a ( n,j ′ ) a ( n,j ) lies in F K .Thus we have shown that all a ( n, j ) for bad j are multiples of a single one, call it g ,by elements of F K . Now they can be simultaneously trivialized on replacing X j by gX j .Again we must correct the new coeﬃcients a ( i,j ) g = 0 of X j in the i th equation by replacing X i by gX i ( i = m, . . . , n − { , . . . , m − } ? Say there are h ≥ J . . . , J h . Let I be the set of i in { m, . . . , n − } for which there is j in J with a ( i, j ) = 0; and similarly for I , . . . , I h .Then I , I , . . . , I h are disjoint, because for example with any j in J and any j in J there can be no i with a ( i, j ) a ( i, j ) = 0, else by (6.3) we would have j ∼ j . (If onewishes, one can convert the matrix of the ﬁrst n − m equations into a block matrix usingrow and column permutations.) The argument above, using i , . . . , i k − in I , shows thatall non-zero a ( n, j ) ( j ∈ J ) are multiples of a single one, call it g , by elements of F K .Similarly we get g , . . . , g h . Now we can trivialize the last row as follows. We replace the X j ( j ∈ J ) by g X j and we correct the eﬀect by replacing X i by g X i ( i ∈ I ). Similarlyusing g , . . . , g h we trivialize the remaining coeﬃcients. This completes the proof.

7. Automorphisms.

As above let K be a ﬁeld, ﬁnitely generated and transcendentalover F p , with G a subgroup of K ∗ . Suppose a linear variety in P n is deﬁned over K and G -isotrivial. Then by deﬁnition there is a G -automorphism ψ taking it to somethingdeﬁned over F K = F p ∩ K . To make our Theorems 1,2 and 3 fully eﬀective we have to24stimate this ψ ; indeed after doing the whole descent to single points using Theorem 1, forexample, it is mainly G -automorphisms that are left.Now it is convenient to use the projective height h P = h P B deﬁned on P l − ( K ) by h P ( x , . . . , x l ) = log Y w max {| x | d w w , . . . , | x l | d w w } . This yields at once a height h ( ψ ) of a G -automorphism ψ , deﬁned by (1.7), as h ( ψ ) = h P ( g , . . . , g n ) . Also if V is linear in P n deﬁned over K , it yields a height h ( V ) in the standard way viathe Grassmannian coordinates of V ; see for example [S] (p.28), which however is in thecontext of number ﬁelds with euclidean norms at the archimedean valuations. Here wehave no archimedean valuations, so the norm problem is irrelevant. If m − ≥ V , then its Grassmannians A ( I ) correspond to subsets I of { , . . . , n } withcardinality n − m + 1 ≤ n . The Northcott Property extends at once to this height. Alsofor ψ in (1.7) the Grassmannians of ψ ( V ) are the A ( I ) g ( I ) , where g ( I ) = Q i ∈ I g i . It followseasily that h ( ψ ( V )) ≤ h ( V ) + nh ( ψ ) , h ( ψ − ) ≤ nh ( ψ ) . (7 . W also over K . Lemma 7.1. If V ∩ W is non-empty then we have h ( V ∩ W ) ≤ h ( V ) + h ( W ) . If further X n − = 0 on V and the equations of V do not involve X n , and W is deﬁned by X n = aX n − then h ( V ∩ W ) ≥ max { h ( V ) , h ( W ) } .Proof. The upper bound may be compared with the inequality h ( V ∩ W ) + h ( V + W ) ≤ h ( V ) + h ( W ) due independently to Struppeck-Vaaler [SV] (Theorem 1 p.493) and Schmidt[S] (Lemma 8A p.28). These are proved over number ﬁelds; however it is easily checkedthat the proof in [S] remains valid with trivial modiﬁcations. Already a special case wasnoted by Thunder [T] whose Lemma 5 (p.157) implies h ( V + W ) ≤ h ( V ) + h ( W ) overfunction ﬁelds of a single variable provided V ∩ W is empty.Regarding the lower bound, let A ( I ) be the Grassmannians of V . Then it is easy toverify that the Grassmannians of V ∩ W consist of the A ( I ) together with the aA ( J ) for J not containing n −

1. There follows h ( V ∩ W ) ≥ h ( V ) at once. Also X n − = 0 on V means that at least one A = A ( J ) is non-zero (see for example Theorem 1 of [HP] p.298),so we get also h ( V ∩ W ) ≥ h P ( A, aA ) = h ( a ) = h ( W ). This completes the proof.25t is the following result which enables ψ to be estimated in the Descent Steps. Lemma 7.2.

Suppose that V is deﬁned over K and is G -isotrivial. Then there is a G -automorphism ψ with ψ ( V ) deﬁned over F K and h ( ψ ) ≤ n ! h ( V ) .Proof. Suppose dim V = m − A ( I ); then as noted above theGrassmannians of ψ ( V ) are the A ( I ) g ( I ) , where g ( I ) = Q i ∈ I g i . If ψ ( V ) is deﬁned over F K thenthese have the shape λα ( I ) for λ in K ∗ and α ( I ) in F K . Thus we have A ( I ) = λα ( I ) g ( I )for all I ; but we can restrict to the set I of all I with A ( I ) = 0 (and so α ( I ) = 0). We caneliminate the λ by ﬁxing I in I ; this gives g ( I ) g ( I ) = A ( I ) A ( I ) α ( I ) α ( I ) ( I ∈ I ) . (7 . ψ ( V ) is deﬁned over F K .To solve (7.2) for g , . . . , g n we divide the numerator and denominator of the left-hand side by g n − m +10 and write it as ( g g ) a ( I, · · · ( g n g ) a ( I,n ) for integers a ( I, i ) which are0 , , −

1. If the vectors a ( I ) ( I ∈ I ) with coordinates a ( I, i ) ( i = 1 . . . , n ) have full rank n then we can solve (7.2) by choosing a ( I ) , . . . , a ( I n ) linearly independent and then solvingthe subset of (7.2) with I = I , . . . , I n . A multiplicative form of Cramer’s Rule gives (cid:18) g i g (cid:19) b = Q b i · · · Q b in n , Q j = A ( I j ) A ( I ) α ( I ) α ( I j ) ( j = 1 , . . . , n )with integers b = 0 and b ij . These b ij are minors of a matrix with entries 0 , , − | b ij | ≤ ( n − | b | h (cid:18) g g , . . . , g n g (cid:19) ≤ max i =1 ,...,n {| b i | + · · · + | b in |} h ( Q , . . . , Q n ) . The height on the left is h ( ψ ) and that on the right at most h ( V ). The result follows atonce, at least under our assumption that the a ( I ) ( I ∈ I ) have full rank n .If this assumption does not hold, then we simply increase the rank by successively ad-joining unit vectors e k until the rank becomes n ; this amounts to the addition of equations g k g = 1. Now we take a subset of n independent equations and solve again with Cramer.The resulting estimates are certainly no larger than before, and this completes the proof.

8. A proposition.

This, the main result of the present section, is a ﬁrst step in the proofof the Descent Step over √ G , with V in P n ( n ≥

2) either a hyperplane or deﬁned over26 ﬁnite ﬁeld. We continue with our assumption that K is ﬁnitely generated over F p ; thus F K = F p ∩ K is a ﬁnite ﬁeld. Let G in K ∗ be ﬁnitely generated of rank r ≥ F ∗ K ; now we may write without confusion simply that G is ﬁnitely generated. It is knownthat the radical √ G , which by deﬁnition lies still in K , is also ﬁnitely generated (see forexample [Mass] p.195), also clearly of rank r over F ∗ K . For the moment we work exclusivelywith this radical. We further assume that K is transcendental over F p and we choose anyseparable transcendence basis B ; then we are free to apply the results of sections 3,4 and5 about heights h = h B and regulators R = R B .We say that V is transversal if every coordinate X i ( i = 0 , . . . , n ) actually occursin the deﬁning equations. This property is independent of the choice of equations. Itspurpose is to prevent “free variables” as in (1.1) with a i = 0.Transversality is a harmless restriction because we could overcome it simply by work-ing in lower dimensions. Clearly every linear subvariety of a transversal variety is alsotransversal. Also a transversal variety must be proper (i.e. not the full P n ).We recall the function δ from Lemma 4.3. Proposition . Let V be a transversal linear subvariety of P n deﬁned over K , and supposeeither that V has dimension n − or that V is deﬁned over some F q . Suppose also that V is not contained in any coset T = P n . Let π be any point of V ( √ G ) .If V has dimension n − , then either(i) there is a proper linear subvariety W of V , also deﬁned over K , with h ( W ) ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) , such that π lies in W ( √ G ) ,or(ii) there is a √ G -automorphism ψ with h ( ψ ) ≤ npδ ( n + r ) R ( √ G ) , a point π ′ and a linear subvariety V ′ of P n such that π = ψ ( π ′ p ) and V = ψ ( V ′ p ) .If V is deﬁned over F q , then either(i) there is a proper linear subvariety W of V , also deﬁned over K , with h ( W ) ≤ n n dδ ( n + r ) R ( √ G ) , such that π lies in W ( √ G ) , r(iii) there is a point π ′ in P n ( √ G ) with π = π ′ p .Proof. Suppose ﬁrst that V has dimension n −

1. Then we just have to follow the argumentsof the proof of Lemma 5 of [Mass] (p.197). Because these arguments are expressed in termsof “broad sets” and this notion is no longer appropriate, we write out all the details.Because V is transversal, we may work aﬃnely with a point π = ( x , . . . , x n ) satisfyinga single equation a x + · · · + a n x n = 1 (8 . C for the ﬁeld of p th powers in K , andconsider s = dim C ( Ca x + · · · + Ca n x n ) , so that 1 ≤ s ≤ n .First suppose that s = n . Then we apply Lemma 5.1 with k = F K , m = n and c = · · · = c m = 1 and g = a x , . . . , g m = a m x m . So the group must be enlarged byadjoining a , . . . , a n to √ G , becoming of rank at most n + r . The enlarged regulator R can be estimated by Lemma 4.2, and we ﬁnd R ≤ n h ( a ) · · · h ( a n ) R ( √ G ) ≤ n h ( V ) n R ( √ G ) . (8 . b ) of Lemma 5.1 is ruled out by s = n ; and the conclusion ( a ) shows that h ( a x , . . . , a n x n ) ≤ n dδ ( n + r ) R . It follows that h ( π ) = h ( x , . . . , x n ) is at most4 n dδ ( n + r ) R + h ( a − , . . . , a − n ) ≤ n dδ ( n + r ) R + nh ( V )and so from (8.2) we deduce h ( π ) ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) + nh ( V ) ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) . (8 . W = { π } for ( i ) of the Proposition; and for these h ( W ) = h ( π ) is boundedas in (8.3).Next suppose that 1 < s < n . By means of a permutation we can assume that g = a x , . . . , g s = a s x s are linearly independent over C . Take any k with s + 1 ≤ k ≤ n ;28hen we can apply Lemma 5.2 with m = s and g = a k x k , √ G being enlarged as above.We ﬁnd relations s X j =1 c kj a j x j = a k x k ( k = s + 1 , . . . , n ) (8 . c kj in C and the quotients f kj = c kj a j x j a k x k ( j = 1 , . . . , s ; k = s + 1 , . . . , n ) (8 . h ( f k , . . . , f ks ) ≤ s dδ ( n + r ) R ( k = s + 1 , . . . , n ) (8 . a k x k ( k = s + 1 , . . . , n ) in (8.1). We ﬁnd c a x + · · · + c s a s x s = 1 (8 . c j = 1 + n X k = s +1 c kj ( j = 1 , . . . , s ) (8 . C .Next apply Lemma 5.1 with m = s to (8.7) and g j = a j x j ( j = 1 , . . . , s ) also in theenlarged √ G . Again conclusion ( b ) is impossible. It follows that the f j = c j a j x j ( j = 1 , . . . , s ) (8 . h ( f , . . . , f s ) ≤ s dδ ( n + r ) R . (8 . x j x k are bounded modulo C whereas in (8.9) certain x j themselves are bounded modulo C . We can eliminate C by substituting (8.8) into (8.9)and using (8.5) to get f j = a j x j + n X k = s +1 f kj a k x k ( j = 1 , . . . , s ) . (8 . a j = 0 ( j = 1 , . . . , s ) these express the fact that π = ( x , . . . , x n ) lies on a linearvariety V ′ of dimension n − s ; and because s = 1 this dimension is strictly less than thedimension n − V . So we can take W as the intersection of V ′ with V . This is in fact V ′ because if we add up all the above equations (8 .

11) and use (8.4),(8.5),(8.7),(8.9), thenwe end up with (8.1). 29ow we have to estimate the height of (8.11). In the corresponding matrix, everycolumn has by (8.6) and (8.10) height at most 4 s dδ ( n + r ) R + h ( V ), which as above in(8.3) we can estimate by B = 8 n n dδ ( n + r ) h ( V ) n R ( √ G ) . It follows that h ( W ) ≤ sB ≤ n n dδ ( n + r ) h ( V ) n R ( √ G ) . This too settles ( i ) of the Proposition.Finally suppose s = 1. This means that a x , . . . , a n x n are in C . By Lemma 4.4 with l = p we can write x j = g j x ′ pj with g j , x ′ j in √ G ( j = 1 , . . . , n ) and h ( g j ) ≤ pδ ( r ) R ( √ G ) ≤ pδ ( n + r ) R ( √ G ) ( j = 1 , . . . , n ) . Then a j g j is in C so has the form a ′ pj ( j = 1 , . . . , n ). Finally1 = a x + · · · + a n x n = a ′ p x ′ p + · · · + a ′ pn x ′ pn = ( a ′ x ′ + · · · + a ′ n x ′ n ) p , and this gives part ( ii ) of the Proposition, with ψ as in (1.7) above for g = 1, π ′ =( x ′ , . . . , x ′ n ), and V ′ deﬁned by (8.1) above with the new coeﬃcients a ′ , . . . , a ′ n .This proves the Proposition when V has dimension n −

1. Incidentally when thecoeﬃcients in (8.1) are in some F q , then the argument for s = 1 shows that x , . . . , x n arein C . So they are p -th powers x ′ p , . . . , x ′ pn ; and clearly x ′ , . . . , x ′ n are in √ G . Thus we getthe conclusion ( iii ) of the Proposition when V has dimension n −

1. And the case s = 1leads of course to ( i ). So it remains only to treat V of dimension m − < n − F q .This we do by expressing the aﬃne equations of V in triangular form, which after apermutation we can suppose are x i = a i + a i x + · · · + a i,m − x m − ( i = m, m + 1 , . . . , n ) (8 . a ij in F q . This gives V = V m ∩ · · · ∩ V n for the varieties deﬁned individually byeach equation.Consider the ﬁrst equation. There may be some zero coeﬃcients a mj , but not all arezero, because V ( √ G ) is non-empty. In fact at least two are non-zero otherwise V wouldbe contained in a coset T = P n contrary to our assumption. We can thus regard V m asa transversal variety of codimension 1 in some projective space of dimension at least 230nd at most m < n . Applying the Proposition for the cases already proved, we get twopossibilities ( i ) , ( iii ). If ( i ) holds for V m , then we get a proper subvariety W m of V m with h ( W m ) ≤ n n dδ ( n + r ) R ( √ G ) . (8 . W m intersects the remaining intersection U m = T i = m V i in a proper subspace of V = V m ∩ U m . For example the triangular nature of(8.12) makes it clear that x m +1 , . . . , x n are determined by x , . . . , x m − on U m , and thenthat x m is determined by x , . . . , x m − on W m in V m ; but also some non-zero polynomialof degree at most 1 in x , . . . , x m − must vanish on W m . So W = W m ∩ U m has dimensionstrictly less than m −

1. By Lemma 7.1 we have h ( W ) ≤ h ( W m ). So by (8.13) we get ( i )of the Proposition for the original V . But what happens if ( iii ) holds for V m ?This means that all the x j actually occurring in the ﬁrst equation of (8.12) are p -thpowers, which certainly goes some way in the direction of ( iii ) for V . But then we can trythe second equation instead. Either we get a W as above, or all the x j actually occurringin the second equation of (8.12) are p -th powers. And so on. In the end, we either get W orthat all the x j actually occurring in all the equations (8.12) are p -th powers. Because V istransversal this does give the full ( iii ) for V ; and so completes the proof of the Proposition.

9. The main estimate.

This is a quantitative version of our Descent Step over √ G without the requirement that the subvarieties W are isotrivial. This leads to a relativelysmall exponent attached to the height h ( V ). As before n ≥

2, and we continue withour assumption that K is ﬁnitely generated and transcendental over F p , with separabletranscendence basis B and F K = F p ∩ K ; further G is ﬁnitely generated of rank r ≥ F ∗ K . Main Estimate . Let V be a positive-dimensional linear subvariety of P n deﬁned over K but not a coset.(a) If V is not √ G -isotrivial, then V ( √ G ) = [ W ∈W W ( √ G ) for a ﬁnite set W of proper linear subvarieties W of V , also deﬁned over K and with h ( W ) ≤ n d (10 n δ ( n + r )) n +1 h ( V ) n R ( √ G ) n +2 . b) If V is √ G -isotrivial and ψ ( V ) is deﬁned over F q , then V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! for a ﬁnite set W of proper linear subvarieties W of V , also deﬁned over K and with h ( ψ ( W )) ≤ n n ( q/p ) dδ ( n + r ) R ( √ G ) . Proof.

We prove this ﬁrst when V is transversal and not contained in any coset T = P n .We start with √ G -isotrivial V . Because we estimate h ( ψ ( W )) and not h ( W ), it clearlysuﬃces to assume that ψ is the identity, so that V is deﬁned over F q . Take arbitrary π in V ( √ G ) not in V ( F K ). Then either ( i ) or ( iii ) of the Proposition holds.If ( i ) holds, then ( b ) looks good with e = 0 (and ψ the identity); at least π lies insome W ( √ G ) for a proper subvariety W of V , deﬁned over K , with h ( W ) ≤ n n dδ ( n + r ) R ( √ G ) . (9 . iii ) holds? Now any a in F q has a unique p th root a p in F q , which is also aconjugate of a over F p . We get a new point π ′ in V ′ ( √ G ), also not in V ′ ( F K ), for a newvariety V ′ in P n which is a conjugate of V . The new variety has the same dimension as V , and is also deﬁned over F q . So we can repeat the process, and again we get either ( i )or ( iii ) of the Proposition.If ( i ) holds, then π ′ lies in some W ′ ( √ G ) again with W ′ over K and h ( W ′ ) boundedas in (9.1). So π lies in ( W ′ ( √ G )) p as in (b) with e = 1.Or if ( iii ) holds, then we get a new point π ′′ in V ′′ ( √ G ) for a new conjugate V ′′ of V in P n .And so on, in a manner similar to the looping in the p -automata of [D] section 4. Be-cause π was not in V ( F K ), this procedure must eventually stop at some proper subvariety W ( L ) over K of V ( L ) (here the number L of repetitions might depend on π ). Now theoriginal point π lies in ( W ( L ) ( √ G )) p L with h ( W ( L ) ) bounded as in (9.1).Because π was arbitrary in V ( √ G ) not in the ﬁnite set V ( F K ), the conclusion so faris V ( √ G ) ⊆ [ W ∈W ∞ [ L =0 ( W ( √ G )) p L W of proper subvarieties W of conjugates of V deﬁned over K and satisfying(9.1); here we may have to include single points W with h ( W ) = 0. To get equality wewrite q = p f and L = f e + l for e ≥ ≤ l ≤ f −

1; this gives V ( √ G ) ⊆ [ ˜ W ∈ ˜ W ∞ [ e =0 ( ˜ W ( √ G )) q e with a new collection ˜ W of proper subvarieties ˜ W = W p l of conjugates of V with h ( ˜ W ) = p l h ( W ) ≤ n n ( q/p ) dδ ( n + r ) R ( √ G ) . Finally by intersecting each ˜ W with V = V q we can assume that each ˜ W is a propersubvariety of V itself in the above, without increasing the height further. Because V isdeﬁned over F q , the ( ˜ W ( √ G )) q e now lie in ( V ( √ G )) q e = V ( √ G ), and so at last the twosides are equal. Now we have the desired ( b ); of course the ﬁniteness of the collection of˜ W follows from the Northcott Property already noted in section 7. This settles the caseof transversal √ G -isotrivial V not contained in a proper coset.Henceforth (until further notice) we will assume that V is not √ G -isotrivial (and stilltransversal not contained in a proper coset).Suppose ﬁrst that V is a hyperplane. Take arbitrary π in V ( √ G ). Then either ( i ) or( ii ) of the Proposition holds. We regard this dichotomy as the starting stage l = 1.If ( i ) holds, then as before ( a ) of the Main Estimate looks good; at least π lies in some W ( √ G ) for a proper subvariety W of V , deﬁned over K , with h ( W ) ≤ Ch ( V ) n (9 . C = 8 n n dδ ( n + r ) R ( √ G ) . (9 . ii ) holds? We get a new point π ′ in V ′ ( √ G ) for a new variety V ′ in P n with π = ψ ( π ′ p ) , V = ψ ( V ′ p ) . (9 . ψ is a √ G -automorphism with h ( ψ ) ≤ pB (9 . B = nδ ( n + r ) R ( √ G ) . (9 . V ′ is also a hyperplane, and also not √ G -isotrivial. So we can repeat the process,and again we get either ( i ) or ( ii ) of the Proposition. This dichotomy is the next stage l = 2.If ( i ) holds, then π ′ lies in some W ′ ( √ G ). So π lies in W ( √ G ) for W = ψ ( W ′ p ),almost as good as above, except that h ( W ) could be larger than before. We take care ofthis later.Or if ( ii ) holds, then we get a new point π ′′ in V ′′ ( √ G ) for a new variety V ′′ in P n .And so on. At stage l we get either π ( l − in a proper subvariety W ( l − of V ( l − with h ( W ( l − ) ≤ Ch ( V ( l − ) n (9 . π ( l ) in V ( l ) ( √ G ) for a new variety V ( l ) with π ( l − = ψ ( l − (( π ( l ) ) p ) , V ( l − = ψ ( l − (( V ( l ) ) p ) . (9 . h ( ψ ( l − ) ≤ pB. (9 . V is not √ G -isotrivial,and after a certain number L of repetitions which this time is independent of π . Actuallylet us deﬁne the integer L ≥ p L ≤ h ( V ) R ( √ G ) < p L +1 . (9 . V = ψ l (( V ( l ) ) p l ) with the √ G -automorphism ψ l = ψψ ′ p · · · ( ψ ( l − ) p l − . (9 . V in the aﬃne form (8.1), we know that some coeﬃcient x = a j = 0does not lie in √ G , and x = gy p l for some g in √ G and some y in K . We can now applyLemma 4.5, because √ G k there is just √ G . We conclude that p l ≤ h ( x ) R ( √ G ) ≤ h ( V ) R ( √ G ) .

34n view of (9.10) this means that ( ii ) cannot hold for l = L + 1. Thus there is some L with 0 ≤ L ≤ L such that ( ii ) holds at stages l = 1 , . . . , L (at least if L ≥ i ) holds at stage l = L + 1. We conclude that π ( L ) lies in W ( L ) , and from (9.7) h ( W ( L ) ) ≤ Ch ( V ( L ) ) n . (9 . π = ψ L (( π ( L ) ) p L ) lies in W = ψ L (( W ( L ) ) p L ). By (7.1) and (9.11) we get h ( W ) ≤ p L h ( W ( L ) ) + nh ( ψ L ) ≤ p L h ( W ( L ) ) + n (cid:16) h ( ψ ) + ph ( ψ ′ ) + · · · + p L − h ( ψ ( L − (cid:17) , which using (9.9) and (9.12) yields h ( W ) ≤ Cp L h ( V ( L ) ) n + 2 np L B ≤ C ( p L h ( V ( L ) ) n + 2 np L B. (9 . h ( V ( L ) ) we use (7.1), (9.8) and (9.9) to get ph ( V ( l ) ) = h (( ψ ( l − ) − V ( l − ) ≤ h ( V ( l − ) + n h ( ψ ( l − ) ≤ h ( V ( l − ) + n pB. If L ≥ p l − and sum from l = 1 to l = L , getting p L h ( V ( L ) ) ≤ h ( V ) + 2 n p L B (which holds also if L = 0). Inserting this into (9.13) we get h ( W ) ≤ C (cid:0) h ( V ) + 2 n p L B (cid:1) n + 2 np L B ≤ C (cid:0) h ( V ) + 2 n p L B (cid:1) n , and then using (9.6) and (9.10) with L ≤ L we ﬁnd h ( W ) ≤ Ch ( V ) n (cid:16) n δ ( n + r ) R ( √ G ) (cid:17) n ≤ Ch ( V ) n (cid:16) n δ ( n + r ) R ( √ G ) (cid:17) n From (9.3) we get ﬁnally h ( W ) ≤ C ′ h ( V ) n R ( √ G ) n +2 (9 . C ′ = 16 n n dδ ( n + r ) (cid:0) n δ ( n + r ) (cid:1) n ≤ n d (cid:0) n δ ( n + r ) (cid:1) n +1 . Because π was arbitrary, the conclusion so far is V ( √ G ) ⊆ [ W ∈W W ( √ G )for a ﬁnite collection W of proper subvarieties W of V satisfying (9.14). But then the twosides are of course equal. This settles the Main Estimate for transversal hyperplanes V that are not √ G -isotrivial and not contained in a proper coset.35ext suppose that V , still not √ G -isotrivial (and still transversal not contained in aproper coset), has dimension m − m < n . So after a permutation of variablesit can be deﬁned by equations (6.1). Each of these equations deﬁnes a hyperplane V i , sothat V = V m ∩ · · · ∩ V n .We claim that we can assume that all non-zero a ( i, j ) lie in √ G . Otherwise forexample V m is transversal and not √ G -isotrivial in the projective space with coordinates X j corresponding to j = m and the j with a ( m, j ) = 0. Since no X m − aX j ( m = j, a = 0)vanishes on V , this projective space has dimension at least 2. So then we could applythe hyperplane result (9.14) to deduce that all solutions lie in a ﬁnite union of propersubspaces W m of this V m with h ( W m ) ≤ C ′ h ( V m ) n R ( √ G ) n +2 . But as in the aﬃne situation just after (8.13), it can be seen that W m intersects the remain-ing intersection U m = T i = m V i in a proper subspace of V = V m ∩ U m . For example the tri-angular nature of (6.1) makes it clear that X m +1 , . . . , X n are determined by X , . . . , X m − on U m , and then that X m is determined by X , . . . , X m − on W m in V m ; but also somenon-zero linear form in X , . . . , X m − must vanish on W m . Therefore W = W m ∩ U m hasdimension strictly less than m −

1. So we are indeed in a proper subspace as required by(a) of the Main Estimate. Further W = W m ∩ V and so h ( W ) ≤ h ( W m ) + h ( V ) by Lemma7.1; moreover h ( V m ) ≤ h ( V ) because the a ( m, j ) are themselves among the Grassmanniancoordinates of V . We end up with (9.14) with say an extra factor 2.So indeed from now on we can assume that all non-zero a ( i, j ) in (6.1) lie in √ G . Thismeans that we are set up to apply Lemma 6.1. We will see that the eﬀect is to pass toa proper subvariety of at least one of V m , . . . , V n despite their being separately isotrivial.As V is not √ G -isotrivial by assumption, we ﬁnd some quotient (6.2), say Q , not lyingin F K . Let π = ( ξ , . . . , ξ n ) be any point of V ( √ G ). For a typical factor a ( i,j ) a ( i,j ′ ) in Q weapply part ( b ) of the Main Estimate in lower dimensions to V i , with ψ i determined by 1and the non-zero a ( i, j ). So here q = p . We ﬁnd ﬁnitely many proper subspaces W i of V i such that ψ i ( V i ( √ G )) lies in the union of the S ∞ e =0 ( ψ i ( W i )( √ G )) p e , with h ( ψ i ( W i )) ≤ n n dδ ( n + r ) R ( √ G ) (9 . p ). In particular, writing π i for the projection of π to the lowerdimensional space, we have equations ψ i ( π i ) = σ q i i (9 . σ i in some ψ i ( W i ) and some power q i of p . Thus a ( i,j ) ξ j a ( i,j ′ ) ξ j ′ = η q i for certain η = η ( i, j, j ′ )in K ∗ . Multiplying all these over the factors in (6.2) we ﬁnd Q = η q · · · η q k k for certain η , . . . , η k in K ∗ . Because the ﬁxed Q is not in F K , this forces q = min { q , . . . , q k } to bebounded above by some quantity depending only on V . In fact h ( Q ) ≥ q , but on the otherhand from (6.2) we see that h ( Q ) ≤ ( n + 1) h ( V ). Thus q ≤ ( n + 1) h ( V ) . (9 . q = q i . Now (9.16) says that π i and so π lies in the variety U = ψ − i ( ψ i ( W i )) q of dimension strictly less than the dimension of V i . This intersects V i in a proper subvariety W ′ i of V i . Once more this W ′ i intersects the remaining intersection T i ′ = i V i ′ in a proper subvariety W of V . As for heights, we have W = W ′ i ∩ V so h ( W ) ≤ h ( W ′ i ) + h ( V ). Also h ( W ′ i ) ≤ h ( U ) + h ( V i ) ≤ h ( U ) + h ( V ), and also h ( U ) ≤ qh ( ψ i ( W i )) + nh ( ψ − i ) ≤ qh ( ψ i ( W i )) + n h ( V i )because of the deﬁnition of ψ i . Putting these together and using (9.15),(9.17) we concludethat h ( W ) ≤ n ( n + n + 3)4 n dδ ( n + r ) h ( V ) R ( √ G ) . This is much smaller than (9.14), and so we have completed the proof of the Main Estimatewhen V is transversal and not contained in a proper coset. In case (a) we have reached sofar the bound h ( W ) ≤ Ah ( V ) n R n +2 with R = R ( √ G ) and A = 4 n d (10 n δ ( n + r )) n +1 due to the extra factor 2 encountered after establishing (9.14).To treat the more general situation when V is transversal and not itself a coset,we use induction on n ≥

2, and we will obtain in case (a) the slightly weaker result h ( W ) ≤ Ah ( V ) n R n +2 + nh ( V ). This leads at once to the bound given in the MainEstimate.If n = 2 then there is a single equation a X + a X + a X = 0, and transversalityimplies all a i = 0. Thus no X i − aX j ( i = j, a = 0) vanishes on V , and we are done. Thuswe can suppose that n ≥ X n − aX n − ( a = 0) vanishes on V . In the remaining equations for V we may eliminate X n to obtain a linear variety ˜ V in P n − . This ˜ V cannot be a coset otherwise V would be. Also ˜ V certainly involves thevariables X , . . . , X n − and so is transversal in P ˜ n for ˜ n = n − n = n −

1. Here ˜ n ≥ n = 3; but in that case if ˜ V is not transversal in P then V would be deﬁned by37quations X = aX and b X + b X = 0 so would be a coset. Thus we can assume that˜ V is transversal in P ˜ n with ˜ n ≥ V is not √ G -isotrivial as in (a). Then ˜ V cannot be √ G -isotrivialotherwise we could transform X n to make V isotrivial. Thus by induction the MainEstimate holds for ˜ V . It is now relatively straightforward to deduce the Main Estimatefor V . Thus by case (a) for ˜ V we get˜ V ( √ G ) = [ ˜ W ∈ ˜ W ˜ W ( √ G ) (9 . W of proper linear subvarieties ˜ W of ˜ V , also deﬁned over K and with h ( ˜ W ) ≤ Ah ( ˜ V ) n R n +2 + ( n − h ( ˜ V ). Now we will check that (a) for V follows with W deﬁned by the equations of ˜ W together with X n = aX n − . First the upper bound ofLemma 7.1 gives h ( W ) ≤ h ( ˜ W ) + h ( a ) ≤ Ah ( ˜ V ) n R n +2 + ( n − h ( ˜ V ) + h ( a ) . (9 . X n − = 0 on ˜ V , else (9.18) would be empty; and so the lower bound ofLemma 7.1 gives h ( V ) ≥ max { h ( ˜ V ) , h ( a ) } . Therefore (9.19) implies h ( W ) ≤ Ah ( V ) n R n +2 + nh ( V )as required.And in case (b) for √ G -isotrivial V (assuming as above that ψ is the identity) wesee that ˜ V is √ G -isotrivial and a lies in F q . We get (b) for V from (b) for ˜ V using theanalogue ˜ V ( √ G ) = S ˜ W ∈ ˜ W S ∞ e =0 ( ˜ W ( √ G )) q e of (9.18) with as above W deﬁned by theequations of ˜ W together with X n = aX n − ; now h ( W ) ≤ h ( ˜ W ).What if V is not transversal (and of course still not a coset)? Then it is transversal(and still not a coset) in some projective subspace of dimension n ′ ≤ n −

1. Here n ′ ≥ n ′ now leadimmediately to the same cases in P n ; we have merely ignored n − n ′ projective variablesthat were never in the equations anyway.This ﬁnally ﬁnishes the proof of the Main Estimate.In view of the fact that the estimate in case (a) is independent of the characteristic p , it may seem a nuisance that the estimate in case (b) depends on p . But actually this is38navoidable, and there are even examples to show that the full q/p is needed. To see this,take any power q > p , and deﬁne K = F q ( t ) with G = √ G generated by t, − t and agenerator ζ of F ∗ q . Here we have r = 2 , R ( √ G ) = √ d = 1. The aﬃne equations x + y = 1 , x + ζz = 1give rise to a √ G -isotrivial line V (with h ( V ) = 0 and ψ the identity), and an upper bound B in (b) would mean that all solutions over √ G are given by w, w q , w q , . . . for some w with h ( w ) ≤ B . Thus every solution π would have either h ( π ) ≤ B or h ( π ) ≥ q . But π = ( x, y, z ) = (cid:18) (1 − t ) q/p , t q/p , t q/p ζ (cid:19) is a solution with h ( π ) = q/p . It follows that B ≥ q/p .

10. Isotrivial W . We show here how to ensure that all the subvarieties W in the MainEstimate can be made √ G -isotrivial, at the expense of enlarging the exponents in theupper bounds for their heights. To simplify the various expressions we abbreviate thefactors in case (a) of the Main Estimate by∆ = ∆( n, r, d ) = 8 n d (10 n δ ( n + r )) n +1 ≥ , h = h ( V ) , R = R ( √ G ) , (10 . n, r, d, p, q ) = 8 n n ( q/p ) dδ ( n + r ) ≥ . (10 . ρ ( m ) = ρ n ( m ) = (2 n ) m − n − , η ( m ) = η n ( m ) = (2 n ) m ( m = 1 , , . . . ) Main Estimate for isotrivial W . Let V be a linear subvariety of P n deﬁned over K butnot a coset, with dimension m − ≥ .(a) If V is not √ G -isotrivial, then V ( √ G ) = [ W ∈W W ( √ G )39 or a ﬁnite set W of proper linear √ G -isotrivial subvarieties W of V , also deﬁned over K and with h ( W ) ≤ (∆ R n +2 ) ρ ( m ) h η ( m ) (10 . (b) If V is √ G -isotrivial and ψ ( V ) is deﬁned over F q , then V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! for a ﬁnite set W of proper linear √ G -isotrivial subvarieties W of V , also deﬁned over K and with h ( ψ ( W )) ≤ (∆ R n +2 ) ρ ( m − (Ψ R ) η ( m − . Proof.

We start with case (a), and now we can write the bound as h ( W ) ≤ ∆ h n R n +2 (10 . W not necessarily √ G -isotrivial. We show by induction on the dimension m − ≥ V that the increased bound h ( ˜ W ) ≤ (∆ R n +2 ) ρ ( m ) h η ( m ) (10 . W are √ G -isotrivial.When m = 2 then the W are points and so automatically √ G -isotrivial as long as W ( √ G ) is non-empty.When m ≥ W is not √ G -isotrivial. We observe that such a W cannot be a coset T . For the latter is deﬁned by ﬁnitely many X i = a ij X j ( a ij = 0),and if T ( √ G ) is non-empty then clearly each a ij lies in √ G . But now it is easy to see that T is √ G -isotrivial after all. For example we can rewrite the equations as a i X i = a j X j with a i , a j in √ G . Then we can set up an equivalence relation on { , , . . . , n } characterized bythe equivalence of such i, j . And now we need change only the variables in the equivalenceclasses of cardinality at least 2 in order to trivialize T .So by induction each of these W satisﬁes W ( √ G ) = [ ˜ W ∈ ˜ W ˜ W ( √ G )40ith √ G -isotrivial ˜ W such that h ( ˜ W ) ≤ (∆ R n +2 ) ρ ( m − h ( W ) η ( m − . Therefore all we have to do is substitute (10.4) into this. We ﬁnd the upper bound (10.5)because ρ ( m −

1) + η ( m −

1) = ρ ( m ) , nη ( m −

1) = η ( m ) . For case (b) we write the bound as h ( ψ ( W )) ≤ Ψ R (10 . W not necessarily √ G -isotrivial. If some W is not √ G -isotrivial, then neither is ψ ( W ), and we can write ψ ( W )( √ G ) = [ W ∗ ∈W ∗ W ∗ ( √ G )with √ G -isotrivial W ∗ such that h ( W ∗ ) ≤ (∆ R n +2 ) ρ ( m − h ( ψ ( W )) η ( m − . (10 . h ( ψ ( ˜ W )) ≤ (∆ R n +2 ) ρ ( m − (Ψ R ) η ( m − (10 . W = ψ − ( W ∗ ) are √ G -isotrivial. In fact just as above, all wehave to do is substitute (10.6) into (10.7), and we ﬁnd at once (10.8). This completes theproof.

11. Points over G . We show here how to replace V ( √ G ) and W ( √ G ) in the MainEstimate by V ( G ) and W ( G ) at the expense of worsening the dependence on the regulator.However we no longer insist that the W are isotrivial. If needed, this could be secured justby repeating the arguments of the previous section. We retain the notations (10.1),(10.2)from that section. Of course n ≥

2, and we continue with our assumption that K is ﬁnitelygenerated over F p , with F K = F p ∩ K ; further G is ﬁnitely generated of rank r ≥ F ∗ K . 41 ain Estimate for points over G . There is a positive integer f = f K ( G ) ≤ [ √ G : G ] ,depending only on K and G , with the following property. Let V be a positive-dimensionallinear subvariety of P n deﬁned over K but not a coset.(a) If V is not √ G -isotrivial, then V ( G ) = [ W ∈W W ( G ) for a ﬁnite set W of proper linear subvarieties W of V , also deﬁned over K and with h ( W ) ≤ ∆ h n R ( √ G ) n +2 . (b) If V is √ G -isotrivial and ψ ( V ) is deﬁned over F q , then either(ba) we have V ( G ) = [ W ∈W W ( G ) for a ﬁnite set W of proper linear subvarieties W of V , also deﬁned over K and with h ( ψ ( W )) ≤ | F K | Ψ R ( G ) or (bb) we have V ( G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( G )) q fe ! (11 . for a ﬁnite set W of proper linear subvarieties W of V , also deﬁned over K and with h ( ψ ( W )) ≤ q f | F K | Ψ R ( G ) . (11 . φ is the Euler function. Lemma 11.1.

For a given power

Q > of a prime P consider a ﬁnite collection ofcongruence equations LQ e ≡ M mod N (11 . with N taken from a ﬁnite set N of positive integers and L, M taken from Z . Suppose thatthe set of solutions e ≥ is non-empty. Then if there is some M = 0 with ord P M < ord P N this set is(a) ﬁnite with Q e ≤ max N ∈N N , nd otherwise(b) a ﬁnite union of arithmetic progressions e = e , e + f, e + 2 f, . . . with f = Q N ∈N φ ( N ) and Q e < Q f max N ∈N N .Proof. Suppose ﬁrst that there is some M = 0 with ord P M < ord P N . Then the corre-sponding L = 0, and we get e ord P Q ≤ ord P LQ e = ord P M < ord P N giving case (a).Thus we can assume that ord P M ≥ ord P N whenever M = 0. We proceed to verifycase (b). Now the congruences (11.3) can be split into congruences modulo powers of P and congruences modulo powers ˜ P m of other primes ˜ P = P .The former congruences, if any, will be satisﬁed as soon as e is suﬃciently large.Indeed they amount to LQ e ≡ P ord P N and so conditions e ≥ λ for various real λ ≤ ord P N ord P Q ; that is, Q λ ≤ P ord P N ≤ N . Thus together they give a single condition e ≥ Λfor some real Λ with Q Λ ≤ max N ∈N N .We note that whether e satisﬁes the other congruences depends only on its congruenceclass modulo f . For if ˜ P m divides some N then φ ( ˜ P m ) divides φ ( N ) which divides f , andso Q f ≡ P m .Thus the solutions e satisfy e ≥ Λ and also must lie in a ﬁnite number of arithmeticprogressions modulo f . If e is the smallest member of one of these progressions with e ≥ Λ, then e − f < Λ and this leads to case (b), thereby completing the proof.We can now start on the proof of the Main Estimate for points over G .Suppose ﬁrst that V is not √ G -isotrivial. Then (a) of the Main Estimate gives V ( √ G ) = [ W ∈W W ( √ G )for W satisfying (10.4). Now we can descend to G simply by intersecting with P n ( G ).Next suppose that V is √ G -isotrivial and ψ ( V ) is deﬁned over F q . Using elementarydivisors we can ﬁnd generators γ , . . . , γ r of √ G modulo constants and positive integers d , . . . , d r such that γ d , . . . , γ d r r generate G modulo constants. The constants can be taken43are of with an extra γ generating √ G ∩ F K and γ d generating G ∩ F K ; here d dividesthe order of γ as a root of unity. Thus[ √ G : G ] = d d · · · d r . (11 . ψ ( X , . . . , X n ) = ( ψ X , . . . , ψ n X n )with ψ i = γ a i γ a i · · · γ a ri r ( i = 0 , . . . , n ) (11 . √ G . Now (b) of the Main Estimate gives V ( √ G ) = ψ − [ W ∈W ∞ [ e =0 ( ψ ( W )( √ G )) q e ! (11 . W satisfying (10.6). But we can no longer descend to G simply by intersecting with P n ( G ).Consider a point π = ( π , . . . , π n ) of V ( G ). By (11.6) there is a point σ = ( σ , . . . , σ n )in some W ( √ G ) and some e ≥ π = ψ − ( ψ ( σ )) q e . As in (11.5) we write σ i = γ b i γ b i · · · γ b ri r ( i = 0 , . . . , n ); (11 . π is over G and so π i = γ c i d γ c i d · · · γ c ri d r r ( i = 0 , . . . , n ) . Equating exponents we ﬁnd a system of congruences( a ji + b ji ) q e ≡ a ji mod d j ( i = 0 , . . . , n ; j = 0 , , . . . , r ) (11 . σ . We can apply Lemma 11.1, and the argument splits into two accord-ing to the conclusion. As the b ji in (11.7) appear only in the coeﬃcients L , the splittingis independent of σ .Suppose ﬁrst that Lemma 11.1(a) holds. Then q e ≤ max { d , d , . . . , d r } ≤ d d · · · d r = [ √ G : G ] (11 . π lies in the ﬁnitely many ˜ W = ψ − ( ψ ( W )) q e , which we can put togetherinto a set ˜ W , and then we have shown that V ( G ) ⊆ [ ˜ W ∈ ˜ W ˜ W ( √ G ) . P n ( G ) gives the same inclusion but with ˜ W ( G ) on the right-handside. On the other hand˜ W = ψ − ( ψ ( W )) q e ⊆ ψ − ( ψ ( V )) q e = ψ − ( ψ ( V )) = V because ψ ( V ) is deﬁned over F q . Thus we conclude V ( G ) = [ ˜ W ∈ ˜ W ˜ W ( G )as in (ba) of the Main Estimate for points over G . But now from (11.9) and (10.6) theheights satisfy h ( ψ ( ˜ W )) = q e h ( ψ ( W )) ≤ d d · · · d r Ψ R ( √ G ) . Using Lemma 4.1 we see that R ( G ) = d · · · d r R ( √ G ), and so we can absorb some termsinto the regulator to get h ( ψ ( ˜ W )) ≤ d Ψ R ( G ) ≤ | F K | Ψ R ( G ) . (11 . e = e + f ˜ e with ˜ e ≥ e bounded as in (11.9) but with an extra q f . In particular taking ˜ e = 0we get a solution of (11.8) and this means that ˜ σ = ψ − ( ψ ( σ )) q e is also deﬁned over G .It lies in ˜ W = ψ − ( ψ ( W )) q e (11 . W ( G ). We also have ψ ( π ) = ( ψ ( σ )) q e = ( ψ (˜ σ )) ˜ q ˜ e for ˜ q = q f . Thus we conclude V ( G ) ⊆ ψ −  [ ˜ W ∈ ˜ W ∞ [ ˜ e =0 ( ψ ( ˜ W )( G )) ˜ q ˜ e  (11 . W of ˜ W in (11.11). On the other hand ψ ( ˜ W ) ˜ q ˜ e = ( ψ ( W )) q e ˜ q ˜ e ⊆ ( ψ ( V )) q e ˜ q ˜ e = ψ ( V )again because ψ ( V ) is deﬁned over F q . Thus we conclude equality in (11.12).45inally we calculate that h ( ψ ( ˜ W )) = q e h ( ψ ( W )) is bounded above by q f max { d , d , . . . , d r } Ψ R ( √ G ) ≤ q f | F K | Ψ R ( G ) (11 . f = φ ( d ) φ ( d ) · · · φ ( d r ) depends only on K and G with f ≤ d d · · · d r = [ √ G : G ] . This completes the proof of (bb); and so the Main Estimate for points over G is proved.In (11.13) the term q f cannot be so easily absorbed into the regulator without intro-ducing an exponential dependence on R ( G ). Let us discuss some aspects of this.When G = √ G then f = 1 in (bb) and we are more or less back to (b) of the MainEstimate. But in general we need the extra f in (11.1). The following example shows thatit sometimes must be almost as large as [ √ G : G ].We go back to the equation t m x + y = 1 of (1.5) over K = F p ( t ), with n = 2. It is to besolved in the group G = G l generated by t l and 1 − t , so that r = 2. Here √ G is generatedby t and 1 − t together with a generator ζ of F ∗ p . The equation deﬁnes a √ G -isotrivial line V with ψ ( x, y ) = ( t m x, y ) = (˜ x, ˜ y ), so that ˜ V = ψ ( V ) is deﬁned by ˜ x + ˜ y = 1, with q = p .Now Leitner [Le] has found all points on ˜ V ( √ G ). If p is odd there are p − F p together with six inﬁnite families(˜ x, ˜ y ) = (˜ x p ˜ e , ˜ y p ˜ e ) (˜ e = 0 , , . . . ) , where (˜ x , ˜ y ) are given by( t, − t ) , (1 − t, t ) , (cid:18) t , − − tt (cid:19) , (cid:18) − − tt , t (cid:19) , (cid:18) − t , − t − t (cid:19) , (cid:18) − t − t , − t (cid:19) . The ( x, y ) = ψ − (˜ x, ˜ y ) = ( t − m ˜ x, ˜ y ) are all the points on V ( √ G ). Choosing m not divisibleby l , we see that none of the constant points give rise to points of V ( G ). Similarly for thesecond family above. And the same is true of the last four families above, simply becauseof the minus signs. However the ﬁrst family gives ( t − m t p ˜ e , (1 − t ) p ˜ e ), which is in G if andonly if p ˜ e ≡ m mod l. (11 . p , there are inﬁnitely manyprimes l for which p is a primitive root modulo l . And Heath-Brown’s Corollary 2 of46He] (p.27) implies that this is true for at least one of p = 3 , ,

7. We can choose m with 1 ≤ m < l with p l − ≡ m mod l . Now (11.14) implies ˜ e ≡ l − l − e = l − l − e ( e = 0 , , . . . ). Thus the surviving points on V ( G ) are just the π = ψ − ( ψ ( W )) p ( l − e ( e = 0 , , . . . ) (11 . W as the single point ( t − m t p l − , (1 − t ) p l − ). This makes it clear that f ≥ l − √ G : G ] = ( p − l for ﬁxed p .We could also see this from (11.2). For as R ( G ) = l √

3, it implies that there wouldbe a point π on V ( G ) with h ( ψ ( π )) ≤ cp f l for c absolute. But the point (11.15) has y = ˜ y = (1 − t ) p l − p ( l − e so h ( ψ ( π )) ≥ p l − p ( l − e ≥ p l − . (11 . l → ∞ , we deduce f ≥ l − c ′ log l , also almost as big as [ √ G : G ] = ( p − l .Less precisely, there can be no estimate h ( ψ ( W )) ≤ C ( n, r, K ) ( h ( V ) + R ( G )) κ replacing (11.2) which is polynomial in h ( V ) and R ( G ) for ﬁxed n, r, K . For this wouldgive a point with h ( ψ ( π )) ≤ c ′′ ( m + l ) κ ≤ c ′′′ l κ , contradicting (11.16). Similarly one seesthat if the dependence on h ( V ) is polynomial, then the dependence on R ( G ) must beexponential. This explains the large solutions like (1.16), with p = 2 , l = 83 , m = 42.

12. Proof of Descent Steps and Theorems.

In the Descent Steps the variety V is certainly deﬁned over a ﬁnitely generated transcendental extension K of F p , and nowwe can choose any separable transcendence basis to obtain a height function. Now theDescent Step over √ G follows from the Main Estimate for isotrivial W . And the DescentStep over G follows, at least without the assumption that the W are √ G -isotrivial, fromthe Main Estimate for points over G . This assumption can be removed by induction justas in section 10 (without bothering about estimates): any W that is not √ G -isotrivial canbe replaced by a ﬁnite union of √ G -isotrivial varieties.To prove Theorem 1 we may assume that V has positive dimension. We apply theMain Estimate for points over G repeatedly, taking always q = | F K | f K ( G ) for safety. With V = V , an arbitrary point π of V ( G ) is either a point of W ( G ) for ﬁnitely many W in V with dim W ≤ dim V −

1, or a point ψ − ϕ e ψ ( π ) for π in V ( G ) for ﬁnitely many V V with dim V ≤ dim V − e ≥

0, with ψ ( V ) deﬁned over F K . Then weargue similarly with π ; and so on. After at most dim V ≤ n − T = V h , and only ﬁnitely many ψ , . . . , ψ h turn up on the way, leading to expressions asin (1.12) and thereby establishing Theorem 1.For later use we note that not just the varieties T but also the whole unions [ ψ , . . . , ψ h ] T lie in the variety V . Why is this? Well, a typical point of the union has the shape π = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ ) for some e , . . . , e h and some τ in T . The descent forTheorem 1 provides linear varieties V = V , V , . . . , V h = T . Now clearly τ lies in T inside V h − , so ψ − h ϕ e h ψ h ( τ ) lies in ψ − h ϕ e h ψ h ( V h − ) = ψ − h ψ h ( V h − ) = V h − inside V h − . In the same way ( ψ − h − ϕ e h − ψ h − )( ψ − h ϕ e h ψ h )( τ ) lies in V h − inside V h − .Continuing backwards we see that π = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ ) lies in V .We leave it to the reader to check, by a straightforward induction argument like thatin section 10 and also using Lemma 7.2, that for Theorem 1 one can takemax { h ( ψ ) , . . . , h ( ψ h ) , h ( T ) } ≤ (2 q ∆ R ( G ) n +2 ) ρ ( m ) h ( V ) η ( m ) (12 . R ( G ) and h ( V ); however,as we noted, an exponential dependence on R ( G ) may be hiding in q = | F K | f K ( G ) .For the symmetrization argument in the proof of Theorem 2 we need a version ofLemma 8.1 (p. 209) of [D], partly removed from its recurrence context. Lemma 12.1.

For m ≥ and x , . . . , x m , y , . . . , y m in K suppose that x y q l + · · · + x m y q l m = 0 (12 . for all large l . Then this holds for all l ≥ .Proof. The proof will be by induction on m , the case m = 1 being trivial. For the inductionstep we can clearly assume that x , . . . , x m are non-zero. Now we note that (12.2) forany m consecutive integers l = g, g + 1 , . . . , g + m − y , . . . , y m over F q . For if we regard these as linear equations for x , . . . , x m , the underlyingdeterminant is the q g power of that with entries y q j − i ( i, j = 1 , . . . , m ), and it is well-knownthat the latter, a so-called Moore determinant, is up to a constant the product of the β y + · · · + β m y m taken over all ( β , . . . , β m ) in P m − ( F q ) (see for example [Go] Corollary48.3.7 p.8). Thus after permuting we can suppose that y m = α y + · · · + α m − y m − for α , . . . , α m − in F q . Substituting into (12.2) gives( x + α x m ) y q l + · · · + ( x m − + α m − x m ) y q l m − = 0 , which therefore also holds for all large l . By the induction hypothesis we conclude thatthis holds for all l ≥

0, which leads back to (12.2) for all l ≥ ψ , . . . , ψ h ] T ( G ) coming from Theorem 1. Fix τ in T ( G ); then T = τ S for a linear subgroup S .We argue ﬁrst on the geometric level. According to (1.12) a typical point of [ ψ , . . . , ψ h ] T has the shape ψ q − ψ q q − q ψ q q q − q q · · · ψ q ··· q h − q ··· q h − h ( τ σ ) q ··· q h with q i = q e i ( i = 1 , . . . , h ) and σ in S ; here we are regarding the ψ i ( i = 1 , . . . , h ) asmultiplication by points instead of automorphisms. This expression can be written as π π q π q q · · · π q ··· q h − h − π q ··· q h h σ q ··· q h (12 . π = ψ − , π = ψ − ψ , . . . , π h − = ψ − h ψ h − , π h = ψ h τ . (12 . q l i = q · · · q i ( i = 1 , . . . , h ) we certainly get a point of ( π , π , . . . , π h ) S according to (1.14); but at the moment we have asymmetry l ≤ · · · ≤ l h . We eliminatethe inequalities here as in [D] (p.212).Let us start with the last inequality. We can write (12.3) as ξη q l with ξ and η independent of l = l h . We already remarked that [ ψ , . . . , ψ h ] T lies in V , so (12.3) does.Thus for each linear form L deﬁning V we have L ( ξη q l ) = 0 for all l , . . . , l h − , l with0 ≤ l ≤ · · · ≤ l h − ≤ l . Fixing l , . . . , l h − , we see from Lemma 12.1 that this equationfor all large l implies the same equation for all l ≥

0. Thus the inequality l h − ≤ l h has indeed been eliminated. Similar arguments work for the other conditions, as is clearfrom the arguments of [D] (p.212) after equation (22). For example, the next step ﬁxes l , . . . , l h − , l h but not l = l h − .Looking back at (12.3), we have therefore proved that all the points π π r π r · · · π r h − h − π r h h σ r h (12 . V , where the integers r i = q l i ( i = 1 , . . . , h ) now range independently over all positiveintegral powers of q . This is the required symmetrization at the geometric level.It actually shows that the entire ( π , π , . . . , π h ) S lies in V . For a typical point of theformer has the shape π π r π r · · · π r h − h − π r h h ˜ σ (12 . σ in S . And there is σ in S with σ r h = ˜ σ . This could be interpreted as somethingabout the divisibility of group varieties; but for us it is just a simple consequence of thefact that S is deﬁned by equations X i = X j . And now (12.6) and (12.5) are equal.At the arithmetic level we claim that ( π , π , . . . , π h ) S ( G ) lies in V ( G ). In fact everypoint π = π π r π r · · · π r h − h − π r h h (12 . r ≤ · · · ≤ r h has the shape (12.3) (with all coordinates of σ equal to 1).It therefore lies in [ ψ , . . . , ψ h ] T ( G ) which is in turn contained in V ( G ). In particular π lies in P n ( G ). But why does it continue to lie in P n ( G ) when the asymmetry is lifted?Well, we can take r = · · · = r h = 1 in (12.7) to see that the product π π · · · π h (12 . P n ( G ). Then taking r = · · · = r h − = 1 , r h = q we can deduce that π q − h liesin P n ( G ). And taking r = · · · = r h − = 1 , r h − = r h = q we deduce that π q − h − lies in P n ( G ). And so on, until we see that all of π q − , . . . , π q − h (12 . P n ( G ) (this was already remarked in section 1).And now if r , . . . , r h are arbitrary integral powers of q in (12.7) we can write π = ( π π · · · π h ) π r − · · · π r h − h to see from (12.8) and (12.9) that indeed π lies in P n ( G ).Now any point of ( π , π , . . . , π h ) S ( G ) by (12.5) has the form πσ r h with π as aboveand σ in S ( G ). It follows that ( π , π , . . . , π h ) S ( G ) lies in V ( G ) as claimed.On the other hand, taking all coordinates of σ as 1 in (12.3) shows that [ ψ , . . . , ψ h ] { τ } lies in ( π , π , . . . , π h ) S ( G ). As we could have ﬁxed τ arbitrarily in T ( G ), we see that[ ψ , . . . , ψ h ] T ( G ) lies in ( π , π , . . . , π h ) S ( G ).50t follows that V ( G ) is indeed the union of the ( π , π , . . . , π h ) S ( G ), which completesthe proof of Theorem 2. We note for later use the fact, already observed, that each( π , π , . . . , π h ) S is contained in V .Here too we leave it to the reader to check using (12.1) that for Theorem 2 one cantake max { h ( π ) , h ( π ) , . . . , h ( π h ) } ≤ ( n + 1)(2 q ∆ R ( G ) n +2 ) ρ ( m ) h ( V ) η ( m ) . (12 . T ( G ) contains τ with h ( τ ) ≤ h ( T ).To prove part (1) of Theorem 3 we start from Theorem 1 with V = H . We ﬁrst claimthat if some π in H ( G ) lies in some [ ψ , . . . , ψ h ] T ( G ) with T not a single point then some(1.2) fails for π . To see this, note that if T is not a single point, then there is a partitionof { , , . . . , n } into proper subsets I, J, . . . such that T is deﬁned by the proportionality ofthe homogeneous coordinates X i ( i ∈ I ), X j ( j ∈ J ), and so on. We may suppose that I contains 0 and that the equations corresponding to I are g i X = g X i for i in I . Considerthe point τ I in P n whose coordinates X i = g i for i in I but with all other coordinates zero.It also lies in T .Now π = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ ) for some e , . . . , e h and some τ in T . From ourremark following the proof of Theorem 1, we see that π I = ( ψ − ϕ e ψ ) · · · ( ψ − h ϕ e h ψ h )( τ I )lies in H . Now τ and τ I have the same coordinates X i ( i ∈ I ). It follows that π and π I have the same coordinates X i ( i ∈ I ). Since the other coordinates of π I are zero, thismeans that (1.2) fails for π as claimed.Therefore H ∗ ( G ) is contained in a ﬁnite union of sets [ ψ , . . . , ψ h ] { τ } . And each ofthese lies in H ( G ). This proves part (1) of Theorem 3.Part (2) follows in a similar way with the help of the remark after the proof of Theorem2, with π = π ( ϕ l π ) · · · ( ϕ l h π h ) σ and π I = π ( ϕ l π ) · · · ( ϕ l h π h ) σ I for σ I deﬁned by X i = 1 for i in I but with all other coordinates zero. This shows that we can restrictto single points S , and the proof is ﬁnished as above. We have therefore proved all ofTheorem 3.It is easy to deduce explicit estimates for Theorem 3 as for Theorems 1 and 2. Oneobtains at once (12.1) (with T replaced by τ ) and (12.10).

13. Limitation results.

We show here that for each n ≥ h ≤ n − p > ψ , . . . , ψ h inTheorem 1 and the π , π , . . . , π h in Theorem 2 cannot always be chosen over G .51e start with h ≤ n −

1. Because Theorem 1 directly implies Theorem 2 and thenTheorem 3, it will suﬃce to prove the analogous statements for Theorem 3. Also we haveseen that each [ ψ , . . . , ψ h ] { τ } in Theorem 3(1) is contained in some ( π , π , . . . , π h ) inTheorem 3(2). So it is enough to treat Theorem 3(2).This we do with the aﬃne hyperplane x + x − x − · · · − x n = 1 (13 . p let R = R p be the set of points(1 , r , . . . , r n − ) as the integers r , . . . , r n − run through all powers of p satisfying theasymmetry conditions that r i divides r i +1 ( i = 1 , . . . , n −

2) and also the extra conditions r n − = r n − , r n − + r n − , . . . , r n − + r n − + · · · + r . (13 . Lemma 13.1.

The set R does not lie in a ﬁnite union of proper subgroups of Z n .Proof. We can actually disregard (13.2) because their failure would just add more to theﬁnite union of proper subgroups. Now the falsity of the lemma would lead to an equation F ( p e , . . . , p e n − ) = 0 (13 . e , . . . , e n − , where F ( y , . . . , y n − ) is a ﬁnite productof polynomials A = a + a y + a y y + · · · + a n − y y · · · y n − corresponding to the proper subgroups of Z n perpendicular to ( a , . . . , a n − ) = 0. It isclear that each A 6 = 0 and so

F 6 = 0. On the other hand it is easy to see that the pointsin (13.3) are Zariski-dense in R n − . This contradiction proves the lemma.Take as usual K = F p ( t ) and G generated by t and 1 − t . We proceed to exhibit manypoints on H ∗ ( G ) with H deﬁned by (13.1).For integral powers q , . . . , q n − of p deﬁne r = q n − , r = q n − q n − , . . . , r n − = q n − · · · q and d = r n − − r n − − · · · − r − r , = r n − − r n − − · · · − r , down to d n − = r n − − r n − and d n − = r n − . Then x = t d , x = 1 − t d n − , x = t d n − − t d n − , . . . , x n = t d − t d (13 . ξ = ( x , . . . , x n ) lies in H . It is in H ( G ) because x = 1 − t r n − = (1 − t ) r n − ,x = t d n − (1 − t r n − ) = t d n − (1 − t ) r n − , and so on.This also leads to a multiplicative representation ξ = ξ r · · · ξ r n − n − (13 . ξ = ( 1 t , , , , , . . . , , , − tt ) ,ξ = ( 1 t , , , , , . . . , , − tt , t ) ξ = ( 1 t , , , , , . . . , − tt , t , t )down to ξ n − = ( 1 t , , − tt , t , t , . . . , t , t , t ) , but ξ n − = ( t, − t, t, t, t, . . . , t, t, t ) . We can quickly check that ξ , . . . , ξ n − are multiplicatively independent. Namely, a relation ξ a · · · ξ a n − n − = (1 , , , , , . . . , , , a n − = 0 on examining the second components, then a n − = 0 from thethird components, and so on down to a = 0.53he case n = 3 with q = q, q = r is of course (1.11) or (1.13).We can see that (13.4) lies in H ∗ ( G ) provided (1 , r , . . . , r n − ) lies in R . For thevarious exponents of t clearly satisfy d n − > d n − > · · · > d > d . There is one moreexponent 0; but d n − = 0 and from the deﬁnition of R we also have d n − = 0 , . . . , d = 0.Thus the exponents d n − , . . . , d , x , x , − x , . . . , − x n (in fact each of d n − = 0 , . . . , d = 0does lead to a vanishing subsum). We already remarked that (1.13) is in H ∗ as long as r = s , that is q = 1, that is r = r as in (13.2).Now we can prove as promised that H ∗ ( G ) does not lie in a ﬁnite union of setsΠ = ( π , π , . . . , π h ) q = ∞ [ l =0 · · · ∞ [ l h =0 π π q l · · · π q lh h (13 . q and points π , π . . . , π h with h < n −

1. The idea is to note that each Π lies ina coset of G n m of dimension at most h ≤ n −

2; whereas the points (13.5) have rank n − H ∗ ( G ) does lie in such a ﬁnite union and we shall reacha contradiction.Now for each element of R the corresponding (13.5) lies in H ∗ ( G ) so in some Π. Thisprovides a partition of R into a ﬁnite union of subsets R Π . By Lemma 13.1 we will bethrough if we can prove that each R Π lies in a proper subgroup of Z n .Suppose for some Π we are lucky in the sense that the corresponding π in (13.6) ismultiplicatively independent of ξ , . . . , ξ n − . The corresponding π − ξ = π − ξ r · · · ξ r n − n − all lie in the group generated by π , . . . , π h , and so the multiplicative rank of the various π − ξ is at most h ≤ n −

2. Since π − , ξ , . . . , ξ n − are independent, it follows that theset R Π cannot contain n (or even n −

1) independent elements. So it must indeed lie in aproper subgroup of Z n .In fact we are not so likely to be that lucky, and it is more probable that there is arelation π a = ξ a · · · ξ a n − n − with a = 0. Now the π − a ξ a = ξ ar − a · · · ξ ar n − − a n − n − still lie in a group of rank at most n −

2. Since ξ , . . . , ξ n − are independent, we deduceas above that the set of all ( ar − a , . . . , ar n − − a n − ) lie in a proper subgroup of Z n − .And this implies as above that R Π lies in a proper subgroup of Z n .54hat ﬁnishes the proof of the ﬁrst limitation result. We could also have argued witha symmetrized version of R ; then the A in the proof of Lemma 13.1 could be taken moresimply as a + a y + a y + · · · + a n − y n − .We can use similar arguments to prove the second limitation result concerning non-deﬁnability over G . Because the [ ψ , . . . , ψ h ] T ( G ) in Theorem 1 lead to ( π , π , . . . , π h ) inTheorem 2 with (12.4) for τ in T ( G ), it will again suﬃce to check the matter for Theorem3(2).This we do with the aﬃne line H deﬁned by tx + y = 1 also over K = F p ( t ), nowwith G generated by t p − and 1 − t . It is the example treated at the end of section 11 with m = 1 and l = p −

1. We need another simple observation.

Lemma 13.2.

For an odd prime p suppose that q + q + q = ˜ q + ˜ q + ˜ q (13 . for integral powers q , q , q , ˜ q , ˜ q , ˜ q of p . Then ˜ q , ˜ q , ˜ q are a permutation of q , q , q .Proof. If q , q , q are all diﬀerent then the left-hand side of (13.7) has just three ones inits expansion to base p . So also the right-hand side; which means that ˜ q , ˜ q , ˜ q are also alldiﬀerent. The result in this case is now clear (even for p = 2). If say q = q = q then weget a one and a two in the expansion because p = 2; so after a permutation ˜ q = ˜ q = ˜ q too, and the result is still clear. Similarly if q = q = q as long as p = 3. This last casecan also be checked directly when p = 3 and this proves the lemma; however the example1 + 1 + 4 = 2 + 2 + 2 shows that p = 2 is not to be saved.Now the analysis in section 11 before the primitive root business shows easily that thepoints of H ∗ ( G ) = H ( G ) are given by x = t r − , y = (1 − t ) r ( r = 1 , p, p , . . . ) . (13 . x, y ) = ξ ξ r for ξ = ( t − ,

1) and ξ = ( t, − t ). Assume p = 2. If H ∗ ( G ) werecontained in a ﬁnite union of Π = ( π , π ) q = ∞ [ l =0 π π q l for some q and some π , π over G , then one of these Π would certainly contain at leastthree diﬀerent points (13.8). This gives equations ξ ξ r = π π s , ξ ξ r ′ = π π s ′ , ξ ξ r ′′ = π π s ′′ (13 . r < r ′ < r ′′ of p and powers s, s ′ , s ′′ of q . Eliminating π , π leads to( ξ ξ r ) s ′ − s ′′ ( ξ ξ r ′ ) s ′′ − s ( ξ ξ r ′′ ) s − s ′ = 1;that is, ξ a = 1 for a = r ( s ′ − s ′′ ) + r ′ ( s ′′ − s ) + r ′′ ( s − s ′ ) . So a = 0; that is, rs ′ + r ′ s ′′ + r ′′ s = rs ′′ + r ′ s + r ′′ s ′ . Lemma 13.2 shows in particular that rs ′ is one of the terms on the right. But which one?Certainly rs ′ = r ′′ s ′ . And rs ′ = rs ′′ else s ′ = s ′′ and (13.9) would imply r ′ = r ′′ . Itfollows that rs ′ = r ′ s . But now eliminating ξ from the ﬁrst two equations in (13.9) leadsto ξ r ′ − r = π r ′ − r . Thus there would be α, β in F p with ( αt − , β ) = ( α, β ) ξ = π ; howeverthis is impossible because αt − is not in G if p = 2. References [AB] B. Adamczewski and J.P. Bell,

On vanishing coeﬃcients of algebraic power seriesover ﬁelds of positive characteristic , Manuscript 2010.[AV] D. Abramovich and F. Voloch,

Toward a proof of the Mordell-Lang conjecture incharacteristic p , International Math. Research Notices (1992), 103-115.[BG] E. Bombieri and W. Gubler, Heights in diophantine geometry , New MathematicalMonographs , Cambridge 2006.[BMZ] E. Bombieri, D. Masser and U. Zannier, Intersecting a plane with algebraic subgroupsof multiplicative groups , Ann. Scuola Norm. Sup. Pisa Cl. Sci. (5) VII (2008), 51-80.[Ca] J.W.S. Cassels,

An introduction to the geometry of numbers , Classics in Math.,Springer 1971.[CMP] L. Cerlienco, M. Mignotte and F. Piras,

Suites r´ecurrentes lin´eaires: propri´et´esalg´ebriques et arithm´etiques , L’Enseignement Math´ematique (1987), 67-108.[D] H. Derksen, A Skolem-Mahler-Lech theorem in positive characteristic and ﬁnite au-tomata , Invent. Math. (2007), 175-224.[E] J.-H. Evertse,

On sums of S-units and linear recurrences , Compositio Mathematica (1984), 225-244. 56ESS] J.-H. Evertse, H.P. Schlickewei and W.M. Schmidt, Linear equations in variableswhich lie in a multiplicative group , Annals of Math. (2002), 807-836.[EZ] J.-H. Evertse and U. Zannier,

Linear equations with unknowns from a multiplicativegroup in a function ﬁeld , Acta Arith. (2008), 159-170.[Gh] D. Ghioca,

The isotrivial case in the Mordell-Lang theorem , Trans. Amer. Math.Soc. (2008), 3839-3856.[GM] D. Ghioca and R. Moosa,

Division points on subvarieties of isotrivial semiabelianvarieties , Internat. Math. Res. Notices , 2006, Article ID 65437, 1-23.[Go] D. Goss, Basic structures of function ﬁeld arithmetic , Ergebnisse der Math. ,Springer 1996.[He] D.R. Heath-Brown, Artin’s conjecture for primitive roots , Quart. J. Math. OxfordSer. (2) (1986), 27-38.[HP] W.V.D. Hodge and D. Pedoe, Methods of algebraic geometry I , Cambridge 1968.[Hr] E. Hrushovsky,

The Mordell-Lang conjecture for function ﬁelds , J. Amer. Math. Soc. (1996), 667-690.[HW] L.-C. Hsia and J. T.-Y. Wang, The ABC theorem for higher-dimensional functionﬁelds , Trans. Amer. Math. Soc. (2003), 2871-2887.[La1] S. Lang,

Introduction to algebraic geometry , Addison-Wesley 1973.[La2] S. Lang,

Fundamentals of diophantine geometry , Springer 1983.[Le] D. Leitner,

Linear equations in positive characteristic , Master Thesis, University ofBasel 2008.[Maso] R.C. Mason,

Diophantine equations over function ﬁelds , London Math. Soc. Lec-ture Notes , Cambridge 1984.[Mass] D. Masser, Mixing and linear equations over groups in positive characteristic , IsraelJ. Math. (2004), 189-204.[MS1] R. Moosa and T. Scanlon,

The Mordell-Lang conjecture in positive characteristicrevisited , Model theory and applications, Quaderni di matematica (eds. L. Belair, Z.Chatzidakis, P. D’Aquino, D. Marker, M. Otero, F. Point, and A. Wilkie), Dipartimentodi Matematica Seconda Universit`a di Napoli (2002) pp. 273-296.[MS2] R. Moosa and T. Scanlon, F -structures and integral points on semiabelian varietiesover ﬁnite ﬁelds , American Journal of Mathematics (2004), 473-522.57PS] A.J. van der Poorten and H. P. Schlickewei, Additive relations in ﬁelds , Journal of theAustralian Mathematical Society

A 51 (1991), 154-170.[S] W.M. Schmidt,

Diophantine approximations and diophantine equations , Lecture Notesin Math. , Springer 1991.[SV] T. Struppeck and J.D. Vaaler,

Inequalities for heights of algebraic subspaces and theThue-Siegel principle , in Analytic Number Theory (Allerton Park 1989), Progress in Math. , Birkh¨auser Boston 1990 (pp. 493-528).[T] J. Thunder, Siegel’s lemma for function ﬁelds , Michigan Math. J. (1995), 147-162.[V] J.F. Voloch, The equation ax + by = 1 in characteristic p , J. Number Theory (1998),195-200. H. Derksen:

Department of Mathematics, University of Michigan, East Hall 530 ChurchStreet, Ann Arbor, Michigan 48104, U.S.A. ( [email protected] ) D. Masser:

Mathematisches Institut, Universit¨at Basel, Rheinsprung 21, 4051 Basel,Switzerland (