[PDF] Counting points of bounded height in monoid orbits

Abstract

Given a set of endomorphisms on P N , we establish an upper bound on the number of points of bounded height in the associated monoid orbits. Moreover, we give a more refined estimate with an associated lower bound when the monoid is free. Finally, we show that most sets of rational functions in one variable satisfy these more refined bounds.

Full PDF

aa r X i v : . [ m a t h . N T ] J u l COUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS

WADE HINDES(with appendix by UMBERTO ZANNIER)

Abstract.

Given a set of endomorphisms on P N , we establish an upper bound on thenumber of points of bounded height in the associated monoid orbits. Moreover, we give amore reﬁned estimate with an associated lower bound when the monoid is free. Finally, weshow that most sets of rational functions in one variable satisfy these more reﬁned bounds. Introduction

Let H be the absolute multiplicative Weil height on P N ( Q ) and let K be a number ﬁeld.Then given a subset X ⊆ P N ( Q ) of interest in some context, the growth rate of the numberof K -points in X of bounded height, X ( K, B ) := { Q ∈ X ∩ P N ( K ) : H ( Q ) ≤ B } , is known to encode interesting invariants of X and K . For instance, if X = P N ( K ) , then X ( K, B ) ∼ C K,N B ( N +1)[ K : Q ] where C K,N depends on the regulator, class group, etc. of K . If X is an abelian variety, then X ( K, B ) ∼ C K,X log( B ) r/ where r is the rank of the Mordell-Weil group X ( K ) . If X is a smooth curve of genus at least , then X ( K, B ) ∼ C K,X . Moregenerally, if X is a thin set, i.e., a proper Zariski closed subset or the image of some genericallyﬁnite morphism of degree at least two, then Theorem 3 in [22, §13.1] implies that(1) X ( K, B ) ≪ B ( N +1 / K : Q ] log( B ) . Likewise there are a few height-counting results in arithmetic dynamics, where orbits play therole of X ; see [2, 13, 16, 25, 26] for examples on Markoﬀ varieties, K3 surfaces, and projectivespace. For instance, suppose that φ is a dominant rational self-map of P N with dynamicaldegree δ φ > . Then, if P ∈ P N ( K ) is a point such that the orbit Orb φ ( P ) = { φ n ( P ) } n ≥ isZariski dense, the Kawaguchi-Silverman Conjecture predicts that(2) { Q ∈ Orb φ ( P ) : H ( Q ) ≤ B } ∼ log( δ φ ) − log log( B ); see [16] for the relevant deﬁnitions and background. Of course, this asymptotic is knownin the case of morphisms, when deg( φ ) = δ φ > and P is not preperiodic. Similarly if S = { φ , . . . φ s } is a set of endomorphisms of degree at least two equipped with a probabilitymeasure ν , then for almost every sequence γ of elements of S , we have the analogous asymptoticto (2) for random orbits:(3) { Q ∈ Orb γ ( P ) : H ( Q ) ≤ B } ∼ log( δ S,ν ) − log log( B ) , δ S,ν = Π φ ∈ S deg( φ ) ν ( φ ) . Here, the bound holds for all P with large enough height; see [13, Corollary 1.3] for details.In this paper, we study the problem of counting points of bounded height in monoid (orsemigroup) orbits in P N , that is, counting all of the points of bounded height obtained byapplying all possible compositions of maps within a ﬁxed set S to a given initial point P ;compare to [2, 26]. Intuitively, one expects that if the maps in S are related in some way(for instance, if they commute), then this should cut down the number of possible points inthe associated orbits. However, for most S we expect to see no relations (free monoids), andwith this in mind, we have the following result; here and throughout, M S denotes the monoidgenerated under composition by a set S of endomorphisms of P N deﬁned over Q . Mathematics Subject Classiﬁcation : Primary: 37P15, 37P05. Secondary: 11G50, 11D45.

Theorem 1.1.

Let S = { φ , . . . , φ s } be a set of endomorphisms on P N ( Q ) with distinctdegrees all at least two. If M S is free, then for all ǫ > there exists an eﬀectively computablepositive constant b = b ( S, ǫ ) and a constant B S depending only on S such that (log B ) b ≪ { f ∈ M S : H ( f ( P )) ≤ B } ≪ (log B ) b + ǫ holds for all P ∈ P N ( Q ) with H ( P ) > B S . Moreover, the implicit constants and error termsdepend on P and are eﬀectively computable if B S is.Remark . When S = { φ , φ } generates a free monoid with deg( φ ) = 2 and deg( φ ) = 3 ,then we give explicit computations for the bounds in Theorem 1.1 in Example 1 below.In particular, we can use the upper bound in Theorem 1.1 on the number of functions inthe free case to give an upper bound on the number of points of bounded height in arbitrarydynamical orbits; compare to (1), to [2, Theorem 4.15], and to the asymptotic for abelianvarieties above. In what follows, Orb S ( P ) = { f ( P ) : f ∈ M S } denotes the total orbit of P under the monoid M S . Corollary 1.2.

Let S = { φ , . . . , φ s } be a set of endomorphisms on P N ( Q ) all of degree atleast two (and distinct if s ≥ ). Then there exists an eﬀectively computable positive constant b and a constant B S depending only on S such that { Q ∈ Orb S ( P ) : H ( Q ) ≤ B } ≪ (log B ) b holds for all P ∈ P N ( Q ) with H ( P ) > B S .Remark . Although we expect that log( B ) b is also a lower bound for some choice of b andmost S (see Conjecture 1.3 and Theorem 1.6 below), we note that it is only an upper boundin general, even for s ≥ . For instance, if M S is a free commutative monoid (e.g., if S is acertain set of monic power maps), then the asymptotic height growth rate in orbits will be aconstant times log log( B ) ; see [13, §5] for details. This matches the case of a single map (alsoa commutative monoid); see also (2) and (3) above.Motivated by the upper and lower bounds in Theorem 1.1, we conjecture the followingexact asymptotic for the number of points (not functions) of bounded height in total orbitsassociated to free monoids: Conjecture 1.3.

Let S = { φ , . . . , φ s } be a set of endomorphisms on P N ( Q ) with distinctdegrees all at least two. If M S is free, then there exist constants a P = a ( S, P ) and b = b ( S ) such that lim B →∞ { Q ∈ Orb S ( P ) : H ( Q ) ≤ B } (log B ) b = a P holds for all suﬃciently generic P ∈ P N ( Q ) (i.e., all P ∈ P N ( Q ) outside of the union of aproper Zariski closed subset and a set of points of bounded height).Remark . Hence, we expect most monoid orbits in P N to exhibit similar height growth as:orbits on Markoﬀ varieties [26], orbits on K surfaces in P × P × P given by (2 , , -forms[2, Theorem 4.5], and Mordell-Weil groups of abelian varieties. However in these cases, therelevant monoids (or the underlying varieties themselves) form groups, and there is less needto distinguish between counting functions and points. For instance if there are inverses in M S ,distinct functions that agree at a point determine a non-trivial ﬁxed point, and these ﬁxedpoints can typically be controlled. On the other hand in the case of abelian varieties (whereone considers the monoid generated by multiplication maps), distinct functions that agree ata point determine a torsion point. Thus this situation may be avoided by throwing away a setof bounded height.To motivate our conjecture, we restrict our attention to morphisms of P . To state ourresults in this setting, recall that w ∈ P ( C ) is called a critical value of φ ∈ C ( x ) if φ − ( w ) contains fewer than deg( φ ) elements. Likewise, we call a critical value w of φ simple if φ − ( w ) contains exactly deg( φ ) − points. In particular, we have the corresponding notions for sets: OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 3

Deﬁnition 1.4.

Let S = { φ , . . . , φ s } be a set of rational maps on P and let C φ i denote theset of critical values of φ i . Then S is called critically separate if C φ i ∩ C φ j = ∅ for all i = j .Moreover, S is called critically simple if every critical value of every φ ∈ S is simple.As evidence for Conjecture 1.3 above, we establish the following weak version for genericsets of rational maps on P . In particular, we are able to count points instead of just functions. Theorem 1.5.

Let S = { φ , . . . , φ s } be a set of rational maps on P ( Q ) with distinct degreesall at least four. If S is critically separate and critically simple, then M S is a free monoid andfor all ǫ > there exists an eﬀectively computable positive constant b = b ( S, ǫ ) and a constant B S depending only on S such that (log B ) b ≪ { Q ∈ Orb S ( P ) : H ( Q ) ≤ B } ≪ (log B ) b + ǫ holds for all P ∈ P ( Q ) with H ( P ) > B S . Finally, since the theorem above does not directly apply to sets of polynomials, we givea diﬀerent proof in this case which works quite generically. In what follows, for m ≥ the polynomials Z m = x m are called cyclic polynomials and the polynomials T m satisfying T m ( x + x − ) = x m + x − m are called Chebychev polynomials (of the ﬁrst kind).

Theorem 1.6.

Let S = { φ , . . . , φ s } be a set of polynomials deﬁned over Q , and let a i x d i denote the leading term of φ i . Suppose that S satisﬁes the following conditions: (1) The set of degrees { d , . . . , d s } is a multiplicatively independent set in Z . (2) The set of leading coeﬃcients { a , . . . , a s } is a multiplicatively independent set in Q ∗ . (3) Each φ ∈ S is not of the form F ◦ E ◦ L for some polynomial F ∈ Q [ x ] , some cyclicor Chebychev polynomial E , and some linear L ∈ Q [ x ] .Then M S is a free monoid and for all ǫ > there exists an eﬀectively computable positiveconstant b = b ( S, ǫ ) and a constant B S depending only on S such that (log B ) b ≪ { Q ∈ Orb S ( P ) : H ( Q ) ≤ B } ≪ (log B ) b + ǫ holds for all P ∈ P ( Q ) with H ( P ) > B S . We brieﬂy outline the proofs of our results in dimension one above. The ﬁrst step is toshow that M S is free. For rational functions, this follows from the genus calculations in [20]and Picard’s theorem. For polynomials, conditions (1) and (2) of Theorem 1.6 imply that M S is free; see Theorem 4.2 below. In particular, Theorem 1.1 implies the desired growth rateon the number of functions f ∈ M S with H ( f ( P )) ≤ B in both cases. On the other hand,for rational functions the genus calculations in [20], Faltings’ Theorem, and Tate’s telescopinglemma 2.2 imply that(4) { f ∈ M S : f ( P ) = Q } is uniformly bounded for all Q ∈ Orb S ( P ) of suﬃciently large height; see Lemma 4.10 below.Likewise, the same property holds for polynomials by condition (3) of Theorem 1.6 and theintegral point classiﬁcation theorems in [3] and the Appendix 5. From here, the desired esti-mate for orbits (for both rational and polynomial functions) follows from Theorem 1.1 and theuniform bounds on (4). We note that it is possible that the full classiﬁcation theorems in [1, 3]can be used to strengthen the statement of Theorem 1.6, without reference to leading termsand degrees. However, we have endeavored to give as self-contained and broadly applicable astatement as possible. Acknowledgements:

We thank Yuri Bilu, Andrew Bridy, Alexander Evetts, Joseph Silver-man, and Umberto Zannier for discussions related to this paper. We also thank the authorsof [14]; Lemma 3.2 in their paper inspired the proof of Theorem 4.2. Finally, we are especiallygrateful to Umberto Zannier (again) for including the appendix to this paper.

WADE HINDES Auxiliary results

To count points of bounded height in orbits, we recall some basic facts about heights andgenerating functions. However as motivation for what is to come, we begin with a brief sketchof the proof of Theorem 1.1, an important ingredient for all other results in this paper. Thebasic idea, consistent with our earlier work on orbits attached to sequences in [10, 11], is thatthe logarithmic height of a point f ( P ) ∈ Orb S ( P ) is roughly determined by the size of deg( f ) ,as long as the initial point P is suﬃciently generic; see Lemma 2.2 below. With this in mind,to count the number of functions f ∈ M S with log H ( f ( P )) ≤ B , we should in some sensesimply be counting the number of f ’s of bounded degree. In particular, when M S is a freemonoid, we can relate the number of f ∈ M S with bounded degree to the number of restrictedinteger compositions of bounded size, once we approximate log deg( φ ) for all φ ∈ S by rationalnumbers. Finally, we use generating functions (and the location of their poles via Lemma 2.6and Lemma 2.5 below) to estimate the number of restricted integer compositions of boundedsize. These facts together imply Theorem 1.1. With this sketch in place, we move on andreview some basic facts about heights. Remark . Since multiplicative heights tend to grow exponentially when evaluating functions,it is convenient to use the logarithmic height h = log ◦ H (instead of H ) to state certainheight estimates in dynamics. However, since height-counting on varieties is usually donewith multiplicative heights, we convert back to H at the end of the proof of Theorem 1.1, tobe consistent with similar results in the literature.Suppose that φ : P N ( Q ) → P N ( Q ) is is a morphism deﬁned over Q of degree d φ . Then it iswell known that(5) h ( φ ( P )) = d φ h ( P ) + O φ (1) for all P ∈ P N ( Q ) ;see, for instance, [24, Theorem 3.11]. With this in mind, we let(6) C ( φ ) := sup P ∈ P N (¯ Q ) (cid:12)(cid:12)(cid:12) h ( φ ( P )) − d φ h ( P ) (cid:12)(cid:12)(cid:12) be the smallest constant needed for the bound in (5). Then, in order to control height growthrates when composing arbitrary elements of a set of endomorphisms, we deﬁne the followingfundamental notion; compare to [10, 11, 15]. Deﬁnition 2.1.

A set S of endomorphisms of P N ( Q ) is called height controlled if the followingproperties hold:(1) d S := inf { d φ : φ ∈ S } is at least .(2) C S := sup { C ( φ ) : φ ∈ S } is ﬁnite. Remark . We note ﬁrst that any ﬁnite set of morphisms of degree at least is height con-trolled. To construct inﬁnite collections, let T be any non-constant set of maps on P and let S T = { φ ◦ x d : φ ∈ T, d ≥ } . Then S T is height controlled and inﬁnite; a similar constructionworks for P N in any dimension. Remark . Although the results in this paper are for ﬁnite S , we include the notion of heightcontrolled sets to motivate future work. For instance, many of the tools used below: canonicalheights, generating functions, etc. work perfectly well for inﬁnite sets. However, the generatingfunctions that appear in this case are not rational, which adds some subtlety.As in the case of iterating a single function, it is Tate’s telescoping Lemma (generalizedbelow) that allows us to transfer information back and forth between heights and degrees; fora proof, see [10, Lemma 2.1]. Lemma 2.2.

Let S be a height controlled set of endomorphisms of P N ( Q ) , and let d S and C S be the corresponding height controlling constants. Then for all f ∈ M S , (cid:12)(cid:12)(cid:12)(cid:12) h ( f ( Q ))deg( f ) − h ( Q ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C S d S − for all Q ∈ P N ( Q ) . OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 5

Now that we have a tool to pass from functions yielding a bounded height relation tofunctions of bounded degree (via Lemma 2.2), we next relate counting functions of boundeddegree to counting restricted integer compositions; this is essentially achieved by the factthat log deg( F ◦ G ) = log deg( F ) + log deg( G ) for all endomorphisms F and G . However, tomake this idea precise, we brieﬂy discuss integer compositions, a classical object of study incombinatorics. For more details, see [6, §I.3.1].Let T ⊆ N > be a collection of positive integers (not necessarily ﬁnite). Then a restrictedcomposition of an integer n with summands in T (or a T -restricted composition of n ) is an ordered collection of elements in T whose sum is n . For instance, and are two diﬀerent restricted compositions of for the set T = { , } . Given n , let f Tn be thenumber of distinct ways of writing n as a composition with summands (parts) in T . Thento give an asymptotic for f Tn , one can try and understand the ordinary generating function f T ( z ) = P n f Tn z n . In particular, if in addition f T ( z ) is a rational or meromorphic function,then the radius of convergence of the generating function, determined by the poles of f T ( z ) ,can be used to deduce an asymptotic for f Tn . Luckily, the generating functions for restrictedcompositions are particularly simple rational functions; see Proposition I.1 in [6]. Proposition 2.3.

The ordinary generating function of the number of compositions havingsummands restricted to a set T ⊆ N > is given by f T ( z ) = 11 − P n ∈ T z n . As mentioned above, once we have an expression for f T ( z ) as a rational function, we can usethe poles of f T ( z ) to estimate the f Tn . Speciﬁcally, we have the following Theorem, a simpleconsequence of partial fractions and Newton expansion. In what follows, if F ( z ) = P n a n z n isa power series expansion about z = 0 for a meromorphic function F , then we use the notation [ z n ] F ( z ) = a n to extract coeﬃcients. Theorem 2.4 (Expansion of rational functions) . If F ( z ) is a rational function that is analyticat zero and has poles at points α , α , . . . , α m , then its coeﬃcients (as a power series about )are a sum of exponential-polynomials: there exist m polynomials { Π j ( x ) } mj =1 such that for n larger than some ﬁxed n , [ z n ] F ( z ) = m X j =1 Π j ( n ) α j − n . Furthermore, the degree of Π j is equal to the order of the pole of F at α j minus one. In particular, after combining Proposition 2.3 and Theorem 2.4, we see that to obtain anasymptotic formula for the number of integer compositions whose parts are restricted to theset { n , . . . , n s } , we must control the roots of smallest modulus of g ( z ) = 1 − ( z n + · · · + z n s ) .With this in mind, we have the following elementary proposition. Lemma 2.5.

Let n , n , . . . , n s be positive integers satisfying gcd( n , n , . . . , n s ) = 1 . Thenthe polynomial g ( z ) = 1 − ( z n + z n + · · · + z n s ) has a unique complex root α of smallestmodulus. Moreover, α is the unique positive real root of g , and α has multiplicity one .Proof. We ﬁrst show that any positive real root α of g is a root of smallest modulus for g (clearly g has a positive root by the Intermediate Value Theorem). This is a simple consequenceof Rouché’s Theorem: let r < α , let p ( z ) = − − z n − · · · − z n s , and let q ( z ) = 2 . Then forall | z | = r , we have that | p ( z ) | = | − − z n − · · · − z n s | ≤ | z | n + · · · + | z | n s = 1 + r n + · · · + r n s < α n + · · · + α n s = 2 − (1 − α n − · · · − α n s ) = | q ( z ) | by the triangle inequality and since α is a root of g . In particular, p and q are holomorphicfunctions on the disc D r of radius r such that | p ( z ) | < | q ( z ) | on the boundary D r . Hence,Rouché’s Theorem implies that q and q + p = g have the same number of roots inside D r .Therefore, g has no complex roots in D r , and α is a root of smallest modulus for g . On theother hand, it is clear that g restricted to the positive real numbers is strictly decreasing. WADE HINDES

Hence, g has only one positive real root. Likewise, it is easy to see that g ′ ( α ) < (since α ispositive). Hence, α must be a root of multiplicity one for g .We next show that α is the unique complex root of g of smallest modulus. This portion ofthe proof of Lemma 2.5 follows from results and arguments in [6, IV.6], namely the “DaﬀodilLemma" [6, IV.1] and the proof of [6, Proposition IV.3] on the commensurability of dominantdirections for rational generating functions arising from regular languages. To see this, supposethat ζ = αe iθ is another root of smallest modulus of g . Let f ( z ) = z n + · · · + z n s , so that ζ satisﬁes | f ( ζ ) | = | | = 1 = f ( α ) = f ( | ζ | ) . In particular, [6, Lemma IV.1] implies that θ = 2 πr/p for some integers ≤ r < p with gcd( r, p ) = 1 (when r = 0 ). Moreover, f admits p as a span; see [6, Deﬁnition IV.5]. In particular (since f admits p as a span), f ( z ) = z a h ( z p ) for some polynomial h and some non-negative integer a . Note also that gcd( a, p ) = 1 , since gcd( n , . . . , n s ) = 1 by assumption. On the other hand, f ( ζ ) = ζ a h ( ζ p ) = ( αe i πr/p ) a h (cid:0) ( αe i πr/p ) p (cid:1) = e i πar/p α a h ( α p ) = e i πar/p f ( α ) = e i πar/p . Hence, ar/p ∈ Z . But this is impossible unless r = 0 , since gcd( ar, p ) = 1 otherwise. Inparticular, ζ = α and α is the unique complex root of g of smallest modulus as claimed. (cid:3) Lastly, we include a technical result that allows us to approximate the number of boundedcompositions whose parts are restricted to the set of non-integers { log deg( φ ) , . . . , log deg( φ s ) } ,a task that is equivalent to counting the number of functions in M S of bounded degree, byinteger compositions whose parts satisfy the gcd condition needed to apply Lemma 2.5. Lemma 2.6.

Let c < c < · · · < c s be distinct positive real numbers. Then for all δ > thereexist positive integers n , . . . , n s , m , . . . , m s and u such that the following conditions hold: (1) c i − δ ≤ n i u < c i < m i u ≤ c i + δ . (2) gcd( n , . . . , n s ) = 1 = gcd( m , . . . , m s ) .Remark . In particular, we may assume that n < · · · < n s < m < · · · < m s by choosing δ suﬃciently small. Proof.

Clearly integers n , . . . , n s , m , . . . , m s and u satisfying condition (1) of Lemma 2.6exist. Therefore, to ﬁnd integers satisfying both (1) and (2), we choose integers satisfying (1)and deform them to ensure that both conditions hold. Speciﬁcally, ﬁx an integer r > , let v = ( n · · · · n s m · · · · m s u ) r , and deﬁne a new list as follows:(7) n ′ = n v + 1 , n ′ i = n i v, m ′ = m v + 1 , m ′ i = m i v, u ′ = u v for all i = 1 . In particular, we note that n ′ i u ′ = n i u and m ′ i u ′ = m i u for all i = 2 and that n ′ u ′ = n u + 1 v and m ′ u ′ = m u + 1 v . Therefore, we may certainly choose r suﬃciently large so that n ′ , . . . , n ′ s , m ′ , . . . , m ′ s and u ′ satisfying condition (1), since the original sequence does. On the other hand, it is easy to seethat gcd( n ′ , n ′ i ) = 1 and gcd( m ′ , m ′ i ) = 1 for all i = 2 by construction. For instance, supposethat p is a prime such that p | n ′ and p | n ′ i for some i = 2 . Then since p | n ′ i , we see that p | v or p | n i . But if p | v , then p | n v and p | n ′ . In particular, p | ( n ′ − n v ) = 1 by (7), a contradiction.Likewise, if p | n i , then p | v by deﬁnition of v . Therefore, we may repeat the argument aboveto reach a contradiction. Similarly, the fact that gcd( m ′ , m ′ i ) = 1 holds for all i = 2 followsmutatis mutandis. In particular, we see that both conditions (1) and (2) of Lemma 2.6 holdfor the new list n ′ , . . . , n ′ s , m ′ , . . . , m ′ s and u ′ , which completes the proof. (cid:3) height-counting in orbits With the necessary background in place, we are ready to prove the bounds on the numberof functions f ∈ M S yielding a bounded height relation from the Introduction. (Proof of Theorem 1.1). Let S = { φ , . . . , φ s } be a ﬁnite set of endomorphisms on P N allof degree at least , and suppose that the monoid M S generated by S under composition isfree. We begin by deﬁning some lengths on M S , which we then relate to integer compositions. OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 7

Given any vector v = ( v , . . . , v s ) ∈ R s> of positive real weights, we deﬁne l S, v ( φ i ) = v i for φ i ∈ S and extend l S, v to all f ∈ M S by:(8) l S, v ( f ) = n X j =1 l S, v ( θ j ) , where f = θ ◦ θ ◦ · · · ◦ θ n for some θ j ∈ S . Remark . Note that since S is a free basis of M S there is a unique way to write f as acomposition of elements of S . In particular, l S, v is a well-deﬁned function. Alternatively, inthe non-free case one can deﬁne l S, v ( f ) by taking an inf over the possible expressions in (8).On the other hand, since M S is a set of functions there is a natural choice of weightinggiven by c = ( c , . . . , c s ) where c i = log deg( φ i ) ; moreover, we assume c < c < · · · < c s . Inparticular, it follows from the fact that deg( F ◦ G ) = deg( F ) · deg( G ) for morphisms that(9) l S, c ( f ) = log deg( f ) for all f ∈ M S ,independent of the generating set. However, non-integer weights (like logs of integers) appearsparingly in the literature, and so we approximate the growth rate of l S, c (which relates to thegrowth rate of heights in orbits via Tate’s telescoping argument) using integer weights.To wit, choose positive integers n , . . . , n s , m , . . . , m s and u depending on δ as in Lemma2.6 and Remark 7. Then it follows by construction that u − l S, n ( f ) ≤ l S, c ( f ) ≤ u − l S, m ( f ) forall f ∈ M S . Hence,(10) { f ∈ M S : l S, m ( f ) ≤ uB } ⊆ { f ∈ M S : l S, c ( f ) ≤ B } ⊆ { f ∈ M S : l S, n ( f ) ≤ uB } holds for all positive B ; here n = ( n , . . . , n s ) and m = ( m , . . . , m s ) . Now given a positiveinteger n we deﬁne(11) L n := { f ∈ M S : l S, m ( f ) = n } and U n := { f ∈ M S : l S, n ( f ) = n } . In particular, since n and m are integer weight vectors, it follows from (9), (10) and (11) that(12) [ uB ] X n =0 L n ≤ { f ∈ M S : log deg( f ) ≤ B } ≤ [ uB ] X n =0 U n . Here [ uB ] denotes the nearest integer to uB . On the other hand, since S generates M S as a free monoid, we can identify M S with the set of ﬁnite sequences of elements of S . Inparticular, L n (respectively U n ) represents the number of ways of writing n as the sum of asequence of elements in { m , . . . , m s } (respectively in { n , . . . , n s } ). Such sequences have beenextensively studied in combinatorics [6, §I.3.1] and are called restricted integer compositions.Speciﬁcally, generating functions for these compositions are known; see Proposition 2.3 above.In particular,(13) L n = (cid:2) z n (cid:3) − ( z m + · · · + z m s ) and U n = (cid:2) z n (cid:3) − ( z n + · · · + z n s ) . As a reminder, [ z n ] F ( z ) denotes the operation of extracting the coeﬃcient of z n in the formalpower series F ( z ) = P f n z n ; see [6, p.19]. On the other hand, since gcd( n , . . . , n s ) = 1 and gcd( m , . . . , m s ) = 1 by construction, Lemma 2.5 implies that both of the rational functionsin (13) have unique poles of smallest modulus (and these poles are positive real numbers ofmultiplicity one). Let α , . . . α r be the roots of g n ( z ) = 1 − ( z n + · · · + z n s ) arranged inincreasing order of modulus and let β , . . . , β r be the roots of g m ( z ) = 1 − ( z m + · · · + z m s ) arranged in increasing order of modulus. Then Theorem 2.4 and (13) together imply that(14) L n = κ β − n + p ( n ) β − n + · · · + p r ( n ) β − nr and U n = τ α − n + q ( n ) α − n + · · · + q r ( n ) α − nr for some constants κ and τ and some polynomials p i , q j ∈ C [ z ] . Explicitly,(15) κ = − β g ′ m ( β ) and τ = − α g ′ n ( α ) . Here we use the residue method for extracting partial fraction coeﬃcients and Newton’s ex-pansion; see the proof of [6, Theorem IV.9]. Moreover, the expressions in (14) and (15) hold

WADE HINDES simultaneously for all n > n for some constant n ∈ N . In particular, by summing (14) andusing the triangle inequality (for both sums and diﬀerences) we see that(16) κ β − m − κ m r | β | − m − κ ≤ m X n =0 L n and m X n =0 U n ≤ τ α − m + τ m r | α | − m + τ holds for all m suﬃciently large. Again, in the interest of being as explicit as possible (at leastfor the main terms), we have that(17) κ = κ ( β )( β ) − − β (1 − β ) g ′ m ( β ) and τ = τ ( α )( α ) − − α (1 − α ) g ′ n ( α ) , obtained by summing the corresponding geometric series. Moreover, r (respectively r ) is themaximum of the multiplicities of the roots of g m (respectively g n ) minus one. Hence, aftertaking m = [ Bu ] , combining (12) and (16), and absorbing u into the relevant constants, wesee that(18) κ C B − κ B r C B − κ ≤ { f ∈ M S : log deg( f ) ≤ B } ≤ τ C B + τ B r C B + τ holds for all B suﬃciently large; here we use also that Bu − ≤ [ Bu ] ≤ Bu + 1 , so that (some)of the relevant constants are given explicitly by C = 1 β u , κ = κ β = 1( β − g ′ m ( β ) , C = 1 | β | u ,C = 1 α u , τ = τ α = 1 α ( α − g ′ n ( α ) , C = 1 | α | u . (19)We note in particular that C > C and C > C , since β < | β | and α < | α | by construc-tion. Now suppose that P ∈ P N ( Q ) is such that h ( P ) > b S := C S / ( d S − , where C S and d S are the constants from Deﬁnition 2.1 above. Then, Tate’s telescoping Lemma 2.2 impliesthat deg( f )( h ( P ) − b S ) ≤ h ( f ( P )) ≤ deg( f )( h ( P ) + b S ) . Therefore, for all B we have the subset relations:(20) (cid:26) f ∈ M S : log deg( f ) ≤ log (cid:18) Bh ( P ) + B S (cid:19)(cid:27) ⊆ (cid:8) f ∈ M S : h ( f ( P )) ≤ B (cid:9) ⊆ (cid:26) f ∈ M S : log deg( f ) ≤ log (cid:18) Bh ( P ) − B S (cid:19)(cid:27) . In particular, if we replace B with log( B/ ( h ( P ) + B S )) on the left side of (18), replace B with log( B/ ( h ( P ) − B S )) on the right side of (18), and apply the change of base formulas forlogarithms, then we deduce from (18) and (20) that κ (cid:18) Bh ( P )+ b S (cid:19) log( C ) − κ log (cid:18) Bh ( P )+ b S (cid:19) r (cid:18) Bh ( P )+ b S (cid:19) log( C ) − κ ≤ (cid:8) f ∈ M S : h ( f ( P )) ≤ B (cid:9) ≤ τ (cid:18) Bh ( P ) − b S (cid:19) log( C ) + τ log (cid:18) Bh ( P ) − b S (cid:19) r (cid:18) Bh ( P ) − b S (cid:19) log( C ) + τ (21)holds for all B suﬃciently large and all initial points P such that h ( P ) > b S . Moreover, sincemost height counting problems on varieties are stated in terms of multiplicative heights, wereplace B with log B in (21) to obtain OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 9 (cid:18) κ ( h ( P )+ b S ) log( C (cid:19) log( B ) log( C ) − (cid:18) κ ( h ( P )+ b S ) log( C (cid:19) log (cid:18) log Bh ( P )+ b S (cid:19) r log( B ) log( C ) − κ ≤ (cid:8) f ∈ M S : H ( f ( P )) ≤ B (cid:9) ≤ (cid:18) τ ( h ( P ) − b S ) log( C (cid:19) log( B ) log( C ) + (cid:18) τ ( h ( P ) − b S ) log( C (cid:19) log (cid:18) log Bh ( P ) − b S (cid:19) r log( B ) log( C ) + τ . (22)Hence, after renaming the constants above, we see that there exist positive constants a ( S, P, δ ) , a ( S, P, δ ) , b ( S, δ ) , b ( S, δ ) and B S := e b S such that(23) a (log B ) b + o (cid:0) (log B ) b (cid:1) ≤ { f ∈ M S : H ( f ( P )) ≤ B } ≤ a (log B ) b + o (cid:0) (log B ) b (cid:1) holds for all P ∈ P N ( Q ) with H ( P ) ≥ B S . Moreover, b and b depend only on the set S and δ , and a and a (and the lower order terms) depend on S , δ and P . Speciﬁcally, (15), (17)and (19) together imply a = 1( β − g ′ m ( β ) log (cid:0) B S H ( P ) (cid:1) log( β − u ) , b = log( β − u ) ,a = 1 α ( α − g ′ n ( α ) log (cid:16) H ( P ) B S (cid:17) log( α − u ) , b = log( α − u ) . (24)Moreover, since roots of polynomials can be approximated to any accuracy eﬀectively, b and b can be computed eﬀectively (also integers as in Lemma 2.6 can be produced eﬀectively forall δ ). Therefore, to complete the proof of Theorem 1.1, we need only show that the diﬀerence b − b > can be made arbitrarily small (by letting δ go to zero); see (31) below. Then weset b = b and b = b + ǫ to deduce the claim in Theorem 1.1.To do this, we use the Mean Value Theorem applied to the functions f ( x ) = − g m ( x ) and h ( x ) = u log( x ) on the intervals [ α , β ] . With this in mind, we begin with a few estimates,all of which follow easily from part (1) of Lemma 2.6:(25) δc < δun < δc − δ , < m n < c + δc − δ , um < c . To simplify the expressions that follow, let α = α and β = β . Then since n ≤ n i and < α < , we see that P si =1 α n i ≤ sα n . Therefore,(26) (cid:16) s (cid:17) n ≤ α. In particular, (25) and (26) together imply the following lower bound on the derivative:(27) f ′ ( α ) = m s α m s − + · · · + m α m − ≥ m α m − ≥ m α m ≥ m (cid:16) s (cid:17) m n ≥ m (cid:16) s (cid:17) c δc − δ . Similarly, (25) and (26) together imply that: f ( α ) = α ( msu − nsu ) u · α n s + · · · + α ( m u − n u ) u · α n − ≥ α δu · α n s + · · · + α δu · α n − α δu ( α n s + · · · + α n ) − α δu − ≥ (cid:16) s (cid:17) δun − ≥ (cid:16) s (cid:17) δc − δ − . Here, we use also that ≤ m i u − n i u ≤ δ by construction; see Lemma 2.6 part (1). In particular,we deduce the following key upper bound:(28) − f ( α ) ≤ − (cid:16) s (cid:17) δc − δ . We are now ready to apply the Mean Value Theorem to f ( x ) on [ α, β ] . Speciﬁcally, m (cid:16) s (cid:17) c δc − δ ≤ f ′ ( α ) = min α ≤ x ≤ β f ′ ( x ) ≤ f ( β ) − f ( α ) β − α = − f ( α ) β − α ≤ − ( s ) δc − δ β − α follows from (27), (28), and the Mean Value Theorem. Therefore, we have the estimate:(29) ≤ β − α ≤ − ( s ) δc − δ m ( s ) c δc − δ . Likewise, the Mean Value Theorem for h ( x ) = u log( x ) on [ α, β ] , (26), and the fact that n > together yield(30) ≤ h ( β ) − h ( α ) β − α ≤ max α ≤ x ≤ β h ′ ( x ) = h ′ ( α ) = uα − ≤ su. Hence, after combining (24),(25), (29) and (30), we deduce that ≤ b − b = h ( β ) − h ( α ) ≤ su · − ( s ) δc − δ m ( s ) c δc − δ = s · um · − ( s ) δc − δ ( s ) c δc − δ ≤ sc · − ( s ) δc − δ ( s ) c δc − δ (31)However, the upper bound in (31) goes to zero as δ goes to zero. Therefore, the exponents b and b in (23) can be made arbitrarily close. (cid:3) Remark . If S has only two maps ( s = 2 ), then the trinomials g n ( z ) = 1 − z n − z n and g m ( z ) = 1 − z m − z m must have non-zero discriminant (in fact, here we need only that n = n and m = m , making no assumptions on gcd’s); this fact follows easily from thediscriminant formula in [9, Theorem 4]. In particular, r and r from (18) and (22) must bezero. Hence, we obtain simpler bounds for the number of functions of bounded degree (hence,also for the number of points of bounded height in orbits). For instance, κ C B − κ C B − κ ≤ { f ∈ M S : log deg( f ) ≤ B } ≤ τ C B + τ C B + τ holds for all B suﬃciently large. Example . In particular, if S = { φ , φ } with deg( φ ) = 2 and deg( φ ) = 3 , then we use thecrude approximations < log(2) < and < log(3) < as inputs to Lemma 2.6 to obtain some explicit bounds for Theorem 1.1. Speciﬁcally, . B S H ( P )) . ! log( B ) . + o (cid:0) log( B ) . (cid:1) ≤ { f ∈ M S : H ( f ( P )) ≤ B }≤ . (cid:0) H ( P ) BS (cid:1) . ! log( B ) . + o (cid:0) log( B ) . (cid:1) holds for all P ∈ P N ( Q ) of suﬃciently large height; here we use (19), (22) and Magma [5] toapproximate roots of polynomials.Lastly, we can use the bounds in Theorem 1.1 on the number of functions in free monoidssatisfying a bounded height relation to give an upper bound on the number of points ofbounded height in arbitrary monoid orbits. (Proof of Corollary 1.2).

Let S = { φ , . . . , φ s } be a set of endomorphisms all of degree atleast . If s = 1 (i.e., S = { φ } contains just one map), then one may use the canonical height[24, §3.4] associated to φ to reach the desired bound. Namely, the fact that | ˆ h φ − h | ≤ c φ andthat ˆ h ( φ n ( P )) = d nφ ˆ h φ ( P ) together imply that ( n : n ≤ log d φ (cid:18) log( B ) − c φ ˆ h φ ( P ) (cid:19)) ⊆ { Q ∈ Orb φ ( P ) : H ( Q ) ≤ B } ⊆ ( n : n ≤ log d φ (cid:18) log( B )+ c φ ˆ h φ ( P ) (cid:19)) for all non-preperiodic P . On the other hand, if P is preperiodic, then Orb φ ( P ) is ﬁnite. Inparticular, the number of points with (multiplicative) height at most B is certainly boundedabove by a constant times log log( B ) ≪ log( B ) as claimed; hence, b = 1 in this case. OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 11

Now assume that s ≥ , and let F S be the free monoid generated by S under concatenation.Then, given a word w = θ . . . θ n ∈ F S , we can deﬁne an action of w on P N ( Q ) via w · P = θ ◦ · · · ◦ θ n ( P ) . Likewise, we deﬁne the degree of w to be deg( θ ◦ · · · ◦ θ n ) . In particular, (bycounting words of bounded degree) it is straightforward to see that we can replace M S with F S in the proof of Theorem 1.1 and deduce that a log( B ) b + o (cid:0) log( B ) b (cid:1) ≤ { w ∈ F S : H ( w · P ) ≤ B } ≤ a log( B ) b + o (cid:0) log( B ) b (cid:1) for some constants a ( P ) , a ( P ) , b and b (whenever H ( P ) > B S , as before); here we canchoose δ = 0 . , small enough to separate logs of distinct integers (see Remark 7). In particular,since every point Q ∈ Orb S ( P ) is of the form Q = w · P for some w ∈ F S , we have that { Q ∈ Orb S ( P ) : H ( Q ) ≤ B } ≤ { w ∈ F S : H ( w · P ) ≤ B } ≤ a log( B ) b + o (cid:0) log( B ) b (cid:1) . Therefore, the number of points in

Orb S ( P ) with height at most B is ≪ log( B ) b . Concretely,by choosing δ = 0 . we get the crude bound b ≤ c − . log( s ) from (24) and (26). (cid:3) Remark . It is likely that the statement and proof of Theorem 1.1 hold for height controlledsets of simultaneously polarizable maps on any projective variety. The main arithmetic in-gredient, Tate’s telescoping Lemma 2.2, works perfectly well with this level of generality; see[10, Lemma 2.1]. Moreover, the other components of the proof (generating functions anddiophantine approximation of degrees) don’t depend on P N .4. Monoid orbits in dimension one

In this section, we prove Theorems 1.5 and 1.6 on monoid orbits over P . To do this, we ﬁrstshow that the relevant sets of maps generate free monoids under composition. For criticallyseparate and simple sets of rational maps, this follows directly from the main results of [20]. Theorem 4.1.

Let S = { φ , . . . , φ s } be a set of rational maps on P ( C ) all of degree at leastfour. If S is critically separate and critically simple, then M S is a free.Proof. Suppose that f = θ ◦ · · · ◦ θ n = τ ◦ · · · ◦ τ m = g for some θ i , τ j ∈ S . Without lossof generality, we may assume that n ≥ m . Clearly if n = m = 1 , then θ = τ and there isnothing to prove. Therefore, we may assume that n ≥ m > . Write f = θ ◦ · · · ◦ θ n and g = τ ◦ · · · ◦ τ m so that θ ( f ) = τ ( g ) . However, since f and g are non-constant and S iscritically separate, [20, Theorem 1.1] implies that θ = τ . Likewise since S is critically simpleand deg( θ ) ≥ , we see that f = g by [20, Theorem 1.3]. Repeating the argument abovenow for f and g (instead of f and f ), we see that θ = τ and θ ◦ · · ·◦ θ n = τ ◦ · · ·◦ τ m . Wecan clearly keep going to deduce that θ i = τ i for all ≤ i ≤ m . Finally, by equating degreesgiven by the original relation f = g , we see that deg( θ n − m ) · · · deg( θ n ) = 1 , a contractionunless n = m . This completes the proof that M S is free. (cid:3) Next we show that polynomial sets with multiplicatively independent degrees and leadingcoeﬃcients generate free monoids under composition. This is perhaps known to the experts.However, without a reference, we include a proof for completeness. Our argument is inspiredby the proof of [14, Lemma 3.2].

Theorem 4.2.

Let S = { φ , . . . , φ s } be a set of polynomials deﬁned over a ﬁeld K of charac-teristic zero, and let a i x d i denote the leading term of φ i . If { d , . . . , d s } is a multiplicativelyindependent set in Z and { a , . . . , a s } is a multiplicatively independent set in K ∗ , then M S isa free monoid.Proof. As the statement of the theorem suggests, it suﬃces to study the monoid generated bythe leading terms in S . To make this statement precise, we note the following lemma: Lemma 4.3.

Let S = { φ , . . . , φ s } be a set of polynomials deﬁned over a ﬁeld K , let a i x d i denote the leading term of φ i , and let S ′ = { a x d , . . . , a s x d s } . If M S ′ is a free monoid, then M S is a free monoid. Proof.

This statement is a simple consequence of the fact that lt ( f ◦ g ) = lt ( f ) ◦ lt ( g ) for all f, g ∈ K [ x ] ; here lt ( · ) denotes the leading term of a polynomial. To see this, suppose that M S ′ is a free monoid and that there is some relation(32) θ ◦ θ ◦ · · · ◦ θ n = τ ◦ τ ◦ · · · ◦ τ m for some θ i , τ j ∈ S . Then, in particular, we have an equality of leading terms, lt ( θ ) ◦ lt ( θ ) ◦ · · · ◦ lt ( θ n ) = lt ( τ ) ◦ lt ( τ ) ◦ · · · ◦ lt ( τ m ) . But this is a relation in M S ′ , which is free on the letters in S ′ . Therefore, n = m and lt ( θ i ) = lt ( τ i ) . However, again since M S ′ is free, lt ( θ i ) = lt ( τ i ) implies that θ i = τ i . Hence therelation in (32) is a trivial one. (cid:3) Now back to the proof of Theorem 4.2. In particular, in light of Lemma 4.3, we mayassume that S = { φ , . . . φ s } is a set of monomials with φ i = a i x d i , that { d , . . . d s } is amultiplicatively independent set in Z , and that { a , . . . a s } is a multiplicatively independentset in K ∗ . Now, given F = θ ◦ · · · ◦ θ n ∈ M S and φ ∈ S , we deﬁne e φ ( F ) = { j | θ j = φ } to be the number of φ ’s appearing in the string deﬁning F (strictly speaking this is an abuseof notation; e φ is a function on words). In particular, if there is a relation F = G for some F, G ∈ M S , then we see that(33) d e φ ( F )1 · · · d e φs ( F ) s = deg( F ) = deg( G ) = d e φ ( G )1 · · · d e φs ( G ) s . However, the d i ’s are multiplicatively independent by assumption, so that e φ i ( F ) = e φ i ( G ) forall i . In particular, the strings deﬁning F and G have the same length (i.e., the total numberof letters from S ) equal to n = P e φ i ( F ) . Hence,(34) F = θ ◦ · · · ◦ θ n = τ ◦ · · · ◦ τ n = G for some θ i , τ i ∈ S. Moreover, e φ i ( F ) = e φ i ( G ) for all i . From here, we will show that θ i = τ i by induction on thelength n . The n = 1 case is clear. For n > , if (34) holds then(35) F ′ ◦ θ = F = G = G ′ ◦ τ for some θ, τ ∈ S and some monomials F ′ and G ′ given by strings of length n − of elementsof S . We proceed in cases. Case(1):

Suppose that θ = τ , and write θ = ax d , F ′ = a F ′ x deg( F ′ ) and G ′ = a G ′ x deg( F ′ ) .Here we use that deg( F ) = deg( G ) and θ = τ , so that deg( F ′ ) = deg( G ′ ) . Therefore, (35)becomes a F ′ a deg( F ′ ) x d deg( F ′ ) = a G ′ a deg( F ′ ) x d deg( F ′ ) , and we deduce that a F ′ = a G ′ . However, then F ′ = a F ′ x deg( F ′ ) = a G ′ x deg( F ′ ) = G ′ and F ′ , G ′ ∈ M S are polynomials obtained by composing strings of elements of S of length n − .In particular, we may deduce that θ i = τ i for all i < n by induction. On the other hand, θ n = θ = τ = τ n by construction. Therefore, θ i = τ i for all i ≤ n as claimed. Case(2):

Suppose that θ = τ . We ﬁx some notation. Given a string θ . . . θ m of elements of S , write f = θ ◦ · · · ◦ θ m = a f x deg( f ) = ( a n · · · a n s s ) x deg( f ) . Then deﬁne the a i -degree of f (or more accurately, the a i -degree of the corresponding string)to be deg a i ( f ) = n i . Note that this construction is well-deﬁned since the leading coeﬃcient a f is in the (multiplicative) semigroup generated by the a i ’s and the a i ’s are multiplicativelyindependent by assumption. Now write θ = ax d . Then we will show that deg a ( F ) = deg a ( G ) ,a contradiction, using (35), the fact that θ = τ , and the following elementary observationsabout a -degrees: Lemma 4.4.

Let S be as in Theorem 4.2 and let θ = ax d ∈ S . Then the following statementshold: (1) If f , f , g ∈ M S , deg a ( f ) ≤ deg a ( f ) , and deg( f ) ≤ deg( f ) , then deg a ( f ◦ g ) ≤ deg a ( f ◦ g ) . (2) Let f ∈ M S , and suppose that e θ ( f ) = e ≥ . Then deg a ( f ) ≤ d e − d − · deg( f ) d e . OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 13

We grant Lemma 4.4 for now and return to the proof later. To see that deg a ( F ) = deg a ( G ) in Case 2, let e = e θ ( F ) = e θ ( G ) be the number of θ ’s appearing in the strings deﬁning F and G . Then, writing F = F ′ ◦ θ as in (35), we see that(36) deg a ( F ) = deg a ( F ′ ) + deg( F ′ ) deg a ( θ ) = deg a ( F ′ ) + deg( F ′ ) ≥ deg( F ′ ) . On the other hand, Lemma 4.4 part (2) applied to f = G ′ implies that(37) deg a ( G ) = deg a ( G ′ ) + deg( G ′ ) deg a ( τ ) = deg a ( G ′ ) ≤ d e − d − · deg( G ′ ) d e . Here we use that G = G ′ ◦ τ and that deg a ( τ ) = 0 , since θ = τ and the leading coeﬃcientsof the elements in S are multiplicatively independent. Therefore, if deg a ( F ) = deg a ( G ) , then(35), (36), (37) together imply that deg( τ ) deg( F ) = deg( τ ) d deg( F ′ ) ≤ deg( τ ) d deg a ( F )= deg( τ ) d deg a ( G ) ≤ d e − d − · deg( τ ) deg( G ′ ) d e − ≤ d e − d − · deg( G ) d e − < d e d − · deg( G ) d e − = dd − G ) . (38)However, F = G so that deg( F ) = deg( G ) . In particular, (38) implies that ≤ deg( τ ) < dd − ≤ , a contradiction. Therefore, deg a ( F ) = deg a ( G ) and Case 2 is incompatible with (35). There-fore, any relation in M S must be of the form in Case 1. However, since we have settledTheorem 4.2 in this case by induction, M S is a free monoid as claimed. (cid:3) We now include a proof of Lemma 4.4 regarding a -degreess. (Lemma 4.4). The ﬁrst statement is a simple consequence of the deﬁnition of a -degrees. Sup-pose that f , f , g ∈ M S , that deg a ( f ) ≤ deg a ( f ) , and that deg( f ) ≤ deg( f ) . Then deg a ( f ◦ g ) = deg a ( f ) + deg a ( g ) · deg( f ) ≤ deg a ( f ) + deg a ( g ) · deg( f ) = deg a ( f ◦ g ) as claimed. For the second statement, let f ∈ M S and suppose that e θ ( f ) = e ≥ . Then, wemay write(39) f = g t +1 ◦ θ r t ◦ g t ◦ · · · ◦ g ◦ θ r ◦ g for some g i ∈ M S with deg a ( g i ) = 0 , some t ≥ , and some r i ≥ with P ti =1 r i = e . We willshow by induction on t that(40) deg a ( f ) ≤ deg a ( g e +1 ◦ g e ◦ · · · ◦ g ◦ θ e ) , from which statement (2) of the Lemma easily follows. If t = 1 , then deg a ( g ◦ θ r ◦ g ) = deg( g ) deg a ( θ r ) ≤ deg( g ) deg( g ) deg a ( θ r ) ≤ deg a ( g ◦ g ◦ θ r ) . Here we use that deg a ( g i ) = 0 . On the other hand, assume that t > and that (40) is truefor polynomials of the form in (39) with t − appearances of substrings of the form θ r i . Thengiven f as in (39), let f = g t +1 ◦ θ r t ◦ g t ◦ θ r t − ◦ · · · ◦ g , let f = g t +1 ◦ · · · ◦ g ◦ θ r where r = P ti =2 r i , and let g = θ r ◦ g . Then f = f ◦ g and deg( f ) = deg( f ) . Hence, part 1 ofLemma 4.4 and the induction hypothesis together imply that(41) deg a ( f ) = deg a ( f ◦ g ) ≤ deg( f ◦ g ) = deg a (( g t +1 ◦ · · · ◦ g ) ◦ θ e ◦ g ) . On the other hand letting g ′ = g t +1 ◦ · · · ◦ g , we see that the t = 1 case above applied to g ′ ◦ θ e ◦ g in place of f implies that(42) deg a (( g t +1 ◦ · · · ◦ g ) ◦ θ e ◦ g ) ≤ deg a (( g t +1 ◦ · · · ◦ g ) ◦ θ e ) . Therefore after combining (41) and (42), we establish (40) as claimed. Finally, the bound inpart 2 of Lemma 4.4 follows easily from (40), the fact that deg a ( θ e ) = ( d e − + · · · + d + 1) = ( d e − / ( d − , and that deg( g t +1 ◦ · · · ◦ g ) = deg( f ) /d e . (cid:3) We are nearly ready to prove Theorems 1.5 and 1.6, versions of Conjecture 1.3 in dimensionone, for some fairly general sets of maps. However to complete the main remaining step, (i.e.,to pass from counting functions to counting points), we need to show that f ( P ) = g ( P ) occursrarely for f, g ∈ M S and P of large enough height. This is largely achieved by ensuring thatthe rational (or integral) points on the curves(43) C i : φ i ( x ) − φ i ( y ) x − y = 0 and C j,k : φ j ( x ) = φ k ( y ) for j = k are ﬁnite. For critically separate and simple sets of rational maps this follows from the genuscalculations in [20] and Faltings’ theorem: Proposition 4.5.

Let S = { φ , . . . , φ s } be a set of rational maps on P ( Q ) all of degree atleast . If S is critically separate and critically simple, then the curves in (43) have at mostﬁnitely many rational points over any number ﬁeld.Proof. Since S is critically separate, [20, Proposition 3.1] implies that each C j,k is an irre-ducible curve for all j = k . Likewise, it is shown on [20, p208] that the genus of C j,k isgiven by (deg( φ j ) − φ k ) − ≥ . Hence, the C j,k have at most ﬁnitely many rationalpoints over any number ﬁeld by Faltings’ theorem. Likewise, [20, Corollary 3.6] implies thateach C i is an irreducible curve. Moreover, it is shown on [20, p210] that the genus of C i is (deg( φ i ) − ≥ . Hence, the C i also have at most ﬁnitely many rational points over anynumber ﬁeld by Faltings’ theorem. (cid:3) For the sets of polynomials in Theorem 1.6, it suﬃces for our purposes to show that thecurves in (43) have only ﬁnitely many integral points (as opposed to rational points). To dothis, we need the integral point classiﬁcation theorems in [3] and the Appendix 5. To putthese results in context, we ﬁrst recall the deﬁnition of Siegel factors and Siegel’s integralpoint theorem.

Deﬁnition 4.6. A Siegel polynomial over a ﬁeld K is an absolutely irreducible polynomial Φ( x, y ) ∈ K [ x, y ] for which the curve Φ( x, y ) = 0 has genus zero and has at most two pointsat inﬁnity. A Siegel factor of a polynomial Ψ( x, y ) ∈ K [ x, y ] is a factor of Ψ which is a Siegelpolynomial over K .The following result explains the relevance of Siegel factors in this context and is one of themost important results in arithmetic geometry; see Theorems 8.2.4 and 8.5.1 in [18]. Theorem 4.7 (Siegel) . Let R be a ﬁnitely generated integral domain of characteristic zero,let K be the ﬁeld of fractions of R , and let Φ( x, y ) ∈ K [ x, y ] . Then there are only ﬁnitelymany pairs ( x, y ) ∈ R × R for which Φ( x, y ) = 0 unless Φ( x, y ) has a Siegel factor over K .Remark . Clearly if K is a number ﬁeld (viewed inside the complex numbers) and Φ( x, y ) has no Siegel factors over C , then Φ( x, y ) has no Siegel factors over K . Therefore, to provethat the equation Φ( x, y ) = 0 has only ﬁnitely many solutions ( x, y ) in some ring of S -integers R ⊂ K , it suﬃces to show that Φ( x, y ) has no Siegel factors over C .To use Siegel’s integral point theorem to show that f ( P ) = g ( P ) occurs infrequently for f, g ∈ M S and P of suﬃciently large height (see Lemma 4.10 for a precise statement), we needthe following theorem of Bilu and Tichy [3, Theorem 10.1], which classiﬁes the polynomials Φ( x, y ) = F ( x ) − G ( y ) having a Siegel factor. OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 15

Theorem 4.8.

For non-constant

F, G ∈ C [ x ] , if F ( x ) − G ( y ) has a Siegel Factor in C [ x, y ] then F = E ◦ F ◦ µ and G = E ◦ G ◦ ν , where E, µ, ν ∈ C [ x ] with deg( µ ) = deg( ν ) = 1 andeither ( F , G ) or ( G , F ) is one of the following pairs (here m, n ≥ and p ∈ C [ x ] K { } ): (a) (cid:0) x m , x r p ( x ) m (cid:1) , where r ∈ N is coprime to m; (b) (cid:0) x , ( x + 1) p ( x ) (cid:1) ; (c) (cid:0) T m , T n (cid:1) with gcd( m, n ) = 1 ; (d) (cid:0) T m , − T n (cid:1) with gcd( m, n ) > ; (e) (cid:0) ( x − , x − x (cid:1) .Remark . Technically, the statement above is a simpliﬁed version of [3, Theorem 10.1] takenfrom [8, Corollary 2.7]. For a more detailed description of the classiﬁcation of pairs ( F, G ) such that F ( x ) − G ( y ) has a Siegel factor (with the relevant ﬁelds of deﬁnition taken intoaccount), see [3].In particular, condition (3) of Theorem 1.6 implies that the aﬃne curves C i,j : φ i ( x ) = φ j ( y ) for i = j have only ﬁnitely many integral points. Here we use Theorem 4.7, Remark 11, and Theorem4.8: the pairs (a)-(d) in Theorem 4.8 are ruled out by condition (3) by examining ﬁrst coor-dinates only (all cyclic or Chebychev polynomials). Likewise, ( x − = F ◦ E ◦ L , where F ( x ) = ( x − , E ( x ) = x is cyclic, and L ( x ) = x . Hence, the pair in (e) is also ruled outby condition (3). Similarly, condition (3) implies that the aﬃne curves C i : φ i ( x ) − φ i ( y ) x − y = 0 have only ﬁnitely many integral points. Here we use Theorem 4.7 and Theorem 2 in theAppendix; Zannier has shown that such curves have at least points at inﬁnity over C andthus cannot have a Siegel factor over any number ﬁeld. In particular, we are now ready toprove our orbit counts for P from the Introduction. (Proof of Theorems 1.5 and 1.6). Suppose that S is a critically separate and critically simpleset of rational functions or that S is a set of polynomials satisfying conditions (1)-(3) ofTheorem 1.6. Then in particular, M S is free by Proposition 4.5 in the rational function case,and M S is free by Theorem 4.2 in the polynomial case. Hence, Theorem 1.1 implies that thenumber of functions f ∈ M S satisfying H ( f ( P )) ≤ B , has the desired growth rate (in eithercase), whenever P has large enough height.To pass from functions to points, we need to control when f ( P ) = g ( P ) is possible for f, g ∈ M S . With this in mind, let R P ⊂ K be a ring of S -integers in some number ﬁeld K (not the same S as the set of functions) containing P and the coeﬃcients of the maps in S .Then deﬁne the quantities κ P := max n h ( x ) : ( x, y ) ∈ C i ( K ) or ( x, y ) ∈ C j,k ( K ) for some y ∈ K and some i, j, k o . in the rational function case and κ P := max n h ( x ) : ( x, y ) ∈ C i ( R P ) or ( x, y ) ∈ C j,k ( R P ) for some y ∈ R P and some i, j, k o . in the polynomial case. Then in either case, κ P is ﬁnite by Proposition 4.5, Theorem 4.7,Remark 11, Theorem 4.8, and Theorem 1 in the Appendix. Now given f = θ ◦ θ ◦· · ·◦ θ n ∈ M S ,deﬁne the length of f to be ℓ ( f ) = n ; note that this quantity is well-deﬁned since M S is free.Moreover letting v = (1 , . . . , , we see that ℓ = ℓ S, v in our earlier notation. Next, recall theconstant b S given by b S = C S / ( d S − , where C S and d S are the height constants in Deﬁnition2.1 above. Then, Tate’s telescoping Lemma 2.2 implies that if h ( ρ ( P )) ≤ κ P for some P with h ( P ) > b S and some ρ ∈ M S , then(44) ℓ ( ρ ) b S ≤ deg( ρ )( h ( P ) − b S ) ≤ h ( ρ ( P )) ≤ κ P . Hence, the length of such ρ is bounded; speciﬁcally, ℓ ( ρ ) ≤ max (cid:8) , ⌈ log ( κ P /b S ) ⌉ (cid:9) := r P ,from which we deduce the following fact. Lemma 4.9.

Suppose that S satisﬁes the conditions of Theorems 1.5 or 1.6 and let ρ ∈ M S .If ℓ ( ρ ) > r P , h ( P ) > b S , and θ ( ρ ( P )) = τ ( P ′ ) for some P ′ ∈ R P and some θ, τ ∈ S , then θ = τ and ρ ( P ) = P ′ . In particular, this allows us to control the number of functions in M S that can agree at P . Lemma 4.10.

Suppose that S satisﬁes the conditions of Theorems 1.5 or 1.6 and h ( P ) > b S .Then there is a constant t P,S depending only on P and S such that (cid:8) f ∈ M S : f ( P ) = Q (cid:9) ≤ t P,S holds for all but ﬁnitely many Q ∈ Orb S ( P ) .Proof. Let d S = max { deg( φ ) : φ ∈ S } and suppose that Q ∈ Orb S ( P ) satisﬁes h ( Q ) > d Sr P +1 ( h ( P ) + b S ) , true of all but ﬁnitely many Q by Northcott’s Theorem; each Q ∈ Orb S ( P ) ⊆ P ( K ) byconstruction of K . Then, it follows from Tate’s telescoping Lemma 2.2 that ℓ ( f ) > r P + 1 forall f ∈ M S with f ( P ) = Q : otherwise, h ( Q ) = h ( f ( P )) ≤ deg( f )( h ( P ) + b S ) ≤ d Sℓ ( f ) ( h ( P ) + b S ) ≤ d Sr P +1 ( h ( P ) + b S ) , a contradiction. In particular, each function taking the value of Q at P has length strictlylarger than r P +1. Now, let f Q ∈ M S be a function of smallest length taking the value of Q at P . Then ℓ ( f Q ) > r p + 1 and we may write f Q = τ ◦ · · · ◦ τ m ◦ ρ Q form some τ i ∈ S , some m ≥ , and some ρ Q ∈ M S of length r P + 1 . Likewise, for any other f ∈ M S with f ( P ) = Q ,we may write f = θ ,f ◦ · · · ◦ θ m,f ◦ q f ◦ ρ f for some θ i,f ∈ S , some q f ∈ M S , and some ρ f ∈ M S of length r P + 1 ; here we use the minimality of the length of f Q . Then f ( P ) = f Q ( P ) implies:(45) θ ,f ◦ · · · ◦ θ m,f ◦ q f ◦ ρ f ( P ) = τ ◦ · · · ◦ τ m ◦ ρ Q ( P ) . Now for all ≤ i ≤ m , let ρ i = θ i +1 ,f ◦ · · · ◦ θ m,f ◦ q f ◦ ρ f and P ′ i = τ i +1 ◦ · · · ◦ τ m ◦ ρ Q ( P ) . Inparticular, (45) becomes θ ,f ( ρ ( P )) = τ ( P ′ ) . On the other hand, P ′ i ∈ R P by deﬁnition of R P and ℓ ( ρ i ) ≥ ℓ ( ρ f ) = r P + 1 > r P for all i .Hence, Lemma 4.9 applied to ρ = ρ , P ′ = P ′ , θ = θ ,f , and τ = τ implies that θ ,f = τ and ρ ( P ) = P ′ . Therefore, θ ,f ◦ · · · ◦ θ m,f ◦ q f ◦ ρ f ( P ) = τ ◦ · · · ◦ τ m ◦ ρ Q ( P ) . Repeating the same argument, this time with ρ = ρ , P ′ = P ′ , etc., we see that Lemma 4.9implies that θ ,f = τ and ρ ( P ) = P ′ . We can clearly continue this argument ( m -times) andobtain that(46) q f ◦ ρ f ( P ) = ρ Q ( P ) and θ i,f = τ i for all ≤ i ≤ m. On the other hand, Tate’s Telescoping Lemma 2.2 and the fact that h ( P ) > b S imply thelower bound(47) r P +1 b S ≤ deg( ρ f )( h ( P ) − b s ) ≤ h ( ρ f ( P )) . Likewise, we have the upper bound(48) h ( ρ Q ( P )) ≤ deg( ρ Q )( h ( P ) + b S ) ≤ d Sr P +1 ( h ( P ) + b S ) . Hence, after combining (46), (47) and (48) with Lemma 2.2 applied to the map q f , we seethat deg( q f )(2 r P +1 − b S ≤ deg( q f )( h ( ρ f ( P )) − b S ) ≤ h ( q f ◦ ρ f ( P )) = h ( ρ Q ( P )) ≤ d Sr P +1 ( h ( P ) + b S ) . In particular, dividing both sides of the inequality above by (2 r P +1 − b S , we deduce that(49) ℓ ( q f ) ≤ deg( q f ) ≤ d Sr P +1 ( h ( P ) + b S )(2 r P +1 − b S . Hence the length of q f is bounded. But S is a ﬁnite set of maps, so the number of possible q f ’sis ﬁnite. Likewise, the length of ρ f is r P + 1 is bounded, and so there are only ﬁnitely manypossible ρ f ’s. In summation, we have shown that if f ∈ M S is any function with f ( P ) = Q , OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 17 then f = τ ◦ · · · ◦ τ m ◦ q f ◦ ρ f such that: the τ i are ﬁxed, and the number of possible q f ’s and ρ f ’s are bounded independently of Q . Speciﬁcally, we have that (cid:8) f ∈ M S : f ( P ) = Q (cid:9) ≤ s log l d SrP +1( h ( P )+ bS )(2 rP +1 − bS m + r P +1 holds for all Q ∈ Orb S ( P ) with h ( Q ) > d Sr P +1 ( h ( P ) + b S ) , which proves the claim. (cid:3) We now ﬁnish the proof of Theorems 1.5 and 1.6. Note that Lemma 4.10 implies that: t − P,S · (cid:8) f ∈ M S : H ( f ( P )) ≤ B (cid:9) + O (1) ≤ (cid:8) Q ∈ Orb S ( P ) : H ( Q ) ≤ B (cid:9) ≤ (cid:8) f ∈ M S : H ( f ( P )) ≤ B (cid:9) holds for all B suﬃciently large and all P such that H ( P ) > e b S . Moreover, combining thebounds above with Theorem 1.1, we see that for all ǫ > there exists an eﬀectively computablepositive constant b = b ( S, ǫ ) such that (log B ) b ≪ { Q ∈ Orb S ( P ) : H ( Q ) ≤ B } ≪ (log B ) b + ǫ as desired. (cid:3) In higher dimensions, it is possible that one can attack Conjecture 1.3 in a similar mannerto that above, provided that one can give a reasonable condition ensuring that the set ofrational/integral points on the variety V f,g := { ( P, Q ) ∈ P N × P N : f ( P ) = g ( Q ) } is not Zariski dense (for all distinct f, g ∈ M S of some ﬁxed length). To do this, it is likelynecessary to assume the Bombieri-Lang Conjecture.Likewise (although most sets generate free monoids), it would be interesting to study theheight growth rates in monoid orbits which are not free (or free commutative). As a test case,one might consider the following example from [14, Remark 1.5]: let ω be a primitive cuberoot of unity and let F ( x ) = x and G ( x ) = ωx . Then the monoid generated by S = { F, G } has three independent relations: F = G , F ◦ G = G ◦ F , and G ◦ F ◦ G = F ◦ G ◦ F .5. Appendix: integral points on curves f ( X ) − f ( Y ) X − Y (by Umberto Zannier) Let f ∈ C [ X ] be a polynomial of degree d ≥ and let O be a ﬁnitely generated subring of C . For the sequel we put(50) F ( X, Y ) = f ( X ) − f ( Y ) X − Y .

Recall also that the cyclic polynomial of degree n is simply X n , and the Chebyshev polynomial of degree n is the unique polynomial T n satisfying the identity T n ( Z + Z − ) = Z n + Z − n . Thepurpose of the present Appendix is to prove the following: Theorem 1.

Assume that the plane curve deﬁned by F ( X, Y ) has inﬁnitely many points in O . Then there are an integer n > and polynomials g, l ∈ C [ X ] , with deg l = 1 , such that f = g ◦ S n ◦ l , where S n is either the cyclic or the Chebyshev polynomial of degree n .Remark . Note that the result has an easy converse, as soon as we allow some freedom on O , as we now illustrate:(i) If S n ( X ) = X n (after applying l − ) we obtain factors X − ζY ( ζ n = 1 , ζ = 1 ) for ourpolynomial F ( X, Y ) , i.e. components of the curve which are lines deﬁned over Q ( ζ ) . Thereforewe obtain inﬁnitely many points in O as soon as O contains ζ (and the coeﬃcients of l ).(ii) In the case S n = T n , from the deﬁning property of T n we easily obtain (well-known)factors of T n ( X ) − T n ( Y ) given by X − ( ζ + ζ − ) XY + Y + ( ζ − ζ − ) , for ζ = ± an n -th root of unity. On setting Y = W + W − , this quadratic in turn factors as ( X − ζW − ζ − W − )( X − ζ − W − ζW − ) . Hence, if we let w take values in O ∗ (which may well beinﬁnite) and set X = x = ζw + ζ − w − we obtain again an inﬁnity of points in O . We also obtain similarly quadratic factors of T n ( X ) + T n ( Y ) , which are relevant when g ( X ) = h ( X ) is even. These factors divide also T n ( X ) − T n ( Y ) , since T n = T ◦ T n = T n − .In the next version of the result, i.e. Theorem 2 below, we shall add a further conclusionwhich implies that all but ﬁnitely many integral points arise in this way.As to the theorem, we recall at once that in virtue of Siegel’s Theorem (extended suitablyto ﬁnitely generated subrings) an irreducible aﬃne curve can have can have inﬁnitely many(integral) points deﬁned over O only if(i) it has genus and(ii) it has at most two points at inﬁnity. See [4], or [17], or [23]. The crucial case is the original Siegel’s 1929 version over Z , asextended later by Mahler to the rings of S -integers in a number ﬁeld.Thus the problem is to investigate when the (possibly reducible) curve deﬁned by F ( X, Y ) has a component satisfying these ‘Siegel conditions’ (which cannot generally be improved).This leads in the ﬁrst place to the need to establish when the deﬁning polynomial F canbe reducible . If f is indecomposable (i.e. not of the shape g ◦ h for polynomials g, h of degree > ) then the correct condition was found by Fried [7]: namely, F is irreducible unless f ( X ) is either a cyclic or a Chebyshev polynomial up to a linear change of variable , which of coursecorresponds to our conclusion. (See also Schinzel’s book [21], especially 1.5, where ﬁelds ofdeﬁnitions are considered as well, which instead we disregard here.) An application of Fried’sresult would then directly yield the present theorem in the indecomposable cases.However, if f is decomposable then certainly F ( X, Y ) is anyway reducible, and the issueleads to more delicate problems concerning the nature of the irreducible factors. In the paper[1] a laborious classiﬁcation is obtained for all the cases when there is a factor deﬁning a curveof genus . The results of [1] depend on some ﬁnite-group theory, which is used to an evenmuch heavier extent in Mueller’s paper [19], which again obtains certain complete laboriousclassiﬁcations relevant for suitable applications of Siegel’s theorem.An applications of [1] would suﬃce for the present purposes of proving Theorem 1, evenforgetting about Siegel’s condition (ii). But in fact it turns out that adding such conditionnot only makes the former (i) automatic, but also leads to a much simpler and self-containedelementary proof, which can be hopefully useful for some readers and for other applications.Moreover this proof yields with little eﬀort a slightly more precise conclusion, as in the lastphrase of the statement below (which, as in the Remark above, allows to describe all butﬁnitely many integral points).To present such a proof is the scope of this Appendix. By the remarks above, for Theorem1 it will suﬃce to prove the following result (even disregarding the last conclusion): Theorem 2.

Assume that the polynomial F ( X, Y ) has an irreducible factor Φ deﬁning a curvewith at most two points at inﬁnity (in a closure in P ). Then deg Φ ≤ and there are aninteger n > and polynomials g, l ∈ C [ X ] , with deg l = 1 , such that f = g ◦ S n ◦ l , where S n is the cyclic (if deg Φ = 1 ) or the Chebyshev (if deg Φ = 2 ) polynomial of degree n .If deg Φ = 1 , then Φ divides l ( X ) n − l ( Y ) n . If deg Φ = 2 , then Φ is symmetric and eitherit divides S n ( l ( X )) − S n ( l ( Y )) , or g is even and Φ divides S n ( l ( X )) + S n ( l ( Y )) .Proof. To start with, we normalize f by assuming it is monic and with vanishing secondcoeﬃcient: f ( X ) = X d + f X d − + . . . + f d , f i ∈ C . This does not aﬀect the results on takinginto account the linear polynomial l ( X ) in the statement.Our aﬃne (possibly reducible) curve C F : F ( X, Y ) = 0 has degree d − . Note that thepoints at inﬁnity in P of (the closure of) this curve are given in homogenous coordinates ( x : y : z ) by z = 0 , x d = y d , x = y , so they form a set of d − pairwise distinct points. Let Φ( X, Y ) ∈ C [ X, Y ] be an irreducible factor of F ( X, Y ) , deﬁning an irreducible curve C Φ with at most two points at inﬁnity. The homogeneous part of Φ of highest degree must bea factor of ( X d − Y d ) / ( X − Y ) , and the points at inﬁnity correspond to linear factors of this By points at inﬁnity we mean the missing points with respect to a projective closure of the curve. Thisnumber may increase by passing to a smooth model, but the theorem applies to any model. They are smooth points, which simpliﬁes things as we do not need to refer to smooth models.

OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 19 homogeneous part. Since this has not multiple factors, we deduce that C Φ has deg Φ pointsat inﬁnity. Hence, if C Φ satisﬁes Siegel’s condition (ii), we must have deg Φ ≤ .From these considerations it also follows that we may assume that Φ is monic in Y .Suppose ﬁrst that deg Φ = 1 , so Φ( X, Y ) = Y − aX − b ; hence we must have f ( aX + b ) = f ( X ) identically. Since however f has vanishing second coeﬃcient, this entails b = 0 , hence f ( aX ) = f ( X ) . We already know that a is a d -th root of unity, a = 1 . If n is the exact orderof a , then n > divides d and f must be a polynomial in X n , i.e. f ( X ) = g ( X n ) and we fallinto one of the cases of the conclusion.Note that Y − aX divides indeed X n − Y n so the last assertion holds as well.Suppose now that deg Φ = 2 . The two points at inﬁnity of C Φ correspond to two Puiseuxexpansions Y = P ± ( X ) := a ± X + b ± + b ± X − + . . . in descending powers of X , where b i ± are complex numbers and a ± are two distinct d -th roots of , both diﬀerent from .We have Φ( X, P ± ( X )) = 0 hence F ( X, P ± ( X )) = 0 , so f ( X ) = f ( P ± ( X )) identically. Asbefore, since f has vanishing second coeﬃcient this yields b ± = 0 . We may write Φ( X, Y ) = ( Y − a + X )( Y − a − X ) + L ( X, Y ) − k, where L is linear homogeneous and k ∈ C . We have that P ± ( X ) − a ± X = O ( X − ) , in the sensethat it is a Puiseux series where no non-negative power of X appears. Since Φ( X, P ± ( X )) = 0 we get that L ( X, P ± ( X )) = O (1) for both choices of the sign. But then, since a ± are distinctthis implies L = 0 , and since Φ is irreducible we have k = 0 . Hence, setting s := a + + a − , p := a + a − , we have pk = 0 and Φ( X, Y ) = ( Y − a + X )( Y − a − X ) − k = Y − sXY + pX − k. Let now x be a variable over C and let y be a solution of Φ( x, y ) = 0 in an extension of C ( x ) , so F := C ( x, y ) is the function ﬁeld of C Φ . Note that F is a quadratic extension ofboth C ( x ) and C ( y ) ; looking at the equation we ﬁnd that the Galois groups are generatedrespectively by the automorphisms σ, τ of F (of order ) given by σ ( x ) = x, σ ( y ) = sx − y τ ( x ) = (cid:0) sp (cid:1) y − x, τ ( y ) = y. It will be notationally convenient to have another expression for F . Deﬁne the linear forms Z ± := Y − a ± X , so Φ = Z + Z − − k . Letting z ± = y − a ± x we thus have z + z − = k and x = z + − z − a − − a + = γ ( z + − z − ) , y = γ ( a − z + − a + z − ) , where we have put γ := ( a − − a + ) − . So in particular we have F = C ( z + ) and by an easycomputation one ﬁnds that the above automorphisms are expressed by(51) σ ( z + ) = − z − = αz + , τ ( z + ) = − a + a − z − = βz + , where α = − k , β = − ka + /a − .Now, since Φ( x, y ) = 0 we have F ( x, y ) = 0 whence f ( x ) = f ( y ) , so the ﬁeld K := C ( x ) ∩ C ( y ) contains C ( f ( x )) and thus the degree [ F : K ] is ﬁnite. The ﬁeld K is left ﬁxedby both σ, τ , and thus by the group G that they generate inside Aut ( F / C ) = PGL ( C ) . Bybasic Galois theory actually the ﬁxed ﬁeld of G is precisely the intersection C ( x ) ∩ C ( y ) = K .We have σ ( τ ( z + )) = ( β/α ) z + , hence β/α = a + /a − is a root of unity of a certain order n :actually, we already knew that a + , a − are d -th roots of unity, and they are distinct, so n > is a divisor of d .The group G is generated by σ and ξ := στ . On looking at the action on z + it is now easilyseen that σ − ξσ = ξ − , so G is a dihedral group of order n .Now, the rational function of z + given by w := z n + + α n z − n + of degree n is plainly invariantby both σ and ξ , hence by G . Again by simple Galois theory, we have C ( w ) = K . Therefore f ( x ) , which lies in K , is a rational function of w , f ( x ) = g ( w ) . (On comparing degrees weﬁnd deg g = d/n .) Recall that x = γ ( z + − z − ) = γ ( z + + ( − k ) z − ) = γ ( z + + αz − ) . Hence x has only thepoles z + = 0 , ∞ , and the same holds for f ( x ) (as functions of z + ). It follows at once that g must be a polynomial, of degree d/n .The proof is now easily completed by a simple change of variables. We have w ∈ K ⊂ C ( x ) ,so we may write w = S ( x ) with S a rational function of degree n , which as above must be apolynomial.Set z = δz + where δ α = 1 . Hence x = γδ − ( z + z − ) . Also, w = δ − n ( z n + z − n ) . Hence δ n S ( γδ − ( z + z − )) = z n + z − n , and by uniqueness it follows that δ n S ( γδ − X ) = T n ( X ) isthe Chebyshev polynomial of degree n . Hence in conclusion we ﬁnd f ( X ) = g ( δ − n T n ( γ − δX )) , as required.To check the last assertion, for notational simpliﬁcation we slightly change conventions andreplace g ( δ − n X ) with g ( X ) and f ( X ) with f ( γδ − X ) , so to suppose f ( X ) = g ( T n ( X )) . Inthe above notation, x becomes z + z − and y = a − z + a + z − . (Note that these substitutionsleave unchanged the set { a + , a − } .)Also, let µ = a + /a − , so µ n =: ǫ ∈ {± } . We have y = µa − (( z/µ ) + ( z/µ ) − ) , so T n (( µa − ) − y ) = ǫ ( z n + z − n ) = ǫT n ( x ) . Hence, setting ν := ( µa − ) − , we have T n ( νy ) = ǫT n ( x ) , g ( T n ( y )) = f ( y ) = f ( x ) = g ( T n ( x )) = g ( ǫT n ( νy )) . Denoting b := deg g = d/n , we then deduce that deg( T n ( y ) b − ( ǫT n ( νy )) b ≤ ( b − n . Buton factoring the left side and noting that all factors but at most one have degree ≥ n , thisimplies that in fact one of the factors is constant, hence (52) T n ( y ) = θǫT n ( νy ) + c, g ( θX + c ) = g ( X ) , for some b -th root of unity θ . Note that all of these equalities hold identically.Now, the Chebyshev polynomial T n ( X ) starts with X n − nX n − + . . . , whence the ﬁrst ofthe equations gives θǫν n = ν = 1 . Also, if n is odd then T n (0) = 0 whence c = 0 ; if n is eventhen ν n = 1 so θǫ = 1 and again setting y = 0 we ﬁnd c = 0 anyway. Conversely, if theseequalities hold it is easy to check that the equation holds, since T n has the same parity of n .So we may suppose in the sequel that θǫν n = ν = 1 and that g ( θX ) = g ( X ) .Now, consider again the equation T n ( νy ) = ǫT n ( x ) , i.e. ν n T n ( y ) = ǫT n ( x ) .If ν n = ǫ we have T n ( x ) = T n ( y ) so Φ( X, Y ) divides ( T n ( X ) − T n ( Y )) / ( X − Y ) , and we arein the ﬁrst case of the conclusion.If ν n = ǫ , then T n ( x ) = − T n ( y ) hence Φ( X, Y ) divides T n ( X ) + T n ( Y ) . Also, we havealready observed that θ = ǫν n which in this case equals − so g is an even polynomial by thesecond equation in (52) (since c = 0 ), again as in the sough conclusion.Finally, from the above equations we derive p = a + a − = ( a + /a − )( a − ) = ( µa − ) = ν − = 1 , hence Φ( X, Y ) is symmetric.This concludes the proof of Theorem 2. (cid:3) Remark . Actually, the proof yields some small supplementary information on the structureof the factors (which however can be deduced independently a posteriori ).We also note that the last conclusion could have been stated as follows: if n is maximal suchthat the decomposition holds, then the quadratic factor anyway divides S n ( l ( X )) − S n ( l ( Y )) .Indeed, if g is even, then since T ( X ) = X − , g can be written as h ◦ T and now we usethat T ◦ T n = T n (well known and easy to deduce). This argument is fairly standard.

OUNTING POINTS OF BOUNDED HEIGHT IN MONOID ORBITS 21

References [1] R. Avanzi and U. Zannier, The equation f ( X ) = f ( Y ) in rational functions X = X ( t ) , Y = Y ( t ) , Compositio Mathematica K surfaces in P × P × P , Math. Ann. f ( x ) = g ( y ) , Acta Arithmetica , CambridgeUniversity Press 2006.[5] W. Bosma, J. Cannon, and C. Playoust, The Magma algebra system I: The user language, Journal ofSymbolic Computation

Cambridge University Press , 2009.[7] M. Fried, On a conjecture of Schur,

Michigan Math. J ., 17 (1970): 41-50.[8] D. Ghioca, T. J. Tucker, and M. E. Zieve, Linear relations between polynomial orbits,

Duke MathematicalJournal

Linear algebra and its applications

J. Number Theory , 201 (2019): 228-256.[11] W. Hindes, Dynamical and arithmetic degrees for random iterations of maps on projective space, preprintarXiv:1904.04709[12] W. Hindes, Finite orbit points for sets of quadratic polynomials,

Int. J. Number Theory , 15.8 (2019):1693-1719.[13] W. Hindes, Dynamical height growth: left, right, and total orbits, submitted, arXiv:2002.09798.[14] Z. Jiang and M. Zieve, Functional equations in polynomials, REU project.[15] S. Kawaguchi, Canonical heights for random iterations in certain varieties,

Int. Math. Res. Not. , ArticleID rnm023, 2007.[16] S. Kawaguchi and Joseph H. Silverman, On the dynamical and arithmetic degrees of rational self-maps ofalgebraic varieties,

Journal für die reine und angewandte Mathematik (Crelles Journal) 2016.713 (2016):21-48.[17] S. Lang, Diophantine Geometry Springer-Verlag, 1982.[18] S. Lang, Fundamentals of Diophantine geometry, Springer Science & Business Media, 2013.[19] P. Mueller, Permutation groups with a cyclic two-orbits subgroup and monodromy groups of Laurentpolynomials.

Ann. Sc. Norm. Super. Pisa Cl. Sci. (5) 12 (2013), no. 2, 369-398.[20] F. Pakovich, Algebraic curves P ( x ) − Q ( y ) = 0 and functional equations, Complex Variables and EllipticEquations 56.1-4 (2011): 199-213.[21] A. Schinzel, Polynomials with special regard to reducibility, Cambridge Univ. Press, 2000.[22] J-P. Serre. Lectures on the Mordell-Weil theorem. Aspects of Mathematics. Friedr. Vieweg & Sohn, Braun-schweig, third edition, 1997. Translated from the French and edited by Martin Brown from notes by MichelWaldschmidt, With a foreword by Brown and Serre.[23] J-P. Serre, Lectures on the Mordell-Weil Theorem, 2nd Ed., Vieweg, 1990.[24] J. Silverman, The Arithmetic of Dynamical Systems, Vol. 241, Springer GTM, 2007.[25] J. Silverman, Rational points on K surfaces: a new canonical height, Inventiones mathematicae

Mathematics of Computation39.160(1982): 709-723.