[PDF] Multiple ergodic averages for variable polynomials

Abstract

In this paper we study multiple ergodic averages for "good" variable polynomials. In particular, under an additional assumption, we show that these averages converge to the expected limit, making progress related to an open problem posted by Frantzikinakis (Problem 10, "Some open problems on multiple ergodic averages. Bulletin of the Hellenic Mathematical Society. 60 (2016), 41-90"). Corresponding averages along prime numbers are studied too. These general convergence results imply various variable extensions of classical recurrence, combinatorial and number theoretical results which are presented as well.

Full PDF

aa r X i v : . [ m a t h . D S ] J a n MULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS

ANDREAS KOUTSOGIANNIS

Abstract.

In this paper we study multiple ergodic averages for “good” variable poly-nomials. In particular, under an additional assumption, we show that these averagesconverge to the expected limit, making progress related to an open problem posted byFrantzikinakis ([14, Problem 10]). Corresponding averages along prime numbers arestudied too. These general convergence results imply various variable extensions of clas-sical recurrence, combinatorial and number theoretical results which are presented aswell.

Dedicated to the loving memory of Aris Deligiannis, a great mentor.

Contents

1. Introduction 2Notation 72. General results and applications 72.1. Single sequence 82.2. Multiple sequences 92.3. Convergence along primes 122.4. Recurrence along shifted primes 133. The ℓ = 1 case 144. Some background material 174.1. Factors 174.2. Nilmanifolds 185. Finding the characteristic factor 195.1. The base case 205.2. The general case 266. Equidistribution 327. From natural to prime numbers 378. Proof of main results 418.1. Averages along natural numbers 418.2. Averages along prime numbers 448.3. Closing comments and problems 45Acknowledgements 46References 46 Mathematics Subject Classiﬁcation.

Primary: 37A45; Secondary: 37A05, 05D10, 11B25, 11B83.

Key words and phrases.

Variable polynomial sequences, multiple ergodic averages, multiple recurrence,characteristic factors, equidistribution, nilmanifolds, prime numbers, sublinear functions. Introduction

The study of multiple ergodic averages along polynomials dates back to 1977, whereFurstenberg, exploiting the ( L ) limiting behavior, as N → ∞ , of(1) N N X n =1 T n f · T n f · . . . · T ℓn f ℓ , where ℓ ∈ N , ( X, B , µ, T ) is a measure preserving system, and f , . . . , f ℓ ∈ L ∞ ( µ ) , provided(in [20]) a purely ergodic theoretic proof of Szemerédi’s theorem (i.e., every subset of naturalnumbers of positive upper density contains arbitrarily long arithmetic progressions–resultthat can be immediately obtained by combining Theorem 2.3 with Theorem 2.5 below).It was Bergelson who ﬁrst visualized the iterates n, n, . . . , ℓn in (1) as “distinct enough”polynomials and studied (initially in [4]), via the use of van der Corput’s lemma which is acrucial tool in “reducing the complexity” of the iterates (see Lemma 5.5 below for a variablevariation of it), averages of the form(2) N N X n =1 T p ( n ) f · . . . · T p ℓ ( n ) f ℓ , for essentially distinct integer polynomials p , . . . , p ℓ , which, eventually, led to multidi-mensional polynomial extensions of Szemerédi’s theorem (see [7]).Bergelson and Leibman conjectured (in [6]) that multiple ergodic averages of the form(3) N N X n =1 T p ( n )1 f · . . . · T p ℓ ( n ) ℓ f ℓ , in any system, for multiple commuting T i ’s (i.e., T i T j = T j T i for all i, j ) and arbitraryinteger polynomials p i , always have limit (as N → ∞ ). This conjecture was answered inthe positive by Walsh (in [36]), who showed it in greater generality, namely, for products oftransformations (i.e., Z ℓ -actions) which generate a nilpotent group, averaging along Følnersequences in Z . Alas, no speciﬁc expression of the limit was provided by the method.One of the questions that someone is called upon to answer is under which conditions onthe polynomials and/ or the system we can explicitly ﬁnd the limit of the aforementionedexpressions. In this direction we have only a few recent results; we present some of themhere. For a weakly mixing T Furstenberg ([20]) showed that (1), and later Bergelson([4]) that (2), converges to Q ℓi =1 R f i dµ ; we will refer to this limit as the “expected” one.Frantzikinakis and Kra (in [19]) proved that for a totally ergodic T (i.e., T n is ergodicfor all n ∈ N ) and independent integer polynomials p , . . . , p ℓ (i.e., every non-trivial linear All the limits in this article are taken with respect to the L norm, except otherwise stated. I.e., T : X → X is an invertible measure preserving transformation on the probability space ( X, B , µ ) . For a set A ⊆ N we deﬁne its upper density , ¯ d ( A ) , as ¯ d ( A ) := lim sup N →∞ | A ∩{ ,...,N }| N . A p ∈ Q [ t ] is called integer polynomial if p ( Z ) ⊆ Z . The non-constant integer polynomials p , . . . , p ℓ ,ℓ ∈ N , are called essentially distinct if p i − p j is non-constant for all i = j. A Følner sequence in Z is a sequence (Φ n ) n of ﬁnite subsets of Z such that for any m ∈ Z we have lim n →∞ | (Φ n + m ) ∩ Φ n || Φ n | = 1 . A special such sequence, that we will deal with, is Φ n = { , . . . , n } , n ∈ N . T is ergodic if T − A = A implies µ ( A ) ∈ { , } . T is weakly mixing if T × T is ergodic. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 3 combination of the p i ’s with scalars from Z is non-constant), (2) converges to the expectedlimit as well. For the multiple transformations case, when all T i ’s are weakly mixing andthe polynomials are of positive distinct degrees, Chu-Frantzikinakis-Host (in [8]) showedthat (3) converges to the expected limit too, result that we extended in [30] for iterates ofthe form ([ p i ( n )]) n , for p i ∈ R [ t ] , where [ · ] denotes the ﬂoor function. Host and Kra (in [26]), developing the theory of characteristic factors, obtained anexplicit expression of the limit of (1) in a general system. Analogously to that, Austin (in[1, 2]), studying (3), found precise characteristic factors for some speciﬁc cases of quadraticpolynomials for ℓ = 2 (and linear polynomials for ℓ = 3 ). Together with Donoso and Sun,exploiting a result by Tao and Ziegler ([35]) on concatenation of factors, we studied (in [10])expressions (even more general than (3), namely for Z ℓ -actions) with essentially distinctinteger polynomials (of multiple variables).Showing that the characteristic factor coincides with the nilfactor of the system andexploiting the equidistribution property of the corresponding polynomial sequence in nil-manifolds, Frantzikinakis proved (in [13]) that the expression(4) N N X n =1 T [ p ( n )] f · T p ( n )] f · . . . · T ℓ [ p ( n )] f ℓ , where p ∈ R [ t ] with p ( t ) = cq ( t ) + d, c, d ∈ R , q ∈ Q [ t ] , has in any system, the samelimit (as N → ∞ ) as (1); obtaining a reﬁnement of Szemerédi’s theorem. Generalizingthe condition to multiple polynomials, following Frantzikinakis’ approach, we showed withKarageorgos (in [27]) that for strongly independent real polynomials p , . . . , p ℓ (i.e., anynon-trivial linear combination of the p i ’s with scalars from R has at least one non-constantirrational coeﬃcient) the expression(5) N N X n =1 T [ p ( n )] f · . . . · T [ p ℓ ( n )] f ℓ , has the expected limit. Generalizing the previous deﬁnition for sequences of real variable polynomials, i.e., ofthe form ( p N ) N ⊆ R [ t ] , we have the following: Deﬁnition 1.1 ([14]) . The sequence ( p N ) N , where p N ∈ R [ t ] , N ∈ N , is good if thepolynomials have bounded degree and for every non-zero α ∈ R we have(6) lim N →∞ N N X n =1 e ip N ( n ) α = 0 . Letting ⌈ x ⌉ and [[ x ]] denote the ceiling and the closest integer functions respectively, using the relations ⌈ x ⌉ = − [ − x ] and [[ x ]] = [ x + 1 / , we see that this last result remains true if the [ · ] ’s are individually andindependently replaced by any rounding function. Similar results for Hardy ﬁeld and tempered functionscan be found in [11, 13] and [31] respectively. Here we mean, as it was stated before, in case T is ergodic, that the limit is equal to Q ℓi =1 R f i dµ, whereas, in the general case, it’s equal to Q ℓi =1 E ( f i |I ( T )) , where E ( f i |I ( T )) is the conditional expectationof f i with respect to the σ -algebra of the T -invariant sets. Actually, in (5) one can use any combination ofthe rounding functions [ · ] and [[ · ]] , or even replace each [ · ] by ⌈·⌉ . ANDREAS KOUTSOGIANNIS

The sequence of ℓ -tuples of variable polynomials ( p ,N , . . . , p ℓ,N ) N , where p i,N ∈ R [ t ] ,N ∈ N , ≤ i ≤ ℓ, is good if every non-trivial linear combination of the sequences ( p ,N ) N , . . . , ( p ℓ,N ) N is good. For this class of polynomial sequences, Frantzikinakis stated the following problem:

Problem 1 (Problem 10, [14]) . Let ( p ,N , . . . , p ℓ,N ) N be a good ℓ -tuple of variable polyno-mials. Then, for every ergodic system ( X, B , µ, T ) and functions f , . . . , f ℓ ∈ L ∞ ( µ ) , wehave lim N →∞ N N X n =1 T [ p ,N ( n )] f · . . . · T [ p ℓ,N ( n )] f ℓ = ℓ Y i =1 Z f i dµ, where the convergence takes place in L ( µ ) . Remark 1.2 ([14]) . If p N ( t ) = P dk =1 c k,N t k , c k,N ∈ R , N ∈ N , then ( p N ) N is good iﬀ forany a ∈ R \ { } we have (7) lim N →∞ N j k c j,N a k = ∞ for some ≤ j ≤ d, where k·k denotes the distance to the closest integer, i.e., k x k := d ( x, Z ) . As it is stated in [14], Problem 1 is interesting even in the following special cases (for aproof that the following sequences are good, see Lemma 8.1):

Example 1 ([14]) . For ℓ = 2 , the pair ( p ,N , p ,N ) N , where p ,N ( n ) = n/N a , p ,N ( n ) = n/N b , N, n ∈ N , < a < b < , is good. Example 2 ([14]) . For ℓ ∈ N , the ℓ -tuple ( p ,N , . . . , p ℓ,N ) N , where p i,N ( n ) = n i /N a , ≤ i ≤ ℓ, N, n ∈ N , < a < , is good.Showing that (4) has the same limit as (1) for p ∈ R [ t ] with p ( t ) = cq ( t ) + d, c, d ∈ R ,q ∈ Q [ t ] , which follows from [13, Theorem 2.2], one naturally states the following problem,which is the good-variable-polynomial-version of Frantzikinakis’ result: Problem 2.

Let ( p N ) N be a good polynomial sequence. Then, for every ℓ ∈ N , system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have lim N →∞ N N X n =1 T [ p N ( n )] f · T p N ( n )] f · . . . · T ℓ [ p N ( n )] f ℓ = lim N →∞ N N X n =1 T n f · T n f · . . . · T ℓn f ℓ , where the convergence takes place in L ( µ ) . After stating Problem 1, Frantzikinakis mentions that its ℓ = 1 case (which also coincideswith the ℓ = 1 case of Problem 2) can be obtained by using the spectral theorem. As thiscase can be proved without postulating any additional assumptions on the coeﬃcients ofthe good variable polynomial sequence ( p N ) N , and it reﬂects its strong equidistributionbehavior, we present, for reasons of completeness, its proof. In particular, in Section 3 weshow: Notice that this deﬁnition, via Weyl’s equidistribution theorem, for independent of N polynomials,characterizes the “strongly independence” notion from [27]. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 5

Theorem 1.3. If ( p N ) N is a good sequence of polynomials, ( X, B , µ, T ) an ergodic systemand f ∈ L ( µ ) , then lim N →∞ N N X n =1 T [ p N ( n )] f = Z f dµ, where the convergence takes place in L ( µ ) . For general ℓ ∈ N , we also make progress towards the validation of both Problems 1and 2. In particular, under some additional assumptions on the coeﬃcients of the goodvariable polynomials, we show two general results, Theorem 2.1 and Theorem 2.2 (see nextsection). In this introductory section, we will present an easier application of each of them,which still covers both Examples 1 and 2.To this end, we ﬁrst recall the set of sublinear logarithmico-exponential Hardy ﬁeldfunctions (of polynomial degree ) which converge to ( ± ) inﬁnity: SLE := { g ∈ LE : 1 ≺ g ( x ) ≺ x } . Next, we deﬁne an appropriate set of functions, from which we will get our variable se-quences of coeﬃcients: C := ( l X i =1 ρ i g i : ρ i ∈ R , g i ∈ SLE , ≤ i ≤ l, l ∈ N , with g l ≺ . . . ≺ g ) . Extending the deﬁnition from [27], we say that the sequence of ℓ -tuple of variable poly-nomials ( p ,N , . . . , p ℓ,N ) N , where for each ≤ i ≤ ℓ, p i,N has the form:(8) p i,N ( n ) = a i,d i ,N n d i + · · · + a i, ,N n + a i, ,N , with ( a i, ,N ) N bounded, and a i,j, · ∈ C , ≤ j ≤ d i , is strongly independent if for any ( λ , . . . , λ ℓ ) ∈ R ℓ \ { ~ } we have that P ℓi =1 λ i p i,N ( n ) is a non-constant polynomial in n .For example, the following tuple of variable polynomials √ N / + 1log N · log log N ! n − N n + 1 , N / n , √ N e/ − N π/ log / N ! n ! N is strongly independent.For Problem 1, i.e., multiple variable polynomial sequences, we have: Of course here someone can use ⌈·⌉ or [[ · ]] instead of [ · ] . Let R be the collection of equivalence classes of real valued functions deﬁned on some halﬂine ( c, ∞ ) ,c ≥ , where two functions that agree eventually are identiﬁed. These equivalence classes are called germs of functions. A Hardy ﬁeld is a subﬁeld of the ring ( R, + , · ) that is closed under diﬀerentiation. Here,we use the word function when we refer to elements of R (understanding that all the operations deﬁnedand statements made for elements of R are considered only for suﬃciently large values of x ∈ R ). Wesay that g is a logarithmico-exponential Hardy ﬁeld function , and we write g ∈ LE , if it belongs to aHardy ﬁeld of real valued functions and it’s deﬁned on some ( c, + ∞ ) , c ≥ , by a ﬁnite combination ofsymbols + , − , × , ÷ , n √· , exp , log acting on the real variable x and on real constants (for more on Hardyﬁeld functions, and in particular for logarithmico-exponential ones, one can check [11] and [13]). We write g ≺ g if | g ( x ) | / | g ( x ) | → ∞ as x → ∞ . ANDREAS KOUTSOGIANNIS

Theorem 1.4.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N be a strongly independent ℓ -tuple of polyno-mials of the form (8) . Then, for every ergodic system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have lim N →∞ N N X n =1 T [ p ,N ( n )] f · . . . · T [ p ℓ,N ( n )] f ℓ = ℓ Y i =1 Z f i dµ, where the convergence takes place in L ( µ ) . It is true that strongly independent sequences of variable polynomials as in (8) aregood (for this, see Lemma 8.1). It is also easy to check that the sequences ( p ,N , p ,N ) N , with p ,N ( n ) = n/N a , p ,N ( n ) = n/N b , < a < b < , and ( q ,N , . . . , q ℓ,N ) N , with q i,N ( n ) = n i /N a , ≤ i ≤ ℓ, and < a < , are strongly independent (here g ( N ) := N a ≺ g ( N ) := N b ∈ SLE ), hence Theorem 1.4 indeed covers both Examples 1 and 2.For Problem 2, i.e., a single variable polynomial sequence, we have the following: Theorem 1.5.

Let ( p N ) N be a non-constant polynomial sequence of the form (8) . Then,for every ℓ ∈ N , system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have lim N →∞ N N X n =1 T [ p N ( n )] f · T p N ( n )] f · . . . · T ℓ [ p N ( n )] f ℓ = lim N →∞ N N X n =1 T n f · T n f · . . . · T ℓn f ℓ , where the convergence takes place in L ( µ ) . Following arguments from [18] and [29], we get the corresponding to Theorems 1.4and 1.5 results along prime numbers (see also the more general Theorems 2.16 and 2.17, andSubsection 2.4 for recurrence results along shifted primes). To the best of our knowledge,these are the very ﬁrst results in the literature of this nature (i.e., for variable iterates).

Theorem 1.6.

For ℓ ∈ N , let ( q ,N , . . . , q ℓ,N ) N be a strongly independent ℓ -tuple of polyno-mials of the form (8) . Then, for every ergodic system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have lim N →∞ π ( N ) X p ∈ P ∩ [1 ,N ] T [ q ,N ( p )] f · . . . · T [ q ℓ,N ( p )] f ℓ = ℓ Y i =1 Z f i dµ, where the convergence takes place in L ( µ ) and π ( N ) = P ∩ [1 , N ] denotes the primenumbers up to N. Theorem 1.7.

Let ( q N ) N be a non-constant polynomial sequence of the form (8) . Then,for every ℓ ∈ N , system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have lim N →∞ π ( N ) X p ∈ P ∩ [1 ,N ] T [ q N ( p )] f · T q N ( p )] f · . . . · T ℓ [ q N ( p )] f ℓ = lim N →∞ N N X n =1 ℓ Y i =1 T in f i , where the convergence takes place in L ( µ ) . As we previously highlighted, very few convergence results for ergodic averages withpolynomial iterates, in which we can explicitly ﬁnd the limit, exist. Generally, results forvariable polynomials are extremely sparse. We will conclude this introduction, mentioningsome of them. Kifer (in [28]) studied multiple averages for variable polynomials of the form

ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 7 p i,N ( n ) = p i ( n ) + q i ( N ) , with p i ’s essentially distinct and q i ( Z ) ⊆ Z for T weakly mixing,and, for more general polynomials, for T strongly mixing “enough”. Finally, Frantzikinakis(in [13]) found characteristic factors for averages with variable polynomial iterates p i,N , with, independent of N, leading coeﬃcients. It is the arguments from this last part thatwe will adapt, in order to ﬁnd characteristic factors for our averages as well, which is oneof the main two ingredients of the proof (with the second one being the equidistributionof particular sequences for which we adapt arguments from [12]). Notation.

With P , N = { , , . . . } , Z , Q , and R we denote the set of prime, natural,integer, rational and real numbers respectively. For a measurable function f on a measurespace X with a transformation T : X → X, we denote with T f the composition f ◦ T. For s ∈ N , T s = R s / Z s denotes the s dimensional torus, and ( a ( n )) n denotes a sequenceindexed over the natural numbers (i.e., ( a ( n )) n ∈ N ). Finally, for two non-negative quantities a, b we write a ≪ b, if there exists a positive constant C such that a ≤ Cb. General results and applications

In this section we will state our most general results together with some applications ofthem, which are variable extensions of classical results. For the proofs of these implications,we follow [11] and [27], adapting the corresponding arguments to the variable polynomialcase.We ﬁrst cover Problem 1 for a subclass of good polynomial sequences:

Theorem 2.1.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N be a good and super nice ℓ -tuple of poly-nomials. Then, for every ergodic system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have (9) lim N →∞ N N X n =1 T [ p ,N ( n )] f · . . . · T [ p ℓ,N ( n )] f ℓ = ℓ Y i =1 Z f i dµ, where the convergence takes place in L ( µ ) . For a single good variable polynomial sequence, we also cover the following case ofProblem 2:

Theorem 2.2.

Let ( p N ) N ⊆ R [ t ] be a good polynomial sequence such that, for all ℓ ∈ N , ( p N , p N , . . . , ℓp N ) N is super nice. Then, for every ℓ ∈ N , system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have (10) lim N →∞ N N X n =1 T [ p N ( n )] f · T p N ( n )] f · . . . · T ℓ [ p N ( n )] f ℓ = lim N →∞ N N X n =1 ℓ Y i =1 T in f i , where the convergence takes place in L ( µ ) . We will show that Theorem 2.1 implies Theorem 1.4 (resp. Theorem 2.2 implies The-orem 1.5), hence it covers Examples 1 and 2, and that also holds for any polynomialfamily { p , . . . , p ℓ } (resp. { p, p, . . . , ℓp } for Theorem 2.2) which is independent of N andnon-trivial linear combinations of its members satisfy (6). In particular it implies [27,Theorem 2.1] for strongly independent polynomials (and of course the same is true for The “super niceness” property is a technical one which is deﬁned in Deﬁnition 5.12 (via Deﬁnition 5.6).

ANDREAS KOUTSOGIANNIS

Theorem 2.2 for a single polynomial p ∈ R [ t ] , any non-zero scalar multiple of which has atleast one non-constant irrational coeﬃcient). The approach we follow to show these results is similar to the one in [13] and [27], witha few extra twists as the variable case is trickier to deal with. Namely, one has to ﬁndthe characteristic factor of (9) and (10) (which turns out in both cases to be equal tothe nilfactor of the system–see Proposition 5.14 and 5.15) and show some equidistributionresults in nilmanifolds (mainly see Lemma 6.5 and Theorem 6.2). The “super niceness”property will be introduced so we can deal with the characteristic factor part, while the“goodness” property implies the equidistribution one.As it was mentioned in the previous section, the ergodicity assumption in Theorem 2.1can be dropped. Hence, our main theorems hold for any system. The strong nature ofthese results is also reﬂected in the fact that they have immediate recurrence and combi-natorial implications.To have some speciﬁc example in mind, one can imagine the following results under theassumptions of Theorems 1.4 and 1.5, i.e., for variable polynomials of the form (8).2.1.

Single sequence.

We ﬁrst deal with a single variable polynomial sequence, assumingthe validity of Theorem 2.2.2.1.1.

Recurrence.

It is Furstenberg Multiple Recurrence Theorem that will help us obtainrecurrence results:

Theorem 2.3 (Furstenberg Multiple Recurrence Theorem, [20]) . Let ( X, B , µ, T ) be asystem. Then, for any ℓ ∈ N and A ∈ B with µ ( A ) > , we have lim inf N →∞ N N X n =1 µ ( A ∩ T − n A ∩ T − n A ∩ . . . ∩ T − ℓn A ) > . Theorem 2.2, via Theorem 2.3, implies the following:

Corollary 2.4.

Let ( p N ) N ⊆ R [ t ] as in Theorem 2.2. Then, for every ℓ ∈ N , system ( X, B , µ, T ) , and A ∈ B with µ ( A ) > , we have lim N →∞ N N X n =1 µ (cid:16) A ∩ T − [ p N ( n )] A ∩ T − p N ( n )] A ∩ . . . ∩ T − ℓ [ p N ( n )] A (cid:17) > . Proof.

Using Theorem 2.2 with f i = A , ≤ i ≤ ℓ (i.e., the characteristic function of A ),together with Theorem 2.3, implies lim N →∞ N N X n =1 µ ℓ \ i =0 T − i [ p N ( n )] A ! = lim N →∞ N N X n =1 µ ℓ \ i =0 T − in A ! > , This will be justiﬁed later via the van der Corput operation, deﬁned in Subsection 5.2. Also, bothTheorems 2.1 and 2.2 remain true under the use of any combination of the rounding functions [ · ] and [[ · ]] , or even if we replace each [ · ] by ⌈·⌉ , as addition by / or multiplication by − respectively (using in thesecond case T − instead of T ) doesn’t alter the properties of the (variable) polynomials. The limit in this case is equal to Q ℓi =1 E ( f i |I ( T )) . Indeed, if µ = R µ t dλ ( t ) denotes the ergodicdecomposition of µ, it suﬃces to show that if E ( f i |I ( T )) = 0 for some i then the averages converge to . Since E ( f i |I ( T )) = 0 , we have that R f i dµ t = 0 for λ -a.e. t. By (9), we have that the averages go to in L ( µ t ) for λ -a.e. t, hence the limit is equal to in L ( µ ) . This lim inf , as we mentioned before, is actually a limit. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 9 which is exactly what we wanted to show. (cid:3)

Combinatorics.

Via Furstenberg Correspondence Principle, one gets combinatorialresults from recurrence ones. We present here a reformulation of this principle from [3].

Theorem 2.5 (Furstenberg Correspondence Principle, [20], [3]) . Let E be a subset ofintegers. There exists a system ( X, B , µ, T ) and a set A ∈ B with µ ( A ) = ¯ d ( E ) such that (11) ¯ d ( E ∩ ( E − n ) ∩ . . . ∩ ( E − n ℓ )) ≥ µ ( A ∩ T − n A ∩ . . . ∩ T − n ℓ A ) for every ℓ ∈ N and n , . . . , n ℓ ∈ Z . Using Corollary 2.4 and Theorem 2.5, we have the following variable combinatorialresult:

Corollary 2.6.

Let ( p N ) N ⊆ R [ t ] as in Theorem 2.2. Then, for every ℓ ∈ N and E ⊆ N with ¯ d ( E ) > , we have lim inf N →∞ N N X n =1 ¯ d ( E ∩ ( E − [ p N ( n )]) ∩ ( E − p N ( n )]) ∩ . . . ∩ ( E − ℓ [ p N ( n )])) > . Proof.

For E ⊆ N with ¯ d ( E ) > , let ( X, B , µ, T ) system and A ∈ B with µ ( A ) = ¯ d ( E ) > that satisﬁes (11). Using Corollary 2.4 we get lim inf N →∞ N N X n =1 ¯ d ℓ \ i =0 ( E − i [ p N ( n )]) ! ≥ lim N →∞ N N X n =1 µ ℓ \ i =0 T − i [ p N ( n )] A ! > , as was to be shown. (cid:3) Hence, we immediately get the following reﬁnement of Szemerédi’s theorem:

Corollary 2.7.

Let ( p N ) N ⊆ R [ t ] as in Theorem 2.2. Then, for every ℓ ∈ N , any E ⊆ N with ¯ d ( E ) > contains arithmetic progressions of the form: { m, m + [ p N ( n )] , m + 2[ p N ( n )] , . . . , m + ℓ [ p N ( n )] } , for some m ∈ Z , N ∈ N , and ≤ n ≤ N, with [ p N ( n )] = 0 . Multiple sequences.

Analogously to the previous results, assuming the validity ofTheorems 2.1, we have various implications for multiple variable polynomial sequences.2.2.1.

Recurrence.

A ﬁrst recurrence result is the following (we skip the proof as the ar-gument is the same to the one in [12, Theorem 2.8]):

Theorem 2.8.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. If ( X, B , µ, T ) is asystem and A , A , . . . , A ℓ ∈ B such that µ (cid:16) A ∩ T k A ∩ . . . ∩ T k ℓ A ℓ (cid:17) = α > for some k , . . . , k ℓ ∈ Z , then lim N →∞ N N X n =1 µ (cid:16) A ∩ T − [ p ,N ( n )] A ∩ . . . ∩ T − [ p ℓ,N ( n )] A ℓ (cid:17) ≥ α ℓ +1 . Setting A i = A and k i = 0 we immediately get the following (optimal lower bound): Corollary 2.9.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. Then, for every system ( X, B , µ, T ) and A ∈ B , we have lim N →∞ N N X n =1 µ (cid:16) A ∩ T − [ p ,N ( n )] A ∩ . . . ∩ T − [ p ℓ,N ( n )] A (cid:17) ≥ ( µ ( A )) ℓ +1 . Combinatorics.

Theorem 2.8, via [15, Proposition 3.3], which is a variant of Theo-rem 2.5 for several sets, implies the following (we are skipping the routine details):

Theorem 2.10.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. If E , E , . . . , E ℓ ⊆ N such that ¯ d ( E ∩ ( E + k ) ∩ . . . ∩ ( E ℓ + k ℓ )) = α > for some k , . . . , k ℓ ∈ Z , then lim inf N →∞ N N X n =1 ¯ d ( E ∩ ( E − [ p ,N ( n )]) ∩ . . . ∩ ( E ℓ − [ p ℓ,N ( n )])) ≥ α ℓ +1 . Setting E i = E and k i = 0 in the previous result, we get: Corollary 2.11.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. Then, for every E ⊆ N , we have lim inf N →∞ N N X n =1 ¯ d ( E ∩ ( E − [ p ,N ( n )]) ∩ . . . ∩ ( E − [ p ℓ,N ( n )])) ≥ ( ¯ d ( E )) ℓ +1 . So, we immediately obtain the following combinatorial result:

Corollary 2.12.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. Then any E ⊆ N with ¯ d ( E ) > contains arithmetic conﬁgurations of the form { m, m + [ p ,N ( n )] , m + [ p ,N ( n )] , . . . , m + [ p ℓ,N ( n )] } , for some m ∈ Z , N ∈ N , and ≤ n ≤ N, with [ p i,N ( n )] = 0 , for all ≤ i ≤ ℓ. Applying Theorem 2.10 to syndetic sets E , E , . . . , E ℓ and α = ( Q ℓi =1 r i ) − , where r i is the syndeticity constant of E i , ≤ i ≤ ℓ, we have: Corollary 2.13.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. If E , E , . . . , E ℓ ⊆ N are syndetic sets, then there exists m ∈ Z , N ∈ N , and ≤ n ≤ N, with [ p i,N ( n )] = 0 , forall ≤ i ≤ ℓ, such that m ∈ E , m + [ p ,N ( n )] ∈ E , . . . , m + [ p ℓ,N ( n )] ∈ E ℓ . The previous result, setting E i = c i E, ≤ i ≤ ℓ, where E ⊆ N is a syndetic set and c , c , . . . , c ℓ ∈ N , implies that we can ﬁnd x , x , . . . , x ℓ ∈ E, N ∈ N , and ≤ n ≤ N, A set E ⊆ N is called syndetic if ﬁnite translations of it are covering N . The cardinality of such a setof translations is a syndeticity constant of E . Where cE := { cn : n ∈ E } . ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 11 solution to the following system of equations: c x − c x = [ p ,N ( n )] c x − c x = [ p ,N ( n )] ... c ℓ x ℓ − c x = [ p ℓ,N ( n )] . Topological dynamics.

Let ( X, T ) be a (topological) dynamical system, whereby ( X, d ) is a compact metric space and T : X → X an invertible continuous transforma-tion. Suppose T is minimal (i.e., { T n x : n ∈ N } = X for all x ∈ X, hence, for every x ∈ X and non-empty open set U the set { n ∈ N : T n x ∈ U } is syndetic). There existsa T -invariant Borel measure which gives positive value to every non-empty open set. So,due to syndeticity, for every x ∈ X and every non-empty open set U we have(12) lim inf N →∞ N N X n =1 U ( T n x ) > (this limit actually exists). Since E ( f i |I ( T )) = lim N →∞ N P Nn =1 T n f i , combining (12)with the result from Theorem 2.1, we get for almost every x ∈ X (and hence for a denseset) and every U , . . . , U ℓ from a given countable basis of non-empty open sets that(13) lim sup N →∞ N N X n =1 U ( T [ p ,N ( n )] x ) · . . . · U ℓ ( T [ p ℓ,N ( n )] x ) > . Using this we get:

Theorem 2.14.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. If ( X, T ) is a minimaldynamical system, then, for a residual and T -invariant set of x ∈ X, we have (14) n ( T [ p ,N ( n )] x, . . . , T [ p ℓ,N ( n )] x ) : N ∈ N , ≤ n ≤ N o = X × · · · × X. Proof.

Relation (13) immediately implies that the set of points that satisfy (14), say R, isdense. To see that R is G δ , take ℓ = 1 (the general case is analogous). Then R = n x ∈ X : ∀ m, r ∈ N , ∃ N ∈ N and ≤ n ≤ N with T [ p ,N ( n )] x ∈ B ( x m , /r ) o , where { x m : m ∈ N } is a countable, dense subset of X and B ( x m , /r ) denotes the openball centered at x m with radius /r. The claim now follows since R = \ m,r ∈ N [ N ∈ N ≤ n ≤ N T − [ p ,N ( n )] B ( x m , /r ) . By the fact that ( T [ p ,N ( n )] ( T x ) , . . . , T [ p ℓ,N ( n )] ( T x )) = ( T × . . . × T )( T [ p ,N ( n )] x, . . . , T [ p ℓ,N ( n )] x ) we also get the T -invariance of R . (cid:3) Using Zorn’s lemma, we know that every dynamical system has a minimal subsystem.Using this and Theorem 2.14 we get:

Corollary 2.15.

For ℓ ∈ N , let ( p ,N , . . . , p ℓ,N ) N as in Theorem 2.1. If ( X, T ) is adynamical system, then, for a non-empty and T -invariant set of x ∈ X, we have n ( T [ p ( n )] x, . . . , T [ p ℓ ( n )] x ) : N ∈ N , ≤ n ≤ N o = { T n x : n ∈ N } × · · · × { T n x : n ∈ N } . Convergence along primes.

For averages along prime numbers, we will also showsome more general, comparing to Theorems 1.6 and 1.7, results. Once again, for ease, onecan imagine the following results under the assumptions of Theorems 1.6 and 1.7.

Theorem 2.16.

For ℓ ∈ N , let ( q ,N , . . . , q ℓ,N ) N be a sequence of ℓ -tuple of polynomialssuch that, for all W ∈ N and ≤ r ≤ W, the sequence ( q W,r, ,N , . . . , q W,r,ℓ,N ) N is good andsuper nice, where q W,r,i,N ( n ) = q i,N ( W n + r ) , ≤ i ≤ ℓ . Then, for every ergodic system ( X, B , µ, T ) and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have lim N →∞ π ( N ) X p ∈ P ∩ [1 ,N ] T [ q ,N ( p )] f · . . . · T [ q ℓ,N ( p )] f ℓ = ℓ Y i =1 Z f i dµ, where the convergence takes place in L ( µ ) . Theorem 2.17.

Let ( q N ) N ⊆ R [ t ] be a polynomial sequence such that, for every W ∈ N , ≤ r ≤ W, the sequence ( q W,r,N ) N is good and for all ℓ ∈ N , ( q W,r,N , q W,r,N , . . . , ℓq

W,r,N ) N is super nice, where q W,r,N ( n ) = q N ( W n + r ) . Then, for every ℓ ∈ N , system ( X, B , µ, T ) , and f , . . . , f ℓ ∈ L ∞ ( µ ) , we have lim N →∞ π ( N ) X p ∈ P ∩ [1 ,N ] T [ q N ( p )] f · T q N ( p )] f · . . . · T ℓ [ q N ( p )] f ℓ = lim N →∞ N N X n =1 ℓ Y i =1 T in f i , where the convergence takes place in L ( µ ) . Theorems 2.16 and 2.17 imply Theorems 1.6 and 1.7 but also [27, Theorem 2.14] and[27, Theorem 2.12], respectively, which are about constant (with respect to N ) polynomialsequences.As with Theorem 2.1, we can drop the ergodicity assumption in Theorem 2.16 as well.Assuming the validity of Theorems 2.16 and 2.17 we present some applications.For a single sequence, Theorem 2.17, via Theorem 2.3, implies the following: Corollary 2.18.

Let ( q N ) N ⊆ R [ t ] as in Theorem 2.17. Then, for every ℓ ∈ N , system ( X, B , µ, T ) , and A ∈ B with µ ( A ) > , we have lim N →∞ π ( N ) X p ∈ P ∩ [1 ,N ] µ (cid:16) A ∩ T − [ q N ( p )] A ∩ T − q N ( p )] A ∩ . . . ∩ T − ℓ [ q N ( p )] A (cid:17) > . Corollary 2.18, immediately implies the following Szemerédy-type theorem: Letting W = r = 1 , it is easy to see that this condition implies that ( q ,N , . . . , q ℓ,N ) N is good andsuper nice as well. As in the previous result, for W = r = 1 , we have that ( q N ) N is good and, for all ℓ ∈ N , ( q N , q N , . . . , ℓq N ) N is super nice. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 13

Corollary 2.19.

Let ( q N ) N ⊆ R [ t ] as in Theorem 2.17. Then, for every ℓ ∈ N , any E ⊆ N with ¯ d ( E ) > contains arithmetic progressions of the form: { m, m + [ q N ( p )] , m + 2[ q N ( p )] , . . . , m + ℓ [ q N ( p )] } , for some m ∈ Z , N ∈ N , and p ∈ P ∩ [1 , N ] , with [ q N ( p )] = 0 . For multiple sequences, Theorem 2.16 implies:

Corollary 2.20.

For ℓ ∈ N , let ( q ,N , . . . , q ℓ,N ) N as in Theorem 2.16. Then, for everysystem ( X, B , µ, T ) and A ∈ B , we have lim N →∞ π ( N ) X p ∈ P ∩ [1 ,N ] µ (cid:16) A ∩ T − [ q ,N ( p )] A ∩ . . . ∩ T − [ q ℓ,N ( p )] A (cid:17) ≥ ( µ ( A )) ℓ +1 . Via the corresponding to Corollary 2.20 combinatorial result, which we get by using thecorrespondence principle, we ﬁnally obtain:

Corollary 2.21.

For ℓ ∈ N , let ( q ,N , . . . , q ℓ,N ) N as in Theorem 2.16. Then any E ⊆ N with ¯ d ( E ) > contains arithmetic conﬁgurations of the form { m, m + [ q ,N ( p )] , m + [ q ,N ( p )] , . . . , m + [ q ℓ,N ( p )] } , for some m ∈ Z , N ∈ N , and p ∈ P ∩ [1 , N ] , with [ q i,N ( n )] = 0 , for all ≤ i ≤ ℓ. Recurrence along shifted primes.

Our method also implies recurrence resultsalong shifted primes. More speciﬁcally, we get the following:

Theorem 2.22.

Let ( q N ) N be a polynomial sequence such that, for every W ∈ N , thesequence ( q W,N ) N is good and for all ℓ ∈ N , ( q W,N , q W,N , . . . , ℓq

W,N ) N is super nice, where q W,N ( n ) = q N ( W n ) . Then, for every ℓ ∈ N , system ( X, B , µ, T ) , and A ∈ B with µ ( A ) > , the set [ N ∈ N n n ∈ [1 , N ] : µ (cid:16) A ∩ T − [ q N ( n )] A ∩ T − q N ( n )] A ∩ . . . ∩ T − ℓ [ q N ( n )] A (cid:17) > o has non-empty intersection with P − and P + 1 . Theorem 2.23.

For ℓ ∈ N , let ( q ,N , . . . , q ℓ,N ) N be a sequence of ℓ -tuple of polynomialssuch that, for all W ∈ N the sequence ( q W, ,N , . . . , q W,ℓ,N ) N is good and super nice, where q W,i,N ( n ) = q i,N ( W n ) , ≤ i ≤ ℓ . Then, for every system ( X, B , µ, T ) and A ∈ B with µ ( A ) > , the set [ N ∈ N n n ∈ [1 , N ] : µ (cid:16) A ∩ T − [ q ,N ( n )] A ∩ . . . ∩ T − [ q ℓ,N ( n )] A (cid:17) > o has non-empty intersection with P − and P + 1 . Via Furstenberg’s correspondence principle the previous results imply:

Corollary 2.24.

Under the assumptions of Theorem 2.22, for every ℓ ∈ N and E ⊆ N with ¯ d ( E ) > , we have that the set [ N ∈ N (cid:8) n ∈ [1 , N ] : ¯ d ( E ∩ ( E − [ q N ( n )]) ∩ ( E − q N ( n )]) ∩ . . . ∩ ( E − ℓ [ q N ( n )])) > (cid:9) has non-empty intersection with P − and P + 1 . Corollary 2.25.

Under the assumptions of Theorem 2.23, for any E ⊆ N with ¯ d ( E ) > , we have that the set [ N ∈ N (cid:8) n ∈ [1 , N ] : ¯ d ( E ∩ ( E − [ q ,N ( n )]) ∩ . . . ∩ ( E − [ q ℓ,N ( n )])) > (cid:9) has non-empty intersection with P − and P + 1 . The ℓ = 1 case This subsection is dedicated to proving Theorem 1.3. Someone who is familiar withWeyl’s equidistribution theorem, can see that there is a strong equidistribution naturebehind the deﬁnition of good sequences of variable polynomials, i.e., property (6). We start with some intermediate lemmas. Lemma 3.1.

Let ( p N ) N be a good sequence of polynomials. Then, for every complex-valuedcontinuous function f on R with period , we have (15) lim N →∞ N N X n =1 f ( p N ( n )) = Z f ( x ) dx. Proof.

We follow the arguments of the proof of [33, Theorem 2.1, Page 7]. For ε > , bythe Stone-Weierstrauss theorem, we ﬁnd a trigonometric polynomial q (i.e., q ( x ) is a ﬁnitelinear combination of functions of the form e πihx , h ∈ Z , with complex coeﬃcients) suchthat sup ≤ x ≤ | f ( x ) − q ( x ) | < ε. We have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z f ( x ) dx − N N X n =1 f ( p N ( n )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)Z ( f ( x ) − q ( x )) dx (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z q ( x ) dx − N N X n =1 q ( p N ( n )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X n =1 ( q ( p N ( n )) − f ( p N ( n ))) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ε + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z q ( x ) dx − N N X n =1 q ( p N ( n )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where the last quantity goes to as N → ∞ by the goodness property of ( p N ) N . Theconclusion follows as ε was chosen to be arbitrarily small. (cid:3) Using a standard argument, the previous statement can be upgraded to the following: We chose to present this proof at this point, before going deeper into characteristic factors andequidistribution of (variable) sequences in general nilmanifolds, as we will only make use of the deﬁnitionof a good polynomial sequence and Herglotz’s theorem. Note though that (6) is strictly stronger comparing to the one from Weyl’s criterion, as, for example, ( √ n ) n is equiditributed but ( n = 1 / √ √ n )) n is not. The following lemmas, in case of independent of N polynomial sequences, are characterizations of theequidistribution notion. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 15

Lemma 3.2.

Let ( p N ) N be a good sequence of polynomials. The conclusion of Lemma 3.1holds for every complex-valued Riemann integrable function f on R with period . Proof.

Without loss of generality assume that f take real values (otherwise we use theconclusion for the real and imaginary part of f ). We follow [33, Theorem 1.1, Page 2].Given ε > let f , f be continuous functions with f (0) = f (1) , f (0) = f (1) ,f ( x ) ≤ f ( x ) ≤ f ( x ) for all x ∈ [0 , and R ( f ( x ) − f ( x )) dx < ε. Using Lemma 3.1 forboth f , f we get Z f ( x ) dx − ε ≤ Z f ( x ) dx = lim N →∞ N N X n =1 f ( p N ( n )) ≤ lim inf N →∞ N N X n =1 f ( p N ( n )) ≤ lim sup N →∞ N N X n =1 f ( p N ( n )) ≤ lim N →∞ N N X n =1 f ( p N ( n )) = Z f ( x ) dx ≤ Z f ( x ) dx + ε. The conclusion follows as ε was chosen to be arbitrarily small. (cid:3) Noting that for a good sequence of polynomials ( p N ) N and γ ∈ R \ Q we have that (( h γ + h ) p N ) N is good for all ( h , h ) ∈ Z \ { (0 , } , we have the following: Lemma 3.3.

Let ( p N ) N be a good sequence of polynomials. Then, for every complex-valuedcontinuous periodic mod 1 function f on R and γ ∈ R \ Q , we have (16) lim N →∞ N N X n =1 f ( p N ( n ) γ, p N ( n )) = Z Z f ( x, y ) dxdy. Proof.

All continuous complex-valued periodic mod 1 (on each coordinate) functions on R can be approximated (with respect to the uniform norm) by ﬁnite linear combinationswith complex coeﬃcients of functions e πi ( h x + h y ) , ( h , h ) ∈ Z . The result now follows,as in the proof of Lemma 3.1, as (6) implies lim N →∞ N P Nn =1 e πi ( h γ + h ) p N ( n ) = 0 for all ( h , h ) ∈ Z \ { (0 , } . (cid:3) Analogously to the proof of Lemma 3.2, using the previous result, we get:

Lemma 3.4.

Let ( p N ) N be a good sequence of polynomials. Then, for every complex-valuedRiemann integrable periodic mod 1 function f on R and γ ∈ R \ Q , we have lim N →∞ N N X n =1 f ( p N ( n ) γ, p N ( n )) = Z Z f ( x, y ) dxdy. The next result is the last ingredient that we need in order to prove Theorem 1.3:

Lemma 3.5. If ( p N ) N is a good sequence of polynomials and γ ∈ R \ Z , then lim N →∞ N N X n =1 e πi [ p N ( n )] γ = 0 . Proof.

We split the proof into two cases.

Case 1: γ ∈ Q \ Z . For m ∈ N , m > , it suﬃces to show, for all ≤ h ≤ m − , that lim N →∞ N N X n =1 e πi [ p N ( n )] hm = 0 . For ≤ j ≤ m − , let A ( j, m, N ) denote the number of terms p N (1) , . . . , p N ( N ) thatsatisfy [ p N ( n )] ≡ j ( mod m ) . This is equivalent to j/m ≤ { p N ( n ) /m } < ( j + 1) /m. Hence, since ( p N /m ) N is good, using Lemma 3.2 for the characteristic function of theinterval [ j/m, ( j + 1) /m ) , we get lim N →∞ A ( j, m, N ) N = 1 m . This last relation, together with m − X h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X n =1 e πi [ p N ( n )] hm (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = m m − X j =0 (cid:18) A ( j, m, N ) N − m (cid:19) , (see [33, Exercise 1.5, Page 318]) implies the claim. Case 2: γ ∈ R \ Q . We write N N X n =1 e πi [ p N ( n )] γ = 1 N N X n =1 e πi ( p N ( n ) −{ p N ( n ) } ) γ = 1 N N X n =1 f ( p N ( n ) γ, p N ( n )) , where f ( x, y ) = e πi ( x −{ y } γ ) is a complex-valued Riemann integrable periodic mod func-tion on R . By Lemma 3.4 we have that lim N →∞ N N X n =1 e πi [ p N ( n )] γ = Z Z f ( x, y ) dxdy = 0 , as was to be shown. (cid:3) Using a standard argument we are now ready to prove Theorem 1.3: Where { x } = x − [ x ] is the fractional part of the real number x. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 17

Proof of Theorem 1.3.

We can assume without loss of generality that R f dµ = 0 . WithHerglotz’s theorem we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 T [ p N ( n )] f (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 1 N N X n,m =1 Z T [ p N ( n )] f T [ p N ( m )] f dµ = 1 N N X n,m =1 Z f T [ p N ( n )] − [ p N ( m )] f dµ = 1 N N X n,m =1 Z e πi ([ p N ( n )] − [ p N ( m )]) γ dν f ( γ )= Z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X n =1 e πi [ p N ( n )] γ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dν f ( γ ) , where ν f is the spectral measure with ν f ( { } ) = 0 . The conclusion now follows sinceLemma 3.5 implies that lim N →∞ N P Nn =1 e πi [ p N ( n )] γ = 0 for all γ ∈ (0 , . (cid:3) Remark 3.6.

Someone can get numerous applications from Theorem 1.3 as the ones inSection 2 (in the prime numbers cases one has to postulate some additional “good” assump-tions for the sequences but no “super nice” ones). As these implications are clear from ourarguments, we leave the formulations and proofs of them to the interested reader. Some background material

In this section we list some materials that will be used for the multiple average case.4.1.

Factors. A homomorphism from a system ( X, B , µ, T ) onto a system ( Y, Y , ν, S ) is ameasurable map π : X ′ → Y ′ , where X ′ is T -invariant subset of X and Y ′ is an S -invariantsubset of Y , both of full measure, such that µ ◦ π − = ν and S ◦ π ( x ) = π ◦ T ( x ) for x ∈ X ′ .When we have such a homomorphism we say that the system ( Y, Y , ν, S ) is a factor of thesystem ( X, B , µ, T ) . If the factor map π : X ′ → Y ′ can be chosen to be injective, thenwe say that the systems ( X, B , µ, T ) and ( Y, Y , ν, S ) are isomorphic . A factor can also becharacterised by π − ( Y ) which is a T -invariant sub- σ -algebra of B , and, conversely, any T -invariant sub- σ -algebra of B deﬁnes a factor. By abusing the terminology, we denote bythe same letter the σ -algebra Y and its inverse image by π , so, if ( Y, Y , ν, S ) is a factor of ( X, B , µ, T ) , we think of Y as a sub- σ -algebra of B .4.1.1. Seminorms.

We follow [26] and [8] for the inductive deﬁnition of the seminorms ||| · ||| k . More speciﬁcally, the deﬁnition that we use here follows from [26] (in the ergodiccase), [8] (in the general case) and the use of von Neumann’s ergodic theorem.Let ( X, B , µ, T ) be a system and f ∈ L ∞ ( µ ) . We deﬁne inductively the seminorms ||| f ||| k,µ,T (or just ||| f ||| k if there is no confusion) as follows: For k = 1 we set ||| f ||| := k E ( f |I ( T )) k , where I ( T ) is the σ -algebra of T -invariant sets and E ( f |I ( T )) the conditional expecta-tion of f with respect to I ( T ) , satisfying R E ( f |I ( T )) dµ = R f dµ and T E ( f |I ( T )) = E ( T f |I ( T )) . For k ≥ we let ||| f ||| k +1 k +1 := lim N →∞ N N X n =1 ||| ¯ f · T n f ||| k k . All these limits exist and ||| · ||| k deﬁne seminorms on L ∞ ( µ ) ([26]). Also, we remark thatfor all k ∈ N we have ||| f ||| k ≤ ||| f ||| k +1 and ||| f ⊗ ¯ f ||| k,µ × µ,T × T ≤ ||| f ||| k +1 ,µ,T . Nilfactors.

Using the seminorms we deﬁned above we can construct factors Z k = Z k ( T ) of X characterized by:for f ∈ L ∞ ( µ ) , E ( f |Z k − ) = 0 if and only if ||| f ||| k = 0 . The following profound fact from [26] shows that for every k ∈ N the factor Z k has apurely algebraic structure; in particular for all practical reasons we can assume that it isa k -step nilsystem (see Subsection 4.2 below for the deﬁnitions): Theorem 4.1 (Host & Kra, [26]) . Let ( X, B , µ, T ) be an ergodic system and k ∈ N . Thenthe factor Z k ( T ) is an inverse limit of k -step nilsystems. Because of this result (also known as the “Structure Theorem”) we call Z k the k - stepnilfactor of the system. The smallest factor that is an extension of all ﬁnite step nilfactorsis denoted by Z = Z ( T ) , meaning, Z = W k ∈ N Z k , and is called the nilfactor of the system.The nilfactor Z is of particular interest because it controls the limiting behaviour in L ( µ ) of the averages in (9) and (10).4.2. Nilmanifolds.

Let G be a k -step nilpotent Lie group, meaning G k +1 = { e } forsome k ∈ N , where G k = [ G, G k − ] denotes the k -th commutator subgroup, and Γ adiscrete cocompact subgroup of G . The compact homogeneous space X = G/ Γ is called k -step nilmanifold (or nilmanifold ). The group G acts on G/ Γ by left translations, wherethe translation by an element b ∈ G is given by T b ( g Γ) = ( bg )Γ . We denote by m X thenormalized Haar measure on X, i.e., the unique probability measure that is invariant underthe action of G , and by G / Γ the Borel σ -algebra of G/ Γ . If b ∈ G , we call the system ( G/ Γ , G / Γ , m X , T b ) k - step nilsystem (or nilsystem ) and the elements of G nilrotations .4.2.1. Equidistribution.

For a connected and simply connected Lie group G, let exp : g → G be the exponential map, where g is the Lie algebra of G . For b ∈ G and s ∈ R we deﬁnethe element b s of G as follows: If X ∈ g is such that exp( X ) = b , then b s = exp( sX ) (thisis well deﬁned since under the aforementioned assumptions exp is a bijection).If ( a ( n )) n is a sequence of real numbers and X = G/ Γ is a nilmanifold with G connectedand simply connected, we say that the sequence ( b a ( n ) x ) n , b ∈ G, is equidistributed in asubnilmanifold Y of X , if for every F ∈ C ( X ) we have(17) lim N →∞ N N X n =1 F ( b a ( n ) x ) = Z F dm Y . A nilrotation b ∈ G is ergodic (or acts ergodically ) on X , if the sequence ( b n Γ) n is densein X. If b ∈ G is ergodic, then for every x ∈ X the sequence ( b n x ) n is equidistributed By this we mean that there exist T -invariant sub- σ -algebras Z k,i , i ∈ N , of B such that Z k = S i ∈ N Z k,i and for every i ∈ N , the factors induced by the σ -algebras Z k,i are isomorphic to k -step nilsystems. If ( a ( n )) n ⊆ Z , we can drop the assumptions that G is connected and simply connected. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 19 in X (this is a non-trivial fact which follows by unique ergodicity). The orbit closure ( b n Γ) n of b ∈ G has the structure of a nilmanifold with ( b n Γ) n being equidistributed in it.Analogously, if G is connected and simply connected, then ( b s Γ) s ∈ R is a nilmanifold with ( b s Γ) s ∈ R being equidistributed in it.4.2.2. Change of base point formula.

Let X = G/ Γ be a nilmanifold. As mentioned before,for every b ∈ G the sequence ( b n Γ) n is equidistributed in X b = { b n Γ : n ∈ N } . Using theidentity b n g = g ( g − bg ) n we see that the nil-orbit ( b n g Γ) n is equidistributed in the set gX g − bg . A similar formula holds when G is connected and simply connected, where wereplace the integer parameter n with the real parameter s and the nilmanifold X b with Y b = { b s Γ : s ∈ R } .4.2.3. Lifting argument.

Giving a topological group G, we denote the connected componentof its identity element, e, by G . In several instances it will be convenient for us to assumethat a nilmanifold has a representation G/ Γ with G connected and simply connected (tothis end, one can follow for example [34]). Since all our results deal with the action on X ofﬁnitely many elements of G we can and will assume that the discrete group G/G is ﬁnitelygenerated. In this case one can show that X = G/ Γ is isomorphic to a sub-nilmanifoldof a nilmanifold ˜ X = ˜ G/ ˜Γ , where ˜ G is a connected and simply connected nilpotent Liegroup, with all translations from G “represented” in ˜ G . We caution the reader that sucha construction is only helpful when our working assumptions impose no restrictions on anilrotation. Any assumption made about b ∈ G, which acts on a nilmanifold X , is typicallylost when passing to the lifted nilmanifold ˜ X .5. Finding the characteristic factor

In this technical section we ﬁnd characteristic factors for the required expressions thatappear in Theorems 2.1 and 2.2. Actually, in both cases, we will show that the nilfactor isthe characteristic one (Proposition 5.14 and Proposition 5.15 respectively).We ﬁrst start with the degree case and then move on to the general one. At this pointwe recall (adapted to our study) the notion of a characteristic factor: Deﬁnition 5.1.

For ℓ ∈ N let ( X, B , µ, T ) be a system. The sub- σ -algebra Y of B is a characteristic factor for the variable tuple of integer-valued sequences ( a ,N , . . . , a ℓ,N ) N ifit is T -invariant and lim N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 T a ,N ( n ) f · . . . · T a ℓ,N ( n ) f ℓ − N N X n =1 T a ,N ( n ) ˜ f · . . . · T a ℓ,N ( n ) ˜ f ℓ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 0 , for all f i ∈ L ∞ ( µ ) , where ˜ f i = E ( f i |Y ) , ≤ i ≤ ℓ. Practically this means that for every F ∈ C ( X ) , b ∈ G and x ∈ X , there exists ˜ F ∈ C ( ˜ X ) , ˜ b ∈ ˜ G and ˜ x ∈ ˜ X , such that F ( b n x ) = ˜ F (˜ b n ˜ x ) for every n ∈ N . Equivalently, lim N →∞ (cid:13)(cid:13)(cid:13) N P Nn =1 T a ,N ( n ) f · . . . · T a ℓ,N ( n ) f ℓ (cid:13)(cid:13)(cid:13) = 0 if E ( f i |Y ) = 0 for some ≤ i ≤ ℓ. The base case.

The following crucial lemma, which can be understood as a “changeof variables” procedure, will be used in the base ℓ = 1 case for deg p N = 1 , i.e., p N ( n ) = a N n + b N . We will assume that ( b N ) N is bounded, so, as such error terms don’t aﬀect ouraverages, we mainly have to deal with the expression N P Nn =1 T [ a N n ] f. Lemma 5.2.

Let ( a N ) N ⊆ (0 , + ∞ ) bounded with ( a N · N ) N tending increasingly to ∞ .For any sequence ( c N ( n )) n,N ⊆ [0 , ∞ ) we have lim sup N →∞ N N X n =1 c [ a N N ] ([ a N n ]) ≪ lim sup N →∞ N N X n =1 c N ( n ) . Proof.

For a ﬁxed N ∈ N , since ( a N ) N is bounded, we have the relation N N X n =1 c [ a N N ] ([ a N n ]) ≤ (cid:18)(cid:20) a N (cid:21) + 1 (cid:19) · a N · [ a N N ] a N N · a N N ] [ a N N ] X n =0 c [ a N N ] ( n ) . Since a N N → ∞ , we get that [ a N N ] /a N N → . Finally, using yet again that ( a N ) N isbounded, we have (cid:18)(cid:20) a N (cid:21) + 1 (cid:19) · a N ≤ a N + 1 ≪ . The result now follows by taking lim sup . (cid:3) Using Lemma 5.2, following the argument of [13, Lemma 5.2] we get:

Lemma 5.3.

Let ( p N ) N be a sequence of polynomials of degree of the form p N ( n ) = a N n + b N , n, N ∈ N , where ( a N ) N , ( b N ) N are bounded with ( a N ) N ⊆ (0 , + ∞ ) and ( a N · N ) N tending increasinglyto ∞ . Then, for any system ( X, B , µ, T ) and f ∈ L ∞ ( µ ) , we have (18) lim sup N →∞ sup k f k ∞ ≤ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ p N ( n )] f dµ (cid:12)(cid:12)(cid:12)(cid:12) ≪ ||| f ||| . Proof.

For every N ∈ N we choose functions f ,N with k f ,N k ∞ ≤ so that the corre-sponding average in (18) is /N close to its sup k f k ∞ ≤ . To show (18), it suﬃces to showthat(19) lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f , [ a N N ] · T [ p N ( n )] f dµ (cid:12)(cid:12)(cid:12)(cid:12) ≪ ||| f ||| . We write [ p N ( n )] = [ a N n + b N ] = [ a N n ] + [ b N ] + e ( n, N ) , e ( n, N ) ∈ { , } . If ([ b N ] + e ( n, N )) n,N takes values in the ﬁnite set of integers E, we have that N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f , [ a N N ] · T [ p N ( n )] f dµ (cid:12)(cid:12)(cid:12)(cid:12) ≪ max e ∈ E N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f , [ a N N ] · T [ a N n ]+ e f dµ (cid:12)(cid:12)(cid:12)(cid:12) . ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 21

Taking squares and using the Cauchy-Schwarz inequality, the right-hand side of the previ-ous relation is bounded by max e ∈ E N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f , [ a N N ] · T [ a N n ]+ e f dµ (cid:12)(cid:12)(cid:12)(cid:12) = max e ∈ E N N X n =1 Z F , [ a N N ] · S [ a N n ]+ e F d ˜ µ, where S = T × T, F , [ a N N ] = f , [ a N N ] ⊗ ¯ f , [ a N N ] , F = f ⊗ ¯ f , and ˜ µ = µ × µ. For every e ∈ E , using Lemma 5.2, the average on the right-hand side of the previous relation isbounded by a constant multiple of lim sup N →∞ N N X n =1 Z F ,N · S n + e F d ˜ µ ≤ lim sup N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 S n + e F (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L (˜ µ ) , where the last inequality follows by Cauchy-Schwarz and the fact that k F ,N k ∞ ≤ . Usingvon Neymann’s mean ergodic theorem, the last term is equal to k E ( S e F |I ( S )) k L (˜ µ ) = k E ( F |I ( S )) k L (˜ µ ) ≤ ||| f ||| , where we used the fact that S is measure preserving and the deﬁnition of the seminorms ||| · ||| . (19) now follows by removing the squares. (cid:3) Remark 5.4.

Lemma 5.3 holds also for ( a N ) N ⊆ ( −∞ , with ( a N · N ) N tending decreas-ingly to −∞ . Indeed, In this case we write [ p N ( n )] = − [ − a N n ] + [ b N ] + e ( n, N ) , e ( n, N ) ∈ {− , } , so, N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ p N ( n )] f dµ (cid:12)(cid:12)(cid:12)(cid:12) ≪ max e ∈ E N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f · T − [ − a N n ] ( T e f ) dµ (cid:12)(cid:12)(cid:12)(cid:12) , where E is a ﬁnite subset of integers. Since − a N > and ( − a N · N ) N tends increasinglyto ∞ , we get the conclusion by the previous lemma (working with T − instead of T ). To extend Lemma 5.3 for multiple terms, we use the following variant of the classicalvan der Corput trick, the main tool for reducing the complexity of our iterates (see nextsubsection for more details):

Lemma 5.5 (Lemma 4.6, [13]) . Let ( v N,n ) N,n be a bounded sequence in a Hilbert space.Then lim sup N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 v N,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ H →∞ H H X h =1 lim sup N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X n =1 h v N,n + h , v N,n i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . We will now demonstrate the main idea behind the generalization of Lemma 5.3, forwhich we follow [13, Proposition 5.3, Case 1]. In that statement, to show lim sup N →∞ sup k f k ∞ , k f i k ∞ ≤ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ a n ] f · T [ a n ] f (cid:12)(cid:12)(cid:12)(cid:12) ≪ ||| f j ||| , We note that the seminorms that are taken with respect to the commuting transformations T or T − , since in Theorems 2.1 and 2.2 we are working under the ergodicity assumption, coincide. where ( i, j ) = (1 , or (2 , , one uses Lemma 5.5, compose with, say, − [ a n ] , and gets theterms (notice that we keep the h -term in the ﬁrst diﬀerence even though it’s bounded) [ a ( n + h )] − [ a n ] ≈ [ a h ] , [ a ( n + h )] − [ a n ] ≈ [( a − a ) n ] , and [ a n ] − [ a n ] ≈ [( a − a ) n ] , so, after grouping the last two terms together, using the ﬁrst one as constant (since it onlydepends on h –the average along which is taken at the very end), one can use the base ℓ = 1 case; this is actually how the inductive step works in the proof of the general ℓ ∈ N case.The variable case is more complicated to deal with. We demonstrate the main ideabehind it by considering Example 1, i.e., p ,N ( n ) = a ,N n and p ,N ( n ) = a ,N n, where a ,N = 1 /N a and a ,N = 1 /N b for < a < b < . The previous approach cannot beimitated, as, for example, [ a ,N ( n + h )] − [ a ,N n ] ≈ [ a ,N h ] is in general a variable term and we cannot proceed with the same argument. What we doinstead is to transform the iterates in the initial sum to the following: (0 , [ a ,N n ] , [ a ,N n ]) ≈ (cid:18) , [ a ,N n ] , (cid:20) a ,N a ,N [ a ,N n ] (cid:21)(cid:19) Lemma 5.2 −−−−−−−−−−−→ change of variables (cid:18) , n, (cid:20) a ,N a ,N n (cid:21)(cid:19) , and then we use Lemma 5.5 to bound, eventually, everything by ||| f ||| . (To use Lemma 5.2note the crucial fact that ( a ,N /a ,N ) N is bounded.) Additionally, to bound our expressionby ||| f ||| , the previous argument needs an additional twist to work since the quantity ( a ,N /a ,N ) N is unbounded. What we do in this case is to compose with − [ a ,N n ] to get (0 , [ a ,N n ] , [ a ,N n ]) ≈ ([ − a ,N n ] , [( a ,N − a ,N ) n ] , ≈ (cid:18)(cid:20) − a ,N a ,N − a ,N [( a ,N − a ,N ) n ] (cid:21) , [( a ,N − a ,N ) n ] , (cid:19) change of −−−−−−→ variables (cid:18)(cid:20) − a ,N a ,N − a ,N n (cid:21) , n, (cid:19) . As ( a ,N / ( a ,N − a ,N )) N is bounded, we can now ﬁnish the argument as before.The previous discussion, naturally leads to the following assumption on the leadingcoeﬃcients of the linear (variable) polynomials: Deﬁnition 5.6.

A sequence of real numbers ( a N ) N has the R -property if(i) it is bounded; and(ii) ( a N ) N ⊆ (0 , + ∞ ) or ( −∞ , and ( | a N | · N ) N tends increasingly to + ∞ . For ℓ ∈ N we say that the sequences of real numbers { ( a i,N ) N : 1 ≤ i ≤ ℓ } have the R ℓ -property if for all ≤ i ≤ ℓ :(i) ( a i,N ) N has the R -property; and Here, by “ ≈ ”, we mean “modulo bounded error terms”. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 23 (ii) at least one of the following three properties holds:(a) there exists ≤ j = i ≤ ℓ such that the sequences (cid:26) (cid:16) a j ,N − a j,N a i,N (cid:17) N : 1 ≤ j = j ≤ ℓ (cid:27) have the R ℓ − -property.(b) there exists ≤ j = i ≤ ℓ such that the sequence ( a i,N − a j ,N ) N hasthe R -property and the sequences (cid:26) (cid:16) a j,N a i,N − a j ,N (cid:17) N : 1 ≤ j = j ≤ ℓ (cid:27) have the R ℓ − -property.(c) there exist ≤ j = i ≤ ℓ such that the sequence ( a i,N − a j ,N ) N has the R -property and ≤ k = j , i ≤ ℓ such that the sequences (cid:26) (cid:16) − a k ,N a i,N − a j ,N (cid:17) N , (cid:16) a j,N − a k ,N a i,N − a j ,N (cid:17) N : 1 ≤ j = k , j ≤ ℓ (cid:27) have the R ℓ − -property. Remark 5.7.

The polynomial family of Example 1, i.e., p ,N ( n ) = n/N a , p ,N ( n ) = n/N b ,n, N ∈ N , where < a < b < , has the R -property.Indeed, skipping the trivial calculations, both sequences (1 /N a ) N , (1 /N b ) N have the R -property and for i = 1 we have the (ii) (a) case, while for i = 2 the (ii) (b) case.We are now ready to extend Lemma 5.3 to multiple terms along polynomials of degree , following the main idea of [13, Proposition 5.3, Case 1]: Proposition 5.8.

Let ( p ,N ) N , . . . , ( p ℓ,N ) N be polynomial sequences of degree of the form p i,N ( n ) = a i,N n + b i,N , n, N ∈ N , ≤ i ≤ ℓ, where the sequences ( a i,N ) N , ≤ i ≤ ℓ, have the R ℓ -property and ( b i,N ) N , ≤ i ≤ ℓ, arebounded. Then, for every f ∈ L ∞ ( µ ) , we have (20) lim sup N →∞ sup k f k ∞ , k f k ∞ ,..., k f ℓ k ∞ ≤ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ p ,N ( n )] f · . . . · T [ p ℓ,N ( n )] f ℓ dµ (cid:12)(cid:12)(cid:12)(cid:12) ≪ ||| f ||| ℓ . Proof.

We use induction on ℓ. The base case, ℓ = 1 , follows from Lemma 5.3. We assumethat ℓ ≥ and that the statement holds for ℓ − . Case 1:

For i = 1 , the property (ii) (a) from the Deﬁnition 5.6 holds. N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ a ,N n + b ,N ] f · . . . · T [ a ℓ,N n + b ℓ,N ] f ℓ dµ (cid:12)(cid:12)(cid:12)(cid:12) = 1 N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ a ,N n ]+ e ( n,N ) f · ℓ Y i =2 T (cid:20) ai,Na ,N [ a ,N n ] (cid:21) + e i ( n,N ) f i dµ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≪ max e ,...,e ℓ ∈ E N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ a ,N n ] ( T e f ) · ℓ Y i =2 T (cid:20) ai,Na ,N [ a ,N n ] (cid:21) ( T e i f i ) dµ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (21) The constant here depends on the bounds of the coeﬃcients and the number of transformations ℓ. Note that, because of symmetry, we also have the respective estimates for ≤ i ≤ ℓ with f i in place of f . where E is a ﬁnite subset of integers (the error terms e i ( n, N ) , as ( b i,N ) N , and ( a i,N /a ,N ) N are bounded for ≤ i ≤ ℓ, take ﬁnitely many values).For every N ∈ N we now choose functions f i,N with k f i,N k ∞ ≤ for i ∈ { , , . . . , ℓ } , sothat the last relation in (21) is /N close to the corresponding sup k f k ∞ , k f k ∞ ,..., k f ℓ k ∞ ≤ . Using the Cauchy-Schwarz inequality and that | a ,N | · N → ∞ , we have that (20) followsif we show, for each choice of e , . . . , e ℓ ∈ E, that(22) lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z f , [ a ,N N ] · T [ a ,N n ] ( T e f ) · ℓ Y i =2 T (cid:20) ai,Na ,N [ a ,N n ] (cid:21) ( T e i f i, [ a ,N N ] ) dµ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) is bounded above by a constant multiple of ||| f ||| ℓ . Using Lemma 5.2 it suﬃces to show(23) lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z f ,N · T n ( T e f ) · ℓ Y i =2 T (cid:20) ai,Na ,N n (cid:21) ( T e i f i,N ) dµ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≪ ||| f ||| ℓ . The left-hand side of (23) is equal to A := lim sup N →∞ N N X n =1 Z F ,N · S n ( S e F ) · ℓ Y i =2 S (cid:20) ai,Na ,N n (cid:21) ( S e i F i,N ) d ˜ µ, where S = T × T, F = f ⊗ ¯ f , F i,N = f i,N ⊗ ¯ f i,N , i = 0 , , . . . , ℓ, and ˜ µ = µ × µ. UsingCauchy-Schwarz, and then Lemma 5.5, we have that | A | ≪ lim sup H →∞ H H X h =1 A h , where A h := lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z S n + h ( S e F ) · ℓ Y i =2 S (cid:20) ai,Na ,N ( n + h ) (cid:21) ( S e i F i,N ) · S n ( S e F ) · ℓ Y i =2 S (cid:20) ai,Na ,N n (cid:21) ( S e i F i,N ) d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12) . Factoring out the term S n we get A h = lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z S h ( S e F ) · S e F · ℓ Y i =2 S (cid:20)(cid:18) ai,Na ,N − (cid:19) n (cid:21) + (cid:20) ai,Na ,N h (cid:21) + e j ( n,h,N ) ( S e i F i,N ) · ℓ Y i =2 S (cid:20)(cid:18) ai,Na ,N − (cid:19) n (cid:21) +˜ e j ( n,h,N ) ( S e i F i,N ) d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12) = lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z F ,h · ℓ Y i =2 S (cid:20)(cid:18) ai,Na ,N − (cid:19) n (cid:21) F i,h,n,N d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (24)where F ,h = S h ( S e F ) · S e F , F i,h,n,N = S (cid:20) ai,Na ,N h (cid:21) + e j ( n,h,N ) ( S e i F i,N ) · S ˜ e j ( n,h,N ) ( S e i F i,N ) , ≤ i ≤ ℓ. Using the hypothesis, for i = 1 , there exists ≤ j ≤ ℓ such that the sequences ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 25 (cid:26) (cid:16) a j ,N − a j,N a ,N (cid:17) N : 1 ≤ j = j ≤ ℓ (cid:27) have the R ℓ − -property. Factoring out S (cid:20)(cid:18) aj ,Na ,N − (cid:19) n (cid:21) in the previous relation we have that A h = lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z F ,h · ℓ Y i =2 S (cid:20)(cid:18) ai,Na ,N − (cid:19) n (cid:21) F i,h,n,N d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = lim sup N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z F j ,h,n,N · S (cid:20) − (cid:18) aj ,Na ,N − (cid:19) n (cid:21) + e ′ ( n,N ) F ,h · Y ≤ i = j ≤ ℓ S (cid:20) − (cid:18) aj ,N − ai,Na ,N (cid:19) n (cid:21) ˜ F i,h,n,N d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12) , where ˜ F i,h,n,N = S e ′ i ( n,N ) F i,h,n,N for some error terms e ′ i ( n, N ) ∈ { , } . As we previously highlighted, for every ﬁxed N, we can partition the set of integers sothat e ′ ( n, N ) is constant. So, ﬁxing e ′ ∈ { , } , using the induction hypothesis, we have A h ≪ lim sup N →∞ sup k F k ∞ , k F k ∞ ,..., k F ℓ k ∞ ≤ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z F · S (cid:20) − (cid:18) aj ,Na ,N − (cid:19) n (cid:21) ( S e ′ F ,h ) · Y ≤ i = j ≤ ℓ S (cid:20) − (cid:18) aj ,N − ai,Na ,N (cid:19) n (cid:21) F i d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12) ≪ ||| S e ′ F ,h ||| ℓ − = ||| F ,h ||| ℓ − = ||| S h ( S e F ) · S e F ||| ℓ − = ||| ( T h + e f · T e f ) ⊗ ( T h + e f · T e f ) ||| ℓ − ≤ ||| T h + e f · T e f ||| ℓ − . So, using Hölder inequality and the deﬁnition of the seminorms ||| · ||| , we have | A | ≪ lim sup H →∞ H H X h =1 A h ≪ lim sup H →∞ H H X h =1 ||| T h + e f · T e f ||| ℓ − ≤ lim sup H →∞ H H X h =1 ||| T h + e f · T e f ||| ℓ − ℓ − ! / ℓ − = ||| T e f ||| ℓ = ||| f ||| ℓ , hence, (22) is bounded above by a constant multiple of ||| f ||| ℓ as was to be shown. Cases 2 & 3:

For i = 1 , we either have (ii) (b) or (ii) (c) in Deﬁnition 5.6.Here we will skip the details already outlined in Case 1. If ≤ j ≤ ℓ is the oneguaranteed by Deﬁnition 5.6, the averaging term in the last part of (21) will become(setting, without loss, e i = 0 ) f j · T [( a ,N − a j ,N ) n ] f · T (cid:20) − aj ,Na ,N − aj ,N [( a ,N − a j ,N ) n ] (cid:21) f · Y ≤ j = j ≤ ℓ T (cid:20) aj,N − aj ,Na ,N − aj ,N [( a ,N − a j ,N ) n ] (cid:21) f j and the one in (24) Z F ,h · S (cid:20)(cid:18) − aj ,Na ,N − aj ,N − (cid:19) n (cid:21) F ,h,n,N · Y ≤ j = j ≤ ℓ S (cid:20)(cid:18) aj,N − aj ,Na ,N − aj ,N − (cid:19) n (cid:21) F i,h,n,N d ˜ µ. Factoring out S (cid:20)(cid:18) − aj ,Na ,N − aj ,N − (cid:19) n (cid:21) = S (cid:20) a ,Na ,N − aj ,N n (cid:21) (for Case 2) and S (cid:20)(cid:18) ak ,N − aj ,Na ,N − aj ,N − (cid:19) n (cid:21) = S (cid:20) ak ,N − a ,Na ,N − aj ,N n (cid:21) (for Case 3–where ≤ k = j ≤ ℓ is the one guaranteed by Deﬁnition 5.6),we can continue (using the induction hypothesis) and ﬁnish the argument as in Case 1.The proof of the statement is now complete. (cid:3) Remark 5.9.

To the best of our knowledge, when we deal with norm convergence of aver-ages with (non-variable) polynomial iterates, we can always replace the conventional Cesàroaverages, i.e., lim N →∞ N P Nn =1 , with uniform ones, i.e., lim N − M →∞ N − M P N − n = M . Ourmethod though, exactly because of the choice of functions f i, [ a ,N N ] (to go from Equation (18) to (19) and from (21) to (22) ), cannot give the corresponding uniform results. The general case.

We start by recalling (see, for example, [4] and [13]) the deﬁnitionof the degree and type of a polynomial family that we will adapt in our study:

Deﬁnition 5.10.

For ℓ ∈ N let P = { p , . . . , p ℓ } be a family of non-constant real polyno-mials. We denote with deg( P ) the maximum degree of the p i ’s and we call it degree of P . If w i denotes the number of distinct leading coeﬃcients of polynomials from P of degree i and d = deg( P ) , then the vector ( d, w d , . . . , w ) is the type of P . We order all the possibletype vectors lexicographically. In order to reduce the complexity (i.e., the type) of a polynomial family, one has to usethe classic PET induction, which was introduced in [4].At this point we remind the reader that the real polynomials p , . . . , p ℓ are called es-sentially distinct if they are, together with their pairwise diﬀerences, non-constant. Givensuch a family of polynomials P = { p , . . . , p ℓ } , p ∈ P and h ∈ N , the van der Corputoperation (vdC-operation) , acting on P , gives the family P ( p, h ) := { p ( t + h ) − p ( t ) , . . . , p ℓ ( t + h ) − p ( t ) , p ( t ) − p ( t ) , . . . , p ℓ ( t ) − p ( t ) } , whereby we remove all the terms that are bounded and we group the ones of degree with bounded diﬀerence (i.e., of the same leading coeﬃcient), thus obtaining a new familyof essentially distinct polynomials.The following lemma acknowledges that there exists a choice of a polynomial in a familyof essentially distinct polynomials, via which the vdC-operation reduces its type: Lemma 5.11 (Lemma 4.5, [13]) . Let ℓ ∈ N and P = { p , . . . , p ℓ } be a family of essentiallydistinct polynomials with deg( P ) = deg( p ) ≥ . Then there exists p ∈ P (of minimum I.e., ( d, w d , . . . , w ) > ( d ′ , w ′ d , . . . , w ′ ) iﬀ, reading from left to right, the ﬁrst instance where the twovectors disagree the coordinate of the ﬁrst vector is greater than that of the second one. I.e., Polynomial Exhaustion Technique. Notice that if P lists the polynomial iterates in the expression T [ p ( n )] f · . . . · T [ p ℓ ( n )] f ℓ , then P ( p, h ) lists the respective iterates (modulo error terms) after using Lemma 5.5 and factoring out the iterate − p ( n ) . This is justiﬁed with the use of the Cauchy-Schwarz inequality.

ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 27 degree in the polynomial family) such that for every large h the family P ( p, h ) has typesmaller than that of P , and deg( P ( p, h )) = deg( p ( t + h ) − p ( t )) . What is crucial for us is that every decreasing sequence of types is eventually (afterﬁnitely many steps) stationary and that, by using the previous lemma, there is a pointthat all the polynomials have degree (hence, one is able to use the base case, ﬁnishingthe inductive scheme). Also, by its deﬁnition, the vdC-operation preserves the essentialdistinctness property.Switching to variable polynomials, notice that, for all N ∈ N , the type in Examples 1and 2 is (1 , and ( ℓ, , . . . , respectively, i.e., both independent of N. This is not alwaysthe case though.Indeed, consider the pairs ( p ,N , p ,N ) N , and ( q ,N , q ,N ) N , where, for N ∈ N we have p ,N ( t ) = (1 /N ) t + √ t + t , and p ,N ( t ) = (( − N /N ) t + √ t − t and q ,N ( t ) = (1 /N ) t + √ t + t , and q ,N ( t ) = (1 /N ) t + √ t − t. In the ﬁrst case the type of the pair is (4 , , , , for N even and (4 , , , , otherwise,while the type of the second one equals to (4 , , , , for N = 1 and (4 , , , , for N > , i.e., has the property that it eventually (for N > ) becomes constant.We will deal with sequences of families of real polynomials, ( P N ) N , where P N = { p ,N , . . . , p ℓ,N } , N ∈ N , that, for large N , has type independent of N, to be ableto use the facts that we just mentioned.Next we will deﬁne the subclass of variable polynomials that we will deal with. Deﬁnition 5.12.

For ℓ ∈ N let ( P N ) N = ( p ,N , . . . , p ℓ,N ) N be a sequence of ℓ -tuples ofreal polynomials with bounded coeﬃcients. We say that ( P N ) N is super nice if, for every(large enough) N ∈ N :(i) the polynomials p i,N and, for all i = j, p i,N − p j,N are non-constant and theirdegrees are independent of N ;(ii) after performing, if needed, (ﬁnitely many) vdC-operations to ( P N ) N to end uponly with polynomials of degree , say k ≡ k (( P N ) N ) many, the leading coeﬃcients(for large enough h i ’s–from the vdC-operations) have the R k -property; and(ii) ′ if deg(( p i ,N ) N ) = deg(( P N ) N ) , then (ii) holds for the polynomial sequence ( P ′ N ) N := ( p ,N − p i ,N , . . . , p i − ,N − p i ,N , − p i ,N , p i +1 ,N − p i ,N , . . . , p ℓ,N − p i ,N ) N . In the following remark we list interesting facts on the super niceness property, some ofwhich will be used in what follows. Abusing the notation we will refer to such sequences as (variable) sequences of ℓ -tuples of real poly-nomials and we will denote them with ( P N ) N = ( p ,N , . . . , p ℓ,N ) N . From now on whatever we write about variable polynomials we understand it for “large” enough N. It is not clear whether ( ii ) implies ( ii ) ′ . Indeed, consider for example the polynomials p ,N ( n ) = − a N n − b N n, p ,N ( n ) = ( a N − b N ) n and ( P N ) N = ( p ,N , p ,N ) N . After performing two vdC-operationswe get the triple {− a N ( h ′ + h ) n, − a N hn, − a N h ′ n } , while for the ( P ′ N ) N = ( − p ,N , p ,N − p ,N ) N , aftera single vdC-operation we get { ( a N − b N ) n, a N hn, ((2 h + 1) a N − b N ) n } . So, in the second case we haveto impose assumptions on both ( a N ) N , ( b N ) N , while in the ﬁrst one only on ( a N ) N . Remark 5.13. (1)

The degree and type of every super nice sequence, together with theinteger k in ( ii ) (and, analogously, in ( ii ) ′ as well), are independent of N. (2) Every independent of N family P = { p , . . . , p ℓ } of essentially distinct polynomialsis super nice. (This shows that the set of super nice families of non-variable polynomialsis not empty.) Indeed, since ( i ) is immediate, we are showing ( ii ) ( ( ii ) ′ follows with exactly the verysame argument). As it was mentioned before, the vdC-operation preserves the essential dis-tinctness property, hence, all the k linear polynomial will have distinct leading coeﬃcients,which, as they are independent of N, will have the R k -property. (3) The set of super nice variable polynomial sequences is non-empty. Actually, the ℓ -tuple ( p ,N , . . . , p ℓ,N ) N , where p i,N ( n ) = n i /N a , ≤ i ≤ ℓ, N, n ∈ N , and < a < , fromExample 2, is super nice (see Lemma 8.2 below for a more general statement). Indeed, as the variable part of the coeﬃcients of the polynomials, after applying vdC-operations, is the same for all terms (and equal to /N a ), at each step we have that theratios of the coeﬃcients are independent of N , hence we have all the properties. (4) Even though the number k of degree terms (that appears in ( ii ) and ( ii ) ′ ) is not apriori known, when we have a single variable polynomial sequence p N ( n ) = a d,N n d + . . . + a ,N n + a ,N , where ( a d,N ) N has the R -property and all ( a i,N ) N , ≤ i ≤ d, are bounded, we have thatfor all ℓ ∈ N , ( P N ) N = ( p N , p N , . . . , ℓp N ) N is super nice (without having to know thecorresponding k ). Given (i) is immediate, it suﬃces to show (ii) ((ii) ′ follows with exactly the sameargument–the important part here is that ( P N ) N consists of distinct non-zero multiples ofthe same polynomial). We start with ( ℓp N ( n ) , . . . , p N ( n )) (we write it in this order forconvenience) and use the vdC-operation which leads to diﬀerences of polynomials (deriva-tives). Factoring out p N ( n ) we get ( ℓ − p N ( n + h ) + p N ( n + h ) − p N ( n ) , . . . , p N ( n + h ) − p N ( n ) , ( ℓ − p N ( n ) , . . . , p N ( n ) . In the next iteration of the vdC-operation we factor out p N ( n + h ) − p N ( n ) , and then p N ( n ) − ( p N ( n + h ) − p N ( n )) = p N ( n + h ) (i.e., polynomials of minimum degree at eachstep), reducing by the appearance of each p N in the corresponding expression. Continuingthe procedure, we eventually arrive at, say k many, degree iterates with distinct leadingcoeﬃcients (by the essential distinctness property of the initial polynomials), which are allmultiples of d ! · a d,N . As all the iterated ratios of these coeﬃcients are independent of a d,N and non-zero, we get that they satisfy the R k -property, completing the claim. (5) If ( P N ) N is super nice, then ( P ′ N ) N is super nice too. Note that these diﬀerences lead to reduction of degrees, as for example ∆ ( p ( n ); h ) := p ( n + h ) − p ( n ) , and ∆ ( p ( n ); h , h ) := ∆ (∆ ( p ( n ); h ); h ) = p ( n + h + h ) − p ( n + h ) − ( p ( n + h ) − p ( n )) reduce thedegree of p by and respectively. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 29

Looking at Property ( i ) for ( P N ) N , we have that each polynomial (sequence) in ( P ′ N ) N is non-constant and has degree independent of N, equal to deg( p i ,N − p ,N ) . If we let q i,N := ( p i,N − p i ,N i = i − p i ,N i = i , then, we have q i,N − q j,N =  p i,N − p j,N i, j = i p i,N j = i − p j,N i = i , and (i) follows for ( P ′ N ) N . ( ii ) and ( ii ) ′ follow by the fact that (( P ′ ) ′ N ) N = ( P N ) N . (6) Property ( i ) is invariant under the vdC-operation. Indeed, if p i ,N is the polynomial guaranteed by Lemma 5.11, then we have the iterates: p ,N ( n + h ) − p i ,N ( n ) , . . . , p ℓ,N ( n + h ) − p i ,N ( n ) , and p ,N ( n ) − p i ,N ( n ) , . . . , p i − ,N ( n ) − p i ,N ( n ) , p i +1 ,N ( n ) − p i ,N ( n ) , . . . , p ℓ,N ( n ) − p i ,N ( n ) . The degrees of these polynomials satisfy deg( p i,N ( n + h ) − p i ,N ( n )) = ( deg( p i,N ( n ) − p i ,N ( n )) i = i deg( p i ,N ) − i = i . For the pairwise diﬀerences part, for i = j, we have p i,N ( n + h ) − p i ,N ( n ) − ( p j,N ( n + h ) − p i ,N ( n )) = p i,N ( n + h ) − p j,N ( n + h ) ,p i,N ( n ) − p i ,N ( n ) − ( p j,N ( n ) − p i ,N ( n )) = p i,N ( n ) − p j,N ( n ) , and, ﬁnally, p i,N ( n + h ) − p i ,N ( n ) − ( p j,N ( n ) − p i ,N ( n )) = p i,N ( n + h ) − p j,N ( n ) , so, everything follows by Property (i) for ( P N ) N . Notice that Remark 5.13 (4) implies that Theorem 1.5, via Theorem 2.2, holds for alarger class of variable polynomial sequences; even with coeﬃcients that oscillate.A real-valued function g which is continuously diﬀerentiable on [ c, ∞ ) , where c ≥ , is called Fejér if the following hold: g ′ ( x ) tends monotonically to as x → ∞ ; and lim x →∞ x | g ′ ( x ) | = ∞ . Any such function is eventually monotonic and satisﬁes the growthconditions log x ≺ g ( x ) ≺ x, hence (1 /g ( N )) N has the R -property. So, modulo thegoodness property, Theorem 2.2 will also hold for polynomial sequences of the form: √ N / (2 + cos √ log N ) n + p ,N ( n ) ! N , or (cid:18) N / (1 /

10 + sin log N ) n + p ,N ( n ) (cid:19) N , where p ,N , p ,N are polynomials of degrees less than and respectively with boundedcoeﬃcients, while the functions g ( x ) = x / (2+cos √ log x ) , g ( x ) = x / (1 / x ) are Fejér but, because of oscillation, don’t belong to SLE . So, for sequences ( P N ) N with degree ≥ , the vdC-operation preserves the super niceness property. Notice that the vdC-operation will remove the p i ,N ( n + h ) − p i ,N ( n ) iterate in case deg( p i ,N ) = 1 . Recall here that if, in the case where i = j, it happens deg( p i,N ) = 1 , then deg( p i ,N ) = 1 (asa non-constant polynomial of minimum degree in ( P N ) N ), so the vdC-operation will group the terms p i,N ( n + h ) − p i ,N ( n ) and p i,N ( n ) − p i ,N ( n ) together, being of degree with bounded diﬀerence. For a deep study of averages with general sublinear iterates one is referred to [9], while for moregeneral such functions, i.e., tempered ones, to [5] and [31].

The following result shows that the nilfactor Z is characteristic for a super nice collectionof polynomial sequences ( p ,N , . . . , p ℓ,N ) N . Proposition 5.14.

For ℓ ∈ N let ( p ,N , . . . , p ℓ,N ) N be a super nice sequence of polynomials, ( X, B , µ, T ) a system, and suppose that at least one of the functions f , . . . , f ℓ ∈ L ∞ ( µ ) isorthogonal to the nilfactor Z . Then, we have (25) lim N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 T [ p ,N ( n )] f · . . . · T [ p ℓ,N ( n )] f ℓ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 0 . Proof.

We assume without loss of generality that f is orthogonal to Z . As in [13, Lemma 4.7],to show (25), it suﬃces to show:(26) lim N →∞ sup k f k ∞ , k f k ∞ ,..., k f ℓ k ∞ ≤ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)Z f · T [ p ,N ( n )] f · . . . · T [ p ℓ,N ( n )] f ℓ dµ (cid:12)(cid:12)(cid:12)(cid:12) = 0 . We claim next that we can further assume that deg( p ,N ) = deg( P N ) . If this is not thecase and deg( p ,N ) < deg( p i ,N ) = deg( P N ) , then, factoring out T [ p i ,N ( n )] , (26) becomes lim N →∞ sup k f k ∞ , k f k ∞ ,..., k f ℓ k ∞ ≤ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z f i · T [ − p i ,N ( n )] f ,n,N · Y ≤ i = i ≤ ℓ T [ p i,N ( n ) − p i ,N ( n )] f i,n,N dµ (cid:12)(cid:12)(cid:12)(cid:12) = 0 , where f i,n,N = T e i ( n,N ) f i for some error terms e i ( n, N ) ∈ { , } . So, it suﬃces to show, forany choice of e ∈ { , } , that lim N →∞ sup k f k ∞ , k f k ∞ ,..., k f ℓ k ∞ ≤ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z f i · T [ − p i ,N ( n )] f · T [ p ,N ( n ) − p i ,N ( n )] ( T e f ) · Y ≤ i = i ≤ ℓ T [ p i,N ( n ) − p i ,N ( n )] f i dµ (cid:12)(cid:12)(cid:12)(cid:12) = 0 . With Remark 5.13 (5), we have that the polynomial family ( p i ,N − p ,N , . . . , p i ,N − p i − ,N ,p i ,N , p i ,N − p i +1 ,N , . . . , p i ,N − p ℓ,N ) N is super nice with degree = deg( p i ,N − p ,N ) , fromwhich the claim follows.In case all the p i,N ’s are of degree , the result follows from Proposition 5.8.Assume now that deg( p ,N ) ≥ . We will use induction on the type of the polynomialfamily of ℓ -tuple of sequences.For every N ∈ N we choose functions f i,N with k f i,N k ∞ ≤ for i ∈ { , , . . . , ℓ } , so thatthe average in (26) is /N close to the corresponding sup k f k ∞ , k f k ∞ ,..., k f ℓ k ∞ ≤ . Using theCauchy-Schwarz inequality, we have that (26) follows if we show(27) lim N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z f ,N · T [ p ,N ( n )] f · ℓ Y i =2 T [ p i,N ( n )] f i,N dµ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 . ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 31

Setting S = T × T, F = f ⊗ f , F i,N = f i,N ⊗ f i,N , i = 0 , , . . . , ℓ, and ˜ µ = µ × µ, (27)can be rewritten as lim N →∞ N N X n =1 Z F ,N · S [ p ,N ( n )] F · ℓ Y i =2 S [ p i,N ( n )] F i,N d ˜ µ = 0 . So, after using Cauchy-Schwarz, it suﬃces to show(28) lim N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 S [ p ,N ( n )] F · ℓ Y i =2 S [ p i,N ( n )] F i,N (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L (˜ µ ) = 0 . Using Lemma 5.5, (28) follows if, for large enough h, we have lim N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z S [ p ,N ( n + h )] F · ℓ Y i =2 S [ p i,N ( n + h )] F i,N · S [ p ,N ( n )] F · ℓ Y i =2 S [ p i,N ( n )] F i,N d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12) = 0 . Picking p j ,N as guaranteed by Lemma 5.11 (note that since the degrees of the p i,N ’s areﬁxed, the choice of j is independent of N ), factoring out S [ p j ,N ( n )] , the previous relationbecomes lim N →∞ N N X n =1 (cid:12)(cid:12)(cid:12)(cid:12) Z F j ,N · S [ p ,N ( n + h ) − p j ,N ( n )]+ e ( n,h,N ) F · ℓ Y i =2 S [ p i,N ( n + h ) − p j ,N ( n )]+ e i ( n,h,N ) F i,N · S [ p ,N ( n ) − p j ,N ( n )]+˜ e ( n,N ) F · Y ≤ i = j ≤ ℓ S [ p i,N ( n ) − p j ,N ( n )]+˜ e i ( n,N ) F i,N d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12) = 0 , (29)for some error terms e i ( n, h, N ) , ˜ e i ( n, N ) ∈ { , } . Next, we group the degree iterates. More speciﬁcally, if deg( p i,N ) = 1 for some ≤ i ≤ ℓ (recall that deg( p ,N ) ≥ ), then p i,N ( n + h ) = p i,N ( n ) + c i,N h, so [ p i,N ( n + h )] =[ p i,N ( n )] + [ c i,N h ] + e ′ i ( n, h, N ) , for some error terms in { , } . Hence, in such a case, wehave S [ p i,N ( n + h ) − p j ,N ( n )]+ e i ( n,h,N ) F i,N · S [ p i,N ( n ) − p j ,N ( n )]+˜ e i ( n,h,N ) F i,N = S [ p i,N ( n ) − p j ,N ( n )] ( S [ c i,N h ]+ e i ( n,h,N )+ e ′ i ( n,h,N ) F i,N · S ˜ e i ( n,h,N ) F i,N ) , and we treat this product as one iterate. After this grouping, assuming that the remainingterms in (29) are r many, it suﬃces to show, for large enough h, and every choice of e ∈ { , } that lim N →∞ sup k F k ∞ , k F k ∞ ,..., k F r k ∞ ≤ (cid:12)(cid:12)(cid:12)(cid:12) N N X n =1 Z F · S [ p ,h,N ( n )] ( S e F ) · r Y i =2 S [ p i,h,N ( n )] F i d ˜ µ (cid:12)(cid:12)(cid:12)(cid:12) = 0 , (30)where the polynomial sequences ( p i,h,N ) N form the polynomial family ( P N ( p j ,N , h )) N , with p ,h,N ( n ) = p ,N ( n + h ) − p j ,N ( n ) and deg( p ,h,N ) = deg( P N ( p j ,N , h )) . (30) is as (26), with the polynomial family of the latter having type strictly less thanthe former (from Lemma 5.11). Using Remark 5.13 (6), we are done by induction. (cid:3) For the expression of Theorem 2.2, using Proposition 5.14, we get the following result:

Proposition 5.15.

For ℓ ∈ N let ( p N , p N , . . . , ℓp N ) N be a super nice sequence of poly-nomials, ( X, B , µ, T ) a system, and suppose that at least one of the functions f , . . . , f ℓ ∈ L ∞ ( µ ) is orthogonal to the nilfactor Z . Then, we have (31) lim N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 T [ p N ( n )] f · T p N ( n )] f · . . . · T ℓ [ p N ( n )] f ℓ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 0 . Proof.

We can rewrite (31) as lim N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 T [ p N ( n )] f · T [2 p N ( n )]+ e ,N ( n ) f · . . . · T [ ℓp N ( n )]+ e ℓ,N ( n ) f ℓ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = 0 , for some bounded error terms e i,N ( n ) ∈ {− i, . . . , − , } , ≤ i ≤ ℓ. The result now followsby the proof of Proposition 5.14 as ( p N , p N , . . . , ℓp N ) N is a super nice sequence. (cid:3) Equidistribution

In order to prove our main equidistribution result (Theorem 6.2), we start with somedeﬁnitions and facts, following [12] (see [12, Subsubsection 2.3.2] for more details).If G is a nilpotent group, then a sequence g : N → G of the form g ( n ) = b p ( n )1 · · · b p k ( n ) k , where b i ∈ G, and p i are integer polynomials, is called a polynomial sequence in G . If themaximum degree of the p i ’s is at most d we say that the degree of g ( n ) is at most d. Given a nilmanifold X = G/ Γ the horizontal torus is deﬁned to be the compact abeliangroup Z = G/ ([ G, G ]Γ) . If X is connected, then Z is isomorphic to some ﬁnite dimensionaltorus T s . A horizontal character χ : G → C is a continuous homomorphism that satisﬁes χ ( gγ ) = χ ( g ) for every γ ∈ Γ and can be thought of as a character of T s , in which casethere exists a unique κ ∈ Z s such that χ ( t Z s ) = e ( κ · t ) , where “ · ” denotes the inner productoperation, and e ( x ) := e πix . ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 33

Let p : Z → R be a polynomial sequence of degree d of the form p ( n ) = P di =0 a i n i ,where a i ∈ R , ≤ i ≤ d. Recalling that k·k = d ( · , Z ) , we deﬁne the smoothness norm (32) k e ( p ( n )) k C ∞ [ N ] := max ≤ i ≤ d ( N i k a i k ) . Given N ∈ N , a ﬁnite sequence ( g ( n )Γ) ≤ n ≤ N is said to be δ - equidistributed in X , if (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X n =1 F ( g ( n )Γ) − Z X F d m X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ k F k Lip ( X ) for every Lipschitz function F : X → C , where k F k Lip ( X ) = k F k ∞ + sup x,y ∈ X,x = y | F ( x ) − F ( y ) | d X ( x, y ) for some appropriate metric d X .At this point we quote [12, Theorem 2.9], a direct consequence of [22, Theorem 2.9]: Theorem 6.1 (Green & Tao, [22]) . Let X = G/ Γ be a nilmanifold with G connected andsimply connected, and d ∈ N . Then for every small enough δ > there exist a positiveconstant M ≡ M ( X, d, δ ) with the following property: For every N ∈ N , if g : Z → G is apolynomial sequence of degree at most d such that the ﬁnite sequence ( g ( n )Γ) ≤ n ≤ N is not δ -equidistributed, then for some non-trivial horizontal character χ with k χ k ≤ M we have k χ ( g ( n )) k C ∞ [ N ] ≤ M ( χ here is thought of as a character of the horizontal torus Z = T s and g ( n ) as a polynomialsequence in T s ). Adapting the notion of equidistribution of a sequence in a nilmanifold (recall (17)) to ourcase, abusing the notation, we say that ( b a N ( n ) x ) ≤ n ≤ N , where ( a N ( n )) ≤ n ≤ N is a variablesequence of real numbers and X = G/ Γ is a nilmanifold with G connected and simplyconnected, is equidistributed in a subnilmanifold Y of X, if for every F ∈ C ( X ) we have lim N →∞ N N X n =1 F ( b a N ( n ) x ) = Z F d m Y . In order for us to prove Theorems 2.1 and 2.2, we prove the following equidistributiontheorem, which is the main result of this section:

Theorem 6.2.

Let ( p ,N , . . . , p ℓ,N ) N be a good sequence of ℓ -tuples of polynomials. (i) If X i = G i / Γ i , ≤ i ≤ ℓ, are nilmanifolds with G i connected and simply connected,then for every b i ∈ G i and x i ∈ X i the sequence ( b p ,N ( n )1 x , . . . , b p ℓ,N ( n ) ℓ x ℓ ) ≤ n ≤ N is equidistributed in the nilmanifold ( b s x ) s ∈ R × · · · × ( b sℓ x ℓ ) s ∈ R . There is an alternative, and equivalent, deﬁnition for this norm (see [22, Deﬁnition 2.7]). We canrewrite p ( n ) = P di =0 a ′ i (cid:0) ni (cid:1) and deﬁne k e ( p ( n )) k ′ C ∞ [ N ] := max ≤ i ≤ k ( N i k a ′ i k ) . By ([16, Section 5]), there arepositive constants c ≡ c ( d ) and C ≡ C ( d ) with c k e ( p ( n )) k ′ C ∞ [ N ] ≤ k e ( p ( n )) k C ∞ [ N ] ≤ C k e ( p ( n )) k ′ C ∞ [ N ] . (ii) If X i = G i / Γ i , ≤ i ≤ ℓ, are nilmanifolds, then for every b i ∈ G i and x i ∈ X i thesequence ( b [ p ,N ( n )]1 x , . . . , b [ p ℓ,N ( n )] ℓ x ℓ ) ≤ n ≤ N is equidistributed in the nilmanifold ( b n x ) n × · · · × ( b nℓ x ℓ ) n . Remark 6.3.

In order to prove Theorem 6.2, we can assume that X = . . . = X ℓ = X. Indeed, in the general case we consider the nilmanifold ˜ X = X × · · · × X ℓ . Then ˜ X = ˜ G/ ˜Γ , where ˜ G = G ×· · ·× G ℓ is connected and simply connected and ˜Γ = Γ ×· · ·× Γ ℓ is a discrete cocompact subgroup of ˜ G. Each b i can be considered as an element of ˜ G andeach x i as an element of ˜ X. Changing the base point we can also assume that x = Γ . The following lemma, which is analogous to [12, Lemma 5.1], shows that Part (ii) of theprevious result follows from Part (i).

Lemma 6.4.

Let ℓ ∈ N and ( a ,N , . . . , a ℓ,N ) N be sequence of ℓ -tuples of real numbers.Suppose that for every nilmanifold X = G/ Γ , with G connected and simply connected, andevery b , . . . , b ℓ ∈ G the sequence ( b a ,N ( n )1 Γ , . . . , b a ℓ,N ( n ) ℓ Γ) ≤ n ≤ N is equidistributed in the nilmanifold ( b s Γ) s ∈ R × · · · × ( b sℓ Γ) s ∈ R . Then, for every nilmanifold X = G/ Γ , b , . . . , b ℓ ∈ G and x , . . . , x ℓ ∈ X, the sequence ( b [ a ,N ( n )]1 x , . . . , b [ a ℓ,N ( n )] ℓ x ℓ ) ≤ n ≤ N is equidistributed in the nilmanifold ( b n x ) n × · · · × ( b nℓ x ℓ ) n . Sketch of the proof.

Following [12, Lemma 4.1], we show the ℓ = 1 case, as the general onefollows by straightforward modiﬁcations.Let X = G/ Γ be a nilmanifold, b ∈ G and x ∈ X. Using some standard reductions(namely, the lifting argument and the change of base point formula), we can and willassume that G is connected and simply connected and that x = Γ . Letting X b := ( b n Γ) n and m X b the corresponding normalized Haar measure, we willshow that for every F ∈ C ( X ) we have(33) lim N →∞ N N X n =1 F ( b [ a N ( n )] Γ) = Z X b F dm X b . Using our assumption for the case ˜ X := ˜ G/ ˜Γ , where ˜ G := R × G is connected and simplyconnected, ˜Γ := Z × Γ and ˜ b := (1 , b ) , for every H ∈ C ( ˜ X ) we have(34) lim N →∞ N N X n =1 H (˜ b a N ( n ) ˜Γ) = Z ˜ X ˜ b H dm ˜ X ˜ b , ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 35 where ˜ X ˜ b := ( s Z , b s Γ) s ∈ R and m ˜ X ˜ b the corresponding normalized Haar measure. Let F ∈ C ( X ) , and deﬁne ˜ F : ˜ X → C with ˜ F ( t Z , g Γ) := F ( b −{ t } g Γ) . While ˜ F maybe discontinuous, for every < δ < / there exists ˜ F δ ∈ C ( ˜ X ) that equals to ˜ F on ˜ X δ = I δ × X, where I δ = { t Z : k t k ≥ δ } , and it is uniformly bounded by k F k ∞ . Since ˜ b a N ( n ) = ( a N ( n ) , b a N ( n ) ) , our assumption implies (see also the argument that weused to get Lemma 3.2 from Lemma 3.1) that a N ( n ) Z ∈ I δ , and so ˜ b a N ( n ) ˜Γ ∈ ˜ X δ , for a setof n ’s with density − δ. So, lim sup N →∞ N ∞ X n =1 | ˜ F (˜ b a N ( n ) ˜Γ) − ˜ F δ (˜ b a N ( n ) ˜Γ) | ≤ δ k F k ∞ , hence, since (34) holds for every ˜ F δ , it also holds for ˜ F .

The map ( s Z , g Γ) b −{ s } g Γ sends ˜ X ˜ b onto X b . Deﬁning the measure m on X b as Z X b F dm := Z ˜ X ˜ b F ( b −{ s } g Γ) dm ˜ X ˜ b ( s Z , g Γ) , we have (see [12, Lemma 4.1]) that m = m X b . Thus, since ˜ F (˜ b a N ( n ) ˜Γ) = ˜ F ( a N ( n ) Z , b a N ( n ) Γ) = F ( b −{ a N ( n ) } b a N ( n ) Γ) = F ( b [ a N ( n )] Γ) , using (34) for the function ˜ F , we get lim N →∞ N N X n =1 F ( b [ a N ( n )] Γ) = Z ˜ X ˜ b ˜ F dm ˜ X ˜ b = Z ˜ X ˜ b F ( b −{ s } g Γ) dm ˜ X ˜ b ( s Z , g Γ) = Z X b F dm X b , so we have (33). (cid:3) Recalling that a sequence of ℓ -tuples of variable polynomials ( p ,N , . . . , p ℓ,N ) N is good ifevery non-trivial linear combination of ( p ,N ) N , . . . , ( p ℓ,N ) N is good, we have the following: Lemma 6.5.

Let ( p ,N , . . . , p ℓ,N ) N be a good sequence of ℓ -tuples of polynomials, X i = G i / Γ i nilmanifolds, with G i connected and simply connected, and suppose that b i ∈ G i acts ergodically on X i , ≤ i ≤ ℓ . Then the sequence ( b p ,N ( n )1 Γ , . . . , b p ℓ,N ( n ) ℓ Γ ℓ ) ≤ n ≤ N isequidistributed in X × · · · × X ℓ .Proof. We follow [12, Lemma 5.3]. As the general case is similar, we assume that X = . . . = X ℓ = X. Arguing by contradiction, we will also assume that for some δ > , ( b p ,N ( n )1 Γ , . . . , b p ℓ,N ( n ) ℓ Γ) ≤ n ≤ N is not δ -equidistributed in X ℓ . If p i,N ( t ) = P d i k =0 c i,k,N t k , then b p i,N ( n ) i = b i, ,N · b ni, ,N · · · b n di i,d i ,N , where b i,j,N = b c i,j,N , ≤ j ≤ d i , ≤ i ≤ ℓ, so, for every N ∈ N , ( b p ,N ( n )1 , . . . , b p ℓ,N ( n ) ℓ ) n isa polynomial sequence in G ℓ . Here we adapt the notation z Z which is more convenient than z ( mod . By this we mean lim N →∞ |{ ≤ n ≤ N : a N ( n ) Z ∈ I δ }| N = 1 − δ. Applying Theorem 6.1, we have a constant M ≡ M ( δ, X, d , . . . , d ℓ ) and a horizontalcharacter χ of X ℓ with k χ k ≤ M such that (cid:13)(cid:13)(cid:13) χ ( b p ,N ( n )1 , . . . , b p ℓ,N ( n ) ℓ ) (cid:13)(cid:13)(cid:13) C ∞ [ N ] ≤ M. Let π ( b i ) = ( β i, Z , . . . , β i,s Z ) , ≤ i ≤ ℓ, where β i,j ∈ R , be the projection of b i onthe horizontal torus T s (this s is bounded by the dimension of X ). Using the ergodicityassumption on the b i ’s, for all ≤ i ≤ ℓ, the set { , β i, , . . . , β i,s } consists of rationallyindependent elements. For t ∈ R we have π ( b ti ) = ( t ˜ β i, Z , . . . , t ˜ β i,s Z ) for some ˜ β i,j ∈ R with ˜ β i,j Z = β i,j Z , so, we have that χ ( b p ,N ( n )1 , . . . , b p ℓ,N ( n ) ℓ ) = e  ℓ X i =1 p i,N ( n ) s X j =1 λ i,j ˜ β i,j  for some λ i ∈ Z with | λ i | ≤ M. If, for n ∈ N , we set p N ( n ) := ℓ X i =1 p i,N ( n ) s X j =1 λ i,j ˜ β i,j = d X k =0 c k,N n k , we have that the sequence ( p N ) N , being a non-trivial (as χ is non-trivial and ˜ β i,j ’s arerationally independent) linear combination of the ( p i,N ) N ’s, is good. Combining the lastthree relations, we have a contradiction to (7), as M ≥ k e ( p N ( n )) k C ∞ [ N ] ≥ max ≤ j ≤ d (cid:0) N j k c j,N k (cid:1) . So, the sequence ( b p ,N ( n )1 Γ , . . . , b p ℓ,N ( n ) ℓ Γ) ≤ n ≤ N is δ -equidistributed for all δ > , hence itis equidistributed. (cid:3) The last ingredient in proving Part (i) of Theorem 6.2 is the following lemma:

Lemma 6.6 (Lemma 5.2, [12]) . Let X = G/ Γ be a nilmanifold with G connected andsimply connected. Then, for every b , . . . , b ℓ ∈ G, there exists an s ∈ R such that for all ≤ i ≤ ℓ the element b s i acts ergodically on the nilmanifold ( b si Γ) s ∈ R . We are now ready to prove Theorem 6.2.

Proof of Theorem 6.2.

Using Lemma 6.4 we see that Part (ii) of Theorem 6.2 followsfrom Part (i). To establish Part (i) let b , . . . , b ℓ ∈ G . By Lemma 6.6 there existsa non-zero s ∈ R such that for every ≤ i ≤ ℓ the element b s i acts ergodicallyon the nilmanifold ( b si Γ) s ∈ R . Using Lemma 6.5 for the elements b s i and the polyno-mials p i,N /s (which are still forming a good sequence of ℓ -tuples of polynomials) weget that the sequence ( b p ,N ( n )1 Γ , . . . , b p ℓ,N ( n ) ℓ Γ) ≤ n ≤ N is equidistributed in the nilmanifold ( b s Γ) s ∈ R × · · · × ( b sℓ Γ) s ∈ R , hence we get the conclusion. (cid:3) Note here that, for all ≤ i ≤ ℓ, the ˜ β i,j ’s are also rationally independent. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 37 From natural to prime numbers

In this section we will provide the steps we need to follow in order to study averagesalong prime numbers. To do so, we follow [17], [18] and [29].To state the ﬁrst lemma which compares averages along primes to averages along naturalnumbers, we need to recall some standard notation:We start with the deﬁnition of the von Mangoldt function , Λ : N → R , where Λ( n ) = (cid:26) log( p ) , if n = p k for some p ∈ P and some k ∈ N , elsewhere .Instead of Λ , it is more convenient for us to deal with the function Λ ′ : N → R , where Λ ′ ( n ) = P ( n ) · Λ( n ) = P ( n ) · log( n ) . The argument of the following lemma, using the fact that the sequence is bounded, isexactly the same as the one in [17, Lemma 1].

Lemma 7.1 (Lemma 1, [17]) . If ( a N ( n )) N,n ⊆ C is a bounded sequence, then (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π ( N ) X p ∈ P ∩ [1 ,N ] a N ( p ) − N N X n =1 Λ ′ ( n ) · a N ( n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o N (1) . Next, we recall the deﬁnition of Gowers norms.

Deﬁnition 7.2. If a : Z N → C , we inductively deﬁne: k a k U ( Z N ) = (cid:12)(cid:12)(cid:12) E n ∈ Z N a ( n ) (cid:12)(cid:12)(cid:12) ; and k a k U d +1 ( Z N ) = (cid:16) E h ∈ Z N k a h · a k d U d ( Z N ) (cid:17) / d +1 , where a h ( n ) = a ( n + h ) . As Gowers showed in [21], k·k U d ( Z N ) deﬁnes a norm on Z N for d ≥ . In what follows, we will use the, standard by now, “W-trick”. For w > , let W = Y p ∈ P ∩ [1 ,w − p be the product of primes bounded above by w. For r ∈ N , let Λ ′ w,r ( n ) = φ ( W ) W · Λ ′ ( W n + r ) , where φ is the Euler function, be the modiﬁed von Mangoldt function .The next result shows the Gowers uniformity of the modiﬁed von Mangoldt functionand can be derived by [23], [24] and [25]. We present here a formulation of it from [18]. Theorem 7.3 (Theorem 7.2, [23]) . For every d ∈ N we have that lim w →∞ (cid:18) lim N →∞ (cid:18) max ≤ r ≤ W, ( r,W )=1 (cid:13)(cid:13) (Λ ′ w,r − · [1 ,N ] (cid:13)(cid:13) U d ( Z dN ) (cid:19)(cid:19) = 0 . The diﬀerence of the averages N P Nn =1 (Λ( n ) − Λ ′ ( n )) is of the order of / √ N. By o N (1) we mean that the quantity goes to as N → ∞ . The following uniformity estimate, which can be shown by iterating a variant of van derCorput’s lemma, is an important step towards our results.

Lemma 7.4 (Lemma 3.5, [18]) . Let ℓ, m ∈ N , ( X, B , µ, T , . . . , T ℓ ) be a system, q i,j ∈ Z [ t ] polynomials, ≤ i ≤ ℓ, ≤ j ≤ m, f , . . . , f m ∈ L ∞ ( µ ) and a : N → C be a sequencesatisfying a ( n ) /n c → for every c > . Then there exists d ∈ N , depending only on themaximum degree of the polynomials q i,j and the integers ℓ, m and a constant C d dependingon d, such that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 a ( n ) · m Y j =1 ℓ Y i =1 T q i,j ( n ) i f j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) ≤ C d (cid:16)(cid:13)(cid:13) a · [1 ,N ] (cid:13)(cid:13) U d ( Z dN ) + o N (1) (cid:17) . Furthermore, the constant C d is independent of the sequence ( a ( n )) n and the o N (1) termdepends only on the integer d and on the sequence ( a ( n )) n . The crucial fact of the previous result for our study is that, because of the dependenceof the constants, it can also be used for variable polynomials of bounded degree, i.e., theones that we deal with, of the form q N ( n ) = a d,N n d + . . . + a ,N n + a ,N . Let r ∈ N and ( X, B , µ ) be a probability space. We call a jointly measurable family ( T t ) t ∈ R r of measure preserving transformations T t : X → X, a measure preserving ﬂow , iffor all s, t ∈ R r , we have T s + t = T s ◦ T t . We will show a similar estimation to [29, Theorem 3.1], via the use of a multidimensionalvariant of the special ﬂow above a system under the constant ceiling function : Theorem 7.5.

For ℓ, m ∈ N , let ( X, B , µ, T , . . . , T ℓ ) be a system, ( q i,j,N ) N ∈ R [ t ] goodpolynomials, ≤ i ≤ ℓ, ≤ j ≤ m, and f , . . . , f ℓ ∈ L ∞ ( µ ) . For N ∈ N , for the functions b N ( n ) := ( ℓ Y i =1 T [ q i, ,N ( n )] i ) f · . . . · ( ℓ Y i =1 T [ q i,m,N ( n )] i ) f m there exists d ∈ N , depending only on the maximum degree of the polynomials q i,j,N andthe integers ℓ and m, such that for every < δ < there exists a constant C d,δ dependingon d and δ, such that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 (Λ ′ w,r ( n ) − b N ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) ≤ C d,δ (cid:16)(cid:13)(cid:13) (Λ ′ w,r − · [1 ,N ] (cid:13)(cid:13) U d ( Z dN ) + o N (1) (cid:17) + c δ (1 + o N →∞ ; w (1)) , for all r ∈ N , where c δ → as δ → + and the term o N (1) depends on the integer d. Proof.

We follow [29, Theorem 3.1]. Let < δ < and w, r ∈ N . For the given transfor-mations on X, we deﬁne the R ℓm -action ℓ Y i =1 T i,s i, · . . . · ℓ Y i =1 T i,s i,m on the probability space Here we mean that T , . . . , T ℓ : X → X are invertible, commuting, measure preserving transformationson the probability space ( X, B , µ ) . The quantity o N →∞ ; w (1) goes to as N → ∞ and then w → ∞ . ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 39 Y := X × [0 , ℓm , endowed with the measure ν := µ × λ ℓm ( λ is the Lebesgue measure on [0 , ), by m Y j =1 ℓ Y i =1 T i,s i,j ( x, a , , . . . , a ℓ, , a , , . . . , a ℓ, , . . . , a ,m , . . . , a ℓ,m ) =  m Y j =1 ℓ Y i =1 T [ s i,j + a i,j ] i x, { s , + a , } , . . . , { s ℓ, + a ℓ, } , . . . , { s ,m + a ,m } , . . . , { s ℓ,m + a ℓ,m }  . One can routinely verify that the above action deﬁnes a measure preserving ﬂow on theproduct probability space Y .If f , . . . , f m are bounded functions on X, we deﬁne the Y -extensions of f j , setting forevery element ( a , , . . . , a ℓ, , a , , . . . , a ℓ, , . . . , a ,m , . . . , a ℓ,m ) ∈ [0 , ℓm : ˆ f j ( x, a , , . . . , a ℓ, , a , , . . . , a ℓ, , . . . , a ,m , . . . , a ℓ,m ) = f j ( x ) , ≤ j ≤ m ; and ˆ f ( x, a , , . . . , a ℓ, , a , , . . . , a ℓ,m ) = 1 [0 ,δ ] ℓm ( a , , . . . , a ℓ, , a , , . . . , a ℓ,m ) . For every ≤ n ≤ N, let ˜ b N ( n ) := ˆ f · ( m Y j =1 ℓ Y i =1 T i,δ j · q i, ,N ( n ) ) ˆ f · . . . · ( m Y j =1 ℓ Y i =1 T i,δ jm · q i,m,N ( n ) ) ˆ f m ; and for every x ∈ Xb ′ N ( n )( x ) := Z [0 , ℓm ˜ b N ( n )( x, a , , . . . , a ℓ, , a , , . . . , a ℓ, , . . . , a ,m , . . . , a ℓ,m ) dλ ℓm , where the integration is with respect to the variables a i,j . By using the triangle and the Cauchy-Schwarz inequality, if a ( n ) = Λ ′ w,r ( n ) − , we have δ ℓm (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 a ( n ) b N ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 a ( n ) · ( δ ℓm b N ( n ) − b ′ N ( n )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 a ( n )˜ b N ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( ν ) . Using Lemma 7.4, we ﬁnd an integer d ∈ N , depending only on the maximum degree ofthe polynomials q i,j,N and the integers ℓ, m and a constant C d depending on d, such that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 a ( n )˜ b N ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( ν ) ≤ C d (cid:16)(cid:13)(cid:13) a · [1 ,N ] (cid:13)(cid:13) U d ( Z dN ) + o N (1) (cid:17) , where the o N (1) term depends only on the integer d and the sequence ( a ( n )) n . Next we will study the term (cid:13)(cid:13)(cid:13) N P Nn =1 a ( n ) · ( δ ℓm b N ( n ) − b ′ N ( n )) (cid:13)(cid:13)(cid:13) L ( µ ) . For every x ∈ X and ≤ n ≤ N, we have (cid:12)(cid:12)(cid:12) δ ℓm b N ( n )( x ) − b ′ N ( n )( x ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,δ ] ℓm  m Y j =1 f j ( ℓ Y i =1 T [ q i,j,N ( n )] i x ) − m Y j =1 f j ( ℓ Y i =1 T [ q i,j,N ( n )+ a i,j ] i x )  dλ ℓm (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Since all the relevant a i,j in the integrand are less or equal than δ, if { q i,j,N ( n ) } < − δ, we have T [ q i,j,N ( n )+ a i,j ] i = T [ q i,j,N ( n )] i for all ≤ i ≤ ℓ, ≤ j ≤ m. We deal with the casewhere { q i,j,N ( n ) } ≥ − δ. For every ≤ i ≤ ℓ, ≤ j ≤ m, let E i,jδ,N := { ≤ n ≤ N : { q i,j,N ( n ) } ∈ [1 − δ, } . Then, by using the fact that E , δ,N ∪ ... ∪ E ,mδ,N ∪ E , δ,N ∪ ... ∪ E ℓ,mδ,N ≤ P ( i,j ) ∈ [1 ,ℓ ] × [1 ,m ] E i,jδ,N and that E i,jδ,N ( n ) = [1 − δ, ( { q i,j,N ( n ) } ) , for every x ∈ X we have (cid:12)(cid:12)(cid:12) δ ℓm b N ( n )( x ) − b ′ N ( n )( x ) (cid:12)(cid:12)(cid:12) ≤ δ ℓm · X ( i,j ) ∈ [1 ,ℓ ] × [1 ,m ] [1 − δ, ( { q i,j,N ( n ) } ) . So, recalling that a ( n ) = Λ ′ w,r ( n ) − , N N X n =1 | a ( n ) | · [1 − δ, ( { q i,j,N ( n ) } ) ≤ N N X n =1 Λ ′ w,r ( n ) · [1 − δ, ( { q i,j,N ( n ) } ) + | E i,jδ,N | N .

Since each ( q i,j,N ) N is good, for large N and small enough δ, the term (and the sum ofﬁnitely many terms of this form) | E i,jδ,N | N is as small as we want.It remains to show that the term N P Nn =1 Λ ′ w,r ( n ) · [1 − δ, ( { q i,j,N ( n ) } ) goes to zeroas N → ∞ , then w → ∞ and ﬁnally δ → + . To this end, it suﬃces to show that N P Nn =1 Λ ′ w,r ( n ) e πikq i,j,N ( n ) → as N → ∞ and then w → ∞ for all k ∈ Z \ { } . Writing N N X n =1 Λ ′ w,r ( n ) e πikq i,j,N ( n ) = 1 N N X n =1 (Λ ′ w,r ( n ) − e πikq i,j,N ( n ) + 1 N N X n =1 e πikq i,j,N ( n ) , we have that the second term goes to for large N from (6). The ﬁrst term goes to zeroas N → ∞ and then w → ∞ as in [18], The result now follows. (cid:3)

The following implication of Theorems 7.5 and 7.3 is the main result that we will use toobtain the convergence and recurrence results along primes, stated in Section 2. This follows by the fact that if f is Riemann integrable on [0 , with R [0 , f ( x ) dx = c, then, for every ε > , we can ﬁnd trigonometric polynomials q , q , with no constant terms, with q ( t ) + c − ε ≤ f ( t ) ≤ q ( t ) + c + ε. We use this for the function f = [1 − δ, . One can imitate here the proof of [18, Lemma 3.5], since, after ﬁnitely many iterates of the van derCorput lemma, the polynomial q i,j,N becomes constant. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 41

Proposition 7.6.

For ℓ, m ∈ N , let ( X, B , µ, T , . . . , T ℓ ) be a system, ( p i,j,N ) N polynomi-als such that ( p W,r,i,j,N ) N is good for every W ∈ N , ≤ r ≤ W, where p W,r,i,j,N ( n ) = p i,j,N ( W n + r ) , ≤ i ≤ ℓ, ≤ j ≤ m, and f , . . . , f ℓ ∈ L ∞ ( µ ) . Then, max ≤ r ≤ W, ( r,W )=1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 (Λ ′ w,r ( n ) − · m Y j =1 ℓ Y i =1 T [ p i,j,N ( W n + r )] i f j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) = o N →∞ ; w (1) . Proof.

Using Theorem 7.5, for the polynomials p W,r,i,j,N , we get that for every < δ < , there exists d ∈ N , depending only on the maximum degree of the polynomials q i,j,N andthe integers ℓ and m, and a constant C d,δ depending on d and δ, such that max ≤ r ≤ W, ( r,W )=1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N X n =1 (Λ ′ w,r ( n ) − · m Y j =1 ℓ Y i =1 T [ p i,j,N ( W n + r )] i f j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) ≤ C d,δ (cid:18) max ≤ r ≤ W, ( r,W )=1 (cid:13)(cid:13) (Λ ′ w,r − · [1 ,N ] (cid:13)(cid:13) U d ( Z dN ) + o N (1) (cid:19) + c δ (1 + o N →∞ ; w (1)) , where c δ → as δ → + . Taking ﬁrst N → ∞ and then w → ∞ in this expression, byTheorem 7.3, we have that the required limit is bounded above by c δ . Taking δ → + , weget the result. (cid:3) Proof of main results

In this ﬁnal section we will prove of our main results.8.1.

Averages along natural numbers.

We will show Theorems 2.1 and 2.2. First weshow that the polynomial sequences from Theorems 1.4 and 1.5 are good and super nice.We remind the reader at this point that every Hardy ﬁeld function is eventually mono-tone and that limits of ratios of such functions always exist (in the extended real line).Recall that we write g ≺ g if | g ( x ) | / | g ( x ) | → ∞ as x → ∞ . We also introduce somenew notation: We write g ∼ g , in case | g ( x ) | / | g ( x ) | converges to a non-zero real numberas x → ∞ , and g - g if either g ≺ g or g ∼ g . Lemma 8.1.

The polynomial sequences from Theorems 1.4 and 1.5 are good.Proof.

Let λ p ,N + . . . + λ ℓ p ℓ,N be a non-trivial linear combination of strongly independentvariable polynomials as in (8). In case this combination is a polynomial of degree , factoring its constant term out, without loss, we can assume that it is of the form h ( N ) n, where h ( N ) ∼ /g ( N ) , with ≺ g ( N ) ≺ N (hence h ( N ) → and | h ( N ) | N → ∞ monotonically as N → ∞ ). For any α = 0 , we have that N N − X n =0 e iαh ( N ) n = 1 N · − e iαh ( N ) N − e iαh ( N ) = h ( N )1 − e iαh ( N ) · − e iαh ( N ) N h ( N ) N → iα · , as N → ∞ . In case the combination is a polynomial of degree d, after using Lemma 5.5 ( d − times,we are getting a polynomial of degree , hence the result follows from the previous step. (cid:3) Recall that when we want to check that a k -tuple, for k > , has the R k -property, wehave to check that for every ≤ i ≤ k the corresponding (according to Deﬁnition 5.6) ( k − -tuple has the R k − -property. If a ( k − -tuple corresponds to the index i , we saythat it is descending from the i term of the previous step. Lemma 8.2.

The polynomial sequences from Theorems 1.4 and 1.5 are super nice.Proof.

For a single polynomial sequence as in (8), the result follows immediately fromRemark 5.13 (4) and the properties of Hardy ﬁeld functions.For multiple sequences, ( i ) follows by the form (8) that the variable polynomial sequenceshave. As ( P N ) N and ( P ′ N ) N consist of polynomials of the same form, ( ii ) and ( ii ) ′ willboth follow by the same argument.Assuming that we have k many essentially distinct terms of the form a i,N n, ≤ i ≤ k, we have a i, · ∈ C and a k,N - . . . - a ,N - a ,N . We present an algorithmic way of ﬁndingthe coeﬃcients in order to check that they satisfy the R k -property: Step 1:

For i = 1 we pick j = k (i.e., the largest index). In this case we will show thatwe have (ii) (a) (of Deﬁnition 5.6). The coeﬃcients become: a k,N − a j,N a ,N ∼ a j,N a ,N , ≤ j ≤ k − . For i > , we pick j = 1 (i.e., the smallest index). In this case we will show that wehave (ii) (b). The coeﬃcients become: a j,N a i,N − a ,N ∼ a j,N a ,N , ≤ j ≤ k. Step λ : After we order them from largest to smallest growth, we denote the j -th term atthe λ -th step with a λ,j,N . We have two cases: • The sequence of coeﬃcients is descending from the i = 1 term of the ( λ − -th step.For i = 1 we pick j = k − λ + 1 and show (ii) (a) (always for the i = 1 case we pick thelargest index j and show (ii) (a)). For ≤ j ≤ j − , we have a λ,j ,N − a λ,j,N a λ, ,N = a λ − ,j,N − a λ − ,j ,N a λ − ,j +1 ,N − a λ − , ,N ∼ a λ − ,j,N a λ − , ,N , where the numerator comes from the diﬀerence ( a λ − ,j +1 ,N − a λ − ,j ,N ) − ( a λ − ,j +1 ,N − a λ − ,j,N ) , and the (common) denominators are canceled.For i > we pick j = 1 and show (ii) (b) (always for the i > case we pick j = 1 andshow (ii) (b)). For ≤ j ≤ k − λ + 1 we have a λ,j,N a λ,i,N − a λ, ,N = a λ − ,k − λ +2 ,N − a λ − ,j,N a λ − , ,N − a λ − ,i,N ∼ a λ − ,j,N a λ − , ,N , where the denominator comes from the diﬀerence ( a λ − ,k − λ +2 ,N − a λ − ,i,N ) − ( a λ − ,k − λ +2 ,N − a λ − , ,N ) , and, as in the previous case, the (common) denominators are canceled. • The sequence of coeﬃcients is descending from the i > term of the ( λ − -th step. This happens because vdC-operations preserve the essential distinctness property of the polynomialsand at each step the coeﬃcient functions belong to C . ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 43

For i = 1 , we choose j = k − λ + 1 . For all ≤ j ≤ j − we have: a λ,j ,N − a λ,j,N a λ, ,N = a λ − ,j +1 ,N − a λ − ,j +1 ,N a λ − , ,N ∼ a λ − ,j +1 ,N a λ − , ,N . For i > , we choose j = 1 . For all ≤ j ≤ k − λ + 1 , we have a λ,j,N a λ,i,N − a λ, ,N = a λ − ,j +1 ,N a λ − ,i +1 ,N − a λ − , ,N ∼ a λ − ,j +1 ,N a λ − , ,N . Note that each of the aforementioned terms, at each step, is (up to a sign) of the form a i,N − a j,N a s,N − a t,N , s > i > j ≥ t, or , a i,N − a j,N a t,N , i > j ≥ t, or , a j,N a i,N − a t,N , t < min { i, j } , i.e., combinations of terms from the initial sequence (because of the cancellations that wementioned before) which are all ∼ to a j,N /a t,N . The claim now follows by the properties ofelements from LE as each coeﬃcient is a logarithmico-exponential Hardy function, henceeventually monotone, which is either ∼ or ∼ /g ( N ) with ≺ g ( N ) ≺ N by theconstruction. (cid:3) We now prove Theorem 2.1 (which of course implies Theorem 1.4):

Proof of Theorem 2.1.

We start by using Proposition 5.14 in order to get that the nilfac-tor Z is characteristic for the multiple average in (9). Via Theorem 4.1 we can assumewithout loss of generality that our system is an inverse limit of nilsystems. By a standardapproximation argument, we can further assume that our system is a nilsystem.Let ( X = G/ Γ , G / Γ , m X , T b ) be a nilsystem, where b ∈ G is ergodic, and F , . . . , F ℓ ∈ L ∞ ( m X ) . Our objective now is show that if ( p ,N , . . . , p ℓ,N ) N is a super nice sequence of ℓ -tuples of polynomials, then(35) lim N →∞ N X n =1 F ( b [ p ,N ( n )] x ) · . . . · F ℓ ( b [ p ℓ,N ( n )] x ) = Z F dm X · . . . · Z F ℓ dm X , where the convergence takes place in L ( m X ) . By density, we can assume that the func-tions F , . . . , F ℓ are continuous. Then, applying Theorem 6.2 to the nilmanifold X ℓ , thenilrotation ˜ b = ( b, . . . , b ) ∈ G ℓ , the point ˜ x = ( x, . . . , x ) ∈ X ℓ , and the continuous function ˜ F ( x , . . . , x ℓ ) = F ( x ) · . . . · F ℓ ( x ℓ ) , we get that lim N →∞ N X n =1 ˜ F ( b [ p ,N ( n )] x, . . . , b [ p ℓ,N ( n )] x ) = Z ˜ F dm X ℓ ; which gives the desired limit in (35), completing the proof. (cid:3) Next, we show Theorem 2.2 (which in turn implies Theorem 1.5):

Proof of Theorem 2.2.

From Proposition 5.15 the nilfactor Z is characteristic for the mul-tiple ergodic averages in (10). Via Theorem 4.1, using the ergodic decomposition, we canassume without loss of generality that our system is an inverse limit of nilsystems. Asin the previous proof, we can further assume without loss of generality that our systemis a nilsystem. Let ( X = G/ Γ , G / Γ , m X , T b ) be a nilsystem and F , . . . , F ℓ ∈ L ∞ ( m X ) . Our objective now is to show that if ( p N ) N ⊆ R [ t ] is a good polynomial sequence with ( p N , p N , . . . , ℓp N ) N being super nice, then the limit(36) lim N →∞ N X n =1 F ( b [ p N ( n )] x ) · F ( b p N ( n )] x ) · . . . · F ℓ ( b ℓ [ p N ( n )] x ) exists in L ( m X ) and it’s equal to the limit(37) lim N →∞ N X n =1 F ( b n x ) · F ( b n x ) · . . . · F ℓ ( b ℓn x ) . Note that by density, we can assume that every F i is continuous. Then, for every x ∈ X ,applying Theorem 6.2 (for the single good polynomial sequence ( p N ) N ) to the nilmanifold X ℓ , the nilrotation ˜ b = ( b, b , . . . , b ℓ ) , the point ˜ x = ( x, x, . . . , x ) , and the continuousfunction ˜ F ( x , . . . , x ℓ ) = F ( x ) · F ( x ) · . . . · F ℓ ( x ℓ ) , as the sequences (˜ b [ p N ( n )] ˜ x ) n and (˜ b n ˜ x ) n are equidistributed to the nilmanifold (˜ b n ˜ x ) n , we get that the limit lim N →∞ N N X n =1 ˜ F (˜ b [ p N ( n )] ˜ x ) exists and lim N →∞ N N X n =1 ˜ F (˜ b [ p N ( n )] ˜ x ) = lim N →∞ N N X n =1 ˜ F (˜ b n ˜ x ) . This implies that the limits in (36) and (37) exist for every x ∈ X and are equal, completingthe proof. (cid:3) Averages along prime numbers.

We will now show Theorems 2.17 and 2.16, whichimply Theorems 1.7 and 1.6 (it is easy to see that polynomials of the form (8) are satisfyingthe required assumptions of these results). We will also show Theorems 2.22 and 2.23. Todo so we follow the arguments from [18] (see also [29] and [27]).

Proof of Theorem 2.17.

By Lemma 7.1 it suﬃces to show that the sequence A ( N ) := 1 N N X n =1 Λ ′ ( n ) · T [ q N ( n )] f · T q N ( n )] f · . . . · T ℓ [ q N ( n )] f ℓ converges in L ( µ ) to the same limit as the sequence N P Nn =1 Q ℓi =1 T in f i as N → ∞ . For w (which gives a corresponding W ), r ∈ N , we deﬁne B w,r ( N ) := 1 N N X n =1 T [ q N ( W n + r )] f · T q N ( W n + r )] f · . . . · T ℓ [ q N ( W n + r )] f ℓ . For any ε > , using Proposition 7.6 with m = ℓ, T i = T, ≤ i ≤ ℓ, and p i,j,N = (cid:26) , if i ≤ ℓ − jq N , elsewhere , for suﬃciently large N and some w we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) A ( W N ) − φ ( W ) X ≤ r ≤ W , ( r,W )=1 B w ,r ( N ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( µ ) < ε. ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 45

By the assumption we have that ( q W,r,N ) N , where q W,r,N ( n ) = q N ( W n + r ) , is good and ( q W,r,N , q W,r,N , . . . , ℓq

W,r,N ) N is super nice.By Theorem 2.2, we have that for any ≤ r ≤ W the sequence ( B w ,r ( N )) N converges,in L ( µ ) , to the same limit as the sequence N P Nn =1 Q ℓi =1 T in f i , and since lim N →∞ k A ( W N + r ) − A ( W N ) k L ( µ ) = 0 for every ≤ r ≤ W , we get the result. (cid:3) Proof of Theorem 2.16.

The argument is analogous to the one in the previous statement.We deﬁne A ( N ) := N P Nn =1 Λ ′ ( n ) · T [ q ,N ( n )] f · . . . · T [ q ℓ,N ( n )] f ℓ and for w, r ∈ N , B w,r ( N ) := N P Nn =1 T [ q ,N ( W n + r )] f · . . . · T [ q ℓ,N ( W n + r )] f ℓ . We use Proposition 7.6 with m = ℓ, T i = T, ≤ i ≤ ℓ, p i,j,N = (cid:26) , if i = jq i,N , if i = j , and we note that the family ( q W,r, ,N , . . . , q W,r,ℓ,N ) N is good and super nice, where q W,r,i,N ( n ) = q i,N ( W n + r ) . The result now follows similarlyto the previous one since, by Theorem 2.1, we have that for any ≤ r ≤ W the sequence ( B w ,r ( N )) N converges, in L ( µ ) , to Q ℓi =1 R f i dµ. (cid:3) We now prove both Theorems 2.22 and 2.23 for the set P − the corresponding resultsfor P + 1 follow by the same arguments. Proof of Theorem 2.22.

Using Proposition 7.6 with m = ℓ, T i = T, ≤ i ≤ ℓ, r = 1 and p i,j,N ( n ) = (cid:26) , if i ≤ ℓ − jq N ( n − , elsewhere , together with Corollary 2.4, we have, forsuﬃciently large ω ∈ N , that lim inf N →∞ N N X n =1 Λ ′ ω, ( n ) · µ A ∩ ℓ \ i =1 T − i [ q N ( W n )] A ! > , from which we get the required non-empty intersection with P − . (cid:3) Proof of Theorem 2.23.

The proof is analogous to the one of Theorem 2.22. More specif-ically, we use Proposition 7.6 with m = ℓ, T i = T, ≤ i ≤ ℓ, r = 1 and p i,j,N ( n ) = (cid:26) , if i = jq i,N ( n − , if i = j , and Corollary 2.9 instead of Corollary 2.4 to get, for some suﬃ-ciently large ω ∈ N , that lim inf N →∞ N N X n =1 Λ ′ ω, ( n ) · µ (cid:16) A ∩ T − [ q ,N ( W n )] A ∩ . . . ∩ T − [ q ℓ,N ( W n )] A (cid:17) > , from which we get the result. (cid:3) Closing comments and problems.

In the generality that it’s stated, Problem 1(i.e., [14, Problem 10]), except in the ℓ = 1 case, remains open. In this article, we ﬁrstshowed that the nilfactor of a system is characteristic for the corresponding sequence ofiterates under the additional super niceness assumption. Second, for the equidistributionpart, we proceeded to show that the goodness property alone was enough for us to displaywhat we wanted in full generality. This comes as no surprise though for, as we have alreadymentioned, the goodness property is a strong equidistribution notion. Hence, to completelyresolve the problem, someone has to show the following: Problem 3.

For ℓ ∈ N let ( P N ) N = ( p ,N , . . . , p ℓ,N ) N be a good sequence of ℓ -tuples ofpolynomials. Then, for every system, its nilfactor Z is characteristic for ( P N ) N . Analogously, to verify Problem 2, it suﬃces to show:

Problem 4.

Let ( p N ) N be a good sequence of polynomials. Then, for every ℓ ∈ N andevery system, its nilfactor Z is characteristic for ( P N ) N = ( p N , p N . . . , ℓp N ) N . The resolution of these problems will of course lead to stronger variable-iterate resultsalong prime (and shifted prime) numbers.As in our results we have convergence to the expected limit, it is reasonable, under therecent developments (see [32]), for someone to study the corresponding pointwise resultsalong natural numbers. So, we naturally close this article with the following problem:

Problem 5.

Find classes of good variable polynomial iterates (e.g., the ones in Theo-rems 2.1 and 2.2) for which we have the corresponding pointwise convergence results. Acknowledgements.

Thanks go to D. Karageorgos with whom I started discussing theproblem; N. Frantzikinakis for his constant support and fruitful discussions during thewriting of this article; and N. Kotsonis for his detailed corrections on the text.

References [1] T. Austin. Pleasant extensions retaining algebraic structure, I.

J. Anal. Math. (2015), 1–36.[2] T. Austin. Pleasant extensions retaining algebraic structure, II.

J. Anal. Math. (2015), 1–111.[3] V. Bergelson. Ergodic Ramsey theory.

Logic and combinatorics (Arcata, Calif., 1985) , 63–87,

Contemp.Math. , , Amer. Math. Soc., Providence, RI , 1987.[4] V. Bergelson. Weakly mixing PET.

Ergodic Theory Dynam. Systems (1987), no. 3, 337–349.[5] V. Bergelson, I. Håland-Knutson. Weakly mixing implies mixing of higher orders along tempered func-tions. Ergodic Theory Dynam. Systems (2009), no. 5, 1375–1416.[6] V. Bergelson, A. Leibman. A nilpotent Roth theorem. Invent. Math. (2002), 429–470.[7] V. Bergelson, A. Leibman. Polynomial extensions of van der Waerden’s and Szemerédi’s Theorems.

Journal of the A. M. S. (1996), 725–753.[8] Q. Chu, N. Frantzikinakis, B. Host. Ergodic averages of commuting transformations with distinct degreepolynomial iterates. Proc. of the London Math. Society. (3), (2011), 801–842.[9] S. Donoso, A. Koutsogiannis, W. Sun. Pointwise multiple averages for sublinear functions.

ErgodicTheory Dynam. Systems (2020), 1594–1618.[10] S. Donoso, A. Koutsogiannis, W. Sun. Seminorms for multiple averages along polynomials and appli-cations to joint ergodicity. To appear in J. d’Analyse Math.[11] N. Frantzikinakis. A multidimensional Szemerédi theorem for Hardy sequences of diﬀerent growth. Tran. of the A. M. S. , no. 8, (2015), 5653–5692.[12] N. Frantzikinakis. Equidistribution of sparse sequences on nilmanifolds.

J. d’Analyse Math. (2009),353–395.[13] N. Frantzikinakis. Multiple recurrence and convergence for Hardy sequences of polynomial growth.

J.d’Analyse Math. (2010), 79–135.[14] N. Frantzikinakis. Some open problems on multiple ergodic averages.

Bulletin of the Hellenic Mathe-matical Society. (2016), 41-90.[15] N. Frantzikinakis. An averaged Chowla and Elliott conjecture along independent polynomials. Inter-national Mathematics Research Notices (2018), no. 12, 3721–3743.[16] N. Frantzikinakis, B. Host. Higher order Fourier analysis of multiplicative functions and applications.

Journal of the A. M. S. , no. 1, (2017), 67-157. Of course someone can start by studying the pointwise convergence for the special cases of averageswith iterates from Examples 1 and 2.

ULTIPLE ERGODIC AVERAGES FOR VARIABLE POLYNOMIALS 47 [17] N. Frantzikinakis, B. Host, B. Kra. Multiple recurrence and convergence for sequences related to theprime numbers.

J. Reine Angew. Math. (2007), 131–144.[18] N. Frantzikinakis, B. Host, B. Kra. The polynomial multidimensional Szemerédi theorem along shiftedprimes.

Israel J. Math. (2013), no. 1, 331–348.[19] N. Frantzikinakis, B. Kra. Polynomial averages converge to the product of integrals.

Israel. J. Math.

148 (2005), 267–276.[20] H. Furstenberg. Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmeticprogressions.

J. d’Analyse Math. (1977), 204-–256.[21] W. Gowers. A new proof of Szemerédi’s theorem. Geom. Funct. Anal. (2001), 465–588.[22] B. Green, T. Tao. The quantitative behaviour of polynomial orbits on nilmanifolds. Annals of Math. (2012), 465–540.[23] B. Green, T. Tao. Linear equations in primes.

Annals. Math. (2010), 1753–1850.[24] B. Green, T. Tao. The Möbius function is strongly orthogonal to nilsequences.

Annals. Math. (2012), no. 2, 541–566.[25] B. Green, T. Tao, T. Ziegler. An inverse theorem for the Gowers U s +1 [ N ] -norm. Annals. Math. (2012), no. 2, 1231–1372.[26] B. Host, B. Kra. Nonconventional ergodic averages and nilmanifolds.

Annals of Math. (2005), no.1, 397–488.[27] D. Karageorgos, A. Koutsogiannis. Integer part independent polynomial averages and applicationsalong primes.

Studia Mathematica (2019), no. 3, 233–257.[28] Y. Kifer. Ergodic theorems for nonconventional arrays and an extension of the Szemerédi theorem.

Discrete & Continuous Dynamical Systems (2018), no. 6, 2687–2716.[29] A. Koutsogiannis. Closest integer polynomial multiple recurrence along shifted primes. Ergodic TheoryDynam. Systems (2018), no. 2, 666–685.[30] A. Koutsogiannis. Integer part polynomial correlation sequences. Ergodic Theory Dynam. Systems (2018), no. 4, 1525–1542.[31] A Koutsogiannis. Multiple ergodic averages for tempered functions. To appear in Descrete & Contin-uous Dynamical Systems.[32] B. Krause, M. Mirek, T. Tao. Pointwise ergodic theorems for non-conventional bilinear polynomialaverages. Preprint, arXiv:2008.00857.[33] L. Kuipers, H. Niederreiter. Uniform distribution of sequences. Pure and Applied Mathematics. Wiley-Interscience , New York-London-Sydney, 1974.[34] A. Leibman. Pointwise Convergence of ergodic averages for polynomial sequences of translations on anilmanifold.

Ergodic Theory Dynam. Systems (2005) no. 1, 201–213.[35] T. Tao, T. Ziegler. Concatenation Theorems for anti-Gowers-uniform functions and Host-Kra charac-teristic factors. Discrete Anal.

Annals of Math. (2012), no. 3, 1667–1688.(Andreas Koutsogiannis)

The Ohio State University, Department of mathematics, Columbus,Ohio, USA

Email address ::