[PDF] Convergence of Polynomial Ergodic Averages of Several Variables for some Commuting Transformations

Abstract

Let (X,B,μ) be a probability space and let T 1 ,..., T l be l commuting invertible measure preserving transformations \linebreak of X . We show that if T c 1 1 ... T c l l is ergodic for each ( c 1 ,..., c l )≠(0,...,0) , then the averages 1 | Φ N | ∑ u∈ Φ N ∏ r i=1 T p i1 (u) 1 ... T p il (u) l f i converge in L 2 (μ) for all polynomials p ij : Z d →Z , all f i ∈ L ∞ (μ) , and all Følner sequences { Φ N } ∞ N=1 in Z d .

Full PDF

aa r X i v : . [ m a t h . D S ] J un CONVERGENCE OF POLYNOMIAL ERGODICAVERAGES OF SEVERAL VARIABLES FOR SOMECOMMUTING TRANSFORMATIONS

MICHAEL C. R. JOHNSONDepartment of MathematicsNorthwestern UniversityEvanston, IL 60201

Abstract.

Let ( X, B , µ ) be a probability space and let T , . . . , T l be l commuting invertible measure preserving transformationsof X . We show that if T c . . . T c l l is ergodic for each ( c , . . . , c l ) =(0 , . . . , | Φ N | P u ∈ Φ N Q ri =1 T p i ( u )1 . . . T p il ( u ) l f i converge in L ( µ ) for all polynomials p ij : Z d → Z , all f i ∈ L ∞ ( µ ),and all Følner sequences { Φ N } ∞ N =1 in Z d . introduction In 1996, Bergelson and Leibman proved the following generalizationof Furstenberg’s Multiple Recurrence Theorem [Fu1], corresponding tothe multidimensional polynomial version of Szemer´edi’s theorem.

Theorem 1.1. [BL]

Let ( X, B , µ ) be a probability space, let T , . . . , T l be commuting invertible measure preserving transformations of X , let p ij : Z → Z be polynomials satisfying p ij (0) = 0 for all ≤ i ≤ r, ≤ j ≤ l , and let A ∈ B with µ ( A ) > . Then lim inf N →∞ N N − X n =0 µ ( r \ i =1 T − p i ( n )1 . . . T − p il ( n ) l A ) > . Furstenberg’s theorem corresponds to the case that p ij ( n ) = n for i = j , p ij ( n ) = 0 for i = j and each T i = T i . In this linear case, Hostand Kra [HK1] showed that the lim inf is in fact a limit. Host andKra [HK2] and Leibman [Le2] proved convergence in the polynomialcase assuming all T i = T . It is natural to ask whether the generalcommuting averages for polynomials in Theorem 1.1 converge. Deﬁnition 1.2.

We say ( T , . . . , T l ) is a totally ergodic generat-ing set of invertible measure preserving transformations of X if each T c T c . . . T c l l is ergodic for any choice of ( c , . . . , c l ) = (0 , . . . , We note that if ( T , . . . , T l ) is a totally ergodic generating set of in-vertible measure preserving transformations of a non-trivial probabilityspace ( X, B , µ ), then the associated group of transformations gener-ated by T , . . . , T l is a free abelian group with l generators. We showthat given a totally ergodic generating set of transformations, we ob-tain convergence in L ( µ ) for the averages in Theorem 1.1. We provea statement replacing indicator functions with arbitrary functions in L ∞ ( µ ). Theorem 1.3.

Let ( X, B , µ ) be a probability space, let ( T , . . . , T l ) be atotally ergodic generating set of commuting invertible measure preserv-ing transformations of X , and let p ij : Z d → Z for ≤ i ≤ r, ≤ j ≤ l be polynomials. For any f , . . . , f r ∈ L ∞ ( µ ) and any Følner sequence { Φ N } ∞ N =1 in Z d , the averages (1) 1 | Φ N | X u ∈ Φ N r Y i =1 f i ( T p i ( u )1 . . . T p il ( u ) l x ) converge in L ( µ ) as N → ∞ . Without the assumption that ( T , . . . , T l ) form a totally ergodic gen-erating set, convergence for the above averages in (1) remains openand is only known in the linear case. Frantzikinakis and Kra [FrK]showed that given p ij ( n ) = n for i = j and p ij ( n ) = 0 for i = j , if weassume that T i is ergodic for each i ∈ { , . . . , l } and T i T − j is ergodicfor all i = j , we obtain convergence in L ( µ ). Tao [Ta] recently provedconvergence in L ( µ ) for the general linear case without the ergodicityassumptions needed in [FrK].In previous results, convergence was often shown by proving thatthe averages in (1) do not change by replacing each function with itsconditional expectation on a certain characteristic factor, namely aninverse limit of nilsystems. This characteristic factor, is then shown tohave algebraic structures for which convergence is known. We deﬁnethese terms precisely in the section below. To prove our theorem, wecombine this technique with a modiﬁed version of PET-induction asintroduced by Bergelson [Be].2. Preliminaries

For simplicity, we assume all functions are real valued. All theoremsand deﬁnitions hold for complex valued functions with obvious minormodiﬁcations. Throughout, we use the notation

T f = f ( T ). ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 3

Nilsystems.Deﬁnition 2.1.

Let G be a k -step nilpotent Lie group, let Γ be adiscrete cocompact subgroup of G , let X = G/ Γ, and let B be theBorel σ -algebra associated to X . For each g ∈ G , let T g : G/ Γ → G/ Γbe deﬁned by T g ( x Γ) = gx Γ, and let µ be Haar measure, the uniquemeasure on ( X, B ) invariant under left translations by elements in G.We call ( X, B , µ, ( T g , g ∈ G )) a nilsystem.Deﬁnition 2.2. A sequence of ﬁnite subsets { Φ N } ∞ N =1 of a countable,discrete group G is a Følner sequence if for all g ∈ G ,lim n →∞ | g Φ n ∆Φ n || Φ n | = 0 , where ∆ is the symmetric diﬀerence operation.Ergodic averages in nilsystems have been well studied. We make useof the following theorem of Leibman: Theorem 2.3. [Le1]

Let ( X, B , µ, ( T g , g ∈ G )) be a nilsystem with X = G/ Γ , g , . . . , g l ∈ G , and p , . . . , p l : Z d → Z be polynomials.Then for any f ∈ C ( X ) and any Følner sequence { Φ N } ∞ N =1 in Z d , theaverages | Φ N | X u ∈ Φ N T p ( u ) g . . . T p l ( u ) g l f converge pointwise as N → ∞ . Corollary 2.4.

Let ( X, B , µ, ( T g , g ∈ G )) be a nilsystem with X = G/ Γ , g , . . . , g l ∈ G , and p ij : Z d → Z for ≤ i ≤ r, ≤ j ≤ l bepolynomials. Then for any f , . . . , f r ∈ L ∞ ( µ ) and any Følner sequence { Φ N } ∞ N =1 in Z d , the averages | Φ N | X u ∈ Φ N r Y i =1 T p i ( u ) g . . . T p il ( u ) g l f i converge in L ( µ ) as N → ∞ .Proof. We apply Theorem 2.3 to X r , with transformations ˆ T ij : X r → X r for 1 ≤ i ≤ r, ≤ j ≤ l deﬁned byˆ T ij ( x , x , . . . , x r ) = ( x , . . . , x i − , T g j ( x i ) , x i +1 , . . . , x r ) . Using polynomials p ij : Z d → Z for 1 ≤ i ≤ r, ≤ j ≤ l and f = f ⊗ . . . ⊗ f r , we getˆ T p ( u )11 . . . ˆ T p l ( u )1 l . . . ˆ T p r ( u ) r . . . ˆ T p rl ( u ) rl f = r Y i =1 T p i ( u ) g . . . T p il ( u ) g l f i . MICHAEL C. R. JOHNSON

Theorem 2.3 guarantees the required averages converge pointwisefor each f , . . . , f r ∈ C ( X ). Using the density of C ( X ) in L ( µ ), L ( µ )convergence follows for arbitrary f , . . . , f r ∈ L ∞ ( µ ). (cid:3) The Host-Kra seminorms |||·||| k . We brieﬂy review the construc-tion of the Host-Kra seminorms on L ∞ ( µ ) from [HK1]. As our settingdeals with multiple commuting transformations, we must specify whichtransformation is used. In this section, T is an ergodic measure pre-serving transformation of ( X, B , µ ).For each k ≥ µ [ k ] T on X [ k ] = X k ,invariant under T [ k ] = T × · · · × T (2 k times).Set µ [0] T = µ . For k ≥

0, let I [ k ] T be the σ -algebra of T [ k ] -invariantsubsets of X [ k ] . Then deﬁne µ [ k +1] T = µ [ k ] T × I [ k ] T µ [ k ] T to be the relativelyindependent square of µ [ k ] T over I [ k ] T . This means for F, G ∈ L ∞ ( µ [ k ] ) Z X [ k +1] F ( x ′ ) G ( x ′′ ) dµ [ k +1] T ( x ′ , x ′′ ) = Z X [ k ] E ( F |I [ k ] T ) E ( G |I [ k ] T ) dµ [ k ] T , where E ( ·|· ) is the conditional expectation operation.Using these measures, deﬁne ||| f ||| k k,T = Z X [ k ] k − Y j =0 f ( x j ) dµ [ k ] T ( x )for a bounded function f ∈ L ∞ ( µ ) and k ≥

1. It is shown in [HK1]that for every k ≥ T , ||| · ||| k,T is a seminorm on L ∞ ( µ ). Also, for f ∈ L ∞ ( µ ), we have ||| f ||| ,T = | R f dµ | and for every k ≥ ||| f ||| k,T ≤ ||| f ||| k +1 ,T ≤ || f || L ∞ ( µ ) . The Host-Kra factors Z k ( X ) . We now deﬁne an increasing se-quence of factors { Z k ( X, T ) : k ≥ } as constructed in [HK1]. Let Z k ( X, T ) be the T -invariant sub- σ -algebra characterized by the follow-ing property: for every f ∈ L ∞ ( µ ), E ( f |Z k ( X, T )) = 0 if and only if ||| f ||| k +1 ,T = 0. We deﬁne Z k ( X, T ) to be the factor of X associated tothe sub- σ -algebra Z k . Thus Z ( X, T ) is the trivial factor and Z ( X, T )is the Kronecker factor.

A priori , these constructions depend on thetransformation T .Indeed, the following observation of Frantzikinakis and Kra showsthat given basic assumptions, none of the previous constructions de-pend on the transformation T . Proposition 2.5. [FrK]

Assume that T and S are ergodic commut-ing invertible measure preserving transformations of a space ( X, B , µ ) . ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 5

Then for all k ≥ and all f ∈ L ∞ ( µ ) , ||| f ||| k,T = ||| f ||| k,S and Z k ( X, T ) = Z k ( X, S ) . Thus we discard T from our notation. Deﬁnition 2.6.

We call a probability space ( X, B , µ ) with l invertiblecommuting measure preserving transformations T , . . . , T l , an (invert-ible commuting measure preserving) system . If ( T , . . . , T l ) isalso a totally ergodic generating set, then we call it a freely gener-ated totally ergodic system (with generators ( T , . . . , T l ) ) . Wedenote it as ( X, B , µ, ( T , . . . , T l )). A system ( X, B , µ, ( T , . . . , T l )) isan inverse limit of systems ( X, B i , µ i , ( T , . . . , T l )) if each B i ⊂ B i +1 and B = W ∞ i =1 B i up to sets of measure zero.The main result of the Host-Kra theory is that each of the factors( Z k , T i ) is isomorphic to an inverse limit of k -step nilsystems. However,such isomorphism a priori depends on the transformation T i . (Notethat by Proposition 2.5, Z k ( X, T i ), does not depend on i ). In [FrK],they deal speciﬁcally with this technicality. We say that a system( X, B , µ, ( T , . . . , T l )) has order k if X = Z k ( X ). Theorem 2.7. [FrK]

Any system ( X, B , µ, ( T , . . . , T l )) of order k isan inverse limit of a sequence of systems ( X, B i , µ i , ( T , . . . , T l )) , eacharising from k -step nilsystems, where X = G i / Γ i and each transforma-tion T , . . . , T l is a left translation of G i / Γ i by an element in G i . By combining Theorem 2.7 and Corollary 2.4, Theorem 1.3 is provedin the case that X = Z k ( X ) for some k .2.4. Characteristic factors and ED-sets.Deﬁnition 2.8.

We say a sub- σ -algebra X ⊆ B is a characteristicfactor for L ( µ ) -convergence of the averages(1) 1 | Φ N | X u ∈ Φ N r Y i =1 T p i ( u )1 . . . T p il ( u ) l f i if X is T j invariant for all 1 ≤ j ≤ l and the averages in (1) converge to 0in L ( µ ) for any Følner sequence { Φ N } ∞ N =1 in Z d whenever E ( f i |X ) = 0for some 1 ≤ i ≤ r .Using the multilinearity of our averages in (1), it only remains toshow that for some k ∈ N , Z k ( X ) is a characteristic factor.To simplify future arguments, we require that our set of polynomialshave a property related to being essentially distinct, as deﬁned in [Le2]. Deﬁnition 2.9.

We say the set of polynomials P = { p ij : Z d → Z for1 ≤ i ≤ r, ≤ j ≤ l } is an ED-set if all of the following hold:

MICHAEL C. R. JOHNSON (1) Each p ij in P is not equal to a nonzero constant.(2) No two polynomials p i j , p i j in P diﬀer by a nonzero constant.(3) For each i = 1 , . . . , r , there is some j ∈ { , . . . , l } where p ij isnonzero.(4) For each distinct pair i , i ∈ { , . . . , r } , there is some j ∈{ , . . . , l } where p i j = p i j .Conditions (1) and (2) are related to the polynomials being essentiallydistinct. When P is viewed as an r × l matrix whose entries are poly-nomials, condition (3) requires that P contains no rows of all zeros,and condition (4) requires that P does not have identical rows.We note that Theorem 1.3 is trivially true if all the polynomialsare identically zero. By replacing each f i with T c . . . T c l l f i for some c , . . . , c l ∈ Z , we may assume that our set of polynomials satisﬁesconditions (1) and (2). When T p i . . . T p il l f i = f i , we factor f i out ofour average. Thus, we further assume our polynomials satisfy condition(3). By writing T . . . T l f T . . . T l g as T . . . T l ( f g ) we may assume thatour set of polynomials also satisﬁes condition (4), and hence is an ED-set. Thus the main theorem is a consequence of the following: Proposition 2.10.

Let ( X, B , µ, ( T , . . . , T l )) be a freely generated to-tally ergodic system and P = { p ij : Z d → Z for ≤ i ≤ r, ≤ j ≤ l } bean ED-set of polynomials. Then there exists k ∈ N such that for any f , . . . , f r ∈ L ∞ ( µ ) with ||| f m ||| k = 0 for some ≤ m ≤ r , we have lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N ( r Y i =1 T p i ( u )1 T p i ( u )2 . . . T p il l f i ) (cid:13)(cid:13)(cid:13) L ( µ ) = 0 for any Følner sequence { Φ N } ∞ N =1 in Z d . We note that the above integer k is only dependent on the set ofpolynomials P and not on the system ( X, B , µ, ( T , . . . , T l )) or the di-mension d . By relabeling our polynomials and functions, we need onlyprove Proposition 2.10 in the case that ||| f ||| k = 0 for some k ∈ N .3. Linear case

To prove proposition 2.10, we use PET-induction as introduced byBergelson in [Be]. In this section we prove the base case of the induc-tion.

Proposition 3.1.

Let ( X, B , µ, ( T , . . . , T l )) be a freely generated to-tally ergodic system and P = { p ij : Z d → Z for ≤ i ≤ r, ≤ j ≤ l } be an ED-set of linear functions. Then there exists a constant C > ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 7 dependent only on the set of polynomials, such that lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N ( r Y i =1 T p i ( u )1 T p i ( u )2 . . . T p il ( u ) l f i ) (cid:13)(cid:13)(cid:13) L ( µ ) ≤ C min ≤ i ≤ r ||| f i ||| r +1 for any f , . . . , f r ∈ L ∞ ( µ ) with k f i k L ∞ ( µ ) ≤ and any Følner sequence { Φ N } ∞ N =1 in Z d . As a corollary, we get that Z r ( X ) is characteristic for the averages in(1) when each of the polynomials in P is linear. We use the followingversion of the van der Corput lemma in the inductive process to reduceeach average to a previous step. Lemma 3.2. [BMZ]

Let { g u } u ∈ G be a bounded family of elements ofa Hilbert space H indexed by elements of a ﬁnitely generated abeliangroup G and let { Φ N } ∞ N =1 be a Følner sequence in G. (1) For any ﬁnite set F ⊆ G , lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N g u (cid:13)(cid:13)(cid:13) ≤ lim sup N →∞ | F | X v,w ∈ F | Φ N | X u ∈ Φ N h g u + v , g u + w i . (2) There exists a Følner sequence { Θ M } ∞ M =1 in G such that lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N g u (cid:13)(cid:13)(cid:13) ≤ lim sup M →∞ M X ( u,v,w ) ∈ Θ M h g u + v , g u + w i . Leibman proved the following lemma in his proof of convergence fora single transformation [Le2]. We likewise use his lemma to prove thelinear case for multiple commuting transformations.

Lemma 3.3. [Le2](1)

Let p i : Z d → Z be nonconstant linear functions for each i =1 , . . . , l . There exists a constant C , such that for any f ∈ L ∞ ( µ ) and any Følner sequence { Φ N } ∞ N =1 in Z d , lim N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N T p ( u )1 . . . T p l ( u ) l f (cid:13)(cid:13)(cid:13) L ( µ ) ≤ C ||| f ||| . (2) Let p i : Z d → Z be nonconstant linear functions for each i =1 , . . . , l . There exists a constant C, such that for any f ∈ L ∞ ( µ ) and any Følner sequence { Φ N } ∞ N =1 in Z d , lim N →∞ | Φ N | X u ∈ Φ N ||| f · T p ( u )1 . . . T p l ( u ) l f ||| k k ≤ C ||| f ||| k +1 k +1 . MICHAEL C. R. JOHNSON

We note here that Lemma 3.3 is similar to Lemmas 7 and 8 in [Le2]but with multiple commuting transformations. The only step neededto alter his proof is to show our average also convergences to the con-ditional expection of f onto the appropriate sub- σ -algebra. But thisfollows from classical results on convergence for amenable group ac-tions. Proof of Proposition 3.1.

To simplify notation, we write T p i ( u )1 . . . T p il ( u ) l as S p i ( u ) . Since each p ij is a linear polynomial, we have S p i ( u ) S p i ( v ) = S p i ( u + v ) .We proceed by induction on r . For r = 1, we are done by Lemma3.3. Assume the proposition holds for r − f , . . . , f r beessentially bounded functions on X with || f i || L ∞ ( µ ) ≤ ≤ i ≤ r ,and let { Φ N } ∞ N =1 be a Følner sequence in Z d . By applying Lemma 3.2to g u = S p i ( u ) f . . . S p r ( u ) f r , for any ﬁnite F ⊆ Z d , we getlim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N r Y i =1 S p i ( u ) f i (cid:13)(cid:13)(cid:13) L ( µ ) ≤ lim sup N →∞ | F | X v,w ∈ F | Φ N | X u ∈ Φ N Z X r Y i =1 S p i ( u + v ) f i · r Y i =1 S p i ( u + w ) f i dµ = lim sup N →∞ | F | X v,w ∈ F | Φ N | X u ∈ Φ N Z X r − Y i =1 S p i ( u ) S − p r ( u ) ( S p i ( v ) f i · S p i ( w ) f i ) · ( S p r ( v ) f r · S p r ( w ) f r ) dµ ≤ | F | X v,w ∈ F lim sup N →∞ (cid:13)(cid:13)(cid:13) N X u ∈ Φ N r − Y i =1 S ( p i − p r )( u ) ( S p i ( v ) f i · S p i ( w ) f i ) (cid:13)(cid:13)(cid:13) L ( µ ) . Since P is an ED-set, so is the family { ( p ij − p rj ) : Z d → Z for 1 ≤ i ≤ r − , ≤ j ≤ l } . By the induction process, there exists a constant C ,independent of f , . . . , f r and { Φ N } ∞ N =1 , such thatlim sup N →∞ (cid:13)(cid:13)(cid:13) N X u ∈ Φ N r − Y i =1 S ( p i − p r )( u ) ( S p i ( v ) f i · S p i ( w ) f i ) (cid:13)(cid:13)(cid:13) L ( µ ) ≤ C ||| ( S p i ( v ) f i · S p i ( w ) f i ) ||| r ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 9 for all ( v, w ) ∈ Z d and i ∈ { , . . . , r } . Thus for any ﬁnite set F ⊂ Z d and i ∈ { , . . . , r } ,lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N r Y i =1 S p i ( u ) f i (cid:13)(cid:13)(cid:13) L ( µ ) ≤ (cid:16) C | F | X v,w ∈ F ||| ( S p i ( v ) f i · S p i ( w ) f i ) ||| r (cid:17) / ≤ C / (cid:16) | F | X v,w ∈ F ||| ( f i · S p i ( w − v ) f i ) ||| r r (cid:17) (1 / r +1 . Let { Ψ N } ∞ N =1 be any Følner sequence in Z d . Thus { Ψ N × Ψ N } ∞ N =1 isa Følner sequence in Z d . By Lemma 3.3 we have for each i ∈ { , . . . , r } lim sup M →∞ | Ψ M | X v,w ∈ Ψ M ||| f i · S p i ( w − v ) f i ||| r r ≤ c ||| f i ||| r +1 r +1 with c independent of f i . By replacing F with Ψ N for each N ∈ N , weget lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N r Y i =1 S p i ( u ) f i (cid:13)(cid:13)(cid:13) L ( µ ) ≤ C / c (1 / r +1 min i ≤ r ||| f i ||| r +1 . (cid:3) Non-linear Case

We now deal with the inductive step. A set of polynomials P = { p ij :1 ≤ i ≤ r, ≤ j ≤ l } where each p ij : Z d → Z is called a (integer)polynomial family . We view P as an r × l matrix whose entries arethe polynomials p ij . We deﬁne the degree of a family P ,deg( P ) = max { deg( p ij ) : p ij ∈ P } . Let D ∈ N . We deﬁne the column degree of a polynomial family P with deg( P ) ≤ D to be the vector C ( P ) = ( c , . . . , c D ) where c i is thenumber of columns whose maximal degree is i .We say that two polynomials p , q are equivalent if deg( p ) = deg( q )and deg( p − q ) < deg( p ). Thus any collection of polynomials canbe partitioned into equivalence classes. We deﬁne the degree of anequivalence class of polynomials to be equal to the degree of any of itsrepresentatives.For a family P with deg( P ) ≤ D , we deﬁne the column weight of acolumn j , to be the vector w j ( P ) = ( w j , . . . , w Dj ) with each w ij equalto the number of equivalence classes in P of degree i in column j . Given two vectors v = ( v , . . . , v D ), v ′ = ( v ′ . . . , v ′ D ), we say v < v ′ thereexists n such that v n < v ′ n and for each n > n , v n = v ′ n . Thus theset of weights and the set of column degrees become well ordered sets.For each polynomial family P with deg( P ) ≤ D , we deﬁne the sub-weight of P to be the matrix w ( P ) = [ w ( P ) . . . w D ( P )] whose columnsare the corresponding column weights of P . Due the fact that our poly-nomial family may have many polynomial entries that are zero, we mustmodify the PET-induction scheme from that of [Le2]. We introduce thefollowing notation to record the position of such zeros in P . Let I = { i ∈ { , . . . , r } : p ij = 0 for all j = 1 , . . . , l } , I = { i ∈ { , . . . , r } : deg( p ij ) ≤ j = 1 . . . , l } \ I , and I = { , . . . , r } \ ( I ∪ I ).When P is an ED-set, I is empty, I records which nonzero rowscontain only polynomials with degree at most 1, while I records whichrows contain a polynomial of degree greater than 2. Deﬁne H ( P ) = I ∪ I and inductively deﬁne H j ( P ) = { i ∈ { , . . . , r } : p ij = 0 } ∩ H j − ( P )for 1 ≤ j ≤ l − P when there is noconfusion which family we are dealing with). Thus, H j records whichnon-identically zero rows have zeros in columns 1 , . . . , j . Pick j to bethe smallest j ≥ H j = ∅ . In the case that column 1 has nozero entries, we note that j = 1.For each polynomial family P and integer a = 1 , . . . , l , we deﬁne thesub-polynomial family P a = { p ij : i ∈ H a − ( P ) , a ≤ j ≤ l } . We note that the entries in the ﬁrst column in P a are precisely theentries of column a of P from nonzero rows whose polynomials are allidentically zero in columns 1 , . . . , a −

1. We note that when P is anED-set, P = P .For each polynomial family P with deg( P ) ≤ D , we deﬁne the weight of P to be the ordered set of matrices W ( P ) = { w ( P ) , . . . , w ( P l ) } .Given two polynomial families P and Q where deg( P ), deg( Q ) ≤ D ,we say that W ( Q ) < W ( P ) if there exists J, A ∈ { , . . . , l } such that w J ( Q A ) < w J ( P A ), but w J ( Q a ) = w J ( P a ) for all 1 ≤ a < A and w j ( Q a ) = w j ( P a ) for all 1 ≤ j < J and a = 1 , . . . l . Example.

Let P =  n n n n

00 2 n n  . We see that P is an ED-set,and H ( P ) = { , } . Thus P = (cid:18) n n n (cid:19) . Since H ( P ) = ∅ , P is ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 11 the empty family. Therefore w ( P ) = (cid:20) (cid:21) , w ( P ) = (cid:20) (cid:21) , and w ( P ) = (cid:20) (cid:21) . Let Q =  n − n + 1 − n + 1 n + 1 n + 2 n + 1 − n + 1 n + 10 − n n − n + 1 3 n + 30 n + 2 n + 1 3 n + 3  .Q is also an ED-set, and we have H ( Q ) = { , , } . So, Q =  − n n − n + 1 3 n + 3 n + 2 n + 1 3 n + 3  . Since H ( Q ) = ∅ , Q is the empty family. Therefore w ( Q ) = (cid:20) (cid:21) , w ( Q ) = (cid:20) (cid:21) , and w ( Q ) = (cid:20) (cid:21) .We note that w ( P ) = w ( Q ). However, since w ( Q ) = w ( P ) but w ( Q ) < w ( P ), we have W ( Q ) < W ( P ). We have implicitly chosen D = 2 in this example. As long as D is at least as large as the degree ofall polynomial families under consideration, it will not aﬀect whether W ( Q ) < W ( P ).A polynomial family P = { p ij } is said to be standard if it is anED-set and deg( p j ) = deg( P ) for some 1 ≤ j ≤ l . We now stateProposition 2.10 in the case that P is standard. Proposition 4.1.

Let ( X, B , µ, ( T , . . . , T l )) be a freely generated to-tally ergodic system and P = { p ij : 1 ≤ i ≤ r, ≤ j ≤ l } be a stan-dard polynomial family. Then there exists k ∈ N such that for any f , . . . , f r ∈ L ∞ ( µ ) with ||| f ||| k = 0 , we have lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N ( r Y i =1 T p i ( u )1 T p i ( u )2 . . . T p il ( u ) l f i ) (cid:13)(cid:13)(cid:13) L ( µ ) = 0 for any Følner sequence { Φ N } ∞ N =1 in Z d . To prove Proposition 4.1, we construct a new polynomial family Q that controls the above averages, where W ( Q ) < W ( P ). This processis a modiﬁed version of the PET-induction process used in [Le2] for asingle transformation. Inductive Polynomial Families.

We begin by deﬁning that acertain property holds for almost all v ∈ Z d if the set of elements forwhich the property does not hold is contained in a set of zero densitywith respect to any Følner sequence in Z d . To show a property holdsfor almost all v ∈ Z d , we use the fact that a set of zeros of a nontrivialpolynomial has zero density with respect to any Følner sequence in Z d .Given any standard polynomial family P with deg( P ) ≥ p ) = deg( P ), for each ( v, w ) ∈ Z d we construct a new family P v,w , as follows. We ﬁrst select an appropriate row i in P , so that P v,w is standard for almost all ( v, w ) ∈ Z d and W ( P v,w ) < W ( P ).We split into the following ﬁve cases. • Case 1: H = ∅ and some p i is not equivalent to p .Choose the smallest i so that p i has minimal degree overall p i that are not equivalent to p . • Case 2: H = ∅ , all p i are equivalent to p , and there is some i, j where p ij is not equivalent to p j and the degree of either p ij or p j equals deg( P ).Choose i to be the smallest such i where p ij is not equivalentto p j and the degree of either p ij or p j equals deg( P ). • Case 3: H = ∅ , all p i are equivalent to p and for all j either p ij is equivalent to p j for all i = 1 . . . r . or deg( p ij ) < deg( P )for all i = 1 . . . r .Choose i = 1. • Case 4: H = ∅ , and some p ij is not equivalent to p i ′ j for i, i ′ ∈ H j − .Choose i to be the smallest i ∈ H j − where p i j has minimaldegree over all p ij that are not equivalent to p . • Case 5: H = ∅ and all p ij are equivalent to p i ′ j for i, i ′ ∈ H j − .Choose i = min H j − .In our construction, we must treat polynomials in P with degree 1diﬀerently than those of greater degree. For all ( v, w ) ∈ Z d , set z ij = (cid:26) w if deg( p ij ) = 1 v otherwise . For a ﬁxed ( v, w ) ∈ Z d , p ij ( u + z ij ) equals p ij ( u + v ) or p ij ( u + w ),depending only on the degree on p ij . Thus we view p ij ( u + z ij ) and p ij ( u + w ) as polynomials in u . Given ( v, w ) ∈ Z d , we deﬁne the new ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 13 polynomial family¯ P v,w = { p ij ( u + z ij ) , p ij ( u + w ) : i ∈ I , j = 1 . . . , l } [ { p ij ( u + w ) : i ∈ I , j = 1 . . . , l } . We relabel the family¯ P v,w = { q v,w,h,j : 1 ≤ h ≤ s, ≤ j ≤ l } in the following manner. We label each row p i ( u + z i ) , . . . , p il ( u + z il )and p i ( u + w ) , . . . , p il ( u + w )as q v,w,h, ( u ) , . . . , q v,w,h,l ( u )for some unique 1 ≤ h ≤ s where p j ( u + z j ) = q v,w, ,j and p i j ( u + w ) = q v,w,s,j ( u ).Since for each vector ( v, w ) in Z d , p ij ( u + v ) , p ij ( u + w ), and p ij ( u ) areall equivalent, ¯ P v,w and P have identical column degrees, and w j ( P ) = w j ( P v,w ) for all 1 ≤ j ≤ l and ( v, w ) ∈ Z d . By construction, the ﬁrstrow of ¯ P v,w also contains a polynomial of maximal degree and it is easyto check that ¯ P v,w is an ED-set for each ( v, w ) outside a set of zerosof ﬁnitely many polynomials. Hence, ¯ P v,w is a standard polynomialfamily for almost all ( v, w ) ∈ Z d .Next, for each ( v, w ) ∈ Z d we deﬁne the new family P v,w = { q v,w,h,j − q v,w,s,j : 1 ≤ h ≤ s − , ≤ j ≤ l } . Example.

For P in our previous example on page 10, case (4) appliesand i = 2. It is easy to check that Q = P v,w with ( v, w ) = ( − , Lemma 4.2.

For each standard polynomial family P where deg( P ) ≥ and deg( p ) = deg( P ) , P v,w is standard for almost all choices of ( v, w ) ∈ Z d . Moreover, C ( P v,w ) ≤ C ( P ) , and deg( P v,w ) equals deg( P ) or deg( P ) − .Proof. Since each entry in P v,w is constructed by subtracting 2 poly-nomials from the same column of ¯ P v,w , the maximum degree in each col-umn of P v,w cannot increase. Therefore C ( P v,w ) ≤ C ( P ) anddeg( P v,w ) ≤ deg( P ). It is easy to check that P v,w is an ED-set whenever¯ P v,w is. We now show that the ﬁrst row in P v,w contains a polynomialof maximal degree. We split into the ﬁve cases used to deﬁne i on page 12. In cases 1,4, and 5, p i is not equivalent to p . When p i is not equivalent to p , deg( P v,w ) ≥ deg( q v,w, , − q v,w,s, ) = deg( p ) ≥ deg( P v,w ) . Thus, deg( q v,w, , − q v,w,s, ) = deg( P v,w ) and the ﬁrst row in P v,w con-tains a polynomial of maximal degree.In Case 2, p i j is not equivalent to p j for some 1 ≤ j ≤ l and thedegree of either p ij or p j equals deg( P ). So,deg( P v,w ) ≥ deg( q v,w, ,j − q v,w,s,j ) = deg( P ) ≥ deg( P v,w ) . Thus, deg( q v,w, ,j − q v,w,s,j ) = deg( P v,w ) and the ﬁrst row in P v,w con-tains a polynomial of maximal degree.In Case 3, all p i are equivalent to p , and i = 1. Thus,deg( q v,w, , − q v,w,s, ) = p ( u + v ) − p ( u + w ) = deg( P ) − v, w ) ∈ Z d , since deg( p ) ≥

2. Let j ∈ { , . . . , l } . Theneither p ij is equivalent to p j for all i = 1 , . . . r or deg( p ij ) < deg( P )for all i = 1 . . . r . When p ij is equivalent to p j , thendeg( q v,w,h,j − q v,w,s,j ) < deg( p j ) ≤ deg( P ) . When deg( p ij ) < deg( P ), deg( q v,w,h,j − q v,w,s,j ) < deg( P ) . Thus, allpolynomials in P v,w have degree less than or equal to deg( P ) −

1, andfor almost all ( v, w ) ∈ Z d , deg( q v,w, , − q v,w,s, ) = deg( P ) − . Thereforethe ﬁrst row in P v,w contains a polynomial of maximal degree.In each case, the ﬁrst row in P v,w contains a polynomial of maxi-mal degree, and deg( P v,w ) equals deg( P ) in cases 1,2,4,5 and equalsdeg( P ) − (cid:3) Reduction of Weight.

We now show that the above construc-tion leads to a reduction in the weights of our polynomial families.

Proposition 4.3.

For each ( v, w ) ∈ Z d and each standard polynomialfamily P where deg( p ) = deg( P ) ≥ , we have W ( P v,w ) < W ( P ) .Proof. We show that W ( P v,w ) < W ( P ) for each of our ﬁve cases usedto deﬁne i on page 12. In cases 1,2, and 3, p i has minimal degreeover all p i . For all ( v, w ), the equivalence classes and their degreesin each column remain the same in ¯ P v,w as in P . Thus, w ( P ) = w ( ¯ P v,w ). Column 1 of P v,w is comprised of polynomials q v,w,h, − q v,w,s, ,where q v,w,s, has minimal degree over all q v,w,h, . We now consider eachequivalence class in column 1 of ¯ P v,w as we pass from ¯ P v,w to P v,w . Eachdistinct equivalence class in column 1 of ¯ P v,w not containing q v,w,s, , ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 15 remains a distinct equivalence class of the same degree in column 1 of P v,w . The equivalence class in column one containing q v,w,s, splits intopossibly several equivalence classes of lower degree. Thus, w ( P v,w )

1. Thus, w ( P a ) counts the equivalence classes of polynomials from only thoserows of column a in P whose entries are zero in columns 1 , . . . , a − ≤ a ≤ j . If the h th row of ¯ P v,w has zeros in columns1 , . . . , a −

1, then q v,w,h,a = p ia ( u + v ) or p ia ( u + w ) where p ia ( u ) is apolynomial in P with i ∈ H a − . Moreover, for each i ∈ H a − , there issome row h of ¯ P v,w with zeros in columns 1 , . . . , a − q v,w,h,a = p ia ( u + w ). Thus, the equivalence classes in ¯ P v,w from only those rowsof column a whose entries are zero in columns 1 , . . . , a − P from only those rows of column a whoseentries are zero in columns 1 , . . . , a −

1. Thus, w ( ¯ P av,w ) = w ( P a ).Since i ∈ H j − , q v,w,s,j = 0 for all j = 1 , . . . , j −

1. So, for all j = 1 , . . . , j − q v,w,h,j − q v,w,s,j = q v,w,h,j . Thus the rows in ¯ P v,w (except the last) with zeros in columns 1 , . . . , a −

1, are the same asthe rows in P v,w with zeros in columns 1 , . . . , a − a < j , we have q v,w,s,j = 0, for all j = 1 , . . . , a . So theequivalence classes and their degrees in only those rows of column a whose entries are zero in columns 1 , . . . , a − P v,w and P v,w . Therefore, w ( P av,w ) = w ( ¯ P av,w ) = w ( P a ).When a = j , we have q v,w,s,a = 0. However, q v,w,s,a has minimaldegree over all q v,w,h,a where q v,w,h,a = 0 for all j = 1 , . . . , a −

1. Asbefore, each distinct equivalence class of such polynomials in column a of ¯ P v,w not containing q v,w,s,a , remains a distinct equivalence class ofthe same degree in column a of P v,w . The equivalence class in column a containing q v,w,s,a splits into possibly several equivalence classes of lowerdegree. Therefore, w ( P av,w ) < w ( P a ). Since, w ( P av,w ) = w ( P a ) for a = 1 , . . . , j − w ( P j v,w ) = w ( P j ), W ( P v,w ) < W ( P ). (cid:3) PET-Induction.

Proof of Proposition 4.1.

Let P = { p ij : 1 ≤ i ≤ r, ≤ j ≤ l } be astandard polynomial family. For polynomial families of degree 1, theresult is given by Proposition 3.1. Suppose deg( P ) ≥

2. Since P is standard, by relabeling the transformations, we may assume that deg ( p ) = deg( P ). There are only ﬁnitely many column degrees C ( Q ) < C ( P ) and weights W ( Q ) < Q ( P ) that correspond to fam-ilies Q = { q ij : 1 ≤ i ≤ s, ≤ j ≤ l } where 1 ≤ s ≤ r and C ( Q ) ≤ C ( P ). Thus, we state our PET-induction hypothesis as fol-lows. We assume that for all 1 ≤ s ≤ r there exists k ∈ N such thatfor all standard polynomial families Q = { q ij : 1 ≤ i ≤ s, ≤ j ≤ l } where C ( Q ) < C ( P ), or where C ( Q ) ≤ C ( P ), deg( q ) = deg( Q ), and W ( Q ) < W ( P ), we havelim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N ( s Y i =1 T q i ( u )1 . . . T q il ( u ) l b i ) (cid:13)(cid:13)(cid:13) L ( µ ) = 0 , for any b , . . . , b r ∈ L ∞ ( µ ) with ||| b ||| k = 0, and for each Følner sequence { Φ N } ∞ N =1 in Z d .Now let f , . . . , f r ∈ L ∞ ( µ ) where ||| f ||| k = 0, and let { Φ N } ∞ N =1 be aFølner sequence in Z d . Without loss of generality we may assume that || f i || L ∞ ( µ ) ≤ ≤ i ≤ r . By replacing each f i with T c . . . T c l l f i for some c , . . . , c l ∈ Z , we may assume that each p ij has zero constantterm. In particular, each polynomial in P whose degree is 1 is linear.By Lemma 3.2 and the Cauchy-Schwartz inequality we have for anyﬁnite set F ⊂ Z d ,lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N ( r Y i =1 T p i ( u )1 . . . T p il ( u ) l f i ) (cid:13)(cid:13)(cid:13) L ( µ ) ≤ lim sup N →∞ | F | X v,w ∈ F | Φ N | X u ∈ Φ N Z X r Y i =1 T p i ( u + v )1 . . .T p il ( u + v ) l f i · r Y i =1 T p i ( u + w )1 . . . T p il ( u + w ) l f i dµ ≤ lim sup N →∞ | F | X v,w ∈ F | Φ N | X u ∈ Φ N Z X s Y h =1 T q v,w,h, ( u )1 . . . T q v,w,h,l ( u ) l b v,w,h dµ ≤ | F | X v,w ∈ F lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N s − Y h =1 T ( q v,w,h, − q v,w,s, )( u )1 . . . T ( q v,w,h,l − q v,w,s,l )( u ) l b v,w,h (cid:13)(cid:13)(cid:13) L ( µ ) for ( v, w ) ∈ Z d , where the b v,w,h represent any of the following boundedfunctions: • T p i ( v − z i )1 . . . T p il ( v − z il ) l f i for i ∈ I , ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 17 • f i · T p i ( v ) − p i ( w )1 . . . T p il ( v ) − p il ( w ) l f i for i ∈ I .Since P has degree of at least 2, 1 ∈ I and b v,w, = T t . . . T t l l f forsome t , . . . , t l ∈ Z . Thus, for all k ∈ N and all ( v, w ) ∈ Z d , ||| b v,w, ||| k = ||| f ||| k . However, P v,w = { q v,w,h,j − q v,w,s,j : 1 ≤ h ≤ s − , ≤ j ≤ l } , is a stan-dard polynomial family where 1 ≤ s − ≤ r and W ( P v,w ) < W ( P ) foralmost all ( v, w ) ∈ Z d . We note that whenever deg( q v,w, , − q v,w,s, ) < deg( P v,w ), C ( P v,w ) < C ( P ). By the PET-induction hypothesis, foralmost all choices of ( v, w ) ∈ Z d , we havelim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N s − Y h =1 T ( q v,w,h, − q v,w,s, )( u )1 . . . T ( q v,w,h,l − q v,w,s,l )( u ) l b v,w,h (cid:13)(cid:13)(cid:13) L ( µ ) = 0 . For all other choices of ( v, w ) ∈ Z d , the above average is boundedabove by 1. Therefore,lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N ( r Y i =1 T p i ( u )1 . . . T p il ( u ) l f i ) (cid:13)(cid:13)(cid:13) L ( µ ) ≤ inf F | F | X v,w ∈ F lim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N s − Y h =1 T ( q v,w,h, − q v,w,s, )( u )1 . . . T ( q v,w,h,l − q v,w,s,l )( u ) l b v,w,h (cid:13)(cid:13)(cid:13) L ( µ ) = 0where the inﬁmum is taken over all ﬁnite subsets of Z d . (cid:3) Reduction to the standard case.

Proof of Proposition 2.10.

We now reduce the general case to one in-volving standard systems. Let P = { p ij : 1 ≤ i ≤ r, ≤ j ≤ l } be a (nonstandard) ED-set of polynomials of degree less than b , let f , . . . , f r ∈ L ∞ ( µ ), and let { Φ N } ∞ N =1 be a Følner sequence in Z d . Onceagain, we assume that each polynomial in P has zero constant term. Inotherwords, p ij ( ) = 0 for each polynomial p ij in P , where is the zerovector in Z d . Thus, we have p ij ( u + v ) = p ij ( u + z ij ) + p ij ( v − z ij ) foreach polynomial in P , where z ij is deﬁned as on page 12. By Lemma { Θ N } ∞ N =1 in Z d such thatlim sup N →∞ (cid:13)(cid:13)(cid:13) | Φ N | X u ∈ Φ N ( r Y i =1 T p i ( u )1 . . . T p il ( u ) l f i ) (cid:13)(cid:13)(cid:13) L ( µ ) ≤ lim sup M →∞ M X ( u,v,w ) ∈ Θ M Z X r Y i =1 T p i ( u + v )+ q ( u )1 . . . T p il ( u + v ) l f i r Y i =1 T p i ( u + w )+ q ( u )1 . . . T p il ( u + w ) l f i dµ ≤ lim sup M →∞ (cid:13)(cid:13)(cid:13) M X ( u,v,w ) ∈ Θ M r Y i =1 T p i ( u + z i )+ q ( u )1 . . . T p il ( u + z il ) l ( T p i ( v − z i )1 . . . T p il ( v − z il ) l f i ) r Y i =1 T p i ( u + w )+ q ( u )1 . . . T p il ( u + w ) l f i (cid:13)(cid:13)(cid:13) L ( µ ) where q : Z d → Z is any polynomial of degree b . Whether z ij equals v or w is determined only by the degree of p ij , so each polynomial belowis really only a polynomial in u, v, w . Thus the set { p i ( u + z i )+ q ( u ) , p i ( u + w )+ q ( u ) , p ij ( u + z ij ) , p ij ( u + w ) : 1 ≤ i ≤ r, ≤ j ≤ l } of polynomials from Z d → Z is a standard family of degree b . Thusthere exists k ∈ N (that depends only on the original polynomial family P ) such thatlim sup M →∞ (cid:13)(cid:13)(cid:13) M X ( u,v,w ) ∈ Θ M r Y i =1 T p i ( u + z i )+ q ( u )1 . . . T p il ( u + z il ) l ( T p i ( v − z i )1 . . . T p il ( v − z il ) l f i ) r Y i =1 T p i ( u + w )+ q ( u )1 . . . T p il ( u + w ) l f i (cid:13)(cid:13)(cid:13) L ( µ ) = 0 . (cid:3) References [Be] Bergelson, V. Weakly mixing PET.

Erg. Th. and Dyn. Sys. (1987), 337-349.[BL] Bergelson, V., Leibman, A. Polynomial extensions of van der Waerden’s andSzemer´edi’s theorems. J. Amer. Math. Soc. . (1996), 725-753.[BMZ] Bergelson, V., McCutcheon, R., and Zhang, Q. A Roth theorem foramenable groups. Amer. J. Math. (1997), 1173-1211.

ONVERGENCE OF SEVERAL COMMUTING POLYNOMIAL AVERAGES 19 [CL] Conze, J.P., Lesigne, E. Th´eor´e mes ergodiques pour des mesures diagonales.

Bull. Soc. Math France. (1984), 143-175.[FrK] Frantzikinakis, N., Kra, B. Convergence of multiple ergodic averages forsome commuting transformations.

Erg. Th. and Dyn. Sys., (2005) 799-809.[Fu1] Furstenberg, H. Ergodic behavior of diagonal measures and a theorem ofSzemer´edi on arithmetic progressions. J. d’Analyse Math . (1977), 204-256.[Fu2] Furstenberg, H. Recurrence in ergodic theory and combinatorial numbertheory. Princeton University Press, Princeton, NJ. , (1981).[HK1] Host, B., Kra, B. Nonconventional ergodic averages and nilmanifolds.

Annalsof Math, (2005), 397-488.[HK2] Host, B., Kra, B. Convergence of polynomial ergodic averages.

Isr. J. Math, (2005) 1-19.[Le1] Leibman, A. Pointwise convergence of ergodic averages for polynomial ac-tions of Z d by translation on a nilmanifold. Erg. Th. and Dyn. Sys. (2005), no. 1, 215-225.[Le2] Leibman, A. Convergence of multiple ergodic averages along polynomials ofseveral variables, Isr. J. Math, (2005), 303-315.[Ta] Tao, T. Norm convergence of multiple ergodic averages for commuting trans-formaions,

Erg. Th. and Dyn. Sys.10