[PDF] Nonconventional ergodic averages and multiple recurrence for von Neumann dynamical systems

Abstract

The Furstenberg recurrence theorem (or equivalently, Szemerédi's theorem) can be formulated in the language of von Neumann algebras as follows: given an integer k≥2 , an abelian finite von Neumann algebra $(\M,\tau)$ with an automorphism $\alpha: \M \to \M$, and a non-negative $a \in \M$ with τ(a)>0 , one has lim inf N→∞ 1 N ∑ N n=1 Rτ(a α n (a)... α (k−1)n (a))>0 ; a subsequent result of Host and Kra shows that this limit exists. In particular, Rτ(a α n (a)>... α (k−1)n (a))>0 for all n in a set of positive density. From the von Neumann algebra perspective, it is thus natural to ask to what extent these results remain true when the abelian hypothesis is dropped. All three claims hold for k=2 , and we show in this paper that all three claims hold for all k when the von Neumann algebra is asymptotically abelian, and that the last two claims hold for k=3 when the von Neumann algebra is ergodic. However, we show that the first claim can fail for k=3 even with ergodicity, the second claim can fail for k≥4 even assuming ergodicity, and the third claim can fail for k=3 without ergodicity, or k≥5 and odd assuming ergodicity. The second claim remains open for non-ergodic systems with k=3 , and the third claim remains open for ergodic systems with k=4 .

Full PDF

NNONCONVENTIONAL ERGODIC AVERAGES AND MULTIPLERECURRENCE FOR VON NEUMANN DYNAMICAL SYSTEMS

TIM AUSTIN, TANJA EISNER, AND TERENCE TAO

Abstract.

The Furstenberg recurrence theorem (or equivalently, Szemer´edi’stheorem) can be formulated in the language of von Neumann algebras as fol-lows: given an integer k ≥

2, an abelian ﬁnite von Neumann algebra ( M , τ )with an automorphism α : M → M , and a non-negative a ∈ M with τ ( a ) >

0, one has lim inf N →∞ N (cid:80) Nn =1 Re τ ( aα n ( a ) . . . α ( k − n ( a )) >

0; asubsequent result of Host and Kra shows that this limit exists. In particular,Re τ ( aα n ( a ) . . . α ( k − n ( a )) > n in a set of positive density.From the von Neumann algebra perspective, it is thus natural to ask towhat extent these results remain true when the abelian hypothesis is dropped.All three claims hold for k = 2, and we show in this paper that all three claimshold for all k when the von Neumann algebra is asymptotically abelian, andthat the last two claims hold for k = 3 when the von Neumann algebra isergodic. However, we show that the ﬁrst claim can fail for k = 3 even withergodicity, the second claim can fail for k ≥ k = 3 without ergodicity, or k ≥ k = 3, and the third claim remains open for ergodic systems with k = 4. Contents

1. Introduction 21.1. Multiple recurrence 21.2. Non-commutative analogues 41.3. Positive results 71.4. Negative results 92. Counterexamples 112.1. Non-convergence for k ≥ k = 3 132.3. Negative trace for k = 5 213. Inclusions of ﬁnite von Neumann dynamical systems 23 T.A. is supported by a fellowship from Microsoft Corporation. T.E. is supported by theEuropean Social Fund and by the Ministry of Science, Research and the Arts Baden-W¨urttemberg.T.T. is supported by NSF grant DMS-0649473 and a grant from the Macarthur Foundation. a r X i v : . [ m a t h . OA ] J u l TIM AUSTIN, TANJA EISNER, AND TERENCE TAO

4. The case of asymptotically abelian systems 285. Triple averages for non-asymptotically-abelian systems 346. Closing remarks 39Appendix A. An application of the van der Corput lemma 40Appendix B. A group theory construction 43B.1. Applications 49References 521.

Introduction

Multiple recurrence.

Let ( X, X , µ ) be a probability space, and let T : X → X be a measure-preserving invertible transformation on X (i.e. T, T − are bothmeasurable, and µ ( T ( A )) = µ ( A ) for all measurable A ). From the mean ergodic the-orem we know that for any f ∈ L ∞ ( X ), the averages N (cid:80) Nn =1 f ◦ T − n converge in(say) L ( X ) norm, which implies in particular that the averages N (cid:80) Nn =1 (cid:82) X f ( f ◦ T − n ) dµ converge for all f , f ∈ L ∞ ( X ). Furthermore, if f = f = f is non-negative with positive mean (cid:82) X f dµ >

0, then the Poincar´e recurrence theoremimplies that this latter limit is strictly positive. In particular, this implies that themean (cid:82) X f ( f ◦ T − n ) dµ is positive for all natural numbers n in a set E ⊂ N of posi-tive (lower) density (which means that lim inf N →∞ N { ≤ n ≤ N : n ∈ E } > Theorem 1.1 (Abelian multiple recurrence) . Let ( X, X , µ ) be a probability space,let k ≥ be an integer, and let T : X → X be a measure-preserving invertibletransformation. • (Convergence in norm) For any f , . . . , f k − ∈ L ∞ ( X ) , the averages N N (cid:88) n =1 ( f ◦ T − n ) . . . ( f k − ◦ T − ( k − n ) converge in L ( X ) norm as N → ∞ . • (Weak convergence) For any f , f , . . . , f k − ∈ L ∞ ( X ) , the averages N N (cid:88) n =1 (cid:90) X f ( f ◦ T − n ) . . . ( f k − ◦ T − ( k − n ) dµ converge as N → ∞ . The minus sign here is not of particular signiﬁcance (other than to conform to some minornotational conventions) and can be ignored in the sequel if desired.

ON NEUMANN NONCONVENTIONAL AVERAGES 3 • (Recurrence on average) For any non-negative f ∈ L ∞ ( X ) with (cid:82) X f dµ > , one has (1) lim inf N →∞ N N (cid:88) n =1 (cid:90) X f ( f ◦ T − n ) . . . ( f ◦ T − ( k − n ) dµ > . • (Recurrence on a dense set) For any non-negative f ∈ L ∞ ( X ) with (cid:82) X f dµ > , one has (2) (cid:90) X f ( f ◦ T − n ) . . . ( f ◦ T − ( k − n ) dµ > c > for some c > and all n in a set of natural numbers of positive lowerdensity. We have called this result the “abelian” multiple recurrence theorem to emphasisethe abelian nature of the algebra L ∞ ( X ). Remarks . Clearly, convergence in norm implies weak convergence; also, as theaverages (2) are bounded and non-negative, recurrence on average implies recur-rence on a dense set. Using the weak convergence result, the limit inferior in (1)can be replaced with a limit, but we have retained the limit inferior in order tokeep the two claims logically independent of each other.As mentioned earlier, the k = 2 cases of Theorem 1.1 follow from classical ergodictheorems. Furstenberg [15] established recurrence on average (and hence recurrenceon a dense set) for all k , and observed that this result was equivalent (by what isnow known as the Furstenberg correspondence principle ) to Szemer´edi’s famoustheorem [35] on arithmetic progressions, thus providing an important new proofof that theorem. Convergence in norm (and hence in mean) was established for k = 3 by Furstenberg [15], for k = 4 by Conze and Lesigne [8], [9], [10] (assumingtotal ergodicity) and by Host and Kra [22] (in general), for k = 5 in some cases byZiegler [40], and for all k by Host and Kra [23] (and subsequently also by Ziegler[41]). See [28] for a survey of these results, and their relation to other topics suchas dynamics of nilsequences, and arithmetic progressions in number-theoretic setssuch as the primes. (cid:67) There is also a multidimensional generalisation of the above results to multiplecommuting shifts:

Theorem 1.2 (Abelian multidimensional multiple recurrence) . Let ( X, X , µ ) bea probability space, let k ≥ be an integer, and let T , . . . , T k − : X → X be acommuting system of measure-preserving invertible transformations. • (Convergence in norm) For any f , . . . , f k − ∈ L ∞ ( X ) , the averages N N (cid:88) n =1 T n (( f ◦ T − n ) . . . ( f k − ◦ T − nk − )) converge in L ( X ) norm. • (Weak convergence) For any f , f , . . . , f k − ∈ L ∞ ( X ) , the averages N N (cid:88) n =1 (cid:90) X ( f ◦ T − n )( f ◦ T − n ) . . . ( f k − ◦ T − nk − ) dµ TIM AUSTIN, TANJA EISNER, AND TERENCE TAO converge. • (Recurrence on average) For any non-negative f ∈ L ∞ ( X ) with (cid:82) X f dµ > , one has (3) lim inf N →∞ N N (cid:88) n =1 (cid:90) X ( f ◦ T − n )( f ◦ T − n ) . . . ( f ◦ T − nk − ) dµ > . • (Recurrence on a dense set) For any non-negative f ∈ L ∞ ( X ) with (cid:82) X f dµ > , one has (4) (cid:90) X ( f ◦ T − n )( f ◦ T − n ) . . . ( f ◦ T − nk − ) dµ > c > for some c > and all n in a set of natural numbers of positive lowerdensity. Of course, Theorem 1.1 is the special case of Theorem 1.2 when T i := T i . It is oftencustomary to normalise T to be the identity transformation (by replacing each ofthe T i with T − T i ). Remarks . The k = 2 case is again classical. Recurrence on average (andhence on a dense set) in this theorem was established for all k by Furstenbergand Katznelson [16], which by the Furstenberg correspondence principle implies amultidimensional version of Szemer´edi’s theorem, a combinatorial proof of whichin full generality has only been obtained relatively recently in [30] and [20]. Con-vergence in norm (and weak convergence) was established for k = 3 in [8], for somespecial cases of k = 4 in [39], for all k assuming total ergodicity in [14], and for all k unconditionally in [36] (with subsequent proofs at [37], [1], [21]). The results canfail if the shifts T , . . . , T k − do not commute [5]. Note that non-commutativityof the shifts should not be confused with the non-commutativity of the underlyingalgebra, which is the focus of this current paper. (cid:67) Non-commutative analogues.

From the perspective of the theory of vonNeumann algebras, the space L ∞ ( X ) appearing in the above theorems can beinterpreted as an abelian von Neumann algebra, with a ﬁnite trace τ ( f ) := (cid:82) X f dµ ,and with an automorphism T : L ∞ ( X ) → L ∞ ( X ) deﬁned by T f := f ◦ T − . Itis then natural to ask whether the above results can be extended to non-abeliansettings. More precisely, we recall the following deﬁnitions. Deﬁnition 1.3 (Non-commutative systems) . A ﬁnite von Neumann algebra isa pair ( M , τ ) , where M is a von Neumann algebra (i.e. an algebra of boundedoperators on a separable complex Hilbert space that contains the identity , isclosed under adjoints, and is closed in the weak operator topology), and τ : M → C is a ﬁnite faithful trace (i.e. a linear map with τ ( a ∗ ) = τ ( a ) , τ ( ab ) = τ ( ba ) , and τ ( a ∗ a ) ≥ for all a, b ∈ M , with τ ( a ∗ a ) = 0 if and only if a = 0 and τ (1) = 1 ).The operator norm of an element a ∈ M is denoted (cid:107) a (cid:107) . We say that an element a ∈ M is non-negative if one has a = b ∗ b for some b ∈ M . An element a ∈ M is In our applications, the hypothesis of separability can be omitted, since one can always passto the separable subalgebra generated by a ﬁnite collection a , . . . , a k − of elements and theirshifts if desired. ON NEUMANN NONCONVENTIONAL AVERAGES 5 central if one has ab = ba for all b ∈ M . The set of all central elements is denoted Z ( M ) and referred to as the centre of M ; the algebra M is abelian if Z ( M ) = M .A shift α on a ﬁnite von Neumann algebra ( M , τ ) is trace-preserving ∗ -automorphism,i.e. α is an algebra isomorphism such that α ( a ∗ ) = α ( a ) ∗ and τ ( α ( a )) = τ ( a ) for all a ∈ M . We say that the shift is ergodic if the invariant algebra { a ∈ M : α ( a ) = a } consists only of the constants C . We refer to the triple ( M , τ, α ) as a von Neumann Z -system , or a von Neumann dynamical system . More generally, if α , . . . , α k − are k commuting shifts on M , we refer to ( M , τ, α , . . . , α k − ) as a von Neumann Z k -system . It is easy to verify that if ( X, X , µ ) is a (classical) probability space with a shift T : X → X , then ( L ∞ ( X ) , (cid:82) X · dµ, ◦ T − ) is an (abelian example of a) von Neu-mann dynamical system, and more generally if T , . . . , T k − : X → X are commut-ing shifts, then ( L ∞ ( X ) , (cid:82) X · dµ, ◦ T − , . . . , ◦ T − k − ) is an abelian example of a vonNeumann Z k -system. In fact, all abelian von Neumann dynamical systems arise (upto isomorphism of the algebras) as such examples; see Kadison and Ringrose [26,Chapter 5].A ﬁnite von Neumann algebra ( M , τ ) gives rise to an inner product (cid:104) a, b (cid:105) := τ ( a ∗ b )on M ; the properties of the trace ensure that this inner product is positive deﬁnite.(We use the convention for a scalar product to be conjugate linear in the ﬁrstcoordinate.) The Hilbert space completion of M with respect to this inner productwill be referred to as L ( τ ). Note that α extends to a unitary transformation on L ( τ ). In the abelian case when M = L ∞ ( X, X , µ ), then L ( τ ) can be canonicallyidentiﬁed with L ( X, X , µ ).Inspired by Theorems 1.1, 1.2, we now make the following deﬁnitions: Deﬁnition 1.4 (Non-commutative recurrence and convergence) . Let k ≥ be aninteger, ( M , τ, α ) be a von Neumann dynamical system, and ( M , τ, α , . . . , α k − ) be a von Neumann Z k -system. • We say that ( M , τ, α ) enjoys order k convergence in norm if for any a , . . . , a k − ∈M , the averages N N (cid:88) n =1 ( α n ( a ))( α n ( a )) . . . ( α ( k − n ( a k − )) converge in L ( τ ) as N → ∞ . • We say that ( M , τ, α ) enjoys order k weak convergence if for any a , a , . . . , a k − ∈M , the averages N N (cid:88) n =1 τ ( a ( α n ( a ))( α n ( a )) . . . ( α ( k − n ( a k − ))) converge as N → ∞ . • We say that ( M , τ, α ) enjoys order k recurrence on average if for any non-negative a ∈ M with τ ( a ) > one has (5) lim inf N →∞ N N (cid:88) n =1 Re τ ( a ( α n ( a ))( α n ( a )) . . . ( α ( k − n ( a ))) > . TIM AUSTIN, TANJA EISNER, AND TERENCE TAO • We say that ( M , τ, α ) enjoys order k recurrence on a dense set if for anynon-negative a ∈ M with τ ( a ) > one has (6) Re τ ( a ( α n ( a ))( α n ( a )) . . . ( α ( k − n ( a ))) > c > . for some c > and all n in a set of natural numbers of positive lowerdensity. • We say that ( M , τ, α , . . . , α k − ) enjoys convergence in norm if for any a , . . . , a k − ∈ M , the averages N N (cid:88) n =1 α − n (( α n ( a ))( α n ( a )) . . . ( α nk − ( a k − ))) converge in L ( τ ) as N → ∞ . • We say that ( M , τ, α , . . . , α k − ) enjoys weak convergence if for any a , a , . . . , a k − ∈M , the averages N N (cid:88) n =1 τ (( α n ( a ))( α n ( a ))( α n ( a )) . . . ( α nk − ( a k − ))) converge as N → ∞ . • We say that ( M , τ, α , . . . , α k − ) enjoys recurrence on average if for anynon-negative a ∈ M with τ ( a ) > one has (7) lim inf N →∞ N N (cid:88) n =1 Re τ (( α n ( a ))( α n ( a )) . . . ( α nk − ( a ))) > . • We say that ( M , τ, α ) enjoys order k recurrence on a dense set if for anynon-negative a ∈ M with τ ( a ) > one has (8) Re τ (( α n ( a ))( α n ( a )) . . . ( α nk − ( a ))) > c > . for some c > and all n in a set of natural numbers of positive lowerdensity.Remark . As before, we may normalise α to be the identity. Of course, the ﬁrstfour properties here are nothing more than the specialisations of the last four to thecase α i = α i for 0 ≤ i ≤ k −

1. The real part is needed in (5), (6), (7), (8) becausethere is no necessity for the traces here to be real-valued (the diﬃculty being thatthe product of two non-negative elements of a non-abelian von Neumann algebraneed not remain non-negative). In the case of (5), one can omit the real part bytaking averages from − N to N , since one has the symmetry τ ( a ( α n ( a ))( α n ( a )) . . . ( α ( k − n ( a ))) = τ (( a ( α n ( a ))( α n ( a )) . . . ( α ( k − n ( a ))) ∗ )= τ (( α ( k − n ( a )) . . . ( α n ( a ))( α n ( a )) a )= τ ( a ( α − n ( a )) . . . ( α − ( k − n ( a )))for any self-adjoint a .Note however that it is quite possible for the expressions (6), (8) to be negativeeven when a is non-negative. Because of this, while recurrence on average stillimplies recurrence on a dense set, the converse is not true; one can have recurrenceon a dense set but end up with a zero or even negative average due to the presence ON NEUMANN NONCONVENTIONAL AVERAGES 7 of large negative values of (6) or (8). We will see examples of this later in thispaper. (cid:67)

Remark . As mentioned earlier, the Furstenberg correspondence principle equatesrecurrence results with a combinatorial statements (such as Szemer´edi’s theorem)which can be formulated in a purely ﬁnitary fashion. However, we do not knowwhether the same is true for non-commutative recurrence results. Formulating aﬁnitary statement that would imply recurrence results for some non-abelian vonNeumann dynamical system probably requires some quite strong approximate em-beddability of the system into ﬁnite-dimensional matrix algebras with approximateshifts, together with a recurrence assertion for such ﬁnite-dimensional systems inwhich the various parameters may all be chosen independent of the dimension.Since many of the results we prove below in the inﬁnitary setting are negativeanyway, we will not pursue this issue here. (cid:67)

The study of these properties (and related topics) for von Neumann dynamicalsystems has been pursued by Niculescu, Str¨oh and Zsid´o [31], Duvenhage [11],Beyers, Duvenhage and Str¨oh [6], and Fidaleo [13]. A variant of these questions,in which one averages over a higher-dimensional range of shifts, was also studied in[12]. In this paper we shall develop further positive and negative results regardingthese properties, which we now present.1.3.

Positive results.

We ﬁrst remark that when k = 2, all systems enjoy normand weak convergence, as well as recurrence on average and on a dense set, thanksto the ergodic theorem for von Neumann algebras (see e.g. [29, Section 9.1]). In-deed, from that theorem, we know that for any von Neumann dynamical system( M , τ, α ) and a ∈ M , the averages N (cid:80) Nn =1 α n ( a ) converge in L ( τ ) to the orthog-onal projection of a to the invariant space L ( τ ) α := { f ∈ L ( τ ) : α ( f ) = f } , givingthe convergence results. If a is non-negative and non-zero, this projection can beveriﬁed to have a positive inner product with a , giving the recurrence results.Now we consider the cases k ≥

3. We have already seen from Theorems 1.1, 1.2that we have convergence and recurrence in those abelian systems arising fromergodic theory, and have recalled above that in fact these include all examples (upto isomorphism).

Proposition 1.5.

Let k ≥ . If ( M , τ, α ) is an abelian von Neumann dynami-cal system, then ( M , τ, α ) enjoys weak convergence and convergence in norm, andrecurrence on average and on a dense set.More generally, if ( M , τ, α , . . . , α k − ) is an abelian von Neumann Z k -system, thenthis Z k -system enjoys weak convergence and convergence in norm, and recurrenceon average and on a dense set. We now generalise these results to the wider class of asymptotically abelian systems.

Deﬁnition 1.6 (Asymptotic abelianness) . A von Neumann dynamical system ( M , τ, α ) is asymptotically abelian if one has lim N →∞ N N (cid:88) n =1 (cid:107) [ α n ( a ) , b ] (cid:107) L ( τ ) = 0 TIM AUSTIN, TANJA EISNER, AND TERENCE TAO for all a, b ∈ M , where [ a, b ] := ab − ba is the commutator.Remark . In previous literature such as [6], a stronger version of asymptoticabelianness is assumed, in which the L ( τ ) norm is replaced by the operator norm.Variants of this type of “topological asymptotic abelianness”, and their relationshipwith non-commutative topological weak mixing have also been considered in [27]. (cid:67) Our work also singles out this case as special, since the assumption of asymptoticabelianness seems to be essential for the correct working of some the chief tech-nical tools taken from the commutative setting (particularly the van der Corputestimate). In the previous works [31], [6], [11], convergence and recurrence wereshown for all orders k for asymptotically abelian systems under some additionalassumptions such as weak mixing or compactness. Our ﬁrst main result shows thatin fact all asymptotically abelian systems enjoy convergence and recurrence. Theorem 1.7.

Let k ≥ . If ( M , τ, α ) is an asymptotically abelian von Neu-mann dynamical system, then ( M , τ, α ) enjoys weak convergence and convergencein norm, and recurrence on average and on a dense set.More generally, if ( M , τ, α , . . . , α k − ) is a von Neumann Z k -system, and the α i α − j for i (cid:54) = j are each individually asymptotically abelian, then this Z k -system enjoysweak convergence and convergence in norm, and recurrence on average and on adense set. Theorem 1.7 is deduced from the genuinely abelian case (Proposition 1.5) using tworesults. The ﬁrst is essentially from [6] or [11], which considered the model case α i = α i ; for the sake of completeness, we present a proof in Appendix A. Theorem 1.8 (Multiple ergodic averages for relatively weakly mixing extensions) . Let ( M , τ, α , . . . , α k − ) be a von Neumann Z k -system, and let N be a von Neu-mann subalgebra of M which is invariant under all of the α i . If for any distinct ≤ i, j ≤ k − the shift α i α − j is asymptotically abelian and weakly mixing relativeto N , then the associated multiple ergodic averages satisfy (cid:13)(cid:13)(cid:13) N N (cid:88) n =1 α − n k − (cid:89) i =1 α ni ( a i ) − N N (cid:88) n =1 α − n k − (cid:89) i =1 α ni ( E N ( a i )) (cid:13)(cid:13)(cid:13) L ( τ ) → as N → ∞ , where E N : M → N is the conditional expectation constructed from τ ,and the products are from left to right. We will recall the notions of relative weak mixing and conditional expectation inSection 3.The second result, which is new and may have other applications elsewhere, can beviewed as a partial analogue of the Furstenberg-Zimmer structure theorem [17] forasymptotically abelian systems.

Theorem 1.9 (Structure theorem for asymptotically abelian systems) . If ( M , τ, α ) is an asymptotically abelian von Neumann dynamical system, then α is weakly mix-ing relative to the centre Z ( M ) ⊂ M . ON NEUMANN NONCONVENTIONAL AVERAGES 9

Remark . In the case when M is a factor (i.e. the centre is trivial), results of thisnature (with a slightly diﬀerent notion of mixing, and of asymptotic abelianness)was established in [7, Example 4.3.24].These results quickly imply Theorem 1.7. Indeed, when studying (for instance)convergence in norm for a Z k -system, one can use Theorem 1.9 followed by The-orem 1.8 to replace each of the a , . . . , a k − by their conditional expectations E Z ( M ) ( a ) , . . . , E Z ( M ) ( a k − ) without any aﬀect on the convergence, at which pointone can apply Proposition 1.5. (Note that the centre Z ( M ) does not depend onwhat shift α − i α j one is analysing.) The other claims are similar (using Lemma 3.1to ensure that if a is non-negative with positive trace, then so is the conditionalexpectation E Z ( M ) ( a )). Remark . The above arguments in fact show a more quantitative statement: if a is non-negative with (cid:107) a (cid:107) ≤ τ ( a ) ≥ δ for some 0 ≤ δ ≤

1, then one hasthe same lower bound c ( k, δ ) ≥ f with (cid:107) f (cid:107) L ∞ ( X ) ≤ (cid:82) X f dµ ≥ δ (in particular,one could insert the bound of Gowers [19]). Similar remarks apply to multiplecommuting shifts. We leave the details to the reader. (cid:67) The proof of Theorem 1.9, given in Section 3 below, rests on non-commutativeversions of several of the steps on the way to the Furstenberg-Zimmer StructureTheorem in the commutative world of ergodic theory [15, 43, 42]. In particular, itrests on a version of the dichotomy between relatively weakly mixing inclusions andthose containing a relatively isometric subinclusion, well-known in ergodic theoryfrom the work of Furstenberg [15] and Zimmer [43, 42] and already generalizedto the non-commutative world by Popa in [32], for applications to the study ofsuperrigidity phenomena.If ( M , τ, α ) is not asymptotically abelian then matters are rather more complicated,with positive results only obtaining under additional restrictions. For k = 3 andfor ergodic shifts, we have a positive result, established in Section 5: Theorem 1.10. If k = 3 and ( M , τ, α ) is an ergodic von Neumann dynamical sys-tem, then one has weak convergence and convergence in norm, as well as recurrenceon a dense set. We remark that the weak convergence result was previously established in [13].1.4.

Negative results.

Recurrence on average has been omitted from Theorem1.10. This is because this result fails:

Theorem 1.11.

Let k = 3 , then there exists an ergodic von Neumann dynamicalsystem ( M , τ, α ) for which recurrence on average fails. (In fact one can make theaverage (5) strictly negative.) We establish this in Section 2.2. The main tool is a sophisticated version of theBehrend set construction, combined with the crossed product construction.

When one drops the ergodicity assumption , one also loses recurrence on a denseset: Theorem 1.12.

Let k = 3 , then there exists a von Neumann dynamical system ( M , τ, α ) for which recurrence on a dense set fails. (In fact one can make themeans (6) equal to a negative constant for all non-zero n .) We establish this in Section 2.2 also. This result is simpler to prove than The-orem 1.11, and uses the original Behrend set construction, and crossed productconstructions.One also loses recurrence on a dense set for larger k even when ergodicity is assumed: Theorem 1.13.

Let k ≥ be odd, then there exists an ergodic von Neumanndynamical system ( M , τ, α ) for which recurrence on a dense set fails. (In fact onecan make the means (6) equal to a negative constant for all non-zero n .) We establish this in Section 2.3. This result uses a counterexample of Bergelson,Host, Kra, and Ruzsa [4], combined with a group theoretic construction. The re-striction to odd k is mostly technical and can almost certainly be removed; however,we are unable to decide whether Theorem 1.13 can be extended to the k = 4 case,because it was shown in [4] that the k = 5 counterexample in that paper cannot bereplicated for k = 4.For convergence, we have counterexamples for k ≥ Theorem 1.14.

Let k ≥ , then there exists an ergodic von Neumann dynamicalsystem ( M , τ, α ) for which weak convergence and convergence in norm fail. We establish this in Section 2.1. The main tool is a group theoretic construction.The above counterexamples were for the single shift case, but of course they arealso counterexamples to the more general situation of multiple commuting shifts.We summarise the positive and negative results (in the single shift case) in Table1.We note in particular that the following questions remain open:

Problem 1.15. If k = 3 , does weak or norm convergence hold for non-ergodic vonNeumann dynamical systems ( M , τ, α ) ? Problem 1.16. If k = 3 , does weak or norm convergence hold for von Neumann Z -systems ( M , τ, α , α , α ) (possibly after imposing suitable ergodicity hypotheses)? Problem 1.17. If k = 4 (or if k ≥ is even), does recurrence on a dense set holdfor ergodic von Neumann dynamical systems ( M , τ, α ) ? We present some remarks on the ﬁrst two problems in Section 6. In the commutative case, an easy application of the ergodic decomposition allows one torecover the non-ergodic case of the recurrence and convergence results from the ergodic case.Unfortunately, in the non-commutative case, the ergodic decomposition is only available when theinvariant factor M τ is central, which is the case in the asymptotically abelian case, but not ingeneral. ON NEUMANN NONCONVENTIONAL AVERAGES 11

Table 1.

Positive and negative results for non-commutative con-vergence and recurrence of a single shift for various values of k , andfor various assumptions of ergodicity. The entries marked “No?”would be expected to have a negative answer if one adopts the prin-ciple that recurrence results which fail for one value of k , shouldalso fail for higher values of k .Conv. norm? Conv. mean? Recur. avg.? Recur. dense? k = 2 Yes Yes Yes Yes k = 3, erg. Yes Yes No Yes k = 3, non-erg. ??? ??? No No k ≥

4, even, erg. No No No? ??? k ≥

4, even, non-erg. No No No? No? k ≥

5, odd, erg. No No No No k ≥

5, odd, non-erg. No No No No

Notational remark.

Unfortunately this paper stands between two quite unre-lated uses of the word ‘factor’, one from operator algebras and one from ergodictheory. In the hope that it may be of interest to operator algebraists, we havedeferred to their usage (even though the true notion of a factor due to Murray andvon Neumann is actually not essential to our work), and will refer throughout toinclusions of von Neumann algebras, even in the commutative setting where thesecan be identiﬁed with ergodic-theoretic ‘factors’. (cid:67)

Acknowledgements.

Our thanks go to Sorin Popa for several helpful discussions,Francesco Fidaleo and David Kerr for references, and to Ezra Getzler for explain-ing Grothendieck’s interpretation of a group via its sheaf of ﬂat connections. Theauthors are indebted to the anonymous referee for careful comments and sugges-tions. Brown University and Universit¨at T¨ubingen and University of California,Los Angeles. 2.

Counterexamples

In this section we construct various counterexamples of von Neumann systems( M , τ, α ) which will demonstrate the negative results in Theorems 1.11-1.14. Thematerial in this section is independent of the positive results in the rest of thepaper, but may provide some cautionary intuition to keep in mind when readingthe proofs of those results.2.1. Non-convergence for k ≥ . We ﬁrst show that convergence results fail for k ≥

4, even if one assumes ergodicity. In fact the divergence is so bad that it isessentially arbitrary:

Theorem 2.1 (No convergence for k ≥ . Let k ≥ be an integer, and let A ⊂ Z be a set. Then there exist an ergodic von Neumann system ( M , τ, α ) and elements a , . . . , a k − ∈ M such that τ ( a α n ( a ) . . . α ( k − n ( a k − )) = 1 A ( n ) for all integers n . It is clear that this implies Theorem 1.14 by choosing A appropriately (and notingthat failure of weak convergence implies failure of convergence in norm, by Cauchy-Schwarz applied in the contrapositive). Proof.

It will suﬃce to verify the k = 4 case, as the higher cases follow by setting a j = 1 for j ≥

4. We will need a group G with four distinguished elements e , e , e , e and an automorphism T : G → G such that T k has no ﬁxed pointsother than the identity for all k (cid:54) = 0, and such that e ( T r e )( T r e )( T r e ) = idholds for all r ∈ A and fails for all r ∈ Z \ A . The construction of such a group issomewhat non-trivial and is deferred to Appendix B, and in particular to Proposi-tion B.8.The group algebra C G of formal ﬁnite linear combinations of group elements of G ,acts (on the left) on the Hilbert space (cid:96) ( G ) in the obvious manner (arising fromconvolution on G ), and can thus be viewed as a subspace of the von Neumann alge-bra B ( (cid:96) ( G )) (note that all the elements of G become unitary in this perspective).We can place a ﬁnite faithful trace τ on C G by declaring the identity element tohave trace 1, and all other elements of G to have trace zero. If we then deﬁne M to be the closure of C G in the weak operator topology of B ( (cid:96) ( G )), we obtain aﬁnite von Neumann algebra, known as the group von Neumann algebra LG of G .The shift T leads to an algebra isomorphism α of C G , which then easily extends toa shift α on M = LG . Because none of the powers of T have any non-trivial ﬁxedpoints, the orbit of any non-zero group element contains no repetitions, and so onecan easily establish that α n f converges weakly to τ ( f ) as n → ∞ for every f ∈ C G ,and hence by approximation that the unitary operator on (cid:96) ( G ) associated to α has no ﬁxed points outside C δ id . This implies that ( M , τ, α ) is ergodic, since given a ∈ M for which α ( a ) = a and τ ( a ) = 0 it follows that a ( δ id ) ∈ (cid:96) ( G ) is a ﬁxedpoint for the action of T on (cid:96) ( G ), which must therefore equal τ ( a ) δ id = 0, andhence τ ( a ∗ a ) = (cid:107) a ( δ id ) (cid:107) = 0 and so a = 0, by the faithfulness of τ . If we now set a j = e j for j = 0 , , , (cid:3) Remark . An inspection of the proofs of Proposition 2.1 and Proposition B.8shows that the expression a α n ( a ) α n ( a ) α n ( a ) can more generally be replacedby α c n ( a ) α c n ( a ) α c n ( a ) α c n ( a ) whenever c , c , c , c are integers with c i (cid:54) = c i +1 for all i = 0 , , , c i +4 = c i ). Thus for instanceone can construct von Neumann systems for which τ ( a ( α n ( a )) a α n ( a )) = 1 A ( n )for an arbitrary set A . We omit the details. (cid:67) Remark . The examples of non-convergence given above are not self-adjointor positive, and the a i are not equal to each other. However, it is not hard tomodify the examples to give an example of a positive a i = a for which the averages N (cid:80) Nn =1 τ ( aα n ( a ) α n ( a ) α n ( a )) do not converge. Indeed, one can repeat the above ON NEUMANN NONCONVENTIONAL AVERAGES 13 x x+hx+k x+2k+h x+2h+2kx+2h+k

Figure 1.

A hexagon. Note the absence of arithmetic progres-sions of length three.construction with a := id + 1100 (cid:88) i =0 ( e i + e ∗ i );this is easily seen to be positive and self-adjoint, and a modiﬁcation of the abovecomputations then shows that τ ( aα n ( a ) α n ( a ) α n ( a )) = 1 + 2100 A ( n )for all n , which is enough to ensure divergence by choosing A appropriately. Weleave the details to the reader. (cid:67) Remark . The group G constructed here can easily be shown to have inﬁniteconjugacy classes (by the same methods used to prove Proposition B.8). Thisimplies that the group algebra LG is a factor. We refer to Kadison, Ringrose [26,Theorem 6.7.5] for details. (cid:67) Negative averages for k = 3 . We now show the negativity of various tripleaverages. The main tool is the following Behrend-type construction of a set whichavoids progressions of length three, but contains many “hexagons”:

Lemma 2.2 (Behrend-type example) . Let ε > . Then for all suﬃciently large d ,there exists a subset F of Z /d Z such that | F | ≥ d − ε , but F contains no non-trivialarithmetic progressions of length three, thus n, n + r, n + 2 r ∈ F can only occur if r = 0 . On the other hand, the set { ( x, h, k ) ∈ Z /d Z : x, x + h, x + k, x + k + 2 h, x + 2 k + h, x + 2 k + 2 h ∈ F } of “hexagons” in F has cardinality at least d − ε . We remark that the ﬁrst part of the lemma already follows directly from the workof Behrend [2] or the earlier work of Salem and Spencer [33]. The claim abouthexagons will be needed in the proof of Theorem 2.6 below, but is not needed forthe simpler results in Corollary 2.4 or Theorem 2.5.

Proof.

Let R be a large multiple of 400 (depending on ε ). We claim that for n alarge enough multiple of 4 (depending on R ), the set {− R, . . . , R } n ⊂ Z n contains a subset E of cardinality | E | ≥ e − O ( n ) R n (where the implied constant in the O ()notation is absolute), and which contains ≥ e − O ( n ) R n hexagons { x, x + h, x + k, x + k + 2 h, x + 2 k + h, x + 2 k + 2 h } but contains no arithmetic progressions oflength three. Choosing d suﬃciently large, letting n be the largest integer such that(10 R ) n ≤ d and then embedding {− R, . . . , R } n in Z /d Z using base 10 R (say), as inthe work of Behrend or Salem-Spencer, this claim will imply the lemma (choosing R suﬃciently large depending on ε ).It remains to establish the claim. From the classical results on the Waring problem(see e.g. [38]), we know that every large integer N has ∼ N ( k − / representationsas the sum of k squares for k large enough (one can for instance take k = 5,but for our purposes any ﬁxed k will suﬃce). Using this, we see that for anyﬁxed δ ∈ (0 , ), every integer r such that δR n ≤ r ≤ R n (say) will have ≥ ( c δ R ) n − C δ representations as the sum of n squares of integers less than R , where c δ , C δ > δ . In other words, the sphere E r := { x ∈ {− R, . . . , R } n : | x | = r } has cardinality at least ( c δ R ) n − C δ . On the other hand, such spheres haveno non-trivial progressions of length three. Thus it will suﬃce (for n large enough)by the pigeonhole principle to show that there are at least e − O ( n ) R n hexagons { x, x + h, x + k, x + k + 2 h, x + 2 k + h, x + 2 k + 2 h } in {− R, . . . , R } n such that(9) | x | = | x + h | = | x + k | = | x + k +2 h | = | x +2 k + h | = | x +2 k +2 h | ≤ R n (note that the case when | x | ≤ δR n for suﬃciently small δ can be eliminated bycrude estimates).To count the solutions to (9), we perform some elementary changes of variable toreplace the constraints in (9) with simpler constraints. We begin by observing thatif a, b, c ∈ {− R/ , . . . , R/ } n are such that(10) a · b = b · c = c · a = 0; c · c = 3 b · b then x := a − b , h := b + c , k := b − c can be veriﬁed to be a solution to (9), withthe map ( a, b, c ) → ( x, h, k ) being injective, so it suﬃces to show that there are atleast e − O ( n ) R n triples ( a, b, c ) with the above properties.For reasons that will become clearer later, we will initially work in dimension n/ n . Using the Waring problem results as before, we can ﬁnd at least e − O ( n ) R n/ triples a, b, c ∈ {− R/ , . . . , R/ } n/ such that c · c = 3 b · b. This is one of the four constraints required for (10). To obtain the remaining con-straints, we use a pigeonholing trick followed by a tensor power trick. Firstly, ob-serve that whenever a, b, c ∈ {− R/ , . . . , R/ } n/ , then a · b, b · c, c · a are of order O ( R n ) ≤ e O ( n ) . Applying the pigeonhole principle, one can thus ﬁnd h , h , h = O ( R n ) such that there are e − O ( n ) R n/ triples a, b, c ∈ {− R/ , . . . , R/ } n/ with(11) a · b = h ; b · c = h ; c · a = h ; c · c = 3 b · b. This is an inhomogeneous version of (10) (at dimension n/ n ), withthe zero coeﬃcients replaced by more general coeﬃcients h , h , h . To eliminatethese coeﬃcients we use a tensor power trick. Let S ⊂ {− R/ , . . . , R/ } n/ ×{− R/ , . . . , R/ } n/ × {− R/ , . . . , R/ } n/ be the set of all triples ( a, b, c ) ON NEUMANN NONCONVENTIONAL AVERAGES 15 obeying (11). We then observe that if ( a i , b i , c i ) ∈ S for i = 1 , , ,

4, then thevectors a, b, c ∈ Z n deﬁned by a := ( a , a , a , a ); b := ( b , b , − b , − b ); c := ( c , − c , c , − c )solve (10). The map from the ( a i , b i , c i ) to ( a, b, c ) is an injection from S to thesolution set of (10), and so we obtain at least | S | ≥ e − O ( n ) R n solutions to (10)as required. (cid:3) This leads to a useful matrix counterexample:

Lemma 2.3 (Restricted third moment can be negative) . There exists a positivesemi-deﬁnite Hermitian matrix ( A ( j, k )) ≤ j,k ≤ d for which the quantity (12) (cid:88) n,r ∈ Z /d Z A ( n, n + r ) A ( n + r, n + 2 r ) A ( n + 2 r, n ) is negative, where we extend A ( i, j ) periodically in both variables by d .Proof. We will take d to be a multiple of 3, and A ( j, k ) to take the form A ( j, k ) := 1 E ( j )1 E ( k ) + 1 E ( j ) ω − j E ( k ) ω k where E ⊂ Z /d Z is a set to be determined later, and ω := e πi/ is a cube root ofunity. The matrix ( A ( j, k )) ≤ j,k ≤ d is then the sum of two rank one projections andis thus positive semi-deﬁnite and Hermitian. The expression (12) can be expandedas (cid:88) n,r ∈ Z /d Z : n,n + r,n +2 r ∈ E (1 + ω r )(1 + ω r )(1 + ω − r ) . The summand can be computed to equal 8 when r is divisible by 3, and − E such that the set { ( n, r ) ∈ Z /d Z : n, n + r, n + 2 r ∈ E ; r (cid:54) = 0 mod 3 } is more than eight times larger than the set { ( n, r ) ∈ Z /d Z : n, n + r, n + 2 r ∈ E ; r = 0 mod 3 } , thus the length three arithmetic progressions in E with spacing not divisible by 3need to overwhelm the length three progressions with spacing divisible by 3.To do this, we use Lemma 2.2 to obtain a subset F ⊂ { , . . . , [ d/ } of cardinality | F | ≥ d . which contains no arithmetic progressions of length three. We then pickthree random shifts h , h , h ∈ { , . . . , d/ } uniformly at random, and considerthe set E := { f + h i ) + i : i = 0 , , f ∈ F } consisting of three randomly shifted, dilated copies of F .By construction, the only length three progressions in E with spacing divisibleby 3 are the trivial progressions n, n, n with r = 0, so the total number of suchprogressions is at most d . On the other hand, for any ﬁxed f , f , f ∈ F , thenumbers 3( f i + h i ) + i for i = 0 , , /d of forming an arithmeticprogression with spacing not divisible by 3, due to the random nature of the h i .Thus the expected value of the total number of such progressions is at least ( d . ) × /d = 3 d . . For d large enough, this gives the claim. (cid:3) This already gives a simple example of negative averages for non-ergodic systems:

Corollary 2.4 (Negative average for non-ergodic system) . There exists a ﬁnitevon Neumann algebra ( M , τ ) with a shift α , and a non-negative element a ∈ M ,such that N +1 (cid:80) Nn = − N τ ( aα n ( a ) α n ( a )) converges to a negative number.Proof. Let a = ( A ( j, k )) ≤ j,k ≤ d be as in Lemma 2.3. We let M be the von Neumannalgebra of complex d × d matrices with the normalised trace τ , and with the shift α ( B ( j, k )) ≤ j,k ≤ d := ( e πi ( j − k ) /d B ( j, k )) ≤ j,k ≤ d . This is easily veriﬁed to be a shift. We see that τ ( aα n ( a ) α n ( a )) = 1 d (cid:88) j,k,l ∈ Z /d Z e πin ( k + l − j ) /d A ( j, k ) A ( k, l ) A ( l, j ) . This expression is periodic in n with period d , and has average1 d (cid:88) l,r ∈ Z /d Z A ( l, l + r ) A ( l + r, l + 2 r ) A ( l + 2 r, l )and the claim then follows from Lemma 2.3. (cid:3) This shows that recurrence on average for k = 3 can fail for non-ergodic systems.However, this is not yet enough to establish either Theorem 1.11 or Theorem 1.12.To obtain these stronger results we must introduce the crossed product construction in von Neumann algebras. For a comprehensive introduction to this concept, see[26, Chapter 13]. We shall just recall the key properties of this construction weneed here.Suppose we have a ﬁnite von Neumann algebra ( M , τ ), and an action U of a(discrete) group G on M , thus for each g ∈ G we have a shift U ( g ) : M → M suchthat U ( g ) U ( h ) = U ( gh ) for all g, h ∈ G , with U (id) being the identity. Then thereexists a crossed product ( M (cid:111) U G, τ ) which contains both the original space ( M , τ )and the group algebra C G as subalgebras. Furthermore, in this crossed product wehave(13) U ( g ) a = gag − for all a ∈ M and g ∈ G , and τ ( ga ) = τ ( ag ) = 0for all a ∈ M and g ∈ G with g not equal to the identity. Finally, the span of theelements ag for a ∈ M and g ∈ G is dense in M (cid:111) U G . Remark . The exact construction of the crossed product is not relevant for ourapplications, but for the convenience of the reader we sketch one such constructionhere. We ﬁrst form the Hilbert space h := (cid:96) ( G, L ( τ )) = (cid:77) g ∈ G L ( τ )consisting of tuples ( x g ) g ∈ G in L ( τ ). This space has an action of M deﬁned by a ( x g ) g ∈ G := (( U ( g − ) a ) x g ) g ∈ G ON NEUMANN NONCONVENTIONAL AVERAGES 17 for a ∈ M , and an action of G (and hence C G ) deﬁned by h ( x g ) g ∈ G := ( x h − g ) g ∈ G . One can verify that these actions combine to an action of the twisted convolu-tion algebra (cid:96) ( G, M ) on h , deﬁned as the space of formal sums (cid:80) h ∈ G ha h with (cid:80) h ∈ G (cid:107) a h (cid:107) < ∞ , and subject to the relations (13). We deﬁne a trace on such sumsby the formula τ ( (cid:80) h ∈ G ha h ) := τ ( a id ). One can then show that one can extendthis to a ﬁnite trace on the weak operator topology closure of (cid:96) ( G, M ), viewedas a subset of B ( h ); this closure can then be denoted M (cid:111) U G . In other words, M (cid:111) U G is constructed as the von Neumann algebra generated by the action of M and G on h . (cid:67) Example . The group von Neumann algebra LG can be viewed as C (cid:111) G , where G acts trivially on the one-dimensional von Neumann algebra C . (cid:67) We can now get a stronger version of Corollary 2.4:

Theorem 2.5 (Negative trace for non-ergodic system) . There exists a von Neu-mann dynamical system ( M , τ, α ) and a non-negative element a ∈ M , such that τ ( aα n ( a ) α n ( a )) is negative (and independent of n ) for all non-zero n . In particu-lar, Theorem 1.12 holds.Proof. Let ( M (cid:48) , τ, β ) be a von Neumann dynamical system to be chosen later.Using the crossed product construction, we can build an extension M := M (cid:48) (cid:111) U Z of M (cid:48) generated by M (cid:48) and two commuting unitary elements u, m , such that(14) mam − = β ( a )and uau − = a for all a ∈ M (cid:48) . In particular, the element u is central. It is then easy to see thatwe can build a shift α on M for which α ( a ) = a ; α ( u ) = u ; α ( m ) = mu for all a ∈ M (cid:48) , since the action of the group Z generated by m and u on M (cid:48) isunchanged when one replaces m by mu .Now let a ∈ M be an element of the form a = (cid:32)(cid:88) i ∈ Z f i m i (cid:33) (cid:32)(cid:88) i ∈ Z f i m i (cid:33) ∗ where f i ∈ M (cid:48) , and only ﬁnitely many of the f i are non-zero. This is clearlynon-negative, and can be simpliﬁed by (14) to the power series a = (cid:88) h ∈ Z g h m h To build α explicitly, we can view M as an algebra of operators on the Hilbert space h := (cid:76) ( j,k ) ∈ Z L ( τ ) as per Remark 2.4, and let α be the conjugation a (cid:55)→ W aW ∗ by the unitaryoperator W : h → h deﬁned by W ( x ( j,k ) ) ( j,k ) ∈ Z := ( x ( j,k − j ) ) ( j,k ) ∈ Z . where the g h ∈ M (cid:48) are the twisted autocorrelations of the f j , g h = (cid:88) j ∈ Z f j + h β h ( f ∗ j ) . Let n be non-zero. The expression τ ( aα n ( a ) α n ( a )) can be expanded as (cid:88) h ,h ,h ∈ Z τ ( g h m h g h ( mu n ) h g h ( mu n ) h ) . The net power of the central element u here is n ( h +2 h ), and the net power of m is h + h + h . Thus we see that the trace vanishes unless h +2 h = h + h + h = 0,or equivalently if ( h , h , h ) = ( h, − h, h ) for some h . Performing this substitutionand using (14), we simplify this expression to(15) (cid:88) h ∈ Z τ ( g h β h ( g − h ) β − h ( g h )) . In particular, this expression is now manifestly independent of n (cid:54) = 0.We now select M (cid:48) to be the commutative von Neumann system L ∞ ( Z /d Z ) withthe shift β ( f ( x )) := f ( x + 1) and the normalised trace. Thus the g h and f h arenow complex-valued functions on Z /d Z , and the above expression can be expandedexplicitly as 1 d (cid:88) x ∈ Z /d Z (cid:88) h ∈ Z g h ( x ) g − h ( x + h ) g h ( x − h ) . Meanwhile, the g h ( x ) by deﬁnition can be written as g h ( x ) = (cid:88) j ∈ Z f j + h ( x ) f j ( x + h ) . We pick a large number N to be chosen later, and set f j ( x ) := b ( x, x + j )1 ≤ j ≤ Nd where b : Z /d Z × Z /d Z → C is a function periodic in two variables of period d tobe chosen later. Then we can compute g h ( x ) = (cid:18) − | h | dN (cid:19) + N A ( x, x + h ) + O (1)where(16) A ( x, y ) := (cid:88) z ∈ Z /d Z b ( x, z ) b ( y, z )and O (1) denotes a quantity that can depend on d (and b ) but is uniformly boundedin N . The expression (15) can then be computed to be C N d (cid:88) x,h ∈ Z /d Z A ( x, x + h ) A ( x + h, x − h ) A ( x − h, x ) + O ( N )where C > C := (cid:90) R (1 − | h | ) (1 − | h | ) + dh. ON NEUMANN NONCONVENTIONAL AVERAGES 19

By the substitution x = m + r, h = r , we can re-express this as(17) C N d (cid:88) m,r ∈ Z /d Z A ( m, m + r ) A ( m + r, m + 2 r ) A ( m + 2 r, m ) + O ( N ) . Now, let d and A ( j, k ) be as in Lemma 2.3. By the spectral theorem (which inparticular allows one to construct self-adjoint square roots of positive deﬁnite ma-trices), we can ﬁnd b ( x, y ) so that (16) holds. The summand in (17) is then negative,and the claim follows by choosing N large enough depending on all other parame-ters. (cid:3) Of course, by Theorem 1.10, one cannot have such a result when the underlyingshift α is ergodic. On the other hand, one can extend Corollary 2.4 to the ergodiccase: Theorem 2.6.

There exists an ergodic von Neumann system ( M , τ, α ) and a non-negative element a ∈ M , such that N +1 (cid:80) Nn = − N τ ( aα n ( a ) α n ( a )) converges to anegative number. In particular, Theorem 1.11 holds.Proof. Let d be a large odd number, and let u := e πi/d be a primitive d th rootof unity. We will let M be a completion of the non-commutative torus . This isobtained by ﬁrst forming the C ∗ -algebra generated by two unitary generators e , e obeying the commutation relation e e = ue e and with all of the expressions e j e k having zero trace unless j = k = 0, in whichcase the trace is 1; and then completing in the weak operator topology resultingfrom the Gel’fand-Naimark-Segal representation on L ( τ ). One can represent thisﬁnite von Neumann algebra more explicitly by letting e , e act on L (( R / Z ) ) bythe maps e f ( x, y ) := e πix f ( x, y ) and e f ( x, y ) := e πiy f ( x + 1 /d, y ), with thetrace τ given by τ ( a ) = (cid:104) Ω , a Ω (cid:105) L (( R / Z ) ) , where Ω ≡ R / Z ) .We let θ , θ ∈ S be generic unit phases, and then deﬁne the shift α on M bysetting α ( e ) := θ e ; α ( e ) := θ e . It is easy to see that this is a shift. If θ , θ are generic (so that θ j θ k is not a rootof unity for any ( j, k ) (cid:54) = (0 , e j e k , and then argueas in the proof of Theorem 2.1 using the faithfulness of τ ).We set a := gg ∗ , where g is an element of the form g := M (cid:88) k =1 (cid:88) h ∈ Z c h e h e k ,M is a large number (much larger than d ) to be chosen later, and c h are complexnumbers to be chosen later, all but ﬁnitely many of which are zero. Clearly a isnon-negative. A computation shows that a = (cid:88) h,k ∈ Z c h,k e h e k

20 TIM AUSTIN, TANJA EISNER, AND TERENCE TAO where(18) c h,k := M (cid:18) − | k | M (cid:19) + (cid:88) l ∈ Z c l + h c l u kl . Since α n ( a ) = (cid:88) h,k ∈ Z c h,k θ hn θ kn e h e k , some Fourier analysis and the genericity of θ , θ show that the expression12 N + 1 N (cid:88) n = − N τ ( aα n ( a ) α n ( a ))converges as N → ∞ to the expression (cid:88) h,k c h,k c − h, − k c h,k τ ( e h e k e − h e − k e h e k ) . The trace here simpliﬁes to u hk . Inserting (18), we can expand this expression as(19) M (cid:88) h,k,l ,l ,l ∈ Z φ ( k/M ) c l + h c l c l − h c l c l + h c l u kl − kl + kl +3 hk where φ ( x ) := (1 − | x | ) (1 − | x | ) + . By Poisson summation, the expression (cid:88) k φ ( k/M ) u kl − kl + kl +3 hk can be computed to be M (cid:82) R φ ( x ) dx + O (1) if l − l + l + 3 h is divisible by d ,and O (1) otherwise, where O (1) denotes a quantity that can depend on d but isbounded uniformly in M . If we then assume that the c h vanish for h outside of { , . . . , M } and are bounded uniformly in M , we can thus expand (19) as CM (cid:88) h,l ,l ,l ∈ Z : d | l − l + l +3 h c l + h c l c l − h c l c l + h c l + O ( M )for some absolute constant C > c h := b ( h )1 [1 ,M ] ( h ), where b : Z /d Z → C is a periodic function withperiod d and independent of M to be chosen later, we can express this as C d M (cid:88) h,l ,l ,l ∈ Z /d Z : l − l + l +3 h =0 b ( l + h ) b ( l ) b ( l − h ) b ( l ) b ( l + h ) b ( l )+ O ( M )for some C d > d but independent of M . Making the substitution l = x ; l = x + k + 2 h ; l = x + 2 k + h , we see that we will be done as soon as weare able to ﬁnd d, b for which the expression X := (cid:88) x,h,k ∈ Z /d Z b ( x ) b ( x + h ) b ( x + k ) b ( x + k + 2 h ) b ( x + 2 k + h ) b ( x + 2 k + 2 h )is negative.To do this, we again appeal to Lemma 2.2 to ﬁnd a set F ⊂ Z /d Z of size at least d . (assuming d large enough), which contains no arithmetic progressions of length ON NEUMANN NONCONVENTIONAL AVERAGES 21 three, but contains at least d . hexagons x, x + h, x + k, x + k + 2 h, x + 2 k + h, x +2 k + 2 h . We then set b ( x ) := (cid:15) x F ( x )where the (cid:15) x = ± X is now the random variable X = (cid:88) x,h,k : x,x + h,x + k,x + k +2 h,x +2 k + h,x +2 k +2 h ∈ F (cid:15) x (cid:15) x + h (cid:15) x + k (cid:15) x +2 h + k (cid:15) x + h +2 k (cid:15) x +2 h +2 k . We will show (for d large enough) that the standard deviation of X exceeds itsexpectation, which shows that there exists a choice of signs for which X is negative.We ﬁrst compute the expectation of X . The only summands with non-zero ex-pectation occur when all the signs cancel, which only occurs when h = 0 or when k = 0, as can be seen by an inspection of the number of ways to collapse thehexagon in Figure 1; here we need the hypothesis that d is odd. But as F containsno non-trivial arithmetic progressions, there are no summands for which only oneof the h, k are zero, so we are left only with the h = k = 0 terms, of which thereare at most d . Thus the expectation of X is at most d .Now we compute the variance. There are at least d . hexagons in F , and all but O ( d ) of them are non-degenerate in the sense that the six vertices of the hexagonare all distinct. The summands in X corresponding to non-degenerate hexagonshave variance 1, and the correlation between any two summands in X either zeroor positive (the latter occurs when two summands are permutations of each other).Thus the variance of X is (cid:29) d . , so the standard deviation is (cid:29) d . , and theclaim follows. (cid:3) Negative trace for k = 5 . Now we show negative traces can occur even inthe ergodic case when k = 5. Theorem 2.7.

There exists an ergodic von Neumann dynamical system ( M , τ, α ) and a non-negative element a ∈ M , such that τ ( aα n ( a ) α n ( a ) α n ( a ) α n ( a )) isnegative for every non-zero n . This establishes the k = 5 case of Theorem 1.13. A similar argument holds for alllarger odd values of k , which we leave to the interested reader; we restrict here tothe case k = 5 simply for ease of notation.To prove this theorem, our starting point is the following result of Bergelson, Host,Kra, and Ruzsa [4]: Theorem 2.8.

For any δ > , there exists a measure-preserving system ( X, X , µ, S ) and a measurable set A ⊂ X with < µ ( A ) < δ such that µ ( A ∩ S n ( A ) ∩ S n ( A ) ∩ S n ( A ) ∩ S n ( A )) ≤ µ ( A ) (say) and (20) µ ( A ∩ S n ( A )) = µ ( A ) for every non-zero integer n .Proof. This follows from [4, Theorem 1.3] (see also the remark immediately belowthat theorem). The property (20) is not explicitly stated in that theorem, but follows from the construction in [4, Section 2.3] (the system X is a torus ( R / Z ) with the skew shift S : ( x, y ) (cid:55)→ ( x + α, y + 2 x + α ), and the set A has the specialform A = ( R / Z ) × B for some set B ). (cid:3) We apply this theorem for some suﬃciently small δ (to be chosen later) to obtain X, µ, S, A with the above properties. We will combine this with the group G , theautomorphism T , and the elements e , e , e , e , e arising from Proposition B.9 asfollows.First, we create the product space L ∞ ( X G , dµ G ), whose σ -algebra is generated upto negligible sets by the tensor products (cid:78) g ∈ G f g , where f g ∈ L ∞ ( X, dµ ) is equalto 1 for all but ﬁnitely many g . This product has a unitary, trace-preserving action U of G , deﬁned by U ( h ) (cid:79) g ∈ G f g := (cid:79) g ∈ G f h − g . We can therefore create the crossed product M := L ∞ ( X G , dµ G ) (cid:111) U G . Note thatif we embed L ∞ ( X, µ ) into L ∞ ( X G , dµ G ) by using the identity component of X G ,we have(21) (cid:79) g ∈ G f g = (cid:89) g ∈ G U ( g ) f g (note that the U ( g ) f g necessarily commute with each other.)We deﬁne a shift α on M by requiring that α ( (cid:79) g ∈ G f g ) = (cid:79) g ∈ G S ( f T − g )and α ( g ) = T g ;one can check that this is indeed a well-deﬁned shift on M .We claim that α is ergodic. Indeed, if a ∈ M is of the form a = f g for some f ∈ L ∞ ( X G , dµ G ) and g ∈ G not equal to the identity, then as the powers of T haveno non-trivial ﬁxed points, the orbit T n g escapes to inﬁnity, and the orbit α n ( a )converges weakly to zero. Meanwhile, if g is the identity, then it is classical that theBernoulli system G (cid:8) L ∞ ( X G , dµ G ) is ergodic, and so the ergodic theorem appliesto a in this case. Putting the two facts together and arguing as for the ergodicityin Theorem 2.1 yields the ergodicity of α .Note that 1 A lies in L ∞ ( X, dµ ), and can thus be identiﬁed with an element of M by the previous embedding. We set a := (cid:88) i =0 A · (2 − e i − e − i ) · A . Clearly a is non-negative. Now let n be non-zero, and consider the expression(22) τ ( aα n ( a ) α n ( a ) α n ( a ) α n ( a )) . Expanding out a , we obtain a linear combination of terms of the form τ (1 A g A S n ( A ) ( T n g )1 S n ( A ) S n ( A ) ( T n g )1 S n ( A ) S n ( A ) ( T n g )1 S n ( A ) S n ( A ) ( T n g )1 S n ( A ) ) ON NEUMANN NONCONVENTIONAL AVERAGES 23 where g , g , g , g , g ∈ { id, e , e , e , e , e , e − , e − , e − , e − , e − } . This trace vanishes unless(23) g T n g T n g T n g T n g = id . By Proposition B.9, we conclude that g , g , g , g , g are either all equal to the iden-tity, or are a permutation of e , e , e , e , e , or are a permutation of e − , e − , e − , e − , e − .In the latter two cases, the contribution to (22) is either zero or negative (beingnegative the trace of the product of several non-negative elements in a commuta-tive von Neumann algebra). Here we are using the fact that 5 is odd. Discardingall of these contributions except the one where g i, = e i, (which has a non-trivialcontribution thanks to Proposition B.9), we conclude that (22) is at most10 τ (1 A S n ( A ) S n ( A ) S n ( A ) S n ( A ) ) − τ (1 A e A S n ( A ) e S n ( A ) S n ( A ) e S n ( A ) S n ( A ) e S n ( A ) S n ( A ) e S n ( A ) ) . By Theorem 2.8, the ﬁrst expression is at most 10 µ ( A ) . Now consider thesecond expression. By Proposition B.9, we see that the partial products e e . . . e i for i = 0 , , , µ ( S n ( A ) ∩ A ) µ ( A ∩ S n ( A )) µ ( S n ( A ) ∩ S n ( A )) µ ( S n ( A ) ∩ S n ( A )) µ ( S n ( A ) ∩ S n ( A )) , which by (20) is equal to µ ( A ) . Thus the expression (22) is at most 2 µ ( A ) − µ ( A ) , which is negative if the upper bound δ for µ ( A ) is chosen to be suﬃcientlysmall.This concludes the proof of Theorem 2.7. Remark . Given that the counterexample in Theorem 2.8 can be extended toany k ≥

5, it seems reasonable to expect that Theorem 1.13 can be extended to all k ≥ k ), though we have not pursued this issue. On the otherhand, the analogue of Theorem 2.8 fails for k = 4, as was shown in [4]. Because ofthis, the k = 4 case of Theorem 1.13 remains open; the construction given here doesnot work, but it is possible that some other construction would suﬃce instead. (cid:67) Inclusions of finite von Neumann dynamical systems

In this section we quickly recall some fairly well-known constructions relating to vonNeumann dynamical systems and their basic properties, culminating in a treatmentof Popa’s noncommutative version of the Furstenberg-Zimmer dichotomy from [32].This material will be needed to establish the structure theorem (Theorem 1.9).Let ( M , τ ) be a ﬁnite von Neumann algebra. As noted in the introduction, we canembed M into a Hilbert space L ( τ ). In order to distinguish the algebra structurefrom the Hilbert space structure , we shall refer in this section to the embedded It is tempting to ignore these distinctions and identify (cid:99) M with M . While this is normallyqutie a harmless identiﬁcation, we will take some care here because we will be studying the bi-module action of M on L ( τ ), and keeping track of this action can become notationally confusingif the algebra elements are identiﬁed with the vectors that they act on. copy of an element a ∈ M of the algebra in L ( τ ) as (cid:98) a rather than a , thus forinstance (cid:99) M = { (cid:98) a : a ∈ M} is a dense subspace of L ( τ ).Clearly, L ( τ ) has the structure of an M -bimodule, formed by extending the reg-ular bimodule structure on M by density; the left-representation is, of course, theclassical Gel’fand-Naimark-Segal representation associated to τ . When it is neces-sary to denote the copy of M in B ( L ( τ )) consisting of the members of M actingby multiplication on the left (respectively, right), we will denote this algebra by M left (respectively, M right ).The space L ( τ ) contains a distinguished vector (cid:98) M – with the property that a (cid:98) (cid:98) a = (cid:98) a for all a ∈ M .This vector will play a prominent role in the rest of this section.Now let ( N , τ | N ) be a von Neumann subalgebra of ( M , τ ) (with the inheritedtrace). Then we can canonically identify L ( τ | N ) with the closed subspace { (cid:98) b : b ∈ N } = N (cid:98) (cid:98) N of L ( τ ) in the obvious manner.We will make use of certain well-known properties of these constructs, which wemerely recall here. A clear account of all of them can be found in [24, Chapters1,3].First, it is important that there is a simple necessary and suﬃcient condition for avector ξ ∈ L ( τ ) to lie in the dense subspace (cid:99) M : this is so if and only if the linearoperator (cid:99) M → L ( τ ) : (cid:98) x (cid:55)→ xξ is bounded for the norm (cid:107) · (cid:107) L ( τ ) , and so extends by continuity to a boundedoperator L ( τ ) → L ( τ ). The necessity of this conclusion is clear, and its suﬃciencyrequires just a little argument using the fact that for a ﬁnite von Neumann algebra( M , τ ) we have M right = M (cid:48)(cid:48) right and M left = M (cid:48)(cid:48) left ; see [24, Theorem 1.2.4].A simple application of this condition now shows that the orthogonal projection e N : L ( τ ) → N (cid:98) (cid:99) M into (cid:98) N , and so deﬁnes also a linearoperator E N : M → N . Indeed, for a ∈ M we need only to show that the map (cid:99) M → L ( τ ) : (cid:98) x (cid:55)→ xe N ( (cid:98) a )is bounded for the norm (cid:107) · (cid:107) L ( τ ) . Since N is also a von Neumann algebra and e N ( (cid:98) a ) ∈ N (cid:98) ∼ = L ( τ | N ), it actually suﬃces to check this for x ∈ N . However, since N (cid:98) N , N )-sub-bimodule, left multiplication by x commutes with e N , and sowe have (cid:107) xe N ( (cid:98) a ) (cid:107) L ( τ ) = (cid:107) e N ( x (cid:98) a ) (cid:107) L ( τ ) ≤ (cid:107) (cid:99) xa (cid:107) L ( τ ) ≤ (cid:107) a (cid:107)(cid:107) (cid:98) x (cid:107) L ( τ ) , as required.The linear operator E N is referred to as the conditional expectation of M onto N associated to τ , and it has the following readily-veriﬁed properties: Lemma 3.1 (Properties of conditional expectation) . For all a ∈ M , the operator E N satisﬁes: ON NEUMANN NONCONVENTIONAL AVERAGES 25 • (Idempotence) E N ( E N ( a )) = E N ( a ) ; • (Contractivity) (cid:107) E N ( a ) (cid:107) ≤ (cid:107) a (cid:107) ; • (Trace-preservation) τ | N ( E N ( a )) = τ ( a ) ; • (Positivity) E N ( a ∗ a ) ≥ (as a member of N ); and • (Relation with e N ) For all ξ ∈ L ( τ ) , one has e N ( a ( e N ( ξ ))) = E N ( a )( e N ( ξ )) = e N ( E N ( a )( ξ )) . Example . If M = L ∞ ( X, X , µ ) for some probability measure µ with the usualtrace, and ( Y, Y , ν ) is a factor space of ( X, X , µ ) with a measurable factor map π : X → Y that pushes µ forward to ν , then L ∞ ( Y, Y , ν ) can be identiﬁed witha subalgebra of M , and the conditional expectation map becomes its classicalcounterpart from probability theory. (cid:67) Together with M , the orthogonal projection e N now generates in B ( L ( τ )) a largervon Neumann algebra (cid:104)M , e N (cid:105) ⊇ M . In general (cid:104)M , e N (cid:105) is no longer a ﬁnite vonNeumann algebra, but it does contain the dense ∗ -subalgebra A := lin( M∪{ xe N y : x, y ∈ M} ) on which we deﬁne the lifted trace ¯ τ : A → C by specifying ¯ τ ( xe N y ) = τ ( xy ). By choosing an orthonormal basis for L ( τ ) relative to the right action of N , and consequently realizing (cid:104)M , e N (cid:105) as an ampliﬁcation of N , this linear map isseen to be non-negative and faithful, and hence deﬁnes a semiﬁnite normal faithful[0 , + ∞ ]-valued trace (which we still denote by ¯ τ ) on the cone ( (cid:104)M , e N (cid:105) ) + of non-negative (and self-adjoint) elements of (cid:104)M , e N (cid:105) . This witnesses that the algebra (cid:104)M , e N (cid:105) is semiﬁnite (that is, any positive element of it may be approximatedfrom below by ﬁnite-¯ τ positive elements). We will not spell out these standardmanipulations here (see, for instance, [32, Section 1.5]), but we will invoke a notionof orthonormal basis for right- N -submodules of L ( τ ) shortly. Remark . In case

N ⊂ M is a ﬁnite-index inclusion of ﬁnite II factors, then weﬁnd that (cid:104)M , e N (cid:105) is also a ﬁnite II factor. Writing M for this factor, it followsthat the above construction may be repeated with the inclusion M (cid:44) → M in placeof N (cid:44) → M , and indeed that it may be iterated to form an inﬁnite tower of II factors N ⊂ M ⊂ M ⊂ M ⊂ . . . . This is Jones’ basic construction ; it underlies his famous work [25] on the possiblevalues of the index [ N : M ], and also several more recent developments. Once againwe refer the reader to [24] for a thorough account of its importance, and numerousfurther references. However, since the construction of this whole inﬁnite tower isspecial to the case of II factors, we will not focus on it further here. (cid:67) It is easy to check that the right action of any n ∈ N commutes with any xe N y , andhence with any member of (cid:104)M , e N (cid:105) , and in fact it can be shown that (cid:104)M , e N (cid:105) (cid:48) = N right and hence that N (cid:48) right = (cid:104)M , e N (cid:105) (cid:48)(cid:48) = (cid:104)M , e N (cid:105) : ﬁrstly, if A ∈ B ( L ( τ ))commutes with every b ∈ M left then it must be the right-action of some a ∈ M ,and now if also e N ( (cid:98) a ) = (cid:98) a then we must in fact have a ∈ N (see Proposition3.1.2 in [24]). Let us record the following immediate but important consequence ofthis for our later work: Lemma 3.2. If V ≤ L ( τ ) is a closed right- N -submodule, then the orthogonalprojection P V : L ( τ ) → V is a member of (cid:104)M , e N (cid:105) . (cid:3) Using ¯ τ we can also deﬁne an alternative completion of A = lin M e N M for each p ∈ [1 , ∞ ) by setting (cid:107) A (cid:107) p, ¯ τ := p (cid:112) ¯ τ (( A ∗ A ) p/ ) for A ∈ A (where as usual the power( A ∗ A ) p/ is deﬁned using spectral theory for the selfadjoint operator A ∗ A , and thenon-negativity of ¯ τ is used to show that ¯ τ (( A ∗ A ) p/ ) is ﬁnite even when p/ L p (¯ τ ); it is a Hilbert space when p = 2.In general elements of L p (¯ τ ) do not correspond to elements of (cid:104)M , e N (cid:105) , but theydo give possibly unbounded but closable operators that are weakly approximableby members of this algebra, which are therefore aﬃliated to N right . If A ∈ L p (¯ τ ) issuch an operator that is self-adjoint, then it admits a spectral decomposition A = (cid:90) R s P (d s )for some spectral measure P on R taking values in the projections of (cid:104)M , e N (cid:105) ∩ L (¯ τ ), of possibly unbounded support in R , but for which (cid:107) A (cid:107) pp, ¯ τ = (cid:90) R | s | p ¯ τ P (d s ) < ∞ . If V is as in Lemma 3.2 then we may write that P V has ﬁnite lifted trace if itcorresponds to a member of (cid:104)M , e N (cid:105) ∩ L (¯ τ ).Now let us introduce some dynamics. Suppose that α is a shift on M whichrestricts to a shift on N . Then, as mentioned in the introduction, α induces aunitary operator acting on L ( τ ), which we shall distinguish from α by writing itas U α ; thus for instance U α (cid:98) a = U α ( a (cid:98)

1) = α ( a ) (cid:98) (cid:100) α ( a )for all a ∈ M . It is clear that N (cid:98) U α , so that U α commutes with e N . Also, conjugation by U α agrees with the action α on M , thus U α aU − α ξ = α ( a ) ξ for all a ∈ M and ξ ∈ L ( τ ). Thus, conjugation by U α extends the action of α to (cid:104)M , e N (cid:105) .The following special class of one-sided submodules of L ( τ ) appears here almostexactly as in the commutative setting. Deﬁnition 3.3 (Finite-rank modules) . A left- (respectively, right-) N -submodule V of L ( τ ) has ﬁnite rank if there are some ξ , ξ , . . . , ξ r ∈ V such that V = (cid:80) ri =1 N ξ i (respectively, V = (cid:80) ri =1 ξ i N ), and the numerical value of its rank is theleast r ≥ for which this is possible. Proposition 3.4 (Relativized Gram-Schmidt procedure) . If V ≤ L ( τ ) is a U α -invariant right- N -submodule of ﬁnite rank r then there are ξ , ξ , . . . , ξ r ∈ L ( τ ) such that • the subspaces ξ i N ≤ L ( τ ) are pairwise orthogonal; and • V = (cid:80) ri =1 ξ i N .Proof. This uses a relativized Gram-Schmidt argument much as in the commutativesetting (see e.g. [18, Lemma 9.4]). We proceed by induction on r . If V has rank 1 ON NEUMANN NONCONVENTIONAL AVERAGES 27 then the result is immediate from the deﬁnition, so let us suppose that it has rank r + 1 for some r ≥

1. Then given a representation V = r +1 (cid:88) i =1 ξ ◦ i N , we know that any member of V may be approximated in (cid:107) · (cid:107) L ( τ ) by expressionsof the form ξ ◦ n + · · · + ξ ◦ r +1 n r +1 for n , n , . . . , n r +1 ∈ N . This, in turn, may bere-written as( ξ ⊥ n + · · · + ξ ⊥ r n r ) + (cid:0) ( ξ ◦ − ξ ⊥ ) n + · · · + ( ξ ◦ r − ξ ⊥ r ) n r (cid:1) + ξ ◦ r +1 n r +1 where for each i ≤ r we have decomposed ξ ◦ i into its component ξ ⊥ i orthogonal to ξ r +1 N and the remainder ξ ◦ i − ξ ⊥ i ∈ ξ r +1 N . Since ξ r +1 N is a right- N -submodule,it follows that the second and third inner sums in the above decomposition bothlie in ξ r +1 N , and now since ξ r +1 N ⊥ is also a right- N -submodule, we have in factshown that V = V + ξ r +1 N where V := (cid:80) ri =1 ξ ⊥ i N is a rank- r right- N -submodule that is orthogonal to ξ r +1 N .Applying the inductive hypothesis to V now completes the proof. (cid:3) The following deﬁnition is also drawn from the commutative world. This notionhas previously been extended to the setting of non-commutative algebras by Popain [32], who discusses several other aspects and equivalent conditions in that paper.(See also [31], [11], [6] for an analysis of the absolute analogue of weak mixing, inwhich the subalgebra N is the trivial algebra C Deﬁnition 3.5 (Relative weak mixing) . If ( M , τ, α ) is a von Neumann dynamicalsystem and N ⊂ M is an α -invariant von Neumann subalgebra, then α is weaklymixing relative to N if for any a ∈ M ∩ N ⊥ we have N N (cid:88) n =1 (cid:107) E N ( a ∗ α n ( a )) (cid:107) τ → as N → ∞ . The basic inverse theorem that we need, extending the idea of Furstenberg andZimmer to the non-commutative context, is contained in the following proposition,which essentially re-proves part of [32, Lemma 2.10]:

Proposition 3.6 (Lack of weak mixing implies ﬁnite trace submodule) . If α isnot weakly mixing relative to N then there is a U α -invariant right- N -submodule V ≤ L ( τ ) (cid:9) N (cid:98) such that P V has ﬁnite lifted trace.Proof. Suppose that a ∈ M ∩ N ⊥ is such that1 N N (cid:88) n =1 (cid:107) E N ( a ∗ α n ( a )) (cid:107) τ (cid:54)→ . Deﬁne b := ae N a ∗ ∈ (cid:104)M , e N (cid:105) , and now observe (using the cyclic permutability of¯ τ and the identity e N me N ≡ E N ( m ) e N ) that for any n ∈ N we have¯ τ (cid:0) b ( U nα bU − nα ) (cid:1) = ¯ τ ( ae N a ∗ U nα ( ae N a ∗ ) U − nα ) = ¯ τ ( ae N a ∗ α n ( a ) e N α n ( a ) ∗ )= ¯ τ (cid:0) E N ( a ∗ α n ( a )) e N α n ( a ) ∗ a (cid:1) = (cid:107) E N ( a ∗ α n ( a )) (cid:107) τ . Averaging in n it follows that¯ τ (cid:16) b N N (cid:88) n =1 α n ( b ) (cid:17) → (cid:104) b, b (cid:105) ¯ τ (cid:54) = 0where b is the limit of the ergodic averages N (cid:80) Nn =1 α n ( b ) in the Hilbertian com-pletion L (¯ τ ), which is therefore invariant under the further extension of the unitaryoperator U α to this Hilbert space.This new element b need not, in general, correspond to a member of (cid:104)M , e N (cid:105) (it iseasily seen to be so in the commutative setting, but for special reasons); however,as a (cid:107) · (cid:107) , ¯ τ -limit of members of (cid:104)M , e N (cid:105) = N (cid:48) right it can always be identiﬁed witha closed operator on L ( τ ) that is aﬃliated with the right action of the algebra N ,and as such it admits a spectral decomposition b = (cid:90) ∞ s P (d s )for some resolution of the identity P on [0 , ∞ ) whose contributing spectral projec-tions lie in (cid:104)M , e N (cid:105) , and for which (cid:90) ∞ s ¯ τ ( P (d s )) = (cid:107) b (cid:107) , ¯ τ < ∞ . Hence ¯ τ P ( I ) < ∞ for any Borel subset I ⊆ (0 , ∞ ) bounded away from 0. Nowchoosing any such subset I for which P ( I ) (cid:54) = 0 gives an orthogonal projection P ( I ) ∈ (cid:104) M, e N (cid:105) of ﬁnite lifted trace that is U α -invariant, commutes with the right- N -action because it lies in (cid:104)M , e N (cid:105) , and moreover has image orthogonal to (cid:98) N because we initially chose b to lie in the orthogonal complement of this subspace. (cid:3) Remark . The above implication can in fact be reversed, and these conditionsshown to be equivalent to a number of others; see [32, Lemma 2.10] for a morecomplete picture. (cid:67)

In the next section we will push the above results a little further under the additionalassumption that the subalgebra N is central, leading to the proof of Theorem 1.9.4. The case of asymptotically abelian systems

We now specialize to the case of an asymptotically abelian system, with the crucialadditional assumption that the subalgebra N is central . Lemma 4.1.

Suppose that ( M , τ, α ) is a von Neumann dynamical system, N ⊂ M is an α -invariant central von Neumann subalgebra and V ≤ L ( τ ) is a U α -invariantright- N -submodule of ﬁnite lifted trace. Then for any ε > there is a further U α -invariant right- N -submodule V ≤ V such that ON NEUMANN NONCONVENTIONAL AVERAGES 29 • ¯ τ ( P V − P V ) < ε ; • V has ﬁnite rank, say r ≥ ; • there are an orthogonal right- N -basis ξ , ξ , . . . , ξ r and a unitary matrixof unitary operators U = ( u ji ) ≤ i,j ≤ r ∈ U r × r ( N ) such that U α ( ξ i ) = r (cid:88) j =1 ξ j u ji ∀ i = 1 , , . . . , r. We refer to U as the cocycle representing the action of U α on the basis elements ξ i .Proof. We will prove this invoking the picture of the representation of N on L ( τ )as a direct integral coming from spectral theory. By the classical theory of directintegrals (see, for instance, [26, Chapter 14]), we can select • a standard Borel probability space ( Y, ν ); • a Borel partition Y = (cid:83) n ≥ Y n ∪ Y ∞ ; • a collection of Hilbert spaces H n for n ∈ { , , . . . , ∞} with dim( H n ) = n ;and • a unitary equivalenceΦ : L ( τ ) → H := (cid:90) ⊕ Y H y ν (d y ) , where we deﬁne H y to be H n when y ∈ Y n ,such that N (acting on the right or left, since these agree for a central subal-gebra of M ) is identiﬁed with the algebra of functions L ∞ ( ν ) acting by point-wise multiplication. Explicitly, if we denote elements of H as measurable sections v : Y → (cid:96) y ∈ Y H y , then f ∈ L ∞ ( ν ) acts on H by M f ( v )( y ) := f ( y ) v ( y ) . Moreover, in order to accommodate Φ( N (cid:98)

1) we select a measurable section v ∈ H with (cid:107) v ( y ) (cid:107) H y ≡

1, and now N (cid:98) { y (cid:55)→ f ( y ) v ( y ) : f ∈ L ∞ ( µ ) } , so that the orthogonal projection Φ e N Φ − acts byΦ e N Φ − ( v )( y ) := (cid:104) v ( y ) , v ( y ) (cid:105) H y · v ( y ) . The larger algebra M right is identiﬁed under Φ with a direct integral (cid:90) ⊕ Y M y ν (d y ) , so that elements of Φ( M ) are expressed as measurable sections T : Y → (cid:96) y ∈ Y B ( H y )acting by T v ( y ) := T ( y )( v ( y ))and such that T ( y ) ∈ M y ν -almost surely, where ( M y ) y ∈ Y is a measurable ﬁeld ofﬁnite von Neumann subalgebras of B ( H y ) for each of which the state M y → C : T (cid:55)→ (cid:104) v ( y ) , T ( v ( y )) (cid:105) H y is a faithful ﬁnite trace; overall we have τ ( a ) = (cid:104) (cid:98) , a (cid:98) (cid:105) = (cid:90) Y (cid:104) v ( y ) , Φ( a )( y )( v ( y )) (cid:105) H y ν (d y )for a ∈ M , and so in particular if n ∈ N then Φ( n ) ∈ L ∞ ( µ ) and τ ( n ) = (cid:82) Φ( n ) d ν .Given these data, for a, b ∈ M we can compute thatΦ( ae N b )Φ − v ( y ) = (cid:104) Φ( b )( y )( v ( y )) , v ( y ) (cid:105) · Φ( a )( y )( v ( y ))and¯ τ ( ae N b ) = τ ( ab ) = (cid:90) Y (cid:104) v ( y ) , Φ( ab )( y )( v ( y )) (cid:105) H y ν (d y )= (cid:90) Y (cid:104) Φ( a ∗ )( y )( v ( y )) , Φ( b )( y )( v ( y )) (cid:105) H y ν (d y ) = (cid:90) Y tr(Φ( ae N b )Φ − | H y ) ν (d y ) . In this representation an N -submodule V ≤ L ( τ ) corresponds to a subspaceΦ( V ) ≤ H of the form (cid:82) ⊕ Y V y ν (d y ) for some measurable subﬁeld of Hilbert spaces V y ≤ H y , and the above calculation now shows that¯ τ ( P V ) = (cid:90) Y dim( V y ) ν (d y ) , so P V has ﬁnite lifted trace if and only if the function y (cid:55)→ dim( V y ) is ν -integrable.We can enhance this picture further by noting that since α preserves N it mustcorrespond to some ν -preserving transformation S (cid:121) Y , and that since it alsopreserves M and extends to a unitary operator on L ( τ ) it must also preserve each ofthe cells Y n . Similarly, since V is U α -invariant, the transformation S must preservethe function y (cid:55)→ deg( V y ). It follows that the unitary operator Φ U α Φ − on L ( τ ) isactually given by a measurable section of unitary operators Ψ : Y → (cid:96) y ∈ Y U ( H y )such that Φ U α Φ − v ( y ) = Ψ( y )( v ( S − y )) . Now, since y (cid:55)→ deg( V y ) is ν -integrable, for suﬃciently large r ≥ (cid:90) { y ∈ Y : deg( V y ) >r } deg( V y ) ν (d y ) < ε. Deﬁne W := (cid:90) ⊕{ y ∈ Y : deg( V y ) ≤ r } V y ν (d y ) ⊕ (cid:90) ⊕{ y ∈ Y : deg( V y ) >r } { } ν (d y )and V := Φ − ( W ). Clearly V is still a right- N -submodule that is U α -invariant,and it clearly also has rank at most r (since it suﬃces to prove this for W , for whichit follows by a relativized Gram-Schmidt construction of a ﬁbrewise-orthonormalbasis exactly as in the setting of commutative ergodic theory; see for instance [18,Lemma 9.4]). Also, we have¯ τ ( P V − P V ) = (cid:90) { y ∈ Y : deg( V y ) >r } deg( V y ) ν (d y ) < ε. Finally, the selection of unitaries Ψ must preserve the ﬁeld of subspaces V y abovethe S -invariant set { y ∈ Y : deg( V y ) = s } for each s ≤ r . Choosing an abstract ON NEUMANN NONCONVENTIONAL AVERAGES 31 d -dimensional Euclidean space W d for each d ≤ r and adjusting each ﬁbre of W by a unitary in order to identify each V y for which dim( V y ) ≤ r with W dim( V y ) ,we obtain a new representation of V as a right- N -submodule using these ﬁbres W d , so that the action of U α is now described by a measurable family of unitariesΨ (cid:48) ( y ) ∈ U ( W dim( V y ) ). Picking an orthonormal basis for each W d , writing theseunitary operators as unitary matrices in terms of these bases, noting that theirindividual entries are now identiﬁed with elements of L ∞ ( µ ) = Φ( N ) and carryingeverything back to L ( τ ) using Φ − gives the desired expression for U α . (cid:3) Remark . Frustratingly, both the fact that a U α -invariant V of ﬁnite lifted tracemay be approximated by a U α -invariant V of ﬁnite rank, and the fact that givensuch a module of ﬁnite rank the action of U α on it may be described by a unitaryelement in U ( M r × r ( N )), seem to be diﬃcult to prove without the assumption that N is central and the resulting representation of the action of N on L ( µ ) as themultiplication action of some L ∞ ( ν ) on a measurable ﬁeld of Hilbert spaces. Itwould be interesting to settle this issue more generally: Question . Do these conclusions hold for a ﬁnite-lifted-trace invariant submo-dule corresponding to an arbitrary inclusion of ﬁnite von Neumann algebras with atrace-preserving automorphism? (cid:67)

Before moving on let us quickly note an important diﬀerence from the setting ofabelian von Neumann algebras.

Example . If M is abelian, then from commutative ergodic theory it is well-known that all the intermediate U α -invariant submodules V ≤ L ( τ ) that haveﬁnite-rank over N together generate an intermediate sub algebra between N and M , and that this then corresponds to an intermediate measure-preserving system.We will see shortly that an analogous conclusion can sometimes be recovered in theasymptotically abelian setting, but it is certainly not true for general ﬁnite-ranksubmodules, even when the smaller algebra N is abelian.Consider, for example, the inclusion i : L Z ∼ = L ∞ ( m T ) (cid:44) → L F corresponding tothe embedding of Z as the cyclic subgroup a Z of the free group F = (cid:104) a, b (cid:105) . Here LG is the group von Neumann algebra of G , deﬁned in Section 2.1. In this casewe can identify L ( τ ) as (cid:96) ( F ) and L ( τ | N ) as the subspace spanned by { ξ a n } n ∈ Z .Now deﬁne α ∈ Aut L F simply by lifting the group automorphism of F that ﬁxes a and maps b (cid:55)→ ba . Now the subspace V := lin { ξ ba n : n ∈ Z } ≤ (cid:96) ( F ) is a U α -invariant right N -module of rank one which is orthogonal to L ( τ | N ). On theother hand, although ξ b ∈ (cid:99) M ∩ V , we have α m ( ξ b ) = α m ( ξ b ) = ξ ba m ba m for m ∈ Z ,and it is easy to see that these elements of M do not remain within any ﬁnite-rankright- N -submodule.It is true that if L ( τ ) (cid:9) L ( τ | N ) contains a ﬁnite-rank right- N -submodule V , thenit also contains a ﬁnite-rank left- N -module in the form of J ( V ), where J is the modular automorphism on V , deﬁned by extending the conjugation map a (cid:55)→ a ∗ on M ≡ (cid:99) M by density. The point is that it can happen that J ( V ) ⊥ V , and thatall elements of J ( V ) are weakly mixed by U α : it is the right-module V , and noother, that serves as the obstruction to overall relative weak mixing coming fromTheorem 1.8. (cid:67) We now introduce a useful technical concept.

Deﬁnition 4.3 (Central vectors) . A vector ξ ∈ L ( τ ) is central if mξ = ξm forall m ∈ M . Lemma 4.4 (No non-obvious central vectors) . The closure Z ( M ) (cid:98) (cid:98) Z ( M ) isequal to the set of all central vectors in L ( τ ) .Proof. Suppose that ξ ∈ L ( τ ) is central. Deﬁne a ξ : M (cid:98) → L ( τ ) by a ξ ( m (cid:98)

1) := ξm . This is a densely-deﬁned linear operator on L ( τ ), and it is closable because if m n (cid:98) (cid:98) m n → (cid:107) · (cid:107) L ( τ ) for some sequence ( m n ) n ≥ in M and also ξm n → ξ (cid:48) in (cid:107) · (cid:107) L ( τ ) , then we have (cid:104) m (cid:48) (cid:98) , ξ (cid:48) (cid:105) = lim n →∞ (cid:104) m (cid:48) (cid:98) , ξm n (cid:105) = lim n →∞ (cid:104) (cid:98) m ∗ n , ( m (cid:48) ) ∗ ξ (cid:105) = 0for every m (cid:48) ∈ M , and so in fact we must have ξ (cid:48) = 0. Also, we clearly have a ξ ( m (cid:98)

1) = a ξ ( (cid:98) m ) = ξm = mξ = ( a ξ ( (cid:98) m = m ( a ξ ( (cid:98) m ∈ M , so a ξ is aﬃliated with both the right- and left-actions of M on L ( τ ). The same therefore holds for a ξ + a ∗ ξ and i( a ξ − a ∗ ξ ), and now these are self-adjoint and so each of them may be expressed as an unbounded spectral integralall of whose contributing spectral projections must lie in M (cid:48) left ∩ M (cid:48) right = Z ( M ).Therefore, approximating a ξ = ( a ξ + a ∗ ξ ) + ( a ξ − a ∗ ξ ) by a sum of two largebut bounded integrals with respect to the respective resolutions of the identity,we obtain a sequence of elements a n ∈ Z ( M ) such that a n → a ξ pointwise ondom(clos( a ξ )) ⊇ M (cid:98)

1, and hence such that a n (cid:98) → ξ in (cid:107) · (cid:107) L ( τ ) . Hence ξ ∈ Z ( M ) (cid:98) (cid:3) Proposition 4.5. If ( M , τ, α ) is an asymptotically abelian von Neumann dynami-cal system, N is a shift-invariant central von Neumann subalgebra, and V ≤ L ( τ ) is an α -invariant right- N -submodule of M having ﬁnite lifted trace then all ele-ments of V are central vectors.Proof. Clearly it will suﬃce to prove this for all ﬁnite-rank approximants V to V as given by Lemma 4.1. Thus we may assume that V actually has ﬁnite rank. Let ξ , ξ , . . . , ξ r and U = ( u ji ) ≤ i,j ≤ r ∈ M r × r ( N ) be as given by the third part ofthat lemma.Since α is asymptotically abelian, we have for any a (cid:98) ∈ M (cid:98) b ∈ M that1 N N (cid:88) n =1 (cid:107) bU nα ( a (cid:98) − U nα ( a (cid:98) b (cid:107) L ( τ ) = 1 N N (cid:88) n =1 (cid:107) bα n ( a ) − α n ( a ) b (cid:107) L ( τ ) → . Approximating an arbitrary ξ ∈ L ( τ ) by elements of M (cid:98)

1, it follows that for eachﬁxed b ∈ M and ξ ∈ L ( τ ) we havelim N →∞ N N (cid:88) n =1 (cid:107) bU nα ( ξ ) − U nα ( ξ ) b (cid:107) L ( τ ) = 0 . ON NEUMANN NONCONVENTIONAL AVERAGES 33

On the other hand, we know that U α ( ξ i ) = r (cid:88) j =1 ξ j u ji ∀ i = 1 , , . . . , r, and so, writing U n = ( u ( n ) ji ) ≤ i,j ≤ r , we have U − nα ( ξ i ) = r (cid:88) j =1 ξ j u ( − n ) ji ⇒ ξ i = r (cid:88) j =1 U nα ( ξ j ) α n ( u ( − n ) ji ) ∀ i = 1 , , . . . , r. Clearly each u ( − n ) ji is still a unitary, and so from this, averaging in n and thecentrality of N we obtain (cid:107) bξ i − ξ i b (cid:107) L ( τ ) = (cid:13)(cid:13)(cid:13) N N (cid:88) n =1 (cid:16) r (cid:88) j =1 bU nα ( ξ j ) α n ( u ( − n ) ji ) − r (cid:88) j =1 U nα ( ξ j ) α n ( u ( − n ) ji ) b (cid:17)(cid:13)(cid:13)(cid:13) L ( τ ) = (cid:13)(cid:13)(cid:13) N N (cid:88) n =1 r (cid:88) j =1 (cid:0) bU nα ( ξ j ) − U nα ( ξ j ) b (cid:1) α n ( u ( − n ) ji ) (cid:13)(cid:13)(cid:13) L ( τ ) ≤ r (cid:88) j =1 N N (cid:88) n =1 (cid:107) bU nα ( ξ j ) − U nα ( ξ j ) b (cid:107) L ( τ ) , and now since each of the summands in j tends to 0 as N → ∞ , it follows that wemust in fact have bξ i = ξ i b for every i ≤ r , and hence (taking N -linear combinations,which have central coeﬃcients, and then a completion) that all vectors in V arecentral, as required. (cid:3) Let us note explicitly the following simple corollary of the above result.

Corollary 4.6. If ( M , τ, α ) is an asymptotically abelian von Neumann dynamicalsystem, then the subalgebra M α := { a ∈ M : α ( a ) = a } of individually α -invariantelements is central.Proof. Of course, if α ( a ) = a then lin { (cid:98) a } is a rank-one α -invariant submoduleof L ( τ ) for the trivial central subalgebra N := C (cid:98)

1, and the claim follows fromProposition 4.5. This claim can also be easily veriﬁed directly from the deﬁnitionof asymptotic abelianness. (cid:3)

Finally we can use the above results to prove Theorem 1.9.

Proof. (Proof of Theorem 1.9) Suppose, for the sake of contradiction, that α werenot weakly mixing relative to Z ( M ) ⊂ M . Then Proposition 3.6 gives a non-trivial right- Z ( M )-submodule V ≤ L ( τ ) (cid:9) Z ( M ) (cid:98) V must consist of central vectors. However,Lemma 4.4 now gives V ≤ Z ( M ) (cid:98)

1, implying a contradiction with our assumptionthat V ⊥ Z ( M ) (cid:98) (cid:3) Note that for the results in this section it suﬃces to assume that for every a ∈ M there exists a sequence { n j } such that lim j →∞ (cid:107) [ α n j ( a ) , b ] (cid:107) L ( τ ) = 0 for every b ∈M . We do not know whether this condition is strictly weaker than asymptoticallyabelianness. Remark . A variant of Theorem 1.9 can also be deduced from the results in [31](and more speciﬁcally, Theorem 4.2 and Proposition 5.5 of that paper); we thankthe anonymous referee for pointing out this fact. More speciﬁcally, the result is thatif α is an automorphism of a ﬁnite von Neumann algebra M that leaves invarianta faithful normal trace τ , and E τ is the conditional expectation to the factor M r := lin wot { a ∈ M : α ( a ) = λa for some λ ∈ T } , then for any a, b ∈ M one haslim N →∞ N N (cid:88) n =1 |(cid:104) E τ ( a ∗ α n ( a )) − E τ ( a ) ∗ α n ( E τ ( a )) , b (cid:105) L ( τ ) | = 0;in particular, for N going to inﬁnity along a density one set of integers, the ex-pression E τ ( a ∗ α n ( a )) − E τ ( a ) ∗ α n ( E τ ( a )) converges to zero in the weak operatortopology. This property is weaker than the relative weak mixing property withrespect to this factor (which one does not expect to hold in general, even in theabelian case), but on the other hand does not require any hypothesis of asymptoticabelianness.5. Triple averages for non-asymptotically-abelian systems

The use to which we put relative weak mixing in the preceding section is veryspecial to asymptotically abelian systems: in general there seems to be no way totrack the error term resulting from the re-arrangement at the heart of the proofof Theorem 1.8 without this assumption. However, in the special case of tripleaverages this problem does simplify somewhat, provided we assume instead thatour system ( M , τ, α ) is ergodic , so that M α = C

1. In this case we will be ableto obtain convergence weakly and in norm, as well as recurrence on a dense set(Theorem 1.10).This assumption is not so innocuous as might be expected from its analog in theworld of commutative ergodic theory. In that setting it is possible quite generallyto decompose a system (that is, more precisely, to decompose its invariant mea-sure) into ergodic components, and then many assertions about the whole system,including multiple recurrence and the convergence of multiple averages, follow ifthey can be proved for each ergodic component separately. However, in the set-ting of a general von Neumann dynamical system, this decomposition is availableonly if M α is central in M ; otherwise the automorphism α can exhibit genuinelynew phenomena precisely in virtue of having the nontrivial ﬁxed subalgebra M α to ‘move around’. This was already seen in the failure of recurrence on a dense setwhen the ergodicity hypothesis is dropped (Theorem 1.12).The key for convergence of triple averages is the following decomposition similarto the commutative case, ﬁrst established (in a slightly more general setting) in[31] (and more speciﬁcally, from Theorem 4.2 and Proposition 5.5 in that paper); ON NEUMANN NONCONVENTIONAL AVERAGES 35 for the convenience of the reader we give a short proof of that decomposition here.Note that the result does not require ergodicity of the system. We remark that aclosely related decomposition was also used in [13].

Proposition 5.1 (Decomposition of von Neumann dynamical systems) . [31] Let ( M , τ, α ) be a von Neumann dynamical system. Then one has the orthogonal de-composition M = M r ⊕ M s , where M r := lin wot { a ∈ M : α ( a ) = λa for some λ ∈ T } and M s := (cid:40) a ∈ M : lim N →∞ N N (cid:88) n =1 | τ ( b α n ( a )) | = 0 for every b ∈ M (cid:41) , i.e., M r is the von Neumann subalgebra spanned by the eigenvectors of α and M s is the subspace of the elements of M that are weakly mixed by α . The correspondingprojection onto M r is the conditional expectation of M onto M r and in particularpreserves positivity.Proof. Since the continuation U α of α to L ( τ ) is a unitary operator, the Jacobs–Glicksberg–de Leeuw decomposition holds for U α (see e.g. [29, Section 2.4]), i.e., L ( τ ) = L r ( τ ) ⊕ L s ( τ ), where the reversible part L r ( τ ) is deﬁned as L r ( τ ) = lin { x : U α ( x ) = λx for some λ ∈ T } and the stable part L s ( τ ) is deﬁned as the space of all x ∈ L ( τ ) such thatlim N →∞ N N (cid:88) n =1 |(cid:104) U nα ( x ) , y (cid:105)| = 0 for every y ∈ L ( τ ) . Moreover, this decomposition is orthogonal since U α is unitary. Note that we donot need here the Jacobs–Glicksberg–de Leeuw decomposition in full generalitybut only its version for unitary operators, which can be also proved via the spectraltheorem.By a result of Størmer [34], the eigenvectors of U α belong to M . We thus have M r = M ∩ L r ( τ ) and M s = M ∩ L s ( τ ). The fact that the weak operator closure and theclosure in the L ( τ )-topology coincide for self-adjoint subalgebras implies the secondformula for M r and thus M r is a von Neumann subalgebra of M . The conditionalexpectation now maps M onto M r assuring the orthogonal decomposition M = M r ⊕ M s . (cid:3) In the remainder of this section we assume our system is ergodic.

Proposition 5.2 (Convergence of triple averages) . Let ( M , τ, α ) be an ergodic vonNeumann dynamical system. Then the averages (24) 1 N N (cid:88) n =1 α n ( a ) α n ( b ) converge in (cid:107) · (cid:107) L ( τ ) as N → ∞ for every a, b ∈ M . Proof.

By the above proposition, it suﬃces to assume that a and b each belong to M r or M s . Suppose ﬁrst that a ∈ M r , and ﬁx b . The operators S N given by S N x = 1 N N (cid:88) n =1 α n ( x ) α n ( b )are linear and bounded on M for the norm (cid:107) · (cid:107) L ( τ ) , so we may assume that α ( a ) = λa for some λ ∈ T . Then S N a = N +1 (cid:80) Nn =0 a ( λα ) n ( b ) which converges in L ( τ ) by the mean ergodic theorem.Suppose now that a ∈ M s . We show that the desired limit is zero. Consider u n := α n ( a ) α n ( b ) (cid:98) (cid:104) u n , u n + j (cid:105) = τ ( α n ( b ∗ ) α n ( a ∗ ) α n + j ( a ) α n +2 j ( b ))= τ ( α n ( b ∗ ) a ∗ α j ( a ) α n +2 j ( b )) = τ ( a ∗ α j ( a ) α n ( α j ( b ) b ∗ )) . The ergodicity of the system implies γ j := lim N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 (cid:104) u n , u n + j (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ (cid:32) a ∗ α j ( a ) lim N →∞ N N (cid:88) n =1 α n ( α j ( b ) b ∗ ) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = | τ ( a ∗ α j ( a )) | · | τ ( α j ( b ) b ∗ ) | . Since a ∈ M s and τ ( α j ( b ) b ∗ ) are bounded in j , we havelim N →∞ N N (cid:88) j =1 γ j = 0 , and therefore by the classical van der Corput lemma for Hilbert spaces (see e.g.[15] or [3]), we have lim N →∞ N (cid:80) Nn =1 u n = 0. (cid:3) Remarks . (1) For compact non-ergodic systems the averages (24) convergeas well, since M = M r in this case; this was also observed in [6].(2) As in the commutative case we see that the Kronecker subalgebra M r ischaracteristic for (24), i.e., the limit of the averages in (24) does not changeif replacing a by E M r a and b by E M r b . (cid:67) As was shown in Corollary 2.4, one cannot expect the limitlim N →∞ N N (cid:88) n =1 τ ( aα n ( a ) α n ( a ))to be positive for every positive a . However, a modiﬁcation extending [6, Theorem5.13] is still true. Proposition 5.3.

For an ergodic von Neumann system ( M , τ, α ) , one has lim inf N →∞ N N (cid:88) n =1 (Re τ ( a α n ( a ) α n ( a ))) + > for every < a ∈ M . In particular, one has recurrence on a dense set. ON NEUMANN NONCONVENTIONAL AVERAGES 37

Proof.

Decompose a = b + c with b ∈ M r and c ∈ M s as in Proposition 5.1, with b > G , anopen set U ⊂ G and g ∈ G such that for the 1-step Bohr set K U := { n ∈ N : g n ∈ U } one has(25) Re τ ( b α n ( b ) α n ( b )) > τ ( b )2 > n ∈ K U . Take ε := τ ( b )18 (cid:107) b (cid:107) . Since b ∈ M r , we ﬁnd k ∈ N , λ , . . . λ k ∈ T and b , . . . , b k ∈ M \{ } such that α ( b j ) = λ j b j for every j = 1 , . . . , k and (cid:107) b − ( b + . . . + b k ) (cid:107) L ( τ ) < ε .Set now G := T k , g := ( λ , . . . , λ k ) and U := U ε/ ( k max (cid:107) b j (cid:107) ) (1) ⊂ T k . Observethat for every n such that g n ∈ U , we have | λ nj − | < ε/ ( k max (cid:107) b j (cid:107) ) for every j = 1 , . . . , k and therefore (cid:107) α n ( b ) − b (cid:107) L ( τ ) ≤ (cid:107) a n ( b + . . . + b k ) − ( b + . . . + b k ) (cid:107) L ( τ ) +2 (cid:107) b + . . . + b k − b (cid:107) L ( τ ) ≤ max (cid:107) b j (cid:107) L ( τ ) ( | λ n − | + . . . + | λ nk − | ) + 2 ε< max (cid:107) b j (cid:107) kεk max (cid:107) b j (cid:107) + 2 ε = 3 ε. So we have by the Cauchy-Schwarz inequality | τ ( b α n ( b ) α n ( b )) − τ ( b ) | ≤ | τ ( b α n ( b ) ( α n ( b ) − b )) | + | τ ( b ( α n ( b ) − b ) b ) |≤ (cid:107) b (cid:107) ( (cid:107) α n ( b ) − b (cid:107) L ( τ ) + (cid:107) α n ( b ) − b (cid:107) L ( τ ) ) ≤ (cid:107) b (cid:107) (cid:107) α n ( b ) − b (cid:107) L ( τ ) < (cid:107) b (cid:107) ε = τ ( b )2 , and (25) is proved.Take now V := U ε/ (2 k max (cid:107) b j (cid:107) ) (1) ⊂ U and a continuous function f : G → [0 , V ≤ f ≤ U . Then by (25) Re τ ( b α n ( b ) α n ( b )) is positive whenever f ( g n ) (cid:54) = 0 and thereforelim inf N →∞ N N (cid:88) n =1 f ( g n ) Re τ ( b α n ( b ) α n ( b )) ≥ lim inf N →∞ N N (cid:88) n =1 V ( g n ) Re τ ( b α n ( b ) α n ( b )) . Since the set K V := { n ∈ N : g n ∈ V } ⊂ K U is syndetic (i.e. has bounded gaps) in N , this implies by (25)(26) lim inf N →∞ N N (cid:88) n =1 f ( g n ) Re τ ( b α n ( b ) α n ( b )) > . Next, we show that(27) (cid:107) · (cid:107) L ( τ ) − lim N →∞ N N (cid:88) n =1 f ( g n ) α n ( b ) α n ( c ) = 0 . To do this, we ﬁrst consider a character γ ∈ (cid:98) G and deﬁne u n := γ ( g n ) α n ( b ) α n ( c ) (cid:98) . We have (cid:104) u n , u n + j (cid:105) = γ ( g n ) γ ( g n + j ) γ ( α n ( c ∗ ) α n ( b ∗ ) α n + j ( b ) α n +2 j ( c ))= γ ( g j ) τ ( α n ( c ∗ ) b ∗ α j ( b ) α n +2 j ( c )) = γ ( g j ) τ ( b ∗ α j ( b ) α n ( α j ( c ) c ∗ )) . By ergodicity of α , γ j := lim N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 (cid:104) u n , u n + j (cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ ( g j ) τ (cid:32) b ∗ α j ( b ) lim N →∞ N N (cid:88) n =1 α n ( α j ( c ) c ∗ ) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = | τ ( b ∗ α j ( b )) | · | τ ( α j ( c ) c ∗ ) | , and the assumption c ∈ M s implies lim N →∞ N (cid:80) Nj =1 γ j = 0. By the van derCorput estimate we thus havelim N →∞ N N (cid:88) n =1 u n = lim N →∞ N N (cid:88) n =1 γ ( g n ) α n ( b ) α n ( c ) (cid:98) . Since the characters form a total set in C ( G ) and the operators S N f := 1 N N (cid:88) n =1 f ( g n ) α n ( b ) α n ( c )are uniformly bounded on C ( G ), (27) is proved. Analogously one also has (cid:107) · (cid:107) L ( τ ) − lim N →∞ N N (cid:88) n =1 f ( g n ) α n ( c ) α n ( b ) = (cid:107) · (cid:107) L ( τ ) − lim N →∞ N N (cid:88) n =1 f ( g n ) α n ( c ) α n ( c ) = 0 . The Cauchy-Schwarz inequality implies now thatlim sup N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 f ( g n ) τ ( c α n ( b ) α n ( c )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = lim sup N →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ (cid:32) c N N (cid:88) n =1 f ( g n ) α n ( b ) α n ( c ) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) c (cid:107) L ( τ ) lim sup N →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N N (cid:88) n =1 f ( g n ) α n ( b ) α n ( c ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L ( τ ) = 0 , and analogously for the Ces`aro sums of f ( g n ) τ ( c α n ( c ) α n ( b )), f ( g n ) τ ( c α n ( c ) α n ( c ))and f ( g n ) τ ( b α n ( c ) α n ( c )) while τ ( c α n ( b ) α n ( b )) = τ ( b α n ( b ) α n ( c )) = τ ( b α n ( c ) α n ( b )) = 0follows from the orthogonality of M r and M s and the fact that M r is an α -invariantself-adjoint subalgebra of M .Combining this with (26), we obtain by the linearity of τ lim inf N →∞ N N (cid:88) n =1 (Re τ ( a α n ( a ) α n ( a ))) + ≥ lim inf N →∞ N N (cid:88) n =1 f ( g n )(Re τ ( a α n ( a ) α n ( a ))) + = lim inf N →∞ N N (cid:88) n =1 f ( g n )(Re τ ( b α n ( b ) α n ( b ))) + > . (cid:3) ON NEUMANN NONCONVENTIONAL AVERAGES 39 Closing remarks

We present some remarks concerning Problem 1.15. By Theorem 1.10, we have apositive answer to this question when the invariant algebra M α is trivial. One canalso extend these arguments to cover the case when the invariant algebra M α iscentral by representing M as a direct integral over M α , see Kadison, Ringrose [26,Chapter 14].It is clear that if the answer to Problem 1.16 is always positive, then the sameis true for Problem 1.15. What is less obvious is that the converse is true; if theanswer to Problem 1.15 is always true, then the answer to Problem 1.16 is alwaystrue. To see this, let ( M , τ ) be a ﬁnite von Neumann algebra with two commutingshifts α , α . We then form the inﬁnite tensor product M Z := (cid:78) n ∈ Z M , which isanother ﬁnite von Neumann algebra, which contains an embedded copy of M byusing the 0 coordinate of Z . Next, let G be the free abelian group on two generators e, f , and let U be the action of G on M Z deﬁned by U ( e ) (cid:79) n ∈ Z a n := (cid:79) n ∈ Z α α − ( a n )and U ( f ) (cid:79) n ∈ Z a n := (cid:79) n ∈ Z a n − for all a n ∈ M with all but ﬁnitely many a n equal to 1. If we deﬁne a shift α (cid:48) to M Z by the formula α (cid:48) (cid:79) n ∈ Z a n := (cid:79) n ∈ Z α n +1)1 α − n ( a n )we then observe the identities α (cid:48) U ( e )( α (cid:48) ) − = U ( e )and α (cid:48) U ( f )( α (cid:48) ) − = U ( f e )(here we use the hypothesis that α , α commute). Because of this, we can deﬁnea shift α on the crossed product M Z (cid:111) U G by declaring α to equal α (cid:48) on M Z , and α ( e ) := e ; α ( f ) := f e. If a , a lie in M Z , we observe that α n ( a f ) α n ( f − a f ) = ( α (cid:48) ) n ( a )(( α (cid:48) ) n U ( e ) − n ( a )) f. If we assume that a , a in fact lie in M , we can simplify this as α n ( a ) α n ( a ) f. Thus, if we assume Problem 1.15 has an aﬃrmative answer for the system M Z (cid:111) U G ,we see that the averages of α n ( a ) α n ( a ) f (and hence of α n ( a ) α n ( a )) convergefor arbitrary a , a ∈ M ; from this one easily deduces (after dividing n into evenand odd classes) that Problem 1.16 has an aﬃrmative answer for the system M .In particular, we see that the task of establishing Problem 1.15 in the aﬃrmative forarbitrary von Neumann dynamical systems is at least as hard as that of achievingconvergence for two commuting shifts in the abelian case, a result ﬁrst obtained in[8]. One can also cover some other (non-ergodic, non-abelian) cases of Problem 1.15 by ad hoc methods. Suppose for instance that M is a group von Neumann algebra LG ,with shift α given by automorphisms α , α : G → G of the group. Then one canaﬃrmatively answer Problem 1.15 as follows. Firstly, by density and linearity wemay assume that a , a are themselves group elements: a = g ∈ G , a = g ∈ G .We then see that the means of α n ( g ) α n ( g ) will converge to zero unless thereexists a group element g for which(28) α n ( g ) α n ( g ) = g for all n in a set of positive upper density. But such sets contain non-trivial par-allelograms n, n + h, n + k, n + h + k for h, k >

0. Applying (28) for n, n + h andrearranging, one obtains α n ( g α h ( g − )) = g − α h ( g ) . Similarly, applying (28) for n + k, n + h + k one has α n + k ( g α h ( g − )) = g − α h ( g ) . Writing u := g − α h ( g ), one thus has α h ( g ) = g u and α k ( u ) = u. If we then write v := g − α hk ( g ) = uα h ( u ) . . . α ( k − h ( u )we see that α hkn ( g ) = g v n for all n , and α ( v ) = v . Thus we have α hkn + j ( g ) α hkn +2 j ( g ) = α j ( g ( α hk ( v )) n α j ( g ))for any n, j . The means of this in n converge in L ( τ ) by the mean ergodic the-orem. Summing over all 0 ≤ j < hk we obtain weak convergence, thus answeringProblem 1.15 aﬃrmatively in this case. The same type of argument also lets onedeal with crossed products of abelian systems by groups, in which the shift acts asan automorphism on the group; we omit the details.Finally, we remark that the results on asymptotically abelian systems, while statedfor Z k -systems, should in fact be valid for any commuting action of a general locallycompact second countable (lcsc) abelian group. Appendix A. An application of the van der Corput lemma

The purpose of this appendix is to establish Theorem 1.8. Our arguments follow[31, Proposition 7.4, Theorem 7.5] closely (see also [6, Proposition 4.4] for anotheradaptation of the same argument). We may normalise α to be the identity.We induct on k ≥

2. When k = 2 we know from the usual mean ergodic theoremfor von Neumann algebras (see e.g. [29, Section 9.1]) that1 N N (cid:88) n =1 α n ( a ) → E M α ( a ) in (cid:107) · (cid:107) L ( τ ) , ON NEUMANN NONCONVENTIONAL AVERAGES 41 and since M α ⊆ N by the relative weak mixing assumption, we also have1 N N (cid:88) n =1 α n ( E N ( a )) → E M α ( E N ( a )) = E M α ( a ) in (cid:107) · (cid:107) L ( τ ) , so combining these conclusions gives the result.Now suppose that k ≥ (cid:96) < k automorphisms. By decomposing each a i as ( a i − E N ( a i )) + E N ( a i )and expanding out the expression (cid:81) k − i =1 α ni ( a i ), we ﬁnd that it suﬃces to show thatfor any i ≤ k − a i ⊥ N ⇒ N N (cid:88) n =1 k − (cid:89) i =1 α ni ( a i ) → (cid:107) · (cid:107) L ( τ ) ;let us argue the case i = 1, the others following at once by symmetry.By the Hilbert-space-valued version of the classical van der Corput estimate (see,for instance, [15] or [3]) this will follow if we show that1 H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 (cid:68) k − (cid:89) i =1 α n + hi ( a i ) , k − (cid:89) i =1 α ni ( a i ) (cid:69) τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 τ (cid:0) α nk − ( α hk − ( a ∗ k − )) · · · α n ( α h ( a ∗ )) · α n ( a ) · · · α nk − ( a k − ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) → N → ∞ and then H → ∞ .Let us now set b i := α ni ( α hi ( a ∗ i )) and c i := α ni ( α hi ( a i )) to lighten notation. Havingdone so, we now set ourselves up for applying the asymptotic abelianness propertyby observing that b k − b k − b k − · · · c c · · · = ( b k − b k − b k − · · · c c · · · ) + ([ b k − , b k − ] b k − · · · c c · · · )= ( b k − b k − b k − b k − · · · c c · · · ) + ( b k − [ b k − , b k − ] b k − · · · c c · · · )+([ b k − , b k − ] b k − b k − · · · c c · · · )...= b k − b k − b k − · · · b c c · · · c k − ( b k − c k − )+ k − (cid:88) j =1 x j [ b k − , b j ] y j + k − (cid:88) j =1 u j [ b k − , c j ] v j where each x j , y j , u j and v j for 1 ≤ j ≤ k − { b i , c i : i ≤ k − } .Importantly, there is some M > (cid:107) x j (cid:107) , (cid:107) y j (cid:107) , (cid:107) u j (cid:107) , (cid:107) v j (cid:107) ≤ M for all j ≤ k −

2, and not depending on n or h , while on the other hand for any j ≤ k − b k − , b j ] = [ α nk − ( α hk − ( a ∗ k − )) , α nj ( α hj ( a ∗ j ))] , and hence overall we have1 N N (cid:88) n =1 (cid:13)(cid:13)(cid:13) k − (cid:88) j =1 x j [ b k − , b j ] y j (cid:13)(cid:13)(cid:13) L ( τ ) ≤ M k − (cid:88) j =1 N N (cid:88) n =1 (cid:107) [ b k − , b j ] (cid:107) L ( τ ) = M k − (cid:88) j =1 N N (cid:88) n =1 (cid:107) [ α hk − ( a ∗ k − ) , ( α − k − α j ) n ( α hj ( a ∗ j ))] (cid:107) L ( τ ) → N → ∞ , by the asymptotic abelianness of α − k − α j . The same reasoning appliesto the term (cid:80) k − j =1 u j [ b k − , c j ] v j , and now applies again to show that in the scalaraverage of interest to us we may also commute b k − from the left-hand-end of ourproduct over to be immediately on the left of c k − , and then move b k − to c k − ,and so on. Overall, this shows that1 H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 τ (cid:0) α nk − ( α hk − ( a ∗ k − )) · · · α n ( α h ( a ∗ )) · α n ( a ) · · · α nk − ( a k − ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∼ H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 τ (cid:0) α n ( α h ( a ∗ ) a ) · · · α nk − ( α hk − ( a ∗ k − ) a k − ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) n =1 τ (cid:0) α h ( a ∗ ) a · ( α α − ) n ( α h ( a ∗ ) a ) · · · ( α k − α − ) n ( α hk − ( a ∗ k − ) a k − ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ (cid:16) α h ( a ∗ ) a · (cid:16) N N (cid:88) n =1 ( α α − ) n ( α h ( a ∗ ) a ) · · · ( α k − α − ) n ( α hk − ( a ∗ k − ) a k − ) (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) as N → ∞ and then H → ∞ . However, now we notice that the inner averageof operators with respect to N here is precisely of the form hypothesized by thetheorem, but involving only the k − α j α − , j = 1 , , . . . , k − H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ (cid:16) α h ( a ∗ ) a · (cid:16) N N (cid:88) n =1 ( α α − ) n ( E N ( α h ( a ∗ ) a )) · · · ( α k − α − ) n ( E N ( α hk − ( a ∗ k − ) a k − )) (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 H H (cid:88) h =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ (cid:16) E N ( α h ( a ∗ ) a ) · (cid:16) N N (cid:88) n =1 ( α α − ) n ( E N ( α h ( a ∗ ) a )) · · · ( α k − α − ) n ( E N ( α hk − ( a ∗ k − ) a k − )) (cid:17)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where the second equality holds because the operator average in the inner bracketsnow lies in N , and so we apply the usual identity for conditional expectations τ ( aE N ( b )) = τ ( E N ( aE N ( b ))) = τ ( E N ( a ) E N ( b )).Writing s N := 1 N N (cid:88) n =1 ( α α − ) n ( E N ( α h ( a ∗ ) a )) · · · ( α k − α − ) n ( E N ( α hk − ( a ∗ k − ) a k − )) , ON NEUMANN NONCONVENTIONAL AVERAGES 43 we see that (cid:107) s N (cid:107) ≤ C for some ﬁxed C and all N ∈ N , and now combining thisbound with the Cauchy-Schwarz inequality we obtain1 H H (cid:88) h =1 (cid:12)(cid:12) τ ( E N ( α h ( a ∗ ) a ) · s n ) (cid:12)(cid:12) = 1 H H (cid:88) h =1 (cid:12)(cid:12)(cid:10) s ∗ n (cid:98) , ( E N ( α h ( a ∗ ) a ) (cid:98) (cid:11) L ( τ ) (cid:12)(cid:12) ≤ H H (cid:88) h =1 C · (cid:107) E N ( α h ( a ∗ ) a (cid:107) L ( τ ) . Finally, it follows that this tends to 0 as H → ∞ by the our assumption that a ⊥ N and the relative weak mixing hypothesis. This completes the proof of Theorem 1.8. Appendix B. A group theory construction

The purpose of this appendix is to explicitly describe a certain type of group,which we shall term a square group , generated by relations involving quadruples ofgenerators. In particular, we will be able to solve the equality problem for suchgroups. Our arguments here are motivated by an observation of Grothendieckthat groups can be identiﬁed with the sheaf of their ﬂat connections on simplicialcomplexes, and experts will be able to detect the ideas of sheaf theory lurkingbeneath the surface of the material here, although we will not use that theoryexplicitly.

Deﬁnition B.1 (Square groups) . A square base (cid:3) = ( H ∪ V, (cid:3) ) consists of thefollowing data: • A set H ∪ V of generators, partitioned into a subset H of horizontal gen-erators and a subset V of vertical generators ; • A set (cid:3) ⊂ ( H × V × H × V ) ∪ ( V × H × V × H ) of quadruples ( e , e , e , e ) of alternating orientation (thus if e is horizontal then e must be vertical,and so forth).Furthermore, we require the following two axioms on the set (cid:3) : • (Cyclic symmetry) If ( e , e , e , e ) ∈ (cid:3) , then ( e , e , e , e ) ∈ (cid:3) . • (Unique continuation) If e , e ∈ H ∪ V , then there is at most one quadruple ( e , e , e , e ) ∈ (cid:3) with the ﬁrst two components e and e .If (cid:3) is a square base, we deﬁne the square group G (cid:3) associated to that base to bethe group generated by the generators H ∪ V , subject to the relations e e e e = id for all ( e , e , e , e ) ∈ (cid:3) . We deﬁne the alphabet of the square base (or squaregroup) to be the set H ∪ V ∪ H − ∪ V − consisting of the horizontal and verticalgenerators and their formal inverses. To describe square groups explicitly, we shall need some notation of a combinatorialand geometric nature. Let N := { , , , . . . } denote the natural numbers. Deﬁnition B.2 (Monotone paths and regions) . A monotone path is a ﬁnite pathin the discrete quadrant N from (0 , to some endpoint ( n, m ) that consists only ofrightward edges ( i, j ) → ( i + 1 , j ) and upward edges ( i, j ) → ( i, j + 1) (in particular,the path will have length n + m ). Given a monotone path γ from (0 , to ( n, m ) , (n,m) Figure 2.

A monotone region, bounded above and below by twomonotone paths. Note the horizontal and vertical convexity of themonotone region. the shadow of γ is deﬁned to be all the pairs ( i, j ) ∈ N such that ( i, j (cid:48) ) ∈ γ forsome j (cid:48) ≥ j . We say that one monotone path γ (cid:48) lies above another monotonepath γ with the same endpoint ( n, m ) if the shadow of γ (cid:48) contains the shadow of γ . In such cases, we refer to the set-theoretic diﬀerence between the two shadowsas a monotone region from (0 , to ( n, m ) , with γ (cid:48) and γ referred to as the upperboundary and lower boundary of the region respectively. We will also consider a monotone path as a degenerate example of a monotoneregion. Monotone regions are horizontally and vertically convex: if two endpointsof a horizontal or vertical line segment in N lie in a monotone region, then theinterior of that segment does also. Deﬁnition B.3 (Flat connections) . Fix a square base (cid:3) , and let Ω ⊂ N be aset. A connection Γ on Ω is an assignment Γ(( i, j ) → ( i + 1 , j )) ∈ H ∪ H − ofa horizontal element of the alphabet to every horizontal edge ( i, j ) , ( i + 1 , j ) ∈ Ω ,and an assignment Γ(( i, j ) → ( i, j + 1)) ∈ V ∪ V − of a vertical element of thealphabet to every vertical edge ( i, j ) (cid:55)→ ( i, j + 1) ∈ Ω . We adopt the conventionthat Γ(( i + 1 , j ) → ( i, j )) := Γ(( i, j ) → ( i + 1 , j )) − and Γ(( i, j + 1) → ( i, j )) :=Γ(( i, j ) → ( i, j + 1)) − , where ( e − ) − := e for e ∈ H ∪ V of course.We say that the connection Γ is ﬂat if for every square ( i, j ) , ( i + 1 , j ) , ( i, j + 1) , ( i +1 , j + 1) in Ω , there exists an oriented loop f , f , f , f of horizontal and verticaledges around the square (in either orientation) such that (Γ( f ) , Γ( f ) , Γ( f ) , Γ( f )) ∈ (cid:3) . ON NEUMANN NONCONVENTIONAL AVERAGES 45

GAB CD EFa dgh cb f e

Figure 3.

A monotone region { A, B, C, D, E, F, G } (with A =(0 , B = (0 , a, b, c, d, e, f, g, h ∈ G (cid:3) , thus for instance Γ( B → C ) = b and Γ( C → B ) = b − . If for instance ( a, b, g − , h − ) and( f, e, d − , c − ) are in (cid:3) , then this connection is ﬂat. A ﬂat connection on a monotone region from (0 , to ( n, m ) is said to be maximal ifit cannot be extended to any strictly larger monotone region with the same endpoints.It is reduced if there does not exist a triple ( i, j ) , ( i + 1 , j ) , ( i + 2 , j ) or ( i, j ) , ( i, j +1) , ( i, j + 2) in Ω such that Γ(( i, j ) → ( i + 1 , j ))Γ(( i + 1 , j ) → ( i + 2 , j )) = id or Γ(( i, j + 1) → ( i, j ))Γ(( i, j + 1) → ( i, j + 2)) = id . In the degenerate case when Ω is just a monotone path, every connection is auto-matically ﬂat, as there are no squares.Let Γ be a ﬂat connection on a monotone region Ω. Then one can integrate thisconnection to produce a map Φ Γ : Ω → G (cid:3) by setting Φ Γ (0 ,

0) := id and Φ Γ ( v ) =Φ Γ ( u )Γ( u → v ) for all horizontal and vertical edges ( u → v ) in Ω. From the ﬂatnessof Γ and the “connected” nature of Ω it is easy to see that Φ Γ exists and is unique.In particular, we can deﬁne the deﬁnite integral | Γ | of Γ to be the group element | Γ | := Φ Γ ( n, m ), where ( n, m ) is the endpoint of Ω. Example

B.1 . The deﬁnite integral of the ﬂat connection in Figure 3 is equal to abcd = abf e = hgcd = hgf e . (cid:67) Observe that every group element g in G (cid:3) can arise as a deﬁnite integral of some ﬂatconnection, simply by expressing g as a word in the alphabet H ∪ V ∪ H − ∪ V − ,and creating an associated monotone path and connection for that word. Lateron we shall see that the deﬁnite integral will provide a one-to-one correspondencebetween group elements and maximal reduced ﬂat connections (Corollary B.7).We have the following fundamental facts: Lemma B.4.

Let (cid:3) be a square base, and let ( n, m ) ∈ N . • (Unique continuation) If Ω is a monotone region from (0 , to ( n, m ) , and γ is a path from (0 , to ( n, m ) in Ω , then any ﬂat connection on Ω isuniquely determined by its restriction to γ . In other words, if Γ , Γ (cid:48) are twoﬂat connections on Ω that agree on γ , then they agree on all of Ω . • (Maximality) If Ω is a monotone region from (0 , to ( n, m ) , and Γ is a ﬂatconnection on Ω , then there exists a unique extension of Γ to a maximalﬂat connection on a monotone region Ω from (0 , to ( n, m ) containing Ω .Proof. We ﬁrst establish unique continuation. This is best explained visually. Thekey observation is that if two ﬂat connections on a square agree on two adjacentsides of a square, then they must agree on the whole square. This is ultimately aconsequence of the unique continuation property of the square base (cid:3) , and can beveriﬁed by a routine case check. Thus, if Γ , Γ (cid:48) are two connections on Ω that agreeon γ , they also agree on any perturbation of γ in Ω formed by taking an adjacentpair of horizontal and vertical edges in γ and “popping” them by replacing themby the other two edges of the square that they form; note that this retains theproperty of being a monotone path. One can check that after a suﬃcient numberof upward and downward “popping” operations one can cover the upper and lowerboundaries of Γ, and everything in between, and the claim follows. Example

B.2 . We continue working with Figure 3. Suppose two ﬂat connectionsΓ , Γ (cid:48) on the indicated region agree on the upper boundary ABCDE , with theindicated connection values a, b, c, d . By unique continuation of (cid:3) , the only possiblevalues available for Γ , Γ (cid:48) on the remaining two edges CF , F E of the square

CDEF are f and e . Thus we may “pop” the upper square and obtain that Γ, Γ (cid:48) also agreeon the monotone path ABCF E . After popping the lower square also we obtainthat Γ , Γ (cid:48) agree on the entire monotone region.To prove the second claim, we simply observe that if Γ can be extended to twomonotone regions Ω , Ω (cid:48) containing Ω , then by unique continuation they agree onthe intersection Ω ∩ Ω (cid:48) (which is also a monotone region), and can thus be gluedto form a ﬂat connection on the union Ω ∪ Ω (cid:48) (which is also a monotone region ).Since there are only ﬁnitely many monotone regions from (0 ,

0) to ( n, m ), the claimthen follows from the greedy algorithm. (cid:3)

Now we need a fundamental deﬁnition.

Deﬁnition B.5 (Concatenation) . Let Γ be a maximal reduced ﬂat connection onsome monotone region Ω from (0 , to ( n, m ) , and let x ∈ H ∪ V ∪ H − ∪ V − bea symbol in the alphabet. We deﬁne the concatenation Γ · x of Γ with x to be themaximal ﬂat connection Γ (cid:48) = Γ · x on a monotone region Ω (cid:48) from (0 , to ( n (cid:48) , m (cid:48) ) generated by the following rule. • (Collapse) If x is horizontal (i.e. x ∈ H ∪ H − ), ( n − , m ) lies in Ω , and Γ(( n − , m ) → ( n, m )) = x − , then one sets ( n (cid:48) , m (cid:48) ) := ( n − , m ) , sets Ω (cid:48) to be the restriction of Ω to the region { ( i, j ) ∈ N : i ≤ n − } (i.e. onedeletes the rightmost column of Ω , and sets Γ (cid:48) to be the restriction of Γ to Ω (cid:48) . One way to see this is to rotate the plane by 45 degrees, so that monotone paths becomegraphs of discrete Lipschitz functions with Lipschitz constant 1, and monotone regions becomethe regions between two such functions.

ON NEUMANN NONCONVENTIONAL AVERAGES 47 • (Extension) If x is horizontal, and either ( n − , m ) lies outside of Ω or Γ(( n − , m ) → ( n, m )) (cid:54) = x − , then one sets ( n (cid:48) , m (cid:48) ) := ( n + 1 , m ) , andextends Γ to Ω ∪ { ( n + 1 , m ) } by setting Γ(( n, m ) → ( n + 1 , m )) := x ;note that this is still ﬂat because it does not create any squares. One thenextends Γ further by the second part of Lemma B.4 to create the maximalﬂat connection Γ (cid:48) on Ω (cid:48) that extends Γ . • If x is vertical instead of horizontal, one follows the analogue of the aboverules but with the roles of n and m reversed.Example B.3 . Imagine one concatenated a horizontal edge x to the ﬂat connectionin Figure 3, which we shall assume to be maximal reduced. If x is not equal to d − , then the concatenated connection would thus extend one unit to the right of E to the endpoint (3 , EF if there is an appropriate tuple in (cid:3) to achieve this extension. If instead x wasequal to d − , then the connection would collapse to the region { A, B, C, D, G } , sothat the endpoint is now D = (1 , (cid:67) The importance of this deﬁnition lies in the fact that it gives a representation of G (cid:3) : Lemma B.6.

Let (cid:3) be a square base, and let Γ be a maximal reduced ﬂat connec-tion. • (Preservation of reducibility) For any x ∈ H ∪ V ∪ H − ∪ V − , Γ · x isreduced. • (Invertibility) For any x ∈ H ∪ V ∪ H − ∪ V − , one has (Γ · x ) · x − = Γ . • (Square relations) For any ( e , e , e , e ) ∈ (cid:3) , one has (((Γ · e ) · e ) · e ) · e =Γ .In particular, the group G (cid:3) acts on the space O of maximal reduced ﬂat connectionsin a unique manner, sending Γ to Γ · g for any Γ ∈ O and g ∈ G (cid:3) .Proof. We begin with the preservation of reducibility claim. If Γ · x is formed bycollapsing Γ, the claim is clear, so suppose instead that Γ · x is formed by extension.By symmetry we may assume that x is horizontal. Let ( n, m ) denote the endpointof Γ, and let Ω (cid:48) be the domain of Γ · x (which then has endpoint ( n + 1 , m )).Assume for contradiction that Γ · x is not reduced. Since Γ was reduced, there areonly two possibilities: either one has a vertical degeneracy(29) Γ(( n + 1 , j ) → ( n + 1 , j + 1))Γ(( n + 1 , j + 1) → ( n + 1 , j + 2)) = idfor some ( n + 1 , j ) , ( n + 1 , j + 1) , ( n + 1 , j + 2) ∈ Ω (cid:48) , or else one has a horizontaldegeneracy(30) Γ(( n − , j ) → ( n, j ))Γ(( n, j ) → ( n + 1 , j )) = idfor some ( n − , j ) , ( n, j ) , ( n + 1 , j ) ∈ Ω (cid:48) .Suppose ﬁrst that one has a vertical degeneracy (29). Consider the restrictionsΓ , Γ of the connection Γ on the adjacent squares (( n, j ) , ( n, j + 1) , ( n + 1 , j ) , ( n +1 , j +1)) and (( n, j +1) , ( n, j +2) , ( n +1 , j +1) , ( n +1 , j +2)). By construction, Γ , Γ agree on their common edge (( n, j + 1) → ( n + 1 , j + 1)), and Γ (( n + 1 , j + 1) → ( n + 1 , j )) is equal to Γ (( n + 1 , j + 1) → ( n + 1 , j + 2)). By the unique continuationproperty of (cid:3) , this implies that Γ and Γ are reﬂections of each other, and inparticular that Γ (( n, j + 1) → ( n, j )) is equal to Γ (( n, j + 1) → ( n, j + 2)). Butthis implies that Γ is not reduced, a contradiction.Now suppose instead that one has a horizontal degeneracy (30). From DeﬁnitionB.5 we know that j cannot equal m , otherwise we would have collapsed rather thanextended Γ. Let 0 ≤ j < m be the largest j for which (30) holds. By repeating theargument in the previous paragraph, we see that the restrictions of Γ to the adjacentsquares { ( n − , j ) , ( n, j ) , ( n − , j +1) , ( n, j +1) } and { ( n, j ) , ( n +1 , j ) , ( n, j +1) , ( n +1 , j + 1) } are reﬂections of each other, which implies that (30) also holds for j + 1,contradicting the maximality of j . This establishes the preservation of reducibility.Now we establish the invertibility. Again, by symmetry we may assume that x ishorizontal.If Γ · x is a (horizontal) extension of Γ, then it is easy to see from Deﬁnition B.5that (Γ · x ) · x − will be the (horizontal) collapse of Γ · x , which is Γ. Conversely,if Γ · x is the (horizontal) collapse of Γ, then (Γ · x ) · x − will be the (horizontal)extension (because Γ was reduced), which will equal Γ again (by uniqueness ofmaximal extension).Finally, we establish the square relations. From cyclic symmetry and invertibilitywe may assume that e , e are horizontal and e , e are vertical. From invertibilityagain, it suﬃces to show that(Γ · e ) · e = (Γ · e − ) · e − for any maximal reduced ﬂat connection Γ. We use ( n, m ) to denote the endpointof Γ.We divide into four cases. Suppose ﬁrst that Γ · e is an extension of Γ, and that(Γ · e ) · e is an extension of Γ · e . Then we claim that Γ · e − is an extensionof Γ. For if this were not the case, then Γ(( n, m − → ( n, m )) must equal e ,but then as (Γ · e )(( n, m ) → ( n + 1 , m )) equals e by construction, the domainof Γ · e must include the square ( n, m − , ( n, m ) , ( n + 1 , m − , ( n + 1 , m ) with(Γ · e )(( n +1 , m − → ( n +1 , m )) = e − , causing (Γ · e ) · e to be a collapse ratherthan an extension, a contradiction. Thus Γ · e − extends Γ. A similar argumentshows that (Γ · e − ) · e − extends Γ · e − (otherwise Γ(( n − , m ) → ( n, m )) wouldequal e − , causing Γ · e to be a collapse rather than an extension). It is then easy toverify that (Γ · e − ) · e − and (Γ · e ) · e are the same (since they glue together to forma ﬂat connection on Γ and on the square ( n, m ) , ( n +1 , m ) , ( n, m +1) , ( n +1 , m +1)).Now suppose that Γ · e is an extension of Γ, but that (Γ · e ) · e is a collapse ofΓ · e . Arguing as before, we conclude that Γ(( n, m − → ( n, m )) equals e , andso Γ · e − is a collapse of Γ; similarly, (Γ · e − ) · e − cannot be a collapse of Γ · e − (this would force Γ · e to be a collapse also) and so is an extension. It is again easyto verify that (Γ · e − ) · e − and (Γ · e ) · e are the same.The remaining two cases (when Γ · e is a collapse of Γ, and (Γ · e ) · e is either anextension or collapse of Γ · e ) are similar to the preceding two, and are left to thereader. (cid:3) ON NEUMANN NONCONVENTIONAL AVERAGES 49

This gives us a satisfactory explicit description of a square group:

Corollary B.7.

Let (cid:3) be a square group. Then the deﬁnite integral map Γ (cid:55)→ | Γ | is a bijection from O to G (cid:3) ; thus every group element has a unique representationas the deﬁnite integral of a maximal reduced ﬂat connection.Proof. The surjectivity of this map was already established in the discussion afterDeﬁnition B.3, so it suﬃces to establish the injectivity. We will establish this viathe identity Γ = ∅ · | Γ | for all Γ ∈ O , where ∅ is the trivial ﬂat connection over the monotone region { (0 , } from (0 ,

0) to (0 , | Γ | , demonstrating injectivity.Let Ω be the domain of Γ, which by deﬁnition is a monotone region from (0 ,

0) tosome point ( n, m ). Let γ be some monotone path in Ω from (0 ,

0) to ( n, m ) (e.g.one could take γ to be the upper or lower boundary of Ω). We label the vertices of γ in order as (0 ,

0) = ( i , j ) , ( i , j ) , . . . , ( i n + m , j n + m ) = ( n, m ). From deﬁnition of | Γ | , we see that | Γ | = Γ(( i , j ) → ( i , j ))Γ(( i , j ) → ( i , j )) . . . Γ(( i n + m − , j n + m − ) → ( i n + m , j n + m )) . For each 0 ≤ k ≤ n + m , let Ω k be the portion of Ω which is in the region { ( i, j ) : i ≤ i k , j ≤ j k } , thus Ω k is a monotone region from (0 ,

0) to ( i k , j k ) whichis increasing in k . Let Γ k be the restriction of Γ to Ω k . As Γ was maximal andreduced, each of the Γ k is also. Since Γ n + m = Γ, it will suﬃce to establish thatΓ k = ∅ · Γ(( i , j ) → ( i , j ))Γ(( i , j ) → ( i , j )) . . . Γ(( i k − , j k − ) → ( i k , j k ))for all 0 ≤ k ≤ n + m . But this is easily established by induction (the reducednature of the Γ k is necessary to avoid the collapse case in Deﬁnition B.5). (cid:3) As a consequence of this corollary, we can distinguish any two elements in G (cid:3) from each other as long as we can express them as the deﬁnite integrals of distinctmaximal reduced ﬂat connections.B.1. Applications.

We now specialise the above abstract group-theoretic machin-ery to the application at hand. We begin with a proposition which will be used toshow non-convergence of quadruple recurrence (Theorem 2.1).

Proposition B.8 (Independence of AP4 relations) . Let A ⊂ Z be a (possiblyinﬁnite) set of integers. Then there exist a group G with elements e , e , e , e ,together with an automorphism T : G → G , such that for r ∈ N , the relation (31) e ( T r e )( T r e )( T r e ) = id holds if and only if r ∈ A . Furthermore, no power T k of T with k (cid:54) = 0 has any ﬁxedpoints other than the identity element id .Remark B.4 . Informally, this proposition asserts that the algebraic relations (31)for various r ∈ Z are independent of each other. In contrast, with progressions oflength three (i.e. in the case k = 3) the analogous relations are highly degenerate.Indeed, suppose that(32) e ( T r e )( T r e ) = id for all r ∈ A . Then if r, r + h lie in A , we have e ( T r e )( T r e ) = e ( T r T h e )( T r T h e )which we can rearrange as( T h e − ) e = T r (( T h e ) e − ) . If r, r + h, r (cid:48) , r (cid:48) + h lie in A , we thus have T r (( T h e ) e − ) = T r (cid:48) (( T h e ) e − ) . Assuming that T r (cid:48) − r has no ﬁxed points, we conclude that ( T h e ) e − is the iden-tity; assuming that T h has no ﬁxed points either, we conclude that e is theidentity. Similar arguments can be used to show that e and then e are also theidentity. Thus the relations (32) and the no-ﬁxed-points hypothesis lead to a totalcollapse of the group generated by e , e , e as soon as A contains even a singlenon-trivial parallelogram r, r + h, r (cid:48) , r (cid:48) + h . (A variant of this argument also showsthat if (32) is obeyed for r and r + h , then it is also obeyed for r + 2 h even withoutthe ﬁxed point hypothesis.) This algebraic distinction between triple recurrenceand quadruple recurrence can be viewed as the primary reason why recurrenceand convergence results continue to hold for triple products, but not for quadrupleproducts even under the assumption of ergodicity (which is reﬂected here in theno-ﬁxed-points assumption). (cid:67) Proof.

We let G be the group generated by the generators e i,n for i = 0 , , , n ∈ Z , subject to the relations e ,n e ,n + r e ,n +2 r e ,n +3 r = idfor all n ∈ Z and r ∈ A . As the set of such relations is invariant under the shift e i,n (cid:55)→ e i,n +1 , we see that we can deﬁne an automorphism T : G → G by setting T e i,n := e i,n +1 . If we then set e i := e i, , it is clear that (31) holds for all r ∈ A .To see that (31) fails for r (cid:54)∈ A , we observe that G can be viewed as a square group,with the horizontal generators { e i,n : i = 0 , n ∈ Z } and vertical generators { e i,n : i = 1 , n ∈ Z } and square relations (cid:3) consisting of ( e ,n , e ,n + r , e ,n +2 r , e ,n +3 r )and its cyclic permutations for all n ∈ Z and r ∈ A ; note that the crucial uniquecontinuation property follows from the basic observation that an arithmetic pro-gression is determined by any two of its elements (“two points determine a line”).If n ∈ Z and r (cid:54)∈ A , one sees that the connection on the path of length four from(0 ,

0) to (2 ,

2) associated to the word e ,n e ,n + r e ,n +2 r e ,n +3 r is already a maximalreduced ﬂat connection (as none of the three squares that share two edges with thepath can be completed to a square from (cid:3) ) and so by Corollary B.7, its deﬁniteintegral e ,n e ,n + r e ,n +2 r e ,n +3 r is not equal to the identity, as required.Finally, to show that T k has no non-trivial ﬁxed points, one simply observes that T k will shift any non-trivial maximal reduced ﬂat connection to a diﬀerent maximalreduced ﬂat connection, and then invokes Corollary B.7 again. (cid:3) Next, we establish a variant that is useful for showing negative averages for quin-tuple recurrence (Theorem 2.7).

ON NEUMANN NONCONVENTIONAL AVERAGES 51

Proposition B.9 (Independence of AP5 relations) . There exists a group G withdistinct elements e , e , e , e , e , together with an automorphism T : G → G , suchthat the relation (33) e ( T r e )( T r e )( T r e )( T r e ) = id holds for all r ∈ Z . Furthermore, no power T k of T with k (cid:54) = 0 has any ﬁxed pointsother than the identity element id . Finally, if r ∈ Z is nonzero, and g , g , g , g , g ∈ { id, e , e , e , e , e , e − , e − , e − , e − , e − } are such that (34) g ( T r g )( T r g )( T r g )( T r g ) = id , then g , g , g , g , g are either equal to the identity, or are a permutation of { e , e , e , e , e } or of { e − , e − , e − , e − , e − } .Proof. For each i = 0 , , , ,

4, we deﬁne G ( i ) to be the group generated by thegenerators e ( i ) j,n for j ∈ { , , , , }\{ i } and n ∈ Z subject to the relations(35) e ( i )0 ,n e ( i )1 ,n + r e ( i )2 ,n +2 r e ( i )3 ,n +3 r e ( i )4 ,n +4 r = idfor all n, r ∈ Z , with the convention that e ( i ) i,n = id for all n . This group has anautomorphism T ( i ) : G ( i ) → G ( i ) that maps e ( i ) j,n to e ( i ) j,n +1 for all n .We now set G to be the product group G := G (0) × G (1) × . . . × G (4) , and set e j := ( e (0) j, , e (1) j, , . . . , e (4) j, )for j = 0 , , , ,

4. We also set T ( g (0) , g (1) , . . . , g (4) ) := ( T (0) g (0) , T (1) g (1) , . . . , T (4) g (4) ) , thus T is an automorphism on G . By construction it is clear that (33) holds. Also,by the arguments in Proposition B.8, no non-zero power of T ( i ) has any non-trivialﬁxed points, and so the same is also true of T .Now we establish the ﬁnal claim of the proposition. Suppose g , . . . , g obey thestated properties. Let i = 0 , , , ,

4, and let g ( i ) j be the G ( i ) component of g j for j = 0 , , , ,

4, thus(36) g ( i )0 (( T ( i ) ) r g ( i )1 )(( T ( i ) ) r g ( i )2 )(( T ( i ) ) r g ( i )3 )(( T ( i ) ) r g ( i )4 ) = id . From construction of G ( i ) , we see that for any distinct j, k ∈ { , , , , }\{ i } ,there is a homomorphism φ ( i ) j,k : G ( i ) → Z to the additive group Z that maps e ( i ) j,n to +1, e ( i ) k,n to −

1, and all other e ( i ) l,n to zero for n ∈ Z and l ∈ { , , , , }\{ i, j, k } (note that these requirements are compatible with the deﬁning relations (35)). Thishomomorphism is T ( i ) invariant. Applying this homomorphism to (36), we obtain (cid:88) l =0 φ ( i ) j,k ( g ( i ) l ) = 0 . In other words, the number of times g l for l = 0 , , , , e j , minus thenumber of times it equals e − j , is equal to the number of times g l equals e k , mi-nus the number of times it equals e − k . Letting j, k, i vary, we thus see that this number is independent of j . It is easy to see that this number cannot exceed 1 inmagnitude, and if it is equal to +1 or −

1, then g , g , g , g , g is a permutation of { e , e , e , e , e } or of { e − , e − , e − , e − , e − } respectively. (Note that this argu-ment also ensures that e , e , e , e , e are distinct.) The remaining possibility toeliminate is when this number is zero, thus each e i occurs in g , g , g , g , g as oftenas e − i . Suppose for instance that g , g , g , g , g contains one occurrence each of e , e − , e , e − . Applying (36) with i = 4 (say), and then applying the homomor-phism that maps e (4)0 ,n to zero, e (4)1 ,n to n , e (4)2 ,n to − n , and e (4)3 ,n to n (here we usethe identity ( n + r ) − n + 2 r ) + ( n + 3 r ) = 0 to ensure consistency with (35))we obtain a contradiction. Similarly if g , g , g , g , g contains any other combi-nation of one or two distinct pairs e j , e − j . The remaining case to eliminate is if g , g , g , g , g contains e j and e − j twice each for some j , say j = 0. Applying (36)with i = 4 again, we can use Corollary B.7 to contradict (36) (as the right-handside is a deﬁnite integral of a maximal ﬂat connection on a horizontal path of lengthfour). Similarly for other values of j , and the claim follows. (cid:3) References [1] T. Austin. On the norm convergence of nonconventional ergodic averages. To appear,

ErgodicTheory Dynam. Systems , 2008.[2] F. A. Behrend. On sets of integers which contain no three terms in arithmetical progression.

Proc. Nat. Acad. Sci. U. S. A. , 32:331–332, 1946.[3] V. Bergelson. Weakly mixing PET.

Ergodic Theory Dynam. Systems , 7(3):337–349, 1987.[4] V. Bergelson, B. Host, and B. Kra. Multiple recurrence and nilsequences.

Invent. Math. ,160(2):261–303, 2005. With an appendix by Imre Ruzsa.[5] V. Bergelson and A. Leibman. Failure of the Roth theorem for solvable groups of exponentialgrowth.

Ergodic Theory Dynam. Systems , 24(1):45–53, 2004.[6] C. Beyers, R. Duvenhage, and A. Str¨oh. The Szemer´edi property in ergodic W ∗ -dynamicalsystems. Preprint, available online at arXiv.org : 0709.1557, 2007.[7] O. Bratteli and D. W. Robinson. Operator algebras and quantum statistical mechanics. 1 .Texts and Monographs in Physics. Springer-Verlag, New York, second edition, 1987. C ∗ - and W ∗ -algebras, symmetry groups, decomposition of states.[8] J.-P. Conze and E. Lesigne. Th´eor`emes ergodiques pour des mesures diagonales. Bull. Soc.Math. France , 112(2):143–175, 1984.[9] J.-P. Conze and E. Lesigne. Sur un th´eor`eme ergodique pour des mesures diagonales.

C. R.Acad. Sci. Paris S´er. I Math. , 306(12):491–493, 1988.[10] J.-P. Conze and E. Lesigne. Sur un th´eor`eme ergodique pour des mesures diagonales. In

Probabilit´es , volume 1987 of

Publ. Inst. Rech. Math. Rennes , pages 1–31. Univ. Rennes I,Rennes, 1988.[11] R. Duvenhage. Bergelson’s Theorem for weakly mixing C ∗ -dynamical systems. Studia Math. ,192:235–257, 2009.[12] F. Fidaleo. On the entangled ergodic theorem.

Inﬁn. Dimens. Anal. Quantum Probab. Relat.Top. , 10(1):67–77, 2007.[13] F. Fidaleo. An ergodic theorem for quantum diagonal measures.

Inﬁn. Dimens. Anal. Quan-tum Probab. Relat. Top. , 12(2):307–320, 2009.[14] N. Frantzikinakis and B. Kra. Convergence of multiple ergodic averages for some commutingtransformations.

Ergodic Theory Dynam. Systems , 25(3):799–809, 2005.[15] H. Furstenberg. Ergodic behaviour of diagonal measures and a theorem of Szemer´edi onarithmetic progressions.

J. d’Analyse Math. , 31:204–256, 1977.[16] H. Furstenberg and Y. Katznelson. An ergodic Szemer´edi Theorem for commuting transfor-mations.

J. d’Analyse Math. , 34:275–291, 1978.[17] H. Furstenberg, Y. Katznelson, and D. Ornstein. The ergodic theoretical proof of Szemer´edi’stheorem.

Bull. Amer. Math. Soc. (N.S.) , 7(3):527–552, 1982.[18] E. Glasner.

Ergodic Theory via Joinings . American Mathematical Society, Providence, 2003.

ON NEUMANN NONCONVENTIONAL AVERAGES 53 [19] W. T. Gowers. A new proof of Szemer´edi’s theorem.

Geom. Funct. Anal. , 11(3):465–588,2001.[20] W. T. Gowers. Quasirandomness, counting and regularity for 3-uniform hypergraphs.

Com-bin. Probab. Comput. , 15(1-2):143–184, 2006.[21] B. Host. Ergodic seminorms for commuting transformations and applications. Preprint, avail-able online at arXiv.org : 0811.3703.[22] B. Host and B. Kra. Convergence of Conze-Lesigne averages.

Ergodic Theory Dynam. Sys-tems , 21(2):493–509, 2001.[23] B. Host and B. Kra. Nonconventional ergodic averages and nilmanifolds.

Ann. Math. ,161(1):397–488, 2005.[24] V. Jones and V. S. Sunder.

Introduction to subfactors , volume 234 of

London MathematicalSociety Lecture Note Series . Cambridge University Press, Cambridge, 1997.[25] V. F. R. Jones. Index for subfactors.

Invent. Math. , 72(1):1–25, 1983.[26] R. Kadison and J. Ringrose.

Fundamentals of Operator Algebras , volume 1 & 2. AmericanMathematical Society, Providence, 1997.[27] D. Kerr and H. Li.[28] B. Kra. From combinatorics to ergodic theory and back again. In

International Congress ofMathematicians. Vol. III , pages 57–76. Eur. Math. Soc., Z¨urich, 2006.[29] U. Krengel.

Ergodic Theorems . de Gruyter, Berlin, 1985.[30] B. Nagle, V. R¨odl, and M. Schacht. The counting lemma for regular k -uniform hypergraphs. Random Structures and Algorithms , to appear.[31] C. P. Niculescu, A. Str¨oh, and L. Zsid´o. Noncommutative extensions of classical and multiplerecurrence theorems.

J. Operator Theory , 50(1):3–52, 2003.[32] S. Popa. Cocycle and orbit equivalence superrigidity for malleable actions of w -rigid groups. Invent. Math. , 170(2):243–295, 2007.[33] R. Salem and D. C. Spencer. On sets of integers which contain no three terms in arithmeticalprogression.

Proc. Nat. Acad. Sci. U. S. A. , 28:561–563, 1942.[34] E. Størmer. Spectra of ergodic transformations.

J. Functional Analysis , 15:202–215, 1974.[35] E. Szemer´edi. On sets of integers containing no k elements in arithmetic progression. ActaArith. , 27:199–245, 1975.[36] T. Tao. Norm convergence of multiple ergodic averages for commuting transformations.

Er-godic Theory and Dynamical Systems , 28:657–688, 2008.[37] H. P. Towsner. Convergence of Diagonal Ergodic Averages. Preprint, available online at arXiv.org : 0711.1180, 2007.[38] R. C. Vaughan.

The Hardy-Littlewood method , volume 125 of

Cambridge Tracts in Mathe-matics . Cambridge University Press, Cambridge, second edition, 1997.[39] Q. Zhang. On convergence of the averages (1 /N ) (cid:80) Nn =1 f ( R n x ) f ( S n x ) f ( T n x ). Monatsh.Math. , 122(3):275–300, 1996.[40] T. Ziegler. A non-conventional ergodic theorem for a nilsystem.

Ergodic Theory Dynam.Systems , 25(4):1357–1370, 2005.[41] T. Ziegler. Universal characteristic factors and Furstenberg averages.

J. Amer. Math. Soc. ,20(1):53–97 (electronic), 2007.[42] R. J. Zimmer. Ergodic actions with generalized discrete spectrum.

Illinois J. Math. ,20(4):555–588, 1976.[43] R. J. Zimmer. Extensions of ergodic group actions.