[PDF] A Multidimensional Szemerédi Theorem in the primes

Abstract

Let A be a subset of positive relative upper density of $\PP^d$, the d -tuples of primes. We prove that A contains an affine copy of any finite set $F\subs\Z^d$, which provides a natural multi-dimensional extension of the theorem of Green and Tao on the existence of long arithmetic progressions in the primes. The proof uses the hypergraph approach by assigning a pseudo-random weight system to the pattern F on a d+1 -partite hypergraph; a novel feature being that the hypergraph is no longer uniform with weights attached to lower dimensional edges. Then, instead of using a transference principle, we proceed by extending the proof of the so-called hypergraph removal lemma to our settings, relying only on the linear forms condition of Green and Tao.

Full PDF

aa r X i v : . [ m a t h . N T ] M a y A MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES

BRIAN COOK, ´AKOS MAGYAR, TATCHAI TITICHETRAKUNA

BSTRACT . Let A be a subset of positive relative upper density of P d , the d -tuples of primes. We prove that A contains an afﬁne copy of any ﬁnite set F ⊆ Z d , which provides a natural multi-dimensional extensionof the theorem of Green and Tao on the existence of long arithmetic progressions in the primes. The proofuses the hypergraph approach by assigning a pseudo-random weight system to the pattern F on a d + 1 -partitehypergraph; a novel feature being that the hypergraph is no longer uniform with weights attached to lowerdimensional edges. Then, instead of using a transference principle, we proceed by extending the proof of theso-called hypergraph removal lemma to our settings, relying only on the linear forms condition of Green andTao.

1. I

NTRODUCTION

Background.

A celebrated theorem in additive combinatorics due to Green and Tao [7] establishesthe existence of arbitrary long arithmetic progressions in the primes. It is proved that if A is a subset of theprimes of positive relative upper density then A necessarily contains inﬁnitely many afﬁne copies of anyﬁnite set of integers. As such, it might be viewed as a relative version of Szemer´edi’s theorem [17] on theexistence of long arithmetic progressions in dense subsets of the integers.Another fundamental result in this area is the multi-dimensional extension of Szemer´edi’s theorem origi-nally proved by Furstenberg and Katznelson [3]. It states that if A ⊆ Z d is of positive upper density then A contains an afﬁne copy of any ﬁnite set F ⊆ Z d . The proof in [3] uses ergodic methods however a morerecent combinatorial approach was developed by Gowers [5] and also independently by Nagel, R ¨odl andSchacht [14].It is natural to ask if a multi-dimensional extension of the result of Green and Tao, or alternatively if arelative version of the Furstenberg-Katznelson theorem can be established. In fact, this question was raisedalready in [18] where the existence of arbitrary constellations among the Gaussian primes was shown. Apartial result was obtained earlier by the ﬁrst two authors [2], where it was proved that relative dense subsetsof P d contain an afﬁne copy of any ﬁnite set F ⊆ Z d which is in general position , in the sense that eachcoordinate hyperplane contains at most one point of F .A common feature of the above mentioned results is that they use an embedding of the underlying sets(the primes or the Gaussian primes) into a set which is sparse but sufﬁciently random with respect to thepattern F . In our case when the set F is not in general position (the simplest example being a 2-dimensionalcorner) this does not seem possible, due to the extra correlations arising from the direct product structure.For example if 3 vertices of a rectangle are in P , then the fourth vertex is necessarily in P , a type of self-correlation not present in the one dimensional case or the Gaussian primes.Another approach, already partly used in [18], is to establish a hypergraph removal lemma [5], [14] forsparse uniform hypergraphs or alternatively with weights attached to the faces. This approach has beenutilized by the second and third authors [13] to show the existence of d -dimensional corners (simplices with edges parallel to the coordinate axis) in dense subsets of P d . Recently a proof based on hypergraph theoryusing only the linear forms conditions, has been obtained in [1], covering both the original Green-Tao theo-rem and the case of the Gaussian primes.In all of the above approaches the crucial point is to prove a removal lemma for a weighted (or sparse) uniform hypergraphs, using transference arguments to remove the weights from the hyperedges. As op-posed, for a general constellation in P d the hypergraph approach leads to a weighted closed hypergraphwith weights attached possibly to any lower dimensional edge, and the usual transference principles do notapply. Our approach is different, we are not trying to remove the weights and hence to reduce the problem topreviously known results, but to extend the proof of the hypergraph regularity and removal lemmas directlyto the weighted settings, which might be of independent interest. In this aspect our argument is essentiallyself-contained, relying only on results from sieve theory, namely on the so-called linear forms conditions [7].Simultaneously with our original work on this problem, the existence of arbitrary constellations in rela-tive dense subsets of P d has also been shown by Tao and Ziegler [20], using an entirely different methodbased on an inﬁnite number of linear forms conditions to obtain a weighted version of the Furstenberg cor-respondence principle, and a short, elegant proof by Fox and Zhao [4] has been obtained afterwards usingsampling arguments. Both of the above proofs however rely on full force of the results of Green, Tao andZiegler developed in [8],[9],[10] for the study of asymptotic number of prime solutions for systems linearequations. As such the methods of [20], and [4] are do not provide bounds, while from our approach one canextract quantitative statements. The bounds, though recursive, are rather poor (iterated tower-exponentialtype) and we do not pursue to explicitly calculate them here. Also, as we rely only on sieve-tech niques ourapproach is somewhat ﬂexible, i.e. it might not be hard to modify it to count the number of small copies ofa ﬁnite set F , of size N ε , in a set A ⊆ [1 , N ] d ∩ P d of positive relative density.1.2. Main results.

Let us recall that a set A ⊆ P d is of positive relative upper density if lim sup N →∞ | A ∩ P dN || P N | d > , where P N denotes the set primes up to N , and | A | stands for the cardinality of a set A . If F ⊆ Z d is a ﬁniteset, we say that a set F ′ is an afﬁne copy of F , or alternatively that F ′ is a constellation deﬁned by F , if F ′ = x + t · F = { x + ty ; y ∈ F } . We call F ′ non-trivial if t = 0 . Our main result is the following. Theorem 1.1. If A is a subset of P d of positive upper relative density, then A contains inﬁnitely manynon-trivial afﬁne copies of any ﬁnite set F ⊆ Z d . Note that it is enough to show that the set A contains at least one non-trivial afﬁne copy of F , as deletingthe set F from A will not affect its relative density. Also, replacing the set F by F ′ = F ∪ ( − F ) one canrequire that the dilation parameter t is positive. By lifting the problem to a higher number of dimensions,it is easy to see that one can assume that F forms the vertices of a d -dimensional simplex. Indeed, let F = { , x , . . . , x k } , choose a set of k linearly independent vectors { y , . . . , y k } ⊆ Z k , and deﬁne the set ∆ := { , ( x , y ) , . . . , ( x k , y k ) , z k +1 , . . . , z k + d } ⊆ Z k + d such that the vectors of ∆ \{ } form a basis of R k + d . If the set A ′ = A × P k contains an afﬁne copy of ∆ then clearly A contains an afﬁne copy of the set π (∆) ⊇ F , where π : R d × R k → R d is the natural orthogonal projection.In the case when ∆ ⊆ Z d is a d -dimensional simplex, we prove a quantitative version of Theorem 1.1.To formulate it we deﬁne the quantity l (∆) := d X i =1 | π i (∆) | , (1.2.1) MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 3 π i : R d → R being the orthogonal projection to the i -th coordinate axis. Theorem 1.2.

Let α > and let ∆ ⊆ Z d be a d -dimensional simplex. There exists a constant c ( α, ∆) > such that for any N > and any set A ⊆ P dN such that | A | ≥ α | P N | d , the set A contains at least c ( α, ∆) N d +1 (log N ) − l (∆) afﬁne copies of the simplex ∆ . Note that in Theorem 1.2 we do not require the copies of ∆ to be non-trivial, thus without loss of generality, N can be assumed to be sufﬁciently large with respect to α and ∆ . It is clear that Theorem 2 implies Theo-rem 1 as the number of trivial copies of ∆ in A is at most N d (log N ) − d .To see why the above lower bound is meaningful, note that there are ≈ N d +1 afﬁne copies of ∆ in [1 , N ] d , and for a ﬁxed i the probability that all the i -th coordinates of an afﬁne copy ∆ ′ are primes isroughly (log N ) −| π i (∆) | . Thus if the prime tuples behave randomly, the probability that ∆ ′ ⊆ P d is about (log N ) − l (∆) .In the contrapositive, Theorem 1.2 states that if a set A ⊆ P dN contains at most δN d +1 (log N ) − l (∆) afﬁnecopies of ∆ , then its relative density is at most ǫ, where ǫ = ǫ ( δ ) is a quantity such that ǫ ( δ ) → as δ → .As for a number of similar results [7], [18], [2], [20], to prove this, one formulates a statement involving apseudo-random measure ν = ν ( N ) : [1 , N ] → R + .1.3. The Green-Tao measure and the linear forms condition.

Let us recall the pseudo-random measure ν introduced by Green and Tao, and (a slight variant of) the so-called linear forms condition, see [7], Sec.9.Let ω be a sufﬁciently large number and let W = Q p ≤ ω p be the product of primes up to ω . For given b relative prime to W deﬁne the modiﬁed von Mangoldt function ¯Λ b : Z → R ≥ by ¯Λ b ( n ) = (cid:26) φ ( W ) W log( W n + b ) if W n + b is a prime otherwise . Here φ is the Euler function. Note that by Dirichlet’s theorem on the distribution of primes in residue classesone has that P n ≤ N ¯Λ b ( n ) = N (1 + o (1)) . A crucial fact is that the function ¯Λ b is majorized by divisorsums closely related to the so-called Goldston-Yildirim divisor sum [7], [11] Λ R ( n ) = X d | n,d ≤ R µ ( d ) log( R/d ) ,µ being the Mobius function and R = N d − − d − . Indeed, for given small parameters < ε < ε < (whose values will be speciﬁed later), recall the Green-Tao measure ν b ( n ) = ( φ ( W ) W Λ R ( W n + b ) log R if ε N ≤ n ≤ ε N ;1 otherwise . Clearly ν b ( n ) ≥ for all n , and it is easy to see that ν b ( n ) ≥ d − − d − ¯Λ b ( n ) (1.3.1)for all ε N ≤ n ≤ ε N , for N sufﬁciently large. Indeed, this is trivial unless W n + b is a prime, and inthat case, since ε N > R , Λ R ( W n + b ) = log R ≥ d − − d − log N . Note that the measure ν is in factdependent on N , however following [7] we do not explicitly indicate that.Let us brieﬂy recall the pseudo-randomness properties of the measures ν b - the so called linear formcondition - which we will need in the proof. This is a slight modiﬁcation of the formulation given in [7],however the proof works without any changes. BRIAN COOK, ´AKOS MAGYAR, TATCHAI TITICHETRAKUN

Theorem A (Linear forms condition, [7]) . Let N , W and the measures ν b be as above, and let m , t , k ∈ N be small parameters. Then the following holds.For given m ≤ m and t ≤ t , suppose that { l i,j } ≤ i ≤ m, ≤ j ≤ t are arbitrary integers at most k in ab-solute value, and that { b i } are arbitrary numbers relative prime to W . If the linear forms L i ( x ) = t X j =1 l i,j x j , are non-zero and pairwise linearly independent over the rationals then E m Y i =1 ν b i ( L i ( x )); x ∈ Z tN ! = 1 + o N,W →∞ ; m ,t ,k (1) , (1.3.2) where the o (1) term is independent of the choice of the b i ’s. In the above formula the linear forms L i ( x ) are considered as acting on ( Z /N Z ) t and the error term o N,W →∞ ; m ,t ,k (1) denotes a quantity that tends to 0 as both N → ∞ and W → ∞ , for any ﬁxedchoice of m , t , k . In our context it is important to let W = Q p ≤ ω p be independent of N to obtain thequantitative lower bound in Theorem 1.2, see also the remarks in [7] (Sec.11). As all error terms in (1.3.2)are independent of the choice of b i ’s, we will write ν for ν b i for simplicity of notations.With the aid of this measure, we deﬁne the weight of a ﬁnite set S ⊆ Z d as w ( S ) := d Y i =1 Y y ∈ π i ( S ) ν ( y ) (1.3.3)where π i ( S ) is the canonical projection of S to the i -th coordinate axis. If S = { x } we will write w ( x ) := w ( { x } ) = Q di =1 ν ( x i ) . The point is that if

W x + b ∈ P dN (and x ∈ [ ε N, ε N ] d ), then w ( x ) ≈ (log N ) d . (1.3.4)The implicit constant depends only on d and W - which we will choose sufﬁciently large but independentof N . Moreover for ∆ ⊆ [ ε N, ε N ] d such that W ∆ + b ⊆ A ⊆ P dN one has w (∆) ≈ (log N ) l (∆) . (1.3.5)Thus identifying [1 , N ] with Z N = Z /N Z it is easy to show that (see Sec.5) Theorem 1.2 follows from Theorem 1.3.

Let ∆ = { v , · · · , v d } ⊆ Z d be a d-dimensional simplex and let δ > . Let N be a largeprime and let A ⊆ Z dN satisfy E x ∈ Z dN ,t ∈ Z N (cid:18) d Y i =0 A ( x + tv i ) (cid:19) w ( x + t ∆) ≤ δ. (1.3.6) Then there exists ǫ = ǫ ( δ ) such that E x ∈ Z dN A ( x ) w ( x ) ≤ ǫ ( δ ) + o N,W →∞ ; ∆ (1) . Moreover ǫ ( δ ) → as δ → . We describe below some of the key elements of the proof. The details are given in the remaining sections.

MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 5

A Removal Lemma for weighted hypergraph systems.

We will use the construction of a weightedhypergraph associated to a set A ⊆ Z dN and a simplex ∆ = { v , . . . , v d } given in [18]. Deﬁnition 1.1 (Hypergraph System.) . Let J = { , , . . . , d } , H := { e : e ⊆ J } , and for a set e ∈ H , let V e = Z eN = Q j ∈ e Z N . Identify V e as the subspace of elements x = ( x , . . . , x d ) ∈ V J such that x j = 0 forall j / ∈ e and let π e : V J → V e denote the natural projection. For e = { j } we write V j := V { j } and for agiven H ⊆ H , we will call the quadruplet ( J, V J , H , d ) a hypergraph system. From a graph theoretical point of view we can think of a point x e ( e ∈ H , | e | = d ), as a d -simplex withvertices { x j : j ∈ e } . A set G e ⊆ V e then may be viewed as a d -regular d -partite hypergraph with ver-tex sets V j ( j ∈ e ) . Similarly a point x ∈ V J represents a d + 1 -simplex with faces x e := ( x j ) j ∈ e for e ∈ H d := { e ⊆ J, | e | = d } .For a given e ⊆ J deﬁne the σ − algebra A e = { π − e ( F ) : F ⊆ V e } , which will play an important rolein the proof of the removal lemma. For a given set A ⊆ Z dN and for e = J \{ j } , let E e = { x ∈ V J : d X i =0 x i ( v i − v j ) } ∈ A (1.4.1)Note that E e ∈ A e as the expression in (1.4.1) is independent of the coordinate x j . Deﬁnition 1.2 (Weighted system) . We will deﬁne now a family of functions ν e : V J → R + , µ e : V J → R + .For e ∈ H d , e = J \{ j } and ≤ k ≤ d . Deﬁne L ke ( x ) = d X i =0 x i ( v ki − v kj ) (1.4.2) where v ki denotes the k th − coordinate of the vector v i . We partition the family of forms L := { L ke ; | e | = d, ≤ k ≤ d } according to which coordinates they depend on. For this we deﬁne the support of a linear form L ( x ) = P dk =0 a k x k as supp ( L ) = { k : a k = 0 } . For a given e ⊆ J , deﬁne ν e ( x ) = Y L ∈L , supp ( L )= e ν ( L ( x )) , µ e ( x ) = Y L ∈L , supp ( L ) ⊆ e ν ( L ( x )) , (1.4.3) with the convention that ν e ≡ if { L ; supp ( L ) = e } = ∅ . Note that if ∆ = { v , · · · , v d } is in general position, that is if v ki = v kj for all i = j and k then supp( L ke ) = e for all e ∈ H d hence µ e ( x ) = ν e ( x ) = d Y k =1 ν ( L ke ( x )) In general, we have µ e ( x ) = Q f ⊆ e ν f ( x ) and also µ e ( x ) = µ e ( π e ( x )) , that is µ e is constant along theﬁbers of the projection π e . We will refer the functions ν e and µ e as weights and measures respectively. Toemphasize this point of view we will often use the integral notation and write Z V J F ( x ) dµ e ( x ) := E x ∈ V J F ( x ) µ e ( x ) , and Z V e F e ( x ) dµ e ( x ) := E x ∈ V e F e ( x ) µ e ( x ) , for functions F : V J → R and F e : V e → R . Thus we could think of µ e as a measure on V J or on thesubspace V e , the exact interpretation will be clear from the context. Note that it follows easily from thelinear forms condition that µ e ( V e ) = R V e dµ e = 1 + o N,W →∞ (1) (similarly µ e ( V J ) = 1 + o N,W →∞ (1) ),see Lemma 2.1. BRIAN COOK, ´AKOS MAGYAR, TATCHAI TITICHETRAKUN

Let us observe now some properties of the family of linear forms L which will play a crucial role in the proof.If e = J \{ j } , e ′ = J \{ j ′ } then supp ( L ke ′ ) ⊆ e if and only if v kj = v kj ′ and that is equivalent to L ke ′ = L ke . Wecall such a family L well-deﬁned . Since for a given e ∈ H d , the forms { L ke , ≤ k ≤ d } are linearly indepen-dent any two distinct forms of the family L are linearly independent. We will refer to such families of formsas being pairwise linearly independent . Also let M = { x ∈ V J : x + . . . + x d = 0 } . Then for any x ∈ M , L ke ( x ) = L ke ′ ( x ) for all e, e ′ ∈ H d and k . We call a family of linear forms L = { L ke ; e ∈ H d , ≤ k ≤ s } satisfying this property symmetric .To see how the weighted hypergraph { ν e } e ∈H is related to our problem we follow [18] to parameterizeafﬁne copies of ∆ . Deﬁne the map

Φ : Z d +1 N → Z d +1 N by Φ( x ) = ( d X i =0 x i v i , − d X i =0 x i ) := ( y, t ) (1.4.4)By (1.4.1) and (1.4.4) we have that x ∈ E e for e = J \{ j } if and only if y + tv j ∈ A thus x ∈ T e ∈H d E e exactly when y + t ∆ ⊆ A . Since Φ is one to one, as we assume { v − v , . . . , v d − v } is alinearly independent family of vectors, this gives a parametrization of all afﬁne copies of ∆ contained in A (mod N ) . Also for e = J \{ j } L ke ( x ) = d X i =0 x i ( v ki − v kj ) = π k ( y + tv j ) (1.4.5)where π k is the orthogonal projection to the k th coordinate axis. This implies that µ e ( x ) = Y supp ( L ) ⊆ e ν ( L ( x )) = d Y k =1 ν ( L ke ( x )) = w ( y + tv j ) , (1.4.6)and also µ J ( x ) = Y L ∈L ν ( L ( x )) = w ( y + t ∆) . (1.4.7)Thus the assumption (1.3.6) in Theorem 1.3 translates to E x ∈ V J Y e ∈H d E e ( x ) µ J ( x ) = E ( y,t ) ∈ Z d +1 N w ( y + t ∆) ≤ δ. (1.4.8)On the other hand, recall M = { x ∈ V J : x + · · · + x d = 0 } then x ∈ M ∩ T e ∈H d E e if and only if Φ( x ) = ( y, with y ∈ A , thus by (1.4.4), (1.4.6) E y ∈ A w ( y ) = E x ∈ M Y e ∈H d E e ( x ) µ e ′ ( x ) (1.4.9)for any ﬁxed e ′ ∈ H d . Thus it is easy to see that Theorem 1.3 follows from a removal lemma for weightedhypergraphs, which we ﬁrst recall in the unweighted case (where ν f ≡ for all f ). See also [18], [5], [14]. Theorem B. (Simplex Removal Lemma) [19] . Let E e ∈ A e be given for e ∈ H d , and let δ > . Also let µ J and µ e denote the normalized counting measures on V J and V e . There exists ε = ε ( δ ) > and for everyindex set e ∈ H d there exists a set E ′ e ∈ A e such that the following holds.If E x ∈ V J Y e ∈H d E e ( x e ) dµ J ( x ) ≤ δ, MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 7 then Y e ∈H d E ′ e ( x e ) = 0 for all x ∈ V J , E x ∈ V e E e \ E ′ e ( x ) µ e ( x ) ≤ ǫ ( δ ) , and ε ( δ ) → as δ → . Naturally one would like to extend Theorem B to families of measures { µ e } e ∈H d in the weighted case, asthat would easily imply Theorem 1.3 and hence our main result Theorem 1.2. The reason why this seemsdifﬁcult is the existence of weights ν e on lower dimensional edges | e | < d when the conﬁguration ∆ is notin general position. Removing these weights does not seem amenable to known “ transference arguments”developed in [18], [1], [6], [15]. What we prove instead is that the removal lemma extends to a family ofmeasures ˜ µ e which are sufﬁciently small perturbations of the measures µ e with respect to a given family offunctions g e : V e → R . Theorem 1.4. (Weighted Simplex Removal Lemma) Let { ν e } e ⊆ J , { µ e } e ⊆ J be a system of weights and mea-sures associated to a well-deﬁned, pairwise linearly independent, and symmetric family of linear forms L as deﬁned in (1.4.3) . Let E e ⊆ A e , g e : V e → [0 , be given for e ∈ H d . Then for a given δ > there existsan ǫ = ǫ ( δ ) > such that the following holds: If E x ∈ V J Y e ∈H d E e ( x ) µ J ( x ) ≤ δ (1.4.10) then there exists a well-deﬁned and symmetric family of linear forms ˜ L = { ˜ L ke ; e ∈ H d , ≤ k ≤ d } suchthat the associated system of weights and measures { ˜ ν e } e ⊆ J , { ˜ µ e } e ⊆ J satisfy E x ∈ V J Y e ∈H d E e ( x )˜ µ J ( x ) = E x ∈ V J Y e ∈H d E e ( x ) µ J ( x ) + o N,W →∞ (1) (1.4.11) and for all e ∈ H d E x ∈ V e g e ( x )˜ µ e ( x ) = E x ∈ V e g e ( x ) µ e ( x ) + o N,W →∞ (1) . (1.4.12) In addition there exist sets E ′ e ∈ A e such that \ e ∈H d ( E e ∩ E ′ e ) = ∅ (1.4.13) and for all e ∈ H d we have E x ∈ V e E e \ E ′ e ( x )˜ µ e ( x ) ≤ ǫ ( δ ) + o N,W →∞ (1) . (1.4.14) Moreover, we also have that ǫ ( δ ) → , as δ → . (1.4.15) It seems possible to formulate the properties of weight system { ν e } e ⊆ J so that Theorem 1.4 holds without referring to anunderlying system of linear forms L . For that one would need to formulate a ‘linear forms’ condition for weighted hypergraphssimilar to [18] at an order depending on δ . We will not pursue this approach here. BRIAN COOK, ´AKOS MAGYAR, TATCHAI TITICHETRAKUN

Proof [Theorem 1.4 implies Theorem 1.3]By assumption (1.3.6) in Theorem 1.3 and by (1.4.7), E x ∈ V J Y e ∈H d E e ( x ) µ J ( x ) ≤ δ. For a given e ′ ∈ H d deﬁne the function g e ′ : V e ′ → [0 , as follows. Let φ e ′ : V e ′ → M be the inverse ofthe projection map π e ′ : V J → V e ′ restricted to M , and for y ∈ V e ′ let g e ′ ( y ) := Y e ∈H d E e ( φ e ′ ( y )) . Applying Theorem 1.4 to the system of weights { ν e } and functions { g e } gives a system of measures ˜ µ e and sets E ′ e ∈ A e satisfying (1.4.11)-(1.4.15). By (1.4.4) we have that x ∈ M ∩ T e ∈H d E e if and only if Φ( x ) = ( y, with y ∈ A . Moreover in that case w ( y ) = µ e ( x ) for all e ∈ H d by (1.4.6), thus for anygiven e ′ ∈ H d E y ∈ Z dN A ( y ) w ( y ) = E x ∈ M Y e ∈H d E e ( x ) µ e ′ ( x ) = E z ∈ V e ′ g e ′ ( z ) µ e ′ ( z )= E z ∈ V e ′ g e ′ ( z )˜ µ e ′ ( z ) + o N,W →∞ (1)= E x ∈ M Y e ∈H d E e ( x )˜ µ e ′ ( x ) + o N,W →∞ (1) . By (1.4.13), Q e ∈H d E e ≤ P e ∈H d E e \ E ′ e . Then the symmetry of the measures ˜ µ e (i.e. the fact that ˜ µ e ( x ) = ˜ µ e ′ ( x ) for x ∈ M ), (1.4.14) and the fact that E e \ E ′ e is constant on the ﬁbers π − e ( x ) implies E x ∈ M Y e ∈H d E e ( x )˜ µ e ′ ( x ) ≤ X e ∈H d E x ∈ M E e \ E ′ e ( x )˜ µ e ′ ( x )= X e ∈H d E x ∈ V e E e \ E ′ e ( x )˜ µ e ( x ) ≤ ( d + 1) ǫ ( δ ) + o N,W →∞ (1) . Choosing

N, W sufﬁciently large with respect to δ gives E y ∈ Z dN A ( y ) w ( y ) ≤ ǫ ′ ( δ ) , with, say ǫ ′ ( δ ) := ( d + 2) ǫ ( δ ) . (cid:3) Weighted box norms and hypergraph regularity.

The known proofs of the Simplex Removal Lemmarely on the so-called Hypergraph Regularity Lemma and the associated Counting Lemma [19],[5],[14], andin particular the notion of a regular or pseudo-random hypergraph. This can be deﬁned in different ways,we use a variant of Gowers’s box norms [5] adapted to our settings.Let e ∈ H d be ﬁxed. For a given ω ∈ { , } e (i.e. ω : e → { , } ), deﬁne the orthogonal projection ω e : V e × V e → V e by ω e ( x e , q e ) i = ( x i if ω i = 0 q i if ω i = 1 (1.5.1) MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 9 for i ∈ e , and the weighted box norm of a function F : V e → R , using the notation x f := π f ( x ) for f ⊆ J ,as k F k d (cid:3) νe = E x,q ∈ V e Y ω ∈{ , } e F ( ω e ( x, q )) Y f ⊆ e Y ω ∈{ , } f ν f ( ω f ( x f , q f )) (1.5.2)Note that if ν f ≡ for all f ⊆ e , then k F k (cid:3) νe = k F k (cid:3) is the usual box norm. Example 1.

Let e = (0 , and F : V × V → R . Then k F k (cid:3) νe = E x ,q ∈ V , x ,q ∈ V F ( x , x ) F ( x , q ) F ( q , x ) F ( q , q ) × ν e ( x , x ) ν e ( x , q ) ν e ( q , x ) ν e ( q , q ) ν ( x ) ν ( q ) ν ( x ) ν ( q ) . The points ω e ( x, q ) and ω f ( x f , q f ) may be viewed as the faces and edges of a d -dimensional octahedron K d with vertices { x j , q j ; j ∈ e } . The inner product in (1.5.2) represents the total weight of the octahedronobtained by multiplying the weights of all edges and vertices. The boxnorm itself is the weighted averageof F over all embeddings of the hypergraph K d .It is not hard to see that the (cid:3) ν -norm is indeed a norm (for d ≥ ) and an appropriate version of theGowers-Cauchy-Schwarz inequality holds, see the Appendix). The importance of this norm is that it con-trols weighted averages over d + 1 -dimensional simplices, something which plays an important role inproving the Counting Lemma. More precisely one has the following. Proposition 1.1. (Weighted von Neumann inequality) Let F e : V e → R be a given functions, such that | F e | ≤ for each e ∈ H d . Then there is an absolute constant C such that (cid:12)(cid:12) E x ∈ V J Y e ∈H d F e ( π e ( x )) µ J ( x ) (cid:12)(cid:12) ≤ C min e ∈H d k F e k (cid:3) νe + o N,W →∞ (1) . (1.5.3)The (cid:3) ν -norm has also been deﬁned and studied in [8] see Appendix B-C there, where various forms ofvon Neumann type inequalities have been shown. In fact it is not hard to adapt the arguments given there toprove Proposition 1.1, however as our setting is somewhat different we will include a proof in an appendix.The above inequality motivates the following Deﬁnition 1.3.

Let e ∈ H d and ε > be ﬁxed and let G e ⊆ V e be a d -regular hypergraph. We say that G e is ε -regular with respect to the weight system { ν f } f ⊆ e if k G e − µ e ( G e ) V e k (cid:3) νe ≤ ε. (1.5.4)It is easy to see from Proposition 1.1 that if the sets E e ∈ A e are cε − regular for all e ∈ H d (with a suf-ﬁciently small constant c > ), then Theorem 1.4 holds with { ˜ µ e } = { µ e } . Indeed, writing G e = π e ( E e ) , G e = µ e ( G e ) 1 V e + F e , and substituting this decomposition into the left side of (1.4.10) we get d +1 − error terms each of which is bounded by c ′ ε (for some small absolute constant c ′ > as long as N and W is sufﬁciently large with respect to ε ), and a main term of the form Q e ∈H d µ e ( G e ) which by the assump-tion of Theorem 1.4 should be less than, say ε . This implies that E x ∈ V e G e ( x ) µ e ( x ) = µ e ( G e ) ≤ δ for δ = (2 ε ) d +1 , for at least one e ∈ H d . Thus the sets E ′ e := ∅ , E ′ e ′ := E e ( e ′ = e ) satisfy the conclusion ofTheorem 1.4.Of course in general the hypergraphs G e = π e ( E e ) are not sufﬁciently regular, the bulk of our argument isto obtain a “Regularity Lemma” in our weighted setting. This roughly says that one can partition the sets G e into sufﬁciently regular hypergraphs with respect to a system of measures ˜ µ e which are small perturbationsof the initial measures µ e . Our proof is based on the iterative process described in [19] however we needto modify the entire argument because of the presence of weights on the lower dimensional edges. Duringthe process we construct increasing families of weight systems { ν q,e } e ∈ ¯ H ,q ∈ Ω which for most values of the parameter q will give rise to small perturbations of the initial weight system { ν e } e ∈ ¯ H .Let us sketch below how the weights ν q,e and the associated measures µ q,e arise in the special case d = 2 , ν ≡ ν ≡ . Assume that there is an edge e , say e = (1 , , so that the graph G e = π e ( E e ) is not ε -regular. This means k F k (cid:3) νe ≥ ε, (1.5.5)where F = G e − µ e ( G e ) V e . In view of deﬁnition (1.5.2), we may write k F k (cid:3) νe = Z V e Z V e F ( x ) u q ( x ) u q ( x ) ν e ( x , q ) ν e ( q , x ) dµ e ( x ) dµ e ( q ) ≥ ε , (1.5.6)where x = ( x , x ) , q = ( q , q ) , u q ( x ) = F ( x , q ) , and u q ( x ) = F ( q , x ) F ( q , q ) . If one deﬁnesthe measures µ q,e , depending on the parameter q , by µ q,e ( x ) := ν e ( x , q ) ν e ( q , x ) µ e ( x ) , then the inner expression in (1.5.6) can be viewed as the inner product Γ( q ) := (cid:10) F, u q · u q (cid:11) µ q,e = Z V e F ( x ) u q ( x ) u q ( x ) dµ q,e ( x ) , (1.5.7)on the Hilbert space L ( V e , µ q,e ) . Thus (1.5.6) translates to E q ∈ V e Γ( q ) µ e ( q ) ≥ ε while using the linearforms condition it is easy to see that E q ∈ V e Γ( q ) µ e ( q ) . thus Γ( q ) & ε , for q ∈ Ω , (1.5.8)for a set Ω ⊆ V e of measure µ e (Ω) & ε . As the functions u iq are bounded, without loss of generality we mayassume that they are indicator functions of sets U iq ⊆ V i . Let B q = B q ∨ B q denote the σ -algebra generatedby the sets π − i ( U iq ) ( i = 1 , on V e , and let E µ q,e ( G e |B q ) be the conditional expectation function of G e with respect to this σ -algebra and the measure µ q,e . Then, as u q u q is measurable with respect to B q we have h G e − E µ q,e ( G e |B q ) , u q u q i µ q,e = 0 . This together with (1.5.7) and (1.5.8) implies that for q ∈ Ω we have h E µ q,e ( G e |B q ) − E µ e ( G e |B ) , u q u q i µ q,e & ε , where B = { V e , ∅} is the trivial σ -algebra, and E µ e ( G e |B ) = µ e ( G e ) V e . Then by the Cauchy-Schwartzinequality, we arrive at k E µ q,e ( A e |B q ) − E µ e ( A e |B ) k L ( µ q,e ) & ε . (1.5.9)Note that, by the Pythagorean theorem, if the second term on the left side would be a conditional expectationwith respect to the measure µ q,e then one would obtain an “energy increment” k E µ q,e ( A e |B q ) − E µ q,e ( A e |B ) k L ( µ q,e ) = k E µ q,e ( A e |B q ) k L ( µ q,e ) − k E µ q,e ( A e |B ) k L ( µ q,e ) & ε . To overcome this “discrepancy”, using the linear forms condition, we show that for given B ⊆ V e one hasfor almost every q ∈ V e E q ∈ V e | µ q,e ( B ) − µ e ( B ) | µ e ( q ) = o N,W →∞ (1) . This in turn implies that k E µ q,e ( G e |B ) − E µ e ( G e |B ) k L ( µ q,e ) = o N,W →∞ (1) and k E µ e ( G e |B ) k L ( µ e ) = k E µ q,e ( G e |B ) k L ( µ q,e ) + o N,W →∞ (1) . Though our exposition later is self-contained, some familiarity with standard notions and arguments, such the conditionalexpectation, energy increment, discussed for example in [19], may be helpful here.

MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 11

Then from (1.5.9) we have for almost every q ∈ Ω , that k E µ q,e ( G e |B q ) k L ( µ q,e ) ≥ k E µ e ( G e |B ) k L ( µ e ) + c ε . (1.5.10)If F : V → R is a function and ( V, B , µ ) is a measure space, the quantity k E µ ( F |B ) k L ( µ ) is sometimesreferred to as the “energy” of the function F with respect to the measure space ( V, B , µ ) , so (1.5.10) is tellingthat if G e is not ε -uniform with respect to the initial measure spaces ( V e , B , µ e ) then its energy increasesby a ﬁxed amount when passing to the measure spaces ( V e , B q , µ q,e ) for (almost) every q ∈ Ω . One caniterate this argument to arrive to a family of measure spaces ( V e , B q,e , µ q,e ) e ∈H d , q ∈ Ω such that the atoms G q,e ∈ B q,e become sufﬁciently uniform, thus obtaining a parametric version of the so-called Koopman-von Neumann decomposition, see [19]. This can be further iterated to eventually obtain a regularity lemma.Note that the number of linear forms deﬁning the measures µ q,e is increasing at each step of the iteration,causing the linear forms condition to be used at a level depending eventually on the relative density of theset A and not just on the dimension d .1.6. Outline of the paper.

In Section 2 we describe the type of parametric weight systems { ν q,f } f ∈H , q ∈ Z that we encounter later on. Here we also discuss their basic properties such as stability and symmetry. InSection 3 we introduce the energy increment argument for parametric systems, as well as prove a regularitylemma. Section 4 is devoted to proving the counting and removal lemmas. Many of our arguments in Sec-tion 3 and Section 4 may be viewed as an extension of those in [19]. In the last section we obtain our mainresults stated in the introduction. The basic properties of weighted box norms are discussed in an Appendix.As for our notations most, of our variables are vector type, although we do not emphasize this. We thinkof the initial data ∆ = { v , . . . , v d } being ﬁxed throughout, and do not denote the dependence on variouselements of ∆ . For example we write Y = O ( X ) or Y . X if Y ≤ C X for some constant

C > depending only on the vectors v i or the dimension d . If y , . . . , y s and X additional parameters we write O y ,...,y s ( X ) for a quantity Y bounded by C ( y , . . . , y s ) X or equivalently Y . y ,...,y s X .We’ll utilize the linear forms condition throughout the paper, giving rise to error terms which tends to 0as both N → ∞ and W → ∞ for any ﬁxed choice of the parameters y , . . . , y s on which they may de-pend. The standard notation for such terms would be o N,W →∞ ; y ,...,y s (1) , which for simplicity we will write o y ,...,y s (1) . Finally as all estimates in the linear forms condition involving the weights ν b are independentof the choice of b we write in certain places ν = ν b for the purpose of simplifying the notation.2. B ASIC PROPERTIES OF PARAMETRIC WEIGHT SYSTEMS AND THEIR EXTENSIONS

In this section we deﬁne the type of parametric systems and associated families of measures we encounterlater and discuss their basic properties such as stability and symmetry. We also discuss the type of extensionsof such systems which arise in our induction process.2.1.

Parametric weight systems and stability properties.

Recall the family of measures { µ e } e ∈H con-structed in (1.4.1) µ e ( x ) = Y L ∈L , supp ( L ) ⊆ e ν ( L ( x )) , where the family L deﬁned in (1.4.1) consists of pairwise linearly independent forms. The following state-ment is based on the linear forms condition and is a prototype of many of the arguments in this section. Lemma 2.1.

For all e ∈ H we have that µ e ( V e ) = 1 + o (1) . (2.1.1) Moreover if g : V e → [ − , then E x e ∈ V e g ( x e ) µ e ( x e ) = E x ∈ V J g ( π e ( x )) µ J ( x ) + o (1) , or equivalently Z V e g dµ e = Z V J ( g ◦ π e ) dµ J + o (1) . (2.1.2) Proof.

Note that the linear forms appearing on the right side of µ e ( V e ) = E x ∈ V e Y supp ( L ) ⊆ e ν ( L ( x )) are pairwise linearly independent, and as they are supported on e they remain pairwise independent whenrestricted to V e . Thus (2.1.1) follows from the linear forms condition.To show (2.1.2), let e ′ = J \ e and write x = ( x e , x e ′ ) with x e = π e ( x ) , x e ′ = π e ′ ( x ) . Then E := E x ∈ V J ( g ◦ π e )( x ) µ J ( x ) − E x e ∈ V e g ( x e ) µ e ( x e ) = E x e ∈ V e g ( x e ) µ e ( x e ) E x e ′ ∈ V e ′ (w( x e , x e ′ ) − , where w( x e , x e ′ ) = Q f * e ν f ( x e ∩ f , x e ′ ∩ f ) .By (2.1.1) we have that µ e ( V e ) . , and then by the Cauchy-Schwartz inequality | E | . E x e ∈ V e E x e ′ ,y e ′ ∈ V e ′ (w( x e , x e ′ ) − x e , y e ′ ) − µ e ( x e ) . The right hand side of this expression is a combination of four terms and (2.1.2) follows from the fact thateach term is o (1) . Indeed the linear forms appearing in the deﬁnition of the function µ e ( x e ) depend onlyon the variables x j for j ∈ e and are pairwise linearly independent. All linear forms involved in w( x e , x e ′ ) depend also on some of the variables in x j , j ∈ e ′ , while the ones in w( x e , y e ′ ) depend on the variablesin y j , j ∈ e ′ , hence these forms depend on different sets of variables. Thus the forms appearing in theexpression µ e ( x e )w( x e , x e ′ )w( x e , y e ′ ) are pairwise linearly independent and (2.1.2) follows from the linearforms condition. Note that the estimate is independent on the function g . (cid:3) This will allow us to consider sets G e ⊆ V e as sets G e = π − e ( G e ) ⊆ V J , changing their measure only by anegligible amount µ J ( G e ) = µ e ( G e ) + o (1) (2.1.3)Next we deﬁne weight systems and associated families of measures depending on parameters. Let L q := ( L ( q, x ) , ..., L s ( q, x )) be a family of linear forms with integer coefﬁcients depending on the parameters q ∈ Z R and the variables x ∈ Z D . We call the family pairwise linearly independent if no two forms in the family are rational multiplesof each other. If N is a sufﬁciently large prime with respect to the coefﬁcients of the linear forms L i ( q, x ) ,then the forms remain pairwise linearly independent when considered as forms over Z × V , Z = Z RN , V = Z DN . We refer to the set Z = Z RN as the parameter space of the family L q . As our arguments willinvolve averaging over the parameter space Z , we call the family L q well-deﬁned if there is measure on Z given by Z Z g ( q ) dψ ( q ) = E q ∈ Z g ( q ) ψ ( q ) , ψ ( q ) = t Y i =1 ν ( Y i ( q )) , (2.1.4)for a family of pairwise linearly independent linear forms Y i deﬁned over Z , and if all forms L i ( q, x ) dependon some of the x -variables.If V = V J then we deﬁne an associated system of weights { ν q,e } q ∈ Z,e ∈H and measures { µ q,e } q ∈ Z,e ∈H asfollows. For a form L k ( q, x ) = P i b i q i + P j a j x j deﬁne its x -support as supp x ( L ) = { j ∈ J ; a j = 0 } .For e ⊆ J and q ∈ Z , let MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 13 ν q,e ( x ) := Y L ∈L q supp x ( L )= e ν ( L ( q, x )) , µ q,e ( x ) := Y L ∈L q supp x ( L ) ⊆ e ν ( L ( q, x )) (2.1.5)We use the convention that ν q,e ≡ if there is no form L ⊆ L q such that supp x ( L ) = e . Note that the x -support partitions the family of forms L q independent of the parameters q , thus for given e ∈ H µ q,e ( x ) = Y f ⊆ e ν q,e ( x ) , for all q ∈ Z. A crucial observation is that many of the properties of the measure system { µ e } still hold for well-deﬁnedmeasure systems { µ q,f } for almost every value of the parameter q ∈ Z . In order to formulate such state-ments we say that the family L has complexity at most K if the dimension of the space Z , the number oflinear forms L j ( q, x ) , Y l ( q ) , and the magnitude of their coefﬁcients are all bounded by K . This quantitywill control the dependence of the error terms in applications of the linear forms condition. We have theanalogue of Lemma 2.1. Lemma 2.2.

Let { µ q,e } e ∈H ,q ∈ Z be a well-deﬁned parametric measure system of complexity at most K .For every e ∈ H there is a set E e ⊆ Z such that ψ ( E e ) = o K (1) , and for every q / ∈ E e µ q,e ( V e ) = 1 + o K (1) . (2.1.6) Moreover for every e ∈ H there is a set E e ⊆ Z of measure ψ ( E e ) = o (1) , such the following holds. Forany function g : Z × V e → [ − , and for every q / ∈ E e one has the estimate Z V e g ( q, x e ) dµ q,e ( x e ) = Z V J g ( q, π e ( x )) dµ q,J ( x ) + o K (1) . (2.1.7) Proof.

To prove (2.1.6) consider the quantity Λ e := Z Z | µ q,e ( V e ) − | dψ ( q )= Z Z E x e ,y e ( Y supp x ( L ) ⊆ e ν ( L ( q, x e )) − Y supp x ( L ) ⊆ e ν ( L ( q, y e )) − dψ ( q ) . The above expression is a combination of four terms and note that the family of linear forms { Y k ( q ) , L i ( q, x e ) , L j ( q, y e ) } is pairwise linearly independent in the ( q, x e , y e ) variables by our assumptions. Applying the linear formscondition gives that each term is o K (1) and so Λ e = o K (1) and (2.1.6) follows.Now let e ′ = J \ e , write x = ( x e , x e ′ ) and arguing as in Lemma 2.1 we have Λ( q, e, g ) := | E x ∈ V J g ( q, π e ( x )) µ q,J ( x ) − E x e ∈ V e g ( q, x e ) µ q,e ( x e ) | = | E x e ∈ V e g ( q, x e ) µ q,e ( x e ) E x e ′ ∈ V e ′ (w q ( x e , x e ′ ) − |≤ E x e ∈ V e µ q,e ( x e ) | E x e ′ ∈ V e ′ (w q ( x e , x e ′ ) − | , where w q ( x e , x e ′ ) = Q f * e ν q,f ( x e ∩ f , x e ′ ∩ f ) .Notice that the right hand side of the above inequality is independent of the function g ; if we denote itby Λ( q, e ) then (2.1.7) would follow from the estimate E q ∈ Z Λ( q, e ) dψ ( q ) = o K (1) . By the linear formscondition E q,x e dψ ( q ) dµ q,e ( x e ) = 1 + o K (1) ≤ , for N sufﬁciently large with respect to K . Then by the Cauchy-Schwartz inequality one has ( E q ∈ Z Λ( q, e ) dψ ( q )) . E q ∈ Z, x e ∈ V e E x e ′ ,y e ′ ∈ V e (w q ( x e , x e ′ ) − q ( x e , y e ′ ) − dµ q,e ( x e ) dψ ( q ) . This is a combination of four terms, however each term again is o K (1) as the linear forms deﬁning ψ depend on the variables q while the ones deﬁning µ q,e depend also on the x e variables. On the other handall linear forms appearing in the weight functions w q ( x e , x e ′ ) (respectively, w q ( x e , y e ′ ) ) depend on the x e ′ (respectively, y e ′ ) variables as well. Thus the family of all linear forms in the above expressions is pairwiselinearly independent in the ( q, x e , x e ′ , y e ′ ) variables. (cid:3) Extension of parametric systems.

During our iteration process we will encounter extensions of para-metric families of forms depending on more and more parameters. Roughly speaking one extends a familyby adding new parameters together with new forms depending also on the new parameters. More pre-cisely let L q = { L ( q , x ) , ..., L s ( q , x ) } and L q = { L ( q , x ) , ..., L s ( q , x ) } be two pairwise lin-early indpendent families of linear forms deﬁned on the parameter spaces Z = Z k N and Z = Z k N . Let ψ and ψ be measures on Z and Z deﬁned by the families of linear forms { Y ( q ) , . . . Y s ( q ) } and { Y ( q ) , . . . Y s ( q ) } . Deﬁnition 2.1.

We say that the family L q is an extension of the family L q if Z ≤ Z and the followingholds. The family of forms L i ( q , x ) , Y j ( q ) which depend only on the variables q = π ( q ) is exactly thefamily of forms L i ( q , x ) , Y j ( q ) , where π : Z → Z is the natural orthogonal projection. If V = V J let µ := { µ q ,e } q ∈ Z ,e ∈H and µ := { µ q ,f } q ∈ Z ,f ∈H be the associated measure systems asdeﬁned in (2.1.5). We say that the measure system µ is an extension of the system µ .Let us make a few immediate observations. Writing Z = Z × Z , Z = Z rN and q = ( q , q ) , we have ψ ( q , q ) = ψ ( q ) · ϕ ( q , q ) (2.2.1)where ϕ ( q, q ) = Q ti =1 ν ( Y i ( q , q )) . The linear forms Y i ( q , q ) deﬁning ϕ ( q , q ) depend on some of thevariables of q = ( q i ) ≤ i ≤ k and are pairwise linearly independent. Similarly one may write for any e ∈ H µ q ,q ) ,e ( x e ) = µ q ,e ( x e )w e ( q , q, x e ) (2.2.2)where the linear forms L j ( q , q, x e ) deﬁning the function w e ( q, q , x e ) depend on (some of) the variables q as well as on (all of) the variables x e .In the special case when L = ( L ( x ) , .., L s ( x )) is a family of linear forms, a parametric family L q is calledan extension of L if the set of forms in L q which are independent of q is exactly the family L . Similarly, theassociated system of weights { ν q,e } and measures { µ q,e } is referred to as an extension of { ν e } and { µ e } . Lemma 2.3.

Let { µ f } f ∈H be a well deﬁned measure system, and let { µ q,f } q ∈ Z,f ∈H be a well-deﬁnedparametric extension of { µ f } f ∈H of complexity at most K . Then for any f ∈ H and for any function g : V f → [ − , there is a set E g,f ⊆ Z of measure ψ ( E g,f ) = o K (1) , so that for all q / ∈ E g,f Z V f g dµ q,f − Z V f g dµ f = o K (1) . (2.2.3) Similarly if { µ q ,f } f ∈H ,q ∈ Z is a well-deﬁned parametric system and if { µ q ,f } f ∈H ,q ∈ Z is an extension ofcomplexity at most K , then to any function g : Z × V f → [ − , there exists a set E g,f ⊆ Z of measure ψ ( E g,f ) = o K (1) , such that for all q = ( q , q ) / ∈ E g,f Z V f g ( q , x ) dµ q ,f ( x ) − Z V f g ( q , x ) dµ q ,f ( x ) = o K (1) . (2.2.4) MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 15

Proof. As µ q,f = µ f ( x f )w f ( q, x f ) , the left side of (2.2.3) may be written as Λ f,g ( q ) := Z V f g ( x )(w f ( q, x ) − dµ f ( x ) . Consider Λ f,g := Z Z | Λ f,g ( q ) | dψ ( q ) . Using the Cauchy-Schwartz inequality we estimate Λ f,g = Z Z Z V f Z V f (w f ( q, x ) − f ( q, y ) − g ( x ) g ( y ) dµ f ( x ) dµ f ( y ) dψ ( q ) ≤ Z V f Z V f (cid:12)(cid:12)(cid:12)(cid:12)Z Z (w f ( q, x ) − f ( q, y ) − dψ ( q ) (cid:12)(cid:12)(cid:12)(cid:12) dµ f ( x ) dµ f ( y ) . Now the Cauchy-Schwartz inequality and (2.1.1) gives | Λ f,g | . Z V f Z V f Z Z Z Z (w f ( q, x ) − f ( q, y ) − ×× (w f ( p, x ) − f ( p, y ) − dµ f ( x ) dµ f ( y ) dψ ( q ) dψ ( p ) . This last expression is a combination of 16 terms where each term is o K (1) by the linear form conditions.Indeed the linear forms which can appear in any of these terms are Y i ( q ) , Y i ( p ) , L i ( x ) , L i ( y ) , L i ( q, x ) , L i ( q, y ) , L i ( p, x ) , L i ( p, y ) . Note that the last 4 terms depend on both sets of variables (for example L i ( q, x ) depends both on q ∈ Z and on x ∈ V f ), and hence the family of these forms are pairwise linearlyindependent in the ( q, p, x, y ) variables. This Proves (2.2.3).The proof of (2.2.4) is essentially the same. Set Λ f,g ( q ) := Z V f g ( q , x ) dµ q ,f ( x ) − Z V f g ( q , x ) dµ q ,f ( x ) and Λ f,g := Z Z | Λ f,g ( q ) | dψ ( q ) . Write Z = Z × Z , where Z = Z kN , and q = ( q , q ) for q ∈ Z . By (2.2.1) we estimate as above Λ f,g . Z V f Z V f Z Z dψ ( q ) dµ q ,f ( x ) dµ q ,f ( y ) | E q ∈ Z (w f ( q , q, x ) − f ( q , q, y ) − ϕ ( q , q ) | . The linear forms condition gives Z V f Z V f Z Z dψ ( q ) dµ q ,f ( x ) dµ q ,f ( y ) = 1 + o K (1) , so then we have | Λ f,g | . Z V f Z V f Z Z E p,q ∈ Z (w f ( q , q, x ) − f ( q , q, y ) − ×× (w f ( q , p, x ) − f ( q , p, y ) − ϕ ( q , q ) ϕ ( q , p ) dψ ( q ) dµ q ,f ( x ) dµ q ,f ( y ) . The point is that any linear form L if ( q , q, x ) depends both on the variables q and x . Thus again the left sideis a combination of 16 terms, each being o K (1) by the linear forms condition as all the linear formsinvolved in any of these expressions are pairwise linearly independent in the ( x, y, q , q, p ) variables. (cid:3) Lemma 2.3 is an example of what we refer to as a stability property. it means that the extension measures µ ( q ,q ) ,f are small perturbations of the measures µ q ,f with respect to quantities which are independent of q .As a ﬁrst application of this principle we show that the weighted box norms, deﬁned in (1.4.2), remainessentially unchanged under parametric extensions of the weight systems deﬁning the norms. Let L q be apairwise linearly independent family of forms deﬁned on the parameter space ( Z , ψ ) and let { ν q ,e } bethe associated system of weights.Let g : Z × V e → R be a function and let e ∈ H , | e | = d ′ . For a given q ∈ Z recall the box norm of g q ( x ) = g ( q , x ) (cid:13)(cid:13) g q (cid:13)(cid:13) d ′ (cid:3) νq ,e = E p,x ∈ V e Y ω e ∈{ , } e g ( q , ω e ( p, x )) Y f ⊆ e Y ω f ∈{ , } f ν q,f ( ω f ( p f , x f )) , (2.2.5)where x f = π f ( x ) , p f = π f ( p ) , π f : V e → V f being the natural projection. The inner product on the rightside of (2.2.5) is deﬁned by the parametric family of forms ˜ L q = [ f ⊆ e { L ( q , ω f ( p f , x f )); L ∈ L q , supp x ( L ) = f, ω f ∈ { , } f } . (2.2.6)It is easy to see that this is a pairwise linearly independent family of forms deﬁned over Z × V ( V = V e × V e ) . Indeed, if we’d have that L ′ ( q , ω ′ f ′ ( p f ′ , x f ′ )) = λL ( q , ω f ( p f , x f )) , (2.2.7)then restriction both forms to the subspace { p = x } would imply that L ′ ( q , x f ′ ) = λL ( q , x f ) and hence f ′ = supp x ( L ′ ) = supp x ( L ) = f . Then, as L and L ′ depend exactly variables x j for j ∈ f , for (2.2.7) tohold, we should have ω ′ f = ω f and L = L ′ .If { ˜ µ q ,f } q ∈ Z ,f ⊆ e denotes the associated system of measures and G ( q , p, x ) := Y ω ∈{ , } e g ( q , ω e ( p, x )) , (2.2.8)then for given q ∈ Z (cid:13)(cid:13) g q (cid:13)(cid:13) d ′ (cid:3) νq ,e = E p,x ∈ V e G q ( p, x ) ˜ µ q ,e ( p, x ) . (2.2.9)Now, if L q is a well-deﬁned parametric extension of L q then (2.2.6) yields to a well-deﬁned parametricextension ˜ L q of the family ˜ L q . Then by Lemma 2.3, and the simple observation that | a d ′ − b d ′ | ≤ ε implies | a − b | ≤ ε − d ′ for a, b ≥ , we obtain Lemma 2.4.

Let { ν q ,f } f ∈H ,q ∈ Z be a parametric weight system with a well-deﬁned extension { ν q ,f } f ∈H ,q ∈ Z of complexity at most K . Then to any e ∈ H and to any function g : Z × V e → [ − , there exists a set E = E ( g, e ) ∈ Z of measure ψ ( E ) = o K (1) such that for all q = ( q , p ) / ∈ E (cid:13)(cid:13) g q (cid:13)(cid:13) (cid:3) νq ,e = (cid:13)(cid:13) g q (cid:13)(cid:13) (cid:3) νq ,e + o K (1) . (2.2.10)Let ( V, B , µ ) be a measure space and let g : V → R be a function. An important construction, the so-calledconditional expectation function is deﬁned as E µ ( g |B )( x ) = 1 µ ( B ( x )) E y ∈ V B ( x ) ( y ) g ( y ) dµ ( y ) = 1 µ ( B ( x )) Z B ( x ) g ( y ) dµ ( y ) , MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 17 where B ( x ) ∈ B is the atom containing x . If µ ( B ( x )) = 0 then we set E µ ( g |B )( x ) = 1 .The complexity of the σ -algebra B , denoted by compl( B ), is deﬁned as the minimum number of elementsof B which generates B . Note that the number of atoms of B is at most compl ( B ) . Next we compare theconditional expectation functions of parametric systems. Lemma 2.5.

Let ( µ q ,f ) q ∈ Z ,f ∈H be a well-deﬁned parametric measure system with a well-deﬁned exten-sion ( µ q ,f ) q ∈ Z ,f ∈H of complexity at most K . For q ∈ Z and e ∈ H , let B q ,e be a σ − algebra on V e such that compl( B q ,e ) ≤ M for some ﬁxed number M . For any function g : Z × V e → [ − , there existsa set E = E ( B , g ) ⊆ Z of measure ψ ( E ) = o M,K (1) such that for any q = ( q , q ) / ∈ E (1) we have (cid:13)(cid:13) E µ q ,e ( g q |B q ,e ) − E µ q ,e ( g q |B q ,e ) (cid:13)(cid:13) L ( µ q ,e ) = o M,K (1) (2.2.11)(2) and (cid:13)(cid:13) E µ q ,e ( g q |B q ,e ) (cid:13)(cid:13) L ( µ q ,e ) = (cid:13)(cid:13) E µ q ,e ( g q |B q ,e ) (cid:13)(cid:13) L ( µ q ,e ) + o M,K (1) . (2.2.12) Proof.

Let m = 2 M and enumerate the atoms of B q ,e as B q , ..., B mq , allowing some of them to possiblybe empty. For a ﬁxed ≤ i ≤ m deﬁne the functions b i ( q , x ) = B iq ( x ) = ( if x ∈ B iq otherwiseand for q = ( q , q ) ∈ Z deﬁne the quantities µ i ( q , g ) := Z V e g ( q , x ) b i ( q , x ) dµ q ,e ( x ) , µ i ( q ) := µ q ,e ( B iq ) ,µ i ( q , g ) := Z V e g ( q , x ) b i ( q , x ) dµ q ,e ( x ) , µ i ( q ) := µ q ,e ( B iq ) By Lemma 2.3 we have that µ i ( q , g ) = µ i ( q , g ) + o K (1) , µ i ( q ) = µ i ( q ) + o K (1) (2.2.13)for all q / ∈ E i where E i ⊆ Z is a set of ψ - measure o K (1) . Let E = S mi =1 E i then ψ ( E ) = o K ,M (1) . The left hand side of (2.2.11) takes the form m X i =1 (cid:18) µ i ( q , g ) µ i ( q ) − µ i ( q , g ) µ i ( q ) (cid:19) µ i ( q ) , (2.2.14)with the convention that if µ i ( q ) = 0 or µ i ( q ) = 0 then µ i ( q , g ) /µ i ( q ) := 1 or µ i ( q , g ) /µ i ( q ) := 1 . If q = ( q , q ) / ∈ E then by (2.2.13) ε := m X i =1 (cid:18) | µ i ( q , g ) − µ i ( q , g ) | + | µ i ( q ) − µ i ( q ) | (cid:19) = o K ,M (1) (2.2.15)Now if µ i ( q ) ≤ ε / then µ i ( q ) ≤ ε / by (2.2.13), hence the total contribution of such terms isbounded by m ε / = o K ,M (1) . If µ i ( q ) ≥ ε / then µ i ( q ) ≥ ε / , we have the estimate (cid:12)(cid:12)(cid:12)(cid:12) µ i ( q , g ) µ i ( q ) − µ i ( q , g ) µ i ( q ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε ( N )2 ε ( N ) / ≤ ε / = o K ,M (1) , This proves (2.2.11). The proof of inequality (2.2.12) proceeds the same way, here one needs to estimatethe quantity m X i =1 (cid:12)(cid:12)(cid:12)(cid:12) µ i ( q , g ) µ i ( q ) − µ i ( q , g ) µ i ( q ) (cid:12)(cid:12)(cid:12)(cid:12) = m X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) µ i ( q , g ) µ i ( q ) (cid:19) µ i ( q ) − (cid:18) µ i ( q , g ) µ i ( q ) (cid:19) µ i ( q ) (cid:12)(cid:12)(cid:12)(cid:12) (2.2.16)If µ i ( q ) ≤ ε / then µ i ( q ) ≤ ε / for q = ( q , q ) / ∈ E , thus the contribution of such terms to the rightside of (2.2.16) is trivially estimated by m ε / = o M,K (1) The rest of the terms are bounded by ε / and (2.2.12) follows. (cid:3) We also need an analogue of the above result when the k · k L ( µ q,e ) norm is replaced by the more compli-cated k · k (cid:3) νq,e norms. Lemma 2.6.

Let { ν q ,f } f ∈H ,q ∈ Z be a well-deﬁned extension of the parametric weight system { ν q ,f } f ∈H ,q ∈ Z ,of complexity at most K . For q ∈ Z and e ∈ H , let B q ,e be a σ -algebra of complexity at most M , forsome ﬁxed constant M > . Then k E ν q ,e ( g q |B q ,e ) − E ν q ,e ( g q |B q ,e ) k (cid:3) νq ,e = o M,K (1) , (2.2.17) for all q = ( q , q ) / ∈ E , where E = E ( g, B ) ⊆ Z is a set of measure ψ ( E ) = o M,K (1) . Proof.

First we show that for any family of sets A = ( A q ) q ∈ Z , A q ⊆ V e there is a set E = E ( g, A ) ofmeasure ψ ( E ) = o K (1) such that for all q = ( q , q ) / ∈ E we have k A q k d (cid:3) νq ,e ≤ µ q ,e ( A q ) + o K (1) . (2.2.18)To see this, ﬁrst note that for q = ( q , q ) ∈ Z one has k A q k d (cid:3) νq ,e ≤ E x,p ∈ V e A q ( x ) µ q ,e ( x ) Y f ⊆ e Y ω f =0 ν q ,f ( ω f ( p f , x f ))= µ q ,e ( A q ) + E ( q ) , with E ( q ) ≤ E x ∈ V e µ q ,e ( x ) | E p ∈ V e ( W ( q , p, x ) − | , where W ( q , p, x ) = Y f ⊆ e Y ω f =0 ν q ,f ( ω f ( p f , x f )) . Arguing as in Lemma 2.3, we see that E q ∈ Z E x,p,p ′ ∈ V e ψ ( q ) dµ q ,e ( x ) ( W ( q , p, x ) − W ( q , p ′ , x ) −

1) = o M,K (1) and (2.2.18) follows.Now let { B iq } mi =1 ( m = 2 M ) be the atoms of B q ,e and deﬁne the quantities µ i ( q , g ) , µ i ( q ) , µ i ( q , g ) , µ i ( q ) as in Lemma 2.4. The expression in (2.2.11) is then estimated (cid:13)(cid:13)(cid:13)(cid:13) m X i =1 (cid:18) µ i ( q , g ) µ i ( q ) − µ i ( q , g ) µ i ( q ) (cid:19) B iq (cid:13)(cid:13)(cid:13)(cid:13) (cid:3) νq ,e ≤ m X i =1 (cid:12)(cid:12)(cid:12)(cid:12) µ i ( q , g ) µ i ( q ) − µ i ( q , g ) µ i ( q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13) B iq (cid:13)(cid:13) (cid:3) νq ,e . m X i =1 (cid:12)(cid:12)(cid:12)(cid:12) µ i ( q , g ) µ i ( q ) − µ i ( q , g ) µ i ( q ) (cid:12)(cid:12)(cid:12)(cid:12) µ q ,e ( B iq ) − d + o M,K (1) , for q = ( q , q ) / ∈ E , where E = E ( B q ,e , g ) is a set of measure o M,K (1) . MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 19

Using the facts that µ i ( q , g ) = µ i ( q , g ) + o K (1) and µ i ( q ) = µ i ( q ) + o K (1) outside a set of measure o M,K (1) , and m X i =1 µ q ,e ( B iq ) = µ q ,e ( V e ) = 1 + o K (1) , it follows that the above expression is o M,K (1) by arguing as in Lemma 2.4. This completes the proof. (cid:3) Symmetric extensions.

We will also need our parametric families of forms to be symmetric, to applyTheorem 1.3, which we deﬁne as follows. Let for each e ∈ H d , L q,e = { L e ( q, x ) , ..., L se ( q, x ) } be a pairwiselinearly independent family of linear forms deﬁned on V = V J , depending on parameters q ∈ Z , such that supp x ( L je ) ⊆ e . We say that the family of forms L q = S e ∈H d L q,e is symmetric if L je ( q, x ) = L je ′ ( q, x ) forall q ∈ Z , x ∈ M = { x : x + · · · + x d = 0 } , e, e ′ ∈ H d and ≤ j ≤ s . Note that that our initial familyof forms deﬁned in (1.4.2) has this property.It is not hard to see that to a given family of forms L q,e , for a ﬁxed e ∈ H d , there is a unique symmetricfamily of forms L q such that L q,e = { L ∈ L q ; supp x ( L ) ⊆ e } . Indeed, if L q is such a family, then for e ′ ∈ H d , q ∈ Z and x ∈ V J L je ′ ( q, x ) = L je ′ ( q, π e ′ ( x )) = L je ′ ( q, φ e ′ ◦ π e ′ ( x )) = L je ( q, φ e ′ ◦ π e ′ ( x )) , (2.3.1)where φ e : V e → M is the inverse of the projection π e ′ restricted to M . This shows the uniqueness ofthe family L q . Conversely, deﬁne L je ′ ( q, x ) by the above equality, then it is clear that supp x ( L je ′ ) ⊆ e ′ ,moreover if x ∈ M then x = φ e ′ ◦ π e ′ ( x ) hence L je ′ ( q, x ) = L je ( q, x ) . Also, if supp x L je ′ ⊆ e then for all q ∈ Z and x ∈ V J L je ′ ( q, x ) = L je ′ ( q, φ e ◦ π e ( x )) = L je ( q, φ e ◦ π e ( x )) = L je ( q, x ) , This shows that all forms in L q which depend only on the variables x e are the forms of L q,e . Finally, if L q,e is a pairwise linearly independent family then so is L q , as linearly dependent forms must depend on the sameset of variables. We will refer to the family of forms L q as the symmetrization of the family L q,e . If f ∈ H for some edge | f | = d ′ ≤ d and L q,f is a family of forms deﬁned on V f then the above construction can beapplied to obtain a symmetric family L q simply by choosing an e ∈ H d such that f ⊆ e and considering L q,f as a family of forms on V e . Note that the construction is independent of the choice of e ⊇ f , as if f ⊆ e ′ as well then L je = L je ′ for all ≤ j ≤ s .In the next section, following [19] we will start an energy increment process to obtain a regularity lemmafor weighted hypergraphs. At each stage we will pass to an extension of a symmetric, well-deﬁned andpairwise independent parametric family L q deﬁned for q ∈ Z as follows. We choose an edge e ∈ H andconsider the extension of the family L q,e as given in (2.2.6), that is replacing the forms L j ( q, x f ) with theforms L j ( q, ω f ( p f , x f )) , ω f ∈ { , } f . This gives an extension ˜ L q,p,f deﬁned on the parameter space ( q, p ) ∈ Z × V f , which we symmetrize to obtain a new symmetric, well-deﬁned and pairwise independentfamily ˜ L q,p . The ﬁrst step of this process was described in the introduction in the special case e = (1 , .3. R EGULARIZATION OF P ARAMETRIC S YSTEMS

A Koopman-von Neumann type decomposition.

Let e ⊆ J and let B f be a σ -algebra on V f for f ∈ ∂e , where ∂e = { f ⊆ e ; | f | = | e | − } denotes the boundary of the edge e . Let B := W f ⊆ ∂e B f be the σ -algebra generated by the sets π − ef ( B f ) where π ef : V e → V f is the canonical projection. The atoms of B are the sets G = T f ⊆ ∂e π − ef ( G f ) with G f being an atom of B f , which may be interpreted as the collectionof simplices x ∈ V e whose faces x f are in G f for all f ∈ ∂e . The starting point of the proof of the Regularity Lemma, given in [19], is to show that if a set G e ⊆ V e isnot sufﬁciently regular with respect to B , that is if (cid:13)(cid:13) G e − E ( G e | _ f ∈ ∂e B f ) (cid:13)(cid:13) (cid:3) ≥ η, (3.1.1)then there exist σ -algebras B ′ f ⊇ B f for f ∈ ∂e , such that k E ( G e | _ f ∈ ∂e B ′ f ) (cid:13)(cid:13) L ≥ (cid:13)(cid:13) E ( G e | _ f ∈ ∂e B f ) (cid:13)(cid:13) L + cη . The quantity k E ( G |B ) k L is referred to as the energy (or index ) of the set G with respect to the σ -algebra B , thus the above inequality means that the energy of the set G e is increased by cη by reﬁning the σ -algebras B f . In addition the complexity of the σ -algebras B ′ f , denoted by compl ( B ′ f ) and deﬁned as theminimal number of sets generating the σ -algebra, is at most 1 larger than that of B f .In our settings for a given e ⊆ J we will have a parametric system of weights { ν q,f } q ∈ Z,f ⊆ e and measures { µ q,f } q ∈ Z,f ⊆ e associated to a well-deﬁned, pairwise linearly independent family of of forms L q deﬁned on Z × V e , as given in (2.1.5). For simplicity we will refer to such systems of weights and measures as being well-deﬁned . Lemma 3.1.

For given e ⊆ J , | e | = d ′ , let { µ q,f } q ∈ Z,f ⊆ e be a well-deﬁned family of measures of complex-ity at most K . For q ∈ Z let G q,e ⊆ V e and {B q,f } f ∈ ∂e be a σ -algebra on V f .Assume (cid:13)(cid:13) G q,e − E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) d ′ (cid:3) νq,e ≥ η, (3.1.2) for some η > and for each q ∈ Ω , where Ω ⊆ Z is a set of measure ψ (Ω) ≥ c > .Then for N, W sufﬁciently large with respect to the parameters c , η , there exists a well-deﬁned extension { µ q ′ ,f } q ′ ∈ Z ′ ,f ⊆ e of the system { µ q,f } of complexity K ′ = O ( K ) , and a set Ω ′ ⊆ Ω × V e ⊆ Z ′ = Z × V e such that the following hold. (1) We have ψ ′ (Ω ′ ) ≥ − c η , (3.1.3) where ψ ′ is the measure on the parameter space Z ′ . (2) For all q ′ = ( q, p ) ∈ Z ′ and f ∈ ∂e there is a σ − algebra B q ′ ,f ⊇ B q,f of complexitycompl ( B q ′ ,f ) ≤ compl ( B q,f ) + 1 . (3.1.4)(3) For all q ′ = ( q, p ) ∈ Ω ′ , one has (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) ≥ (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q,e ) + 2 − η , (3.1.5)(4) and µ q ′ ,e ( V e ) ≤ . (3.1.6)The meaning of the above lemma is that if there is a large “bad ” set Ω of parameters q for which the set G q,e is not sufﬁciently uniform with respect to the σ -algebra W f ∈ ∂e B q,f , then its energy will increase by aﬁxed amount when passing to a well deﬁned extension {B q ′ ,f } , { µ q ′ ,e } , for all q ′ = ( q, p ) ∈ Ω ′ . MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 21

Proof.

Let g q := G q,e − E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) . (3.1.7)Then by (2.2.5) we have for each q ∈ Ω (cid:13)(cid:13) g q (cid:13)(cid:13) d ′ (cid:3) νq,e = Z V e h g q , Y f ∈ ∂e u q,p,f i µ ( q,p ) ,e dµ q,e ( p ) ≥ η, (3.1.8)where u q,p,f : V e → [ − , are functions, and { µ ( q,p ) ,e } ( q,p ) ∈ Z ′ is the family of measures µ ( q,p ) ,e ( x ) = Y f ⊆ e Y ω f ∈{ , } f ω f =0 ν q,f ( ω f ( p f , x f )) . As explained after (2.1.5) the measures µ ( q,p ) ,e are deﬁned by a pairwise independent family of forms L ( q,p ) ,e depending on the parameters ( q, p ) ∈ Z × V e , which is a well-deﬁned extension of the family L q,e deﬁn-ing the measures µ q,e . It is clear from (3.1.8) that the measure ψ ′ on Z ′ has the form ψ ′ ( q, p ) = µ q,e ( p ) ψ ( q ) .For q ′ = ( q, p ) , let Γ( q, p ) := h g q , Y f ∈ ∂e u q,p,f i µ q,p,f (3.1.9)We show that there is a set Ω ′ ⊆ Ω × V e of measure ψ ′ (Ω ′ ) ≥ − c η , (3.1.10)such that for every ( q, p ) ∈ Ω ′ one has Γ( q, p ) ≥ η . (3.1.11)By Lemma 2.2 we have that µ q,e ( V e ) = 1 + o K (1) ≤ for q / ∈ E where E ⊆ Ω is a set of measure ψ ( E ) = o K (1) . Thus for q ∈ Ω \E = Ω we have by (3.1.8) that Z V e { Γ( q,p ) ≥ η/ } Γ( q, p ) dµ q,e ( p ) ≥ η , (3.1.12)where by (3.1.8) and (3.1.9) we have Γ( q, p ) = Z V e g q ( x ) Y f ∈ ∂e u q,f w q,p ( x ) dµ q,e ( x ) The function w q,p ( x ) is the product of weight functions of the form ν ( L ( q, p, x )) depending on both p and x . Thus, using the bounds | g q | ≤ , | u q,p,f | ≤ , one has Z Z Z V e | Γ( q, p ) | dµ q,e ( p ) dψ ( q ) ≤ Z Z Z V e Z V e Z V e w q,p ( x )w q,p ( x ′ ) dµ q,e ( x ) dµ q,e ( x ′ ) dµ q,e ( p ) dψ ( q ) (3.1.13) = 1 + o K (1) ≤ by the linear forms condition, as the factors in the product depend on different sets of variables. Let Ω ′ := { ( q, p ) ∈ Ω × V e ; Γ( q, p ) ≥ η/ } . Thus by (3.1.12), (3.1.13) and the Cauchy-Schwartz inequality c η ≤ Z Ω ′ Γ( q, p ) dµ q,e ( p ) dψ ( q ) ! ≤ Z Ω ′ Γ( q, p ) dµ q,e ( p ) dψ ( q ) ψ ′ (Ω ′ ) ≤ ψ ′ (Ω ′ ) . This shows ψ ′ (Ω ′ ) ≥ − c η as claimed.Since | u q ′ ,f | ≤ , decomposing of each function u q ′ ,f into its positive and negative parts yields that h g q , Y f ∈ ∂e v q ′ ,f i µ q ′ ,e ≥ − η (3.1.14)for some functions v q ′ ,f : V f → [0 , . For a given f ∈ ∂e and some ≤ t f ≤ , let U q ′ ,t f := { x f ∈ V f : v q ′ ,f ( x f ) ≥ t f } be the level set of the functions v q ′ ,f . Then v q ′ ,f ( x f ) = R U q ′ ,tf ( x f ) dt f , and for each term in (3.1.14) wehave Z · · · Z h g q , Y f ∈ ∂e U q ′ ,tf i µ q ′ ,e dt ≥ − η, where t = ( t f ) f ∈ ∂e . Accordingly the integrand must be at least − d − η for some value of the parameter t .Fix such a t = ( t f ) and write U q ′ ,f for U q ′ ,t f for simplicity of notation. For q ′ = ( q, p ) ∈ Ω ′ , deﬁne B q ′ ,f tobe the σ − algebra generated by B q,f , and the U q ′ ,t f . For q ′ / ∈ Ω ′ , set B q ′ ,f = B q,f . The function Q f ∈ ∂e U q ′ ,f is constant on the atoms of the σ − algebra W f ∈ ∂e B q ′ ,f , and therefore we havefor q ′ ∈ Ω ′ h G q,e − E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) , Y f ∈ ∂e U q ′ ,f i µ q ′ ,e = 0 for q ′ ∈ Ω ′ . Hence, by (3.1.7) and (3.1.14) it follows that h E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) − E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) , Y f ∈ ∂e U q ′ ,f i µ q ′ ,e ≥ − η (3.1.15)By Lemma 2.2 there is a set E ⊆ Z ′ such that ψ ′ ( E ) = o K (1) and (cid:13)(cid:13) Y f ∈ ∂e U q ′ ,f (cid:13)(cid:13) L ( µ q ′ ,e ) ≤ µ q ′ ,e ( V e ) / = 1 + o K (1) ≤ for q ′ ∈ Ω ′ \E =: Ω ′ . Then by the Cauchy-Schwartz inequality, (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) − E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) ≥ − η, for q ′ ∈ Ω ′ . By Lemma 2.6 there is an exceptional set E ⊆ Z ′ of measure ψ ′ ( E ) = o K,M (1) such that for q ′ = ( q, p ) ∈ Ω ′ := Ω ′ \E we have (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) − E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) ≥ − η − o K,M (1) ≥ − η. (3.1.16)Since B q,f ⊆ B q ′ ,f , for q ′ = ( q, p ) , (3.1.16) is equivalent to (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) − (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) ≥ − η . (3.1.17)Finally, by a further invocation of Lemma 2.6 there is a set E ⊆ Z ′ of measure ψ ′ ( E ) = o K,M (1) such thatfor q ′ ∈ Ω ′ := Ω ′ \E we have (for N, W sufﬁciently large)

MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 23 (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) − (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q,e ) ≥ − η . (3.1.18)This proves the lemma choosing Ω ′ = Ω ′ . (cid:3) Iterating the above lemma leads to a parametric family of σ − algebras and measures such that the sets G q,e become sufﬁciently uniform with respect to them. The associated decomposition of their indicatorfunctions is sometimes referred to as a Koopman-von Neumann type decomposition [19]. We will replacesets G e ⊆ V e by σ -algebras B e on V e for e ∈ H d ′ and for that it is useful to deﬁne the total energy of thefamily {B e } e ∈H d ′ with respect to a family of lower order σ -algebras {B f } f ∈H d ′− and a family of measures { µ e } e ∈H d ′ as X e ∈H d ′ ,G e ∈B e (cid:13)(cid:13) E µ e ( G e | _ f ∈ ∂e B f ) (cid:13)(cid:13) L ( µ e ) . (3.1.19)Assuming the measures µ e are normalized i.e. µ e ( V e ) = 1 + o (1) ≤ , a crude upper bound for the totalenergy is d +1 M = O M (1) , where M is the complexity of the σ -algebras B e . Lemma 3.2 (Koopman-von Neumann decomposition) . Let { µ q,f } q ∈ Z,f ∈H be a well-deﬁned, symmetricfamily of measures of complexity at most K . Let ≤ d ′ ≤ d , and let {B q,e } q ∈ Z,e ∈H d ′ and {B q,f } q ∈ Z,f ∈H d ′− be families of σ -algebras of complexity at most M d ′ and M d ′ − . Finally let Ω ⊆ Z with ψ (Ω) ≥ c > ,and let δ > be a constant.Then for N, W sufﬁciently large with respect to the constants δ, c , M d ′ , M d ′ − and K , there exists a well-deﬁned, symmetric extension { µ q ′ ,f } q ′ ∈ Z ′ ,f ∈H of the system { µ q,f } of complexity at most K ′ = O M d ′ ,K, δ (1) and a family of σ -algebras {B q ′ ,f } q ′ ∈ Z ′ ,f ∈H d ′− such that the following hold. (1) For all q ′ = ( q, p ) ∈ Z ′ and f ∈ H d ′ − we have B q,f ⊆ B q ′ ,f , compl ( B q ′ ,f ) ≤ compl ( B q,f ) + O M d ′ , δ (1) . (3.1.20)(2) There exists a set Ω ′ ⊆ Ω × V ⊆ Z ′ of measure ψ ′ (Ω ′ ) ≥ c ( c , δ, M d ′ ) > such that for all q ′ = ( q, p ) ∈ Ω ′ and for all G q,e ∈ B q,e one has (cid:13)(cid:13) G q,e − E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) (cid:3) νq ′ , e ≤ δ. (3.1.21) and (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) = (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q,e ) + o M d ′ ,K, δ (1) , (3.1.22) Proof.

Initially set Z ′ = Z , then (3.1.20) and (3.1.22) trivially holds for q ′ = q . If there is a set Ω ⊆ Ω ofmeasure ψ (Ω ) ≥ c such that inequality (3.1.21) holds for all q ∈ Ω and G q,e ∈ B q,e then the conclusionsof the lemma hold for the initial system of measures and σ -algebras { µ q,f } , {B q,f } and the set Ω . Other-wise there is a set Ω ⊆ Ω of measure ψ (Ω ) ≥ c such that for each q ∈ Ω there is an e ∈ H d ′ and aset G q,e ∈ B q,e for which the inequality (3.1.21) fails. By the pigeonholing we may assume that e ∈ H d ′ isindependent of q . Then by Lemma 3.1, with η := δ d ′ , there is a well-deﬁned extension { µ q ′ ,f } q ′ ∈ Z ′ ,f ⊆ e , afamily of σ -algebras {B q ′ ,f } q ′ ∈ Z ′ ,f ⊆ e and a set Ω ′ ⊆ Ω for which (3.1.3)-(3.1.5) hold. Let { µ q ′ ,f } q ′ ∈ Z ′ ,f ∈H be the symmetrization of the system { µ q ′ ,f } q ′ ∈ Z ′ ,f ⊆ e as described in section 2.3, and set B q ′ ,f := B q,f for q ′ / ∈ Ω ′ or f * e . By Lemma 2.5 one may remove a set E of measure ψ ′ ( E ) = o M d ′ ,K (1) such that for all q ′ ∈ Ω ′ \E , (3.1.20) and (3.1.22) hold for the extended system, whose total energy is at least − δ d ′ largerthan that of the initial system { µ q,f } q ∈ Z,f ∈H .Based on the above argument we perform the following iteration. Let { µ q ′ ,f } q ′ ∈ Z ′ ,f ∈H be a well-deﬁned,symmetric extension of the initial system { µ q,f } q ∈ Z,f ∈H , {B q ′ ,f } q ′ ∈ Z ′ ,f ∈H d ′− be a family of σ -algebrasand let Ω ′ ⊆ Ω × V ′ ⊆ Z ′ for which (3.1.20) and (3.1.22) hold. If there is a set Ω ′ ⊆ Ω ′ of measure ψ ′ (Ω ′ ) ≥ ψ (Ω ′ ) / such that for all q ∈ Ω ′ , e ∈ H d ′ and G q,e ∈ B q,e inequality (3.1.21) holds, then thesystem { µ q ′ ,f } , {B q ′ ,f } together with the set Ω ′ satisﬁes the conclusions of the lemma.Otherwise there is a well-deﬁned, symmetric extension { µ q ′′ ,f } q ′′ ∈ Z ′′ ,f ∈H together with a family of σ -algebras {B q ′′ ,f } q ′′ ∈ Z ′′ ,f ∈H d ′− and a set Ω ′′ ⊆ Ω ′ × Z d ′ N such that for all q ′′ ∈ Ω ′′ inequalities (3.1.20) and(3.1.22) hold, and total energy of the system ( µ q ′′ ,e , B q,e , B q ′′ ,f ) is at least − δ d ′ larger than that of thesystem ( µ q ′ ,e , B q,e , B q ′ ,f ) . Set Z ′ := Z , µ q ′ ,e := µ q ′′ ,e and B q ′ ,f := B q ′′ ,f . By (3.1.19) the iteration process must stop in O M d ′ ,δ (1) steps and the system obtained satisﬁes (3.1.20)-(3.1.22). (cid:3) Hypergraph regularity Lemmas.

The shortcoming of Lemma 3.2 is that the complexity of the σ -algebras B q,f might be very large with respect to the parameter δ , which measures the uniformity of thegraphs G q,e . This issue can be taken care of with an iteration process using Lemma 3.2 repeatedly, alongthe lines it was done in [19]. In the weighted settings we have to pass to a new system of weights andmeasures at each iteration and have to exploit the stability properties of well-deﬁned extensions to show thatthe iteration process terminates. Lemma 3.3 (Preliminary regularity lemma.) . Let ≤ d ′ ≤ d and M d ′ > be a constant. Let { µ q,f } q ∈ Z,f ∈H be a well-deﬁned, symmetric family of measures of complexity at most K , and ≤ d ′ ≤ d and {B q,e } q ∈ Z,e ∈H d ′ be a family of σ − algebras on V e so that for all q ∈ Z, e ∈ H d ′ compl ( B q,e ) ≤ M d ′ . (3.2.1) Let ε > and F : R + → R + be a non-negative, increasing function, possibly depending on ε and Ω ⊆ Z be a set of measure ψ (Ω) ≥ c > . If N, W is sufﬁciently large with respect to the parameters ε, c , M d ′ , K , and F , then there exists a well-deﬁned, symmetric extension { µ q,f } q ∈ Z of complexity at most O K,M d ′ ,F, ε (1) , and families of σ -algebras B q,f ⊆ B ′ q,f deﬁned for q ∈ Z, f ∈ H d − and a set Ω ⊆ Z such that the following holds. (1) We have that Ω ⊆ Ω × V ⊆ Z = Z × V where V = Z kN of dimension k = O M d ′ ,F, ε (1) . Moreover ψ (Ω) ≥ c ( c , F, M d ′ , ε ) > . (3.2.2)(2) There is a constant M d ′ − = O M d ′ ,F,ε (1) such that for all q ∈ Z and f ∈ H d ′ − we havecompl ( B q,f ) ≤ M d ′ − . (3.2.3)(3) For all q = ( q, p ) ∈ Ω , e ∈ H d ′ and G q,e ∈ B q,e , we have (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B ′ q,f ) − E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q ,e ) ≤ ε (3.2.4) MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 25 and (cid:13)(cid:13) G q,e − E µ q,e ( G q,e | _ f ∈ ∂e B ′ q,f ) (cid:13)(cid:13) (cid:3) νq,e ≤ F ( M d ′ − ) . (3.2.5) Proof.

Let { µ q ′ ,f } q ′ ∈ Z ′ , f ∈H be a well-deﬁned, symmetric extension of the initial system { µ q,f } deﬁnedon a parameter space Z ′ = Z × V ′ of complexity at most K ′ . Also for q ′ ∈ Z ′ and f ∈ H d ′ − let {B q ′ ,f } q ′ ∈ Z ′ ,f ∈H d ′− be a family of σ -algebras of complexity at most M d ′ − . Set B q ′ ,e := B q,e for q ′ = ( q, p ) ∈ Z ′ , e ∈ H d ′ , and apply Lemma 3.2 to the system ( µ q ′ ,e , B q ′ ,e , B q ′ ,f ) , with δ = F ( M d ′ − ) − .This generates a well-deﬁned, symmetric extension { µ q,f } q ∈ Z,f ∈H and a family of σ -algebras {B ′ q,f } q ∈ Z,f ∈H d ′− and a set Ω ⊆ Z . Set B q,f := B q ′ ,f for q = ( q ′ , p ) ∈ Z , f ∈ H d ′ − . The new system ( µ q,f , B q,e , B ′ q,f ) satisﬁes (3.2.2)-(3.2.3) and (3.2.5) as long as the parameters K ′ , M d ′ − are of magnitude O K,M d ′ ,F, ε (1) . There are two possibilities. • Case 1:

There exists a set Ω ⊆ Ω of measure ψ (Ω ) ≥ ψ (Ω) / such that (3.2.5) holds for all q ∈ Ω . In this case the conclusions of the lemma hold for the system ( µ q,e , B q,e , B ′ q,f ) and the set Ω . • Case 2:

There is a set Ω ⊆ Ω of measure ψ (Ω ) ≥ ψ (Ω) so that inequality (3.2.5) fails for all q ∈ Ω . Then, thanks to the stability condition (3.1.22) and the fact that B q ′ ,f = B q,f ⊆ B ′ q,f , wehave for q ∈ Ω , q ′ = π ′ ( q ) , and q = π ( q ) that X e,G q,e (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B ′ q,f ) (cid:13)(cid:13) L µq,e − X e,G q,e (cid:13)(cid:13) E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L µq ′ ,e ≥ X e,G q,e ( (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B ′ q,f ) (cid:13)(cid:13) L µq,e − (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L µq,e ) − o M d ′ ,K ′ ,F (1)= X e,G q,e (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B ′ q,f ) − E µ q,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L µq,e − o M d ′ ,K ′ ,F (1) ≥ ε − o M d ′ ,K ′ ,F (1) , (3.2.6)where the summation is taken over all e ∈ H d ′ and G q,e ∈ B q,e .Thus, for sufﬁciently large N, W , we have for all q = ( q, p ) ∈ Ω that the total energy of the system ( µ q,f , B q,e , B ′ q,f ) is at least ε larger than that of the system ( µ q ′ ,f , B q ′ ,e , B q ′ ,f ) . In this case, set Z ′ := Z, Ω ′ := Ω , µ q ′ ,f := µ q,f , and B q ′ ,f := B ′ q,f and repeat the above argument. Starting withthe original system µ q,f , B q,e and σ -algebras, B q,f = {∅ , V f } ) , the iteration process must stop in at most ε − ( Md ′ )+1 d +1 = O M d ′ ,ε (1) steps, generating a system ( µ q,f , B q,e , B ′ q,f ) which satisﬁes the conclusionsof the lemma. (cid:3) This lemma is more widely applicable than Lemma 3.2 as the uniformity of the hypergraphs G q,e with re-spect to the (ﬁne) σ − algebras B ′ q,e can be chosen to be arbitrarily small with respect to the complexity ofthe (coarse) σ − algebras B q,e , while the approximations E µ q,e ( G q,e | W B ′ q,e ) and E µ q,e ( G q,e | W B q,e ) stayvery close in L ( µ q,e ) . In order to obtain a counting and a removal lemma starting from a given measure system { µ q,e } and σ -algebras {B q,e } we need to regularize the elements of the σ -algebras B q,e for all e ∈ H with respect to the lower order σ -algebras W f ∈ ∂e B q,f . This is done by applying Lemma 3.3 inductively, and provides theﬁnal form of the regularity lemma we need. Let us call a function F : R + → R + a growth function if it iscontinuous, increasing, and satisﬁes F ( x ) ≥ x for x ≥ . Theorem 3.1. [Regularity lemma.] Let ≤ d ′ ≤ d and M d ′ > be a constant. Let { µ q,f } q ∈ Z,f ∈H be awell-deﬁned, symmetric family of measures of complexity at most K , and ≤ d ′ ≤ d and {B q,e } q ∈ Z,e ∈H d ′ be a family of σ − algebras on V e so that for all q ∈ Z, e ∈ H d ′ compl ( B q,e ) ≤ M d ′ . (3.2.7) Let F : R + → R + be a growth function, and Ω ⊆ Z be a set of measure ψ (Ω) ≥ c > . If N, W is sufﬁciently large with respect to the parameters c , M d ′ , K , and F , then there exists a well-deﬁned, symmetric extension { µ q,f } q ∈ Z,f ∈H of complexity at most O K,M d ′ ,F (1) , and families of σ -algebras B q,f ⊆ B ′ q,f deﬁned for q ∈ Z, f ∈ H d − and a set Ω ⊆ Z such that the following holds. (1) We have that Ω ⊆ Ω × V ⊆ Z = Z × V where V = Z kN of dimension k = O M d ′ ,F (1) . Moreover ψ (Ω) ≥ c ( c , F, M d ′ ) > . (3.2.8)(2) There exist numbers M d ′ < F ( M d ′ ) ≤ M d ′ − < F ( M d ′ − ) ≤ · · · ≤ M < F ( M ) ≤ M = O M d ′ ,F (1) (3.2.9) such that for all ≤ j < d ′ , f ∈ H j , and q ∈ Z ,compl ( B ′ q,f ) ≤ M j . (3.2.10)(3) For all ≤ j ≤ d ′ , e ∈ H j , q = ( q, p ) ∈ Ω , and G q,e ∈ B q,e (with B q,e := B q,e , if j = d ′ ), onehas (cid:13)(cid:13) E µ q,e ( G q,e | _ f ∈ ∂e B ′ q,f ) − E µ q,e ( G q,e | _ f ∈ ∂e B q,f ) (cid:13)(cid:13) L ( µ q,e ) ≤ F ( M j ) (3.2.11) and (cid:13)(cid:13) G q,e − E µ q,e ( G q,e | _ f ∈ ∂e B ′ q,f ) (cid:13)(cid:13) (cid:3) νq.e ≤ F ( M ) . (3.2.12) Proof.

We proceed by an induction on d ′ . If d ′ = 1 the statement follows from Lemma 3.3 with ε = F ( M ) ,so assume that d ′ ≥ and the theorem holds for d ′ − . Apply Lemma 3.3 with a growth function F ∗ ≥ F (to be speciﬁed later) and with ε = F ∗ ( M d ′ ) . This gives a well-deﬁned, symmetric extension { µ q ′ ,f } and afamily of σ -algebras B q ′ ,f ⊆ B ′ q ′ ,f deﬁned on a parameter space Z ′ = Z × V , such that (cid:13)(cid:13) E µ q ′ ,e ( G q ′ ,e | _ f ∈ ∂e B ′ q ′ ,f ) − E µ q ′ ,e ( G q ′ ,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) ≤ F ∗ ( M d ′ ) (3.2.13)and (cid:13)(cid:13) G q ′ ,e − E µ q ′ ,e ( G q ′ ,e | _ f ∈ ∂e B ′ q ′ ,f ) (cid:13)(cid:13) (cid:3) νq ′ ,e ≤ F ∗ ( M d ′ − ) , (3.2.14)hold for all q ′ = ( q, p ) ∈ Ω ′ , e ∈ H d ′ , and G q ′ ,e ∈ B q ′ ,e = B q,e , where Ω ′ ⊆ Ω × V ⊆ Z ′ is a set ofmeasure ψ ′ (Ω ′ ) ≥ c ( c , F, M d ′ ) > . MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 27

Applying the induction hypothesis to the system { µ q ′ ,f } q ′ ∈ Z ′ ,f ∈H , {B q ′ ,f } q ′ ∈ Z ′ ,f ∈H d ′− , the growth func-tion F , and the set Ω ′ , one obtains an extension { µ q,f } q ∈ Z,f ∈H and families of σ − algebras {B q,f ⊆B ′ q,f } q ∈ Z, f ∈H j such that (3.2.10) - (3.2.12) hold for j < d ′ − , with constants M d ′ − < F ( M d ′ − ) ≤ · · · ≤ M < F ( M ) = O M d ′− ,F (1) . (3.2.15)For q = ( q ′ , p ) ∈ Z , f ∈ H d ′ − set B q,f := B q ′ ,f , and B ′ q,f := B ′ q ′ ,f . We show that inequalities (3.2.11)and (3.2.12) hold for j = d ′ . Indeed, by the stability property (2.2.12), one has (cid:13)(cid:13) E µ q,e ( G q ′ ,e | _ f ∈ ∂e B ′ q ′ ,f ) − E µ q,e ( G q ′ ,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L ( µ q,e ) = (cid:13)(cid:13) E µ q ′ ,e ( G q ′ ,e | _ f ∈ ∂e B ′ q ′ ,f ) − E µ q ′ ,e ( G q ′ ,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) L ( µ q ′ ,e ) + o K,M d ′ ,F,F ∗ (1) ≤ F ∗ ( M d ′ ) + o K,M d ′ ,F,F ∗ (1) , (3.2.16)for all q = ( q ′ , p ) ∈ Ω \E , e ∈ H d ′ , and G q ′ ,e ∈ B q ′ ,e . Here E ⊆ Ω is a set of measure ψ ( E ) = o K,M d ′ ,F,F ∗ (1) . Similarly using the stability properties (2.2.10) and (2.2.17) of the box norms, we have (cid:13)(cid:13) G q ′ ,e − E µ q,e ( G q ′ ,e | _ f ∈ ∂e B ′ q ′ ,f ) (cid:13)(cid:13) (cid:3) νq,e = (cid:13)(cid:13) G q ′ ,e − E µ q ′ ,e ( G q ′ ,e | _ f ∈ ∂e B ′ q ′ ,f ) (cid:13)(cid:13) (cid:3) νq,e + o K,M d ′ ,F,F ∗ (1)= (cid:13)(cid:13) G q ′ ,e − E µ q ′ ,e ( G q,e | _ f ∈ ∂e B q ′ ,f ) (cid:13)(cid:13) (cid:3) νq ′ ,e + o K,M d ′ ,F,F ∗ (1) ≤ F ∗ ( M d ′ − ) + o K,M d ′ ,F,F ∗ (1) , (3.2.17)for all q = ( q ′ , p ) ∈ Ω \E , e ∈ H d ′ and A q ′ ,e ∈ B q ′ ,e = B q,e , where E ⊆ Ω is a set of measure ψ ( E ) = o K,M d ′ ,F,F ∗ (1) .With F ( M ) = O M d ′− ,F (1) , we have that F ( M ) ≤ C ( M d ′ − , F ) =: F ∗ ( M d ′ − ) for a sufﬁcientlyrapidly growing function F ∗ depending only on F . Assuming N, W are sufﬁciently large with respect to M d ′ and K , inequalities (3.2.11), (3.2.12) for j = d ′ and q ∈ Ω \ ( E ∪ E ) follow from (3.2.13) and (3.2.14).The rest of the conclusions of the theorem are clear from the construction. (cid:3)

4. C

OUNTING AND THE R EMOVAL L EMMAS .4.1.

The Removal Lemma.

In this section we formulate a so-called counting lemma and show how it im-plies Theorem 1.4. Our arguments will closely follow and are straightforward adaptations of those in [19]to the weighted settings; for the sake of completeness we will include the details.For e ∈ H d let G e ⊆ V e be a hypergraph, and let B e = { A e , A Ce , ∅ , V e } be the σ − algebra generatedby it. Let { ν e } e ∈H and { µ e } e ∈H be the weights and measures associated to a well-deﬁned, symmetric fam-ily forms L = { L ke ; e ∈ H d , ≤ k ≤ d } . Take M d > and F : R + → R + be a growth function to bedetermined later and apply Theorem 3.1 with d ′ = d to obtain a well-deﬁned, symmetric parametric exten-sion { µ q,e } q ∈ Z,e ∈H together with σ -algebras B q,e ⊆ B ′ q,e and a set Ω ⊆ Z such that (3.2.8)-(3.2.12) hold. The family { ν e } can be considered as a parametric family of weights in a trivial way, setting Z = Ω = { } , and ψ (0) = 1 . Note that the complexity of the system as well as the σ -algebras is O M d ,F (1) . We consider the system ofmeasures µ q,e and the σ -algebras B q,e , B ′ q,e ﬁxed for the rest of this section.It will be convenient to deﬁne all our σ -algebras on the same space V J and eventually replace the ensem-ble of measures { µ q,e } e ∈H with the measure µ q := µ q,J = Q f ∈H ν q,f . Thanks to the stability conditions(2.1.6)-(2.1.7) this can be done at essentially no cost. Indeed for any e ∈ H there is an exceptional set E e ⊆ Ω of measure ψ ( E e ) = o M d ,F (1) , such that for any family of sets G q,e ⊆ V e we have that µ q ( π − e ( G q,e )) = µ q,e ( G q,e ) + o M d ,F (1) , (4.1.1)uniformly for q ∈ Ω \E e . Let E = S e ∈H E e , Ω ′ := Ω \E , then (4.1.1) means that for any set A q,e ∈ A e one has that µ q ( A q,e ) = µ q,e ( π e ( A q,e )) + o M d ,F (1) uniformly for q ∈ Ω ′ . We will write µ q,e ( A q,e ) = R V e A q,e ( x e ) dµ q,e ( x e ) for simplicity of notations.Deﬁne the σ -algebras B q,e := π − e ( B q,e ) , B ′ q,e := π − e ( B ′ q,e ) on V J , and note that B q,e = B e for e ∈ H d asthe initial σ -algebras B e are not altered in Theorem 3.1. Let B q := W e ∈H B q,e be the σ -algebra generated bythe algebras B q,e , and deﬁne similarly the σ -algebra B ′ q . The atoms of B q are of the form A q = T e ∈H A q,e where A q,e is an atom of B q,e . In particular if E e ∈ B e then T e ∈H d E e is the union of the atoms of B q .The so-called counting lemma [19], [5], [14], gives an approximate formula for the measure of “most”atoms A q and as consequence it shows that their measure is bounded below by a positive constant depend-ing only on the initial data F and M d . If, as in Theorem 1.4, one assumes that the measure of T e ∈H d E e is sufﬁciently small then it cannot contain most of the atoms thus removing the exceptional atoms from thesets E e , the intersection of the remaining sets becomes empty, leading to a proof of Theorem 1.4.To make this heuristic precise let us start by deﬁning the relative density δ q,e ( A | B ) := µ q,e ( A ∩ B ) /µ q ( B ) for A, B ∈ B q,e , with the convention that δ q,e ( A | B ) := 1 if µ q ( B ) = 0 . Deﬁnition 4.1.

Let A q = ∩ e ∈H A q,e be an atom of B q . We say that the atom A q is regular if the followinghold. (1) For all atoms A q,e δ q,e ( A q,e (cid:12)(cid:12) \ f ∈ ∂e A q,f ) ≥ F ( M j ) , (4.1.2)(2) Moreover Z V e (cid:12)(cid:12) E µ q ( A q,e | _ f ∈ ∂e B ′ q,f ) − E µ q ( A q,e | _ f ∈ ∂e B q,f ) (cid:12)(cid:12) Y f ( e A q,f dµ q,e ≤ F ( M j ) Z V J Y f ( e A q,f dµ q,e . (4.1.3)This roughly means that all atoms A q,e are both somewhat large and regular on the intersection of the lowerorder atoms A q,f , ( f ∈ ∂e ). Note that if | e | = 1 then ∂e = ∅ and by convention we deﬁne T f ∈ ∂e A q,f = V J ,and the left side of (4.1.2) becomes µ q,e ( A q,e ) . Proposition 4.1. [Counting lemma] There is a set

E ⊆ Ω of measure ψ ( E ) = o N,W →∞ ; M d ,F (1) such thatif q ∈ Ω \E and if A q = T e ∈H A q,e ∈ W e ∈H B q,e is a regular atom, then µ q ( A q ) = (1 + o M d →∞ (1)) Y e ∈H δ q,e ( A q,e (cid:12)(cid:12) \ f ∈ ∂e A q,f ) + O M (cid:18) F ( M ) (cid:19) + o N,W →∞ ; M d ,F (1) . (4.1.4) MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 29

Next, following [19], we show that the total measure of irregular atoms is small. For any atom A q,e ∈ B q,e ,let B q,e,A q,e be the union of all sets of the form T f ( e A q,f for which (4.1.2) or (4.1.3) fails. Note that if anatom A q = T e ∈H A q,e is irregular then A q ⊆ A q,e ∩ B q,e,A q,e for some e ∈ H . We claim that µ q ( A q,e ∩ B q,e,A q,e ) . F ( M j ) (4.1.5)for q / ∈ E , where E ⊆ Ω is a set of measure ψ ( E ) = o M d ,F (1) . To see this, note that the measure µ q can be replaced by the measure µ q,e as they differ by a negligible quantity on sets which belong to A e . Weestimate ﬁrst the contribution of those sets T f ( e A q,f to the left side of (4.1.5) for which (4.1.2) fails. Thisquantity is bounded by X { A q,f } f ∈ ∂e , (4.1.2) fails µ q,e ( A q,e ∩ \ f ∈ ∂e A q,f ) ≤ F ( M j ) X { A q,f } f ∈ ∂e µ q,e ( \ f ∈ ∂e A q,f ) ≤ F ( M j ) µ q,e ( V e ) . F ( M j ) , as the summation is taken over the disjoint atoms of the σ -algebra W f ∈ ∂e B q,f .Similarly, one estimates the total contribution of the disjoint atoms T f ( e A q,f for which (4.1.3) fails asfollows. X { A q,f } f ⊆ e , (4.1.3) fails µ q,e ( \ f ( e A q,f ) ≤ F ( M j ) Z V e | E µ q,e ( A q,e | _ f ∈ ∂e B ′ q,f ) − E µ q,e ( A q,e | _ f ∈ ∂e B q,f ) | dµ q,e ≤ F ( M j ) 1 F ( M j ) = 1 F ( M j ) . Since the sets A q,e ∩ B q,e,A q,e contain all irregular atoms, and for given e ∈ H j the number of all atoms ofthe σ -algebra B q,e is at most Mj , one estimates the total measure of all irregular atoms as d X j =1 X e ∈H j X A q,e ∈B q,e µ q ( A q,e ∩ B q,e,A q,e ) ≤ d X j =1 (cid:18) dj (cid:19) Mj F ( M j ) ≤ p log F ( M d ) ≤ − Md (4.1.6)if, say F ( M ) ≥ Md + d . This shows, choosing M d sufﬁciently large, that most atoms are regular.Another fact we need is that the measure of regular atoms is not too small. Indeed by (4.1.2), (4.1.4),we have that for q ∈ Ω and a regular atom A q = ∩ e ∈H A q,e ,µ q ( A q ) ≥ Y j ≤ d Y e ∈H j F ( M j ) / − O d,M (cid:18) F ( M ) (cid:19) + o M d ,F (1) ≥ F ( M ) > , (4.1.7)as long as F is sufﬁciently rapid growing and M d is sufﬁciently large with respect to d. It is clear from(3.2.9) that F ( M ) ≤ F ∗ ( M d ) for a function F ∗ depending only on F and M d .After these preparations, assuming the validity of Proposition 4.1, it is easy to obtain the Proof of Theorem 1.4.

Let δ > , E e ∈ A e and g e : V e → [0 , for e ∈ H d be given. Let E ⊆ Ω be aset of measure ψ ( E ) = o M d ,F (1) so that (4.1.1), (4.1.6) and (4.1.7) hold for q ∈ Ω / E . Also by (2.2.4)conditions (1.4.11)-(1.4.12) hold for ˜ µ J := µ q,J and ˜ µ e := µ q,e ( e ∈ H d ) , (4.1.8)for q / ∈ E , for a set E ⊆ Ω be a set of measure ψ ( E ) = o M d ,F (1) .Now ﬁx q / ∈ E ∪ E and deﬁne ˜ µ J and ˜ µ e for e ∈ H d as is (4.1.8). We claim that this system of mea-sures satisfy the conclusions of the theorem. By construction the system is symmetric so it remains toconstruct the sets E ′ e and show (1.4.13)-(1.4.15) hold. For given e ∈ H deﬁne the sets E ′ q,e = V J \ ( B q,e,A e ∪ [ f ( e, A q,f ( A q,f ∩ B q,f,A q,f )) , (4.1.9)where A q,f ranges over the atoms of B q,f . As we have B q,e = B e , which is generated by a single set E e , if T e ∈H d E e contains an atom A q = T f ∈H A q,f then A q,e = E e for e ∈ H d . If such an atom would be regularthen by (1.4.10) its measure would satisfy F ∗ ( M d ) ≤ ˜ µ J ( \ e ∈H d E e ) = µ J ( \ e ∈H d E e ) + o M d ,F (1) < δ. Choosing M d to be the largest positive integer so that F ∗ ( M d ) ≤ (2 δ ) − we see that T e ∈H d E e containsonly irregular atoms. From (4.1.9) and (4.1.6) we have ˜ µ J ( E e \ E ′ q,e ) = ˜ µ J ( [ f ⊆ e , A q,f ( A q,f ∩ B q,f,A q,f )) ≤ − Md . (4.1.10)Also, all irregular atoms A q = T f ∈H A q,f ⊆ T e ∈H d E e are contained in one of the sets E e \ E ′ q,e, thus \ e ∈H d ( E e ∩ E ′ q,e ) = ∅ . Finally, choosing ε := 2 − Md , (1.4.14) holds by (4.1.10). Moreover δ → implies M d → ∞ and hence ε → showing the validity of (1.4.15). This proves Theorem 1.4. (cid:3) Proof of Proposition 4.1.

The proof proceeds by induction and uses the Cauchy-Schwartz inequality,causing to double certain sets of variables. As a consequence, we need a generalization of Proposition 4.1which requires the following deﬁnition.

Deﬁnition 4.2 (Weighted hypergraph bundles over H ) . Let K be a ﬁnite set together with a map π : K → J ,called the projection map of the bundle. Let G K be the set of edges g ⊆ K such that π is injective on g and π ( g ) ∈ H .For any g ∈ G K , write V g := V π ( g ) = Y k ∈ g V π ( k ) , and deﬁne the weights and measures ν q,g , µ q,g : V g → R + as ν q,g ( x g ) := ν q,π ( g ) ( x g ) , µ q,g ( x g ) = Y g ′ ⊆ g ν q,g ′ ( x g ′ ) . MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 31

The total measure measure µ q,K on V K is given by µ q,K ( x ) = Y g ∈G ν q,g ( x g ) . A hypergraph

G ⊆ G K which is closed in the sense that ∂g ⊆ G for every g ∈ G , together with thespaces V g and the weight functions ν q,g for g ∈ G is called a weighted hypergraph bundle over H . Thequantity d ′ = sup g ∈G | g | is called the order of G . Note that the underlying linear forms deﬁning the weight system { ν q,g } q ∈ Z,g ∈G K , ¯ L g ( q, x g ) = L π ( g ) ( q, x g ) , supp x ( L π ( g ) ) = π ( g ) are pairwise linearly independent. Indeed, if g = g ′ they depend on different sets of variables, and for a ﬁxedsets of variables they are the same as the forms L ( q, x g ) . What happens is that we sample a number variablesfrom each space V j and evaluate the forms L ( q, x ) in the new variables. For example if we have x , x ′ ∈ V and x , x ′ ∈ V then to the edge (1 , ∈ H there correspond the edges (1 , , (1 , ′ ) , (1 ′ , and (1 ′ , ′ ) in G , and to every linear form L ( q, x , x ) there also correspond the forms L ( q, x , x ′ ) , L ( q, x ′ , x ) and L ( q, x ′ , x ′ ) deﬁning the weights on the appropriate edges. Proposition 4.2. [Generalized Counting Lemma] Let

G ⊆ G K be a closed hypergraph bundle over H withthe projection map π : K → J , and d ′ := sup g ∈G | g | be the order of G . Then, for F growing sufﬁcientlyrapidly with respect to d and K , there exists a set E ⊆ Ω of measure ψ ( E ) = o N →∞ ; M d ,K,F (1) such thatfor q ∈ Ω \E we have Z V K Y g ∈G A q,π ( g ) ( x g ) dµ q,K ( x ) (4.2.1) = (1 + o M d →∞ ,K (1)) Y g ∈G δ q,π ( g ) ( A q,π ( g ) | \ f ∈ ∂π ( g ) A q,f ) + O K,M ( 1 F ( M ) ) + o N →∞ ,K,M d (1) . Note that Proposition 3 is the special case when G = H and π is the identity map. Proof.

We use a double induction. First we induct on d ′ , the order of G (note that d ′ ≤ d ), and then, ﬁxing K and π , we induct on the number of edges r := |{ g ∈ G : | g | = d ′ }| .To start, assume that d ′ = r = 1 , so that G = { k } and j = π ( k ) ∈ J. The left hand side of (4.2.1)becomes Z V k A q,j ( x k ) dµ q,k ( x k ) = Z V j A q,j ( x j ) dµ q,j ( x j ) = δ q,j ( A q,j ) . Let { A q,e } e ∈H be a regular collection of atoms for q ∈ Ω , and deﬁne the functions b q,e , c q,e : V e → R for e ∈ H by b q,e := E µ q,e ( A q,e | _ f ∈ ∂e B ′ q,f ) − E µ q,e ( A q,e | _ f ∈ ∂e B q,f ) (4.2.2) c q,e := A q,e − E µ q,e ( A q,e | _ f ∈ ∂e B ′ q,f ) (4.2.3) and introduce the shorthand notation δ q,e = δ q,e ( A q,e | \ f ∈ ∂e A q,f ) . Note that if x ∈ A q,e T f ∈ ∂e A q,f then δ q,e = E µ q,e ( A e | _ f ∈ ∂e B q,f )( x e ) , (4.2.4)and thus one has the decomposition A q,e ( x e ) = δ q,e + b q,e ( x e ) + c q,e ( x e ) (4.2.5)on the set T f ∈ ∂e A q,f . Let g ∈ G such that | g | = d ′ and use (4.2.5) to write Y g ∈G A q,π ( g ) ( x g ) = ( δ q,π ( g ) + b q,π ( g ) ( x g ) + c q,π ( g ) ( x g )) Y g ∈G\{ g } A q,π ( g ) ( x g ) . Consider the contribution of the terms separately Z V K Y g ∈G A q,π ( g ) ( x g ) dµ q,K ( x )= Z V K ( δ q,π ( g ) + b q,π ( g ) ( x g ) + c q,π ( g ) ( x g )) Y g ∈G\{ g } A q,π ( g ) ( x g ) dµ q,K ( x )= M q + E q + E q (4.2.6)For main term M q , by the second induction hypothesis we have M q = δ q,π ( g ) Z V K Y g ∈G\{ g } A q,π ( g ) ( x g ) dµ q,K ( x )= δ q,π ( g ) (1 + o M d →∞ (1)) Y g ∈G\ g δ q,π ( g ) + O K,M ( 1 F ( M ) ) + o N,W →∞ ; K,M d (1) , and hence M q agrees with the right side of (4.2.1). We continue to estimate the second error term by E q = Z V K c q,π ( g ) ( x g ) Y g ∈G\{ g } A q,π ( g ) ( x g ) dµ q ( x ) = E x ∈ V K ( c q,π ( g ) ν q,g )( x g ) Y g ∈G\{ g } A q,π ( g ) ν q,g ( x g )= E x ∈ V K Y | g | = d ′ ,g ∈G f q,g ( x g ) Y g ′ ∈G , | g ′ |

5. P

ROOF OF T HE M AIN R ESULTS

In this section we ﬁnish the proof of our main result Theorem 1.2. Since we have already shown the validityof Theorem 1.4 and hence that of Theorem 1.3 by the argument in the introduction, it remains to show thatcounting afﬁne copies of ∆ in a set A ⊆ Z dN with weights w translates to counting copies in A ⊆ P d ofrelative density α > . This is standard, we include the details for the sake of completeness, using thearguments given in [2]. MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 35

First, let us identify [1 , N ] d with Z dN and recall that constellations in Z dN deﬁned by the simplex ∆ whichare contained in a box B ⊆ [1 , N ] d of size εN , are in fact genuine constellations contained in B . Note thatwe can assume that the simplex ∆ is primitive in the sense that t ∆ * Z d for any < t < , as any simplexis a dilate of a primitive one. To any simplex ∆ ⊆ Z d there exists a constant τ (∆) > depending only on ∆ such that the following holds. Lemma 5.1. [2]

Let ∆ ⊆ Z d be a primitive simplex. Then there is constant < ε < τ (∆) so that thefollowing holds.Let N be sufﬁciently large, and let B = I d be a box of size εN contained in [1 , N ] d ≃ Z dN . If thereexist x ∈ Z dN and ≤ t < N such that x ∈ B and x + t ∆ ⊆ B as a subset on Z dN , then either x + t ∆ ⊆ B or x + ( t − N )∆ ⊆ B , also as a subset of Z d .Proof [Theorem 1.3 implies Theorem 1.2]Let N, W be sufﬁciently large positive integers and assume that | A | ≥ α | P N | d for a set A ⊆ P dN . By thepigeonhole principle choose b = ( b j ) ≤ j ≤ d so that b j is relative prime to W for each j , and | A ∩ (( W Z ) d + b ) | ≥ α N d (log N ) d φ ( W ) d , (5.1)where φ is the Euler totient function. Set N := N/W and A := { n ∈ [1 , N ] d ; W n + b ∈ A } .Choose ε > so that ε < τ (∆) . By the Prime Number Theorem there is a prime N ′ so that ε N ′ = N (1 + o N →∞ (1)) , thus we have | A ∩ [1 , ε N ′ ] d | ≥ α ε d N ′ ) d W d (log N ′ ) d φ ( W ) d . (5.2)By Dirichlet’s theorem on primes in arithmetic progressions the number of n ∈ [1 , N ′ ] d \ [ ε N ′ , N ′ ] d forwhich W n + b ∈ P d is of O ( ε N ′ d W d (log N ′ ) d φ ( W ) d ) , thus (5.2) holds for the set A ′ := A ∩ [ ε N ′ , ε N ′ ] d aswell, if ε ≤ c d ε d α for a small enough constant c d > .If x ∈ A ′ then ε N ′ ≤ x i ≤ ε N ′ and W x i + b i ∈ P for ≤ i ≤ d , thus by the deﬁnition of theGreen-Tao measure ν b : [1 , N ′ ] → R + given in Section 1.3, we have w ( x ) = d Y i =1 ν b i ( x i ) ≥ c d (cid:18) φ ( W ) log NW (cid:19) d . (5.3)as log N ′ − log N assuming N sufﬁciently large with respect to W . Thus E x ∈ Z dN ′ A ′ ( x ) w ( x ) ≥ c d ε d α (5.4)for some constant c d > . Applying the contrapositive of Theorem (1.3) for the set A ′ with ε := c d ε d α gives E x ∈ Z dN ′ , t ∈ Z N ′ (cid:18) d Y j =0 A ′ ( x + tv j ) (cid:19) w ( x + t ∆) ≥ δ (5.5)with a constant δ = δ ( α, ∆) > depending only on α and the simplex ∆ = { v , . . . , v d } . Similarly as in(5.3) w ( x + t ∆) ≤ C d (cid:18) φ ( W ) log NW (cid:19) l (∆) , (5.6)since all coordinates of x + t ∆ are primes, bigger then R . Thus the number of copies ∆ ′ = x + t ∆ which arecontained in A ′ as a subset of Z dN ′ is at least c N d +1 (log N ) − l (∆) , for some constant c = c ( α, ∆ , W ) > depending only on the initial data α , ∆ and the number W . Since A ′ ⊆ [ ε N ′ , ε N ′ ] d , by Lemma 5.1 atleast half of the simplices ∆ ′ are contained in A ′ as a subset of Z d , and then the simplices ∆ ′′ := W ∆ ′ + b are contained in A .Now choose W = W ( α, ∆) large enough so that Theorem 1.3 holds for all sufﬁciently large N , and then A contain at least c ′ ( α, ∆) N d +1 (log N ) − l (∆) similar copies of ∆ for some constant c ′ ( α, ∆) > dependingonly on α and the simplex ∆ . This proves Theorem 1.2 (cid:3) A PPENDIX

A. B

ASIC PROPERTIES OF WEIGHTED BOX NORMS

In this appendix we describe some basic facts about the weighted version of Gowers’s box norms deﬁnedin (1.5.2) for functions F : V e → R . These norms have also been deﬁned in [8], Appendix B, and in factall the properties we prove here, including Proposition 1.1, can be deduced from the arguments given there.However as our settings is slightly different, we include the proofs below.We will assume e = { , . . . , d } =: [ d ] , and V = V [ d ] = Z dN without loss of generality. To show thatthese are indeed norms (for d ≥ ) let us deﬁne a multilinear form referred to as the weighted Gowers’sinner product. Let F ω : V e → R for ω ∈ { , } e , be a given family of functions and deﬁne D F ω , ω ∈ { , } d E (cid:3) ν := E x [ d ] ,y [ d ] ∈ V Y ω ∈{ , } d F ω ( ω ( x [ d ] , y [ d ] )) Y | I |

We will use Cauchy-Schwartz inequality several times and the linear forms condition. D F ω ; ω ∈ { , } d E (cid:3) dν = E x [2 ,d ] ,y [2 ,d ] (cid:20)(cid:18) Y | I |

G , ω = 1 ≤ X ω ∈{ , } d k h ω k (cid:3) dν ... k h ω d k (cid:3) dν = ( k F k (cid:3) dν + k G k (cid:3) dν ) d Also it follows directly from the deﬁnition that k λF k d (cid:3) dν = λ d k f k d (cid:3) dν , hence k λF k (cid:3) dν = | λ | k F k (cid:3) dν . (cid:3) Proof of Proposition 1.

Let H ′ = { f ∈ H ; | f | < d , and write the left side of (1.5.3) as E = E x ∈ V J Y e ∈H d F e ( x e ) Y f ∈H ′ ν f ( x f ) . Fix e = [ d ] and write e j := [ d + 1] \{ j } for the rest of the faces. The idea is to apply the Cauchy-Schwartzinequality successively in the x , x , . . . , x d variables to eliminate the functions F e ≤ ν e , . . . , F e d ≤ ν e d ,using the linear forms condition at each step. Using F e ≤ ν e we have | E | ≤ E x ,...,x d +1 ν e ( x ) Y / ∈ f ∈H ′ ν f ( x f ) (cid:12)(cid:12) E x Y j =2 F e j ( x j ) Y ∈ f ∈H ′ ν f ( x f ) (cid:12)(cid:12) . By the linear forms condition E x ,...,x d +1 ν e ( x ) Q / ∈ f ∈H ′ ν f ( x f ) = 1 + o N →∞ (1) , thus by the Cauchy-Schwartz inequality E . E x ,...,x d +1 ν e ( x ) Y / ∈ f ∈H ′ ν f ( x f ) E x ,y Y j =2 F e j ( x , x e j \{ } ) F e j ( y , x e j \{ } ) (A.1) × Y ∈ f ∈H ′ ν f ( y , x f \{ } ) ν f ( x , x f \{ } ) Note that, what happened is that we have replaced the function F e by the measure ν e , doubled the variable x to the pair of variables ( x , y ) and also doubled each factor of the form G e ( x e ) (which is either F e ( x e ) or ν e ( x e ) , for e ∈ H ) depending on the x variable. To keep track of these changes as we continue with therest of that variables, let us introduce some notations. Let g ⊆ [ d ] and for a function G e ( x e ) deﬁne G ∗ e ( x e ∩ g , y e ∩ g , x e \ g ) := Y ω e ∈{ , } e ∩ g G e ( ω e ( x e ∩ g , y e ∩ g ) , x e \ g ) . (A.2)We claim that after applying the Cauchy-Schwartz inequality in the x , . . . , x i variables we have with g = [ i ] E i . E x [ i ] ,y [ i ] ,x J \ [ i ] Y j ≤ i ν ∗ e j ( x [ i ] ∩ e j , y [ i ] ∩ e j , x e j \ [ d ] ) Y j>i F ∗ e j ( x [ i ] ∩ e j , y [ i ] ∩ e j , x e j \ [ d ] ) (A.3) × Y f ∈H ′ ν ∗ f ( x f ∩ [ i ] , y f ∩ [ i ] , x f \ [ i ] ) . (A.4)For i = 1 this can be seem from (A.1). Note that the linear forms appearing in any of these factors arepairwise linearly independent as our system is well-deﬁned. Assuming it holds for i separating the factorsindependent of the x i +1 variable, replacing the function F e i +1 with ν e i +1 , and applying the Cauchy-Schwartzinequality we double the variable x i +1 to the pair ( x i +1 , y i +1 ) and each factor G ∗ e ( x e ∩ [ i ] , y e ∩ [ i ] , x e \ [ i ] ) de-pending on it, to obtain the factor G ∗ e ( x e ∩ [ i +1] , y e ∩ [ i +1] , x e \ [ i +1] ) , thus the formula holds for i + 1 . Afterﬁnishing this process we have by (A.2) and (A.3) E d . E x [ d ] ,y [ d ] Y ω ∈{ , } d F e ( ω ( x [ d ] , y [ d ] )) Y f ⊆ [ d ] ,f = e Y ω f ∈{ , } f ν f ( ω f ( x f , y f )) W ( x [ d ] , y [ d ] ) , where W ( x [ d ] , y [ d ] ) = E x d +1 Y d +1 ∈ e ∈H Y ω e ∈{ , } e ∩ [ d ] ν e ( ω e ( x e ∩ [ d ] , y e ∩ [ d ] , x e \ [ d ] )) . Thus, as F e ≤ ν e , to prove (1.5.3) it is enough to show that E x [ d ] ,y [ d ] Y f ⊆ [ d ] Y ω f ∈{ , } f ν f ( ω f ( x f , y f )) |W ( x [ d ] , y [ d ] ) − | = o N →∞ (1) . This, similarly as in [7], can be done with one more application of the Cauchy-Schwartz inequality leadingto 4 terms involving the ”big” weight functions W and W . Each terms is however o N →∞ (1) by thelinear forms condition, as the underlying linear forms are pairwise linearly independent. Indeed the forms L f ( ω f ( x f , y f ) are pairwise independent for f ⊆ [ d ] , and depend on a different set of variables then theforms L e ( ω e ( x e ∩ [ d ] , y e ∩ [ d ] , x e \ [ d ] )) for e * [ d ] deﬁning the weight function W . The new forms appearing MULTIDIMENSIONAL SZEMER ´EDI THEOREM IN THE PRIMES 39 in W are copies of the forms in W with the x d +1 variable replaced by a new variable y d +1 hence areindependent of each other and the rest of the forms. This proves the proposition. (cid:3) Acknowledgements.

We would like to thank Terence Tao for some helpful correspondence during thisresearch. R

EFERENCES [1] D. C

ONLON , J. F OX , Y. Z HAO , A relative Szemeredi theorem ,Geometric and Functional Analysis , to appear.[2] B. C

OOK , A. M

AGYAR

Constellations in Pd

International Mathematics Research Notices 2012.12 (2012), 2794-2816.[3] H. F

URSTENBERG , Y. K

ATZNELSON , An ergodic Szemer´edi theorem for commuting trnasformations , J. Analyse Math. 31(1978), 275-291[4] J. F OX , Y. Z HAO , A short proof of the multidimensional Szemer´edi theorem in the primes. , American Journal of Mathematics,ti appear.[5] W.T. G

OWERS , Hypergraph regularity and the multidimensional Szemer´edi theorem , Annals of Math. 166/3 (2007), 897-946[6] W.T. G

OWERS , Decompositions, approximate structure, transference, and the Hahn-Banach theorem , Bull. London Math.Soc. 42 (4) (2010), 573-606[7] B. G

REEN AND

T. T AO , The primes contain arbitrary long arithmetic progressions , Annals of Math. 167 (2008), 481-547[8] B. G

REEN AND

T. T AO , Linear equations in primes.

Annals. of Math.(2) 171.3 (2010), 1753-1850.[9] B. G

REEN , T. T AO , The Mobius function is asymptotically orthogonal to nilsequences , Annals of Math., 175 (2012), 541-566[10] B. G

REEN , T. T AO , T. Z IEGLER , An inverse theorem for the Gowers U s +1 [ N ] norm , Annals of Math., 176 (2012), no. 2,1231-1372[11] D. G OLDSTON , C. Y

ILDIRIM , Higher correlations of divisor sums related to primes I: triple correlations , Integers: ElectronicJournal of Combinatorial Number theory, 3 (2003), 1-66[12] D. G

OLDSTON , C. Y

ILDIRIM , Higher correlations of divisor sums related to primes III: small gaps between primes , Proc.London Math. Soc. 95 (2007), 653-686[13] A. M

AGYAR , T. T

ITICHETRAKUN

Corners in dense subsets of P d , preprint[14] B. N AGLE , V. R DL , M. S CHACHT , The counting lemma for regular k-uniform hypergraphs , Random Structures and Algo-rithms, 28(2), (2006), 113-179[15] O. R

EINGOLD , L. T

REVISAN , M. T

ULSIANI , S. V

ADHAM , Dense subsets of pseudorandom sets

Electronic Colloquium ofComputational Complexity, Report TR08-045 (2008)[16] J. S

OLYMOSI , Note on a generalization of Roths theorem , Discrete and Computational Geometry, Algorithms Combin. 25,(2003), 825-827[17] E. S

ZEMER ´ EDI , On sets of integers containing no k elements in arithmetic progression , Acta Arith. 27 (1975), 299-345[18] T. T AO , The Gaussian primes contain arbitrarily shaped constellations , J. Analyse Math., 99/1 (2006), 109-176[19] T. T AO , A variant of the hypergraph removal lemma

Journal of Combinatorial Theory, Series A 113.7 (2006): 1257-1280[20] T. T

AO AND

T. Z