[PDF] A Removal Lemma for Ordered Hypergraphs

Abstract

We prove a removal lemma for induced ordered hypergraphs, simultaneously generalizing Alon--Ben-Eliezer--Fischer's removal lemma for ordered graphs and the induced hypergraph removal lemma. That is, we show that if an ordered hypergraph (V,G,<) has few induced copies of a small ordered hypergraph (W,H,≺) then there is a small modification G ′ so that (V, G ′ ,<) has no induced copies of (W,H,≺) . (Note that we do \emph{not} need to modify the ordering < .) We give our proof in the setting of an ultraproduct (that is, a Keisler graded probability space), where we can give an abstract formulation of hypergraph removal in terms of sequences of σ -algebras. We then show that ordered hypergraphs can be viewed as hypergraphs where we view the intervals as an additional notion of a ``very structured'' set. Along the way we give an explicit construction of the bijection between the ultraproduct limit object and the corresponding hyerpgraphon.

Full PDF

aa r X i v : . [ m a t h . C O ] J a n A REMOVAL LEMMA FOR ORDERED HYPERGRAPHS

HENRY TOWSNER

Abstract.

We prove a removal lemma for induced ordered hypergraphs,simultaneously generalizing Alon–Ben-Eliezer–Fischer’s removal lemmafor ordered graphs and the induced hypergraph removal lemma. That is,we show that if an ordered hypergraph (

V, G, < ) has few induced copiesof a small ordered hypergraph (

W, H, ≺ ) then there is a small modiﬁ-cation G ′ so that ( V, G ′ , < ) has no induced copies of ( W, H, ≺ ). (Notethat we do not need to modify the ordering < .)We give our proof in the setting of an ultraproduct (that is, a Keislergraded probability space), where we can give an abstract formulation ofhypergraph removal in terms of sequences of σ -algebras. We then showthat ordered hypergraphs can be viewed as hypergraphs where we viewthe intervals as an additional notion of a “very structured” set. Alongthe way we give an explicit construction of the bijection between theultraproduct limit object and the corresponding hyerpgraphon. Introduction

In this paper, we will show a removal lemma for ordered hypergraphs—asimultaneous generalization of the removal lemma for ordered graphs [2, 3]and for hypergraphs [15, 22, 23].As in similar results, the methods naturally generalize to ﬁnite coloringsof k -tuples (“hypermatrices over a ﬁnite alphabet”). Therefore, in full gen-erality, our main result is the following. Corollary 5.6.

Let ǫ > be given and let Σ be a ﬁnite alphabet. There isa δ > so that whenever (Ω , < ) is an ordered set and ρ : (cid:0) Ω k (cid:1) → Σ , there isa ρ ′ : (cid:0) Ω k (cid:1) → Σ such that: • |{ ~x ∈ (cid:0) Ω k (cid:1) | ρ ( ~x ) = ρ ′ ( ~x ) }| < ǫ | Ω | k , and • for each ordered set ( W, ≺ ) with | W | < /ǫ and each coloring c : (cid:0) Wk (cid:1) → Σ , either: – (Ω , ρ ′ , < ) contains no copies of ( W, c, ≺ ) (that is, there are noorder-preserving functions π : W → Ω such that ρ ′ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) ), or – (Ω , ρ, < ) contains many copies of ( W, c, ≺ ) (that is, the set oforder-preserving functions π : W → Ω such that ρ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) has size at least δ | Ω | | W | ). Date : January 26, 2021.Partially supported by NSF grant DMS-1600263.

Coregliano and Razborov have recently [8] shown a result that could alsoplausibly be called ordered hypergraph removal. Their removal involvesmodifying the entire structure—that is, one replaces (Ω , <, ρ ) with (Ω , < ′ , ρ ′ ), whereas the result here only modiﬁes ρ . Their argument is quite general,applying to a wide range of structures. By contrast, our result is narrower,though we discuss at the end how the arguments might be generalized.Our approach is to restate the usual proof of hypergraph removal in asuﬃciently general way that the proof of ordered hypergraph removal fallsout without much change. We will consider k -graphs which have a sequenceof notions of “structured sets”. In the usual graph removal lemma, thissequence would has length 1: the only kind of structured set is the rectangles(that is, sets of edges of the form A × B for sets A and B ).In the hypergraph removal lemma for k -graphs, the sequence of notions ofstructure has length k −

1: the ﬁrst, most general tier of structured sets are cylinder sets generated by k − k − Q i ≤ k A i , which are exactly the cylinder sets generated by 1-tuples).Meanwhile, in the ordered graph case, the sequence of notions of structurehas length 2: the more general tier of structured set is the rectangles A × B where A, B are arbitrary sets of vertices, while the second, more restrictivetier of structure is sets of the form I × J where I, J must be intervals in theordering.Once we have set up this general framework, ordered hypergraph removalwill fall out almost instantly from the proof of hypergraph removal: wewill use a sequence of notions of structure of length k , beginning with thecylinder sets generated by k − Q i ≤ k A i where the A i are arbitrary, and then adding an additional notion ofstructured set given by boxes of the form Q i ≤ k I i where the I i are intervals.The idea that Szemerédi’s regularity lemma and its generalizations canbe viewed in terms of nested notions of structure is present, for instance, in[25, 26], which describe Szemerédi’s regularity lemma in terms of conditionalexpectation.Working with multiple layers of structure typical requires fairly compli-cated dependencies to correctly express bounds in the ﬁnite setting, so itis convenient to pass to an inﬁnitary, analytic setting where we can “let ǫ equal 0”—that is, where some of the bounds will disappear into a measure-theoretic limit object.There are two main approaches to representing the notion of structurein such a formalism. To be explicit, consider the case of a 3-graph; in theﬁnite setting, we have a large vertex set V and consider some symmetric set H ⊆ V . In one approach to limit objects, the graphon approach (e.g. [20]),the limit object is an uncountable space Λ and a measurable function f :Λ → [0 , V , while the additional three coordinates correspond to the pairs of REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 3 coordinates. More generally, if we began with sets H ⊆ V k , the limit objectwould involve functions on Λ k − . (The value 2 k − { , , . . . , k } except for the empty set and the whole set, whichrepresents a “purely quasirandom” component which is omitted from thelimit object )Analogously, an ordered graph is a symmetric H ⊆ V where V is an or-dered set. An orderon [4] (the graphon-like limit object corresponding to anordered graph) is then a function from Λ to [0 , V are analo-gous to pairs from Λ, where the ﬁrst component represents the informationabout the ordering and the second component the additional informationwhich is present in the vertex which not explained by its position in theordering. (Permutons [17] and latinons [12]—limit objects for permutationsand Latin squares, respectively—similarly acquire “extra” coordinates inthis way.)It is both an advantage and a disadvantage of this representation that itfully separates out the interactions between these coordinates: the coordi-nates are combined as a familiar product measure space (it is common totake Λ = [0 , , n for some n ) and one can use standard results (for instance, theLebesgue density theorem, as in [10]) on the space. However, because of thisde-association of the coordinates, it is diﬃcult to interpret what the higherorder coordinates “mean” in a general way: when we represent a 3-graphwith a function f ( x, y, z, u, v, w ), it is diﬃcult to concretely say, in a gen-eral way, what a particular value of u means. (Indeed, it is artiﬁcial, anda bit misleading, to represent these objects as powers of a single space: forinstance, there is no reason to think we can swap the z and u coordinatesin a meaningful way.)Here we prefer a diﬀerent approach to the limit object where the limitobjects have a more familiar form: our version of a limit of 3-graphs willbe a subset of Ω for an uncountable set Ω, and our version of an orderedgraph will be a subset of Ω where Ω is an ordered set. The price is thatwe must work with a Keisler graded probability space . This means that themeasurable sets are more complicated: in addition to the sets of pairs given In a third, related, setting— arrays of exchangeable random variables [1, 9, 16]—thecoordinate corresponding to the empty set is typically included, but easily eliminatedbecause an exchangeable array of random variables is a combination of dissociated arrays,where the dissociated arrays are precisely those where the coordinate corresponding tothe empty set can be ignored. A natural generalization of a k -hypergraphon would be toadd the coordinate corresponding to the empty set; this would be roughly represent anensemble of k -hypergraphons rather than a single such object. The coordinate corresponding to the full set is related to why we end up with a functionrather than a set: we could work with a set F ⊆ Λ k − , and think of f ( ~ω ) = R f ( ~ω, u ) du where u is the extra coordinate. See Section 2.2.

HENRY TOWSNER by the standard product measure construction, there are typically additionalmeasurable sets of pairs which include the quasirandom sets.In this setting, diﬀerent kinds of structure are identiﬁed by looking at sub- σ -algebras of measurable sets, which represent notions of structure [28, 29].For instance, when we consider a 3-graph H ⊆ Ω , we have a collection B of measurable subsets of Ω , containing all the information we need aboutthe ﬁrst two coordinates, but there is also a product σ -algebra, B , ⊆ B ,which is the collection of measurable sets generated by rectangles. So theﬁrst two coordinates of Λ correspond to determining information about setsin B , , while the fourth coordinate of Λ corresponds to information aboutthe quasirandom elements of B .Formally, the two approaches are linked by a suitable map π : Ω → Λ where, for instance, π − ( C × Λ ) must give a set of the form A × Ω where A is B , -measurable, while π − (Λ × B × Λ ) must give a set of the form A × Ω where A is quasirandom. (In fact, our approach to the proof will leadus to construct something close to an explicit version of this map.)Similarly, when we have an ordering on Ω we have both the collection B of all measurable subsets of Ω, but also a sub- σ -algebra B < which isgenerated by the intervals. In this setting, we will be able to prove:

Theorem 5.5.

Let Σ be a ﬁnite alphabet and let (Ω , < ) be a set togetherwith a Keisler graded probability space on Ω such that < is measurable, andsuppose ρ : (cid:0) Ω k (cid:1) → Σ is measurable. For each w and each ǫ > , there is a ρ ′ : (cid:0) Ω k (cid:1) → Σ such that • µ ( { ~x | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ , and • for each ordered set ( W, ≺ ) and coloring c : (cid:0) Wk (cid:1) → Σ , either: – (Ω , ρ ′ , < ) contains no copies of ( W, c, ≺ ) (that is, there are noorder-preserving functions π : W → Ω such that ρ ′ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) ), or – (Ω , ρ, < ) contains many copies of ( W, c, ≺ ) (that is, the set oforder-preserving functions π : W → Ω such that ρ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) has positive measure). Corollary 5.6 follows immediately by a standard ultraproduct argumentas described in [13].The main technique in our proof will be reproducing, in this setting, some-thing like the Lebesgue density theorem: a way of deﬁning a notion of densityin this setting so that almost every point is dense.The use of inﬁnitary and measure-theoretic arguments in proofs like thisis superﬁcial: rather than interpreting the proof of Theorem 5.5 as actually The association between these approaches are quite systematic. For instance, the factthat orderons are functions with domain Λ is essentially telling us that when we lookat the Keisler graded probability space, we should be paying attention to a particularsub- σ -algebra of B , namely B < . REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 5 involving inﬁnite sets, one can interpret these as abbreviations for compli-cated statements about ﬁnite sets. In particular, although the proof doesnot explicitly give bounds on the relationship between δ and ǫ in Theorem5.5, it is routine (though quite tedious) to translate the proof into an explicitcombinatorial one with bounds. (See [5, 27] for a formal descriptions of howthis may be done in general.)The proof here uses a familiar structure, so we can already tell roughlywhat upper bound it gets: the “unwound” proof of Theorem 5.5 for ordered k -graphs goes through a result similar to the regularity lemma for k + 1-graphs. This means the upper bound for removal of ordered k -graphs wouldbe on the order of the k + 2-th function in the fast-growing hierarchy. (Re-call that the second function in this family if roughly exponential, and laterfunctions are obtained by iterating the previous function, so the 3-rd func-tion is iterated exponentiaion, the 4-th function is the “wowzer” functionobtained by iterating the iterated exponential, and so on.) For hypergraphregularity, these bounds are known [21] to be tight. (For graph removal,better bounds [11] can be obtained by avoiding the regularity lemma. Itseems likely that similar methods can produce at least some improvementon the upper bound for hypergraph and ordered hypergraph removal.)2. Preliminaries

Cylinder Intersection Sets.

One of the central ideas in the proofof graph removal is approximating a graph (

V, E ) using sets of vertices. Inmodern presentations, this idea is usually expressed using some form of theSzemerédi regularity lemma: we ﬁnd a partition V = S i ≤ n V i so that most ofthe bipartite graphs ( V i , V j , E ∩ ( V i × V j )) have a quasi-randomness property.To turn this into a proof of graph removal, one of the key points is that almostall of the edges in E will belong to bipartite graphs ( V i , V j , E ∩ ( V i × V j )) wherethe quasi-randomness property holds and the density | E ∩ ( V i × V j ) || V i × V j | ≥ ǫ > P be the set of pairs ( i, j ) where this density is bounded awayfrom 0, we end up considering a related graph, E + = E ∩ ( Q i,j ∈ P V i × V j ),the edges which are near many other edges.In order to generalize this to k -graphs, we need to generalize the idea ofa product set to higher arity. The right notion is a cylinder intersection set :a cylinder intersection set of k -tuples is a collection of k -tuples deﬁned byrestricting the sets that certain r -sub-tuples can belong to for r < k . Forinstance, if k = 3 and r = 2, the prototypical cylinder intersection set is aset of the form { ( x, y, z ) | ( x, y ) ∈ A, ( x, z ) ∈ B and ( y, z ) ∈ C } . We can see that a product is just a cylinder intersection set where we onlyconsider sub-tuples with r = 1. HENRY TOWSNER

Since we will be considering cylinder intersection sets extensively, andsince it turns out that we can view graph homomorphisms as themselves be-ing cylinder intersection sets, it will be convenient to introduce some uniformnotation.We are interested in a situation where we have a ﬁnite set of points—say, W —with some structure (a k -graph or an ordering) and are interested in“copies” inside some other set V . For this purpose, it is useful to work with“ W -tuples”. Deﬁnition 2.1.

When W is a ﬁnite set, a W -tuple from Ω is a function W to Ω. We write Ω W for the set of W tuples.When k is a non-negative integer, we write [ k ] for the set { , , . . . , k } .These deﬁnitions equate a [ k ]-tuple with a k -tuple and a Ω [ k ] with Ω k , sowe can view this as an extension of the usual notation for tuples. Deﬁnition 2.2.

When ~x W ∈ Ω W is a W -tuple and e ⊆ W , we write ~x e ∈ Ω e for the e -tuple ~x e = ~x W ↾ e .When e = { i } is a singleton, we can abbreviate x i = ~x { i } = ~x W ( i ) torecover the usual notation for tuples. Deﬁnition 2.3.

We write (cid:0) Wk (cid:1) for the collection of subsets of W of size k .We write (cid:0) W ≤ k (cid:1) for S i ≤ k (cid:0) Wi (cid:1) , the collection of subsets of W of size ≤ k , and (cid:0) W

When H = ( W, F ) is a ﬁnite k -graph and G = (Ω , E ) is a k -graph, we deﬁne the copies of H in G , written T H ( G ), to be T F ( { E } ).That is, the copies of ( W, F ) in (Ω , E ) are the tuples ~x W such that, forevery e ∈ F , ~x e ∈ E . Deﬁnition 2.5.

When H = ( W, F ) is a ﬁnite k -graph and E ⊆ Ω k is a k -graph, we deﬁne the induced copies of H in G , written T indH ( G ) to be T ( Wk )( { A e } ) where A e = (cid:26) E if e ∈ F Ω k \ E otherwise . That is, the induced copies of (

W, F ) in (Ω , E ) are the tuples ~x W suchthat, for each e ∈ (cid:0) Wk (cid:1) , e ∈ F if and only if ~x e ∈ E .A few other kinds of cylinder intersection sets will be needed along theway. Another case we will see is when S = (cid:0) [ k ] ≤ k (cid:1) or S = (cid:0) [ k ]

Suppose ρ : (cid:0) Ω k (cid:1) → Σ and c : (cid:0) Wk (cid:1) → Σ are colorings.The copies of ( W, c ) in (Ω , ρ ), written T W,c (Ω , ρ ), are T ( Wk )( { A e } ) where A e = { ~x e | ρ ( ~x e ) = c ( e ) } .An induced copy of ( W, F ) in (Ω , E ) is precisely a copy of (

W, χ F ) in(Ω , χ E ) where the characteristic functions χ F , χ E are viewed as coloringswhere Σ = { , } .2.2. Measure Spaces.

It will be convenient for us to prove our results in aninﬁnitary setting where we can use some measure theoretic ideas. A

Keislergraded probability space consists of a set Ω and, for each k , a measure µ k onsubsets of Ω k .When Ω is ﬁnite, the natural choice is to take each µ k to be the countingmeasure, µ k ( S ) = | S || Ω | k , on subsets of Ω k . When Ω is inﬁnite, we need to ﬁx σ -algebras of measurable sets and add some conditions to ensure that themeasures are compatible with each other. Deﬁnition 2.7. A Keisler graded probability space on Ω is a collection ofprobability measure spaces, (Ω k , B k , µ k ), for each k ∈ N so that: • whenever π : [1 , k ] → [1 , k ] is a permutation and B ∈ B k , we have B π = { ( x π (1) , . . . , x π ( k ) ) | ( x , . . . , x k ) ∈ B } ∈ B k and µ k ( B π ) = µ k ( B ), • if B ∈ B k and C ∈ B r then B × C ∈ B k + r , • whenever B ∈ B k + r , the set of ( x , . . . , x r ) such that B x ,...,x r = { ( x r +1 , . . . , x k + r ) | ( x , . . . , x k +1 ) ∈ B } ∈ B k is a set in B r of measure1 and µ k + r ( B ) = Z µ k ( B x ,...,x r ) dµ r . We say { (Ω k , B k , µ k ) } k ∈ N is atomless if, for every x ∈ V , µ ( { x } ) = 0.Atomless Keisler graded probability spaces are the setting obtained bytaking the limit of the counting measures as the size of Ω approaches inﬁnity(made precise by using an ultraproduct). As a result, for many purposes onecan simply pretend that an atomless Keisler graded probability space is ﬁnitewith | Ω | very, very large.The special case where, for each k , B k is equal to the product σ -algebra B k is the most familiar example, but in general a Keisler graded proba-bility space may have additional measurable sets which do not belong tothe product σ -algebra. These additional sets precisely correspond to thequasirandom graphs and hypergraphs [28].More generally, we take B I to be a measure space on Ω I , obtained from B | I | in the natural way by choosing any bijection between I and { , , . . . , | I |} , HENRY TOWSNER and we have a corresponding measure µ I on B I . Since B | I | and µ | I | aresymmetric, B I and µ I do not depend on the choice of bijection.The σ -algebra B k of all measurable sets has canonical sub- σ -algebras gen-erated by cylinder intersection sets which are, in general, proper; thesecorrespond exactly to the “non-quasirandom” sets (for various notions ofquasirandomness). Deﬁnition 2.8.

When r < k , B k,r is the sub- σ -algebra of B k generated byall ([ k ] , (cid:0) [ k ] r (cid:1) )-cylinder intersection sets where all components are elements of B r .More generally, when D is a sub-algebra of B r , we write K k,r ( D ) for thesub- σ -algebra of B k generated by all ( k, (cid:0) [ k ] r (cid:1) )-cylinder intersection sets whereall components are elements of D .We say { (Ω k , B k , µ k ) } k ∈ N is countably approximated if each, for each k ,there is a countable algebra of sets B k ⊆ B k such that: • K k,r ( B r ) ⊆ B k for all r < k , • the algebras B k are symmetric, • whenever B ∈ B k , r < k , and q ∈ Q ∩ (0 , D ∈ B r with { ~x ∈ Ω r | µ ( B ~x ) < q } ⊆ D ⊆ { ~x ∈ Ω r | µ ( B ~x ) ≤ q } . Ultraproducts of graphs are countably approximated, using the deﬁnablesets (in a large enough language) as the approximating sets. (It turns outthat we cannot quite expect to exactly close the algebras under level sets;see [13] for more on the approach here.)We can think of B k,r as being the sets of k -tuples which are “explainedby” properties of r -tuples.Since B k,r is symmetric, we can also deﬁne B W,r in the natural way—equivalently, as the image of B | W | ,r under any bijection of W with { , . . . , | W |} ,or as the σ -algebra generated by ( W, (cid:0) Wr (cid:1) )-cylinder intersection sets wherethe component A e belongs to B e . Deﬁnition 2.9.

We deﬁne t S ( { A e } e ∈ S ) = µ ( T S ( { A e } e ∈ S )). More generally,we deﬁne t S ( { f e } e ∈ S ) = Z Y e ∈ S f e ( ~x e ) dµ. We similarly deﬁne t H ( G ) = µ ( T H ( E )), t indH ( G ) = µ ( T indH ( G )), and t W,c (Ω , ρ ) = µ ( T W,c (Ω , ρ )).There will be no confusion between these related deﬁnitions, since t S ( { A e } ) = t S ( { χ A e } ). 3. Removal and Induced Removal for Graphs

In this section we prove graph removal, using this as a vehicle to introduceour notation and approach and prove some lemmas we will need for the moregeneral results in later sections.

REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 9

Neighborhoods and Points of Density.

We would like to work withpoints of density of measurable functions—that is, points which behave likelimits of the nearby points. One diﬃculty is that a general probabilitymeasure space may not have a natural basis like the open balls. We willﬁx this by brute force: we simply pick, more or less arbitrarily, a family ofneighborhoods around each point which suﬃce for our purposes. More precisely, we will have, for each point x ∈ Ω, a sequence N j ( x ) ofneighborhoods such that µ ( N j ( x )) →

0. (The analogous arrangement inthe Lebesgue measure would take N j ( x ) = B /j ( x ).)Since we will need it later and the deﬁnition is the same, we will deﬁne asystem of neighborhoods around tuples in Ω r as well. Deﬁnition 3.1.

When D is a countable collection of subsets of Ω r , a systemof neighborhoods in D is a sequence of partitions, N = {N j } such that: • each N j is a ﬁnite partition of Ω r , • when i < j , N j reﬁnes N i , • for every set A ∈ D , there is a j so that A diﬀers by measure 0 froma union of elements of N j , • lim j →∞ max P ∈N j µ ( P ) = 0.We call r the arity of N .We write N σ for the σ -algebra generated by all sets in S j N j .Since we will be working with partitions frequently, we introduce somenotation. Deﬁnition 3.2.

When N j is a partition of Ω r and ~x ∈ Ω r is a point, wewrite N j ( ~x ) for the unique set P ∈ N j such that ~x ∈ P .We should think of N as being a schema giving, for each tuple ~x W andeach number j , a set N j ( ~x W ) which is the “ball around the tuple ~x W ”.We want to lift partitions of Ω r to partitions of Ω k with r < k in theobvious way—a partition of Ω k is a cylinder intersection set coming fromour partition of Ω r . Deﬁnition 3.3.

When N j is a partition of Ω r , r ≤ | W | and ~x W ∈ Ω W , wewrite N j ( ~x W ) for the (cid:0) Wr (cid:1) -cylinder intersection set T ( Wr )( {N j ( ~x e ) } e ∈ ( Wr )).For instance, when N has arity 1, it induces partitions of Ω into sets ofthe form P × Q where P, Q ∈ N j .A system of neighborhoods N = {N j } give us a natural way to deﬁnedensity. An alternative method, which plays a central role in the “graphon” approach to limitgraphs [19], is to use the fact that every probability measure space is, in a suitable way,equivalent to the Lebesgue measure on the unit interval, and then use the usual notion ofa point of density. This is used, for instance, in [10] to prove hypergraph regularity.

Deﬁnition 3.4.

Let N be a system of neighborhoods of arity r . Given k ≥ r and f : Ω k → [0 , f j N ( ~x ) = 1 µ ( N j ( ~x )) Z N j ( ~x ) f ( ~x ) dµ whenever µ ( N j ( ~x )) > f + N ( ~x ) = lim j →∞ f j N ( ~x )wherever each f j N is deﬁned and this limit exists. We call ~x a point of densityfor f in N if f + N ( ~x ) exists andlim j →∞ µ ( N j ( ~x )) Z N j ( ~x ) | f + N ( ~y ) − f + N ( ~x ) | dµ ( ~y ) = 0 . When A ⊆ Ω k , we call ~x a point of density for A in N if ~x is a point ofdensity for χ A .One might have expected the deﬁnition of a point of density to be simplythat lim j →∞ µ ( N j ( ~x )) Z N j ( ~x ) | f ( ~y ) − f ( ~x ) | dµ ( ~y ) = 0 . But take E to be a quasi-random graph and let f = χ E and N a system ofneighborhoods of arity 1; in this case, every positive measure neighborhood N j ( x , x ) = N j ( x ) × N j ( x ) has the property that half its points belongto E and half do not belong to E , so there would be no points of density asall.This is the fundamental diﬀerence from classical Lebesgue measure: be-cause we are working in a Keisler graded probability space with quasi-random elements, we cannot expect most points in the graph to be nearother points in the graph. However we will see that we can expect mostpoints to have a well-deﬁned density, and to be near other points with asimilar density.We will usually want, not just any point of density of f , but one wherethe density f + N is positive. Deﬁnition 3.5.

We say x is a positive point of density of f if x is a pointof density of f with f + N ( x ) >

0. When E is a set, a positive point of density of E is a positive point of density of χ E .When r = 1—that is, when N consists of sets of points—there are noparticular symmetry issues. In particular, when f is a symmetric function(for instance, the characteristic function of a graph of hypergraph), everypermutation of a point of density is also a point of density. When r >

1, wehave to worry about whether the neighborhoods themselves are symmetric.

Lemma 3.6. If f is symmetric, each N j is symmetric (that is, each per-mutation of a set in N j is also in N j ), and ~x is a point of density for f in N then each permutation of ~x is also a point of density. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 11

In general, f + N is E ( f | K k,r ( N σ )). More precisely, E ( f | K k,r ( N σ )) isonly deﬁned up to the L norm, so f + N is a natural representative of E ( f |K k,r ( N σ )). Lemma 3.7.

For any measurable function f : Ω k → [0 , , f + N is deﬁnedalmost everywhere, f + N = E ( f | K k,r ( N ) σ ) , almost every x is a point ofdensity of x , and almost every point x with f ( x ) > is a positive point ofdensity of f .Proof. We ﬁrst show that the functions f j N converge in the L norm to E ( f | K k,r ( N σ )). For any ǫ >

0, we may choose j large enough that || E ( f | K k,r ( N j )) − E ( f | K k,r ( N σ )) || L < ǫ . Then whenever j ≥ j , N j reﬁnes Q up to measure 0, so also || f j N − E ( f | K k,r ( N σ )) || L < ǫ .To see that the pointwise limit is deﬁned almost everywhere and thatalmost every point is a point of density, consider any ǫ > α < β .Let g = E ( f | K k,r ( N σ )). Choose j large enough so that there is a set S ∈ K k,r ( N j ) so that µ ( S △ { ~x | g ( ~x ) ≤ α } ) < β − α − α ǫ .Consider all rectangles R from S j N j which are contained in S and suchthat the average of f on R is ≥ β . Since R R g dµ ≥ βµ ( R ) and g ≤

1, we musthave { ~x ∈ R | g ( ~x ) > α } ≥ β − α − α µ ( R ), and therefore µ ( R ) < ǫ . Therefore,once j ≥ j , except for a set of measure ǫ , if g ( ~x ) ≤ α then for all j ≥ j , f j N ( ~x ) ≤ α as well. So the set of points with g ( ~x ) ≤ α but lim sup f j N ( ~x ) > α has measure < ǫ . Dually, we can show that the set of points with g ( ~x ) ≥ β but lim inf f j N ( ~x ) < β has measure < ǫ . Since this holds for all α, β and all ǫ , for almost all ~x we have f + N ( ~x ) = lim f j N ( ~x ) = g ( ~x ).By the same argument, for any α and any δ > f + N ( ~x ) ≤ α , except for a set of measure < ǫ , for all suﬃciently large j we have f j N ( ~x ) ≤ α + δ , and therefore since µ ( N j ( ~x )) R N j ( ~x ) f ( ~x ) dµ = µ ( N j ( ~x )) R N j ( ~x ) f + N ( ~x ) dµ ,the set of ~y ∈ N j ( ~x ) with f + N ( ~y ) ≥ α + δ is small. So almost every ~x is apoint of density.Finally, to see that almost every point with f ( x ) > f + N ( x ) >

0, let Z be the set of points where f + N ( ~x ) = 0. Since f + N is K k,r ( N )-measurable, Z belongs to the completion of K k,r ( N ), so 0 = R Z f + dµ = R Z f dµ , so theset of ~x ∈ Z where f ( ~x ) > (cid:3) Counting and Graph Removal.

The next fact we need is that thequantity t S ( { f e } e ∈ S ) depends only on the “non-random” part of the f e . Inits simplest form, this says that if E is a graph, t S ( E ) = t S ( E ( E | B , ))—that is, we can replace the graph E with the function E ( E | B , ) measuringthe density of E when counting graph densities. In the graphon approach, this fact plays a central role: the object E ( E | B , ) is thegraphon, and the basic theorems establish that for things like counting graph densities,this is all that is needed. We will state this fact in a very general way which will continue to serveus as we deal with k -graphs. Lemma 3.8.

Let { f e } e ∈ S be given and, for each e , let D e be a σ -algebra ofsets of r -tuples such that, for every e ∈ S , | e | ≥ r and either: • f e is D e -measurable, or • for every e ∈ S \ { e } and each ﬁxed ~x e \ e , the function ~x e f e ( ~x e ) is D e -measurable.For each e , let f ′ e = E ( f e | D e ) . Then t S ( { f e } ) = t S ( { f ′ e } ) . The general form allows the case where S contains tuples of diﬀerentsizes, and replaces B , with a more general σ -algebra which may dependon the coordinate e ; most commonly, we will have D e = K e,r ( D ) for a ﬁxed σ -algebra D .We need some requirement that D e is large enough. For example, when weturn to 3-graphs, we might initially try D e = B , while S ⊆ (cid:0) W (cid:1) . Workingonly with B , amounts to working with weak hypergraph regularity [6],which is known to suﬃce when S is linear —that is, when | e ∩ e ′ | ≤ e, e ′ ∈ S [7, 18]. But when the elements of S can overlap moregenerally, we need to work with a larger σ -algebra, for instance B , . Thisis precisely what the second case of the lemma requires: that the “overlaps”with the other functions is already measurable with respect to D e . Proof.

We show by induction on | T | , where T ⊆ S , that t S ( { A e } ) = Z Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T f e ( ~x e ) dµ. When T = S , this gives the desired claim.When T = ∅ , the statement is trivial.Suppose the inductive hypothesis holds for T and that e ∈ S \ T . Thenwe have t S ( { f e } ) = Z f e ( ~x e ) Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T ∪{ e } f e ( ~x e ) dµ. For a ﬁxed ~x W \ e , consider the function h ( ~x e ) = Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T ∪{ e } f e ( ~x e ) . Each term in the product is D e -measurable, so h is D e -measurable as well.Therefore t S ( { f e } ) = Z E ( f e | K e ,r ( D ))( ~x e ) Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T ∪{ e } f e ( ~x e ) dµ = Z f ′ e ( ~x e ) Y e ′ ∈ T f ′ e ′ ( ~x e ′ ) Y e ′ ∈ S \ T ∪{ e } f ′ e ′ ( ~x e ′ ) dµ which gives the inductive claim. (cid:3) REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 13

The next result should be seen as our version of the graph counting lemma.Typically, a graph counting lemma would say something like the following:Suppose S ⊆ (cid:0) W (cid:1) and that for each w ∈ W , we have a set P w ⊆ Ω such that, for each pair ( w, w ′ ) ∈ S , µ ( P w × P w ′ ) R P w × P w ′ f { w,w ′ } dµ >ǫ , and also f { w,w ′ } is suitably quasi-random between P w and P w ′ . Then t S ( { f e } ) > P w approaches 0: instead of sets P w , we will be able to work with individualpoints x w (and, therfore, suﬃciently small neighborhoods N j ( x w )). Therequirement that f { w,w ′ } be suitably quasi-random becomes the requirementthat ( x w , x w ′ ) be a points of density, and the requirement that f { w,w ′ } havepositive density becomes the requirement that f + { w,w ′ } ( x w , x w ′ ) > N be a system of neighborhoods with arity 1—since thehypergraph version requires more work. Theorem 3.9.

Let W be a ﬁnite set, let N be a system of neighborhoodsof arity , and let S be a collection of subsets of W . Suppose that, for each e ∈ S , either: • f e is K e, ( N ) -measurable, or • for every e ′ ∈ S \ { e } , the function ~x e R f e ′ ( ~x e ∪ e ′ ) dµ ( ~x e ′ \ e ) is K e, ( N ) -measurable.Suppose that ~x W ∈ T S ( { f e } ) is such that, for each e ∈ S , ~x e is a positivepoint of density of f e . Then t S ( { f e } ) > . The basic idea of the proof is that we may “blow up” each individualpoint x w into a small ball N j ( x w ), and then use the fact that each ~x e is apoint of density f e to ﬁnd many copies of W between these small balls. Proof.

Choose some ǫ ≤ min e ∈ S f + e ( ~x e ).Since each ~x e is a point of density, we may choose some j large enoughthat, for each e ∈ S ,1 µ ( N j ( ~x e )) µ ( { ~y e ∈ N j ( ~x e ) | f + e ( ~y e ) ≥ ǫ/ } ) ≥ − | S | . Therefore also 1 µ ( N j ( ~x W )) µ ( { ~y W ∈ N j ( ~x W ) | f + e ( ~y e ) ≥ ǫ/ } ) ≥ − | S | . Note that this depends on the fact that the arity of N is 1, because thisensures that N j ( ~x W ) = N j ( ~x e ) × N j ( ~x W \ e ).Therefore γ = µ ( { ~y W ∈ N j ( ~x W ) | there is some e ∈ S such that f + e ( ~y e ) < ǫ/ } ) < | S | | S | = 1 , so, using Lemma 3.8 (with D e = B , for all e ), t S ( { f e } ) = t S ( { f + e } ) ≥ ǫ | S | | S | (1 − γ ) > . (cid:3) Theorem 3.10 (Graph Removal) . Suppose H = ( W, F ) is a ﬁnite graphand G = (Ω , E ) is a graph with a countably approximated atomless Keislergraded probability space on Ω with E ∈ B . If t H ( G ) = 0 then there is asymmetric E ′ ⊆ E such that E \ E ′ is a measure set contained in anintersection of sets in B and, taking G ′ = (Ω , E ′ ) , T H ( G ′ ) = ∅ .Proof. Choose N so that every set in B is a ﬁnite union of sets in S j N j .Let E ′ ⊆ E consist of the positive points of density of E . By Lemma 3.7, µ ( E \ E ′ ) = 0. If T H ( E ′ ) = ∅ then any ~x W ∈ T H ( E ′ ) satisﬁes the conditionsof the previous lemma, and so t H ( E ) > E ∈ B and E \ E ′ is contained in an intersection of ﬁnite unions ofrectangles from B , E \ E ′ is contained in an intersection of sets in B . (cid:3) Corollary 3.11.

For every ﬁnite graph H = ( W, F ) and every ǫ > thereis a δ > so that whenever G = ( V, E ) is a graph with t H ( G ) < δ , there isa symmetric E ′ ⊆ E with | E \ E ′ | < ǫ | V | such that, taking G ′ = ( V, E ′ ) , T H ( G ′ ) = ∅ .Sketch. The proof is standard (see [13]), but we include the outline here.Suppose the statement were false, so let H = ( W, F ) and ǫ > n , there is a G n = ( V n , E n ) with t H ( G n ) < /n ,but so that no symmetric E ′ ⊆ E n with | E \ E ′ | < ǫ | V n | is H -free. Notethat | V n | → ∞ (otherwise t H ( G n ) < /n implies T H ( G n ) = ∅ for n largeenough).Let (Ω , E ) be an ultraproduct of the sequence G n . Take the Keislergraded probability space generated by the deﬁnable sets, with the Loebmeasure. Let E ′ be given by the previous lemma. Then E \ E ′ is containedin an intersection of deﬁnable sets, so choosing some deﬁnable set Z m largeenough, E \ Z m is H -free and Z m has measure < ǫ . By the Łoś Theorem, forinﬁnitely many n , we have ( V n , E n \ Z m ) is also H -free and Z m has measure < ǫ . (Where, by Z m , we mean the interpretation of the deﬁnable set Z m inthe structure G n .) But this is contradicts the choice of the G n . (cid:3) Induced Graph Removal.

When we prove induced graph removal,we have a new issue to deal with: when ~x is not a point of density, we cannotsimply exclude the point from E , because, by doing so, we might end upcreating an induced copy ~x where one of the non-edges of ~x is an elementwe removed from E .Instead, we adopt a more complicated strategy. We choose j large, sothat N j gives a partition of Ω into very small pieces. We will then choose,from each element P of N j , a representative a P ∈ P , uniformly at random.Since we are only choosing ﬁnitely many such elements, with probability 1, REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 15 all the pairs ( a P , a P ′ ) with P = P are points of density. We then modify E to match ( a P , a P ′ ) on P × P ′ ; that is, we deﬁne a new graph E ′ : if( a P , a P ′ ) ∈ E then we place all of P × P ′ in E ′ , while if ( a P , a P ′ ) E thenwe exclude all of P × P ′ from E ′ . If we choose j large enough, we will beable to show that, for most choices of the representatives a P , µ ( E △ E ′ ) issmall.This leaves us with a new problem: what to do with the “diagonal com-ponents” P × P . When j is large, these diagonals have small measure, so wecan put them in or out of E ′ as convenient. On the other hand, we cannotguarantee that ( a P , a P ) is a point of density. Theorem 3.12 (Induced Graph Removal) . Suppose H = ( W, F ) is a ﬁnitegraph and G = (Ω , E ) is a graph with a countably approximated atomlessKeisler graded probability space on Ω with E ∈ B . For each ǫ > there is asymmetric E ′ ∈ B such that µ ( E ′ △ E ) < ǫ and for any H with t indH ( E ) = 0 , T indH ( E ′ ) = ∅ .Proof. Let f = χ E . Choose j large enough that the set of pairs ( x , x ) forwhich χ jE has not converged to within ǫ/ ǫ/

3, and so that P P ∈N j µ ( P × P ) < ǫ/ (cid:0) Ω2 (cid:1) into three sets: E = { ( x, y ) | f + ( x, y ) = 1 } , E = { ( x, y ) | f + ( x, y ) = 0 } , and E / = { ( x, y ) | < f + ( x, y ) < } . (Thereis also a set of measure 0 where f + ( x, y ) is undeﬁned.) We may think of theseas the interior of E , the interior of the complement of E , and a boundaryof points near both E and the complement of E .Suppose that, for each P ∈ N j with µ ( P ) >

0, we choose a point a P ∈ P uniformly at random. Then, with positive probability: • the set of points contained in P × P ′ with P = P ′ and such that | f + ( a P , a P ′ ) − f j ( a P , a P ′ ) | ≥ ǫ/ ǫ/ • each ( a P , a P ′ ) is a point of density for each of E , E , E / and is apositive point of density for the set it belongs to.Next we prepare to deal with elements of the sets P × P . What we wantto do is choose many points near each a P ; when we choose one point b P,i near a P and one point b P ′ ,j near a P ′ with P = P ′ , we can ensure, with highprobability, that ( b P,i , b P ′ ,j ) is similar to ( a P , a ′ P ). When we take two pointsnear the same a P , b P,i and b P,j with i = j , we have no control over whathappens. However, by applying Ramsey’s Theorem (many times), we canat least ensure that the behavior does not depend on the particular choiceof i and j .Formally, we will choose these points by applying our counting lemmato a suitable graph. We may let A = { a P | P ∈ N j , µ ( P ) > } . Forany d , let us consider the colored d -blowup of A , which we deﬁne to be the { , / , } -colored graph ( A d , c d ) where: • A d = A × [ d ], • dom( c d ) = { (( a, i ) , ( a ′ , j )) | a = a ′ } , • when a = a ′ and ( a, a ′ ) ∈ E z , c d (( a, i ) , ( a ′ , j )) = z .Observe that Theorem 3.9 applies to ( A d , c d ), so t ( A d ,c d ) ( { E z } z ∈{ , / , } ) > v : A → { , , / } , the v -homogeneous completion of ( A d , c d ) is thecolored graph ( A d , c vd ) where c d ⊆ c vd and, for i = j , c vd (( a, i ) , ( a, j )) = v ( a ).Take m suﬃciently large and consider any copy ~b A m of the colored m -blowup of A in (Ω , E ). (This means that for each pair (( a, i ) , ( a ′ , j )) ∈ (cid:0) A d (cid:1) with a = a ′ , ( b ( a,i ) , b ( a,j ) ) ∈ E z if and only if ( a, a ′ ) ∈ E z , and we makeno commitments about which of the three sets (( a, i ) , ( a ′ , i )) belongs to.)Applying Ramsey’s Theorem once for each a ∈ A , there is a v and a sub-copy ~b A d of ~b A m which is a copy of ( A d , c vd ).Since there are only ﬁnitely many v , this means that for each d there some v so that t ( A d ,c vd ) ( E ) >

0. Furthermore, if d < d ′ , we have t ( A d ′ ,c vd ′ ) ( E ) ≤ t ( A d ,c vd ) ( E ). Therefore there must be some v so that, for all d , t ( A d ,c vd ) ( E ) > P ∈ N j has measure 0. Todeal with this, we assign to every element P ∈ N j a corresponding element Q P ∈ N j , and we will always treat elements of P as if they were really in Q P . For any P ∈ N j with µ ( P ) = 0, choose some Q P ∈ N j with µ ( Q P ) > µ ( P ) >

0, take Q P = P . So for almost every point, Q P = P , butthere are a measure 0 set of exceptional points where Q P = P .We deﬁne E ′ as follows: • for P = P ′ , if ( a Q P , a Q P ′ ) ∈ E , let E ′ ∩ ( P × P ′ ) = ∅ , • for P = P ′ , if ( a Q P , a Q P ′ ) ∈ E , let P × P ′ ⊆ E ′ , • for P = P ′ , if ( a Q P , a Q P ′ ) ∈ E / , let E ′ ∩ ( P × P ′ ) = E ∩ ( P × P ′ ), • if v ( a Q P ) = 1 then ( P × P ) ⊆ E ′ , • if v ( a Q P ) = 0 then E ′ ∩ ( P × P ) = ∅ , • if v ( a Q P ) = 1 / E ′ ∩ ( P × P ) = E ∩ ( P × P ).Consider any graph H = ( W, F ) such that T indH ( E ′ ) = ∅ . Take any ~x W ∈ T indH ( E ′ ). For each w ∈ W , let P w ∈ N j = Q N j ( x w ). Note that we mayhave P w = P w ′ even when w = w ′ , so ﬁx an ordering W = { w , . . . , w | W | } .We have t ( A | W | ,c v | W | ) ( { E z } z ∈{ , , / } ) >

0, so we may choose a copy ~y A | W | where all pairs are points of positive density for E if they belong to E ∪ E / and for E if they belong to E ∪ E / .Take ~z w i = ~y ( a Pwi ,i ) . For each pair w i = w j , observe that ( z w i , z w j ) is apositive point of density for E if ( w i , w j ) ∈ F and for E if ( w i , w j ) F —tosee this, suppose ( w i , w j ) ∈ F (the case where ( w i , w j ) F is symmetric): • if P w i = P w j then, since ( x w i , x w j ) ∈ E ′ , we have ( a P wi , a P wj ) E ,so ( y a Pwi ,i , y a Pwj ,j ) ∈ E ∪ E / and is therefore a positive point ofdensity for E , We could have tweaked our deﬁnition of a partition to avoid this case, but when wego on to hypergraphs, this case will be unavoidable, and the exceptional points will havesmall but positive measure.

REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 17 • if P w i = P w j then, since ( x w i , x w j ) ∈ E ′ , we have v ( a P wi ) = 0, soagain ( y a Pwi ,i , y a Pwi ,j ) ∈ E ∪ E / , and is therefore a positive pointof density for E .Therefore we may apply Theorem 3.9 to ~z w i to show that t H ( E ) > µ ( E △ E ′ ) < ǫ . Observe that of ( x w , x w ′ ) ∈ E △ E ′ then, letting P = N j ( x w ) and P ′ = N j ( x w ′ ), one of the following holds:(1) P = P ′ ,(2) µ ( P ) = 0 or µ ( P ′ ) = 0,(3) | f + ( a P , a P ′ ) − f j ( a P , a P ′ ) | > ǫ/ f j ( a P , a P ′ ) ≥ − ǫ/ x w , x w ′ ) E , or(5) f j ( a P , a P ′ ) < ǫ/ x w , x w ′ ) ∈ E .The ﬁrst case accounts for measure at most ǫ/

3, the second case for measure0, the third case for measure at most ǫ/

3, and the last two can each accountfor at most an ǫ/ P × P ′ , so at most ǫ/ µ ( E △ E ′ ) < ǫ . (cid:3) Corollary 3.13.

For every ﬁnite graph H = ( W, F ) and every ǫ > thereis a δ > so that whenever G = ( V, E ) is a graph with t indH ( G ) < δ , thereis a symmetric E ′ ⊆ E with | E \ E ′ | < ǫ | V | such that, taking G ′ = ( V, E ′ ) , T indH ( G ′ ) = ∅ . Hypergraphs

Sequences of Neighborhoods.

In order to extend the argumentsabove to hypergraphs, we need to deal with an additional complication.When G = (Ω , E ) and H = ( W, F ) are graphs and we consider the product t H ( G ) = R Q ( w,w ′ ) ∈ F χ E ( x w , x w ′ ) dµ , the distict terms in the product onlyoverlap on a single coordinate. The crucial step is that in Lemma 3.8, whenwe look at a single edge e = ( w , w ′ ) ∈ F , the “overlaps” with other edgesin F \ { e } share at most one coordinate, and are therefore B , -measurable.This means that we are able to use Lemma 3.8 (in the proof of Theorem 3.9)to replace E with E ( χ E | B , ).When G = (Ω , E ) and H = ( W, F ) are 3-graphs, however, the product t H ( G ) = R Q ( w,w ′ ,w ′′ ) ∈ F χ E ( x w , x w ′ , x x ′′ ) dµ has terms which can share twocoordinates. If we try to carry out a proof analogous to the proof of Theorem3.9, we are only able to reduce E to E ( χ E | B , ). E ( χ E | B , ), however,is “graph-like”—it is described in terms of two coordinates at a time, like agraph.This leads us to an iterated process where, at each step, we reduce thenumber of coordinates by one. This means we need to consider, not a singlesystem of neighborhoods, but a sequence of then: we will have a sequenceof systems of neighborhoods, N d , . . . , N , and we will consider not just theneighborhoods N jd ( ~x ), but how these neighborhoods sit in the neighborhood N jd ( ~x ) ∩ N j ′ d − ( ~x ) with j ′ ≫ j , and so on. In this section we will set up all the general machinery. For concreteness,we’ll focus on the case needed to prove induced hypergraph removal, whichmeans we will focus on the case where ~x is a k -tuple and we consider systemsof neighborhoods N k − , . . . , N where N i has arity i . We will refer to this,throughout this section, as the standard example . In particular, note thatthis example illustrates that in the intersection N jd ( ~x ) ∩ N j ′ d − ( ~x ), the set N jd ( ~x ) is “more complicated” (for example, it is deﬁned using sets of arity d )while the set N j ′ d − ( ~x ) is “ﬁner” (since j ′ ≫ j , we are working with a muchﬁner partition of Ω d − ). So we are looking at neighborhoods which use “somehigh complexity information and a lot of low complexity information”.We will nonetheless work, where possible, with general systems of neigh-borhoods, since this is the case we will use in the next section. (In the nextsection, N i +1 will have arity i , and N will consist only of intervals.)For this purpose, we identify the property we need a sequence of systemsof neighborhoods to have to be workable. (For instance, when k >

3, wecannot use a sequence of neighborhoods of arity k − N i +1 is “not too much more complicated” than N i , and it should be relatedto the “computability of overlaps” clause from Theorem 3.8. The generalproperty we need is given by the following deﬁnition. Deﬁnition 4.1. If D is a σ -algebra of sets of s tuples, C is a σ -algebra ofsets of r -tuples, and s ≤ r , we say C is properly aligned in D if, for any C ∈ C and any c with 1 ≤ c ≤ r , the function f ( x , . . . , x r ) = Z χ C ( y , . . . , y c , x c +1 , . . . , x r ) dµ is K r,s ( D )-measurable.We say a sequence of σ -algebras D d , . . . , D where D i is a σ -algebra ofsets of r i -tuples, is properly aligned if: • r = 1, • r i ≤ r i +1 for each i < d , and • D i +1 is properly aligned in D i for each i < d .Of course, the standard example itself is properly aligned. Lemma 4.2.

The sequence of σ -algebras B d , B d − , . . . , B is properly aligned.Proof. Since r i = i , the ﬁrst two conditions are immediate. If C ∈ B i +1 then the function f ( x , . . . , x i +1 ) = R χ C ( y , . . . , y c , x c +1 , x i +1 ) dµ dependsonly on ( x c +1 , . . . , x i +1 ), and is therefore K i +1 ,i +1 − c ( B i − c ) ⊆ B i +1 ,i ( B i )-measurable. (cid:3) When dealing with graphs, although we stated things in terms of tuples ~x W , we were really interested in the collection of inﬁnitesimal neighborhoods { lim j →∞ N j ( x w ) } w ∈ W . In the graph setting, however, we could ignore thedistinction between a point and its inﬁnitesimal neighborhood. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 19

For hypergraphs, though, we need to consider multiple layers of inﬁnites-imal neighborhoods: in the standard example, a pair ( x , x ) has a pair ofinﬁnitesimal neighborhoods lim j →∞ ( N j ( x ) , N j ( x )) and then an inﬁnites-imal neighborhood of pairs lim j →∞ N j ( x , x ). The problem is that spec-ifying an actual tuple of points pins down all these inﬁnitesimal neighbor-hoods simultaneously. But there could be distinct pairs ( x , x ) , ( y , y ) with( N j ( x ) , N j ( x )) = ( N j ( y ) , N j ( y )) for all j , but N j ( x , x ) = N j ( y , y )for some j —that is, a pair of inﬁnitesimal neighborhoods of points might(and, in general, does) partition into many neighborhoods of pairs.Now, however, we need to separate these notions properly. We bor-row model-theoretic terminology, referring to inﬁnitesimal neighborhoodsas types . Deﬁnition 4.3.

When N is a system of neighborhoods with arity r , an N -type is a decreasing sequence P ⊇ P ⊇ · · · with each P j ∈ N j a non-emptyset. When p = { P j } is a type, we write p ( j ) = P j . For any ~x ∈ Ω r , we write tp N ( ~x ) = {N j ( ~x ) } .There are two diﬀerent perspectives on types which it will be useful tokeep in mind below. The simpler perspective is that a type is, essentially, a G δ -set (more precisely, a distinguished presentation of a G δ -set): the typeis giving us the set of points T j P j , and dealing with N ( x ) rather than x isa way of “zooming out” from x to all the points inﬁnitesimally close to it.In particular, if we ﬁx two N -types N ( x ) , N ( x ), we are ﬁxing twosets, and so the product N ( x ) × N ( x ) is itself a rectangle. Althoughthis rectangle has measure 0, we can hope that it behaves like a limit of thepositive measure rectangles N j ( x ) × N j ( x ). For instance, if E is a randomgraph, we might expect that N j ( x ) × N j ( x ) contains both pairs belongingto E and pairs not belonging to E . Indeed, we will see that almost all pointsbelong to types which do behave like the limit of the positive measure typesthat approximate them.There is a technical subtlety: perhaps there is a failure of compactnessand the intersection T P j happens to be empty, even though each ﬁnite in-tersection is non-empty. In practice, we always care more about the approxi-mations to the set than the actual intersection: it the intersection happenedto be empty, we could always ﬁll in a point inside it. Indeed, ultraproductsare saturated which, in particular, ensures that each type is non-empty.This suggests the second perspective: we can think of the types themselvesas being points, in a diﬀerent but related space. That is, instead of workingwith the space Ω of points, we can work with a space Ω where an elementof Ω is a N -type, and we have a measurable function tp : Ω → Ω . Wewill not explicitly use this second perspective, but it may be useful to keepin mind. This second perspective also an explicit connection to the graphon-based approachesto regularity, as in [10, 24]. These approaches avoid the use of a Keisler graded probability

When ~x W is a tuple, we want to consider the N -type of ~x W , by whichwe mean the N -types of all size r subsets of W . Slightly more generally, if S ⊆ (cid:0) Wr (cid:1) , we need to consider the collection of N -types precisely for those e ∈ S . (The case we will need this for is that, if x w = x w ′ , we will want toignore those e ∈ (cid:0) Wr (cid:1) which contain both x w and x w ′ .) Deﬁnition 4.4.

When r ≤ | W | is the arity of N and S ⊆ (cid:0) Wr (cid:1) , an N - S -typeis a tuple ~p S = { ~p e } e ∈ S such that for each e ∈ S , ~p e is an N -type and, foreach j , ~p S ( j ) = T S ( { ~p e ( j ) } ) is non-empty.For any point ~x W , letting S = R ( ~x W ), there is a corresponding N - S -type tp ( ~x W ) given by ( tp N ( ~x W )) e = tp N ( ~x e ).The only case we will need is where S = (cid:0) Wr (cid:1) \ R ( ~x W ) (or an analogreplacing ~x W with N -types). Since tuples with repeated coordinates are anexceptional case with measure 0, they will not be needed until we deal withinduced hypergraph removal.Note that f + N ( ~x ) depends only on the type of ~x , not on the particularpoint, and so being a point of density is a property of the type: if ~x is apoint of density for f in N then so is every ~x in tp N ( ~x ).Finally, we need our most general deﬁnition: we have a sequence of sys-tems of neighborhoods N d , . . . , N and want to consider the N i -type of a ~x W -tuple for all i simultaneously. Deﬁnition 4.5.

When ~p = { ~p ,w } w ∈ W is a N - (cid:0) W (cid:1) -type, we write R r ( ~p ),the tuples of length r with repeated elements for the set of tuples e ∈ (cid:0) Wr (cid:1) such that there are w, w ′ ∈ e with w = w ′ and p ,w = p ,w ′ . When there areno repeated types, we will write R ( ~p ) = ∅ (omitting the subscript r ).When N d , . . . , N is a sequence of systems of neighborhoods for each i , a N d , . . . , N -type is a set ~p W = { ~p i,e } i ≤ d such where ~p = { ~p ,w } w ∈ W is an N - (cid:0) W (cid:1) -type and for i > ~p i = { ~p i,e } e ∈ ( Wri ) \R ei ( ~p ) is a N i -( (cid:0) Wr i (cid:1) \ R r i ( ~p ))-type.For any point ~x W , we write tp N d ,..., N ( ~x W ) for the type given by ( tp N d ,..., N ( ~x W )) i = tp N i ( ~x W ).This deﬁnition really is what we should expect: a N d , . . . , N -type ~p as-signs, for each i ≤ d and each r i -sub-tuple e without repeated N -types,an N i -type ~p i,e . The tuples with repeated types are omitted because thosetuples concentrate on diagonals, and have to be handled diﬀerently. space by taking our spaces Ω r with r > , they use a ternary product Ω ,where the ﬁrst two components represent copies of Ω while the third contains the part of Ω which is not measurable with respect to B , . Types give an alternate construction of this:we can see that the map tp : Ω → Ω given by tp ( x, y ) = ( N ( x ) , N ( y )) is inadequate—for instance, if E is a random graph on Ω, there is no E ∗ ⊆ Ω with E = tp − ( E ∗ ). Instead,the correct map is tp : Ω → Ω × Ω , where Ω the space of N -types; Ω is a Keislergraded probability space, but Ω × Ω is an ordinary measure-theoretic product. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 21

This is precisely a description of an inﬁnitesimal complex: we wish toconsider a subset of Ω W where, for each e ∈ (cid:0) Wr i (cid:1) , we restrict ourselves tothe set of ~x W such that ~x e ∈ ~p i,e .The additional subtlety is that when when we have a repeated tuple ~p ,w = ~p ,w ′ , we don’t want to consider more complicated types contain-ing more than one of them. (This is a technical point involving how wehandle repeated vertices in the proof of induced hypergraph counting, butfor now, observe that if ~p ,w = ~p ,w ′ then we should expect a type ~p , { , ′ } containing both to concentrate on a diagonal; since the diagonal has measure0, this means our type concentrates on a set of measure 0, which obstructsour ability to prove a counting lemma.)4.2. Dense Types.

We have already noted that being a point of density isreally a property of the type, not that point. For completeness, we restatethe deﬁnition in terms of types. Recall that, for any N -type p and anyinteger j , p ( j ) is a set in N j approximating p . Deﬁnition 4.6.

Let N be a nested system of neighborhoods. Let f : Ω W → [0 ,

1] be given. For any N -type p such that each p ( j ) has positive measure,we deﬁne f j N ( p ) = 1 µ ( p ( j )) Z p ( j ) f ( ~x ) dµ and f + N ( p ) = lim j →∞ f j N ( p ) . We say an N - S -type p is a dense type for f if f + N ( p ) exists andlim j →∞ µ ( p ( j )) Z p ( j ) | f + N ( y ) − f + N ( p ) | dµ ( y ) = 0 . We say p is a positive dense type for f if p is a dense type for f and f + N ( p ) > f . That means that when we havea ~p e , we need { ~p e ′ } e ′ ( e to be a dense type for each (or at least most) of thesets ~p e ( j ). In order to make the inductive step work, we need to demandthat f be dense at ~p in a slightly stronger way.We need to relativize the conditional expectation. We take a σ -algebra D ,a function f , and a set B which we should think of as being more complicatedthan those in D (for instance, we might have D = B , and B ∈ B \B , ), and we want to deﬁne the conditional expectation of f “around theset B ”. We will write this E ( f y B | D ), which will be precisely the D -measurable information with the property that, when given B , we canreconstruct E ( f χ B | D ). Deﬁnition 4.7.

Let f be a function, P a set, and D a σ -algebra. The weighted projection E ( f y P | D ) is deﬁned to be the unique (up to L -norm) function with domain { ~x | E ( χ P | D )( ~x ) > } such that E ( f y P | D )( ~x ) = E ( f χ P | D )( ~x ) E ( χ P | D )( ~x ) . Note that E ( f y P | D ) is, as the notation suggests, measurable withrespect to D . The main fact we will need is the following. Lemma 4.8. If g is D -measurable then Z f χ P g dµ = Z E ( f y P | D ) χ P g dµ. Proof. Z f χ P g dµ = Z E ( f χ P | D ) g dµ = Z E ( f y P | D ) E ( χ P | D ) g dµ = Z E ( f y P | D ) χ P g dµ. (cid:3) Deﬁnition 4.9.

Given f : Ω W → [0 , P ⊆ Ω W , and two systems ofneighborhoods N d , N d − , we deﬁne f + y P N d , N d − = E ( { ~y | f + N d ( ~y ) > } y P | K W,r d − ( N σd − ) } ) . This obscure deﬁnition is justiﬁed by its crucial appearance in Lemma4.12 below. In practice, P will have the form T ( Wrd )( { ~p e ( j ) } ) for some N d -type ~p , so we will have partitioned Ω W into sets P of this form and then wecan think of the functions f + y P N d , N d − as being a “partition of unity” appliedto the function E ( { y | f + N d ( ~y ) > } | K W,r d − ( N σd − ) } ). Deﬁnition 4.10.

For each i ≤ d , let N i be a system of neighborhoods andlet f : Ω W → R . We say a N d , . . . , N -type ~p W with R ( ~p ) = ∅ is a densetype for f in N d , . . . , N if: • ~p d is a dense type for f (as an N d -type) • for all j and each e ∈ (cid:0) Wr d (cid:1) , { ~p i,e ′ } i − E } in N d − , . . . , N . REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 23

If, additionally, f + N d ( ~p ) >

0, we say ~p is a positive dense type for f in N d , . . . , N . Lemma 4.11.

For any measurable f : Ω k → [0 , and almost every x , tp N d ,..., N ( x ) is a dense type for f , and for almost every x with f ( x ) > , tp N d ,..., N ( x ) is a positive dense type for f .Proof. The set of x so that R ( tp N d ,..., N ( x )) = ∅ has measure 0, so we mayignore these points.We now proceed by induction on d . tp ( x ) N d is a dense type for f in N d exactly when x is, and we have already shown that the set of x such that x is dense point for f has measure 1.For each P ∈ S j N d ( j ), by the inductive hypothesis the set of x suchthat tp N d − ,..., N ( x ) is a dense type for P has measure 1. Since there arecountably many elements in S j N d ( j ), the set of of x so that tp N d − ,..., N ( x )is a dense type for all of them also has measure 1.Also, for each P ∈ S j N d ( j ), the set of x ∈ P such that tp N d − ,..., N ( x ) isnot a positive dense type for P has measure 0, and so again, except on a setof measure 0, tp N d − ,..., N ( x ) will be a positive dense type for N jd ( x ).It remains to show that, for each δ > E , the set of pointsfailing the fourth condition above with E has measure < δ .Let δ, E be given. Let A + = { ~y | f + N d ( ~y ) > } . By choosing j suﬃcientlylarge, we can arrange that A + is contained, except for a set of measure < δ/

2, in elements P ∈ K W,r d ( N σd ( j )) such that µ ( A + ∩ P ) µ ( P ) > − δ E .Within any such P ,1 − δ < µ ( P ) µ ( A + ∩ P )= 1 µ ( P ) Z χ A + χ P dµ = 1 µ ( P ) Z E ( χ A + y P | K W,d − ( N σd − )) χ P dµ = 1 µ ( P ) Z f + y P N d , N d − χ P dµ and therefore µ ( { ~y ∈ P | f + y P N d , N d − ( ~y ) > − E } ) µ ( P ) > − δ S E = { ~y | f + y P N d , N d − ( ~y ) > − E } . Inductively, the set of ~x ∈ S E ∩ P such that tp N d − ,..., N ( ~x ) is not a positive dense type for S E has measure atmost δµ ( P ) /

2, and therefore the set of ~x ∈ S E such that tp N d − ,..., N ( ~x ) isnot a positive dense type for S E has measure at most δ . (cid:3) Counting and Removal.

The next result is the analog of the hyper-graph counting lemma. We suppose we have a conﬁguration { f e } e ∈ S with S a set of subsets of W , and we have points ~x W with f e ( ~x e ) > e , tp N d ,..., N ( ~x e ) is a positivedense type for f e , then actually we can expand this single point into a setof points of positive measure, showing that t S ( { f e } ) > Theorem 4.12.

Let W be a ﬁnite set, let N d , . . . , N be a properly alignedsequence of systems of neighborhoods so that N i is a nested system of neigh-borhoods with arity r i , let S be a set of subsets of W , and suppose that ~p W is a N d , . . . , N -type such that: • for each e ∈ S , the restriction ~p W is a positive dense type for f e , and • for each e ∈ S , either: – f e is K e,r d ( N σd ) -measurable, or – for every e ′ ∈ S \ { e } , the function ~x e R f e ′ ( ~x e ∪ e ′ ) dµ ( ~x e ′ \ e ) is K e,r d ( N σd ) -measurable.Then t S ( { f e } ) > .Proof. We proceed by induction on d . When d = 1, this is exactly Theorem3.9.So suppose d >

1. By Lemma 3.8 with D e = K e,r d ( N σd ) for all e , wehave t S ( { f e } ) = t S ( { ( f e ) + N d } ). Since ~p e is a positive dense type of f e , also( f e ) + N d ( ~p e ) > e . Let A + e = { ~y e | ( f e ) + N d ( ~y e ) > } . It suﬃces toshow that t S ( { χ A + e } ) > j suﬃciently large. For each e ∈ S , let A ♭e = { ~z | ( χ A + e ) + y T ( erd ) ( { ~p d,e ′ ( j ) } ) N d , N d − ( ~z ) > − | S | + 1 } , so { ~p i,e } i µ ( T S ( { A ♭e } ) ∩ T ( Wrd )( { ~p d,e ( j )) } ) > . Consider some ~y W ∈ h T S ( { A ♭e } ) \ T S ( { A + e } ) i ∩ T ( Wrd )( ~p d,e ( j )). There mustbe some e ∈ S such that ~y e ∈ A ♭e \ A + e . For each e ∈ S , µ ( { ~y W ∈ T ( Wrd )( ~p d,e ( j )) | ~y e ∈ T S ( { A ♭e } ) \ A + e } )= Z χ A ♭e (1 − χ A + e ) Y e ∈ ( Wrd ) χ ~p d,e ( j ) · Y e ∈ S \{ e } χ A ♭e dµ = Z χ A ♭e (1 − χ A + e ) Y e ′ ∈ ( e rd ) χ ~p d,e ′ ( j ) · Y e ∈ S \{ e } χ A ♭e Y e ∈ ( Wrd ) \ ( e rd ) χ ~p d,e ( j ) dµ REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 25 = Z χ A ♭e (1 − ( χ A + e ) + y T ( e rd ) ( { ~p d,e ′ } ) ) Y e ′ ∈ ( e rd ) χ ~p d,e ′ ( j ) · Y e ∈ S \{ e } χ A ♭e Y e ∈ ( Wrd ) \ ( e rd ) χ ~p d,e ( j ) dµ< | S | + 1 µ ( T S ( { A ♭e } ) ∩ T ( Wrd )( { ~p d,e ( j ) } )) . Therefore µ ( h T S ( { A ♭e } ) \ T S ( { A + e } ) i ∩ T ( Wrd )( { ~p d,e ( j ) } ))) < | S || S | + 1 µ ( T S ( { A ♭e } ) ∩ T ( Wrd )( { ~p d,e ( j ) } ))) , so t S ( { A + e } ) ≥ µ ( T S ( { A ♭e } ) ∩ ~p d ( j )) > | S | + 1 µ ( T S ( { A ♭e } ) ∩ T ( Wrd )( { ~p d,e ( j ) } ))) > . (cid:3) Theorem 4.13.

Suppose H = ( W, F ) is a ﬁnite k -graph and G = (Ω , E ) isa k -graph with a countably approximated atomless Keisler graded probabilityspace on Ω with E ∈ B k . If t H ( G ) = 0 then there is a symmetric E ′ ⊆ E such that E \ E ′ is a measure set contained in an intersection of sets in B k and, taking G ′ = (Ω , E ′ ) , T H ( G ′ ) = ∅ .Proof. Nearly identical to the proof of Graph Removal, Theorem 3.10. Let N k − , . . . , N be a sequence of systems of neighborhoods for the main exam-ple. Then let E ′ ⊆ E consist of the points in E whose type is positive densefor χ E . If T H ( E ′ ) = ∅ then any ~x W ∈ T H ( E ′ ) satisﬁes the conditions of theprevious lemma, and therefore t H ( E ) > (cid:3) Corollary 4.14.

For every ﬁnite k -graph H = ( W, F ) and every ǫ > thereis a δ > so that whenever G = ( V, E ) is a k -graph with t H ( G ) < δ , thereis a symmetric E ′ ⊆ E with | E \ E ′ | < ǫ | V | k such that, taking G ′ = ( V, E ′ ) , T H ( G ′ ) = ∅ . Conditioning on Sets of Measure . Before going on, it will beconvenient to consider the notion of picking a type “uniformly at random”.The natural way to pick a random type is to pick a random point ~x and take tp N d ,..., N ( ~x ). (Note that, with probability 1, such a type has no repeated N -types, so we do not need to worry about that complication here.)However what we will need later is to ﬁrst pick N -types, then the N -type, and so on, and we will need to describe what it means to pick a N -typerandomly among the extensions of a given N -type.Because the types represent sets of measure 0, we do not generally expectto be able to make sense of the probability of an event conditioned on beingin a type ~p . However because these events are intersections of a well-deﬁnedfamily of positive measure events, we can make sense of conditioning onthem as long as the right limits converge.Say we have two systems of neighborhoods, N and M of arity r ≤ s ,respectively. (For instance, M = N i +1 while N = N i .) Then almost every N - s -type ~p is a dense type for every P in every M ( c ), since there are onlycountably many such sets.For instance (in the standard example) choosing N types p and q givesus a measure 0 rectangle p × q ; despite being measure 0, for almost all p and q we can make sense of choosing a pair ( x, y ) ∈ p × q randomly and taking tp N ( x, y ): the probability that tp N ( x, y )( j ) = P is precisely the density of P in the type p × q .So we may choose a N d , . . . , N -type by ﬁrst choosing N -types randomly,and then inductively using this process to choose the N -types, then the N -types, and so on.The only thing we need to check is that this gives the same distributionas if we had simply chosen the type of a random point. Lemma 4.15.

The inductive method of choosing N d , . . . , N -types has thesame distribution as choosing tp N d ,..., N ( ~x ) for a uniformly chosen ~x .Proof. By induction on i ≤ d . For i = 1 these distributions have the samedeﬁnition. Suppose the claim holds for i . The probability that we choosea N i +1 -type ~p i +1 with ~p i +1 ( j ) = P is the integral, over all choices of ~p i of( χ P ) + N i ( ~p i ). By the inductive hypothesis, this is the integral over a random ~x of ( χ P ) + N i ( tp N i ( ~x )), which is equal to µ ( P ). (cid:3) Induced Hypergraph Removal.

To prove induced hypergraph re-moval, we need a hypergraph counting lemma that allows repeated elementsin tuples. To do that, we need to generalize the notion of a likely conﬁgura-tion.Suppose we have a tuple of points ~a W where some of the points are re-peated . Once again, we want to be able to “wiggle the points” so that wecan replace them with nearby points which we will be able to apply The-orem 4.12 to. The complication is that now, in addition to the types ofthe points themselves, we need to be concerned with the types of the tu-ples they belong to: if we “wiggle” a , this also aﬀects the neighborhood of tp N ( a , a ). Indeed, we can’t really “wiggle” a while holding tp N ( a , a )constant, because tp N ( a , a ) completely determines tp N ( a ).So we have to wait until later in the counting process: the proof of The-orem 4.12 inductively reduces a hypergraph counting problem to a problemabout counting graphs. In particular, prior to the last step of that process,we replace tp N ( a , a ) with a positive measure approximation to it. (Specif-ically, the set A ♭ , which we construct in that proof.) At that point, we cansafely “wiggle” a , since we can just promise to remain within various setswhich have positive measure.With that in mind, we can prove our inﬁnitary version of removal. Westate it in a general form, allowing a coloring ρ of k -tuples and showing Really, this should be talking about types rather than points, but we can more or lessequate a w with tp N ( a w ), and this will be clearer without the added abstraction of talkingabout types. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 27 that, with a small change to ρ , we can simultaneously remove all copies ofsmall colorings ( W, c ) which appear with 0 density in (Ω , ρ ). As always, theinduced hypergraph case is when Σ = { , } . Theorem 4.16.

Let Σ be a ﬁnite set and let (Ω , ρ ) be a coloring with acountably approximated atomless Keisler graded probability space on Ω andlet ρ : (cid:0) Ω k (cid:1) → Σ be such that each ρ − ( σ ) ∈ B . For each ǫ > there is a ρ ′ : (cid:0) Ω k (cid:1) → Σ such that µ ( { ~x | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ, each ( ρ ′ ) − ( σ ) ∈ B , and, for all ( W, c ) , if t W,c (Ω , ρ ) = 0 then T W,c (Ω , ρ ′ ) = ∅ .Proof. To emphasize the way the proof generalizes, we will state it in termsof a general sequence of systems of neighborhoods, N d − , . . . , N . For thisprecise result, we can take d = k and r i = i for all i < d , but nothingchanges to consider a longer sequence of systems of neighborhoods, and wewill need this case in the next section. General Setup : For each σ ∈ Σ, let P σ = { ~y | ρ ( ~y ) = σ } and f σ = χ P σ ,so T W,c (Ω , ρ ) = T ( Wk )( { P σ ( e ) } e ∈ ( Wk )). Let P = { P σ | σ ∈ Σ } , which is apartition of k -tuples from Ω. Choose a sequence c d − < · · · < c where each c j is suﬃciently large relative to the sizes of |N c j ′ j ′ | for j ′ > j and so c islarge enough that the set of k -tuples with more than one point in the sameelement of N c has measure < ǫ/ P as being analogous to N d ; in particular, wedeﬁne r d = k .Our plan is this. We have partitioned Ω r d − into the elements of N c d − d − ,then partitioned Ω r d − into the elements of N c d − d − , which are much smaller,and so on. By analogy to the proof of Theorem 3.12, we will want to considerthe partition of Ω k into sets of the form T U i P × P . Here, analogously, wehave to deal separately with the case where ~P { i } = ~P { j } for i = j —that is,the case where two of the singleton components of ~P are the same elementof N c . As in Theorem 3.12, these components account for a small amountof measure, so we have a great deal of freedom in how we deﬁne ρ ′ on them.We address this later, after dealing with the other components. The second complication is that when we choose ~x ~P , we need the choicesto be “coherent”: if we have two components ~P and ~Q such that ~P { i } = ~Q { j } ,we need tp N ( ~x ~Pi ) = tp N ( ~x ~Qj ). More generally, if there is an e such that P e = Q e for e ⊆ e , we need tp N | e | ( ~x ~Pe ) = tp N | e | ( ~x ~Qe ) for e ⊆ e . (Really, weneed something slightly more general: if there is a bijection π : [ k ] → [ k ]and an e such that P e = Q π ( e ) for e ⊆ e then we should have tp N | e | ( ~x ~Pe ) = tp N | e | ( ~x ~Qπ ( e ) ).)This is why we introduced the alternate method of selecting types inthe previous section. For each P ∈ N c with positive measure, we willchoose a type p P ⊆ P . (That is, so that p P ( c ) = P .) Then we willturn to pairs: given a triple ~P = { P , , P , P } we will take ~p ~P = p P , ~p ~P = p P , and as long as P , has positive density at p P × p P , we willchoose a p ~P , ⊆ P , ∩ ( p P × p P ). Conﬁgurations : Let us make all this precise. As the discussion abovesuggests, we will want to work inductively, starting with the partition ofΩ and working our way up to the partition of Ω k . We will call the com-ponents of these partitions conﬁgurations . For j ≤ d , let us deﬁne a ≤ j -conﬁguration to be a collection { P e } e ∈ U [ j ′ ] ≤ max { j,d − } ( [ rj ] rj ′ ) such that foreach e ∈ U j ′ ≤ max { j,d − } (cid:0) [ r j ] r j ′ (cid:1) , P e ∈ N c | e | | e | . (Note that ≤ d -conﬁgurations areslightly disanalogous, since they do not have a “top level” component from P .)It will be convenient to restrict ≤ j -conﬁgurations in the natural way: if ~P = { P e } e ∈ U j ′≤ max { j,d − } ( [ rj ] rj ′ ) is a ≤ j -conﬁguration, j < j , and e ∈ (cid:0) [ j ] j (cid:1) ,we can deﬁne ~P ↾ e to be the ≤ j ′ -conﬁguration { P e } e ∈ U j ′≤ j ( [ e rj ′ ).For each j ≤ d , the ≤ j -conﬁgurations partition Ω r j into sets of the form T U j ′≤ max { j,d − } ( [ rj ] rj ′ )( { P e } e ∈ U j ′≤ max { j,d − } ( [ rj ] rj ′ )).We say a conﬁguration { P e } has distinct singletons if whenever i, i ′ ∈ [ j ] with i = i ′ , P { i } = P { i ′ } . We will ﬁrst deal with the conﬁgurationswith distinct singletons, and then deal with the remaining conﬁgurations.Note that the conﬁgurations without distinct singletons account for a smallamount of measure.We ultimately want to associate each ≤ j -conﬁguration ~P with a type ~p ~P , which we do by induction on j . However there is a technical issue wemust address ﬁrst. If µ ( T j ′ ≤ max { j,d − } ( { P e } )) = 0 then we cannot choose arandom type reﬁning this conﬁgurations. Slightly more generally, if j > j − P [ j ] having density 0 atthe corresponding type of j -tuples, which will also lead to us being unableto continue the process. In both these cases, we will consider ~P a defective conﬁguration. We therefore assign, to each conﬁguration ~P , an associated REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 29 conﬁguration ~Q ~P , which is always non-defective. In most cases, we will have ~Q ~P = ~P , but when ~P is defective, ~Q ~P will be a diﬀerent conﬁguration. (Asthis name suggests, only a small amount of measure will be contained indefective conﬁgurations.) We will then choose the type ~p ~P to reﬁne ~Q ~P . Choosing representative types : By induction on j , for each j < d and each ≤ d -conﬁguration ~P = { P e } with distinct singletons we will choosea ≤ j -conﬁguration ~Q ~P and a N j , . . . , N -type ~p ~P .Once we have completed this deﬁnition for j , it is natural to deﬁne N j , . . . , N -[ r j ′ ]-types for any ≤ j ′ conﬁguration: when ~P is a ≤ j ′ -conﬁgurationwith j ′ > j , we deﬁne ~p ~P ↾ j to be the type with ~p ~P ↾ je = ~p ~P ↾ ee for all e ∈ U j ≤ j (cid:0) [ j ′ ] j (cid:1) .We inductively arrange that:(1) the choices are cumulative: for j ′ < j and e ∈ (cid:0) [ j ] r j ′ (cid:1) , ~Q ~Pe = ~Q ~P ↾ e e and ~p ~Pe = ~p ~P ↾ e e ,(2) the choices are symmetric: if π : [ j ] → [ j ] is a permutation and ~P ,( ~Q ~P π ) π − = ~Q ~P and ( ~p ~P π ) π − = ~p ~P ,(3) the types ~p ~P reﬁne the conﬁgurations: for each ~P , each j ′ ≤ j , andeach e ∈ (cid:0) [ j ] r j ′ (cid:1) , ~p ~P ( c j ′ ) = ~Q ~Pe ,(4) ~p ~P is a positive dense type for ~Q ~P [ j ] ,(5) for every j ′ ∈ ( j, d ] and every ≤ j ′ -conﬁguration ~P with distinct sin-gletons, the type ~p ~P ↾ j is a dense type for ~P [ j ′ ] ,(6) few points belong to conﬁgurations represented by types which makethe sets in the conﬁguration very sparse: the set of points belongingto ≤ d -conﬁgurations ~P with distinct singletons such that, for some e ∈ U j ′ ∈ ( j,d ) (cid:0) [ d ] r j ′ (cid:1) , the density of P e at the N j , . . . , N - e -type ~p P ↾ e is < ǫ k , has measure at most ǫ .We will describe a random construction of the entire sequence and thenargue that, with positive probability, we can ﬁnd a choice satisfying theseconditions.For j = 1, a ≤ P ∈ N c . If µ ( P ) = 0, we take Q P to be some set in N c with positive measure, otherwise we take Q P = P .We then take ~p P to be tp N ( x ) for a randomly chosen element x ∈ Q P .Suppose we have completed the construction for j . Consider the equiva-lence classes consisting of ≤ j + 1-conﬁgurations ~P where we identify conﬁg-urations under permutations of [ j + 1]; we will choose a single representativefrom each such equivalence class.We can choose ~Q ~P and ~p ~P as follows. We ﬁrst consider the non-defectiveconﬁgurations. Consider a ≤ j + 1-conﬁguration ~P with distinct singletonssuch that:(1) for each e ( [ r j +1 ], we have ~Q ~Pe = ~P e , and (2) χ + ~P [ rj +1] ( { ~p ~Pe } e ( [ r j +1 ] ) > χ + ~P [ rj +1] ( { ~p ~Pe } e ( [ r j +1 ] ) exist because wewill arrange for (5) to hold). Then we set ~Q ~P [ r j +1 ] = ~P [ r j +1 ] and choose ~p ~P [ r j +1 ] to be a random reﬁnement of ~Q ~P [ r j +1 ] at { ~p ~Pe } e ( [ r j +1 ] , as described in theprevious subsection. (The choice of the ~p ~P is the only non-deterministicpart of the construction.)Consider a defective conﬁguration ~P such that for each e ( [ r j +1 ], we have ~Q ~Pe = ~P e but χ + ~P [ rj +1] ( { ~p ~Pe } e ( [ r j +1 ] ) = 0. (That is, ~P ↾ j was ﬁne, but thetop level component ~P [ r j +1 ] makes it defective defective.) Since { ~p ~Pe } e ( [ r j +1 ] is a dense type for all P ∈ N c j +1 j +1 and these sets are a partition, there issome P ∈ N c j +1 j +1 with χ + P ( { ~p ~Pe } e ( [ r j +1 ] ) >

0, and we take ~Q ~P [ r j +1 ] = P and ~p ~P [ r j +1 ] = ~p ~Q ~P [ r j +1 ] .Finally, consider a defective conﬁguration such that, for some e ( [ r j +1 ], ~Q ~Pe = ~P e . Then we wish to simply follow along with the “corrected” con-ﬁguration: deﬁne ~P ′ by ~P ′ e = ~Q ~Pe for e ( [ j + 1] and ~P ′ [ r j +1 ] = ~P [ r j +1 ] , andset ~Q ~P [ r j +1 ] = ~Q ~P ′ [ r j +1 ] (which was already deﬁned in one of the previous cases)and ~p ~P [ r j +1 ] = ~p ~P ′ [ r j +1 ] .To satisfy symmetry, we deﬁne ~Q and ~p for permutations of ~P in theunique way determined by symmetry. Note that we need to use the factthat ~P has distinct singletons to make sure that no permutation other thanthe identity maps ~P to itself, so the symmetry requirement imposes nofurther restrictions on our choices.We need to check that, with positive probability, the choice of the ~p ~P satisﬁes the six conditions above. The ﬁrst three follow immediately fromthe construction.The fourth property and ﬁfth properties hold with probability 1, so wecan certainly choose the types ~p ~P to satisfy these properties.For the sixth property, note that the ≤ k -conﬁgurations ~P such that thereexists an e ⊆ k and a j < | e | such that the density of P e in T U j ′ ( [ k ] j ′ )( ~P )is < ǫ k have measure at most ǫ . Consider a ≤ k -conﬁguration such that thisdoes not happen. As we observed in the previous section, the corresponding N j , . . . , N - e -type ~p ~P is chosen with the same distribution as choosing thetype of a random point in T U j ′ ( [ k ] j ′ )( ~P ). By our choice of c j , for each ≤ k -conﬁguration ~P , the probability that there is any e , j so that P e is hasdensity 0 in ~p ~P ↾ j is at most ǫ/

3. By averaging over all ≤ k -conﬁgurations(weighted by their size), there is positive probability we choose the ~p ~P that REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 31 the set of ≤ k -conﬁgurations failing the condition in (6) has measure at most ǫ . This last condition implies that most points belong to non-defective con-ﬁgurations: the only way there is an e with ~P e = ~Q ~Pe is if there is an e sothat P e has density 0 in the corresponding type of lower arity, which meansall such conﬁgurations are contained in the set of exceptional conﬁgurations. Deﬁning ρ ′ for most tuples : At this point, we have done enough todeﬁne ρ ′ on ≤ k -conﬁgurations ~P with distinct singletons.Let Ξ be the set of non-empty subsets of Σ, and for each ≤ k -conﬁguration ~P with distinct singletons, let ξ ( ~P ) = { σ ∈ Σ | ( f σ ) + N k − ( ~p ~P ) > } . (Sincethe f σ add to 1, ξ ( ~P ) is always non-empty, and therefore in Ξ.) We willdeﬁne ρ ′ on T U j ′ ( [ k ] j ′ )( ~P ) by setting ρ ′ ( ~x ) = ( ρ ( ~x ) if ρ ( ~x ) ∈ ξ ( ~P )some σ ∈ ξ ( ~P ) otherwise . Note that if ρ ( ~x ) = ρ ′ ( ~x ), we must have one of: • the ≤ k -conﬁguration containing ~x does not have distinct singletons, • the ≤ k -conﬁguration containing ~x is defective, or • ρ ( x ) ξ ( ~P ).The ﬁrst two conditions account for points of measure at most ǫ/ ρ ( x ) ξ ( ~P ) then we have ( f ρ ( x ) ) + N k − ( ~p ~P ) = 0. Since the ~p ~P are distributeduniformly at random, except on a set of conﬁgurations of measure at most ǫ/

6, ( f ρ ( x ) ) + N k − ( ~p ~P ) = 0 implies that the set of points in ~P with color ρ ( x )has measure at most ( ǫ/ µ ( T U j ′ ( [ k ] j ′ )( ~P )). Therefore (regardless of how wedeﬁne ρ ′ on the ≤ k -conﬁgurations which do not have distinct singletons), µ ( { ~x | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ. Dealing with tuples with distinct singletons : Next, again as inLemma 3.12, we need to decide what to do with the conﬁgurations whichdo not have distinct singletons.To motivate the construction, it is useful to look at how we will use ourdeﬁnition. Suppose that, after ﬁnishing the deﬁnition of ρ ′ , we have some ~x W ∈ T W,c (Ω , ρ ′ ). Then ~x W induces some maps into our partition: for j < d we can take θ j : (cid:0) Wr j (cid:1) → N c j j given by θ j ( e ) = N c j j ( ~x e ). Then for j ≤ d ,whenever e ∈ (cid:0) Wr j (cid:1) and θ is injective on e , we can deﬁne Θ( e ) to be the ≤ d -conﬁguration { θ j ( e ′ ) } e ′ ∈ S j ′ (cid:0) Wk (cid:1) are mapped. Letus consider tuples ( W, ζ, ι, { θ j } ) where ( W, ζ, { θ j } ) is a strict blow up and ι : ( (cid:0) Wk (cid:1) \ dom( ζ )) → Σ. (We make the choice here to have ζ take values inΞ while ι only takes values in Σ; it would cause no harm, except perhapsadditional complication, to instead let ι be Ξ-valued as well.)We want to consider tuples ( W, ζ, ι, { θ j } ) where ι is “homogeneous”, inthe sense that ι ( e ) only depends on the conﬁguration e is mapped to. Tomake this precise, let us say ~P = { P e } is a ≤ k -conﬁguration with repeatedsingletons if each P e ∈ N c e | e | and P e is deﬁned for all e such that, for i, i ′ ∈ e , P i = P i ′ . (That is, when e contains repeated elements of N c , we simplydo not deﬁne P e .) Let us deﬁne Z to be the set of ≤ k -conﬁgurations withrepeated singletons.We can extend the deﬁnition of Θ to those e ∈ (cid:0) Wr j (cid:1) where θ is not injectiveon e by deﬁning Θ( e ) ∈ Z to be { θ j ( e ′ ) } e ′ ∈ S j ′

Σ, let us say (

W, ζ, ι, { θ j } ) is ν -homogeneous if, for all e ∈ (cid:0) Wk (cid:1) \ dom( ζ ), ι ( e ) = ν (Θ( e )).Given ~x W ∈ Ω W , we can of course induce a function ι : (cid:0) Wk (cid:1) → Σ bytaking ι ( e ) = ρ ( ~x e ). So what we need to do is ﬁnd such ~x W which arehomogeneous.Let us say ( W, ζ, { θ j } ) has size at least d if for every non-defective ≤ k -conﬁguration ~P with distinct singletons, there are at least d k -tuples in (cid:0) Wk (cid:1) with Θ( e ) = ~P .Observe that, for every ( W, ζ, { θ j } ), there is some d so that whenever( W ′ , ζ ′ , { θ ′ j } ) has size at least d , there is an embedding π : W → W ′ sothat, for all e , θ j ( e ) = θ ′ j ( π ( e )). Furthermore, for every d , there is an m sothat for any ( W, ζ, ι, { θ j } ) with size at least m , there is a W ⊆ W so that( W , ζ, ι, { θ j } ) has size at least d and is homogeneous.So, for each d , we can take this large enough m and ﬁx a ( W, ζ, { θ j } )of size at least m . We have t W,ζ (Ω , ξ ) > ~x W ∈ T W,ζ (Ω , ξ ),we have a W ⊆ W of size at least d so that ~x W is homogeneous (that is,there is a ν : Z →

Σ so that, for e ∈ (cid:0) W k (cid:1) , P ( ~x e ) = ν (Θ( e ))—equivalently,( W , ζ, ι, { θ j } ) is ν -homogeneous, where ι is induced by ~x W ). Since thereare ﬁnitely many ν , there must be some ν which we obtain for a set of ~x W of positive measure. Such a ν exists for every m , so there is some ν whichworks for arbitrarily large m . REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 33

We pick such a ν and use it to complete the deﬁnition of ρ ′ : when ~x ∈ T U j ′ ( [ k ] j ′ )( ~P ) for some ~P ∈ Z , we set ρ ′ ( ~x ) = ν ( ~P ). Checking that removal holds : All that remains is show that whenever T W,c (Ω , ρ ′ ) = ∅ that t W,c (Ω , ρ ) > W and ( W, c ) so that T W,c (Ω , ρ ′ ) = ∅ . Choosesome ~x W ∈ T W,c (Ω , ρ ′ ).From ~x W we can read oﬀ ( W, ζ, { θ j } ) by setting, for e ∈ (cid:0) Wr j (cid:1) , θ j ( e ) = ~Q {N cj ′ j ′ ( ~x e ′ ) } e ′ ∈ S j ′≤ j ( erj ′ ) and ζ ( e ) = ξ ( ~x e ). There is d so that ( W, ζ, { θ j } ) will embedin any blow up of the partition of size at least d . Take ( W ′ , ζ ′ , { θ ′ j } ) ofsize at least m where m is large enough, t W ′ ,ζ ′ (Ω , ρ ) > ν , there is a positive measure of ~y W ′ ∈ T W ′ ,ζ ′ (Ω , ρ ) such thatthere is a W ⊆ W ′ so that ( W , ζ ′ , { θ ′ j } ) has size at least d and ~y W is ν -homogeneous.So consider one of these ~y W where, for all e ∈ (cid:0) W k (cid:1) , ~y e is a point of densityfor each f σ and a positive point of density for f P ( ~y e ) . Fix an embedding π : W → W . We claim that, for each e ∈ (cid:0) Wk (cid:1) , f + c ( e ) ( ~y π ( e ) ) > e )

6∈ Z , so Θ( e ) is a non-defective ≤ k -conﬁguration with distinct singletons. Then c ( e ) = ρ ′ ( ~x e ) and, by thedeﬁnition of ρ ′ , ρ ′ ( ~x e ) ∈ ξ ( ~p Θ( e ) ) = ζ ( e ) = ζ ′ ( π ( e )) = ξ ( ~y π ( e ) ). Therefore c ( e ) ∈ ξ ( ~y π ( e ) ), so f + c ( e ) ( ~y π ( e ) ) > e ) ∈ Z . Again c ( e ) = ρ ′ ( ~x e ) and, by the deﬁnitionof ρ ′ , ρ ′ ( ~x e ) = ν (Θ( e )). Since ~y W is ν -homogeneous, we have ρ ( ~y π ( e ) ) = ν (Θ( e )).So we can apply Theorem 4.12 to ~y π ( W ) , showing that t W,c (Ω , ρ ) > (cid:3) Ordered Hypergraphs

The work of the previous section applies, with only minimal changes, toordered hypergraphs.

Deﬁnition 5.1.

When (Ω , < ) is a linearly ordered set and ( W, ≺ ) is a ﬁnitelinear order, we write O

For each k , B k,< is the the sub- σ -algebra of B k generatedby all products Q i ≤ k I i where each I i is an interval in < .Note that, by deﬁnition, B k,< ⊆ B k, . Lemma 5.3. If { (Ω k , B k , µ k ) } k ∈ N then { ( x, y ) | x < y } ∈ B ,< .Proof. We show that, for any ǫ >

0, we may approximate { ( x, y ) | x < y } towithin ǫ . Given ǫ >

0, write Ω = S i { ( x, y ) | x < y } ∈ B ,< . (cid:3) In general, this means that when ( W, ≺ ) is a partially ordered set, the setof ~x W which respect the partial ordering is B W,< -measurable, since it is anintersection of sets of the form { x W | x w < x w ′ } . Theorem 5.4.

Let ( W, ≺ ) be a partially ordered ﬁnite set, let N d , . . . , N , N ,< be a properly aligned sequence of systems of neighborhoods so that N i is anested system of neighborhoods with arity r i , let S be a set of subsets of W ,and suppose that ~p W is a N d , . . . , N , N ,< -type such that: • when w ≺ w ′ , there is an i so that ~p w ( i ) < ~p w ′ ( i ′ ) , • for each e ∈ S , the restriction ~p W is a positive dense type for f e , and • for each e ∈ S , either: – f e is K e,r d ( N d ) -measurable, or – for every e ′ ∈ S \ { e } , the function ~x e R f e ′ ( ~x e ∪ e ′ ) dµ ( ~x e ′ \ e ) is K e,r d ( N d ) -measurable. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 35

Then t S, ≺ ( { f e } ) > .Proof. We proceed by induction on d . When d = 0 (that is, we are consider-ing a N ,< -type), the proof is similar to Theorem 3.9, taking care to respectthe ordering.Choose some ǫ ≤ min e ∈ S f + e ( ~x e ).Since each ~p e is a dense type, we may choose some j large enough that,for each e ∈ S , 1 µ ( ~p e ( j )) µ ( { ~y e ∈ ~p e ( j ) | f + e ( ~y e ) ≥ ǫ/ } ) ≥ − | S | . Consider Q w ∈ W p w ( j ). This is a product of intervals and, when j is largeenough, the map w p w ( j ) is order preserving. Therefore1 µ ( ~p W ( j )) µ ( { ~y W ∈ ~p W ( j ) | f + e ( ~y e ) ≥ ǫ/ w ≺ w ′ , y w < y w ′ } ) ≥ − | S | . Therefore γ = µ ( { ~y W ∈ ~p W ( j ) | there is some e ∈ S such that f + e ( ~y e ) < ǫ/ } ) < | S | | S | = 1 , so, using Lemma 3.8, t S, ≺ ( { f e } ) = t S, ≺ ( { f + e } ) ≥ ǫ | S | | S | (1 − γ ) > . The argument from Theorem 4.12 applies unchanged for the inductivecase since the set of ordered tuples is N i -measurable for all i . (cid:3) Theorem 5.5 (Ordered Hypergraph Removal) . Let Σ be a ﬁnite set andlet (Ω , ρ, < ) be given along with a countably approximated atomless Keislergraded probability space on Ω with ρ : (cid:0) Ω k (cid:1) → Σ such that each ρ − ( σ ) ∈ B and a dense collection of intervals of < is in B . For each ǫ > there is a ρ ′ : (cid:0) Ω k (cid:1) → Σ such that µ ( { ~x ∈ Ω k | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ, each ( ρ ′ ) − ( σ ) ∈ B , , and, for all ( W, c, ≺ ) , if t W,c, ≺ (Ω , ρ ) = 0 then T W,c, ≺ (Ω , ρ ′ ) = ∅ .Proof. The proof is largely unchanged from the proof of Theorem 4.16 usingthe sequence of systems of neighborhoods N k − , N k − , . . . , N , N ,< , so wetake d = k + 1, and using Theorem 5.4.The only further step that needs to be checked carefully is the homogeniza-tion step when dealing with tuples with distinct singeltons. Our deﬁnition ofa blow up ( W, ζ, { θ j } ) is unchanged—note that a partial ordering of W , onpairs where θ is injective, can be inferred from the assignment θ . Our homo-geneous blowups ( W, ζ, ι, { θ j } , ≺ ) are deﬁned to have total orderings where ≺ is consistent with θ . The crucial point is that the Ramsey-type propertystill holds: for every d , there is an m so that for any ( W, ζ, ι, { θ j } , ≺ ), thereis a W ⊆ W so that ( W , ζ, ι, { θ j } , ≺ ) is homogeneous. The ordering is no obstacle to obtaining this by the usual Ramsey theoretic arguments, andthe rest of the proof is unchanged. (cid:3) Corollary 5.6 (Ordered hypergraph removal lemma) . For every ﬁnite set Σ , every ǫ > , and every Σ -colored ( W, ≺ , c ) , there is a δ > so that forany (Ω , <, ρ ) with t W, ≺ ,c (Ω , <, ρ ) < δ there is a ρ ′ with |{ ~x ∈ Ω k | ρ ( ~x ) = ρ ′ ( ~x ) }| < ǫ | Ω | k such that T W, ≺ ,c (Ω , <, ρ ) = ∅ . Corollary 5.7 (Inﬁnite ordered hypergraph removal lemma) . For everyﬁnite set Σ , every ǫ > , and every family F of ﬁnite Σ -colored orderedhypergraphs, there are δ > and a bound M so that for any (Ω , <, ρ ) , if, forevery ( W, ≺ , c ) ∈ F with | W | ≤ M we have t W, ≺ ,c (Ω , <, ρ ) < δ , then thereis a ρ ′ with |{ ~x ∈ Ω k | ρ ( ~x ) = ρ ′ ( ~x ) }| < ǫ | Ω | k such that for every ( W, ≺ , c ) ∈ F , T W, ≺ ,c (Ω , <, ρ ) = ∅ . Further Directions

We have not attempted to identify the correct common generalization ofTheorems 4.16 and 5.5 to give a general theorem saying that certain struc-tures can be removed while preserving some ﬁxed structure. Such a theoremmust make some promise about the measurability of the ﬁxed structure, andadditionally place some sort of Ramsey-type condition on it.There are other examples in the literature where some distinguished fam-ily of sets analogous to B < is of particular interest. In particular, [14] con-siders a computational setting; translated into our framework here, we addthe assumption that the points of Ω are understood to have a structure likebinary sequences 2 Λ , embodied in a distinguished family of sets B ,c whichconsists of those sets[ s ] = { ω | ∀ λ ∈ dom( s ) s ( λ ) = ω ( λ )where s is a partial function with ﬁnite domain from Λ to { , } . That is, thedistinguished sets are those in which a ﬁnite number of coordinates have beenﬁxed. The regularity lemma they prove is precisely the one correspondingto the sequence of σ -algebras B , B ,c ; extending the removal lemma to thissetting (or to longer sequences B d , . . . , B , B ,c ) would require identifyinginteresting structures to be the ﬁxed part (analogous to the ordering) whichare B ,c -measurable—that is, the relation symbols in this structure wouldhave to have the property that they can be calculated on all but measure ǫ points while examining the input at only ﬁnitely many points in Λ. References [1] D. J. Aldous. “Representations for partially exchangeable arrays ofrandom variables”. In:

J. Multivariate Anal. issn : 0047-259X (cit. on p. 3).

EFERENCES 37 [2] N. Alon and O. Ben-Eliezer. “Eﬃcient removal lemmas for matrices”.In:

Order issn : 0167-8094 (cit. on p. 1).[3] N. Alon, O. Ben-Eliezer, and E. Fischer. “Testing hereditary propertiesof ordered graphs and matrices”. In: . IEEE Computer Soc.,Los Alamitos, CA, 2017, pp. 848–858 (cit. on p. 1).[4] O. Ben-Eliezer et al.

Limits of Ordered Graphs and their Applications .2018. eprint: arXiv:1811.02023 (cit. on p. 3).[5] B. van den Berg, E. Briseid, and P. Safarik. “A functional interpre-tation for nonstandard arithmetic”. In:

Ann. Pure Appl. Logic issn : 0168-0072 (cit. on p. 5).[6] F. R. K. Chung. “Regularity lemmas for hypergraphs and quasi-randomness”.In:

Random Structures Algorithms issn : 1042-9832 (cit. on p. 12).[7] D. Conlon et al. “Weak quasi-randomness for uniform hypergraphs”.In:

Random Structures Algorithms issn : 1042-9832 (cit. on p. 12).[8] L. N. Coregliano and A. A. Razborov. “Semantic limits of dense combi-natorial objects”. In:

Uspekhi Mat. Nauk issn : 0042-1316 (cit. on pp. 2, 34).[9] P. Diaconis and S. Janson. “Graph limits and exchangeable randomgraphs”. In:

Rend. Mat. Appl. (7) issn : 1120-7183 (cit. on p. 3).[10] G. Elek and B. Szegedy. “A measure-theoretic approach to the theoryof dense hypergraphs”. In:

Adv. Math. issn : 0001-8708 (cit. on pp. 3, 9, 19).[11] J. Fox. “A new proof of the graph removal lemma”. In:

Ann. of Math.(2) issn : 0003-486X (cit. on p. 5).[12] F. Garbe et al.

Limits of Latin squares . 2020. eprint: arXiv:2010.07854 (cit. on p. 3).[13] I. Goldbring and H. Towsner. “An approximate logic for measures”.English. In:

Israel Journal of Mathematics issn : 0021-2172 (cit. on pp. 4, 8, 14).[14] M. Göös, T. Pitassi, and T. Watson. “Query-to-communication liftingfor

BPP ”. In:

SIAM J. Comput. issn : 0097-5397 (cit. on p. 36).[15] W. T. Gowers. “Hypergraph regularity and the multidimensional Sze-merédi theorem”. In:

Ann. of Math. (2) issn : 0003-486X (cit. on p. 1).[16] D. Hoover.

Relations on Probability Spaces and Arrays of RandomVariables . Preprint. Institute for Advanced Study, Princeton, NJ, 1979(cit. on p. 3).[17] C. Hoppen et al. “Limits of permutation sequences”. In:

J. Combin.Theory Ser. B issn : 0095-8956 (cit. on p. 3). [18] Y. Kohayakawa et al. “Weak hypergraph regularity and linear hyper-graphs”. In:

J. Combin. Theory Ser. B issn :0095-8956 (cit. on p. 12).[19] L. Lovász.

Large networks and graph limits . Vol. 60. American Mathe-matical Society Colloquium Publications. American Mathematical So-ciety, Providence, RI, 2012, pp. xiv+475. isbn : 978-0-8218-9085-1 (cit.on p. 9).[20] L. Lovász and B. Szegedy. “Limits of dense graph sequences”. In:

J.Combin. Theory Ser. B issn : 0095-8956 (cit.on p. 2).[21] G. Moshkovitz and A. Shapira. “A tight bound for hyperaph regular-ity”. In:

Geom. Funct. Anal. issn : 1016-443X (cit. on p. 5).[22] B. Nagle, V. Rödl, and M. Schacht. “The counting lemma for regular k -uniform hypergraphs”. In: Random Structures Algorithms issn : 1042-9832 (cit. on p. 1).[23] V. Rödl and J. Skokan. “Regularity lemma for k -uniform hypergraphs”.In: Random Structures Algorithms issn : 1042-9832 (cit. on p. 1).[24] “Semantic Limits of Dense Combinatorial Objects”. In:

ArXiv abs/1910.08797(2019) (cit. on p. 19).[25] T. Tao. “A variant of the hypergraph removal lemma”. In:

J. Combin.Theory Ser. A issn : 0097-3165 (cit. onp. 2).[26] T. Tao. “Szemerédi’s regularity lemma revisited”. In:

Contrib. DiscreteMath. issn : 1715-0868 (cit. on p. 2).[27] H. Towsner.

What do ultraproducts remember about the original struc-tures? draft. Apr. 2018 (cit. on p. 5).[28] H. Towsner. “ σ -algebras for quasirandom hypergraphs”. In: RandomStructures Algorithms issn : 1042-9832 (cit.on pp. 4, 7).[29] H. Towsner. “An analytic approach to sparse hypergraphs: hypergraphremoval”. In:

Discrete Analysis (2018) (cit. on p. 4).

Department of Mathematics, University of Pennsylvania, 209 South 33rdStreet, Philadelphia, PA 19104-6395, USA

Email address : [email protected] URL ::