aa r X i v : . [ m a t h . C O ] J a n A REMOVAL LEMMA FOR ORDERED HYPERGRAPHS
HENRY TOWSNER
Abstract.
We prove a removal lemma for induced ordered hypergraphs,simultaneously generalizing Alon–Ben-Eliezer–Fischer’s removal lemmafor ordered graphs and the induced hypergraph removal lemma. That is,we show that if an ordered hypergraph (
V, G, < ) has few induced copiesof a small ordered hypergraph (
W, H, ≺ ) then there is a small modifi-cation G ′ so that ( V, G ′ , < ) has no induced copies of ( W, H, ≺ ). (Notethat we do not need to modify the ordering < .)We give our proof in the setting of an ultraproduct (that is, a Keislergraded probability space), where we can give an abstract formulation ofhypergraph removal in terms of sequences of σ -algebras. We then showthat ordered hypergraphs can be viewed as hypergraphs where we viewthe intervals as an additional notion of a “very structured” set. Alongthe way we give an explicit construction of the bijection between theultraproduct limit object and the corresponding hyerpgraphon. Introduction
In this paper, we will show a removal lemma for ordered hypergraphs—asimultaneous generalization of the removal lemma for ordered graphs [2, 3]and for hypergraphs [15, 22, 23].As in similar results, the methods naturally generalize to finite coloringsof k -tuples (“hypermatrices over a finite alphabet”). Therefore, in full gen-erality, our main result is the following. Corollary 5.6.
Let ǫ > be given and let Σ be a finite alphabet. There isa δ > so that whenever (Ω , < ) is an ordered set and ρ : (cid:0) Ω k (cid:1) → Σ , there isa ρ ′ : (cid:0) Ω k (cid:1) → Σ such that: • |{ ~x ∈ (cid:0) Ω k (cid:1) | ρ ( ~x ) = ρ ′ ( ~x ) }| < ǫ | Ω | k , and • for each ordered set ( W, ≺ ) with | W | < /ǫ and each coloring c : (cid:0) Wk (cid:1) → Σ , either: – (Ω , ρ ′ , < ) contains no copies of ( W, c, ≺ ) (that is, there are noorder-preserving functions π : W → Ω such that ρ ′ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) ), or – (Ω , ρ, < ) contains many copies of ( W, c, ≺ ) (that is, the set oforder-preserving functions π : W → Ω such that ρ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) has size at least δ | Ω | | W | ). Date : January 26, 2021.Partially supported by NSF grant DMS-1600263.
Coregliano and Razborov have recently [8] shown a result that could alsoplausibly be called ordered hypergraph removal. Their removal involvesmodifying the entire structure—that is, one replaces (Ω , <, ρ ) with (Ω , < ′ , ρ ′ ), whereas the result here only modifies ρ . Their argument is quite general,applying to a wide range of structures. By contrast, our result is narrower,though we discuss at the end how the arguments might be generalized.Our approach is to restate the usual proof of hypergraph removal in asufficiently general way that the proof of ordered hypergraph removal fallsout without much change. We will consider k -graphs which have a sequenceof notions of “structured sets”. In the usual graph removal lemma, thissequence would has length 1: the only kind of structured set is the rectangles(that is, sets of edges of the form A × B for sets A and B ).In the hypergraph removal lemma for k -graphs, the sequence of notions ofstructure has length k −
1: the first, most general tier of structured sets are cylinder sets generated by k − k − Q i ≤ k A i , which are exactly the cylinder sets generated by 1-tuples).Meanwhile, in the ordered graph case, the sequence of notions of structurehas length 2: the more general tier of structured set is the rectangles A × B where A, B are arbitrary sets of vertices, while the second, more restrictivetier of structure is sets of the form I × J where I, J must be intervals in theordering.Once we have set up this general framework, ordered hypergraph removalwill fall out almost instantly from the proof of hypergraph removal: wewill use a sequence of notions of structure of length k , beginning with thecylinder sets generated by k − Q i ≤ k A i where the A i are arbitrary, and then adding an additional notion ofstructured set given by boxes of the form Q i ≤ k I i where the I i are intervals.The idea that Szemerédi’s regularity lemma and its generalizations canbe viewed in terms of nested notions of structure is present, for instance, in[25, 26], which describe Szemerédi’s regularity lemma in terms of conditionalexpectation.Working with multiple layers of structure typical requires fairly compli-cated dependencies to correctly express bounds in the finite setting, so itis convenient to pass to an infinitary, analytic setting where we can “let ǫ equal 0”—that is, where some of the bounds will disappear into a measure-theoretic limit object.There are two main approaches to representing the notion of structurein such a formalism. To be explicit, consider the case of a 3-graph; in thefinite setting, we have a large vertex set V and consider some symmetric set H ⊆ V . In one approach to limit objects, the graphon approach (e.g. [20]),the limit object is an uncountable space Λ and a measurable function f :Λ → [0 , V , while the additional three coordinates correspond to the pairs of REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 3 coordinates. More generally, if we began with sets H ⊆ V k , the limit objectwould involve functions on Λ k − . (The value 2 k − { , , . . . , k } except for the empty set and the whole set, whichrepresents a “purely quasirandom” component which is omitted from thelimit object )Analogously, an ordered graph is a symmetric H ⊆ V where V is an or-dered set. An orderon [4] (the graphon-like limit object corresponding to anordered graph) is then a function from Λ to [0 , V are analo-gous to pairs from Λ, where the first component represents the informationabout the ordering and the second component the additional informationwhich is present in the vertex which not explained by its position in theordering. (Permutons [17] and latinons [12]—limit objects for permutationsand Latin squares, respectively—similarly acquire “extra” coordinates inthis way.)It is both an advantage and a disadvantage of this representation that itfully separates out the interactions between these coordinates: the coordi-nates are combined as a familiar product measure space (it is common totake Λ = [0 , , n for some n ) and one can use standard results (for instance, theLebesgue density theorem, as in [10]) on the space. However, because of thisde-association of the coordinates, it is difficult to interpret what the higherorder coordinates “mean” in a general way: when we represent a 3-graphwith a function f ( x, y, z, u, v, w ), it is difficult to concretely say, in a gen-eral way, what a particular value of u means. (Indeed, it is artificial, anda bit misleading, to represent these objects as powers of a single space: forinstance, there is no reason to think we can swap the z and u coordinatesin a meaningful way.)Here we prefer a different approach to the limit object where the limitobjects have a more familiar form: our version of a limit of 3-graphs willbe a subset of Ω for an uncountable set Ω, and our version of an orderedgraph will be a subset of Ω where Ω is an ordered set. The price is thatwe must work with a Keisler graded probability space . This means that themeasurable sets are more complicated: in addition to the sets of pairs given In a third, related, setting— arrays of exchangeable random variables [1, 9, 16]—thecoordinate corresponding to the empty set is typically included, but easily eliminatedbecause an exchangeable array of random variables is a combination of dissociated arrays,where the dissociated arrays are precisely those where the coordinate corresponding tothe empty set can be ignored. A natural generalization of a k -hypergraphon would be toadd the coordinate corresponding to the empty set; this would be roughly represent anensemble of k -hypergraphons rather than a single such object. The coordinate corresponding to the full set is related to why we end up with a functionrather than a set: we could work with a set F ⊆ Λ k − , and think of f ( ~ω ) = R f ( ~ω, u ) du where u is the extra coordinate. See Section 2.2.
HENRY TOWSNER by the standard product measure construction, there are typically additionalmeasurable sets of pairs which include the quasirandom sets.In this setting, different kinds of structure are identified by looking at sub- σ -algebras of measurable sets, which represent notions of structure [28, 29].For instance, when we consider a 3-graph H ⊆ Ω , we have a collection B of measurable subsets of Ω , containing all the information we need aboutthe first two coordinates, but there is also a product σ -algebra, B , ⊆ B ,which is the collection of measurable sets generated by rectangles. So thefirst two coordinates of Λ correspond to determining information about setsin B , , while the fourth coordinate of Λ corresponds to information aboutthe quasirandom elements of B .Formally, the two approaches are linked by a suitable map π : Ω → Λ where, for instance, π − ( C × Λ ) must give a set of the form A × Ω where A is B , -measurable, while π − (Λ × B × Λ ) must give a set of the form A × Ω where A is quasirandom. (In fact, our approach to the proof will leadus to construct something close to an explicit version of this map.)Similarly, when we have an ordering on Ω we have both the collection B of all measurable subsets of Ω, but also a sub- σ -algebra B < which isgenerated by the intervals. In this setting, we will be able to prove:
Theorem 5.5.
Let Σ be a finite alphabet and let (Ω , < ) be a set togetherwith a Keisler graded probability space on Ω such that < is measurable, andsuppose ρ : (cid:0) Ω k (cid:1) → Σ is measurable. For each w and each ǫ > , there is a ρ ′ : (cid:0) Ω k (cid:1) → Σ such that • µ ( { ~x | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ , and • for each ordered set ( W, ≺ ) and coloring c : (cid:0) Wk (cid:1) → Σ , either: – (Ω , ρ ′ , < ) contains no copies of ( W, c, ≺ ) (that is, there are noorder-preserving functions π : W → Ω such that ρ ′ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) ), or – (Ω , ρ, < ) contains many copies of ( W, c, ≺ ) (that is, the set oforder-preserving functions π : W → Ω such that ρ ( π ( ~x )) = c ( ~x ) for all ~x ∈ (cid:0) Wk (cid:1) has positive measure). Corollary 5.6 follows immediately by a standard ultraproduct argumentas described in [13].The main technique in our proof will be reproducing, in this setting, some-thing like the Lebesgue density theorem: a way of defining a notion of densityin this setting so that almost every point is dense.The use of infinitary and measure-theoretic arguments in proofs like thisis superficial: rather than interpreting the proof of Theorem 5.5 as actually The association between these approaches are quite systematic. For instance, the factthat orderons are functions with domain Λ is essentially telling us that when we lookat the Keisler graded probability space, we should be paying attention to a particularsub- σ -algebra of B , namely B < . REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 5 involving infinite sets, one can interpret these as abbreviations for compli-cated statements about finite sets. In particular, although the proof doesnot explicitly give bounds on the relationship between δ and ǫ in Theorem5.5, it is routine (though quite tedious) to translate the proof into an explicitcombinatorial one with bounds. (See [5, 27] for a formal descriptions of howthis may be done in general.)The proof here uses a familiar structure, so we can already tell roughlywhat upper bound it gets: the “unwound” proof of Theorem 5.5 for ordered k -graphs goes through a result similar to the regularity lemma for k + 1-graphs. This means the upper bound for removal of ordered k -graphs wouldbe on the order of the k + 2-th function in the fast-growing hierarchy. (Re-call that the second function in this family if roughly exponential, and laterfunctions are obtained by iterating the previous function, so the 3-rd func-tion is iterated exponentiaion, the 4-th function is the “wowzer” functionobtained by iterating the iterated exponential, and so on.) For hypergraphregularity, these bounds are known [21] to be tight. (For graph removal,better bounds [11] can be obtained by avoiding the regularity lemma. Itseems likely that similar methods can produce at least some improvementon the upper bound for hypergraph and ordered hypergraph removal.)2. Preliminaries
Cylinder Intersection Sets.
One of the central ideas in the proofof graph removal is approximating a graph (
V, E ) using sets of vertices. Inmodern presentations, this idea is usually expressed using some form of theSzemerédi regularity lemma: we find a partition V = S i ≤ n V i so that most ofthe bipartite graphs ( V i , V j , E ∩ ( V i × V j )) have a quasi-randomness property.To turn this into a proof of graph removal, one of the key points is that almostall of the edges in E will belong to bipartite graphs ( V i , V j , E ∩ ( V i × V j )) wherethe quasi-randomness property holds and the density | E ∩ ( V i × V j ) || V i × V j | ≥ ǫ > P be the set of pairs ( i, j ) where this density is bounded awayfrom 0, we end up considering a related graph, E + = E ∩ ( Q i,j ∈ P V i × V j ),the edges which are near many other edges.In order to generalize this to k -graphs, we need to generalize the idea ofa product set to higher arity. The right notion is a cylinder intersection set :a cylinder intersection set of k -tuples is a collection of k -tuples defined byrestricting the sets that certain r -sub-tuples can belong to for r < k . Forinstance, if k = 3 and r = 2, the prototypical cylinder intersection set is aset of the form { ( x, y, z ) | ( x, y ) ∈ A, ( x, z ) ∈ B and ( y, z ) ∈ C } . We can see that a product is just a cylinder intersection set where we onlyconsider sub-tuples with r = 1. HENRY TOWSNER
Since we will be considering cylinder intersection sets extensively, andsince it turns out that we can view graph homomorphisms as themselves be-ing cylinder intersection sets, it will be convenient to introduce some uniformnotation.We are interested in a situation where we have a finite set of points—say, W —with some structure (a k -graph or an ordering) and are interested in“copies” inside some other set V . For this purpose, it is useful to work with“ W -tuples”. Definition 2.1.
When W is a finite set, a W -tuple from Ω is a function W to Ω. We write Ω W for the set of W tuples.When k is a non-negative integer, we write [ k ] for the set { , , . . . , k } .These definitions equate a [ k ]-tuple with a k -tuple and a Ω [ k ] with Ω k , sowe can view this as an extension of the usual notation for tuples. Definition 2.2.
When ~x W ∈ Ω W is a W -tuple and e ⊆ W , we write ~x e ∈ Ω e for the e -tuple ~x e = ~x W ↾ e .When e = { i } is a singleton, we can abbreviate x i = ~x { i } = ~x W ( i ) torecover the usual notation for tuples. Definition 2.3.
We write (cid:0) Wk (cid:1) for the collection of subsets of W of size k .We write (cid:0) W ≤ k (cid:1) for S i ≤ k (cid:0) Wi (cid:1) , the collection of subsets of W of size ≤ k , and (cid:0) W When H = ( W, F ) is a finite k -graph and G = (Ω , E ) is a k -graph, we define the copies of H in G , written T H ( G ), to be T F ( { E } ).That is, the copies of ( W, F ) in (Ω , E ) are the tuples ~x W such that, forevery e ∈ F , ~x e ∈ E . Definition 2.5. When H = ( W, F ) is a finite k -graph and E ⊆ Ω k is a k -graph, we define the induced copies of H in G , written T indH ( G ) to be T ( Wk )( { A e } ) where A e = (cid:26) E if e ∈ F Ω k \ E otherwise . That is, the induced copies of ( W, F ) in (Ω , E ) are the tuples ~x W suchthat, for each e ∈ (cid:0) Wk (cid:1) , e ∈ F if and only if ~x e ∈ E .A few other kinds of cylinder intersection sets will be needed along theway. Another case we will see is when S = (cid:0) [ k ] ≤ k (cid:1) or S = (cid:0) [ k ] Suppose ρ : (cid:0) Ω k (cid:1) → Σ and c : (cid:0) Wk (cid:1) → Σ are colorings.The copies of ( W, c ) in (Ω , ρ ), written T W,c (Ω , ρ ), are T ( Wk )( { A e } ) where A e = { ~x e | ρ ( ~x e ) = c ( e ) } .An induced copy of ( W, F ) in (Ω , E ) is precisely a copy of ( W, χ F ) in(Ω , χ E ) where the characteristic functions χ F , χ E are viewed as coloringswhere Σ = { , } .2.2. Measure Spaces. It will be convenient for us to prove our results in aninfinitary setting where we can use some measure theoretic ideas. A Keislergraded probability space consists of a set Ω and, for each k , a measure µ k onsubsets of Ω k .When Ω is finite, the natural choice is to take each µ k to be the countingmeasure, µ k ( S ) = | S || Ω | k , on subsets of Ω k . When Ω is infinite, we need to fix σ -algebras of measurable sets and add some conditions to ensure that themeasures are compatible with each other. Definition 2.7. A Keisler graded probability space on Ω is a collection ofprobability measure spaces, (Ω k , B k , µ k ), for each k ∈ N so that: • whenever π : [1 , k ] → [1 , k ] is a permutation and B ∈ B k , we have B π = { ( x π (1) , . . . , x π ( k ) ) | ( x , . . . , x k ) ∈ B } ∈ B k and µ k ( B π ) = µ k ( B ), • if B ∈ B k and C ∈ B r then B × C ∈ B k + r , • whenever B ∈ B k + r , the set of ( x , . . . , x r ) such that B x ,...,x r = { ( x r +1 , . . . , x k + r ) | ( x , . . . , x k +1 ) ∈ B } ∈ B k is a set in B r of measure1 and µ k + r ( B ) = Z µ k ( B x ,...,x r ) dµ r . We say { (Ω k , B k , µ k ) } k ∈ N is atomless if, for every x ∈ V , µ ( { x } ) = 0.Atomless Keisler graded probability spaces are the setting obtained bytaking the limit of the counting measures as the size of Ω approaches infinity(made precise by using an ultraproduct). As a result, for many purposes onecan simply pretend that an atomless Keisler graded probability space is finitewith | Ω | very, very large.The special case where, for each k , B k is equal to the product σ -algebra B k is the most familiar example, but in general a Keisler graded proba-bility space may have additional measurable sets which do not belong tothe product σ -algebra. These additional sets precisely correspond to thequasirandom graphs and hypergraphs [28].More generally, we take B I to be a measure space on Ω I , obtained from B | I | in the natural way by choosing any bijection between I and { , , . . . , | I |} , HENRY TOWSNER and we have a corresponding measure µ I on B I . Since B | I | and µ | I | aresymmetric, B I and µ I do not depend on the choice of bijection.The σ -algebra B k of all measurable sets has canonical sub- σ -algebras gen-erated by cylinder intersection sets which are, in general, proper; thesecorrespond exactly to the “non-quasirandom” sets (for various notions ofquasirandomness). Definition 2.8. When r < k , B k,r is the sub- σ -algebra of B k generated byall ([ k ] , (cid:0) [ k ] r (cid:1) )-cylinder intersection sets where all components are elements of B r .More generally, when D is a sub-algebra of B r , we write K k,r ( D ) for thesub- σ -algebra of B k generated by all ( k, (cid:0) [ k ] r (cid:1) )-cylinder intersection sets whereall components are elements of D .We say { (Ω k , B k , µ k ) } k ∈ N is countably approximated if each, for each k ,there is a countable algebra of sets B k ⊆ B k such that: • K k,r ( B r ) ⊆ B k for all r < k , • the algebras B k are symmetric, • whenever B ∈ B k , r < k , and q ∈ Q ∩ (0 , D ∈ B r with { ~x ∈ Ω r | µ ( B ~x ) < q } ⊆ D ⊆ { ~x ∈ Ω r | µ ( B ~x ) ≤ q } . Ultraproducts of graphs are countably approximated, using the definablesets (in a large enough language) as the approximating sets. (It turns outthat we cannot quite expect to exactly close the algebras under level sets;see [13] for more on the approach here.)We can think of B k,r as being the sets of k -tuples which are “explainedby” properties of r -tuples.Since B k,r is symmetric, we can also define B W,r in the natural way—equivalently, as the image of B | W | ,r under any bijection of W with { , . . . , | W |} ,or as the σ -algebra generated by ( W, (cid:0) Wr (cid:1) )-cylinder intersection sets wherethe component A e belongs to B e . Definition 2.9. We define t S ( { A e } e ∈ S ) = µ ( T S ( { A e } e ∈ S )). More generally,we define t S ( { f e } e ∈ S ) = Z Y e ∈ S f e ( ~x e ) dµ. We similarly define t H ( G ) = µ ( T H ( E )), t indH ( G ) = µ ( T indH ( G )), and t W,c (Ω , ρ ) = µ ( T W,c (Ω , ρ )).There will be no confusion between these related definitions, since t S ( { A e } ) = t S ( { χ A e } ). 3. Removal and Induced Removal for Graphs In this section we prove graph removal, using this as a vehicle to introduceour notation and approach and prove some lemmas we will need for the moregeneral results in later sections. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 9 Neighborhoods and Points of Density. We would like to work withpoints of density of measurable functions—that is, points which behave likelimits of the nearby points. One difficulty is that a general probabilitymeasure space may not have a natural basis like the open balls. We willfix this by brute force: we simply pick, more or less arbitrarily, a family ofneighborhoods around each point which suffice for our purposes. More precisely, we will have, for each point x ∈ Ω, a sequence N j ( x ) ofneighborhoods such that µ ( N j ( x )) → 0. (The analogous arrangement inthe Lebesgue measure would take N j ( x ) = B /j ( x ).)Since we will need it later and the definition is the same, we will define asystem of neighborhoods around tuples in Ω r as well. Definition 3.1. When D is a countable collection of subsets of Ω r , a systemof neighborhoods in D is a sequence of partitions, N = {N j } such that: • each N j is a finite partition of Ω r , • when i < j , N j refines N i , • for every set A ∈ D , there is a j so that A differs by measure 0 froma union of elements of N j , • lim j →∞ max P ∈N j µ ( P ) = 0.We call r the arity of N .We write N σ for the σ -algebra generated by all sets in S j N j .Since we will be working with partitions frequently, we introduce somenotation. Definition 3.2. When N j is a partition of Ω r and ~x ∈ Ω r is a point, wewrite N j ( ~x ) for the unique set P ∈ N j such that ~x ∈ P .We should think of N as being a schema giving, for each tuple ~x W andeach number j , a set N j ( ~x W ) which is the “ball around the tuple ~x W ”.We want to lift partitions of Ω r to partitions of Ω k with r < k in theobvious way—a partition of Ω k is a cylinder intersection set coming fromour partition of Ω r . Definition 3.3. When N j is a partition of Ω r , r ≤ | W | and ~x W ∈ Ω W , wewrite N j ( ~x W ) for the (cid:0) Wr (cid:1) -cylinder intersection set T ( Wr )( {N j ( ~x e ) } e ∈ ( Wr )).For instance, when N has arity 1, it induces partitions of Ω into sets ofthe form P × Q where P, Q ∈ N j .A system of neighborhoods N = {N j } give us a natural way to definedensity. An alternative method, which plays a central role in the “graphon” approach to limitgraphs [19], is to use the fact that every probability measure space is, in a suitable way,equivalent to the Lebesgue measure on the unit interval, and then use the usual notion ofa point of density. This is used, for instance, in [10] to prove hypergraph regularity. Definition 3.4. Let N be a system of neighborhoods of arity r . Given k ≥ r and f : Ω k → [0 , f j N ( ~x ) = 1 µ ( N j ( ~x )) Z N j ( ~x ) f ( ~x ) dµ whenever µ ( N j ( ~x )) > f + N ( ~x ) = lim j →∞ f j N ( ~x )wherever each f j N is defined and this limit exists. We call ~x a point of densityfor f in N if f + N ( ~x ) exists andlim j →∞ µ ( N j ( ~x )) Z N j ( ~x ) | f + N ( ~y ) − f + N ( ~x ) | dµ ( ~y ) = 0 . When A ⊆ Ω k , we call ~x a point of density for A in N if ~x is a point ofdensity for χ A .One might have expected the definition of a point of density to be simplythat lim j →∞ µ ( N j ( ~x )) Z N j ( ~x ) | f ( ~y ) − f ( ~x ) | dµ ( ~y ) = 0 . But take E to be a quasi-random graph and let f = χ E and N a system ofneighborhoods of arity 1; in this case, every positive measure neighborhood N j ( x , x ) = N j ( x ) × N j ( x ) has the property that half its points belongto E and half do not belong to E , so there would be no points of density asall.This is the fundamental difference from classical Lebesgue measure: be-cause we are working in a Keisler graded probability space with quasi-random elements, we cannot expect most points in the graph to be nearother points in the graph. However we will see that we can expect mostpoints to have a well-defined density, and to be near other points with asimilar density.We will usually want, not just any point of density of f , but one wherethe density f + N is positive. Definition 3.5. We say x is a positive point of density of f if x is a pointof density of f with f + N ( x ) > 0. When E is a set, a positive point of density of E is a positive point of density of χ E .When r = 1—that is, when N consists of sets of points—there are noparticular symmetry issues. In particular, when f is a symmetric function(for instance, the characteristic function of a graph of hypergraph), everypermutation of a point of density is also a point of density. When r > 1, wehave to worry about whether the neighborhoods themselves are symmetric. Lemma 3.6. If f is symmetric, each N j is symmetric (that is, each per-mutation of a set in N j is also in N j ), and ~x is a point of density for f in N then each permutation of ~x is also a point of density. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 11 In general, f + N is E ( f | K k,r ( N σ )). More precisely, E ( f | K k,r ( N σ )) isonly defined up to the L norm, so f + N is a natural representative of E ( f |K k,r ( N σ )). Lemma 3.7. For any measurable function f : Ω k → [0 , , f + N is definedalmost everywhere, f + N = E ( f | K k,r ( N ) σ ) , almost every x is a point ofdensity of x , and almost every point x with f ( x ) > is a positive point ofdensity of f .Proof. We first show that the functions f j N converge in the L norm to E ( f | K k,r ( N σ )). For any ǫ > 0, we may choose j large enough that || E ( f | K k,r ( N j )) − E ( f | K k,r ( N σ )) || L < ǫ . Then whenever j ≥ j , N j refines Q up to measure 0, so also || f j N − E ( f | K k,r ( N σ )) || L < ǫ .To see that the pointwise limit is defined almost everywhere and thatalmost every point is a point of density, consider any ǫ > α < β .Let g = E ( f | K k,r ( N σ )). Choose j large enough so that there is a set S ∈ K k,r ( N j ) so that µ ( S △ { ~x | g ( ~x ) ≤ α } ) < β − α − α ǫ .Consider all rectangles R from S j N j which are contained in S and suchthat the average of f on R is ≥ β . Since R R g dµ ≥ βµ ( R ) and g ≤ 1, we musthave { ~x ∈ R | g ( ~x ) > α } ≥ β − α − α µ ( R ), and therefore µ ( R ) < ǫ . Therefore,once j ≥ j , except for a set of measure ǫ , if g ( ~x ) ≤ α then for all j ≥ j , f j N ( ~x ) ≤ α as well. So the set of points with g ( ~x ) ≤ α but lim sup f j N ( ~x ) > α has measure < ǫ . Dually, we can show that the set of points with g ( ~x ) ≥ β but lim inf f j N ( ~x ) < β has measure < ǫ . Since this holds for all α, β and all ǫ , for almost all ~x we have f + N ( ~x ) = lim f j N ( ~x ) = g ( ~x ).By the same argument, for any α and any δ > f + N ( ~x ) ≤ α , except for a set of measure < ǫ , for all sufficiently large j we have f j N ( ~x ) ≤ α + δ , and therefore since µ ( N j ( ~x )) R N j ( ~x ) f ( ~x ) dµ = µ ( N j ( ~x )) R N j ( ~x ) f + N ( ~x ) dµ ,the set of ~y ∈ N j ( ~x ) with f + N ( ~y ) ≥ α + δ is small. So almost every ~x is apoint of density.Finally, to see that almost every point with f ( x ) > f + N ( x ) > 0, let Z be the set of points where f + N ( ~x ) = 0. Since f + N is K k,r ( N )-measurable, Z belongs to the completion of K k,r ( N ), so 0 = R Z f + dµ = R Z f dµ , so theset of ~x ∈ Z where f ( ~x ) > (cid:3) Counting and Graph Removal. The next fact we need is that thequantity t S ( { f e } e ∈ S ) depends only on the “non-random” part of the f e . Inits simplest form, this says that if E is a graph, t S ( E ) = t S ( E ( E | B , ))—that is, we can replace the graph E with the function E ( E | B , ) measuringthe density of E when counting graph densities. In the graphon approach, this fact plays a central role: the object E ( E | B , ) is thegraphon, and the basic theorems establish that for things like counting graph densities,this is all that is needed. We will state this fact in a very general way which will continue to serveus as we deal with k -graphs. Lemma 3.8. Let { f e } e ∈ S be given and, for each e , let D e be a σ -algebra ofsets of r -tuples such that, for every e ∈ S , | e | ≥ r and either: • f e is D e -measurable, or • for every e ∈ S \ { e } and each fixed ~x e \ e , the function ~x e f e ( ~x e ) is D e -measurable.For each e , let f ′ e = E ( f e | D e ) . Then t S ( { f e } ) = t S ( { f ′ e } ) . The general form allows the case where S contains tuples of differentsizes, and replaces B , with a more general σ -algebra which may dependon the coordinate e ; most commonly, we will have D e = K e,r ( D ) for a fixed σ -algebra D .We need some requirement that D e is large enough. For example, when weturn to 3-graphs, we might initially try D e = B , while S ⊆ (cid:0) W (cid:1) . Workingonly with B , amounts to working with weak hypergraph regularity [6],which is known to suffice when S is linear —that is, when | e ∩ e ′ | ≤ e, e ′ ∈ S [7, 18]. But when the elements of S can overlap moregenerally, we need to work with a larger σ -algebra, for instance B , . Thisis precisely what the second case of the lemma requires: that the “overlaps”with the other functions is already measurable with respect to D e . Proof. We show by induction on | T | , where T ⊆ S , that t S ( { A e } ) = Z Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T f e ( ~x e ) dµ. When T = S , this gives the desired claim.When T = ∅ , the statement is trivial.Suppose the inductive hypothesis holds for T and that e ∈ S \ T . Thenwe have t S ( { f e } ) = Z f e ( ~x e ) Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T ∪{ e } f e ( ~x e ) dµ. For a fixed ~x W \ e , consider the function h ( ~x e ) = Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T ∪{ e } f e ( ~x e ) . Each term in the product is D e -measurable, so h is D e -measurable as well.Therefore t S ( { f e } ) = Z E ( f e | K e ,r ( D ))( ~x e ) Y e ∈ T f ′ e ( ~x e ) Y e ∈ S \ T ∪{ e } f e ( ~x e ) dµ = Z f ′ e ( ~x e ) Y e ′ ∈ T f ′ e ′ ( ~x e ′ ) Y e ′ ∈ S \ T ∪{ e } f ′ e ′ ( ~x e ′ ) dµ which gives the inductive claim. (cid:3) REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 13 The next result should be seen as our version of the graph counting lemma.Typically, a graph counting lemma would say something like the following:Suppose S ⊆ (cid:0) W (cid:1) and that for each w ∈ W , we have a set P w ⊆ Ω such that, for each pair ( w, w ′ ) ∈ S , µ ( P w × P w ′ ) R P w × P w ′ f { w,w ′ } dµ >ǫ , and also f { w,w ′ } is suitably quasi-random between P w and P w ′ . Then t S ( { f e } ) > P w approaches 0: instead of sets P w , we will be able to work with individualpoints x w (and, therfore, sufficiently small neighborhoods N j ( x w )). Therequirement that f { w,w ′ } be suitably quasi-random becomes the requirementthat ( x w , x w ′ ) be a points of density, and the requirement that f { w,w ′ } havepositive density becomes the requirement that f + { w,w ′ } ( x w , x w ′ ) > N be a system of neighborhoods with arity 1—since thehypergraph version requires more work. Theorem 3.9. Let W be a finite set, let N be a system of neighborhoodsof arity , and let S be a collection of subsets of W . Suppose that, for each e ∈ S , either: • f e is K e, ( N ) -measurable, or • for every e ′ ∈ S \ { e } , the function ~x e R f e ′ ( ~x e ∪ e ′ ) dµ ( ~x e ′ \ e ) is K e, ( N ) -measurable.Suppose that ~x W ∈ T S ( { f e } ) is such that, for each e ∈ S , ~x e is a positivepoint of density of f e . Then t S ( { f e } ) > . The basic idea of the proof is that we may “blow up” each individualpoint x w into a small ball N j ( x w ), and then use the fact that each ~x e is apoint of density f e to find many copies of W between these small balls. Proof. Choose some ǫ ≤ min e ∈ S f + e ( ~x e ).Since each ~x e is a point of density, we may choose some j large enoughthat, for each e ∈ S ,1 µ ( N j ( ~x e )) µ ( { ~y e ∈ N j ( ~x e ) | f + e ( ~y e ) ≥ ǫ/ } ) ≥ − | S | . Therefore also 1 µ ( N j ( ~x W )) µ ( { ~y W ∈ N j ( ~x W ) | f + e ( ~y e ) ≥ ǫ/ } ) ≥ − | S | . Note that this depends on the fact that the arity of N is 1, because thisensures that N j ( ~x W ) = N j ( ~x e ) × N j ( ~x W \ e ).Therefore γ = µ ( { ~y W ∈ N j ( ~x W ) | there is some e ∈ S such that f + e ( ~y e ) < ǫ/ } ) < | S | | S | = 1 , so, using Lemma 3.8 (with D e = B , for all e ), t S ( { f e } ) = t S ( { f + e } ) ≥ ǫ | S | | S | (1 − γ ) > . (cid:3) Theorem 3.10 (Graph Removal) . Suppose H = ( W, F ) is a finite graphand G = (Ω , E ) is a graph with a countably approximated atomless Keislergraded probability space on Ω with E ∈ B . If t H ( G ) = 0 then there is asymmetric E ′ ⊆ E such that E \ E ′ is a measure set contained in anintersection of sets in B and, taking G ′ = (Ω , E ′ ) , T H ( G ′ ) = ∅ .Proof. Choose N so that every set in B is a finite union of sets in S j N j .Let E ′ ⊆ E consist of the positive points of density of E . By Lemma 3.7, µ ( E \ E ′ ) = 0. If T H ( E ′ ) = ∅ then any ~x W ∈ T H ( E ′ ) satisfies the conditionsof the previous lemma, and so t H ( E ) > E ∈ B and E \ E ′ is contained in an intersection of finite unions ofrectangles from B , E \ E ′ is contained in an intersection of sets in B . (cid:3) Corollary 3.11. For every finite graph H = ( W, F ) and every ǫ > thereis a δ > so that whenever G = ( V, E ) is a graph with t H ( G ) < δ , there isa symmetric E ′ ⊆ E with | E \ E ′ | < ǫ | V | such that, taking G ′ = ( V, E ′ ) , T H ( G ′ ) = ∅ .Sketch. The proof is standard (see [13]), but we include the outline here.Suppose the statement were false, so let H = ( W, F ) and ǫ > n , there is a G n = ( V n , E n ) with t H ( G n ) < /n ,but so that no symmetric E ′ ⊆ E n with | E \ E ′ | < ǫ | V n | is H -free. Notethat | V n | → ∞ (otherwise t H ( G n ) < /n implies T H ( G n ) = ∅ for n largeenough).Let (Ω , E ) be an ultraproduct of the sequence G n . Take the Keislergraded probability space generated by the definable sets, with the Loebmeasure. Let E ′ be given by the previous lemma. Then E \ E ′ is containedin an intersection of definable sets, so choosing some definable set Z m largeenough, E \ Z m is H -free and Z m has measure < ǫ . By the Łoś Theorem, forinfinitely many n , we have ( V n , E n \ Z m ) is also H -free and Z m has measure < ǫ . (Where, by Z m , we mean the interpretation of the definable set Z m inthe structure G n .) But this is contradicts the choice of the G n . (cid:3) Induced Graph Removal. When we prove induced graph removal,we have a new issue to deal with: when ~x is not a point of density, we cannotsimply exclude the point from E , because, by doing so, we might end upcreating an induced copy ~x where one of the non-edges of ~x is an elementwe removed from E .Instead, we adopt a more complicated strategy. We choose j large, sothat N j gives a partition of Ω into very small pieces. We will then choose,from each element P of N j , a representative a P ∈ P , uniformly at random.Since we are only choosing finitely many such elements, with probability 1, REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 15 all the pairs ( a P , a P ′ ) with P = P are points of density. We then modify E to match ( a P , a P ′ ) on P × P ′ ; that is, we define a new graph E ′ : if( a P , a P ′ ) ∈ E then we place all of P × P ′ in E ′ , while if ( a P , a P ′ ) E thenwe exclude all of P × P ′ from E ′ . If we choose j large enough, we will beable to show that, for most choices of the representatives a P , µ ( E △ E ′ ) issmall.This leaves us with a new problem: what to do with the “diagonal com-ponents” P × P . When j is large, these diagonals have small measure, so wecan put them in or out of E ′ as convenient. On the other hand, we cannotguarantee that ( a P , a P ) is a point of density. Theorem 3.12 (Induced Graph Removal) . Suppose H = ( W, F ) is a finitegraph and G = (Ω , E ) is a graph with a countably approximated atomlessKeisler graded probability space on Ω with E ∈ B . For each ǫ > there is asymmetric E ′ ∈ B such that µ ( E ′ △ E ) < ǫ and for any H with t indH ( E ) = 0 , T indH ( E ′ ) = ∅ .Proof. Let f = χ E . Choose j large enough that the set of pairs ( x , x ) forwhich χ jE has not converged to within ǫ/ ǫ/ 3, and so that P P ∈N j µ ( P × P ) < ǫ/ (cid:0) Ω2 (cid:1) into three sets: E = { ( x, y ) | f + ( x, y ) = 1 } , E = { ( x, y ) | f + ( x, y ) = 0 } , and E / = { ( x, y ) | < f + ( x, y ) < } . (Thereis also a set of measure 0 where f + ( x, y ) is undefined.) We may think of theseas the interior of E , the interior of the complement of E , and a boundaryof points near both E and the complement of E .Suppose that, for each P ∈ N j with µ ( P ) > 0, we choose a point a P ∈ P uniformly at random. Then, with positive probability: • the set of points contained in P × P ′ with P = P ′ and such that | f + ( a P , a P ′ ) − f j ( a P , a P ′ ) | ≥ ǫ/ ǫ/ • each ( a P , a P ′ ) is a point of density for each of E , E , E / and is apositive point of density for the set it belongs to.Next we prepare to deal with elements of the sets P × P . What we wantto do is choose many points near each a P ; when we choose one point b P,i near a P and one point b P ′ ,j near a P ′ with P = P ′ , we can ensure, with highprobability, that ( b P,i , b P ′ ,j ) is similar to ( a P , a ′ P ). When we take two pointsnear the same a P , b P,i and b P,j with i = j , we have no control over whathappens. However, by applying Ramsey’s Theorem (many times), we canat least ensure that the behavior does not depend on the particular choiceof i and j .Formally, we will choose these points by applying our counting lemmato a suitable graph. We may let A = { a P | P ∈ N j , µ ( P ) > } . Forany d , let us consider the colored d -blowup of A , which we define to be the { , / , } -colored graph ( A d , c d ) where: • A d = A × [ d ], • dom( c d ) = { (( a, i ) , ( a ′ , j )) | a = a ′ } , • when a = a ′ and ( a, a ′ ) ∈ E z , c d (( a, i ) , ( a ′ , j )) = z .Observe that Theorem 3.9 applies to ( A d , c d ), so t ( A d ,c d ) ( { E z } z ∈{ , / , } ) > v : A → { , , / } , the v -homogeneous completion of ( A d , c d ) is thecolored graph ( A d , c vd ) where c d ⊆ c vd and, for i = j , c vd (( a, i ) , ( a, j )) = v ( a ).Take m sufficiently large and consider any copy ~b A m of the colored m -blowup of A in (Ω , E ). (This means that for each pair (( a, i ) , ( a ′ , j )) ∈ (cid:0) A d (cid:1) with a = a ′ , ( b ( a,i ) , b ( a,j ) ) ∈ E z if and only if ( a, a ′ ) ∈ E z , and we makeno commitments about which of the three sets (( a, i ) , ( a ′ , i )) belongs to.)Applying Ramsey’s Theorem once for each a ∈ A , there is a v and a sub-copy ~b A d of ~b A m which is a copy of ( A d , c vd ).Since there are only finitely many v , this means that for each d there some v so that t ( A d ,c vd ) ( E ) > 0. Furthermore, if d < d ′ , we have t ( A d ′ ,c vd ′ ) ( E ) ≤ t ( A d ,c vd ) ( E ). Therefore there must be some v so that, for all d , t ( A d ,c vd ) ( E ) > P ∈ N j has measure 0. Todeal with this, we assign to every element P ∈ N j a corresponding element Q P ∈ N j , and we will always treat elements of P as if they were really in Q P . For any P ∈ N j with µ ( P ) = 0, choose some Q P ∈ N j with µ ( Q P ) > µ ( P ) > 0, take Q P = P . So for almost every point, Q P = P , butthere are a measure 0 set of exceptional points where Q P = P .We define E ′ as follows: • for P = P ′ , if ( a Q P , a Q P ′ ) ∈ E , let E ′ ∩ ( P × P ′ ) = ∅ , • for P = P ′ , if ( a Q P , a Q P ′ ) ∈ E , let P × P ′ ⊆ E ′ , • for P = P ′ , if ( a Q P , a Q P ′ ) ∈ E / , let E ′ ∩ ( P × P ′ ) = E ∩ ( P × P ′ ), • if v ( a Q P ) = 1 then ( P × P ) ⊆ E ′ , • if v ( a Q P ) = 0 then E ′ ∩ ( P × P ) = ∅ , • if v ( a Q P ) = 1 / E ′ ∩ ( P × P ) = E ∩ ( P × P ).Consider any graph H = ( W, F ) such that T indH ( E ′ ) = ∅ . Take any ~x W ∈ T indH ( E ′ ). For each w ∈ W , let P w ∈ N j = Q N j ( x w ). Note that we mayhave P w = P w ′ even when w = w ′ , so fix an ordering W = { w , . . . , w | W | } .We have t ( A | W | ,c v | W | ) ( { E z } z ∈{ , , / } ) > 0, so we may choose a copy ~y A | W | where all pairs are points of positive density for E if they belong to E ∪ E / and for E if they belong to E ∪ E / .Take ~z w i = ~y ( a Pwi ,i ) . For each pair w i = w j , observe that ( z w i , z w j ) is apositive point of density for E if ( w i , w j ) ∈ F and for E if ( w i , w j ) F —tosee this, suppose ( w i , w j ) ∈ F (the case where ( w i , w j ) F is symmetric): • if P w i = P w j then, since ( x w i , x w j ) ∈ E ′ , we have ( a P wi , a P wj ) E ,so ( y a Pwi ,i , y a Pwj ,j ) ∈ E ∪ E / and is therefore a positive point ofdensity for E , We could have tweaked our definition of a partition to avoid this case, but when wego on to hypergraphs, this case will be unavoidable, and the exceptional points will havesmall but positive measure. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 17 • if P w i = P w j then, since ( x w i , x w j ) ∈ E ′ , we have v ( a P wi ) = 0, soagain ( y a Pwi ,i , y a Pwi ,j ) ∈ E ∪ E / , and is therefore a positive pointof density for E .Therefore we may apply Theorem 3.9 to ~z w i to show that t H ( E ) > µ ( E △ E ′ ) < ǫ . Observe that of ( x w , x w ′ ) ∈ E △ E ′ then, letting P = N j ( x w ) and P ′ = N j ( x w ′ ), one of the following holds:(1) P = P ′ ,(2) µ ( P ) = 0 or µ ( P ′ ) = 0,(3) | f + ( a P , a P ′ ) − f j ( a P , a P ′ ) | > ǫ/ f j ( a P , a P ′ ) ≥ − ǫ/ x w , x w ′ ) E , or(5) f j ( a P , a P ′ ) < ǫ/ x w , x w ′ ) ∈ E .The first case accounts for measure at most ǫ/ 3, the second case for measure0, the third case for measure at most ǫ/ 3, and the last two can each accountfor at most an ǫ/ P × P ′ , so at most ǫ/ µ ( E △ E ′ ) < ǫ . (cid:3) Corollary 3.13. For every finite graph H = ( W, F ) and every ǫ > thereis a δ > so that whenever G = ( V, E ) is a graph with t indH ( G ) < δ , thereis a symmetric E ′ ⊆ E with | E \ E ′ | < ǫ | V | such that, taking G ′ = ( V, E ′ ) , T indH ( G ′ ) = ∅ . Hypergraphs Sequences of Neighborhoods. In order to extend the argumentsabove to hypergraphs, we need to deal with an additional complication.When G = (Ω , E ) and H = ( W, F ) are graphs and we consider the product t H ( G ) = R Q ( w,w ′ ) ∈ F χ E ( x w , x w ′ ) dµ , the distict terms in the product onlyoverlap on a single coordinate. The crucial step is that in Lemma 3.8, whenwe look at a single edge e = ( w , w ′ ) ∈ F , the “overlaps” with other edgesin F \ { e } share at most one coordinate, and are therefore B , -measurable.This means that we are able to use Lemma 3.8 (in the proof of Theorem 3.9)to replace E with E ( χ E | B , ).When G = (Ω , E ) and H = ( W, F ) are 3-graphs, however, the product t H ( G ) = R Q ( w,w ′ ,w ′′ ) ∈ F χ E ( x w , x w ′ , x x ′′ ) dµ has terms which can share twocoordinates. If we try to carry out a proof analogous to the proof of Theorem3.9, we are only able to reduce E to E ( χ E | B , ). E ( χ E | B , ), however,is “graph-like”—it is described in terms of two coordinates at a time, like agraph.This leads us to an iterated process where, at each step, we reduce thenumber of coordinates by one. This means we need to consider, not a singlesystem of neighborhoods, but a sequence of then: we will have a sequenceof systems of neighborhoods, N d , . . . , N , and we will consider not just theneighborhoods N jd ( ~x ), but how these neighborhoods sit in the neighborhood N jd ( ~x ) ∩ N j ′ d − ( ~x ) with j ′ ≫ j , and so on. In this section we will set up all the general machinery. For concreteness,we’ll focus on the case needed to prove induced hypergraph removal, whichmeans we will focus on the case where ~x is a k -tuple and we consider systemsof neighborhoods N k − , . . . , N where N i has arity i . We will refer to this,throughout this section, as the standard example . In particular, note thatthis example illustrates that in the intersection N jd ( ~x ) ∩ N j ′ d − ( ~x ), the set N jd ( ~x ) is “more complicated” (for example, it is defined using sets of arity d )while the set N j ′ d − ( ~x ) is “finer” (since j ′ ≫ j , we are working with a muchfiner partition of Ω d − ). So we are looking at neighborhoods which use “somehigh complexity information and a lot of low complexity information”.We will nonetheless work, where possible, with general systems of neigh-borhoods, since this is the case we will use in the next section. (In the nextsection, N i +1 will have arity i , and N will consist only of intervals.)For this purpose, we identify the property we need a sequence of systemsof neighborhoods to have to be workable. (For instance, when k > 3, wecannot use a sequence of neighborhoods of arity k − N i +1 is “not too much more complicated” than N i , and it should be relatedto the “computability of overlaps” clause from Theorem 3.8. The generalproperty we need is given by the following definition. Definition 4.1. If D is a σ -algebra of sets of s tuples, C is a σ -algebra ofsets of r -tuples, and s ≤ r , we say C is properly aligned in D if, for any C ∈ C and any c with 1 ≤ c ≤ r , the function f ( x , . . . , x r ) = Z χ C ( y , . . . , y c , x c +1 , . . . , x r ) dµ is K r,s ( D )-measurable.We say a sequence of σ -algebras D d , . . . , D where D i is a σ -algebra ofsets of r i -tuples, is properly aligned if: • r = 1, • r i ≤ r i +1 for each i < d , and • D i +1 is properly aligned in D i for each i < d .Of course, the standard example itself is properly aligned. Lemma 4.2. The sequence of σ -algebras B d , B d − , . . . , B is properly aligned.Proof. Since r i = i , the first two conditions are immediate. If C ∈ B i +1 then the function f ( x , . . . , x i +1 ) = R χ C ( y , . . . , y c , x c +1 , x i +1 ) dµ dependsonly on ( x c +1 , . . . , x i +1 ), and is therefore K i +1 ,i +1 − c ( B i − c ) ⊆ B i +1 ,i ( B i )-measurable. (cid:3) When dealing with graphs, although we stated things in terms of tuples ~x W , we were really interested in the collection of infinitesimal neighborhoods { lim j →∞ N j ( x w ) } w ∈ W . In the graph setting, however, we could ignore thedistinction between a point and its infinitesimal neighborhood. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 19 For hypergraphs, though, we need to consider multiple layers of infinites-imal neighborhoods: in the standard example, a pair ( x , x ) has a pair ofinfinitesimal neighborhoods lim j →∞ ( N j ( x ) , N j ( x )) and then an infinites-imal neighborhood of pairs lim j →∞ N j ( x , x ). The problem is that spec-ifying an actual tuple of points pins down all these infinitesimal neighbor-hoods simultaneously. But there could be distinct pairs ( x , x ) , ( y , y ) with( N j ( x ) , N j ( x )) = ( N j ( y ) , N j ( y )) for all j , but N j ( x , x ) = N j ( y , y )for some j —that is, a pair of infinitesimal neighborhoods of points might(and, in general, does) partition into many neighborhoods of pairs.Now, however, we need to separate these notions properly. We bor-row model-theoretic terminology, referring to infinitesimal neighborhoodsas types . Definition 4.3. When N is a system of neighborhoods with arity r , an N -type is a decreasing sequence P ⊇ P ⊇ · · · with each P j ∈ N j a non-emptyset. When p = { P j } is a type, we write p ( j ) = P j . For any ~x ∈ Ω r , we write tp N ( ~x ) = {N j ( ~x ) } .There are two different perspectives on types which it will be useful tokeep in mind below. The simpler perspective is that a type is, essentially, a G δ -set (more precisely, a distinguished presentation of a G δ -set): the typeis giving us the set of points T j P j , and dealing with N ( x ) rather than x isa way of “zooming out” from x to all the points infinitesimally close to it.In particular, if we fix two N -types N ( x ) , N ( x ), we are fixing twosets, and so the product N ( x ) × N ( x ) is itself a rectangle. Althoughthis rectangle has measure 0, we can hope that it behaves like a limit of thepositive measure rectangles N j ( x ) × N j ( x ). For instance, if E is a randomgraph, we might expect that N j ( x ) × N j ( x ) contains both pairs belongingto E and pairs not belonging to E . Indeed, we will see that almost all pointsbelong to types which do behave like the limit of the positive measure typesthat approximate them.There is a technical subtlety: perhaps there is a failure of compactnessand the intersection T P j happens to be empty, even though each finite in-tersection is non-empty. In practice, we always care more about the approxi-mations to the set than the actual intersection: it the intersection happenedto be empty, we could always fill in a point inside it. Indeed, ultraproductsare saturated which, in particular, ensures that each type is non-empty.This suggests the second perspective: we can think of the types themselvesas being points, in a different but related space. That is, instead of workingwith the space Ω of points, we can work with a space Ω where an elementof Ω is a N -type, and we have a measurable function tp : Ω → Ω . Wewill not explicitly use this second perspective, but it may be useful to keepin mind. This second perspective also an explicit connection to the graphon-based approachesto regularity, as in [10, 24]. These approaches avoid the use of a Keisler graded probability When ~x W is a tuple, we want to consider the N -type of ~x W , by whichwe mean the N -types of all size r subsets of W . Slightly more generally, if S ⊆ (cid:0) Wr (cid:1) , we need to consider the collection of N -types precisely for those e ∈ S . (The case we will need this for is that, if x w = x w ′ , we will want toignore those e ∈ (cid:0) Wr (cid:1) which contain both x w and x w ′ .) Definition 4.4. When r ≤ | W | is the arity of N and S ⊆ (cid:0) Wr (cid:1) , an N - S -typeis a tuple ~p S = { ~p e } e ∈ S such that for each e ∈ S , ~p e is an N -type and, foreach j , ~p S ( j ) = T S ( { ~p e ( j ) } ) is non-empty.For any point ~x W , letting S = R ( ~x W ), there is a corresponding N - S -type tp ( ~x W ) given by ( tp N ( ~x W )) e = tp N ( ~x e ).The only case we will need is where S = (cid:0) Wr (cid:1) \ R ( ~x W ) (or an analogreplacing ~x W with N -types). Since tuples with repeated coordinates are anexceptional case with measure 0, they will not be needed until we deal withinduced hypergraph removal.Note that f + N ( ~x ) depends only on the type of ~x , not on the particularpoint, and so being a point of density is a property of the type: if ~x is apoint of density for f in N then so is every ~x in tp N ( ~x ).Finally, we need our most general definition: we have a sequence of sys-tems of neighborhoods N d , . . . , N and want to consider the N i -type of a ~x W -tuple for all i simultaneously. Definition 4.5. When ~p = { ~p ,w } w ∈ W is a N - (cid:0) W (cid:1) -type, we write R r ( ~p ),the tuples of length r with repeated elements for the set of tuples e ∈ (cid:0) Wr (cid:1) such that there are w, w ′ ∈ e with w = w ′ and p ,w = p ,w ′ . When there areno repeated types, we will write R ( ~p ) = ∅ (omitting the subscript r ).When N d , . . . , N is a sequence of systems of neighborhoods for each i , a N d , . . . , N -type is a set ~p W = { ~p i,e } i ≤ d such where ~p = { ~p ,w } w ∈ W is an N - (cid:0) W (cid:1) -type and for i > ~p i = { ~p i,e } e ∈ ( Wri ) \R ei ( ~p ) is a N i -( (cid:0) Wr i (cid:1) \ R r i ( ~p ))-type.For any point ~x W , we write tp N d ,..., N ( ~x W ) for the type given by ( tp N d ,..., N ( ~x W )) i = tp N i ( ~x W ).This definition really is what we should expect: a N d , . . . , N -type ~p as-signs, for each i ≤ d and each r i -sub-tuple e without repeated N -types,an N i -type ~p i,e . The tuples with repeated types are omitted because thosetuples concentrate on diagonals, and have to be handled differently. space by taking our spaces Ω r with r > , they use a ternary product Ω ,where the first two components represent copies of Ω while the third contains the part of Ω which is not measurable with respect to B , . Types give an alternate construction of this:we can see that the map tp : Ω → Ω given by tp ( x, y ) = ( N ( x ) , N ( y )) is inadequate—for instance, if E is a random graph on Ω, there is no E ∗ ⊆ Ω with E = tp − ( E ∗ ). Instead,the correct map is tp : Ω → Ω × Ω , where Ω the space of N -types; Ω is a Keislergraded probability space, but Ω × Ω is an ordinary measure-theoretic product. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 21 This is precisely a description of an infinitesimal complex: we wish toconsider a subset of Ω W where, for each e ∈ (cid:0) Wr i (cid:1) , we restrict ourselves tothe set of ~x W such that ~x e ∈ ~p i,e .The additional subtlety is that when when we have a repeated tuple ~p ,w = ~p ,w ′ , we don’t want to consider more complicated types contain-ing more than one of them. (This is a technical point involving how wehandle repeated vertices in the proof of induced hypergraph counting, butfor now, observe that if ~p ,w = ~p ,w ′ then we should expect a type ~p , { , ′ } containing both to concentrate on a diagonal; since the diagonal has measure0, this means our type concentrates on a set of measure 0, which obstructsour ability to prove a counting lemma.)4.2. Dense Types. We have already noted that being a point of density isreally a property of the type, not that point. For completeness, we restatethe definition in terms of types. Recall that, for any N -type p and anyinteger j , p ( j ) is a set in N j approximating p . Definition 4.6. Let N be a nested system of neighborhoods. Let f : Ω W → [0 , 1] be given. For any N -type p such that each p ( j ) has positive measure,we define f j N ( p ) = 1 µ ( p ( j )) Z p ( j ) f ( ~x ) dµ and f + N ( p ) = lim j →∞ f j N ( p ) . We say an N - S -type p is a dense type for f if f + N ( p ) exists andlim j →∞ µ ( p ( j )) Z p ( j ) | f + N ( y ) − f + N ( p ) | dµ ( y ) = 0 . We say p is a positive dense type for f if p is a dense type for f and f + N ( p ) > f . That means that when we havea ~p e , we need { ~p e ′ } e ′ ( e to be a dense type for each (or at least most) of thesets ~p e ( j ). In order to make the inductive step work, we need to demandthat f be dense at ~p in a slightly stronger way.We need to relativize the conditional expectation. We take a σ -algebra D ,a function f , and a set B which we should think of as being more complicatedthan those in D (for instance, we might have D = B , and B ∈ B \B , ), and we want to define the conditional expectation of f “around theset B ”. We will write this E ( f y B | D ), which will be precisely the D -measurable information with the property that, when given B , we canreconstruct E ( f χ B | D ). Definition 4.7. Let f be a function, P a set, and D a σ -algebra. The weighted projection E ( f y P | D ) is defined to be the unique (up to L -norm) function with domain { ~x | E ( χ P | D )( ~x ) > } such that E ( f y P | D )( ~x ) = E ( f χ P | D )( ~x ) E ( χ P | D )( ~x ) . Note that E ( f y P | D ) is, as the notation suggests, measurable withrespect to D . The main fact we will need is the following. Lemma 4.8. If g is D -measurable then Z f χ P g dµ = Z E ( f y P | D ) χ P g dµ. Proof. Z f χ P g dµ = Z E ( f χ P | D ) g dµ = Z E ( f y P | D ) E ( χ P | D ) g dµ = Z E ( f y P | D ) χ P g dµ. (cid:3) Definition 4.9. Given f : Ω W → [0 , P ⊆ Ω W , and two systems ofneighborhoods N d , N d − , we define f + y P N d , N d − = E ( { ~y | f + N d ( ~y ) > } y P | K W,r d − ( N σd − ) } ) . This obscure definition is justified by its crucial appearance in Lemma4.12 below. In practice, P will have the form T ( Wrd )( { ~p e ( j ) } ) for some N d -type ~p , so we will have partitioned Ω W into sets P of this form and then wecan think of the functions f + y P N d , N d − as being a “partition of unity” appliedto the function E ( { y | f + N d ( ~y ) > } | K W,r d − ( N σd − ) } ). Definition 4.10. For each i ≤ d , let N i be a system of neighborhoods andlet f : Ω W → R . We say a N d , . . . , N -type ~p W with R ( ~p ) = ∅ is a densetype for f in N d , . . . , N if: • ~p d is a dense type for f (as an N d -type) • for all j and each e ∈ (cid:0) Wr d (cid:1) , { ~p i,e ′ } i If, additionally, f + N d ( ~p ) > 0, we say ~p is a positive dense type for f in N d , . . . , N . Lemma 4.11. For any measurable f : Ω k → [0 , and almost every x , tp N d ,..., N ( x ) is a dense type for f , and for almost every x with f ( x ) > , tp N d ,..., N ( x ) is a positive dense type for f .Proof. The set of x so that R ( tp N d ,..., N ( x )) = ∅ has measure 0, so we mayignore these points.We now proceed by induction on d . tp ( x ) N d is a dense type for f in N d exactly when x is, and we have already shown that the set of x such that x is dense point for f has measure 1.For each P ∈ S j N d ( j ), by the inductive hypothesis the set of x suchthat tp N d − ,..., N ( x ) is a dense type for P has measure 1. Since there arecountably many elements in S j N d ( j ), the set of of x so that tp N d − ,..., N ( x )is a dense type for all of them also has measure 1.Also, for each P ∈ S j N d ( j ), the set of x ∈ P such that tp N d − ,..., N ( x ) isnot a positive dense type for P has measure 0, and so again, except on a setof measure 0, tp N d − ,..., N ( x ) will be a positive dense type for N jd ( x ).It remains to show that, for each δ > E , the set of pointsfailing the fourth condition above with E has measure < δ .Let δ, E be given. Let A + = { ~y | f + N d ( ~y ) > } . By choosing j sufficientlylarge, we can arrange that A + is contained, except for a set of measure < δ/ 2, in elements P ∈ K W,r d ( N σd ( j )) such that µ ( A + ∩ P ) µ ( P ) > − δ E .Within any such P ,1 − δ < µ ( P ) µ ( A + ∩ P )= 1 µ ( P ) Z χ A + χ P dµ = 1 µ ( P ) Z E ( χ A + y P | K W,d − ( N σd − )) χ P dµ = 1 µ ( P ) Z f + y P N d , N d − χ P dµ and therefore µ ( { ~y ∈ P | f + y P N d , N d − ( ~y ) > − E } ) µ ( P ) > − δ S E = { ~y | f + y P N d , N d − ( ~y ) > − E } . Inductively, the set of ~x ∈ S E ∩ P such that tp N d − ,..., N ( ~x ) is not a positive dense type for S E has measure atmost δµ ( P ) / 2, and therefore the set of ~x ∈ S E such that tp N d − ,..., N ( ~x ) isnot a positive dense type for S E has measure at most δ . (cid:3) Counting and Removal. The next result is the analog of the hyper-graph counting lemma. We suppose we have a configuration { f e } e ∈ S with S a set of subsets of W , and we have points ~x W with f e ( ~x e ) > e , tp N d ,..., N ( ~x e ) is a positivedense type for f e , then actually we can expand this single point into a setof points of positive measure, showing that t S ( { f e } ) > Theorem 4.12. Let W be a finite set, let N d , . . . , N be a properly alignedsequence of systems of neighborhoods so that N i is a nested system of neigh-borhoods with arity r i , let S be a set of subsets of W , and suppose that ~p W is a N d , . . . , N -type such that: • for each e ∈ S , the restriction ~p W is a positive dense type for f e , and • for each e ∈ S , either: – f e is K e,r d ( N σd ) -measurable, or – for every e ′ ∈ S \ { e } , the function ~x e R f e ′ ( ~x e ∪ e ′ ) dµ ( ~x e ′ \ e ) is K e,r d ( N σd ) -measurable.Then t S ( { f e } ) > .Proof. We proceed by induction on d . When d = 1, this is exactly Theorem3.9.So suppose d > 1. By Lemma 3.8 with D e = K e,r d ( N σd ) for all e , wehave t S ( { f e } ) = t S ( { ( f e ) + N d } ). Since ~p e is a positive dense type of f e , also( f e ) + N d ( ~p e ) > e . Let A + e = { ~y e | ( f e ) + N d ( ~y e ) > } . It suffices toshow that t S ( { χ A + e } ) > j sufficiently large. For each e ∈ S , let A ♭e = { ~z | ( χ A + e ) + y T ( erd ) ( { ~p d,e ′ ( j ) } ) N d , N d − ( ~z ) > − | S | + 1 } , so { ~p i,e } i Suppose H = ( W, F ) is a finite k -graph and G = (Ω , E ) isa k -graph with a countably approximated atomless Keisler graded probabilityspace on Ω with E ∈ B k . If t H ( G ) = 0 then there is a symmetric E ′ ⊆ E such that E \ E ′ is a measure set contained in an intersection of sets in B k and, taking G ′ = (Ω , E ′ ) , T H ( G ′ ) = ∅ .Proof. Nearly identical to the proof of Graph Removal, Theorem 3.10. Let N k − , . . . , N be a sequence of systems of neighborhoods for the main exam-ple. Then let E ′ ⊆ E consist of the points in E whose type is positive densefor χ E . If T H ( E ′ ) = ∅ then any ~x W ∈ T H ( E ′ ) satisfies the conditions of theprevious lemma, and therefore t H ( E ) > (cid:3) Corollary 4.14. For every finite k -graph H = ( W, F ) and every ǫ > thereis a δ > so that whenever G = ( V, E ) is a k -graph with t H ( G ) < δ , thereis a symmetric E ′ ⊆ E with | E \ E ′ | < ǫ | V | k such that, taking G ′ = ( V, E ′ ) , T H ( G ′ ) = ∅ . Conditioning on Sets of Measure . Before going on, it will beconvenient to consider the notion of picking a type “uniformly at random”.The natural way to pick a random type is to pick a random point ~x and take tp N d ,..., N ( ~x ). (Note that, with probability 1, such a type has no repeated N -types, so we do not need to worry about that complication here.)However what we will need later is to first pick N -types, then the N -type, and so on, and we will need to describe what it means to pick a N -typerandomly among the extensions of a given N -type.Because the types represent sets of measure 0, we do not generally expectto be able to make sense of the probability of an event conditioned on beingin a type ~p . However because these events are intersections of a well-definedfamily of positive measure events, we can make sense of conditioning onthem as long as the right limits converge.Say we have two systems of neighborhoods, N and M of arity r ≤ s ,respectively. (For instance, M = N i +1 while N = N i .) Then almost every N - s -type ~p is a dense type for every P in every M ( c ), since there are onlycountably many such sets.For instance (in the standard example) choosing N types p and q givesus a measure 0 rectangle p × q ; despite being measure 0, for almost all p and q we can make sense of choosing a pair ( x, y ) ∈ p × q randomly and taking tp N ( x, y ): the probability that tp N ( x, y )( j ) = P is precisely the density of P in the type p × q .So we may choose a N d , . . . , N -type by first choosing N -types randomly,and then inductively using this process to choose the N -types, then the N -types, and so on.The only thing we need to check is that this gives the same distributionas if we had simply chosen the type of a random point. Lemma 4.15. The inductive method of choosing N d , . . . , N -types has thesame distribution as choosing tp N d ,..., N ( ~x ) for a uniformly chosen ~x .Proof. By induction on i ≤ d . For i = 1 these distributions have the samedefinition. Suppose the claim holds for i . The probability that we choosea N i +1 -type ~p i +1 with ~p i +1 ( j ) = P is the integral, over all choices of ~p i of( χ P ) + N i ( ~p i ). By the inductive hypothesis, this is the integral over a random ~x of ( χ P ) + N i ( tp N i ( ~x )), which is equal to µ ( P ). (cid:3) Induced Hypergraph Removal. To prove induced hypergraph re-moval, we need a hypergraph counting lemma that allows repeated elementsin tuples. To do that, we need to generalize the notion of a likely configura-tion.Suppose we have a tuple of points ~a W where some of the points are re-peated . Once again, we want to be able to “wiggle the points” so that wecan replace them with nearby points which we will be able to apply The-orem 4.12 to. The complication is that now, in addition to the types ofthe points themselves, we need to be concerned with the types of the tu-ples they belong to: if we “wiggle” a , this also affects the neighborhood of tp N ( a , a ). Indeed, we can’t really “wiggle” a while holding tp N ( a , a )constant, because tp N ( a , a ) completely determines tp N ( a ).So we have to wait until later in the counting process: the proof of The-orem 4.12 inductively reduces a hypergraph counting problem to a problemabout counting graphs. In particular, prior to the last step of that process,we replace tp N ( a , a ) with a positive measure approximation to it. (Specif-ically, the set A ♭ , which we construct in that proof.) At that point, we cansafely “wiggle” a , since we can just promise to remain within various setswhich have positive measure.With that in mind, we can prove our infinitary version of removal. Westate it in a general form, allowing a coloring ρ of k -tuples and showing Really, this should be talking about types rather than points, but we can more or lessequate a w with tp N ( a w ), and this will be clearer without the added abstraction of talkingabout types. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 27 that, with a small change to ρ , we can simultaneously remove all copies ofsmall colorings ( W, c ) which appear with 0 density in (Ω , ρ ). As always, theinduced hypergraph case is when Σ = { , } . Theorem 4.16. Let Σ be a finite set and let (Ω , ρ ) be a coloring with acountably approximated atomless Keisler graded probability space on Ω andlet ρ : (cid:0) Ω k (cid:1) → Σ be such that each ρ − ( σ ) ∈ B . For each ǫ > there is a ρ ′ : (cid:0) Ω k (cid:1) → Σ such that µ ( { ~x | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ, each ( ρ ′ ) − ( σ ) ∈ B , and, for all ( W, c ) , if t W,c (Ω , ρ ) = 0 then T W,c (Ω , ρ ′ ) = ∅ .Proof. To emphasize the way the proof generalizes, we will state it in termsof a general sequence of systems of neighborhoods, N d − , . . . , N . For thisprecise result, we can take d = k and r i = i for all i < d , but nothingchanges to consider a longer sequence of systems of neighborhoods, and wewill need this case in the next section. General Setup : For each σ ∈ Σ, let P σ = { ~y | ρ ( ~y ) = σ } and f σ = χ P σ ,so T W,c (Ω , ρ ) = T ( Wk )( { P σ ( e ) } e ∈ ( Wk )). Let P = { P σ | σ ∈ Σ } , which is apartition of k -tuples from Ω. Choose a sequence c d − < · · · < c where each c j is sufficiently large relative to the sizes of |N c j ′ j ′ | for j ′ > j and so c islarge enough that the set of k -tuples with more than one point in the sameelement of N c has measure < ǫ/ P as being analogous to N d ; in particular, wedefine r d = k .Our plan is this. We have partitioned Ω r d − into the elements of N c d − d − ,then partitioned Ω r d − into the elements of N c d − d − , which are much smaller,and so on. By analogy to the proof of Theorem 3.12, we will want to considerthe partition of Ω k into sets of the form T U i 0, and we take ~Q ~P [ r j +1 ] = P and ~p ~P [ r j +1 ] = ~p ~Q ~P [ r j +1 ] .Finally, consider a defective configuration such that, for some e ( [ r j +1 ], ~Q ~Pe = ~P e . Then we wish to simply follow along with the “corrected” con-figuration: define ~P ′ by ~P ′ e = ~Q ~Pe for e ( [ j + 1] and ~P ′ [ r j +1 ] = ~P [ r j +1 ] , andset ~Q ~P [ r j +1 ] = ~Q ~P ′ [ r j +1 ] (which was already defined in one of the previous cases)and ~p ~P [ r j +1 ] = ~p ~P ′ [ r j +1 ] .To satisfy symmetry, we define ~Q and ~p for permutations of ~P in theunique way determined by symmetry. Note that we need to use the factthat ~P has distinct singletons to make sure that no permutation other thanthe identity maps ~P to itself, so the symmetry requirement imposes nofurther restrictions on our choices.We need to check that, with positive probability, the choice of the ~p ~P satisfies the six conditions above. The first three follow immediately fromthe construction.The fourth property and fifth properties hold with probability 1, so wecan certainly choose the types ~p ~P to satisfy these properties.For the sixth property, note that the ≤ k -configurations ~P such that thereexists an e ⊆ k and a j < | e | such that the density of P e in T U j ′ ( [ k ] j ′ )( ~P )is < ǫ k have measure at most ǫ . Consider a ≤ k -configuration such that thisdoes not happen. As we observed in the previous section, the corresponding N j , . . . , N - e -type ~p ~P is chosen with the same distribution as choosing thetype of a random point in T U j ′ ( [ k ] j ′ )( ~P ). By our choice of c j , for each ≤ k -configuration ~P , the probability that there is any e , j so that P e is hasdensity 0 in ~p ~P ↾ j is at most ǫ/ 3. By averaging over all ≤ k -configurations(weighted by their size), there is positive probability we choose the ~p ~P that REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 31 the set of ≤ k -configurations failing the condition in (6) has measure at most ǫ . This last condition implies that most points belong to non-defective con-figurations: the only way there is an e with ~P e = ~Q ~Pe is if there is an e sothat P e has density 0 in the corresponding type of lower arity, which meansall such configurations are contained in the set of exceptional configurations. Defining ρ ′ for most tuples : At this point, we have done enough todefine ρ ′ on ≤ k -configurations ~P with distinct singletons.Let Ξ be the set of non-empty subsets of Σ, and for each ≤ k -configuration ~P with distinct singletons, let ξ ( ~P ) = { σ ∈ Σ | ( f σ ) + N k − ( ~p ~P ) > } . (Sincethe f σ add to 1, ξ ( ~P ) is always non-empty, and therefore in Ξ.) We willdefine ρ ′ on T U j ′ ( [ k ] j ′ )( ~P ) by setting ρ ′ ( ~x ) = ( ρ ( ~x ) if ρ ( ~x ) ∈ ξ ( ~P )some σ ∈ ξ ( ~P ) otherwise . Note that if ρ ( ~x ) = ρ ′ ( ~x ), we must have one of: • the ≤ k -configuration containing ~x does not have distinct singletons, • the ≤ k -configuration containing ~x is defective, or • ρ ( x ) ξ ( ~P ).The first two conditions account for points of measure at most ǫ/ ρ ( x ) ξ ( ~P ) then we have ( f ρ ( x ) ) + N k − ( ~p ~P ) = 0. Since the ~p ~P are distributeduniformly at random, except on a set of configurations of measure at most ǫ/ 6, ( f ρ ( x ) ) + N k − ( ~p ~P ) = 0 implies that the set of points in ~P with color ρ ( x )has measure at most ( ǫ/ µ ( T U j ′ ( [ k ] j ′ )( ~P )). Therefore (regardless of how wedefine ρ ′ on the ≤ k -configurations which do not have distinct singletons), µ ( { ~x | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ. Dealing with tuples with distinct singletons : Next, again as inLemma 3.12, we need to decide what to do with the configurations whichdo not have distinct singletons.To motivate the construction, it is useful to look at how we will use ourdefinition. Suppose that, after finishing the definition of ρ ′ , we have some ~x W ∈ T W,c (Ω , ρ ′ ). Then ~x W induces some maps into our partition: for j < d we can take θ j : (cid:0) Wr j (cid:1) → N c j j given by θ j ( e ) = N c j j ( ~x e ). Then for j ≤ d ,whenever e ∈ (cid:0) Wr j (cid:1) and θ is injective on e , we can define Θ( e ) to be the ≤ d -configuration { θ j ( e ′ ) } e ′ ∈ S j ′ Σ, let us say ( W, ζ, ι, { θ j } ) is ν -homogeneous if, for all e ∈ (cid:0) Wk (cid:1) \ dom( ζ ), ι ( e ) = ν (Θ( e )).Given ~x W ∈ Ω W , we can of course induce a function ι : (cid:0) Wk (cid:1) → Σ bytaking ι ( e ) = ρ ( ~x e ). So what we need to do is find such ~x W which arehomogeneous.Let us say ( W, ζ, { θ j } ) has size at least d if for every non-defective ≤ k -configuration ~P with distinct singletons, there are at least d k -tuples in (cid:0) Wk (cid:1) with Θ( e ) = ~P .Observe that, for every ( W, ζ, { θ j } ), there is some d so that whenever( W ′ , ζ ′ , { θ ′ j } ) has size at least d , there is an embedding π : W → W ′ sothat, for all e , θ j ( e ) = θ ′ j ( π ( e )). Furthermore, for every d , there is an m sothat for any ( W, ζ, ι, { θ j } ) with size at least m , there is a W ⊆ W so that( W , ζ, ι, { θ j } ) has size at least d and is homogeneous.So, for each d , we can take this large enough m and fix a ( W, ζ, { θ j } )of size at least m . We have t W,ζ (Ω , ξ ) > ~x W ∈ T W,ζ (Ω , ξ ),we have a W ⊆ W of size at least d so that ~x W is homogeneous (that is,there is a ν : Z → Σ so that, for e ∈ (cid:0) W k (cid:1) , P ( ~x e ) = ν (Θ( e ))—equivalently,( W , ζ, ι, { θ j } ) is ν -homogeneous, where ι is induced by ~x W ). Since thereare finitely many ν , there must be some ν which we obtain for a set of ~x W of positive measure. Such a ν exists for every m , so there is some ν whichworks for arbitrarily large m . REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 33 We pick such a ν and use it to complete the definition of ρ ′ : when ~x ∈ T U j ′ ( [ k ] j ′ )( ~P ) for some ~P ∈ Z , we set ρ ′ ( ~x ) = ν ( ~P ). Checking that removal holds : All that remains is show that whenever T W,c (Ω , ρ ′ ) = ∅ that t W,c (Ω , ρ ) > W and ( W, c ) so that T W,c (Ω , ρ ′ ) = ∅ . Choosesome ~x W ∈ T W,c (Ω , ρ ′ ).From ~x W we can read off ( W, ζ, { θ j } ) by setting, for e ∈ (cid:0) Wr j (cid:1) , θ j ( e ) = ~Q {N cj ′ j ′ ( ~x e ′ ) } e ′ ∈ S j ′≤ j ( erj ′ ) and ζ ( e ) = ξ ( ~x e ). There is d so that ( W, ζ, { θ j } ) will embedin any blow up of the partition of size at least d . Take ( W ′ , ζ ′ , { θ ′ j } ) ofsize at least m where m is large enough, t W ′ ,ζ ′ (Ω , ρ ) > ν , there is a positive measure of ~y W ′ ∈ T W ′ ,ζ ′ (Ω , ρ ) such thatthere is a W ⊆ W ′ so that ( W , ζ ′ , { θ ′ j } ) has size at least d and ~y W is ν -homogeneous.So consider one of these ~y W where, for all e ∈ (cid:0) W k (cid:1) , ~y e is a point of densityfor each f σ and a positive point of density for f P ( ~y e ) . Fix an embedding π : W → W . We claim that, for each e ∈ (cid:0) Wk (cid:1) , f + c ( e ) ( ~y π ( e ) ) > e ) 6∈ Z , so Θ( e ) is a non-defective ≤ k -configuration with distinct singletons. Then c ( e ) = ρ ′ ( ~x e ) and, by thedefinition of ρ ′ , ρ ′ ( ~x e ) ∈ ξ ( ~p Θ( e ) ) = ζ ( e ) = ζ ′ ( π ( e )) = ξ ( ~y π ( e ) ). Therefore c ( e ) ∈ ξ ( ~y π ( e ) ), so f + c ( e ) ( ~y π ( e ) ) > e ) ∈ Z . Again c ( e ) = ρ ′ ( ~x e ) and, by the definitionof ρ ′ , ρ ′ ( ~x e ) = ν (Θ( e )). Since ~y W is ν -homogeneous, we have ρ ( ~y π ( e ) ) = ν (Θ( e )).So we can apply Theorem 4.12 to ~y π ( W ) , showing that t W,c (Ω , ρ ) > (cid:3) Ordered Hypergraphs The work of the previous section applies, with only minimal changes, toordered hypergraphs. Definition 5.1. When (Ω , < ) is a linearly ordered set and ( W, ≺ ) is a finitelinear order, we write O For each k , B k,< is the the sub- σ -algebra of B k generatedby all products Q i ≤ k I i where each I i is an interval in < .Note that, by definition, B k,< ⊆ B k, . Lemma 5.3. If { (Ω k , B k , µ k ) } k ∈ N then { ( x, y ) | x < y } ∈ B ,< .Proof. We show that, for any ǫ > 0, we may approximate { ( x, y ) | x < y } towithin ǫ . Given ǫ > 0, write Ω = S i Let ( W, ≺ ) be a partially ordered finite set, let N d , . . . , N , N ,< be a properly aligned sequence of systems of neighborhoods so that N i is anested system of neighborhoods with arity r i , let S be a set of subsets of W ,and suppose that ~p W is a N d , . . . , N , N ,< -type such that: • when w ≺ w ′ , there is an i so that ~p w ( i ) < ~p w ′ ( i ′ ) , • for each e ∈ S , the restriction ~p W is a positive dense type for f e , and • for each e ∈ S , either: – f e is K e,r d ( N d ) -measurable, or – for every e ′ ∈ S \ { e } , the function ~x e R f e ′ ( ~x e ∪ e ′ ) dµ ( ~x e ′ \ e ) is K e,r d ( N d ) -measurable. REMOVAL LEMMA FOR ORDERED HYPERGRAPHS 35 Then t S, ≺ ( { f e } ) > .Proof. We proceed by induction on d . When d = 0 (that is, we are consider-ing a N ,< -type), the proof is similar to Theorem 3.9, taking care to respectthe ordering.Choose some ǫ ≤ min e ∈ S f + e ( ~x e ).Since each ~p e is a dense type, we may choose some j large enough that,for each e ∈ S , 1 µ ( ~p e ( j )) µ ( { ~y e ∈ ~p e ( j ) | f + e ( ~y e ) ≥ ǫ/ } ) ≥ − | S | . Consider Q w ∈ W p w ( j ). This is a product of intervals and, when j is largeenough, the map w p w ( j ) is order preserving. Therefore1 µ ( ~p W ( j )) µ ( { ~y W ∈ ~p W ( j ) | f + e ( ~y e ) ≥ ǫ/ w ≺ w ′ , y w < y w ′ } ) ≥ − | S | . Therefore γ = µ ( { ~y W ∈ ~p W ( j ) | there is some e ∈ S such that f + e ( ~y e ) < ǫ/ } ) < | S | | S | = 1 , so, using Lemma 3.8, t S, ≺ ( { f e } ) = t S, ≺ ( { f + e } ) ≥ ǫ | S | | S | (1 − γ ) > . The argument from Theorem 4.12 applies unchanged for the inductivecase since the set of ordered tuples is N i -measurable for all i . (cid:3) Theorem 5.5 (Ordered Hypergraph Removal) . Let Σ be a finite set andlet (Ω , ρ, < ) be given along with a countably approximated atomless Keislergraded probability space on Ω with ρ : (cid:0) Ω k (cid:1) → Σ such that each ρ − ( σ ) ∈ B and a dense collection of intervals of < is in B . For each ǫ > there is a ρ ′ : (cid:0) Ω k (cid:1) → Σ such that µ ( { ~x ∈ Ω k | ρ ( ~x ) = ρ ′ ( ~x ) } ) < ǫ, each ( ρ ′ ) − ( σ ) ∈ B , , and, for all ( W, c, ≺ ) , if t W,c, ≺ (Ω , ρ ) = 0 then T W,c, ≺ (Ω , ρ ′ ) = ∅ .Proof. The proof is largely unchanged from the proof of Theorem 4.16 usingthe sequence of systems of neighborhoods N k − , N k − , . . . , N , N ,< , so wetake d = k + 1, and using Theorem 5.4.The only further step that needs to be checked carefully is the homogeniza-tion step when dealing with tuples with distinct singeltons. Our definition ofa blow up ( W, ζ, { θ j } ) is unchanged—note that a partial ordering of W , onpairs where θ is injective, can be inferred from the assignment θ . Our homo-geneous blowups ( W, ζ, ι, { θ j } , ≺ ) are defined to have total orderings where ≺ is consistent with θ . The crucial point is that the Ramsey-type propertystill holds: for every d , there is an m so that for any ( W, ζ, ι, { θ j } , ≺ ), thereis a W ⊆ W so that ( W , ζ, ι, { θ j } , ≺ ) is homogeneous. The ordering is no obstacle to obtaining this by the usual Ramsey theoretic arguments, andthe rest of the proof is unchanged. (cid:3) Corollary 5.6 (Ordered hypergraph removal lemma) . For every finite set Σ , every ǫ > , and every Σ -colored ( W, ≺ , c ) , there is a δ > so that forany (Ω , <, ρ ) with t W, ≺ ,c (Ω , <, ρ ) < δ there is a ρ ′ with |{ ~x ∈ Ω k | ρ ( ~x ) = ρ ′ ( ~x ) }| < ǫ | Ω | k such that T W, ≺ ,c (Ω , <, ρ ) = ∅ . Corollary 5.7 (Infinite ordered hypergraph removal lemma) . For everyfinite set Σ , every ǫ > , and every family F of finite Σ -colored orderedhypergraphs, there are δ > and a bound M so that for any (Ω , <, ρ ) , if, forevery ( W, ≺ , c ) ∈ F with | W | ≤ M we have t W, ≺ ,c (Ω , <, ρ ) < δ , then thereis a ρ ′ with |{ ~x ∈ Ω k | ρ ( ~x ) = ρ ′ ( ~x ) }| < ǫ | Ω | k such that for every ( W, ≺ , c ) ∈ F , T W, ≺ ,c (Ω , <, ρ ) = ∅ . Further Directions We have not attempted to identify the correct common generalization ofTheorems 4.16 and 5.5 to give a general theorem saying that certain struc-tures can be removed while preserving some fixed structure. Such a theoremmust make some promise about the measurability of the fixed structure, andadditionally place some sort of Ramsey-type condition on it.There are other examples in the literature where some distinguished fam-ily of sets analogous to B < is of particular interest. In particular, [14] con-siders a computational setting; translated into our framework here, we addthe assumption that the points of Ω are understood to have a structure likebinary sequences 2 Λ , embodied in a distinguished family of sets B ,c whichconsists of those sets[ s ] = { ω | ∀ λ ∈ dom( s ) s ( λ ) = ω ( λ )where s is a partial function with finite domain from Λ to { , } . That is, thedistinguished sets are those in which a finite number of coordinates have beenfixed. The regularity lemma they prove is precisely the one correspondingto the sequence of σ -algebras B , B ,c ; extending the removal lemma to thissetting (or to longer sequences B d , . . . , B , B ,c ) would require identifyinginteresting structures to be the fixed part (analogous to the ordering) whichare B ,c -measurable—that is, the relation symbols in this structure wouldhave to have the property that they can be calculated on all but measure ǫ points while examining the input at only finitely many points in Λ. References [1] D. J. Aldous. “Representations for partially exchangeable arrays ofrandom variables”. In: J. Multivariate Anal. issn : 0047-259X (cit. on p. 3). EFERENCES 37 [2] N. Alon and O. Ben-Eliezer. “Efficient removal lemmas for matrices”.In: Order issn : 0167-8094 (cit. on p. 1).[3] N. Alon, O. Ben-Eliezer, and E. Fischer. “Testing hereditary propertiesof ordered graphs and matrices”. In: . IEEE Computer Soc.,Los Alamitos, CA, 2017, pp. 848–858 (cit. on p. 1).[4] O. Ben-Eliezer et al. Limits of Ordered Graphs and their Applications .2018. eprint: arXiv:1811.02023 (cit. on p. 3).[5] B. van den Berg, E. Briseid, and P. Safarik. “A functional interpre-tation for nonstandard arithmetic”. In: Ann. Pure Appl. Logic issn : 0168-0072 (cit. on p. 5).[6] F. R. K. Chung. “Regularity lemmas for hypergraphs and quasi-randomness”.In: Random Structures Algorithms issn : 1042-9832 (cit. on p. 12).[7] D. Conlon et al. “Weak quasi-randomness for uniform hypergraphs”.In: Random Structures Algorithms issn : 1042-9832 (cit. on p. 12).[8] L. N. Coregliano and A. A. Razborov. “Semantic limits of dense combi-natorial objects”. In: Uspekhi Mat. Nauk issn : 0042-1316 (cit. on pp. 2, 34).[9] P. Diaconis and S. Janson. “Graph limits and exchangeable randomgraphs”. In: Rend. Mat. Appl. (7) issn : 1120-7183 (cit. on p. 3).[10] G. Elek and B. Szegedy. “A measure-theoretic approach to the theoryof dense hypergraphs”. In: Adv. Math. issn : 0001-8708 (cit. on pp. 3, 9, 19).[11] J. Fox. “A new proof of the graph removal lemma”. In: Ann. of Math.(2) issn : 0003-486X (cit. on p. 5).[12] F. Garbe et al. Limits of Latin squares . 2020. eprint: arXiv:2010.07854 (cit. on p. 3).[13] I. Goldbring and H. Towsner. “An approximate logic for measures”.English. In: Israel Journal of Mathematics issn : 0021-2172 (cit. on pp. 4, 8, 14).[14] M. Göös, T. Pitassi, and T. Watson. “Query-to-communication liftingfor BPP ”. In: SIAM J. Comput. issn : 0097-5397 (cit. on p. 36).[15] W. T. Gowers. “Hypergraph regularity and the multidimensional Sze-merédi theorem”. In: Ann. of Math. (2) issn : 0003-486X (cit. on p. 1).[16] D. Hoover. Relations on Probability Spaces and Arrays of RandomVariables . Preprint. Institute for Advanced Study, Princeton, NJ, 1979(cit. on p. 3).[17] C. Hoppen et al. “Limits of permutation sequences”. In: J. Combin.Theory Ser. B issn : 0095-8956 (cit. on p. 3). [18] Y. Kohayakawa et al. “Weak hypergraph regularity and linear hyper-graphs”. In: J. Combin. Theory Ser. B issn :0095-8956 (cit. on p. 12).[19] L. Lovász. Large networks and graph limits . Vol. 60. American Mathe-matical Society Colloquium Publications. American Mathematical So-ciety, Providence, RI, 2012, pp. xiv+475. isbn : 978-0-8218-9085-1 (cit.on p. 9).[20] L. Lovász and B. Szegedy. “Limits of dense graph sequences”. In: J.Combin. Theory Ser. B issn : 0095-8956 (cit.on p. 2).[21] G. Moshkovitz and A. Shapira. “A tight bound for hyperaph regular-ity”. In: Geom. Funct. Anal. issn : 1016-443X (cit. on p. 5).[22] B. Nagle, V. Rödl, and M. Schacht. “The counting lemma for regular k -uniform hypergraphs”. In: Random Structures Algorithms issn : 1042-9832 (cit. on p. 1).[23] V. Rödl and J. Skokan. “Regularity lemma for k -uniform hypergraphs”.In: Random Structures Algorithms issn : 1042-9832 (cit. on p. 1).[24] “Semantic Limits of Dense Combinatorial Objects”. In: ArXiv abs/1910.08797(2019) (cit. on p. 19).[25] T. Tao. “A variant of the hypergraph removal lemma”. In: J. Combin.Theory Ser. A issn : 0097-3165 (cit. onp. 2).[26] T. Tao. “Szemerédi’s regularity lemma revisited”. In: Contrib. DiscreteMath. issn : 1715-0868 (cit. on p. 2).[27] H. Towsner. What do ultraproducts remember about the original struc-tures? draft. Apr. 2018 (cit. on p. 5).[28] H. Towsner. “ σ -algebras for quasirandom hypergraphs”. In: RandomStructures Algorithms issn : 1042-9832 (cit.on pp. 4, 7).[29] H. Towsner. “An analytic approach to sparse hypergraphs: hypergraphremoval”. In: Discrete Analysis (2018) (cit. on p. 4). Department of Mathematics, University of Pennsylvania, 209 South 33rdStreet, Philadelphia, PA 19104-6395, USA Email address : [email protected] URL ::