[PDF] Hypercontractivity on the symmetric group

Abstract

The hypercontractive inequality is a fundamental result in analysis, with many applications throughout discrete mathematics, theoretical computer science, combinatorics and more. So far, variants of this inequality have been proved mainly for product spaces, which raises the question of whether analogous results hold over non-product domains. We consider the symmetric group, S n , one of the most basic non-product domains, and establish hypercontractive inequalities on it. Our inequalities are most effective for the class of \emph{global functions} on S n , which are functions whose 2 -norm remains small when restricting O(1) coordinates of the input, and assert that low-degree, global functions have small q -norms, for q>2 . As applications, we show: 1. An analog of the level- d inequality on the hypercube, asserting that the mass of a global function on low-degrees is very small. We also show how to use this inequality to bound the size of global, product-free sets in the alternating group A n . 2. Isoperimetric inequalities on the transposition Cayley graph of S n for global functions, that are analogous to the KKL theorem and to the small-set expansion property in the Boolean hypercube. 3. Hypercontractive inequalities on the multi-slice, and stability versions of the Kruskal--Katona Theorem in some regimes of parameters.

Full PDF

aa r X i v : . [ c s . D M ] O c t Hypercontractivity on the symmetric group

Yuval Filmus * Guy Kindler † Noam Lifshitz ‡ Dor Minzer § Abstract

The hypercontractive inequality is a fundamental result in analysis, with many applications through-out discrete mathematics, theoretical computer science, combinatorics and more. So far, variants of thisinequality have been proved mainly for product spaces, which raises the question of whether analogousresults hold over non-product domains.We consider the symmetric group, S n , one of the most basic non-product domains, and establishhypercontractive inequalities on it. Our inequalities are most effective for the class of global functions on S n , which are functions whose -norm remains small when restricting O (1) coordinates of the input,and assert that low-degree, global functions have small q -norms, for q > .As applications, we show:1. An analog of the level- d inequality on the hypercube, asserting that the mass of a global functionon low-degrees is very small. We also show how to use this inequality to bound the size of global,product-free sets in the alternating group A n .2. Isoperimetric inequalities on the transposition Cayley graph of S n for global functions, that areanalogous to the KKL theorem and to the small-set expansion property in the Boolean hypercube.3. Hypercontractive inequalities on the multi-slice, and stability versions of the Kruskal–Katona The-orem in some regimes of parameters. The hypercontractive inequality is a fundamental result in analysis that allows one to compare variousnorms of low-degree functions over a given domain. A notorious example is the Boolean hypercube { , } n equipped with the uniform measure, in which case the inequality states that for any function f : { , } n → R of degree at most d , one has that k f k q √ q − d k f k for any q > . (Here and throughout the paper, weuse expectation norms, k f k q = E x [ | f ( x ) | q ] /q , where the input distribution is clear from context, uniformin this case). While the inequality may appear technical and mysterious at ﬁrst sight, it has proven itself asremarkably useful, and lies at the heart of numerous important results, e.g. [15, 11, 2, 23].While the hypercontractive inequality holds for general product spaces, in some important cases it is veryweak quantitatively. Such cases include the p -biased cube for p = o (1) , the multi-cube [ m ] n for m = ω (1) ,and the bilinear graph (closely related to the Grassmann graph). This quantitative deﬁciency causes variousanalytical and combinatorial problems on these domains to be considerably more challenging, and indeedmuch less is known there (and what is known is considerably more difﬁcult to prove, see for example [12]). * Department of Computer Science, Technion, Israel. This project has received funding from the European Union’s Horizon2020 research and innovation programme under grant agreement No 802020-ERC-HARMONIC. † Einstein Institute of Mathematics, Hebrew University of Jerusalem. ‡ Einstein Institute of Mathematics, Hebrew University of Jerusalem. § Department of Mathematics, Massachusetts Institute of Technology. .1 Global hypercontractivity Recently, initially motivated by the study of PCPs (probabilistically checkable proofs) and later by sharp-threshold results, variants of the hypercontractive inequality have been established in such domains [20, 17,18]. In these variants, one states an inequality that holds for all functions, but is only meaningful for aspecial (important) class of functions, called global functions . Informally, a function f on a given productdomain Ω = Ω × · · · × Ω n is called global, if its -norm, as well as the -norms of all its restrictions, areall small. This makes these variants applicable in cases that were previously out of reach, leading to newresults, but at the same time harder to apply, since one has to make sure it is applied to a global function toget a meaningful bound (see [17, 22, 18] for example applications.). It is worth noting that these variants arein fact generalizations of the standard hypercontractive inequality, since one can easily show that in domainssuch as the Boolean hypercube, all low-degree functions are global.By now, there are various proofs of the above mentioned results: (1) a proof by reduction to the Booleanhypercube, (2) a direct proof by expanding k f k qq (for even q ’s), (3) an inductive proof on n . All of theseproofs use the product structure of the domain very strongly, and therefore it is unclear how to adapt thembeyond the realm of product spaces.

Signiﬁcant challenges arise when trying to analyze non-product spaces. The simplest examples of suchspaces are the slice and multi-slice, and the symmetric group. The classical hypercontractive inequality isequivalent to another inequality, the log-Sobolev inequality. Sharp Log-Sobolev inequalities were provenfor the symmetric group and the slice by Lee and Yau [21], and for the multi-slice by Salez [26] (improvingon earlier work of Filmus, O’Donnell and Wu [10]).While such log-Sobolev inequalities are useful for balanced slices and multi-slices, their usefulnessfor domains such as the symmetric group is limited, due to the similarity between S n and [ n ] n . For thisreason, Diaconis and Shahshahnai [4] resorted to representation-theoretic techniques in their analysis ofthe convergence of Markov chains on S n . We rectify this issue in a different way, by extending globalhypercontractivity to S n . The main goal of this paper is to study the symmetric group S n , which is probably the most fundamentalnon-product domain. Throughout this paper, we will consider S n as a probability space equipped withthe uniform measure, and use expectation norms, as well as the corresponding expectation inner product,according to the uniform measure. We will think of S n as a subset of [ n ] n , and thereby for π ∈ S n refer to π (1) as “the ﬁrst coordinate of the input”.To state our main results, we begin with deﬁning the notion of globalness on S n . Given f : S n → R anda subset L ⊆ [ n ] × [ n ] of the form { ( i , j ) , . . . , ( i t , j t ) } , where all of the i ’s are distinct and all of the j ’sare distinct, we denote by S Tn the set of permutations π ∈ S n respecting T (i.e. such that π ( i ℓ ) = j ℓ for all We remark that this requirement can often be greatly relaxed: (1) it is often enough to only consider restrictions that ﬁx O (1) of the coordinates of the input, and (2) it is often enough that there are “very few” restrictions that have large -norm, for anappropriate notion of “very few”. This inductive proof is actually much trickier than the textbook proof of the hypercontractive inequality over the Boolean cube.The reason is that the statement of the result itself does not tensorize, thus one has to come up with an alternative, slightly stronger,statement, that does tensorize = 1 , . . . , t ), sometimes known as a double coset (and corresponding to the notion of link in complexes).We denote by f → T : S Tn → R the restriction of f to S Tn , and equip S Tn with the uniform measure. Deﬁnition 1.1.

A function f : S n → R is called ε -global with constant C if for any consistent T , it holdsthat k f → T k C | T | ε . Our basic hypercontractive inequality is concerned with a Markov operator T ( ρ ) that may at ﬁrst notseem very natural. We defer the precise development and motivation for T ( ρ ) to Section 1.5; for now, weencourage the reader to think of it as averaging after a long random walk on the transpositions graph, say oflength Θ( n ) . Theorem 1.2.

For an even q ∈ N and C > , there is ρ > and an operator T ( ρ ) : L ( S n ) → L ( S n ) satisfying:1. If f : { , } n → R is ε -global with constant C , then (cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13) q ε q − k f k /q .2. There is an absolute constant K , such for all d ∈ N satisfying d √ log n/K , it holds that theeigenvalues of T ( ρ ) corresponding to degree d functions are at least ρ − K · d . As is often the case, once one has a hypercontractive inequality involving a noise operator whose eigen-values are well-understood, one can state a hypercontractive inequality for low-degree functions. For us,however, it will be important to relax the notion of globalness appropriately, and we therefore consider thenotion of bounded globalness.

Deﬁnition 1.3.

A function f : S n → R is called ( d, ε ) -global if for any consistent T of size at most d , itholds that k f → T k ε . A natural example of ( d, ε ) -global functions is the low-degree part of f , denoted by f d , which is thedegree d function which is closest to f in L -norm. Here, a function has degree d if it can be written as alinear combination of indicators of sets S Tn for | T | d . Naively, one may expect such connection to triviallyhold (by Parseval); the issue is that restrictions and degree-truncations do not commute as well as in productspaces, so such naive arguments fail. Nevertheless, we show that such a connection indeed holds.With Deﬁnition 1.3 in hand, we can now state our hypercontractive inequality for low-degree functions. Theorem 1.4.

There exists

K > such that the following holds. Let q ∈ N be even, n > q K · d . If f is a (2 d, ε ) -global function of degree d , then k f k q q O ( d ) ε q − q k f k q . Remark 1.5.

The focus of the current paper is on the case that n is very large in comparison to the degree d , and therefore the technical conditions imposed on n in Theorems 1.2 and 1.4 will hold for us. It would beinteresting to relax or even remove these conditions altogether, and we leave further investigation to futureworks. We present some applications of Theorem 1.2 and Theorem 1.4, as outlined below. Formally, our applications only require that the eigenvalues corresponding to low-degree functions are bounded away from (given that n is large enough in comparison the degree of f ), which will be the case. .4.1 The level- d inequality Our ﬁrst application is concerned with the weight a global function has on its low degrees, which is ananalog of the classical level- d inequality on the Boolean hypercube (e.g. [24, Corollary 9.25]). Theorem 1.6.

There exists an absolute constant

C > such that the following holds. Let d, n ∈ N and ε > such that n > Cd log(1 /ε ) Cd . If f : S n → Z is (2 d, ε ) -global, then (cid:13)(cid:13) f d (cid:13)(cid:13) C · d ε log C · d (1 /ε ) . Theorem 1.6 should be compared to the level- d inequality on the hypercube, which asserts that for anyfunction f : { , } n → { , } with E [ f ] = δ < / we have that (cid:13)(cid:13) f d (cid:13)(cid:13) δ (cid:16)

10 log(1 /δ ) d (cid:17) d , for all d log(1 /δ ) . (Quantitatively, the parameter δ should be compared to ε in Theorem 1.6 due to normalization).Note that it may be the case that ε in Theorem 1.6 is much larger than k f k / , and then Theorem 1.6becomes trivial. Fortunately, we can prove a stronger version of Theorem 1.6 for functions f whose -normis not exponentially small, which actually follows relatively easily from Theorem 1.6. Theorem 1.7.

There exists an absolute constant

C > such that the following holds. Let d, n ∈ N , ε > be parameters and let f : S n → Z be a (2 d, ε ) -global function. If n > Cd log(1 / k f k ) C · d , then (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) C · d k f k ε log C · d (1 / k f k ) . On the proof of the level- d inequality. In contrast to the case of the hypercube, Theorem 1.6 does notimmediately follow from Theorem 1.2 or Theorem 1.4, and requires more work, as we explain below. Recallthat one proof of the level- d inequality on the hypercube proceeds, using hypercontractivity, as (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) = h f d , f i (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) q k f k / ( q − p q − d (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) k f k / ( q − , choosing suitable q , and rearranging. Our hypercontractive inequality does not allow us to make the ﬁnaltransition, and instead only tells us that (cid:13)(cid:13) f d (cid:13)(cid:13) q O d,q ( ε ( q − /q ) (cid:13)(cid:13) f d (cid:13)(cid:13) /q . Executing this plan onlyimplies, at best, the quantitatively weaker statement that (cid:13)(cid:13) f d (cid:13)(cid:13) ε / log O d (1) (1 /ε ) . Here, the differencebetween ε / and ε is often crucial, because such results are often only useful for very small ε anyway.To explain how we circumvent this issue, note ﬁrst that the source of the inefﬁciency is that we used thefact that f d is (2 d, ε ) -global, but the reality could be that it is much more global than that (for example, thestatement itself asserts a much stronger bound on the -norm of f d ). To exploit this point, let us considerthe restriction that maximizes the -norm of f d . The most optimistic case would be that the globalness of f d is achieved already by the function itself, which would say that f d is (2 d, O d ( (cid:13)(cid:13) f d (cid:13)(cid:13) )) -global. Inthis case, the argument from the hypercube goes through well enough to achieve the desired bound.What if the globalness of f d is achieved by a restriction of size r instead? In this case, we showthat there is a “derivative” g of f d which achieves roughly the same -norm as that restriction of f d ,and taking any further “derivatives” only decreases the -norm of g . We show that this implies that g is (2 d, O d ( k g k )) global, so we have reached the same situation as before!The above discussion motivates an inductive approach, and in particular proving the statement for allinteger-valued functions (and not only Boolean functions), as stated. This way, we are able to show that Parseval’s identity implies that the sum of all k f = d k is k f k , so in particular (cid:13)(cid:13) f d (cid:13)(cid:13) k f k . We only deﬁne the appropriate notion of derivatives we use in Section 4, and for now encourage the reader to think of it as ananalogous operation to the discrete derivative in the Boolean hypercube. g above we have that k g k = ˜ O d ( ε ) , which implies that f d is (2 d, ˜ O d ( ε )) -global. This is a majorimprovement over our original knowledge regarding f d , and in particular it allows us to run the argumentfrom the hypercube (described above) successfully. We say that a family of permutations F ⊆ S n is product-free if there are no π , π , π ∈ S n such that π = π ◦ π . What is the size of the largest product-free family F ?With the formulation above, one can of course take F to be the set of odd permutations, which has size | S n | / . What happens if we forbid such permutations, i.e. only consider families of even permutations?Questions of this sort generalize the well-studied problem of ﬁnding arithmetic sequences in dense sets.More relevant to us is the work of Gowers [14], which studies this problem for a wide range of groups(referred therein as “quasi-random groups”), and the work of Eberhard [6] which specialized this questionto A n , and improves Gowers’ results. More speciﬁcally, Gowers’ result shows that a product-free F ⊆ A n has size at most O (cid:16) n / | A n | (cid:17) , and Eberhard’s work [6] improves this bound to | F | = O (cid:16) log / n √ n | A n | (cid:17) .We remark that Eberhard’s result is tight up to the polylogarithmic factor, as can be evidenced from thefamily F = (cid:8) π ∈ A n | π (1) ∈ (cid:8) , . . . , √ n (cid:9) , π ( (cid:8) , . . . , √ n (cid:9) ) ⊆ [ n ] \ [ √ n ] (cid:9) . (1)In this section, we consider the problem of determining the maximal size of a global , product-free set in A n . In particular, we show: Theorem 1.8.

There exists N ∈ N such that the following holds for all n > N . For every C > there is K > , such that if F ⊆ A n is product-free and is (6 , C ·√ δ ) -global, where δ = | F | / | A n | , then δ log K nn . Remark 1.9.

A few remarks are in order.1. We note that the above result achieves a stronger bound than the family in (1) . There is no contradic-tion here, of course, since that family is very much not global: restricting to π (1) = 2 increases themeasure of F signiﬁcantly.2. The junta method, which can be used to study many problems in extremal combinatorics, often con-siders the question for global families as a key component. The rough idea is to show that one canapproximate a family F by a union of families ˜ F that satisfy an appropriate pseudo-randomness con-dition, such that if F is product-free than so are the families ˜ F . Furthermore, inside any not-too-smallpseudo-random family ˜ F , one may ﬁnd a global family ˜ F ′ by making any small restriction that in-creases the size of the family considerably. Thus, in this way one may hope to reduce the generalquestion to the question on global families (see [18] for example).While at the moment we do not know how to employ the junta method in the case of product-free setsin A n , one may still hope that it is possible, providing some motivation for Theorem 1.8.3. Our result is in fact more general, and can be used to study the -set version of this problem; seeCorollary 7.9.4. We suspect that much stronger quantitative bounds should hold for global families; we elaborate onthis suspicion in Section 7.2.4. .4.3 Isoperimetric inequalities Using our hypercontractive inequalities we are able to prove several isoperimetric inequalities for globalsets. Let S ⊆ S n be a set, and consider the transpositions random walk T that from a permutation π ∈ S n moves to π ◦ τ , where τ is a randomly chosen transposition. We show that if S is “not too sensitive alongany transposition”, then the probability to exit S in a random step according to T must be signiﬁcant,similarly to the classical KKL Theorem on the hypercube [15]. The formal statement of this result is givenin Theorem 7.13.We are also able to analyze longer random walks according to T , of order ≈ n , and show that one hassmall-set expansion for global sets. See Theorem 7.12 for a formal statement. Our results for S n imply analogous results in the multi-slice. The deduction is done in a black-box fashion,by a natural correspondence between functions over S n and over the multi-slice that preserves degrees,globalness, and L p norms.This allows us to deduce analogs of our results for S n essentially for free (see Section 7.4), as well as astability result for the classical Kruskal–Katona Theorem (see Theorem 7.20). Our hypercontractive inequality has also been used in the study of Probabilistically Checkable Proofs [3].More precisely, to study a new hardness conjecture, referred to as “Rich -to- Games Conjecture” in [3],and show that if true, it implies Khot’s Unique-Games Conjecture [19].

In this section we outline the techniques used in the proofs of Theorem 1.2 and Theorem 1.4.

Consider two ﬁnite probability spaces X and Y , and suppose that C = ( x , y ) is a coupling between them (weencourage the reader to think of X as S n , and of Y as a product space in which we already know hypercon-tractivity to hold). Using the coupling C , we may deﬁne the averaging operators T X → Y : L ( X ) → L ( Y ) and T Y → X : L ( Y ) → L ( X ) as T X → Y f ( y ) = E ( x , y ) ∼C [ f ( x ) | y = y ] , T Y → X f ( x ) = E ( x , y ) ∼C [ f ( y ) | x = x ] . It is easily noted by Jensen’s inequality, that each one of the operators T X → Y and T Y → X is a contractionwith respect to the L p -norm, for any p > . The beneﬁt of considering these operators, is that given anoperator T Y with desirable properties (say, it is hypercontractive, i.e. it satisﬁes k T Y f k k f k ), wemay consider the lifted operator on X given by T X def = T Y → X T Y T X → Y and hope that it too satisﬁes somedesirable properties. Indeed, it is easy to see that if T Y is hypercontractive, then T X is also hypercontractive: k T Y → X T Y T X → Y f k k T Y T X → Y f k k T X → Y f k k f k . (2) The formal statement of the result requires an appropriate notion of discrete derivatives which we only give in Section 4.

6e show that the same connection continues to hold for more reﬁned hypercontractive inequalities such asthe one given in [17, 18] (and more concretely, Theorem 2.5 below). We note that the proof in this case isslightly more involved.While very elegant and appealing, the above approach can only be used to show hypercontractivity fora very special type of operators such as T X deﬁned above, and it is not clear if such results are of any useat all. To remedy this situation, we study the effect of this operator in the spectral domain. In particular, weshow that the action of this operator on “low-degree functions” is very similar to the effect of the standardnoise operator, and thus we are able deduce a hypercontractive inequality for low-degree functions, as inTheorem 1.4. Let L = [ n ] , and let m be large, depending polynomially on n ( m = n will do). We will couple S n and L m , where the idea is to think of each element of L as local information about the coupled permutation π .That is, the element ( i, j ) ∈ L encodes the fact that π maps i to j . Our coupling

We say that a set T = { ( i , j ) , . . . , ( i t , j t ) } ⊆ L of pairs is consistent if there exists a permutation π with π ( i k ) = j k for each k ∈ [ t ] , and any such permutation π is said to be consistent with T .Our coupling between S n and L m is the following:1. Choose an element x ∼ L m uniformly at random.2. Greedily construct from x a set T of consistent pairs. That is, starting from k = 1 to m , we con-sider the k -th coordinate of x , denoted by ( i k , j k ) , and check whether adding it to T would keep itconsistent. If so, we add ( i k , j k ) to T , and otherwise we do not.3. Choose a permutation π consistent with T uniformly at random. The resulting operator

Finally, we can specify our hypercontractive operator on S n . Let X = S n , Y = L m and T X → Y , T Y → X be the operators corresponding to the coupling that we have just constructed. Let T Y = T ρ be the noiseoperator on the product space L m , which can be deﬁned in two equivalent ways:1. Every element is retained with probability ρ , and resampled otherwise.2. The d ’th Fourier level is multiplied by ρ d .Then T ( ρ ) = T Y → X T Y T X → Y is our desired operator on S n .We next explain how to analyze the operator T Y .7 howing that T Y satisﬁes reﬁned hypercontractivity Recall the simplistic argument (2), showing that hypercontractivity of T X implies that the hypercontractivityof T Y . We intend to show, in a similar way, that reﬁned hypercontractivity is also carried over by thecoupling. Towards this end, we must show that the notion of globalness is preserved: namely, if f is global,then g = T S n → L m f is also global. This assertion however very much depends on the precise notion ofglobalness we consider. If we assume that f is ε -global with constant C , then it is easy to show that g isalso ε -global with constant C (see Proposition 3.1), and the argument goes through smoothly. However, inthe case that f is only guaranteed to be ( d, ε ) -global, things are more interesting, and in this case we areonly able to handle f ’s that are of low-degree (this is natural, as we will deal with the low-degree part of ( d, ε ) -global functions).A convenient feature of product spaces is that for low-degree functions, the notions of ε -globalness withconstant C , and ( D, δ ) -globalness, are equivalent up to small losses in parameters. This allows one to invokeresults such as Theorem 2.5 in this case. While we show that the case of the symmetric group possesses asimilar property (at least when n is large enough in comparison to d ), we are not able to immediately use it.The issue is that even if f : S n → R is a function of degree d , it may not be the case that g = T S n → L m f isalso of low degree.We circumvent this issue as follows. Suppose f is (2 d, ε ) -global is of degree d . Then, as remarkedabove, we argue that f is ε -global with some absolute constant C , and so it is ( t, C t ε ) -global for all t ∈ N .Thus, g is ( t, C t ε ) -global for all t . Now, as g is a function over a product space, it is easily seen that thelatter implies that the noisy version of g , h = T C g , is ε -global with factor , and thus we are able toinvoke Theorem 2.5 on it. Together, this implies that taking T Y = T / ◦ T / (4 C ) gets us that k T X f k p ε k f k . (The constant / arises from Theorem 2.5.) Our second approach to establish hypercontractive inequalities goes via a rather different route. One ofthe proofs of hypercontractivity in product domains proceeds by ﬁnding a convenient, orthonormal basisfor the space of real-valued functions over Ω (which in product cases is easy as the basis tensorizes). Thisway, proving hypercontractivity amounts to studying moments of this basis functions as well as other forms,which is often not very hard to do due to the simple nature of the basis.When dealing with non-product spaces, such as S n , we do not know how to produce such a convenientorthonormal basis. Nevertheless, our direct approach presented in Section 6 relies on a representation ofa function f : S n → R in a canonical form that is almost as good as in product spaces. To construct thisrepresentation, we start with obvious spanning sets such as ( d Y ℓ =1 π ( i ℓ )= j ℓ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) |{ i , . . . , i d }| = |{ j , . . . , j d }| = d ) . This set contains many redundancies (and thus is not a basis), and we show how to use these to enforce asystem of linear constraints on the coefﬁcients of the representation that turn out to be very useful in provinghypercontractive inequalities.

In Section 2 we present some basic preliminaries. Sections 3, 4 and 5 are devoted for presenting our ap-proach to hypercontractivity via coupling and algebraic arguments, and in Section 6 we present our direct8pproach. In Sections 7 and 8 we present several consequences of our hypercontractive inequalities: thelevel- d inequality in Section 8, and the other applications in Section 7. We think of the product operation in S n as function composition, and so ( τ σ )( i ) = ( τ ◦ σ )( i ) = τ ( σ ( i )) .Throughout the paper, we consider the space of real-valued functions on S n equipped with the expecta-tion inner product, denoted by L ( S n ) . Namely, for any f, g : S n → R we deﬁne h f, g i = E σ ∈ S n [ f ( σ ) g ( σ )] .A basic property of this space is that it is an S n -bimodule, as can be seen by deﬁning the left operation on afunction f and a permutation τ as τ f ( σ ) = f ( τ ◦ σ ) , and the right operation f τ ( σ ) = f ( σ ◦ τ ) . We will deﬁne the concept of degree d function in several equivalent ways. The most standard deﬁnition isthe one which we already mentioned in the introduction. Deﬁnition 2.1.

Let T = { ( i , j ) , . . . , ( i t , j t ) } ⊆ L be a set of t consistent pairs, and recall that S Tn is theset of all permutations such that π ( i k ) = j k for all k ∈ [ t ] .The space V d consists of all linear combinations of functions of the form T = 1 S Tn for | T | d . We saythat a real-valued function on S n has degree (at most) d if it belongs to V d . By construction, V d − ⊆ V d for all d > . We deﬁne the space of functions of pure degree d as V = d = V d ∩ V ⊥ d − . It is easy to see that V n = V n − , and so we can decompose the space of all real-valued functions on S n as follows: R [ S n ] = V =0 ⊕ V =1 ⊕ · · · ⊕ V = n − . We comment that the representation theory of S n reﬁnes this decomposition into a ﬁner one, indexed bypartitions λ of n ; the space V = d corresponds to partitions in which the largest part is exactly n − d .We may write any function f : S n → R in terms of our decomposition uniquely as n − P i =0 f = i , where f = i ∈ V = i . It will also be convenient for us to have a notation for the projection of f onto V d , which isnothing but f d = f =0 + f =1 + · · · + f = d .We will need an alternative description of V = d in terms of juntas. Deﬁnition 2.2.

Let

A, B ⊆ [ n ] . For every a ∈ A and b ∈ B , let e ab = 1 π ( a )= b . We say that a function f : S n → R is an ( A, B ) -junta if f can be written as a function of the e ab . We denote the space of ( A, B ) -juntas by V A,B .A function is a d -junta if it is an ( A, B ) -junta for some | A | = | B | = d . Lemma 2.3.

The space V A,B is spanned by the functions T for T ⊆ A × B . Consequently, V d is the spanof the d -juntas.Proof. If A = { i , . . . , i d } and B = { j , . . . , j d } then an ( A, B ) -junta f can be written as a function of e i s j t , and in particular as a polynomial in these functions. Since e i s j t e i s j t = e i s j t e i s j t = 0 if t = t and s = s , it follows that f can be written as a linear combination of functions T for T ⊆ A × B .9onversely, if T = { ( a , b ) , . . . , ( a d , b d ) } then T = e a b · · · e a d b d .To see the truth of the second part of the lemma, notice that if | A | = | B | = d and T ⊆ A × B then | T | d , and conversely if | T | d then T ⊆ A × B for some A, B such that | A | = | B | = d .We will also need an alternative description of V A,B . Lemma 2.4.

For each

A, B , the space V A,B consists of all functions f : S n → R such that f = τ f σ for all σ ﬁxing A pointwise and τ ﬁxing B pointwise.Proof. Let U A,B consist of all functions f satisfying the stated condition, i.e., f ( π ) = f ( τ πσ ) whenever σ ﬁxes A pointwise and τ ﬁxes B pointwise.Let a ∈ A and b ∈ B . If σ ﬁxes a and τ ﬁxes b then π ( a ) = b iff τ πσ ( a ) = b , showing that e ab ∈ U A,B .It follows that V A,B ⊆ U A,B .In the other direction, let f ∈ U A,B . Suppose for deﬁniteness that A = [ a ] and B = [ b ] . Let π be a permutation such that π (1) = 1 , . . . , π ( t ) = t , and π ( i ) > b for i = t + 1 , . . . , a . Applying apermutation ﬁxing B pointwise on the left, we turn π into a permutation π ′ such that π ′ (1) , . . . , π ′ ( a ) =1 , . . . , t, b + 1 , . . . , b + ( a − t ) . Applying a permutation ﬁxing A pointwise on the right, we turn π ′ into thepermutation , . . . , t, b +1 , . . . , b +( a − t ) , . . . , n, t +1 , . . . , a . This shows that if π , π are two permutationssatisfying e ab ( π ) = e ab ( π ) for all a ∈ A, b ∈ B then we can ﬁnd permutations σ , σ ﬁxing A pointwiseand permutations τ , τ ﬁxing B pointwise such that τ π σ = τ π σ , and so f ( π ) = f ( π ) . This showsthat f ∈ V A,B . We will make use of the following hypercontractive inequality, essentially due to [18]. For that, we ﬁrstremark that we consider the natural analog deﬁnitions of globalness for product spaces. Namely, for a ﬁniteproduct space (Ω , µ ) = (Ω × · · · × Ω m , µ × · · · × µ m ) , we say that f : Ω → R is ε -global with a constant C , if for any T ⊆ [ m ] and x ∈ Q i ∈ T Ω i it holds that k f T → x k ,µ x C | T | ε , where µ x is the distribution µ conditioned on coordinates of T being equal to x . Similarly, we say that f is ( d, ε ) -global if for any | T | d and x ∈ Q i ∈ T Ω i it holds that k f T → x k ,µ x ε . Theorem 2.5.

Let q ∈ N be even, and suppose f is ε -global with constant C , and let ρ qC ) . Then k T ρ f k q ε q − k f k q . We remark that Theorem 2.5 was proved in [18] for q = 4 , however the proof is essentially the same forall even integers q . In this section we prove the following hypercontractive results for our operator T ( ρ ) assuming f is global.We begin by proving two simple propositions. Proposition 3.1.

Suppose f : S n → R is ε -global with constant C , and let g = T S n → L m f . Then g is ε -global with constant C . roof. Let S be a set of size t , and let x = (cid:0) ( i k , j k ) (cid:1) k ∈ S ∈ L S . Let y ∼ L [ m ] \ S be chosen uniformly, andlet σ be the random permutation that our coupling process outputs given ( x, y ) . We have k g S → x k = E y ( E σ f ( σ )) E σ h f ( σ ) i by Cauchy–Schwarz. Next, we consider the values of σ ( i k ) for k ∈ S , condition on them and denote T = { ( i k , σ ( i k )) } . The conditional distribution of σ given T is uniform by the symmetry of elements in [ n ] \ { i k | k ∈ S } , so for any permutation π on [ n ] \ { i k | k ∈ S } we have that σπ has the same probability as σ . Also, the collection { σπ } consists of all permutations satisfying T , so E h f ( σ ) i = E T (cid:2) k f T k (cid:3) max T k f T k C | S | ε . Fact 3.2.

Suppose that we are given two probability spaces ( X, µ X ) , ( Y, µ Y ) . Suppose further that foreach x ∈ X we have a distribution N ( x ) on Y , such that if we choose x ∼ µ X and y ∼ N ( x ) , then themarginal distribution of y is µ Y . Deﬁne an operator T Y → X : L ( Y ) → L ( X ) by setting T Y → X f ( x ) = E y ∼ N ( x ) f ( y ) . Then k T Y → X f k q k f k q for each q > . We can now prove one variant of our hypercontractive inequality for global functions over the symmetricgroup.

Theorem 3.3.

Let q ∈ N be even, C, ε > , and ρ qC ) . If f : S n → R is ε -global with constant C ,then (cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13) q ε q − q k f k q .Proof. Let f : S n → R be ε -global with constant C . By Proposition 3.1, the function g = T S n → L m f is also ε -constant with constant C , and by Fact 3.2 we have (cid:13)(cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13)(cid:13) qq = k T L m → S n T ρ g k qq k T ρ g k qq . Now, by Theorem 2.5 we may upper-bound the last norm by ε q − k g k , and using Fact 3.2 again we maybound k g k k f k . Remark 3.4.

Once the statement has been proven for even q ’s, a qualitatively similar statement can beautomatically deduced for all q ’s, as follows. Fix q , and take the smallest q q ′ q + 2 that is an eveninteger. Then for ρ q +2) C ) q ′ C ) we may bound (cid:13)(cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13)(cid:13) q (cid:13)(cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13)(cid:13) q ′ ε q ′− q ′ k f k q ′ ε qq +2 k f k q +2 , where in the last inequality we used q ′ q + 2 and k f k ε . .2 Hypercontractivity for low-degree functions Next, we use Theorem 3.3 to prove our hypercontractive inequality for low-degree functions that assumesconsiderably weaker globalness properties of f , namely Theorem 1.4. The proof of the above theorem makesuse of the following key lemmas. The ﬁrst of which asserts that just like in the cube, bounded globalness ofa low-degree function implies (full) globalness. Lemma 3.5.

Suppose n > Cd log d for a sufﬁciently large constant C . Let f : S n → R be a (2 d, ε ) -globalfunction of degree d . Then, f is ε -global with constant . Thus, to deduce Theorem 1.4 from Theorem 3.3, it sufﬁces to show that f may be approximated bylinear combinations of T ( ρ i ) f for i = 1 , , . . . in L q , and this is the content of our second lemma. First, letus introduce some convenient notations. For a polynomial P ( z ) = a + a z + · · · + a k z k , we denote thespectral norm of P by k P k = P ki =0 | a i | . We remark that it is easily seen that k P P k k P kk P k for anytwo polynomials P , P . Lemma 3.6.

Let n > C d q − Cd for a sufﬁciently large constant C , and let ρ = 1 / (400 C q ) . Then thereexists a polynomial P satisfying P (0) = 0 and k P k q O ( d ) , such that (cid:13)(cid:13)(cid:13) P (cid:16) T ( ρ ) (cid:17) f − f (cid:13)(cid:13)(cid:13) q √ n k f k for every function f of degree at most d . We defer the proofs of Lemmas 3.5 and 3.6 to Sections 4 and 5, respectively. In the remainder of thissection we derive Theorem 1.4 from them, restated below.

Theorem 1.4 (Restated) .

There exists

C > such that the following holds. Let q ∈ N be even, n > q C · d .If f is a (2 d, ε ) -global function of degree d , then k f k q q O ( d ) ε q − q k f k q .Proof. Choose ρ = 1 / (400 C q ) , and let P be as in Lemma 3.6. Then k f k q (cid:13)(cid:13)(cid:13) P (cid:16) T ( ρ ) (cid:17) f (cid:13)(cid:13)(cid:13) q + 1 √ n k f k . As for the ﬁrst term, we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) l X i =1 a i (cid:16) T ( ρ ) (cid:17) i f (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) q l X i =1 | a i | (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) T ( ρ ) (cid:17) i f (cid:13)(cid:13)(cid:13)(cid:13) q k P k (cid:13)(cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13)(cid:13) q q O ( d ) (cid:13)(cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13)(cid:13) q . To estimate (cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13) q , note ﬁrst that by Lemma 3.5, f is ε -global for constant , thus given that C is largeenough we may apply Theorem 3.3 to deduce that (cid:13)(cid:13) T ( ρ ) f (cid:13)(cid:13) q ε q − q k f k q . As k f k ε we conclude that k f k q q O ( d ) ε q − q k f k q + 1 √ n k f k = q O ( d ) ε q − q k f k q . Proof of Lemma 3.5

We begin by proving Lemma 3.5. A proof of the corresponding statement in product spaces proceeds byshowing that a function is ( d, ε ) -global if and only if the -norms of derivatives of f of order d are small.Since then derivatives of order higher than d of f are automatically (by degree considerations), they areautomatically small. Thus, if f is a ( d, ε ) -global function of degree d , then all derivatives of f have small -norm, and by the reverse relation it follows that f is ε -global for some constant C .Our proof follows a similar high level idea. The main challenge in the proof is to ﬁnd an appropriateanalog of discrete derivatives from product spaces, that both reduces the degree of the function f and can berelated to restrictions of f . Towards this end, we make the following key deﬁnition. Deﬁnition 4.1.

Let i = i ∈ [ n ] and j = j ∈ [ n ] .1. The Laplacian of f along ( i , i ) is deﬁned as L ( i ,i ) [ f ] = f − f ( i i ) , where we denote by ( i i ) the transposition of i and i .2. The derivative of f along ( i , i ) → ( j , j ) is (L ( i ,i ) f ) ( i ,i ) → ( j ,j ) . More explicity, it is a functiondeﬁned on S ( i ,j ) , ( i ,j ) n (that is isomorphic to S n − ) whose value on π is f ( π ) − f ( π ◦ ( i , i )) .

3. For distinct i , . . . , i t and distinct j , . . . , j t , denote the ordered set S = { ( i , j ) , . . . , ( i t , j t ) } anddeﬁne the Laplacian of f along S as L S [ f ] = L i ,j ◦ · · · ◦ L i t ,j t ◦ f .For ( k , ℓ ) , . . . , ( k t , ℓ t ) , the derivative of f along S → { ( k , ℓ ) , . . . , ( k t , ℓ t ) } is D S →{ ( k ,l ) ,..., ( k t ,l t ) } f = ( L S [ f ]) S →{ ( i ,k ) , ( j ,l ) ,..., ( i t ,k t ) , ( j t ,l t ) } We call D a derivative of order t . We also include the case where t = 0 , and call the identity operatora -derivative. The following two claims show that the deﬁnition of derivatives above is good, in the sense that -normsof derivatives relate to globalness, and derivatives indeed reduce the degree of f . Claim 4.2.

Let t ∈ N , and ε > , and f : S n → R .1. If f is (2 t, ε ) -global, then for each derivative D of order t we have that k D f k t ε .2. If t n/ , and for all ℓ t and every derivative D of order ℓ we have that k D f k ε , then f is ( t, t ε ) -global.Proof. The ﬁrst item follows immediately by induction on t using the triangle inequality. The rest of theproof is devoted to establishing the second item, also by induction on t . Base case t = 0 , . The case t = 0 is trivial, and we prove the case t = 1 . Let i , i ∈ [ n ] be distinct andlet j , j ∈ [ n ] be distinct. Since k D ( i ,i ) → ( j ,j ) f k ε we get from the triangle inequality that |k f i → j ,i → j k − k f i → j ,i → j k | ε. (3)13ultiplying (3) by k f i → j ,i → j k + k f i → j ,i → j k we get that (cid:12)(cid:12) k f i → j ,i → j k − k f i → j ,i → j k (cid:12)(cid:12) ε ( k f i → j ,i → j k + k f i → j ,i → j k ) . Taking average over j and using the triangle inequality on the left-hand side, we get that (cid:12)(cid:12) k f i → j k − k f i → j k (cid:12)(cid:12) ε E j [ k f i → j ,i → j k + k f i → j ,i → j k ] . By Cauchy–Schwarz, E j [ k f i → j ,i → j k ] E j (cid:2) k f i → j ,i → j k (cid:3) / = k f i → j k , and similarly for theother term, so we conclude (cid:12)(cid:12) k f i → j k − k f i → j k (cid:12)(cid:12) ε ( k f i → j k + k f i → j k ) , and dividing both sides of the inequality by k f i → j k + k f i → j k we get |k f i → j k − k f i → j k | ε. Since E i ∼ [ n ] k f i → j k = k f k ε , we get that there is i such that k f i → j k ε , and the aboveinequality implies that k f i → j k ε for all i . This completes the proof for the case t = 1 . The inductive step.

Let t > . We prove that f is (cid:0) t, t ε (cid:1) -global, or equivalently that f T is (cid:0) , t ε (cid:1) -globalfor all consistent sets T of size t − . Indeed, ﬁx a consistent T of size t − .By the induction hypothesis, k f T k t − ε , and the claim would follow from the t = 1 case once weshow that k D f → T k t − ε for all order derivatives D = D ( i ,i ) → ( j ,j ) , where i , i do not appearas the ﬁrst coordinate of an element in T , and j , j do not appear as a second coordinate of an elementof T (we’re using the fact here that the case t = 1 applies, as S Tn is isomorphic to S n −| T | as S n −| T | -bimodules). Fix such D , and let g = D ( i ,i ) → ( j ,j ) f . By hypothesis, for any order t − derivative ˜D we have that k ˜D g k ε , hence by the induction hypothesis k g → T k t − ε . Since restrictions andderivatives commute, we have g → T = D ( i ,i ) → ( j ,j ) f → T , and we conclude that f → T is (cid:0) , t ε (cid:1) -global, asdesired. Claim 4.3. If f is of degree d , and D is a t -derivative, then D f is of degree d − t .Proof. It is sufﬁcient to consider the case t = 1 of the proposition, as we may apply it repeatedly. Bylinearity of the derivative D it is enough to show it in the case where f = x i → j · · · x i t → j t . Now note thatthe Laplacian L ( k k ) annihilates f unless either k is equal to some i ℓ , or k is equal to some i ℓ , or both,and we only have to consider these cases. Each derivative corresponding to the Laplacian L ( k ,k ) restrictsboth the image of k and the image of k , so after applying this restriction on L ( k ,k ) f we either get the function, a function of degree d − , or a function of degree d − .We are now ready to prove Lemma 3.5. To prove that f is global, we handle restrictions of size t n/ ,and restrictions of size t > n/ separately, in the following two claims. Claim 4.4.

Suppose f : S n → R is a (2 d, ε ) -global function of degree d . Then f is (cid:0) t, t ε (cid:1) -global for each t n .Proof. By the second item in Claim 4.2, it is enough to show that for each t -derivative D we have k D f k t ε . For t d this follows from the ﬁrst item in Claim 4.2, and for t > d it follows from Proposition 4.3 aswe have that D f = 0 for all derivatives of order t . 14or t > n , we use the obvious fact f is always ( t, k f k ∞ ) -global, and upper bound the inﬁnity norm of f using the following claim. Claim 4.5.

Let f be a (2 d, ε ) -global function of degree d . Then k f k ∞ p (6 d )!4 n ε .Proof. We prove the claim by induction on n . The case n = 1 is obvious, so let n > .If d n , then by Claim 4.4 we have that f is (cid:0) d, d ε (cid:1) -global, and hence for each set S of size d , thefunction f → S is (cid:0) d, d ε (cid:1) -global. Therefore, the induction hypothesis implies that k f k ∞ = max S : | S | = d k f S k ∞ p (6 d )!4 n − d ) · d ε = p (6 d )!4 n ε. Suppose now that n d . Then k f k ∞ (6 d )! k f k since the probability of each atom in S d is d )! .Hence, k f k ∞ p (6 d )! ε .Note that (6 d )! n given C is sufﬁciently large, so for t > n/ , Claim 4.5 implies that f is ( t, n ε ) =( t, t ε ) -global. Proof overview.

Our argument ﬁrst constructs a very strong approximating polynomial in the L -norm.The approximation will be in fact strong enough to imply, in a black-box way, that it is also an approximatingpolynomial in L q .To construct an L approximating polynomial, we use spectral considerations. Denote by λ , . . . , λ ℓ theeigenvalues of T ( ρ ) on the space of degree d functions. Note that if P is a polynomial such that P ( λ i ) = 1 for all i , then P (T ρ ) f = f for all f of degree d . However, as ℓ may be very large, there may not be apolynomial P with small k P k satisfying P ( λ i ) = 1 for all i , and to circumvent this issue we must arguethat, at least effectively, ℓ is small. Indeed, while we do not show that ℓ is small, we do show that there are d distinct values, λ ( ρ ) , . . . , λ d ( ρ ) , such that each λ i is very close to one of the λ j ( ρ ) ’s. This, by interpolation,implies that we may ﬁnd a low-degree polynomial P such that P ( λ i ) is very close to for all i = 1 , . . . , ℓ .Finally, to argue that k P k is small, we show that each λ i ( ρ ) is bounded away from .It remains then to establish the claimed properties of the eigenvalues λ , . . . , λ ℓ , and we do so in severalsteps. We ﬁrst identify the eigenspaces of T ( ρ ) among the space of low-degree functions, and show thateach one of them contains a junta. Intuitively, for juntas it is much easier to understand the action of the T ( ρ ) , since when looking on very few coordinates, S n looks like a product space. Indeed, using this logicwe are able to show that all eigenvalues of T ( ρ ) on low-degree functions are bounded away from . To arguethat the eigenvalues are concentrated on a few values, we use the fact that taking symmetry into account, thenumber of linearly independent juntas is small.Our proof uses several notations appearing in Section 2.1, including the actions of S n on functions fromthe left τ f and from the right f σ , the level decomposition V d , the spaces V A,B , and the concept of d -junta. T ( ρ ) T ( ρ ) commutes with the action of S n as a bimoduleLemma 5.1. The operator T ( ρ ) commutes with the action of S n as a bimodule. The proof relies on the following claims. 15 laim 5.2. If T , S are operators that commute with the action of S n as a bimodule, then so is T ◦ S .Proof. We have π (T Sf ) π = T ( π Sf π ) = TS( π f π ) .Let X and Y be S n -bimodules, and consider X × Y as an S n -bimodule with the operation σ ( x, y ) σ =( σ x σ , σ y σ ) . We say that a probability distribution µ on X × Y is invariant under the action of S n on bothsides if µ ( σ ( x, y ) σ ) = µ ( x, y ) for all x ∈ X , y ∈ Y and σ , σ ∈ S n . Claim 5.3.

Let

X, Y be S n -bimodules that are coupled by the probability measure µ , and suppose that µ is invariant under the action of S n from both sides. Then the operators T X → Y , T Y → X commute with theaction of S n from both sides.Proof. We prove the claim for T X → Y (the argument for T Y → X is identical). Let µ X , µ Y be the marginaldistributions of µ on X and on Y , and for each x ∈ X denote by x the indicator function of x . Then theset { x } x ∈ X is a basis for L ( X ) , and so it is enough to show that for all x and σ , σ ∈ S n it holds that σ (T X → Y x ) σ = T X → Y ( σ xσ ) . Note that as these are two functions over Y , it is enough to show that h σ (T X → Y x ) σ , y i = h T X → Y ( σ xσ ) , y i for all y , since { y } y ∈ Y forms a basis for L ( Y ) .Fix x and y . Since µ is invariant under the action of S n on both sides, it follows that µ Y is invariantunder the action of S n , so we have h σ (T X → Y x ) σ , y i = D T X → Y x , σ − yσ − E = (cid:10) T X → Y x , σ y σ (cid:11) = µ ( x, σ yσ ) , where in the penultimate transition we used the fact that σ − yσ − = 1 σ y σ . On the other hand, we alsohave that the last fact holds for x , and so h T X → Y ( σ xσ ) , y i = (cid:28) T X → Y σ − x σ − , y (cid:29) = µ (cid:0) σ − xσ − , y (cid:1) . The claim now follows from the fact that µ is invariant under the action of S n from both sides.We are now ready to move on to the proof of Lemma 5.1. Proof of Lemma 5.1.

We let S n act on L from the right by setting ( i, j ) π = ( π ( i ) , j ) and from the left bysetting π ( i, j ) = ( i, π ( j )) . For a function f on L m we write π f π for the function ( x , . . . , x m ) f ( π x π , . . . , π x m π ) . By Claim 5.3 the operators T ρ , T S n → L m , T L m → S n commute with the action of S n as a bimodule, andtherefore so is T ( ρ ) by Claim 5.2. V A,B and V d are invariant under T ( ρ ) First we show that V A,B is an invariant subspace of T ( ρ ) . Lemma 5.4.

Let T be an endomorphism of L ( S n ) as an S n -bimodule. Then T V A,B ⊆ V A,B . Moreover,

T V d ⊆ V d . roof. Let f ∈ V A,B . We need to show that T f ∈ V A,B . Let σ ∈ S [ n ] \ A , σ ∈ S [ n ] \ B . Then σ (T f ) σ = T ( σ f σ ) = T f, where the ﬁrst equality used the fact that T commutes with the action of S n from both sides, and the secondinequality follows from Lemma 2.4. The ‘moreover’ part follows from Lemma 2.3. Lemma 5.5.

Let λ be an eigenvalue of T ( ρ ) as an operator from V d to itself. Let V d,λ be the eigenspacecorresponding to λ . Then V d,λ contains a d -junta.Proof. Since each space V A,B is T ( ρ ) invariant, we may decompose each V A,B into eigenspaces V ( λ ) A,B . Let V ( λ ) d = X | A | , | B | d V ( λ ) A,B . Then for each λ , V ( λ ) d is an eigenspaces of T ( ρ ) with eigenvalue λ , and X λ V ( λ ) d = X | A | , | B | d V A,B = V d = X λ V d,λ . By uniqueness, it follows that V d,λ = V ( λ ) d for all λ . Fix λ ; then we get that there are | A | , | B | d such that V λA,B ⊆ V d,λ , and since any function in V A,B is a d -junta by deﬁnition, the proof is concluded.We comment that the representation theory of S n supplies us with explicit formulas for d -juntas in V d,λ (arising in the construction of Specht modules), which can be turned into d -juntas by symmetrization. Sincewe will not need such explicit formulas here, we skip this description. V A,B

We now move on to the study of the spaces V A,B . These spaces have small dimension and are therefore easyto analyse. We ﬁrst construct a set { v T } of functions in V A,B that form a nearly-orthonormal basis.

Deﬁnition 5.6.

Let T = { ( i , j ) , . . . , ( i k , j k ) } ⊆ [ d ] be consistent. Let T be the indicator function ofpermutation π in S n that satisfy the restrictions given by T , i.e. π ( i ) = j , . . . , π ( i i k ) = j k . We deﬁne v T = T k T k . Since the spaces V A,B are isomorphic (as S n − d bimodules) for all sets A, B of size d , we shall focus onthe case where A = B = [ d ] . Lemma 5.7.

Let d n , and let T = S be sets of size d . Then h v T , v S i O (cid:0) n (cid:1) .Proof. If T ∪ S is not consistent, then T S = 0 and so h v T , v S i = 0 . Otherwise, h v T , v S i = E | T ∪ S |k T k k S k = ( n − | T ∪ S | )! p ( n − | T | )! ( n − | S | )! ( n − d − n − d )! = O (cid:18) n (cid:19) . Proposition 5.8.

There exists an absolute constant c > such that for all consistent T ⊆ L we have D T ( ρ ) v T , v T E > ( cρ ) | T | . roof. Let x ∼ L m , y ∼ N ρ ( x ) , and let σ x , σ y ∈ S n be corresponding permutations chosen according tothe coupling. We have D T ( ρ ) v T , v T E = n !( n − | T | )! D T ( ρ ) T , T E , as k T k = ( n −| T | )! n ! . We now interpret (cid:10) T ( ρ ) T , T (cid:11) as the probability that both σ x and σ y satisfy therestrictions given by T . For each ordered subset S ⊆ [2 n ] of size | T | consider the event A S that x S = y S = T , while all the coordinates of the vectors x [2 n ] \ S , y [2 n ] \ S do not contradict T and do not belong to T . Then D T ( ρ ) x T , x T E > X S an ordered | T | -subset of [2 n ] Pr [ A S ] . Now the probability that x S = T is (cid:0) n (cid:1) | T | . Conditioned on x S = T , the probability that y S = T is at least ρ | T | . When we condition on x S = y S = T , we obtain that the probability that x [ n ] \ S and y [ n ] \ S do not involve any coordinate contradicting T or in T is at least (cid:16) − | T | n (cid:17) n = 2 − Θ( | T | ) . Hence Pr [ A S ] > (cid:0) n (cid:1) | T | Ω ( ρ ) | T | . So wrapping everything up we obtain that D T ( ρ ) v T , v T E > (2 n )!(2 n − | T | )! · n !( n − | T | )! 1 n | T | Ω( ρ ) | T | = Ω ( ρ ) | T | . Lemma 5.9.

Let ρ ∈ (0 , . Then for all sets T = S of size at most n/ we have (cid:10) T ( ρ ) v T , v S (cid:11) = O (cid:16) √ n (cid:17) .Proof. Suppose without loss of generality that k T k k S k , so | T | > | S | . Choose x ∼ L m , y ∼ N ρ ( x ) ,and let σ x , σ y by the corresponding random permutations given by the coupling. We have D T ( ρ ) v T , v S E = Pr [1 T ( σ x ) = 1 , S ( σ y ) = 1] √ E T E S As the probability in the numerator is at most E [1 T ] , we have D T ( ρ ) v T , v S E s E [1 T ] E [1 S ] = s ( n − | T | )!( n − | S | )! , and the proposition follows in the case that | S | < | T | .It remains to prove the proposition provided that | S | = | T | . Let ( i, j ) ∈ S \ T . Note that Pr [1 T ( σ x ) = 1 , S ( σ y ) = 1] n Pr [1 T ( σ x ) = 1 | σ y ( i ) = j ] . Let us condition further on σ x ( i ) . Conditioned on σ x ( i ) = j , we have that σ x is a random permutationsending i to j , and so Pr [1 T ( σ X ) = 1] is either 0 (if ( i, j ) contradicts T ) or ( n − −| T | )!( n − = O (cid:0) k T k (cid:1) (if ( i, j ) is consistent with T ).Conditioned on σ x ( i ) = j (and on σ y ( i ) = j ), we again obtain that σ x is a random permutation thatdoes not send i to j , in which case Pr [1 T ( σ x ) = 1] = ( n − | T | )! n ! − ( n − O (cid:0) k T k (cid:1) ( i, j ) contradicts T , and Pr [1 T ( σ x ) = 1] = ( n − | T | )! − ( n − | T | − n ! − ( n − O (cid:0) k T k (cid:1) if ( i, j ) is consistent with T . This completes the proof of the lemma. Proposition 5.10.

Let C be a sufﬁciently large constant. If n > (cid:0) ρC (cid:1) − d C d and f is a d -junta, then D T ( ρ ) f, f E > ρ O ( d ) k f k . Proof.

Since { v T } T ⊆ [ d ] span the space V [ d ] , [ d ] of ([ d ] , [ d ]) -juntas by Lemma 2.3, we may write f = P a T v T . Now D T ( ρ ) f, f E = X T a T D T ( ρ ) v T , v T E + X T = S a T a S D T ( ρ ) v T , v S E . By Lemma 5.9 we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X T = S a T a S D T ( ρ ) v T , v S E(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) O  X T = S | a T a S |√ n  O (cid:18) √ n (cid:19) X T | a T | ! O ( d ) √ n X T | a T | ! , where the last inequality is by Cauchy–Schwarz. On the other hand, by Proposition 5.8 we have X T a T D T ( ρ ) v T , v T E > ρ O ( d ) X T a T ! . Using a similar calculation, one sees that k f k = ± O ( d ) n ! X T a T , so we get that D T ( ρ ) f, f E > ρ O ( d ) − O ( d ) √ n ! X T a T > ρ O ( d ) − O ( d ) √ n ! k f k > ρ O ( d ) k f k . Corollary 5.11.

Let C be a sufﬁciently large absolute constant. If n > (cid:0) ρC (cid:1) − d C d then all the eigenvaluesof T ( ρ ) as an operator from V d to itself are at least ρ O ( d ) .Proof. By Lemma 5.5, each eigenspace V d,λ contains a d -junta. Let f ∈ V d,λ be a nonzero d -junta. Thenby Proposition 5.10, λ = (cid:10) T ( ρ ) f, f (cid:11) k f k > ρ O ( d ) . .3 Showing that the eigenvalues of T ( ρ ) on V d are concentrated on at most d values Let λ i ( ρ ) = (cid:10) T ( ρ ) v T , v T (cid:11) , where T is a set of size i . Then symmetry implies that λ i ( ρ ) does not dependon the choice of T . Lemma 5.12.

Suppose that n > (cid:0) ρC (cid:1) O ( d ) C d . Then each eigenvalue of T ( ρ ) as an operator on V d is equalto λ i ( ρ ) (cid:16) ± n − (cid:17) for some i d .Proof. Let λ be an eigenvalue of T ( ρ ) , and let f be a corresponding eigenfunction in V [ d ] , [ d ] . Write f = X a S v S , where the sum is over all S = { ( i , j ) , . . . , ( i t , j t ) } ⊆ [ d ] . Then ( ρ ) f − λf , but on the other handfor each set S we have D T ( ρ ) f − λf, v S E = a S (cid:16)D T ( ρ ) v S , v S E − λ (cid:17) ± X | S |6 = | T | | a T | (cid:16)(cid:12)(cid:12)(cid:12)D T ( ρ ) v T , v S E(cid:12)(cid:12)(cid:12) + | λ | |h v T , v S i| (cid:17) = a S (cid:0) λ | S | ( ρ ) − λ (cid:1) ± O (cid:18) P T = S | a T |√ n (cid:19) . Thus, for all S we have that | a S | (cid:12)(cid:12) λ | S | ( ρ ) − λ (cid:12)(cid:12) O (cid:18) P T = S | a T |√ n (cid:19) . On the other hand, choosing S that maximizes | a S | , we ﬁnd that | a S | > P T = S | a T | d , and plugging that intothe previous inequality yields that (cid:12)(cid:12) λ | S | ( ρ ) − λ (cid:12)(cid:12) O (cid:16) d (cid:17) √ n n − . ρ − d n − / λ | S | ( ρ ) , provided that C is sufﬁciently large. L variant of Lemma 3.6 Lemma 5.13.

Let n > ρ − Cd for a sufﬁciently large constant C . There exists a polynomial P ( z ) = P ki =1 a i z i , such that k P k ρ − O ( d ) and k P (cid:0) T ( ρ ) (cid:1) f − f k n − d k f k .Proof. Choose P ( z ) = 1 − Q di =1 (cid:0) λ − i z − (cid:1) d , where λ i = λ i ( ρ ) . Orthogonally decompose T ( ρ ) towrite f = P λ f = λ , for nonzero orthogonal functions f = λ ∈ V d satisfying T ( ρ ) f = λ = λf = λ , and let g = P (cid:0) T ( ρ ) (cid:1) f − f . Then g = P λ ( P ( λ ) − f = λ . Therefore k g k = X λ ( P ( λ ) − k f = λ k max λ ( P ( λ ) − k f k . Suppose the maximum is attained at λ ⋆ . By Lemma 5.12, there is i d such that λ ⋆ = λ i (1 ± n − ) , and so (cid:12)(cid:12)(cid:12)(cid:0) λ − i λ ⋆ − (cid:1) d (cid:12)(cid:12)(cid:12) n − d . For any j = i , we have by Proposition 5.10 that λ j > ρ O ( d ) , and so (cid:12)(cid:12)(cid:12)(cid:0) λ − i λ ⋆ − (cid:1) d (cid:12)(cid:12)(cid:12) ρ − O ( d ) . (1 − P ( λ ⋆ )) ρ − O ( d ) n − d n − d , where the last inequality follows from the lower bound on n . To ﬁnish up the proof then, we must upperbound k P k , and this is relatively straightforward: k P k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) d Y i =1 (cid:0) λ − i z − (cid:1) d (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) d Y i =1 (cid:13)(cid:13) λ − i z − (cid:13)(cid:13) d = 1 + d Y i =1 (1 + λ − ) d d Y i =1 (1 + ρ − O ( d ) ) d , which is at most ρ − O ( d ) . In the second inequality, we used the fact that k P P k k P k k P k . L q approximation To deduce the L q approximation of the polynomial P from Lemma 5.13 we use the following basic type ofhypercontractive inequality (this bound is often times too weak quantitatively, but it is good enough for ussince we have a very strong L approximation). Lemma 5.14.

Let C be sufﬁciently large, and let n > C d q d . Let f : S n → R be a function of degree d .Then k f k q q O ( d ) n d k f k .Proof. Let ρ = · · q ) . Decomposing f into the P λ f = λ where T ( ρ ) f = λ = λf = λ , we may ﬁnd g ofdegree d , such that f = T ( ρ ) g , namely, g = P λ λ − f = λ . By Parseval and Corollary 5.11, we get that k g k ρ − O ( d ) k f k . Thus, we have that k f k q = (cid:13)(cid:13) T ( ρ ) g (cid:13)(cid:13) q , and to upper bound this norm we intend to useTheorem 3.3, and for that we show that g is global with fairly weak parameters.Let T ⊆ L be consistent of size at most d . Then k g → T k = E x g ( x )1 T ( x ) E x [1 T ( x )] s E x g ( x ) E x [1 T ( x )] n | T | k g k n | T | ρ − O ( d ) k f k , and so g is (2 d, ε ) global for ε = n d/ ρ − O ( d ) k f k . Lemma 3.5 now implies that g is ε -global with constant . By the choice of ρ , we may now use Theorem 3.3 to deduce that (cid:13)(cid:13)(cid:13) T ( ρ ) g (cid:13)(cid:13)(cid:13) q ε ( q − /q k g k /q n d/ ρ − O ( d ) k f k n d q O ( d ) k f k . Finally, we combine Lemma 5.13 and Lemma 5.14 to deduce the L q approximating polynomial. Proof of Lemma 3.6.

Let f be a function of degree d . By lemma 5.13 there exists a P with k P k ρ − O ( d ) and P (0) = 0 such that the function g = P (cid:0) T ( ρ ) (cid:1) f − f satisﬁes k g k n − d k f k . By Lemma 5.14, k g k q q d n − d k f k √ n k f k , provided that C is sufﬁciently large, completing the proof. In this section, we give an alternative proof to a variant of Theorem 1.4. This approach starts by identifyinga trivial spanning set of the space V t of degree t functions from Deﬁnition 2.1.21 otations. For technical reasons, it will be convenient for us to work with ordered sets. We denote by [ n ] t the collection of ordered sets of size t , which are simply t -tuples of distinct elements from [ n ] , but wealso allow set operations (such as \ ) on them. We also denote n t = | [ n ] t | = n ( n − · · · ( n − t + 1) . Forordered sets I = { i , . . . , i t } , J = { j , . . . , j t } , we denote by I → J ( π ) the indicator of π ( i k ) = j k for all k = 1 , . . . , t ; for convenience, we also denote this by π ( I ) = J .With the above notations, the following set clearly spans V t , by deﬁnition: { I → J | | I | = | J | t } . (4)We remark that this set is not a basis, since these functions are linearly dependent. For example, for t = 1 we have P ni =1 π (1)= i − . This implies that a function f ∈ V has several different representationsas a linear combination of functions from the spanning set (4). The key to our approach is to show thatthere is a way to canonically choose such a linear combination, which is both unique and works well withcomputations of high moments. Deﬁnition 6.1.

Let f ∈ V = t , and suppose that f = P I,J ∈ [ n ] t a ( I, J )1 I → J . We say that this representation is normalized if1. For any r t , J = { j , . . . , j t } and I = { i , . . . , i r − , i r +1 , . . . , i t } we have that X i r I a ( { i , . . . , i t } , J ) = 0 .

2. Analogously, for any r t , I = { i , . . . , i t } and J = { j , . . . , j r − , j r +1 , . . . , j t } we have that X j r J a ( I, { j , . . . , j t } ) = 0 .

3. Symmetry: for all ordered sets

I, J of size t and π ∈ S t , we have a ( I, J ) = a ( π ( I ) , π ( J )) . More loosely, we say that a representation according to the spanning set (4) is normalized if averagingthe coefﬁcients according to a single coordinate results in . We also refer to the equalities in Deﬁnition (4)as “normalizing relations”. In this section, we show that a normalized representation always exists, and thenshow how it is useful in establishing hypercontractive statements similar to Theorem 1.4.Normalized representations ﬁrst appear in the context of the slice by Dunkl [5], who called normalizedrepresentations harmonic functions . See also the monograph of Bannai and Ito [1, III.3] and the papers [8, 9].Ryan O’Donnell (personal communication) has proposed calling them zero-ﬂux representations. Lemma 6.2.

Let t t , and let f ∈ V t . Then we may write f = h + g , where h ∈ V t − and g is givenby a set of coefﬁcients satisfying the normalizing relations g = P I,J ∈ [ n ] t a t ( I, J )1 I → J ( π ) .Proof. The proof is by induction on t .Fix t > and f ∈ V t . Then we may write f ( π ) = P I,J ∈ [ n ] t a ( I, J )1 I → J ( π ) , where the coefﬁcientssatisfy the symmetry property from Deﬁnition 6.1. 22hroughout the proof, we will change the coefﬁcients in a sequential process, and always maintain theform f = h + P | I | = | J | = t b ( I, J )1 I → J ( π ) for h ∈ V t − .Take r ∈ [ t ] , and for each I = { i , . . . , i t } , J = { j , . . . , j t } , deﬁne the coefﬁcients b ( I, J ) = a ( I, J ) − n − t + 1 X i I \{ i r } a ( { i , . . . , i r − , i, i r +1 , . . . , i t } , J ) . (5)In Claim 6.3 below, we prove that after making this change of coefﬁcients, we may write f = h + P | I | = | J | = t b ( I, J )1 I → J ( π ) , and that the coefﬁcients b ( I, J ) satisfy all normalizing relations that the a ( I, J ) do, as well as the normalizing relations from the ﬁrst collection in Deﬁnition 6.1 for r . We repeat thisprocess for all r ∈ [ t ] .After this process is done, we have f = h + P I,J ∈ [ n ] t b ( I, J )1 I → J ( π ) , where the coefﬁcients a ( I, J ) satisfy the ﬁrst collection of normalizing relations from Deﬁnition 6.1. We can now perform the analogousprocess on the J part, and by symmetry obtain that after this process, the second collection of normalizingrelations in Deﬁnition 6.1 hold. One only has to check that this does not destroy the ﬁrst collection ofnormalizing relations, which we also prove in Claim 6.3.Finally, we symmetrize f to ensure that it satisﬁes the symmetry condition. To do so, we replace g = P I,J ∈ [ n ] t b ( I, J )1 I → J ( π ) with g ′ = P π ∈ S t g π , where (1 I → J ) π = 1 π ( I ) π ( J ) (and extended linearly).It is easy to check that g = g π as functions, and that g π satisﬁes the two sets of normalizing relations. Itfollows that so does g ′ , and furthermore by construction, g ′ is symmetric. Claim 6.3.

The change of coefﬁcients (5) has the following properties:1. The coefﬁcients b ( I, J ) satisfy the normalizing relation in the ﬁrst item for r in Deﬁnition 6.1.2. If the coefﬁcients a ( I, J ) satisfy the normalizing relation in the ﬁrst item in Deﬁnition 6.1 for r ′ = r ,then so do b ( I, J ) .3. If the coefﬁcients a ( I, J ) satisfy the normalizing relation in the second item in Deﬁnition 6.1 for r ′ ,then so do b ( I, J ) .4. We may write f = h + P | I | = | J | = t b ( I, J )1 I → J ( π ) , where h ∈ V t − .Proof. We prove each one of the items separately.

Proof of the ﬁrst item.

Fix I = { i , . . . , i r − , i r +1 . . . , i t } , J = { j , . . . , j t } , and calculate: X i r I b ( { i , . . . , i t } , J ) = X i r I a ( { i , . . . , i t } , J ) − n − t + 1 X i I a ( { i , . . . , i r − , i, i r +1 , . . . , i t } , J ) ! = X i r I a ( { i , . . . , i t } , J ) − n − t + 1 X i r Ii I a ( { i , . . . , i r − , i, i r +1 , . . . , i t } , J ) . (6)As in the second double sum, for each i r the coefﬁcient a ( { i , . . . , i r − , i r , i r +1 , . . . , i t } , J ) is counted n − | I | = n − t + 1 times, we get that the above expression is equal to .23 roof of the second item. Fix r ′ = r , and suppose a ( · , · ) satisfy the ﬁrst set of normalizing relations for r ′ .Without loss of generality, assume r ′ < r . Let I = { i , . . . , i r ′ − , i r ′ +1 , . . . , i t } , J = { j , . . . , j t } . Below,we let i, i r ′ be summation indices and we denote I ′ = { i , . . . , i r ′ − , i r ′ , i r ′ +1 , . . . , i r − , i, i r , . . . , , i t } .Calculating as in (6): X i r ′ I b ( { i , . . . , i t } , J ) = X i r ′ I a ( { i , . . . , i t } , J ) − n − t + 1 X i I \{ i r } a ( I ′ , J )= X i r ′ I a ( { i , . . . , i t } , J ) − n − t + 1 X i r ′ I X i I \{ i r } a ( I ′ , J ) . (7)The ﬁrst sum is by the assumption of the second item. For the second sum, we interchange the order ofsummation to see that it is equal to P i I \{ i r } P i r ′ I a ( I ′ , J ) , and note that for each i , the inner sum is againby the assumption of the second item. Proof of the third item.

Fix r ′ , and suppose a ( · , · ) satisfy the second set of normalizing relations for r ′ . Fix I = { i , . . . , i t } , J = { j , . . . , j r ′ − , j r ′ +1 , . . . , j t } , I ′ = { i , . . . , i r − , i, i r +1 , . . . , i t } , J ′ = { j , . . . , j t } , and calculate: X j r / ∈ J b ( I, J ′ ) = X j r / ∈ J  a ( I, J ′ ) − n − t + 1 X i/ ∈ I \{ i r } a ( I ′ , J ′ )  = X j r / ∈ J a ( I, J ′ ) − n − t + 1 X i/ ∈ I \{ i r } X j r / ∈ J a ( I ′ , J ′ ) . (8)Once again, both sums vanish due to the assumption. Proof of the fourth item.

For I = { i , . . . , i t } , J = { j , . . . , j t } , denote c ( I, J ) = 1 n − t + 1 X i I \{ i r } a ( { i , . . . , i r − , i, i r +1 , . . . , i t } , J ) , so that a ( I, J ) = b ( I, J ) + c ( I, J ) . Plugging this into the representation of f , we see that it is enough toprove that h ( π ) = P I,J c ( I, J )1 I → J ( π ) is in V t − . Writing I ′ = I \ { i r } , J ′ = J \ { j r } and expanding, wesee that h ( π ) = 1 n − t + 1 X I,J I → J ( π ) X i I \{ i r } a ( { i , . . . , i r − , i, i r +1 , . . . , i t } , J )= 1 n − t + 1 X I ′ ,J ′ X i I ′ ,j r J ′ a ( { i , . . . , i r − , i, i r +1 , . . . , i t } , J ) X i r I ′ I → J ( π ) . Noting that P i r I ′ I → J ( π ) = 1 I ′ → J ′ ( π ) is in the spanning set (4) for t − , the proof is concluded.Applying Lemma 6.2 iteratively, we may write each f : S n → R of degree at most t as f = f + . . . + f d ,where for each k = 0 , , . . . , d , the function f k is in V k , and is given by a list of coefﬁcients satisfying thenormalizing relations. 24 .2 Usefulness of normalized representations In this section we establish a claim that demonstrates the usefulness of the normalizing relations. Informally,this claim often serves as a replacement for the orthogonality property that is so useful in product spaces.Formally, it allows us to turn long sums into short sums, and is very helpful in various computations arisingin computations in norms of functions on S n that are given in a normalized representation. Claim 6.4.

Let r ∈ { , . . . , d } , t < r . Let J be of size r , I be of size at least r , and R ⊆ I of size r − t .Then X T ∈ ([ n ] \ I ) t a r ( R ◦ T , J ) = ( − t X T ∈ ( I \ R ) t a r ( R ◦ T , J ) . Proof.

By symmetry, it sufﬁces to prove the statement for R that are preﬁxes of I . We prove the claim byinduction on t . The case t = 0 is trivial, so assume the claim holds for t − , where t > , and prove for t .The left hand side is equal to X i ,...,i t I distinct a r ( R ◦ ( i , . . . , i t ) , J ) . For ﬁxed i , . . . , i t − I , by the normalizing relations we have that X i t I ∪{ i ,...,i t − } a r ( R ◦ ( i , . . . , i t − ) ◦ ( i t ) , J ) = − X i t ∈ I \ R a r ( R ◦ ( i , i , . . . , i t − ) ◦ ( i t ) , J ) , hence X i ,...,i t I distinct a r ( R ◦ ( i , . . . , i t ) , J ) = − X i t ∈ I \ R X i ,...,i t − I ∪{ i t } distinct a r ( R ◦ ( i , i , . . . , i t − ) ◦ i t , J ) . For ﬁxed i t ∈ I \ R , using the induction hypothesis, the inner sum is equal to ( − t − X T ∈ ( I \ ( R ∪{ i t } )) t − a r ( R ◦ T ◦ ( i t ) , J ) . Plugging that in, X i ,...,i t I distinct a r ( R ◦ ( i , . . . , i t ) , J ) = ( − t − X i t ∈ I \ R − X T ∈ ( I \ ( R ∪{ i t } )) t − a r ( R ◦ T ◦ ( i t ) , J )= ( − t X T ′ ∈ ( I \ R ) t a r ( R ◦ T ′ , J ) . Key to the hypercontractive statement proved in this section is an analytic notion of inﬂuence. Given aﬁxed representation of f as n P k =0 P I,J ∈ [ n ] k a k ( I, J )1 I → J where for each k the coefﬁcients a k ( I, J ) satisfy thenormalizing relations, we deﬁne the analytic notion of inﬂuences as follows.25 eﬁnition 6.5. For

S, T ⊆ [ n ] of the same size s , deﬁne I S,T [ f ] = X r > X I ∈ ([ n ] \ S ) r J ∈ ([ n ] \ T ) r ( r + s )! n r + s a ( S ◦ I, T ◦ J ) . Here, S ◦ I denotes the element in [ n ] r resulting from appending I at the end of S . Deﬁnition 6.6.

A function f is called ε -analytically-global if for all S, T , I S,T [ f ] ε . Remark 6.7.

With some work it can be shown that for d ≪ n , a degree d function being ε -analyticallyglobal is equivalent to f being (2 d, δ ) -global in the sense of Deﬁnition 1.3, where δ = O d ( ε ) . Thus, at leastqualitatively, the hypercontractive statement below is in fact equivalent to Theorem 1.4. We can now state our variant of the hypercontractive inequality that uses analytic inﬂuences.

Theorem 6.8.

There exists an absolute constant

C > such that for all d, n ∈ N for which n > C · d log d ,the following holds. If f ∈ V d is given by a list of coefﬁcients satisfying the normalizing relations, say f = P I,J ∈ [ n ] d a d ( I, J )1 I → J , then E π (cid:2) f ( π ) (cid:3) X | S | = | T | (cid:18) n (cid:19) | S | I S,T [ f ] .p -biased hypercontractivity. The last ingredient we use in our proof is a hypercontractive inequality onthe p -biased cube from [17]. Let g : { , } m → R be a degree d function, where we think of { , } m asequipped with the p -biased product measure. Then, we may write g in the basis of characters, i.e. as a linercombination of { χ S } S ⊆ [ m ] , where χ S ( x ) = Q i ∈ S x i − p √ p (1 − p ) . This is the p -biased Fourier transform of f : g ( x ) = X S b g ( S ) χ S ( x ) . Next, we deﬁne the generalized inﬂuences of sets (which are very close in spirit to the analytic notion ofinﬂuences considered herein). For T ⊆ [ n ] , we denote I T [ g ] = X S ⊇ T b g ( S ) . The following results is an easy consequence of [17, Theorem 3.4] (the deduction of it from this result isdone in the same way as the proof of [17, Lemma 3.6]).

Theorem 6.9.

Suppose g : { , } m → R . Then k g k P T ⊆ [ n ] (3 p ) | T | I T [ g ] . Write f according to its normalized representation as f ( π ) = P I,J ∈ [ n ] d a ( I, J )1 I → J . We intend to deﬁne afunction g : { , } n × n → R that will behaves similary to f , as follows. We think of { , } n × n as equipped26ith the p -biased measure for p = 1 /n , and think of an input x ∈ { , } n × n as a matrix. The rationale isthat the bit x i,j being will encode the fact that π ( i ) = j , but we will never actually think about it this way.Thus, we deﬁne g as g ( x ) = X I,J ∈ [ n ] d a ( I, J ) d Y ℓ =1 (cid:18) I ℓ → J ℓ − n (cid:19) . For

I, J , we denote by S I,J ⊆ [ n × n ] the set of coordinates { ( I ℓ , J ℓ ) | ℓ = 1 , . . . , d } , and note that withthis notation, g ( x ) = X I,J ∈ [ n ] d p p (1 − p ) d | a ( I, J ) | χ S I,J ( x ) . To complete the proof, we ﬁrst show (Claim 6.10) that k f k (1 + o (1)) k g k , and then prove the desiredupper bound on the -norm of g , using Theorem 6.9. Claim 6.10. k f k (1 + o (1)) k g k Proof.

Deferred to Section 6.4.1.We now upper bound k g k . Using Theorem 6.9, k g k X T ⊆ [ n × n ] (3 p ) | T | I T [ g ] , (9)and the next claim bounds the generalized inﬂuences of g by the analytic inﬂuences of f .For two sets I = { i , . . . , i t } , J = { j , . . . , j t } of the same size, let S ( I, J ) = { ( i , j ) , . . . , ( i t , j t ) } ⊆ [ n ] × [ n ] . Claim 6.11.

Let T = S ( I ′ , J ′ ) be such that I T [ g ] = 0 . Then I T [ g ] I I ′ ,J ′ [ f ] .Proof. Take T in this sum for which I T [ g ] = 0 , and denote t = | T | . Then T = { ( i , j ) , . . . , ( i t , j t ) } = S ( I ′ , J ′ ) for I ′ = { i , . . . , i t } , J ′ = { j , . . . , j t } that are consistent. For Q ⊆ [ n ] × [ n ] of size d suchthat T ⊆ Q , let S Q,T = { ( I, J ) | T ⊆ S ( I, J ) = Q } , and note that by the symmetry normalizing relation, a ( I, J ) is constant on ( I, J ) ∈ S Q,T . We thus get I T [ g ] = X Q  X ( I,J ) ∈ S Q,T p p (1 − p ) d a ( I, J )  d ! p d X Q X ( I,J ) ∈ S Q,T a ( I, J ) , where we used the fact that the size of S Q,T is d ! . Rewriting the sum by ﬁrst choosing the locations of T in ( I, J ) , we get that the last sum is at most d t X I ∈ ([ n ] \ I ′ ) d − t J ∈ ([ n ] \ J ′ ) d − t a ( I ′ ◦ I, J ′ ◦ J ) Combining all, we get that I T [ g ] P I ∈ ([ n ] \ I ′ ) d − t J ∈ ([ n ] \ J ′ ) d − t d ! n d a ( I ′ ◦ I, J ′ ◦ J ) = I I ′ ,J ′ [ g ] .Plugging in Claim 6.11 into (9) and using Claim 6.10 ﬁnishes the proof of Theorem 6.8.27 .4.1 Proof of Claim 6.10 Let I r and J r be d -tuples of distinct indices from [ n ] . Then E π (cid:2) f ( π ) (cid:3) = X I ,...,I J ,...,J a ( I , J ) · · · a ( I , J ) E π (cid:2) π ( I )= J · · · π ( I )= J (cid:3) . Consider the collection of constraints on π in the product of the indicators. To be non-zero, the constraintsshould be consistent, so we only consider such tuples. Let M be the number of different elements thatappear in I , . . . , I (which is at least d and at most d ) We partition the outer sum according to M , andupper bound the contribution from each M separately. Fix M ; then the contribution from it is: n M X I ,...,I J ,...,J type M a ( I , J ) · · · a ( I , J ) . We would like to further partition this sum according to the pattern in which the M different elements of I , . . . , I are divided between them (and by consistency, this determines the way the M different elementsof J , . . . , J are divided between them). There are at most (2 − M d different such conﬁgurations,thus we ﬁx one such conﬁguration and upper bound it (at the end multiplying the bound by d ). Thus, wehave distinct i , . . . , i M ranging over [ n ] , and the coordinate of each I r is composed of the i , . . . , i M (andsimilarly j , . . . , j M and the J r ’s), and our sum is n M X i ,...,i M distinct j ,...,j M distinct a ( I , J ) · · · a ( I , J ) . (10)We partition the i t ’s into the number of times they occur: let A , . . . , A be the sets of i t that appear in , , , or of the I r ’s. We note that i t and j t appear in the same I r ’s and always together (otherwise theconstraints would be contradictory), and in particular i t ∈ A j iff j t ∈ A j . Also, M = | A | + | A | + | A | + | A | .We consider contributions from conﬁgurations where A = ∅ and A = ∅ separately, and to control thelatter group we show that the above sum may be upper bounded by M M sums of in which A = ∅ . To dothat, we show how to reduce the size of A by allowing more sums, and then apply it iteratively.Without loss of generality, assume i ∈ A ; then it is in exactly one of the I r ’s — without loss ofgenerality the last coordinate of I . We rewrite the sum as n M X i ,...,i M a ( I , J ) a ( I , J ) a ( I , J ) X i ∈ [ n ] \{ i ,...,i M } j ∈ [ n ] \{ j ,...,j M } a ( I , J ) . (11)Consider the innermost sum. Applying Claim 6.4 twice, we have X i ∈ [ n ] \{ i ,...,i M } j ∈ [ n ] \{ j ,...,j M } a ( I , J ) = X i ∈{ i ,...,i M }\ I j ∈{ j ,...,j M }\ J a ( I , J ) . Plugging that into (11), we are able to write the sum therein using ( M − r ) sums (one for each choice of i ∈ { i , . . . , i M } \ I and j ∈ { j , . . . , j M } \ J ) on i , . . . , i M , j , . . . , j M , and thus we have reduced the28ize of A by at least , and have decreased M by at least . The last bit implies that the original normalizingfactor is smaller by a factor of at least /n than the new one. Iteratively applying this procedure, we end upwith A = ∅ , and we assume that henceforth. Thus, letting H be the set of consistent ( I , . . . , I , J , . . . , J ) in which each element in I ∪ · · · ∪ I appears in at least two of the I i ’s, we get that E π (cid:2) f ( π ) (cid:3) d O ( d ) n ! X I ,...,I J ,...,J from H | a ( I , J ) | · · · | a ( I , J ) | E π (cid:2) π ( I )= J · · · π ( I )= J (cid:3) (1 + o (1)) X I ,...,I J ,...,J from H n | I ∪···∪ I | | a ( I , J ) | · · · | a ( I , J ) | , (12)where in the last inequality we used E π (cid:2) π ( I )= J · · · π ( I )= J (cid:3) = 1 n · ( n − · · · ( n − | I ∪ . . . ∪ I | + 1) (1 + o (1)) 1 n | I ∪ ... ∪ I | . Next, we lower bound k g k . Expanding as before, E x (cid:2) g ( π ) (cid:3) = X I ,...,I J ,...,J p p (1 − p ) d | a ( I , J ) | · · · | a ( I , J ) | E x (cid:2) χ S ( I ,J ) ( x ) · · · χ S ( I ,J ) ( x ) (cid:3) . A direct computation shows that the expectation of a normalized p -biased bit, i.e. x i,j − p √ p (1 − p ) , is , the expec-tation of its square is , the expectation of its third power is o (1) √ p (1 − p ) , and the expectation of its fourth poweris o (1) p (1 − p ) . This tells us that all summands in the above formula are non-negative, and therefore we can omitall those that correspond to ( I , . . . , I ) and ( J , . . . , J ) not from H , and only decrease the quantity. For j = 2 , , , denote by h j the number of elements that appear in j of the I , . . . , I . Then we get that theinner term is at least (1 − o (1)) p p (1 − p ) d − h − h | a ( I , J ) | · · · | a ( I , J ) | . Note that h + 3 h + 4 h = 4 d , we get that d − h − h = 2( h + h + h ) = 2 | I ∪ · · · ∪ I | .Combining everything, we get that E x (cid:2) g ( π ) (cid:3) > (1 − o (1)) X I ,...,I J ,...,J from H ( p (1 − p )) | I ∪···∪ I | | a ( I , J ) | · · · | a ( I , J ) | > (1 − o (1)) X I ,...,I J ,...,J from H n | I ∪···∪ I | | a ( I , J ) | · · · | a ( I , J ) | . (13)Combining (12) and (13) shows that k f k (1 + o (1)) k g k .29 .5 Deducing hypercontractivity for low-degree functions With Theorem 6.8 in hand, one may deduce the following inequality as an easy corollary.

Corollary 6.12.

There exists an absolute constant

C > such that for all d, n ∈ N for which n > C · d log d ,the following holds. If f ∈ V d ( S n ) is ε -analytically-global, then k f k C · d log d ε .Proof. Since the proof is straightforward, we only outline its steps. Writing f = f + · · · + f d for f k ∈ V k given by normalizing relations, one bounds k f k ( d + 1) d P k =0 k f k k , uses Theorem 6.8 on each f k , andﬁnally I I ′ ,J ′ [ f k ] I I ′ ,J ′ [ f ] ε . Remark 6.13.

Using the same techniques, one may prove statements analogous to Theorem 6.8 and Corol-lary 6.12 for all even q ∈ N . The ﬁrst application of our hypercontractive is the following level- d inequality. Theorem 1.6 (Restated) .

There exists an absolute constant

C > such that the following holds. Let d, n ∈ N and ε > such that n > Cd log(1 /ε ) Cd . If f : S n → { , } is (2 d, ε ) -global, then k f d k C · d ε log C · d (1 /ε ) .Proof. Deferred to Section 8.This result is analogous to the level d inequality on the Boolean hypercube [24, Corollary 9.25], howeverit is quantitatively weaker because our dependence on d is poorer; for instance, it remains meaningful onlyfor d log(1 /ε ) / , wherein the original statement on the Boolean hypercube remains effective up to d ∼ log(1 /ε ) . Still, we show in Section 7.2 that this statement sufﬁces to recover results regarding the sizeof the largest product-free sets in S n .It would be interesting to prove a quantitatively better version of Theorem 1.6 in terms of d , and inparticular whether it is the case that for d = c log(1 /ε ) it holds that k f = d k = ε − o (1) for sufﬁciently small(but constant) c > .We remark that once Theorem 1.6 has been established (or more precisely, the slightly stronger statementin Proposition 8.11), one can strengthen it at the expense of assuming that n is larger, namely establishTheorem 1.7 from the introduction. We defer its proof to Section 8.8. In this section we prove a strengthening of Theorem 1.8. Conceptually, the proof is very simple. Startingwith Gowers’ approach, we convert this problem into an independent set in a Cayley graph associated with F , and use a Hoffman-type bound to solve that problem.Fix a global product-free set F ⊆ A n , and construct the (directed) graph G F as follows. Its vertex set is S n , and ( π, σ ) is an edge if π − σ ∈ F . Note that G F is a Cayley graph, and that if F is product-free, then F is an independent set in G F . Our plan is thus to (1) study the eigenvalues of G F and prove good upperbounds on them, and then (2) bound the size of F using a Hoffman-type bound.30et T F be the adjacency operator of G F , i.e. the random walk that from a vertex π transitions to arandom neighbour σ in G F . We may consider the action of T F on functions f : S n → R as ( T F f )( π ) = E σ :( π,σ ) is an edge [ f ( σ )] = E a ∈ F [ f ( πa )] . We will next study the eigenspaces and eigenvalues of T F , and for that we need some basic facts re-garding the representation theory of S n . We will then study the fraction of edges between any two globalfunctions A , B , and Theorem 1.8 will just be the special case that A = B = F .Throughout this section, we set δ = | F || S n | . S n We will need some basic facts about the representation theory of S n , and our exposition will follow standardtextbooks, e.g. [13].A partition of [ n ] , denoted by λ ⊢ n , is a sequence of integers λ = ( λ , . . . , λ k ) where λ > λ > . . . > λ k > sum up to n . It is well-known that partitions index equivalence classes of representations of S n , thuswe may associate with each partition λ a character χ λ : S n → C , which in the case of the symmetric groupis real-valued. The dimension of λ is dim ( λ ) = χ λ ( e ) , where e is the identity permutation.Given a partition λ , a λ -tabloid is a partition of [ n ] into sets A , . . . , A k such that | A i | = λ i . Thus, for λ -tabloids A = ( A , . . . , A k ) and B = ( B , . . . , B k ) , we deﬁne T A,B = { π ∈ S n | π ( A i ) = B i ∀ i = 1 , . . . , k } ,and refer to any such T A,B as a λ -coset.With these notations, we may deﬁne the space V λ ( S n ) , which is the linear span of the indicator functionsof all λ -cosets. We note that V λ ( S n ) is clearly a left S n -module, where the action of S n is given as π f : S n → R deﬁned by π f ( σ ) = f ( πσ ) .Next, we need to deﬁne an ordering on partitions that will let us further reﬁne the spaces V λ . Deﬁnition 7.1.

Let λ = ( λ , . . . , λ k ) , µ = ( µ , . . . , µ s ) be partitions of [ n ] . We say that λ dominates µ ,and denote λ D µ , if for all j = 1 , . . . , k it holds that j P i =1 λ i > j P i =1 µ i . With this deﬁnition, one may easily show that V µ ⊆ V λ whenever µ D λ , and furthermore that V µ = V λ if and only if µ = λ . It thus makes sense to deﬁne the spaces V = λ = V λ ∩ \ µ ⊲ λ V ⊥ µ . The spaces V = λ are orthogonal and their direct sum is { f : S n → R } , so we may write any function f : S n → R as f = P λ ⊢ n f = λ in a unique way. Deﬁnition 7.2.

Let λ = ( λ , . . . , λ k ) be a partition of n . The transpose partition, λ t , is ( µ , . . . , µ k ′ ) ,where k ′ = λ and µ j = |{ i | λ i > j }| . Alternatively, if we think of a partition as represented by top-left justiﬁed rows, then the transpose of apartition is obtained by reﬂecting the diagram across the main diagonal. For example, (3 , t = (2 , , : (3 ,

1) = (2 , ,

1) = λ = ( n ) , and its transpose, λ = (1 t ) . For λ = ( n ) , the space V = λ consists of constant functions, and one has χ λ = 1 . Thus, f =( n ) is just the averageof f , i.e. µ ( f ) def = E π [ f ( π )] . For λ = (1 n ) , the space V = λ consists of multiples of the sign function ofpermutations, sign : S n → {− , } , and χ λ = sign . One therefore has f = λ = h f, sign i sign ( f ) .For general partitions λ , it is well-known that the dimensions of λ and λ t are equal, and one has that χ λ t = sign · χ λ . We will need the following statement that generalizes this correspondence to f = λ and f = λ t . Lemma 7.3.

Let f : S n → R , and let λ ⊢ n . Then ( f · sign ) = λ = f = λ t sign .Proof. The statement follows directly from the inversion formula for f = λ , which states that f = λ ( π ) = dim ( λ ) E σ ∈ S n (cid:2) f ( σ ) χ λ ( πσ − ) (cid:3) . By change of variables, we see that ( f · sign ) = λ ( π ) = dim ( λ ) E σ ∈ S n (cid:2) f ( σ − π ) sign ( σ − π ) χ λ ( σ ) (cid:3) = sign ( π ) dim ( λ ) E σ ∈ S n (cid:2) f ( σ − π ) sign ( σ ) χ λ ( σ ) (cid:3) , where we used the fact that sign is multiplicative and sign ( σ − ) = sign ( σ ) . Now, as sign ( σ ) χ λ ( σ ) = χ λ t ( σ ) , we get by changing variables again that ( f · sign ) = λ ( π ) = sign ( π ) dim ( λ ) E σ ∈ S n (cid:2) f ( σ ) χ λ t ( πσ − ) (cid:3) = sign ( π ) dim ( λ t ) E σ ∈ S n (cid:2) f ( σ ) χ λ t ( πσ − ) (cid:3) , which is equal to sign ( π ) f = λ t ( π ) by the inversion formula.Lastly, we remark that if λ is a partition such that λ = n − k , then V = λ ⊆ V k . It follows by Parseval that X λ ⊢ nλ = n − k (cid:13)(cid:13)(cid:13) f = λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) f k (cid:13)(cid:13)(cid:13) . (14) T ∗ F T F Claim 7.4.

For all λ ⊢ n we have that T F V = λ ⊆ V = λ ; the same holds for T ∗ F .Proof. First, we show that T F V λ ⊆ V λ , and for that it is enough to show that T F T A,B ∈ V λ for all λ -tabloids A = ( A , . . . , A k ) and B = ( B , . . . , B k ) . Fix a ∈ F , and note that T A,B ( σa ) = 1 T a ( A ) ,B ( σ ) where a ( A ) = ( a ( A ) , . . . , a ( A k )) , so T A,B ( σa ) , as a function of σ , is also an indicator of a λ -coset. Since T F T A,B is a linear combination of such functions, it follows that T F T A,B ∈ V λ . A similar argument showsthat the same holds for the adjoint operator of T ∗ F = T F − , where F − = (cid:8) a − (cid:12)(cid:12) a ∈ F (cid:9) .Thus, for f ∈ V = λ we automatically have that f ∈ V λ , and we next show orthogonality to V µ for all µ⊲λ .Indeed, let µ be such partition and let g ∈ V µ ; then by the above T ∗ F g ∈ V µ and so h T F f , g i = h f, T ∗ F g i = 0 ,and the proof is complete. The argument for T ∗ F is analogous.Thus, we may ﬁnd a basis of each V = λ consisting of eigenvectors of T ∗ F T F . The following claim showsthat the multiplicity of each corresponding eigenvalue is at least dim ( λ ) . Claim 7.5.

Let f ∈ V = λ ( S n ) be non-zero. Then dim ( Span ( { π f } π ∈ S n )) > dim ( λ ) .Proof. Let ρ λ : S n → V = λ be a representation, and denote by W the span of { π f } π ∈ S n . Note that W is asubspace of V = λ , and it holds that ( ρ | W , W ) is a sub-representation of ρ . Since each irreducible representa-tion V ⊆ V = λ of S n has dimension dim ( λ ) , it follows that dim ( W ) > dim ( λ ) , and we’re done.32e can thus use the trace method to bound the magnitude of each eigenvalue. Lemma 7.6.

Let f ∈ V = λ be an eigenvector of T ∗ F T F with eigenvalue α λ . Then α λ dim ( λ ) δ . Proof.

By Claim 7.5, we may ﬁnd a collection of dim ( λ ) permutations, call it Π , such that { π f } π ∈ Π islinearly independent. Since f is an eigenvector of T ∗ F T F , it follows that each one of π f is an eigenvectorwith eigenvalue α λ . It follows that Tr ( T ∗ F T F ) > | Π | α λ = dim ( λ ) α λ .On the other hand, interpreting Tr ( T ∗ F T F ) probabilistically as the probability to return to the startingvertex in -steps, Tr ( T ∗ F T F ) = X π Pr a ∈ F − ,a ∈ F [ π = πa a ] = n ! Pr a ∈ F − ,a ∈ F (cid:2) a = a − (cid:3) = n ! 1 | F | = 1 δ . Combining the two bounds on Tr ( T ∗ F T F ) completes the proof.To use this lemma effectively, we have the following bound on dim ( λ ) that follows from the hook lengthformula. Lemma 7.7 (Claim 1, Theorem 19 in [7]) . Let λ ⊢ n be given as λ = ( λ , . . . , λ k ) , and denote d =min( n − λ , k ) .1. If λ = ( n ) , then dim ( λ ) = 1 .2. If d > , then dim ( λ ) > (cid:0) nd · e (cid:1) d .3. If d > n/ , then dim ( λ ) > . n . With the information we have gathered regarding the representation theory of S n and the eigenvalues of T F , we can use the spectral method to prove lower bounds on h T F g, h i for Boolean functions g, h that areglobal, as in the following lemma. Lemma 7.8.

There exists

C > such that the following holds. Let n ∈ N and ε > be such that n > log(1 /ε ) C , and suppose that g, h : A n → { , } are (6 , ε ) -global. Then h T F g, h i > E [ g ] E [ h ]4 − C ε log C (1 /ε ) √ nδ − C √ n δ p E [ g ] E [ h ] . Proof.

Extend g, h to S n by deﬁning them to be outside A n .Recall that T F preserves each V = λ . Decomposing g = P λ ⊢ n g = λ where g = λ ∈ V = λ and h similarly,we have by Plancherel that h T F g, h i = P λ,θ h T F g = λ , h = λ i . For the trivial partition λ = ( n ) we have that g = λ ≡ µ ( g ) = E [ g ] / , h = λ ≡ µ ( h ) = E [ h ] / . For λ = (1 n ) , since F ⊆ A n it follows that T F sign = sign ,and so T F g = λ = β λ sign , h = λ = γ λ sign for β λ , γ λ > , so the term corresponding to λ in the above isnon-negative. Thus, denoting λ = ( λ , . . . , λ k ) we have that h T F g, h i > µ ( g ) µ ( h ) − X λ ⊢ nλ =( n ) , (1 n ) λ > n − or k > n − (cid:13)(cid:13)(cid:13) T F g = λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) h = λ (cid:13)(cid:13)(cid:13) − X λ =( n ) , (1 n ) λ n − and k n − (cid:13)(cid:13)(cid:13) T F g = λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) h = λ (cid:13)(cid:13)(cid:13) . (15)33e upper-bound the second and third terms on the right-hand side, from which the lemma follows. Webegin with the second term, and handle separately λ ’s such that λ > n − , and λ ’s such that k > n − . λ ’s such that λ = ( n ) , (1 n ) and λ > n − . We ﬁrst upper bound (cid:13)(cid:13) T F g = λ (cid:13)(cid:13) . As T ∗ F T F preserveseach space V = λ and is symmetric, we may write this space as a sum of eigenspaces of T ∗ F T F , say L θ V θ = λ .Writing g = λ = P θ g = λ,θ where g = λ,θ ∈ V θ = λ , we have that (cid:13)(cid:13)(cid:13) T F g = λ (cid:13)(cid:13)(cid:13) = h g = λ , T ∗ F T F g = λ i = X θ h g = λ,θ , T ∗ F T F g = λ,θ i = X θ θ (cid:13)(cid:13)(cid:13) g = λ,θ (cid:13)(cid:13)(cid:13) . By Lemma 7.6 we have θ dim ( λ ) δ , which by Fact 7.7 is at most O (cid:0) nδ (cid:1) . We thus get that (cid:13)(cid:13)(cid:13) T F g = λ (cid:13)(cid:13)(cid:13) O (cid:18) nδ (cid:19) X θ (cid:13)(cid:13)(cid:13) g = λ,θ (cid:13)(cid:13)(cid:13) O (cid:18) nδ (cid:19) (cid:13)(cid:13)(cid:13) g = λ (cid:13)(cid:13)(cid:13) . Plugging this into the second sum in (15), we get that the contribution from λ such that λ > n − is atmost O (cid:18) √ nδ (cid:19) X λ ⊢ nλ =( n ) , (1 n ) λ > n − (cid:13)(cid:13)(cid:13) g = λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) h = λ (cid:13)(cid:13)(cid:13) O (cid:18) √ nδ (cid:19) (cid:13)(cid:13) g (cid:13)(cid:13) (cid:13)(cid:13) h (cid:13)(cid:13) , where we used Cauchy-Schwarz and (14). By Theorem 1.6, (cid:13)(cid:13) g (cid:13)(cid:13) , (cid:13)(cid:13) h (cid:13)(cid:13) C · ε log C (1 /ε ) for someabsolute constant C . We thus get that X λ ⊢ nλ =( n ) , (1 n ) λ > n − (cid:13)(cid:13)(cid:13) T F g = λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) h = λ (cid:13)(cid:13)(cid:13) √ nδ C ′ · ε log C (1 /ε ) .λ ’s such that k > n − . The treatment here is pretty much identical to the previous case, except that welook at the functions ˜ g = g · sign and ˜ h = h · sign . That is, ﬁrst note that the globalness of g, h implies that ˜ g, ˜ h are also global with the same parameters, and since g, h are Boolean, ˜ g, ˜ h are integer valued. Moreover,by Lemma 7.3 we have that X λ ⊢ nλ =( n ) , (1 n ) k > n − (cid:13)(cid:13)(cid:13) T F g = λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) h = λ (cid:13)(cid:13)(cid:13) = X λ ⊢ nλ =( n ) , (1 n ) k > n − (cid:13)(cid:13)(cid:13) T F ˜ g = λ t (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ˜ h = λ t (cid:13)(cid:13)(cid:13) = X λ ⊢ nλ =( n ) , (1 n ) λ > n − (cid:13)(cid:13)(cid:13) T F ˜ g λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ˜ h λ (cid:13)(cid:13)(cid:13) , and from here the argument is identical. Bounding the third term in (15) . Repeating the eigenspace argument from above, for all λ ⊢ n such that λ n − and k n − we have (cid:13)(cid:13)(cid:13) T F g = λ (cid:13)(cid:13)(cid:13) O (cid:18) √ n δ (cid:19) (cid:13)(cid:13)(cid:13) g = λ (cid:13)(cid:13)(cid:13) . Thus, the third sum in (15) is at most O (cid:18) √ n δ (cid:19) X λ ⊢ n (cid:13)(cid:13)(cid:13) g = λ (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) h = λ (cid:13)(cid:13)(cid:13) O (cid:18) √ n δ (cid:19) k g k k h k , where we used Cauchy–Schwarz and Parseval. 34e can now prove the strengthening of Theorem 1.8, stated below. Corollary 7.9.

There exists K ∈ N such that the following holds for all ε > and n > log K (1 /ε ) . If A , B ⊆ A n are (6 , ε ) -global, and µ ( A ) µ ( B ) > K max( n − δ − , ( nδ ) − / ε log K (1 /ε )) , then h T F g, h i > µ ( A ) µ ( B ) . Proof.

Taking g = 1 A , h = 1 B , by Lemma 7.8 we have h T F g, h i > µ ( A ) µ ( B ) − C ′ ε log C ′ (1 /ε ) √ nδ − C ′ n √ δ p µ ( A ) µ ( B ) , where C ′ is an absolute constants. Now the conditions on the parameters implies that the ﬁrst term dominatesthe other two.We note that Theorem 1.8 immediately follows, since there one has g = h = 1 F and h T F g, h i = 0 , soone gets that the condition on the parameters fail, and therefore the lower bound on µ ( A ) µ ( B ) (which inthis case is just δ ) fails; plugging in ε = C · √ δ and rearranging ﬁnishes the proof. We remark that it is within reason to expect that global, product-free families in A n must in fact be muchsmaller. To be more precise, one may expect that for all t ∈ N , there is j ∈ N such that for n > n ( t ) , if F is ( j, O ( √ δ )) -global (where δ = | F | / | S n | ), then δ O t ( n − t ) . The bottleneck in our approach comes fromthe use of the trace method (which doesn’t use the globalness of F at all), and the bounds it gives on theeigenvalues of T ∗ F T F corresponding to low-degree functions: they become meaningless as soon as δ > /n .Inspecting the above proof, our approach only requires a super-logarithmic upper bound on the eigen-values to go through. More precisely, we need that the ﬁrst few non-trivial eigenvalues of T ∗ F T F are at most (log n ) − K ( t ) , for sufﬁciently large K ( t ) . We feel that something like that should follow in greater generalityfrom the fact that the set of generators in the Cayley graph, namely F , is global. To support that, note that ifwe were dealing with Abelian groups, then the eigenvalue α of T F corresponding to a character χ could becomputed as λ = | F | P a ∈ F χ ( a ) , which by rewriting is nothing but a (normalized) Fourier coefﬁcient of F ,i.e. δ c F ( χ ) , which we expect to be small by the globalness of F . In this section, we consider T which is the adjacency operator of the transpositions graph. That is, itis the transition matrix of the (left) Cayley graph ( S n , A ) , where A is the set of transpositions (and themultiplication happens from the left). We show that for a global set S , starting a walk from a vertex in S and performing ≈ cn steps according to T escapes S with probability close to . Poisson process random walk.

To be more precise, we consider the following random walk: from apermutation π ∈ S , choose a number k ∼ Poisson ( t ) , take τ which is a product of k random transpositions,and go to σ = τ ◦ π . We show that starting with a random π ∈ S , the probability that we escape S , i.e. that Sσ S , is close to . 35o prove this result, we ﬁrst note that the distribution of an outgoing neighbour from π is exactly e − t ( I − T) π , where π is the indicator vector of π . Therefore, the distribution of σ where π ∈ S is ran-dom is e − t ( I − T) 1 S | S | , where S is the indicator vector of S . Thus, the probability that σ is in S (i.e. of thecomplement event) is µ ( S ) h S , e − t ( I − T) S i , where µ ( S ) is the measure of S . We upper-bound this quantity using spectral considerations. We will onlyneed our hypercontractive inequality and basic knowledge of the eigenvalues of T , which can be found, forexample, in [10, Corollary 21]. This is the content of the ﬁrs three items in the lemma below (we also provea fourth item, which will be useful for us later on). Lemma 7.10.

Let λ ∈ R be an eigenvalue of T , and f ∈ V d ( S n ) be a corresponding eigenvector.1. T V = d ( S n ) ⊆ V = d ( S n ) .2. − dn − λ − dn − .3. If d n/ , then we have the stronger bound − dn − λ − (cid:0) − d − n (cid:1) dn − .4. If L is a Laplacian of order , then L and T commute. Thus, T commutes with all Laplacians.Proof. For the ﬁrst item, we ﬁrst note that T commutes with the right action of S n on functions: (T( f π ))( σ ) = E π ′ a transposition (cid:2) f π ( π ′ ◦ σ ) (cid:3) = E π ′ a transposition (cid:2) f ( π ′ ◦ σ ◦ π ) (cid:3) = T f ( σ ◦ π ) = (T f ) π ( σ ) . Also, T is self adjoint, so T ∗ also commutes with the action of S n . The ﬁrst item now follows as in the proofof Claim 7.4.The second and third items are exactly [10, Corollary 21]. For the last item, for any function f and anorder Laplacian

L = L ( i,j ) , TL f = T (cid:16) f − f ( i,j ) (cid:17) = T f − T (cid:16) f ( i,j ) (cid:17) = T f − (T f ) ( i,j ) = L (T f ) , where in the third transition we used the fact that T commutes with the right action of S n .We remark that the ﬁrst item above implies that we may ﬁnd a basis of the space of real-valued functionsconsisting of eigenvectors of T , where each function is from V = d ( S n ) for some d . Lastly, we need thefollowing (straightforward) fact. Fact 7.11. If f ∈ V d ( S n ) is an eigenvector of T with eigenvalue λ , then f is an eigenvector of e − t ( I − T) with eigenvalue e − t (1 − λ ) . Theorem 7.12.

There exists

C > such that the following holds for all d ∈ N , t, ε > and n ∈ N suchthat n > C · d log C · d (1 /ε ) . If S ⊆ S n is a set of vertices such that S is (2 d, ε ) -global, then Pr π ∈ Sσ ∼ e − t ( I − T ) π [ σ S ] > − (cid:18) C · d ε log C · d (1 /ε ) + e − ( d +1) tn − (cid:19) . roof. Consider the complement event that σ ∈ S , and note that the desired probability can be writtenanalytically as µ ( S ) h S , e − t ( I − T) S i , where µ ( S ) is the measure of S . Now, writing f = 1 S and expanding f = f =0 + f =1 + · · · , we consider each one of e − t ( I − T) f = j separately. We claim that (cid:13)(cid:13)(cid:13) e − t ( I − T) f = j (cid:13)(cid:13)(cid:13) e − jtn − (cid:13)(cid:13) f = j (cid:13)(cid:13) . (16)Indeed, note that we may write f = j = P a r f j,r , where f j,r ∈ V = j ( S n ) are orthogonal and eigenvectors of T with eigenvalue λ j,r , and so by Fact 7.11, e − t ( I − T) f = j = P r e − t (1 − λ j,r ) f j,r . By Parseval we deduce that (cid:13)(cid:13)(cid:13) e − t ( I − T) f = j (cid:13)(cid:13)(cid:13) X r e − t (1 − λ j,r ) k f j,r k max r e − t (1 − λ j,r ) X r k f j,r k = max r e − t (1 − λ j,r ) (cid:13)(cid:13) f = j (cid:13)(cid:13) . Inequality (16) now follows from the second item in Lemma 7.10.We now expand out the expression we have for the probability of the complement event using Plancherel: µ ( S ) h S , e − t ( I − T) S i = 1 µ ( S ) X j h f = j , e − t ( I − T) f = j i µ ( S ) X j (cid:13)(cid:13) f = j (cid:13)(cid:13) (cid:13)(cid:13)(cid:13) e − t ( I − T) f = j (cid:13)(cid:13)(cid:13) µ ( S ) X j e − jtn − (cid:13)(cid:13) f = j (cid:13)(cid:13) , (17)where in the last two transitions we used Cauchy–Schwarz and inequality (16). Lastly, we bound (cid:13)(cid:13) f = j (cid:13)(cid:13) .For j > d we have that P j>d (cid:13)(cid:13) f = j (cid:13)(cid:13) µ ( S ) by Parseval, and for j d we use hypercontractivity.First, bound (cid:13)(cid:13) f = j (cid:13)(cid:13) (cid:13)(cid:13) f j (cid:13)(cid:13) , and note that the function f j is (2 j, O ( j ) ε log O ( j ) (1 /ε )) -global byClaim A.1. Thus, using H ¨older’s inequality and Theorem 1.4 we get that (cid:13)(cid:13) f j (cid:13)(cid:13) = h f, f j i k f k / (cid:13)(cid:13) f j (cid:13)(cid:13) µ ( S ) / O ( j ) q O ( j ) ε log O ( j ) (1 /ε ) (cid:13)(cid:13) f j (cid:13)(cid:13) / . Rearranging gives (cid:13)(cid:13) f j (cid:13)(cid:13) O ( j ) µ ( S ) ε log O ( j ) (1 /ε ) .Plugging our estimates into (17) we get µ ( S ) h S , e − t ( I − T) S i d X j =0 O ( j ) e − jtn − ε log O ( j ) (1 /ε ) + e − ( d +1) tn − O ( d ) ε log O ( d ) (1 /ε ) + e − ( d +1) tn − . Using exactly the same technique, one can prove a lower bound on the probability of escaping a globalset in a single step, as stated below. This result is similar in spirit to a variant of the KKL Theorem over theBoolean hypercube [15], and therefore we modify the formulation slightly. Given a function f : S n → R ,we deﬁne the inﬂuence of coordinate i ∈ [ n ] to be I i [ f ] = E j = i h(cid:13)(cid:13) L ( i,j ) f (cid:13)(cid:13) i , and deﬁne the total inﬂuence of f to be I [ f ] = I [ f ] + · · · + I n [ f ] .37 heorem 7.13. There exists

C > such that the following holds for all d ∈ N and n ∈ N such that n > C · d . Suppose S ⊆ S n is such that for all derivative operators D = I of order at most d , it holds that k D1 S k − C · d . Then I [1 S ] > d · var (1 S ) . Proof.

Deferred to Appendix A.

Our hypercontractive inequalities also imply similar hypercontractive inequalities on different non-productdomains. One example from [3] is the domain of -to- maps, i.e. (cid:8) π : [2 n ] → [ n ] | (cid:12)(cid:12) π − ( i ) (cid:12)(cid:12) = 2 ∀ i ∈ [ n ] (cid:9) .A more general domain, which we consider below, is the multi-slice. Deﬁnition 7.14.

Let m, n ∈ N such that n > m , and let k , . . . , k m ∈ N sum up to n . The multi-slice U k ,...,k m of dimension n consists of all vectors x ∈ [ m ] n that, for all j ∈ [ m ] , have exactly k j of theircoordinates equal to j .We consider the multi-slice as a probability space with the uniform measure. In exactly the same way one deﬁnes the degree decomposition over S n , one may consider the degreedecomposition over the mutli-slice. A function f : U k ,...,k m → R is said to be a d -junta if there are A ⊆ [ n ] of size at most d and g : [ m ] d → R such that f ( x ) = g ( x A ) . We then deﬁne the space V d ( U k ,...,k m ) spanned by d -juntas. Also, one may analogously deﬁne globalness of functions over the multi-slice. A d -restriction consists of a set A ⊆ [ n ] of size d and α ∈ [ m ] A , and the corresponding restriction is the function f A → α ( z ) = f ( x A = α, x ¯ A = z ) (whose domain is a different multi-slice). Deﬁnition 7.15.

We say f : U k ,...,k m → R is ( d, ε ) -global if for any d -restriction ( A, α ) it holds that k f A → α k ε . Our hypercontractive inequality for the multi-slice reads as follows.

Theorem 7.16.

There exists an absolute constant

C > such that the following holds. Let d, q, n ∈ N besuch that n > q C · d , and let f ∈ V d ( U k ,...,k m ) . If f is (2 d, ε ) -global, then k f k q q O ( d ) ε q − q k f k q . Proof.

We construct a simple deterministic coupling C between S n and U k ,...,k m .Fix a partition of [ n ] into sets K , . . . , K m such that | K j | = k j for all j . Given a permutation π ,we deﬁne C ( π ) = x as follows: for all i ∈ [ n ] , j ∈ [ m ] , we set x i = j if π ( i ) ∈ K j . Deﬁne themapping M : L ( U k ,...,k m ) → L ( S n ) that maps a function h : U k ,...,k m → R to M h : S n → R deﬁned by ( M h )( π ) = h ( C ( π )) .Let g = M f . We claim that g has degree at most d and is global. To see that g ∈ V d ( S n ) , it is enoughto show that the mapping f → g is linear (which is clear), and maps a d -junta into a d -junta, which isalso straightforward. To see that g is global, let T = { ( i , r ) , . . . , ( i ℓ , r ℓ ) } be consistent, and deﬁne the r -restriction ( A, α ) as: A = { i , . . . , i ℓ } , and α i s = j if r s ∈ K j . Note that the distribution of x ∈ U k ,...,k m conditioned on x A is exactly the same as of C ( π ) conditioned on π respecting T , so if r d we get that k g A → α k = k f → T k ε, g is (2 d, ε ) -global. The result thus follows from Theorem 1.4 and the fact that M preserves L p normsfor all p > .The coupling in the proof of Theorem 7.16 also implies in the same way a level- d inequality over U k ,...,k m from the corresponding result in S n , Theorem 1.6, as well as isoperimetric inequalities, as wedescribe next. d inequality As on S n , for f : U k ,...,k m → R we let f d be the projection of f onto V d ( U k ,...,k m ) . Our level- d inequalityfor the multi-slice thus reads: Corollary 7.17.

There exists an absolute constant

C > such that the following holds. Let d, n ∈ N and ε > such that n > Cd log(1 /ε ) Cd . If f : U k ,...,k m → { , } is (2 d, ε ) -global, then (cid:13)(cid:13) f d (cid:13)(cid:13) C · d ε log C · d (1 /ε ) .Proof. The proof relies on an additional easy property of the mapping M from the proof of Theorem 7.16.As in S n , we deﬁne the space of pure degree d functions over U k ,...,k m as V = d ( U k ,...,k m ) = V d ( U k ,...,k m ) ∩ V d − ( U k ,...,k m ) ⊥ , and let f = d be the projection of f onto V = d ( U k ,...,k m ) . We thus have f d = f =0 + f =1 + · · · + f = d , and so f = d = f d − f d − .Write h i = M f = i , and note that h i is of degree at most i . Also, we note that as restrictions of size r < i over S n are mapped to restrictions of size r over U k ,...,k m , it follows that h i is perpendicular to degree i − functions, and so h i ∈ V = i ( S n ) . By linearity of M , M f = h + h + · · · + h n , and by uniqueness of thepure degree decomposition, it follows that h i = ( M f ) = i . We therefore have that (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) = X i d (cid:13)(cid:13) f = i (cid:13)(cid:13) = X i d k h i k = X i d (cid:13)(cid:13) ( M f ) = i (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) ( M f ) d (cid:13)(cid:13)(cid:13) C · d ε log C · d (1 /ε ) , where the last inequality is by Theorem 1.6. One can also deduce the obvious analogs of Theorems 7.12, 7.13 for the multi-slice. Since we use it for ourﬁnal application, we include here the statement of the analog of Theorem 7.13.For f : U k ,...,k m → R , consider the Laplacians L i,j that map a function f to a function L i,j f deﬁnedas L i,j f ( x ) = f ( x ) − f ( x ( i,j ) ) , and deﬁne I i [ f ] = E j = i h k L i,j f k i and I [ f ] = n P i =1 I i [ f ] . Similarly toDeﬁnition 4.1, we deﬁne a derivative of f as a restriction of the corresponding Laplacian, i.e. for i, j ∈ [ n ] , a, b ∈ [ m ] we deﬁne D ( i,j ) → ( a,b ) f = (L i,j f ( x )) ( i,j ) → ( a,b ) . Theorem 7.18.

There exists

C > such that the following holds for all d ∈ N and n ∈ N such that n > C · d . Suppose S ⊆ U k ,...,k m such that for all derivative operators D = I of order at most d it holdsthat k D1 S k − C · d . Then I [1 S ] > d · var (1 S ) . We omit the straightforward derivation from Theorem 7.13.39 .5 Stability result for the Kruskal–Katona theorem on the slice

Our ﬁnal application is the following sharp threshold result for the slice, which can be also seen as a stabilityversion of the Kruskal–Katona theorem (see [25, 16] for other, incomparable stability versions). For a familyof subsets

F ⊆ (cid:0) [ n ] k (cid:1) , we denote µ ( F ) = |F | / (cid:0) nk (cid:1) . and deﬁne the upper shadow of F as F ↑ = (cid:26) X ∈ (cid:18) nk + 1 (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) ∃ A ⊆ X, A ∈ F (cid:27) . The Kruskal–Katona theorem is a basic result in combinatorics that gives a lower bound on the measure ofthe upper shadow of a family F in terms of the measure of the family itself. Below we state a convenient,simpliﬁed version of it due to Lov´asz’, which uses the generalized binomial coefﬁcients. Theorem 7.19.

Let

F ⊆ (cid:0) [ n ] k (cid:1) and suppose that |F | = (cid:0) nx (cid:1) . Then |F ↑| > (cid:0) nx +1 (cid:1) . In general, Theorem 7.19 is tight, as can be shown by considering “subcubes”, i.e. families of the form H = n X ∈ (cid:0) [ n ] k (cid:1) (cid:12)(cid:12)(cid:12) X ⊇ A o for some A ⊆ [ n ] . This raises the question of whether a stronger version ofTheorem 7.19 holds for families that are “far from having a structure such as H ”. Alternatively, this questioncan be viewed as a stability version of Theorem 7.19: must a family for which Theorem 7.19 is almost tightbe of a similar structure to H ?Below, we mainly consider the case that k = o ( n ) , and show in improved version of Theorem 7.19 forfamilies that are “far from H ”. To formalize this, we consider the notion of restrictions: for A ⊆ I ⊆ [ n ] ,we deﬁne F I → A = { X ⊆ [ n ] \ I | X ∪ A ∈ F } , and also deﬁne its measure µ ( F I → A ) appropriately. We say a family F is ( d, ε ) -global if for any | I | d and A ⊆ I it holds that µ ( F I → A ) ε . Theorem 7.20.

There exists

C > , such that the following holds for all d, n ∈ N such that n > C · d . Let F ⊆ (cid:0) [ n ] k (cid:1) , and suppose that F is ( d, − C · d ) -global. Then µ ( F ↑ ) > (cid:0) d k (cid:1) µ ( F ) .Proof. Let f = 1 F , g = 1 F↑ , and consider the operator M : (cid:0) [ n ] k (cid:1) → (cid:0) [ n ] k +1 (cid:1) that from a set A ⊆ [ n ] of size k moves to a random set of size k + 1 containing it. We also consider M as an operator M : L (cid:16)(cid:0) [ n ] k (cid:1)(cid:17) → L (cid:16)(cid:0) [ n ] k +1 (cid:1)(cid:17) deﬁned as M f ( B ) = E A ⊆ B [ f ( A )] (this operator is sometimes known as the raising or up operator). Note that for all B ∈ (cid:0) [ n ] k +1 (cid:1) , it holds that g ( B ) M f ( B ) = M f ( B ) , and that the average of M f isthe same as the average of f , i.e. µ ( F ) . Thus, µ ( F ) = h g, M f i k g k k M f k = k g k h f, M ∗ M f i . Using the fact that the -norm of g squared is the measure of F ↑ and rearranging, we get that µ ( F ↑ ) > µ ( F ) h f, M ∗ M f i = µ ( F ) Pr x ∈ R ( [ n ] k ) y ∼ MM ∗ x [ x ∈ F , y ∈ F ] . (18)We next lower bound Pr x ∈ R ( [ n ] k ) y ∼ MM ∗ x [ x ∈ F , y

6∈ F ] , which will give us an upper bound on the denominator.Towards this end, we relate this probability to the total inﬂuence of F as deﬁned in Section 7.4.3. Note40hat the distribution of y conditioned on x is: with probability / ( k + 1) we have y = x , and otherwise y = x ( i,j ) , where i, j are random coordinates such that x i = x j . Consider z ∼ T x , where T is the operatorof applying a random transposition; the probability that it interchanges two coordinates i, j such that x i = x j is k ( n − k ) / (cid:0) n (cid:1) , and so we get Pr x ∈ R ( [ n ] k ) y ∼ MM ∗ x [ x ∈ F , y

6∈ F ] = kk + 1 n ( n − k ( n − k ) Pr x ∈ R ( [ n ] k ) y ∼ T x [ x ∈ F , y

6∈ F ]= kk + 1 n ( n − k ( n − k ) 12 Pr x ∈ R ( [ n ] k ) y ∼ T x [1 F ( x ) = 1 F ( y )] = kk + 1 n ( n − k ( n − k ) 12 n I [1 F ] > k I [1 F ] , which is at least d k µ ( f ) by Theorem 7.18 (and the fact that var ( f ) = µ ( f )(1 − µ ( f )) > µ ( f ) / ). Itfollows that the denominator in (18) is at most µ ( f ) (cid:0) − d k (cid:1) , and plugging this into (18) we get that µ ( F ↑ ) > (cid:18) d k (cid:19) µ ( F ) . We ﬁnish this section by noting that Theorem 7.20 indeed improves on Theorem 7.19 in some range ofparameters. Namely, in the case that x = Θ( k ) , x k − and n > C · k . Normalizing the inequality inTheorem 7.19, we get that µ ( F ↑ ) > (cid:0) nk (cid:1)(cid:0) nk +1 (cid:1) (cid:0) nx +1 (cid:1)(cid:0) nx (cid:1) µ ( F ) = k + 1 n − k n − xx + 1 = (cid:18) (cid:18) k − xk (cid:19)(cid:19) µ ( F ) , so it is enough to note that F is ( d, − C · d ) -global for d = (cid:4) k − x (cid:5) . Indeed, if | I | = d and A ⊆ I , then µ ( F I→A ) (cid:0) nx (cid:1)(cid:0) n − dk −| A | (cid:1) (cid:0) nx (cid:1)(cid:0) n − dk − d (cid:1) = n ( n − · · · ( n − x + 1)( n − d )( n − d − · · · ( n − k + 1) ( k − d )! x ! k d +1 n x n k − d k d +1 n − d , which at most − C · d provided that n is large enough. d inequality The goal of this section is to prove Theorem 1.6.

Proof overview in an idealized setting.

We ﬁrst describe the proof idea in an idealized setting in whichderivative operators, and truncations, interact well. By that, we mean that if D is an order ℓ derivative, and f is a function, then D( f d ) = (D f ) d − ℓ . We remark that this property holds in product spaces, but mayfail in non-product domains such as S n .Adapting the proof of the level- d inequality from the hypercube (using Theorem 1.4 instead of standardhypercontractivity), one may easily establish a weaker version of Theorem 1.6, wherein ε is replaced by ε / , as follows. Take q = log(1 /ε ) , then (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) = h f d , f i (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) q k f k / ( q − . f is integer-valued, we have that k f k / ( q − is at most k f k q − /q ε q − /q . Using the as-sumption of our idealized setting and Parseval, we get that for every derivative D of order ℓ we have that (cid:13)(cid:13) D( f d ) (cid:13)(cid:13) = (cid:13)(cid:13) (D f ) d − ℓ (cid:13)(cid:13) k D f k . Thus, using the globalness of f and both items of Claim 4.2, weget that f d is ( d, O ( d ) ε ) -global, and so by Theorem 1.4 we get that (cid:13)(cid:13) f d (cid:13)(cid:13) q (2 q ) O ( d ) ε . All in all, weget that (cid:13)(cid:13) f d (cid:13)(cid:13) (2 q ) O ( d ) ε , which falls short of Theorem 1.6 by a factor of ε .The quantitative deﬁciency in this argument stems from the fact that f d in fact is much more globalthan what the simplistic argument above establishes, and to show that we prove things by induction on d .This induction is also the reason we have strengthened Theorem 1.6 from the introduction to the statementabove. Returning to the real setting.

To lift the assumption of the ideal setting, we return to discuss restrictions(as opposed to derivatives). Again, we would have been in good shape if restrictions were to commutewith degree truncations, but this again fails, just like derivatives. Instead, we use the following observation(Claim 8.4). Suppose k > d + ℓ + 2 , and let g be a function of pure degree k , and S be a restriction ofsize at most ℓ . Then the restricted function g S is perpendicular to degree k − ℓ − > d functions, and so ( g S ) d = (( g k ) S ) d .Note that for k = d , this statement exactly corresponds to truncations and restrictions commuting, butthe conditions of the statement always require that k > d at the very least. In fact, in our setting we willhave ℓ = 2 d , so we would need to use the statement with k = 3 d + 2 . Thus, to use this statement effectivelywe cannot apply it on our original function f , and instead have to ﬁnd an appropriate choice of g such that g k , g d ≈ f d , and moreover that they remain close under restrictions (so in particular we preserve ourglobalness). Indeed, we are able to design such g by applying appropriate sparse linear combinations ofpowers of the natural transposition operator of S n on f . g In this section we construct the function g . Lemma 8.1.

There is an absolute constant

C > , such that the following holds. Suppose n > C · d , andlet T be the adjacency operator of the transpositions graph (see Section 7.3). There exists a polynomial P with k P k C · d such that (cid:13)(cid:13)(cid:13) P (T)( f d ) − f d (cid:13)(cid:13)(cid:13) (cid:18) n (cid:19) d (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) . Proof.

Let Q ( z ) = d X i =1 Y j ∈ [4 d ] \{ i } (cid:18) z n − e − j e − i − e − j (cid:19) d , and deﬁne P ( z ) = 1 − (1 − Q ( z )) d . We ﬁrst prove the upper bound on k P k ; note that k Q k d X i =1 Y j ∈ [4 d ] \{ i } (cid:13)(cid:13)(cid:13)(cid:13) z n − e − j e − i − e − j (cid:13)(cid:13)(cid:13)(cid:13) d = d X i =1 Y j ∈ [4 d ] \{ i } (cid:18) e − j e − i − e − j (cid:19) d = 2 O ( d ) , so k P k (1 + 2 O ( d ) ) d = 2 O ( d ) . 42ext, we show that for g = P (T) f , it holds that (cid:13)(cid:13) g d − f d (cid:13)(cid:13) (cid:0) n (cid:1) d (cid:13)(cid:13) f d (cid:13)(cid:13) , and we do so byeigenvalue considerations. Let d < ℓ d , and let λ be an eigenvalue of T corresponding to a function ofpure degree ℓ . Since ℓ n/ , Lemma 7.10 implies that λ = 1 − ℓn + O (cid:16) ℓ n (cid:17) , and so λ n = e − ℓ ± O (cid:16) d n (cid:17) .Thus, as each one of the products in Q ( λ ) contains a term for ℓ , we get that | Q ( λ ) | d · O ( d ) n ! d O ( d ) n d , so | P ( λ ) | = 1 − (1 − O ( d n d ) d n d . Next, let ℓ d , and let λ be an eigenvalue of T corresponding to afunction of pure degree ℓ . As before, λ n = e − ℓ ± O (cid:16) d n (cid:17) , but now in Q ( λ ) there is one product that omitsthe term for ℓ . A direct computation gives that Q ( λ ) = Y j ∈ [4 d ] \{ ℓ } (cid:18) λ n − e − j e − ℓ − e − j (cid:19) d + 2 O ( d ) n d = Y j ∈ [4 d ] \{ ℓ } − O O ( d ) n !! + 2 O ( d ) n d , so Q ( λ ) = 1 − O (cid:16) O ( d ) n (cid:17) . Thus, | P ( λ ) − | = O O ( d ) n d ! n d . It follows that g d − f d = d P ℓ =0 c ℓ f = ℓ for | c ℓ | n d , and the result follows from Parseval. In this section, we study random walks along Cayley graphs on S n . The speciﬁc transition operator we willlater be concerned with is the transposition operator from Lemma 7.10 and its powers, but we will presentthings in greater generality. A Markov chain M on S n is called a Cayley random walk if for any σ, τ, π ∈ S n , thetransition probability from σ to τ is the same as the transition probability from σπ to τ π . In other words, a Markov chain M is called Cayley if the transition probability from σ to τ is onlya function of στ − . We will be interested in the interaction between random walks and restrictions, andtowards this end we ﬁrst establish the following claim, asserting that a Cayley random walk either nevertransitions between two restrictions T and T ′ , or can always transition between the two. Claim 8.3.

Suppose M is a Cayley random walk on S n , let i , . . . , i t ∈ [ n ] be distinct, and let T = { ( i , j ) , . . . , ( i t , j t ) } , T ′ = { ( i , j ′ ) , . . . , ( i t , j ′ t ) } be consistent sets. Then one of the following two musthold:1. Pr u ∈ S T ′ n v ∼ M v (cid:2) v ∈ S Tn (cid:3) = 0 . . For all π ∈ S Tn , it holds that Pr u ∈ S T ′ n v ∼ M v (cid:2) v = π (cid:3) > .Proof. If the ﬁrst item holds then we’re done, so let us assume otherwise. Then there are u ∈ S T ′ n , v ∈ S Tn such that M has positive probability of transitioning from u to v . Denoting τ = uv − , we note that τ ( j ℓ ) = j ′ ℓ for all ℓ = 1 , . . . , t . Fix π ∈ S Tn . Since M is a Cayley operator, the transition probability from τ π to π ispositive, and since τ π is in S T ′ n , the proof is concluded.If M satisﬁes the second item of the above claim with T and T ′ , we say that M is compatible with ( T, T ′ ) . Let T = { ( i , j ) , . . . , ( i t , j t ) } be consistent. A function f ∈ L ( S Tn ) is called a d -junta, if there is S ⊆ [ n ] \ { i , . . . , i t } of size d such that f ( π ) only depends on π ( i ) for i ∈ S (we say that f ( π ) onlydepends on π ( S ) ). With this deﬁnition in hand, we may deﬁne the space of degree d functions on S Tn ,denoted by V d ( S Tn ) , as the span of all d -juntas, and subsequently deﬁne projections onto this subspaces.That is, for each f ∈ L ( S Tn ) we denote by f d the projection of f onto V d ( S Tn ) . Finally, we deﬁne thepure degree d part of f as f = d = f d − f d − .We have the following basic property of pure degree d functions. Claim 8.4.

Suppose that f : S n → R is of pure degree d . Let T be a set of size ℓ < d . Then f T is orthogonalto all functions in V d − − ℓ .Proof. Clearly, it is enough to show that f T is orthogonal to all ( d − − ℓ ) -juntas. Fix g : S Tn → R to be a ( d − − ℓ ) -junta, and let h be its extension to S n by setting it to be outside S Tn . Then h is a ( d − -junta,and so h f, h i = ( n − ℓ )! n ! h f T , g i . Any random walk M on S n extends to an operator on functions on S n , which maps f : S n → R to thefunction M f : S n → R given by M f ( π ) = E u ∈ S n v ∼ M u [ f ( u ) | v = π ] . Our main goal in this section is to prove the following statement that both strengtheners and generalizesProposition 3.1.

Proposition 8.5.

Let f : S n → R . Let M be a Cayley random walk on S n , let g = M f , and let T = { ( i , j ) , . . . , ( i t , j t ) } be a consistent set. Then for all d , k ( g T ) d k max T ′ = { ( i ,j ′ ) ,..., ( i t ,j ′ t ) } M compatible with ( T,T ′ ) k ( f T ′ ) d k . M be a Cayley random walk and let T = { ( i , j ) , . . . , ( i t , j t ) } and T ′ = { ( i , j ′ ) , . . . , ( i t , j ′ t ) } be consistent so that M is compatible with ( T, T ′ ) . Put I = { ( i , i ) , . . . , ( i t , i t ) } . Deﬁne the operator M S Tn → S T ′ n : L ( S Tn ) → L ( S T ′ n ) in the following way: given a function f ∈ L ( S Tn ) , we deﬁne M S T ′ n → S Tn f ( π ) = E u ∈ R S Tn v ∼ M u (cid:2) f ( u ) (cid:12)(cid:12) v = π (cid:3) . Drawing inspiration from the proof of Proposition 3.1, we study the operator M S Tn → S T ′ n . Since we arealso dealing with degree truncations, we have to study its interaction with this operator. Indeed, a key step inthe proof is to show that the two operators commute, in the following sense: for all d ∈ N and f ∈ L ( S Tn ) ,it holds that (cid:16) M S Tn → S T ′ n f (cid:17) = d = M S Tn → S T ′ n (cid:16) f = d (cid:17) . Towards this end, we view L ( S Tn ) (and similarly L ( S T ′ n ) ) as a right S In -module using the following op-eration: a function-permutation pair ( f, π ) ∈ L ( S Tn ) × S In is mapped to a function f π ∈ L ( S Tn ) deﬁnedas f π ( σ ) = f ( σπ − ) . Claim 8.6.

With the setup above, M S Tn → S T ′ n : L (cid:0) S Tn (cid:1) → L (cid:16) S T ′ n (cid:17) is a homomorphism of S In -modules.Proof. The proof is essentially the same as the proof of Lemma 5.1, and is therefore omitted.Therefore, it is sufﬁcient to prove that any homomorphism commutes with taking pure degree d part,which is the content of the following claim. Claim 8.7.

Let

T, T ′ be consistent as above, and let A : L ( S Tn ) → L ( S T ′ n ) be a homomorphism of right S In -modules. Then for all f ∈ L ( S Tn ) we have that (A f ) = d = A (cid:16) f = d (cid:17) . Proof.

We ﬁrst claim that A preserves degrees, i.e. A V d ( S Tn ) ⊆ V d ( S T ′ n ) . To show this, it is enough to notethat if f ∈ L ( S Tn ) is a d -junta, then A f is a d -junta. Let f be a d -junta, and suppose that S ⊆ [ n ] is aset of size at most d such that f ( σ ) only depends on σ ( S ) . Then for any π that has S as ﬁxed points, wehave that f ( σ ) = f ( σπ − ) = f π ( σ ) , so f = f π . Applying A and using the previous claim we get that A f = A f π = (A f ) π . This implies that A f is invariant under any permutation that keeps S as ﬁxed points,so it is an S -junta.Let V = d ( S Tn ) be the space of functions of pure degree d , i.e. V d ( S Tn ) ∩ V d − ( S Tn ) ⊥ . We claim that A alsopreserves pure degrees, i.e. A V = d ( S Tn ) ⊆ V = d ( S T ′ n ) . By the previous paragraph it is enough to show thatif f ∈ V = d ( S Tn ) , then A f is orthogonal to V d − ( S T ′ n ) . Letting A ∗ be the adjoint operator of A , it is easilyseen that A ∗ : L ( S T ′ n ) → L ( S Tn ) is also a homomorphism between right S In -modules, and by the previousparagraph it follows that A ∗ preserves degrees. Thus, for any g ∈ V d − ( S T ′ n ) we have that A ∗ g ∈ V d − ( S Tn ) ,and so h A f, g i = h f, A ∗ g i = 0 . We can now prove the statement of the claim. Fix f ∈ L ( S Tn ) and d . Then by the above paragraph, A (cid:0) f = d (cid:1) ∈ V = d ( S T ′ n ) , and by linearity of A we have P d A (cid:0) f = d (cid:1) = A f . The claim follows from theuniqueness of the degree decomposition. 45e deﬁne a transition operator on restrictions as follows. From a restriction T = { ( i , j ) , . . . , ( i t , j t ) } ,we sample T ′ ∼ N ( T ) as follows. Take π ∈ S Tn uniformly, sample σ ∼ M π , and then let T ′ be { ( i , σ ( i )) , . . . , ( i t , σ ( i t )) } . The following claim is immediate: Claim 8.8. (M f ) T = E T ′ ∼ N ( T ) (cid:2) M S T ′ n → S Tn f T ′ (cid:3) . We are now ready to prove Proposition 8.5.

Proof of Proposition 8.5.

By Claim 8.8, we have g T = E T ′ ∼ T M S T ′ n → S Tn f T ′ . Using Claim 8.7 and thelinearity of the operator f f = d , we get ( g T ) = d = E T ′ ∼ N ( T ) M S T ′ n → S Tn (cid:16) ( f T ′ ) = d (cid:17) . Summing this up using linearity again, we conclude that ( g T ) d = E T ′ ∼ N ( T ) M S T ′ n → S Tn (cid:16) ( f T ′ ) d (cid:17) . Taking norms and using the triangle inequality gives us that k ( g T ) d k E T ′ ∼ N ( T ) (cid:13)(cid:13)(cid:13) M S T ′ n → S Tn (cid:16) ( f T ′ ) d (cid:17)(cid:13)(cid:13)(cid:13) max T ′ : M consistent with ( T,T ′ ) (cid:13)(cid:13)(cid:13) M S T ′ n → S Tn (cid:16) ( f T ′ ) d (cid:17)(cid:13)(cid:13)(cid:13) . The proof is now concluded by appealing to Fact 3.2. d inequality The last ingredient we will need in the proof of Theorem 1.6 is a weak version of the level- d inequality,which does not take the globalness of f into consideration. Lemma 8.9.

Let C be sufﬁciently large, let n > log (1 /ε ) d C d , and let f : S n → { , } satisfy k f k ε .Then k f d k n d log (1 /ε ) O ( d ) ε . Proof.

Set q = log(1 /ε ) , and without loss of generality assume q is an even integer (otherwise we maychange q by a constant factor to ensure that). Using H ¨older’s inequality, Lemma 5.14, and the fact that k f k q/ ( q − = O (cid:0) ε (cid:1) , we obtain (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) = D f d , f E (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) q k f k q/ ( q − log (1 /ε ) O ( d ) n d (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) ε , and the lemma follows by rearranging. Lemma 8.10.

There is

C > , such that the following holds for n > C · d . For all derivatives D of order t d we have: (cid:13)(cid:13)(cid:13) D (cid:16) f d (cid:17)(cid:13)(cid:13)(cid:13) O ( d ) max t − derivative D ′ (cid:13)(cid:13)(cid:13)(cid:0) D ′ f (cid:1) d − t (cid:13)(cid:13)(cid:13) + (cid:18) n (cid:19) d (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) . roof. Let T d = P (T) be as in Lemma 8.1, and write f d = T d ( f d )+ g , where k g k n − d (cid:13)(cid:13) f d (cid:13)(cid:13) .Let S be a consistent restriction of t coordinates, and let D be a derivative along S . Then there is R ⊆ L ofsize t such that D f = (L f ) S → R . By Claim 4.3, the degree of D( f d ) is at most d − t , thus D( f d ) = (cid:16) D( f d ) (cid:17) d − t . (19)We want to compare the right-hand side with (D (T d f )) d − t , but ﬁrst we show that in it one may truncateall degrees higher than d in f . Note that by Claim 8.7, for each k > d the function T d f = k has pure degree k , so D(T d f = k ) is perpendicular to degree k − t − functions. Since k − t − > d − t , we have that itslevel d − t projection is , so (D (T d f )) d − t = (cid:0) D (cid:0) T d f d (cid:1)(cid:1) d − t . It follows that (cid:13)(cid:13)(cid:13) D( f d ) − (D (T d f )) d − t (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) D (cid:16) f d − T d ( f d ) (cid:17)(cid:17) d − t (cid:13)(cid:13)(cid:13)(cid:13) k D g k n t k g k n t − d (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) . (20)Our task now is to bound (cid:13)(cid:13)(cid:13) (D (T d f )) d − t (cid:13)(cid:13)(cid:13) . Since T commutes with Laplacians, it follows that T d also commutes with Laplacians, and so (D (T d f )) d − t = ((LT d f ) S → R ) d − t = ((T d L f ) S → R ) d − t . (21)By Proposition 8.5, for all i and h : S n → R we have k (cid:0)(cid:0) T i h (cid:1) S (cid:1) d k max S ′ = { ( i ,j ′ ) ,..., ( i t ,j ′ t ) } (cid:13)(cid:13)(cid:13) ( h S ′ ) d (cid:13)(cid:13)(cid:13) , and so k ((T d h ) S ) d k k P k max S ′ = { ( i ,j ′ ) ,..., ( i t ,j ′ t ) } (cid:13)(cid:13)(cid:13) ( h S ′ ) d (cid:13)(cid:13)(cid:13) O ( d ) max S ′ = { ( i ,j ′ ) ,..., ( i t ,j ′ t ) } (cid:13)(cid:13)(cid:13) ( h S ′ ) d (cid:13)(cid:13)(cid:13) . Applying this for h = Lf gives that (cid:13)(cid:13)(cid:13) ((T d L f ) S → R ) d − t (cid:13)(cid:13)(cid:13) O ( d ) max R ′ (cid:13)(cid:13)(cid:13) ((L f ) S → R ′ ) d − t (cid:13)(cid:13)(cid:13) = 2 O ( d ) max D ′ (cid:13)(cid:13)(cid:13)(cid:0) D ′ f (cid:1) d − t (cid:13)(cid:13)(cid:13) , (22)where the last transition is by the deﬁnition of derivatives. Combining (20), (21), (22) and using the triangleinequality ﬁnishes the proof. d inequality We end this section by deriving the following proposition, which by Claim 4.2 implies Theorem 1.6.

Proposition 8.11.

There exists an absolute constant

C > such that the following holds for all d ∈ N , ε > and n > C · d log(1 /ε ) C · d . Let f : S n → Z be a function, such that for all t d and all t -derivatives D we have k D f k ε . Then (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) Cd ε log (1 /ε ) Cd . roof. The proof is by induction on d . If d = 0 , then (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) = (cid:12)(cid:12)(cid:12)(cid:12) E [ f ( π )] (cid:12)(cid:12)(cid:12)(cid:12) E h | f ( π ) | i = k f k ε , where in the second transition we used the fact that f is integer-valued.We now prove the inductive step. Fix d > . Let t d , and let D be a t -derivative. By Lemma 8.10,there is an absolute constant C > such that (cid:13)(cid:13)(cid:13) D (cid:16) f d (cid:17)(cid:13)(cid:13)(cid:13) e C ( d ) max D ′ a t − derivative (cid:13)(cid:13)(cid:13)(cid:0) D ′ f (cid:1) d − t (cid:13)(cid:13)(cid:13) + n − d (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) . (23)Fix D ′ . The function D ′ f takes integer values and is deﬁned on a domain that is isomorphic to S n − t , so bythe induction hypothesis we have (cid:13)(cid:13)(cid:13)(cid:0) D ′ f (cid:1) d − t (cid:13)(cid:13)(cid:13) e C ( d − t ) ε log (cid:18) ε (cid:19) C ( d − t ) . As for k f d k , applying Lemma 8.9 we see it is at most n d ε log Cd (1 /ε ) . Plugging these two estimatesinto (23) we get that (cid:13)(cid:13)(cid:13) D (cid:16) f d (cid:17)(cid:13)(cid:13)(cid:13) e Cd ε log C · d (1 /ε ) , provided that C is sufﬁciently large.If (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) e Cd ε log (cid:18) ε (cid:19) Cd we’re done, so assume otherwise. We get that (cid:13)(cid:13) D ′ (cid:0) f d (cid:1)(cid:13)(cid:13) k f d k for all derivatives of order at most d ,and from Claim 4.3, (cid:13)(cid:13) D ′ (cid:0) f d (cid:1)(cid:13)(cid:13) = 0 for higher-order derivatives, and so by Claim 4.2, the function f d is (2 d, d k f d k ) -global, and by Lemma 3.5, we get that f d is d k f d k -global with constant . In thiscase, we apply the standard argument as presented in the overview, as outlined below.Set q = log (1 /ε ) , and without loss of generality assume q is an even integer (otherwise we may change q by a constant factor to ensure that). Set ρ = q ) . From Lemmas 5.1, 5.4 we have that T ( ρ ) preservesdegrees, and so by Corollary 5.11 we get (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) ρ − C · d h T ( ρ ) f d , f d i = ρ − C · d h T ( ρ ) f d , f i ρ − C · d (cid:13)(cid:13)(cid:13) T ( ρ ) f d (cid:13)(cid:13)(cid:13) q k f k q/ ( q − , where we also used H ¨older’s inequality. By Theorem 3.3, we have (cid:13)(cid:13) T ( ρ ) f d (cid:13)(cid:13) q d (cid:13)(cid:13) f d (cid:13)(cid:13) , and by adirection computation k f k q/ ( q − ε q − /q . Plugging these two estimates into the inequality above andrearranging yields that (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) ρ − C · d d k f k q/ ( q − ρ − C · d ε = 2 C log(10 C ) ε log C · d (1 /ε ) C · d ε log C · d (1 /ε ) , for large enough C . 48 .8 Deducing the strong level- d inequality: proof of Theorem 1.7 Let δ = 2 C · d ε log C · d (1 /ε ) for sufﬁciently large absolute constant C . By Claim A.1 we get that f d is δ -global with constant . Set q = log(1 / k f k ) , and let ρ = 1 / (10 · · q ) be from Theorem 3.3. FromLemmas 5.1, 5.4 we have that T ( ρ ) preserves degrees, and so by Corollary 5.11 we get (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) = h f d , f d i ρ − O ( d ) h f d , T ( ρ ) f d i = ρ − O ( d ) h f, T ( ρ ) f d i ρ − O ( d ) k f k q/ ( q − (cid:13)(cid:13)(cid:13) T ( ρ ) f d (cid:13)(cid:13)(cid:13) q . Using k f k q/ ( q − k f k q − /q = k f k k f k − /q O ( k f k ) and Theorem 3.3 to bound (cid:13)(cid:13) T ( ρ ) f d (cid:13)(cid:13) q δ ,it follows that (cid:13)(cid:13)(cid:13) f d (cid:13)(cid:13)(cid:13) ρ − O ( d ) k f k δ C · d k f k ε log C · d (1 /ε ) , where we used k f k ε . References [1] E. Bannai and T. Ito.

Algebraic combinatorics. I . The Benjamin/Cummings Publishing Co., Inc.,Menlo Park, CA, 1984. Association schemes.[2] J. Bourgain. On the distribution of the Fourier spectrum of Boolean functions.

Israel J. of Math. ,(131):269–276, 2002.[3] M. Braverman, S. Khot, and D. Minzer. On rich $2$-to-$1$ games.

Electronic Colloquium on Com-putational Complexity (ECCC) , 26:141, 2019.[4] P. Diaconis and M. Shahshahani. Time to reach stationarity in the Bernoulli–Laplace diffusion model.

SIAM Journal on Mathematical Analysis , 18(1):208–218, 1987.[5] C. F. Dunkl. Orthogonal functions on some permutation groups. In

Relations between combinatoricsand other parts of mathematics (Proc. Sympos. Pure Math., Ohio State Univ., Columbus, Ohio, 1978) ,Proc. Sympos. Pure Math., XXXIV, pages 129–147. Amer. Math. Soc., Providence, R.I., 1979.[6] S. Eberhard. Product mixing in the alternating group. arXiv preprint arXiv:1512.03517 , 2015.[7] D. Ellis, E. Friedgut, and H. Pilpel. Intersecting families of permutations.

Journal of the AmericanMathematical Society , 24(3):649–682, 2011.[8] Y. Filmus. Orthogonal basis for functions over a slice of the Boolean hypercube.

Electronic Journalof Combinatorics , 23(1):P1.23, 2016.[9] Y. Filmus and E. Mossel. Harmonicity and invariance on slices of the Boolean cube.

ProbabilityTheory and Related Fields , 175(3–4):721–782, 2019.[10] Y. Filmus, R. O’Donnell, and X. Wu. A log-sobolev inequality for the multislice, with applications. In , pages 34:1–34:12, 2019.[11] E. Friedgut. Boolean functions with low average sensitivity depend on few coordinates.

Combinator-ica , 18(1):27–35, 1998. 4912] E. Friedgut and J. Bourgain. Sharp thresholds of graph properties, and the k -SAT problem. Journal ofthe American mathematical Society , 12(4):1017–1054, 1999.[13] W. Fulton and J. Harris.

Representation theory: a ﬁrst course , volume 129. Springer Science &Business Media, 2013.[14] W. T. Gowers. Quasirandom groups.

Combinatorics, Probability and Computing , 17(3):363–387,2008.[15] J. Kahn, G. Kalai, and N. Linial. The inﬂuence of variables on Boolean functions. In

FOCS 1988 ,pages 68–80, 1988.[16] P. Keevash. Shadows and intersections: stability and new proofs.

Adv. Math. , 218(5):1685–1703,2008.[17] P. Keevash, N. Lifshitz, E. Long, and D. Minzer. Hypercontractivity for global functions and sharpthresholds. arXiv preprint arXiv:1906.05568 , 2019.[18] P. Keevash, N. Lifshitz, E. Long, and D. Minzer. Forbidden intersections for codes. 2020.[19] S. Khot. On the power of unique 2-prover 1-round games. In

Proceedings on 34th Annual ACMSymposium on Theory of Computing, May 19-21, 2002, Montr´eal, Qu´ebec, Canada , pages 767–775,2002.[20] S. Khot, D. Minzer, and M. Safra. Pseudorandom sets in grassmann graph have near-perfect expansion.In

FOCS 2018 , pages 592–601, 2018.[21] T.-Y. Lee and H.-T. Yau. Logarithmic Sobolev inequality for some models of random walks.

Ann.Probab. , 26(4):1855–1873, 1998.[22] N. Lifshitz and D. Minzer. Noise sensitivity on the p -biased hypercube. In , pages 1205–1226, 2019.[23] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with low inﬂuences:invariance and optimality. In

FOCS 2005 , pages 21–30, 2005.[24] R. O’Donnell.

Analysis of boolean functions . Cambridge University Press, 2014.[25] R. O’Donnell and K. Wimmer. KKL, Kruskal-Katona, and monotone nets.

SIAM J. Comput. ,42(6):2375–2399, 2013.[26] J. Salez. A sharp log-Sobolev inequality for the multislice.

ArXiv , abs/2004.05833, 2020.

A Missing proofs

A.1 Globalness of f implies globalness of f d Claim A.1.

There exists an absolute constant

C > such that the following holds for all n, d ∈ N and ε > satisfying n > C · d log(1 /ε ) C · d . Suppose f : S n → Z is (2 d, ε ) -global. Then for all j d , thefunction f j is . (2 j, O ( j ) ε log O ( j ) (1 /ε )) -global.2. O ( j ) ε log O ( j ) (1 /ε ) -global with constant .Proof. If j = 0 , then the claim is clear as f j is just the constant E [ f ( π )] , and its absolute value is at most k f k ε .Suppose j > and let D be a derivative of order r j , then by Claim 4.2 we have k D f k j ε .Therefore, applying Proposition 8.11 on D f , we get that (cid:13)(cid:13) (D f ) j − (cid:13)(cid:13) O (( j − ) ε log O ( j ) (1 /ε ) . Using Lemma 8.10 we get that (cid:13)(cid:13) D( f j ) (cid:13)(cid:13) O ( j ) max − derivative D ′ (cid:13)(cid:13)(cid:13)(cid:0) D ′ f (cid:1) j − (cid:13)(cid:13)(cid:13) + (cid:18) n (cid:19) j (cid:13)(cid:13) f j (cid:13)(cid:13) O ( j ) ε log O ( j ) (1 /ε ) , where in the last inequality we our earlier estimate and Lemma 8.9. For derivatives of order higher than j ,we have that D( f j ) = 0 from Claim 4.3. Thus, Claim 4.2 implies that f j is (2 j, O ( j ) ε log O ( j ) (1 /ε )) -global. The second item immediately follows from Lemma 3.5. A.2 Proof of Theorem 7.13

Our proof will make use of the following simple fact.

Fact A.2.

Let g : S n → R .1. We have the Poincar´e inequality: var ( g ) n P L k L g k , where the sum is over all -Laplacians.2. We have I [ g ] = n − P L k L g k , where again the sum is over all -Laplacians.Proof. The second item is straightforward by the deﬁnitions, and we focus on the ﬁrst one. Let ˜L g = E L [L g ] = ( I − T) g . If α d,r is an eigenvalue of T corresponding to a function from V = d ( S n ) , then by thesecond item in Lemma 7.10 we have α d,r − dn − .Note that we may ﬁnd an orthonormal basis of V = d ( S n ) consisting of eigenvectors of T , and thereforewe may ﬁrst write g = P d g = d where g = d ∈ V = d ( S n ) , and then further decompose each g d to g d = r d P r =0 g d,r where g d,r ∈ V = d ( S n ) are all orthogonal and eigenvectors of T . We thus get h g, ˜L g i = X d r d X r =0 (1 − α d,r ) (cid:13)(cid:13)(cid:13) g d,r (cid:13)(cid:13)(cid:13) > X d r d X r =0 dn − (cid:13)(cid:13)(cid:13) g d,r (cid:13)(cid:13)(cid:13) = X d dn − (cid:13)(cid:13)(cid:13) g = d (cid:13)(cid:13)(cid:13) > n − var ( g ) . (24)On the other hand, h g, ˜ Lg i = E π (cid:20) E τ a transposition [ g ( π )( g ( π ) − g ( π ◦ τ ))] (cid:21) = 12 E τ a transposition (cid:20) E π (cid:2) ( g ( π ) − g ( π ◦ τ )) (cid:3)(cid:21) , which is the same as ( n ) P L k L g k . Combining this with the previous lower bound gives the ﬁrst item.51 roof of Theorem 7.13. Let f = 1 S . Then I [ f ] = n − Pr π ∈ S n σ ∼ T π [ f ( π ) = f ( σ )] , and arithmetizing that wehave that it is equal to n − h f, ( I − T) f i . Thus, writing f = f =0 + f =1 + . . . , where f = j ∈ V = j ( S n ) , wehave, as in inequality (24), that n − h f, ( I − T) f i > n − n X j =0 jn − (cid:13)(cid:13) f = j (cid:13)(cid:13) > d (cid:13)(cid:13)(cid:13) f >d (cid:13)(cid:13)(cid:13) . (25)To ﬁnish the proof, we show that (cid:13)(cid:13) f >d (cid:13)(cid:13) > Ω( var ( f )) . To do that, we upper-bound the weight of f ondegrees to d .Let g = f d . We intend to bound var ( g ) using the Poincar´e inequality, namely the ﬁrst item in Fact A.2.Fix an order Laplacian L . We have k L g k = h L g, L f i k L g k k L f k / . (26)As f is Boolean, L f is {− , , } -valued and so k L f k / = k L f k / , and next we bound k L g k . Notethat k L g k = E D order derivativeconsistent with L h k D g k i , (27)and we analyze k D g k for all derivatives D . For that we use hypercontractivity, and we ﬁrst have to showthat D g is global.Fix a -derivative D , and set h = D g . By Lemma 8.10 (with ˜ f = f − E [ f ] instead of f ), we get thatfor all r d − and order r derivatives D we have k D h k = (cid:13)(cid:13)(cid:13) DD (cid:16) ˜ f d (cid:17)(cid:13)(cid:13)(cid:13) O ( d ) max D ′ an r − derivative D ′ a − derivative (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) D ′ D ′ ˜ f (cid:17) d − r − (cid:13)(cid:13)(cid:13)(cid:13) + n − d (cid:13)(cid:13)(cid:13) ˜ f d (cid:13)(cid:13)(cid:13) − C · d / + n − d p var ( f ) def = δ, where we used D ′ D ′ ˜ f = D ′ D ′ f , which by assumption has -norm at most − C · d , and (cid:13)(cid:13)(cid:13) ˜ f d (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ˜ f (cid:13)(cid:13)(cid:13) = p var ( f ) . For r > d , we have by Claim 4.3 that k D h k = 0 . Thus, all derivatives of h have small -norm,and by Claim 4.2 we get that h is (2 d, d δ ) -global. Thus, from Theorem 1.4 we have that k D g k O ( d ) δ / k D g k / (28)Plugging inequality (28) into (27) yields that k L g k O ( d ) δ E D order derivativeconsistent with L h k D g k i = 2 O ( d ) δ k L g k O ( d ) δ k L f k . Plugging this, and the bound we have on the / -norm L f , into (26), we get that k L g k O ( d ) δ / k L f k . Summing this inequality over all -Laplacians and using Fact A.2, we get that var ( g ) n X L k L g k O ( d ) δ / n − X L k L f k = 2 C · d δ / I [ f ] for some absolute constant C , and we consider two cases.52 he case that I [ f ] − C · d δ − / var ( f ) / . In this case we get that var ( g ) var ( f ) / , and so (cid:13)(cid:13) f >d (cid:13)(cid:13) = var ( f ) − var ( g ) > var ( f ) / . Plugging this into (25) ﬁnishes the proof. The case that I [ f ] > − C · d δ − / var ( f ) / . By deﬁnition of δ we get that either I [ f ] > C · d / var ( f ) ,in which case we are done, or I [ f ] > − O ( d ) n d var ( f ) / , in which case we are done by the lower boundon nn