An Elementary Exposition of Pisier's Inequality
Siddharth Iyer, Anup Rao, Victor Reis, Thomas Rothvoss, Amir Yehudayoff
aa r X i v : . [ m a t h . F A ] S e p AN ELEMENTARY EXPOSITION OF PISIER’S INEQUALITY
SIDDHARTH IYER, ANUP RAO, VICTOR REIS, THOMAS ROTHVOSS,AND AMIR YEHUDAYOFF
Abstract.
Pisier’s inequality is central in the study of normed spaces and has im-portant applications in geometry. We provide an elementary proof of this inequality,which avoids some non-constructive steps from previous proofs. Our goal is to makethe inequality and its proof more accessible, because we think they will find additionalapplications. We demonstrate this with a new type of restriction on the Fourier spec-trum of bounded functions on the discrete cube. Introduction
The
Rademacher projection is a method to linearize functions from the discrete cube {± } n to the Euclidean space R m . It is fundamental in the study of normed spaces [8, 1].Pisier’s inequality controls the operator norm of the Rademacher projection [13, 14].This inequality has several important geometric applications. Most strikingly, ifcombined with a result of Figiel and Tomczak-Jaegermann [5] it implies the M M ∗ -estimate, which says that in a certain average sense, symmetric convex bodies behavemuch more like ellipsoids than one could derive from John’s classical theorem [6]. The M M ∗ -estimate is, in turn, a central piece in the proof of Milman’s QS-theorem [9, 10,11], which is one of the deepest results in convex geometry.Pisier’s original proof uses complex analysis and interpolation (and provides addi-tional information). Bourgain and Milman found a different and more direct proof [3].Their proof relies on several deep results, like the Hahn-Banach theorem, the Rieszrepresentation theorem, and Bernstein’s theorem from approximation theory.The purpose of this note is to present an elementary and accessible proof of Pisier’sinequality. Our proof is explicit and avoids the non-constructive part in the prooffrom [3].1.1. The inequality.
The Rademacher projection is based on Fourier analysis. Thestarting point is the space of functions from {± } n to R . The characters form animportant (orthonormal) basis for this space. The character that corresponds to theset S ⊆ [ n ] is the map χ S : {± } n → R defined by χ S ( x ) = χ S ( x , x , . . . , x n ) = Y j ∈ S x j . Every f : {± } n → R m can be uniquely expressed as f ( x ) = X S ⊆ [ n ] ˆ f ( S ) · χ S ( x ) where the vectors ˆ f ( S ) ∈ R m are the Fourier coefficients of f . The linear part of f is f lin ( x ) = X S ⊆ [ n ]: | S | =1 ˆ f ( S ) · χ S ( x ) = n X j =1 ˆ f ( { j } ) · x j . The Rademacher projection is the map f f lin . Pisier’s inequality gives an upperbound on its operator norm. Theorem (Pisier) . There is a constant
C > so that the following holds. Let k · k bea norm on R m . Let X be uniformly distributed in {± } n . Then E (cid:2) k f lin ( X ) k (cid:3) / ≤ C log( m + 1) · E (cid:2) k f ( X ) k (cid:3) / . The proof of Pisier’s inequality from [3] is based on the existence of a function g : {± } n → R that is nearly linear, yet has small ℓ norm. The existence of g isproved in a non-constructive way. We give an explicit and simple formula for such afunction g .When k · k is the Euclidean norm, the C log( m ) term can be replaced by 1, becauseorthogonal projections do not increase the Euclidean norm. Bourgain, however, showedthat for general norms the log( m ) factor is necessary [2]. Bourgain’s constructionis probabilistic. In Section 5, we describe a simple explicit example, also based onBourgain’s idea, showing that log( m )log log( m ) factor is necessary.There is a variant of Pisier’s inequality for functions f : R n → R m where X isGaussian. While such a variant is useful for applications, it is a statement about aninfinite-dimensional vector space of functions, which makes the proof more complicated.However, one can show that the variant on the discrete cube and the variant in Gaussianspace are equivalent (see e.g. [1]).We conclude the introduction with one more application. Fourier analysis of Booleanfunctions is an important area in computer science and mathematics with many appli-cations (see the textbook [12]). A central goal is to identify properties of the Fourierspectrum of Boolean or bounded function on the cube; see [4, 7] and references within.Pisier’s inequality implies the following restriction on the Fourier spectrum. There isa constant c > f : {± } n → [ − , k ˆ f k ) ≥ c n X j =1 | ˆ f ( { j } ) | , where k ˆ f k is the sparsity of ˆ f ; i.e., the number of sets S ⊆ [ n ] so that ˆ f ( S ) = 0. Theproof of this inequality and its sharpness can be deduced from Section 5.2. Preliminaries
Convolution is a powerful tool when there is an underlying group structure. Here thegroup is the cube {± } n with the operation x ⊙ z = ( x z , . . . , x n z n ). The convolutionof a (vector-valued) function f : {± } n → R m and a (scalar-valued) function g : {± } n → R is the function f ∗ g : {± } n → R m defined by f ∗ g ( x ) = E Z [ g ( Z ) · f ( x ⊙ Z )] N ELEMENTARY EXPOSITION OF PISIER’S INEQUALITY 3 where Z is uniformly random in {± } n . We list some basic properties of convolution. Fact 1. If T : R m → R m is a linear map then T ( f ∗ g ) = T ( f ) ∗ g . Fact 2. [ f ∗ g ( S ) = ˆ g ( S ) · ˆ f ( S ) for every S ⊆ [ n ] .Proof. f ∗ g ( x ) = E [ g ( Z ) · f ( x ⊙ Z )]= E "X S ˆ g ( S ) χ S ( Z ) · X T ˆ f ( T ) χ T ( x ⊙ Z ) = E "X S ˆ g ( S ) χ S ( Z ) · X T ˆ f ( T ) χ T ( x ) χ T ( Z ) = X S ˆ g ( S ) ˆ f ( S ) χ S ( x ) , where the last equality uses linearity of expectation and the orthonormality of thecharacters: E [ χ S ( Z ) χ T ( Z )] = ( S = T ,0 otherwise. (cid:3) Fact 3.
For any norm k · k , E (cid:2) k f ∗ g ( X ) k (cid:3) / ≤ E [ | g ( X ) | ] · E (cid:2) k f ( X ) k (cid:3) / . Proof. E (cid:2) k f ∗ g ( X ) k (cid:3) = E X (cid:20) k E Z [ g ( Z ) · f ( X ⊙ Z )] k (cid:21) ≤ E X (cid:20)(cid:0) E Z [ | g ( Z ) | · k f ( X ⊙ Z ) k ] (cid:1) (cid:21) , where the inequality follows from the convexity of the norm k · k . By the Cauchy-Schwarz inequality, we get ≤ E X (cid:20) E Z [ | g ( Z ) | ] · E Z ′ (cid:2) | g ( Z ′ ) | · k f ( X ⊙ Z ′ ) k (cid:3)(cid:21) = E Z [ | g ( Z ) | ] · E Z ′ (cid:20) | g ( Z ′ ) | · E X (cid:2) k f ( X ) k (cid:3)(cid:21) = (cid:0) E Z [ | g ( Z ) | ] (cid:1) · E X (cid:2) k f ( X ) k (cid:3) . (cid:3) IYER ET AL. An Overview of the Proof
The linear part f lin of f can be expressed as the convolution of f with the linearfunction L = P nj =1 x j ; see Fact 2. In order to analyze the norm of f lin = f ∗ L , weuse an auxiliary function P which serves as a proxy for L . We call the function P the linear proxy , and it depends on a parameter ℓ that will be set to be ≈ log( m ). Lemma 4.
For every odd ℓ > , there is P : {± } n → R so that the following hold.First, P is close to L : for all S ⊆ [ n ] , | \ P − L ( S ) | ≤ ℓ ℓ . Second, P has small ℓ norm: E [ | P ( X ) | ] ≤ ℓ. Let us explain how to prove Pisier’s inequality using the linear proxy P . The con-vexity of norms allows to split the bound to two terms: E (cid:2) k f lin ( X ) k (cid:3) / = E (cid:2) k f ∗ L ( X ) k (cid:3) / = E (cid:2) k f ∗ P ( X ) + f ∗ ( L − P )( X ) k (cid:3) / ≤ E (cid:2) k f ∗ P ( X ) k (cid:3) / + E (cid:2) k f ∗ ( L − P )( X ) k (cid:3) / . Bound each of the two terms separately. To bound the first term, apply Fact 3 anduse the choice of P , E (cid:2) k f ∗ P ( X ) k (cid:3) / ≤ E [ | P ( Z ) | ] · E (cid:2) k f ( X ) k (cid:3) / ≤ ℓ E (cid:2) k f ( X ) k (cid:3) / . To bound the second term, we use John’s theorem, which is classical and we do notprove here. John’s theorem states that there is an invertible linear map T : R m → R m so that for every x ∈ R m , k T ( x ) k ≤ k x k ≤ √ m · k T ( x ) k . Using T we can switch between k · k and k · k : E (cid:2) k f ∗ ( L − P )( X ) k (cid:3) / ≤ √ m · E (cid:2) k T ( f ∗ ( L − P )( X )) k (cid:3) / = √ m · E (cid:2) k T ( f ) ∗ ( L − P )( X ) k (cid:3) / = √ m · sX S k [ T ( f )( S ) k · ( \ L − P ( S )) ≤ ℓ √ m ℓ · sX S k [ T ( f )( S ) k = 8 ℓ √ m ℓ · E (cid:2) k T ( f ( X )) k (cid:3) / ≤ ℓ √ m ℓ · E (cid:2) k f ( X ) k (cid:3) / . N ELEMENTARY EXPOSITION OF PISIER’S INEQUALITY 5
Putting it together, E (cid:2) k f lin ( X ) k (cid:3) / ≤ ℓ (cid:16) √ m ℓ (cid:17) E (cid:2) k f ( X ) k (cid:3) / . Setting ℓ to be the smallest odd that is larger than log( m ), the proof is complete. Remark.
Pisier’s inequality is more general than stated in Theorem 1.1. The Banach-Mazur distance of the norm k · k from the Euclidean norm k · k is D = inf { d ∈ R : ∃ T ∈ GL m ∀ x ∈ R m k T ( x ) k ≤ k x k ≤ d · k T ( x ) k } , where GL m is the group of invertible linear transformations from R m to itself. John’stheorem states that always D ≤ √ m . The above argument proves that, more generally,we can replace the C log( m + 1) term by C log( D + 1) . Constructing the linear proxy
The structure of the linear proxy P we construct is similar to the linear proxy from [3].However, the existence of the linear proxy in [3] is proved in a non-constructive way.Here we provide a simple and explicit formula for P . The main piece in the constructionis the following proposition. Proposition 5.
Let ℓ > be odd and let φ ( θ ) = 2 ℓ − ℓ · sin( ℓθ )sin ( θ ) . There is a finitely supported distribution on θ ∈ [0 , π ] such that E (cid:2) φ ( θ ) · sin k ( θ ) (cid:3) = ( if k = 1 , if k = 0 , , , . . . , ℓ ,and E [ | φ ( θ ) | ] ≤ ℓ Using the proposition, the linear proxy is defined as P ( x ) = 2 · E θ " φ ( θ ) · n Y j =1 (cid:16) θ ) · x j (cid:17) . The properties of P readily follow. To prove that P is close to linear, open the productand use linearity of expectation: P ( x ) = X S ⊆ [ n ] E θ (cid:20) φ ( θ ) sin | S | ( θ )2 | S | (cid:21) · χ S ( x ) . This is the Fourier representation of P . The first property of φ implies that ˆ P ( S ) = 0when | S | = 0 , , , . . . , ℓ , and ˆ P ( S ) = 1 when | S | = 1. When | S | > ℓ , the secondproperty of φ implies | ˆ P ( S ) | ≤ | S | · E [ | φ ( θ ) | ] ≤ ℓ ℓ . IYER ET AL.
Bound the ℓ norm of P by E [ | P ( X ) | ] ≤ · E " | φ ( θ ) | · (cid:12)(cid:12)(cid:12) n Y j =1 (cid:16) θ ) · X j (cid:17)(cid:12)(cid:12)(cid:12) = 2 · E " | φ ( θ ) | · n Y j =1 (cid:16) θ ) · X j (cid:17) = 2 · E [ | φ ( θ ) | ] ≤ ℓ, because 1 + sin( θ ) · X j ≥
0, and E [ X j ] = 0.4.1. Construction of φ . The cancellations below are based on the following simplefact. Let Γ denote the 4 ℓ equally spaced angles:Γ = n , π ℓ , · π ℓ , . . . , (4 ℓ − · π ℓ o . For any integer a , since P θ ∈ Γ e iaθ = e ia · π ℓ · P θ ∈ Γ e iaθ , we have X θ ∈ Γ e iaθ = ( ℓ if a = 0 mod ℓ ,0 otherwise.(1)The distribution on θ is uniform in the set Γ \ { , π } . It remains to prove the statedproperties of φ one-by-one. For k = 0, since φ ( θ ) = − φ (2 π − θ ), E (cid:2) φ ( θ ) sin ( θ ) (cid:3) = 0 . For k = 1, use the identity sin( θ ) = e − iθ · e iθ − i : E [ φ ( θ ) · sin( θ )] = 14 ℓ − · ℓ − ℓ · X θ ∈ Γ \{ ,π } sin( ℓθ )sin( θ )= 12 ℓ · X θ ∈ Γ \{ ,π } e − i ( ℓ − θ · e i ℓθ − e i θ −
1= 12 ℓ · X θ ∈ Γ \{ ,π } e − i ( ℓ − θ + e − i ( ℓ − θ + . . . + e i ( ℓ − θ . Because ℓ is odd, when θ ∈ { , π } , we have e − i ( ℓ − θ + . . . + e i ( ℓ − θ = ℓ . So, using (1),we get= 12 ℓ · (cid:16) − ℓ + X θ ∈ Γ e − i ( ℓ − θ + e − i ( ℓ − θ + . . . + e i ( ℓ − θ (cid:17) = 12 ℓ · ( − ℓ + 4 ℓ ) = 1 . N ELEMENTARY EXPOSITION OF PISIER’S INEQUALITY 7
When 1 < k ≤ ℓ , because sin(0) = sin( π ) = 0, we have E (cid:2) φ ( θ ) · sin k ( θ ) (cid:3) = 14 ℓ − · ℓ − ℓ · X θ ∈ Γ \{ ,π } sin( ℓθ ) · sin k − ( θ )= 12 ℓ · X θ ∈ Γ sin( ℓθ ) · sin k − ( θ )= 12 ℓ · X θ ∈ Γ (cid:16) e iℓθ − e − iℓθ i (cid:17) · (cid:16) e iθ − e − iθ i (cid:17) k − = 0 , since every phase appearing here after opening the parenthesis is non-zero modulo 4 ℓ .Finally, bound the ℓ norm of φ : by the symmetry of θ , E [ | φ ( θ ) | ] ≤ · ℓ − · ℓ − ℓ · ℓ X j =1 (cid:12)(cid:12)(cid:12) (2 πj/ (4 ℓ )) (cid:12)(cid:12)(cid:12) ≤ ℓ · ∞ X j =1 (cid:12)(cid:12)(cid:12) ℓ j (cid:12)(cid:12)(cid:12) ≤ ℓ, where we used the inequality sin( γ ) ≥ γ/ ( π/ ≤ γ ≤ π/ A Lower Bound
Bourgain showed that Pisier’s inequality is sharp [2]. His example is non-explicitbecause it uses the probabilistic method. Here we give a simple and explicit exampleshowing that a loss of log m log log m is necessary. The main technical ingredient is the followingconstruction: Theorem 6.
For any n ∈ N , there is a function F : {− , } n → R with the followingproperties:(A) k F k ∞ ≤ O (1) .(B) ˆ F ( { j } ) = √ n for all j ∈ [ n ] .(C) k ˆ F k ≤ O ( √ n log( n )) . Our example follows the same outline as Bourgain’s approach. Bourgain proved astronger theorem showing that there is a function satisfying (A) and (B) but its Fouriersparsity in (C) is at most 2 O ( √ n ) . His construction starts by considering a simplefunction H satisfying (A) and (B) but not (C). He then carefully uses randomness toeliminate most of the Fourier coefficients in H while maintaining (A) and (B), andimproving the sparsity. We observe that it is enough to truncate H to prove thetheorem above.Before proving the theorem, let us see how it yields a limitation to Pisier’s inequality.Let F = { S ⊆ [ n ] : ˆ F ( S ) = ∅} , and consider the function f : {± } n → R F defined by( f ( x )) S = ˆ F ( S ) χ S ( x )for each S ∈ F . Define a norm on R F as follows. Every v ∈ R F corresponds to thefunction g = g v : {± } n → R that is defined by ˆ g ( S ) = v S . The norm of v is definedto be k v k = k g k ∞ = max {| g ( z ) | : z ∈ {± } n } . IYER ET AL.
It follows that for every x ∈ {− , } n , k f ( x ) k = k F k ∞ ≤ O (1)and that k f lin ( x ) k ≥ n · √ n ≥ Ω (cid:16) log |F | log log |F | (cid:17) . It remains to prove the theorem.
Proof of Theorem 6.
First, we define a function H : {− , } n → R by H ( x ) := Im (cid:16) n Y j =1 (cid:16) i √ n x j (cid:17)(cid:17) = X S ⊆ [ n ] Im (cid:16)(cid:16) i √ n (cid:17) | S | (cid:17) · χ S ( x ) , where Im denotes the imaginary part of a complex number. It follows that k H k ∞ ≤ (cid:12)(cid:12)(cid:12) i √ n (cid:12)(cid:12)(cid:12) n = (cid:16)r n (cid:17) n ≤ . It also follows that ˆ H ( { j } ) = 1 √ n (2)for all j ∈ [ n ] and | ˆ H ( S ) | ≤ n −| S | / for all S ⊆ [ n ].The function F is obtained from H by truncating the high frequencies. Let F ( x ) := X S ∈F ˆ H ( S ) · χ S ( x ) , where F := { S ⊆ [ n ] : | S | ≤ √ n } . Property (A) of F can be justified as follows. Forevery x ∈ {− , } n , | H ( x ) − F ( x ) | = (cid:12)(cid:12)(cid:12) X S ⊆ [ n ] ( ˆ H ( S ) − ˆ F ( S )) · χ S ( x ) (cid:12)(cid:12)(cid:12) ≤ X S | ˆ H ( S ) | | {z } ≤ n −| S | / · | χ S ( x ) | | {z } ≤ ≤ X k> √ n (cid:18) nk (cid:19) n − k/ ≤ X k> √ n (cid:16) e √ nk (cid:17) k ≤ − Ω( √ n ) . So, indeed k F k ∞ ≤ k H k ∞ + k H − F k ∞ ≤ O (1). Property (B) of F holds by (2).Property (C) holds because k ˆ F k ≤ |F | ≤ O (log( n ) √ n ) . (cid:3) Remark.
Bourgain used random sampling to sparsify the Fourier spectrum of H andget sparsity O ( √ n ) . Bourgain used Khinchine’s inequality to analyze the sparsity of therandom function. One can perform a similar analysis using more standard concentra-tion bounds. Remark.
Theorem 6 can be proved with F ( x ) = T k ( x + ... + x n n ) as well, where k = ⌊√ n ⌋ and T k is the k ’th Chebyshev polynomial of the first kind. N ELEMENTARY EXPOSITION OF PISIER’S INEQUALITY 9
Acknowledgements
We thank Mrigank Arora and Emanuel Milman for useful comments.
References [1] S. Artstein-Avidan, A. Giannopoulos, and V. D. Milman.
Asymptotic geometric analysis, Part I ,volume 202. American Mathematical Soc., 2015.[2] J. Bourgain. On martingales transforms in finite dimensional lattices with an appendix on thek-convexity constant.
Mathematische Nachrichten , 119(1):41–53, 1984.[3] J. Bourgain and V. D. Milman. New volume ratio properties for convex symmetric bodies in r n . Inventiones mathematicae , 88(2):319–340, 1987.[4] I. Dinur, E. Friedgut, G. Kindler, and R. O’Donnell. On the fourier tails of bounded functionsover the discrete cube. In
Proceedings of the thirty-eighth annual ACM symposium on Theory ofcomputing , pages 437–446, 2006.[5] T. Figiel and N. Tomczak-Jaegermann. Projections onto hilbertian subspaces of banach spaces.
Israel Journal of Mathematics , 33(2):155–171, 1979.[6] F. John. Extremum problems with inequalities as subsidiary conditions. In
Studies and EssaysPresented to R. Courant on his 60th Birthday, January 8, 1948 , pages 187–204. IntersciencePublishers, Inc., New York, N. Y., 1948.[7] N. Keller, E. Mossel, and T. Schlank. A note on the entropy/influence conjecture.
Discrete Math-ematics , 312(22):3364–3372, 2012.[8] B. Maurey and G. Pisier. S´eries de variables al´eatoires vectorielles ind´ependantes et propri´et´esg´eom´etriques des espaces de banach.
Studia Mathematica , 58(1):45–90, 1976.[9] V. Milman. Almost euclidean quotient spaces of subspaces of a finite-dimensional normed space.
Proceedings of the American Mathematical Society , 94(3):445–449, 1985.[10] V. D. Milman. In´egalit´e de Brunn-Minkowski inverse et applications `a la th´eorie locale des espacesnorm´es.
C. R. Acad. Sci. Paris S´er. I Math. , 302(1):25–28, 1986.[11] V. D. Milman. Isomorphic symmetrization and geometric inequalities. In J. Lindenstrauss andV. D. Milman, editors,
Geometric Aspects of Functional Analysis , pages 107–131, Berlin, Heidel-berg, 1988. Springer Berlin Heidelberg.[12] R. O’Donnell.
Analysis of boolean functions . Cambridge University Press, 2014.[13] G. Pisier. Sur les espaces de banach k -convexes. S´eminaire Analyse fonctionnelle (dit” Maurey-Schwartz”) , pages 1–15, 1979.[14] G. Pisier. Un th´eor`eme sur les op´erateurs lin´eaires entre espaces de banach qui se factorisent parun espace de hilbert. In
Annales scientifiques de l’ ´Ecole Normale Sup´erieure , volume 13, pages23–43, 1980.
School of Computer Science, University of Washington
E-mail address : [email protected] School of Computer Science, University of Washington
E-mail address : [email protected] School of Computer Science, University of Washington
E-mail address : [email protected] School of Computer Science, University of Washington
E-mail address : [email protected] Department of Mathematics, Technion-IIT
E-mail address ::