A Parallel Repetition Theorem for the GHZ Game
aa r X i v : . [ c s . CC ] A ug A Parallel Repetition Theorem for the GHZ Game
Justin Holmgren ∗ Ran Raz † August 13, 2020
Abstract
We prove that parallel repetition of the (3-player) GHZ game reduces the value of the game poly-nomially fast to 0. That is, the value of the GHZ game repeated in parallel t times is at most t − Ω(1) .Previously, only a bound of ≈ α ( t ) , where α is the inverse Ackermann function, was known [Ver96].The GHZ game was recently identified by Dinur, Harsha, Venkat and Yuen as a multi-player gamewhere all existing techniques for proving strong bounds on the value of the parallel repetition of thegame fail. Indeed, to prove our result we use a completely new proof technique. Dinur, Harsha, Venkatand Yuen speculated that progress on bounding the value of the parallel repetition of the GHZ gamemay lead to further progress on the general question of parallel repetition of multi-player games. Theysuggested that the strong correlations present in the GHZ question distribution represent the “hardestinstance” of the multi-player parallel repetition problem [DHVY17].Another motivation for studying the parallel repetition of the GHZ game comes from the field ofquantum information. The GHZ game, first introduced by Greenberger, Horne and Zeilinger [GHZ89],is a central game in the study of quantum entanglement and has been studied in numerous works. Forexample, it is used for testing quantum entanglement and for device-independent quantum cryptography.In such applications a game is typically repeated to reduce the probability of error, and hence boundson the value of the parallel repetition of the game may be useful. ∗ NTT Research. E-mail: [email protected] . Research conducted at Princeton University, supported inpart by the Simons Collaboration on Algorithms and Geometry and NSF grant No. CCF-1714779. † Department of Computer Science, Princeton University. E-mail: [email protected] . Research supported by theSimons Collaboration on Algorithms and Geometry, by a Simons Investigator Award and by the National Science Foundationgrants No. CCF-1714779, CCF-2007462. ontents A.1 Divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18A.2 Conditional KL Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.3 Conditional Statistical Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
B Fourier Analysis 20C Bound on Optimization Problem 21 Introduction
In a k -player game, players are given correlated “questions” ( q , . . . , q k ) sampled from a distribution Q andmust produce corresponding “answers” ( a , . . . , a k ) such that ( q , . . . , q k , a , . . . , a k ) satisfy a fixed predi-cate π . Crucially, the players are not allowed to communicate amongst themselves after receiving theirquestions (but they may agree upon a strategy beforehand). The value of the game is the probability withwhich the players can win with an optimal strategy. Multi-player games play a central role in theoreticalcomputer science due to their intimate connection with multi-prover interactive proofs (MIPs) [BGKW88],hardness of approximation [FGL + parallel repetition . In the t -wise parallel repetition of a game,question tuples ( q ( i )1 , . . . , q ( i ) k ) are sampled independently for i ∈ [ t ]. The j th player is given ( q (1) j , . . . , q ( t ) j ),and is required to produce ( a (1) j , . . . , a ( t ) j ). The players win if for every i ∈ [ t ], ( a ( i )1 , . . . , a ( i ) k ) is a winninganswer for questions ( q ( i )1 , . . . , q ( i ) k ). Parallel repetition was first proposed in [FRS94] as an intuitive attemptto reduce the value of a game from ǫ to ǫ t , but in general this is not what happens [For89, Fei91, FV96, Raz11].The actual effect is far more subtle and a summary of some of the known results is given in Table 1.Two-player games ≥ − Ω( t )) [Raz98] O (cid:16) α ( t ) (cid:17) [Ver96]Entangled t − Ω(1) [Yue16] O (1) (trivial)Non-Signaling exp ( − Ω ( t )) [Hol09] Ω(1) [HY19]Table 1: Known bounds on the worst-case (slowest) decay for various values of the t -wise parallel repetitionof a non-trivial game. α denotes the inverse Ackermann function.Much less is known about games with three or more players than about two-player games. Only veryweak bounds are known on how t -wise parallel repetition decreases the value of a three-player game (as afunction of t ). There is a similar gap in our understanding when players are allowed to share entangled state;in fact, no bounds here are known whatsoever in the three-player case. If players are more generally allowedto use any no-signaling strategy, then there are in fact counterexamples (lower bounds) showing that parallelrepetition may utterly fail to reduce the (no-signaling) value of a three-player game. The GHZ game, which we will denote by G GHZ , is a three-player game with query distribution Q GHZ thatis uniform on { x ∈ F : x + x + x = 0 } . To win, players are required on input ( x , x , x ) to produce( y , y , y ) such that y ⊕ y ⊕ y = x ∨ x ∨ x . It is easily verified that the value of G GHZ is 3 / “We suspect that progress on bounding the value of the parallel repetition of the GHZ game willlead to further progress on the general question.” and “We believe that the strong correlations present in the GHZ question distribution represent the“hardest instance” of the multiplayer parallel repetition problem. Existing techniques from thetwo-player case (which we leverage in this paper) appear to be incapable of analyzing games withquestion distributions with such strong correlations.” The GHZ game also plays an important role in quantum information theory and in particular in en-tanglement testing and device-independent quantum cryptography. Its salient properties are that it is an3OR game for which quantum (entangled) players can play perfectly, but classical players can win only withprobability strictly less than 1 [MS13]. No such two -player game is known. Moreover, the GHZ game hasthe so called, self testing property, that all quantum strategies that achieve value 1 are essentially equivalent.This property is important for entanglement testing and device-independent quantum cryptography.Prior to our work, the best known parallel repetition bound for the GHZ game was due to Verbit-sky [Ver96], who observed a connection between parallel repetition and the density Hales-Jewett theoremfrom Ramsey theory [FK91]. Using modern quantitative versions of this theorem [Pol12], Verbitsky’s resultimplies a bound of approximately α ( t ) , where α is the inverse Ackermann function.We prove a bound of t − Ω(1) . To prove our parallel repetition theorem for the GHZ game we show that for an arbitrary strategy, even ifwe condition on that strategy winning in several coordinates i , . . . , i m , there still exists some coordinate inwhich that strategy loses with significant probability. We consider the finer-grained event that also specifiesspecific queries and answers in coordinates i , . . . , i m , and abstract it out as a sufficiently dense productevent E over the three players’ inputs.Given an arbitrary product event E that occurs with sufficiently high probability, we show that somecoordinate of ˜ P def = P | E is hard. We do this in three high-level steps:1. We first prove this for the simpler case in which E is an affine subspace of F × n . In fact, we show inthis case that many coordinates of ˜ P are hard.2. We then prove that when E is arbitrary, ˜ P can be written as a convex combination of components˜ P |W , where W is a large affine subspace, with most such components “indistinguishable” from P |W .Specifically, our main requirement is that for all sufficiently compressing linear functions φ on W , theKL divergence of φ ( ˜ X ) from φ ( X ) is small, where we sample ˜ X ← ˜ P |W and X ← P |W .3. With this notion of indistinguishability, we prove that if ˜ P |W is indistinguishable from P |W , then theGHZ game (or any game with a constant-sized answer alphabet) is roughly as hard in every coordinatewith query distribution ˜ P |W as with P |W .We conclude that for many coordinates i , there is a significant portion of ˜ P for which the i th coordinate ishard. We emphasize that unlike all previous parallel repetition bounds, our proof does not construct a localembedding of Q GHZ into ˜ P for general E . Local Embeddability in Affine Subspaces
We first show that if E is any affine subspace of sufficientlylow codimension m in F × n , then there exist many coordinates i ∈ [ n ] for which Q GHZ is locally embeddablein the i th coordinate of the conditional distribution ˜ P . In fact, it will suffice for us to consider only affine“power” subspaces, i.e. of the form w + V for some linear subspace V in F n and vector w ∈ F × n . Let X , . . . , X n ∈ F denote the queries in each of the n repetitions.Our observation is that when E is affine there exists a subset of coordinates S ⊆ [ n ] with | S | ≥ i ′ ∈ S , E depends on ( X i ) i ∈ S only via the differences ( X i ′ − X i ) i ′ ∈ S \{ i } . Indeed, if E = E × E × E and if each E j is given by an affine equation ( X j , . . . , X nj ) · A = b j for a sufficiently “skinny” matrix A ,then by the pigeonhole principle there must exist two distinct subset row-sums of A with equal values. Byconsidering the symmetric difference of these subsets, and using the fact that we are working over F , thereis a set S ⊆ [ n ] such that the S -subset row-sum of A is 0. Thus the value of ( X j , . . . , X nj ) · A is unchangedif X ij is subtracted from X i ′ j for every i ∈ S .As a result, the players can all sample ( X i ′ − X i ) i ′ ∈ S \{ i } and ( X i ′ ) i ′ / ∈ S , which are independent of X i ,using shared randomness. On input X ij , the j th player can locally compute ( X i ′ j ) i ′ ∈ S from X ij and ( X i ′ − X i ).4 seudo-Affine Decompositions At a high level, we next show that if E is an arbitrary product event(with sufficient probability mass) then ˜ P has a “pseudo-affine decomposition”. That is, there is a partitionΠ of ( F n ) into affine subspaces such that if W is a random part of Π (as weighted by ˜ P ), then any strategyfor ˜ P |W can be extended to a strategy for P |W that is similarly successful in expectation.To construct Π, we prove the following sufficient conditions for Π to be a pseudo-affine decomposition: • When W is a random part of Π (as weighted by ˜ P ), the distributions ˜ P |W and P |W are indistinguish-able to all sufficiently compressing linear distinguishers. That is, if W is an affine shift of V , then forall subspaces U ≤ V of sufficiently small co-dimension, the distributions ˜ P |W and P |W are statisticallyclose modulo U . • Each part W of Π is in fact an affine shift of a product space V for some linear space V .We construct Π satisfying these conditions iteratively. Starting with the singleton partition, as long as arandom part W of Π has some subspace U for which ˜ P |W and P |W are distributed differently mod U , wereplace each part W of Π by all the affine shifts of U in W . We show that this process cannot be repeatedtoo many times when E has sufficient density. Pseudorandomness Preserves Hardness
The high-level reason these conditions suffice is because forany strategy f = f × f × f , they enable us to refine Π to a partition Π ′ f such that when X is sampled from˜ P |W ′ for a random part W ′ in Π ′ f , the distribution of f ( X ) is as if X were sampled uniformly from W ′ ∩ E (i.e. with X , X , and X mutually independent). Moreover, when we construct Π ′ f we partition each part W of Π into all affine shifts of some linear space U where the codimension of U in W is not too large.Thus the strategy f on ˜ P |W effectively has the players acting as independent (randomized) functions of theirinputs modulo U . Such strategies generalize to P |W by the first property of pseudo-affine decompositionsstated above.To construct Π ′ f , we ensure that f is uncorrelated with every affine function on ˜ P |W ′ when W ′ is arandom part of Π ′ f , and then prove the desired independence by Fourier analysis. We construct Π ′ f byiterative refinement of Π. Start by considering a random part W of Π. Whenever f ( X ) is correlated withan affine F -valued function χ , replace W in Π by W ∩ χ − (0) and W ∩ χ − (1), and do this in parallel forall parts of Π. We show that this cannot be repeated too many times, and thus we quickly arrive at ourdesired Π ′ F . In this section we describe some preliminary definitions that are somewhat specific to this work. Morestandard preliminaries are given in Appendices A and B.
Definition 3.1.
For any set S , a partition of S is a pairwise disjoint set of subsets of S , whose union is allof S . If Π is a partition of S and x is an element of S , we write Π( x ) to denote the (unique) element of Π thatcontains x . If U is a linear subspace of V , we write U ≤ V rather than U ⊆ V to emphasize that U is a subspace ratherthan an unstructured subset.We crucially rely on the Cauchy-Schwarz inequality:5 efinition 3.2 (Inner Product Space) . A real inner product space is a vector space V over R together withan operation h· , ·i : V × V → R satisfying the following axioms for all x, y, z ∈ V : • Symmetry: h x, y i = h y, x i . • Linearity in the first argument: h ax + by, z i = a h x, z i + b h y, z i . • Positive Definiteness: h x, x i > if x = 0 . Theorem 3.3 (Cauchy-Schwarz) . In any inner product space, it holds for all vectors u and v that |h u, v i| ≤h u, u i · h v, v i . In parallel repetition we often work with Cartesian product sets of the form X = ( X × · · · × X k ) n . For thesesets, we will use superscripts to index the outer product and subscripts to index the inner product. That is,we view elements x of X as tuples ( x , . . . , x n ), where the i th component of x j is x ji . We will also write x i to denote the vector ( x i , . . . , x ni ). If { E i ⊆ X i } i ∈ [ k ] is a collection of subsets indexed by subscripts, we write E × · · · × E k or Q i ∈ [ k ] E i to denote the set { x ∈ X : ∀ i ∈ [ k ] , x i ∈ E i } . Similarly, if Y is a product set( Y × · · · × Y k ) m , we say f : X → Y is a product function f × · · · × f k if f ( x ) = y for y i = f i ( x i ). Definition 3.4 (Multi-player Games) . A k -player game is a tuple ( X , Y , P, W ) , where X = X × · · · × X k and Y = Y × · · ·× Y k are finite sets, P is a probability measure on X , and W : X × Y → { , } is a “winningprobability” predicate. Definition 3.5 (Parallel Repetition) . Given a k -player game G = ( X , Y , Q, W ) , its n -fold parallel repetition ,denoted G n , is defined as the k -player game ( X n , Y n , Q n , W n ) , where W n ( x, y ) def = V nj =1 W ( x j , y j ) . Definition 3.6.
The success probability of a function f = f × · · · f k : X → Y in a k -player game G =( X , Y , Q, W ) is v [ f ]( G ) def = Pr x ← Q h W (cid:0) ( x, f ( x ) (cid:1) = 1 i . Definition 3.7.
The value of a k -player game G = ( X , Y , Q, W ) , denoted v ( G ) , is the maximum, over allfunctions f = f × · · · × f k : X → Y , of v [ f ]( G ) . Fact 3.8.
Randomized strategies are no better than deterministic strategies.
Definition 3.9 (Value in j th coordinate) . If G = ( X , Y , Q, W n ) is a game (with a product winning predicate),the value of G in the j th coordinate , denoted v j ( G ) , is the value of the game ( X , Y , Q, W ′ ) , where W ′ ( x, y ) = W ( x i , y i ) . Definition 3.10 (Game with Modified Query Distribution) . If G = ( X , Y , Q, W ) is a game, and P is aprobability measure on X , we write G| P to denote the game ( X , Y , P, W ) . In this section, we give some Fourier-analytic conditions (see Appendix B for the basics of Fourier analysis)that imply independence of random variables under the (parallel repeated) GHZ query distribution.It will be convenient for us to work with probability distributions in terms of their densities (see Ap-pendix A for basic probability definitions and notation). Because of symmetry, this implies also linearity in the second argument, aka bilinearity. efinition 4.1 (Probability Densities) . If P : Ω → R is a probability distribution with support contained in A , then the density of P in A is ϕ : A → R x
7→ | A | · P ( x ) . If A is unspecified, then by default it is taken to be Ω . Lemma 4.2.
Let V be a (finite) vector space over F , let P be uniform on { x ∈ V : x + x + x = 0 } , andlet U be uniform on V .For any subset E = E × E × E of V , P ( E ) = X χ ∈ ˆ V Y i ∈ [3] ˆ1 E i ( χ ) = U ( E ) · X χ ∈ ˆ V Y i ∈ [3] ˆ ϕ E i ( χ ) , where ϕ E i denotes the density in V of the uniform distribution on E i .Proof. Let ϕ P denote the density in V of P . That is, ϕ P ( x , x , x ) = ( |V| if x + x + x = 00 otherwise.Then P ( E ) = E x ←V [ ϕ P ( x ) · E ( x )]= X χ ∈ b V ˆ ϕ P ( χ ) · ˆ1 E ( χ ) . (Plancherel) (1)We now compute ˆ ϕ P ( χ ) and ˆ1 E ( χ ). We start by noting that the dual space c V is isomorphic to ˆ V . Thatis, each character χ ∈ c V is of the form χ ( x , x , x ) = χ ( x ) χ ( x ) χ ( x ) for some (uniquely determined) χ , χ , χ ∈ ˆ V and conversely, each choice of χ , χ , χ ∈ ˆ V gives rise to some χ ∈ c V .The Fourier transform of ϕ P is given byˆ ϕ P ( χ , χ , χ ) = ( χ = χ = χ E is a product event, the Fourier transform of 1 E : V → { , } is given byˆ1 E ( χ , χ , χ ) = Y i ∈ [3] ˆ1 E i ( χ i )= U ( E ) · Y i ∈ [3] ˆ ϕ E i ( χ i ) . (3)Substituting Eqs. (2) and (3) into Eq. (1) concludes the proof of the lemma. Corollary 4.3.
With V , P , E , and U as in Lemma 4.2, | P ( E ) − U ( E ) | ≤ X χ ∈ ˆ V\{ } Y i ∈ [3] (cid:12)(cid:12) ˆ1 E i ( χ ) (cid:12)(cid:12) , where ∈ ˆ V denotes the trivial character. roof. For any probability density function ϕ , we have ˆ ϕ (1) = 1, so | P ( E ) − U ( E ) | = U ( E ) · (cid:12)(cid:12)(cid:12)(cid:12) P ( E ) U ( E ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ U ( E ) · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X χ ∈ ˆ V\{ } Y i ∈ [3] ˆ ϕ E i ( χ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X χ ∈ ˆ V\{ } Y i ∈ [3] (cid:12)(cid:12) ˆ1 E i ( χ ) (cid:12)(cid:12) . Lemma 4.4.
Let V be a (finite) vector space over F , let P be uniform on { x ∈ V : x + x + x = 0 } , let U be uniform on V , and let X = ( X , X , X ) denote the identity random variable on V . Let Y i = Y i ( X i ) be a Y i -valued random variable for each i ∈ [3] , let Y = ( Y , Y , Y ) , and let Y = Y × Y × Y .Let W be a subspace of V . If for all χ ∈ ˆ W , E ( x,y ) ← P X,Y (cid:2) d TV (cid:0) P χ ( X ) | X ∈ x + W ,Y = y , U χ ( X ) | X ∈ x + W (cid:1)(cid:3) ≤ ǫ, (4) then E x ← P X (cid:2) d TV (cid:0) P Y | X ∈ x + W , U Y | X ∈ x + W (cid:1)(cid:3) ≤ ǫ · p |Y | · |Y | . Proof.
For x ∈ V , we will write ¯ x to denote the set x + W . Recall that V / W denotes the set of all cosets { x + W} x ∈V . For every i ∈ [3], every ¯ x i ∈ V / W , and every y i ∈ Y i , define 1 i, ¯ x i ,y i : ¯ x i → { , } to be theindicator for the set Y − i ( y i ) ∩ ¯ x i . Define ϕ i, ¯ x i ,y i to be the density (in ¯ x i ) of the uniform distribution on Y − i ( y i ) ∩ ¯ x i . That is, ϕ i, ¯ x i ,y i : ¯ x i → R ϕ i, ¯ x i ,y i ( x ′ i ) = ( | ¯ x i || Y − i ( y i ) ∩ ¯ x i | if Y i ( x ′ i ) = y i ϕ i, ¯ x i ,y i is easily seen to be related to 1 i, ¯ x i ,y i as1 i, ¯ x i ,y i = P Y i | ¯ X i =¯ x i ( y i ) · ϕ i, ¯ x i ,y i . With this notation, our assumption that Eq. (4) holds (for all χ ∈ ˆ W ) is equivalent to assuming that forall χ ∈ ˆ W \ { } , E ( x,y ) ← P X,Y (cid:2)(cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)(cid:3) ≤ ǫ. (5)This is because for all χ ∈ ˆ W \ { } , the distribution U χ ( X ) | X ∈ x + W is uniform on {± } .In general for x ∈ Supp( P X ), we have (by Corollary 4.3) that for any y ∈ Y , (cid:12)(cid:12) P Y | X ∈ x + W ( y ) − U Y | X ∈ x + W ( y ) (cid:12)(cid:12) ≤ X χ ∈ ˆ W\{ } Y i ∈ [3] (cid:12)(cid:12) ˆ1 i, ¯ x i ,y i ( χ ) (cid:12)(cid:12) (6)because: • the event E = { Y = y } is a product event E × E × E , where each E i = { Y i = y i } depends only on X i or equivalently on X i − x i , • the distribution P X − x | ¯ X =¯ x is uniform on { ( w , w , w ) ∈ W : w + w + w = 0 } , and • the distribution U X − x | ¯ X =¯ x is uniform on { ( w , w , w ) ∈ W } . Specifically, with the formalism of random variables as functions on a sample space, we mean that X is the identity function,mapping ( x , x , x ) to ( x , x , x ). · E x ← P X (cid:2) d TV (cid:0) P Y | X ∈ x + W , U Y | X ∈ x + W (cid:1)(cid:3) = E x ← P X X y ∈Y (cid:12)(cid:12) P Y | X ∈ x + W ( y ) − U Y | X ∈ x + W ( y ) (cid:12)(cid:12) ≤ E x ← P X X y ∈Y X χ =1 Y i ∈ [3] (cid:12)(cid:12) ˆ1 i, ¯ x i ,y i ( χ ) (cid:12)(cid:12) = E x ← P X X y ∈Y X χ =1 Y i ∈{ , } q(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12) · ˆ1 i, ¯ x i ,y i ( χ ) . Now, we apply Cauchy-Schwarz on the inner product space whose elements are real-valued functions of( x, y, χ ), and where the inner product is defined by h f, g i def = E x ← P X P y ∈Y P χ =1 f ( x, y, χ ) · g ( x, y, χ ). Thisbounds the above by vuuut Y i ∈{ , } E x ← P X X y ∈Y X χ =1 (cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12) · ˆ1 i, ¯ x i ,y i ( χ ) = vuuut Y i ∈{ , } X χ =1 X y ∈Y E x ← P X h(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12) · ˆ1 i, ¯ x i ,y i ( χ ) i . By the independence of ( X , Y ) and ( X i , Y i ) under P for i ∈ { , } , this is equal to Y i ∈{ , } sX χ =1 X y ∈Y E x ← P X h(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12)i · E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) = Y i ∈{ , } vuuut |Y − i | · X χ =1 X y ∈Y E x ← P X h(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12)i · X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) . But the function 1 , ¯ x ,y is just P Y | ¯ X =¯ x ( y ) · ϕ , ¯ x ,y , so the above is Y i ∈{ , } vuuut |Y − i | · X χ =1 X y ∈Y E x ← P X h P Y | ¯ X =¯ x ( y ) · (cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)i · X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) = Y i ∈{ , } vuuut |Y i | · X χ =1 X y ∈Y E x ← P X h P Y | ¯ X =¯ x ( y ) · (cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)i · X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) which by the definition of expectation is Y i ∈{ , } vuuut |Y i | · X χ =1 (cid:18) E x,y ← P X,Y (cid:2)(cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)(cid:3)(cid:19) · X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) .
9e use Eq. (5) to bound this by Y i ∈{ , } s ǫ |Y i | · X χ =1 X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) ≤ Y i ∈{ , } vuut ǫ |Y i | · X y i ∈Y i E x ← P X (cid:20) E x ′ ← ¯ x i [1 i, ¯ x i ,y i ( x ′ ) ] (cid:21) (Parseval’s Theorem)= Y i ∈{ , } vuuut ǫ |Y i | · E x ← P X E x ′ ← ¯ x i X y i ∈Y i i, ¯ x i ,y i ( x ′ ) . But for y i = y ′ i , the supports of 1 i, ¯ x i ,y i and 1 i, ¯ x i ,y ′ i are disjoint, so this is at most 2 ǫ p |Y | · |Y | . In this section we show that the parallel repeated GHZ query distribution has many coordinates in which theGHZ query distribution can be locally embedded, even conditioned on any affine event of low co-dimension.We first recall the notion of a local embedding.
Definition 5.1.
Let Σ be a finite set, let k and n be positive integers, let Q be a probability distribution on Σ k , and let ˜ P be a probability distribution on Σ k × n .We say that Q is locally embeddable in the j th coordinate of ˜ P if there exists a probability distribution R on a set R and functions e , . . . , e k : Σ × R → Σ n such that when sampling q ← Q , r ← R , if ˜ X denotes therandom variable ˜ X def = e ( q , r ) ⊤ ... e k ( q k , r ) ⊤ , then:1. The probability law of ˜ X is exactly ˜ P .2. It holds with probability that ˜ X j = q . Proposition 5.2.
Let n and m be positive integers with m < n . Let Q denote the GHZ query distribution(uniform on the set Q = { x ∈ F : x + x + x = 0 } ), and let W be an affine shift of V for a subspace V ≤ F × n of codimension m with Q n ( W ) > .Then there exist at least n − m distinct values of j ∈ [ n ] for which Q is locally embeddable in the j th coordinate of ˜ P def = Q n |W .Proof. Suppose otherwise. Without loss of generality, suppose that the coordinates that are not locallyembeddable include the first n ′ def = m + 1 coordinates (otherwise, V can be permuted to make this so). Thatis, for each j ∈ [ n ′ ], Q is not locally embeddable in the j th coordinate of ˜ P .Let the defining equations for V be written as V def = (cid:8) x ∈ F × n : x · A = 0 (cid:9) for some choice of A ∈ F n × m , and let v ∈ F × n be such that W = v + V .Because 2 n ′ > m , the pigeonhole principle implies that there exist two distinct sets S , S ⊆ [ n ′ ] suchthat X j ∈ S A j = X j ∈ S A j , A j denotes the j th row of A . Thus, there is a non-empty subset S def = S ∆ S ⊆ [ n ′ ] suchthat X j ∈ S A j = 0 . (7)Fix some such S . We will show that for any j ∈ S , Q is locally embeddable in the j th coordinate of ˜ P ,which is a contradiction. Let X denote the F × n -valued random variable given by the identity function. Claim 5.3.
For any j ∈ S , the distribution ˜ P X j is identical to Q (i.e., uniformly random on Q ).Proof. Let j ∈ S be given. It suffices to show that for every q, q ′ ∈ Q , there is a bijection Φ q,q ′ : Q n ∩ W →Q n ∩ W such that x ∈ Q n ∩ W satisfies x j = q if and only if y def = Φ q,q ′ ( x ) satisfies y j = q ′ . Such a bijectionΦ q,q ′ can be constructed by defining, for all j ′ ∈ [ n ],Φ q,q ′ ( x ) j ′ = ( x j ′ + q ′ − q if j ′ ∈ Sx j ′ otherwise.Φ q,q ′ clearly is an injective map from Q n to Q n and satisfies Φ q,q ′ ( x ) j = x j + q ′ − q , so the only remainingthing to check is that it indeed maps W into W . This is true because it preserves x · A . Indeed, for any i ∈ [3], Φ q,q ′ ( x ) i · A = x i · A + X j ′ ∈ S ( q ′ i − q i ) · A j ′ = x i · A + ( q ′ i − q i ) · X j ′ ∈ S A j ′ = x i · A (by Eq. (7)).For any j ∈ S , let ∆ ( j ) denote the random variable (cid:0) X j ′ − X j (cid:1) j ′ ∈ S \{ j } . Claim 5.4.
For any j ∈ S , it holds in ˜ P that (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) and X j are independent.Proof. Equivalently (using the definition of ˜ P ), let E denote the event that X ∈ W , i.e. for all i ∈ [3],( X i − v i ) · A = 0 . We need to show that in P , the random variables X j and (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) are conditionally independent given E . To show this, we rely on the following fact: Fact 5.5. If Y and Z are any independent random variables, and if E is any event that depends only on Z (and occurs with non-zero probability), then Y and Z are conditionally independent given E . It is clear that X j and (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) are independent in P . It is also the case that E depends only on (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) : E is defined by the constraint that for all i ∈ [3],0 = ( X i − v i ) · A = X j ′ ∈ S ( X j ′ i − X ji − v j ′ i ) · A j ′ + X j ′ ∈ [ n ] \ S ( X j ′ i − v j ′ i ) · A j ′ (by Eq. (7))= − v ji · A j + X j ′ ∈ S \{ j } ( X j ′ i − X ji − v j ′ i ) · A j ′ | {z } depends only on ∆ ( j ) + X j ′ ∈ [ n ] \ S ( X j ′ i − v j ′ i ) · A j ′ | {z } depends only on X [ n ] \ S .
11e now put everything togther. Fix any j ∈ S . We construct a local embedding of Q into the j th coordinate of ˜ P . For each i ∈ [3], we define e i : F × ( F × n ) → F × n such that for each j ′ ∈ [ n ]: e i ( x, r ) j ′ = x if j ′ = jx + r j ′ i − r ji if j ′ ∈ S \ { j } r j ′ i if j ′ / ∈ S .Define the distribution P ( embed ) to be the distribution on x ∈ F × n obtained by independently sampling q ← Q and r ← ˜ P , then defining x def = e ( q , r ) e ( q , r ) e ( q , r ) . It clearly holds with probability 1 that q = x j . Claim 5.6. P ( embed ) ≡ ˜ P .Proof. By definition, it is immediate that: P ( embed ) X j ≡ ˜ P X j and P ( embed )∆ ( j ) ,X [ n ] \ S ≡ ˜ P ∆ ( j ) ,X [ n ] \ S .Finally, X is fully determined by X j and (∆ ( j ) , X [ n ] \ S ), which are independent in both P ( embed ) (because q and r are sampled independently in the definition of P ( embed ) ) and ˜ P (by Claim 5.4).We have constructed an embedding of Q into one of the first n ′ coordinates of ˜ P , which is the desiredcontradiction. In this section we show that if E is an arbitrary event with sufficient probability mass under P = Q n GHZ ,then ˜ P = P | E can be decomposed into components with affine support that are “similar” to correspondingcomponents of P . We will call such components pseudorandom.We say that Π is an affine partition of F × n to mean that: • Each part Π( x ) of Π has the form w ( x ) + V ( x ) where V ( x ) is a subspace of F n , and • Each V ( x ) has the same dimension, which we refer to as the dimension of Π and denote by dim(Π).The codimension of Π is defined to be n − dim(Π). Definition 6.1. If W is an affine shift of a vector space V (for V ≤ F n ), we say that a W -valued randomvariable X is ( m, ǫ ) -close to Y if for all linear functions φ : F n → F m we have d KL ( φ ( X ) k φ ( Y )) ≤ ǫ , where φ denotes the function mapping x x x φ ( x ) φ ( x ) φ ( x ) . We write d m ( X k Y ) to denote the minimum ǫ for which X is ( m, ǫ ) -close to Y . We remark that d m ( X k Y ) is a non-decreasing function of m . Lemma 6.2.
Let P denote the distribution Q n GHZ , let X be the identity random variable, let E be an eventwith P ( X ∈ E ) = e − ∆ , and let ˜ P = P (cid:12)(cid:12) ( X ∈ E ) . For any δ > and any m ∈ Z + , there exists an affinepartition Π of F × n , of codimension at most m · ∆ δ , such that: E π ← ˜ P Π( X ) h d m (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17)i ≤ δ. (8)12 roof. We construct the claimed partition iteratively. Start with the trivial n -dimensional affine partitionΠ = { F × n } . Whenever Π i is a partition Π for which Eq. (8) does not hold, there exists a function φ i : F × n → F × m that: • When restricted to any part π of Π i , φ i is of the form φ i,π for some linear function φ i,π : F n → F m ,and • d KL (cid:16) ˜ P φ i ( X ) | Π i ( X ) (cid:13)(cid:13)(cid:13) P φ i ( X ) | Π i ( X ) (cid:17) > δ. (9)Without loss of generality, we additionally assume that each φ i,π is “full rank” when restricted to π .That is, if π is an affine shift of V , where V has dimension k , then the restriction of φ i,π to V is a linearmap of rank min( k, m ). It is clear that any φ i,π may be modified to be full rank without decreasing the KLdivergence of Eq. (9).Then by the chain rule for KL divergences, d KL (cid:16) ˜ P X | Π i ( X ) ,φ i ( X ) (cid:13)(cid:13)(cid:13) P X | Π i ( X ) ,φ i ( X ) (cid:17) < d KL (cid:16) ˜ P X | Π i ( X ) (cid:13)(cid:13)(cid:13) P X | Π i ( X ) (cid:17) − δ. (10)The left-hand side of Eq. (10) is equivalent to d KL (cid:16) ˜ P X | Π i +1 ( X ) (cid:13)(cid:13)(cid:13) P X | Π i +1 ( X ) (cid:17) with Π i +1 = (cid:8) π ∩ { x : φ i ( x ) = z } (cid:9) π ∈ Π i ,z ∈ F × m , which is an affine partition of dimension at least dim(Π) − m .Thus with the non-negative potential functionΦ(Π) def = d KL (cid:16) ˜ P X | Π( X ) (cid:13)(cid:13)(cid:13) P X | Π( X ) (cid:17) , we have Φ(Π i +1 ) < Φ(Π i ) − δ . But Φ(Π ) = − ln ( P ( E )) = ∆, so there must exist i ⋆ ≤ ∆ δ for which Eq. (8)holds with Π = Π i ⋆ , which has co-dimension at most m · ∆ δ . Proposition 7.1.
Let
W ⊆ F × n be an affine shift of a linear subspace V and let P be a the uniformdistribution on { w ∈ W : w + w + w = 0 } , which we assume to be non-empty. Let X denote the identityrandom variable, let E = E × E × E be an event with P ( X ∈ E ) = e − ∆ , and define ˜ P def = P (cid:12)(cid:12) ( X ∈ E ) .Suppose that ˜ P X is ( ⌈ δ ⌉ , δ ) -close to P X as in Definition 6.1, for δ satisfying δ ≤ min( ∆ · e − /ǫ , ∆ e , ǫ ) .Then for each j ∈ [ n ] , we have v j ( G n GHZ | ˜ P ) ≤ v j ( G n GHZ | P ) + 2 ǫ .Proof. Fix j ∈ [ n ] to be any coordinate, and let ˜ f = ˜ f × ˜ f × ˜ f : W → F be an arbitrary strategy. Let Y denote ˜ f ( X ). Claim 7.2.
There exists a subspace
U ≤ V of codimension at most ⌈ δ ⌉ such that: • The j th coordinate x j of x ∈ F × n depends only on x + U . • For all χ ∈ ˆ U , E ( x,y ) ← P X,Y h d KL (cid:16) P χ ( X ) | X ∈ x + U ,Y = y (cid:13)(cid:13)(cid:13) U χ ( X ) | X ∈ x + U (cid:17)i ≤ δ, where U denotes the uniform distribution on W . roof. Start with U = { u ∈ V : u j = 0 } (this ensures that any subspace U ≤ U satisfies the first desiredproperty). Define a potential function Z ( U ) def = dim( U ) − E ( x,y ) ← P X,Y (cid:2) H ( X | X ∈ x + U , Y = y ) (cid:3) , which is clearly non-negative. Additionally, Z ( U ) (and in particular Z ( U )) is at most 1 because for anysubspace U ≤ V and any x ∈ V , the entropy chain rule implies E y ← P Y | X ∈ x U (cid:2) H ( X | X ∈ x + U , Y = y ) (cid:3) = H ( X | X ∈ x + U ) − H ( Y | X ∈ x + U ) ≥ dim( U ) − . (in the first step we used the fact that Y is a function of X .For i ≥
1, define χ i ∈ ˆ U i \ { } to maximize b i def = E ( x,y ) ← P X,Y h d KL (cid:16) P χ i ( X ) | X ∈ x + U i ,Y = y (cid:13)(cid:13)(cid:13) U χ i ( X ) | X ∈ x + U i (cid:17)i = E ( x,y ) ← P X,Y h d KL (cid:16) P χ i ( X ) | X ∈ x + U i ,Y = y (cid:13)(cid:13)(cid:13) Unif {± } (cid:17)i = E ( x,y ) ← P X,Y h d KL (cid:16) P χ i ( X ) | X ∈ x + U i ,Y = y (cid:13)(cid:13)(cid:13) Unif {± } (cid:17)i = 1 − E ( x,y ) ← P X,Y h H (cid:0) χ i ( X ) | X ∈ x + U i , Y = y (cid:1)i , and define U i +1 def = { u ∈ U i : χ i ( u ) = 1 } . By the entropy chain rule, we have Z ( U i +1 ) ≤ Z ( U i ) − b i .Since the initial potential is at most 1, and all potentials are at least 0, there must be some i ⋆ ≤ ⌈ δ ⌉ forwhich b i ⋆ ≤ δ . The corresponding U i ⋆ is the desired subspace of V .Now let U be as given by Claim 7.2. By Lemma 4.4, we have E x ← P X d TV (cid:0) P Y | X ∈ x + U , Y i ∈ [3] P Y i | X i ∈ x i + U (cid:1) ≤ √ δ. By assumption of Proposition 7.1 (together with Pinsker’s inequality), P X + U and ˜ P X + U are q δ -closein total variational distance. We thus have that E x ← ˜ P X d TV (cid:0) P Y | X ∈ x + U , Y i ∈ [3] P Y i | X i ∈ x i + U (cid:1) ≤ √ δ, (11)by the general fact that if P and Q are two distributions that are ǫ -close in total variational distance, andif X is a B -bounded random variable, then (cid:12)(cid:12) E P [ X ] − E Q [ X ] (cid:12)(cid:12) ≤ Bǫ .We now obtain a probabilistic lower bound on P ( E | X + U ). We first lower bound its log-expectation: E x ← ˜ P X h − ln P (cid:0) E | X ∈ x + U (cid:1)i = E x ← ˜ P X h d KL (cid:0) ˜ P X | X ∈ x + U k P X | X ∈ x + U (cid:1)i ≤ d KL (cid:0) ˜ P X k P X (cid:1) (Fact A.17) ≤ ∆ . Markov’s inequality then implies that for any τ ,Pr x ← ˜ P X (cid:2) P ( E | X ∈ x + U ) ≤ τ (cid:3) ≤ ∆ln(1 /τ ) . (12)14ombining Eq. (12) with Eq. (11) and Fact A.18, we get E x ← ˜ P X d TV (cid:0) ˜ P Y | X ∈ x + U , Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i ∈ x i + U (cid:1) ≤ ∆ln(1 /τ ) + 4 √ δτ . Since this holds for all τ ∈ [0 ,
1] and because δ ≤ ∆ e , Corollary C.2 implies that E x ← ˜ P X d TV (cid:0) ˜ P Y | X ∈ x + U , Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i ∈ x i + U (cid:1) ≤ (cid:16) ∆ √ δ (cid:17) ≤ ǫ, (13)where the last inequality follows from our assumption that δ ≤ ∆ · e − /ǫ .Putting everything together, we have˜ P X + U ,Y = ˜ P X + U ˜ P Y | X + U ≈ ǫ ˜ P X + U · Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i + U ≈√ δ P X + U · Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i + U , where ≈ denotes closeness in total variational distance.But P X + U · Q i ∈ [3] ( P | X i ∈ E i ) Y i | X i + U is just the distribution on ( x + U , y ) obtained by sampling x ← P X , y ← F ( x ), where F = F × F × F is the following randomized strategy. On input x i , F i useslocal randomness to sample and output y i ← ( P | X i ∈ E i ) Y i | X i ∈ x i + U . By Fact 3.8, the probability that W ( x j , y ) = 1 (which is well-defined because x j is a function of x + U ) is at most v j ( G n GHZ | P ).We thus have v j [ ˜ f ]( G n GHZ | ˜ P ) = ˜ P X + U ,Y (cid:0) W ( X j , Y ) = 1 (cid:1) ≤ v j ( G n GHZ | P ) + ǫ + r δ ≤ v j ( G n GHZ | P ) + 2 ǫ. Since this holds for arbitrary ˜ f , we have v j ( G n GHZ | ˜ P ) ≤ v j ( G n GHZ | P ) + 2 ǫ . Theorem 8.1. If G = ( X , Y , Q, W ) denotes the GHZ game, then v ( G n ) ≤ n − Ω(1) .Proof.
Recall v ( G ) = 3 / P denote Q n ; that is P is uniform on (cid:8) ( X , X , X ) ∈ F × n : X + X + X = 0 (cid:9) . Let E = E × E × E be any product event in F × n with P ( E ) ≥ e − ∆ (where ∆ is a parameter we will specify later), and let ˜ P denote P | E .Let δ > m = ⌈ δ ⌉ . Recall our definition of d m (Defini-tion 6.1). Lemma 6.2 states that there exists an affine partition Π of F × n , of codimension at most m · ∆ δ ,such that: E π ← ˜ P Π( X ) h d m (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17)i ≤ δ. E π ← ˜ P Π( X ) h d ∞ (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17)i = d KL (cid:16) ˜ P X | Π( X ) (cid:13)(cid:13)(cid:13) P X | Π( X ) (cid:17) ≤ d KL (cid:0) ˜ P X k P X (cid:1) ≤ ∆ . Markov’s inequality thus implies that with probability at least 1 / π ← ˜ P Π( X ) , it holds that d m (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17) ≤ δ and d ∞ (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17) ≤ π pseudorandom, and let R denote the set of pseudorandom π .By Proposition 7.1, for each pseudorandom π we have v j (cid:0) G n | ( ˜ P | π ) (cid:1) ≤ v j (cid:0) G n | ( P | π ) (cid:1) + 2 ǫ (14)as long as 3 δ ≤ min( 9∆ · e − /ǫ , e , ǫ ) , (15)where ǫ is a parameter we will specify later.By Proposition 5.2, for each π ∈ Π (with P ( π ) > m · ∆ δ values of j ∈ [ n ], we have v j (cid:0) G n (cid:12)(cid:12) ( P | π ) (cid:1) = v ( G ) = 3 /
4. By averaging, there exists some j ⋆ ∈ [ n ] such that E π ← ˜ P Π( X ) | Π( X ) ∈R h v j ⋆ (cid:0) G n (cid:12)(cid:12) ( P | π ) (cid:1)i ≤ m ∆ nδ + (cid:18) − m ∆ nδ (cid:19) · , which is at most 7 / δ ≥ m ∆ n . (16)Putting everything together, we have v j ⋆ (cid:0) G n | ˜ P (cid:1) ≤ E π ← ˜ P Π( X ) h v j ⋆ (cid:0) G n | ( ˜ P | π ) (cid:1)i ≤ Pr π ← ˜ P Π( X ) [ π / ∈ R ] + Pr π ← ˜ P Π( X ) [ π ∈ R ] · E π ← ˜ P Π( X ) | Π( X ) ∈R h v j ⋆ (cid:0) G n | ( ˜ P | π ) (cid:1)i ≤
23 + 13 · ( 78 + 2 ǫ ) ≤ ǫ ≤ . Setting ǫ = , ∆ = 0 . n , δ = n − . , m = n . ensuresthat these constraints are all satisfied for sufficiently large n .Applying Lemma 8.2 below with ρ ( n ) = n − . and ǫ = completes the proof. Lemma 8.2 (Parallel Repetition Criterion) . Let G = ( X , Y , Q, W ) be a game, and let P denote Q n . Suppose ρ : Z + → R is a function with ρ ( n ) ≥ e − O ( n ) and ǫ > is a constant such that for all E = E × · · · E k ⊆ X n with P n ( E ) ≥ ρ ( n ) there exists j such that v j (cid:0) G n | ( P | E ) (cid:1) ≤ − ǫ . Then v ( G n ) ≤ ρ ( n ) Ω(1) . Proof.
Fix any f = f ×· · ·× f k : X n → Y n . Consider the probability space defined by sampling X ← P n , andlet Y = f ( X ). We define additional random variables J , . . . , J n ∈ [ n ] and Z , . . . , Z n ∈ X × Y where J is anarbitrary fixed value, Z i def = ( X J i , Y J i ) for all i , and J i +1 depends deterministically on Z ≤ i def = ( Z , . . . , Z i ) as16ollows. When Z ≤ i = z ≤ i , J i +1 is defined to be a value j ∈ [ n ] that minimizes P n (cid:0) W ( X j , Y j ) = 1 (cid:12)(cid:12) Z ≤ i = z ≤ i (cid:1) .With these definitions, each event { Z ≤ i = z ≤ i } is a product event. In particular, if P n ( Z ≤ i = z ≤ i ) ≥ ρ ( n )then P n (cid:0) W ( X J i +1 , Y J i +1 ) = 1 (cid:12)(cid:12) Z ≤ i = z ≤ i (cid:1) ≤ − ǫ .Let Win i denote the event that W ( Z i ) = 1, let Win ≤ i denote the event Win ∧ · · · ∧ Win i , and let w i denote P n (cid:0) Win ≤ i (cid:1) . Since Win ≤ i is the union of some subset of the |X | i · |Y| i disjoint product events { Z ≤ i = z ≤ i } , we have Pr z ≤ i ← P nZ ≤ i | Win ≤ i [ P n ( Z ≤ i = z ≤ i ) ≥ ρ ( n )] ≥ − |X | i · |Y| i · ρ ( n ) w i . Moreover, for all z ≤ i for which P n ( Z ≤ i = z ≤ i ) ≥ ρ ( n ), we know that P n (cid:0) Win i +1 (cid:12)(cid:12) Z ≤ i = z ≤ i (cid:1) ≤ − ǫ . Thusas long as w i ≥ · |X | i · |Y| i · ρ ( n ), we have w i +1 = w i · P n ( Win i +1 | Win ≤ i )= w i · E z ≤ i ← P nZ ≤ i | Win ≤ i (cid:2) P n ( Win i +1 | Z ≤ i = z ≤ i ) (cid:3) ≤ w i · Pr z ≤ i ← P nZ ≤ i | Win ≤ i (cid:2) P n (cid:0) Z ≤ i = z ≤ i (cid:1) < ρ (cid:3) + Pr z ≤ i ← P nZ ≤ i | Win ≤ i (cid:2) P n (cid:0) Z ≤ i = z ≤ i (cid:1) ≥ ρ (cid:3) · (1 − ǫ ) ! ≤ w i · (cid:18)
12 + 12 · (1 − ǫ ) (cid:19) = w i · (cid:16) − ǫ (cid:17) Iterating this inequality as long as the condition w i ≥ · |X | i · |Y| i · ρ ( n ) is satisfied, we find w i ⋆ such that w i ⋆ ≤ min (cid:0) · |X | i ⋆ · |Y| i ⋆ · ρ ( n ) , (1 − ǫ ) i ⋆ (cid:1) . This is minimized for i ⋆ = Θ(log ρ ( n ) ) or i ⋆ = n and gives v ( G n ) ≤ w i ⋆ ≤ ρ ( n ) Ω(1) . A Probability Theory
We recall the notions of probability theory that we will need.
Definition A.1. A probability distribution on a finite set Ω is a function P : Ω → R satisfying P ( ω ) ≥ forall ω ∈ Ω and P ω ∈ Ω P ( ω ) = 1 . We extend the domain of P to Ω by writing P ( E ) to denote P ω ∈ E P ( ω ) for any “event” E ⊆ Ω . Definition A.2.
The support of P : Ω → R is the set { ω ∈ Ω : P ( ω ) > } . Definition A.3. A Σ -valued random variable on a sample space Ω is a function X : Ω → Σ . Definition A.4 (Expectations) . If P : Ω → R is a probability distribution and X : Ω → R is a randomvariable, the expectation of X under P , denoted E P [ X ] , is defined to be P ω ∈ Ω P ( ω ) · X ( ω ) . We refer to subsets of Ω as events . We use standard shorthand for denoting events. For instance, if X isa Σ-valued random variable and x ∈ Σ, we write X = x to denote the event { ω ∈ Ω : X ( ω ) = x } . Definition A.5 (Indicator Random Variables) . For any event E , we write E to denote a random variabledefined as E ( ω ) = ( if ω ∈ E otherwise. Definition A.6 (Independence) . Events E , . . . , E k ⊆ Ω are said to be independent under a probabilitydistribution P if P ( E ∩ · · · ∩ E k ) = Q i ∈ [ k ] P ( E i ) . Random variables X , . . . , X k are said to be independentif the events X = x , . . . , X k = x k are independent for any choice of x , . . . , x k . efinition A.7 (Conditional Probabilities) . If P : Ω → R is a probability distribution and E ⊆ Ω is anevent with P ( E ) > , then the conditional distribution of P given E is denoted ( P | E ) : Ω → R and is definedto be ( P | E )( ω ) = ( P ( ω ) /P ( E ) if ω ∈ E otherwise. If X is a random variable and P is a probability distribution, we write P X to denote the induceddistribution of X under P . That is, P X ( x ) = P ( X = x ).If E is an event, we write P X | E as shorthand for ( P | E ) X . Definition A.8 (Entropy) . If P : Ω → R is a probability distribution, the entropy (in nats) of P is H ( P ) def = − X ω ∈ Ω P ( ω ) · ln (cid:0) P ( ω ) (cid:1) . When X is a random variable associated with a probability distribution P , we sometimes write H ( X ) asshorthand for H ( P X ) . Definition A.9 (Conditional Entropy) . If P is a probability measure with random variables X and Y , wewrite H ( P X | Y ) def = E y ← P Y (cid:2) H ( P X | Y = y ) (cid:3) . Fact A.10 (Chain Rule of Conditional Entropy) . For any probability measure P and any random variables X , Y , it holds that H ( P X | Y ) = H ( P X,Y ) − H ( P Y ) . A.1 Divergences
Definition A.11 (Total Variational Distance) . If P, Q : Ω → R are two probability distributions, then the total variational distance between P and Q , denoted d TV ( P, Q ) , is d TV ( P, Q ) def = max E ⊆ Ω (cid:12)(cid:12)(cid:12) P ( E ) − Q ( E ) (cid:12)(cid:12)(cid:12) . An equivalent definition is d TV ( P, Q ) def = 12 X ω ∈ Ω (cid:12)(cid:12) P ( ω ) − Q ( ω ) (cid:12)(cid:12) Definition A.12 (Kullback-Leibler (KL) Divergence) . If P, Q : Ω → R are probability distributions, the Kullback-Leibler divergence of P from Q is d KL ( P k Q ) def = X ω ∈ Ω P ( ω ) ln (cid:18) P ( ω ) Q ( ω ) (cid:19) , where terms of the form p · ln( p/ are treated as if p = 0 and + ∞ otherwise, and terms of the form · ln(0 /q ) are treated as . The following relation between total variational distance and Kullback-Leiber divergence, known asPinsker’s inequality, is of fundamental importance.
Theorem A.13 (Pinsker’s Inequality) . For any probability distributions
P, Q : Ω → R , it holds that d TV ( P, Q ) ≤ q d KL ( P k Q ) . efinition A.14 (Conditional KL Divergence) . If P, Q : Ω → R are probability distributions and if W , X , Y , and Z are random variables on Ω , we write d KL ( P W | X k Q Y | Z ) def = E x ← P X (cid:2) d KL ( P W | X = x k Q Y | Z = x ) (cid:3) , which is taken to be + ∞ if there exists x with P X ( x ) > but Q Z ( x ) = 0 . KL divergence obeys a chain rule analogous to that for entropy.
Fact A.15 (Chain Rule for KL Divergence) . If P, Q : Ω → R are probability distributions and W, X, Y, Z are random variables on Ω , then d KL ( P W,X k Q Y,Z ) = d KL ( P X k Q Z ) + d KL ( P W | X k P Y | Z ) . A.2 Conditional KL Divergence
Fact A.16. If P : Ω → R is a probability distribution and E ⊆ Ω is an event, then d KL (cid:0) P | E (cid:13)(cid:13) P (cid:1) = ln (cid:18) P ( E ) (cid:19) . Fact A.17.
Let
P, Q : Ω → R be probability distributions and let X , Y be random variables on Ω with Y afunction of X . Then d KL ( P X | Y k Q X | Y ) (cid:3) ≤ d KL ( P X k Q X ) . Proof.
This is well known, but for completeness: d KL ( P X | Y k Q X | Y ) = d KL ( P X,Y k Q X,Y ) − d KL ( P Y k Q Y ) (chain rule)= d KL ( P X k Q X ) − d KL ( P Y k Q Y ) ( Y is a function of X ) ≤ d KL ( P X k Q X ) . (non-negativity of KL) A.3 Conditional Statistical Distance
Fact A.18.
Let
P, Q : Ω → R be probability distributions, and let E ⊆ Ω be an arbitrary event. Then d TV ( P | E, Q | E ) ≤ · d TV ( P, Q ) P ( E ) . Proof.
Suppose for the sake of contradiction that for some A ⊆ E , we have | ( P | E )( A ) − ( Q | E )( A ) | > d TV ( P, Q ) P ( E ) . Multiplying on both sides by P ( E ), we obtain | P ( A ) − P ( E ) · ( Q | E )( A ) | > d TV ( P, Q ) . Since | P ( E ) − Q ( E ) | ≤ d TV ( P, Q ) and ( Q | E )( A ) ≤
1, we have | P ( A ) − Q ( A ) | > d TV ( P, Q ) , which is a contradiction. Corollary A.19.
Let P : Ω → R be a probability distribution, let X , Y and Z be random variables on Ω ,and let E be an event such that Pr z ← P Z [ P ( E | Z = z ) ≥ δ ] ≥ − τ , and let ˜ P denote P | E . Then E z ← P Z (cid:2) d TV ( ˜ P X | Z = z , ˜ P Y | Z = z ) (cid:3) ≤ τ + 2 · E z ← P Z (cid:2) d TV ( P X | Z = z , P Y | Z = z ) (cid:3) δ . roof. E z ← P Z (cid:2) d TV ( ˜ P X | Z = z , ˜ P Y | Z = z ) (cid:3) = E z ← P Z (cid:2) P ( E | Z = z ) <δ · d TV ( ˜ P X | Z = z , ˜ P Y | Z = z ) + 1 P ( E | Z = z ) ≥ δ · d TV ( ˜ P X | Z = z , ˜ P Y | Z = z ) (cid:3) ≤ τ + E z ← P Z (cid:2) P ( E | Z = z ) ≥ δ · d TV ( ˜ P X | Z = z , ˜ P Y | Z = z ) (cid:3) ≤ τ + E z ← P Z (cid:20) P ( E | Z = z ) ≥ δ · · d TV ( P X | Z = z , P Y | Z = z ) P ( E | Z = z ) (cid:21) ≤ τ + 2 · E z ← P Z (cid:2) d TV ( P X | Z = z , P Y | Z = z ) (cid:3) δ . B Fourier Analysis
For any (finite) vector space V over F , the character group of V , denoted ˆ V , is the set of group homomor-phisms mapping V (viewed as an additive group) to {± } (viewed as a multiplicative group). Each suchhomomorphism is called a character of V .We will distinguish the spaces of functions mapping from V → R and functions mapping ˆ V → R andview them as two different inner product spaces. For functions mapping V → R , we define the inner product h f, g i def = E x ← V [ f ( x ) g ( x )] , and for functions mapping ˆ V → R , we define the inner product h ˆ f , ˆ g i def = X χ ∈ ˆ V ˆ f ( χ ) · ˆ g ( χ ) . If there is danger of ambiguity, we use ˆ h· , · ˆ i to denote the latter inner product, and ˆ k · ˆ k to denote itscorresponding norm. Fact B.1.
Given a choice of basis for V , there is a canonical isomorphism between V and ˆ V . Specifically,if V = F n , then the characters of V are the functions of the form χ γ ( v ) = ( − γ · v for γ ∈ F n . Definition B.2.
For any function f : V → R , its Fourier transform is the function ˆ f : ˆ V → R defined by ˆ f ( χ ) def = h f, χ i = E x ← V [ f ( x ) χ ( x )] . One can verify that the characters of V are orthonormal. Together with the assumption that V is finite,we can deduce that f is equal to P χ ∈ ˆ V ˆ f ( χ ) · χ . Theorem B.3 (Plancherel) . For any f, g : V → R , h f, g i = h ˆ f , ˆ g i . An important special case of Plancherel’s theorem is Parseval’s theorem:
Theorem B.4 (Parseval) . For any f : V → R , k f k = k ˆ f k . Bound on Optimization Problem
Let W : R + → R + denote the inverse of the function x x · e x ( W is known in the literature as the(principal branch of the) Lambert W function). We rely on the following theorem: Theorem C.1 ([HH00, Corollary 2.4]) . There exists a constant C (in particular, C = ln (cid:0) e (cid:1) works)such that for all y ≥ e , W ( y ) ≤ ln y − ln ln y + C. The following corollary is more directly suited to our needs.
Corollary C.2.
For any
A, B > satisfying A ≥ eB , min τ ∈ (0 , A ln (cid:0) τ (cid:1) + Bτ ≤ A ln( A/B ) . Proof.
The minimum is achieved (up to a factor of two) when A ln ( τ ) = Bτ because A ln ( τ ) is monotonicallyincreasing with τ while Bτ is monotonically decreasing. Making the change of variables z = − ln( τ ), this isequivalent to ze z = AB , i.e. z = W ( AB ). This choice of z (or equivalently τ ) gives A ln (cid:0) τ (cid:1) + Bτ = 2 AW ( A/B )= 2 B · A/BW ( A/B )= 2 B · exp (cid:0) W ( A/B ) (cid:1) (Definition of W ) ≤ A · (1 + e − )ln( A/B ) (Theorem C.1) ≤ A ln( A/B ) . References [BGKW88] Michael Ben-Or, Shafi Goldwasser, Joe Kilian, and Avi Wigderson. Multi-prover interactiveproofs: How to remove intractability assumptions. In
STOC , pages 113–131. ACM, 1988.[BJKS04] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics ap-proach to data stream and communication complexity.
J. Comput. Syst. Sci. , 68(4):702–732,2004.[CHTW04] Richard Cleve, Peter Høyer, Benjamin Toner, and John Watrous. Consequences and limits ofnonlocal strategies. In
CCC , pages 236–249. IEEE Computer Society, 2004.[DHVY17] Irit Dinur, Prahladh Harsha, Rakesh Venkat, and Henry Yuen. Multiplayer parallel repetitionfor expanding games. In
ITCS , volume 67 of
LIPIcs , pages 37:1–37:16. Schloss Dagstuhl -Leibniz-Zentrum für Informatik, 2017.[EPR35] Albert Einstein, Boris Podolsky, and Nathan Rosen. Can quantum-mechanical description ofphysical reality be considered complete?
Physical review letters , 47(10):777, 1935.[Fei91] Uriel Feige. On the success probability of the two provers in one-round proof systems. In
Structure in Complexity Theory Conference , pages 116–123. IEEE Computer Society, 1991.21FGL +
91] Uriel Feige, Shafi Goldwasser, László Lovász, Shmuel Safra, and Mario Szegedy. Approximatingclique is almost NP-complete (preliminary version). In
FOCS , pages 2–12. IEEE ComputerSociety, 1991.[FK91] H. Furstenberg and Y. Katznelson. A density version of the Hales-Jewett theorem.
Journald’Analyse Mathématique , 57(1):64–119, December 1991.[For89] Lance Jeremy Fortnow.
Complexity-theoretic aspects of interactive proof systems . PhD thesis,MIT, 1989.[FRS94] Lance Fortnow, John Rompel, and Michael Sipser. On the power of multi-prover interactiveprotocols.
Theor. Comput. Sci. , 134(2):545–557, 1994.[FV96] Uriel Feige and Oleg Verbitsky. Error reduction by parallel repetition - a negative result. InSteven Homer and Jin-Yi Cai, editors,
CCC , pages 70–76. IEEE Computer Society, 1996.[GHZ89] Daniel M. Greenberger, Michael A. Horne, and Anton Zeilinger.
Going Beyond Bell’s Theorem ,pages 69–72. Springer Netherlands, Dordrecht, 1989.[HH00] Abdolhossein Hoorfar and Mehdi Hassani. Inequalities on the Lambert W function and hyper-power function.
J. Inequal. Pure and Appl. Math , 2000.[Hol09] Thomas Holenstein. Parallel repetition: Simplification and the no-signaling case.
Theory Com-put. , 5(1):141–172, 2009.[HY19] Justin Holmgren and Lisa Yang. The parallel repetition of non-signaling games: counterexamplesand dichotomy. In
STOC , pages 185–192. ACM, 2019.[MS13] Carl A. Miller and Yaoyun Shi. Optimal robust self-testing by binary nonlocal XOR games. In
TQC , volume 22 of
LIPIcs , pages 254–262. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,2013.[Pol12] D.H.J. Polymath. A new proof of the density Hales-Jewett theorem.
Annals of Mathematics ,175(3):1283–1327, May 2012.[PRW97] Itzhak Parnafes, Ran Raz, and Avi Wigderson. Direct product results and the GCD problem, inold and new communication models. In Frank Thomson Leighton and Peter W. Shor, editors,
STOC , pages 363–372. ACM, 1997.[Raz98] Ran Raz. A parallel repetition theorem.
SIAM J. Comput. , 27(3):763–803, 1998.[Raz11] Ran Raz. A counterexample to strong parallel repetition.
SIAM J. Comput. , 40(3):771–777,2011.[Ver96] Oleg Verbitsky. Towards the parallel repetition conjecture.
Theor. Comput. Sci. , 157(2):277–282,1996.[Yue16] Henry Yuen. A parallel repetition theorem for all entangled games. In
ICALP , volume 55 of