[PDF] A Parallel Repetition Theorem for the GHZ Game

Abstract

We prove that parallel repetition of the (3-player) GHZ game reduces the value of the game polynomially fast to 0. That is, the value of the GHZ game repeated in parallel t times is at most t −Ω(1) . Previously, only a bound of ≈ 1 α(t) , where α is the inverse Ackermann function, was known. The GHZ game was recently identified by Dinur, Harsha, Venkat and Yuen as a multi-player game where all existing techniques for proving strong bounds on the value of the parallel repetition of the game fail. Indeed, to prove our result we use a completely new proof technique. Dinur, Harsha, Venkat and Yuen speculated that progress on bounding the value of the parallel repetition of the GHZ game may lead to further progress on the general question of parallel repetition of multi-player games. They suggested that the strong correlations present in the GHZ question distribution represent the "hardest instance" of the multi-player parallel repetition problem. Another motivation for studying the parallel repetition of the GHZ game comes from the field of quantum information. The GHZ game, first introduced by Greenberger, Horne and Zeilinger, is a central game in the study of quantum entanglement and has been studied in numerous works. For example, it is used for testing quantum entanglement and for device-independent quantum cryptography. In such applications a game is typically repeated to reduce the probability of error, and hence bounds on the value of the parallel repetition of the game may be useful.

Full PDF

aa r X i v : . [ c s . CC ] A ug A Parallel Repetition Theorem for the GHZ Game

Justin Holmgren ∗ Ran Raz † August 13, 2020

Abstract

We prove that parallel repetition of the (3-player) GHZ game reduces the value of the game poly-nomially fast to 0. That is, the value of the GHZ game repeated in parallel t times is at most t − Ω(1) .Previously, only a bound of ≈ α ( t ) , where α is the inverse Ackermann function, was known [Ver96].The GHZ game was recently identiﬁed by Dinur, Harsha, Venkat and Yuen as a multi-player gamewhere all existing techniques for proving strong bounds on the value of the parallel repetition of thegame fail. Indeed, to prove our result we use a completely new proof technique. Dinur, Harsha, Venkatand Yuen speculated that progress on bounding the value of the parallel repetition of the GHZ gamemay lead to further progress on the general question of parallel repetition of multi-player games. Theysuggested that the strong correlations present in the GHZ question distribution represent the “hardestinstance” of the multi-player parallel repetition problem [DHVY17].Another motivation for studying the parallel repetition of the GHZ game comes from the ﬁeld ofquantum information. The GHZ game, ﬁrst introduced by Greenberger, Horne and Zeilinger [GHZ89],is a central game in the study of quantum entanglement and has been studied in numerous works. Forexample, it is used for testing quantum entanglement and for device-independent quantum cryptography.In such applications a game is typically repeated to reduce the probability of error, and hence boundson the value of the parallel repetition of the game may be useful. ∗ NTT Research. E-mail: [email protected] . Research conducted at Princeton University, supported inpart by the Simons Collaboration on Algorithms and Geometry and NSF grant No. CCF-1714779. † Department of Computer Science, Princeton University. E-mail: [email protected] . Research supported by theSimons Collaboration on Algorithms and Geometry, by a Simons Investigator Award and by the National Science Foundationgrants No. CCF-1714779, CCF-2007462. ontents A.1 Divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18A.2 Conditional KL Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19A.3 Conditional Statistical Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

B Fourier Analysis 20C Bound on Optimization Problem 21 Introduction

In a k -player game, players are given correlated “questions” ( q , . . . , q k ) sampled from a distribution Q andmust produce corresponding “answers” ( a , . . . , a k ) such that ( q , . . . , q k , a , . . . , a k ) satisfy a ﬁxed predi-cate π . Crucially, the players are not allowed to communicate amongst themselves after receiving theirquestions (but they may agree upon a strategy beforehand). The value of the game is the probability withwhich the players can win with an optimal strategy. Multi-player games play a central role in theoreticalcomputer science due to their intimate connection with multi-prover interactive proofs (MIPs) [BGKW88],hardness of approximation [FGL + parallel repetition . In the t -wise parallel repetition of a game,question tuples ( q ( i )1 , . . . , q ( i ) k ) are sampled independently for i ∈ [ t ]. The j th player is given ( q (1) j , . . . , q ( t ) j ),and is required to produce ( a (1) j , . . . , a ( t ) j ). The players win if for every i ∈ [ t ], ( a ( i )1 , . . . , a ( i ) k ) is a winninganswer for questions ( q ( i )1 , . . . , q ( i ) k ). Parallel repetition was ﬁrst proposed in [FRS94] as an intuitive attemptto reduce the value of a game from ǫ to ǫ t , but in general this is not what happens [For89, Fei91, FV96, Raz11].The actual eﬀect is far more subtle and a summary of some of the known results is given in Table 1.Two-player games ≥ − Ω( t )) [Raz98] O (cid:16) α ( t ) (cid:17) [Ver96]Entangled t − Ω(1) [Yue16] O (1) (trivial)Non-Signaling exp ( − Ω ( t )) [Hol09] Ω(1) [HY19]Table 1: Known bounds on the worst-case (slowest) decay for various values of the t -wise parallel repetitionof a non-trivial game. α denotes the inverse Ackermann function.Much less is known about games with three or more players than about two-player games. Only veryweak bounds are known on how t -wise parallel repetition decreases the value of a three-player game (as afunction of t ). There is a similar gap in our understanding when players are allowed to share entangled state;in fact, no bounds here are known whatsoever in the three-player case. If players are more generally allowedto use any no-signaling strategy, then there are in fact counterexamples (lower bounds) showing that parallelrepetition may utterly fail to reduce the (no-signaling) value of a three-player game. The GHZ game, which we will denote by G GHZ , is a three-player game with query distribution Q GHZ thatis uniform on { x ∈ F : x + x + x = 0 } . To win, players are required on input ( x , x , x ) to produce( y , y , y ) such that y ⊕ y ⊕ y = x ∨ x ∨ x . It is easily veriﬁed that the value of G GHZ is 3 / “We suspect that progress on bounding the value of the parallel repetition of the GHZ game willlead to further progress on the general question.” and “We believe that the strong correlations present in the GHZ question distribution represent the“hardest instance” of the multiplayer parallel repetition problem. Existing techniques from thetwo-player case (which we leverage in this paper) appear to be incapable of analyzing games withquestion distributions with such strong correlations.” The GHZ game also plays an important role in quantum information theory and in particular in en-tanglement testing and device-independent quantum cryptography. Its salient properties are that it is an3OR game for which quantum (entangled) players can play perfectly, but classical players can win only withprobability strictly less than 1 [MS13]. No such two -player game is known. Moreover, the GHZ game hasthe so called, self testing property, that all quantum strategies that achieve value 1 are essentially equivalent.This property is important for entanglement testing and device-independent quantum cryptography.Prior to our work, the best known parallel repetition bound for the GHZ game was due to Verbit-sky [Ver96], who observed a connection between parallel repetition and the density Hales-Jewett theoremfrom Ramsey theory [FK91]. Using modern quantitative versions of this theorem [Pol12], Verbitsky’s resultimplies a bound of approximately α ( t ) , where α is the inverse Ackermann function.We prove a bound of t − Ω(1) . To prove our parallel repetition theorem for the GHZ game we show that for an arbitrary strategy, even ifwe condition on that strategy winning in several coordinates i , . . . , i m , there still exists some coordinate inwhich that strategy loses with signiﬁcant probability. We consider the ﬁner-grained event that also speciﬁesspeciﬁc queries and answers in coordinates i , . . . , i m , and abstract it out as a suﬃciently dense productevent E over the three players’ inputs.Given an arbitrary product event E that occurs with suﬃciently high probability, we show that somecoordinate of ˜ P def = P | E is hard. We do this in three high-level steps:1. We ﬁrst prove this for the simpler case in which E is an aﬃne subspace of F × n . In fact, we show inthis case that many coordinates of ˜ P are hard.2. We then prove that when E is arbitrary, ˜ P can be written as a convex combination of components˜ P |W , where W is a large aﬃne subspace, with most such components “indistinguishable” from P |W .Speciﬁcally, our main requirement is that for all suﬃciently compressing linear functions φ on W , theKL divergence of φ ( ˜ X ) from φ ( X ) is small, where we sample ˜ X ← ˜ P |W and X ← P |W .3. With this notion of indistinguishability, we prove that if ˜ P |W is indistinguishable from P |W , then theGHZ game (or any game with a constant-sized answer alphabet) is roughly as hard in every coordinatewith query distribution ˜ P |W as with P |W .We conclude that for many coordinates i , there is a signiﬁcant portion of ˜ P for which the i th coordinate ishard. We emphasize that unlike all previous parallel repetition bounds, our proof does not construct a localembedding of Q GHZ into ˜ P for general E . Local Embeddability in Aﬃne Subspaces

We ﬁrst show that if E is any aﬃne subspace of suﬃcientlylow codimension m in F × n , then there exist many coordinates i ∈ [ n ] for which Q GHZ is locally embeddablein the i th coordinate of the conditional distribution ˜ P . In fact, it will suﬃce for us to consider only aﬃne“power” subspaces, i.e. of the form w + V for some linear subspace V in F n and vector w ∈ F × n . Let X , . . . , X n ∈ F denote the queries in each of the n repetitions.Our observation is that when E is aﬃne there exists a subset of coordinates S ⊆ [ n ] with | S | ≥ i ′ ∈ S , E depends on ( X i ) i ∈ S only via the diﬀerences ( X i ′ − X i ) i ′ ∈ S \{ i } . Indeed, if E = E × E × E and if each E j is given by an aﬃne equation ( X j , . . . , X nj ) · A = b j for a suﬃciently “skinny” matrix A ,then by the pigeonhole principle there must exist two distinct subset row-sums of A with equal values. Byconsidering the symmetric diﬀerence of these subsets, and using the fact that we are working over F , thereis a set S ⊆ [ n ] such that the S -subset row-sum of A is 0. Thus the value of ( X j , . . . , X nj ) · A is unchangedif X ij is subtracted from X i ′ j for every i ∈ S .As a result, the players can all sample ( X i ′ − X i ) i ′ ∈ S \{ i } and ( X i ′ ) i ′ / ∈ S , which are independent of X i ,using shared randomness. On input X ij , the j th player can locally compute ( X i ′ j ) i ′ ∈ S from X ij and ( X i ′ − X i ).4 seudo-Aﬃne Decompositions At a high level, we next show that if E is an arbitrary product event(with suﬃcient probability mass) then ˜ P has a “pseudo-aﬃne decomposition”. That is, there is a partitionΠ of ( F n ) into aﬃne subspaces such that if W is a random part of Π (as weighted by ˜ P ), then any strategyfor ˜ P |W can be extended to a strategy for P |W that is similarly successful in expectation.To construct Π, we prove the following suﬃcient conditions for Π to be a pseudo-aﬃne decomposition: • When W is a random part of Π (as weighted by ˜ P ), the distributions ˜ P |W and P |W are indistinguish-able to all suﬃciently compressing linear distinguishers. That is, if W is an aﬃne shift of V , then forall subspaces U ≤ V of suﬃciently small co-dimension, the distributions ˜ P |W and P |W are statisticallyclose modulo U . • Each part W of Π is in fact an aﬃne shift of a product space V for some linear space V .We construct Π satisfying these conditions iteratively. Starting with the singleton partition, as long as arandom part W of Π has some subspace U for which ˜ P |W and P |W are distributed diﬀerently mod U , wereplace each part W of Π by all the aﬃne shifts of U in W . We show that this process cannot be repeatedtoo many times when E has suﬃcient density. Pseudorandomness Preserves Hardness

The high-level reason these conditions suﬃce is because forany strategy f = f × f × f , they enable us to reﬁne Π to a partition Π ′ f such that when X is sampled from˜ P |W ′ for a random part W ′ in Π ′ f , the distribution of f ( X ) is as if X were sampled uniformly from W ′ ∩ E (i.e. with X , X , and X mutually independent). Moreover, when we construct Π ′ f we partition each part W of Π into all aﬃne shifts of some linear space U where the codimension of U in W is not too large.Thus the strategy f on ˜ P |W eﬀectively has the players acting as independent (randomized) functions of theirinputs modulo U . Such strategies generalize to P |W by the ﬁrst property of pseudo-aﬃne decompositionsstated above.To construct Π ′ f , we ensure that f is uncorrelated with every aﬃne function on ˜ P |W ′ when W ′ is arandom part of Π ′ f , and then prove the desired independence by Fourier analysis. We construct Π ′ f byiterative reﬁnement of Π. Start by considering a random part W of Π. Whenever f ( X ) is correlated withan aﬃne F -valued function χ , replace W in Π by W ∩ χ − (0) and W ∩ χ − (1), and do this in parallel forall parts of Π. We show that this cannot be repeated too many times, and thus we quickly arrive at ourdesired Π ′ F . In this section we describe some preliminary deﬁnitions that are somewhat speciﬁc to this work. Morestandard preliminaries are given in Appendices A and B.

Deﬁnition 3.1.

For any set S , a partition of S is a pairwise disjoint set of subsets of S , whose union is allof S . If Π is a partition of S and x is an element of S , we write Π( x ) to denote the (unique) element of Π thatcontains x . If U is a linear subspace of V , we write U ≤ V rather than U ⊆ V to emphasize that U is a subspace ratherthan an unstructured subset.We crucially rely on the Cauchy-Schwarz inequality:5 eﬁnition 3.2 (Inner Product Space) . A real inner product space is a vector space V over R together withan operation h· , ·i : V × V → R satisfying the following axioms for all x, y, z ∈ V : • Symmetry: h x, y i = h y, x i . • Linearity in the ﬁrst argument: h ax + by, z i = a h x, z i + b h y, z i . • Positive Deﬁniteness: h x, x i > if x = 0 . Theorem 3.3 (Cauchy-Schwarz) . In any inner product space, it holds for all vectors u and v that |h u, v i| ≤h u, u i · h v, v i . In parallel repetition we often work with Cartesian product sets of the form X = ( X × · · · × X k ) n . For thesesets, we will use superscripts to index the outer product and subscripts to index the inner product. That is,we view elements x of X as tuples ( x , . . . , x n ), where the i th component of x j is x ji . We will also write x i to denote the vector ( x i , . . . , x ni ). If { E i ⊆ X i } i ∈ [ k ] is a collection of subsets indexed by subscripts, we write E × · · · × E k or Q i ∈ [ k ] E i to denote the set { x ∈ X : ∀ i ∈ [ k ] , x i ∈ E i } . Similarly, if Y is a product set( Y × · · · × Y k ) m , we say f : X → Y is a product function f × · · · × f k if f ( x ) = y for y i = f i ( x i ). Deﬁnition 3.4 (Multi-player Games) . A k -player game is a tuple ( X , Y , P, W ) , where X = X × · · · × X k and Y = Y × · · ·× Y k are ﬁnite sets, P is a probability measure on X , and W : X × Y → { , } is a “winningprobability” predicate. Deﬁnition 3.5 (Parallel Repetition) . Given a k -player game G = ( X , Y , Q, W ) , its n -fold parallel repetition ,denoted G n , is deﬁned as the k -player game ( X n , Y n , Q n , W n ) , where W n ( x, y ) def = V nj =1 W ( x j , y j ) . Deﬁnition 3.6.

The success probability of a function f = f × · · · f k : X → Y in a k -player game G =( X , Y , Q, W ) is v [ f ]( G ) def = Pr x ← Q h W (cid:0) ( x, f ( x ) (cid:1) = 1 i . Deﬁnition 3.7.

The value of a k -player game G = ( X , Y , Q, W ) , denoted v ( G ) , is the maximum, over allfunctions f = f × · · · × f k : X → Y , of v [ f ]( G ) . Fact 3.8.

Randomized strategies are no better than deterministic strategies.

Deﬁnition 3.9 (Value in j th coordinate) . If G = ( X , Y , Q, W n ) is a game (with a product winning predicate),the value of G in the j th coordinate , denoted v j ( G ) , is the value of the game ( X , Y , Q, W ′ ) , where W ′ ( x, y ) = W ( x i , y i ) . Deﬁnition 3.10 (Game with Modiﬁed Query Distribution) . If G = ( X , Y , Q, W ) is a game, and P is aprobability measure on X , we write G| P to denote the game ( X , Y , P, W ) . In this section, we give some Fourier-analytic conditions (see Appendix B for the basics of Fourier analysis)that imply independence of random variables under the (parallel repeated) GHZ query distribution.It will be convenient for us to work with probability distributions in terms of their densities (see Ap-pendix A for basic probability deﬁnitions and notation). Because of symmetry, this implies also linearity in the second argument, aka bilinearity. eﬁnition 4.1 (Probability Densities) . If P : Ω → R is a probability distribution with support contained in A , then the density of P in A is ϕ : A → R x

7→ | A | · P ( x ) . If A is unspeciﬁed, then by default it is taken to be Ω . Lemma 4.2.

Let V be a (ﬁnite) vector space over F , let P be uniform on { x ∈ V : x + x + x = 0 } , andlet U be uniform on V .For any subset E = E × E × E of V , P ( E ) = X χ ∈ ˆ V Y i ∈ [3] ˆ1 E i ( χ ) = U ( E ) · X χ ∈ ˆ V Y i ∈ [3] ˆ ϕ E i ( χ ) , where ϕ E i denotes the density in V of the uniform distribution on E i .Proof. Let ϕ P denote the density in V of P . That is, ϕ P ( x , x , x ) = ( |V| if x + x + x = 00 otherwise.Then P ( E ) = E x ←V [ ϕ P ( x ) · E ( x )]= X χ ∈ b V ˆ ϕ P ( χ ) · ˆ1 E ( χ ) . (Plancherel) (1)We now compute ˆ ϕ P ( χ ) and ˆ1 E ( χ ). We start by noting that the dual space c V is isomorphic to ˆ V . Thatis, each character χ ∈ c V is of the form χ ( x , x , x ) = χ ( x ) χ ( x ) χ ( x ) for some (uniquely determined) χ , χ , χ ∈ ˆ V and conversely, each choice of χ , χ , χ ∈ ˆ V gives rise to some χ ∈ c V .The Fourier transform of ϕ P is given byˆ ϕ P ( χ , χ , χ ) = ( χ = χ = χ E is a product event, the Fourier transform of 1 E : V → { , } is given byˆ1 E ( χ , χ , χ ) = Y i ∈ [3] ˆ1 E i ( χ i )= U ( E ) · Y i ∈ [3] ˆ ϕ E i ( χ i ) . (3)Substituting Eqs. (2) and (3) into Eq. (1) concludes the proof of the lemma. Corollary 4.3.

With V , P , E , and U as in Lemma 4.2, | P ( E ) − U ( E ) | ≤ X χ ∈ ˆ V\{ } Y i ∈ [3] (cid:12)(cid:12) ˆ1 E i ( χ ) (cid:12)(cid:12) , where ∈ ˆ V denotes the trivial character. roof. For any probability density function ϕ , we have ˆ ϕ (1) = 1, so | P ( E ) − U ( E ) | = U ( E ) · (cid:12)(cid:12)(cid:12)(cid:12) P ( E ) U ( E ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ U ( E ) · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X χ ∈ ˆ V\{ } Y i ∈ [3] ˆ ϕ E i ( χ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X χ ∈ ˆ V\{ } Y i ∈ [3] (cid:12)(cid:12) ˆ1 E i ( χ ) (cid:12)(cid:12) . Lemma 4.4.

Let V be a (ﬁnite) vector space over F , let P be uniform on { x ∈ V : x + x + x = 0 } , let U be uniform on V , and let X = ( X , X , X ) denote the identity random variable on V . Let Y i = Y i ( X i ) be a Y i -valued random variable for each i ∈ [3] , let Y = ( Y , Y , Y ) , and let Y = Y × Y × Y .Let W be a subspace of V . If for all χ ∈ ˆ W , E ( x,y ) ← P X,Y (cid:2) d TV (cid:0) P χ ( X ) | X ∈ x + W ,Y = y , U χ ( X ) | X ∈ x + W (cid:1)(cid:3) ≤ ǫ, (4) then E x ← P X (cid:2) d TV (cid:0) P Y | X ∈ x + W , U Y | X ∈ x + W (cid:1)(cid:3) ≤ ǫ · p |Y | · |Y | . Proof.

For x ∈ V , we will write ¯ x to denote the set x + W . Recall that V / W denotes the set of all cosets { x + W} x ∈V . For every i ∈ [3], every ¯ x i ∈ V / W , and every y i ∈ Y i , deﬁne 1 i, ¯ x i ,y i : ¯ x i → { , } to be theindicator for the set Y − i ( y i ) ∩ ¯ x i . Deﬁne ϕ i, ¯ x i ,y i to be the density (in ¯ x i ) of the uniform distribution on Y − i ( y i ) ∩ ¯ x i . That is, ϕ i, ¯ x i ,y i : ¯ x i → R ϕ i, ¯ x i ,y i ( x ′ i ) = ( | ¯ x i || Y − i ( y i ) ∩ ¯ x i | if Y i ( x ′ i ) = y i ϕ i, ¯ x i ,y i is easily seen to be related to 1 i, ¯ x i ,y i as1 i, ¯ x i ,y i = P Y i | ¯ X i =¯ x i ( y i ) · ϕ i, ¯ x i ,y i . With this notation, our assumption that Eq. (4) holds (for all χ ∈ ˆ W ) is equivalent to assuming that forall χ ∈ ˆ W \ { } , E ( x,y ) ← P X,Y (cid:2)(cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)(cid:3) ≤ ǫ. (5)This is because for all χ ∈ ˆ W \ { } , the distribution U χ ( X ) | X ∈ x + W is uniform on {± } .In general for x ∈ Supp( P X ), we have (by Corollary 4.3) that for any y ∈ Y , (cid:12)(cid:12) P Y | X ∈ x + W ( y ) − U Y | X ∈ x + W ( y ) (cid:12)(cid:12) ≤ X χ ∈ ˆ W\{ } Y i ∈ [3] (cid:12)(cid:12) ˆ1 i, ¯ x i ,y i ( χ ) (cid:12)(cid:12) (6)because: • the event E = { Y = y } is a product event E × E × E , where each E i = { Y i = y i } depends only on X i or equivalently on X i − x i , • the distribution P X − x | ¯ X =¯ x is uniform on { ( w , w , w ) ∈ W : w + w + w = 0 } , and • the distribution U X − x | ¯ X =¯ x is uniform on { ( w , w , w ) ∈ W } . Speciﬁcally, with the formalism of random variables as functions on a sample space, we mean that X is the identity function,mapping ( x , x , x ) to ( x , x , x ). · E x ← P X (cid:2) d TV (cid:0) P Y | X ∈ x + W , U Y | X ∈ x + W (cid:1)(cid:3) = E x ← P X X y ∈Y (cid:12)(cid:12) P Y | X ∈ x + W ( y ) − U Y | X ∈ x + W ( y ) (cid:12)(cid:12) ≤ E x ← P X X y ∈Y X χ =1 Y i ∈ [3] (cid:12)(cid:12) ˆ1 i, ¯ x i ,y i ( χ ) (cid:12)(cid:12) = E x ← P X X y ∈Y X χ =1 Y i ∈{ , } q(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12) · ˆ1 i, ¯ x i ,y i ( χ ) . Now, we apply Cauchy-Schwarz on the inner product space whose elements are real-valued functions of( x, y, χ ), and where the inner product is deﬁned by h f, g i def = E x ← P X P y ∈Y P χ =1 f ( x, y, χ ) · g ( x, y, χ ). Thisbounds the above by vuuut Y i ∈{ , }  E x ← P X X y ∈Y X χ =1 (cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12) · ˆ1 i, ¯ x i ,y i ( χ )  = vuuut Y i ∈{ , } X χ =1 X y ∈Y E x ← P X h(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12) · ˆ1 i, ¯ x i ,y i ( χ ) i . By the independence of ( X , Y ) and ( X i , Y i ) under P for i ∈ { , } , this is equal to Y i ∈{ , } sX χ =1 X y ∈Y E x ← P X h(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12)i · E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) = Y i ∈{ , } vuuut |Y − i | · X χ =1  X y ∈Y E x ← P X h(cid:12)(cid:12) ˆ1 , ¯ x ,y ( χ ) (cid:12)(cid:12)i ·  X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) . But the function 1 , ¯ x ,y is just P Y | ¯ X =¯ x ( y ) · ϕ , ¯ x ,y , so the above is Y i ∈{ , } vuuut |Y − i | · X χ =1  X y ∈Y E x ← P X h P Y | ¯ X =¯ x ( y ) · (cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)i ·  X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) = Y i ∈{ , } vuuut |Y i | · X χ =1  X y ∈Y E x ← P X h P Y | ¯ X =¯ x ( y ) · (cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)i ·  X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) which by the deﬁnition of expectation is Y i ∈{ , } vuuut |Y i | · X χ =1 (cid:18) E x,y ← P X,Y (cid:2)(cid:12)(cid:12) ˆ ϕ , ¯ x ,y ( χ ) (cid:12)(cid:12)(cid:3)(cid:19) ·  X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) .

9e use Eq. (5) to bound this by Y i ∈{ , } s ǫ |Y i | · X χ =1 X y i ∈Y i E x ← P X (cid:2) ˆ1 i, ¯ x i ,y i ( χ ) (cid:3) ≤ Y i ∈{ , } vuut ǫ |Y i | · X y i ∈Y i E x ← P X (cid:20) E x ′ ← ¯ x i [1 i, ¯ x i ,y i ( x ′ ) ] (cid:21) (Parseval’s Theorem)= Y i ∈{ , } vuuut ǫ |Y i | · E x ← P X  E x ′ ← ¯ x i  X y i ∈Y i i, ¯ x i ,y i ( x ′ )  . But for y i = y ′ i , the supports of 1 i, ¯ x i ,y i and 1 i, ¯ x i ,y ′ i are disjoint, so this is at most 2 ǫ p |Y | · |Y | . In this section we show that the parallel repeated GHZ query distribution has many coordinates in which theGHZ query distribution can be locally embedded, even conditioned on any aﬃne event of low co-dimension.We ﬁrst recall the notion of a local embedding.

Deﬁnition 5.1.

Let Σ be a ﬁnite set, let k and n be positive integers, let Q be a probability distribution on Σ k , and let ˜ P be a probability distribution on Σ k × n .We say that Q is locally embeddable in the j th coordinate of ˜ P if there exists a probability distribution R on a set R and functions e , . . . , e k : Σ × R → Σ n such that when sampling q ← Q , r ← R , if ˜ X denotes therandom variable ˜ X def =  e ( q , r ) ⊤ ... e k ( q k , r ) ⊤  , then:1. The probability law of ˜ X is exactly ˜ P .2. It holds with probability that ˜ X j = q . Proposition 5.2.

Let n and m be positive integers with m < n . Let Q denote the GHZ query distribution(uniform on the set Q = { x ∈ F : x + x + x = 0 } ), and let W be an aﬃne shift of V for a subspace V ≤ F × n of codimension m with Q n ( W ) > .Then there exist at least n − m distinct values of j ∈ [ n ] for which Q is locally embeddable in the j th coordinate of ˜ P def = Q n |W .Proof. Suppose otherwise. Without loss of generality, suppose that the coordinates that are not locallyembeddable include the ﬁrst n ′ def = m + 1 coordinates (otherwise, V can be permuted to make this so). Thatis, for each j ∈ [ n ′ ], Q is not locally embeddable in the j th coordinate of ˜ P .Let the deﬁning equations for V be written as V def = (cid:8) x ∈ F × n : x · A = 0 (cid:9) for some choice of A ∈ F n × m , and let v ∈ F × n be such that W = v + V .Because 2 n ′ > m , the pigeonhole principle implies that there exist two distinct sets S , S ⊆ [ n ′ ] suchthat X j ∈ S A j = X j ∈ S A j , A j denotes the j th row of A . Thus, there is a non-empty subset S def = S ∆ S ⊆ [ n ′ ] suchthat X j ∈ S A j = 0 . (7)Fix some such S . We will show that for any j ∈ S , Q is locally embeddable in the j th coordinate of ˜ P ,which is a contradiction. Let X denote the F × n -valued random variable given by the identity function. Claim 5.3.

For any j ∈ S , the distribution ˜ P X j is identical to Q (i.e., uniformly random on Q ).Proof. Let j ∈ S be given. It suﬃces to show that for every q, q ′ ∈ Q , there is a bijection Φ q,q ′ : Q n ∩ W →Q n ∩ W such that x ∈ Q n ∩ W satisﬁes x j = q if and only if y def = Φ q,q ′ ( x ) satisﬁes y j = q ′ . Such a bijectionΦ q,q ′ can be constructed by deﬁning, for all j ′ ∈ [ n ],Φ q,q ′ ( x ) j ′ = ( x j ′ + q ′ − q if j ′ ∈ Sx j ′ otherwise.Φ q,q ′ clearly is an injective map from Q n to Q n and satisﬁes Φ q,q ′ ( x ) j = x j + q ′ − q , so the only remainingthing to check is that it indeed maps W into W . This is true because it preserves x · A . Indeed, for any i ∈ [3], Φ q,q ′ ( x ) i · A = x i · A + X j ′ ∈ S ( q ′ i − q i ) · A j ′ = x i · A + ( q ′ i − q i ) · X j ′ ∈ S A j ′ = x i · A (by Eq. (7)).For any j ∈ S , let ∆ ( j ) denote the random variable (cid:0) X j ′ − X j (cid:1) j ′ ∈ S \{ j } . Claim 5.4.

For any j ∈ S , it holds in ˜ P that (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) and X j are independent.Proof. Equivalently (using the deﬁnition of ˜ P ), let E denote the event that X ∈ W , i.e. for all i ∈ [3],( X i − v i ) · A = 0 . We need to show that in P , the random variables X j and (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) are conditionally independent given E . To show this, we rely on the following fact: Fact 5.5. If Y and Z are any independent random variables, and if E is any event that depends only on Z (and occurs with non-zero probability), then Y and Z are conditionally independent given E . It is clear that X j and (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) are independent in P . It is also the case that E depends only on (cid:0) ∆ ( j ) , X [ n ] \ S (cid:1) : E is deﬁned by the constraint that for all i ∈ [3],0 = ( X i − v i ) · A = X j ′ ∈ S ( X j ′ i − X ji − v j ′ i ) · A j ′ + X j ′ ∈ [ n ] \ S ( X j ′ i − v j ′ i ) · A j ′ (by Eq. (7))= − v ji · A j + X j ′ ∈ S \{ j } ( X j ′ i − X ji − v j ′ i ) · A j ′ | {z } depends only on ∆ ( j ) + X j ′ ∈ [ n ] \ S ( X j ′ i − v j ′ i ) · A j ′ | {z } depends only on X [ n ] \ S .

11e now put everything togther. Fix any j ∈ S . We construct a local embedding of Q into the j th coordinate of ˜ P . For each i ∈ [3], we deﬁne e i : F × ( F × n ) → F × n such that for each j ′ ∈ [ n ]: e i ( x, r ) j ′ =  x if j ′ = jx + r j ′ i − r ji if j ′ ∈ S \ { j } r j ′ i if j ′ / ∈ S .Deﬁne the distribution P ( embed ) to be the distribution on x ∈ F × n obtained by independently sampling q ← Q and r ← ˜ P , then deﬁning x def =  e ( q , r ) e ( q , r ) e ( q , r )  . It clearly holds with probability 1 that q = x j . Claim 5.6. P ( embed ) ≡ ˜ P .Proof. By deﬁnition, it is immediate that: P ( embed ) X j ≡ ˜ P X j and P ( embed )∆ ( j ) ,X [ n ] \ S ≡ ˜ P ∆ ( j ) ,X [ n ] \ S .Finally, X is fully determined by X j and (∆ ( j ) , X [ n ] \ S ), which are independent in both P ( embed ) (because q and r are sampled independently in the deﬁnition of P ( embed ) ) and ˜ P (by Claim 5.4).We have constructed an embedding of Q into one of the ﬁrst n ′ coordinates of ˜ P , which is the desiredcontradiction. In this section we show that if E is an arbitrary event with suﬃcient probability mass under P = Q n GHZ ,then ˜ P = P | E can be decomposed into components with aﬃne support that are “similar” to correspondingcomponents of P . We will call such components pseudorandom.We say that Π is an aﬃne partition of F × n to mean that: • Each part Π( x ) of Π has the form w ( x ) + V ( x ) where V ( x ) is a subspace of F n , and • Each V ( x ) has the same dimension, which we refer to as the dimension of Π and denote by dim(Π).The codimension of Π is deﬁned to be n − dim(Π). Deﬁnition 6.1. If W is an aﬃne shift of a vector space V (for V ≤ F n ), we say that a W -valued randomvariable X is ( m, ǫ ) -close to Y if for all linear functions φ : F n → F m we have d KL ( φ ( X ) k φ ( Y )) ≤ ǫ , where φ denotes the function mapping  x x x   φ ( x ) φ ( x ) φ ( x )  . We write d m ( X k Y ) to denote the minimum ǫ for which X is ( m, ǫ ) -close to Y . We remark that d m ( X k Y ) is a non-decreasing function of m . Lemma 6.2.

Let P denote the distribution Q n GHZ , let X be the identity random variable, let E be an eventwith P ( X ∈ E ) = e − ∆ , and let ˜ P = P (cid:12)(cid:12) ( X ∈ E ) . For any δ > and any m ∈ Z + , there exists an aﬃnepartition Π of F × n , of codimension at most m · ∆ δ , such that: E π ← ˜ P Π( X ) h d m (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17)i ≤ δ. (8)12 roof. We construct the claimed partition iteratively. Start with the trivial n -dimensional aﬃne partitionΠ = { F × n } . Whenever Π i is a partition Π for which Eq. (8) does not hold, there exists a function φ i : F × n → F × m that: • When restricted to any part π of Π i , φ i is of the form φ i,π for some linear function φ i,π : F n → F m ,and • d KL (cid:16) ˜ P φ i ( X ) | Π i ( X ) (cid:13)(cid:13)(cid:13) P φ i ( X ) | Π i ( X ) (cid:17) > δ. (9)Without loss of generality, we additionally assume that each φ i,π is “full rank” when restricted to π .That is, if π is an aﬃne shift of V , where V has dimension k , then the restriction of φ i,π to V is a linearmap of rank min( k, m ). It is clear that any φ i,π may be modiﬁed to be full rank without decreasing the KLdivergence of Eq. (9).Then by the chain rule for KL divergences, d KL (cid:16) ˜ P X | Π i ( X ) ,φ i ( X ) (cid:13)(cid:13)(cid:13) P X | Π i ( X ) ,φ i ( X ) (cid:17) < d KL (cid:16) ˜ P X | Π i ( X ) (cid:13)(cid:13)(cid:13) P X | Π i ( X ) (cid:17) − δ. (10)The left-hand side of Eq. (10) is equivalent to d KL (cid:16) ˜ P X | Π i +1 ( X ) (cid:13)(cid:13)(cid:13) P X | Π i +1 ( X ) (cid:17) with Π i +1 = (cid:8) π ∩ { x : φ i ( x ) = z } (cid:9) π ∈ Π i ,z ∈ F × m , which is an aﬃne partition of dimension at least dim(Π) − m .Thus with the non-negative potential functionΦ(Π) def = d KL (cid:16) ˜ P X | Π( X ) (cid:13)(cid:13)(cid:13) P X | Π( X ) (cid:17) , we have Φ(Π i +1 ) < Φ(Π i ) − δ . But Φ(Π ) = − ln ( P ( E )) = ∆, so there must exist i ⋆ ≤ ∆ δ for which Eq. (8)holds with Π = Π i ⋆ , which has co-dimension at most m · ∆ δ . Proposition 7.1.

Let

W ⊆ F × n be an aﬃne shift of a linear subspace V and let P be a the uniformdistribution on { w ∈ W : w + w + w = 0 } , which we assume to be non-empty. Let X denote the identityrandom variable, let E = E × E × E be an event with P ( X ∈ E ) = e − ∆ , and deﬁne ˜ P def = P (cid:12)(cid:12) ( X ∈ E ) .Suppose that ˜ P X is ( ⌈ δ ⌉ , δ ) -close to P X as in Deﬁnition 6.1, for δ satisfying δ ≤ min( ∆ · e − /ǫ , ∆ e , ǫ ) .Then for each j ∈ [ n ] , we have v j ( G n GHZ | ˜ P ) ≤ v j ( G n GHZ | P ) + 2 ǫ .Proof. Fix j ∈ [ n ] to be any coordinate, and let ˜ f = ˜ f × ˜ f × ˜ f : W → F be an arbitrary strategy. Let Y denote ˜ f ( X ). Claim 7.2.

There exists a subspace

U ≤ V of codimension at most ⌈ δ ⌉ such that: • The j th coordinate x j of x ∈ F × n depends only on x + U . • For all χ ∈ ˆ U , E ( x,y ) ← P X,Y h d KL (cid:16) P χ ( X ) | X ∈ x + U ,Y = y (cid:13)(cid:13)(cid:13) U χ ( X ) | X ∈ x + U (cid:17)i ≤ δ, where U denotes the uniform distribution on W . roof. Start with U = { u ∈ V : u j = 0 } (this ensures that any subspace U ≤ U satisﬁes the ﬁrst desiredproperty). Deﬁne a potential function Z ( U ) def = dim( U ) − E ( x,y ) ← P X,Y (cid:2) H ( X | X ∈ x + U , Y = y ) (cid:3) , which is clearly non-negative. Additionally, Z ( U ) (and in particular Z ( U )) is at most 1 because for anysubspace U ≤ V and any x ∈ V , the entropy chain rule implies E y ← P Y | X ∈ x U (cid:2) H ( X | X ∈ x + U , Y = y ) (cid:3) = H ( X | X ∈ x + U ) − H ( Y | X ∈ x + U ) ≥ dim( U ) − . (in the ﬁrst step we used the fact that Y is a function of X .For i ≥

1, deﬁne χ i ∈ ˆ U i \ { } to maximize b i def = E ( x,y ) ← P X,Y h d KL (cid:16) P χ i ( X ) | X ∈ x + U i ,Y = y (cid:13)(cid:13)(cid:13) U χ i ( X ) | X ∈ x + U i (cid:17)i = E ( x,y ) ← P X,Y h d KL (cid:16) P χ i ( X ) | X ∈ x + U i ,Y = y (cid:13)(cid:13)(cid:13) Unif {± } (cid:17)i = E ( x,y ) ← P X,Y h d KL (cid:16) P χ i ( X ) | X ∈ x + U i ,Y = y (cid:13)(cid:13)(cid:13) Unif {± } (cid:17)i = 1 − E ( x,y ) ← P X,Y h H (cid:0) χ i ( X ) | X ∈ x + U i , Y = y (cid:1)i , and deﬁne U i +1 def = { u ∈ U i : χ i ( u ) = 1 } . By the entropy chain rule, we have Z ( U i +1 ) ≤ Z ( U i ) − b i .Since the initial potential is at most 1, and all potentials are at least 0, there must be some i ⋆ ≤ ⌈ δ ⌉ forwhich b i ⋆ ≤ δ . The corresponding U i ⋆ is the desired subspace of V .Now let U be as given by Claim 7.2. By Lemma 4.4, we have E x ← P X  d TV (cid:0) P Y | X ∈ x + U , Y i ∈ [3] P Y i | X i ∈ x i + U (cid:1) ≤ √ δ. By assumption of Proposition 7.1 (together with Pinsker’s inequality), P X + U and ˜ P X + U are q δ -closein total variational distance. We thus have that E x ← ˜ P X  d TV (cid:0) P Y | X ∈ x + U , Y i ∈ [3] P Y i | X i ∈ x i + U (cid:1) ≤ √ δ, (11)by the general fact that if P and Q are two distributions that are ǫ -close in total variational distance, andif X is a B -bounded random variable, then (cid:12)(cid:12) E P [ X ] − E Q [ X ] (cid:12)(cid:12) ≤ Bǫ .We now obtain a probabilistic lower bound on P ( E | X + U ). We ﬁrst lower bound its log-expectation: E x ← ˜ P X h − ln P (cid:0) E | X ∈ x + U (cid:1)i = E x ← ˜ P X h d KL (cid:0) ˜ P X | X ∈ x + U k P X | X ∈ x + U (cid:1)i ≤ d KL (cid:0) ˜ P X k P X (cid:1) (Fact A.17) ≤ ∆ . Markov’s inequality then implies that for any τ ,Pr x ← ˜ P X (cid:2) P ( E | X ∈ x + U ) ≤ τ (cid:3) ≤ ∆ln(1 /τ ) . (12)14ombining Eq. (12) with Eq. (11) and Fact A.18, we get E x ← ˜ P X  d TV (cid:0) ˜ P Y | X ∈ x + U , Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i ∈ x i + U (cid:1) ≤ ∆ln(1 /τ ) + 4 √ δτ . Since this holds for all τ ∈ [0 ,

1] and because δ ≤ ∆ e , Corollary C.2 implies that E x ← ˜ P X  d TV (cid:0) ˜ P Y | X ∈ x + U , Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i ∈ x i + U (cid:1) ≤ (cid:16) ∆ √ δ (cid:17) ≤ ǫ, (13)where the last inequality follows from our assumption that δ ≤ ∆ · e − /ǫ .Putting everything together, we have˜ P X + U ,Y = ˜ P X + U ˜ P Y | X + U ≈ ǫ ˜ P X + U · Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i + U ≈√ δ P X + U · Y i ∈ [3] ( P | X i ∈ E i ) Y i | X i + U , where ≈ denotes closeness in total variational distance.But P X + U · Q i ∈ [3] ( P | X i ∈ E i ) Y i | X i + U is just the distribution on ( x + U , y ) obtained by sampling x ← P X , y ← F ( x ), where F = F × F × F is the following randomized strategy. On input x i , F i useslocal randomness to sample and output y i ← ( P | X i ∈ E i ) Y i | X i ∈ x i + U . By Fact 3.8, the probability that W ( x j , y ) = 1 (which is well-deﬁned because x j is a function of x + U ) is at most v j ( G n GHZ | P ).We thus have v j [ ˜ f ]( G n GHZ | ˜ P ) = ˜ P X + U ,Y (cid:0) W ( X j , Y ) = 1 (cid:1) ≤ v j ( G n GHZ | P ) + ǫ + r δ ≤ v j ( G n GHZ | P ) + 2 ǫ. Since this holds for arbitrary ˜ f , we have v j ( G n GHZ | ˜ P ) ≤ v j ( G n GHZ | P ) + 2 ǫ . Theorem 8.1. If G = ( X , Y , Q, W ) denotes the GHZ game, then v ( G n ) ≤ n − Ω(1) .Proof.

Recall v ( G ) = 3 / P denote Q n ; that is P is uniform on (cid:8) ( X , X , X ) ∈ F × n : X + X + X = 0 (cid:9) . Let E = E × E × E be any product event in F × n with P ( E ) ≥ e − ∆ (where ∆ is a parameter we will specify later), and let ˜ P denote P | E .Let δ > m = ⌈ δ ⌉ . Recall our deﬁnition of d m (Deﬁni-tion 6.1). Lemma 6.2 states that there exists an aﬃne partition Π of F × n , of codimension at most m · ∆ δ ,such that: E π ← ˜ P Π( X ) h d m (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17)i ≤ δ. E π ← ˜ P Π( X ) h d ∞ (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17)i = d KL (cid:16) ˜ P X | Π( X ) (cid:13)(cid:13)(cid:13) P X | Π( X ) (cid:17) ≤ d KL (cid:0) ˜ P X k P X (cid:1) ≤ ∆ . Markov’s inequality thus implies that with probability at least 1 / π ← ˜ P Π( X ) , it holds that d m (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17) ≤ δ and d ∞ (cid:16) ˜ P X | X ∈ π (cid:13)(cid:13)(cid:13) P X | X ∈ π (cid:17) ≤ π pseudorandom, and let R denote the set of pseudorandom π .By Proposition 7.1, for each pseudorandom π we have v j (cid:0) G n | ( ˜ P | π ) (cid:1) ≤ v j (cid:0) G n | ( P | π ) (cid:1) + 2 ǫ (14)as long as 3 δ ≤ min( 9∆ · e − /ǫ , e , ǫ ) , (15)where ǫ is a parameter we will specify later.By Proposition 5.2, for each π ∈ Π (with P ( π ) > m · ∆ δ values of j ∈ [ n ], we have v j (cid:0) G n (cid:12)(cid:12) ( P | π ) (cid:1) = v ( G ) = 3 /

4. By averaging, there exists some j ⋆ ∈ [ n ] such that E π ← ˜ P Π( X ) | Π( X ) ∈R h v j ⋆ (cid:0) G n (cid:12)(cid:12) ( P | π ) (cid:1)i ≤ m ∆ nδ + (cid:18) − m ∆ nδ (cid:19) · , which is at most 7 / δ ≥ m ∆ n . (16)Putting everything together, we have v j ⋆ (cid:0) G n | ˜ P (cid:1) ≤ E π ← ˜ P Π( X ) h v j ⋆ (cid:0) G n | ( ˜ P | π ) (cid:1)i ≤ Pr π ← ˜ P Π( X ) [ π / ∈ R ] + Pr π ← ˜ P Π( X ) [ π ∈ R ] · E π ← ˜ P Π( X ) | Π( X ) ∈R h v j ⋆ (cid:0) G n | ( ˜ P | π ) (cid:1)i ≤

23 + 13 · ( 78 + 2 ǫ ) ≤ ǫ ≤ . Setting ǫ = , ∆ = 0 . n , δ = n − . , m = n . ensuresthat these constraints are all satisﬁed for suﬃciently large n .Applying Lemma 8.2 below with ρ ( n ) = n − . and ǫ = completes the proof. Lemma 8.2 (Parallel Repetition Criterion) . Let G = ( X , Y , Q, W ) be a game, and let P denote Q n . Suppose ρ : Z + → R is a function with ρ ( n ) ≥ e − O ( n ) and ǫ > is a constant such that for all E = E × · · · E k ⊆ X n with P n ( E ) ≥ ρ ( n ) there exists j such that v j (cid:0) G n | ( P | E ) (cid:1) ≤ − ǫ . Then v ( G n ) ≤ ρ ( n ) Ω(1) . Proof.

Fix any f = f ×· · ·× f k : X n → Y n . Consider the probability space deﬁned by sampling X ← P n , andlet Y = f ( X ). We deﬁne additional random variables J , . . . , J n ∈ [ n ] and Z , . . . , Z n ∈ X × Y where J is anarbitrary ﬁxed value, Z i def = ( X J i , Y J i ) for all i , and J i +1 depends deterministically on Z ≤ i def = ( Z , . . . , Z i ) as16ollows. When Z ≤ i = z ≤ i , J i +1 is deﬁned to be a value j ∈ [ n ] that minimizes P n (cid:0) W ( X j , Y j ) = 1 (cid:12)(cid:12) Z ≤ i = z ≤ i (cid:1) .With these deﬁnitions, each event { Z ≤ i = z ≤ i } is a product event. In particular, if P n ( Z ≤ i = z ≤ i ) ≥ ρ ( n )then P n (cid:0) W ( X J i +1 , Y J i +1 ) = 1 (cid:12)(cid:12) Z ≤ i = z ≤ i (cid:1) ≤ − ǫ .Let Win i denote the event that W ( Z i ) = 1, let Win ≤ i denote the event Win ∧ · · · ∧ Win i , and let w i denote P n (cid:0) Win ≤ i (cid:1) . Since Win ≤ i is the union of some subset of the |X | i · |Y| i disjoint product events { Z ≤ i = z ≤ i } , we have Pr z ≤ i ← P nZ ≤ i | Win ≤ i [ P n ( Z ≤ i = z ≤ i ) ≥ ρ ( n )] ≥ − |X | i · |Y| i · ρ ( n ) w i . Moreover, for all z ≤ i for which P n ( Z ≤ i = z ≤ i ) ≥ ρ ( n ), we know that P n (cid:0) Win i +1 (cid:12)(cid:12) Z ≤ i = z ≤ i (cid:1) ≤ − ǫ . Thusas long as w i ≥ · |X | i · |Y| i · ρ ( n ), we have w i +1 = w i · P n ( Win i +1 | Win ≤ i )= w i · E z ≤ i ← P nZ ≤ i | Win ≤ i (cid:2) P n ( Win i +1 | Z ≤ i = z ≤ i ) (cid:3) ≤ w i · Pr z ≤ i ← P nZ ≤ i | Win ≤ i (cid:2) P n (cid:0) Z ≤ i = z ≤ i (cid:1) < ρ (cid:3) + Pr z ≤ i ← P nZ ≤ i | Win ≤ i (cid:2) P n (cid:0) Z ≤ i = z ≤ i (cid:1) ≥ ρ (cid:3) · (1 − ǫ ) ! ≤ w i · (cid:18)

12 + 12 · (1 − ǫ ) (cid:19) = w i · (cid:16) − ǫ (cid:17) Iterating this inequality as long as the condition w i ≥ · |X | i · |Y| i · ρ ( n ) is satisﬁed, we ﬁnd w i ⋆ such that w i ⋆ ≤ min (cid:0) · |X | i ⋆ · |Y| i ⋆ · ρ ( n ) , (1 − ǫ ) i ⋆ (cid:1) . This is minimized for i ⋆ = Θ(log ρ ( n ) ) or i ⋆ = n and gives v ( G n ) ≤ w i ⋆ ≤ ρ ( n ) Ω(1) . A Probability Theory

We recall the notions of probability theory that we will need.

Deﬁnition A.1. A probability distribution on a ﬁnite set Ω is a function P : Ω → R satisfying P ( ω ) ≥ forall ω ∈ Ω and P ω ∈ Ω P ( ω ) = 1 . We extend the domain of P to Ω by writing P ( E ) to denote P ω ∈ E P ( ω ) for any “event” E ⊆ Ω . Deﬁnition A.2.

The support of P : Ω → R is the set { ω ∈ Ω : P ( ω ) > } . Deﬁnition A.3. A Σ -valued random variable on a sample space Ω is a function X : Ω → Σ . Deﬁnition A.4 (Expectations) . If P : Ω → R is a probability distribution and X : Ω → R is a randomvariable, the expectation of X under P , denoted E P [ X ] , is deﬁned to be P ω ∈ Ω P ( ω ) · X ( ω ) . We refer to subsets of Ω as events . We use standard shorthand for denoting events. For instance, if X isa Σ-valued random variable and x ∈ Σ, we write X = x to denote the event { ω ∈ Ω : X ( ω ) = x } . Deﬁnition A.5 (Indicator Random Variables) . For any event E , we write E to denote a random variabledeﬁned as E ( ω ) = ( if ω ∈ E otherwise. Deﬁnition A.6 (Independence) . Events E , . . . , E k ⊆ Ω are said to be independent under a probabilitydistribution P if P ( E ∩ · · · ∩ E k ) = Q i ∈ [ k ] P ( E i ) . Random variables X , . . . , X k are said to be independentif the events X = x , . . . , X k = x k are independent for any choice of x , . . . , x k . eﬁnition A.7 (Conditional Probabilities) . If P : Ω → R is a probability distribution and E ⊆ Ω is anevent with P ( E ) > , then the conditional distribution of P given E is denoted ( P | E ) : Ω → R and is deﬁnedto be ( P | E )( ω ) = ( P ( ω ) /P ( E ) if ω ∈ E otherwise. If X is a random variable and P is a probability distribution, we write P X to denote the induceddistribution of X under P . That is, P X ( x ) = P ( X = x ).If E is an event, we write P X | E as shorthand for ( P | E ) X . Deﬁnition A.8 (Entropy) . If P : Ω → R is a probability distribution, the entropy (in nats) of P is H ( P ) def = − X ω ∈ Ω P ( ω ) · ln (cid:0) P ( ω ) (cid:1) . When X is a random variable associated with a probability distribution P , we sometimes write H ( X ) asshorthand for H ( P X ) . Deﬁnition A.9 (Conditional Entropy) . If P is a probability measure with random variables X and Y , wewrite H ( P X | Y ) def = E y ← P Y (cid:2) H ( P X | Y = y ) (cid:3) . Fact A.10 (Chain Rule of Conditional Entropy) . For any probability measure P and any random variables X , Y , it holds that H ( P X | Y ) = H ( P X,Y ) − H ( P Y ) . A.1 Divergences

Deﬁnition A.11 (Total Variational Distance) . If P, Q : Ω → R are two probability distributions, then the total variational distance between P and Q , denoted d TV ( P, Q ) , is d TV ( P, Q ) def = max E ⊆ Ω (cid:12)(cid:12)(cid:12) P ( E ) − Q ( E ) (cid:12)(cid:12)(cid:12) . An equivalent deﬁnition is d TV ( P, Q ) def = 12 X ω ∈ Ω (cid:12)(cid:12) P ( ω ) − Q ( ω ) (cid:12)(cid:12) Deﬁnition A.12 (Kullback-Leibler (KL) Divergence) . If P, Q : Ω → R are probability distributions, the Kullback-Leibler divergence of P from Q is d KL ( P k Q ) def = X ω ∈ Ω P ( ω ) ln (cid:18) P ( ω ) Q ( ω ) (cid:19) , where terms of the form p · ln( p/ are treated as if p = 0 and + ∞ otherwise, and terms of the form · ln(0 /q ) are treated as . The following relation between total variational distance and Kullback-Leiber divergence, known asPinsker’s inequality, is of fundamental importance.

Theorem A.13 (Pinsker’s Inequality) . For any probability distributions

P, Q : Ω → R , it holds that d TV ( P, Q ) ≤ q d KL ( P k Q ) . eﬁnition A.14 (Conditional KL Divergence) . If P, Q : Ω → R are probability distributions and if W , X , Y , and Z are random variables on Ω , we write d KL ( P W | X k Q Y | Z ) def = E x ← P X (cid:2) d KL ( P W | X = x k Q Y | Z = x ) (cid:3) , which is taken to be + ∞ if there exists x with P X ( x ) > but Q Z ( x ) = 0 . KL divergence obeys a chain rule analogous to that for entropy.

Fact A.15 (Chain Rule for KL Divergence) . If P, Q : Ω → R are probability distributions and W, X, Y, Z are random variables on Ω , then d KL ( P W,X k Q Y,Z ) = d KL ( P X k Q Z ) + d KL ( P W | X k P Y | Z ) . A.2 Conditional KL Divergence

Fact A.16. If P : Ω → R is a probability distribution and E ⊆ Ω is an event, then d KL (cid:0) P | E (cid:13)(cid:13) P (cid:1) = ln (cid:18) P ( E ) (cid:19) . Fact A.17.

Let

P, Q : Ω → R be probability distributions and let X , Y be random variables on Ω with Y afunction of X . Then d KL ( P X | Y k Q X | Y ) (cid:3) ≤ d KL ( P X k Q X ) . Proof.

This is well known, but for completeness: d KL ( P X | Y k Q X | Y ) = d KL ( P X,Y k Q X,Y ) − d KL ( P Y k Q Y ) (chain rule)= d KL ( P X k Q X ) − d KL ( P Y k Q Y ) ( Y is a function of X ) ≤ d KL ( P X k Q X ) . (non-negativity of KL) A.3 Conditional Statistical Distance

Fact A.18.

Let

P, Q : Ω → R be probability distributions, and let E ⊆ Ω be an arbitrary event. Then d TV ( P | E, Q | E ) ≤ · d TV ( P, Q ) P ( E ) . Proof.

Suppose for the sake of contradiction that for some A ⊆ E , we have | ( P | E )( A ) − ( Q | E )( A ) | > d TV ( P, Q ) P ( E ) . Multiplying on both sides by P ( E ), we obtain | P ( A ) − P ( E ) · ( Q | E )( A ) | > d TV ( P, Q ) . Since | P ( E ) − Q ( E ) | ≤ d TV ( P, Q ) and ( Q | E )( A ) ≤

1, we have | P ( A ) − Q ( A ) | > d TV ( P, Q ) , which is a contradiction. Corollary A.19.

For any (ﬁnite) vector space V over F , the character group of V , denoted ˆ V , is the set of group homomor-phisms mapping V (viewed as an additive group) to {± } (viewed as a multiplicative group). Each suchhomomorphism is called a character of V .We will distinguish the spaces of functions mapping from V → R and functions mapping ˆ V → R andview them as two diﬀerent inner product spaces. For functions mapping V → R , we deﬁne the inner product h f, g i def = E x ← V [ f ( x ) g ( x )] , and for functions mapping ˆ V → R , we deﬁne the inner product h ˆ f , ˆ g i def = X χ ∈ ˆ V ˆ f ( χ ) · ˆ g ( χ ) . If there is danger of ambiguity, we use ˆ h· , · ˆ i to denote the latter inner product, and ˆ k · ˆ k to denote itscorresponding norm. Fact B.1.

Given a choice of basis for V , there is a canonical isomorphism between V and ˆ V . Speciﬁcally,if V = F n , then the characters of V are the functions of the form χ γ ( v ) = ( − γ · v for γ ∈ F n . Deﬁnition B.2.

For any function f : V → R , its Fourier transform is the function ˆ f : ˆ V → R deﬁned by ˆ f ( χ ) def = h f, χ i = E x ← V [ f ( x ) χ ( x )] . One can verify that the characters of V are orthonormal. Together with the assumption that V is ﬁnite,we can deduce that f is equal to P χ ∈ ˆ V ˆ f ( χ ) · χ . Theorem B.3 (Plancherel) . For any f, g : V → R , h f, g i = h ˆ f , ˆ g i . An important special case of Plancherel’s theorem is Parseval’s theorem:

Theorem B.4 (Parseval) . For any f : V → R , k f k = k ˆ f k . Bound on Optimization Problem

Let W : R + → R + denote the inverse of the function x x · e x ( W is known in the literature as the(principal branch of the) Lambert W function). We rely on the following theorem: Theorem C.1 ([HH00, Corollary 2.4]) . There exists a constant C (in particular, C = ln (cid:0) e (cid:1) works)such that for all y ≥ e , W ( y ) ≤ ln y − ln ln y + C. The following corollary is more directly suited to our needs.

Corollary C.2.

For any

A, B > satisfying A ≥ eB , min τ ∈ (0 , A ln (cid:0) τ (cid:1) + Bτ ≤ A ln( A/B ) . Proof.

The minimum is achieved (up to a factor of two) when A ln ( τ ) = Bτ because A ln ( τ ) is monotonicallyincreasing with τ while Bτ is monotonically decreasing. Making the change of variables z = − ln( τ ), this isequivalent to ze z = AB , i.e. z = W ( AB ). This choice of z (or equivalently τ ) gives A ln (cid:0) τ (cid:1) + Bτ = 2 AW ( A/B )= 2 B · A/BW ( A/B )= 2 B · exp (cid:0) W ( A/B ) (cid:1) (Deﬁnition of W ) ≤ A · (1 + e − )ln( A/B ) (Theorem C.1) ≤ A ln( A/B ) . References [BGKW88] Michael Ben-Or, Shaﬁ Goldwasser, Joe Kilian, and Avi Wigderson. Multi-prover interactiveproofs: How to remove intractability assumptions. In

STOC , pages 113–131. ACM, 1988.[BJKS04] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics ap-proach to data stream and communication complexity.

J. Comput. Syst. Sci. , 68(4):702–732,2004.[CHTW04] Richard Cleve, Peter Høyer, Benjamin Toner, and John Watrous. Consequences and limits ofnonlocal strategies. In

CCC , pages 236–249. IEEE Computer Society, 2004.[DHVY17] Irit Dinur, Prahladh Harsha, Rakesh Venkat, and Henry Yuen. Multiplayer parallel repetitionfor expanding games. In

ITCS , volume 67 of

LIPIcs , pages 37:1–37:16. Schloss Dagstuhl -Leibniz-Zentrum für Informatik, 2017.[EPR35] Albert Einstein, Boris Podolsky, and Nathan Rosen. Can quantum-mechanical description ofphysical reality be considered complete?

Physical review letters , 47(10):777, 1935.[Fei91] Uriel Feige. On the success probability of the two provers in one-round proof systems. In

Structure in Complexity Theory Conference , pages 116–123. IEEE Computer Society, 1991.21FGL +

91] Uriel Feige, Shaﬁ Goldwasser, László Lovász, Shmuel Safra, and Mario Szegedy. Approximatingclique is almost NP-complete (preliminary version). In

FOCS , pages 2–12. IEEE ComputerSociety, 1991.[FK91] H. Furstenberg and Y. Katznelson. A density version of the Hales-Jewett theorem.

Journald’Analyse Mathématique , 57(1):64–119, December 1991.[For89] Lance Jeremy Fortnow.

Complexity-theoretic aspects of interactive proof systems . PhD thesis,MIT, 1989.[FRS94] Lance Fortnow, John Rompel, and Michael Sipser. On the power of multi-prover interactiveprotocols.

Theor. Comput. Sci. , 134(2):545–557, 1994.[FV96] Uriel Feige and Oleg Verbitsky. Error reduction by parallel repetition - a negative result. InSteven Homer and Jin-Yi Cai, editors,

CCC , pages 70–76. IEEE Computer Society, 1996.[GHZ89] Daniel M. Greenberger, Michael A. Horne, and Anton Zeilinger.

Going Beyond Bell’s Theorem ,pages 69–72. Springer Netherlands, Dordrecht, 1989.[HH00] Abdolhossein Hoorfar and Mehdi Hassani. Inequalities on the Lambert W function and hyper-power function.

J. Inequal. Pure and Appl. Math , 2000.[Hol09] Thomas Holenstein. Parallel repetition: Simpliﬁcation and the no-signaling case.

Theory Com-put. , 5(1):141–172, 2009.[HY19] Justin Holmgren and Lisa Yang. The parallel repetition of non-signaling games: counterexamplesand dichotomy. In

STOC , pages 185–192. ACM, 2019.[MS13] Carl A. Miller and Yaoyun Shi. Optimal robust self-testing by binary nonlocal XOR games. In

TQC , volume 22 of

LIPIcs , pages 254–262. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,2013.[Pol12] D.H.J. Polymath. A new proof of the density Hales-Jewett theorem.

Annals of Mathematics ,175(3):1283–1327, May 2012.[PRW97] Itzhak Parnafes, Ran Raz, and Avi Wigderson. Direct product results and the GCD problem, inold and new communication models. In Frank Thomson Leighton and Peter W. Shor, editors,

STOC , pages 363–372. ACM, 1997.[Raz98] Ran Raz. A parallel repetition theorem.

SIAM J. Comput. , 27(3):763–803, 1998.[Raz11] Ran Raz. A counterexample to strong parallel repetition.

SIAM J. Comput. , 40(3):771–777,2011.[Ver96] Oleg Verbitsky. Towards the parallel repetition conjecture.

Theor. Comput. Sci. , 157(2):277–282,1996.[Yue16] Henry Yuen. A parallel repetition theorem for all entangled games. In

ICALP , volume 55 of