Approximability of all Boolean CSPs in the dynamic streaming setting
Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, Santhoshini Velusamy
CClassification of the streaming approximability of Boolean CSPs
Chi-Ning Chou ∗ Alexander Golovnev † Madhu Sudan ‡ Santhoshini Velusamy § Abstract
A Boolean constraint satisfaction problem (CSP),
Max-CSP ( f ), is a maximization problemspecified by a constraint f : {− , } k → { , } . An instance of the problem consists of m constraint applications on n Boolean variables, where each constraint application applies theconstraint to k literals chosen from the n variables and their negations. The goal is to computethe maximum number of constraints that can be satisfied by a Boolean assignment to the n variables. In the ( γ, β )-approximation version of the problem for parameters γ ≥ β ∈ [0 , γ fraction of the constraints can be satisfiedfrom instances where at most β fraction of the constraints can be satisfied.In this work we completely characterize the approximability of all Boolean CSPs in thestreaming model. Specifically, given f , γ and β we show that either (1) the ( γ, β )-approximationversion of Max-CSP ( f ) has a probabilistic streaming algorithm using O (log n ) space, or (2) forevery ε > γ − ε, β + ε )-approximation version of Max-CSP ( f ) requires Ω( √ n ) space forprobabilistic streaming algorithms. Previously such a separation was known only for k = 2. Westress that for k = 2, there are only finitely many distinct problems to consider.Our positive results show wider applicability of bias-based algorithms used previously by[GVV17], [CGV20] by giving a systematic way to explore biases. Our negative results combinethe Fourier analytic methods of [KKS15], which we extend to a wider class of CSPs, with a richcollection of reductions among communication complexity problems that lie at the heart of thenegative results. ∗ School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA. Supported byNSF awards CCF 1565264 and CNS 1618026. Email: [email protected] . † Department of Computer Science, Georgetown University. Email: [email protected] . ‡ School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA. Supported inpart by a Simons Investigator Award and NSF Award CCF 1715187. Email: [email protected] . § School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA. Supported inpart by a Simons Investigator Award and NSF Award CCF 1715187. Email: [email protected] . a r X i v : . [ c s . CC ] F e b ontents D Y , D N , T )- streaming-RMD to approximating Max-CSP( f ) . . . . . 275.4 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Introduction
In this paper we give a complete characterization of the approximability of Boolean constraintsatisfaction problems (CSPs) described by a single constraint in the streaming setting. We describethe exact class of problems below, and give a brief history of previous work before giving our results.
In this paper we use N to denote the set of natural numbers { , , , . . . } . For n ∈ N we use [ n ] todenote the set { , , . . . , n } . We refer to a variable taking values in {− , } as a Boolean variable.Given a Boolean variable X , we refer to X and − X as the literals associated with X . For vectors a , b ∈ R n we use a (cid:12) b to denote their coordinate-wise product. I.e., if a = ( a , . . . , a n ) and b = ( b , . . . , b n ) then a (cid:12) b = ( a b , . . . , a n b n ).In this paper, a Boolean CSP is a maximization problem, Max-CSP ( f ), specified by a singleconstraint function f : {− , } k → { , } for some positive integer k . Given n Boolean variables x , . . . , x n , an application of the constraint function f to these variables, which we term simply a constraint , is given by two k -tuples j = ( j , . . . , j k ) ∈ [ n ] k and b = ( b , . . . , b k ) ∈ {− , } k wherethe j i ’s are distinct, and represents the application of the constraint function f to the literals b x j , . . . , b k x j k . Specifically an assignment σ = ( σ , . . . , σ n ) ∈ {− , } n satisfies a constraint givenby ( j , b ) if f ( b σ j , . . . , b k σ j k ) = 1. For a constraint C = ( j , b ) and assignment σ we use σ | j as shorthand for ( σ j , . . . , σ j k ) and C ( σ ) as shorthand for f ( b (cid:12) σ | j ) = f ( b σ j , . . . , b k σ j k ). Aninstance Ψ of Max-CSP ( f ) consists of m constraints C , . . . , C m applied to n variables x , . . . , x n .The value of an assignment σ ∈ {− , } n on an instance Ψ = ( C , . . . , C m ), denoted val Ψ ( σ ) isthe fraction of constraints satisfied by σ , i.e., val Ψ ( σ ) = m (cid:80) i ∈ [ m ] C i ( σ ). The goal of the exact problem is to compute the maximum, over all assignments, of the value of the assignment on theinput instance, i.e., to compute, given Ψ, the quantity val Ψ = max σ ∈{− , } n { val Ψ ( σ ) } . (We notethat the literature on CSPs has several generalizations: one may allow an entire set of constraintfunctions, not just a single one. One may restrict the constraint applications to be applicable onlyto variables and not literals. And finally one can of course consider non Boolean CSPs. We donot do any of those in this paper, though extending our techniques to classes of functions seemsimmediately feasible. See more discussion in Section 1.7.)In this work we consider the approximation version of Max-CSP ( f ), which we study in terms ofthe “gapped promise problems”. Specifically given 0 ≤ β < γ ≤
1, the ( γ, β )-approximation versionof
Max-CSP ( f ), abbreviated ( γ, β )- Max-CSP ( f ), is the task of distinguishing between instancesfrom Γ = { Ψ | opt(Ψ) ≥ γ } and instances from B = { Ψ | opt(Ψ) ≤ β } . It is well-known thatthis distinguishability problem is a refinement of the usual study of approximation which usuallystudies the ratio of γ/β for tractable versions of ( γ, β )- Max-CSP ( f ). See Proposition 2.10 for aformal statement in the context of streaming approximability of Max-CSP ( f ) problems. We consider streaming algorithms that take as input instances Ψ of
Max-CSP ( f ) on n variablesand m clauses for m, n ∈ N . m and n are given to our algorithms initially and then the constraints C , . . . , C m arrive one at a time. Our algorithms are allowed to use internal randomness and s bits of space. The algorithms output a single bit at the end. They are said to solve the ( γ, β )-approximation problem correctly if they output the correct answer with probability at least 2 / / O ( poly log n ), versus algo-rithms that require space at least n ε for some ε >
0. In informal usage we refer to a streamingproblem as “easy” if it can be solved with polylogarithmic space (the former setting) and “hard”if it requires polynomial space (the latter setting).
To our knowledge, streaming algorithms for Boolean CSPs have not been investigated extensively.On the positive side, it may be surprising that there exists any non-trivial algorithm at all. Here, andlater, we describe algorithms solving the (1 , ρ ( f ) − ε )-approximation problem for ε > ρ ( f ) = 2 − k (cid:80) a ∈{− , } k f ( a ) is the fraction of clauses satisfied by a random assignment. Notethat the algorithm that always outputs 1s solves (1 , ρ ( f ) − ε )-approximation problem.It turns out that there do exist some non-trivial approximation algorithms for Boolean CSPs.This was established by the work of Guruswami, Velingker, and Velusamy [GVV17] who, in ournotation, gave an algorithm for the ( γ, γ/ − ε )-approximation version of Max-2AND , for every γ ∈ [0 ,
1] (
Max-2AND is the
Max-CSP ( f ) problem corresponding to f ( a, b ) = 1 if a = b = 1 and0 otherwise). A central ingredient in their algorithm is the ability of streaming algorithms toapproximate the (cid:96) norm of a vector in the turnstile model, which allows them to estimate the“bias” of n variables (how often they occur positively in constraints, as opposed to negatively).Subsequently, the work of Chou, Golovnev, and Velusamy [CGV20] further established the utilityof such algorithms, which we refer to as bias-based algorithms, by giving optimal algorithms forall Boolean CSPs on 2 variables. In particular they give a better (optimal!) analysis of bias-basedalgorithms for Max-2AND , and show that
Max-2SAT also has an optimal algorithm based on bias.We note that
Max-2SAT is again not covered by the results of the current paper since it involvestwo functions corresponding to clauses of length 1, and clauses of length 2.On the negative side, the problem that has been explored the most is
Max-CUT , or in ourlanguage
Max-2XOR , which corresponds to f ( x, y ) = x ⊕ y = (1 − xy ) / Kapralov, Khanna,and Sudan [KKS15] showed that
Max-2XOR does not have a (1 , / ε )-approximation algo-rithm using o ( √ n )-space. This was subsequently improved upon by Kapralov, Khanna, Sudan,and Velingker [KKSV17], and Kapralov and Krachun [KK19]. The final paper [KK19] completelyresolves Max-CUT and
Max-2XOR showing that (1 , / ε )-approximation for these problems re-quires Ω( n ) space. Turning to other problems, the work by [GVV17] notices that the (1 , / ε )-inapproximability of Max-2XOR immediately yields (1 , / ε )-inapproximability of Max-2AND aswell. In [CGV20] more sophisticated reductions are used to improve the inapproximability resultfor
Max-2AND to a ( γ, γ/ ε )-inapproximability for some positive γ , which turns out to be theoptimal ratio by their algorithm and analysis. As noted earlier their work gives optimal algorithmsfor all functions f : {− , } → { , } . Our main theorem is a dichotomy for approximating all Boolean CSPs in the streaming setting.
Theorem 1.1.
For every k ∈ N , for every function f : {− , } k → { , } , and for every ≤ β <γ ≤ , at least one of the following always holds: Strictly speaking this work does not include
Max-CUT , which does not allow constraints to be placed on arbitraryliterals.
Max-2XOR is however very closely related and in particular is harder than
Max-CUT . . ( γ, β ) - Max-CSP ( f ) has a O (log n ) -space streaming algorithm.2. For every ε > , ( γ − ε, β + ε ) - Max-CSP ( f ) requires Ω( √ n ) space. If γ = 1 , then (1 , β + ε ) - Max-CSP ( f ) requires Ω( √ n ) space.Furthermore, given the truth-table of f , and γ and β as (cid:96) -bit rationals , it can be decided inpolynomial space poly (2 k , (cid:96) ) which one of the two conditions holds. In analogy with the terminology used in the study of CSP approximation in polynomial time,we define a problem to be “(streaming)-approximation-resistant” if it is hard to beat random as-signment with n o (1) -space. Recall ρ ( f ) denotes the fraction of assignments that satisfy a function f .We say that Max-CSP ( f ) is streaming-approximation-resistant if, for every ε > δ > , ρ + ε )- Max-CSP ( f ) requires Ω( n δ ) space. (We suppress the qualifier “streaming-” formost of the paper.) We get the following dichotomy for streaming-approximation-resistance. Corollary 1.2.
For every k ∈ N , for every function f : {− , } k → { , } , if Max-CSP ( f ) isstreaming-approximation-resistant, then for every ε > , the (1 , ρ + ε ) -approximation version of Max-CSP ( f ) requires Ω( √ n ) space. If Max-CSP ( f ) is not streaming-approximation-resistant, thenthere exists ε > such that (1 − ε, ρ + ε ) - Max-CSP ( f ) can be solved in logarithmic space. Further-more, given the truth-table of the function f , there is an algorithm running in space poly (2 k ) thatdecides if Max-CSP ( f ) is streaming-approximation-resistant or not. In Section 2.4, we show how to apply our theorem above to get a full characterization of theapproximation profile of the
Max-2AND problem (i.e., the
Max-CSP ( f ) problem for f ( x, y ) = 1 if x = y = 1 and 0 otherwise). This reproduces the result shown in [CGV20] while giving a morerefined picture of the approximability. See Section 2.4.We remark that while our dichotomy theorems are in some sense “explicit” (formalized best bythe assertion that they can be decided in PSPACE given the truth table of f : {− , } k → { , } , γ ,and β , they do not necessarily resolve questions about the approximation resistance of an infinitefamily of functions such as the linear threshold functions. But they can be applied to get someuniform classes of results. We mention one below.Say that a function f supports 1-wise independence if there exists a distribution D supported onthe satisfying assignments to f , i.e., on f − (1) ⊆ {− , } k such that its marginals are all uniform,i.e., for every j ∈ [ k ], we have E a ∼ D [ a j ] = 0. Our main theorem immediately yields the followingcorollary. Corollary 1.3. If f : {− , } k → { , } supports 1-wise independence then Max-CSP ( f ) isstreaming-approximation-resistant. We also give a (very) partial converse, showing that symmetric functions are approximationresistant if and only if they support 1-wise independence (see Lemma 2.11). While we do believethat there are other streaming-approximation-resistant problems, we do not know of one (and inparticular do not give one in this paper). We discuss this more in the next section. α ∈ R is said to be an (cid:96) -bit rational if there exist integers − (cid:96) < p, q < (cid:96) such that α = p/q . .5 Contrast with dichotomies in the polynomial time setting The literature on dichotomies of
Max-CSP ( f ) problems is vast. One broad family of results here[Sch78, Bul17, Zhu17] considers the exact satisfiability problems (corresponding to distinguishingbetween instances from { Ψ | opt(Ψ) = 1 } and instances from { Ψ | opt(Ψ) < } . Another family ofresults [Rag08, AM09, KTW14] considers the approximation versions of Max-CSP ( f ) and gets “neardichotomies” along the lines of this paper — i.e., they either show that the ( γ, β )-approximation iseasy (in polynomial time), or for every ε > γ − ε, β + ε )-approximation version is hard (insome appropriate sense). Our work resembles the latter series of works both in terms of the natureof results obtained, the kinds of characterizations used to describe the “easy” and “hard” classes,and also in the proof approaches (though of course the streaming setting is much easier to analyze,allowing for much simpler proofs overall). We summarize their results giving comparisons to ourtheorem and then describe a principal contrast.In a seminal work, Raghavendra [Rag08] gave a characterization of the polynomial time approx-imability of the Max-CSP ( f ) problems based on the unique games conjecture [Kho02]. Our Theo-rem 1.1 is analogous to his theorem, though restricted to a single function, with Boolean variables,with ability to complement variables. A characterization of approximation resistant functions isgiven by Khot, Tulsiani and Worah [KTW14]. Our Corollary 1.2 is analogous to this. Austrinand Mossel [AM09] show that all functions supporting a pairwise independent distribution areapproximation-resistant. Our Corollary 1.3 is analogous to this theorem.While our results run in parallel to the work on polynomial time approximability our charac-terizations are not immediately comparable. Indeed there are some significant differences whichwe highlight below. Of course there is the obvious difference that our negative results are un-conditional (and not predicated on a complexity theoretic assumption like the unique games con-jecture or P (cid:54) = NP ). But more significantly our characterization is a bit more “explicit” than thoseof [Rag08] and [KTW14]. In particular the former only shows decidability of the problem whichtake ε as an input (in addition to γ, β and f ) and distinguishes ( γ, β )-approximable problems from( γ − ε, β + ε )-inapproximable problems. The running time of their decision procedure grows with1 /ε . In contrast our distinguishability separates ( γ, β )-approximability from “ ∀ ε >
0, ( γ − ε, β + ε )-inapproximability” — so our algorithm does not require ε as an input - it merely takes γ, β and f as input. Indeed this difference is key to the understanding of approximation resistance. Due to thestronger form of our main theorem (Theorem 1.1), our characterization of streaming-approximation-resistance is explicit (decidable in PSPACE ), whereas a decidable characterization of approximation-resistance in the polynomial time setting seems to be still open.Our characterizations also seem to differ from the previous versions in terms of the features be-ing exploited to distinguish the two classes. This leads to some strange gaps in our knowledge. Forinstance, it would be natural to suspect that (conditional) inapproximability in the polynomial timesetting should also lead to (unconditional) inapproximability in the streaming setting. But we don’thave a formal theorem proving this. One (unfulfilling) consequence of this gap in knowledge is thatwe do not yet have an streaming-approximation-resistant problem that is not covered by Corol-lary 1.3. In the polynomial time setting, Potechin [Pot19] gives a balanced linear threshold functionthat is approximation-resistant. Balanced linear threshold functions do not support 1-wise inde-pendence and so his function would be a good candidate for a streaming-approximation-resistantfunction that is not covered by Corollary 1.3. Of course, if this were false, it would be a breakthrough result giving a polynomial time (even log space) algorithmfor the unique games! .6 Overview of our analysis At the heart of our characterization is a family of algorithms for
Max-CSP ( f ) in the streamingsetting. We will describe this family soon, but the main idea of our proof is that if no algorithmin this family solves ( γ, β )- Max-CSP ( f ), then we can extract a single pair of instances, roughly a γ -satisfiable “yes” instance and an at most β -satisfiable “no” instance, that certify this inability.We then show how this pair of instances can be exploited as gadgets in a negative result. Upto this part our approach resembles that in [Rag08] (though of course all the steps are quitedifferent). The main difference is that we are able to use the structure of the algorithm and thelower bound construction to show that we can afford to consider only instances on k variables.(This step involves a non-trivial choice of definitions that we elaborate on shortly.) This bound onthe number of variables allows us to get a very “decidable” separation between approximable andinapproximable problems. Specifically we show that distinction between approximable setting andthe inapproximable one can be expressed by a quantified formula over the reals with a constantnumber of quantifiers over 2 k variables and equations — a problem that is known to be solvable in PSPACE . We give more details below.
Bias-based algorithms.
For every λ = ( λ , . . . , λ k ) ∈ R k we define the λ -bias measure of aninstance Ψ of Max-CSP ( f ) as follows. Let p ij denote the number of occurrences of the literal x i as the j th variable in a constraint, and let n ij denote the same quantity for the literal − x i . Let bias i,j = m ( p ij − n ij ). We define the λ -bias of the i th variable to be a weighted sum of bias i,j as follows: bias λ (Ψ) i = (cid:80) kj =1 λ j bias i,j . Let the bias vector of the instance Ψ be bias λ (Ψ) =( bias λ (Ψ) , . . . , bias λ (Ψ) n ). It turns out that the ability to estimate the (cid:96) norm of a vector in the“turnstile model” implies that for any given λ vector, we can estimate the (cid:96) norm of bias λ (Ψ)(to within a multiplicative factor of (1 ± ε ) for arbitrarily small ε > γ, β )- Max-CSP ( f ) using only an estimate of the (cid:96) norm of bias λ (Ψ) (forsome λ based on f, γ, β ) as a “bias-based algorithm”. A priori it is not clear how to choose a λ vector for a given problem. The crux of our analysis is to identify two (bounded, closed) convexsets K Yγ , K Nβ ⊆ R k such that if the two sets are disjoint then the hyperplane separating them givesus the desired λ .We now give some insight into the sets K Yγ and K Nβ . Roughly these sets capture properties ofinstances of Max-CSP ( f ) on k variables , say x , . . . , x k . The instances we consider are special inthat x i always appears as the i th variable in every constraint: the only variability being in whetherit appears positively or negatively. The set K Yγ consists of the bias vectors bias λ (Ψ) of all instancesΨ that have val Ψ (1 k ) ≥ γ , i.e., the assignment of all 1’s satisfied γ fraction of the constraints ofΨ. The set K Nβ is similarly supposed to capture the biases bias λ (Ψ) of instances Ψ for which thevalue is at most β . Determining exactly which assignments achieve this bounded value turns outto be subtle and we defer describing it here. But given our choice, our analysis roughly works asfollows: Given an instance Ψ on n variables, we create a distribution D (Ψ) ∈ ∆( {− , } k ) and itsprojection µ onto R k such that if Ψ is a YES instance, then µ ends up being in K Yγ , while if Ψ is aNO instance, µ ∈ K Nβ . Most crucially, the (cid:96) norm of bias λ (Ψ) exactly corresponds to the distancefrom µ to the hyperplane separating K Yγ and K Nβ , which allows us to distinguish the YES and NOcases. Details of the definition of sets can be found in Section 2 and the analysis of the algorithmcan be found in Section 4. 7 ommunication complexity of hidden partitions. Hardness results in streaming are usuallyobtained by appealing to lower bounds for one-way communication complexity and our work is nodifferent. The rough idea is to create instances Ψ that are divided into a (large) constant numberof sub-instances Ψ , . . . , Ψ T that are on the same set of variables, x , . . . , x n . In YES instances,the sub-instances are chosen so that a planted assignment chosen uniformly satisfies γ fraction ofthe constraints. In NO instances, the sub-instances are chosen “randomly” so that no assignmentis very likely to satisfy β fraction of the constraints. The division into sub-instances is used asfollows: no two constraints within a sub-instance share variables - so an algorithm with limitedmemory when facing the stream corresponding to Ψ t would not really see any interesting patternslocally, and so would need to remember “details” about Ψ , . . . , Ψ t − . However, and this is whereour sets K Y and K N come into play, remembering univariate marginals (or how often x i appearedpositively or negatively) would hopefully be of no use since both the YES and the NO distributionswould have exactly the same marginals.Implementing this reduction to the communication complexity problem is mostly straightfor-ward given previous works. We don’t describe the reduction but only the reduced communicationproblem. We consider a two-player one-way communication problem, which we call the RandomizedMask Detection (RMD) problem where Alice gets a vector x ∗ ∈ {− , } n chosen uniformly at ran-dom, and Bob gets a random k -uniform hypermatching M with αn hyperedges on [ n ], along witha vector z ∈ {− , } kαn whose distribution depends on whether we are in the YES case or NO case(here α is some small but positive constant). Specifically, z specifies the x -values of the verticestouched by M , but this information is hidden partially by picking for each edge (independently) amasking vector b and letting z for this edge be the information for x ∗ masked by (xor’ing with) b . See Section 5.2 for a mathematically precise statement. The key difference between the YESinstance and the NO instance is the distribution of b : In the YES case, it is chosen according tosome distribution D Y supported on {− , } k whose marginals are in K Nγ ; and in the NO case, theycome from the distribution D N whose marginals are in K Yβ . Of course, we apply this reductiononly in the setting where the two sets of marginals intersect, so for our purpose we can ignore K Nγ and K Yβ , and just consider two arbitrary distributions D Y and D N with matching marginals. Thetechnical meat of our negative result is proving that for an arbitrary pair of distributions D Y and D N with matching marginals, any one-way communication protocol with o ( √ n ) communicationhas o (1)-advantage in distinguishing the YES and NO cases. See Theorem 5.3.The work of Kapralov, Khanna, and Sudan [KKS15] seeds our quest by showing that ( D Y , D N )-RMD is hard on the special case where D Y is uniform on { (1 , , ( − , − } and D N is uniform on {− , } . Strictly speaking their formalism is slightly different — and one in which we are notable to express all our problems, but their proof for this case certainly applies to our formalism.The proof of [KKS15] is Fourier analytic, based on prior work of Gavinsky, Kempe, Kerenidis, Raz,and de Wolf [GKK + {− , } k for all values of k , and to all distributions D Y and D N thathave uniform marginals . This is reported in Section 6.Somewhat to our surprise we were unable to extend the Fourier analytic proof to the case where D Y and D N have arbitrary but matching marginals. To get the full case, we turn to reductions.Specifically we show that while we cannot directly prove the indistinguishability of general D Y and In order to handle the general
Max-CSP problem, in RMD we extend the previous framework with a more detailedencoding of the hypermatching M , and also allow for a general masking vector b . Due to these extensions, we cannotimmediately conclude hardness of RMD from previous results, and we prove it from scratch. N with matching marginals, we can use the indistinguishability for uniform marginals as a tool(via reductions) to show indistinguishability of some restricted pairs of distributions ( D , D (cid:48) ). Thekey to the final result is that for any pair of distributions D Y and D N with matching marginals,there is a path from one to the other of finite length (our upper bound is poly ( k !)) such that everyadjacent pair of distributions on the path is indistinguishable by our aforementioned reductionsfor restricted pairs. We remark that while D Y and D N are typically chosen to have interestingproperties with respect to their value on various assignments, the intermediate distributions maynot have any interesting properties for the underlying optimization problem! But the generality ofthe framework turns out to be a strength in that we can refer to these problems anyway and usetheir indistinguishability features. The path from D Y to D N allows us to use triangle-inequalityfor indistinguishability to get the final result on indistinguishability of RMD on distributions withmatching marginals. Details of this part can be found in Section 7. Some of the main questions left open in this work are:1. Can the methods be extended to handle the case where the constraints come from a family offunctions, rather than a single function? We believe this should be straightforward to achieve.2. Can we further extend the results to the setting where the constraints are not placed onliterals, but rather only on variables? Such an extension seems to require new ideas beyondthose in this paper.3. Can we extend the results to the non-Boolean setting, i.e., when the variables take on valuesfrom an arbitrary finite set, as opposed to {− , } . We stress that both the positive andnegative results in this paper exploit restrictions of the Boolean setting! In this direction,Guruswami and Tao [GT19] proved that (1 /p + ε )-approximation for the unique games withalphabet size p requires ˜Ω( √ n ) space in the streaming setting.4. Can the lower bound for the hard problems be improved to linear? Such an improvementwas given by Kapralov and Krachun [KK19] for the Max-2LIN problem (
Max-CSP ( f ) where f ( x, y ) = x ⊕ y ) in a technical tour-de-force. Extending this work to other optimizationproblems seems non-trivially challenging.5. Finally, our work and all the questions above only consider the setting of single-pass stream-ing algorithms. Once this is settled, it would make sense to extend the analyses to multi-passalgorithms. While there are several multi-pass streaming algorithms and lower bounds (see,e.g., [Cha20, McG14, GM08] and references therein), we note that Assadi, Kol, Saxena,and Yu [AKSY20] recently suggested a multi-round version of the Boolean Hidden Hyper-matching problem that allows to extend some previous single-pass results (including a lowerbound for approximate Max-2LIN ) to the multi-pass setting.
In Section 2, we describe our result in detail. In particular we give an explicit criterion to distinguishthe easy and hard
Max-CSP ( f ) problems in the streaming setting. Section 3 contains some of thepreliminary background used in the rest of the paper. In Section 4, we describe and analyze our9lgorithm that yields our easiness result. In Section 5, we define the central family of communicationproblems that lie at the heart of our negative results and prove the negative result for streamingproblems assuming the communication problems are hard. In Section 6, we establish the desiredlower bounds for a subclass of the problems by Fourier analytic methods. In Section 7, we establishreductions between the communication problems that allow us to extend our negative results tothe entire set. We start with some notation needed to state our results. We use R ≥ to denote the set of non-negative real numbers. For a finite set Ω, let ∆(Ω) denote the space of all probability distributionsover Ω, i.e., ∆(Ω) = {D : Ω → R ≥ | (cid:88) ω ∈ Ω D ( ω ) = 1 } . We view ∆(Ω) as being contained in R | Ω | . We use X ∼ D to denote a random variable drawn fromthe distribution D . The main objects that allow us to derive our characterization are the space of distributions onconstraints that either allow a large number of constraints to be satisfied, or only a few constraintsto be satisfied. To see where the distributions come from, note that distributions of constraints over n variables can naturally be identified with instances of weighted constraint satisfaction problem(where the weight associated with a constraint is simply its probability). In what follows we willconsider instances on exactly k variables x , . . . , x k . Furthermore all constraints will use x i asthe i th variable. Hence, a constraint on k variables is specified by b ∈ {− , } k , specifying theconstraint f ( b x , . . . , b k x k ). Thus in what follows we will equate “instances on k variables” withdistributions on {− , } k .Given 0 ≤ β ≤ γ ≤ S Yγ = S Yγ ( f ) will be instances where γ fraction of the constraints are satisfied by the assignment 1 k .The second set S Nβ = S Nβ ( f ) is a bit more subtle: it consists of instances where no “independentidentical distribution” on the variables satisfies more that β -fraction of the clauses. To elaborate,recall that the only distributions on a single variable taking values in {− , } are the Bernoullidistributions. Let Bern ( p ) denote the distribution that takes the value 1 with probability 1 − p and − p . Then an instance belongs to S Nβ if for every p , when ( x , . . . , x k ) getsa random assignment chosen according to Bern ( p ) k , the expected fraction of satisfied clauses is atmost β . The following is our formal definition. Definition 2.1 (Space of Yes/No Distributions) . For γ, β ∈ R , we define S Yγ = S Yγ ( f ) = {D Y ∈ ∆( {− , } k ) | E b ∼D Y [ f ( b )] ≥ γ } and S Nβ = S Nβ ( f ) = {D N ∈ ∆( {− , } k ) | E b ∼D N E a ∼ Bern ( p ) k [ f ( b (cid:12) a )] ≤ β, ∀ p } . γ > β the sets S Yγ and S Nβ are clearly disjoint. But their marginals, when projected tosingle coordinates need not be, and this is the crux of our characterization. In what follows, wedefine sets K Yγ and K Nβ to be the marginals of the distributions in S Yγ and S Nβ respectively. Fora distribution D ∈ ∆( {− , } k ), let µ ( D ) denote its marginals, i.e., µ ( D ) = ( µ , . . . , µ k ) where µ i = E b ∼D [ b i ]. Definition 2.2 (Marginals of Yes/No Distributions) . For γ, β ∈ R , we define K Yγ = K Yγ ( f ) = { µ ( D Y ) | D Y ∈ S Yγ } and K Nβ = K Nβ ( f ) = { µ ( D N ) | D N ∈ S Nβ } . With the two definitions above in hand we are ready to describe our characterizations of easyvs. hard approximation versions of
Max-CSP ( f ). Our main result, stated formally below, roughly says that the
Max-CSP ( f ) problem is ( γ, β )-approximable if and only if the sets K Yγ and K Nβ do not intersect. Theorem 2.3.
For every function f : {− , } k → { , } and for every ≤ β < γ ≤ , the followinghold:1. If K Yγ ( f ) ∩ K Nβ ( f ) = ∅ , then ( γ, β ) - Max-CSP ( f ) admits a probabilistic streaming algorithmthat uses O (log n ) space.2. If K Yγ ( f ) ∩ K Nβ ( f ) (cid:54) = ∅ , then for every ε > , the ( γ − ε, β + ε ) -approximation versionof Max-CSP ( f ) requires Ω( √ n ) space . Furthermore, if γ = 1 , then (1 , β + ε ) - Max-CSP ( f ) requires Ω( √ n ) space.Proof of Theorem 2.3. Part (1) of the theorem is restated and proved as Theorem 4.1 in Section 4.Part (2) is proved as Theorem 5.1 in Section 5.4.We now turn to the implications of this theorem. First, to get Theorem 1.1 from Theorem 2.3,we need to show that the question “Is K Yγ ∩ K Nβ = ∅ ?” can be decided in polynomial space. Tothis end, we first make the following observation. Lemma 2.4.
For every β, γ ∈ [0 , the sets S Yγ , S Nβ , K Nγ and K Yβ are bounded, closed and convex.Furthermore, K Yγ ∩ K Nβ = ∅ can be expressed in the quantified theory of reals with quantifieralternations, O (2 k ) variables, and polynomials of degree at most k + 1 .Proof. We start by considering the sets S Yγ and S Nβ . It is straightforward to see that S Yγ is abounded and convex polytope in R k . S Nβ is a bit more subtle due to the universal quantificationover p ∈ [0 , R k and so is still abounded and convex set (though not necessarily a polytope). K Yγ (resp. K Nβ ) is obtained by alinear projection from R k to R k . So K Yγ is a bounded, closed, and convex polytope in R k , while K Nβ is still a bounded, closed, and convex set. The constant hidden in the Ω notation may depend on k and ε .
11o get an intersection detection algorithm we use one more property. Note that for variable p ,the condition E a ∼D N E b ∼ Bern ( p ) k [ f ( b (cid:12) a )] ≤ β is a polynomial inequality in p of degree at most k ,with coefficients that are linear forms in D N ( b ), b ∈ {− , } k . This allows us to express thecondition K Yγ ∩ K Nβ (cid:54) = ∅ using the following system of quantified polynomial inequalities: ∃ D Y , D N ∈ R k , ∀ p ∈ [0 ,
1] s.t. D Y , D N are distributions, (2.5) ∀ i ∈ [ k ] , E b ∼D Y [ b i ] = E b ∼D N [ b i ] , (2.6) E b ∼D Y [ f ( b )] ≥ γ, (2.7) E b ∼D N E a ∼ Bern ( p ) k [ f ( a (cid:12) b )] ≤ β. (2.8)Note that Equations (2.5), (2.6) and (2.7) are just linear inequalities in the variables D Y , D N anddo not depend on p . As noticed above Equation (2.8) is an inequality in p , and D N , of degree k in p ,and 1 in D N . We thus get that the intersection problem can be expressed in the quantified theoryof the reals by an expression with two quantifier alternations, 2 k variables and O (2 k ) polynomialinequalities, with polynomials of degree at most k + 1. (Most of the inequalities are of the form D Y ( b ) ≥ D N ( b ) ≥
0. Only O ( k ) inequalities are not of that form; and of these, only one isnon-linear.)The quantified theory of the reals is known to be solvable in PSPACE. In particular we mayuse the following theorem. Theorem 2.9 ([BPR06, Theorem 14.11, see also Remark 13.10]) . The truth of a quantified formulawith w quantifier alternations over K variables and polynomial (potentially strict) inequalities canbe decided in space K O ( w ) and time K O ( w ) . (Specifically, Theorem 14.11 in [BPR06] asserts the time complexity above, and Remark 13.10yields the space complexity.)Theorem 1.1 now follows immediately. Proof of Theorem 1.1.
Theorem 2.3 asserts that the ( γ, β )-approximation version of
Max-CSP ( f )is easy if and only if K Yγ ∩ K Nβ = ∅ . Lemma 2.4 asserts that this condition is in turn expressible inthe quantified theory of the reals with 2 quantifier alternations. Finally Theorem 2.9 asserts thatthis can be decided in polynomial space. The theorem follows.We note that the literature on approximation algorithms usually considers a single parameterversion of the problem. In our context we would say that an algorithm A is a α -approximationalgorithm for Max-CSP ( f ) if for every instance Ψ, we have α · val Ψ ≤ A (Ψ) ≤ val Ψ . The following proposition converts our main theorem in terms of this standard notion.
Proposition 2.10.
Fix f : {− , } k and let K Yγ and K Nβ denote the space of marginals for thisfunction f . Let α = inf β ∈ [0 , sup γ ∈ ( β,
1] s . t K Y γ ∩ K N β = ∅ { β/γ } . hen for every ε > , there is an ( α − ε ) -approximation algorithm for Max-CSP ( f ) that uses O (log n ) space. Conversely every ( α + ε ) -approximation algorithm for Max-CSP ( f ) requires Ω( √ n ) space.Proof. For the positive result, let τ (cid:44) ε · ρ ( f ) /
2, where ρ ( f ) = 2 − k (cid:80) a ∈{− , } k f ( a ) is the fractionof clauses satisfied by a random assignment. Let A τ = { ( iτ, jτ ) ∈ [0 , | i, j ∈ Z ≥ , i > j, K Yiτ ∩ K Njτ = ∅} . By Theorem 2.3, for every ( γ, δ ) ∈ A τ there is a O (log n log(1 /τ ))-space algorithm for ( γ, β )- Max-CSP ( f ) with error probability 1 / (10 τ ), which we refer to as the ( γ, β )-distinguisher below. Inthe following we consider the case where all O ( τ − ) distinguishers output correct answers, whichhappens with probability at least 2 / O ( τ − log(1 /τ ) log n ) = O (log n ) space ( α − ε )-approximation algorithm for Max-CSP ( f ) isthe following: On input Ψ, run in parallel all the ( γ, β )-distinguishers on Ψ, for every ( γ, β ) ∈ A τ .Let β = arg max β [ ∃ γ such that the ( γ, β )-distinguisher outputs YES on Ψ] . Output β (cid:48) = max { ρ ( f ) , β } .We now prove that this is an ( α − ε )-approximation algorithm. First note that by the correctnessof the distinguisher we have β (cid:48) ≤ val Ψ . Let γ be the smallest multiple of τ satisfying γ ≥ ( β + τ ) /α . By the definition of α , we have that K Yγ ∩ K Nβ + τ = ∅ . So ( γ , β + τ ) ∈ A τ and so the( γ , β + τ )-distinguisher must have output NO on Ψ (by the maximality of β ). By the correctnessof this distinguisher we conclude val Ψ ≤ γ ≤ ( β + τ ) /α + τ ≤ ( β (cid:48) + τ ) /α + τ . We now verify that( β (cid:48) + τ ) /α + τ ≤ β (cid:48) / ( α − ε ) and this gives us the desired approximation guarantee. We have( β (cid:48) + τ ) /α + τ ≤ ( β (cid:48) + 2 τ ) /α ≤ ( β (cid:48) /α ) · (1 + 2 τ /ρ ( f )) = ( β (cid:48) /α )(1 + ε ) ≤ ( β (cid:48) / ( α (1 − ε ))) , where the first inequality uses α ≤
1, the second uses β (cid:48) ≥ ρ ( f ), the equality comes from thedefinition of τ and the final inequality uses (1 + ε )(1 − ε ) ≤
1. This concludes the positive result.The negative result is simpler. Given γ, β with β/γ ≥ α + ε , we can use an ( α + ε )-approximationalgorithm A to solve the ( γ, β )- Max-CSP ( f ), by outputting YES if A (Ψ) ≥ β and NO otherwise. We now turn to Corollary 1.2. Recall that for a function f : {− , } k → { , } , we define ρ ( f ) =2 − k · |{ a ∈ {− , } k : f ( a ) = 1 }| to be the probability that a uniformly random assignmentsatisfies f . Recall further that f is approximation-resistant if for every ε >
0, the (1 , ρ ( f ) + ε )-approximation version of Max-CSP ( f ) requires polynomial space. Proof of Corollary 1.2.
By Theorem 2.3 we have that
Max-CSP ( f ) is approximation-resistant ifand only if K Y ∩ K Nρ ( f )+ ε (cid:54) = ∅ for every ε >
0. In turn, this is equivalent to saying
Max-CSP ( f ) isapproximation resistant if and only if K Y ∩ K Nρ ( f ) (cid:54) = ∅ . If K Y ∩ K Nρ ( f ) = ∅ , then by the property thatthese sets are closed, we have that there must exist ε > K Y − ε ∩ K Nρ ( f )+ ε = ∅ . In turn thisimplies, again by Theorem 2.3, that the (1 − ε, ρ ( f ) + ε )-approximation version of Max-CSP ( f ) canbe solved by a streaming algorithm with O (log n ) space. Finally, from Lemma 2.4 and Theorem 2.9the condition “Is K Y ∩ K Nρ ( f ) = ∅ ?” can be checked in polynomial space.13o get Corollary 1.3, we perform some basic reasoning about the sets K Y and K Nρ ( f ) . Proof of Corollary 1.3.
We argue that the vector 0 k belongs to both K Y and K Nρ ( f ) . Theorem 2.3now implies the assertion.Let D Y be the distribution proving that f supports a 1-wise independent distribution, i.e., D Y is supported on f − (1) and satisfies E b ∈D Y [ b i ] = 0 for every i ∈ [ k ]. It follows that D Y ∈ S Y and0 k ∈ K Y .Let D N be the uniform distribution on {− , } k . Note that for every a ∈ {− , } k we have a (cid:12) b isuniformly distributed over {− , } k if b ∼ D N . Consequently, for every a we get E b ∼D N [ f ( b (cid:12) a )] = ρ ( f ), and so for every p ∈ [0 , E a ∼ Bern ( p ) k E b ∼D N [ f ( b ◦ a )] = ρ ( f ) . We conclude the D N ∈ S Nρ ( f ) and so 0 k ∈ K Nρ ( f ) .We conclude from Theorem 2.3 that Max-CSP ( f ) is not (1 , ρ ( f ) − ε )-approximable and so isapproximation-resistant. We illustrate the applicability of our theorem with two examples. The first is of the specific function
Max-2AND , i.e.,
Max-CSP ( f ) for f ( a, b ) = a ∧ b , i.e., f ( a, b ) = 1 if and only if a = b = 1. Example 1 (
Max-2AND ). For the function f : {− , } → { , } given by f (1 ,
1) = 1 and f ( a, b ) = 0 otherwise, wewould like to calculate the quantity inf β sup γ | K Yγ ∩ K Nβ = ∅ β/γ . We first note that due to thesymmetry of f , we have K Yγ is symmetric, i.e., ( µ , µ ) ∈ K Yγ ⇔ ( µ , µ ) ∈ K Yγ . Similarlywith K βN . Further by convexity of K Yγ and K Nβ we get there exists a pair ( µ , µ ) ∈ K Yγ ∩ K Nβ if and only if there exists a µ such that ( µ, µ ) ∈ K Yγ ∩ K Nβ . We now define two functionsthat will help us answer the question if such a µ exists. Let γ ( µ ) := max γ | ( µ,µ ) ∈ K Yγ { γ } & β ( µ ) := min β | ( µ,µ ) ∈ K Nβ { β } . Note that K Yγ ∩ K Nβ (cid:54) = ∅ if and only if there exists a µ such that γ ≤ γ ( µ ) and β ≥ β ( µ ).With some minimal calculations for γ ( µ ) and some slightly more involved ones for β ( µ ) wecan show γ ( µ ) = 1 + | µ | β ( µ ) = (cid:40) | µ | , | µ | ≥ −| µ | ) − | µ | ) , else.With the above in hand we can analyze when K Yγ ∩ K Nβ = ∅ .First, when γ ≤ /
2, note that (0 , ∈ K Yγ and hence K Yγ ∩ K Nβ (cid:54) = ∅ for all β ≥ /
4. When γ > /
2, we set | µ | = 2 γ − γ ( µ ) and β ( µ ) only depend on | µ | ) to get β ( µ ) (cid:12)(cid:12) | µ | =2 γ − = (cid:40) (1 − γ ) − γ , / ≤ γ < / γ − , / ≤ γ . α ( β ) = sup γ ∈ [ β, | K Yγ ∩ K Nβ = ∅ β/γ is minimized at β = 4 / α = 4 /
9, which is consistent with the findings in [CGV20] for the
Max-2AND problem. Our more refined analysis also shows that α ( β ) approaches 1 as β → γ , β , and β/γ with respect to µ . β/γ is minimized at β = 4 / Lemma 2.11 (1-wise independence implies approximation resistant) . For a symmetric function f : {− , } k → { , } , Max-CSP ( f ) is approximation resistant if and only if it supports a 1-wiseindependent distribution.Proof. One direction of the implication directly follows from Corollary 1.2. For the other direction,we use Fourier analysis. The necessary definitions are included in Section 3.4. A symmetricfunction f is given by a set of “levels” L = { (cid:96) , . . . , (cid:96) t } ⊆ {− k, . . . , k } such that f ( a , . . . , a k ) = 1if and only if (cid:107) a (cid:107) = (cid:80) ki =1 a i ∈ L . If L contains 0, or if L contains both positive and negativeelements, then f supports a 1-wise independent distribution. So we conclude L contains onlypositive elements or only negative elements. Without loss of generality we consider the case where L contains only positive elements.Let ρ = ρ ( f ), first note that both K Y and K Nρ are symmetric since f is symmetric. Thus, bythe convexity of the sets, it suffices to consider vectors of the form µ k = ( µ, µ, . . . , µ ) in K Y and K Nρ . Since L contains only positive elements, it follows that for µ k ∈ K Y , we must have µ >
0. Toprove that
Max-CSP ( f ) is not approximation resistant, it suffices to show that for µ > µ k is notcontained in K Nρ . Consider a distribution D ∈ S Nρ with µ ( D ) = µ k . It can be shown by elementaryFourier analysis that if a ∼ Bern (1 / ε ) k and b ∼ D then E b ∼D E a ∼ Bern ( p ) k [ f ( b (cid:12) a )] = ρ + Ω( µτ ε ) − O ( ε ) , where τ is the sum of the first level Fourier coefficients of f (i.e., τ = (cid:80) || w || =1 ˆ f ( w )), and the Ω( · )and O ( · ) notations hide constants depending on f and D , but not on ε >
0. Due to the symmetry of Indeed, if (cid:96) , (cid:96) ∈ L , where (cid:96) < (cid:96) >
0, then a distribution D that with probability p = (cid:96) / ( (cid:96) − (cid:96) ) samplesa random a of Hamming weight (cid:107) a (cid:107) = (cid:96) and with probability 1 − p samples a random a of weight (cid:107) a (cid:107) = (cid:96) is1-wise independent and is supported on f − (1). , all the first level Fourier coefficients are equal, and due to the positivity of L , all these coefficientsare positive. It follows that for some sufficiently small ε >
0, the expected probability of satisfyinga constraint is strictly larger than ρ thus proving µ k (cid:54)∈ K Nρ . We conclude K Y ∩ K Nρ = ∅ , and so Max-CSP ( f ) is not approximation-resistant. We will follow the convention that n denotes the number of variables in the CSP as well as thecommunication game, m denotes the number of constraints in the CSP, and k denotes the arity ofthe CSP. We use N to denote the set of natural numbers { , , , . . . } and use [ n ] to denote the set { , , . . . , n } . By default, the Boolean variable in this paper takes value in {− , } .For variables of a vector form, we write them in boldface, e.g., x ∈ {− , } n , and its i -th entryis written without boldface, e.g., x i . For variable being a vector of vectors, we write it, for example,as b = ( b (1) , b (2) , . . . , b ( m )) where b ( i ) ∈ {− , } k . The j -th entry of the i -th vector of b is thenwritten as b ( i ) j . Let x and y be two vectors of the same length, x (cid:12) y denotes the entry-wiseproduct of them.For every p ∈ [0 , Bern ( p ) denotes the Bernoulli distribution that takes value 1 with proba-bility p and takes value − − p . Let f : {− , } k → { , } be a Boolean constraint function of arity k and x , . . . , x n be variables.A constraint C consists of j = ( j , . . . , j k ) ∈ [ n ] k and b = ( b , . . . , b k ) ∈ {− , } k where the j i ’sare distinct. The constraint C reads as requiring f ( b (cid:12) x | j ) = f ( b x j , . . . , b k x j k ) = 1. A Max-CSP ( f ) instance Ψ contains m constraints C , . . . , C m where C i = ( j ( i ) , b ( i )) for each i ∈ [ m ]. Foran assignment σ ∈ {− , } n , the value val Ψ ( σ ) of σ on Ψ is the fraction of constraints satisfiedby σ , i.e., val Ψ ( σ ) = m (cid:80) i ∈ [ m ] f ( b ( i ) (cid:12) σ | j ( i ) ). The optimal value of Ψ is defined as val Ψ =max σ ∈{− , } n val Ψ ( σ ). The approximation version of Max-CSP ( f ) is defined as follows. Definition 3.1 (( γ, β )- Max-CSP ( f )) . Let f : {− , } k → { , } be a constraint function and ≤ β < γ ≤ . For each m ∈ N , let Γ m = { Ψ = ( C , . . . , C m ) | val Ψ ≥ γ } and B m = { Ψ =( C , . . . , C m ) | val Ψ ≤ β } .The task of ( γ, β ) - Max-CSP ( f ) is to distinguish between instances from Γ = ∪ m ≤ poly ( n ) Γ m andinstances from B = ∪ m ≤ poly ( n ) B m . For α ∈ [0 , ALG is an α -approximation to the Max-CSP ( f ) problem if ALG can solve ( γ, β )- Max-CSP ( f ) with success probability at least 2 / ≤ β < γ ≤ β/γ ≤ α .Let ρ ( f ) = 2 − k · |{ a ∈ {− , } k | f ( a ) = 1 }| denote the probability that a uniformly randomassignment satisfies f . We say f is streaming-approximation-resistant if for every ε >
0, the(1 , ρ ( f ) + ε )- Max-CSP ( f ) requires Ω( n δ ) space for some constant δ > The total variation distance between probability distributions plays an important role in our anal-ysis. 16 efinition 3.2 (Total variation distance of discrete random variables) . Let Ω be a finite probabilityspace and X, Y be random variables with support Ω . The total variation distance between X and Y is defined as follows. (cid:107) X − Y (cid:107) tvd := 12 (cid:88) ω ∈ Ω | Pr[ X = ω ] − Pr[ Y = ω ] | . We will use the triangle and data processing inequalities for the total variation distance.
Proposition 3.3 (E.g.,[KKS15, Claim 6.5]) . For random variables
X, Y and W : • (Triangle inequality) (cid:107) X − Y (cid:107) tvd ≥ (cid:107) X − W (cid:107) tvd − (cid:107) Y − W (cid:107) tvd . • (Data processing inequality) If W is independent of both X and Y , and f is a function, then (cid:107) f ( X, W ) − f ( Y, W ) (cid:107) tvd ≤ (cid:107) X − Y (cid:107) tvd . We will use the following concentration inequality which is essentially an Azuma-Hoeffding styleinequality for submartingales. The form we use is from [KK19].
Lemma 3.4 ([KK19, Lemma 2.5]) . Let X = (cid:80) i ∈ [ N ] X i where X i are Bernoulli random variablessuch that for any k ∈ [ N ] , E [ X k | X , . . . , X k − ] ≤ p for some p ∈ (0 , . Let µ = N p . For any ∆ > , Pr [ X ≥ µ + ∆] ≤ exp (cid:18) − ∆ µ + 2∆ (cid:19) . We will need the following basic notions from Fourier analysis over the Boolean hypercube (see, forinstance, [O’D14]). For a Boolean function f : {− , } k → R its Fourier coefficients are defined by (cid:98) f ( v ) = E a ∈{− , } k [ f ( a ) · ( − v (cid:62) a ], where v ∈ { , } k . We need the following two important tools. Lemma 3.5 (Parseval’s identity) . For every function f {− , } k → R , (cid:107) f (cid:107) = 12 k (cid:88) a ∈{− , } k f ( a ) = (cid:88) v ∈{ , } k (cid:98) f ( v ) . Note that for every distribution f on {− , } k , (cid:101) f (0 k ) = 2 − k . For the uniform distribution U on {− , } k , (cid:98) U ( v ) = 0 for every v (cid:54) = 0 k . Thus, by Lemma 3.5, for any distribution f on {− , } k : (cid:107) f − U (cid:107) = (cid:88) v ∈{ , } k (cid:16) (cid:98) f ( v ) − (cid:98) U ( v ) (cid:17) = (cid:88) v ∈{ , } k \{ k } (cid:98) f ( v ) . (3.6)Next, we will use the following consequence of hypercontractivity for Boolean functions as givenin [GKK +
09, Lemma 6] which it turns relies on a lemma from [KKL88].
Lemma 3.7.
Let f : {− , } n → {− , , } and A = { a ∈ {− , } n | f ( a ) (cid:54) = 0 } . If | A | ≥ n − c forsome c ∈ N , then for every (cid:96) ∈ { , . . . , c } , we have n | A | (cid:88) v ∈{ , } n (cid:107) v (cid:107) = (cid:96) (cid:98) f ( v ) ≤ (cid:32) √ c(cid:96) (cid:33) (cid:96) . Streaming Algorithm
In this section we give our main algorithmic result — a O (log n )-space streaming algorithm for the( γ, β )- Max-CSP ( f ) if K Yγ = K Yγ ( f ) and K Nβ = K Nβ ( f ) are disjoint. (See Definition 2.2.)We state our main theorem of this section which simply repeats Part (1) of Theorem 2.3. Theorem 4.1.
For every function f : {− , } k → { , } and for every ≤ β < γ ≤ , if K Yγ ( f ) ∩ K Nβ ( f ) = ∅ , then ( γ, β ) - Max-CSP ( f ) admits a probabilistic streaming algorithm that uses O (log n ) space and succeeds with probability at least / . The overview of the algorithm is as follows: We use the separability of K Yγ and K Nβ to obtain ahyperplane with normal vector λ that seperates the two sets. We then estimate a λ -weighted biasof a given instance Ψ and accept Ψ if this bias falls on the K Yγ side of the hyperplane. We notethat the bias can be approximated arbitrarily well using well-known (cid:96) -norm approximators in theturnstile model. The bulk of the work is in analyzing the correctness of our algorithm.We will use the following streaming algorithm for approximating the (cid:96) norm of a vector. Proposition 4.2 ([Ind00],[KNW10, Theorem 2.1]) . Given a stream S of poly ( n ) updates ( i, v ) ∈ [ n ] × {− M, − ( M − , . . . , M − , M } where M = poly ( n ) , let x i = (cid:80) ( i,v ) ∈ S v for i ∈ [ n ] . Forevery ε > , there exists a streaming algorithm that uses O (log n ) bits of memory and outputs a (1 ± ε ) -approximation to the value (cid:107) x (cid:107) = (cid:80) i | x i | with probability at least / . Let us start with the definition of λ -bias. Definition 4.3 (Bias (vector)) . For λ = ( λ , . . . , λ k ) ∈ R k , and instance Ψ = ( C , . . . , C m ) of Max-CSP ( f ) where C i = ( j ( i ) , b ( i )) , we let the λ -bias vector of Ψ , denoted bias λ (Ψ) , be the vectorin R n given by bias λ (Ψ) (cid:96) = 1 m · (cid:88) i ∈ [ m ] ,t ∈ [ k ]: j ( i ) t = (cid:96) λ t b t ( i ) , for (cid:96) ∈ [ n ] . The λ -bias of Ψ , denoted B λ (Ψ) , is the (cid:96) norm of bias λ (Ψ) , i.e., B λ (Ψ) = (cid:80) n(cid:96) =1 | bias λ (Ψ) (cid:96) | . By directly applying the known (cid:96) -sketching algorithm (i.e., Proposition 4.2), the followinglemma shows that λ -bias can be estimated in O (log n ) space. Lemma 4.4.
For every vector λ ∈ R k and ε > , there exists a O (log n ) space algorithm A that,on input Ψ outputs a (1 ± ε ) -approximation to B λ (Ψ) , i.e., for every Ψ , (1 − ε ) B λ (Ψ) ≤ A (Ψ) ≤ (1 + ε ) B λ (Ψ) , with probability at least / .Proof. Note that since k and ε are constants with respect to n , we can without loss of generalityassume that each entry of λ is an integer and ε has constant bit complexity. Next, for each i ∈ [ m ] and t ∈ [ k ], let ( j ( i ) t , λ t ) be an update. Since m = poly ( n ) and k is a constant, we know that there are only poly ( n ) updates and each update is constant integer. Concretely, round ε to 2 − t where t is the smallest integer such that ε ≥ − t . As for λ , let λ min = min j ∈ [ k ] | λ j | andround it with the same way as we did for ε . Next, for each j ∈ [ k ], scale and round λ j to (cid:100) λ j λ min (cid:101) . It is not difficult toverify that scaling down the new λ -bias by a factor of λ min /
4, it is a (1 ± ε/ λ -bias. O (log n ) bits of memory andoutputs a (1 ± ε )-approximation to the value (cid:88) i ∈ [ m ] ,t ∈ [ k ]: j ( i ) t = (cid:96) λ t b t ( i )with probability at least 2 /
3. Now, we design a streaming algorithm A which gets the (1 ± ε )-approximation A to the above quantity and calculates the value m . Finally, let A (Ψ) = A/m andby Proposition 4.2 and the definition of λ -bias, we know that A (Ψ) is a (1 ± ε )-approximation to B λ (Ψ) using O (log n ) space with probability at least 2 / Proposition 4.5.
Let K Y and K N be two disjoint nonempty closed convex sets in R k at least oneof which is compact. Then there exists a nonzero vector λ = ( λ , . . . , λ k ) and real numbers τ Y > τ N such that ∀ x ∈ K Y , (cid:104) λ , x (cid:105) ≥ τ Y and ∀ x ∈ K N , (cid:104) λ , x (cid:105) ≤ τ N . We are now ready to describe our algorithm for ( γ, β )- Max-CSP ( f ). Algorithm 1
A streaming algorithm for ( γ, β )- Max-CSP ( f ) Input:
Ψ—an instance of
Max-CSP ( f ). Let λ ∈ R k and τ N < τ Y be as given by Proposition 4.5 separating K Yγ ( f ) and K Nβ ( f ). Let ε = τ Y − τ N τ Y + τ N ) (so that (1 − ε ) τ Y > (1 + ε ) τ N ). Compute ˜ B to be a (1 ± ε ) approximation to B λ (Ψ), i.e., (1 − ε ) B λ (Ψ) ≤ ˜ B ≤ (1 + ε ) B λ (Ψ)with probability at least 2 / if ˜ B ≤ τ N (1 + ε ) thenOutput: NO. elseOutput: YES.It is clear that the algorithm above runs in O (log n ) space (in particular by using Proposition 4.2for Step 3). We now turn to analyzing the correctness of the algorithm. Lemma 4.6.
Algorithm 1 correctly solves ( γ, β ) - Max-CSP ( f ) , if K Yγ ( f ) and K Nβ ( f ) are disjoint.Specifically, for every Ψ , let τ Y , τ N , ε, λ , ˜ B be as given in Algorithm 1, we have: val Ψ ≥ γ ⇒ B λ (Ψ) ≥ τ Y and ˜ B > τ N (1 + ε ) , and val Ψ ≤ β ⇒ B λ (Ψ) ≤ τ N and ˜ B ≤ τ N (1 + ε ) , provided (1 − ε ) B λ (Ψ) ≤ ˜ B ≤ (1 + ε ) B λ (Ψ) . In the rest of this section, we will prove Lemma 4.6. The key to our analysis is a distribution D (Ψ a ) ∈ ∆( {− , } k ) that we associate with every instance Ψ and assignment a ∈ {− , } n to the19ariables of Ψ. Recall that in Definition 2.2, we define µ ( D ) = ( µ , . . . , µ k ) where µ i = E b ∼D [ b i ].If Ψ is γ -satisfied by assignment a , we prove that µ ( D (Ψ a )) ∈ K Yγ . On the other hand, if Ψ isnot β -satisfiable by any assignment, we prove that for every a , µ ( D (Ψ a )) ∈ K Nβ . Finally we alsoshow that the bias B λ (Ψ) relates to λ ( D (Ψ a )) (cid:44) (cid:104) µ ( D (Ψ a ) , λ (cid:105) , where the latter quantity is exactlywhat needs to be computed (by Proposition 4.5) to distinguish the membership of µ ( D (Ψ a )) in K Yγ versus the membership in K Nβ .We start with recalling some notations. For an instance Ψ = ( C , . . . , C m ) on n variableswith C i = ( j ( i ) , b ( i )), and an assignment a ∈ {− , } n , let Ψ a denote the new instance obtained byflipping the variables according to a . Specifically Ψ a = ( C a , . . . , C a m ) where C a i = ( j ( i ) , a | j ( i ) (cid:12) b ( i )).Given instance Ψ, let D (Ψ) ∈ ∆( {− , } k ) be the distribution obtained by sampling a constraintat random from Ψ and outputting the “negation pattern”. Formally, to sample a random vector b ∼ D (Ψ), we sample i ∈ [ m ] uniformly and output b ( i ) where C i = ( j ( i ) , b ( i )).The next lemma relates the λ -bias vector of Ψ to λ ( D (Ψ a )) and uses this to relate the bias ofΨ to the maximum over a of λ ( D (Ψ a )). Lemma 4.7.
For every vector a ∈ {− , } n , we have λ ( D (Ψ a )) = (cid:104) a , bias λ (Ψ) (cid:105) . Consequently wehave B λ (Ψ) = max a ∈{− , } n { λ ( D (Ψ a )) } .Proof. We start with the first equality. Fix a ∈ {− , } n . We have λ ( D (Ψ a )) = (cid:104) µ ( D (Ψ a )) , λ (cid:105) (By definition of λ ( · ))= E y ∼D (Ψ a ) [ (cid:104) y , λ (cid:105) ] (By definition of µ ( D ) and linearity of inner product)= E i ∼ Unif { [ m ] } [ (cid:104) b a ( i ) , λ (cid:105) ] (By definition of D (Ψ a ))= E i ∼ Unif { [ m ] } (cid:88) t ∈ [ k ] b a ( i ) t · λ t (Expanding the inner product)= 1 m (cid:88) i ∈ [ m ] (cid:88) (cid:96) ∈ [ n ] (cid:88) t ∈ [ k ] j ( i ) t = (cid:96) · λ t · a (cid:96) · b ( i ) t (Using definition of Ψ a )= 1 m (cid:88) (cid:96) ∈ [ n ] a (cid:96) (cid:88) t ∈ [ k ] λ t (cid:88) i ∈ [ m ] j ( i ) t = (cid:96) · b ( i ) t (Exchanging summations)= (cid:88) (cid:96) ∈ [ n ] a (cid:96) · bias λ (Ψ) (cid:96) (By definition of bias λ ( · ))= (cid:104) a , bias λ (Ψ) (cid:105) , yielding the first equality.The second part is immediate from the observation that for every vector v ∈ R n , we have || v || = max a ∈{− , } n (cid:104) a , v (cid:105) and so B λ (Ψ) = || bias λ (Ψ) || = max a ∈{− , } n {(cid:104) a , bias λ (Ψ) (cid:105)} = max a ∈{− , } n { λ ( D (Ψ a )) } . We now turn to connecting val Ψ to properties of D (Ψ a ).20 emma 4.8. For every Ψ and a , if val Ψ ( a ) ≥ γ then D (Ψ a ) ∈ S Yγ .Proof. Follows from the fact that E b ∼D (Ψ a ) [ f ( b )] = 1 m (cid:88) i ∈ [ m ] f ( b ( i ) (cid:12) a | j ( i ) ) = 1 m (cid:88) i ∈ [ m ] C i ( a ) = val Ψ ( a ) ≥ γ , implying D (Ψ a ) ∈ S Yγ . Lemma 4.9.
For every Ψ , if val Ψ ≤ β , then for all a , we have D (Ψ a ) ∈ S Nβ .Proof. We claim if val Ψ ≤ β , then D (Ψ) ∈ S Nβ . This suffices to prove the lemma, since for every a ∈ {− , } n we have val Ψ a = val Ψ . So if val Ψ ≤ β then val Ψ a ≤ β and so by the claim aboveapplied to Ψ a , we have D (Ψ a ) ∈ S Nβ .We prove the contrapositive, i.e., we assume D (Ψ) (cid:54)∈ S Nβ and show this implies val Ψ > β . If D (Ψ) (cid:54)∈ S Nβ , then there exists p ∈ [0 ,
1] such that E b ∼D (Ψ) E c ∼ Bern ( p ) k [ f ( b (cid:12) c )] > β . But thisimplies, as we show below, that if σ ∼ Bern ( p ) n , then E σ ∼ Bern ( p ) n [ val Ψ ( σ )] > β . We have: E σ ∼ Bern ( p ) n [ val Ψ ( σ )] = E σ ∼ Bern ( p ) n E i ∼ Unif { [ m ] } [ C i ( σ )] (By definition of Ψ)= E σ ∼ Bern ( p ) n E i ∼ Unif { [ m ] } [ f ( b ( i ) (cid:12) σ | j ( i ) )] (By definition of C i )= E i ∼ Unif { [ m ] } E σ | j ( i ) ∼ Bern ( p ) k [ f ( b ( i ) (cid:12) σ | j ( i ) )] (Exchanging summations)= E i ∼ Unif { [ m ] } E c ∼ Bern ( p ) k [ f ( b ( i ) (cid:12) c )] (Renaming variables)= E b ∼D (Ψ) E c ∼ Bern ( p ) k [ f ( b (cid:12) c )] (By definition of D (Ψ)) > β (By the contrapositive assumption)Since val Ψ (cid:44) max σ { val Ψ ( σ ) } ≥ E σ ∼ Bern ( p ) n [ val Ψ ( σ )], we get a contradiction to val Ψ ≤ β . Thisconcludes the proof of the claim and hence the lemma.Before turning to the proof of Lemma 4.6, we first do a quick post-analysis of the proof above.The proof above is the key reason why the definition of S Nβ is chosen as it is: In particular, from thefact that there was an i.i.d. distribution, namely Bern ( p ) k , according to which a random assignmentsatisfied the “instance” underlying D (Ψ) with value more than β allowed us to extend this to a(again i.i.d., but this was not necessary) distribution over assignments to Ψ that also achieved valueof at least β . Note that the mere existence of an assignment of value greater than β on D (Ψ) wouldhave been insufficient for this step to go through, explaining our choice of definition of S Nβ .We are now ready to prove Lemma 4.6. Proof of Lemma 4.6.
Let val Ψ ≥ γ . Then there exists a ∈ {− , } n such that val Ψ ( a ) ≥ γ .By Lemma 4.8, we have that D (Ψ a ) ∈ S Yγ . By our choice of λ , we have λ ( D ) ≥ τ Y forevery D ∈ S Yγ and so in particular we have λ ( D (Ψ a )) ≥ τ Y . By Lemma 4.7, we have B λ (Ψ) = max c ∈{− , } n { λ ( D (Ψ c )) } . Putting these together we have B λ (Ψ) = max c ∈{− , } n { λ ( D (Ψ c )) } ≥ λ ( D (Ψ a )) ≥ τ Y . B ≥ (1 − ε ) B λ (Ψ), we get ˜ B ≥ (1 − ε ) τ Y > (1 + ε ) τ N , where the final inequality holdsby our choice of ε .The case val Ψ ≤ β is similar. In this case, by Lemma 4.9 we have D (Ψ a ) ∈ S Nβ for every a .Now applying Lemma 4.7 we get that for every a , (cid:104) a , bias λ (cid:105) = λ ( D (Ψ a )) ≤ τ N . We conclude that B λ (Ψ) = max a ∈{− , } n {(cid:104) a , bias λ (cid:105)} ≤ τ N . since ˜ B ≤ (1 + ε ) B λ (Ψ), we get ˜ B ≤ (1 + ε ) τ N .We now conclude the section with a formal proof of Theorem 4.1. Proof of Theorem 4.1.
The desired algorithm is Algorithm 1. Its space complexity is bounded bythe space required for Step 3, which by Lemma 4.4 is O (log n ). Assuming Step 3 works correctly,which happens with probability at least 2 /
3, Lemma 4.6 shows that it correctly solves ( γ, β )- Max-CSP ( f ) whenever K Yγ ( f ) ∩ K Nβ ( f ) = ∅ . In this section, we prove the following theorem, which is simply a restatement of the “hard” partof Theorem 2.3.
Theorem 5.1.
For every function f : {− , } k → { , } and for every ≤ β < γ ≤ , if K Yγ ( f ) ∩ K Nβ ( f ) (cid:54) = ∅ , then for every ε > , ( γ − ε, β + ε ) - Max-CSP ( f ) requires Ω( √ n ) space .Furthermore, if γ = 1 . then (1 , β + ε ) - Max-CSP ( f ) requires Ω( √ n ) space. To prove this theorem, we introduce the
Randomized Mask Detection (RMD) communicationgame below. We then state a lower bound for the communication complexity of this game (The-orem 5.3), and use the lower bound to prove Theorem 5.1. The proof of Theorem 5.3 appears inSection 7.
In this section, and (most of) the rest of this paper, we will be considering the complexity of2-player 1-way communication games. Broadly such games are described by two (parameterizedset of) distributions Y and N . An instance of the game is a pair ( X, Y ) either drawn from Y orfrom N and X is given as input to Alice and Y to Bob. A (one-way communication) protocolΠ = (Π A , Π B ) is a pair of functions with Π A ( X ) ∈ { , } c denoting Alice’s message to Bob, andΠ B (Π A ( X ) , Y ) ∈ { YES , NO } denoting the protocol’s output. We denote this output by Π( X, Y ).The complexity of this protocol is the parameter c specifying the length of Π A ( X ) (maximized overall X ). The advantage of the protocol Π is the quantity (cid:12)(cid:12)(cid:12)(cid:12) Pr ( X,Y ) ∼Y [Π( X, Y ) =
YES ] − Pr ( X,Y ) ∼N [Π( X, Y ) =
YES ] (cid:12)(cid:12)(cid:12)(cid:12) . The Randomized Mask Detection (RMD) communication game is an instance of such a com-munication game. Let n, k ∈ N and α ∈ (0 ,
1) with k ≤ n and αk ≤
1. Alice receives a privateinput x ∗ drawn uniformly at random from {− , } n while Bob receives private inputs of a k -uniform hypermatching of size αn and a vector z ∈ {− , } αkn of the form z = ( z (1) , . . . , z ( αn )) The constant hidden in the Ω notation may depend on k and ε . z ( i ) ∈ {− , } k for each i ∈ [ αn ]. Alice’s input x ∗ encodes a random bipartition of thevertex set according to the ± k -uniform hypermatching is encoded by a matrix M ∈ { , } αkn × n where the ( k ( i −
1) + 1)-th to the ( ki )-th rows encode the i -th hyperedge byputting exactly one 1 in each row to the corresponding vertices. During the game, Alice sendsa message to Bob and Bob has to discover the hidden structure of the vector z . The followingdefinition formally describes the problem. Definition 5.2 (Randomized Mask Detection (RMD) Problem) . For k ∈ N , α ∈ (0 , /k ] anda pair of distributions D Y , D N ∈ ∆( {− , } k ) , the ( D Y , D N ; α, k ) - RMD problem is the -playercommunication game given by a family of instances ( Y n , N n ) n ∈ N ,n ≥ /α where for a given n , Y = Y n and N = N n are as follows: Both Y and N are supported on triples ( x ∗ , M, z ) where x ∗ ∈ {− , } n , M ∈ { , } kαn × n and z ∈ {− , } kαn , where x ∗ is Alice’s input and the pair ( M, z ) are Bob’s inputs.We now specify the distributions of x ∗ , M and z in Y and N : • In both Y and N , x ∗ is distributed uniformly over {− , } n . • In both Y and N the matrix M ∈ { , } αkn × n is chosen uniformly (and independently of x ∗ ) among matrices with exactly one per row and at most one per column. (Thus M represents a k -hypermatching where each block of k rows describes a hyperedge.) • The vector z is obtained by “masking” (i.e., xor-ing) M x ∗ by a random vector b ∈ {− , } αkn whose distribution differs in Y and N . Specificially let b = ( b (1) , . . . , b ( αn )) be sampled fromone of the following distributions (independent of x ∗ and M ): – Y : Each b ( i ) ∈ {− , } k is sampled independently according to D Y . – N : Each b ( i ) ∈ {− , } k is sampled independently according to D N .We now set z = ( M x ∗ ) (cid:12) b (recall that that (cid:12) denotes coordinatewise product). We will typically suppress k and α from the notation when they are clear from context andsimply refer to the ( D Y D N )- RMD . We will refer to n as the length parameter or refer to “instancesof length n ” when the instances are drawn from Y n vs. N n . The goal of a protocol solving RMD isto distinguish between case where the masks are sampled from D Y from the case where the masksare sampled from D N and advantage measures this probability of distinguishing.We note that our communication game is slightly different from those in previous works: Specif-ically the problem studied in [GKK +
09, KKS15] is called the
Boolean Hidden Matching (BHM) problem from [GKK +
09] and the works [KKSV17, KK19] study a variant called the
Implicit HiddenPartition problem. While these problems are similar, they are less expressive than our formulation,and specifically do not seem to capture the many different all
Max-CSP ( f ) problems.There are two main differences between the previous settings and our setting. The first differenceis the way to encode the matching matrix M . In all the previous works, each edge (or hyperedge)is encoded by a single row in M where the corresponding columns are assigned to 1, so that m = αn . However, it turns out that this encoding hides too much information and hence we donot know how to reduce the problem to general Max-CSP . We unfold the encoding by using k rows to encode a single k -hyperedge (leading to the setting of m = kαn in our case). The seconddifference is that we allow the masking vector b to be sampled from a more general distribution.This is also for the purpose of establishing a reduction to general Max-CSP . That being said, itis possible to describe some of the previous results in our language: all the papers consider the23omplexity of distinguishing the distribution D Y = Unif ( { (1 , , ( − , − } ) from the distribution D N = Unif ( {− , } ). This problem is shown to have a communication lower bound of Ω( √ n ) in[GKK + n ) lower bound in [KK19].Due to the above two differences, it is not clear how to derive communication lower bounds forgeneral D Y and D N by reduction from the previous works. The main technical contribution of thispart of the paper is a communication lower bound for RMD for general D Y and D N . We summarizethe result in the following theorem. Theorem 5.3 (RMD Lower bound for distributions with matching marginals) . For every k ∈ N ,there exists α > such that for every α ∈ (0 , α ) and δ > the following holds: For every pairof distributions D Y , D N ∈ ∆( {− , } k ) with µ ( D Y ) = µ ( D N ) there exists τ > and n such thatfor every n ≥ n , every protocol for ( D Y , D n ) - RMD achieving advantage δ on instances of length n requires τ √ n bits of communication. We prove Theorem 5.3 in two parts. First, in Section 6, we prove a communication lower boundfor the special case where the marginals of D Y and D N are all zero. While this captures manynew cases, it fails to capture the more interesting scenarios (involving non-approximation resistantproblems). To get lower bounds for the general case, we reduce the 0-marginal case to the generalcase in Section 7.In the rest of this section, we use Theorem 5.3 to prove the main theorem (Theorem 5.1) ofthis section. We first perform a standard step on bootstrapping the number of hyperedges (whichcorresponds to the number of clauses in Max-CSP ) in Section 5.2. Next, we present the reductionto
Max-CSP in Section 5.3. Finally, we wrap up the proof for Theorem 5.1 in Section 5.4.
The hardness of RMD suggests a natural path for hardness of
Max-CSP ( f ) problems in the stream-ing model. Such a reduction would take two distributions D Y ∈ S Yγ and D N ∈ S Nβ with matchingmarginals, construct distributions Y and N of RMD, and then interpret these distributions (ina natural way) as distributions over instances of Max-CSP ( f ) that are indistinguishable to smallspace algorithms. While the exact details of this “interpretation” need to be spelled out, every stepin this path can be achieved. Unfortunately this does not mean any hardness for Max-CSP ( f ) sincethe CSPs generated by this reduction would consist of instances that have at most one constraintper variable! Indeed to go from the hardness of RMD to hardness of CPSs we need the hardness ofdistinguishing a T -fold concatenation of streams drawn according to Y from a T -fold concatenationof streams drawn according to N . (The concatenation now allows us to appeal to the membership D Y ∈ S Yγ to conclude that instances Ψ drawn from Y T have high val Ψ whereas for instances Ψdrawn from N T , the fact that D N ∈ S Nβ will imply that val Ψ is low for large but constant T .)In what follows, we define the T -fold concatenated streaming problem associated with ( D Y , D n )- RMD , which we call the ( D Y , D N , T )-streaming-RMD, formally. We then show that this problemremains indistinguishable which allows us to implement the plan alluded to above. We note thatthis part of our reduction is standard in prior works. In particular we follow the presentation in[CGV20].The general framework defines two distributions Y stream and N stream over streams. A stream-ing algorithm ALG processes the streams with space s and is required to output a verdict24n { YES , NO } . The advantage of ALG is defined as usual to be | Pr σ ∼Y stream [ ALG ( σ ) = YES ] − Pr σ ∼N stream [ ALG ( σ ) = YES ] | . Definition 5.4 (( D Y , D N , T )- streaming-RMD ) . For k, T ∈ N , α ∈ (0 , /k ] , distributions D Y , D N over {− , } k , the streaming problem ( D Y , D N , T ; α, k ) - streaming-RMD is the task of distinguishing,for every n , σ ∼ Y stream ,n from σ ∼ N stream ,n where for a given length parameter n , the distributions Y stream = Y stream ,n and N stream = N stream ,n are defined as follows: • Let Y be the distribution over instances of length n , i.e., triples ( x ∗ , M, z ) , from the definitionof ( D Y , D N ) - RMD . For x ∈ {− , } n , let Y| x denote the distribution Y conditioned on x ∗ = x .The stream σ ∼ Y stream is sampled as follows: Sample x ∗ uniformly from {− , } n . Let ( M (1) , z (1) ) , . . . , ( M ( T ) , z ( T ) ) be sampled independently according to Y| x ∗ . Let σ ( t ) be the pair ( M ( t ) , z ( t ) ) presented as a stream of edges with labels in {− , } k . Specifically for t ∈ [ T ] and i ∈ [ αn ] , let σ ( t ) ( i ) = ( e t ( i ) , z ( t ) ( i )) where e t ( i ) is the i th hyperedge of M t , i.e., e t ( i ) =( j t ( k ( i −
1) + 1) , . . . , j t ( k ( i −
1) + k ) and j t ( (cid:96) ) is the unique index j such that M ( t ) j,(cid:96) = 1 .Finally we let σ = σ (1) ◦ · · · ◦ σ ( T ) be the concatenation of the σ ( t ) s. • σ ∼ N stream is sampled similarly except we now sample ( M (1) , z (1) ) , . . . , ( M ( T ) , z ( T ) ) indepen-dently according to N | x ∗ where N | x is the distribution N condition on x ∗ = x . Again when α and k are clear from context we suppress them and simply refer to the( D Y , D N , T )- streaming-RMD problem. Lemma 5.5.
Let
T, k ∈ N , D Y , D N be two distributions over {− , } k and let α ∈ (0 , /k ] . Supposethat a streaming algorithm ALG solves ( D Y , D N , T ) - streaming-RMD on instances of length n withadvantage ∆ and space s , then there is a one-way protocol for ( D Y , D N ) - RMD on instances of length n using at most sT bits of communication achieving advantage at least ∆ /T . The proof of Lemma 5.5 is based on a hybrid argument ( e.g., [KKS15, Lemma 6.3]). We providea proof here based on the proof of [CGV20, Lemma 4.11]. (We note that previous lemmas of thisform only considered the case where D N is the uniform distribution, and the proofs used somespecial properties of this setting. Generalizing it to arbitrary D N involves a little extra care aswe do below.) Later, in Section 5.3, we show a reduction from ( D Y , D N , T )- streaming-RMD to Max-CSP ( f ) thus completing the objective of this section. Proof of Lemma 5.5.
Note that since we are interested in distributional advantage, we can fix therandomness in
ALG so that it becomes a deterministic algorithm. By an averaging argument therandomness can be chosen to ensure the advantage does not decrease. Let Γ denote the evolutionof function of
ALG as it processes a block of αn edges. That is, if the algorithm is in state s andreceives a stream σ of length αn then it ends in state Γ( s, σ ). Let s denote its initial state.We consider the following collection of (jointly distributed) random variables: Let x ∗ ∼ Unif ( {− , } n ). Let σ (1) Y , . . . , σ ( T ) Y ∼ Y| x ∗ be chosen independently conditioned on x ∗ . Similarly,let σ (1) N , . . . , σ ( T ) N ∼ N | x ∗ be chosen independently conditioned on x ∗ . Let S Yt denote the state of ALG after processing σ (1) Y , . . . , σ ( t ) Y , i.e., S Y = s and and S Yt = Γ( S Yt − , σ ( t ) Y ). Similarly let S Nt denote the state of ALG after processing σ (1) N , . . . , σ ( t ) N .Let S Ya : b denote the sequence of states ( S Ya , . . . , S Yb ) and similarly for S Na : b . Now let ∆ t = (cid:107) S Y t − S N t (cid:107) tvd . Observe that ∆ = 0 while ∆ T ≥ ∆. (The latter is based on the fact that ALG ≤ ∆ T − ∆ = (cid:80) T − t =0 (∆ t +1 − ∆ t )and so there exists t ∗ ∈ { , , . . . , T − } such that∆ t ∗ +1 − ∆ t ∗ = (cid:107) S Y t ∗ +1 − S N t ∗ +1 (cid:107) tvd − (cid:107) S Y t ∗ − S N t ∗ (cid:107) tvd ≥ ∆ T .
Now consider the random variable ˜ S = Γ( S Yt ∗ , σ ( t ∗ +1) N ) (so the previous state is fromthe YES distribution and the input is from the NO distribution). We claim below that E A ∼ d S Y t ∗ [ (cid:107) S Yt ∗ +1 | S Y t ∗ = A − ˜ S | S Y t ∗ = A (cid:107) tvd ] ≥ ∆ t ∗ +1 − ∆ t ∗ . Once we have the claim, we show howto get a space T s protocol for ( D Y , D n )- RMD with advantage ∆ t ∗ +1 − ∆ t ∗ concluding the proof ofthe lemma. Claim 5.6. E A ∼ d S Y t ∗ [ (cid:107) S Yt ∗ +1 | S Y t ∗ = A − ˜ S | S Y t ∗ = A (cid:107) tvd ] ≥ ∆ t ∗ +1 − ∆ t ∗ .Proof. We use the following equivalent definition of total variation distance. Random variables X and Y satisfy (cid:107) X − Y (cid:107) tvd ≤ τ if and only if there exists a coupling distribution D couple such that( ˜ X, ˜ Y ) ∼ D couple satisfy (1) ˜ X ∼ d X , (2) ˜ Y ∼ d Y and (3) Pr[ ˜ X (cid:54) = ˜ Y ] ≤ τ .Since (cid:107) S Y t ∗ − S N t ∗ (cid:107) tvd ≤ ∆ t ∗ we have D couple such that ( A, B ) ∼ D couple satisfy A ∼ d S Y t ∗ , B ∼ d S N t ∗ , and Pr[ A (cid:54) = B ] ≤ ∆ t ∗ .Now assume the claim is not true. Then for every A we have a coupling distribution D ∗ A suchthat for ( X, Y ) ∼ D ∗ A we have X ∼ d S Yt ∗ +1 | S Y t ∗ = A , Y ∼ d ˜ S | S Y t ∗ = A , and E A ∼ d S Y t ∗ [Pr ( X,Y ) ∼D ∗ A [ X (cid:54) = Y ]] < ∆ t ∗ +1 − ∆ t ∗ .We now describe a distribution ˜ D couple coupling S Y t ∗ +1 and S N t ∗ +1 showing their total variationdistance is less than ∆ t ∗ +1 thus achieving a contradiction. We describe the sampling proceduresampling ( ˜ A, ˜ B ) ∼ ˜ D couple : We first sample ( A, B ) ∼ D couple . If A = B we sample ( X, Y ) ∼ D ∗ A .Else we sample X ∼ d S Yt ∗ +1 | S Y t ∗ = A and (independently) Y ∼ d S Nt ∗ +1 | S N t ∗ = B . We let ˜ A = ( A, X )and ˜ B = ( B, Y ). It is easy to verify that ˜ A ∼ d S Y t ∗ +1 and ˜ B ∼ d S N t ∗ +1 . Finally note that theprobability that ˜ A (cid:54) = ˜ B is upper bounded byPr ( ˜ A, ˜ B ) ∼ ˜ D couple [ ˜ A (cid:54) = ˜ B ] = Pr (( A,X ) , ( B,Y )) ∼ ˜ D couple [ A (cid:54) = B ] + Pr (( A,X ) , ( B,Y )) ∼ ˜ D couple [( A = B ) and X (cid:54) = Y ] ≤ Pr ( A,B ) ∼D couple [ A (cid:54) = B ] + Pr ( A,B ) ∼D couple , ( X,Y ) ∼D ∗ A [ X (cid:54) = Y ] < ∆ t ∗ + (∆ t ∗ +1 − ∆ t ∗ ) < ∆ t ∗ +1 which implies (cid:107) S Y t ∗ +1 − S N t ∗ +1 (cid:107) tvd < ∆ t ∗ +1 and hence contradicts to the definition of ∆ t ∗ +1 .We now show how a protocol can be designed for ( D Y , D N )- RMD that achieves advantage at least θ = E A ∼ d S Y t ∗ [ (cid:107) S Yt ∗ +1 | S t ∗ = A − ˜ S | S t ∗ = A (cid:107) tvd ] ≥ ∆ t ∗ +1 − ∆ t ∗ concluding the proof of the lemma. Theprotocol uses the distinguisher T A : { , } s → { , } such that E A,S Yt ∗ +1 , ˜ S [ T A ( S Yt ∗ +1 )] − E [ T A ( ˜ S )] ≥ θ which is guaranteed to exist by the definition of total variation distance.Our protocol works as follows: Let Alice receive input x ∗ and Bob receive inputs ( M, z ) sampledfrom either Y| x ∗ or N | x ∗ .1. Alice samples σ (1) , . . . , σ ( t ∗ ) ∼ Y| x ∗ independently and computes A = S Y t ∗ ∈ { , } ( t ∗ +1) s and sends A to Bob. 26. Bob extracts S Yt ∗ from A , computes (cid:98) S = Γ( S Yt ∗ , σ ), where σ is the encoding of ( M, z ) as astream, and outputs YES if T A ( (cid:98) S ) = 1 and NO otherwise.Note that if ( M, z ) ∼ Y| x ∗ then (cid:98) S ∼ d S Yt ∗ +1 | S Y t ∗ = A while if ( M, z ) ∼ N | x ∗ then (cid:98) S ∼ ˜ S S Y t ∗ = A . Itfollows that the advantage of the protocol above exactly equals E A [ T A ( S Yt + )] − E A [ T A ( ˜ S )] ≥ θ ≥ ∆ t ∗ +1 − ∆ t ∗ ≥ ∆ /T . This concludes the proof of the lemma.By combining Lemma 5.5 with Theorem 5.3, we immediately have the following corollary. Lemma 5.7.
For k ∈ N let α ( k ) be as given by Theorem 5.3. Let T, k ∈ N , α ∈ (0 , α ( k )) , and D Y , D N be two distributions over {− , } k with µ ( D Y ) = µ ( D N ) . Then every streaming algorithm ALG solving ( D Y , D N , T ) - streaming-RMD with advantage / for all lengths uses space Ω( √ n ) .Proof. We get the lemma by combining Lemma 5.5 and Theorem 5.3. Let
ALG be an algorithmusing space s solving ( D Y , D N , T )- streaming-RMD with advantage 1 /
8. Then by Lemma 5.5, thereexists a one-way protocol for ( D Y , D N )- RMD using at most sT bits of communication with advan-tage at least 1 / (8 T ). Applying Theorem 5.3 with δ = 1 / (8 T ) >
0, we now get that s = Ω( √ n ). ( D Y , D N , T ) - streaming-RMD to approximating Max-CSP( f ) We now complete the sequence of reductions from
RMD to approximating
Max-CSP ( f ) by reducing streaming-RMD to Max-CSP ( f ). To this end, note that an instance σ of streaming-RMD is asequence ( σ (1) , . . . , σ ( m )) where each σ ( i ) = ( j ( i ) , z ( i )) with j ( i ) ∈ [ n ] k and z ( i ) ∈ {− , } k isalready syntactically very close to the description of a Max-CSP ( f ) instance. The only missingingredient is any reference to the function f itself! Indeed the reduction from streaming-RMD to Max-CSP ( f ) involves just applying this function f to the literals indicated by σ ( i ).Given an instance σ = ( σ (1) , . . . , σ ( m )) of streaming-RMD , let Ψ( σ ) denote the instance of Max-CSP ( f ) on variables x = ( x , . . . , x n ) with the constraints C , . . . , C m with C i = σ ( i ) =( j ( i ) , z ( i )) is the constraint satisfied if f ( z ( i ) (cid:12) x | j ( i ) ) = 1.In what follows we show that if D Y ∈ S Yγ then for all sufficiently large constant T , and sufficientlylarge n , if we draw σ ∼ Y stream ,n , then with high probability, Ψ( σ ) has value at least γ − o (1).Conversely if D N ∈ S Nβ , then for all sufficiently large n , if we draw σ ∼ N stream ,n , then withhigh probability Ψ( σ ) has value at most β + o (1). This essentially completes our reduction to Max-CSP ( f ). Lemma 5.8.
For every k ∈ N , f : {− , } k → { , } , ≤ β < γ ≤ , ε > , distributions D Y , D N ∈ ∆( {− , } k ) , α ∈ (0 , / (100 k )) , there exists an integer T such that for every T ≥ T ,the following holds:1. If D Y ∈ S Yγ , then for every sufficiently large n , the ( D Y , D N , T ) - streaming-RMD YES instance σ ∼ Y stream ,n satisfies Pr[ val Ψ( σ ) < ( γ − ε )] ≤ exp( − n ) .2. If D N ∈ S Nβ , then for every sufficiently large n , the ( D Y , D N , T ) - streaming-RMD NO instance σ ∼ N stream ,n satisfies Pr[ val Ψ( σ ) > ( β + ε )] ≤ exp( − n ) .Furthermore, if γ = 1 then Pr σ ∼Y stream ,n (cid:2) val Ψ( σ ) = 1 (cid:3) = 1 . roof. Roughly our proof uses the fact that the definition of S Yγ is setup so that Ψ( σ ) achievesvalue γ under the “planted” assignment x ∗ . Similarly S Nβ is setup so that for every assignment,the expected value is not more than β .We recall that the condition D Y ∈ S Yγ implies that E a ∼D Y [ f ( a )] ≥ γ . Now consider a random YES instance σ ∼ Y stream ,n of ( D Y , D N , T )- streaming-RMD and let x ∗ denote the underlyingvector corresponding to this draw. We show that for Ψ = Ψ( σ ) we have val Ψ ( x ∗ ) ≥ γ − ε withhigh probability. We consider the constraints given by σ ( i ) one at a time. Let m = αnT denotethe total number of constraints of Ψ. Let Z i = C i ( x ∗ ) = f ( z ( i ) (cid:12) x ∗ | j ( i ) ) denote the indicator ofthe event that the i th constraint is satisfied by x ∗ . By construction of z ( i ) (from Definition 5.2 andpassed through Definition 5.4), we have z ( i ) = b ( i ) (cid:12) x ∗ | j ( i ) where b ( i ) ∼ D Y independently of allother choices. We thus have Z i = f ( b ( i ) (cid:12) x ∗ | j ( i ) (cid:12) x ∗ | j ( i ) ) = f ( b ( i )). Thus Z i is a random variable,chosen independent of Z , . . . , Z i − , with expectation E [ Z i | Z , . . . , Z i − ] = E b ∼D Y [ f ( b )] ≥ γ . Byapplying a concentration bound (Lemma 3.4 suffices, though even simpler Chernoff bounds wouldsuffice) we get that Pr σ ∼Y stream ,n [ val Ψ( σ ) = m (cid:80) mi =1 Z i < ( γ − ε )] ≤ exp( − ε m ) = exp( − ε αT n ).This yields Part (1) of the lemma.Note that, if γ = 1, then Z i = 1 deterministically for every i , and so we get val Ψ = 1 withprobability 1, yielding the furthermore part of the lemma.We now turn to the analysis of the NO case. Here the condition D N ∈ S Nβ implies that forevery p ∈ [0 , E b ∼D N E a ∼ Bern ( p ) k [ f ( b (cid:12) a )] ≤ β . Now consider any fixed assignment ν ∈{− , } n . In what follows we show that for a random NO instance σ ∼ N stream ,n of ( D Y , D N , T )- streaming-RMD if we let Ψ = Ψ( σ ), then Pr[ val Ψ ( ν ) > ( β + ε )] ≤ c − n for c >
2. This allows us totake a union bound over the 2 n possible ν ’s to claim Pr[ val Ψ > ( β + ε )] ≤ n · c − n .We thus turn to analyzing val Ψ ( ν ). Recall that σ is chosen by picking x ∗ ∈ {− , } n uniformlyand then picking σ ( i )’s based on this choice — but our analysis will work for every choice of x ∗ ∈ {− , } n . Fix such a choice and let ν ∗ = ν (cid:12) x ∗ . Now for i ∈ [ m ] (where m = αnT ) let Z i denote the indicator of the event that ν satisfies C i . We have Z i = f ( b ( i ) (cid:12) x ∗ j ( i ) (cid:12) ν j ( i ) ) = f ( b ( i ) (cid:12) ν ∗ | j ( i ) ). We show below that E [ Z i | Z , . . . , Z i − ] ≤ β + ε/
2. (This time the Z i ’s aredependent, but we will show the conditioning does not hurt.) This allows us to apply Lemma 3.4to conclude Pr[ val Ψ ( ν ) = m ( (cid:80) mi =1 Z i ) > ( β + ε )] ≤ exp( − ε αT n ) ≤ c − n for any constant c of ourchoice (for correspondingly large T ).We thus turn to the final remaining step, i.e., to show E [ Z i | Z , . . . , Z i − ] ≤ β + ε/
2. Note thatthe only affect of the conditioning Z , . . . , Z i is that this influences the distribution of j ( i ). Recallfrom the construction in Definition 5.4 that σ = σ (1) ◦ · · · ◦ σ ( T ) is a concatenation of T streamsthat are independent, conditioned on x ∗ . Say σ ( i ) belongs to the t -th component i.e., σ ( t ) . Thenthe only variables that affect Z i are the Z i (cid:48) ’s where i (cid:48) < i and i (cid:48) also is a part of σ ( t ) . This effect isin turn passed through the conditioning of j ( i ). Let us fix j ( i (cid:48) ) for every i (cid:48) < i with i (cid:48) being part of σ ( t ) . Note there are at most αn such i (cid:48) ’s. Now let S = [ n ] \ ∪ i (cid:48) j ( i (cid:48) ) be the remaining vertices. Notethat conditioned on the fixed j ( i (cid:48) )’s, j ( i ) is a uniformly chosen sequence of k distinct elements of S .Note that since α < / (100 k ) we have | S | ≥ n − kαn ≥ . n . Let p = p S be the fraction 1’s in ν ∗ | S .We have E [ Z i | S ] ≤ E j ( i ) , b ( i ) [ f ( b ( i ) (cid:12) ν ∗ | j ( i ) )] ≤ E b ( i ) ∼D N E a ∼ Bern ( p ) k [ f ( b ( i ) (cid:12) a )] + k / | S | ≤ β + ε/ j ( i ) , . . . , j ( i ) k independently from S leadsto ν ∗ j ( i ) , . . . , ν ∗ j ( i ) k that are distributed independently according to Bern ( p ) while j ( i ) , . . . , j ( i ) k aredistinct with probability at least 1 − k / | S | . This concludes the proof of the lemma.28 .4 Proof of Theorem 5.1 We are now ready to prove Theorem 5.1.
Proof of Theorem 5.1.
Let α = α ( k ) be as given by Theorem 5.3. Given ε >
0, let α = α / T = (cid:100) / ( ε α ) (cid:101) . Suppose there exists a streaming algorithm ALG that solves ( γ − ε, β + ε )- Max-CSP ( f ). Let D Y , D N be distributions such that E a ∼D Y [ f ( a )] ≥ γ , E a ∼D N E c ∼ Bern ( p ) k [ f ( a (cid:12) c )] ≤ β for all p ∈ [0 , µ ( D Y ) = µ ( D N ). Let n be sufficiently large and let Y stream ,n and N stream ,n denote the distributions of YES and NO instances of ( D Y , D N , T )- streaming-RMD of length n .Since α and T satisfy the conditions of Lemma 5.8, we have for every sufficiently large n Pr σ ∼Y stream ,n (cid:2) val Ψ( σ ) < ( γ − ε ) (cid:3) = o (1) and Pr σ ∼N stream ,n (cid:2) val Ψ( σ ) > ( β + ε ) (cid:3) = o (1) . We conclude that
ALG can distinguish
YES instances of
Max-CSP ( f ) from NO instanceswith advantage at least 1 / − o (1) ≥ /
8. However, since D Y , D N and α satisfy the conditions ofLemma 5.7 (in particular µ ( D Y ) = µ ( D N ) and α ∈ (0 , α )) such an algorithm requires space atleast Ω( √ n ). Thus, we conclude that any streaming algorithm that solves ( γ − ε, β + ε )- Max-CSP ( f )requires Ω( √ n ) space.Finally, note that if γ = 1 then in Lemma 5.8, we have val Ψ = 1 with probability one. Repeatingthe above reasoning with this information, shows that (1 , β + ε ) − Max-CSP ( f ) requires Ω( √ n )-space. The goal of this section is to prove a special case of Theorem 5.3 when the distributions are 1-wiseindependent, i.e., their marginals are all 0. The main theorem of this section is summarized below.
Theorem 6.1 (Lower bound for 1-wise distributions) . For every k ≥ , there exists an α > suchthat for every α ∈ (0 , /α ) , δ ∈ (0 , / , and every D Y , D N ∈ ∆( {− , } k ) with µ ( D Y ) = µ ( D N ) =0 k , there exists τ > , and n such that for every n ≥ n , we have that every protocol for ( D Y , D N ) -RMD with parameter α that achieves advantage δ requires at least τ √ n bits of communication oninstances of length n . Our proof of Theorem 6.1 follows the methodology of [GKK +
09] with minor modifications asrequired by the RMD formulation. Their proof uses Fourier analysis to reduce the task of provinga communication lower bound to that of proving some combinatorial identities about randomlychosen matchings. We follow the same approach and this leads us to slightly different conditionsabout randomly chosen hypermatchings which requires a fresh analysis (though at the end ourbounds are qualitatively similar to those in [GKK + o ( √ n ), thenthe posterior distribution of Bob’s input z is close to the uniform distribution in total variationdistance, and hence contradicts the assumed advantage of the protocol. In Theorem 6.2 we showthat this total variation distance is small when Alice’s message is a “typical” one, in that thenumber of Alice inputs leading to this message is not too small. We show immediately after statingTheorem 6.2 how to go from the case of typical messages to all messages, and this gives a proof ofTheorem 6.1. 29or each k -uniform hypermatching M , distribution D over {− , } k , and a fixed Alice’s message,the posterior distribution function p M, D : {− , } αkn → [0 ,
1] is defined as follows. For each z ∈ {− , } αkn , let p M, D ( z ) := Pr x ∗ ∈ A b ∼D αn [ z = ( M x ∗ ) (cid:12) b | M, Alice’s message] = E x ∗ ∈ A E b ∼D αn [ z =( M x ∗ ) (cid:12) b ] , where A ⊂ {− , } n is the set of Alice’s inputs that correspond to the message. If the number ofbits communicated is at most c , then there exists a message such that A ⊆ {− , } n and | A | ≥ n − c . Theorem 6.2.
For every k ∈ N , there exists α > such that for every α ∈ (0 , α ) , δ ∈ (0 , / ,there exists a τ > such that the following holds for every sufficiently large n . Let A ⊆ {− , } n be a set satisfying | A | ≥ n − τ √ n , and let D be a distribution over {− , } k satisfying E a ∼D [ a j ] = 0 for all j ∈ [ k ] . Then E M (cid:2) (cid:107) p M, D − U (cid:107) tvd (cid:3) ≤ δ , (6.3) where U denotes the uniform distribution over {− , } kαn . Assuming Theorem 6.2, we prove Theorem 6.1 below.
Proof of Theorem 6.1.
Let δ be as in the theorem statement and let δ (cid:48) = δ/
8. Let τ be theconstant given by Theorem 6.2 when invoked with parameter α and δ (cid:48) . Let τ = τ / c (cid:48) = τ √ n ,and c = c (cid:48) − log(1 /δ (cid:48) ). Note that for large enough n , we have c ≥ τ √ n .We will prove the theorem for this choice of τ . The proof is by contradiction. Suppose thereexists a protocol for ( D Y , D N )- RMD on instances of length n with advantage at least δ using atmost c bits of communication. Let D unif be the uniform distribution over {− , } k . By triangleinequality, there is a protocol for either ( D Y , D unif )- RMD or ( D N , D unif )- RMD with advantage atleast δ using at most c bits of communication. Without loss of generality, suppose there is protocolfor ( D Y , D unif )- RMD with advantage at least δ . We have (cid:107) p M, D Y − p M, D unif (cid:107) tvd ≥ δ . Next, by Yao’s principle [Yao77] we may assume that the message sent by Alice is deterministic.Namely, the message partitions the set {− , } n of x ∗ into 2 c sets A , A , . . . , A c . Using a simplecounting argument, we can show that with probability at least 1 − δ (cid:48) , the message sent by Alicecorresponds to a set A i ⊂ {− , } n of size at least 2 n − c − log 1 /δ (cid:48) ≥ n − c (cid:48) . We call such an event GOOD . That is,
GOOD = (cid:91) i ∈ [2 c ]: | A i |≥ n − c (cid:48) A i . Now for each A i with | A i | ≥ n − c (cid:48) , we apply Theorem 6.2 with parameters α and δ (cid:48) to get (cid:107) p M, D Y − p M, D unif (cid:107) tvd | x ∗ ∈ A i = E M [ (cid:107) p M, D Y − U (cid:107) tvd | x ∗ ∈ A i ] ≤ δ (cid:48) . Now, for x ∗ ∼ Unif ( {− , } n ), we have (cid:107) p M, D Y − U (cid:107) tvd = Pr[ x ∗ ∈ GOOD ] · (cid:107) p M, D Y − U (cid:107) tvd | x ∗ ∈ GOOD + Pr[ x ∗ (cid:54)∈ GOOD ] · (cid:107) p M, D Y − U (cid:107) tvd | x ∗ (cid:54)∈ GOOD ≤ · δ (cid:48) + δ (cid:48) · δ < δ . (cid:107) p M, D Y − U (cid:107) tvd = (cid:107) p M, D Y − p M, D unif (cid:107) tvd ≥ δ . This completes the proof of Theorem 6.1.The rest of this section is devoted to the proof of Theorem 6.2. In Section 6.1, we reduce theupper bound for Equation 6.3 to a combinatorial problem. Next, we analyze the combinatorialproblem in Section 6.2, and finally complete the proof of Theorem 6.2 in Section 6.3.
Let A ⊆ {− , } n be the set of Alice’s inputs that correspond to the message. We define f : {− , } n → { , } to be the indicator function of A , i.e., f ( x ∗ ) = 1 iff x ∗ ∈ A . In this subsection,we apply Fourier analysis on the left hand side of Equation 6.3 and get an upper bound in termsof a combinatorial quantity related to the random matching and the Fourier coefficients of f . Thereduction is summarized in the following lemma.In what follows, we will write a vector s ∈ { , } αkn as a concatenation of αn vectors, i.e., s = ( s (1) , . . . , s ( αn )) where s ( i ) ∈ { , } k . We use | s ( i ) | to denote the Hamming weight of s ( i ). Lemma 6.4.
Let A ⊆ {− , } n and f : {− , } n → { , } be its indicator function. Let k ∈ N and α ∈ (0 , / k ) . Let D be a distribution over {− , } k such that E a ∼D [ a j ] = 0 for all j ∈ [ k ] .For each (cid:96) ∈ [ n ] , let us denote by v (cid:96) ∈ { , } n , the vector where the first (cid:96) entries are , and theremaining entries are . We have E M [ (cid:107) p M, D − U (cid:107) tvd ] ≤ n | A | αkn (cid:88) (cid:96) ≥ g ( (cid:96) ) · (cid:88) v ∈{ , } n | v | = (cid:96) (cid:98) f ( v ) , where g ( (cid:96) ) = Pr M (cid:104) ∃ s ∈ { , } αkn \{ αkn } , | s ( i ) | (cid:54) = 1 ∀ i, M (cid:62) s = v (cid:96) (cid:105) . Proof.
By Cauchy–Schwarz inequality and Equation 3.6, E M (cid:2) (cid:107) p M, D − U (cid:107) tvd (cid:3) ≤ αkn E M (cid:2) (cid:107) p M, D − U (cid:107) (cid:3) = 2 αkn E M (cid:88) s ∈{ , } αkn \{ αkn } (cid:91) p M, D ( s ) . (6.5)The following claim shows that the expected sum of the Fourier coefficients (correspondingto non-empty subsets of [ αkn ]) of the posterior distribution p M, D can be upper bounded by anexpected sum of certain Fourier coefficients of the indicator function f . Claim 6.6. E M [ (cid:107) p M, D − U (cid:107) tvd ] ≤ n | A | (cid:88) s ∈ GOOD \{ αkn } E M (cid:104) (cid:98) f ( M (cid:62) s ) (cid:105) . roof. For every s ∈ { , } αkn \{ αkn } , consider s ∈ { , } αkn to be αn blocks s (1) , . . . , s ( αn ) ∈{ , } k of length k . Observe that (cid:91) p M, D ( s ) = 12 αkn (cid:88) z ∈{− , } αkn p M, D ( z ) (cid:89) i ∈ [ αn ] ,j ∈ [ k ] s ( i ) j =1 z ( i ) j . By substituting p M, D ( z ) = E x ∗ ∈ A E b ∼D αn [ z = M x ∗ (cid:12) b ], the equation becomes (cid:91) p M, D ( s ) = 12 αkn · E x ∗ ∈ A (cid:89) i ∈ [ αn ] ,j ∈ [ k ] s ( i ) j =1 ( M x ∗ ) i,j E b ∼D αn (cid:89) i ∈ [ αn ] ,j ∈ [ k ] s ( i ) j =1 b ( i ) j . Since E a ∼D [ a j ] = 0 for all j ∈ [ k ], the right hand side expression becomes zero if there exists i ∈ [ αn ] such that | s ( i ) | = 1. Define GOOD := { s ∈ { , } αkn | | s ( i ) | (cid:54) = 1 ∀ i } . We have (cid:91) p M, D ( s ) ≤ αkn · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∗ ∈ A (cid:89) i ∈ [ αn ] ,j ∈ [ k ] s ( i ) j =1 ( M x ∗ ) i,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · s ∈ GOOD . Since each row and column in M has at most one non-zero entry, we can rewrite the right handside as = 12 αkn · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∗ ∈ A (cid:89) i ∈ [ n ]( M (cid:62) s ) i =1 x ∗ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · s ∈ GOOD
Now we relate the above quantity to the Fourier coefficients of f . Recall that f is the indicatorfunction of the set A and hence for each v ∈ { , } n , we have (cid:98) f ( v ) = 12 n (cid:88) x ∗ f ( x ∗ ) (cid:89) i ∈ [ n ]: v i =1 x ∗ i = 12 n (cid:88) x ∗ ∈ A (cid:89) i ∈ [ n ]: v i =1 x ∗ i . Thus, the Fourier coefficient of p M corresponding to a set s ∈ { , } αkn can be bounded as follows: (cid:91) p M, D ( s ) ≤ αkn · n | A | (cid:98) f ( M (cid:62) s ) · s ∈ GOOD . (6.7)By plugging Equation 6.7 into Equation 6.5, we have the desired bound, and this completes theproof of Claim 6.6.It follows from Claim 6.6 that E M [ (cid:107) p M, D − U (cid:107) tvd ] ≤ n | A | (cid:88) s ∈ GOOD \{ αkn } E M (cid:104) (cid:98) f ( M (cid:62) s ) (cid:105) . M , the map M (cid:62) is injective, the right hand side of the above inequality has thefollowing combinatorial form. 2 n | A | (cid:88) v ∈{ , } n \{ n } Pr M (cid:104) ∃ s ∈ GOOD \{ } , M (cid:62) s = v (cid:105) (cid:98) f ( v ) . By symmetry, the above probability term will be the same for v and v (cid:48) which have the sameHamming weight. For each (cid:96) ∈ [ n ], denote g ( (cid:96) ) = Pr M (cid:2) ∃ s ∈ GOOD \{ } , M (cid:62) s = v (cid:3) , where | v | = (cid:96) .Therefore, the expression simplifies to2 n | A | n (cid:88) (cid:96) ≥ g ( (cid:96) ) · (cid:88) v ∈{ , } n | v | = (cid:96) (cid:98) f ( v ) . Note that for (cid:96) = 1 or (cid:96) > αkn , g ( (cid:96) ) = 0 by definition. Thus, the above expression further simplifiesto the following: 2 n | A | αkn (cid:88) (cid:96) ≥ g ( (cid:96) ) · (cid:88) v ∈{ , } n | v | = (cid:96) (cid:98) f ( v ) . We conclude that E M [ (cid:107) p M, D − U (cid:107) tvd ] ≤ n | A | αkn (cid:88) (cid:96) ≥ g ( (cid:96) ) · (cid:88) v ∈{ , } n | v | = (cid:96) (cid:98) f ( v ) . This completes the proof of Lemma 6.4.
In this subsection, we upper bound the combinatorial term g ( (cid:96) ) in Lemma 6.4. The result issummarized in the following lemma. Lemma 6.8.
For every k , there exists an α > such that for every α ∈ (0 , α ) , and for every n and (cid:96) ≤ n/ , we have g ( (cid:96) ) = Pr M (cid:104) ∃ s (cid:54) = 0 , | s ( i ) | (cid:54) = 1 ∀ i, M (cid:62) s = v (cid:96) (cid:105) ≤ (cid:18) (cid:96)n (cid:19) (cid:96)/ . Proof.
We set α = (1 / (2 e k )) k so that 2 α /k e / k ≤
1. We reformulate our events. Instead offixing v = v (cid:96) and picking the matching M at random, we note that it is equivalent to fixingthe matching M and letting v be a uniformly random vector of weight (cid:96) . We thus let M be thematching e , . . . , e αn , where e i = { ( i − k + 1 , . . . , ( i − k + k } . Letting V denote the support ofthe vector v , the event we wish to consider is: “ V ⊆ [ kαn ] and | V ∩ e i | ≥ i ∈ [ αn ].”We bound the probability as follows. Let T = { i ∈ [ αn ] | e i ∩ V (cid:54) = ∅} denote the set of edgesthat touch V , and let | T | = t . Note that (cid:96)/k ≤ t ≤ (cid:96)/
2, where the latter inequality follows from thefact that every intersection is of size at least 2. We pick V by first picking T (there are at most (cid:0) αnt (cid:1) ways of doing this), and then picking V as a subset of the vertices incident to the edges of T (there33igure 2: An example with n = 30, k = 3, α = 0 .
5, and (cid:96) = 6. The red triangles denote the edges e , . . . , e . The blue circle denotes the set V and the red circle denotes the set T with t = 3. Notethat the figure illustrates the over-counting we do in the proof of the lemma - the set V actuallyintersects one of the edges just once, and so should not be counted. Our counting will neverthelessinclude the set since it is contained in at most (cid:96)/ (cid:0) kt(cid:96) (cid:1) ways of doing this). (See Figure 2.) Summing over t and dividing by the total numberof choices of V gives the final bound. We give the calculations below (which use the inequalities( a/b ) b ≤ (cid:0) ab (cid:1) ≤ ( ea/b ) b ).Pr V [ V ⊆ [ kαn ] , | V ∩ e i | ≥ ≤ (cid:80) (cid:96)/ t = (cid:96)/k (cid:0) αnt (cid:1)(cid:0) kt(cid:96) (cid:1)(cid:0) n(cid:96) (cid:1) ≤ (cid:96)/ (cid:88) t = (cid:96)/k (cid:16) eαnt (cid:17) t · (cid:18) ekt(cid:96) (cid:19) (cid:96) · (cid:16) n(cid:96) (cid:17) − (cid:96) = (cid:96)/ (cid:88) t = (cid:96)/k e t + (cid:96) α t k (cid:96) ( t/n ) (cid:96) − t ≤ α (cid:96)/k e (cid:96)/ k (cid:96) ( (cid:96)/n ) (cid:96)/ ∞ (cid:88) t (cid:48) =0 ( (cid:96)/n ) t (cid:48) ≤ α /k e / k ) (cid:96) ( (cid:96)/n ) (cid:96)/ ≤ (2 α /k e / k ) (cid:96) ( (cid:96)/n ) (cid:96)/ ≤ ( (cid:96)/n ) (cid:96)/ . Proof of Theorem 6.2.
By Lemma 6.4 and Lemma 6.8, we have E M [ (cid:107) p M, D − U (cid:107) tvd ] ≤ n | A | · αkn (cid:88) (cid:96) =2 (cid:96) (cid:96)/ n (cid:96)/ (cid:88) v ∈{ , } n | v | = (cid:96) (cid:98) f ( v ) . We use Lemma 3.7 to upper bound the sum of level- (cid:96)
Fourier coefficients for small (cid:96) as follows.34et c = τ √ n so that | A | ≥ n − c . For (cid:96) ∈ [4 c ], we have2 n | A | (cid:88) v ∈{ , } n | v | = (cid:96) (cid:98) f ( v ) ≤ (cid:32) √ c(cid:96) (cid:33) (cid:96) . Next, we apply Parseval’s inequality (Lemma 3.5) and have (cid:80) v (cid:98) f ( v ) ≤
1. Thus, E M [ (cid:107) p M, D − U (cid:107) tvd ] ≤ c (cid:88) (cid:96) =2 (cid:96) (cid:96)/ n (cid:96)/ · (cid:32) √ c(cid:96) (cid:33) (cid:96) + 2 n | A | · max c<(cid:96) ≤ αkn (cid:40) (cid:96) (cid:96)/ n (cid:96)/ (cid:41) The second term on the right hand side is maximized at (cid:96) = 4 c + 1 (since (cid:96) ≤ n ), and hence E M [ (cid:107) p M, D − U (cid:107) tvd ] ≤ c (cid:88) (cid:96) =2 (cid:18) c (cid:96) · n (cid:19) (cid:96)/ + (cid:18) cn (cid:19) c ≤ c (cid:88) (cid:96) =2 (2 τ ) (cid:96) + (8 τ ) c ≤ δ , where the final expression determines our choice of τ . Specifically, we set τ = δ/ so that eachterm is at most δ /
2. This completes the proof of Theorem 6.2.
In this section we finally prove Theorem 5.3. In other words we show that for every D Y , D N ∈ ∆( {− , } k ) with matching marginals, any protocol for ( D Y , D N )- RMD with positive advantagerequires Ω( √ n ) bits of communication. We start with an overview.The first step is to observe that we can prove indistinguishability of some distributions withmatching non-zero marginals. For example, given that D = Unif ( { ( − , − , (1 , } is indistinguish-able from D = Unif ( {− , } ), it can also be shown that D (cid:48) = { (1 , } + D is indistinguishablefrom D (cid:48) = { (1 , } + D (see Lemma 7.7 for a related statement). Note that D (cid:48) and D (cid:48) aredistributions with non-zero but matching marginals.The bulk of this section is devoted to proving that for every pair of distributions D Y and D N , we can find a path (a sequence) of intermediate distributions D Y = D , D , . . . , D L = D N such that adjacent pairs in this sequence are indistinguishable by a “basic” argument, where abasic argument is a combination of an indistinguishability result from Theorem 6.1 and a shiftingargument formalized in Lemma 7.7. Our proof comes in the following steps:1. For every marginal vector µ , we identify a canonical distribution D µ that we use as theendpoint of the path. So it suffices to prove that for all D , D is indistinguishable from D µ ( D ) ,i.e., there is a path of finite length from D to D µ ( D ) .2. We identify a measure Φ( D ) associated with distributions that helps measure progress ona path. Among distributions with marginal µ ( D ), this measure is uniquely maximized by35 µ ( D ) . We show that for every distribution D that is not canonical one can take a basic stepthat increases µ ( D ). Unfortunately the measure Φ is real-valued and the increases per stepcan be by arbitrarily small amounts, so we are not done.3. We give a combinatorial proof that there is a path of finite length (some function of k ) thattakes us from an arbitrary distribution to the canonical one.Putting the three ingredients together, along with a proof that a “basic step” is indistinguishablegives us the final theorem.We start with the definition of the chain and the canonical distribution. For a distribution D ∈ ∆( {− , } k ), its support is the set supp ( D ) = { a ∈ {− , } k | D ( a ) > } . Next, we considerthe following partial order on {− , } k . For vectors a , b ∈ {− , } k we use the notation a ≤ b if a i ≤ b i for every i ∈ [ k ]. Further we use a < b if a ≤ b and a (cid:54) = b . Definition 7.1 (Chain) . We refer to a sequence a (0) < a (1) < · · · < a ( (cid:96) ) , a ( i ) ∈ {− , } k forevery i ∈ { , . . . , (cid:96) } , as a chain of length (cid:96) . Note that chains in {− , } k have length at most k . Definition 7.2 (Canonical distribution) . Given a vector of marginals µ = ( µ , . . . , µ k ) ∈ [ − , k ,the canonical distribution associated with µ , denoted D µ , is defined as follows: Let ρ : [ k ] → [ k ] be apermutation such that − ≤ µ ρ (1) ≤ · · · ≤ µ ρ ( k ) ≤ . For i ∈ { , . . . , k } , let a ( i ) ∈ {− , } k be givenby a ( i ) j = − if j ∈ { ρ (1) , . . . , ρ ( k − i ) } and a ( i ) j = 1 otherwise. (Note that a (0) < · · · < a ( k ) .)Then D µ ( a ( i )) = ( µ ρ ( k − i +1) − µ ρ ( k − i ) ) , where we define µ ρ (0) = − and µ ρ ( k +1) = 1 . Finally, D µ ( a ) = 0 for all a / ∈ { a (0) , . . . , a ( k ) } . It is easy to verify that D µ is indeed a distribution, and that it has the desired marginals, i.e., µ ( D µ ) = µ . Note that a distribution is a canonical distribution if and only if its support is achain. Furthermore, the canonical distribution is uniquely determined even though ρ , and hencethe chain a (0) , . . . , a ( k ), may not be uniquely determined. This is so since ρ is non-unique only if µ ρ ( i ) = µ ρ ( i +1) for some i , and in this case D µ ( a ( i )) = 0 so the “non-uniqueness of a ( i ) does notaffect D µ .Next we define a potential associated with distributions. For a distribution D ∈ ∆( {− , } k )define its potential to be Φ( D ) = E b ∼D (cid:88) j ∈ [ k ] b j . We will show shortly that D µ is the distribution with maximum potential among all distributionswith marginal µ . In the process of showing this we will introduce a “polarization operator” whichmaps a distribution D to a new one that increases the potential for typical distributions. Sincethis operator is useful also for further steps, we start with defining this operator and analyzing itseffect on the potential. Briefly, suppose the support of a distribution contains both ( − i (1) k − i and 1 i ( − k − i . Then thepolarization operator moves some of this mass (as much as possible while maintaining the propertythat the result is a distribution) to the more “polarized” points ( − k and 1 k . The operator isdefined more generally to allow the two starting points to agree on some coordinates. To definethis operator, the following notation will be useful.36or u , v ∈ {− , } k , let u ∧ v = (min { u , v } , . . . , min { u k , v k } ) and let u ∨ v =(max { u , v } , . . . , max { u k , v k } ). We say u and v are incomparable if u (cid:54)≤ v and v (cid:54)≤ u . Notethat if u and v are incomparable then { u , v } and { u ∨ v , u ∧ v } are disjoint . Definition 7.3 (Polarization (update) operator) . Given a distribution
D ∈ ∆( {− , } k ) and in-comparable elements u , v ∈ {− , } k , we define the ( u , v ) - polarization of D , denoted D u , v , to bethe distribution as given below. Let ε = min {D ( u ) , D ( v ) } . D u , v ( b ) = D ( b ) − ε , b ∈ { u , v }D ( b ) + ε , b ∈ { u ∨ v , u ∧ v }D ( b ) , otherwise.We refer to ε ( D , u , v ) = min {D ( u ) , D ( v ) } as the polarization amount. It can be verified that the polarization operator preserves the marginals, i.e., µ ( D ) = µ ( D u , v ).Note also that this operator is non-trivial, i.e., D u , v = D , if { u , v } (cid:54)⊆ supp ( D ). By correlating the“+1”s and “ − D more polarized in the sensequantified in the following lemma. Lemma 7.4 (Polarization increases potential) . Let
D ∈ ∆( {− , } k ) be a distribution with marginalvector µ = µ ( D ) and let u , v ∈ supp ( D ) be incomparable. Then we have Φ( D u , v ) = Φ( D ) + 8 · ε · s · t where ε = ε ( D , u , v ) is the polarization amount, and s = |{ j ∈ [ k ] | u j = − v j = 1 }| and t = |{ j ∈ [ k ] | u j = − v j = − }| . In particular Φ( D u , v ) > Φ( D ) .Proof. We look at the difference Φ( D u , v ) − Φ( D ). Let (cid:96) = (cid:80) j ∈ [ k ]: u j = v j u j . We have:Φ( D u , v ) − Φ( D ) = (cid:88) b ∈{− , } k ( D u , v ( b ) − D ( b )) · Φ( b )= ε · (Φ( u ∧ v ) + Φ( u ∨ v ) − Φ( u ) − Φ( v ))= ε · (( (cid:96) + s + t ) + ( (cid:96) − s − t ) − ( (cid:96) + s − t ) − ( (cid:96) − s + t ) )= 8 · ε · s · t . Finally note that s, t > u and v are incomparable, and ε > u , v ∈ supp ( D ), thusyielding Φ( D u , v ) > Φ( D ). Lemma 7.5 ( D µ maximizes potential) . For every distribution
D ∈ ∆( {− , } k ) with µ = µ ( D ) we have Φ( D ) ≤ Φ( D µ ) . Furthermore the inequality is strict if D (cid:54) = D µ .Proof. Let D ∗ be a distribution with marginal µ that maximized Φ( D ). Suppose there exist in-comparable u , v ∈ supp ( D ∗ ), then by Lemma 7.4 we have that Φ( D ∗ ) < Φ( D ∗ u , v ) contradictingthe maximality of D ∗ . It follows that there are no incomparable elements in supp ( D ∗ ), or in otherwords, supp ( D ∗ ) is a chain. We now show that this implies D ∗ = D µ .More specifically we show that any distribution D supported on a chain is uniquely determinedby its marginal µ . To see this, let ρ : [ k ] → [ k ] be a bijection such that µ ρ ( j ) ≤ µ ρ ( j +1) for all j . To see this, suppose u = u ∧ v , then we have u j = min { u j , v j } for all j ∈ [ k ] and hence u ≤ v , which is acontradiction. The same analysis works for the other cases. τ < τ < · · · < τ (cid:96) be the attainable values of µ , i.e., { τ | ∃ j ∈ [ k ] s.t. µ j = τ } = { τ , . . . , τ (cid:96) } .For 0 ≤ i ≤ (cid:96) , let a ( i ) be given by a ( i ) j = − µ j ≤ τ (cid:96) − i and a ( i ) j = 1 otherwise. Note that a (0) < · · · < a ( (cid:96) ). It can be verified that supp ( D ∗ ) = { a (0) , . . . , a ( (cid:96) ) } , and D ∗ ( a ( i )) is uniquelydefined for all i . Claim 7.6. supp ( D ∗ ) = { a (0) , . . . , a ( (cid:96) ) } , and D ∗ ( a ( i )) = ( τ (cid:96) − i +1 − τ (cid:96) − i ) / , where τ − = − and τ (cid:96) +1 = 1 .Proof. For the sake of contradiction, assume supp ( D ∗ ) = { a (cid:48) (0) , . . . , a (cid:48) ( (cid:96) (cid:48) ) } (cid:54) = { a (0) , . . . , a ( (cid:96) ) } where a (cid:48) (0) < a (cid:48) (1) < · · · < a (cid:48) ( (cid:96) (cid:48) ) is a chain. Let 0 ≤ i ≤ min { (cid:96), (cid:96) (cid:48) } be the smallest i such that a ( i ) (cid:54) = a (cid:48) ( i ). Consider the following three situations: (i) a ( i ) < a (cid:48) ( i ), (ii) a ( i ) > a (cid:48) ( i ), and (iii) a ( i )and a (cid:48) ( i ) are incomparable.For (i) and (iii), due to the construction of { a (0) , . . . , a ( (cid:96) ) } and the fact that { a (cid:48) (0) , . . . , a (cid:48) ( (cid:96) (cid:48) ) } isa chain, we have that for each j, j (cid:48) ∈ [ k ] with τ i − < µ j , µ j (cid:48) ≤ τ i , a (cid:48) ( i (cid:48) ) j = a (cid:48) ( i (cid:48) ) j (cid:48) for all 0 ≤ i (cid:48) ≤ (cid:96) (cid:48) .This implies that µ j = µ j (cid:48) which is a contradiction because there are two attainable values τ i and τ i − lie in the interval ( τ i − , τ i ]. Similar argument also works for situation (ii).We conclude that supp ( D ∗ ) = { a (0) , . . . , a ( (cid:96) ) } . It is immediate to see that D ∗ ( a ( i )) is uniquelydefined for all i by solving the following linear system. µ = | | | a (0) a (1) · · · a ( (cid:96) ) | | | D ∗ ( a (0)) D ∗ ( a (1)) · · ·D ∗ ( a ( (cid:96) )) . Note that by the construction of { a (0) , . . . , a ( (cid:96) ) } , the matrix has full rank, and, hence, there is aunique solution. It can be verified that the solution is given by D ∗ ( a ( i )) = ( τ (cid:96) − i +1 − τ (cid:96) − i ) /
2, where τ − = − τ (cid:96) +1 = 1.In summary, D ∗ is uniquely determined by µ ( D ) and its support is a chain. This implies that D ∗ = D µ , so D µ is the unique distribution that maximizes the potential. Our next observation is that for every distribution D with incomparable elements u , v in theirsupport, D is indistinguishable, in the RMD problem, from its ( u , v )-polarization D u , v . Lemma 7.7 (Polarization update preserves indistinguishability) . Let α ( k ) be as given in Theo-rem 6.1. Let k ∈ N , α ∈ (0 , α ) , δ ∈ (0 , / . Then for every distribution D ∈ ∆( {− , } k ) andincomparable u , v ∈ supp ( D ) there exists τ > and n such that for every n ≥ n every protocol for ( D , D u , v ) - RMD achieving advantage δ on instances of length n requires τ √ n bits of communication. We prove Lemma 7.7 by a reduction. We show that there exists a pair of distributions D Y and D N with marginals being zero such that given a protocol Π for ( D , D u , v )- RMD , we can get a protocolΠ (cid:48) for ( D Y , D N )- RMD . We then use Theorem 6.1 to get a lower bound on the communication of Π (cid:48) and thus of Π. Specifically, we divide the proof into three steps. In step one, we define D Y and D N and provide intuition on the reduction. Next, we formally describe the reduction by designing aprotocol for ( D Y , D N )- RMD from a protocol for ( D , D u , v )- RMD . Finally, we prove the correctnessof the reduction and wrap up the proof of Lemma 7.7.38 tep 1: The auxiliary distributions D Y and D N . We start by defining D Y and D N . Let S = { i ∈ [ k ] | u i (cid:54) = v i } . Let k (cid:48) = | S | . Without loss of generality, we re-index the coordinates andassume S = { , , . . . , k (cid:48) } . Let a = u | S so that v | S = − a . We also let ˜ u = u | ¯ S denote the commonparts of u and v . Let D Y be the uniform distribution over { a , − a } , and D N be the uniformdistribution over { k (cid:48) , ( − k (cid:48) } . Note that µ ( D Y ) = µ ( D N ) = 0 k (cid:48) . Let D = Unif ( { u , v } ) and D = Unif ( { u ∨ v , u ∧ v } ). Let ε = ε ( D , u , v ) be the polarization amount. Let D ∈ ∆( {− , } k )be such that D = (1 − ε ) D + 2 ε D . Note that D u , v = (1 − ε ) D + 2 ε D .We give an informal idea now, before giving the (potentially notationally complex) details.The rough idea is that Alice and Bob first pad their inputs with lots of dummy variables (whosevalues are known to both) and expand the masks from D Y (or D N ) into masks that are from D (respectively D ). They then augment the sequence of masks from αn (cid:48) to αn = Ω( αn (cid:48) /ε ), injectingmany random masks from D . This gives them an instance of ( D , D u , v )- RMD to solve for whichthey use the protocol Π. It is not too hard to see all this can be done locally by Alice and Bob;and this is proved formally below.
Step 2: A reduction from ( D Y , D N ) - RMD to ( D , D u , v ) - RMD . Consider a protocol Π =(Π A , Π B ) for ( D , D u , v )- RMD with parameter α ≤ / (200 k ) using C ( n ) bits of communication toachieve an advantage of δ on instances of length n . We let n (cid:48) = ( k (cid:48) ε/k ) n where k (cid:48) was chosen inthe previous step. We also let α (cid:48) = (2 k/k (cid:48) ) α so that α (cid:48) ≤ / (100 k (cid:48) ). We use Π to design a protocolΠ (cid:48) for ( D Y , D N )- RMD with parameter α (cid:48) achieving advantage of at least δ/ n (cid:48) with communication C (cid:48) ( n (cid:48) ) = C ( n ). We conclude by Theorem 6.1 that there exists a constant τ (cid:48) such that C ( n ) ≥ τ (cid:48) √ n (cid:48) = τ √ n , where τ = τ (cid:48) (cid:112) εk (cid:48) /k > (cid:48) uses shared randomness between Alice and Bob (while we assume Π is de-terministic). Let n (cid:48)(cid:48) = kn (cid:48) /k (cid:48) so that n = n (cid:48)(cid:48) / (2 ε ). Let α (cid:48)(cid:48) = α (cid:48) n (cid:48) /n (cid:48)(cid:48) = kα/k (cid:48) . Recall thatan instance of ( D Y , D N )- RMD is determined by a four tuple ( x (cid:48) , M (cid:48) , z (cid:48) , b (cid:48) ) with x (cid:48) ∈ {− , } n (cid:48) , M (cid:48) ∈ { , } k (cid:48) α (cid:48) n (cid:48) × n (cid:48) and z (cid:48) , b (cid:48) ∈ {− , } k (cid:48) α (cid:48) n (cid:48) with z (cid:48) = M (cid:48) x (cid:48) (cid:12) b (cid:48) . See Figure 3 for a pictorialdescription. Figure 3: Pictorial description of ( x (cid:48) , M (cid:48) , b (cid:48) , z (cid:48) ).We give two maps using shared randomness R (cid:48) and R (cid:48)(cid:48) :(i) From ( D Y , D N ) - RMD to ( D , D ) - RMD : ( x (cid:48) , M (cid:48) , b (cid:48) , z (cid:48) , R (cid:48) ) (cid:55)→ ( x (cid:48)(cid:48) , M (cid:48)(cid:48) , b (cid:48)(cid:48) , z (cid:48)(cid:48) ) where x (cid:48)(cid:48) ∈{ , } n (cid:48)(cid:48) , M (cid:48)(cid:48) ∈ { , } kα (cid:48)(cid:48) n (cid:48)(cid:48) × n (cid:48)(cid:48) and b (cid:48)(cid:48) , z (cid:48)(cid:48) ∈ {− , } kα (cid:48)(cid:48) n (cid:48)(cid:48) .(ii) From ( D , D ) - RMD to ( D , D u , v ) - RMD : ( x (cid:48)(cid:48) , M (cid:48)(cid:48) , b (cid:48)(cid:48) , z (cid:48)(cid:48) , R (cid:48)(cid:48) ) (cid:55)→ ( x , M, b , z ), where x ∈{ , } n , M ∈ { , } kαn × n and b , z ∈ {− , } kαn .Before describing the two maps, let us first state the desired conditions.39 uccess conditions for the reduction (1) The reduction is locally well-defined.
Namely, there exist random strings R (cid:48) and R (cid:48)(cid:48) so that (i) Alice can get x through the maps ( x (cid:48) , R (cid:48) ) (cid:55)→ x (cid:48)(cid:48) and ( x (cid:48)(cid:48) , R (cid:48)(cid:48) ) (cid:55)→ x whileBob can get ( M, z ) through the maps ( M (cid:48) , z (cid:48) , R (cid:48) ) (cid:55)→ ( M (cid:48)(cid:48) , z (cid:48)(cid:48) ) and ( M (cid:48)(cid:48) , z (cid:48)(cid:48) , R (cid:48)(cid:48) ) (cid:55)→ ( M, z ).(2) The reduction is sound and complete.
Namely, (i) z (cid:48)(cid:48) = M (cid:48)(cid:48) x (cid:48)(cid:48) (cid:12) b (cid:48)(cid:48) and z = M x (cid:12) b . (ii) If b (cid:48) ∼ D α (cid:48) n (cid:48) Y then b (cid:48)(cid:48) ∼ D α (cid:48)(cid:48) n (cid:48)(cid:48) and b ∼ D αn . Similarly if b (cid:48) ∼ D α (cid:48) n (cid:48) N then b (cid:48)(cid:48) ∼ D α (cid:48)(cid:48) n (cid:48)(cid:48) and b ∼ D αn u , v . (iii) x (cid:48)(cid:48) ∼ Unif ( {− , } n (cid:48)(cid:48) ), x ∼ Unif ( {− , } n ) and M is auniformly random matrix conditioned on having exactly one “1” per row and at mostone “1” per column.In Claim 7.8 and Claim 7.9 we show that the above conditions hold except for an error eventthat occurs with tiny (exp( − n )) probability. For now, let us show that these conditions imply thesuccess of the reduction. Assuming conditions (1) and (2) the rest is simple. Alice computes x from x (cid:48) , R (cid:48) and R (cid:48)(cid:48) and sends m = Π A ( x ) to Bob, who computes ( M, z ) from M (cid:48) , z (cid:48) , R (cid:48) and R (cid:48)(cid:48) andoutputs Π B ( m, M, z ). Conditions (1)-(2) combined with the bound on the error event imply thatif Π has advantage δ then Π (cid:48) has advantage at least δ − exp( − n ) ≥ δ/ Step 3: Specify and analyze the first map.
We now turn to specifying the maps men-tioned above and proving that they satisfy conditions (1)-(2). We start with ( x (cid:48) , M (cid:48) , b (cid:48) , z (cid:48) , R (cid:48) ) (cid:55)→ ( x (cid:48)(cid:48) , M (cid:48)(cid:48) , b (cid:48)(cid:48) , z (cid:48)(cid:48) ). For this part, we let R (cid:48) ∼ Unif ( {− , } n (cid:48)(cid:48) − n (cid:48) ). We set x (cid:48)(cid:48) = ( x (cid:48) , R (cid:48) ). To get M (cid:48)(cid:48) , z (cid:48)(cid:48) and b (cid:48)(cid:48) we need some more notations. First, note that α (cid:48) n (cid:48) = α (cid:48)(cid:48) n (cid:48)(cid:48) due to the choice of parameters.Next, note that M (cid:48)(cid:48) can be viewed as the stacking of matrices M (cid:48) , . . . , M (cid:48) α (cid:48) n (cid:48) ∈ { , } k (cid:48) × n (cid:48) . We firstextend M (cid:48) i by adding all-zero columns at the end to get N (cid:48)(cid:48) i ∈ { , } k (cid:48) × n (cid:48)(cid:48) . We then stack N (cid:48)(cid:48) i on topof P (cid:48)(cid:48) i ∈ { , } ( k − k (cid:48) ) × n (cid:48)(cid:48) to get M (cid:48)(cid:48) i , where ( P (cid:48)(cid:48) i ) j(cid:96) = 1 if and only if (cid:96) = n (cid:48) + ( i − k + j . See Figure 5for a pictorial description of N (cid:48)(cid:48) i and P (cid:48)(cid:48) i . We let M (cid:48)(cid:48) be the stacking of M (cid:48)(cid:48) , . . . , M (cid:48)(cid:48) α (cid:48)(cid:48) n (cid:48)(cid:48) . Next weturn to b (cid:48)(cid:48) . Let b (cid:48) = ( b (cid:48) (1) , · · · , b (cid:48) ( α (cid:48)(cid:48) n (cid:48)(cid:48) )). Let ˜ u = ( u k (cid:48) +1 , . . . , u k ) denote the common parts of u and v . We let b (cid:48)(cid:48) ( i ) = ( b (cid:48) ( i ) , ˜ u ) and b (cid:48)(cid:48) = ( b (cid:48)(cid:48) (1) , · · · , b (cid:48)(cid:48) ( α (cid:48)(cid:48) n (cid:48)(cid:48) )). Finally we let z (cid:48)(cid:48) = M (cid:48)(cid:48) x (cid:48)(cid:48) (cid:12) b (cid:48)(cid:48) as required. See Figure 4 for a pictorial description.Figure 4: Pictorial description of ( x (cid:48)(cid:48) , M (cid:48)(cid:48) , b (cid:48)(cid:48) , z (cid:48)(cid:48) ).Now, we verify that the first map satisfies the success conditions mentioned above. Claim 7.8.
The first map in the reduction is locally well-defined, sound, and complete. roof. To see that the first map is locally well-defined, note that Alice can compute x (cid:48)(cid:48) = ( x (cid:48) , R (cid:48) )locally. Similarly, Bob can compute M (cid:48)(cid:48) locally by construction. As for z (cid:48)(cid:48) , note that z (cid:48)(cid:48) interleaves(in a predetermined order) the bits of z (cid:48) and those of ( P i x (cid:48)(cid:48) (cid:12) ˜ u ) i ∈ [ αn (cid:48) ] . Furthermore P i x (cid:48)(cid:48) dependsonly on R (cid:48) (since the first n (cid:48) columns of all P i s are zero). Thus Bob can locally compute P i x (cid:48)(cid:48) forevery i , and since ˜ u is also known Bob can compute z (cid:48)(cid:48) locally.To see the first map is sound and complete, (i) z (cid:48)(cid:48) = M (cid:48)(cid:48) x (cid:48)(cid:48) (cid:12) b (cid:48)(cid:48) follows from the construction. Asfor (ii), for each i ∈ [ α (cid:48) n (cid:48) ] = [ α (cid:48)(cid:48) n (cid:48)(cid:48) ], if b (cid:48) i ∼ D Y = Unif ( { a , − a } ), then b (cid:48)(cid:48) i ∼ Unif ( { ( a , ˜ u ) , ( − a , ˜ u ) } ).Note that a is chosen to be the uncommon part of u and v and hence ( a , ˜ u ) = u and ( − a , ˜ u ) = v .Thus, b (cid:48)(cid:48) i ∼ Unif ( { u , v } ) = D as desired. Similarly, one can show that if b (cid:48) i ∼ D N , then b (cid:48)(cid:48) i ∼ D .Finally, we have x (cid:48)(cid:48) ∼ Unif ( {− , } n (cid:48)(cid:48) ) by construction and hence (iii) holds.This completes the proof of conditions (1)-(2) for the first step of the reduction.Figure 5: Pictorial description of N (cid:48)(cid:48) i , P (cid:48)(cid:48) i , N i , P i . Step 4: Specify and analyze the second map.
We now turn to the second map. Here R (cid:48)(cid:48) will be composed of many smaller parts which we introduce now. Let y ∼ Unif ( {− , } n − n (cid:48)(cid:48) ), w ∼ Bern (2 ε ) αn . Let Γ ∈ { , } n × n be a uniform permutation matrix. Let c = ( c (1) , . . . , c (( n − n (cid:48)(cid:48) ) /k ))where c ( i ) ∼ D are chosen independently. We let R (cid:48)(cid:48) = ( y , w , Γ , c ). Let w ( i ) = |{ j ∈ [ i ] | w j = 1 }| denote the number 1’s among the first i coordinates of w . If w ( αn ) ≥ α (cid:48)(cid:48) n (cid:48)(cid:48) or if αn − w ( αn ) ≥ ( n − n (cid:48)(cid:48) ) /k we declare an error, Note E [ w ( n )] = α (cid:48)(cid:48) n (cid:48)(cid:48) / − n )).We now define the elements of ( x , M, b , z ). We set x = Γ( x (cid:48)(cid:48) , y ) so x is a random permutationof the concatenation of x (cid:48)(cid:48) and y . Next, let M (cid:48)(cid:48) = ( M (cid:48)(cid:48) , . . . , M (cid:48)(cid:48) α (cid:48)(cid:48) n (cid:48)(cid:48) ) where M (cid:48)(cid:48) i ∈ { , } k × n (cid:48)(cid:48) . Weextend M (cid:48)(cid:48) i to N i ∈ { , } k × n by adding all-zero columns to the right. For i ∈ { , . . . , ( n − n (cid:48)(cid:48) ) /k } ,let P i ∈ { , } k × n be given by ( P i ) j(cid:96) = 1 if and only if (cid:96) = n (cid:48)(cid:48) + ( i − k + j . See Figure 5 for apictorial description of N i and P i . Next we define a matrix ˜ M ∈ { , } kαn × n = ( ˜ M , . . . , ˜ M αn ) where˜ M i ∈ { , } k × n is defined as follows: If w i = 1 then we let ˜ M i = N w ( i ) else we let ˜ M i = P i − w ( i ) .Finally we let M = ˜ M · Γ − . Next we turn to b . Again let b (cid:48)(cid:48) = ( b (cid:48)(cid:48) (1) , . . . , b (cid:48)(cid:48) ( α (cid:48)(cid:48) n (cid:48)(cid:48) )). Welet b = ( b (1) , . . . , b ( αn )) where b ( i ) is defined as follows: If w i = 1 then b ( i ) = b (cid:48)(cid:48) ( w ( i )), else b ( i ) = c ( i − w ( i )). Finally, z = M x (cid:12) b . See Figure 6 for a pictorial description. This concludesthe description of the map and we turn to analyzing its properties.Figure 6: Pictorial description of x , w , M, b , z .41ow, we verify that the first map satisfies the success conditions mentioned above. Claim 7.9. If w ( αn ) ≤ α (cid:48)(cid:48) n (cid:48)(cid:48) and αn − w ( αn ) ≤ ( n − n (cid:48)(cid:48) ) /k , then the second map in thereduction is locally well-defined, sound, and complete. In particular, the error event happens withprobability at most exp( − Ω( n )) over the randomness of R (cid:48)(cid:48) .Proof. To see that the second map is locally well-defined, first note that Alice can compute x =Γ( x (cid:48)(cid:48) , y ) from x (cid:48)(cid:48) and the shared randomness R (cid:48)(cid:48) locally. As for Bob, note that the maximumindex needed for N and b (cid:48)(cid:48) (resp. P and c ) is at most w ( αn ) (resp. αn − w ( i )). Namely, if w ( αn ) ≤ α (cid:48)(cid:48) n (cid:48)(cid:48) and αn − w ( αn ) ≤ ( n − n (cid:48)(cid:48) ) /k , then M and b are well-defined. Also, using similarargument as in the proof of Claim 7.8, one can verify that M and b can be locally computed by M (cid:48)(cid:48) , b (cid:48)(cid:48) , and the shared randomness R (cid:48)(cid:48) .To see the second map is sound and complete, (i) z = M x (cid:12) b directly follows from theconstruction. As for (ii), if b (cid:48) ∼ D α (cid:48) n (cid:48) Y , from Claim 7.8 we know that b (cid:48)(cid:48) ∼ D α (cid:48) n (cid:48) = Unif ( { u , v } ) α (cid:48) n (cid:48) .Now, for each i ∈ [ αn ], b ( i ) = b (cid:48)(cid:48) ( w ( i )) with probability 2 ε and b ( i ) = c ( i − w ( i )) withprobability 1 − ε . As b (cid:48)(cid:48) ( i (cid:48) ) ∼ D for every i (cid:48) ∈ [ α (cid:48) n (cid:48) ] and c ( i (cid:48)(cid:48) ) ∼ D for every i (cid:48)(cid:48) ∈ [( n − n (cid:48)(cid:48) ) /k ],we have b ( i ) ∼ (1 − ε ) D + 2 ε D = D as desired. Similarly, one can show that for every i ∈ [ α (cid:48) n (cid:48) ] = [ α (cid:48)(cid:48) n (cid:48)(cid:48) ], if b (cid:48) ( i ) ∼ D α (cid:48) n (cid:48) N , then b ( i ) ∼ D u , v . Finally, we have x ∼ Unif ( {− , } n ) and M isa uniformly random matrix with exactly one “1” per row and at most one “1” per column (due tothe application of a random permutation Γ) by construction.This completes the proof of conditions (1)-(2) for the second step of the reduction. Step 5: Wrap up the proof of Lemma 7.7.
Proof of Lemma 7.7.
Let us start with setting up the parameters. Given k, α ∈ (0 , α ) , n, D , andincomparable pair ( u , v ) ∈ supp ( D ) and polarization amount ε = ε ( D , u , v ), let k (cid:48) = |{ i ∈ [ k ] | u i (cid:54) = v i }| , n (cid:48) = ( k (cid:48) ε/k ) n , α (cid:48) = (2 k/k (cid:48) ) α , n (cid:48)(cid:48) = kn (cid:48) /k (cid:48) , α (cid:48)(cid:48) = α (cid:48) n (cid:48) /n (cid:48)(cid:48) , and δ (cid:48) = δ/ A , Π B ) for( D , D u , v )- RMD with advantage δ and at most τ √ n bits of communication.First, observe that n − n (cid:48)(cid:48) = (1 − ε ) n and α (cid:48)(cid:48) n (cid:48)(cid:48) = 2 εαn . As w ∼ Bern ( ε ) αn , we have w ( αn ) ≤ α (cid:48)(cid:48) n (cid:48)(cid:48) and αn − w ( αn ) ≤ ( n − n (cid:48)(cid:48) ) /k with probability at least 1 − exp( − Ω( n )). Thus, combinewith Claim 7.8 and Claim 7.9, if ( x (cid:48) , M (cid:48) , z (cid:48) ) is a Yes (resp. No) instance of ( D Y , D N )- RMD ,then the output of the reduction, i.e., ( x , M, z ), is a Yes (resp. No) instance of ( D , D u , v )- RMD with probability at least 1 − exp( − Ω( n )). Moreover, Claim 7.8 and Claim 7.9 also show thatthe reduction can be implemented locally and hence Alice and Bob can run the protocol Π on( x , M, z ). In particular, Alice and Bob computes x and ( M, z ) using their inputs and sharedrandomness respectively. Then, Alice sends m = Π A ( x ) to Bob and Bob outputs Π B ( m, M, z ).By the correctness of the reduction as well as that of the protocol, we know that Alice and Bobhave advantage at least δ − exp( − Ω( n )) ≥ δ/ δ (cid:48) in solving ( D Y , D N )- RMD with at most τ √ n = τ (cid:112) ( k/ ( k (cid:48) ε )) n (cid:48) bits of communication.Finally, by Theorem 6.1, we know that there exists a constant τ > D Y , D N )- RMD with advantage δ (cid:48) requires at least τ √ n (cid:48) bits of communication. This impliesthat τ ≥ τ (cid:112) k (cid:48) ε/k . We conclude that any protocol for ( D , D u , v )- RMD with advantage δ requiresat least τ √ n bits of communication. 42 .3 Finite upper bound on the number of polarization steps In this section we prove that there is a finite upper bound on the number of polarization stepsneeded to move from a distribution
D ∈ ∆( {− , } k ) to the canonical distribution with marginal µ ( D ), i.e., D µ ( D ) . Together with the indistinguishability result from Lemma 7.7 this allows us tocomplete the proof of Theorem 5.3 by going from D Y to D µ ( D Y ) = D µ ( D N ) and then to D N byusing the triangle inequality for indistinguishability.In this section we extend our considerations to functions A : {− , } k → R ≥ . Let F ( {− , } k ) = { A : {− , } k → R ≥ } . For A ∈ F ( {− , } k ), let µ ( A ) = (cid:80) a ∈{− , } k A ( a ).Note ∆( {− , } k ) ⊆ F ( {− , } k ) and A ∈ ∆( {− , } k ) if and only if A ∈ F ( {− , } k ) and µ ( A ) = (cid:80) a ∈{− , } k A ( a ) = 1. We extend the definition of marginals, support, canonical distribution, poten-tial and polarization operators to F ( {− , } k ). In particular we let µ ( A ) = ( µ , µ , . . . , µ k ) where µ = µ ( A ) and µ j = (cid:80) a ∈{− , } k a j A ( a ) for j ∈ [ k ]. We also define canonical function and polariza-tion operators so as to preserve µ ( A ). So given arbitrary A , let D = µ ( A ) · A . Note D ∈ ∆( {− , } k ).For µ = ( µ , µ , . . . , µ k ) ∈ R k +1 , we define A µ = µ · D µ (cid:48) where µ (cid:48) = ( µ /µ , . . . , µ k /µ ) to be thecanonical function associated with µ . We remark that by Lemma 7.4 and Lemma 7.5, A µ ( A ) is theunique function such that (i) it has the same marginals as A and (ii) it supports a chain. Definition 7.10 (Polarization length) . For distribution A ∈ F ( {− , } k ) , let N ( A ) be the small-est t such that there exists a sequence A = A , A , . . . , A t such that A = A , A t = A µ ( A ) is canonical and for every i ∈ [ t ] it holds that there exists incomparable u i , v i ∈ supp ( A i − ) such that A i = ( A i − ) u i , v i . If no such finite sequence exists then let N ( A ) be infinite. Let N ( k ) = sup A ∈F ( {− , } k ) { N ( A ) } . Again, if N ( A ) = ∞ for some A or if no finite upper boundexists, N ( k ) is defined to be ∞ . Note that if
D ∈ ∆( {− , } k ) so is every element in the sequence, so the polarization lengthbound below applies also to distributions. Our main lemma in this subsection is the following: Lemma 7.11 (A finite upper bound on N ( k )) . N ( k ) is finite for every finite k . Specifically N ( k ) ≤ ( k + 3)(1 + N ( k − . We prove Lemma 7.11 constructively in the following four steps.
Step 1: Description of the algorithm
Polarize . Let us start with some notations. For A ∈ F ( {− , } k ) we let A | x (cid:96) = b denote the function A restricted to the subcube {− , } (cid:96) − × { b } ×{− , } k − (cid:96) . Note that A restricted to subcubes is effectively a ( k − lgorithm 2 Polarize ( · ) Input: A ∈ F ( {− , } k ). if k=2 then Output: A ( − , , (1 , − . ( A ) | x k = − ← Polarize ( A | x k = − ) ; ( A ) | x k =1 ← Polarize ( A | x k =1 ) ; t ← Let ( − k = a t (0) < · · · < a t ( k −
1) = (1 k − , −
1) be a chain supporting ( A t ) | x k = − . Let (( − k − ,
1) = b t (0) < · · · < b t ( k −
1) = 1 k be a chain supporting ( A t ) | x k =1 . while ∃ ( i, j ) with j < k − a t ( i ) ∨ b t ( j ) = 1 k and A t ( a t ( i )) , A t ( b t ( j )) > do Let ( i t , j t ) be the lexicographically smallest such pair ( i, j ). B t ← ( A t ) a t ( i t ) , b t ( j t ) . ( A t +1 ) | x k = − ← Polarize ( B t | x k = − ) ; ( A t +1 ) | x k =1 ← ( B t ) | x k =1 . t ← t + 1. Let ( − k = a t (0) < · · · < a t ( k −
1) = (1 k − , −
1) be a chain supporting ( A t ) | x k = − . Let (( − k − ,
1) = b t (0) < · · · < b t ( k −
1) = 1 k be a chain supporting ( A t ) | x k =1 . Let (cid:96) ∈ [ k ] be such that for every a ∈ {− , } k \ { k } we have A t ( a ) > ⇒ a (cid:96) = − ( A t +1 ) | x (cid:96) = − ← Polarize ( A t ) | x (cid:96) = − . ( A t +1 ) | x (cid:96) =1 ← ( A t ) | x (cid:96) =1 . Output: A t +1 .The goal of the rest of the proof is to show that Algorithm 2 terminates after a finite numberof steps and outputs A µ ( A ) . Step 2: Correctness assuming
Polarize terminates.Claim 7.12 (Correctness condition of
Polarize ) . For every A ∈ F ( {− , } k ) , if Polarize ter-minates, then
Polarize ( A ) = A µ ( A ) . In particular, Polarize ( A ) has the same marginals as A and is supported on a chain.Proof. First, by the definition of the polarization operator (Definition 7.3), the marginals of A t arethe same for every t . So in the rest of the proof, we focus on inductively showing that if Polarize terminates, then
Polarize ( A ) is supported on a chain.For the base case where k = 2, we always have Polarize ( A ) = A ( − , , (1 , − supported on achain as desired.When k >
2, note that when the algorithm enters the Clean-up stage, if we let m and n denotethe largest indices such that A t ( a t ( m )) , A t ( b t ( n )) > A t ( b t ( n )) (cid:54) = 1 k , then the condition that a t ( m ) ∨ b t ( n ) (cid:54) = 1 k implies that there is a coordinate (cid:96) such that a t ( m ) (cid:96) = b t ( n ) (cid:96) = −
1. Sinceevery c such that A t ( c ) > c k = − c ≤ a t ( m ), we have A t ( c ) > c (cid:96) = − c (cid:54) = 1 k such that c k = 1, we have A t ( c ) > c (cid:96) = −
1. We conclude that A t is supported on { k } ∪ { c | c (cid:96) = − } . Thus, by the induction hypothesis, after polarizing thesubcube x (cid:96) = − x (cid:96) = 1 unchanged, we get that the resulting function A t +1 is supported on a chain as desired and complete the induction. We conclude that if Polarize terminates, we have
Polarize ( A ) = A µ ( A ) . Step 3: Invariant in
Polarize . Now, in the rest of the proof of Lemma 7.11, the goal is toshow that for every input A , the number of iterations of the while loop in Algorithm 2 is finite. Thekey claim (Claim 7.16) here asserts that the sequence of pairs ( i t , j t ) is monotonically increasing44n lexicographic order. Once we establish this claim, it follows that there are at most k iterationsof the while loop and so N ( k ) ≤ ( k + 3) · (1 + N ( k − Claim 7.13.
For every t ≥ , we have ∀ b ∈ {− , } , ( A t ) | x k = b is supported on a chain.Proof. For b = −
1, the claim follows from the correctness of the recursive call to
Polarize . For b = 1, we claim by induction on t that the supporting chain b t (0) < · · · < b t ( k −
1) never changes(with t ). To see this, note that b t ( k −
1) = 1 k is the only point in the subcube { x k = 1 } that increasesin value compared to A t , and this is already in the supporting chain. Thus b t (0) < · · · < b t ( k − A t +1 ) | x k =1 .For c ∈ {− , } k , we say that a function A : {− , } k → R ≥ is c -subcube-respecting ( c -respecting, for short) if for every c (cid:48) such that A ( c (cid:48) ) >
0, we have c (cid:48) ≥ c or c (cid:48) ≤ c . We say that A is c -downward-respecting if A is c -respecting and the points in the support of A above c form apartial chain, specifically, if u , v > c have A ( u ) , A ( v ) > u ≥ v or v ≥ u .Note that if A is supported on a chain then A is c -respecting for every point c in the chain.Conversely, if A is supported on a chain and A is c -respecting, then A is supported on a chain thatincludes c . Claim 7.14 (Polarization on subcubes) . Let A be a c -respecting function and let ˜ A be obtained from A by a finite sequence of polarization updates, as in Definition 7.3. Then ˜ A is also c -respecting.Furthermore if A is c -downward-respecting and w > c then ˜ A is also c -downward-respecting and A ( w ) = ˜ A ( w ) .Proof. Note that it suffices to prove the claim for a single update by a polarization operator sincethe rest follows by induction. So let ˜ A = A u , v for incomparable u , v ∈ supp ( A ).Since A is c -respecting, and u , v are incomparable, either u ≤ c , v ≤ c or u ≥ c , v ≥ c . Supposethe former is true, then u ∨ v ≤ c and u ∧ v ≤ c , and hence, ˜ A is c -respecting. Similarly, in thecase when u ≥ c , v ≥ c , we can show that ˜ A is c -respecting. The furthermore part follows bynoticing that for u and v to be incomparable if A is c -downward-respecting and A ( u ) , A ( v ) > u , v ≤ c , and so the update changes A only at points below c .The following claim asserts that in every iteration of the while loop, by the lexicographicallyminimal choice of ( i t , j t ), there exists a coordinate h ∈ [ k −
1] such that every vector c < a t ( i t ) inthe support of A t , B t , or A t +1 has c h = −
1, and every vector c (cid:54) = 1 k in the support of ( A t ) | x k =1 has c h = − Claim 7.15.
For every t ≥ , ∃ h ∈ [ k − such that ∀ c ∈ {− , } k , if c ∈ supp ( A t ) ∪ supp ( B t ) ∪ supp ( A t +1 ) , then the following hold: • If c < a t ( i t ) , then c h = − . • If c k = 1 and c (cid:54) = 1 k , then c h = − .Proof. Since ( i t , j t ) is lexicographically the smallest incomparable pair in the support of A t , for i < i t , j < k −
1, and A t ( a ( i )) , A t ( b ( j )) >
0, we have a ( i ) ∨ b ( j ) (cid:54) = 1 k . Let m be the largest indexsmaller than i t such that A t ( a t ( m )) >
0. Similarly, let n < k − t ( b t ( n )) >
0. Then the fact that a t ( m ) ∨ b t ( n ) (cid:54) = 1 k implies that there exists h ∈ [ k −
1] suchthat a t ( m ) h = b t ( n ) h = −
1. Now, using the fact (from Claim 7.13) that ( A t ) | x k = − is supportedon a chain, we conclude that for every c < a t ( i t ), A t ( c ) > c ≤ a t ( m ) and hence, c h = −
1. Similarly, for every vector c (cid:54) = 1 k in the support of ( A t ) | x k =1 , by the maximality of n , wehave c h = − B t . First, recall that supp ( B t ) ⊂ supp ( A t ) ∪ { k , a t ( i t ) ∧ b t ( j t ) } since B t = ( A t ) a t ( i t ) , b t ( j t ) . Next, note that the only point (other than 1 k ) where B t is largerthan A t is a t ( i t ) ∧ b t ( j t ). It suffices to show that ( a t ( i t ) ∧ b t ( j t )) h = −
1. We have a t ( i t ) ∧ b t ( j t ) ≤ b t ( j t ) ≤ b t ( n ) and hence ( a t ( i t ) ∧ b t ( j t )) h = − A t +1 . Since A t +1 | x k =1 = B t | x k =1 , the second item inthe claim follows trivially. To prove the first item, let us consider a (cid:48) ∈ {− , } k defined as follows: a (cid:48) h = − a (cid:48) r = a t ( i t ) r for r (cid:54) = h . Note that B t | x k = − is a t ( i t )-respecting since potentially theonly new point in its support (compared to A t | x k = − ) is a t ( i t ) ∧ b t ( j t ) ≤ a t ( i t ). From the previousparagraph we also have that if B t ( c ) > c < a t ( i t ), then c h = − c ≤ a (cid:48) . Onthe other hand, if B t ( c ) > c ≥ a t ( i t ), then c ≥ a (cid:48) . Therefore, B t | x k = − is a (cid:48) -respecting. Byapplying Claim 7.14, we conclude that ( A t +1 ) | x k = − is also a (cid:48) -respecting. It follows that if c < a ( i t )and A t +1 ( c ) >
0, then c ≤ a (cid:48) and so c h = − Step 4: Proof of Lemma 7.11.
The following claim establishes that the while loop in the
Polarize algorithm terminates after a finite number of iterations.
Claim 7.16.
For every t ≥ , ( i t , j t ) < ( i t +1 , j t +1 ) in lexicographic ordering.Proof. Consider the chain a t +1 (0) < · · · < a t +1 ( k −
1) supporting A t +1 | x k = − . Note that for i ≥ i t , A t +1 | x k = − is a t ( i )-respecting (since A t | x k = − and B t | x k = − were also so). In particular, A t | x k = − is a t ( i )-respecting because it is supported on a chain containing a t ( i ). Next B t | x k = − is a t ( i )-respecting since potentially the only new point in its support is a t ( i t ) ∧ b t ( j t ) ≤ a t ( i ). Finally, A t +1 | x k = − is also a t ( i )-respecting using Claim 7.14. Thus we can build a chain containing a t ( i )that supports A t +1 | x k = − . It follows that we can use a t +1 ( i ) = a t ( i ) for i ≥ i t . Now consider i < i t .We must have a t +1 ( i ) < a t +1 ( i t ) = a t ( i t ). By Claim 7.15, there exists h ∈ [ k −
1] such that for i < i t , a t +1 ( i ) h = − i t +1 , j t +1 ). Note that by definition, A t +1 ( a t +1 ( i t +1 )) > A t +1 ( b t +1 ( b t +1 )) >
0. First, let us show that i t ≤ i t +1 . On the contrary, let us assume that i t +1 < i t . It follows from the above paragraph that a t +1 ( i t +1 ) h = −
1. Also, for every b t +1 ( j ) with j < k − A t +1 ( b t +1 ( j )) >
0, we have b t +1 ( j ) h = −
1. Therefore, a ( i t +1 ) ∨ b ( j t +1 ) (cid:54) = 1 k (inparticular ( a ( i t +1 ) ∨ b ( j t +1 )) h = − i t +1 = i t , then j t +1 ≥ j t . By the minimality of ( i t , j t ) in the t -th round, for j < j t such that A t ( b t ( j )) >
0, we have a t ( i t ) ∨ b t ( j ) (cid:54) = 1 k . Since i t +1 = i t , a t +1 ( i t +1 ) = a t +1 ( i t ) = a t ( i t ). We already noted in the proof of Claim 7.13 that b t (0) < · · · < b t ( k −
1) is also a supportingchain for ( A t +1 ) | x k =1 . The only point where the function A t +1 | x k =1 has greater value than A t | x k =1 is 1 k . Therefore, for j < j t such that A t +1 ( b t +1 ( j )) >
0, we have a t +1 ( i t +1 ) ∨ b t +1 ( j ) (cid:54) = 1 k andhence, j t +1 ≥ j t .So far, we have established that ( i t +1 , j t +1 ) ≥ ( i t , j t ) in lexicographic ordering. Finally, we willshow that ( i t +1 , j t +1 ) (cid:54) = ( i t , j t ) by proving that at least one of A t +1 ( a t +1 ( i t )) and A t +1 ( b t +1 ( j t ))is zero. The polarization update ensures that at least one of B t ( a t ( i t )) and B t ( b t ( j t )) is zero.If B t ( b t ( j t )) = 0, then by definition, we have A t +1 ( b t +1 ( j t )) = A t +1 ( b t ( j t )) = 0. Finally to46andle the case B t ( a t ( i t )) = 0, let us again define a (cid:48) as: a (cid:48) h = − a (cid:48) r = a t ( i t ) r for r (cid:54) = h , where h is as given by Claim 7.15. We assert that B t | x k = − is a (cid:48) -downward-respecting. Asshown in the proof of Claim 7.15, we have B t | x k = − is a (cid:48) -respecting. The support of B t | x k = − iscontained in { a t (0) , · · · , a t ( k − } ∪ { a t ( i t ) ∧ b t ( j t ) } and a t ( i t ) ∧ b t ( j t ) < a t ( i t ), and by Claim 7.15, a t ( i t ) ∧ b t ( j t ) ≤ a (cid:48) . It follows that B t | x k = − is a (cid:48) -downward-respecting. Finally, by the furthermorepart of Claim 7.14 applied to B t | x k = − and w = a t ( i t ), we get that A t +1 ( a t +1 ( i t )) = A t +1 ( a t ( i t )) = B t ( a t ( i t )) = 0. It follows that ( i t +1 , j t +1 ) (cid:54) = ( i t , j t ). Proof of Lemma 7.11.
By Claim 7.12, we know that if Algorithm 2 terminates, we have
Polarize ( A ) = A µ ( A ) . Hence, the maximum number of polarization updates used in Polar-ize (on input from F ( {− , } k )) serves as an upper bound for N ( k ). By Claim 7.16, we knowthat there are at most k iterations of the while loop and so N ( k ) ≤ ( k + 3) · (1 + N ( k − We now have the ingredients in place to prove Theorem 5.3.
Proof of Theorem 5.3.
Given distribution D Y , D N with µ = µ ( D Y ) = µ ( D N ). ApplyingLemma 7.11 to D Y we get there exist D = D Y , D , . . . , D t = D µ such that D i +1 = ( D i ) u ( i ) , v ( i ) ,i.e., D i is an update of D i , with t ≤ N ( k ) < ∞ . Similarly applying Lemma 7.11 to D N we get thereexist D (cid:48) = D N , D (cid:48) , . . . , D (cid:48) t (cid:48) = D µ such that D (cid:48) i +1 = ( D (cid:48) i ) u (cid:48) ( i ) , v (cid:48) ( i ) with t (cid:48) ≤ N ( k ) < ∞ . ApplyingLemma 7.7 with δ (cid:48) = δ/ (2 N ( k )) to the pairs D i and D i +1 we get there exists τ i such that everyprotocol for ( D i , D i +1 )- RMD requires τ i √ n bits of communication to achieve advantage δ (cid:48) . Simi-larly applying Lemma 7.7 again with δ (cid:48) = δ/ (2 N ( k )) to the pairs D (cid:48) i and D (cid:48) i +1 we get there exists τ (cid:48) i such that every protocol for ( D (cid:48) i , D (cid:48) i +1 )- RMD requires τ (cid:48) i √ n bits of communication to achieveadvantage δ (cid:48) . Letting τ = min (cid:8) min i ∈ [ t ] { τ i } , min i ∈ [ t (cid:48) ] { τ (cid:48) i } (cid:9) , we get, using the triangle inequality forindistinguishability, that every protocol Π for ( D Y , D N )- RMD achieving advantage δ ≥ ( t + t (cid:48) ) δ (cid:48) requires τ √ n communication. Acknowledgments
Thanks to Johan H˚astad for many pointers to the work on approximation resistance and answersto many queries. Thanks to Dmitry Gavinsky, Julia Kempe and Ronald de Wolf for prompt anddetailed answers to our queries on outdated versions of their work [GKK + (cid:96) norm estimation algorithms. References [AKSY20] Sepehr Assadi, Gillat Kol, Raghuvansh R Saxena, and Huacheng Yu. Multi-Pass GraphStreaming Lower Bounds for Cycle Counting, MAX-CUT, Matching Size, and OtherProblems. In
FOCS 2020 , 2020. 47AM09] Per Austrin and Elchanan Mossel. Approximation resistant predicates from pairwiseindependence.
Comput. Complex. , 18(2):249–271, 2009.[BPR06] Saugata Basu, Richard Pollack, and Marie-Fran¸coise Roy.
Algorithms in Real AlgebraicGeometry . Springer, 2006.[Bul17] Andrei A. Bulatov. A dichotomy theorem for nonuniform CSPs. In Chris Umans, editor,
FOCS 2017 , pages 319–330. IEEE, 2017.[BV04] Stephen P. Boyd and Lieven Vandenberghe.
Convex optimization . Cambridge UniversityPress, 2004.[CGV20] Chi-Ning Chou, Alexander Golovnev, and Santhoshini Velusamy. Optimal streamingapproximations for all Boolean Max-2CSPs and Max- k SAT. In
FOCS 2020 . IEEE,2020.[Cha20] Amit Chakrabarti. Data stream algorithms.
Lecture notes , page 94, 2020.[GKK +
09] Dmitry Gavinsky, Julia Kempe, Iordanis Kerenidis, Ran Raz, and Ronald de Wolf. Ex-ponential separation for one-way quantum communication complexity, with applicationsto cryptography.
SIAM J. Comput. , 38(5):1695–1708, 2009.[GM08] Sudipto Guha and Andrew McGregor. Tight lower bounds for multi-pass stream com-putation via pass elimination. In
ICALP 2008 , pages 760–772. Springer, 2008.[GT19] Venkatesan Guruswami and Runzhou Tao. Streaming hardness of unique games. In
APPROX 2019 , pages 5:1–5:12. LIPIcs, 2019.[GVV17] Venkatesan Guruswami, Ameya Velingker, and Santhoshini Velusamy. Streaming com-plexity of approximating Max 2CSP and Max Acyclic Subgraph. In
APPROX 2017 .LIPIcs, 2017.[Ind00] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings and datastream computation. In
FOCS 2000 , pages 189–197. IEEE, 2000.[Kho02] Subhash Khot. On the power of unique 2-prover 1-round games. In
STOC 2002 , pages767–775. ACM, 2002.[KK19] Michael Kapralov and Dmitry Krachun. An optimal space lower bound for approxi-mating MAX-CUT. In
STOC 2019 , pages 277–288. ACM, 2019.[KKL88] Jeff Kahn, Gil Kalai, and Nathan Linial. The influence of variables on Boolean functions.In
FOCS 1988 , pages 68–80. IEEE, 1988.[KKS15] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds forapproximating MAX-CUT. In
SODA 2015 , pages 1263–1282. SIAM, 2015.[KKSV17] Michael Kapralov, Sanjeev Khanna, Madhu Sudan, and Ameya Velingker. (1 + ω (1))-approximation to MAX-CUT requires linear space. In SODA 2017 , pages 1703–1722.SIAM, 2017. 48KNW10] Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexityof sketching and streaming small norms. In
SODA 2010 , pages 1161–1178. SIAM, 2010.[KTW14] Subhash Khot, Madhur Tulsiani, and Pratik Worah. A characterization of strong ap-proximation resistance. In
STOC 2014 , pages 634–643, 2014.[McG14] Andrew McGregor. Graph stream algorithms: a survey.
SIGMOD Record , 43(1):9–20,2014.[O’D14] Ryan O’Donnell.
Analysis of Boolean functions . Cambridge University Press, 2014.[Pot19] Aaron Potechin. On the approximation resistance of balanced linear threshold functions.In Moses Charikar and Edith Cohen, editors,
STOC 2019 , pages 430–441. ACM, 2019.[Rag08] Prasad Raghavendra. Optimal algorithms and inapproximability results for every CSP?In
STOC 2008 , pages 245–254, 2008.[Sch78] Thomas J. Schaefer. The complexity of satisfiability problems. In
STOC 1978 , pages216–226. ACM, 1978.[Yao77] Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of com-plexity. In
FOCS 1977 , pages 222–227. IEEE, 1977.[Zhu17] Dmitriy Zhuk. A proof of CSP dichotomy conjecture. In