A Structural Theorem for Local Algorithms with Applications to Coding, Testing, and Privacy
AA Structural Theorem for Local Algorithmswith Applications to Coding, Testing, and Privacy
Marcel Dall’AgnolUniversity of Warwick [email protected]
Tom Gur ∗ University of Warwick [email protected]
Oded LachishBirkbeck, University of London [email protected]
Abstract
We prove a general structural theorem for a wide family of local algorithms, which includesproperty testers, local decoders, and PCPs of proximity. Namely, we show that the structure ofevery algorithm that makes q adaptive queries and satisfies a natural robustness condition admitsa sample-based algorithm with n − /O ( q log q ) sample complexity, following the definition ofGoldreich and Ron (TOCT 2016). We prove that this transformation is nearly optimal. Ourtheorem also admits a scheme for constructing privacy-preserving local algorithms.Using the unified view that our structural theorem provides, we obtain results regardingvarious types of local algorithms, including the following. • We strengthen the state-of-the-art lower bound for relaxed locally decodable codes, obtainingan exponential improvement on the dependency in query complexity; this resolves an openproblem raised by Gur and Lachish (SODA 2020). • We show that any (constant-query) testable property admits a sample-based tester withsublinear sample complexity; this resolves a problem left open in a work of Fischer, Lachish,and Vasudev (FOCS 2015) by extending their main result to adaptive testers. • We prove that the known separation between proofs of proximity and testers is essen-tially maximal; this resolves a problem left open by Gur and Rothblum (ECCC 2013,Computational Complexity 2018) regarding sublinear-time delegation of computation.Our techniques strongly rely on relaxed sunflower lemmas and the Hajnal–Szemer´edi theorem.
Keywords: local algorithms, sample-based algorithms, coding theory, property testing,adaptivity, sunflower lemmas. ∗ Tom Gur is supported by the UKRI Future Leaders Fellowship MR/S031545/1. a r X i v : . [ c s . CC ] O c t ontents A Deferred proofs 52
Introduction
Sublinear-time algorithms are central to the theory of algorithms and computational complexity.Moreover, with the surge of massive datasets in the last decade, understanding the power ofcomputation in sublinear time rapidly becomes crucial for real-world applications. Indeed, in recentyears this notion received a great deal of attention, and algorithms for a plethora of problems werestudied extensively.Since algorithms that run in sublinear time cannot even afford to read the entirety of theirinput, they are forced to make decisions based on a small local view of the input and are thusoften referred to as local algorithms . Prominent notions of local algorithms include property testers [RS96, GGR98], which are probabilistic algorithms that solve approximate decision problems byonly probing a minuscule portion of their input; locally decodable codes (LDCs) [KT00] and locallytestable codes (LTCs) [GS06], which are codes that, using a small number of queries to their input,admit algorithms that decode individual symbols and test the validity of the encoding, respectively;and probabilistically checkable proofs (PCPs) [FGL +
91, AS92, ALM + conceptual contributionsis capturing a fundamental structural property that is common to all of algorithms above andbeyond, which in turn implies sufficient structure for obtaining our main result. We build onwork of Fischer, Lachish and Vasudev [FLV15] as well as Gur and Lachish [GL20], which implyan essentially equivalent structure for non-adaptive testers and local decoders, respectively; ourgeneralisation captures both and extends beyond them to the adaptive setting, as well as to otherclasses of algorithms.More specifically, we first formalise the notion of local algorithms in the natural way: we definethem simply as probabilistic algorithms that compute some function f ( x ), with high probability,by making a small number of queries to the input x . We then observe that, except for degeneratecases, having a promise on the input is necessary for algorithms that make a sublinear number ofqueries to it. Finally, we formalise a natural robustness condition that captures this phenomenonand is shared by most reasonable interpretations of local algorithms. We say that a local algorithm is robust if its output remains stable under minor perturbations ofthe input. To make the discussion more precise, we define a ( ρ , ρ ) -robust local algorithm M forcomputing a function f : { , } n → { , } as a local algorithm that satisfies the following: for everyinput w that is ρ -close to x such that f ( x ) = 0, we have M w = 0 with high probability; and, forevery w that is ρ -close to x such that f ( x ) = 1, we have M w = 1 w.h.p. We remark that ourresults extend to larger alphabets (see Section 4, where we formally define robust local algorithms).We illustrate the expressivity of robust local algorithms via two examples: property testing andlocally decodable codes. We remark that similarly, locally testable codes, locally correctable codes,1 a) A local decoder for the code C with decoding radius δ . Codewords whose i th message bit equals 0 (resp. 1)for a fixed i comprise C (resp. C ). The decoder isrobust in the δ/ C , shaded blue andgreen. Inputs in the red area are within distance δ from C , but their δ/ ε -tester for property Π. Inputs inthe blue shaded area are 2 ε -far from Π,where the tester is robust. While inputsin the red shaded area are rejected by thetester, this is not necessarily the case fortheir ε -neighbourhoods. Figure 1: Casting local decoders and property testers as robust local algorithms.relaxed LDCs, PCPs of proximity, and other notions can all be cast as robust local algorithms (seeSections 4.2 to 4.4).
Locally decodable codes.
An LDC is a code that admits algorithms for decoding each individualbit of the message of a moderately corrupted codeword; that is, a code C : { , } k → { , } n with decoding radius δ for which there exists a probabilistic algorithm D that, given an index i ∈ [ k ],makes queries to a string w promised to be δ -close to a codeword C ( x ) and outputs D w ( i ) = x i with high probability. Observe that D can be viewed as a ( δ/ , δ/ δ/
2. This is because the δ/ δ/ Property testing.
The field of property testing deals with algorithms that solve approximatedecision problems by only probing a minuscule part of their input; more precisely, an ε -tester for aproperty Π is an algorithm that queries a string x and, with high probability, outputs 1 if x ∈ Π,and outputs 0 if x is ε -far from Π. Here, unlike with local decoders, there is no robustness at allwith respect to 1-inputs. Indeed, we can only cast an ε -tester as an ( ε, ε -testing the property Π: doubling the proximity parameter ensures the ε -neighbourhood of each0-input is still rejected (see Fig. 1b). We refer to such robustness as one-sided .By the discussion above, our scope includes local algorithms that only exhibit one-sided robust-ness. Accordingly, we define robust local algorithms (without specifying the robustness parameters)as ( ρ , ρ )-robust local algorithms where max { ρ , ρ } = Ω(1). We stress that while dealing with Although Theorem 1 assumes an algorithm satisfying this condition, we remark that a weaker one suffices.Supposing (without loss of generality) that ρ ≥ ρ , only a single input x must be such that M w = 0 for every w thatis Ω(1)-close to x ; then the result follows even for ρ = Θ( n − /q ) = o (1), where q is the query complexity of M . In this work, we capture structural properties that are common to all robust local algorithms andleverage this structure to obtain a transformation that converts them into sample-based algorithms ;that is, algorithms that are provided with uniformly distributed labeled samples, or alternatively,are only allowed to query each coordinate independently with some fixed probability. Adopting thelatter perspective, the sample complexity of such an algorithm is the expected number of coordinatesthat it samples. In the following, we use n to denote the input size. Theorem 1 (informally stated, see Theorem 6.1) . Every robust local algorithm with query complexity q can be transformed into a sample-based local algorithm with sample complexity n − /O ( q log q ) . It is important to point out that the robustness in Theorem 1 is only required on part of theinput space (i.e., need only be one-sided); indeed, otherwise the structural properties capturedbecome much more restrictive (and are not shared by, e.g., property testers).Moreover, we prove that the transformation in Theorem 1 is optimal up to a quadratic factorin the dependency on the query complexity; that is, q -query robust local algorithms cannot betransformed into sample-based algorithms with sample complexity n − /o ( q ) (see Section 7.3 for amore precise statement).Our proof of Theorem 1 strongly relies on analysing the query behaviour of robust local algorithmsby partitioning their local views into relaxed sunflowers and using volume lemmas that are impliedby their robustness. We build on the Hajnal–Szemer´edi theorem to analyse sampling from relaxedsunflowers (see Section 2 for a detailed technical overview).By the generality of our definition, we can apply Theorem 1 to a wide family of well-studiedalgorithms such as locally testable codes, locally decodable and correctable codes, relaxed LDCs,universal LTCs, PCPs of proximity, and more (see Section 4 for details on how to cast thesealgorithms as robust local algorithms).We note that [FLV15] and [GL20] obtain an essentially equivalent transformation for testers anddecoders, respectively, through “lossy” versions of our relaxed sunflower lemmas (they extract one relaxed sunflower from the local views, rather than partition them) that only holds if the algorithmis non-adaptive ; by a trivial transformation from adaptive to non-adaptive algorithms that incursan exponential increase in the query complexity, these previous works show transformations whosesample-based algorithms have complexity n − / exp( q ) , which we reduce to n − / poly( q ) (indeed, asfar down as n − / ˜ O ( q ) ). Motivation.
The notion of sample-based local algorithms was first defined in [GGR98], and itssystematic study was initiated by Goldreich and Ron [GR16]. This notion is both intrinsicallyinteresting as a natural and general model of computation, as well as due to its practical potential,as in many applications implementing full query access is impractical, but obtaining random samplesis much easier. Moreover, sample-based local algorithms admit schemes for multi-computation(i.e., simultaneously computing multiple functions of the input using the same queries), as well asschemes for privacy-preserving local computation, on which we elaborate below.Importantly, the strength and generality of Theorem 1 allows us to obtain applications andresolve several open problems in different settings, ranging from lower bounds on local codes to3eparations between the hardness of approximate decision versus verification. See Section 1.3 for adiscussion of these applications.
Privacy-preserving local computation.
We wish to highlight an interesting application ofTheorem 1 to privacy. Suppose a client wishes to compute a function of data that is stored on aserver, e.g., (relaxed) decode a symbol of a code or test whether the data has a certain property.Typically, the query behaviour of a local algorithm may leak information on which function theclient attempts to compute. However, since sample-based algorithms probe their input uniformly,they can be used to compute the desired function without revealing any information on whichfunction was computed, e.g., which coordinate was decoded or which property was tested.Furthermore, since Theorem 1 transforms any robust local algorithm into a sample-based localalgorithm that probes its input obliviously to the function it computes , then (after standard error-reduction) we can apply Theorem 1 to many algorithms at once and reuse the samples to obtain alocal algorithm that computes multiple functions at the same time (see Section 1.3.2). We proceed to present the main applications that we obtain from Theorem 1, which range over threefields of study: coding theory, property testing, and probabilistic proof systems. In the following,we state our results and discuss the context and motivation for our applications to each one ofthese fields. We remark that our main application to property testing follows as a direct corollaryof Theorem 1, whereas our applications to coding theory and probabilistic proof systems requireadditional arguments.
The notion of LDCs plays an important role in contemporary coding theory. Indeed, since thesystematic study of LDCs was initiated in the highly influential work of Katz and Trevisan [KT00],these codes received much attention and made a profound impact on algorithms, cryptography,complexity theory, program checking, data structures, quantum computing, pseudorandomness,and other areas in theoretical computer science (see surveys [Tre04, Yek12, KS17] and referencestherein), as well as led to significant practical applications in distributed storage [HSX + O (1)-query LDCs has a super-polynomial blocklength (cf. [Efr12], building on [Yek08]). Thisbarrier has led to the study of relaxed LDCs , which were introduced in the seminal work of Ben-Sasson, Goldreich, Harsha, Sudan, and Vadhan [BGH + relaxed LDC C : { , } k → { , } n with decoding radius δ is a code that admits a probabilistic algorithm, a decoder, which on index i ∈ [ k ]makes queries to a string w ∈ { , } n that is δ -close to a codeword C ( x ) and satisfies the following:(1) if the input is a valid codeword (i.e., w = C ( x )), the decoder outputs x i with high probability; We stress that such obliviousness does not not correspond to a differential privacy guarantee. x i or a special “abort”symbol ⊥ , indicating it detected an error and is unable to decode. This seemingly modest relaxation allows for obtaining dramatically stronger parameters. Indeed,Ben-Sasson et al. [BGH +
04] constructed a q -query relaxed LDC with blocklength n = k / Ω( √ q ) ,and raised the problem of whether it is possible to obtain a better rate; the best known construction,obtained in recent work of Asadi and Shinkar [AS20], improves it to n = k / Ω( q ) . We stress that proving lower bounds on relaxed LDCs is significantly harder than on standard LDCs , and indeed,the first non-trivial lower bound was only recently obtained in [GL20], which shows that, to obtainquery complexity q , the code must have blocklength n ≥ k O ( q · log2 q ) . This shows that O (1)-query relaxed LDCs cannot obtain quasilinear length, a question raised in[Gol04], but still leaving exponential room for improvement in the dependency on query complexity(note that even for q = O (1) this strongly affects the asymptotic behaviour). Indeed, eliminatingthis exponential dependency was raised as the main open problem in [GL20].Fortunately, our technical framework is general enough to capture relaxed LDCs as well, and inturn, our main application for coding theory resolves the aforementioned open problem by obtaininga lower bound with an exponentially better dependency on the query complexity. Along the way,we also extend the lower bound to hold for relaxed decoders with two-sided error , resolving anotheropen problem left open in [GL20].
Theorem 2 (informally stated, see Corollary 7.8) . Any relaxed LDC C : { , } k → { , } n withconstant decoding radius δ and query complexity q must have blocklength at least n ≥ k O ( q q ) . This also makes significant progress towards resolving the open problem due to Ben-Sasson et al.[BGH + Property testers are one of the most widely studied types of sublinear-time algorithms (see, e.g., therecent textbook [Gol17]). Recall that an ε -tester for a property Π solves the approximate decisionproblem of accepting inputs in Π and rejecting inputs that are ε -far from Π.The standard definition of property testing endows the tester with the ability to make queries.An alternative definition, first presented in [GGR98], only provides the tester with uniformlydistributed labeled samples (or, equivalently, it is only allowed to make uniformly and independentlydistributed queries). Such algorithms are called sample-based testers and have received attentionrecently [GR16, FGL14, BGS15, FLV15, CFSS17, BMR19a, BMR19b], both because of their intrinsicinterest as a natural model of computation, as well as due to their practical potential. Indeed, whilesample-based testers are typically much less efficient than query-based testers, in many applicationsobtaining full query capacity is impractical, and random samples are much easier to obtain.As an immediate corollary of Theorem 1, we obtain that any constant-query testable property( o ( (cid:112) log n/ log log n )-query, in fact) admits a sample-based tester with sublinear sample complexity. As observed in [BGH + ⊥ on an arbitrarily small fraction of the coordinates. Theorem 3 (Informally stated; see Corollary 7.1) . Any property that is ε -testable with q queriesadmits a sample-based ε -tester with sample complexity n − /O ( q log q ) . This also admits an application to adaptive multi-testing , where the goal is to simultaneously test alarge number of properties. In Section 4.2 we show that as a corollary of Theorem 3 we can multi-test,with sublinear query complexity, exponentially many properties, namely k = exp (cid:16) n /ω ( q log q ) (cid:17) ,that are each testable with q adaptive queries. Proofs of proximity [EKK +
00, BGH +
04, DR06, RVW13, GR18] are probabilistic proof systems thatallow for delegation of computation in sublinear time. These proof systems have been extensivelystudied in recent years and found applications to various local algorithms and cryptographic protocols(e.g.,[FGL14, KR15, GGK15, GG16b, GR17, BRV18, CG18, RRR19, BSCG +
19, CGS20]).In the non-interactive setting, we have a verifier that wishes to ascertain the validity of a givenstatement, using a short (sublinear length) explicitly given proof, and a sublinear number of queriesto its input. Since the verifier cannot even read the entire input, it is only required to reject inputsthat are far from being valid. Thus, the verifier is only assured of the proximity of the statement toa correct one. Such proof systems can be viewed as the NP (or more accurately MA) analogue ofproperty testing, and are referred to as MA proofs of proximity (MAPs).As such, one of the most fundamental questions regarding proofs of proximity is their relativestrength in comparison to testers; that is, whether verifying a proof for an approximate decisionproblem can be done significantly more efficiently than solving it. One of the main results in [GR18]is that this is indeed the case. Namely, there exists a property Π which: (1) admits an adaptiveMAP with proof length O (log( n )) and query complexity q = O (1); and (2) requires at least n − / Ω( q ) queries to be tested without access to a proof. While the above shows a chasm between the power of testers and MAPs, it remained openwhether the aforementioned sublinear lower bound on testing is an artefact of the techniques and itwould possible to obtain a stronger separation, where the property is harder for testers (potentiallyrequiring even Ω( n ) queries).In Section 7.3 we use Theorem 1 to show that the foregoing separation is nearly tight. Theorem 4 (Informally stated; see Theorem 7.11) . Any property that admits an adaptive MAP withquery complexity q and proof length p also admits a tester with query complexity p · n − /O ( q log q ) . Interestingly, we remark that we rely on Theorem 4 to prove the (near) optimality of Theorem 1(see Section 2.5 for details).
Our work leaves several interesting directions and open problems that we wish to highlight. Firstly,we stress that our structural theorem is extremely general, and indeed the robustness condition that We remark that the bound in [GR18] is stated in a slightly weaker form. However, it is straightforward to seethat the proof achieves the bound stated above. See Section 7.3.
Open question 1.
Can Theorem 1 and the framework of robust local algorithms be used to obtainquery-to-sample transformations for PAC learners and LCAs?One promising area that we did not explore is achieving rate lower bounds on PCPs of proximity(PCPPs). Such bounds are notoriously hard to get, and indeed the only such bounds we are awareof are those in [BSHLM09], which are restricted to special setting of 3-query PCPPs. We remarkthat our framework captures PCPPs, and that in light of the rate lower bounds it allowed us toobtain for relaxed LDCs, it would be interesting to see if it could be used to show rate lower boundson PCPPs.
Open question 2.
Is it possible to obtain rate lower bounds on q -query PCPs of proximity for q > q -query robust local algorithms into sample-based local algorithms with sample complexity n − /O ( q log q ) , whereas in Section 7.3 we show that any such transformation must yield an algorithmwith sample complexity n − / Ω( q ) . This still leaves a quadratic gap in the dependency on querycomplexity. We remark that closing this gap could lead to fully resolving an open question raised in[BGH +
04] regarding the power of relaxed LDCs.
Open question 3.
What is the optimal sample complexity obtained by a transformation fromrobust local algorithms to sample-based local algorithms?Finally, in Section 1.2 we discuss an application of our main result to privacy-preserving localcomputation, where one can compute one out of a large collection of functions without revealing anyinformation regarding which function was computed. While Theorem 1 implies such a scheme thatonly requires probing the input in sublinear locations, the number of probes is quite high. Moreover,by the near-tightness of our result, we cannot expect a significant improvement of this scheme.Nevertheless, we find it very interesting to explore whether for structured families of functions (as,for example, admitted by the canonical tester for dense graphs in [GT03]), or for statistical orcomputational notions of privacy, the query complexity of this scheme can be significantly reduced.
Open question 4 (Private testing of small families) . Do there exist schemes for privacy-preservinglocal computation with small query complexity?
Organisation
The rest of the paper is organised as follows. In Section 2, we provide a high-level technical overviewof the proof of our main result and its applications. In Section 3, we briefly discuss the preliminariesfor the technical sections. In Section 4, we present our definition of robust local algorithms andshow how to cast various types of algorithms in this framework. In Section 5, we provide an arsenalof technical tools, including relaxed sunflower lemmas and a sampling lemma that builds on theHajnal–Szemer´edi theorem. In Section 6, we use the foregoing tools to prove Theorem 1. Finally, inSection 7, we derive our applications to coding theory, property testing, and proofs of proximity.7
Technical overview
In this section, we outline the techniques used and developed in the course of proving Theorem 1and its applications. Our techniques build on and simplify ideas from [FLV15, GL20], but aresignificantly more general and technically involved, and in particular, offer novel insight regardingadaptivity in local algorithms.Our starting point, which we outline in Section 2.1, generalises the techniques of [GL20] (whichare, in turn, inspired by [FLV15]) to the setting of robust local algorithms. Then, in Section 2.2,we identify a key technical bottleneck in previous works: adaptivity . We discuss the fundamentalchallenges that adaptivity imposes, and in Section 2.3 we present our strategy for meeting thesechallenges and the tools that we develop for dealing with them, as well as describe our construction.Subsequently, in Section 2.4, we provide an outline of the analysis of our construction, which relieson the Hajnal–Szemer´edi theorem to sample from daisies. Finally, in Section 2.5, we sketch how toderive from Theorem 1 our applications to coding theory, property testing, and probabilistic proofs.
The setting.
Recall that our goal is to transform a robust (query-based) local algorithm intoa sample-based algorithm with sublinear sample complexity. Towards this end, let M be a( ρ , ρ )-robust local algorithm for computing a function f : { , } n → { , } . Since we also need todeal with one-sided robustness , assume without loss of generality that ρ = 0 and ρ := ρ = Ω(1).Recall that the algorithm M receives query access to a string x ∈ { , } n , makes at most q queriesto this string, flips at most r random coins and outputs f ( x ) ∈ { , } with probability at least 1 − σ .For simplicity of exposition, we assume that the error rate is σ = Θ(1 /q ), the query complexityis constant ( q = O (1)), and the randomness complexity r is bounded by log( n ) + O (1). We remarkthat the analysis trivially extends to non-constant values of q , and that we can achieve the otherassumptions via simple transformations, which we provide in Section 5.4, at the cost of logarithmicfactors. In the following, our aim is to construct a sample-based local algorithm N for computingthe function f , with sample complexity O ( n − / (2 q ) ). As a warmup, we first suppose that the algorithm M is non-adaptive . In this case, we can simplyrepresent M as a distribution µ over a collection of query sets S , where each S ∈ S is a subset of [ n ]of size q , and predicates { f S : { , } q → { , }} S ∈S as follows. The algorithm M draws a set S ∈ S according to µ , queries S , obtains the local view x | S (i.e., x restricted to the coordinates in S ), andoutputs f S ( x | S ). Note that by our assumption on the randomness complexity of M , the support of µ (i.e., the collection of query sets) has linear size.Consider an algorithm N that samples each coordinate of the string x independently withprobability p = 1 /n / (2 q ) (and aborts in the rare event that this exceeds the desired samplecomplexity). Naively, we would have liked N x to emulate an invocation of the algorithm M bysampling the restriction of x to a query set S ∼ µ .Indeed, if the distribution µ is “well spread”, the probability of obtaining such a local viewof M is high. Suppose, for instance, that all of the query sets are pairwise disjoint. In this We remark that in general, the function f may depend on an explicitly given parameter (e.g., an index for decodingin the case of (relaxed) LDCs), but for simplicity of notation, we omit this parameter in the technical overview.Furthermore, f may be defined over a larger alphabet, see Section 4. This choice of p will be made clear in Section 2.4; see Footnote 14 N sampling any particular local view is p q , and we expect N to obtain O ( p q n ) = O ( n − / (2 q ) ) local views (recall that the number of query sets is O ( n )). However, if µ isconcentrated on a small number of coordinates, it is highly unlikely that N will obtain a local viewof M . For example, if M queries the first coordinate of x with probability 1, then we can obtain alocal view of M with probability at most p , which is negligible.Fortunately, we can capitalise on the robustness condition to deal with this problem. We firstillustrate how to do so for an easy special case, and then deal with the general setting. Special case: sunflower query set.
Suppose that µ is concentrated on a small coordinate set K and is otherwise disjoint, i.e., the support of µ is a sunflower with kernel K of size at most ρn ;see Fig. 2a. Since the query sets are disjoint outside of K , by the discussion above we will samplemany sets except for the coordinates in K (i.e., sample the petals of the sunflower). Recall that if x is such that f ( x ) = 0, then the ( ρ, M outputs 0, with high probability, onany input y that is ρ -close to x . Thus, even if we arbitrarily assign values to K and use them tocomplete sampled petals into full local views, we can emulate an invocation of M that will outputas it would on x .If all inputs in the promise of M were robust (as is the case for LDCs, but not for testers,relaxed LDCs, and PCPPs), then the above would suffice. However, recall that we are not ensuredrobustness when x is such that f ( x ) = 1. To deal with that, we can enumerate over all possibleassignments to the kernel K , considering the local views obtained by completing sampled petals intofull local views by using each kernel assignment to fill in the values that were not sampled. Observethat: (1) when the input x is a 1-input and N considers the kernel assignment that coincides with x , a majority of local views (a fraction of at least 1 − σ ) will lead M x to output 1; and (2), when x is a 0-input, a minority of local views (a fraction of at most σ ) will lead M x to output 1 under anykernel assignment .The sample-based algorithm N thus outputs 1 if and only if it sees, for some kernel assignment, amajority of local views that lead M to output 1. Recall that there is asymmetry in the robustness of M (while 0-inputs are robust, 1-inputs are not), which translates into asymmetric output conditionsfor N . Note, also, that correctness of this procedure for 0-inputs requires that not even a singlekernel assignment would lead N to output incorrectly; but our assumption on the error rate ensuresthat the probability of sampling a majority of petals whose local views will lead to an error issufficiently small to tolerate a union bound over all kernel assignments, as long as | K | is smallenough. General case: extracting a heavy daisy from the query sets.
Of course, the combinatorialstructure of the query sets of a local algorithm is not necessarily a sunflower and may involve manycomplex intersections. While we could use the sunflower lemma to extract a sunflower from thecollection of query sets, we stress that the size of such a sunflower is sublinear , which is not enoughin our setting (as we deal with constant error rate).Nevertheless, we can exploit the robustness of M even if its query sets only have the structureof a relaxed sunflower, referred to as a daisy , with a small kernel. Loosely speaking, a t -daisy is asunflower in which the kernel is not necessarily the intersection of all petals, but is rather a small The type of robustness that relaxed LDCs admit is slightly more subtle, since it deals with a larger alphabet thatallows for outputting ⊥ . See discussion in Section 7.2. a) A sunflower with one of its sets shaded.The intersection of any two sets results inthe same set, the kernel. (b) A daisy with its kernel shaded, whose boundaryis the dashed line. Outside the kernel each point iscovered by a bounded number of petals. Figure 2: Sunflowers and daisies.subset such that every element outside the kernel is contained in at most t petals; see Fig. 2b (andsee Section 5.1 for a precise definition).Using a daisy lemma [FLV15, GL20], we can extract from the query sets (the support of µ ) ofthe robust local algorithm M a t -daisy D with t roughly equal to n i/q and a kernel K of size roughly n − i/q , where i ∈ [ q ] bounds the size of the petals of D . Moreover, the weight µ ( D ) = (cid:80) S ∈D µ ( S )is significantly larger than the error rate σ of M (recall that we assumed a sufficiently small σ = Θ(1 /q )). Thus, even if the daisy contains all local views that lead to an error, their total weightwould still be small with respect to that of local views leading to a correct decision; hence, thequery sets in the daisy D well-approximate the behaviour of M , and we can disregard the sets inthe support of µ that do not belong to D at the cost of a negligible increase to the error rate.Crucially, the intersection bound t implies that sampling a daisy is similar to sampling asunflower : since petals do not intersect heavily, with high probability many of them are fully queried(as is the case with sunflowers). The bound on | K | , on the other hand, allows us to implementthe sampling-based algorithm we discussed for the sunflower case, except with respect to a daisy.The kernel is sufficiently small so that the output of M remains stable (i.e., does not change) withrespect to changes in the assignment to K , and suffices to tolerate a union bound when consideringall possible assignments to K .It follows that the daisy D provides enough “sunflower-like” structure for the sample-basedalgorithm N defined previously to succeed, with high probability, when it only considers the querysets in D and enumerates over all assignments to its kernel. Let us now attempt to apply the transformation laid out in the previous section to a robust localalgorithm M that makes q adaptive queries. In this case, M may choose to query distinct coordinatesdepending on the answers to its previous queries, and thus there is no single distribution µ thatcaptures its query behaviour.Observe that now, rather than inducing a distribution on sets, the algorithm M induces adistribution over decision trees of depth q , as the behaviour of a randomised query-based algorithm M x can be described by choosing a decision tree according to its random string, then performing the In Definition 5.1, a t -daisy has t as a function from [ q ] to N and allows for a tighter bound on the number ofintersecting petals. We use the simplified definition of [GL20] in this technical overview.
450 0 20 0 351 0 91 0000 1 10 1 100 1 10 1 135x142x x = 1 x = 0 x = 0 x = 1Figure 3: Decision tree of a 3-local algorithm. When the input x is such that x = 1, x = 0 and x = 0,the branch highlighted in blue queries { , , } and outputs 1. When x = 0 and x = 1, this tree induces the query set { , , } ; when x = 1 and x = 0, it induces the set { , , } . This “collapsing” of the querybehaviour is illustrated on either side of the tree. adaptive queries according to the evaluation of that tree on the input x ∈ { , } n . By our assumptionon the randomness complexity of M , this distribution is supported on Θ( n ) decision trees. Notethat for any fixed input x , the decision tree collapses to a path, and hence the distribution overdecision trees induces a distribution over query sets, which we denote µ x (see Fig. 3).A naive way of transitioning from decision trees to sets is by querying all of the branches of eachdecision tree. Alas, doing so would increase the query complexity of M exponentially from q to(more than) 2 q , which would in turn lead to a sample-based algorithm with a much larger samplecomplexity than necessary. Thus, we need to deal with the far more involved structure induced bydistributions over decision trees, which imposes significant technical challenges. For starters, sinceour technical framework inherently relies on a combinatorial characterisation of algorithms, we firstneed to find a method of transitioning from decision trees to (multi-)sets without increasing thequery complexity of the local algorithm M .To this end, a key idea is to enumerate over all random strings and their corresponding decisiontrees, and extract all q -sets (i.e., sets of size q ) corresponding to each branch of each tree. This leavesus with a combinatorial multi-set S (as multiple random strings may lead to the same decision tree,and branches of distinct decision trees may query the same set) with Θ(2 q · n ) = Θ( n ) query sets, ofsize q each, corresponding to all possible query sets induced by all possible input strings. Note that S contains the elements of the support of µ x for all inputs x ∈ { , } n and that, for any fixed input x , the vast majority of these query sets may not be relevant to this input: each S ∈ S \ supp( µ x )corresponds to a branch of a decision tree that the bits of x would have not led to query.This already poses a significant challenge to our approach, as we would have liked to extract aheavy daisy D from the collection S which well-approximates the query sets of M independentlyof any input . However, it could be the case that the sets that are relevant to an input x (i.e.,supp( µ x )) induce a completely different daisy (with potentially different kernels over which we’llneed to enumerate) than the relevant sets for a different input y that differs from x on the values inthe kernel, and so it is not clear at all that there exists a single daisy that well-approximates thequery behaviour of the adaptive algorithm M for all inputs.Furthermore, the above also causes problems with the kernel enumeration process. For each We remark that this treatment of multi-sets allows us to significantly simplify the preparation for combinatorialanalysis that was used in previous works involving sunflowers and daisies. κ to the kernel K , denote by x κ ∈ { , } n the word that takes the values of κ in K andthe values of x outside of K . Recall that the crux of our approach is to simulate executions of M x κ ,for each kernel assignment κ , using the values of the sampled petals and plugging in the kernelassignment to complete these petals into local views. Hence, since relevant sets corresponding todifferent kernel assignments may be distinct, it is unclear how to rule according to the local viewsthat each of them induce.We overcome these challenges in the next section with a more sophisticated extraction of daisiesthat, crucially, does not discard any query sets of the adaptive algorithm M . Specifically, we willpartition the (multi)-collection of all possible query sets into a collection of daisies and simultaneouslyanalyse all daisies in the partition to capture the adaptive behaviour of the algorithm. Relying on techniques from [FLV15, GL20], we can not only extract a single heavy daisy, butrather partition a (multi-)collection of query sets into a family of daisies, with strong structuralproperties on which we can capitalise. This allows us to apply our combinatorial machinery withoutdependency on a particular input, and analyse all daisies simultaneously . Daisy partition lemma.
A refinement of the daisy lemma in [GL20], which we call a daisypartition lemma (Lemma 5.2), partitions a multi-set of q -sets into q + 1 daisies {D i : 0 ≤ i ≤ q } (seeFig. 4) with the following structural properties.1. D is a n /q -daisy, and for i >
1, each D i is a t -daisy with t = n ( i − /q ;2. The kernel K of D coincides with that of D , and, for i >
0, the kernel K i of D i satisfies | K i | ≤ q |S| · n − i/q ;3. The petal S \ K i of every S ∈ D i has size exactly i .Moreover, the kernels form an incidence chain K q = ∅ ⊆ K q − ⊆ · · · ⊆ K = K . Note that D isvacuously a t -daisy for any t , since its petals are empty; and that our assumption on the randomnesscomplexity of M implies | K i | = O ( n − i/q ) when i > S and assert that, for any input x , there exists some i ∈ { , . . . , q } such that µ x ( D i ) is larger than 1 /q (recall that, for all x , the support of µ x iscontained in S ); that is, each input may lead to a different heavy daisy, but there will always be atleast one daisy that well-approximates the behaviour of the algorithm on input x . Alas, with only alocal view of the input word, we are not able to tell which daisies are heavy and which are not.It is clear, then, that a sample-based algorithm that makes use of the daisy partition has torule not only according to a single daisy, but rather according to all of them. But how exactly itshould do so is a nontrivial question to answer, given that there are multiple daisies (and kernels)potentially interfering with one another. Adaptivity in daisy partitions.
A natural approach for dealing with multiple daisies simulta-neously is by enumerating over every assignment to all kernels (i.e., to ∪ i K i ) and, for each suchassignment, obtaining local views from all daisies and ruling according to the aggregated local views.Note that the incidence chain structure implies that enumerating over assignments to K suffices,since each assignment to K induces assignments to K i for all i .12 a) A collection S of 3-sets before being parti-tioned. (b) The collection S partitioned into 4 daisies: D (shaded in grey), D (green), D (yellow) and D (purple).(c) D , whose sets areentirely contained in thekernel K . (d) D with K = K ,where each S ∈ D hasa petal S \ K of size 1. (e) D with K ⊆ K ,where each S ∈ D hasa petal S \ K of size 2. (f) D , with (empty) ker-nel K = ∅ . Each queryset S ∈ D has a petal S \ K = S of size 3. Figure 4: Daisy partition.However, this approach leads to fundamental difficulties. Recall that correctness of the sample-based algorithm on 0-inputs depends on no kernel assignment causing an output of 1. Although forany assignment to K i this happens with sufficiently small probability to ensure this is unlikely tohappen on all 2 | K i | assignments simultaneously, this does not hold true for assignments to largerkernels. More precisely, since | K i − | may be larger than | K i | by a factor of n /q , an error rate thatis preserved by 2 | K i | assignments becomes unbounded if the number of assignments increases to2 | K i − | . This leads us to only consider, for query sets in D i , assignments to K i rather than to theunion of all kernels.Put differently, we construct an algorithm that deals with each daisy independently, and whosecorrectness follows from a delicate analysis that aggregates local views taken from all daisies, whichwe outline in Section 2.4. We begin by considering a sample-based local algorithm N that extendsthe strategy we used for a single daisy as follows. On input x ∈ { , } n , it: (1) samples eachcoordinate of the string x independently with probability p = 1 /n / (2 q ) ; (2) for each i ∈ { , . . . , q } and each assignment κ to the kernel K i of the daisy D i , outputs 1 if a majority of local views leads M to output 1; and (3) outputs 0 if no such majority is ever found.First, note that since the algorithm N is constructed in a whitebox way, it has access to thedescription of all decision trees induced by the query-based algorithm M . Hence N x is able todetermine which local views correspond to a valid execution of M . Denoting by Q the set ofcoordinates that were sampled, an assignment κ to K i induces, for each query set S ⊂ Q ∪ K i , theassignment x κ | S ; the sample-based algorithm N can check whether each such S is a relevant queryset (i.e., belongs to the support of µ x κ ) by verifying it arises from some branch of a decision tree13f M that x κ would have led to query. This allows N to ignore the non-relevant query sets andovercome the difficulty pointed out in the previous section. However, we remain with the issue that motivated searching for heavy daisies in the first place:there is no guarantee that every D i well-approximates the algorithm M on all inputs. Therefore, if x is a 0-input and µ x ( D i ) is smaller than the error rate σ , even when N considers the correct kernelassignment x | K i with respect to x , it may find a majority of local views that leads M x to output 1;indeed, nothing prevents all the “bad” query sets, which lead M x to erroneously output 1, frombeing placed in the same daisy D i .To this end, we set a threshold for the number of local views needed to lead to an output of1. Then, a majority of “bad” query sets in a daisy is not enough to cause an incorrect output,and N rules according to the number, rather than proportion, of local views it sees. The weightof bad query sets is at most σ , allowing us to bound the number of such sets; this prevents themfrom causing an incorrect output even if no local view leads to the correct one. Note that differentthresholds are necessary for daisies with petals of different sizes, since the probability of samplingpetals decreases as the size of the petal increases. The thresholds τ i must thus be carefully set totake this into account.Finally, note that whenever the daisy D leads to an output of 1, this happens (almost) independently of the input : the assignment to every S ∈ D is determined solely by the assignmentto K , because S ⊂ K . Therefore, the sample-based algorithm N disregards D in its execution. The algorithm.
By the discussion above, we obtain the following description for the sample-basedalgorithm N x (with some parameters that we will set later).1. Sample each coordinate of x independently with probability p = 1 /n / (2 q ) . If the number ofsamples exceeds the desired sample complexity, abort.2. For every i ∈ [ q ] and every assignment κ to K i , perform the following steps.(a) Count the number of sets in D i with local views that lead M to output 1, which are relevant for the assignment κ and the queried values. If i = 1, discard the sets whosepetals are shared by at least α local views. (b) If the number is larger than the threshold τ i , output 1.3. If every assignment to every kernel failed to trigger an output of 1, then output 0.In the next section we will present key technical tools that we develop and apply to analyse thisalgorithm, as well as discuss the parameters τ i = γ i · np i (where γ i = Θ(1)) and α = Θ(1), and showit indeed suffices for the problem we set out to solve. We remark that in the accurate description of our construction (see Section 6.1), we capture all the informationcontained in the decision trees via tuples that contain, besides the query set, the assignment that led to it beingqueried as well as the output of the algorithm when it does so. The daisy partition lemma then allows to partitionthese tuples based on the structure of the sets they contain. The extra condition for i = 1 is necessary to deal with the looser intersection bound t = n /q > n ( i − /q on D .We discuss this in the next section. .4 Analysis using a volume lemma and the Hajnal–Szemer´edi theorem To establish the correctness of the aforementioned sample-based algorithm, we shall first need twotechnical lemmas about sampling daisies. We will then proceed to provide an outline of the analysisof our algorithm.
We sketch the proofs of two simple, yet important technical lemmas that will be paramount to ouranalysis: (1) a lemma that allows us to transition from arguing about probability mass to arguingabout combinatorial volume; and (2) a lemma that allows us to efficiently analyse sampling petalsof daisies with complex intersection patterns.
The volume lemma.
We start by showing how to derive from the probability mass of query sets(i.e., the probability under µ x when the input is x ) a bound on the volume that the union of thesequery sets cover. This is provided by the following volume lemma , which captures what is arguably the defining structural property of robust local algorithms .Recall that the sample-based algorithm N uses the query sets of a ( ρ, M with error rate σ , which comprise the support of the distributions µ x for all inputs x . Intuitively,these sets cannot be too concentrated (i.e., cover little volume), as otherwise slightly corruptinga word (in less than ρn coordinates) could require M to output differently, a behaviour that isprevented by the robustness of M . This intuition is captured by the following volume lemma . Lemma 2.1 (Lemma 5.6, informally stated) . Let x ∈ { , } n be a non-robust input (a -input inour case) and S be a subcollection of query sets in the support of µ x . If S covers little volume (i.e., ∪S < ρn ), then it has small weight (i.e., µ x ( S ) < σ ). We stress that the robustness of the -inputs yields the volume lemma for -inputs . Note thatthe contrapositive of the volume lemma yields a desirable property for our sample-based algorithm:for any (non-robust) 1-input x , the query sets in supp( µ x ) must cover a large amount of volume, sothat we can expect to sample many such sets. The Hajnal–Szemer´edi theorem.
Once we establish that a daisy covers a large volume, itremains to argue how this affects the probability of sampling petals from this large daisy, which isa key component of our algorithm. Recall that sampling the petals of a sunflower is trivial to do.However, with the complex intersection patterns that the petals of a daisy could have, we need atool to argue about sampling petals of daisies.First, recall that the daisy partition lemma ensures that each D i is a t -daisy where t = n max { ,i } /q ,for all i . Observe that if D i is a 1-daisy (which we call a simple daisy), that is, each point outside thekernel K i is contained in at most one set S ∈ D i , then the sets in D i have pairwise disjoint petals, sosampling them is exactly like sampling petals of a sunflower: these petals are sampled independentlyfrom one another, and we expect their number to be concentrated around the expectation of p i |D i | (recall that all petals have size i ). This is a rather subtle consequence of adaptivity; in the nonadaptive setting a symmetric volume lemma for b -inputs can be shown using robustness on b -inputs, for b ∈ { , } .
15f course, there is no guarantee that D i is a simple daisy, though we expect it to contain asimple daisy if it is large enough. Indeed, greedily removing intersecting sets yields a simple daisy ofsize Θ( |D i | /t ), but this does not suffice for our purposes because most of the sets in D i are discarded.Instead, we rely on the Hajnal–Szemer´edi theorem to obtain a “lossless” transition from a t -daisyto a collection of simple daisies, from which sampling petals is easy. The Hajnal–Szemer´edi theoremshows that for every graph G with m vertices and maximum degree ∆, and for any k ≥ ∆ + 1,there exists a k -colouring of the vertices of G such that every colour class has size either (cid:98) m/k (cid:99) or (cid:100) m/k (cid:101) . By applying this theorem to the incidence graph of the petals of query sets (i.e., the graphwith vertex set D i where we place an edge between S and S (cid:48) when ( S ∩ S (cid:48) ) \ K i (cid:54) = ∅ ), we obtain apartition of D i into Θ( t ) simple daisies of the same size (up to an additive error of 1), and henceobtain stronger sampling bounds. Note that the probability that N samples too many coordinates (thus aborts) is exponentially small,hence we assume hereafter that this event did not occur.We proceed to sketch the high-level argument of the correctness of the sampled-based algorithm N , described in the previous section, making use of tools above. This follows from two claims thathold with high probability: (1) correctness on non-robust inputs , which ensures that when x is a1-input (i.e., non-robust), there exists i ∈ [ q ] such that when N considers the kernel assignment x | K i (which coincides with the input), the number of local views that lead to output 1 crosses thethreshold τ i ; and (2) correctness on robust inputs , which, on the other hand, ensures that when x isa 0-input (i.e., robust), for every kernel K i and every kernel assignment , the number of local viewsthat lead to output 1 does not cross the threshold τ i .In the following, we remind that when the sample-based algorithm N considers a particularassignment κ to a kernel and counts the number of local views that lead to output 1, the algorithmonly considers views that are relevant to x κ (the input x where the values of its kernel are replacedby κ ); that is, local views that arise from some branch of a decision tree of the adaptive algorithm M that would have led it to query these local views. While N does not know all of x , after collectingsamples from x and considering the kernel assignment κ , it can check which local views are relevantto x κ (see discussion in Section 2.3). Correctness on non-robust inputs.
We start with the easier case, where x is a non-robustinput (in our case, f ( x ) = 1). We show that there exists i ∈ [ q ] such that when N considersthe kernel assignment x | K i , the number of local views that lead to output 1 crosses the threshold τ i = γ i · np i . We begin by recalling that N disregards the daisy D , whose query sets are entirelycontained in the kernel K , and arguing that this leaves sufficiently many query sets that lead tooutput 1. Indeed, while we could not afford this if D was heavily queried by M given the 1-input x (i.e., if µ x ( D ) is close to 1 − σ ), an application of the volume lemma shows this is not the case:since | K | = o ( n ), this volume is smaller than ρn , implying µ x ( D ) < σ for all -inputs x .Apart from D , the query sets in the daisy D whose petals are shared by at least α localviews (for a parameter α to be discussed shortly) are also discarded, and we need to show that theloss incurred by doing so is negligible as well. This is accomplished with a slightly more involvedapplication of the volume lemma: since the sets of D have petals of size 1, the subcollection C ⊆ D of sets that are discarded covers a volume of at most | K | + |C| /α . For a sufficiently large choice ofa constant α >
0, we have |C| /α ≤ ρn/ C ⊆ supp( µ x ) and | supp( µ x ) | = Θ( n ) by the16ssumption on the randomness complexity of M ). Since | K | = o ( n ) and in particular | K | < ρn/ C shows that µ x ( C ) < σ .Finally, the total weight of all query sets in supp( µ x ) that lead to output 0 is at most σ (bydefinition of the error rate σ ). this implies that the subcollection of supp( µ x ) that leads to output 1 and is not disregarded has weight at least 1 − σ − σ − σ = 1 − σ , and, for a sufficiently smallvalue of σ (recall that σ = Θ(1 /q )), we have 1 − σ ≥ / D i has weight at least (1 − σ ) /q ≥ σ , and thus, by the volume lemma, covers at least ρn coordinates . Therefore, since | supp( µ x ) | = Θ( n ) and µ x is uniform over a multi-collection of querysets , this daisy contains Θ( n ) “good” sets (that lead to output 1 and were not discarded). For theanalysis, using the Hajnal–Szemer´edi theorem, we partition the t -daisy D i into Θ( t ) simple daisiesof size Θ( n/t ). Each such simple daisy has disjoint petals of size i , so that Ω( np i /t ) petals will besampled except with probability exp( − Ω( np i /t )). Finally, this implies that, when N considers thekernel assignment x | K i to K i , at least τ i = γ i · np i petals are sampled except with probability O ( t ) · exp (cid:18) − Ω (cid:18) np i t (cid:19)(cid:19) = exp (cid:18) − Ω (cid:18) n − max { ,i − } q − i q (cid:19)(cid:19) = o (1) . (Recall that t = n max { ,i − } /q and the sampling probability is p = 1 /n / (2 q ) .) Correctness on robust inputs.
It remains to show the harder case: when x is a robust input(in our case, f ( x ) = 0), then all kernel assignments to all daisies will make the local views thatlead to output 1 fail to cross the threshold. This case is harder since, by the asymmetry of N withrespect to robust and non-robust inputs, here we need to prove a claim for all kernel assignmentsto all daisies, whereas in the non-robust case above we only had to argue about the existence of asingle assignment to a single kernel.We begin with a simple observation regarding D , then analyse the daisies D i for i >
1, anddeal with the more delicate case of D last. Recall that D is disregarded by the algorithm N ,and that by the asymmetry of N with respect to 0- and 1-inputs, this only makes the analysis onrobust inputs easier . Indeed, N x is correct when no kernel assignment to any of the D i ’s leads tocrossing the threshold τ i of local views on which the query-based algorithm M outputs 1. Thus, bydiscarding the query sets in D , we only increase the probability of not crossing these thresholds.Fix i > κ to K i . Then, the relevant sets that N maysample are those in the support of µ x κ (recall that x κ is the word obtained by replacing the bitsof x whose coordinates lie in K i by κ ). Since | K i | = o ( n ), it follows by the robustness of x that x κ is ρ -close to x , and thus the weight of the collection O ⊆ supp( µ x κ ) of query sets that lead tooutput 1 is at most σ . For the sake of this technical overview, we focus on the worst-case scenario,where all of these “bad” sets are in the daisy D i (i.e., O ⊆ D i ) and |O| is as large as possible (i.e., |O| = Θ( n )), and show that even that will not suffice to cross the threshold τ i .By the randomness complexity of the algorithm M , the size of O is Θ( n ). We apply theHajnal–Szemer´edi theorem and partition O into Θ( t ) simple daisies of size Θ( n/t ). Recall that thepetals of query sets in D i have size i and are disjoint; therefore, each of these simple daisies hasΩ( np i /t ) sampled petals with probability only exp( − Ω( np i /t )). By an averaging argument, the We stress that the constants hidden by Ω-notation are smaller in the robust case than in the non-robust one. This τ i = γ i · np i with probability at most O ( t ) · exp (cid:18) − Ω (cid:18) np i t (cid:19)(cid:19) = exp (cid:18) − Ω (cid:18) n − i − q − i q (cid:19)(cid:19) = exp (cid:16) − Ω (cid:16) n − iq + q (cid:17)(cid:17) ;recall that t = n ( i − /q and the sampling probability is p = 1 /n / q , so p i ≥ /n / (2 q ) . Since thedaisy partition lemma yields a bound of O ( n − i/q ) for the size of the kernel K i , a union bound overall 2 | K i | kernel assignments ensures the threshold is crossed with probability o (1). We now analyse D and stress that the need for a separate analysis arises from the looserintersection bound on this daisy: D is a t -daisy with t = n /q , whereas for all other i the boundis t = n ( i − /q . This implies that there is no “gap” between the expected number of queriedpetals in each simple daisy Θ( np/t ) = Θ( n − /q − / (2 q ) ) = o ( n − /q ) and the size of the kernel | K | = O ( n − /q ), so a union bound as in the case i > N on D is designed to address: the querysets O ⊆ supp( µ x κ ) that lead to output 1 will only be counted by N if their petals are sharedby at most α query sets . Then, by the Hajnal–Szemer´edi theorem, we partition O into α = Θ(1)simple daisies of size Θ( n ). Each simple daisy will have more than τ i /α = Θ( np ) queried petalswith probability at most exp( − Ω( np )), so that the total number of such petals across all simpledaisies exceeds τ j with probability at most exp( − Ω( np )). This provides the necessary gap: asΘ( np ) = Ω( n − / (2 q ) ) and | K | = O ( n − /q ), a union bound over all 2 | K | assignments to K showsthe threshold is crossed with probability o (1). This concludes the high-level overview of the proof ofcorrectness (see Section 6.2 for the full proof). We conclude the technical overview by briefly sketching how to derive our main applications fromTheorem 1; see Section 1.3 for details. Recall that above we assumed the local algorithm M westart with had suitable error rate and randomness complexity; the results that follow remove thisassumption at the cost of an increase of the query complexity of M , from q to O ( q log q ). Query-to-sample tradeoffs for adaptive testers.
The application to property testing is animmediate corollary of Theorem 1. Namely, an ε -tester is a ( ε, ε . Therefore, by Theorem 1, any ε -tester making q adaptive queries can be transformed into a sample-based 2 ε -tester with sample complexity n − /O ( q log q ) (see Corollary 7.1 for details). We remark that in Section 7.1 we show an additional application tomulti-testing. Relaxed LDC lower bound.
By a straightforward extension of our definition of robust localalgorithms to allow for outputting a special failure symbol ⊥ , our framework captures relaxed LDCs(see Section 7.2 for details). We remark that, although standard LDCs have two-sided robustness, is what allows us to show the total number of queried petals is at least τ i = γ i np i with probability exp( − Ω( np i /t )) inthe robust case but 1 − exp( − Ω( np i /t )) in the non-robust, for the same constant γ i . Recall that we set the sampling probability to be p := 1 /n /β with β = 2 q . This choice is justified as follows: theunion bound requirement that 2 | K i | multiplied by the probability of crossing the threshold be small translates into1 /q − i/β > i ∈ [ q − i = q − β = Ω( q ). once for each bit to be decoded , we obtaina global decoder that decodes uncorrupted codewords with n − /O ( q log q ) queries; by a simpleinformation-theoretical argument, we obtain a rate lower bound of n = k /O ( q log q ) for relaxedLDCs (see Corollaries 7.8 and 7.9). Tightness of the separation between MAPs and testers.
Theorem 1 applies to the settingof
Merlin-Arthur proofs of proximity (MAPs) via a description of MAPs as coverings by partialtesters, which are, in turn, robust local algorithms. In Section 7.3, we show that the existence of anadaptive MAP for a property Π with query complexity q and proof length p implies the existenceof a sample-based tester for Π with sample complexity p · n − /O ( q log q ) . This implies that thereexists no property admitting a MAP with query complexity q = O (1) and logarithmic proof length(and, in fact, much longer proof length) that requires at least n − /ω ( q log q ) queries for testerswithout a proof, showing the (near) tightness of the separation between the power of MAPs andtesters from [GR18]. Optimality of Theorem 1.
Interestingly, as a direct corollary of the tightness of the aforemen-tioned separation between MAPs and testers of [GR18], we obtain that the general transformationin Theorem 1 is optimal, up to a quadratic gap in the dependency on the sample complexity. Thisfollows simply because a transformation with smaller sample complexity could have been used totransform the MAP construction of the aforementioned MAPs-vs-testers separation, yielding atester with query complexity that contradicts the lower bound in that result (see Theorem 7.12).
Throughout this paper, constants are denoted by greek lowercase letters, such as α , β and γ ; capitalletters of the latin alphabet generally denote sets (e.g. P and S ) or algorithms (e.g., M and N ).For each n ∈ N , we denote by [ n ] the set { , . . . , n } . Sets S such that | S | = q are called q -sets. Thecomplement of S is denoted S . We will use Σ to denote an alphabet.As integrality issues do not substantially change any of our results, equality between an integerand an expression (that may not necessarity evaluate to one) is assumed to be rounded to thenearest integer. Multi-sets of sets.
To prevent ambiguity, we call (multi-)sets comprised of objects other thanpoints (such as sets, trees or tuples) (multi-) collections in this work, and denote them by thecalligraphic capitals D , S , T . Distance and proximity.
We denote the absolute distance between two strings x, y ∈ Σ n (overthe alphabet Σ) by ¯∆( x, y ) := |{ x i (cid:54) = y i : i ∈ [ n ] }| and their relative distance (or simply distance )by ∆( x, y ) := ¯∆( x,y ) n . If ∆( x, y ) ≤ ε , we say that x is ε -close to y , and otherwise we say that x is ε -far from y . The (Hamming) ball of radius ε around x is B ε ( x ) = { y ∈ Σ n : y is ε -far from x } , andthe ball B ε ( S ) around a set S ⊆ Σ n is the union over B ε ( x ) for each x ∈ S .19 robability. The uniform distribution over a set S is denoted U S . We write X ∼ µ to denote arandom variable X with distribution µ ; the probability of event [ X = s ] is interchangeably referredto as weight , and is denoted Pr[ X = s ] (the underlying distribution will be clear from context), andthe expected value of X is E [ X ]. We also write | µ | as shorthand for | supp( µ ) | , the support size of µ .Below is the version of the Chernoff bound that will be used in this work. Lemma 3.1 (Chernoff bound) . Let X , . . . , X k be independent Bernoulli random variables dis-tributed as X . Then, for every δ ∈ [0 , , Pr (cid:34) k k (cid:88) i =1 X i ≥ (1 + δ ) E [ X ] (cid:35) ≤ e − δ k E [ X ]3 and Pr (cid:34) k k (cid:88) i =1 X i ≤ (1 − δ ) E [ X ] (cid:35) ≤ e − δ k E [ X ]2 . Algorithms.
We denote by M x ( z ) the output of algorithm M given direct access to input z andquery access to string x . Probabilistic expressions that involve a randomised algorithm M are takenover the inner randomness of M (e.g., when we write Pr[ M x ( z ) = s ], the probability is taken over thecoin tosses of M ). The number of coin tosses M makes is its randomness complexity . The maximalnumber of queries M performs over all strings x and outcomes of its coin tosses is interchangeablyreferred to as its query complexity or locality q . When q = o ( n ), where n is the length of the string x , we say M is a ( q -) local algorithm. If the queries performed by M are determined in advance(so that no query depends on the result of any other query), M is non-adaptive ; otherwise, it is adaptive . Finally, if M queries each coordinate independently with some probability p , we say it is a sample-based algorithm. Since we will want to have an absolute bound on the sample complexity(i.e., the number of coordinates sampled) of sample-based algorithms, we allow them to cap thenumber of coordinates they sample. Notation.
We will use the following, less standard, notation. An assignment to a set S (overalphabet Σ) is a function a : S → Σ, which may be equivalently seen as a vector in Σ | S | whosecoordinates correspond to elements of S in increasing order. Its restriction to P ⊆ S is denoted a | P . If x is an assignment to S and κ is an assignment to P ⊆ S , the partially replaced assignment x κ ∈ Σ n is that which coincides with κ in P and with x in S \ P (i.e., x κ | P = κ and x κ | S \ P = x S \ P ). Adaptivity.
Adaptive local algorithms are characterised in two equivalent manners: a standarddescription via decision trees and an alternative that makes more direct use of set systems. Let M be a q -local algorithm for a decision problem (i.e., which outputs an element of { , } ) with oracleaccess to a string over alphabet Σ and no access to additional input (what follows immediatelygeneralises by enumerating over explicit inputs).The behaviour of M is completely characterised by a collection { ( T s , s ) : s ∈ { , } r } of decisiontrees , where r is the randomness complexity of M ; the trees are | Σ | -ary, have depth q , edges labeledby elements of Σ, inner nodes labeled by elements of [ n ] and leaves labeled by elements of { , } .The execution of M x proceeds as follows. It (1) flips r random coins, obtains a string s ∈ { , } r andchooses the decision tree T s to execute; (2) beginning at the root, for q steps, queries the coordinategiven by the label at the current vertex and follows the edge whose label is the queried value; and(3) outputs the bit given by the label of the leaf thus reached.20lthough this offers a complete description of an adaptive algorithm, we choose to use analternative that is amenable to daisy lemmas (see Section 5.1). This is obtained by describingeach decision tree with the collection of its branches. More precisely, from { ( T s , s ) : s ∈ { , } r } we construct { ( S st , a st , b st , s, t ) : s ∈ { , } r , t ∈ [ | Σ | q ] } , where t identifies which branch the tupleis obtained from. S st is the q -set queried by the t th branch of T s , while a st is the assignment to S st defined by the edges of this branch and b st ∈ { , } is the output at its leaf. We remark thatthe decision trees may be reconstructed from their branches, so that this is description is indeedequivalent (though we will not need this fact).We now define the distribution of an algorithm, as well as its distribution under a fixed input. Definition 3.2 (Induced distribution) . Let M be a q -local algorithm with randomness complexity r described by the collection of decision trees { T s : s ∈ { , }} . The distribution ˜ µ M of M is given bysampling s ∈ { , } r uniformly at random and taking T s .Fix an arbitrary input x to M and, for all s ∈ { , } r , let ( S st , x | S st , b st , s, t ) be the unique tupledefined by the branch of T s followed on input x . We may thus discard t and the tuple ( S s , x | S s , b s , s ) is well defined. The distribution µ Mx is given by sampling s ∈ { , } r uniformly at random and takingthe set S s (the first element of the tuple ( S s , x | S s , b s , s ) ). We note that the contents of the tuple ( S s , x | S s , b s , s ) describe exactly how M will behave oninput x and random string s . We now introduce robust local algorithms , a natural notion that captures a wide class of sublinear-timealgorithms, ranging from property testing to locally decodable codes . Our main result (Theorem 1)holds for any robust local algorithm, and indeed, we obtain our results for coding theory, testing,and proofs of proximity as direct corollaries.While local algorithms, i.e., algorithms that only probe a minuscule portion of their input, arevery well studied, their definition is typically context-dependent, where they are required to performdifferent tasks (e.g., test, self-correct, decode, perform a local computation) under different promises(proximity to encoded inputs, being either “close” to or “far” from sets, etc.).Indeed, except for degenerate cases, having a promise on the input is necessary for algorithmsthat only make a sublinear number of queries to their input. We capture this phenomenon bydefining a robustness condition, which is shared by most natural sublinear-time algorithms. Looselyspeaking, we say that a local algorithm is robust if its output remains stable in the presence ofsmall perturbations.In the next subsection, we provide a precise definition of robust local algorithms. Then, in thesubsequent subsection, we show how this notion captures property testing, locally testable codes,locally decodable and correctable codes, PCPs of proximity, and other local algorithms. We begin by defining local algorithms , which are probabilistic algorithms that receive query accessto an input x and explicit parameter z and are required to compute a partial function f ( z, x ) (whichrepresents a promise problem) by only making a small number of queries to the input x .21 efinition 4.1 (Local algorithms) . For sets Σ , Z , let f be a partial function f : Z × Σ n → { , } .A q -local algorithm M for computing f with error rate σ receives query access to an input x ∈ Σ n and explicit access to z ∈ Z . The algorithm makes at most q queries to x , and satisfies Pr[ M x ( z ) = f ( z, x )] ≥ − σ. The parameter q is called the query complexity of M , to which we also refer as locality . Through-out, when we refer to a local algorithm , we mean a q -local algorithm with q = o ( n ). Anotherimportant parameter is the randomness complexity of M , defined as the maximal number of cointosses it makes over all z ∈ Z and x ∈ Σ n .The following definition formalises the aforementioned natural notion of robustness , which isthe structural property that underlies local computation. Loosely speaking, a local algorithm M isrobust if its outputs are “stable” in small neighbourhoods of its 0-inputs or 1-inputs. Definition 4.2 (Robustness) . Let ρ > . A local algorithm M for computing f is ρ -robust at point x ∈ Σ n if Pr[ M w ( z ) = f ( z, x )] ≥ − σ for all w ∈ B ρ ( x ) , z ∈ Z . We say that M is ( ρ , ρ )-robust if, for every z ∈ Z and b ∈ { , } , M is ρ b -robust at any point x such that f ( z, x ) = b . If a local algorithm M is ( ρ , ρ )-robust and max { ρ , ρ } = Ω(1) (independent of n ), we simplycall M robust . Note that non-trivial robustness is only possible since f is a partial function;that is, the local algorithm M solves a promise problem . Indeed, for every parameter z , thealgorithm is promised to receive an input P z, := f − ( z,
0) on which it should output 0, or an input P z, := f − ( z,
1) on which it should output 1.
Remark 4.3 (One-sided robustness) . For our main result (Theorem 1), it suffices to have one-sidedrobustness , i.e., ( ρ , ρ )-robustness where only one of ρ , ρ is non-zero. For example, in the settingof property testing with proximity parameter ε we only have ( ε, ρ, ρ -robust. Remark 4.4 (Larger alphabets) . The definition of local algorithms can be further generalised toa constant-size output alphabet Γ, in which case the partial function is f : Σ n → Γ; we assumeΓ = { , } for simplicity of exposition, but note that our results extend to larger output alphabetsin a straightforward manner.We proceed to show how to capture various well-studied families of sublinear-time algorithms(such as testers, local decoders, and PCPs) using the notion of robust local algorithms. Property testing [RS96, GGR98] deals with probabilistic algorithms that solve approximate decisionproblems by making a small number of queries to their input. More specifically, a tester is requiredto decide whether its input is in a set Π (i.e., has the property Π) or whether it is ε -far from anyinput in Π. Definition 4.5 (Testers) . An ε - tester with error rate σ for a property Π ⊆ Σ n is a probabilisticalgorithm T that receives query access to a string x ∈ Σ n . The tester T performs at most q = q ( ε ) queries to x and satisfies the following two conditions.1. If x ∈ Π , then Pr [ T x = 1] ≥ − σ . . For every x that is ε -far from Π (i.e., x ∈ B ε (Π) ), then Pr [ T x = 0] ≥ − σ . We are interested in the regime where ε = Ω(1) (i.e., ε is a fixed constant independent of n ),and assume it to be the case in the remainder of this discussion.Note that testers are not robust with respect to inputs in the property Π, as one change to aninput x ∈ Π could potentially lead to an input outside Π. Moreover, an ε -tester does not immediatelysatisfy one-sided robustness, as inputs that are on the boundary of the ε -neighbourhood of Π arenot robust (see figure Fig. 1b).However, by increasing the value of the proximity parameter by a factor of 2, we can guaranteethat every point that is 2 ε -far from Π satisfies the robustness condition. The following claimformalises this statement and shows that testers can be cast as robust local algorithms. Claim 4.6. An ε -tester T for property Π ⊆ Σ n is an ( ε, -robust local algorithm, with the sameparameters, for computing the function f defined as follows. f ( x ) = (cid:26) , if x ∈ Π0 , if x is ε -far from Π . Proof.
By definition, the tester T is a local algorithm for computing f ; denote its error rate by σ .We show it satisfies (one-sided) robustness with respect to f . Let x ∈ Σ n be an input that is 2 ε -farfrom Π, and consider y ∈ B ε ( x ). By the triangle inequality, we have that y is ε -far from Π. Thus,Pr [ T y = 0] ≥ − σ , and so T is an ( ε, f . Remark 4.7 (Robustness vs proximity tradeoff) . The notion of a tester with proximity parameter ε and that of an ε -robust tester with proximity parameter 2 ε coincide. Moreover, there is a tradeoffbetween the size of the promise captured by the partial function f and the robustness parameter ρ : taking any ε (cid:48) > ε , the tester T is a ρ -robust local algorithm with ρ = ε (cid:48) − ε for computing thefunction f ( x ) = (cid:26) , if x ∈ Π0 , if x is ε (cid:48) -far from Π . As ε (cid:48) increases, the robustness parameter ρ increases and the size of the domain of definition of f decreases. In particular, taking ε (cid:48) = ε makes T a (0 , We consider error-correcting codes that admit local algorithms for various tasks, such as testing, de-coding, correcting, and computing functions of the message. Recall that a code C : { , } k → { , } n is an injective mapping from messages of length k to codewords of blocklength n . The rate ofthe code C is defined as k/n . The relative distance of the code is the minimum, over all distinctmessages x, y ∈ Σ k , of ∆( C ( x ) , C ( y )). We shall sometimes slightly abuse notation and use C todenote the set of all of its codewords (cid:8) C ( x ) : x ∈ Σ k (cid:9) ⊂ Σ n . Note that we focus on binary codes,but remind that the extension to larger alphabets is straightforward. In the following, we show howto cast the prominent notions of local codes as robust local algorithms.23 .3.1 Locally testable codes Locally testable codes (LTCs) [GS06] are codes that admit algorithms that distinguish codewordsfrom strings that are far from being valid codewords, using a small number of queries.
Definition 4.8 (Locally Testable Codes (LTCs)) . A code C : { , } k → { , } n is locally testable,with respect to proximity parameter ε and error rate σ , if there exists a probabilistic algorithm T that makes q queries to a purported codeword w such that:1. If w = C ( x ) for some x ∈ { , } k , then Pr [ T w = 1] ≥ − σ .2. For every w that is ε -far from C , we have Pr [ T x = 0] ≥ − σ . Note that the algorithm T that an LTC admits is simply an ε -tester for the property of being avalid codeword of C . Thus, by Claim 4.6, we can directly cast T as a robust local algorithm. Locally decodable codes (LDCs) [KT00] are codes that admit algorithms for decoding each individualbit of the message of a moderately corrupted codeword by only making a small number of queriesto it.
Definition 4.9 (Locally Decodable Codes (LDCs)) . A code C : { , } k → { , } n is locally decodablewith decoding radius δ and error rate σ if there exists a probabilistic algorithm D that, given index i ∈ [ k ] , makes q queries to a string w promised to be δ -close to a codeword C ( x ) , and satisfies Pr[ D w ( i ) = x i ] ≥ − σ. Note that local decoders are significantly different from local testers and testing in general.Firstly, decoders are given a promise that their input is close to a valid codeword (whereas testersare promised to either receive a perfectly valid input, or one that is far from being valid). Secondly,a decoder is given an index as an explicit parameter and is required to perform a different task(decode a different bit) for each parameter (see Fig. 1a).Nevertheless, local decoders can also be cast as robust local algorithms. In fact, unlike testers,they satisfy two-sided robustness (i.e., both 0-inputs and 1-inputs are robust). In the following, notethat since inputs near the boundary of the decoding radius are not robust, we reduce the decodingradius by a factor of 2.
Claim 4.10.
A local decoder D with decoding radius δ for the code C : { , } k → { , } n is a ( δ/ , δ/ -robust local algorithm for computing the function f defined as follows. f ( z, w ) = x z , if x ∈ { , } k is such that w is δ/ -close to C ( x ) . Proof.
Take any w ∈ { , } n that is δ/ C ( x ). Then, ( w, z ) is in the domain of definition of f for all explicit inputs z ∈ [ k ]. Now let w (cid:48) ∈ B δ/ ( w ) and note that w (cid:48) is still within the decodingradius of D . Hence, the decoder D outputs x z with probability 1 − σ , as required. Moreover, thisholds regardless whether x z = 0 or x z = 1, and so D is ( δ/ , δ/ Remark 4.11 (Robustness vs decoding radius tradeoff) . A local decoder has decoding radius δ if andonly if it is δ/ δ/
2, and a tradeoff between promise size and robustnessparameter likewise holds in this case: for any δ (cid:48) < δ , the decoder D is a ( δ − δ (cid:48) , δ − δ (cid:48) )-robustalgorithm for the restriction of f to the δ (cid:48) -neighbourhood of the code C . In particular, D is a( δ, δ )-robust algorithm with the domain of f defined to be the code C .24 .3.3 Relaxed locally decodable codes Relaxed locally decodable codes (relaxed LDCs) [BGH +
04] are codes that admit a natural relaxationof the notion of local decoding, in which the decoder is allowed to output a special abort symbol ⊥ on a small fraction of indices, indicating it detected an inconsistency, but never erring with highprobability. Definition 4.12 (Relaxed LDC, informally stated) . A relaxed LDC is a code that admits a decoder D with the same correctness requirement as a standard local decoder, except that if w is a corruptedcodeword, there exists a subset I w ⊆ [ k ] of size ξk , for a constant ξ , where the requirement for i ∈ I w is relaxed to Pr[ D w ( i ) ∈ { x i , ⊥} ] ≥ − σ. A precise definition of relaxed LDCs is stated in Section 7.2, where we apply our transformationto obtain improved rate lower bounds on relaxed LDCs.Note that strictly speaking, the special abort symbol makes it so that relaxed local decoders donot fully fit Definition 4.1, as the input-output mapping f becomes one-to-many. Nevertheless, atrivial generalisation of local algorithms, which allows an additional abort symbol, enables us tocapture relaxed LDCs as robust local algorithms as well. We show this in Section 7.2. The notion of locally correctable codes (LCCs) is closely related to that of LDCs, except that ratherthan admitting an algorithm that can decode any individual message bit, LCCs admit an algorithmthat can correct any corrupted codeword bit of a moderately corrupted codeword.
Definition 4.13 (Locally Correctable Codes (LCCs)) . A code C : { , } k → { , } n is locallycorrectable with correcting radius δ and error rate σ if there exists a probabilistic algorithm D that,given index j ∈ [ n ] , makes q queries to a string w promised to be δ -close to a codeword C ( x ) andsatisfies Pr[ D w ( j ) = C ( x ) j ] ≥ − σ. A straightforward adaptation of Claim 4.10 yields the following claim.
Claim 4.14.
A local corrector D with correcting radius δ for the code C : { , } k → { , } n is a ( δ/ , δ/ -robust local algorithm for computing the function f defined as follows. f ( z, w ) = C ( x ) z , if x ∈ { , } k is such that w is δ/ -close to C ( x ) . Universal locally testable codes (universal LTCs) [GG18] are codes that admit local tests formembership in numerous possible subcodes, allowing for testing properties of the encoded message.
Definition 4.15 (Universal LTCs) . A universal LTC C : { , } k → { , } n for a family of functions F = (cid:8) f i : { , } k → { , } (cid:9) i ∈ [ M ] is a code such that for every i ∈ [ M ] the subcode { C ( x ) : f i ( x ) = 1 } is locally testable. Note that ULTCs trivially generalise LTCs, as well as generalise relaxed LDCs (see details in[GG18, Appendix A]). Since universal testers can be viewed as algorithms that receive an explicitparameter i ∈ [ M ] and invoke an ε -tester for the property { C ( x ) : f i ( x ) = 1 } , then by applyingClaim 4.6 to each value of the parameter i they can be cast as robust local algorithms.25 .4 PCPs of proximity PCPs of proximity (PCPPs) [BGH +
04] are probabilistically checkable proofs wherein the verifier isgiven query access not only to the proof, but also to the input. The PCPP verifier is then requiredto probabilistically check whether the statement is correct by only making a constant number ofqueries to both input and proof.
Definition 4.16.
A PCP of proximity (PCPP) for a language L , with respect to proximity parameter ε and error rate σ , consists of a probabilistic algorithm V , called the verifier, that receives queryaccess to both an input x ∈ Σ n and a proof π ∈ { , } m . The algorithm V is allowed to make q queries to x and to π and satisfies the following:1. for every x ∈ L there exists a proof π such that Pr[ V ( x,π ) = 1] ≥ − σ ; and2. for every x that is ε -far from L and every proof π it holds that Pr[ V ( x,π ) = 0] ≥ − σ . We observe that PCPs of proximity with canonical proofs [GS06] (i.e., such that the verifierrejects statement-proof pairs that are far from being a valid statement with a valid proof for it)admit verifiers that are robust local algorithms. Using the tools of [DGG19], who show that PCPPscan be endowed with the canonicity property at the cost of polynomial blowup in proof length, wecan cast general PCPPs as robust local algorithms.
Claim 4.17.
A PCPP for a language L ⊆ Σ n , with respect to proximity parameter ε > , can betransformed into a PCPP for L , with respect to proximity parameter ε , with a verifier that is an ( ε, -robust local algorithm with the same query complexity and error rate.Sketch of proof. Let V be a PCPP verifier with proximity parameter ε and soundness σ for language L ⊆ Σ n , that makes at most q queries to its input and proof. By [DGG19, Section 3], there existsa a PCPP verifier V (cid:48) for L with the same parameters, except proof length that is polynomial inthe proof length of V , such that the soundness of V (cid:48) is with respect to a set of canonical proofs { Π( x ) } x ∈ L ; that is, V (cid:48) accepts pairs ( x, y ) where x ∈ L and y ∈ Π( x ) and rejects pairs ( x, y ) thatare ε -far from being such that x ∈ L and y ∈ Π( x ).Observe that V (cid:48) is an ( ε, σ and query complexity q forcomputing the function f defined as follows. f ( x, y ) = (cid:26) , if x ∈ L and y ∈ Π( x ) , , if x is 2 ε -far from all pairs ( x, y ) such that x ∈ L and y ∈ Π( x ) . By definition, V (cid:48) is a q -local algorithm for computing f with the parameters in the statement.If the pair ( x, y ) is 2 ε -far from being a valid pair and ( x (cid:48) , y (cid:48) ) is ε -close to ( x, y ), then ( x (cid:48) , y (cid:48) ) is ε -farfrom valid, and thus Pr[ V ( x (cid:48) ,y (cid:48) ) = 0] ≥ − σ . Non-interactive proofs of proximity.
MA proofs of proximity (MAPs) [GR18, FGL14] areproof systems that can be viewed as a property testing analogue of NP proofs. The setting ofMAPs is very similar to that of PCPPs, with the distinction that the purported proof is of sublinearsize and is given explicitly, i.e., the MAP verifier can read the entire proof. We remark that anequivalent description of a MAP as covering by partial testers [FGL14] is used in this work, so that every fixed proof string defines a tester, and Claim 4.6 applies. We cover this in Section 7.3.26
Technical lemmas
In the section we provide an arsenal of technical tools for analysing robust local algorithms, whichwe will then use to prove our main result in Section 6. The order in which we present the tools isaccording to their importance, starting with the most central lemmas.Specifically, in Section 5.1 we discuss the notion of relaxed sunflowers that we shall need, calleddaisies, then state and prove a daisy partition lemma for multi-collections of sets. In Section 5.2, weapply on the Hajnal-Szemer´edi theorem to derive a sampling lemma for daisies. In Section 5.3, weprove a simple yet vital volume lemma for robust local algorithms, which will be used throughoutour analysis. Finally, in Section 5.4 we adapt generic transformations (amplification and randomnessreduction) to our setting of robust local algorithms.
We discuss the central technical tool used in the transformation to sample-based algorithms, whichis a relaxation of combinatorial sunflowers, referred to as daisies [FLV15, GL20]. We extend thedefinition of daisies to multi-sets, then state and prove the particular variant of a daisy lemma thatwe shall need.
Definition 5.1 (Daisy) . Suppose S is a multi-collection of subsets of [ n ] (subsets may repeat). S isan h -daisy (where h : N → N ) with petals of size j and kernel K ⊆ [ n ] if the following holds. Every S ∈ S has a petal S \ K with | S \ K | = j and, for every k ∈ [ j ] , there exists a subset P k ⊆ S \ K with | P k | ≥ k whose elements are contained in at most h ( k ) sets from S . A daisy with pairwisedisjoint petals ( -daisy) is referred to as a simple daisy . We remark that the notion of a daisy relaxes the standard definition of a sunflower in twoways: (1) the kernel is not required to equal the pairwise intersection of all sets in the collection,rather, its structure is unconstrained; and (2) the petals P = { S \ K : S ∈ D} need not be pairwisedisjoint, but rather each point outside of the kernel can be contained in at most h ( j ) sets of D ; seeFig. 2b. Note that Definition 5.1 applies to multi-sets , as opposed to sunflowers (for which pairwisedisjointness disallows multiple copies of a same set).These relaxations, unlike in the case of sunflowers, allow us to arbitrarily partition any collectionof subsets into a collection of daisies with strong structural properties, as Lemma 5.2 shows. Lemma 5.2 (Daisy partition lemma for multi-collections) . Let S be a multi-collection of q -sets of [ n ] , and define the function h : N → N as follows: h ( k ) = n max { ,k − } q . Then, there exists a collection {D j : 0 ≤ j ≤ q } such that1. {D j } is a partition of S , i.e., (cid:83) qj =0 D j = S and D j ∩ D k = ∅ when j (cid:54) = k .2. For every ≤ j ≤ q , there exists a set K j ⊆ [ n ] of size | K j | ≤ q |S| · n − max { ,j } /q such that D j is an h -daisy with kernel K j and petals of size j . Moreover, the kernels form an incidencechain ∅ = K q ⊆ K q − ⊆ · · · ⊆ K = K .Proof. We construct the collections {D j : 0 ≤ j ≤ q } in a greedy iterative manner, as follows.27. Define S := S .2. Inductively define, for each 0 ≤ j ≤ q − Kernel construction:
Define K j as the set of points in [ n ] that are contained in at least h ( j + 1) sets from S .(b) Daisy construction:
Set D j to be all the sets S ∈ S j such that | S \ K j | = j .(c) Set S j +1 to be S j \ D j .3. Finally, set D q = S q and K q = ∅ .We now prove that this construction yields daisies with the required properties. For easeof notation, let d i be the number of sets in S containing i for each i ∈ [ n ]. By definition, S q ⊆ S q − ⊆ · · · ⊆ S and D j ⊆ S j for all j . Since D j − ∩ S j = ∅ for all j ∈ [ q ], it follows that D j ∩ D k = ∅ when j (cid:54) = k . Also, since S = S q ∪ (cid:83) ≤ j h ( k ) (equivalently said, j and k are minimal such thatthe subset L ⊂ S \ K j , comprised of all i with d i ≤ h ( k ), has size at most k − k = 1. Then, every i ∈ S \ K j is such that d i > h (2) = h (1) and thus S \ K j ⊆ K . But this implies S ∈ D (since | S \ K | = 0), a contradiction because the intersectioncondition holds by vacuity on empty petals.Suppose now that k >
1. The subset L := { i ∈ S \ K j : d i ≤ h ( k ) } contains at most k − k , at least k − i ∈ L satisfy d i ≤ h ( k − ≤ h ( k ). Therefore, | L | = k − L = { i ∈ S \ K j : d i ≤ h ( k − } . By the definition of L , every i ∈ S \ ( K j ∪ L ) satisfies d i > h ( k ), so that i ∈ K k − ; therefore, S ⊆ K k − ∪ K j ∪ L . Since the kernels form an incidence chain, K j ⊆ K k − and thus S \ K k − = L .28ut then | S \ K k − | = | L | = k −
1, so that S ∈ D k − , contradicting the fact that S ∈ D j (because k − < j and {D j } is a partition).The following claim shows an upper bound on the total number of sets in an h -daisy that mayintersect a given petal. It will be useful in order to partition a daisy into simple daisies, as the nextsection will show. Claim 5.3.
Let S be a multi-collection of q -sets and {D j : 0 ≤ j ≤ q } be a daisy partition obtainedby an application of Lemma 5.2. Then, for every j ∈ [ q ] and S ∈ D j , the number of sets in D j whose petals intersect S \ K j is at most h ( j ) − n max { ,j − } /q − .Proof. Let S be an arbitrary set in D j . We name the elements in S \ K j by u , u , . . . , u j (byLemma 5.2, every S ∈ D j satisfies | S \ K j | = j ). For every k ∈ [ j ], let d k be the number of sets of D j that u k is a member of.Assume without loss of generality that d k ≤ d (cid:96) for every k and (cid:96) in [ j ] such that k < (cid:96) , asotherwise we can rename u , u , . . . , u j so that this holds.By the definition of an h -daisy, for every (cid:96) ∈ [ j ], there exist a set of (cid:96) elements k ∈ [ j ] thatsatisfy d k ≤ h ( (cid:96) ). Thus, [ (cid:96) ] is such a set and we know that d (cid:96) ≤ h ( (cid:96) ). As the number of petals thatintersect S \ K j is at most (cid:80) jk =1 d k , we get that j (cid:88) k =1 d k ≤ j (cid:88) k =1 h ( k ) = j (cid:88) k =1 n max { ,k − } q ≤ h ( j ) . The last equality follows directly from the value of h ( k ). Concentration of measure is an essential ingredient in our proofs, which we first illustrate via asimplified example. Consider a collection of singletons that comprise the petals of a combinatorialsunflower: sets P , . . . P k , all disjoint and of size 1, contained in the ground set [ n ]. If we performbinomial sampling of the ground set (sampling each i ∈ [ n ] independently with probability p ), theChernoff bound ensures that the number of sampled petals is close to its expectation. Defining X i as the random variable that indicates whether P i was sampled, we have lower and upper tail boundsthat guarantee the number of queried petals is concentrated around pn except with exponentiallysmall probability. Note, too, that the same holds for larger petals: if P i is a j -set for all i , thenumber of queried petals is concentrated around p j n .Now consider the case where P , . . . , P k are petals of a daisy . In this case the Chernoff bounddoes not apply, since the indicator random variables X i are no longer independent; however, thestructure of a daisy ensures there is not too much intersection among these petals, which givesmeans to control the correlation between these random variables. It is thus reasonable to expectthat sampling a daisy is similar to sampling a sunflower . This intuition is formalised by making useof the Hajnal-Szemer´edi theorem [HS70], which we state next. Theorem 5.4.
Let G be a graph with m vertices and maximum degree ∆ . Then, for any k ≥ ∆ + 1 ,there exists a k -colouring of the vertices of G such that every colour class has size either (cid:98) m/k (cid:99) or (cid:100) m/k (cid:101) .
29e remind that integrality does not cause issues in our analyses, and we thus assume all colourclasses have size n/k . By encoding the sets of a daisy as the vertices of an “intersection graph”, thefact that petals have bounded intersection translates into a graph with bounded maximum degree.Applying the Hajnal-Szemer´edi theorem to this graph, we are able to partition the original daisyinto a small number of large simple daisies.
Lemma 5.5.
A daisy D with kernel K , such that each one of its petals has a non-empty intersectionwith at most t other petals, can be partitioned into t + 1 simple daisies with the same kernel, each ofsize |D| / ( t + 1) .Proof. Construct a graph G with vertex set D by placing an edge between vertices S and S (cid:48) when( S ∩ S (cid:48) ) \ K (cid:54) = ∅ . By definition, the maximum degree of G is ∆( G ) ≤ t . The Hajnal-Szemer´editheorem implies G is colourable with t + 1 colours, where each colour class has size | G | / ( t + 1).This partition of the vertex set corresponds to a partition of the daisy D into simple daisies {S j : j ∈ [ t + 1] } , each of size |D| / ( t + 1). This section proves a key lemma that makes use of daisies to prove a certain structure on the setsthat a robust local algorithm may query. Loosely speaking, the volume lemma ensures that in orderfor a collection of sets to be queried with high enough probability, it must cover a sufficiently largefraction of the input’s coordinates.Let M be a q -local algorithm that computes a partial function f with error rate σ (we assumethe explicit input to be fixed and omit it). Recall that, for each input x ∈ Σ n , the algorithm M queries according to a distribution µ x over a multi-collection of q -sets, as defined in Definition 3.2. Lemma 5.6 (Volume lemma) . Fix x ∈ Σ n in the domain of f . If there exists a ρ -robust y ∈ Σ n such that f ( y ) (cid:54) = f ( x ) , then every collection S ⊆ supp( µ x ) such that | ∪ S| = | ∪ S ∈S S | < ρn satisfies µ x ( S ) ≤ σ .Proof. Suppose, by way of contradiction, that there exists
S ⊆ supp( µ x ) such that µ x ( S ) > σ and | ∪ S| < ρn .For notational simplicity, assume without loss of generality that f ( x ) = 1, and take a ρ -robust y ∈ Σ n such that f ( y ) = 0. Define w to match x in the coordinates covered by ∪S , and to match y otherwise. Then w is ρ -close to y , so that M outputs 0 when its input is w with probability at least1 − σ .When the algorithm samples a decision tree that makes it query S ∈ S , then M behaves exactly as it would on input x , which happens with probability at least µ x ( S ) > σ . But the algorithmoutputs 1 on input x with probability at most σ , and thus outputs 0 on input w with probabilitygreater than σ , in contradiction with the robustness of y . Remark 5.7.
The volume lemma requires an arbitrary ρ -robust y with f ( y ) (cid:54) = f ( x ). It thussuffices that a single such ρ -robust point exists for the volume lemma to hold for every x (cid:48) such that f ( x (cid:48) ) = f ( x ). This section provides two standard transformations that improve parameters of an algorithm: errorreduction (Claim 5.8) and randomness reduction (Claim 5.9). These apply generally to randomised30lgorithms for decision problems, and, when applied to robust local algorithms, both transformationscompute the same function and preserve robustness . We defer their proofs to Appendix A.The following claim is an adaptation of a basic fact regarding randomised algorithms: performingindependent runs and selecting the output via a majority rule decreases the error probabilityexponentially.
Claim 5.8 (Error reduction) . Let M be a ( ρ , ρ ) -robust algorithm for computing a function f : Z × Σ n → { , } with error rate σ ≤ / , query complexity q and randomness complexity r .For any σ (cid:48) > , there exists a ( ρ , ρ ) -robust algorithm N for computing the same function witherror rate σ (cid:48) , query complexity O ( q log(1 /σ (cid:48) ) /σ ) and randomness complexity O ( qr log(1 /σ (cid:48) ) /σ ) . Next, we state a transformation that yields an algorithm with twice the error rate and significantlyreduced randomness complexity. This, in turn, provides an upper bound on the number of q -setsqueried by the algorithm, such that an application of Lemma 5.2 to this multi-collection yields daisieswith kernels of sublinear size. Such a bound on the size of the kernels is crucial to ensure correctnessof the sample-based algorithm we construct in Section 6.1. Our proof adapts the technique ofGoldreich and Sheffet [GS10], which in turn builds on the work of Newman [New91]. Claim 5.9 (Randomness reduction) . Let M be a ( ρ , ρ ) -robust algorithm for computing a function f : Z × Σ n → { , } with error rate σ , query complexity q and randomness complexity r .There exists a ( ρ , ρ ) -robust algorithm N for computing the same function with error rate σ and query complexity q , whose distribution ˜ µ N has support size n ln | Σ | /σ . In particular, therandomness complexity of N is bounded by log( n/σ ) + log log | Σ | + 2 . In the next section, we need a combination of error reduction and randomness reduction. Recallthat we begin with a ρ -robust (cid:96) -local algorithm M with explicit access to z and query access to astring in Σ n . Its error rate is 1 /
3, and it may have arbitrarily large randomness complexity.We now apply both transformations in order, omitting mention of parameters that are leftunchanged.1. Apply Claim 5.8 (error reduction) to M , obtaining a new algorithm M (cid:48)(cid:48) with error rate σ (cid:48)(cid:48) = 1 / (16 q ) and query complexity q = O ( (cid:96) log(1 /σ (cid:48)(cid:48) )) = O ( (cid:96) log(16 q )) (as well as largerrandomness complexity).2. Apply Claim 5.9 (randomness reduction) to M (cid:48)(cid:48) , thereby obtaining a new algorithm M (cid:48) witherror rate σ = 2 σ (cid:48)(cid:48) = q and support size 3 n ln | Σ | /σ (cid:48)(cid:48) = 6 n ln | Σ | /σ on its distribution overdecision trees.We thus derive the following lemma. Lemma 5.10.
Assume there exists a ρ -robust algorithm M for computing f with query complexity (cid:96) , error rate / and arbitrary randomness complexity. Then there exists a ρ -robust q -local algorithm M (cid:48) with error rate σ = 1 / (8 q ) such that q log 16 q = O ( (cid:96) ) , or, equivalently, q = O ( (cid:96) log (cid:96) ) . Moreover, the distribution of M (cid:48) is uniform over a multi-collection of decision trees of size n ln | Σ | /σ . Proof of Theorem 1
This section contains the main technical contribution of our work: a proof that every robust localalgorithm with query complexity q can be transformed into a sample-based local algorithm withsample complexity n − /O ( q log q ) . We begin by providing a precise statement of Theorem 1. In thefollowing, we remind that when the error rate of an algorithm is not stated, it is assumed to be 1 / Theorem 6.1 (Theorem 1, restated) . Suppose there exists a ( ρ , ρ ) -robust local algorithm M forcomputing the function f : Z × Σ n → { , } with query complexity (cid:96) and max { ρ , ρ } = Ω(1) .Then, there exists a sample-based algorithm N for f with sample complexity γ · n − / (2 q ) , where q = O ( (cid:96) log (cid:96) ) and γ = O ( | Σ | q ln | Σ | ) . Note that when q = Ω( √ log n ) or | Σ | q = Ω( n / (2 q ) ), the algorithm we obtain samples linearlymany coordinates, and the statement becomes trivial. Therefore, hereafter we assume that thequery complexity of M satisfies (cid:96) = o ( (cid:112) log n/ log log n ) (so q = o ( √ log n )) and the alphabet sizesatisfies | Σ | = n /ω ( (cid:96) log (cid:96) ) (so | Σ | q = n /ω ( q ) ).We proceed to prove Theorem 6.1. Specifically, in Section 6.1 we construct a sample-basedlocal algorithm N from the ( ρ , ρ )-robust local algorithm M in the hypothesis of Theorem 6.1; inSection 6.2, we analyse our sample-based algorithm N ; and in Section 6.3 we conclude the proof byshowing the lemmas proved throughout the analysis indeed imply correctness of N . Hereafter, let f : Z × Σ n → { , } be the function in the hypothesis of Theorem 6.1. As thetreatment that follows is the same for all explicit inputs z ∈ Z , we assume it to be fixed and omit itfrom the notation. We also assume without loss of generality that ρ is a constant strictly greaterthan 0 (if this is not the case we simply exchange the 0 and 1 values in the truth table of f ). Weset ρ = ρ .Let M be the algorithm in the hypothesis of Theorem 6.1. We apply Lemma 5.10 and obtaina ( ρ , ρ )-robust local algorithm M (cid:48) for the same problem, with query complexity q = O ( (cid:96) log (cid:96) )and error rate σ = 1 / (8 q ). The algorithm N we describe below has white box access to the localalgorithm M (cid:48) . We next explain how it extracts information from it.Upon execution, M (cid:48) chooses a decision tree uniformly at random according to the outcome ofits coin flips; this uniform distribution is denoted ˜ µ = ˜ µ M (cid:48) , whose support size is | ˜ µ | . For everydecision tree and every one of its branches, define a description tuple ( S, a S , b, s ), where s is therandom string that will cause the use of this tree, S is the set of all the queries in this branch, a S isthe assignment to these queries that will result in M (cid:48) using this specific branch and b is the value M (cid:48) returns when this occurs (as per Definition 3.2).We assume that for every description-tuple ( S, a S , b, s ) the size of S is exactly q . This can beassumed without loss of generality since it is possible to convert M (cid:48) into an algorithm such thatevery decision tree and every one of its branches makes q distinct queries: if the same query appearsmore than once on a branch of a tree, all but the first appearance can be removed by choosingthe continuation that follows the (already known) value that leads to the algorithm using thisbranch. In addition, a tree can be expanded by adding queries, so that every branch has exactly q distinct queries. Both of these changes do not change any of the parameters of the algorithmbeyond ensuring that it will query exactly q queries.32he algorithm N we describe next is only interested in description tuples ( S, a S , b, s ) such that b = 1. To this end we set T = { ( S, a S , b, s ) : ( S, a S , b, s ) is a description tuple such that b = 1 } . Algorithm N also requires the multi-collection S defined as follows: S = { S : ( S, a S , b, s ) ∈ T } . Specifically, it applies Lemma 5.2 to get a daisy partition of S . When the algorithm extracts T and S from M (cid:48) and computes a daisy partition for S , it preserves the information that allows it toassociate the set of a tuple of ( S, a S , b, s ) to the unique daisy S is contained in.The construction proceeds in two stages: preprocessing and execution . Recall that, for any input x ∈ Σ n and assignment κ to a subset S ⊆ [ n ], we denote by x κ the word that assigns the samevalues as κ on S and the same values as x on [ n ] \ S . Preprocessing. N has access to M (cid:48) , with which it computes T and S . Applying Lemma 5.2 to S , the algorithm obtains the daisy partition D = {D j : 0 ≤ j ≤ q } , so that each tuple in T is associated with D j for exactly one j ∈ { , . . . , q } . Set p := γ · n − / (2 q ) , the sampling probability , where γ = 48 | Σ | q ln | Σ | ; and, for every j ∈ [ q ], set τ j := | ˜ µ | q · p j , the thresholds , which will be used in the execution stage. Execution.
When N receives query access to a string x ∈ Σ n , it performs the following sequenceof steps.1. Sampling:
Select each element in [ n ] independently with probability p . Denote by Q the set ofall coordinates thus obtained. If | Q | ≥ pn , then N outputs arbitrarily. Otherwise, N queriesall the coordinates in Q .2. Enumeration:
For every j ∈ [ q ] and kernel assignment κ to K j , perform the following steps.Set a counting variable v to 0 before each iteration.(a) Local view generation and vote counting:
For every tuple (
S, a S , , s ) ∈ T such that S ∈ D j , increment v if S ⊂ Q ∪ K j and a S assigns on S the same values as x κ does.In the case j = 1, if 12 ln | Σ | / ( ρ · σ ) sets have the same point outside K , disregard themin the count. (b) Threshold check: If v ≥ τ j , output 1.3. If the condition v ≥ τ j was never satisfied, then output 0.We proceed to analyse this construction. Note that the algorithm does not iterate over the case j = 0. We will show in Section 6.2 that this has a negligibleeffect. This is required for technical purposes when dealing with K . .2 Analysis We remind that the explicit input z is assumed to be fixed and is omitted from the notation. Forthe analysis we are interested in the behaviour of the algorithm M (cid:48) on a fixed input x . For thispurpose, we use the distribution µ x from Definition 3.2.For x ∈ Σ n we define µ x to be the uniform distribution over the multi-collection of sets { S : ( S, a S , b, s ) is a description tuple such that a S = x | S } , (6.1)where a description tuple is as appears in Section 6.1. We note that this implies that supp( µ x ) hasexactly one set for each decision tree M (cid:48) may use, since when both the randomness and the inputare fixed exactly one branch of the decision tree is used by M (cid:48) . Therefore, | µ x | = | ˜ µ | . We now list the relevant parameters in the analysis with reference to where they are obtained.By Lemma 5.10, σ = 18 q , (6.2)and, for every x ∈ Σ n , | µ x | = | ˜ µ | = 6 n ln | Σ | /σ . (6.3)The construction of N in the previous section sets the parameters γ = 48 | Σ | q ln | Σ | , (6.4) p = γ · n − / (2 q ) , (6.5)and, for all j ∈ [ q ], τ j = | ˜ µ | q · p j = | µ x | q · p j , (6.6)(the second equality holds for all x ∈ Σ n by Eq. (6.3)). Finally, the size of the collection of tuples T ,which by the construction in Section 5.4 is the same as that of S , is bounded by the total numberof branches over all decision trees in supp(˜ µ ). Thus, |S| = |T | ≤ | Σ | q · | ˜ µ | = | Σ | q · | µ x | , (6.7)for every x ∈ Σ n .For our result we need an upper bound on the sizes of the kernels algorithm N enumerates over. Claim 6.2.
Let { K i : 0 ≤ i ≤ q } be the kernels of the daisy partition {D i } of S used by the algorithm N . Then, for every i ∈ { , , . . . , q } , the kernel K i is such that | K i | ≤ γ · q · n − max { ,i } /q and, for n sufficiently large, | K i | < ρn/ .Proof. By Lemma 5.2, for every i ∈ { , , . . . , q } , | K i | ≤ q |S| n − max { ,i } /q ≤ q | Σ | q · | µ x | n − max { ,i } /q (by Eq. (6.7), |S| ≤ | Σ | q · | µ x | ) ≤ q | Σ | q · (6 n ln | Σ | /σ ) n − max { ,i } /q (by Eq. (6.3), | µ x | ≤ n ln | Σ | /σ )= 48 | Σ | q · ln | Σ | · q · n − max { ,i } /q (by (6.2), σ = 1 / (8 q ))= γ · q · n − max { ,i } /q (by Eq. (6.4), γ = 48 | Σ | q ln | Σ | )34t remains to prove the second part of the claim. By the calculation above, since ρ is constant and | Σ | q ln | Σ | · q = o ( n − /q ) (recall that | Σ | = n /ω ( q ) and q = o ( √ log n )), for sufficiently large n , | K | ≤ γ · q · n − /q = (cid:16) | Σ | q ln | Σ | · q · n − /q (cid:17) n ≤ ρn. (6.8)By Lemma 5.2, K q ⊆ K q − , . . . , K = K , and hence the claim follows.Next, we provide a number of definitions emanating from algorithm N . We define, for every x ∈ Σ n , the multi-collection O x := (cid:8) S : ( S, a S , , s ) ∈ T and x | S = a S (cid:9) , where T is defined as in Section 6.1. Note that the definition of this collection depends only on thealgorithm M and not on the function f it computes. Hence, it is well-defined for every x , and inparticular for points that are ρ -close to a ρ -robust point of the domain (unlike f ). We note that,since µ x is defined over the collection in (6.1) we know that O x ⊆ supp( µ x ) . (6.9)Since the “capping parameter” 12 ln | Σ | / ( ρ · σ ) is used numerous times, we set α = 12 ln | Σ | / ( ρ · σ ) . (6.10)We refer to the act of incrementing v as counting a vote. For each j ∈ [ q ], we define the votecounting function v j : Σ n → N to be a random variable over the value of Q as follows. If j > v j ( x ) := | { S ∈ O x ∩ D j : S ⊆ Q ∪ K j } | , and v ( x ) is defined likewise, with the exception that, when more than α sets intersect in a pointoutside K , they are discarded. Claim 6.3.
Let x ∈ Σ n , j ∈ [ q ] and κ an assignment to K j . Then v j ( x κ ) is equal (as a function of Q ) to the maximum value of the counter v computed by N on input x with kernel K j and the kernelassignment κ to K j .Proof. Fix x ∈ Σ n . Recall that when algorithm N computes v for a j ∈ { , , . . . , q } and a kernelassignment κ to K j in Step 2a, it only increases v if it encounters a tuple ( S, a S , , s ) where S ∈ D j , S ⊂ Q ∪ K j and a S assigns on S the same values as x κ does. Thus, by the definition of O x , thealgorithm N counts exactly all the tuples ( S, a S , , s ) such that S ∈ O x and S ⊂ Q ∪ K j . Theseare precisely the sets that comprise the collection whose cardinality is v j ( x κ ). Note that the sameholds when j = 1 due to the additional condition in Step 2a and the corresponding restriction inthe definition of v ( x κ ).We now proceed to the main claims. The algorithm N only counts votes for output 1, i.e. tupleswith 1 as their third value, and hence it suffices to prove that: (1) when f ( x ) = 1 , when the kernelassignment is κ = x | K j (the value of x on the indices in K j ) for some daisy D j , the number of votesis high enough to cross the threshold τ j ; and that (2) when f ( x ) = 0 , then every kernel assignment κ is such that the number of votes is smaller than the threshold. These conditions are shown tohold with high probability in Claim 6.4 and Claim 6.5, respectively, and we show how the theoremfollows from them in the next section. 35 laim 6.4 (Correctness on non-robust inputs) . Let Q be the coordinates sampled by N and fix x ∈ Σ n such that f ( x ) = 1 . There exists j ∈ [ q ] such that, with the kernel assignment κ = x | K j , thevote counting function satisfies v j ( x κ ) ≥ τ j with probability at least / .Proof. For ease of notation, let us fix x as in the statement and denote O := O x = O x κ . When j >
1, define the subcollection of O in D j by O j := O x ∩ D j ; when j = 1, define O := ( O x ∩ D j ) \ C ,where C ⊆ O x ∩ D contains every S ∈ O x ∩ D for which there exist at least α − S (cid:48) ∈ O x ∩ D that have the same petal as S , i.e., such that S \ K = S (cid:48) \ K . We also take n to besufficiently large when necessary for an inequality to hold.For the claim to hold we require the existence of j ∈ [ q ] such that O j is a sufficiently largeportion of O . Since (cid:83) 2, and it follows that | K | + |C| α < ρn |O| α ≤ ρn | µ x | α (by (6.9), O = O x ⊆ supp( µ x )) ≤ ρn | µ x | · ρ · σ 12 ln | Σ | (cid:18) by Eq. (6.10), α − = ρ · σ 12 ln | Σ | (cid:19) = ρn n ln | Σ | σ · ρ · σ 12 ln | Σ | (by Eq. (6.3), | µ x | = 6 n ln | Σ | /σ ) ≤ ρn . Consequently, by the volume lemma (Lemma 5.6), µ x ( C ) ≤ σ .By the definition of error rate, µ x ( O ) ≥ − σ . Since {O j : 0 ≤ j ≤ q } is a partition of O (because {D j } is a partition), µ x (cid:91) 2. Let j be such that |O j | ≥ | µ x | / (2 q ) ; (6.11)36y averaging, such a j indeed exists. Our goal now is to show that with probability at least 9 / τ j sets S ∈ O j whose petal is in Q , i.e., such that S \ K j ⊆ Q .Instead of proving this directly on O j , we do so on collections that form a partition of O j andhave a useful structure. The sets in O j are also in D j , so that O j is also a daisy with kernel K j .By Claim 5.3, for every set S ∈ O j , there exist at most 2 n max { ,j − } /q − S (cid:48) ∈ O j whose petalshave a non-empty intersection with the petal of S , i.e, such that ( S ∩ S (cid:48) ) \ K j (cid:54) = ∅ . This enables usto apply Lemma 5.5 to O j , partitioning it into {S i : i ∈ [2 t ] } , simple daisies of equal sizes, where t ≤ n max { ,j − } /q . (6.12)Thus, for every i ∈ [2 t ] , |S i | = O j t . (6.13)Let O (cid:48) j be the multi-collection of all sets S ∈ O j such that S \ K j ⊆ Q . In the same manner, forevery i ∈ [2 t ], let S (cid:48) i be the multi-collection of all sets S ∈ S i such that S \ K j ⊆ Q .By construction, the collections {S (cid:48) i } are pairwise disjoint. Also, by the definition of v j , we have v j ( x ) = |O (cid:48) j | = (cid:80) ti =1 |S (cid:48) i | . Therefore, the event that v j ( x ) ≤ τ j can only occur if there exists i ∈ [2 t ]such that |S (cid:48) i | ≤ τ j / (2 t ).Consequently, we obtainPr [ v j ( x ) ≤ τ j ] ≤ Pr (cid:104) |S (cid:48) i | ≤ τ j t for some i ∈ [2 t ] (cid:105) ≤ t (cid:88) i =1 Pr (cid:104) |S (cid:48) i | ≤ τ j t (cid:105) (union bound) ≤ t Pr (cid:104) |S (cid:48) | ≤ τ j t (cid:105) . (all S i have equal size)We show afterwards that the probability of the event |S (cid:48) | ≤ τ j t is strictly less than 1 / (20 t ),which by the inequality above implies the claim.We later use the Chernoff bound with S , and hence we start by bounding E [ |S (cid:48) | ] from below.Recall that the petal of every set S ∈ S ⊆ D j has size j (i.e., | S \ K j | = j ), and therefore is in S (cid:48) with probability exactly p j . So E [ |S (cid:48) | ] = |S | p j = |O j | p j t (by Eq. (6.13), |S | = |O j | / t ) ≥ | µ x | p j tq (by Eq. (6.11)) (6.14) ≥ τ j t . (cid:18) by Eq. (6.11) and Eq. (6.6), |O j | ≥ | µ x | q and τ j = | µ x | q · p j (cid:19) Thus, Pr (cid:20) |S (cid:48) | ≤ E [ |S (cid:48) | ] (cid:21) ≥ Pr (cid:104) |S (cid:48) | ≤ τ j t (cid:105) . Next we show that the probability of the event |S (cid:48) | ≤ E [ |S (cid:48) | is at most 1 / (20 t ), which concludesthe proof. Since S is a simple daisy, the petals of sets in S are pairwise disjoint and hence the37vents S \ K j ⊂ Q , for every S ∈ S , are all independent. This enables us to use the Chernoff boundto get thatPr (cid:20) |S (cid:48) | ≤ E [ |S (cid:48) | ] (cid:21) ≤ exp (cid:18) − E [ |S (cid:48) | ]8 (cid:19) (Chernoff bound) ≤ exp (cid:18) | µ x | p j tq (cid:19) (by Eq. (6.14)) ≤ exp (cid:18) − n ln | Σ | σ p j n tq (cid:19) (by (6.3), | µ x | = 6 n ln | Σ | /σ ) ≤ exp (cid:18) − q | Σ | p j n tq (cid:19) (cid:0) by (6.2), σ − = 8 q (cid:1) ≤ exp (cid:18) − n ln | Σ | p j n − max { ,j − } q (cid:19) (cid:18) by (6.12), t ≤ n max { ,j − } q (cid:19) ≤ t exp (cid:18) − γ j · | Σ | · n − max { ,j − } q − j q + ln(20 t ) (cid:19) (cid:16) p = γ · n − / (2 q ) (cid:17) ≤ t exp (cid:18) − γ · | Σ | · n q − q + ln(20 t ) (cid:19) (1 ≤ j ≤ q ) ≤ t , where the last inequality follows because ln t ≤ max { ,j − } q ln n + 1 and n is sufficiently large.Note that, although a success probability of 9 / 10 suffices to ensure correctness of a single run of N , Claim 6.4 yields a much stronger result: the failure probability is exponentially small . This isbecause Claim 6.4 does not enumerate over kernel assignments. Moreover, the analysis for the case j = 1 can be improved significantly (as will be necessary in Claim 6.5), but this does not yield in anoverall improvement in our results.In the following claim, we note that, since it is only necessary to enumerate over kernelassignments, | K | /n -robustness suffices for the analysis. Claim 6.5 (Correctness on robust inputs) . Suppose the input x ∈ Σ n is | K | /n -robust for M (cid:48) and f ( x ) = 0 . Then, for every j ∈ [ q ] and every assignment κ to the kernel K j , the vote count satisfies v j ( x κ ) < τ j with probability at least − | Σ | | K j | / (10 q ) .Proof. For ease of notation, fix j ∈ [ q ], an assignment κ to K j and x as in the statement. Let O := O x κ . If j > 1, define the subcollection of O in D j by O j := O x κ ∩ D j ; if j = 1, define O := ( O x κ ∩ D ) \ C , where C ⊆ O x κ ∩ D contains every S ∈ O x κ ∩ D for which there exist atleast α − S (cid:48) ∈ O x κ ∩ D that have the same petal as S , i.e., such that S \ K = S (cid:48) \ K .We also take n to be sufficiently large when necessary for an inequality to hold.Note that x κ may not be in the domain of f , but the robustness of x allows us to bound thesize of O x κ regardless. Moreover, since f ( x ) = 0, we know that µ x ( O ) ≤ σ . As µ x is uniform, eachelement of the multi-collection has O weight exactly 1 / | µ x | . Therefore,for every i ∈ [ q ] , |O i | ≤ σ | µ x | . (6.15)Our goal now is, for every j ∈ [ q ], to upper bound the probability that there are at most τ j sets S ∈ O whose petal is in Q , i.e., such that S \ K j ⊆ Q .38or every j ∈ [ q ], let β j be such that for every set S ∈ O j there exist at most β j − S (cid:48) ∈ O j whose petal intersects the petal of S , i.e., ( S \ K ) ∩ ( S (cid:48) \ K ) (cid:54) = ∅ .For the time being let us fix j ∈ [ q ]. By applying Lemma 5.5, we partition O j into {S i : i ∈ [ β j ] } ,such that for every i ∈ [ q ] , |S i | = |O i | β j ≤ σ | µ x | β j , (6.16)where the inequality follows from (6.15), and each S i is a simple daisy of size |O | /β j .Let O (cid:48) j be the multi-collection of all sets S ∈ O j such that S \ K j ⊆ Q . In the same manner,for every i ∈ [ β j ], let S (cid:48) i be the multi-collection of all sets S ∈ S i such that S \ K j ⊆ Q . By thedefinition of v j and the fact that {S i } is a partition, v j ( x κ ) = |O (cid:48) j | = (cid:80) β j i =1 |S (cid:48) i | . Since the event that v ( x κ ) ≥ τ j can only occur if |S (cid:48) i | ≥ τ j β j for some i ∈ [ β j ], we obtainPr [ v j ( x ) ≥ τ j ] ≤ Pr (cid:20) |S (cid:48) i | ≥ τ j β j for some i ∈ [ β j ] (cid:21) ≤ β j (cid:88) i =1 Pr (cid:20) |S (cid:48) i | ≥ τ j β j (cid:21) (union bound) ≤ β j · Pr (cid:20) |S (cid:48) | ≥ τ j β j (cid:21) . (all S i have equal size)Now our goal is to show that the event that |S (cid:48) | ≥ τ j β j happens with probability at most | Σ | −| K | qβ j .Note that this is sufficient for proving the claim because plugging this into the previous equationgives Pr [ v j ( x ) ≥ τ j ] ≤ | Σ | −| Kj | q .Since the sets in S are pairwise disjoint, we can and do use the Chernoff bound. In order to doso we first bound the value of E [ |S (cid:48) | ] from above. Recall that the petal of every set S ∈ S j ⊆ D j has size j (i.e., | S \ K j | = j ), and therefore S is in S (cid:48) j with probability exactly p j . So, E [ |S (cid:48) | ] = |S | · p j ≤ σ · | µ x | · p j β j (by Eq. (6.16))= τ j β j . (cid:18) by Eq. (6.2) and Eq. (6.6), σ = 18 q and τ j = | µ x | q · p j (cid:19) We now use the Chernoff bound, we stop at a partial result and provide separate analysis forthe cases that j = 1 and j > 1. 39r (cid:20) |S (cid:48) | ≥ τ j β j (cid:21) = Pr (cid:20) |S (cid:48) | ≥ τ j β j E [ |S (cid:48) | ] E [ |S (cid:48) | ] (cid:21) ≤ exp (cid:32) − (cid:18) τ j β j E [ |S (cid:48) | ] − (cid:19) · E [ |S (cid:48) | ]3 (cid:33) (Chernoff bound) ≤ exp (cid:18) − τ j β j (cid:19) (explained aferwards) ≤ exp (cid:18) − | µ x | · p j qβ j (cid:19) (cid:18) by (6.6), τ j = | µ x | q · p j (cid:19) ≤ exp (cid:18) − ln | Σ | np j qβ j · σ (cid:19) , (by Eq. (6.3), | µ x | = 6 n ln | Σ | /σ )where the second inequality follows from (cid:16) τ j β j E [ |S (cid:48) | ] − (cid:17) · E [ |S (cid:48) | ]3 being minimal when E [ |S (cid:48) | ] is at itsupper bound of τ j β j . We next proceed to the first of the two cases.Now, take j = 1. In this case, by the construction of the daisy partition (Lemma 5.2), every set S ∈ O has a petal S \ K of cardinality exactly 1. By the definition of O , each set S ∈ O has atmost α − S (cid:48) ∈ O whose petal intersects the petal of S , i.e., ( S \ K ) ∩ ( S (cid:48) \ K ) (cid:54) = ∅ (and thus S \ K = S (cid:48) \ K , since both petals have size 1). So at most β − α − O intersect each S ∈ O , which follows from Eq. (6.10). Now,exp (cid:18) − ln | Σ | np qα · σ (cid:19) = exp (cid:18) − n · p · ρ q (cid:19) (by 6.10, α = 12 ln | Σ | / ( ρ · σ ))= exp (cid:18) − γ · ρ q n − / (2 q ) (cid:19) (cid:16) p = γ · n − / (2 q ) (cid:17) = 110 q exp − γ · n − /q · ρ · n q − q q + ln(10 q ) ≤ q exp (cid:16) − ln | Σ | · γ · q · n − /q (cid:17) (large enough n )= | Σ | −| K | / (10 q ) , where the last inequality follows because K ≤ γ · q · n − /q by Claim 6.2.Now, take j > 1. By Claim 5.3, β j = 2 h ( j − 1) = 2 n ( j − /q , which implies the first equality in40he following.exp (cid:18) − ln | Σ | np j qβ j · σ (cid:19) = exp (cid:18) − ln | Σ | np j q · σ · n − ( j − /q (cid:19) = exp (cid:16) − ln | Σ | p j · n − ( j − /q (cid:17) (cid:18) by (6.2), σ = 18 q (cid:19) = exp (cid:18) − ln | Σ | · γ j · n − j − q − j q (cid:19) (cid:16) p = γ · n − / (2 q ) (cid:17) ≤ q exp (cid:16) − ln | Σ | · γ · n − j/q · n q − q + ln(10 q ) (cid:17) (1 < j ≤ q ) ≤ q exp (cid:16) − ln | Σ | · γ · q · n − j/q (cid:17) (large enough n ) ≤ | Σ | −| K j | / (10 q ) , where the last inequality follows because K j ≤ γ · q · n − j/q by Claim 6.2. We conclude the proof Theorem 6.1 by applying the two previous claims. Recall that we transformeda ρ -robust local algorithm M for a function f , with query complexity (cid:96) , into a ρ -robust localalgorithm M (cid:48) with query complexity q = O ( (cid:96) log (cid:96) ) and suitable error rate. Then we transformed M (cid:48) into a sample-based algorithm N with sample complexity n − /O ( q ) = n − /O ( (cid:96) log (cid:96) ) , an upperbound guaranteed by the sampling step (Step 1) in the construction of N . It remains to showcorrectness of the algorithm on every input in the domain of f .We first consider errors that may arise in the sampling step. By the Chernoff bound, it choosesmore than 2 pn = 2 n − j/ (2 q ) points to query and thus outputs arbitrarily with probability at most1 / 10. Otherwise, it proceeds to the next steps.In the next part of the proof we analyse v j ( x ) instead of analyzing v (of Step 2a) in algorithm N ; this is sufficient, since by Claim 6.3, they are distributed identically over Q .Suppose the input x ∈ Σ n is such that f ( x ) = 0. Since x is ρ -robust, it is in particular | K | /n -robust (because | K | = o ( n )). Then Claim 6.5 ensures that, for every j ∈ [ q ] and kernel assignment κ to K j , the vote counte satisfies v j ( x κ ) ≥ τ j with probability at most | Σ | −| K j | / (10 q ). A unionbound over all j ∈ [ q ] and | Σ | | K j | assignments to the kernel K j ensures the probability this happens,causing N to output 1 in the threshold check step (Step 2b), is at most 1 / 10; otherwise, N willenumerate over every assignment and then (correctly) output 0 in Step 3.Now suppose x ∈ Σ n is such that f ( x ) = 1. Then Claim 6.4 ensures that, for some j ∈ [ q ], thekernel assignment κ = x | K j will make the vote count satisfy v j ( x ) ≥ τ j with probability at least9 / 10, in which case N (correctly) outputs 1 in the threshold check step (Step 2b).Therefore, N proceeds beyond the sampling step with probability 9 / 10 and outputs correctly(due to Claim 6.5 and Claim 6.4) with probability at least 9 / − / ≥ / 3. This concludes theproof of Theorem 6.1. Remark 6.6. Notice that the claims actually prove a stronger statement: the failure proba-bility is not merely 1 / 3, but exponentially small . For each j ∈ [ q ], the error probability isexp (cid:18) − Ω (cid:18) n − jq + q − j q (cid:19)(cid:19) , but it must withstand a union bound over exp (cid:16) O (cid:16) n − jq (cid:17)(cid:17) events (corre-sponding to the assignments to the kernel K j ). The smallest slackness is in the case j = q , where the41uccess probability is still exp (cid:16) − Ω (cid:16) n q (cid:17)(cid:17) ; this implies that correctness holds for exp (cid:16) O (cid:16) n q (cid:17)(cid:17) many runs of the algorithm. Therefore, the same samples can be reused for exponentially many runsof possibly different algorithms. In this section, we derive applications from Theorem 6.1 which range over three fields of study:property testing, coding theory, and probabilistic proof systems. We remind that our main applicationto property testing follows as a direct corollary of Theorem 6.1, whereas our applications to codingtheory and probabilistic proof systems require additional arguments. Recall that a property tester T for property Π ⊆ Σ n is an algorithm that receives explicit access toa proximity parameter ε > 0, query access to x ∈ Σ n and approximately decides membership in Π:it accepts if x ∈ Π and rejects if x is ε -far from Π, with high probability.By Claim 4.6, an ε/ ε ∈ (0 , 1) is an ε/ f : Σ n → { , } defined as follows. f ( x ) = (cid:26) , if x ∈ Π0 , if x ∈ B ε (Π) . Note, moreover, that a local algorithm that solves f is by definition an ε -tester, acceptingelements of Π and rejecting points that are ε -far from it with high probability. A direct applicationof Theorem 6.1 thus yields the following corollary, which improves upon the main result of [FLV15],by extending it to the two-sided adaptive setting. Corollary 7.1. For every fixed ε > , q ∈ N , any ε/ -testable property of strings in Σ n with q queries admits a sample-based ε -tester with sample complexity n − /O ( q log q ) . This also immediately extends an application to multitesters in [FLV15]. By standard errorreduction, for any k ∈ N , an increase of the sample complexity by a factor of O (log k ) ensures eachmember of a collection of k sample-based testers errs with probability 1 / (3 k ). A union bound allowsus to reuse the same samples for all testers , so that all will output correctly with probability 2 / k = exp (cid:16) n /ω ( q log q ) (cid:17) , the sample complexity becomes n − /O ( q log q ) · n /ω ( q log q ) = o ( n ),which yields the following corollary. Corollary 7.2. If a property Π ⊆ Σ n is the union of k = exp (cid:16) n /ω ( q log q ) (cid:17) properties Π , . . . , Π k ,each ε/ -testable with q queries, then Π is ε -testable via a sample-based tester with sublinear samplecomplexity. A tester for the union simply runs all (sub-)testers, accepting if and only if at least one of themaccepts. A proof for a generalisation of this corollary, which holds for partial testers , is given in theSection 7.3. 42 .2 Stronger relaxed LDC lower bounds Relaxed LDCs are codes that relax the notion of LDCs by allowing the local decoder to abort on asmall fraction of the indices, yet crucially still avoid errors. This seemingly modest relaxation turnsout to allow for dramatically better parameters (an exponential improvement on the rate of thebest known O (1)-query LDCs). However, since these algorithms are much stronger, obtaining lowerbounds on relaxed LDCs is significantly harder than on standard LDCs. Indeed, the first lowerbound on relaxed LDCs [GL20] was only shown more than a decade after the notion was introduced;this bound shows that to obtain query complexity q , a relaxed LDC C : { , } k → { , } n must haveblocklength n ≥ k O ( q · log2 q ) , In this section, we use Theorem 6.1 to obtain an improved lower bound with an exponentially better dependency on the query complexity (note that even for q = O (1) this strongly affects theasymptotic behaviour). We begin by providing a precise definition of relaxed LDCs. Definition 7.3 (Relaxed LDCs) . A code C : { , } k → { , } n whose distance is δ C is a q -localrelaxed LDC with success rate ρ and decoding radius δ ∈ (0 , δ C / if there exists a randomisedalgorithm D , known as a relaxed decoder , that, on input i ∈ [ k ] , makes at most q queries to anoracle w and satisfies the following conditions.1. Completeness : For any i ∈ [ k ] and w = C ( x ) , where x ∈ { , } k , Pr[ D w ( i ) = x i ] ≥ / . Relaxed Decoding : For any i ∈ [ k ] and any w ∈ { , } n that is δ -close to a (unique) codeword C ( x ) , Pr[ D w ( i ) ∈ { x i , ⊥} ] ≥ / . Success Rate : There exists a constant ρ > such that, for any w ∈ { , } n that is δ -close to acodeword C ( x ) , there exists a set I w ⊆ [ k ] of size at least ρk such that for every i ∈ I w , Pr[ D w ( i ) = x i ] ≥ / . We define the error rate σ of a relaxed LDC as the maximum among Pr[ D C ( x ) ( i ) (cid:54) = x i ] andPr[ D w ( i ) / ∈ { x i , ⊥} ], over all messages x ∈ { , } k and words w ∈ { , } n that are δ -close to C ( x ). Remark 7.4. The first two conditions imply the latter, as shown by [BGH + D is a relaxedlocal decoder.Note that, whenever D w outputs ⊥ , it detected that the input is not valid , since it is inconsistentwith any codeword C ( x ). In order to capture this behaviour, we slightly generalise the definition oflocal algorithms and robustness as follows. Definition 7.5 (Relaxed local algorithm) . An algorithm M is said to be a relaxed local algorithm for computing a function f with error rate σ and valid input set V if it satisfies the conditions ofDefinition 4.1, but with the expression Pr[ M x ( z ) = f ( z, x )] ≥ − σ replaced by Pr[ M x ( z ) ∈ { f ( z, x ) , ⊥} ] ≥ − σ, or each input x (cid:54)∈ V . The subset V ⊆ Σ n of the domain of f where M is guaranteed never to output ⊥ comprises the valid inputs . We shall also need to generalise the notion of robustness accordingly. Definition 7.6 (Robustness) . A local algorithm M for computing f is ρ -robust at point x ∈ Σ n if Pr[ M w ( z ) ∈ { f ( z, x ) , ⊥} ] ≥ − σ for all w ∈ B ρ ( x ) . We remark that robustness for algorithms that allow aborting allows the correct value to changeto ⊥ (but, crucially, not to the wrong value) even if only one bit is changed. This makes theargument more involved than an argument for LDCs, and indeed, our theorem for relaxed LDCsrelies on the full machinery of Theorem 6.1.Note that an algorithm that ignores its input and always outputs ⊥ fits both definitions above,but has no valid inputs and clearly does not display any interesting behaviour. We also remark thatthe set of valid inputs captures completeness (but not the success rate) in the case of relaxed LDCs.With these extensions, a relaxed local decoder D with decoding radius δ fits the definition ofa (relaxed) local algorithm that receives i ∈ [ k ] as explicit input, where the code C comprises thevalid inputs and every x ∈ C is δ -robust for D .While a relaxed local algorithm is very similar in flavour to a standard local algorithm, it maynot be entirely clear whether a transformation analogous to Theorem 6.1 holds in this case aswell. We next show that one indeed does: with small modifications to the algorithm constructedin Section 6.1, we may leverage the same analysis of Section 6.2 to prove the following variant ofTheorem 6.1. Theorem 7.7. Suppose there exists a ( ρ , ρ ) -robust relaxed local algorithm M for computing thefunction f : Z × Σ n → { , } with query complexity (cid:96) = O (1) and ρ , ρ = Ω(1) . Let V ⊆ Σ n be thevalid inputs of M . Then, there exists a sample-based relaxed local algorithm N for f with samplecomplexity n − /O ( (cid:96) log (cid:96) ) with the same valid inputs V .Proof. Throughout the proof, we assume the explicit input to be fixed and omit it from the notation.First, note that error reduction (Claim 5.8) and randomness reduction (Claim 5.9) apply in therelaxed setting: the analysis is identical on valid inputs, and holds likewise for the remainder ofthe domain of f (with correctness of M relaxed to be M x ∈ { f ( x ) , ⊥} ). Thus Lemma 5.10 enablesthe transformation of M into another robust algorithm M (cid:48) with small error rate that uniformlysamples a decision tree from a multi-collection of small size.Recall that the construction of the sample-based algorithm in Section 6.1 uses a collection oftriplets obtained from the behaviour of M (cid:48) when it outputs 1. A corresponding collection can beobtained for the case where M (cid:48) outputs 0. Denote by T b the collection that corresponds to output b ∈ { , } , and let N b be the sample-based algorithm that • uses the triplets T b to construct its daisy partition in the preprocessing step; • outputs b if the counter crosses the threshold in Step 2b; and • outputs ⊥ in Step 3 if the threshold is never reached;but is otherwise the same as the construction of Section 6.1.The analysis of Section 6.2 applies to N xb : if x ∈ V , the analysis of Claim 6.4 is identical; whileif x is robust and f ( x ) = ¬ b , Claim 6.5 requires a lower bound on the probability that M (cid:48) outputs b x (and enables an application of the volume lemma), which holds by the definitionof error rate of a relaxed local algorithm. Therefore, N xb outputs arbitrarily in the sampling stepwith probability at most 1 / 10, and outputs ⊥ (resp. b ) when f ( x ) = b (resp. when x is robust and f ( x ) = ¬ b ) with probability at most 1 / N simply executes the sampling step of N , then theenumeration steps of N and N on these samples, outputting b if one of N b outputs b and outputting ⊥ otherwise. Then, N x = f ( x ) if x ∈ V and N x = ⊥ if x / ∈ V , with probability 7 / ≥ / Corollary 7.8. Any binary code C : { , } k → { , } n that admits a relaxed local decoder D withdecoding radius δ/ and query complexity q = O (1) also admits a sample-based relaxed local decoder D (cid:48) with decoding radius δ and sample complexity n − /O ( q log q ) . We are now ready to state the following corollary, which improves on the previous best ratelower bound for relaxed LDCs [GL20] by an application of the theorem above to the setting ofrelaxed local decoding. This follows from the construction of a global decoder (which is able todecode the entire message) that is only guaranteed to succeed with high probability when its inputis a perfectly valid codeword . Corollary 7.9. Any code C : { , } k → { , } n that is relaxed locally decodable with q = O (1) queries satisfies n = k O ( q q ) . Proof. Let D (cid:48) be the sample-based relaxed LDC with sample complexity q (cid:48) obtained by Corollary 7.8from a relaxed LDC with query complexity q for the code C . Reduce the error rate of D (cid:48) to 1 / (3 k )by repeating the algorithm O (log k ) times and taking the majority output, thus increasing thesample complexity to O ( q (cid:48) · log k ) = n − /t with t = O ( q log q ).Now, consider the global decoder G defined as follows: on input w , execute the sampling stageonce and the enumeration stages of D w (1) , . . . , D w ( k ) on the same samples. A union bound ensuresthat, with probability at least 2 / 3, the outputs satisfy D w ( i ) = x i for all i if w = C ( x ).The global decoder G obtains k bits of information from n − /t bits with probability above 1 / k ≤ n − /t n t − t , so that n ≥ k / ( t − . Since t = O ( q log q ), it follows that n = k /O ( q log q ) . Recall that a Merlin-Arthur proof of proximity (MAP, for short) for property Π is a local algorithmthat receives explicit access to a proximity parameter ε > π , as wellas query access to a string x ∈ Σ n . It uses the information encoded in π to decide which coordinatesof x to query, accepting if x ∈ Π and π is a valid proof for x , and rejecting if x is ε -far from Π. Inparticular, a MAP with proof length 0 is simply a tester. The complexity of a MAP is defined as the45um of its proof length and query complexity. For simplicity, we consider the proximity parameter ε to be a fixed constant in the following discussion.A partial tester T is a relaxation of the standard definition of a tester, that accepts inputs insidea property Π and rejects inputs that are far from a larger property Π that contains Π (standardtesting is the case where Π = Π ). We first formalise an observation raised in [FGL14], whichshows an equivalence between MAPs and coverings by partial testers. Claim 7.10. A MAP T for property Π ⊆ Σ n with proof complexity m , error rate σ and querycomplexity q = q ( ε ) is equivalent to a collection of partial testers { T i : i ∈ [ | Σ | m ] } . Each T i ( ε ) accepts inputs in the property Π i and rejects inputs that are ε -far from Π , with the same querycomplexity q and error rate σ as T . The properties Π i satisfy Π i ⊆ B ε (Π) and Π ⊆ ∪ i Π i .Proof. Consider a MAP T with parameters as in the statement, and define T i ( ε ) := T ( ε, i ) for eachpurported proof i ∈ [ | Σ | m ]. Clearly the query complexity and error rate of T i match those of T ,and these testers reject points that are ε -far from Π. The property Π i is, by definition, the set ofinputs that T i accepts (with probability at least 1 − σ ), which may possibly be empty. But sincethe definition of a MAP guarantees that for each x ∈ Π there exists a proof i such that T xi accepts(with probability 1 − σ ), we have Π ⊆ ∪ i Π i .Consider, now, a collection of testers { T i : i ∈ [ | Σ | m ] } as in the statement, and define a MAP T that simply selects the tester indexed by the received proof string; i.e., T ( ε, i ) := T i ( ε ). Then, withprobability at least 1 − σ , the MAP T rejects inputs that are ε -far from Π and accepts x ∈ Π whenits proof string is i ∈ [ | Σ | m ] such that x ∈ Π i .As discussed in the introduction, one of the most fundamental questions regarding proofs ofproximity is their relative strength in comparison to testers; that is, whether verifying a proof foran approximate decision problem can be done significantly more efficiently than solving it. This canbe cast as an analogue of the P versus NP question for property testing.Fortunately, in the setting of property testing, the problem of verification versus decision is verymuch tractable. Indeed, one of the main results in [GR18] is that this is indeed the case. Namely,that there exists a property Π which: (1) admits a MAP with proof length O (log( n )) and querycomplexity q = O (1); and (2) requires at least n − / Ω( q ) queries to be tested without access to aproof. (We remark that in [GR18], for simplicity, the lower bound is stated in a slightly weakerform. However, it is straightforward to see that the stronger form holds; see discussion at the end ofthis section.)While this implies a nearly exponential separation between the power of testers and MAPs, itremained open whether the aforementioned sublinear lower bound on testing is an artefact of thetechniques, or whether it is possible to obtain a stronger separation, where the property is harderfor testers, potentially requiring even a linear number of queries.Claim 7.10 and Theorem 6.1 allow us to prove the following corollary, which shows that theforegoing separation is nearly tight. Theorem 7.11. If a property Π ⊆ Σ n admits a MAP with query complexity q , proof length m andproximity parameter ε/ , then it admits a sample-based ε -tester with sample complexity m · n − /O ( q log q ) . Applying Theorem 7.11 to the special case of MAPs with logarithmic proof length, we obtain asample-based tester with sample complexity n − /O ( q log q ) , showing that the separation in [GR18]46s nearly optimal, and in particular that there cannot be a fully exponential separation betweenMAPs and testers. Proof (of Theorem 7.11). Let Π be a property and T be a MAP with proof length m as in thestatement. By Claim 7.10, there exists a collection of partial testers { T i : i ≤ | Σ | m } with querycomplexity q that satisfy the following. Each T i accepts inputs in a property Π i and rejects inputsthat are ε/ ⊆ ∪ i Π i . By applying Corollary 7.1 to each of these testers, weobtain a collection of sample-based testers { S i } with sample complexity q (cid:48) = n − /O ( q log q ) for thesame partial properties, but which only reject inputs that are ε -far from Π.The execution of each of the S i proceeds in two steps, as defined in Section 6.1: sampling (Step 1)and enumeration (Step 2). Note that the sampling step is exactly the same for every S i .Let k = O ( m log | Σ | ) = O ( m ) such that taking the majority output from k repetitions of S i yields an error rate of 1 / (3 | Σ | m ). We define a new sample-based algorithm S that repeats thefollowing steps k times:1. Execute both steps of S (sampling and enumeration), recording the output.2. For all 1 < i ≤ | Σ | m , only execute the enumeration step of S i on the samples obtained inItem 1, and record the output.After all k iterations have finished, check if at least k/ S i were 1 for some i . If so, output1, and output 0 otherwise.First suppose S receives an input x ∈ Π, and let i ≤ | Σ | m such that x ∈ Π i . Then the majorityoutput of the enumerations steps of S i is 1 with probability 1 − / (3 | Σ | m ) ≥ / 3. Now suppose S receives an input x that is ε -far from Π. Then, for each i , the majority output of the enumerationstep of S i is 1 with probability at most 1 / (3 | Σ | m ). A union bound over all i ≤ | Σ | m ensures thishappens with probability at least 1 / 3, in which case S correctly outputs 0. S is therefore an ε -tester for the property Π, and as its sample complexity is k · q (cid:48) = m · n − /O ( q log q ) , the theorem follows.Interestingly, as a direct corollary of Theorem 7.11, we obtain that the general transformationin Theorem 6.1 is optimal, up to a quadratic gap in the dependency on the sample complexity, asa transformation with a smaller sample complexity could have been used to transform the MAPconstruction in the MAPs-vs-testers separation of [GR18], yielding a tester with query complexitythat contradicts the lower bound in that result. Theorem 7.12. There does not exist a transformation that takes a robust local algorithm withquery complexity q and transforms it into a sample-based local algorithm with sample complexity atmost n − /o ( q ) .Proof. Let Π be the encoded intersecting messages property considered in [GR18, Section 3.1], forwhich it was shown that Π has a MAP with query complexity q and logarithmic proof complexity,but every tester for Π requires at least n − / Ω( q ) queries. Suppose towards contradiction that atransformation as in the hypothesis exists. Then, applying the transformation to the aforementionedMAP (as in Theorem 7.11) yields a tester for Π with query complexity n − /o ( q ) , in contradiction toour assumption. 47 n the lower bound in [GR18]. The separation between MAPs and testers in [GR18] is provedwith respect to a property of strings that are encoded by relaxed LDCs; namely, the encodedintersecting messages property , defined asEIM C = (cid:110)(cid:0) C ( x ) , C ( y ) (cid:1) : x, y ∈ { , } k , k ∈ N and ∃ i ∈ [ k ] s.t. x i (cid:54) = 0 and y i (cid:54) = 0 (cid:111) , where C : { , } k → { , } n is a code with linear distance, which is both a relaxed LDC andan LTC. In [GR18] it is shown that there exists a MAP with proof length O (log( n )) and querycomplexity q = O (1), and crucially for us, that any tester requires Ω( k ) queries to be tested withoutaccess to a proof. The best constructions of codes that satisfy the aforementioned conditions[BGH + 04, CGS20, AS20] achieve blocklength n = O ( k /q ) = k / Ω( q ) , and hence the statedlower bound follows. References [ALM + 92] S. Arora, C. Lund, R. Motwani, M. Sudan, and Mario Szegedy. Proof verificationand hardness of approximation problems. In Proceedings., 33rd Annual Symposium onFoundations of Computer Science , pages 14–23, 1992.[AS92] S. Arora and S. Safra. Probabilistic checking of proofs; a new characterization of np.In Proceedings - 33rd Annual Symposium on Foundations of Computer Science, FOCS1992 , Proceedings - Annual IEEE Symposium on Foundations of Computer Science,FOCS, pages 2–13. IEEE Computer Society, 1992.[AS20] Vahid R Asadi and Igor Shinkar. Relaxed locally correctable codes with improvedparameters. arXiv preprint arXiv:2009.07311 , 2020.[BDG17] Jop Bri¨et, Zeev Dvir, and Sivakanth Gopi. Outlaw distributions and locally decodablecodes. In , pages 20:1–20:19, 2017.[BGGZ18] Jeremiah Blocki, Venkata Gandikota, Elena Grigorescu, and Samson Zhou. Relaxedlocally correctable codes in computationally bounded channels. In , 2018.[BGH + 04] Eli Ben-Sasson, Oded Goldreich, Prahladh Harsha, Madhu Sudan, and Salil P. Vadhan.Robust PCPs of proximity, shorter PCPs and applications to coding. In Proceedings ofthe 36th Annual ACM Symposium on Theory of Computing (STOC) , 2004.[BGS15] Arnab Bhattacharyya, Elena Grigorescu, and Asaf Shapira. A unified framework fortesting linear-invariant properties. Random Structures & Algorithms , 46(2):232–260,2015.[BMR19a] Piotr Berman, Meiram Murzabulatov, and Sofya Raskhodnikova. The power andlimitations of uniform samples in testing properties of figures. Algorithmica , 81(3):1247–1266, 2019. 48BMR19b] Piotr Berman, Meiram Murzabulatov, and Sofya Raskhodnikova. Testing convexity offigures under the uniform distribution. Random Structures & Algorithms , 54(3):413–443,2019.[BRV18] Itay Berman, Ron D Rothblum, and Vinod Vaikuntanathan. Zero-knowledge proofsof proximity. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.[BSCG + 19] Eli Ben-Sasson, Alessandro Chiesa, Lior Goldberg, Tom Gur, Michael Riabzev, andNicholas Spooner. Linear-size constant-query iops for delegating computation. In Theory of Cryptography Conference , pages 494–521. Springer, 2019.[BSHLM09] Eli Ben-Sasson, Prahladh Harsha, Oded Lachish, and Arie Matsliah. Sound 3-querypcpps are long. ACM Transactions on Computation Theory (TOCT) , 1(2):1–49, 2009.[CFSS17] Xi Chen, Adam Freilich, Rocco A Servedio, and Timothy Sun. Sample-based high-dimensional convexity testing. In Approximation, Randomization, and Combinato-rial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017) . SchlossDagstuhl-Leibniz-Zentrum fuer Informatik, 2017.[CG18] Cl´ement L. Canonne and Tom Gur. An adaptivity hierarchy theorem for propertytesting. Computational Complexity , 27(4):671–716, 2018.[CGdW09] Victor Chen, Elena Grigorescu, and Ronald de Wolf. Efficient and error-correcting datastructures for membership and polynomial evaluation. arXiv preprint arXiv:0909.3696 ,2009.[CGS20] Alessandro Chiesa, Tom Gur, and Igor Shinkar. Relaxed locally correctable codes withnearly-linear block length and constant query complexity. , 2020.[DGG19] Irit Dinur, Oded Goldreich, and Tom Gur. Every set in p is strongly testable under asuitable encoding. In , 2019.[DH13] Irit Dinur and Prahladh Harsha. Composition of low-error 2-query pcps using decodablepcps. SIAM J. Comput. , 42(6):2452–2486, 2013.[DR06] Irit Dinur and Omer Reingold. Assignment testers: Towards a combinatorial proof ofthe pcp theorem. SIAM Journal on Computing , 36(4):975–1024, 2006.[Efr12] Klim Efremenko. 3-query locally decodable codes of subexponential length. SIAM J.Comput. , 41(6):1694–1703, 2012.[EKK + 00] Funda Erg¨un, Sampath Kannan, S Ravi Kumar, Ronitt Rubinfeld, and MaheshViswanathan. Spot-checkers. Journal of Computer and System Sciences , 60(3):717–751,2000.[FGL + 91] Uriel Feige, Shafi Goldwasser, L´aszl´o Lov´asz, Shmuel Safra, and Mario Szegedy. Ap-proximating clique is almost np-complete. In Proc. 32nd IEEE Symp. on Foundationsof Computer Science , pages 2–12, 1991.49FGL14] Eldar Fischer, Yonatan Goldhirsh, and Oded Lachish. Partial tests, universal tests anddecomposability. In Moni Naor, editor, Innovations in Theoretical Computer Science,ITCS’14, Princeton, NJ, USA, January 12-14, 2014 , pages 483–500. ACM, 2014.[FLV15] Eldar Fischer, Oded Lachish, and Yadu Vasudev. Trading query complexity for sample-based testing and multi-testing scalability. In Proceedings of the IEEE 56th AnnualSymposium on Foundations of Computer Science (FOCS) , 2015.[GG16a] Oded Goldreich and Tom Gur. Universal locally verifiable codes and 3-round interactiveproofs of proximity for CSP. Electronic Colloquium on Computational Complexity(ECCC) , 23:192, 2016.[GG16b] Oded Goldreich and Tom Gur. Universal locally verifiable codes and 3-round interactiveproofs of proximity for csp. In Electronic Colloquium on Computational Complexity(ECCC) , volume 23, page 192, 2016.[GG18] Oded Goldreich and Tom Gur. Universal locally testable codes. Chicago J. Theor.Comput. Sci. , 2018.[GGK15] Oded Goldreich, Tom Gur, and Ilan Komargodski. Strong locally testable codes withrelaxed local decoders. In , 2015.[GGR98] Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connectionto learning and approximation. J. ACM , 45(4):653–750, 1998.[GL20] Tom Gur and Oded Lachish. On the power of relaxed local decoding algorithms. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms ,pages 1377–1394. SIAM, 2020.[Gol04] Oded Goldreich. Short locally testable codes and proofs. In ECCC (later appeared inProperty Testing 2010) , 2004.[Gol17] Oded Goldreich. Introduction to property testing . Cambridge University Press, 2017.[GR16] Oded Goldreich and Dana Ron. On sample-based testers. ACM Transactions onComputation Theory (TOCT) , 8(2):1–54, 2016.[GR17] Tom Gur and Ron D. Rothblum. A hierarchy theorem for interactive proofs of proximity.In , pages 39:1–39:43, 2017.[GR18] Tom Gur and Ron D. Rothblum. Non-interactive proofs of proximity. ComputationalComplexity , 27(1):99–207, 2018.[GRR18] Tom Gur, Govind Ramnarayan, and Ron D. Rothblum. Relaxed locally correctablecodes. In , pages 27:1–27:11, 2018.[GS06] Oded Goldreich and Madhu Sudan. Locally testable codes and PCPs of almost-linearlength. Journal of the ACM (JACM) , 53(4):558–655, 2006.50GS10] Oded Goldreich and Or Sheffet. On the randomness complexity of property testing. Computational Complexity , 19(1), 2010.[GT03] Oded Goldreich and Luca Trevisan. Three theorems regarding testing graph properties. Random Structures & Algorithms , 23(1):23–57, 2003.[HS70] Andr´as Hajnal and E. Szemer´edi. Proof of a conjecture of p. erd˝os. Colloq Math SocJ´anos Bolyai , 4, 1970.[HSX + 12] Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, ParikshitGopalan, Jin Li, and Sergey Yekhanin. Erasure coding in Windows Azure storage.In Presented as part of the 2012 USENIX Annual Technical Conference , pages 15–26,2012.[KR15] Yael Tauman Kalai and Ron D Rothblum. Arguments of proximity. In AnnualCryptology Conference , pages 422–442. Springer, 2015.[KS17] Swastik Kopparty and Shubhangi Saraf. Local testing and decoding of high-rateerror-correcting codes. Electronic Colloquium on Computational Complexity (ECCC) ,24:126, 2017.[KT00] Jonathan Katz and Luca Trevisan. On the efficiency of local decoding procedures forerror-correcting codes. In Proceedings of the 32nd Annual ACM Symposium on Theoryof Computing (STOC) , 2000.[MR10] Dana Moshkovitz and Ran Raz. Two-query PCP with subconstant error. J. ACM ,57(5):29:1–29:29, 2010.[New91] Ilan Newman. Private vs. common random bits in communication complexity. Infor-mation processing letters , 39(2):67–71, 1991.[RR19] Noga Ron-Zewi and Ron Rothblum. Local proofs approaching the witness length. Electronic Colloquium on Computational Complexity (ECCC) , 26:127, 2019.[RRR19] Omer Reingold, Guy N Rothblum, and Ron D Rothblum. Constant-round interactiveproofs for delegating computation. SIAM Journal on Computing , (0):STOC16–255,2019.[RS96] Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of polynomials withapplications to program testing. SIAM Journal on Computing , 25(2):252–271, 1996.[RVW13] Guy N Rothblum, Salil Vadhan, and Avi Wigderson. Interactive proofs of proximity:delegating computation in sublinear time. In Proceedings of the forty-fifth annual ACMsymposium on Theory of computing , pages 793–802, 2013.[Tre04] Luca Trevisan. Some applications of coding theory in computational complexity. Electronic Colloquium on Computational Complexity (ECCC) , 2004.[Yek08] Sergey Yekhanin. Towards 3-query locally decodable codes of subexponential length. J. ACM , 55(1):1:1–1:16, 2008.[Yek12] Sergey Yekhanin. Locally decodable codes. Foundations and Trends in TheoreticalComputer Science , 6(3):139–255, 2012.51 Deferred proofs In this appendix, we provide the proofs of two claims that were deferred in Section 5, namely,Claim 5.8, which provides an amplification procedure for local algorithms, and Claim 5.9, whichprovides a randomness reduction procedure for local algorithms. We remark that both claims followfrom a straightforward adaptation of standard techniques, and include these proofs for completeness.We begin with the amplification procedure. Claim A.1 (Claim 5.8 restated) . Let M be a ( ρ , ρ ) -robust algorithm for computing a function f : Z × Σ n → { , } with error rate σ ≤ / , query complexity q and randomness complexity r .For any σ (cid:48) > , there exists a ( ρ , ρ ) -robust algorithm N for computing the same function witherror rate σ (cid:48) , query complexity q log(1 /σ (cid:48) ) /σ and randomness complexity qr log(1 /σ (cid:48) ) /σ .Proof. Define N as the algorithm that makes t = 108 log(1 /σ (cid:48) ) /σ independent runs of M andoutputs the most frequent symbol, resolving ties arbitrarily. The query and randomness complexitiesof N clearly match the statement, and we must now prove that the error rate is indeed σ (cid:48) and that N is ( ρ , ρ )-robust.Fix z ∈ Z and x ∈ Σ n in the domain of f and let b := f ( z, x ). As M is ρ b -robust at x , thealgorithm satisfies Pr[ M y ( z ) = b ] ≥ − σ for all y ∈ B ρ b ( x ). By the Chernoff bound,Pr [ M y ( z ) (cid:54) = b for at least ( σ + 1 / t runs] ≤ e − σt · = e − log σ (cid:48) < − log σ (cid:48) = σ (cid:48) . The majority rule will thus yield outcome b with probability at least 1 − σ (cid:48) , since at least 1 − ( σ + 1 / t ≥ t/ b (except with probability at most σ (cid:48) ). As x, z and y ∈ B ρ b ( x ) arearbitrary, the result follows.We proceed to the randomness reduction transformation. Claim A.2 (Claim 5.9, restated) . Let M be a ( ρ , ρ ) -robust algorithm for computing a function f : Z × Σ n → { , } with error rate σ , query complexity q and randomness complexity r .There exists a ( ρ , ρ ) -robust algorithm N for computing the same function with error rate σ and query complexity q , whose distribution ˜ µ N has support size n ln | Σ | /σ . In particular, therandomness complexity of N is bounded by log( n/σ ) + log log | Σ | + 2 .Proof. Fix any explicit input z ∈ Z . Let { x j } be an enumeration of the inputs in Σ n such thatPr[ M x j ( z ) = b j ] ≥ − σ for some b j ∈ { , } . Note that this includes points in the neighbourhoodof a point at which M is robust which are not necessarily in the domain of f , so it suffices toshow Pr[ N x j ( z ) = b j ] ≥ − σ to prove the claim for N with the required query complexity anddistribution.Define the 2 r ×| { x j } | matrix E with entries in { , } as follows. Denote by b ij ∈ { , } the outputof M x j ( z ) when it executes according to the decision tree indexed by (the binary representation of) i ∈ [2 r ]. Then, E i,j = (cid:26) , if b ij (cid:54) = b j , otherwise.Note that E i,j simply indicates whether M x j ( z ) outputs incorrectly on input when the outcome ofthe algorithm’s coin flips is (the binary representation of) i . By construction, for each fixed j , afraction of at most σ indices i ∈ [2 r ] are such that E i,j = 1.52et t = 3 n ln | Σ | /σ and I , . . . , I t be independent random variables uniformly distributed in [2 r ].For each fixed j ≤ | { x j } | ≤ | Σ | n and k ≤ t , we have E [ E I k ,j ] ≤ σ . By the Chernoff bound,Pr (cid:34) t (cid:88) k =1 E I k ,j ≥ σt (cid:35) ≤ e − σt = e − n ln | Σ | < | Σ | − n . Applying the union bound over all j ≤ | { x j } | ≤ | Σ | n , we obtainPr (cid:34) t (cid:88) k =1 E I k ,j ≥ σt for some j (cid:35) < . We have thus shown, via the probabilistic method, the existence of a multi-set R z of size3 n ln | Σ | /σ such that Pr[ N x j ( z ) (cid:54) = b j ] ≤ σ, where N samples its random strings uniformly from R z (rather than from { , } r ), using thecorresponding decision trees of M . The size of S z is thus | ˜ µ N | = 3 n ln | Σ | /σ , and this sampling canbe performed with log( n/σ ) + log log | Σ | + 2 random coins.Since the decision trees of N are simply a subcollection of those of M , the query complexity of N is qq