[PDF] Local Correction of Juntas

Abstract

A Boolean function f over n variables is said to be q-locally correctable if, given a black-box access to a function g which is "close" to an isomorphism f_sigma of f, we can compute f_sigma(x) for any x in Z_2^n with good probability using q queries to g. We observe that any k-junta, that is, any function which depends only on k of its input variables, is O(2^k)-locally correctable. Moreover, we show that there are examples where this is essentially best possible, and locally correcting some k-juntas requires a number of queries which is exponential in k. These examples, however, are far from being typical, and indeed we prove that for almost every k-junta, O(k log k) queries suffice.

Full PDF

aa r X i v : . [ c s . CC ] D ec Local Correction of Juntas

Noga Alon ∗ Amit Weinstein † August 20, 2018

Abstract

A Boolean function f over n variables is said to be q -locally correctable if, given a black-boxaccess to a function g which is ”close” to an isomorphism f σ of f , we can compute f σ ( x ) for any x ∈ Z n with good probability using q queries to g .We observe that any k -junta, that is, any function which depends only on k of its inputvariables, is O (2 k )-locally correctable. Moreover, we show that there are examples where thisis essentially best possible, and locally correcting some k -juntas requires a number of querieswhich is exponential in k . These examples, however, are far from being typical, and indeed weprove that for almost every k -junta, O ( k log k ) queries suﬃce. The ﬁeld of property testing of Boolean functions received a considerable amount of attention inthe last two decades. Many properties of functions have been examined in order to estimate what isthe needed query complexity for testing them, that is, the number of inputs of the function one hasto read in order to distinguish between a function that satisﬁes the property and one that is ”far”from satisfying it. In particular, one is usually interested in properties for which the number ofqueries is independent of the input size. Some of these properties are linearity [7], being a dictatorfunction [4, 11], a junta [9, 10], or a low-degree polynomial [2].Another property that one might consider testing is functions isomorphism, i.e. testing if twofunctions are identical up to relabeling of the input variables. A common scenario is where onefunction is given in advance and the goal of the tester is to determine if the second input functionis isomorphic to it or far from any isomorphism of it. Several recent results indicate that testingthis property is hard for most functions (requires Ω( n ) queries), and speciﬁcally for k -juntas thereare lower bounds which depend on k (see e.g. [6, 1, 8]).The focus of our work is not testing such properties, but rather locally correcting functions, thatis, determining the value of a function in a given point by reading its values in several other points.This is closely related to random self-reducibility, as pointed out already in [7]. More precisely, wecare about locally correcting speciﬁc functions which are known up to isomorphism. ∗ Sackler School of Mathematics and Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978,Israel and Institute for Advanced Study, Princeton, New Jersey 08540, USA. Email: [email protected]. Researchsupported in part by an ERC Advanced grant, by a USA-Israeli BSF grant and by NSF grant No. DMS-0835373. † Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Email: [email protected] supported in part by an ERC advanced grant. uestion. Given a speciﬁc Boolean function f , what is the needed query complexity in order tocorrect an input function which is close to some isomorphism f σ of f ? This question can be seen as a special case of locally correctable codes (see, e.g., [12]). Namely,each codeword would be the 2 n evaluations of an isomorphism of the input function (at most n !distinct codewords) and we would like to correct any speciﬁc value of the given noisy codewordusing as few queries as possible.Here we study the above question mostly for juntas. We provide both lower and upper ex-ponential bounds for the query complexity of locally correcting juntas with respect to their size.However, the given lower bound is applicable only to a small portion of the juntas and in fact weshow that most k -juntas are locally correctable using a nearly linear (in k ) number of queries. In order to correct functions, we need to ﬁrst deﬁne when two functions are ”close”, as otherwisecorrection is hopeless. We use the common deﬁnition, saying two Boolean functions are ε -close ifthey agree on all but at most an ε fraction of the inputs. The following deﬁnition best describesthe focus of our work, indicating when a function is locally correctable. Deﬁnition.

A Boolean function f : Z n Z is said to be q -locally correctable for ε > if thefollowing holds. There exists an algorithm that given an input function g which is ε -close to anisomorphism f σ of f , can determine the value f σ ( x ) for any speciﬁc x ∈ Z n with probability at least / , using q queries to g . More generally, we also deﬁne a family of functions to be locally correctable when we do notrequire to know which speciﬁc function from the family we are trying to correct.

Deﬁnition.

A family F of Boolean functions is said to be q -locally correctable for ε > if thefollowing holds. There exists an algorithm that given an input function g which is ε -close to anisomorphism f σ of f , for some f ∈ F from the family, can determine the value f σ ( x ) for any speciﬁc x ∈ Z n with probability at least / , using q queries to g . A crucial observation when looking at the above deﬁnitions is the fact that the mentionedalgorithm must hold for every input x ∈ Z n . Replacing this requirement by the ability to determinethe value at a uniform random x , any function would be trivially 1-locally correctable for ε ≤ / / ε as aconstant independent of n , which however can depend on some property of the function f . Thisdependence is often required to ensure that g is close to a unique isomorphism of f (up to equivalentisomorphisms).A simple result regarding juntas is an exponential upper bound for the number of queries, interms of the junta’s size. For this upper bound, we use the analysis of testing low-degree polynomialsand therefore get the following more general bound. Proposition 1.

Every polynomial of degree k is O (2 k ) -locally correctable for ε < − k − .Proof sketch. The techniques used in testing low-degree polynomials rely on their values on thepoints of random aﬃne subcubes inside Z n which are deﬁned by random bases of k + 1 vectors and2n oﬀset in Z n (see [2]). Taking such a subcube and evaluating the sum of a degree k polynomial onall 2 k +1 elements of it, always results in zero. The test itself selects several such random subcubesand veriﬁes that this is indeed the case. Since in our case, we are given some speciﬁc input x ∈ Z n for which we want to correct the function, we can use a similar argument.Given the input x , we randomly select k + 1 vectors x , x , . . . , x k +1 and consider the aﬃnesubcube whose basis is the set of these k + 1 vectors and whose oﬀset is x . Since the sum ofevaluations inside this aﬃne subcube (which includes x ) is zero, we can deduce the value at x byquerying the other 2 k +1 − k + 1 vectors in the basis randomly, relying on the (easy case in the) analysis of [2] whichis based on the fact that each input queried is uniformly random, we can bound this probability by(2 k +1 − ε < / f is indeed O (2 k )-locally correctablefor ε < − k − . Corollary 2.

The family of k -juntas is O (2 k ) -locally correctable for ε < − k − .Proof. A k -junta is in particular a polynomial of degree k and therefore is also O (2 k )-locally cor-rectable using the above proposition. In addition, the algorithm suggested by the proposition doesnot require any knowledge about the input function except for it being a polynomial of degree k ,thus the family of k -juntas is O (2 k )-locally correctable. A natural question is whether the exponential upper bound for low-degree polynomials, which isapplicable also for juntas, is indeed tight in the case of juntas. We show that the answer is indeedpositive, but only for a small fraction of the juntas. In other words, for some juntas the exponentialupper bound is also best possible but this is far from being the typical case.

Theorem 3.

There exist some k -juntas which require Ω( k ) (adaptive or non-adaptive) queries inorder to be locally corrected, even for ε which is exponentially small in n . In the typical case, however, i.e., for almost every junta, the lower bound above is far frombeing tight and in fact one can correct a typical k -junta using a nearly linear number of queries (in k ). Formally, we prove the following. Theorem 4. A k -junta in which every inﬂuencing variable has inﬂuence of at least / is O ( k log k ) -locally correctable for ε < − k − . Therefore this is the case for almost every k -junta. Corollary 5.

The family of k -juntas in which every inﬂuencing variable has inﬂuence of at least / is O ( k log k ) -locally correctable for ε < − k − . k -juntas We start this section with the proof of the lower bound for some juntas. The juntas used in theproof are very sparse, having an exponentially small fraction of inputs for which the value of thefunction is 1. 3 roof of Theorem 3.

Given k < n ∈ N where n is even, deﬁne f to be the AND function of theﬁrst k literals x , . . . , x k . In order to prove a lower bound for the number of queries, we use Yao’sprinciple. To this end, we deﬁne two distributions on functions which are all o (1)-close to beingisomorphic to f , one for which the algorithm should return zero and another for which the algorithmshould return one (denoted by D and D respectively). We further show that any algorithm thatperforms only 2 o ( k ) queries would not be able to distinguish between the two distributions withnon-negligible probability.We ﬁrst describe the distribution D as follows. We randomly choose a permutation σ ∈ S n sothat σ ( i ) ∈ [ n/

2] for every i ∈ [ k ], meaning the k relevant variables are all in the ﬁrst half. Thefunction g given to the algorithm is deﬁned by g ( y ) = f σ ( y ) whenever the Hamming weight of y is at most 0 . n in each half (i.e. P n/ i =1 y i ≤ . n and P ni = n/ y i ≤ . n ) and otherwise g ( y ) = 0.Notice that indeed g is o (1)-close to being isomorphic to f as we modiﬁed only an o (1)-fraction ofthe inputs. The input x is set to be the balanced input of n/ n/ f σ ( x ) = 0 for every instance in D as required.The distribution D is similar to D with one modiﬁcation. The permutation σ is chosen sothat σ ( i ) [ n/

2] for every i ∈ [ k ]. The choice of x and the locations where we ﬁx g ( y ) = 0 aredeﬁned as before and indeed f σ ( x ) = 1 for every instance in D .We ﬁrst show that an arbitrary query to g in either distribution would output one with proba-bility at most 2 − Ω( k ) . Let y be some query the algorithm performs. Clearly, if the Hamming weightof y in either half is more than 0 . n , the result would be zero according to the deﬁnition of g inboth distributions. Otherwise, the probability that g ( y ) = 1 is given by (cid:18) mk (cid:19) / (cid:18) n/ k (cid:19) = ( m − k + 1)( m − k + 2) · · · m ( n/ − k + 1)( n/ − k + 2) · · · ( n/ ≤ (cid:18) n/ n/ (cid:19) k = 0 . k where m is the Hamming weight of y in the relevant half (either the ﬁrst half for D or the secondhalf for D ), which is known to be at most 0 . n . Therefore, any algorithm that performs at most2 o ( k ) queries would ﬁnd a y for which g ( y ) = 1 only with negligible probability, and it would not beable to distinguish between D and D with noticeable probability. Notice that the proof impliesthat using an adaptive algorithm would not yield any improvement as we can predict all results tobe zero in advance (and therefore this is equivalent to a non-adaptive algorithm).The fact that the AND junta is very sparse was crucial for the above proof. In order to prove abetter upper bound for most juntas, we need some restriction that would ensure the function is farfrom being sparse. In Theorem 4 we required something even stronger, that the inﬂuence of everyinﬂuencing variable, that is, any of the k special variables of the junta, is at least 1 / Deﬁnition 1 (Inﬂuence) . Given a Boolean function f : Z n Z , the inﬂuence of i with respect to f is deﬁned by Inf i ( f ) = Pr x [ f ( x ) = f ( x + e i )] where e i is the vector having 1 only at location i . Thus, the inﬂuence is the probability that changing the value of the i th variable will alsochange the value of the function. This probability is taken over all values of x , and is thereforethe expected inﬂuence of i in a restricted function (when the variable i itself is not restricted). Throughout this work we use the notation [ ℓ ] := { , , . . . , ℓ } . i is greater than 1 /

50, then the function is 1 / k -junta for which the inﬂuencing variables are the ﬁrst k variables, the inﬂuenceof some variable 1 ≤ i ≤ k is determined by the bias of the 2 k − pairs of inputs of length k whichdiﬀer only in the i th variable, where the values of all other variables in [ k ] range over all possibilities.The expected bias is hence 1 /

2, and moreoverPr[Inf i ( f ) < /

50] = Pr[ B (2 k − , / < k − / < − c k for some absolute constant c >

0, where here B is the binomial distribution and we applied oneof the standard estimates for binomial distributions (cf., e.g. [3], Appendix A). Therefore, by theunion bound, the k inﬂuencing variables would all have inﬂuence greater than 1 /

50 with probability1 − − Ω(2 k ) .Now that we deﬁned the inﬂuence of a variable and veriﬁed that indeed almost every juntasatisﬁes the condition in the theorem, we describe the proof of Theorem 4. Proof of Theorem 4.

Let f be a k -junta as in Theorem 4, and let g be the given input function whichis ε -close to f σ (we assume ε < − k − in order to guarantee that g is close to a unique isomorphism f σ ). Following the basic approach in the known junta testing algorithms (see e.g. [10, 5]), we intendto randomly divide the variables into parts and identify which sets have inﬂuencing variables. Herehowever, mistakenly identifying a set to have inﬂuencing variables (due to fault evaluations ofthe input function) or having more than one such variable in a part is not an essential issue (asestimating the number of inﬂuencing variables is not our goal).Fix s = 3 k and partition the set [ n ] into s parts uniformly at random, by assigning to eachvariable one of the s sets independently. For each set we perform r = 100 log k +500 pairs of queries,where each pair ( x, x ′ ) is chosen independently and uniformly at random such that x and x ′ agreeon all elements outside of the current set. When a set has at least one inﬂuencing variable (withinﬂuence at least 1 / /

100 (as the randomly restricted function over the variables outside of the current set is expectedto be at least 1 / . r < . log k · . < / k (assuming we did not hit a faulty evaluation -a probability that we later consider). Since there are at most k sets with inﬂuencing variables, bythe union bound we would identify them all with probability at least 99 / ε . During this process, weperform only sr = O ( k log k ) pairs of independent non-adaptive queries and therefore we would hita faulty evaluation only with probability O ( k log k/ k ) = 2 − Ω( k ) .So far with good probability we have identiﬁed at most k sets which are inﬂuencing. Let S , . . . , S k denote these k sets (where we add some arbitrary randomly chosen sets if less than k were found) and deﬁne S = ∪ ki =1 S i to be their union (notice that the size of S is expected to be E [ | S | ] = n/ x for which we were asked to determine f σ ( x ), we would like tochoose an input y which agrees with x on all indices from S , and yet is uniformly distributed (exceptfor the restriction to match x on the k inﬂuencing variables). Achieving this would guarantee thatthe probability of y hitting a faulty evaluation is at most 2 k +1 / k +3 = 1 / x on these k variables).5et p = 3 / y so that y i = x i for every i ∈ S , and otherwise Pr[ y i = x i ] = p .Whenever i is not one of the special k variables, Pr[ y i = x i ] = Pr[ i S ] · p = · = and theseprobabilities are all independent. The independence between the diﬀerent variables and the factthat Pr[ i S ] is exactly 2 / y is uniformly distributed over all inputs which agree with x on the special k variables, as required.Combining the two parts together, the algorithm would return the correct answer g ( y ) = f σ ( x )with probability at least 3 / − / − − Ω( k ) > / k ). Proof of Corollary 5.

The algorithm provided here did not use any speciﬁc knowledge of the func-tion f except for the guarantee of its structure, being a k -junta in which each inﬂuencing variablehas inﬂuence at least 1 /

50. Therefore, this family is O ( k log k )-locally correctable for ε < − k − .Hence, for every ﬁxed k this family forms a locally correctable code which has polynomial size in n and constant number of queries O ( k log k ). In this work we have shown that k -juntas and degree k polynomials are q -locally correctable, where q depends on the structure parameter k of the function, and not on the number of variables n .We have also seen that q is always at most O (2 k ) and that sometimes an exponential behavior istight. The main general open question in this subject is that of computing the query complexityof local correction for any given function. In particular, it would be very interesting to ﬁnd a char-acterization of all functions that are ”easily” correctable, that is, have constant query complexity(independent of n ).Although we have seen both upper and lower bounds for juntas and polynomials, the lowerbound is only applicable for speciﬁc functions. Symmetric functions, for example, are always 0-locally correctable as one does not need to query the function at all in order to correct an input.However, such functions can have arbitrary large degree as polynomials. Taking the majority func-tion as an example, it is trivial to correct and yet it has high degree, but even a slight modiﬁcationof it, Maj n − - the majority of n − n − ,one can modify o (1) fraction of the inputs, namely the balanced layer, and make this functionimpossible for correcting in any number of queries. References [1] N. Alon and E. Blais,

Testing boolean function isomorphism . In Proc. RANDOM-APPROX,pages 394-405, 2010.[2] N. Alon, T. Kaufman, M. Krivelevich, S. Litsyn and D. Ron,

Testing low-degree polynomialsover GF(2) . Proceedings of RANDOM-APPROX 2003, 188-199. Also: Testing Reed-Mullercodes, IEEE Transactions on Information Theory 51 (2005), 4032-4039.[3] N. Alon and J. Spencer,

The Probabilistic Method, Third Edition , Wiley 2008.[4] M. Bellare, O. Goldreich, and M. Sudan,

Free bits, PCPs and non-approximability - towardstight results . SIAM J. Comput., 27(3):804-915, 1998.65] E. Blais,

Testing juntas nearly optimally . Proceedings of the 41st annual ACM STOC, 2009,pp. 151-158.[6] E. Blais and R. O’Donnell,

Lower bounds for testing function isomorphism . In IEEE Conferenceon Computational Complexity, pages 235-246, 2010.[7] M. Blum, M. Luby, and R. Rubinfeld,

Self-testing/correcting with applications to numericalproblems . Journal of Computer and System Sciences, 47(3):549-595, 1993.[8] S. Chakraborty, D. Garc´ıa-Soriano, and A. Matsliah,

Nearly tight bounds for testing functionisomorphism . In Proc. SODA, pages 1683-1702, 2011.[9] H. Chockler and D. Gutfreund,

A lower bound for testing juntas . Information ProcessingLetters, 90(6):301-305, 2004.[10] E. Fischer, G. Kindler, D. Ron, S. Safra, and A. Samorodnitsky,

Testing juntas . J. Comput.Syst. Sci., 68(4):753-787, 2004.[11] M. Parnas, D. Ron, and A. Samorodnitsky,

Testing basic boolean formulae . SIAM J. DiscreteMath., 16(1):20-46, 2002.[12] S. Yekhanin,