[PDF] A Composition Theorem for Randomized Query Complexity

Abstract

Let the randomized query complexity of a relation for error probability \epsilon be denoted by R_\epsilon(\cdot). We prove that for any relation f \subseteq \{0,1\}^n \times \mathcal{R} and Boolean function g:\{0,1\}^m \rightarrow \{0,1\}, R_{1/3}(f\circ g^n) = \Omega(R_{4/9}(f)\cdot R_{1/2-1/n^4}(g)), where f \circ g^n is the relation obtained by composing f and g. We also show that R_{1/3}\left(f \circ \left(g^\oplus_{O(\log n)}\right)^n\right)=\Omega(\log n \cdot R_{4/9}(f) \cdot R_{1/3}(g)), where g^\oplus_{O(\log n)} is the function obtained by composing the xor function on O(\log n) bits and g^t.

Full PDF

aa r X i v : . [ c s . CC ] J un A Composition Theorem for Randomized Query complexity

Anurag Anshu ∗ Dmitry Gavinsky † Rahul Jain ‡ Srijita Kundu § Troy Lee ¶ Priyanka Mukhopadhyay k Miklos Santha ∗∗ Swagato Sanyal †† June 15, 2017

Abstract

Let the randomized query complexity of a relation for error probability ǫ be denoted by R ǫ ( · ).We prove that for any relation f ⊆ { , } n × R and Boolean function g : { , } m → { , } , R / ( f ◦ g n ) = Ω( R / ( f ) · R / − /n ( g )), where f ◦ g n is the relation obtained by composing f and g . We also show using an XOR lemma that R / (cid:16) f ◦ (cid:16) g ⊕ O (log n ) (cid:17) n (cid:17) = Ω(log n · R / ( f ) · R / ( g )),where g ⊕ O (log n ) is the function obtained by composing the XOR function on O (log n ) bits and g . Given two Boolean functions f : { , } n → { , } and g : { , } m → { , } , the composed function f ◦ g n : ( { , } m ) n → { , } is deﬁned as follows: For x = ( x (1) , . . . , x ( n ) ) ∈ ( { , } m ) n , f ◦ g n ( x ) = f ( g ( x (1) ) , . . . , g ( x ( n ) )). Composition of Boolean functions has long been a topic of active research incomplexity theory. In many works, composition of Boolean function is studied in the context of acertain complexity measure. The objective is to understand the relation between the complexity ofthe composed function in terms of the complexities of the individual functions. Let D ( · ) denote thedeterministic query complexity. It is easy to see that D ( f ◦ g n ) ≤ D ( f ) · D ( g ) since f ◦ g can be computedby simulating an optimal query algorithm of f ; whenever the algorithm makes a query, we simulatean optimal query algorithm of g and serve the query. It can be shown by an adversary argument thatthis is an optimal query algorithm and D ( f ◦ g n ) = D ( f ) · D ( g ).However, such a characterization is not so obvious for randomized query complexity. Although asimilar upper bound still holds true (possibly accommodating a logarithmic overhead), it is no moreas clear that it also asymptotically bounds the randomized query complexity of f ◦ g n from below. Let R ǫ ( · ) denote the ǫ -error randomized query complexity. Our main theorem in this work is the following. ∗ Centre for Quantum Technologies, National University of Singapore, Singapore. [email protected] † Institute of Mathematics, Czech Academy of Sciences, ˇZitna 25, Praha 1, Czech Republic. Part of this work wasdone when Dmitry Gavinsky was visiting the Centre for Quantum Technologies at the National University of Singapore. ‡ Centre for Quantum Technologies, National University of Singapore and MajuLab, UMI 3654, [email protected] § Centre for Quantum Technologies, National University of Singapore, Singapore. [email protected] ¶ Division of Mathematical Sciences, Nanyang Technological University, Singapore and Centre for Quantum Technolo-gies, National University of Singapore, Singapore. [email protected] k Centre for Quantum Technologies, National University of Singapore, Singapore. [email protected] ∗∗ IRIF, Universit´e Paris Diderot, CNRS, 75205 Paris, France, and Centre for Quantum Technologies, National Uni-versity of Singapore, Singapore. [email protected] †† Division of Mathematical Sciences, Nanyang Technological University, Singapore and Centre for Quantum Technolo-gies, National University of Singapore, Singapore. [email protected] heorem 1 (Main Theorem) . For any relation f ⊆ { , } n × R and Boolean function g : { , } m →{ , } , R / ( f ◦ g n ) = Ω( R / ( f ) · R / − /n ( g )) . See Section 2 for deﬁnitions of composition and various complexity measures of relations. Theorem 1implies that if g is a function that is hard to compute with error 1 / − /n , f ◦ g n is hard to computewith error 1 / f is a Boolean function, Theorem 1 implies that R / ( f ◦ g n ) = Ω( R / ( f ) · R / − /n ( g )), since the success probability of query algorithms for Boolean functions can be boostedfrom 5 / / g is hard against randomized query algorithms even forerror 1 / − /n . In Section 3.1 we prove the following consequence of Theorem 1.Let f ⊆ { , } n ×R be any relation. Let g : { , } m → { , } be a function. Let g ⊕ t : ( { , } m ) t → { , } be deﬁned as follows: for x = ( x (1) , . . . , x ( t ) ) ∈ ( { , } m ) t , g ⊕ t ( x ) = ⊕ ti =1 g ( x ( i ) ). Theorem 2. R / (cid:16) f ◦ (cid:16) g ⊕ O (log n ) (cid:17) n (cid:17) = Ω(log n · R / ( f ) · R / ( g )) . Theorem 2 is proved by establishing, via an XOR lemma by Andrew Drucker [5], that if g is hard forerror 1 / g ⊕ O (log n ) is hard for error 1 / − /n .Composition theorem for randomized query complexity has been an area of active research in the past.G¨o¨os and Jayram [6] showed a composition theorem for a constrained version of conical junta degree,which is a lower bound on randomized query complexity. Composition theorem for approximate degree(which also lower bounds randomized query complexity) for the special case of TRIBES function hasseen a long line of research culminating in independent works of Sherstov [12] and Bun and Thaler [3]who settle the question by proving optimal bounds.Composition theorem has been studied and shown in the context of communication and query com-plexities by the works of G¨o¨os, Pitassi and Watson [7, 8], Chattopadhyay et al. [4] when the function g is the indexing function or the inner product function with large enough arity. The work of Hatami,Hosseini and Lovett [9] proves a composition theorem in the context of communication and parity querycomplexites when the function g is the two-bit XOR function. Ben-David and Kothari [1] proved acomposition theorem for the sabotage complexity of Boolean functions, a novel complexity measuredeﬁned in the same work that the authors prove to give quadratically tight bound on the randomizedquery complexity.Composition theorems have also been successfully used in the past in constructing separating examplesfor various complexity measures, and bounding one complexity measure in terms of another. Kulkarniand Tal [10] proved an upper bound on fractional block sensitivity in terms of degree by analyzingthe behavior of fractional block sensitivity under function composition. Separation between blocksensitivity and degree was obtained by composing Kushilevitz’s icosadedron function repeatedly withitself (see [2]). Separation between parity decision tree complexity and Fourier sparsity has beenobtained by O’Donnell et al. by studying the behavior of parity kill number under function composition[11]. In this section, we give a high level overview of our proof of Theorem 1. We refer the reader to Section 2for formal deﬁnitions of composition and various complexity measures of relations.2et ǫ = 1 / − /n . Let µ be the distribution over the domain { , } m of g for which R ǫ ( g ) is achieved,i.e., R ǫ ( g ) = D µǫ ( g ) (see Fact 1). For b ∈ { , } , let µ b denote the distribution obtained by conditioning µ to the event that g ( x ) = b (see Section 2 for a formal deﬁnition).We show that for every probability distribution λ over the domain { , } n of f , there exists a deter-ministic query algorithm A with worst case query complexity at most R / ( f ◦ g n ) / R ǫ ( g ), such thatPr z ∼ λ [( z, A ( z )) ∈ f ] ≥ /

9. By the minimax principle (Fact 1) this proves Theorem 1.Now using the distribution λ over { , } n we deﬁne a probability distribution γ over ( { , } m ) n . Todeﬁne γ , we begin by deﬁning a family of distributions { γ z : z ∈ { , } n } over ( { , } m ) n . For a ﬁxed z = ( z , . . . , z n ) ∈ { , } n , we deﬁne γ z by giving a sampling procedure:1. For each i = 1 , . . . , n , sample x ( i ) = ( x ( i )1 , . . . , x ( i ) m ) from { , } m independently according to µ z i .2. Return x = ( x (1) , . . . , x ( n ) ).Thus for z = ( z , . . . , z n ) ∈ { , } n and x = ( x (1) , . . . , x ( n ) ) ∈ ( { , } m ) n , γ z ( x ) = Π ni =1 µ z i ( x ( i ) ). Notethat γ z is supported only on strings x for which the following is true: for each r ∈ R , ( x, r ) ∈ f ◦ g n ifand only if ( z, r ) ∈ f .Having deﬁned the distributions γ z , we deﬁne the distribution γ by giving a sampling procedure:1. Sample a z = ( z , . . . , z n ) from { , } n according to λ .2. Sample an x = ( x (1) , . . . , x ( n ) ) from ( { , } m ) n according to γ z . Return x .By minimax principle (Fact 1), there is a deterministic query algorithm B of worst case complexity atmost R / ( f ◦ g n ) such that Pr x ∼ γ [( x, B ( x )) ∈ f ◦ g n ] ≥ /

3. We will use B to construct a randomizedquery algorithm A ′ for f with the desired properties. A deterministic query algorithm A for f withrequired performance guarantees can then be obtained by appropriately ﬁxing the randomness of A ′ .See Algorithm 1 for a formal description of A ′ . Given an input z = ( z , . . . , z n ) , A ′ simulates B . Recallthat an input to B is an nm bit long string ( x ( i ) j ) i =1 ,...,nj =1 ,...,m . Whenever B asks for (queries) an input bit x ( i ) j , a response bit is appropriately generated and passed to B . To generate a response to a query by B , a bit in z may be queried; those queries will contribute to the query complexity of A ′ . The queriesare addressed as follows. Let the simulation of B request bit x ( i ) j . • If less than D µǫ ( g ) queries have been made into x ( i ) (including the current query) then a bit b issampled from the marginal distribution of x ( i ) j according to µ , conditioned on the responses tothe past queries. b is passed to the simulation of B . • If D µǫ ( g ) queries have been made into x ( i ) (including the current query) then ﬁrst the input bit z i is queried; then a bit b is sampled from the marginal distribution of x ( i ) j according to µ z i ,conditioned on the responses to the past queries. b is passed to the simulation of B .The simulation of B continues until B terminates in a leaf. Then A ′ also terminates and outputs thelabel of the leaf.We use Claims 3 and 4 to prove that for a ﬁxed z ∈ { , } n , the probability distribution induced by A ′ on the leaves of B is statistically close to the probability distribution induced by B on its leaves for arandom input from γ z . Averaging over diﬀerent z ’s, the correctness of A ′ follows from the correctnessof B . The reader is referred to Section 3 for the details.3 Preliminaries

In this section, we deﬁne some basic concepts, and set up our notations. We begin with deﬁning the2-sided error randomized and distributional query complexity measures of relations. The relationsconsidered in this work will all be between the Boolean hypercube { , } k of some dimension k , andan arbitrary set S . The strings x ∈ { , } n will be called as inputs to the relation, and { , } n will bereferred to as the input space and the domain of h . Deﬁnition 1 (2-sided Error Randomized Query Complexity) . Let S be any set. Let h ⊆ { , } k × S be any relation and ǫ ∈ [0 , / R ǫ ( h ) is the minimumnumber of queries made in the worst case by a randomized query algorithm A (the worst case is overinputs and the internal randomness of A ) that on each input x ∈ { , } k satisﬁes Pr[( x, A ( x )) ∈ h ] ≥ − ǫ (where the probability is over the internal randomness of A ). Deﬁnition 2 (Distributional Query Complexity) . Let h ⊆ { , } k × S be any relation, µ a distributionon the input space { , } k of h , and ǫ ∈ [0 , / D µǫ ( h ) is theminimum number of queries made in the worst case (over inputs) by a deterministic query algorithm A for which Pr x ∼ µ [( x, A ( x )) ∈ h ] ≥ − ǫ .In particular, if h is a function and A is a randomized or distributional query algorithm computing h with error ǫ , then Pr[ h ( x ) = A ( x )] ≥ − ǫ , where the probability is over the respective sources ofrandomness.The following theorem is von Neumann’s minimax principle stated for decision trees. Fact 1 (minimax principle) . For any integer k , set S , and relation h ⊆ { , } k × S , R ǫ ( h ) = max µ D µǫ ( h ) . Let g : { , } m → { , } be a Boolean function. Let µ be a probability distribution on { , } m whichintersects non-trivially both with g − (0) and with g − (1). For each z ∈ { , } , let µ z be the distributionobtained by restricting µ to g − ( z ). Formally, µ z ( x ) = ( g ( x ) = z µ ( x ) P y : g ( y )= z µ ( y ) if g ( x ) = z Notice that µ and µ are deﬁned with respect to some Boolean function g , which will always be clearfrom the context. Deﬁnition 3 (Subcube, Co-dimension) . A subset C of { , } m is called a subcube if there exists a set S ⊆ { , . . . , m } of indices and an assignment function A : S → { , } such that C = { x ∈ { , } m : ∀ i ∈ S, x i = A ( i ) } . The co-dimension codim ( C ) of C is deﬁned to be | S | .Let C ⊆ { , } m be a subcube and µ be a probability distribution on { , } m . We will often abusenotation and use C to denote the event that a random string x belongs to the subcube C . Theprobability Pr x ∼ µ [ x ∈ C ] will be denoted by Pr µ [ C ]. For subcubes C and C , the conditional probabilityPr x ∼ µ [ x ∈ C | x ∈ C ] will be denoted by Pr µ [ C | C ]. Deﬁnition 4 (Bias of a subcube) . Let g : { , } m → { , } be a Boolean function. Let µ be aprobability distribution over { , } m . Let C ⊆ { , } m be a subcube such that Pr µ [ C ] >

0. The bias of C with respect to µ , bias µ ( C ), is deﬁned to be: bias µ ( C ) = | Pr x ∼ µ [ g ( x ) = 0 | x ∈ C ] − Pr x ∼ µ [ g ( x ) = 1 | x ∈ C ] | . A Boolean function g is implicit in the deﬁnition of bias, which will always be clear from the context.4 roposition 2. Let g : { , } m → { , } be a Boolean function, and D µǫ ( g ) >

0. Then,min b ∈{ , } { Pr x ∼ µ [ g ( x ) = b ] } > ǫ. In particular, bias µ ( { , } m ) < − ǫ . Proof.

Towards a contradiction, assume that min b ∈{ , } { Pr x ∼ µ [ g ( x ) = b ] } ≤ ǫ . Then, the algorithmthat outputs arg max b ∈{ , } { Pr x ∼ µ [ g ( x ) = b ] } makes 0 query and is correct with probability at least1 − ǫ . This contradicts the hypothesis that D µǫ ( g ) > Deﬁnition 5 (Composition of relations) . Let f ⊆ { , } n ×R and g ⊆ { , } m ×{ , } be two relations.The composed relation f ◦ g n ⊆ ( { , } m ) n × R is deﬁned as follows: For x = ( x (1) , . . . , x ( n ) ) ∈ ( { , } m ) n and r ∈ R , ( x, r ) ∈ f ◦ g n if and only if there exists b = ( b (1) , . . . , b ( n ) ) ∈ { , } n such thatfor each i = 1 , . . . , n , ( x ( i ) , b ( i ) ) ∈ g and ( b, r ) ∈ f .We will often view a deterministic query algorithm as a binary decision tree. In each vertex v of thetree, an input variable is queried. Depending on the outcome of the query, the computation goes to achild of v . The child of v corresponding to outcome b to the query made is denoted by v b . It is wellknown that the set of inputs that lead the computation of a decision tree to a certain vertex forms asubcube. We will denote the subcube corresponding to a vertex v by C v .We next prove two claims about bias, probability and co-dimension of subcubes that will be useful.Claim 3 states that for a function with large distributional query complexity, the bias of most shallowleaves of any deterministic query procedure is small. Claim 3.

Let g : { , } m → { , } be a Boolean function. Let ǫ ∈ [1 / , /

2) and let δ = 1 / − ǫ .Let µ be a probability distribution on { , } m , and D µǫ ( g ) = c >

0. Let B be any deterministic queryalgorithm for strings in { , } m . For each y ∈ { , } m , let ℓ y be the unique leaf of B that contains y .Then,(a) Pr y ∼ µ [ codim ( ℓ y ) < c and bias µ ( ℓ y ) ≥ δ / ] < δ / . (b) For each b ∈ { , } , Pr y ∼ µ b [ codim ( ℓ y ) < c and bias µ ( ℓ y ) ≥ δ / ] < δ / .In the above claim B could just be a deterministic procedure that makes queries and eventuallyterminates; whether or not it makes any output upon termination is not of any consequence here. Proof.

We ﬁrst show that part (a) implies part (b). To this end, assume part (a) and ﬁx a b ∈ { , } .Let a ( y ) be the indicator variable for the event codim ( ℓ y ) < c and bias µ ( ℓ y ) ≥ δ / . Thus, part (a)states that Pr y ∼ µ [ a ( y ) = 1] < δ / . Now,Pr y ∼ µ b [ codim ( ℓ y ) < c and bias µ ( ℓ y ) ≥ δ / ]= X y : a ( y )=1 µ b ( y )= 1 P y : g ( y )= b µ ( y ) X y : a ( y )=1 µ ( y ) (From the deﬁnition of µ b ) < ǫ Pr y ∼ µ [ a ( y ) = 1] (From Proposition 2) < δ / . (By the hypothesis ǫ ≥ / y ∼ µ [ codim ( ℓ y ) < c and bias µ ( ℓ y ) ≥ δ / ] ≥ δ / . Now consider the following decision tree algorithm A on m bit strings:Begin simulating B . Let C be the subcube associated with the current node of B in the simulation.Simulate B unless one of the following happens. • B terminates. • The number of queries made is c − • bias µ ( C ) ≥ δ / .Upon termination, if bias µ ( C ) ≥ δ / , output arg max b ∈{ , } Pr y ∼ µ [ g ( y ) = b | y ∈ C ]. Else output auniformly random bit.It immediately follows that the worst case query complexity of A is at most c −

1. Now, we will provethat Pr y ∼ µ [ A ( y ) = g ( y )] ≥ − ǫ . This will contradict the hypothesis that D µǫ ( g ) = c . Let L be thenode of B at which the computation of A ends. Let Pr y ∼ µ [ bias µ ( L ) ≥ δ / ] = p . By our assumption,the probability (over µ ) that L is a leaf and bias µ ( L ) ≥ δ / is at least δ / ; in particular p ≥ δ / .Now, Pr y ∼ µ [ A ( y ) = g ( y )]= Pr y ∼ µ [ bias µ ( L ) ≥ δ / ] · Pr y ∼ µ [ A ( y ) = g ( y ) | bias µ ( L ) ≥ δ / ]+Pr y ∼ µ [ bias µ ( L ) < δ / ] · Pr y ∼ µ [ A ( y ) = g ( y ) | bias µ ( L ) < δ / ] ≥ p · (1 / δ / ) + (1 − p ) .

12 (from our assumption)= 1 / p · δ / ≥ / δ (since p ≥ δ / )= 1 − ǫ. This completes the proof.The next claim states that if a subcube has low bias with respect to a distribution µ , then thedistributions µ and µ ascribe almost the same probability to it. Claim 4.

Let g : { , } m → { , } be a Boolean function and δ ∈ (0 , ]. Let µ be a distribution on { , } m . Let C be a subcube such that Pr µ [ C ] > bias µ ( C ) ≤ δ . Also assume that bias µ ( { , } m ) ≤ δ . Then for any b ∈ { , } we have,(a) Pr µ [ C ] ≤ (1 + 4 δ ) · Pr µ b [ C ] , (b) Pr µ [ C ] ≥ (1 − δ ) · Pr µ b [ C ]. Proof.

We prove part (a) of the claim. The proof of part (b) is similar.By the deﬁnition of bias and the hypothesis, for each b ∈ { , } , X y ∈H m : g ( y )= b µ ( y ) ≤ (cid:18)

12 + δ (cid:19) · X y ∈H m µ ( y ) = 12 + δ , (1)6 y ∈C : g ( y )= b µ ( y ) ≥ (cid:18) − δ (cid:19) · X y ∈C µ ( y ) > . (2)Now, Pr µ b [ C ] = X y ∈C µ b ( y )= P y ∈C : g ( y )= b µ ( y ) P y ∈H m : g ( y )= b µ ( y ) ≥ (1 / − δ/ · P y ∈C µ ( y )1 / δ/ / − δ/ / δ/ · Pr µ [ C ]Thus, Pr µ [ C ] ≤ / δ/ / − δ/ · Pr µ b [ C ] ≤ (1 + 4 δ ) · Pr µ b [ C ] . (since δ ≤ ) In this section we prove our main theorem. We restate it below.

Theorem 1 (Main Theorem) . For any relation f ⊆ { , } n × R and Boolean function g : { , } m →{ , } , R / ( f ◦ g n ) = Ω( R / ( f ) · R / − /n ( g )) . Proof.

We begin by recalling the notations deﬁned in Section 1.1 that we will use in this proof.Let ǫ = 1 / − /n . Let µ be the distribution over the domain { , } m of g for which R ǫ ( g ) is achieved,i.e., R ǫ ( g ) = D µǫ ( g ). (see Fact 1)We show that for every probability distribution λ over the input space { , } n of f , there exists adeterministic query algorithm A with worst case query complexity at most R / ( f ◦ g ) / R ǫ ( g ), suchthat Pr z ∼ λ [( z, A ( z )) ∈ f ] ≥ /

9. By the minimax principle (Fact 1) this will prove Theorem 1.Using λ , we deﬁne a probability distribution γ over ( { , } m ) n . We ﬁrst deﬁne a family of distributions { γ z : z ∈ { , } n } over ( { , } m ) n . For a ﬁxed z ∈ { , } n , we deﬁne γ z by giving a sampling procedure:1. For each i = 1 , . . . , n , sample x ( i ) = ( x ( i )1 , . . . , x ( i ) m ) from { , } m independently according to µ z i .2. Return x = ( x (1) , . . . , x ( n ) ).Thus for z = ( z , . . . , z n ) ∈ { , } n and x = ( x (1) , . . . , x ( n ) ) ∈ ( { , } m ) n , γ z ( x ) = Π ni =1 µ z i ( x ( i ) ). Notethat γ z is supported only on strings x for which the following is true: for each r ∈ R , ( x, r ) ∈ f ◦ g n ifand only if ( z, r ) ∈ f .Now, we deﬁne the distribution γ by giving a sampling procedure:1. Sample a z = ( z , . . . , z n ) from { , } n according to λ .7. Sample an x = ( x (1) , . . . , x ( n ) ) from ( { , } m ) n according to γ z . Return x .By the minimax principle (Fact 1), there is a deterministic query algorithm B of worst case complexityat most R / ( f ◦ g n ) such that Pr x ∼ γ [( x, B ( x )) ∈ f ◦ g n ] ≥ /

Randomized query algorithm A ′ for f Input: z ∈ { , } n Initialize v ← root of the decision tree B , Q ← ∅ while v is not a leaf do Let a bit in x ( i ) be queried at v if i Q then /* codim ( C ( i ) v ) < D µǫ ( g ) if this is satisfied */ Set v ← v b with probability Pr µ [ C ( i ) v b | C ( i ) v ] if codim ( C ( i ) v ) = D µǫ ( g ) then Query z i Set Q = Q ∪ { i } else Set v ← v b with probability Pr µ zi [ C ( i ) v b | C ( i ) v ] Output label of v .From the deﬁnition of bias one can verify that the events in steps 5 and 10 in Algorithm 1 that are beingconditioned on, have non-zero probabilities under the respective distributions; hence, the probabilisticprocesses are well-deﬁned.From the description of A ′ it is immediate that z i is queried only if the underlying simulation of B queries at least R ǫ ( g ) locations in x ( i ) . Thus the worst-case query complexity of A ′ is at most R / ( f ◦ g n ) / R ǫ ( g ).We are left with the task of bounding the error of A ′ . Let L be the set of leaves of the decision tree B . Each leaf ℓ ∈ L is labelled with a bit b ℓ ∈ { , } ; whenever the computation reaches ℓ , the bit b ℓ isoutput.For a vertex v , let the corresponding subcube C v be C (1) v × . . . × C ( n ) v , where C ( i ) v is a subcube of thedomain of the i -th copy of g (corresponding to the input x ( i ) ). Recall from Section 2 that for b ∈ { , } , v b denotes the b -th child of v .For each leaf ℓ ∈ L and i = 1 , . . . , n , deﬁne snip ( i ) ( ℓ ) to be 1 if there is a node t in the unique path fromthe root of B to ℓ such that codim ( C ( i ) t ) < D µǫ ( g ) and bias µ ( C ( i ) t ) ≥ n . Deﬁne snip ( i ) ( ℓ ) = 0 otherwise.Deﬁne snip ( ℓ ) = ∨ ni =1 snip ( i ) ( ℓ ).For each ℓ ∈ L , deﬁne p zℓ to be the probability that for an input drawn from γ z , the computation of B terminates at leaf ℓ . We have,Pr x ∼ γ z [( x, B ( x )) ∈ f ◦ g n ] = Pr x ∼ γ z [( z, B ( x )) ∈ f ] = X ℓ ∈L :( z,b ℓ ) ∈ f p zℓ . (3)From our assumption about B we also have that,Pr x ∼ γ [( x, B ( x )) ∈ f ◦ g n ] = E z ∼ λ Pr x ∼ γ z [( x, B ( x )) ∈ f ◦ g n ] ≥ . (4)8ow, consider a run of A ′ on z . For each ℓ ∈ L of B , deﬁne q zℓ to be the probability that the computationof A ′ on z terminates at leaf ℓ of B . Note that the probability is over the internal randomness of A ′ .To ﬁnish the proof, we need the following two claims. The ﬁrst one states that the leaves ℓ ∈ L aresampled with similar probabilities by B and A ′ . Claim 5.

For each ℓ ∈ L such that snip ( ℓ ) = 0, and for each z ∈ { , } n , · p zℓ ≤ q zℓ ≤ · p zℓ .The next Claim states that for each z , the probability according to γ z of the leaves ℓ for which snip ( ℓ ) = 1 is small. Claim 6. ∀ z ∈ { , } n , X ℓ ∈L , snip ( ℓ )=1 p zℓ ≤ n . We ﬁrst ﬁnish the proof of Theorem 1 assuming Claims 5 and 6, and then prove the claims. For aﬁxed input z ∈ { , } n , the probability that A ′ , when run on z , outputs an r such that ( z, r ) ∈ f , isat least X ℓ ∈L , ( z,bℓ ) ∈ f, snip ( ℓ )=0 q zℓ ≥ X ℓ ∈L , ( z,bℓ ) ∈ f, snip ( ℓ )=0 · p zℓ (By Claim 5)= 89  X ℓ ∈L , ( z,bℓ ) ∈ f p zℓ − X ℓ ∈L , ( z,bℓ ) ∈ f, snip ( ℓ )=1 p zℓ  ≥  X ℓ ∈L , ( z,bℓ ) ∈ f p zℓ − n  . (By Claim 6) (5)Thus, the success probability of A ′ is at least E z ∼ λ X ℓ ∈L , ( z,bℓ ) ∈ f, snip ( ℓ )=0 q zℓ ≥ ·  E z ∼ λ X ℓ ∈L , ( z,bℓ ) ∈ f p zℓ − n  (By Equation (5)) ≥ · (cid:18) − n (cid:19) (By Equations (3) and (4)) ≥ . (For large enough n )We now give the proofs of Claims 5 and 6. Proof of Claim 5.

We will prove the ﬁrst inequality. The proof of the second inequality is similar .Fix a z ∈ { , } n and a leaf ℓ ∈ L . For each i = 1 , . . . , n , assume that codim ( C ( i ) ℓ ) = d ( i ) , and in thepath from the root of B to ℓ the variables x ( i )1 , . . . , x ( i ) d ( i ) are set to bits b , . . . , b d ( i ) in this order. Thecomputation of A ′ terminates at leaf ℓ if the values of the diﬀerent bits x ( i ) j sampled by A ′ agree withthe leaf ℓ . The probability of that happening is given by q zℓ = n Y i =1 Pr A ′ [ x ( i )1 = b , . . . , x ( i ) d ( i ) = b d ( i ) | z ] (6) Note that only the ﬁrst inequality is used in the proof of Theorem 1. n Y i =1 Pr x ∼ µ [ x ( i )1 = b , . . . , x ( i ) D µǫ ( g ) − = b D µǫ ( g ) − ] · Pr x ∼ µ zi [ x ( i ) D µǫ ( g ) = b D µǫ ( g ) , . . . , x ( i ) d ( i ) = b d ( i ) | x ( i )1 = b , . . . , x ( i ) D µǫ ( g ) − = b D µǫ ( g ) − ] . (7)The second equality above follows from the observation that in Algorithm 1, the ﬁrst D µǫ ( g ) − x ( i ) are sampled from their marginal distributions with respect to µ , and the subsequentbits are sampled from their marginal distributions with respect to µ z i . In equation (7), the termPr x ∼ µ zi [ x ( i ) D µǫ ( g ) = b D µǫ ( g ) , . . . , x ( i ) d ( i ) = b d ( i ) | x ( i )1 = b , . . . , x ( i ) D µǫ ( g ) − = b D µǫ ( g ) − ] is interpreted as 1 if d ( i ) < D µǫ ( g ).We invoke Claim 4(b) with C set to the subcube { x ∈ { , } m : x ( i )1 = b , . . . , x ( i ) D µǫ ( g ) − = b D µǫ ( g ) − } and δ set to n . To see that the claim is applicable here, note that from the assumption snip ( ℓ ) = 0we have that bias ( C ) < δ = n < , where the last inequality holds for large enough n . Also, since D µǫ ( g ) >

0, by Proposition 2 the bias of { , } m is at most n < n = δ . Continuing from Equation (7),by invoking Claim 4(b) we have, q zℓ ≥ n Y i =1 (1 − /n ) Pr x ∼ µ zi [ x ( i )1 = b , . . . , x ( i ) D µǫ ( g ) − = b D µǫ ( g ) − ] · Pr x ∼ µ zi [ x ( i ) D µǫ ( g ) = b D µǫ ( g ) , . . . , x ( i ) d ( i ) = b d ( i ) | x ( i )1 = b , . . . , x ( i ) D µǫ ( g ) − = b D µǫ ( g ) − ]= (1 − /n ) n n Y i =1 Pr x ∼ µ zi [ x ( i )1 = b , . . . , x ( i ) d ( i ) = b d ( i ) ] ≥ · p zℓ . (For large enough n ) Proof of Claim 6.

Fix a z ∈ { , } n . We shall prove that for each i , P ℓ ∈L , snip ( i ) ( ℓ )=1 p zℓ ≤ n . Thatwill prove the claim, since P ℓ ∈L , snip ( ℓ )=1 p zℓ ≤ P ni =1 P ℓ ∈L , snip ( i ) ( ℓ )=1 p zℓ .To this end, ﬁx an i ∈ { , . . . , n } . For a random x drawn from γ z , let p be the probability that instrictly less than D µǫ ( g ) queries the computation of B reaches a node t such that bias ( C ( i ) t ) is at least n . Note that this probability is over the choice of the diﬀerent x ( j ) ’s. We shall show that p ≤ n .This is equivalent to showing that P ℓ ∈L , snip ( i ) ( ℓ )=1 p zℓ ≤ n .Note that each x ( j ) is independently distributed according to µ z j . By averaging, there exists a choiceof x ( j ) for each j = i such that for a random x ( i ) chosen according to µ z i , a node t as above is reachedwithin at most D µǫ ( g ) − p . Fix such a setting for each x ( j ) , j = i .Claim 6 follows from Claim 3 (note that ǫ = − n ≥ for large enough n ).This completes the proof of Theorem 1. In this section we prove Theorem 2.Theorem 1 is useful only when the function g is hard against randomized query algorithms even forerror 1 / − /n . In this section we use an XOR lemma to show a procedure that, given any g that ishard against randomized query algorithms with error 1 /

3, obtains another function on a slightly larger10omain that is hard against randomized query algorithms with error 1 / − /n . This yields the proofof Theorem 2.Let g : { , } m → { , } be a function. Let g ⊕ t : ( { , } m ) t → { , } be deﬁned as follows. For x = ( x (1) , . . . , x ( t ) ) ∈ ( { , } m ) t , g ⊕ t ( x ) = ⊕ ti =1 g ( x ( i ) ) . The following theorem is obtained by specializing Theorem 3 of Andrew Drucker’s paper [5] to thissetting.

Theorem 7 (Drucker 2011 [5] Theorem 3) . R / − − Ω( t ) ( g ⊕ t ) = Ω( t · R / ( g )) . Theorem 2 (restated below) follows by setting t = Θ(log n ) and combining Theorem 7 with Theorem 1. Theorem 2. R / (cid:16) f ◦ (cid:16) g ⊕ O (log n ) (cid:17) n (cid:17) = Ω(log n · R / ( f ) · R / ( g )) . Acknowledgements:

This work was partially supported by the National Research Foundation,including under NRF RF Award No. NRF-NRFF2013-13, the Prime Ministers Oﬃce, Singapore andthe Ministry of Education, Singapore under the Research Centres of Excellence programme and byGrant No. MOE2012-T3-1- 009.D.G. is partially funded by the grant P202/12/G061 of GA ˇCR and by RVO: 67985840. M. S. ispartially funded by the ANR Blanc program under contract ANR-12-BS02-005 (RDAM project).

References [1] Shalev Ben-David and Robin Kothari. Randomized query complexity of sabotaged and composedfunctions. In , pages 60:1–60:14, 2016.[2] Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complexity: a survey.

Theor. Comput. Sci. , 288(1):21–43, 2002.[3] Mark Bun and Justin Thaler. Dual lower bounds for approximate degree and Markov-Bernsteininequalities. In

Automata, Languages, and Programming - 40th International Colloquium, ICALP2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I , pages 303–314, 2013.[4] Arkadev Chattopadhyay, Michal Kouck´y, Bruno Loﬀ, and Sagnik Mukhopadhyay. Simulationtheorems via pseudorandom properties.

CoRR , abs/1704.06807, 2017.[5] Andrew Drucker. Improved direct product theorems for randomized query complexity. In

Proceed-ings of the 26th Annual IEEE Conference on Computational Complexity, CCC 2011, San Jose,California, June 8-10, 2011 , pages 1–11, 2011.[6] Mika G¨o¨os and T. S. Jayram. A composition theorem for conical juntas. In , pages 5:1–5:16,2016.[7] Mika G¨o¨os, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partitionnumber. In

IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015,Berkeley, CA, USA, 17-20 October, 2015 , pages 1077–1088, 2015.118] Mika G¨o¨os, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for BPP.

CoRR , abs/1703.07666, 2017.[9] Hamed Hatami, Kaave Hosseini, and Shachar Lovett. Structure of protocols for XOR functions.In

IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October2016, Hyatt Regency, New Brunswick, New Jersey, USA , pages 282–288, 2016.[10] Raghav Kulkarni and Avishay Tal. On fractional block sensitivity.

Chicago J. Theor. Comput.Sci. , 2016, 2016.[11] Ryan O’Donnell, John Wright, Yu Zhao, Xiaorui Sun, and Li-Yang Tan. A composition theoremfor parity kill number. In

IEEE 29th Conference on Computational Complexity, CCC 2014,Vancouver, BC, Canada, June 11-13, 2014 , pages 144–154, 2014.[12] Alexander A. Sherstov. Approximating the AND-OR tree.