Tribes Is Hard in the Message Passing Model
aa r X i v : . [ c s . CC ] F e b Tribes Is Hard in the Message Passing Model ∗ Arkadev ChattopadhyaySagnik MukhopadhyayTata Institute of Fundamental Research, Mumbai { arkadev.c | sagnik } @tifr.res.in September 6, 2018
Abstract
We consider the point-to-point message passing model of communication in which there are k processorswith individual private inputs, each n -bit long. Each processor is located at the node of an underlyingundirected graph and has access to private random coins. An edge of the graph is a private channel ofcommunication between its endpoints. The processors have to compute a given function of all their inputsby communicating along these channels. While this model has been widely used in distributed computing,strong lower bounds on the amount of communication needed to compute simple functions have just begunto appear.In this work, we prove a tight lower bound of Ω( kn ) on the communication needed for computing theTribes function, when the underlying graph is a star of k + 1 nodes that has k leaves with inputs and a centerwith no input. Lower bound on this topology easily implies comparable bounds for others. Our lower boundsare obtained by building upon the recent information theoretic techniques of Braverman et.al ([BEO + The classical model of 2-party communication was introduced in the seminal work of Yao[Yao79], motivated byproblems of distributed computing. This model has proved to be of fundamental importance (see the book byKushilevitz and Nisan [KN97]) and forms the core of the vibrant subject of communication complexity. It is fairto say that the wide applicability of this model to different areas of computer science cannot be over-emphasized.However, a commonly encountered situation in distributed computing is one where there are multiple pro-cessors, each holding a private input, that are connected by an underlying communication graph. An edge ofthe graph corresponds to a private channel of communication between the endpoints. There are k processorslocated on distinct nodes of the graph that want to compute a function of their joint inputs. In such a net-worked scenario, a very natural question is to understand how much total communication is needed to get thefunction computed. The classical 2-party model is just a special case where the graph is an edge connectingtwo processors.Among others, this model has also been called the Number-in-hand multiparty point-to-point messagepassing model of communication. Apart from distributed computing, this model is used in secure multipartycomputation. The study of the communication cost in the model was most likely introduced by Dolev andFeder [DF92] and further worked on by Duris and Rolim [DR98]. These early works focused on deterministiccommunication. There has been renewed interest in the model because it arguably better captures manyof today’s networks that is studied in various distributed models: models for map-reduce [KSV10, GSZ11],massively parallel model for computing conjunctive queries [BKS13, KS11], distributed models of learning[BBFM12] and in core distributed computing [DKO12]. However, there were no known systematic techniquesof proving lower bounds on the cost of randomized communication protocols that exploited the non-broadcast nature of the private channels of communication in the model. Recently, there has been a flurry of workdeveloping new techniques for proving lower bounds on communication. Phillips, Verbin and Zhang [PVZ12]introduced the method of symmetrization to prove strong bounds for a variety of functions. Their techniquewas further developed in the works of Woodruff and Zhang [WZ12, WZ13, WZ14].All these works considered the co-ordinator model, a special case, that was introduced in the early work of[DF92]. In the co-ordinator model, the underlying graph has the star topology with k + 1 nodes. There are k ∗ A. Chattopadhyay is partially supported by a Ramanujan Fellowship of the DST and S. Mukhopadhyay is supported by a TCSFellowship. n -bit input. Each of the k leaf-nodes is connected to the center of the star. The node atthe center has no input and is called the co-ordinator. The following two simple observations about the modelwill be relevant for this work: every function can be trivially computed using O ( nk ) bits of communication byhaving each of the k players send their inputs to the co-ordinator who then outputs the answer. It is also easilyobserved that the co-ordinator model can simulate a communication protocol on an arbitrary topology having k nodes with at most a log k factor blow-up in the total communication cost.A key lesson learnt from our experience with the classical 2-party model is that an excellent indicator of ourunderstanding of a model is our ability to prove lower bounds for the widely known Set-Disjointness problem inthe model. Indeed, as surveyed in [CP10], several new and fundamental lower bound techniques have emergedfrom efforts to prove lower bounds for this function. Further, the lower bound for Set-Disjointness, is whatdrives many of the applications of communication complexity to other domains. While the symmetrizationtechnique of Phillips et.al and its refinements by Woodruff and Zhang proved several lower bounds, no stronglower bounds for Set-Disjointness were known until recently in the k -processor co-ordinator model. In thissetting, the relevant definition of Set-Disjointness is the natural generalization of its 2-party definition: viewthe n -bit inputs of the k processors as a k × n Boolean matrix where the i th row corresponds to the Processor i ’s input. The Set-Disjointness function outputs 1 iff there exists a column of this matrix that has no zeroes.In an important development, Braverman et.al. [BEO +
13] proved a tight Ω( kn ) lower bound for Set-Disjointness in the co-ordinator model. Their approach is to build up new information complexity tools for thismodel that is a significant generalization of the 2-party technique of Bar-Yossef et.al. [BJKS02]. In this work, wefurther develop this information complexity method for the co-ordinator model by considering another naturaland important function, known as Tribes m,ℓ . In this function, the n -bit input to each processor is grouped into m blocks, each of length ℓ . Thus, the overall k × n input matrix splits up into m sub-matrices A , . . . , A m ,each of dimension k × ℓ . Tribes outputs 1 iff the Set-Disjointness function outputs 1 on each sub-matrix A i .This obviously imparts a direct-sum flavor to the problem of determining the complexity of the Tribes functionin the following sense: a naive protocol will solve Tribes by simultaneously running an optimal protocol forSet-Disjointness on each of the m instances A , . . . , A m . Is this strategy optimal?This question was answered in the affirmative for the 2-party model by Jayram, Kumar and Sivakumar[JKS03] when they proved an Ω( n ) lower bound on the randomized communication complexity of the Tribesfunction. Their work delicately extended the information theoretic tools of Bar-Yossef et.al [BJKS02]. Interest-ingly, it also exhibited the power of the information complexity approach. There was no other known techniqueto establish a tight lower bound on the Tribes function .In this work, we show that the naive strategy for solving Tribes is optimal also in the co-ordinator model: Theorem 1.
In the k -processor co-ordinator model, every bounded error randomized protocol solving the Tribes m,ℓ function, has communication cost Ω (cid:0) mℓk (cid:1) , for every k ≥ . We prove this by extending and simplifying the information complexity approach of [BEO +
13] and theearlier work of [JKS03]. It is worth noting that our bounds in Theorem 1 hold for all values of k . In particular,this also yields a lower bound for Set-Disjointness for all values of k . The earlier bound of Braverman et.al.only worked if k = Ω(log n ). We first provide a quick overview of our techniques and contributions. We follow this up with a more detaileddescription, elaborating on the main steps of the argument.
Brief Summary:
Recall that the
Tribes m,ℓ function can be written as an m -fold AND of Disj ℓ instances. Onepossible way to show that Tribes m,ℓ is hard in message-passing model is to show that any protocol evaluating
Tribes m,ℓ must evaluate all the the
Disj ℓ instances. This suffices to argue that Tribes m,ℓ is m times as hardas Disj ℓ . By now it is well known that information complexity provides a convenient framework to realizesuch direct sum arguments. In order to do so, one needs to define a distribution on inputs that is entirelysupported on the ones of the m Set-Disjointness instances of Tribes. This was the general strategy of Jayramet.al.[JKS03] in the 2-party context. However, the first problem one encounters is to define an appropriate harddistribution and a right notion of information cost such that Disjointness has high information cost of Ω( kℓ )under that distribution in the co-ordinator model . This turns out to be a delicate and involved step. Variousnatural information costs do not work as observed by Phillips et.al.[PVZ12]. Here, we are helped by the work ofBraverman et.al.[BEO + τ and an information cost measure IC . However, we face two problems in using them. The first is that τ happens to be (almost) entirely supportedon the zeroes of Set-Disjointness. Taking ideas from [JKS03], we modify τ to get a distribution µ supported This is not surprising. Two other successful techniques, the discrepancy and the corruption method, both yield lower boundson the non-deterministic complexity. On the other hand, Tribes and its complement, on n -bit inputs, both have only √ n non-deterministic complexity. µ , we first sample from τ andthen pick a random column of the sampled input and force it to all ones. Intuitively, the idea is that the all onescolumn is being well hidden at a random spot. If τ was hard, µ should also remain hard. The second problemis to appropriately modify the information cost measure IC to IC so that it yields high information complexityunder µ . Here, we use an idea of [JKS03].However, proving that IC is high for protocols when inputs are sampled according to µ raises new technicalchallenges. The first challenge is to prove a direct sum result on the information complexity of protocols asmeasured by IC . Implementing this step is a novelty of this work, where we show roughly that IC (cid:0) Disj ℓ (cid:1) is atleast Ω( ℓ · IC (cid:0) Disj (cid:1) ). For showing this, we introduce a new information measure, PIC ( f ) which is a lower boundon IC ( f ) and will be explained in relevat section. The final challenge is to prove that IC (cid:0) Disj (cid:1) is Ω( k ). Weagain do that by first simplifying some of the lemmas of [BEO +
13] and extending them using some ideas fromthe work of [JKS03].
More Detailed Account:
Among the many possible ways to define information cost of a protocol, thedefinition we work with stems from the inherent structure of the communication model. As evident from theprevious discussion, in the model of communication we are interested in, the co-ordinator can see the wholetranscript of the protocol but cannot see the inputs. On the other hand, the processors can only see a local viewof the transcript - the message that is passed to them and the message they send - along with their respectiveinputs. From the point of view of the co-ordinator, who has no input, the information revealed by the transcriptabout the input can be expressed by I [ X : Π( X )]. This is small for the protocol where the co-ordinator goesaround probing each player on each coordinate to see whether any player has 0 in it and gives up once shefinds such a player.(We call it Protocol A ). It is not hard to see that the information cost can only be ashigh as O ( n log k ) for protocol A . A relevant information cost measure from the point of view of processor i is I [ X − i : Π i ( X ) | X i ] which measures how much information processor i learns about other inputs from thetranscript. It turns out that this information cost is also very small for the protocol where all the processorssend their respective inputs to the co-ordinator (We call this protocol as protocol B ). Here I [ X − i : Π i ( X ) | X i ]is 0 for all i . What is worth noticing is that in both protocols, if we consider the sum of the two informationcosts, i.e., I [ X : Π( X )] + P i I [ X − i : Π i ( X ) | X i ], it is Ω( nk ) which is the kind of bound we are aiming for.This cost trade-off was first observed in [PVZ12] but they were unable to prove a lower bound for Disj ℓ inthis model of communication. Braverman et al [BEO +
13] solved this problem by coming up with the followingnotion of information complexity. Let ( X , M , Z ) be distributed jointly according to some distribution τ . Theinformation cost of a protocol Π with respect to τ is defined as, IC τ (Π) = X i ∈ [ k ] h I τ [ X i : Π i ( X ) | M , Z ] + I τ [ M : Π i ( X ) | X i , Z ] i (1)Conditioning on the auxiliary random variables M and Z serves the following purpose: Even though thedistribution τ is a non-product distribution, it can be thought of as a convex combination of product distribu-tions, one for each specific values of M and Z . It is well-known by now that such convex combination facilitatesproving direct-sum like result.The desired properties of the distribution τ are as follows. First, the distribution should have enough entropyto make it hard for the players to encode their inputs cheaply and send it across to the co-ordinator. Such anencoding is attempted in protocol B . Second, the distribution should be supported on inputs which have onlya few 0’s in each column of Disj ℓ . This makes sure that the co-ordinator has to probe Ω( k ) processors in eachcolumn before he finds a 0 in that column. This attempt of probing was undertaken by the co-ordinator inprotocol A . The first property can be individually satisfied by setting each processor’s input to be 0 or 1 withequal probability in each column. The second property can also be individually satisfied by taking a randomprocessor for each column and giving it a 0 and giving 1 to rest of the processors as their inputs. Let Z j denotethe processor whose bit was fixed to 0 in column j . The hard distribution for Disj ℓ is a convex combination ofthese two distributions. The way it is done is by setting a Bernoulli random variable M j for each of the column j which acts as a switch, i.e., if M j = 0 the input to the column j is sampled from the first distribution, otherwiseit is sampled from the second distribution. M j takes value 0 with probability 2 /
3. We define M = h M , . . . , M ℓ i and Z = h Z , . . . , Z ℓ i .At this point it is interesting to go back to the definition of IC and try to see the implication of eachterm in the definition. For the coordinator, P i I τ [ X i : Π i ( X ) | M , Z ] represents the amount of informationrevealed about the inputs of the processors by the transcript. For convenience, we can assume that M is withco-ordinator. We can do this without loss of generality as the co-ordinator can sample O (log k ) inputs fromcolumn j and conclude the value of M j from it, for any j . This amount of communication is okay for us as weare trying to show a lower bound of Ω( nk ). However note that we cannot assume that the processors have theknowledge of M . Had that been the situation, the processors would have employed protocol A or protocol B incolumn j depending on the value of the M j . The value of I [ X i : Π i ( X ) | M , Z ], in this protocol, would havebeen small. So we need to make sure that we charge the processors for their effort to know the value of M . This3s taken care by the second term in the definition of IC i.e., I µ [ M : Π i ( X ) | X i , Z ]. Braverman et al [BEO + ℓk ) lower bound for the information cost of Disj ℓ with respect to the hard distribution.As mentioned before, we, however, need the hard distribution ζ for Tribes m,ℓ to be entirely supported on 1sof
Disj ℓ . But the distribution τ described above is supported on 0s of Disj ℓ . Here we borrow ideas from [JKS03]and design a distribution µ by selecting a random column for the Disj ℓ instances and planting an all 1 input init. We denote the random co-ordinate by W . It is easy to verify that µ is a distribution supported in 1s of Disj ℓ .We set the hard distribution for Tribes m,ℓ to be an m -fold product distribution ζ = µ m denoted by the randomvariables h ¯ X , ¯ M , ¯ Z , ¯ W i . It is to be noted that a correct protocol should work well for all inputs, not necessarilyfor the inputs coming from the distribution ζ . This property will be crucially used in later part of the proof.The modification of the input distribution from τ to µ and subsequently to ζ calls for changing the definitionof the information complexity to suit our purpose. We define information complexity as follows which we willuse in this paper. Definition 2.
Let ( ¯ X , ¯ M , ¯ Z , ¯ W ) be distributed jointly according to ζ . The information cost of a protocol Π with k processors in NIH point-to-point coordinator model with respect to ζ is defined as, IC ζ (Π) = X i ∈ [ k ] (cid:20) I ζ [ ¯ X i : Π i ( ¯ X ) | ¯ M , ¯ Z , ¯ W ] + I ζ [ ¯ M : Π i ( ¯ X ) | ¯ X i , ¯ Z , ¯ W ] (cid:21) . (2) For a function f : X → R , the information complexity of the function is defined as, IC ζ,δ ( f ) = inf Π IC µ (Π) , (3) where the infimum is taken over all δ -error protocol Π for f . By doing this, we are able to bound the information complexity of
Tribes m,ℓ as m -times that of Disj ℓ .Although non-trivial, this step can be accomplished by exploiting the proof techniques used in [BEO + Disj ℓ , which turns out to be difficult for two reasons. First,the distribution µ is no more a 0 distribution for Disj ℓ . We get around this by defining a new informationcomplexity measure, - which we call as partial information complexity - to show that the partial informationcomplexity of Disj ℓ on distribution µ is at least ( ℓ − Disj . This is one of the main technicalcontributions of our paper. See Section 4.1 for details. The second hurdle we face is bounding the informationcomplexity of Disj . Here we combine ideas from [JKS03, BEO +
13] to conclude that the partial informationcomplexity of
Disj is Ω( k ). This is the second main technical contribution of this paper, which is explained inSection 4.2. Finally we give a simple argument in Section 5 to show that IC ζ (Π) lower bounds the communicationcost of Π where Π is any correct protocol for Tribes m,ℓ . Communication complexity.
In this work, we are mainly interested in multiparty communication number-in-hand model. In this model of computation, the input is distributed between k players P , · · · , P k who jointly wish to compute a function f on the combined input by communication with each other. It canbe assumed that the players have unlimited computational power. Several variants of this model have beenstudied extensively, such as, message passing model, where each pair of players have a dedicated communicationchannel and hence the players can send messages to specific players. This is contrasted in the second variantwhere the players can only broadcast their communication. The latter model in known as shared blackboardmodel. In this work, we consider the message passing model.As mentioned before, the model which is easier to work with is the coordinator model where, in addition to k players who hold the input, a coordinator is introduced who does not have any input but all the communicationis channelled via her, i.e., the players can only communicate with the coordinator though the coordinator isallowed to communication with everybody. It is easy to observe that this coordinator model can simulate themessage passing model with only a log k overhead in the total communication cost.We work with randomized protocol where the players have access to private coins. (Though it might seem likethat the public coin protocol can yield better upper bound, it can be noted that all the proofs can be modifiedto give the same result for public coin model.) The standard notion of private coin randomized communicationcomplexity is adopted here, where we look at the worst-case communication of the protocol when the protocolis allowed to make only δ error (bounded away from 1 /
2) on each input. Here the probability is taken over theprivate coin tosses of the players. For more details, readers are referred to [KN97].4 nformation theory.
We will quickly go through the information theoretic definitions and facts we need.For a random variable X taking value in the sample space Ω according to the distribution p (˙), the entropy of X , denoted as H ( X ), is defined as follows. H ( X ) = X x ∈ Ω Pr[ X = x ] log 1Pr[ X = x ] = E x (cid:20) log 1 p ( x ) (cid:21) . (4)For two random variables X and Y , the conditional entropy of X given Y is defined as follows. H ( X | Y ) = E x,y (cid:20) log 1 p ( x | y ) (cid:21) . (5)Informally, the entropy of a random variable measures the uncertainty associated with it. Conditioning onanother random variable, i.e., knowing the value that another random variable takes can only decrease theuncertainty of the former one. This notion is captured in the following fact that H ( X | Y ) ≤ H ( X ) where theequality is achieved when X is independent of Y . Given two random variables X and Y with joint distribution p ( x, y ) we can talk about how much information one random variable reveals about the other random variable.The mutual information, as it is called, between X and Y is defined as follows. I [ X : Y ] = H ( X ) − H ( X | Y ) . (6)It is to be noted that the mutual information is a symmetric quantity, though it might not be obvious from thedefinition itself. From the previous discussion, it is easy to see that the mutual information is a non-negativequantity. As before, we can also define conditional mutual information as below. I [ X : Y | Z ] = H ( X | Z ) − H ( X | Y, Z ) . (7)The following chain rule of mutual information will be crucially used in our proof. I [ X , . . . , X n : Y ] = X i ∈ [ n ] I [ X i : Y | X i − , . . . , X ] . (8)It is to be noted that the chain rule of mutual information will also work when conditioned on random variable Z . Remark 1.
Consider a permutation σ : [ n ] → [ n ] . The following observation will be useful in our proof. I [ X , . . . , X n : Y ] = X i ∈ [ n ] I [ X σ ( i ) : Y | X σ ( i − , . . . , X σ (1) ] . (9)We will use the following lemmas regarding mutual information. Lemma 3.
Consider random variables
A, B, C and D . If A is independent of B given D then, I [ A : B, C | D ] = I [ A : C | B, D ] , (10) and I [ A : C | B, D ] ≥ I [ A : C | D ] . (11) Proof.
By chain rule of mutual information, I [ A : B, C | D ] = I [ A : B | D ] + I [ A : C | B, D ]= I [ A : C | B, D ] [As A ⊥ B | D ⇒ I [ A : B | D ] = 0].For the second expression, we know I [ A : B | D ] = 0 which means H ( A | D ) = H ( A | B, D ). Hence, I [ A : C | B, D ] = H ( A | B, D ) − H ( A | B, C, D )= H ( A | D ) − H ( A | B, C, D ) ≥ H ( A | D ) − H ( A | C, D )= I [ A : C | D ] . Lower bound for
Tribes m,ℓ in message-passing model
Here, in the first subsection, we will show two direct-sum results. In the first step we bound the informationcomplexity of
Tribes m,ℓ in terms of that of
Disj ℓ . It is to be noted that the proof technique of [BJKS02] fallsshort of proving any lower bound on the information complexity measure we have defined - mainly becauseof the fact the information complexity measure consists of sum two different mutual information terms foreach processor, and it is not clear that one can come up with lower bounds for both the terms simultaneously.This problem has already been attended to in [BEO +
13] and the proof we present here resembles the prooftechnique used by them. For completeness we include the proof in this paper. In the second step, we willbound the information complexity of
Disj ℓ in terms of Disj . This step is more difficult and a straight-forwardapplication of the direct-sum argument of [BEO +
13] will not work. First we use ideas from [JKS03] to definepartial information complexity measure which is more convenient to work with. Then we come up with a noveldirect-sum argument for partial information complexity measure.In Section 4.2, we show that the information complexity of
Disj is Ω( k ). We manage to show this bycombining ideas from [BEO +
13, JKS03].
In this section we prove that the information cost of computing
Tribes m,ℓ is m times the information cost ofcomputing Disj ℓ . The proof is almost the same proof as in [BEO +
13] where the authors have used a direct sumtheorem to show that the information cost of computing
Disj ℓ is ℓ times the information cost of computing k -bit AND k . Before going into details we need the following definitionsConsider f : D m → R can be written as f ( X ) = g ( h ( X ) , . . . , h ( X m )) where X = h X , . . . , X m i , X i ∈ D and h : D → R . In other words, f is g -decomposable with primitive h . Definition 4 (Collapsing distribution) . We call X ∈ D m be a collapsing input for f if for any i ∈ [ m ] and y ∈ D , we have f ( X ( i, y )) = h ( y ) . Any distribution ζ supported entirely on collapsing inputs on f is called acollapsing distribution of f . Definition 5 (Projection) . Given a distribution ν specified by random variable ( D , . . . , D k ) and a subset S of [ k ] , we call the projection of ν on ( D i ) i ∈ S , denoted as ν ↓ ( D i ) i ∈ S , the marginal distribution of ( D i ) i ∈ S inducedby ν . The proof is by reduction: we will show that given a protocol Π for
Tribes m,ℓ and a collapsing distribution µ = ζ ml , we can construct a protocol Π ′ for Disj ℓ such that it computes Disj ℓ with the same error probabilityas that of Π and the information complexity of Π is m times that of Disj ℓ . Theorem 6.
Let µ = ζ mℓ be a collapsing distribution for Tribes m,ℓ partitioned by M , Z and W as describedbefore. Then IC µ ( Tribes m,ℓ ) ≥ m. IC ζ ℓ ( Disj ℓ ) . (12)As mentioned before, the proof of Theorem 6 works out nicely by adapting the proof techniques of [BEO + ′ for Disj ℓ given protocol Π for Tribes m,ℓ . Onan input u for Disj ℓ , the processors and the coordinator sample a random k × mℓ matrix ¯ X in the following way.1. The coordinator samples J uniformly at random from [ m ]. This is the place where the processors embedthe input u .2. The coordinator samples ¯ Z − J ∈ [ k ] ℓ ( m − and sends it to all the processors.3. The coordinator samples ¯ W − J ∈ [ ℓ ] m − and sends it to all the processors.4. The coordinator samples M t ∈ { , } ℓ , M t ∼ Bin (1 /
3) for all t less that J and sends it to all the processors.The processors use their private randomness to sample the inputs for those Disj ℓ instances in the followingway: For the i th Disj ℓ instance, the processors sample X by sampling each of the columns independently- For column j the input of the Z j th processor is fixed to 0 and the other processors get 1 if M j = 1,otherwise, they get 0 or 1 uniformly at random5. For the rest of Disj ℓ instances, the coordinator herself samples the inputs in the same way and sends therequisite inputs to respective processors. Observation 7.
Consider the tuple ( U , N , V , S ) distributed according to ζ l . If U is given as input to protocol Π ′ , then ( ¯ X , ¯ M , ¯ Z , ¯ W ) is distributed according to µ . µ is a collapsing distribution for Tribes m,ℓ , it is easy to see that the protocol Π ′ computes Disj ℓ . We needto show the connection between information cost of this protocol and the information cost of Tribes m,ℓ which isshown next.We will prove the following lemma.
Lemma 8. [BEO + I ( U , N , V , S ) ∼ ζ ℓ [ U i : Π ′ i ( U ) | N , V , S ] ≤ m I ( ¯ X , ¯ M , ¯ W , ¯ Z ) ∼ µ [ ¯ X i : Π i ( ¯ X ) | ¯ M , ¯ W , ¯ Z ] , (13) and I ( U , N , V , S ) ∼ ζ ℓ [ N : Π ′ i ( U ) | U i , V , S ] ≤ m I ( ¯ X , ¯ M , ¯ W , ¯ Z ) ∼ µ [ ¯ M : Π i ( ¯ X ) | ¯ X i , ¯ W , ¯ Z ] . (14)Lemma 8 implies Theorem 6. Proof of Lemma 8.
The proof is exactly same as the proof of Lemma 4.1 of [BEO + i on the transcript Π ′ , which we are calling as Π ′ i . We haveΠ ′ i ( U ) = h J , ¯ Z − J , ¯ M [1 , J − , ¯ X i [ J +1 ,m ] , ¯ W − J , Π( ¯ X ( J , U )) i . As h J , ¯ Z − J , ¯ M [1 , J − , ¯ X i [ J +1 ,m ] , ¯ W − J i is independent of N , we can write what follows. I ( U , N , V , S ) ∼ ζ ℓ [ N : Π ′ i ( U ) | U i , V , S ]= I ( U , N , V , S ) ∼ ζ ℓ (¯ Z − J , ¯ M − J , ¯ X − J , W − J ) ∼ ζm − ℓ [ N : h J , ¯ Z − J , ¯ M [1 , J − , ¯ X i [ J +1 ,m ] , ¯ W − J , Π( ¯ X ( J , U )) i | U i , V , S ]= I ( U , N , V , S ) ∼ ζ ℓ (¯ Z − J , ¯ M − J , ¯ X − J , W − J ) ∼ ζm − ℓ [ N : Π( ¯ X ( J , U )) | J , ¯ Z − J , ¯ M [1 , J − , ¯ X i [ J +1 ,m ] , ¯ W − J , U i , V , S ]= I ( ¯ X , ¯ M , ¯ W , ¯ Z ) ∼ µ [ ¯ M J : Π i ( ¯ X ) | J , ¯ M [1 , J − , ¯ X i [ J ,m ] , ¯ Z , ¯ W ] ≤ I ( ¯ X , ¯ M , ¯ W , ¯ Z ) ∼ µ [ ¯ M J : Π i ( ¯ X ) | J , ¯ M [1 , J − , ¯ X i , ¯ Z , ¯ W ]= 1 m m X j =1 I ( ¯ X , ¯ M , ¯ W , ¯ Z ) ∼ µ [ ¯ M j : Π i ( ¯ X ) | ¯ M [1 ,j − , ¯ X i , ¯ Z , ¯ W ]= 1 m I ( ¯ X , ¯ M , ¯ W , ¯ Z ) ∼ µ [ ¯ M : Π i ( ¯ X ) | ¯ X i , ¯ Z , ¯ W ] . Equation (13) can be proved in the same way.Now our goal is to connect the information cost of
Disj ℓ under ζ ℓ to information cost of AND k . So a naturalattempt is to prove a theorem like Theorem 6 for reduction from Disj ℓ to AND k . Unfortunately this is notpossible. Recall that Disj ℓ ( X ) = W ℓi =1 V kj =1 X ji . Hence for a collapsing distribution each of the AND k s shouldevaluate to 0, which is not the case for the distribution ζ ℓ .Inspired by [JKS03], we define the following measure of information cost, namely, partial information cost.Let Π be a protocol for Disj ℓ . The partial information cost of Π is defined as, PIC (Π) = k X i =1 (cid:0) I (cid:2) M − W : Π i ( X ) | X i , Z , W (cid:3) + I (cid:2) X i − W : Π i ( X ) | M , Z , W (cid:3)(cid:1) . (15)The random variable M − W denotes M with its W -th coordinate removed. Similarly, X i − W denotes X i withits W -th coordinate removed. The partial information complexity of Disj ℓ is the partial information cost of thebest protocol computing Disj ℓ . It is easy to see that the partial information complexity of any function f lowerbounds the information complexity of f .We prove the following theorem. Theorem 9.
Let ζ ℓ be the distribution over the inputs of Disj ℓ partitioned by M , Z , W as described before. Then PIC ζ ℓ ( Disj ℓ ) ≥ ( ℓ − . PIC ζ ( Disj ) . (16)7ere we will show the following reduction analogous to our previous reduction from Tribes m,ℓ to Disj ℓ . Givena protocol Π ′ for Disj ℓ and distribution ζ ℓ (as described in Section 2, we will come up with a protocol Π ′′ for Disj such that the partial information cost of Π ′′ w.r.t. ζ ℓ is 1 / ( ℓ −
1) times the partial information cost of Π ′ w.r.t ζ .Let us describe the construction of the protocol Π ′′ . On an input u = h u , u i for Disj , the processors andthe coordinator sample a k × ℓ random matrix X ( u ) in the following way.1. The coordinator samples P and Q uniformly at random from [ ℓ ] such that P < Q .2. The coordinator samples Z −{ P , Q } = ( Z i ) i ∈ [ ℓ ] \ P , Q , where each Z i ∈ R [ k ], and sends it to all the processors.3. The coordinator samples a number R uniformly at random from { , ..., ℓ − } and then samples a subset T ⊆ [ ℓ ] \{ P , Q } uniformly at random from all sets of size R that do not contain P , Q . Then the coordinatorsamples M t ∼ Bin (1 /
3) for all t ∈ T and sends them to all the processors. The processors use their privaterandomness to sample X t for each column t in T in the following way: The input of the Z t -th processor isfixed to 0 in X t and the other processors get 1 if M t = 1, otherwise, if M t = 0, they get 0 or 1 uniformlyat random. We will call this input sampling procedure as IpSample .4. For the rest of the columns, the coordinator samples the inputs according to
IpSample and sends therequisite inputs to the respective processors.5. The processors form the input X ≡ X ( u, P , Q ) (i.e., X P = u and X Q = u ) and run the protocol Π ′ for Disj ℓ with X as input. Observation 10.
Consider the tuple ( U , N , V , S ) distributed according to ζ . If U is given as input to protocol Π ′′ , then ( X , M , Z , W ) is distributed according to ζ ℓ , where W is the unique all 1’s coordinate in X . Here W = P if V = 1 and W = Q if V = 2 . Next we prove the following lemma connecting the information cost of Π ′ for Disj ℓ and that of Π ′′ for Disj .This lemma implies the Theorem 9. Lemma 11. I ( U , N , V , S ) ∼ ζ [ U i − V , Π ′′ i ( U ) | N , V , S ] ≤ ℓ − I ( X , M , W , Z ) ∼ ζ ℓ [ X i − W , Π ′ i ( X ) | M , W , Z ] , (17) and I ( U , N , V , S ) ∼ ζ [ N − V , Π ′′ i ( U ) | U i , V , S ] ≤ ℓ − I ( X , M , W , Z ) ∼ ζ ℓ [ M − W , Π ′ i ( X ) | X i , W , Z ] . (18) Proof.
We consider the LHS of Equation (18). The view of processor i of the transcript of protocol Π ′′ , denotedas Π ′′ i ( U ), is given as follows.Π ′′ i ( U ) = h P , Q , Z − P , Q , R , T , M T , X i ¯ T \{ P , Q } , Π ′ ( X ( P , Q , U ))) i . (19)So the LHS of Equation (18) can be written as I ( U , N , V , S ) ∼ ζ [ N − V : Π ′′ i ( U ) | U i , V , S ]= I ( U , N , V , S ) ∼ ζ X , M , Z ) ∼ ζℓ ↓ X , M , Z [ N − V : P , Q , Z − P , Q , T , M T , R , X i ¯ T \{ P , Q } , Π ′ i ( X ( P , Q , U )) | U i , V , S ]= I ( U , N , V , S ) ∼ ζ X , M , Z ) ∼ ζℓ ↓ X , M , Z [ N − V : Π ′ i ( X ( P , Q , U )) | P , Q , Z − P , Q , T , M T , R , X i ¯ T \{ P , Q } , U i , V , S ] [Lemma 3 eqn. (10)]= I ( U , N , V , S ) ∼ ζ X , M , Z ) ∼ ζℓ ↓ X , M , Z [ N − V : Π ′ i ( X ) | P , Q , R , T , M T , Z , V , X i ¯ T ] [Combining ( U i , X i ¯ T \{ P , Q } ) and ( Z − P , Q , S )] ≤ I ( U , N , V , S ) ∼ ζ X , M , Z ) ∼ ζℓ ↓ X , M , Z [ N − V : Π ′ i ( X ) | P , Q , R , T , M T , Z , V , X i ] [Lemma 3 eqn. (11), X i S ind. of N − V ][ V takes value in 1 and 2 uniformly at random. Hence we can write it as follows.]= 12 I ( X , M , Z ) ∼ ζ ℓ ↓ X , M , Z [ M P : Π ′ i ( X ) | P , Q , R , T , M T , Z , V = 2 , X i ]+ 12 I ( X , M , Z ) ∼ ζ ℓ ↓ X , M , Z [ M Q : Π ′ i ( X ) | P , Q , R , T , M T , Z , V = 1 , X i ] . (20)8onsider the first mutual information term. I [ M P : Π ′ i ( X ) | P , Q , R , T , M T , Z , V = 2 , X i ]= 2 ℓ ( ℓ − X p Disj . Theorem 12. PIC ζ = Ω( k ) . mℓk ) lower bound on the switched informationcomplexity of Tribes m,ℓ which is the lower bound on R δ ( Tribes m,ℓ ) we aimed for. Notation. By ¯ e we mean the all 1 vector of size k . By ¯ e i,j , we mean the boolean vector of size k where allentries are 1 except the entries in index i and j . Similarly, ¯ e i is the boolean vector the all entries are 1 exceptthat of index i . Π[ i, x, m, z ; ¯ e i ] implies the transcript of the protocol Π on the following Disj instance: the inputof the first column comes from the distribution specified by M = m , Z = z and X i = x and the input of thesecond column is ¯ e i . Abusing notation slightly, Π i [ x, m, z ; ¯ e i ] represents processor- i ’s view of the transcript ofthe protocol Π when the input of the first column comes from the distribution specified by X i = x, M = m and Z = z and the input of the second column is ¯ e i . Hellinger distance. For probability distributions P and Q supported on a sample space Ω, the Hellingerdistance between P and Q , denoted as h ( P, Q ), is defined as, h ( P, Q ) = 1 √ ||√ P − p Q || . (26)Hellinger distance can also be written as follows, h ( P, Q ) = 1 − F ( P, Q ) , (27)where F ( P, Q ) = P ω ∈ Ω p P ( ω ) Q ( ω ) is also known as Bhattacharya coefficient. From the definition it is clear tosee that the Hellinger distance is a metric satisfying triangle inequality. Below we will state some facts (withoutproof) about Hellinger distance. Interested readers can refer to [BJKS02] for the proofs.We will denote the statistical distance between two distributions P and Q as ∆( P, Q ) and the Hellingerdistance between P and Q as h ( P, Q ). Fact 13 (Hellinger vs. statistical distance) . h ( P, Q ) ≤ ∆( P, Q ) ≤ √ h ( P, Q ) . (28)This essentially means that the Hellinger distance is good approximation of statistical distance. The followingfacts gives us the necessary connection between mutual information and Hellinger distance. Fact 14 (Hellinger vs information [Lin91]) . Let X be a random variable taking value in { x , x } equally likelyand let Π be a randomized protocol which takes X as input. Then, I [ X : Π( X )] ≥ h (Π( x ) , Π( x )) . (29) Fact 15. Let Π be a δ -error protocol for function f . For inputs x and y such that f ( x ) = f ( y ) , we have, h (Π( x ) , Π( y )) = 1 − δ √ . (30)The following lemmas will be helpful in our proof. These lemmas are straight-forward generalization of theirtwo-party analogues. Lemma 16. For any randomized protocol Π computing f : X k → { , } and for any x, y ∈ X k and for some i and j , h (Π( x i x j x − i,j , y i y j y − i,j )) = h (Π( x i y j x − i,j , y i x j y − i,j )) . (31) Proof. We will think of the randomized protocol Π on input ( x, y ) as a deterministic protocol Π ′ workingon ( x, y, { R i } i ) where R i is the private random coins of player i . The first observation is that for k -partydeterministic protocol, the inputs which gives rise to the same transcript π form a combinatorial rectangle R π = S π × · · · × S kπ . So we have the following.Pr { R i } i [Π ′ [ x, { R i } i ] = π ] = Pr { R i } i [( x, { R i } i ) ∈ R π ]= Pr { R i } i "^ i ( x i , R i ) ∈ S iπ = ^ i Pr R i [( x i , R i ) ∈ S iπ ] . (32)11ence it immediately follows that1 − h (Π( x i x j x − i,j , y i y j y − i,j ))= X π r Pr { R i } i [Π( x i x j x − i,j = π ] . Pr { R i } i [ y i y j y − i,j = π ]= X π (cid:20)r Pr { R i } i [( x − i,j , R − i,j ∈ R − i,jπ ] Pr R i [( x i , R i ) ∈ S iπ ] Pr R j [( x j , R j ) ∈ S jπ ] × r Pr { R i } i [( x − i,j , R − i,j ∈ R − i,jπ ] Pr R i [( y i , R i ) ∈ S iπ ] Pr R j [( y j , R j ) ∈ S jπ ] (cid:21) = X π r Pr { R i } i [Π( x i y j x − i,j = π ] . Pr { R i } i [ y i x j y − i,j = π ]= 1 − h (Π( x i y j x − i,j , y i x j y − i,j )) . (33)This property of Hellinger distance is called the k -party cut-paste property . Another property of Hellingerdistance which is required is the following. Lemma 17. For any randomized protocol Π and for any input x, y ∈ X k and for some i and j , h (Π( x i x j x − i,j , y i y j y − i,j )) ≥ h (Π( x i x j x − i,j , x i y j y − i,j )) + h (Π( y i x j x − i,j , y i y j y − i,j )) . (34) Proof. As before,(1 − h (Π( x i x j x − i,j , x i y j y − i,j ))) + (1 − h (Π( y i x j x − i,j , y i y j y − i,j )))= X π (cid:20)r Pr { R i } i [Π( x i x j x − i,j = π ] . Pr { R i } i [ x i y j y − i,j = π ]+ r Pr { R i } i [Π( y i x j x − i,j = π ] . Pr { R i } i [ y i y j y − i,j = π ] (cid:21) = X π (cid:20)r Pr { R i } i [( y − i,j , R i,j ∈ R − i,jπ ] Pr { R i } i [( x − i,j , R i,j ∈ R − i,jπ ] × r Pr R j [( x j , R j ) ∈ S jπ ] Pr R j [( y j , R j ) ∈ S jπ ] (cid:18) Pr R i [( x i , R i ) ∈ S iπ ] + Pr R i [( y i , R i ) ∈ S iπ ] (cid:19) (cid:21) ≥ X π (cid:20)r Pr { R i } i [( y − i,j , R i,j ∈ R − i,jπ ] Pr { R i } i [( x − i,j , R i,j ∈ R − i,jπ ] × r Pr R j [( x j , R j ) ∈ S jπ ] Pr R j [( y j , R j ) ∈ S jπ ] q Pr R i [( x i , R i ) ∈ S iπ ] Pr R i [( y i , R i ) ∈ S iπ ] (cid:21) = 2(1 − h (Π( x i x j x − i,j , y i y j y − i,j ))) . (35)Now we show some structural properties of coordinator model described in terms of Hellinger distance. Theseare generalizations of analogous properties shown in [BEO + + First we state a version of diagonal lemma for M and X i which will be useful in our proof. Note that Π i (¯ e i,j , ¯ e )is equivalent to Π i [1 , , j ; ¯ e ]. Lemma 18. For i = j h (Π i [0 , , j ; ¯ e ] , Π i [1 , , z ; ¯ e ]) ≥ h (Π i (¯ e i,j ; ¯ e ) , Π i (¯ e j ; ¯ e )) . (36)The next lemma is, as mentioned in [BEO + global-to-local property of Hellinger distance inthe following setting. 12 emma 19. For i = j h (Π[ i, , , z ; ¯ e ] , Π[ i, , , z ; ¯ e ]) = h (Π i [0 , , z ; ¯ e ] , Π i [1 , , z ; ¯ e ]) , (37) and h (Π(¯ e i,j ; ¯ e ) , Π(¯ e i ; ¯ e ) = h (Π i (¯ e i,j ; ¯ e ) , Π i (¯ e i ; ¯ e )) . (38)The proofs of Lemma 18 and Lemma 19 are straightforward from the analogous lemmas in [BEO + Proof of Lemma 18. We have to measure the Hellinger distance of the distribution of Π i where the inputs inthe first coordinate are coming from the distribution switched by M and Z . The first observation that we do isthe following. In the protocol Π the input of the i th player is fixed. The protocol can be thought of as a twoparty protocol, where the first player is the i -th player and the second player consists of all the k − M and Z , can randomly sample the inputs of all other playersparticipating in Π. The input of the first player, as in the case of i th player in the protocol Π is fixed. Let uscall this new protocol as ˆΠ. It is clear that the Hellinger distance in the Equation (36) remains the same if weconsider ˆΠ instead of Π.Now, given a two party protocol ˆΠ, we can invoke Pythagorean lemma (two party version of Lemma 17) toimply the following. 2 h (Π[0 , , j ; ¯ e ] , Π[1 , , j ; ¯ e ]) = 2 h ( ˆΠ[0 , (0 , j ); ¯ e ] , ˆΠ[1 , (1 , j ); ¯ e ]) ≥ h ( ˆΠ[0 , (0 , j ); ¯ e ] , ˆΠ[1 , (0 , j ); ¯ e ])+ h ( ˆΠ[0 , (1 , j ); ¯ e ] , ˆΠ[1 , (1 , j ); ¯ e ]) ≥ h ( ˆΠ[0 , (1 , j ); ¯ e ] , ˆΠ[1 , (1 , j ); ¯ e ])= h (Π[¯ e i,j ; ¯ e ] , ˆΠ[¯ e j ; ¯ e ]) . Proof of Lemma 19. We will make the following observations. Firstly, given any transcript τ for Π and a player i we can divide τ in three parts: τ i ← which is the part of the transcript where the coordinator sends messageto player i , τ i → which is the part of the transcript where the player i sends message to the coordinator and τ − i where the other players and the coordinator sends message to each other.Secondly, the following is easy to see. When the input of the player i is fixed (say to (01)) as in the case ofthe distribution we are interested in Eq. (37),Pr R,X [Π = τ ] = Pr R i [Π i → = τ i → | X i = (01) , Π i ← = τ i ← ] . Pr X − iR − i [Π i ← = τ i ← , Π − i = τ − i | Π i → = τ i → ] . (39)Similarly,Pr R,X [Π i = τ i ] = Pr R i [Π ii → = τ ii → | X i = (01) , Π ii ← = τ ii ← ] . Pr R − i ,X − i [Π ii ← = τ ii ← | Π ii → = τ ii → ] . (40)So, 1 − h (Π i [0 , , z ; ¯ e ] , Π i [1 , , z ; ¯ e ])= X τ i (cid:20)r Pr R i [Π ii → = τ ii → | X i = (01) , Π ii ← = τ ii ← ] Pr R i [Π ii → = τ ii → | X i = (11) , Π ii ← = τ ii ← ]Pr R − i ,X − i [Π ii ← = τ ii ← | Π ii → = τ ii → ] (cid:21) = X τ i (cid:20)r Pr R i [Π ii → = τ ii → | X i = (01) , Π ii ← = τ ii ← ] Pr R i [Π ii → = τ ii → | X i = (11) , Π ii ← = τ ii ← ] X τ : τ | i = τ i Pr R − i ,X − i [Π i ← = τ i ← , Π − i = τ − i | Π i → = τ i → ] (cid:21) = X τ (cid:20)q Pr R i [Π i → = τ i → | X i = (01) , Π i ← = τ i ← ] Pr R i [Π i → = τ i → | X i = (11) , Π i ← = τ i ← ]Pr R − i ,X − i [Π i ← = τ i ← , Π − i = τ − i | Π i → = τ i → ] (cid:21) = 1 − h (Π[ i, , , z ; ¯ e ] , Π[ i, , , z ; ¯ e ]) . (38) can be proved in the similar way. 13 .2.2 Information complexity of Disj Now we are ready to prove the partial information cost of Disj is Ω( k ). We consider processor i and fix a value j = i . Claim 20 ([BEO + . (1) I [ M − W : Π i | X i , Z = j, W = 2] ≥ h (Π i [1 , , j ; ¯ e ] , Π i [1 , , j ; ¯ e ]) , (41)(2) I [ X i − W : Π i | M , Z = j, W = 2] ≥ h (Π i [0 , , j ; ¯ e ] , Π i [1 , , j ; ¯ e ]) , (42)(3) I [ M − W : Π i | X i , Z = j, W = 1] ≥ h (Π i [¯ e ; 1 , , j ] , Π i [¯ e ; 1 , , j ]) , (43)(4) I [ X i − W : Π i | M , Z = j, W = 1] ≥ h (Π i [¯ e ; 0 , , j ] , Π i [¯ e ; 1 , , j ]) . (44) Proof sketch. Note that M takes value 0 with probability 2 / / 3. This makes M − W a completely unbiased random variable given X i = (11) and Z = j . This lets us make the following assertionfrom the property of Hellinger distance (c.f. Fact 14). I [ M − W : Π i | X i = (11) , Z = j, W = 2] ≥ h (Π i [1 , , j ; ¯ e ] , Π i [1 , , j ; ¯ e ]) . (45)Also, given M = 0 and Z = j , X i , for any i = j , is a random variable taking value between (01) and (11)uniformly. This lets us conclude the following using Fact 14. I [ X i − W : Π i | M = 0 , Z = j, W = 2] ≥ h (Π i [1 , , j ; ¯ e ] , Π i [0 , , j ; ¯ e ]) . (46)Now, it can be checked that Pr[ X i − W = (11) | Z = j, W = 2] = 2 / M = 0 | Z = j, W = 2] = 2 / 3. Combining these facts with Equation (45) and (46) we can prove the claim. Other two casescan be proved similarly.Using Cauchy-Schwarz and triangle inequality, we can write the following. I [ M − W : Π i | X i , Z = j, W ] + I [ X i − W ; Π i | M , Z = j, W ] ≥ (cid:2) h (Π i [1 , , j ; ¯ e ] , Π i [0 , , j ; ¯ e ]) + h (Π i [¯ e ; 1 , , j ] , Π i [¯ e ; 0 , , j ]) (cid:3) Using Lemma 18, ≥ (cid:2) h (Π i (¯ e i,j . ¯ e ) , Π i (¯ e j . ¯ e )) + h (Π i (¯ e ¯ e i,j ) , Π i (¯ e ¯ e j ) (cid:3) . Using Lemma 19 we get, X i I [ M − W : Π i | X i , Z , W ] + I [ X i − W ; Π i | M , Z , W ] ≥ k X i X j : i = j h h (Π(¯ e i,j . ¯ e ) , Π(¯ e j . ¯ e )) + h (Π(¯ e ¯ e i,j ) , Π(¯ e ¯ e j )) i By recounting the double summation,= 112 k X i = j h [ h (Π(¯ e i,j . ¯ e ) , Π(¯ e j . ¯ e )) + h (Π(¯ e i,j . ¯ e ) , Π(¯ e i . ¯ e ))]+ [ h (Π(¯ e ¯ e i,j ) , Π(¯ e ¯ e j )) + h (Π(¯ e ¯ e i,j ) , Π(¯ e ¯ e i ))] i ≥ k X i = j h [ h (Π(¯ e i . ¯ e ) , Π(¯ e j . ¯ e ))] + [ h (Π(¯ e ¯ e i ) , Π(¯ e ¯ e j ))] i = 124 k X i = j h [ h (Π(¯ e. ¯ e ) , Π(¯ e i,j . ¯ e ))] + [ h (Π(¯ e ¯ e ) , Π(¯ e ¯ e i,j ))] i [Lemma 16] ≥ k X i = j [ h (Π(¯ e. ¯ e i,j ) , Π(¯ e i,j . ¯ e ))] [Cauchy-Schwarz & triangle inequality] ≥ k X i = j h [ h (Π(¯ e. ¯ e i,j ) , Π(¯ e j . ¯ e i )) + h (Π(¯ e i,j ¯ e ) , Π(¯ e i . ¯ e j ))] i [Lemma 17]= k − − δ ) [Fact 15]= Ω( k ) . (47)We can write the penultimate equality because ¯ e. ¯ e i,j and ¯ e i,j ¯ e gives output 1 in Disj but ¯ e j . ¯ e i and ¯ e i . ¯ e j gives 0. In this section we will show the information complexity is right measure to lower bound by showing thatthe randomized communication complexity of any function f is lower bounded by the switched informationcomplexity of f . Theorem 21. For any distribution µ over the inputs, R ǫ ( Tribes m,ℓ ) = Ω( IC µ ( Tribes m,ℓ )) . (48) Proof. Let us assume the random variables ( X , Z ) is distributed according to µ and the marginal distributionon X is ν . Note that I µ [ X ; Y | Z ] ≤ H µ ( X | Z ) ≤ H ν ( X ). Now consider any ǫ -error protocol Π for Tribes m,ℓ .We can write the following. I [ X i : Π i ( X ) | M , Z ] ≤ H (Π i | M , Z ) ≤ H (Π i ) , (49)and I [ M : Π i ( X ) | X i , Z ] ≤ H (Π i | X i , Z ) ≤ H (Π i ) . (50)Now trivially H (Π i ) upper bounded by the biggest size of Π i (Note that, Π i is function of the randomvariable X ) and thus the Equation (49) can be upper bounded by the biggest size of Π i . But for each playerthe biggest size of his view of transcript can occur for different X . Hence we cannot upper bound the switchedinformation complexity by the | Π | .Instead, we use the following fact from information theory. Lemma 22 (Theorem 5.3.1 in [CT06]) . The expected length L of any instantaneous q -ary code for a randomvariable X satisfies the following inequality. L ≥ q H ( X ) . (51)We can make the transcript instantaneous by introducing a special delimiter . That still keeps the alphabetsize constant. Hence we can write the following. I [ X i : Π i ( X ) | M , Z ] ≤ H (Π i | M , Z ) ≤ H (Π i ) ≤ log 3 . E ( | Π i | )) , (52)and I [ M : Π i ( X ) | X i , Z ] ≤ H (Π i | X i , Z ) ≤ H (Π i ) ≤ log 3 . E ( | Π i | ) . (53)Now we are in good shape. We will complete the proof of the lemma by the following set of equations.15 i ∈ [ k ] ( I [ X i : Π i ( X ) | M , Z ] + I [ M ; Π i ( X ) | X i , Z ]) ≤ X i ∈ [ k ] E ( | Π i | )= 2 log 3 E x ( X i ∈ [ k ] | Π i | ) [Linearity of expectation]= 2 log 3 E x ( | Π | )= O (max x {| Π( x ) |} ) . (54)The worst case transcript size upper bounded by the randomized communication complexity of Tribes m,ℓ .This proves the theorem.Combining Theorem 21, 6, 9 and 12, we can prove Theorem 1. References [BBFM12] Maria-Florina Balcan, Avrim Blum, Shai Fine, and Yishay Mansour. Distributed learning, com-munication complexity and privacy. In Shie Mannor, Nathan Srebro, and Robert C. Williamson,editors, COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edin-burgh, Scotland , pages 26.1–26.22. JMLR.org, 2012.[BEO + 13] Mark Braverman, Faith Ellen, Rotem Oshman, Toniann Pitassi, and Vinod Vaikuntanathan. A tightbound for set disjointness in the message-passing model. In FOCS , pages 668–677. IEEE ComputerSociety, 2013.[BJKS02] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approachto data stream and communication complexity. In , pages 209–218.IEEE Computer Society, 2002.[BKS13] Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing.In Richard Hull and Wenfei Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems, PODS 2013, New York, NY, USA - June 22 - 27,2013 , pages 273–284. ACM, 2013.[CP10] Arkadev Chattopadhyay and Toniann Pitassi. The story of set disjointness. SIGACT News , 41(3):59–85, 2010.[CT06] Thomas M. Cover and Joy A. Thomas. Elements of information theory (2. ed.) . Wiley, 2006.[DF92] Danny Dolev and Tom´as Feder. Determinism vs. nondeterminism in multiparty communicationcomplexity. SIAM J. Comput. , 21(5):889–895, 1992.[DKO12] Andrew Drucker, Fabian Kuhn, and Rotem Oshman. The communication complexity of distributedtask allocation. In Darek Kowalski and Alessandro Panconesi, editors, ACM Symposium on Prin-ciples of Distributed Computing, PODC ’12, Funchal, Madeira, Portugal, July 16-18, 2012 , pages67–76. ACM, 2012.[DR98] Pavol Duris and Jos´e D. P. Rolim. Lower bounds on the multiparty communication complexity. J.Comput. Syst. Sci. , 56(1):90–95, 1998.[GSZ11] Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in themapreduce framework. In Takao Asano, Shin-Ichi Nakano, Yoshio Okamoto, and Osamu Watanabe,editors, Algorithms and Computation - 22nd International Symposium, ISAAC 2011, Yokohama,Japan, December 5-8, 2011. Proceedings , volume 7074 of Lecture Notes in Computer Science , pages374–383. Springer, 2011.[JKS03] T. S. Jayram, Ravi Kumar, and D. Sivakumar. Two applications of information complexity. InLawrence L. Larmore and Michel X. Goemans, editors, STOC , pages 673–682. ACM, 2003.[KN97] Eyal Kushilevitz and Noam Nisan. Communication complexity . Cambridge University Press, 1997.16KS11] Paraschos Koutris and Dan Suciu. Parallel evaluation of conjunctive queries. In Maurizio Lenzeriniand Thomas Schwentick, editors, Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Sym-posium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece , pages223–234. ACM, 2011.[KSV10] Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce.In Moses Charikar, editor, Proceedings of the Twenty-First Annual ACM-SIAM Symposium onDiscrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010 , pages 938–948. SIAM,2010.[Lin91] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on InformationTheory , 37(1):145–151, 1991.[PVZ12] Jeff M. Phillips, Elad Verbin, and Qin Zhang. Lower bounds for number-in-hand multiparty commu-nication complexity, made easy. In Yuval Rabani, editor, Proceedings of the Twenty-Third AnnualACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012 ,pages 486–501. SIAM, 2012.[WZ12] David P. Woodruff and Qin Zhang. Tight bounds for distributed functional monitoring. In Howard J.Karloff and Toniann Pitassi, editors, Proceedings of the 44th Symposium on Theory of ComputingConference, STOC 2012, New York, NY, USA, May 19 - 22, 2012 , pages 941–960. ACM, 2012.[WZ13] David P. Woodruff and Qin Zhang. When distributed computation is communication expensive. InYehuda Afek, editor, Distributed Computing - 27th International Symposium, DISC 2013, Jerusalem,Israel, October 14-18, 2013. Proceedings , volume 8205 of Lecture Notes in Computer Science , pages16–30. Springer, 2013.[WZ14] David P. Woodruff and Qin Zhang. An optimal lower bound for distinct elements in the messagepassing model. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAMSymposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 , pages718–733. SIAM, 2014.[Yao79] Andrew Chi-Chih Yao. Some complexity questions related to distributive computing (preliminaryreport). In Michael J. Fischer, Richard A. DeMillo, Nancy A. Lynch, Walter A. Burkhard, andAlfred V. Aho, editors,