[PDF] Adaptive Majority Problems for Restricted Query Graphs and for Weighted Sets

Abstract

Suppose that the vertices of a graph G are colored with two colors in an unknown way. The color that occurs on more than half of the vertices is called the majority color (if it exists), and any vertex of this color is called a majority vertex. We study the problem of finding a majority vertex (or show that none exists) if we can query edges to learn whether their endpoints have the same or different colors. Denote the least number of queries needed in the worst case by m(G) . It was shown by Saks and Werman that m( K n )=n−b(n) , where b(n) is the number of 1's in the binary representation of n . In this paper, we initiate the study of the problem for general graphs. The obvious bounds for a connected graph G on n vertices are n−b(n)≤m(G)≤n−1 . We show that for any tree T on an even number of vertices we have m(T)=n−1 and that for any tree T on an odd number of vertices, we have n−65≤m(T)≤n−2 . Our proof uses results about the weighted version of the problem for K n , which may be of independent interest. We also exhibit a sequence G n of graphs with m( G n )=n−b(n) such that G n has O(nb(n)) edges and n vertices.

Full PDF

aa r X i v : . [ m a t h . C O ] M a y Adaptive Ma jority Problemsfor Restricted Query Graphsand for Weighted Sets ∗ G´abor Dam´asdi c , D´aniel Gerbner a , Gyula O.H. Katona a , Bal´azs Keszegh a,c ,D´aniel Lenger c , Abhishek Methuku b , D´aniel T. Nagy a ,D¨om¨ot¨or P´alv¨olgyi c , Bal´azs Patk´os a,d , M´at´e Vizer a , G´abor Wiener e a Alfr´ed R´enyi Institute of Mathematics, Hungarian Academy of Sciences b ´Ecole Polytechnique F´ed´erale de Lausanne c MTA-ELTE Lend¨ulet Combinatorial Geometry Research Group, E¨otv¨os Lor´and University d Lab. of Combinatorial and Geometric Structures, Moscow Inst. of Physics and Technology e Dept. of Computer Science and Information Theory, Budapest Univ. of Technology and Economics

May 12, 2020

Abstract

Suppose that the vertices of a graph G are colored with two colors in an unknown way.The color that occurs on more than half of the vertices is called the majority color (if itexists), and any vertex of this color is called a majority vertex . We study the problem ofﬁnding a majority vertex (or show that none exists), if we can query edges to learn whethertheir endpoints have the same or diﬀerent colors. Denote the least number of queries neededin the worst case by m ( G ). It was shown by Saks and Werman that m ( K n ) = n − b ( n ),where b ( n ) is the number of 1’s in the binary representation of n . ∗ Research supported by the Lend¨ulet program of the Hungarian Academy of Sciences (MTA), under grantnumber LP2017-19/2017, the J´anos Bolyai Research Fellowship of the Hungarian Academy of Sciences, theNational Research, Development and Innovation Oﬃce – NKFIH under the grants K 116769, K 124171, K 132696,SNN 129364, KH 130371 and FK 132060, the National Research, Development and Innovation Fund (TUDFO /51757 / 2019-ITM, Thematic Excellence Program), by the European Union, co-ﬁnanced by the European SocialFund, by the grant of Russian Government N 075-15-2019-1926, by the New National Excellence Program underthe grant number ´UNKP-19-4-BME-287 and by the BME-Artiﬁcial Intelligence FIKP grant of EMMI (BMEFIKP-MI/SC). n this paper we initiate the study of the problem for general graphs. The obviousbounds for a connected graph G on n vertices are n − b ( n ) ≤ m ( G ) ≤ n −

1. We show thatfor any tree T on an even number of vertices we have m ( T ) = n −

1, and that for any tree T on an odd number of vertices, we have n − ≤ m ( T ) ≤ n −

2. Our proof uses resultsabout the weighted version of the problem for K n , which may be of independent interest.We also exhibit a sequence G n of graphs with m ( G n ) = n − b ( n ) such that G n has O ( nb ( n ))edges and n vertices. Given a set X of n balls and an unknown coloring of X with a ﬁxed set of colors, we say that a ball x ∈ X is a majority ball if its color class contains more than | X | / majority problem is to ﬁnd a majority ball (or show that none exists). In the basic model of majority problems,one is allowed to ask queries of pairs ( x, y ) of balls in X to which the answer tells whether thecolor of x and y is the same or not, which we denote by SAME and DIFF, respectively. Theanswers are given by an adversary whose goal is to force us to use as many questions as possible.It is an easy exercise to see that if the number of colors is two, then in a non-adaptive search (allqueries must be asked at once) the minimum number of queries to solve the majority problem is n −

1, unless n is odd, in which case n − ⌈ n/ ⌉ − n − b ( n ), where b ( n )is the number of 1’s in the binary form of n (we note that there are simpler proofs of this result,see [1, 14, 17]). There are several other generalizations of the problem, which include more colors[2, 4, 10, 12], larger queries [3, 4, 6, 8, 11, 12, 13], non-adaptive [1, 5, 10] and weighted versions[10].In the present paper we study the adaptive majority problem for two colors when we restrictthe set of pairs that can be queried to the edges of some graph G on n vertices. The originalmajority problem, where we can ask any pair, corresponds to G = K n . To distinguish betweenthe version when we are restricted to the edges of a graph, and the original, unrestricted version,we call the colored objects vertices and balls , respectively. To the best of our knowledge, theonly similar result is [7], where it was shown that if the size of the two color classes diﬀers onlyby a constant, then Ω( n ) queries might be needed on any graph even for a randomized algorithmto solve the majority problem, but if the sizes diﬀer by Ω( n ), then randomized algorithms cando better for several graph classes. In this paper, however, we only deal with the worst-caseperformance of deterministic algorithms, which allows us to obtain better bounds.Notice that it is possible to solve the majority problem (with any number of queries) if andonly if G is connected when n is even, and if and only if G has at most two components when n

2s odd. For any such graph, denote the minimum number of queries needed to solve the majorityproblem in the worst case by m ( G ). Obviously we have n − b ( n ) = m ( K n ) ≤ m ( G ) ≤ n − m ( G ) ≤ n − n is odd). Our main results are the following. Theorem 1.1.

For every tree T on an even number n of vertices m ( T ) = n − andfor every tree T on an odd number n of vertices m ( T ) ≥ n − . The constant 65 is probably far from optimal. We will see trees T on n vertices with m ( T ) = n −

3, but it is possible that m ( T ) ≥ n − n −

6, for paths.We also study the least number of edges a graph must have if we can solve the majorityproblem as fast as in the unrestricted case, i.e., when m ( G ) = n − b ( n ). Theorem 1.2.

For every n , there is a graph G with n vertices and n (1 + b ( n )) edges such that m ( G ) = n − b ( n ) . It would be interesting to determine whether this bound can be improved to O ( n ), or showa superlinear lower bound.The proof of Theorem 1.1 uses a weighted version of the original (i.e., G = K n case of the)majority problem, which is deﬁned in the next section. We think these results are interesting ontheir own.In the following, we always suppose that only two colors are used, which we call red and blue.When both colors contain the same number of balls, then we call the coloring balanced . Now we deﬁne a variant of the majority problem, where the balls are given diﬀerent weights.More precisely, given k balls with non-negative integer weights w , . . . , w k , a ball is a (weighted) majority ball if the weight of its color class is more than P ki =1 w i /

2. The (weighted) majorityproblem is to ﬁnd a majority ball (or show that none exists). We will often identify a ball withits weight or its index and talk about a ball with weight w i , or the ball w i , or the ball i .Note that during the running of an adaptive algorithm solving the non-weighted majorityproblem, at any point the information obtained so-far can be represented by a graph whosevertices are all balls, and the queries asked are edges labeled with DIFF or SAME. Since nowwe study the majority problem only for two colors, we can deduce from the labels of the edgesthe color partition inside every component. Denote the diﬀerence between the sizes of the colorclasses in each component by w i . Finishing the algorithm from a given state is equivalent tosolving the majority problem with the weights w i . Similarly, in the weighted version when we This is explained in more details in Section 3. w i and a ball with weight w j , we can consider the answer as merging thetwo balls into a ball with weight w i + w j or | w i − w j | , depending on the answer. We will say thatthe new ball contains the two previous balls.A set of k balls with given weights w , . . . , w k can be represented by a vector w = ( w , . . . , w k ).We denote the number of queries needed to solve the weighted majority problem in the worst caseby m ( w ). So, with this notation, the result of Saks and Werman for the non-weighted problemcan be written as m (1 , . . . ,

1) = k − b ( k ). Note that m ( w ) ≤ k − P ki =1 w i is odd, then m ( w ) ≤ k − k ≥ µ ( k ) denote the largest l such that 2 l divides k (anddeﬁne µ (0) = ∞ ). For w = ( w , . . . , w k ) denote by p the number of balanced colorings and by p i the number of (non-balanced) colorings such that w i is in the majority class. Proposition 2.1. (i) m ( w ) ≥ k − µ ( p ) . (ii) m ( w ) ≥ k − − µ ( p i ) for every i ≤ k . It was also shown in [10] that m ( w ) = k − µ ( p ) for µ ( p ) ≤

2, but not in general, e.g., for w = { , , , , , , } we have 8 balanced colorings, but m ( w ) = 5 > − µ (8).Our main results about the weighted majority problem are exact bounds for some special w .They are based on the following lemma. Lemma 2.2.

Let w = ( w , . . . , w k ) and k > n . (i) If w = · · · = w n = 1 and P ki =1 w i = 2 n +1 , then m ( w ) = k − . (ii) If w = · · · = w n = 1 , P ki =1 w i = 2 n +1 + 1 and k = 2 n + 1 , then m ( w ) = k − .Proof. For both statements we use Proposition 2.1. Let us start with (i) . We are going tocalculate µ ( p ). First color only the balls whose index is from { n + 1 , . . . , k } . Denote by B and R , respectively, the blue and red balls whose index is from { n + 1 , . . . , k } . Let x := | P i ∈ B w i − P i ∈ R w i | . Note that x is an even integer, and for a ﬁxed x there are 2 (cid:0) n n − − x/ (cid:1) waysin which we can color { , . . . , n } to make the coloring balanced (note that this value is obtainedby considering both the cases P i ∈ B w i − P i ∈ R w i ≤ P i ∈ B w i − P i ∈ R w i ≥ (cid:0) n n − − x/ (cid:1) is divisible by 4 except for the case x = 2 n , when this number is exactly2. This means that the number of balanced colorings is 2 mod 4. Thus Proposition 2.1 (i) gives m ( w ) ≥ k −

1, and since m ( w ) ≤ k − (ii) goes similarly. We are going to calculate µ ( p k ). We deﬁne B and R in the same way. Without loss of generality we can assume that ball k is blue and let y = P i ∈ B w i − P i ∈ R w i . Then the number of colorings of the ﬁrst 2 n balls such that blue is the Beware that in [10] a slightly diﬀerent notation was used, where p denoted the number of balanced 2-partitions,which is half of the number of balanced colorings, and part (ii) of Proposition 2.1 was not explicitly stated. P n i =2 n − −⌊ y/ ⌋ (cid:0) n i (cid:1) . If y < n , each term here is divisible by 2, except the lastone, while if y > n , each term is divisible by 2. There are 2 k − − n ways to color the balls from { n + 1 , . . . , k − } out of which y > n only once. This means that when ball k is blue, then thenumber of colorings (of all the balls) where blue is the majority color is odd. Of course, the sameis true when ball k is red. Thus Proposition 2.1 (ii) gives m ( w ) ≥ k −

2, and since m ( w ) ≤ k − P ki =1 w i is odd, we have equality.Note that in the above proof we do not need at all that the ﬁrst few weights are 1, we onlyneed that they are the same and that their number is a power of two. We also do not need in (i) that the sum of all the weights is exactly 2 n +1 , only that x = P i ∈ B w i − P i ∈ R w i = 2 n canhappen in an odd number of ways. We also do not need in (ii) that the sum of all the weightsis exactly 2 n +1 + 1, only that − n < y = P i ∈ B w i − P i ∈ R w i ≤ n can happen in an odd numberof ways (with ﬁxed ball k always blue). A suﬃcient condition for this is that y > − n alwaysholds, while y ≤ n holds except when all balls are blue, i.e., we need w k − P k − i =2 n +1 w i > − n and P ki =2 n +1 w i − w j ≤ n + w j for any 2 n < j < k . To summarize, this proves the following. Lemma 2.3.

Let w = ( w , . . . , w k ) and k > n + 1 . (i) If w = · · · = w n and there are an odd number of partitions R ∪ B = { n + 1 , . . . , k } such that P i ∈ B w i − P i ∈ R w i = w n , then m ( w ) = k − . (ii) If w = · · · = w n and P ki =1 w i ≤ w n +1 + 2 w j for any n < j ≤ k , with the inequalitybeing strict for j = k , then m ( w ) ≥ k − . These imply, for example, that m (3 , , , ,

9) = 4 and m (3 , , , , ≥ Corollary 2.4. (i) If w = · · · = w n +2 s = 1 and P ki =1 w i = 2 n +1 + 2 s , then m ( w ) ≥ k − − s . (ii) If w = · · · = w n +2 s = 1 , P ki =1 w i = 2 n +1 +2 s +1 and w k = 2 n +1 , then m ( w ) ≥ k − − s .Proof. We only prove (i) - the proof of (ii) goes the same way. First, we prove the weakerstatement that m ( w ) ≥ k − − s . We reveal 2 s balls of weight 1 such that half of them are redand half of them are blue, and then apply Lemma 2.2 for the remaining balls.For the bound m ( w ) ≥ k − − s we need one more trick. We run the adversarial algorithmthat gives the lower bound in Lemma 2.2, until the ﬁrst ball of weight 1 is queried. When thishappens, then we reveal that it is red, and we reveal another ball of weight 1 that it is blue. Wedo this s times, and after that proceed according to the adversarial algorithm.Call a vector w = ( w , . . . , w k ) hard if m ( w ) = k − P ni =1 w i is even, or P ni =1 w i is oddand m ( w ) = k −

2. Thus Lemma 2.2 states that the vectors satisfying its conditions are hard.

Observation 2.5.

Let w = ( w , w , . . . , w k ) be a hard vector with w = w and let w ′ =(2 w , w , w , . . . , w k ) . Then w ′ is also hard. roof. If k = 2, then w ′ is hard by deﬁnition. Let us assume that k ≥ w ′ is not hard. Wewill show that w is not hard either. We ask w and w in the ﬁrst query. If the answer is DIFF,we can obviously ﬁnish with k − P ki =1 w i is even and k − P ki =1 w i is odd, thus w cannot be hard. If the answer is SAME, we apply our algorithm for w ′ to reach the same conclusion using that w ′ is not hard. Question 2.6.

Does the reverse direction also hold in the above observation?

Also, the respective statement might hold when m ( w ) is smaller, but we do not know of other,generally applicable suﬃcient conditions.Combining Observation 2.5 with Lemma 2.2, we obtain the following statement. Lemma 2.7. If w , . . . , w j are each powers of two and (i) P ji =1 w i = 2 n and P ki =1 w i = 2 n +1 , then w is hard, i.e., m ( w ) = k − . (ii) P ji =1 w i = 2 n , P ki =1 w i = 2 n +1 + 1 and k > n + 1 , then w is hard, i.e., m ( w ) = k − . Note that (i) of Lemma 2.2 states that if the sum of the weights in w is a power of two, andat least half of that weight is given by balls of weight 1, then w is hard. Lemma 2.7 shows thatweight 1 can be replaced by any weights that are powers of two, and (ii) of Lemma 2.2 can besimilarly improved. If Observation 2.5 held for non-hard vectors as well, then we could obtainan improvement of Lemma 2.7, similar to Corollary 2.4. Instead, we state the following weakerstatement. Corollary 2.8. If w , . . . , w j are each powers of two and P ji =1 w i = 2 n , w j +1 = 1 , P ki =1 w i = 2 n +1 + 3 and k > n + 2 , then m ( w ) ≥ k − .Proof. If w i = 1 for every i > j , then we know that m ( w ) = k −

2. Otherwise, before the startof the algorithm, we can reveal that ball j + 1 and the heaviest ball with index > j + 1 havediﬀerent colors, reducing the problem to (ii) of Lemma 2.7.Combining Lemma 2.3 and Corollary 2.8 we obtain the following. Proposition 2.9. If ≤ w , . . . , w j ≤ and P ji =1 w i = 2 n , P ki =1 w i = 2 n +1 + 3 and k > n + 2 , then m ( w ) ≥ k − .Proof. If there is some i > j such that w i = 1, we are done using Corollary 2.8. Otherwise, forall j < i ≤ k we have w i ≥

2, and we can apply Lemma 2.3.The next subsection, contrary to its title, is not so relevant to our proof, but it helps tounderstand better what can happen before the ﬁnal steps of an optimal algorithm that solvesthe majority problem. 6 .1 Relevant balls

Given w = ( w , . . . , w k ), we say that w i is relevant if there is a coloring of the other balls suchthat the color of w i changes what the majority color is, or whether a majority color exists. Inother words, there is a coloring of the other balls, such that either w i is red means red is majorityand w i is blue means blue is majority, or one color of w i means there is no majority, the othercolor means there is majority. In this subsection we prove some simple facts about relevant balls.We start with some simple observations. Proposition 2.10. (i) If m ( w ) = 0 , then either there is no relevant ball and the answer is thatthere is no majority, or there is one relevant ball and that is the majority ball. (ii) If we obtain w ′ from w by any answer to a query ( a, b ) such that b is non-relevant, then m ( w ′ ) = m ( w ) . (iii) If we increase the weight of a relevant ball, it remains relevant. In other words, if weobtain w ′ from w by replacing one ball w i with w ′ i such that w ′ i > w i , and w i is relevant in w ,then w ′ i is relevant in w ′ . (iv) For any w there is a threshold t > such that w i is relevant if and only if w i > t . (v) If a ball x is relevant before a query Q not containing x , there is at least one answer to Q such that x is still relevant afterwards. If a ball x is relevant before a query ( x, y ) , then afterthe answer SAME the resulting ball with weight x + y is relevant. (vi) For any query ( a, b ) there is an answer such that the number of relevant balls decreasesby at most two.Proof. To prove (i) , observe that if we color all the balls blue, there is a majority, unless all theballs have zero weight, in which case there is no relevant ball. Thus if there is no majority inany coloring of w , then there cannot be relevant balls. If there is a majority ball a , then it hasto be the only relevant ball. Indeed, if b is relevant, then changing the color of b must changethe answer, unless b was the answer.To prove (ii) , let c be the ball that the answer to the query ( a, b ) gives, thus it has weighteither a + b or | a − b | . Any algorithm that gives a solution for w gives a solution for w ′ , whereasking a ball that contains a is replaced by the query that contains c , and ignoring queries thatinvolve b . A similar argument shows that a solution for w ′ gives a solution for w . Therefore, m ( w ′ ) = m ( w ).To prove (iii) , assume for a contradiction that w ′ i is not relevant and take a coloring of theother balls that shows this. But then taking the same coloring for the other balls also shows that w i is not relevant, a contradiction.To prove (iv) , observe ﬁrst that it is equivalent to the statement that a non-relevant ballcannot have larger or equal weight than a relevant ball. Indeed, this is obviously implied by (iv) ,and if this holds, than t can be chosen as the largest weight of a non-relevant ball. Now assume In [10] the property that every ball is relevant was called non-slavery . is relevant and w ( b ) ≥ w ( a ). Then consider the coloring that shows a is relevant, and exchangethe color of b and the color of a . This coloring clearly shows b must be also relevant.To prove (v) , assume x is not in the query and consider a coloring of the other balls suchthat the color of x decides the majority. In that coloring the balls in the query have diﬀerent orsame color; answer accordingly. Then the same coloring shows x is still relevant.Assume now x is in the query ( x, y ) and the answer is SAME. Consider a coloring of theother balls such that the color of x decides the majority. Taking the same coloring, the new ball x + y will decide majority.To prove (vi) , consider the relevant ball x not in the query with the smallest weight. By (v) there is an answer such that x remains relevant. As other relevant balls have larger weight, theyalso remain relevant, except for a and b (whose total weight can go below the weight of x ).We remark that (vi) of the above proposition gives a new proof of a proposition from [10],which states that if all the n balls are relevant, we need at least ⌊ n/ ⌋ queries. Proposition 2.11.

Before the last query of an optimal algorithm, there are either two relevantballs, they are of equal weight and there are no other balls with non-zero weight, or there are threerelevant balls, and any query that compares two of them ﬁnishes the algorithm.Proof.

Proposition 2.10 implies that there are at most three relevant balls. Observe that if thereare two relevant balls a and b in w , they must have the same weight. Indeed, if w ( a ) > w ( b ) andthe total weight m of the other balls is smaller than w ( a ) − w ( b ), then a is the only relevant ball.If m ≥ w ( a ) − w ( b ), then color b red, a blue and go through the other balls in increasing orderof their weight, without the last ball c . We give each of them the color which has the smallerweight at that point. The ﬁrst ball gets the color red, but as m ≥ w ( a ) − w ( b ), at one point thetotal weight of red balls becomes at least the total weight of blue balls. From that point, thediﬀerence between the classes is at most the weight of the current ball, which is at most w ( c ).This coloring shows c is relevant.It is left to show that if there are exactly three relevant balls, a , b and c , querying any two ofthem (say a and b ) ﬁnishes the algorithm. Let m be the sum of the weights of the other balls.If m = 0, we are done unless both a + b and | a − b | are equal to c , which means b = 0, but aball with zero weight cannot be relevant. Thus we can assume m >

0. We have a ≤ b + c + m ,otherwise we are done (which contradicts our assumption that we are before the last query). Butwe also have a + m ≤ b + c , because the other balls are not relevant. Moreover, if a + m = b + c ,then again, some of the other balls would be relevant, thus we have a + m < b + c . This implies c > a − b + m , thus we are done if the answer is DIFF, as c is a majority ball. We also have a + b + m ≥ c , otherwise we are done without the last query, and it implies a + b ≥ c + m ,moreover a + b > c + m , otherwise some of the remaining balls are relevant. Thus we are doneif the answer is SAME. 8 Graphs

Let us start this section with describing in detail how the weighted majority problems are con-nected to the majority problem on graphs. Consider an algorithm solving the majority problemon a graph G . Let G i be the subgraph of G formed by the ﬁrst i queries. We call the vertex setof a connected component of G i a q -component . Observe that knowing the answer to the ﬁrst i queries, for every q -component U we know a partition of U into a blue subset U and a red subset U . Let the weight of U be w i ( U ) = (cid:12)(cid:12) | U | − | U | (cid:12)(cid:12) . If the i + 1 st query is ( u, u ′ ) with u ∈ U and u ′ ∈ U ′ , where U and U ′ are q -components in G i , then U and U ′ are merged into a q -componentwith vertex set U ∪ U ′ in G i +1 . Moreover, its weight w i +1 ( U ∪ U ′ ) is either w i ( U ) + w i ( U ′ ), or | w i ( U ) − w i ( U ′ ) | , depending on the answer to the i + 1 st query ( u, u ′ ). For other q -components U ′′ we have w i +1 ( U ′′ ) = w i ( U ′′ ). Hence an algorithm that ﬁnishes solving the majority problem on G after the k th query also solves the weighted majority problem for the vector having the weightsof the q -components of G k as coordinates. However, this does not work in the other direction,as we do not have the restriction of the graph structure in the weighted problem. Thus we canonly prove upper bounds for m ( G ) this way.We will omit i and simply talk about w ( U ) instead of w i ( U ) because i will be always clearfrom the context. For a ball u ∈ U , let w ( u ) := w ( U ). If a q -component X has weight zero,we say that X is balanced . Similarly to vectors, we say that a graph G on n vertices is hard if m ( G ) = n − n and m ( G ) = n − n . Proposition 3.1.

Every tree T on an even number n of vertices is hard, i.e., m ( T ) = n − .Proof. We show more: the adversary can pick in advance a coloring c of the vertices such thatno matter what edge is missing from the queries, we cannot ﬁnd out if there is a majority ornot. All this coloring needs to satisfy is that we have n/ T , both the resulting subtrees are unbalanced, i.e., the number of blue and red balls is notthe same in them. Indeed, if an edge of T is not asked, then the adversary can either claim thatthe real coloring of the vertices is c and thus no majority vertex exists, or the coloring coincideswith c on one component, but is exactly the ﬂipped version of c on the other component andthus there is majority.Equivalently, we want to ﬁnd a balanced 2-coloring such that each edge of T cuts it into twonon-balanced parts. We start with an arbitrary balanced coloring of T . If an edge connecting ared ball u and a blue ball v cuts T into two balanced parts, we can simply change the color of u to blue and the color of v to red. Observe that any other edge e cuts T into two parts such that u and v belong to the same part, hence it does not change whether e cuts T into balanced parts.Let us assume now that u and v are both red, and the edge uv cuts T into two balanced parts A and A ′ . Then we change the color of every ball in A . We claim that for any edge e that cuts T into parts B and B ′ , it does not change whether B and B ′ are balanced. Indeed, either B iscompletely inside A ′ , in which case no color inside B is changed, or B contains A , in which casesome colors have changed, but the number of blue balls turning red is the same as the number9f red balls turning blue, as A is balanced. As B ′ is balanced if and only if B is balanced, B ′ isalso unaﬀected. Now u and v have diﬀerent colors, so we can again exchange their color.Hence we obtained that we can decrease the number of edges that cut T into balanced parts.It is easy to see that the coloring remains balanced. After applying this operation ﬁnitely manytimes we obtain the desired coloring, ﬁnishing the proof.Surprisingly, it is much harder to give a lower bound for trees on an odd number of vertices.For paths, for example, we have m ( P n ) = n − b ( n ) for all odd n ≤

13, while m ( P ) = 12 = n − b ( n ) + 1 = n −

3. (This we have veriﬁed with a computer program.) We conjecture that n − n − n −

65 for odd n , we start with a lemma that gives another prooffor Proposition 3.1. First, we introduce a notation. In a graph G , for a subset of its vertices X ⊂ V we denote by δ ( X ) the parity of the number of edges between X and V \ X . If G isa tree and X is a connected subset of vertices, then δ ( X ) equals the parity of the number ofcomponents of V \ X . Recall that the weight of a q -component X , denoted by w ( X ) is thediﬀerence between the number of the blue balls and the number of the red balls in it. Lemma 3.2.

We can answer to any sequence of queries in any graph G such that for any q -component X ( V we have ≤ w ( X ) ≤ , and (i) if | X | is odd, w ( X ) = 1 , (ii) if | X | is even, w ( X ) = 2 δ ( X ) . Note that if T is a tree on an even number of vertices, then 0 ≤ w ( X ) ≤ q -component. Assume at that pointthere would also be some balanced q -components. Observe that there is a tree-structure onthe q -components of a tree. Then at least one of the balanced q -components would be a leaf-component, but that contradicts condition (ii) . Thus there can be only one component, whichimplies Proposition 3.1.For trees on an odd number of vertices, a similar argument cannot work, as for example in P n ,it can happen that the ﬁrst two vertices form a q -component of weight 2, followed by ( n − / q -component, and the last vertex is a q -componentof weight 1. In this case we have solved the majority problem with only ( n − / q -components. For paths, there is no way to keep the weightfunction bounded without allowing an arbitrarily number of adjacent balanced q -components;but if this happened, then we could merge all the q -components to their left, and all the q -components to their right, so that only two non-balanced q -components remain - after this weare done if n is odd, saving an arbitrarily large number of queries. This is why the proof will bemore complicated for trees on an odd number of vertices; we will need to use our results aboutweighted balls. 10 roof of Lemma 3.2. Initially the conditions are satisﬁed. Suppose that the query is betweentwo q -components, X and Y .If | X | + | Y | is odd, then exactly one of w ( X ) and w ( Y ) equals 1, while the other equals 0 or2, so we can achieve w ( X ∪ Y ) = 1 to satisfy condition (i) .If | X | and | Y | are both odd, then we can choose the weight of X ∪ Y to be 0 or 2; one ofthose is equal to 2 δ ( X ).If | X | and | Y | are both even, then since δ ( X ∪ Y ) = δ ( X ) + δ ( Y ) − | E ( X, Y ) | = δ ( X ) + δ ( Y ) mod 2, we need w ( X ∪ Y ) = w ( X ) + w ( Y ) mod 4. Observe that w ( X ) + w ( Y ) is 0, 2 or4. Thus we can answer so that w ( X ∪ Y ) becomes 0, 2 or 0, respectively, to satisfy condition (ii) .For the lower bound of n −

65 for trees on an odd number of vertices, we need another theorem.Before that, we prove a simpler result that contains an important ingredient of the proof, and isof independent interest.

Theorem 3.3.

Let n = 2 k + l , where l < k . If G has a set U of vertices such that | U | ≤ k − and the components of G \ U are single vertices (i.e., every edge is incident to a vertex in U ),then G is hard, i.e., m ( G ) = n − if n is even and m ( G ) = n − if n is odd.Proof. Denoting by w ( X ) the weight of a q -component X , we initially have P X w ( X ) = n . Theadversary will maintain in the ﬁrst part of the algorithm that w ( X ) = 0 for every q -component X . Let us now describe a strategy of the adversary for the ﬁrst part of the algorithm. Wheneverfor some v ∈ G \ U we ask the ﬁrst query containing v , if the other vertex in the query u ∈ U is such that w ( u ) = p ≥

2, the answer is such that the weight of the new q -component is p − P X w ( X ) decreases by 2. In every other case the answer is such that the weights are addedup, i.e., P X w ( X ) remains the same.Introduce the potential function Ψ = P X w ( X ) + |{ X | X ∩ U = ∅ , w ( X ) = 1 }| . Theadversary’s strategy is such that every time we ask the ﬁrst query containing a v ∈ G \ U , thefunction Ψ decreases by at least 1. Since initially Ψ = n + | U | , after | U | + l queries involvingsome vertex of V \ U , we would have 2 k ≥ Ψ ≥ P X w ( X ). But the adversary stops executingthis algorithm the moment we have P X w ( X ) = 2 k or P X w ( X ) = 2 k + 1; this surely happens,as P X w ( X ) can only decrease by 2.Let us consider the vertices from G \ U that were merged into some q -components (i.e. thosethat appeared in queries). Let x denote the number of those where the total weight did notdecrease when they ﬁrst appeared in a query, and y denote the number of those where the totalweight decreased when they ﬁrst appeared in a query. Then we have x ≤ y + | U | . Indeed,consider a q -component containing a vertex u ∈ U , and observe that whenever the weight of thiscomponent increased by merging it with a vertex from G \ U , the next time its weight decreased. Instead of the potential function argument, the proof of Theorem 3.3 could also be ﬁnished in the same wayas the proof of Theorem 3.4 or 3.6. G \ U that have not appeared in any query is at least n − | U | − ( | U | + l ) ≥ k − .Now we can apply Lemma 2.2 to the current q -components as weighted balls. Indeed, we haveat least 2 k − q -components of weight 1, and the total weight is 2 k or 2 k + 1. By Lemma 2.2 thenumber of queries needed is the number of components minus 1 or minus 2, depending on theparity. Hence even if we could compare any two q -components from now, we still could not solvethe majority problem with less queries.With a similar method, we can obtain the following lower bound for odd paths. Theorem 3.4. m ( P n ) ≥ n − .Moreover, m ( P n ) ≥ n − unless n + 1 or n + 3 is a power of two.Proof. We have already seen that this holds if n is even, so it is enough to prove the theoremfor n odd. First we prove the weaker claim m ( P n ) ≥ n −

10. The statement holds for n < m ( P n ) ≥ n − b ( n ). Let U include every 9 th vertex of P n , starting with the ﬁrst, and alsothe last vertex of P n , so ⌈ n ⌉ ≤ | U | ≤ ⌈ n ⌉ + 1, and P n \ U consists of paths on 8 vertices (andpossibly one shorter path at the end). We answer each query such that for any q -component X if X ∩ U = ∅ , then w ( X ) ≤

1, while if X ∩ U = ∅ , then 1 ≤ w ( X ) ≤

2. In each stepthe total weight decreases by 0 or 2, so after a while it becomes 2 k + 1 for k = ⌊ log n ⌋ . Whenthis happens, we apply Lemma 2.2 to the current q -components as weighted balls. Indeed, P X : X ∩ U = ∅ w ( X ) ≤ | U | = 2 ⌈ n ⌉ + 2 ≤ n ≤ k − if n > P X : X ∩ U = ∅ > k − . ByLemma 2.2, the number of queries needed to ﬁnish is at least the number of components minus2. Equivalently, Lemma 2.2 states that we need to connect the weighted balls until at most twocomponents remain, thus, we need to connect all of U into at most two components. This meansquerying all the edges between any two vertices of U that are in the same component at the end.That means the edges we have not queried are all on a path of length 9 (between two verticesfrom U ). This proves m ( P n ) ≥ n − m ( P n ) ≥ n − n is large enough,but now we have to be more careful with the calculations. Because of this, we also change howwe select U ; instead of starting with the ﬁrst vertex, we start with the second vertex of thepath, then take every 9 th vertex, and ﬁnally the last but one vertex. We can aﬀord to skip theendvertices, as a single vertex anyhow cannot form a balanced component, we can only compareit to its adjacent vertex from U . This gives | U | = ⌊ n +149 ⌋ , and n +149 ≤ n if n ≥ n <

127 the lower bound n − b ( n ) ≥ n − th vertex, and ﬁnally the last but one vertex. This way only 4 edges can remainunqueried between two diﬀerent components. This gives | U | = ⌊ n +128 ⌋ , and this is less than2 ⌊ log n ⌋− unless n + 1 or n + 3 is a power of two.12t is an interesting question where the truth is between n − n − P n for odd n . Ouronly (computer veriﬁed) case is m ( P ) = n − centroid decomposition . Proposition 3.5.

In every tree on n vertices, for every integer p , there is a subset U of at most n/p vertices such that every component of G \ U has at most p edges (including the edges fromthe components to U ). Theorem 3.6. If G is a tree on n vertices, then m ( G ) ≥ n − .Proof. Let n = 2 k + l , where l < k is odd. Observe that the statement is trivial if n ≤

65, thuswe can assume k ≥

6. Apply Proposition 3.5 with p = 32 to obtain a set U of vertices such that | U | ≤ k − − T of G \ U has at most p edges. (We write p instead of 32throughout the proof.)We proceed as in the proof of Theorem 3.3. We denote by w ( X ) the weight of a q -component X , and for a vertex u of G , w ( u ) denotes the weight of the q -component containing u . Weinitially have P X w ( X ) = n . The adversary will maintain in the ﬁrst part of the algorithm that w ( X ) = 0 for every q -component X that intersects U .We split each component T to a connecting part T ′ and some hanging parts T , T , . . . whereany of these can be empty, as follows. If v ∈ T separates some vertices of U from each other,then it goes to T ′ . Each connected component of T \ T ′ forms a diﬀerent T i . Notice that eachhanging part T i is a subtree of T , thus it has a unique vertex r ( T i ) that separates T i \ { r ( T i ) } from T \ T i ; we call r ( T i ) the root of T i .We answer queries inside T i according to Lemma 3.2 (applied only to T i ), while if the query X ∩ T ′ = ∅ , we answer such that w ( X ) ≤ X ⊂ G \ U will be at most 2. The crucial property is that the balanced q -components of T will always separate either two U vertices, or some positive weight part of a T i from a U vertex. This way they are “in the way” to compare these parts with the rest of thegraph, so they cannot be simply ignored. The strategy of the adversary will be to make sure thatthe game cannot end while there are many unbalanced q -components. After there are only fewunbalanced q -components the game might end, but in this case the graph could be made intoa single q -component by adding O ( p ) further edges to it. This shows that at most these manyqueries can be saved.Also, in case we merge all of some T i into one q -component, the adversary would like to avoid w ( T i ) = 0. This cannot happen if T i has an odd number of vertices; if T i has an even number ofvertices, the adversary adds an (imaginary) extra degree one vertex r ′ ( T i ) to T i that is adjacentonly to r ( T i ), to obtain T ∗ i , and applies Lemma 3.2 to T ∗ i instead of T i . Since r ′ ( T i ) is never13ompared with anything, merging all of T i into a q -component cannot give w ( T i ) = 0, because T ′ = T ∗ i \ T i has only one component, { r ′ ( T i ) } . Therefore, in case the whole tree T i is merged,we get w ( T i ) = 2.Whenever we compare some Y ⊂ G \ U with an X intersecting U such that w ( X ) ≥

3, theadversary answers such that the weight of the new q -component is w ( X ) − w ( Y ), thus P X w ( X )decreases by 2 w ( Y ) ≤

4. In every other case the adversary answers so that the weights are addedup, i.e., P X w ( X ) remains the same. This way the weight of a q -component can never exceed 4,unless we merge two q -components that both intersect U . Because of this, we can conclude that P X : X ∩ U = ∅ w ( X ) ≤ | U | ≤ n/ < k − .The adversary stops executing this algorithm the moment we have P X w ( X ) = 2 k + 1 or2 k + 3; this surely happens, as P X w ( X ) is odd and can decrease by at most 4. As we have seenin the earlier proofs, if P X w ( X ) = 2 k + 1, then we will have two non-balanced q -componentswhen the algorithm is done. If P X w ( X ) = 2 k + 3, then we can apply Proposition 2.9, whoseconditions are shaped to work here, to conclude that we will have at most three non-balanced q -components when the algorithm is done.Moreover, these few remaining non-balanced q -components need to cover U , as the weightsof sets intersecting U stays positive throughout the algorithm. If at the end we have at most ℓ components, then adding ℓ − T ’s, we can make the q -graph connected.As every tree has at most p vertices, and in our case ℓ ≤

3, adding 2 p edges can make the q -graphconnected.To summarize, instead of asking all n − p = 64. Remark.

We could get a better constant by considering the number of yet unqueried edges weneed to add to connect the remaining non-balanced q -components (as in the end of the proof ofTheorem 3.4). Here we will not go to details, as our bound is probably anyhow far from beingoptimal, but this would give something like n −

33 as a lower bound.

We have seen that it is much harder to prove the lower bound m ( T ) ≥ n −

65 for trees of oddorder n than the lower bound m ( T ) ≥ n − n . Somewhat even moresurprisingly, there is a signiﬁcant diﬀerence between the so-called non-deterministic complexitiesof trees of even and odd order. The non-deterministic complexity m nd ( G ) of a graph G is deﬁnedas the minimum number of queries needed to ﬁnd a majority vertex in the worst case, providedwe know the color of each vertex beforehand from an unreliable source and we just have to verify(some of) this information. Let us observe that in the proof of Proposition 3.1 we actually showed m nd ( T ) = n − T of even order n . Proposition 3.7.

Let P be a path of order n , such that n is odd. Then m nd ( P ) = n − Θ( √ n ) .Proof. Let us denote the i th vertex of P by x i for i = 1 , , . . . , n . For the lower bound let ussuppose that n = k + 1 for some even k (this is possible, since we are only interested in the14rder of magnitude of n − m nd ( P )) and let us call the batch of vertices x ( i − k +1 , x ( i − k +2 , . . . , x ik Batch i for i = 1 , , . . . , k . Now let us color the vertices of Batch i red if and only if i is odd andlet x k +1 be blue (just like the vertices of Batch k , since k is even). We claim that in order toﬁnd a majority vertex, one needs at least n − k − p ≥ k , hence the number of q -components afterthe last query is p + 1 ≥ k + 1. It is easy to see that the number of balanced q -components isat most k −

1, since a balanced q -component must contain both x ik and x ik +1 for some i < k .Thus at least k + 2 of the q -components are unbalanced. It is also easy to see that the weightof any q -component is at most k + 1 (since k vertices of the same color are always followed andpreceded by k vertices of the opposite color, except for the ﬁrst k and last k + 1 vertices). Nowif one could show a majority vertex, the weight of its q -component should be more than the sumof the weights of the other q -components, which is impossible, since the latter sum is at least k + 1. This ﬁnishes the proof of the lower bound.Next we prove the upper bound. Consider any coloring of the vertices of P and let us denotethe number of red (resp. blue) vertices among { x , x , . . . , x i } by R ( i ) (resp. B ( i )) and let ussuppose without loss of generality that d := R ( n ) − B ( n ) > n is odd). Observethat if d ≥ √ n , then by asking the ﬁrst n − ⌈ d ⌉ edges (or any consecutive n − ⌈ d ⌉ edges) of P ,we obtain a q -component of weight at least ⌈ d ⌉ + 1 and ⌈ d ⌉ − q -components of weight (andalso cardinality) 1. Thus any majority vertex of the large q -component is also a majority vertexof the whole graph, and we are done. Therefore, we might assume that d < √ n .Let D ( i ) := R ( i ) − B ( i ) (so d = D ( n )), ∆ := max ni =1 | D ( i ) | , and let j be the smallest number,such that | D ( j ) | = ∆. Since d >

0, we may suppose that D ( j ) = ∆, otherwise we can reverse theorder of the vertices and obtain a situation, where the similarly obtained D ( j ′ ) is positive (thevalue ∆ would be diﬀerent then, but d remains the same). Now we consider two cases, based onthe value of ∆.Case 1. ∆ < √ n . Since all values D ( i ) are in the interval [ − ∆ , ∆], by the pigeonholeprinciple there must be a value v , for which |{ i : D ( i ) = v }| ≥ n √ n +1 > √ n . It is obvious thatif D ( a ) = D ( b ), then the number of red and blue vertices are the same in the subpath betweenthe vertices x a +1 and x b . Let the elements of { i : D ( i ) = v } be i , i , . . . , i r and let us query alledges of P , except the edges ( x i , x i +1 ) , ( x i , x i +1 ) , . . . , ( x i r , x i r +1 ). In this way we obtain r or r + 1 q -components, of which r − q -component (which exists, since n is odd) is a majority vertex of the whole graphas well. Since r > √ n , we are done with this case.Case 2. ∆ ≥ √ n . Recall that j is the smallest number, such that D ( j ) = ∆ and d < √ n ,i.e., the number of red vertices is ∆ more than the number of blue vertices by x j , but then thediﬀerence drops to d by the end of P . Thus there must exist a smallest number k , such that k > j and D ( k ) < √ n . Then the subpath P ′ between the vertices x j +1 and x k contains at least∆ − √ n ≥ √ n more blue vertices, than red vertices. Let now j be the smallest index, such15hat the subpath of P ′ between x j +1 and x j contains exactly one more of the blue vertices thanthe red vertices. Similarly, let j be the smallest index, such that the subpath of P ′ between x j +1 and x j contains one more of blue vertices than red vertices, and so on. It is clear thatthe indices j , j , . . . , j ⌊√ n ⌋ are well-deﬁned. Now let us query all edges of P , except the edges( x j , x j +1 ) , ( x j , x j +1 ) , . . . , ( x j ⌊√ n ⌋ , x j ⌊√ n ⌋ +1 ). In this way we obtain ⌊√ n ⌋ + 2 q -components, suchthat one of them has weight ∆, ⌊√ n ⌋ of them has weight 1, and one of them has weight smallerthan ⌊√ n ⌋ , thus any majority vertex of the q -component of weight ∆ is also a majority vertexof the whole graph, ﬁnishing the proof of Proposition 3.7. In this subsection we prove Theorem 1.2 which states that for every n there is a graph G with n vertices and n (1 + b ( n )) edges such that m ( G ) = n − b ( n ). Proof of Theorem 1.2.

Let F k be the graph obtained from a path v v . . . v k by adding k verticesof degree 1, u , u , . . . , u k , to it such that u i is connected to v i for each 1 ≤ i ≤ k . Let k = ⌊ n/ ⌋ and G be the graph we obtain from F k by adding all the possible edges incident to any of thevertices v k − b ( n )+1 , . . . , v k . We are going to deﬁne an algorithm A l for l = 2 i . First we describe some properties of thealgorithm. It uses the edges of F l and either gets back a DIFF answer at some point for an edgethat connects two monochromatic q -components of the same size (which is a power of two), orshows that F l is monochromatic. Moreover, at any point it uses only the ﬁrst j vertices of thepath v . . . v j and the leaves u , . . . , u j connected to them, for some j . Therefore, if there arevertices not appearing in any query, they form a connected graph.We deﬁne algorithm A l recursively. Algorithm A is trivial, it has only one query u v .Assume we have deﬁned Algorithm A l and we are given F l . The graph F l consists of twocopies of F l and an additional edge, where the ﬁrst copy has vertices v , . . . , v l , u , . . . , u l , thesecond copy has the remaining vertices, and the additional edge is v l v l +1 . Algorithm A l +1 runsalgorithm A l separately for the ﬁrst, and then for the second copy of F l , and ﬁnally asks ( v l , v l +1 ).If for either copy of F l we get back a DIFF answer at some point for an edge that connects twomonochromatic q -components of the same size, we are done. Otherwise, both copies of F l aremonochromatic. In this case a DIFF answer to the last query connects two monochromatic q -components of the same size, while a SAME answer shows F l is monochromatic.The algorithm showing m ( G ) ≤ n − b ( n ) for even n is based on an idea similar to the algorithmshowing m ( K n ) ≤ n − b ( n ): we ask queries such that if the answer is DIFF, we obtain a balanced q -component (that we can discard), and otherwise we build larger and larger monochromatic q - There are several non-isomorphic graphs G ′ with n (1 + b ( n )) edges such that essentially the same proof shows m ( G ′ ) = n − b ( n ). For example we could take the union of a path on n vertices with the biclique K n − b ( n ) ,b ( n ) ,but we have found our proof to be easier to present for the above graph G . v k , u k ). Observe that these two vertices can be considered asthe ﬁrst vertices of a copy of F i having additionally the vertices v , . . . , v i − − , u , . . . , u i − − ,for every i ≤ l = ⌊ log n ⌋ . Thus we run algorithm A l . If we obtain a monochromatic q -componentof size 2 l , that is of the majority q -color, and we are done. If we obtain a DIFF answer, wecontinue with Step 2, which starts with asking v k − u k − . Observe that any consecutive part of F k together with u k and v k forms a copy of F l for some l . More precisely, there is a copy of F l on the vertices v j , . . . , v j + l − , u j , . . . , u j + l − , v k − , u k − , provided j + l − < k −

1. We continuewith the vertex v j if v j − has the largest index among those appearing in any query in Step 1.Similarly to Step 1, we run algorithm A l for the largest l possible, i.e., the largest l that is apower of 2 and is smaller than k − j + 1.In general, for Step i we take v k − i +1 , the ﬁrst vertex v j not appearing in any query inStep i −

1, and 2 r − r possible without arriv-ing back to v k − i +1 . Furthermore, we take u m for every v m we took. More precisely, we take v j , . . . , v j +2 r − , u j , . . . , u j +2 r − , v k − i +1 and u k − i +1 . This is a copy of F r , thus we can run Algo-rithm A r on it. If we obtain a monochromatic q -component, that is of the majority color, andwe are done. Indeed, we took the largest r possible, thus more than half of the vertices notappearing in any query before Step i are of that color. The vertices appearing in earlier queriesare balanced, as the q -components obtained in earlier steps are balanced.Observe that if all the steps end with DIFF answers, we had at least b ( n ) steps before theend of the algorithm, i.e. before every vertex appeared in a query. Indeed, all the componentshave order that is a power of 2, and n cannot be written as the sum of less than b ( n ) powers of2. After Step b ( n ), the remaining vertices (if there are any) form a connected graph, thus we canask a spanning tree of that graph to ﬁnd a majority vertex there. Altogether the query graph isa forest of at least b ( n ) q -components, thus at most n − b ( n ) queries were asked. We collect below the most important questions that remain open. • What is the complexity of computing m ( w ) and m ( G )? • For which w ′ = (2 w , w , w , . . . , w k ) does m ( w ′ ) = m ( w ) − • Does m ( T ) ≥ n − n vertices? • What is the least number of edges a graph on n vertices can have if m ( G ) = n − b ( n )?17 cknowledgment We would like to thank our anonymous reviewers for several useful suggestions that improvedthe presentation of our paper.

References [1] M. Aigner, Variants of the majority problem, Disc. App. Math., 137 (2004), 3–25.[2] M. Aigner, G. De Marco, M. Montangero, The plurality problem with three colors and more,Theor. Comp. Sci., 337 (2005), 319–330.[3] A. M. Borzyszkowski, Computing majority via multiple queries, Theor. Comp. Sci., 539(2014), 106–111.[4] H. Chang, D. Gerbner, B. Patk´os, Finding non-minority balls with majority and pluralityqueries, preprint, Disc. App. Math., to appear.[5] F. Chung, R. Graham, J. Mao, A. Yao, Oblivious and Adaptive Strategies for the Majorityand Plurality Problems, Computing and combinatorics, LNCS 3595, Springer, Berlin (2005),329-338.[6] G. De Marco, E. Kranakis, G. Wiener, Computing Majority with Triple Queries, Theor.Comp. Sci., 461 (2012), 17–26.[7] G. De Marco, A. Pelc, Randomized Algorithms for Determining the Majority on Graphs,Combinatorics, Probability and Computing, 15 (2006), 823–834.[8] D. Eppstein, D. S. Hirschberg, From Discrepancy to Majority, Algorithmica, 80 (2018),1278-1297.[9] M.J. Fisher, S.L. Salzberg, Finding a Majority Among nn