[PDF] Testing properties of signed graphs

Abstract

In graph property testing the task is to distinguish whether a graph satisfies a given property or is "far" from having that property, preferably with a sublinear query and time complexity. In this work we initiate the study of property testing in signed graphs, where every edge has either a positive or a negative sign. We show that there exist sublinear algorithms for testing three key properties of signed graphs: balance (or 2-clusterability), clusterability and signed triangle freeness. We consider both the dense graph model, where we can query the (signed) adjacency matrix of a signed graph, and the bounded-degree model, where we can query for the neighbors of a node and the sign of the connecting edge. Our algorithms use a variety of tools from graph property testing, as well as reductions from one setting to the other. Our main technical contribution is a sublinear algorithm for testing clusterability in the bounded-degree model. This contrasts with the property of k-clusterability which is not testable with a sublinear number of queries. The tester builds on the seminal work of Goldreich and Ron for testing bipartiteness.

Full PDF

TTesting properties of signed graphs

Florian Adriaens * Simon Apers † Abstract

In graph property testing the task is to distinguish whether a graph satisﬁes a given propertyor is “far” from having that property, preferably with a sublinear query and time complexity.In this work we initiate the study of property testing in signed graphs, where every edge haseither a positive or a negative sign. We show that there exist sublinear algorithms for testingthree key properties of signed graphs: balance (or ), clusterability and signedtriangle freeness . We consider both the dense graph model, where we can query the (signed)adjacency matrix of a signed graph, and the bounded-degree model, where we can query forthe neighbors of a node and the sign of the connecting edge.Our algorithms use a variety of tools from graph property testing, as well as reductions fromone setting to the other. Our main technical contribution is a sublinear algorithm for testingclusterability in the bounded-degree model. This contrasts with the property of k -clusterabilitywhich is not testable with a sublinear number of queries. The tester builds on the seminal workof Goldreich and Ron for testing bipartiteness. Signed Graphs A signed graph is a graph where every edge either has a positive or a negativelabel. Formally it is denoted as G = ( V, E, σ ) with node set V = [ N ] , edge set E ⊆ V × V andedge labelling σ : E → { + , −} . Such graphs model a variety of different scientiﬁc phenomena.The widely studied correlation clustering problem [BBC04, DEFI06] was orginally motivated bya document classiﬁcation problem, where one has knowledge of pairwise similarities betweendocuments, and the goal is to cluster the documents into (an undeﬁned number of) groups such thatwithin a group the documents are similar to each other, while across groups they are less similar.Several authors [BGG +

19, TOG20] have focused on the related problem of ﬁnding large polarizedcommunities. In physics, signed graphs and their frustration index are utilized to model the ground-state energy of Ising models [Kas63]. A third example are social networks, in which interactionsbetween individuals can often be categorized into binary categories: trust versus distrust, friendlyor antagonistic, etc. An important aspect in the edge formation of social networks is the signof triangles. According to structural balance theory from social psychology [CH56], triangles * KTH, Sweden. Email: [email protected] † CWI, the Netherlands and ULB, Belgium. Email: [email protected] a r X i v : . [ c s . D S ] F e b ith either one or three positive edges are more plausible, and this prevalence has been observedin real-life social networks [LHK10, TCAL16]. Many methods and algorithms for link and signprediction try to capitalize on this. Aside from link prediction, the survey of [TCAL16] lists severalother important data mining tasks in signed social media networks.Signed graphs generalize unsigned graphs, and as such they can have different properties thanunsigned graphs. One important example is the property of clusterability or weak balance , this wasﬁrst introduced in [Dav67] and it is the subject of the correlation clustering problem [BBC04]. Asigned graph is clusterable if there exists a partitioning of the nodes into an a priori unknown num-ber of components such that (i) every positive edge connects two nodes in the same component,and (ii) every negative edge connects two nodes in different components. An equivalent character-ization, in terms of forbidden subgraphs, is that the signed graph contains no cycles with exactlyone negative edge [Dav67, DEFI06]. Clusterability does not appear to have a meaningful interpre-tation in the case of unsigned graphs. For example, if one views an unsigned graph as a signedgraph with the restriction that all the edges have the same sign, whether positive or negative, thenclearly any such graph is clusterable since it contains no cycles with exactly one negative edge.Other signed graph properties are closer related to unsigned graph properties. For example, theproperty of balance or in signed graphs [K ¨on36, Har53] in fact generalizes thatof bipartiteness in unsigned graphs. A signed graph is balanced if it is clusterable into exactlytwo components, with only positive edges inside the components and negative edges between thecomponents. It follows that a signed graph with only negative edges is balanced iff the underly-ing unsigned graph is bipartite. There is also a reduction in the opposite direction, transforming asigned graph to an unsigned graph by replacing each positive edge by a path of two negative edges,and afterwards omitting all the signs of the edges [Zas18]. This reduction preserves distances, inthe sense that the minimum number of edges that need to be deleted to make the signed graphbalanced (the signed frustration index ) is equal to the minimum number of edges that need to bedeleted to make the transformed graph bipartite (the unsigned frustration index ) [Zas18, Proposi-tion 2.2]. We will use this reduction to show that in the bounded-degree model we can reduce theproblem of testing balance to that of testing bipartiteness. A ﬁnal signed graph property that we investigate is that of signed triangle freeness . For a givensigned triangle we say that a signed graph is signed triangle free if the given triangle does not occurin the graph. The occurrence or absence of certain signed triangles is relevant in structural balancetheory [CH56] and it generalizes the notion of triangle freeness in unsigned graphs.

Graph Property Testing

Graph property testing was formally introduced in the seminal workof Goldreich, Goldwasser and Ron [GGR98]. As input we are given query access to an (unsigned)graph G = ( V, E ) with node set V = [ N ] and edge set E ⊆ V × V . We would like to decidewhether the graph obeys a certain property P , or whether it is “far” from any graph having thatproperty. This is a relaxed setting as compared to that of deciding P and it often allows for al-gorithms that have sublinear query and/or time complexity. Such sublinear algorithms have beenproposed for a wide range of graph properties such as bipartiteness [GGR98, GR99], k -colorability[GGR98, AK02], cycle-freeness [GR04, CGR +

14] and more generally monotone graph properties This reduction does not work well in the dense graph model, since the transformed graph will be typically sparse. dense graph model [Gol17] we are able to query the adjacency matrix entries. A querytakes the form ( v, w ) ∈ [ N ] × [ N ] and the reply is if there is an edge between v and w ,otherwise it is . Two graphs G = ( V, E ) and G (cid:48) = ( V, E (cid:48) ) are said to be (cid:15) -far from eachother if they differ in at least an (cid:15) -fraction of the adjacency matrix entries. Equivalently, atleast (cid:15)N edges have to be added or removed to turn G into G (cid:48) .2. In the bounded-degree graph model we are given an upper bound d on the degrees of thegraph, and we are given access to the adjacency list of G . A query takes the form ( v, i ) , with v ∈ [ N ] and i ∈ [ d ] . If the degree of v is at least i , then the query is answered with the i th neighbor u of node v (in arbitrary order). If v has degree smaller than i , an error symbol isreturned. Two graphs G = ( V, E ) and G (cid:48) = ( V, E (cid:48) ) are (cid:15) -far from each other if at least (cid:15)nd edges have to be modiﬁed (added or removed) to turn G into G (cid:48) .Using these deﬁnitions, a graph is (cid:15) -far from having property P if it is (cid:15) -far from any graph havingproperty P .A property testing algorithm for P is a randomized algorithm that, given query access to G and an error parameter (cid:15) , should behave as follows: (i) if G has property P then the algorithmshould accept with probability at least / , whereas (ii) if G is (cid:15) -far from having property P , thenthe algorithm should reject with probability at least / . If G satisﬁes neither condition than thealgorithm can behave arbitrarily. This is the main reason why testing algorithms are often far moreefﬁcient than algorithms for effectively deciding whether G has property P or not. If a propertytester always accepts graphs having property P (i.e., it never falsely rejects), then it is called a one-sided property tester for P . Otherwise it is called a two-sided property tester. Testing in Signed Graphs

Given the utility of graph property testing, and the importance ofsigned graphs in many applications, we believe that extending the framework of graph propertytesting to signed graphs is worthwhile. The deﬁnitions of distance and query access for unsignedgraphs are easily extended to signed graphs:1. In the dense signed graph model adjacency matrix queries are now answered by an elementfrom { , − , + } . A signed graph is (cid:15) -far from property P if a least (cid:15)N edge modiﬁcations(addition, removal or sign switch) have to be made to obtain a graph that satisﬁes P .2. In the bounded-degree signed graph model a query ( v, i ) ∈ [ N ] × [ d ] is now answeredeither by the i -th neighbor w of v and the sign σ ( v, w ) of the corresponding edge, or by anerror symbol if v has less than i neighbors. A signed graph is (cid:15) -far from P if a least (cid:15)nd edge modiﬁcations (addition, removal or sign switch) have to be made to obtain a graph thatsatisﬁes P . 3ote that in both cases the edge modiﬁcations now consist of edge additions, removals, as well assign switches. However, the properties discussed in this paper (signed triangle freeness, balanceand clusterability) are all monotonous, and hence one may restrict the attention to edge removals. In this work we investigate property testing algorithms for the canonical signed graph propertiesof signed triangle freeness, balance and clusterability. Table 1 summarizes our results in terms ofquery complexity of the proposed testers. The (cid:101) O ( · ) -notation hides polylogarithmic factors in itsargument and in N , in the bounded-degree model it also hides a polynomial dependence on thedegree bound d . The time complexity for testing balance and clusterability in the bounded-degreemodel is bounded by the query complexity (cid:101) O ( √ N / poly( (cid:15) )) . In all other cases the time complexityis at most exponential in the query complexity. In the rest of the section we give a sketch of ourtechniques. Dense signed graph model Bounded-degree signed graph modelSigned triangle freeness (cid:101) O (tower(log(1 /(cid:15) ))) [Fox11] (cid:101) O (1 /(cid:15) ) [Gol10]Balance (cid:101) O (1 /(cid:15) ) [Soh12] (cid:101) O ( √ N / poly( (cid:15) )) Clusterability (cid:101) O (1 /(cid:15) ) (cid:101) O ( √ N / poly( (cid:15) )) Table 1: Query complexity of the different property testers. All testers are one-sided except for theclusterability tester in the dense model. The function tower(log(1 /(cid:15) ))) denotes a power tower of ’s of height O (log(1 /(cid:15) )) . Dense signed graph model

We ﬁrst describe property testing algorithms in the dense signedgraph model. The property of signed triangle freeness can be efﬁciently tested by interpretingthe signed graph as an edge-colored graph. We can then use Fox’s edge-colored triangle removallemma [Fox11], similar to the case of triangle freeness in unsigned graphs. For the property of balance or 2-clusterability we can use a reduction to a constraint satisfaction problem (CSP): ev-ery node corresponds to a Boolean variable (which indicates its cluster), a positive edge imposesan equality constraint between its endpoints and a negative edge imposes an inequality constraint.We can then test balance by using a property testing algorithm for CSPs [AdKK03, Soh12, AE02].The property of clusterability can also be cast as a CSP in which the node variables now take arbi-trary integer values in [ N ] (indicating their cluster). However, the aforementioned CSP testers[AdKK03, Soh12, AE02] are not efﬁcient in such a regime. We circumvent this problem byproving that a signed graphs that is clusterable is necessarily (cid:15)/ -close to being clusterable into O (1 /(cid:15) ) clusters. Using this we reduce the problem of testing clusterability to that of distinguish-ing graphs that are (cid:15)/ -close to being O (1 /(cid:15) ) -clusterable from those that are (cid:15) -far from being O (1 /(cid:15) ) -clusterable. This problem corresponds to tolerantly testing a CSP where the variables nowtake values in [ O (1 /(cid:15) )] , and this can be done efﬁciently using an algorithm by Andersson andEngebretsen [AE02]. 4 ounded-degree signed graph model Now we turn to the bounded-degree model. Testingsigned triangle freeness is trivial in this model, similar to the unsigned case [Gol10]. Testingbalance requires more care. While we can again cast the problem as a CSP, we are not aware ofany appropriate property testing algorithms for CSPs in the bounded-degree model. Rather wereduce the problem of testing balance for signed graphs to that of testing bipartiteness for un-signed graphs, for which we can use the algorithm of Goldreich and Ron [GR99]. The reductionis based on a transformation described by Zaslavsky [Zas18], which maps balanced (resp. unbal-anced) signed graphs to bipartite (resp. nonbipartite) unsigned graphs. The resulting algorithm’squery complexity has an optimal √ N -dependence, which follows from the Ω( √ N ) lower boundfor testing bipartiteness.Finally, and this is our main technical contribution, we describe a property testing algorithmfor clusterability. While we can again reduce the problem to testing O (1 /(cid:15) ) -clusterability, similarto the dense case, the problem is that k -colorability (which is a special case of k -clusterability) isnot testable in the bounded-degree model [BT04]. Rather, we base our algorithm on the forbiddensubgraph characterization by Davis [Dav67], which states that a signed graph is clusterable if andonly if it has no cycles with exactly one negative edge. We then use random walks to ﬁnd sucha cycle: ﬁrst we pick a random initial node and perform a large number of random walks on thepositive edges of G , then we check for the existence of a negative edge between any pair of nodesthat were visited by a random walk. Such a negative edge necessarily yields a bad cycle. Thecorrectness of this algorithm is easy to prove when the positive edges in G induce an expander.For the general case we build on the (unsigned) graph decomposition results of Goldreich and Ron[GR99]. On the one hand, our work demonstrates that key properties of signed graphs can be tested veryefﬁciently. This seems to not have been studied before. On the other hand, we introduce signedgraphs as an interesting setting for studying graph property testing. Our work leaves open a numberof questions and future directions:• A lot of effort has been put in characterizing the set of properties that are testable (using (cid:101) O (poly(1 /(cid:15) )) queries) in the dense graph model [AS08, AS05, GT01, AFNS09] and thebounded-degree graph model [BSS10, CSS09, CGR +

14, NS13]. It would be interesting tocharacterize the set of signed graph properties that are testable.• In a very recent work by Kumar, Seshadhri and Stolman [KSS21] an efﬁcient partition or-acle was proposed. For minor-closed graph families such an oracle gives local access to acertain global decomposition of the graph. The study of such a decomposition and corre-sponding oracle for signed graphs seems like an interesting future direction, especially giventhe connection between signed graphs and social networks.• Finally, we did not succeed in proving a Ω( √ N ) lower bound for testing clusterability in thebounded-degree signed graph model, and hence we leave this as an open question.5 Property testing in the dense signed graph model

A signed triangle is any triangle with a ﬁxed sign assignment of its edges. As mentioned in theintroduction, it is often interesting to check whether a signed graph for instance contains anytriangles with exactly one negative edge. While it can be computationally expensive to effectivelydecide this, especially for massive graphs such as social networks, it might be easier to test whetherthe graph is free of such triangles.Testing triangle freeness for unsigned graphs has been well studied. It is a direct applicationof the triangle removal lemma [RS78, Fox11]. For some function f , the canonical one-sidedtester simply picks f ( (cid:15) ) random triples in [ N ] and rejects the graph if any of the triples induces atriangle. If the graph is triangle free, then we always accept the graph, so that the tester is indeedone-sided. If the graph is (cid:15) -far from being triangle free, than the triangle removal lemma of Fox[Fox11] proves that the graph contains at least δ ( (cid:15) ) (cid:0) n (cid:1) triangles. Here δ ( (cid:15) ) is a function boundedby the inverse of the towering function tower(log(1 /(cid:15) )) , which corresponds to a tower of ’s ofheight O (log(1 /(cid:15) )) (i.e., -to-the- -to-the-. . . -to-the- , O (log(1 /(cid:15) ) times). Hence if we sample f ( (cid:15) ) ∈ Θ(1 /δ ( (cid:15) )) triples, then with high probability one of them will induce a triangle and we willcorrectly reject the graph. This yields a total query complexity of O (1 /δ ( (cid:15) )) .Testing signed triangle freeness in signed graphs can be analyzed in a very similar way. Byinterpreting the edge signs of a graph as an edge-coloring, we can use Fox’s colored triangleremoval lemma [Fox11]. This lemma states that if an edge-colored graph is (cid:15) -far from being freeof a certain colored triangle, then the graph contains at least δ (cid:48) ( (cid:15) ) (cid:0) n (cid:1) such induced triangles, where δ (cid:48) ( (cid:15) ) is again bounded by the inverse of tower(log(1 /(cid:15) )) . By the same argument as in the unsignedcase, this implies the existence of a one-sided tester for colored (or signed) triangle freeness inthe dense graph model with query complexity O (tower(log(1 /(cid:15) ))) . This proves the followingtheorem. Theorem 1.

There exists a one-sided tester for signed triangle freeness in the dense signed graphmodel with query complexity (cid:101) O (tower(log(1 /(cid:15) ))) . We can cast balance or 2-clusterability of a signed graph G = ( V, E, σ ) as a satisﬁability problem.Associate with each node v a variable x v ∈ { , } . With every edge ( u, v ) ∈ E we associate aconstraint on x u and x v : if σ ( e ) = + (positive edge) the constraint is satisﬁed iff x u = x v ; if σ ( e ) = − (negative edge) then the constraint is satisﬁed iff x u (cid:54) = x v . The graph G will be balanced iff thereexists an assignment of x v ’s such that all constraints are satisﬁed. Even more, if G is (cid:15) -far frombeing balanced then we similarly have to remove (cid:15)n constraints from the satisﬁability problem forit to satisﬁable. As a consequence, the problem reduces to testing whether the satisﬁability problemis in fact satisﬁable. For this we can use the work by Sohler [Soh12] which describes a one-sidedtester with query complexity (cid:101) O (1 /(cid:15) ) . The algorithm is very simple: sample (cid:101) O (1 /(cid:15) ) variables andaccept, if and only if the induced set of constraints on those variables has a satisfying assignment.6pplying this algorithm to the problem of testing balance gives the following algorithm: sample (cid:101) O (1 /(cid:15) ) nodes, query the entire induced subgraph, and accept if and only if the induced subgraph isbalanced. From [Soh12, Theorem 1] it then follows that this describes a one-sided property testerfor balance. This proves the following theorem. Theorem 2.

There exists a one-sided tester for balance in the dense signed graph model withquery complexity (cid:101) O (1 /(cid:15) ) . We note that testing k -clusterability can be reduced to satisﬁability in the very same manner,except that now the variables x v ∈ { , , . . . , k − } . For constant k we can again use [Soh12,Theorem 1] to get a one-sided tester with query complexity (cid:101) O (1 /(cid:15) ) . A relaxation of k -clusterability for signed graphs is the notion of weak balance or clusterability [Dav67, BBC04]. A signed graph is clusterable if it is k -clusterable for some (a priori unknown) k ∈ [ N ] . Since there can only be N clusters, we could test clusterability by testing N -clusterability.The satisﬁability reduction from last section however fails in such case, because typical satisﬁabil-ity testers have a bad dependence on the domain size of the variables.Rather, we argue that testing clusterability can be reduced to tolerantly testing k -clusterabilityfor k ∈ O (1 /(cid:15) ) . A tolerant tester [PRR06] is required to accept inputs that are (cid:15) -close to someproperty P , while rejecting inputs that are (cid:15) -far from P , for some parameters (cid:15) < (cid:15) . Toleranttesting is closely related to approximating the distance from an object to a property (see [PRR06]).We use the following lemma. Lemma 3.

If a signed graph is clusterable then it is (cid:15) -close to being clusterable into at most /(cid:15) clusters.Proof. Let the partition V = P ∪ P ∪ · · · ∪ P r denote a valid clustering of the graph. We deﬁnea new partition by merging different components: keep all components P i of size | P i | ≥ (cid:15)N , andmerge the remaining components into components of size between (cid:15)N and (cid:15)N (which is alwayspossible). This yields a new partition with at most /(cid:15) components. Between the components thereare only negative edges, and there are at most (2 (cid:15)N ) /(cid:15) = 4 (cid:15)N edges within the components.Hence if we remove all the edges within the new components, then we obtain a new graph forwhich the new partition describes a clustering with at most /(cid:15) clusters, and which is (4 (cid:15) ) -far fromthe original graph.Now if a signed graph is (cid:15) -far from being clusterable, then clearly it is also (cid:15) -far from being say (8 /(cid:15) ) -clusterable. On the other hand, by this lemma, a graph that is (cid:15)/ -close to being clusterablewill be ( (cid:15)/ (cid:15)/

2) = 3 (cid:15)/ -close to a graph that is (8 /(cid:15) ) -clusterable. Hence we can use a toleranttester for O (1 /(cid:15) ) -clusterability to tolerantly test clusterability. Equivalently, we can use an additiveestimate (with error ± (cid:15)N ) on the number of edges that need to be removed in order to make asigned graph k -clusterable, for k ∈ O (1 /(cid:15) ) . Now we are in better shape to cast the problem as asatisﬁability problem, similar to last section. In Appendix A.1 we detail how to use the algorithm7f Andersson and Engebretsen [AE02] to tolerantly test for O (1 /(cid:15) ) -clusterability using (cid:101) O (1 /(cid:15) ) queries. This yields the following theorem. Theorem 4.

There exists a two-sided tolerant tester for clusterability in the dense signed graphmodel with query complexity (cid:101) O (1 /(cid:15) ) . Since the tester is tolerant, this also gives an algorithm to estimate the weak frustration index withadditive error (cid:15)N using (cid:101) O (1 /(cid:15) ) queries. Testing triangle freeness in the bounded-degree (unsigned) graph model is signiﬁcantly easier toanalyze than doing so for the dense model [Gol10]. In fact, the exact same argument applies tosigned graphs and we will describe it here for completeness. Given query access to a signed graph,consider the following testing algorithm: pick (cid:101) O (1 /(cid:15) ) nodes and reject if any of them is part of asigned triangle. We can check whether a node is part of a signed triangle simply by querying for itsneighbors, and the neighbors of its neighbors. In the bounded-degree model this takes only (cid:101) O (1) queries. If the graph is signed triangle free then we will never reject. On the other hand, if at least (cid:15)dN edges have to removed in order to make the graph signed triangle free, then at least (cid:15)N nodesmust be part of a signed triangle. With large probability the algorithm will sample such a node andconsequently reject the graph. This yields the theorem below. Theorem 5.

There exists a one-sided tester for signed triangle freeness in the bounded-degreesigned graph model with query complexity O (1 /(cid:15) ) . Our algorithm for testing balance of bounded-degree signed graphs reduces the problem to testingbipartiteness in a related unsigned graph . Consider the following mapping from a signed graph G to an unsigned graph G (cid:48) : (i) for every positive edge ( u, v ) create a new node w ( u,v ) and replace theedge ( u, v ) by two unsigned edges ( u, w ( u,v ) ) and ( w ( u,v ) , v ) , and (ii) replace each of the remainingnegative edges by an unsigned edge. The unsigned graph G (cid:48) has an odd cycle if and only if G hasa cycle with an odd number of negative edges. As a consequence, G (cid:48) will be bipartite if and onlyif G was balanced.In fact, an even stronger property holds [Zas18, Proposition 2.2]: the (signed) frustration indexof G is equal to the (unsigned) frustration index of G (cid:48) . This implies the following lemma. Lemma 6. If G is (cid:15) -far from balanced then G (cid:48) is (cid:15)/ ( d + 1) -far from bipartite. The reduction is reminiscent of the reduction from cycle-freeness testing to bipartiteness testing in [CGR + +

14] is randomized and between unsigned graphs, whereas our reduction is determin-istic and for signed graphs. G (dotted lines are positive edges, solid lines are negativeedges). (b) Bipartite unsigned graph G (cid:48) after mapping. (c) Bipartition of G (cid:48) . Proof.

For the second fact, let G be (cid:15) -far from being balanced, so that it has signed frustrationindex k ≥ (cid:15)dN . The unsigned graph G (cid:48) then has unsigned frustration index k ≥ (cid:15)dN . Now if G has m + positive edges, then G (cid:48) has exactly N + m + ≤ ( d + 1) N vertices while keeping thesame degree bound d . As a consequence, we can bound its frustration index k ≥ (cid:15)d +1 ( d + 1) dN ≥ (cid:15)d +1 d ( N + m + ) , so that G (cid:48) is indeed (cid:15)d +1 -far from being bipartite.This lemma proves that we can test (cid:15) -balancedness of G by testing (cid:15)/ ( d + 1) -bipartiteness of G (cid:48) . For this we can use the following algorithm, which uses random walks to ﬁnd odd cycles. Itwas proven to be a one-sided bipartiteness tester in the bounded-degree model by Goldreich andRon [GR99]. Algorithm 1

Bipartiteness tester for O (1 /(cid:15) ) times do Pick a node v uniformly at random. Perform (cid:101) O ( √ N /(cid:15) ) random walks starting from v , each of length (cid:101) O (1 /(cid:15) ) . If some vertex u is reached both after an even path and after an odd path then reject.It remains to prove that we can efﬁciently implement this tester on G (cid:48) . Lemma 7.

It is possible to implement Algorithm 1 on G (cid:48) using poly(log( N ) /(cid:15) ) √ N adjacency listqueries to G .Proof. We need to be able to select a uniformly random node from G (cid:48) , and implement a randomwalk on G (cid:48) . The latter is easy:• If we are on an original node u in G (cid:48) then pick a random neighbor v of u in G . If ( u, v ) isnegative, go to v , otherwise go to the new node indexed w ( u,v ) .• If we are on a new node w ( u,v ) , go to either u or v with probability / . A single step of the random walk from a node v (with degree d ( v ) ) corresponds to the following process: withprobability d ( v ) /d move to a uniformly random neighbor, and otherwise stay at v .

9o select a uniformly random node from G (cid:48) , do the following:1. Pick ( u, i ) ∈ [ N ] × [ d ] uniformly at random and query for the i -th neighbor v of u in G . If u has less than i neighbors we reject.2. If σ ( u, v ) = − , with probability / (4 d ( u )) output a random endpoint of ( u, v ) and terminate.Otherwise, go to next step.3. If σ ( u, v ) = + , with probability / output w ( u,v ) and terminate. Otherwise, with probability / (3 d ( u )) output a random endpoint of ( u, v ) .With probability ≥ / (4 d ) , this scheme returns a uniformly random node from G (cid:48) (and otherwiseit rejects). To see this, ﬁrst consider any original node u in G (cid:48) . Any of its d ( u ) incident edges ispicked with an equal probability / ( dN ) . If a negative incident edge is picked, then u is returnedin step 2. with probability / (4 d ( u )) ; if it is a positive incident edge then u is returned in step3. with probability (1 − / / (3 d ( u )) = 1 / (4 d ( u )) . Hence the total probability that u is returnedis d ( u ) 14 d ( u ) 1 dN = 14 dN . Now consider a new node w ( u,v ) in G (cid:48) . In step 1. the edge ( u, v ) is picked with probability / ( dN ) ,after which w ( u,v ) is returned with probability / in step 3., yielding a total probability / (4 dN ) .Since there are N + m − ≥ N nodes in G (cid:48) , the total probability of returning a node is ≥ N/ (4 dN ) =1 / (4 d ) . The sampling scheme only requires a single query, and so we can sample a uniformlyrandom node from G (cid:48) using d ∈ O (1) queries in expectation. By Chebyshev’s inequality the totalnumber of queries will be close to its expection with overwhelming probability.This proves the following theorem. Theorem 8.

There exists a one-sided tester for balance in the bounded-degree signed graph modelwith query complexity (cid:101) O ( √ N / poly( (cid:15) )) . Since balancedness of signed graphs generalizes bipartiteness of unsigned graphs, the (cid:101) Ω( √ N ) lower bound for testing bipartiteness in the (unsigned) bounded-degree model [GR04, Theorem7.1] also applies to testing balancedness in the (signed) bounded-degree model. As a consequence,the √ N -dependency of our tester is optimal. In this section we prove the existence of a one-sided property tester for clusterability in thebounded-degree signed graph model. We ﬁrst note that similar to the dense case we can re-duce the problem to testing O (1 /(cid:15) ) -clusterability. However, k -clusterability is a special case of k -colorability for unsigned graphs, and this is known not to be testable in the bounded-degreemodel [BOT02] (i.e., it requires Ω( N ) queries). Instead, we use the forbidden subgraph character-ization of clusterability by Davis [Dav67]: 10 heorem 9 ([Dav67, Theorem 1]) . A signed graph G is clusterable if and only if G contains nocycle with exactly one negative edge. We will call such a cycle a bad cycle . This characterization is a crucial distinction betweenclusterability and the untestable k -clusterability (or its unsigned variant, k -colorability), whichdoes not seem to have such a simple characterization.Similar to the bipartiteness tester of Goldreich and Ron [GR99] we will try to ﬁnd bad cyclesby performing many random walks in G . Speciﬁcally, we simulate random walks on the unsignedsubgraph G + = ( V, E + ) induced by the positive edges E + = { e ∈ E | σ ( e ) = + } . Starting froma random initial node, we perform many such random walks and we check for the existence of anegative edge between distinct random walks. Such a negative edge will necessarily yield a badcycle, in which case we can safely reject the graph. Algorithm 2

Tester for clusterability for O (1 /(cid:15) ) times do Pick a random node s and run bad-cycle ( s ) . If this returns a bad cycle, reject the graph. bad-cycle ( s ) : Perform (cid:101) O ( √ N / poly( (cid:15) )) random walks of length (cid:101) O (1 / poly( (cid:15) )) on G + , starting from s . Let K denote the set of all the nodes that are visited. If there is a negative edge between any pair of nodes in K , return the corresponding bad cycle.We prove the following claim. Theorem 10.

Algorithm 2 is a one-sided tester for clusterability with query complexity (cid:101) O ( √ N / poly( (cid:15) )) . The claim about the query complexity is easy to check. The total number of random walk stepsis (cid:101) O ( √ N / poly( (cid:15) )) , and a single random walk step can be implemented with (cid:101) O (1) queries. Tocheck whether there exists a negative edge between any pair of nodes in K , it sufﬁces to querythe full (bounded) neighborhood of every node in K . This takes d | K | ∈ (cid:101) O ( √ N / poly( (cid:15) )) queries.The remainder of this section is used to prove correctness of the tester (which ultimately followsfrom Claim 14). We will ﬁrst describe the intuition behind the tester. To this end, assume that there is a decompo-sition V = V ∪ · · · ∪ V k as in Fig. 2 such that for each i the following holds:1. V i has few positive outgoing edges: | E + ( V i , V ci ) | ≤ (cid:15) d | V i | .

11. A random walk of length (cid:101) O (1 / poly( (cid:15) )) on G + , and starting from any s ∈ V i , ends uniformlyat random inside V i .While such a decomposition does not generally exist, the existence of a closely related decompo-sition was proven by Goldreich and Ron [GR99].Now assume that G is (cid:15) -far from being clusterable. Then we claim that there must be at least (cid:15) dN negative edges inside the partitions V , . . . , V k . Indeed, if this were not the case, then we couldﬁnd a valid clustering by removing these ≤ (cid:15) dN negative edges together with the ≤ (cid:15) dN positiveedges between the partitions. This contradicts the fact that G is (cid:15) -far from being clusterable.Now make the additional assumption that the number of negative edges | E − ( V i ) | inside eachpartition V i is Ω( (cid:15) | V i | ) , and consider an arbitrary node s ∈ V i . The probability that a pair of randomwalks on the positive edges, starting from s , results in a bad cycle can be lower bounded by theprobability that the random walk endpoints u and v form a negative edge ( u, v ) ∈ E − . Since u and v are distributed uniformly, this probability is at least | E − ( V i ) | / | V i | ∈ Ω( (cid:15)/N ) . Taking (cid:112) N/(cid:15) independent random walks (and ignoring correlations), the total probability of ﬁnding a bad cyclethen becomes Ω (cid:16)(cid:0) √ N/(cid:15) (cid:1) (cid:15)N (cid:17) ∈ Ω(1) .Figure 2: Bounded degree tester for clusterability. Positive edges E + depicted as solid lines,negative edges E − depicted as dashed lines. G is (cid:15) -close to clusterable if there are ≤ (cid:15) dN positiveedges between the partitions and ≤ (cid:15) dN negative edges inside the partitions. While the former section correctly captures the intuition behind the tester, the full proof of correct-ness is signiﬁcantly more involved. Luckily, most of the analysis runs similar to that of Goldreichand Ron [GR99] (regarding the graph decomposition, but also regarding bounds on the correlationbetween distinct random walks). We can make the above intuition rigorous. In the following let G + = ( V, E + ) denote the (unsigned) subgraph induced by the positive edges. The main idea ofthe decomposition is that for most of the vertices s in G + , we can ﬁnd a subset S so that S has fewoutgoing edges and a short random walk from s mixes approximately uniformly over S . We canhence set the ﬁrst partition V = S . Now we would like to repeat the argument for the remaining12raph H + = G + [ V − V ] , induced on the node subset V − V . The problem is that the next subset S (cid:48) ⊆ V − V will be “good” for random walks in H + , but these can behave very differently fromthe original random walks in G + . This problem is dealt with by deﬁning a Markov chain M ( H + ) on the unpartitioned subgraph H + such that (i) we can use M ( H + ) to cut off a new partition S (cid:48) from H + , but also (ii) that the behavior of walks according to M ( H + ) is related to the behavior ofthe original random walks in G + . Details of the Markov chain M ( H + ) are given in Appendix B.1.The following lemma is proven in [GR99], with q s,v ( t ) denoting the probability that t steps of theMarkov chain M ( H + ) , starting from s , end in v . Lemma 11 ([GR99, Corollary 3 and Lemma 4.3]) . Let H + be a subgraph of G + with at least (cid:15)N/ vertices. Then for at least half of the vertices s in H + there exists a subset of vertices S in H + , a value β ∈ (cid:101) Ω( (cid:15) ) and an integer t ∈ (cid:101) O (1 /(cid:15) ) such that1. The number of edges between S and the rest of H + is at most (cid:15)d | S | / .2. For every v ∈ S it holds that (cid:113) β | S || H + | ≤ q s,v ( t ) ≤ (cid:15) (cid:113) β | S || H + | . This is the key lemma that underlies the GR decomposition. Under these conditions, we canprove that if the original signed graph G has many negative edges inside the subset S then themodiﬁed Markov chain M ( H + ) will ﬁnd a bad cycle. This lemma is new, but its proof (which wedefer to Appendix B.2) runs along the lines of the proof of [GR99, Lemma 4.5]. Lemma 12.

Let H + be a subgraph of G + , s a vertex in H + and S a subset of vertices in H + .Assume that there exist α > , F ≥ , t such that α ≤ q s,v ( t ) ≤ F α for every v ∈ S . If theoriginal graph G has at least (cid:15)d | S | / negative edges inside S then m ∈ Ω (cid:18) F(cid:15)α √ | S | (cid:19) runs of M ( H + ) over t steps and starting from s will return a bad cycle with probability at least . . Ultimately we are ofcourse interested in the behavior of random walks in G + , rather than thatof M ( H + ) . The following lemma (proven in Appendix B.1) shows that both are closely related. Claim 13.

Assume that there exists s and m such that m walks of M ( H + ) of length (cid:101) O (1 /(cid:15) ) andstarting from s result in a bad cycle with probability at least . . Then m random walks in G + oflength (cid:101) O (1 /(cid:15) ) and starting from s will also result in a bad cycle with probability at least . . We can prove correctness of the tester (Algorithm 2) by combining the ingredients from the previ-ous section. Rather than proving that a graph G that is (cid:15) -far from clusterable will be rejected, wewill prove that if G is accepted with large probability then G must be close to being clusterable.This is again similar to the proof of correctness in [GR99]. Claim 14.

If Algorithm 2 accepts a graph G with probability greater than / , then G must be (cid:15) -close to being clusterable.

13o prove this, let G be a graph that is accepted with probability greater than / . We say thata vertex s is good if bad-cycle ( s ) in Algorithm 2 returns a bad cycle with probability at most . . Otherwise it is bad . Since we reject with probability less than / , and we consider Ω(1 /(cid:15) ) starting vertices, there can be only (cid:15)N/ bad vertices (for the appropriate constant in the Ω( · ) notation). We will show that under these circumstances we can ﬁnd a valid clustering by removingless than (cid:15)dN edges. To this end, we will iteratively separate a subset S that has at most (cid:15)d | S | / positive outgoing edges and at most (cid:15)d | S | / negative internal edges. We call such a subset an (cid:15) -good cluster .At a given step, let H + denote the unpartitioned graph. We wish to invoke Lemma 11. Call avertex s for which the lemma holds a “useful” vertex with respect to H + . While | H + | ≥ (cid:15)N/ , thelemma ensures that there are ≥ (cid:15)N/ useful vertices. Since there are at most (cid:15)N/ bad vertices,this implies that there exists a vertex s that is both good and useful. By Lemma 11 there exists asubset S in H + that has at most (cid:15) d | S | (positive) edges to the rest of H + . Moreover, by Lemma ?? and Claim 13, the set S is such that if the original signed graph G has at least (cid:15)d | S | negative edgesinside S then bad-cycle ( s ) will return a bad cycle with probability at least . . However, weassumed that s is a good vertex and so the latter probability can be at most . . This implies that G must have less than (cid:15)d | S | / negative edges inside S , and hence S is an (cid:15) -good cluster.We repeat this process until | H + | < (cid:15)N/ . If V , . . . , V k denote the (cid:15) -good clusters that wehave cut off, then we have a partition V = V ∪ · · · ∪ V k ∪ H such that the number of positive edgesbetween the partitions is at most d | H | + (cid:88) i | E + ( V i , V ci ) | ≤ (cid:15) dN + (cid:88) i (cid:15) d | V i | < (cid:15)dN, and the number of negative edges inside the partitions is at most d | H | + (cid:88) i | E − ( V i ) | ≤ (cid:15) dN + (cid:88) i (cid:15) d | V i | < (cid:15)dN. Removing these less than (cid:15)dN edges yields a valid clustering, so that G must be (cid:15) -close toclusterable. This proves Claim 14. This work has beneﬁted from discussions with Jop Bri¨et, Aristides Gionis, Oded Goldreich andChristian Sohler. Florian Adriaens is supported by the ERC Advanced Grant REBOUND (834862),the EC H2020 RIA project SoBigData (871042), and the Wallenberg AI, Autonomous SystemsandSoftware Program (WASP) funded by the Knut and Alice Wallenberg Foundation. SimonApers is supported in part by the Dutch Research Council (NWO) through QuantERA ERA-NETCofund project QuantAlgo 680-91-034. 14 eferences [AdKK03] Noga Alon, W.Fernandez de la Vega, Ravi Kannan, and Marek Karpinski. Randomsampling and approximation of MAX-CSPs.

Journal of Computer and System Sci-ences , 67(2):212 – 243, 2003. Special Issue on STOC 2002.[AE02] Gunnar Andersson and Lars Engebretsen. Property testers for dense constraint satis-faction programs on ﬁnite domains.

Random Structures & Algorithms , 21(1):14–32,2002.[AFNS09] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. A combinatorial charac-terization of the testable graph properties: It’s all about regularity.

SIAM Journal onComputing , 39(1):143–167, 2009.[AK02] Noga Alon and Michael Krivelevich. Testing k-colorability.

SIAM J. Discret. Math. ,15(2):211–227, February 2002.[AS04] Noga Alon and Joel H Spencer.

The probabilistic method . John Wiley & Sons, 2004.[AS05] N. Alon and A. Shapira. A characterization of the (natural) graph properties testablewith one-sided error. In , pages 429–438, 2005.[AS08] Noga Alon and Asaf Shapira. Every monotone graph property is testable.

SIAMJournal on Computing , 38(2):505–522, 2008.[BBC04] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering.

Machinelearning , 56(1):89–113, 2004.[BGG +

19] Francesco Bonchi, Edoardo Galimberti, Aristides Gionis, Bruno Ordozgoiti, and Gian-carlo Ruffo. Discovering polarized communities in signed networks. In

Proceedings ofthe 28th ACM International Conference on Information and Knowledge Management ,CIKM ’19, page 961–970, New York, NY, USA, 2019. Association for ComputingMachinery.[BOT02] Andrej Bogdanov, Kenji Obata, and Luca Trevisan. A lower bound for testing 3-colorability in bounded-degree graphs. In

The 43rd Annual IEEE Symposium on Foun-dations of Computer Science , pages 93–102. IEEE, 2002.[BSS10] Itai Benjamini, Oded Schramm, and Asaf Shapira. Every minor-closed property ofsparse graphs is testable.

Advances in mathematics , 223(6):2200–2218, 2010.[BT04] Andrej Bogdanov and Luca Trevisan. Lower bounds for testing bipartiteness in densegraphs. In , pages 75–81.IEEE, 2004. 15CGR +

14] Artur Czumaj, Oded Goldreich, Dana Ron, C Seshadhri, Asaf Shapira, and ChristianSohler. Finding cycles and trees in sublinear time.

Random Structures & Algorithms ,45(2):139–184, 2014.[CH56] D. Cartwright and F. Harary. Structural balance: a generalization of Heider’s theory.

Psychological review , 63 5:277–293, 1956.[CSS09] Artur Czumaj, Asaf Shapira, and Christian Sohler. Testing hereditary properties ofnonexpanding bounded-degree graphs.

SIAM Journal on Computing , 38(6):2499–2510, 2009.[Dav67] James A Davis. Clustering and structural balance in graphs.

Human relations ,20(2):181–187, 1967.[DEFI06] Erik D Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. Correlationclustering in general weighted graphs.

Theoretical Computer Science , 361(2-3):172–187, 2006.[Fox11] Jacob Fox. A new proof of the graph removal lemma.

Annals of Mathematics , pages561–579, 2011.[GGR98] Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connectionto learning and approximation.

J. ACM , 45(4):653–750, July 1998.[Gol10] Oded Goldreich. Introduction to testing graph properties. In

Property testing , pages105–141. Springer, 2010.[Gol17] Oded Goldreich.

Testing Graph Properties in the Dense Graph Model , page 162–212.Cambridge University Press, 2017.[GR99] Oded Goldreich and Dana Ron. A sublinear bipartiteness tester for bounded degreegraphs.

Combinatorica , 19(3):335–373, 1999.[GR04] Oded Goldreich and Dana Ron. Property testing in bounded degree graphs.

Algorith-mica , 32, 01 2004.[GT01] O. Goldreich and L. Trevisan. Three theorems regarding testing graph properties. In

Proceedings 42nd IEEE Symposium on Foundations of Computer Science , volume 1,pages 460–469, 2001.[Har53] Frank Harary. On the notion of balance of a signed graph.

Michigan Math. J. ,2(2):143–146, 1953.[Kas63] Pieter W Kasteleyn. Dimer statistics and phase transitions.

Journal of MathematicalPhysics , 4(2):287–293, 1963. 16K¨on36] D´enes K ¨onig.

Theorie der endlichen und unendlichen Graphen: KombinatorischeTopologie der Streckenkomplexe , volume 16. Akademische Verlagsgesellschaft mbh,1936.[KSS19] Akash Kumar, C Seshadhri, and Andrew Stolman. Random walks and forbiddenminors II: a poly( dε − ) -query tester for minor-closed properties of bounded degreegraphs. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory ofComputing , pages 559–567, 2019.[KSS20] Akash Kumar, C Seshadhri, and Andrew Stolman. Random walks and forbidden mi-nors I: An n / o (1) -query one-sided tester for minor closed properties on boundeddegree graphs. SIAM Journal on Computing , 2020.[KSS21] Akash Kumar, C Seshadhri, and Andrew Stolman. Random walks and forbidden mi-nors III: poly( d/(cid:15) ) -time partition oracles for minor-free graph classes. 2021.[LHK10] Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Signed Networks in SocialMedia , page 1361–1370. Association for Computing Machinery, New York, NY, USA,2010.[NS13] Ilan Newman and Christian Sohler. Every property of hyperﬁnite graphs is testable.

SIAM Journal on Computing , 42(3):1095–1112, 2013.[PRR06] Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Tolerant property testing and distanceapproximation.

Journal of Computer and System Sciences , 72(6):1012–1042, 2006.[RS78] Imre Z Ruzsa and Endre Szemer´edi. Triple systems with no six points carrying threetriangles.

Combinatorics (Keszthely, 1976), Coll. Math. Soc. J. Bolyai , 18:939–945,1978.[Soh12] Christian Sohler. Almost optimal canonical property testers for satisﬁability. In

IEEE53rd Annual Symposium on Foundations of Computer Science , pages 541–550. IEEE,2012.[TCAL16] Jiliang Tang, Yi Chang, Charu Aggarwal, and Huan Liu. A survey of signed networkmining in social media.

ACM Computing Surveys (CSUR) , 49(3):1–37, 2016.[TOG20] Ruo-Chun Tzeng, Bruno Ordozgoiti, and Aristides Gionis. Discovering conﬂictinggroups in signed networks.

Advances in Neural Information Processing Systems , 33,2020.[Zas18] Thomas Zaslavsky. Negative (and positive) circles in signed graphs: A problem collec-tion.

AKCE International Journal of Graphs and Combinatorics , 15(1):31–48, 2018.17

Technical details for dense signed graph model

A.1 Clusterability

The following theorem is proven by Andersson and Engebretsen [AE02].

Theorem 15 ([AE02, Theorem 2]) . Consider a constraint family F = { f : D (cid:96) → { , }} over (cid:96) variables in domain D , and let Σ denote the maximum number of constraints that can be simulta-neously satisﬁed. An (cid:96) -CSP- D instance I over n variables with constraint family F is described bya collection of constraints { ( f, x i , . . . , x i (cid:96) ) } with f ∈ F and i , . . . , i (cid:96) ∈ [ n ] . It is possible to ap-proximate the maximum number of satisﬁable constraints max( I ) up to error (cid:15)n k with probabilityat least − δ using (cid:101) O (cid:18) |F | Σ (cid:96) (cid:15) (cid:19) queries and time exp (cid:0) (cid:101) O (cid:0) Σ (cid:96)(cid:15) (cid:1)(cid:1) . The problem of k -clusterability is a special instance of this problem. Deﬁne the family F = { f + , f − : [ k ] → { , }} by f + ( x, y ) = 1 if x = y and otherwise, and f − ( x, y ) = 0 if x = y and otherwise. Now given a signed graph G , we can deﬁne a collection I G of constraints by addingconstraint ( f + , x, y ) if ( x, y ) is a positive edge, and ( f − , x, y ) if ( x, y ) is a negative edge. We canquery I G using a single query to the adjacency matrix of G . Moreover, the k -frustration index of G is given by | E ( G ) | − max( I G ) , with | E ( G ) | the number of edges in G . Using that Σ = 1 , |F | = 2 and (cid:96) = 2 for the family F ,it follows from Theorem 15 that we can ﬁnd an (cid:15)N approximation of max( I G ) using (cid:101) O (1 /(cid:15) ) queries and time exp( (cid:101) O (1 /(cid:15) )) . In addition we can easily approximate | E ( G ) | to additive error (cid:15)N by randomly sampling entries of the adjacency matrix of G . Combining these gives an ap-proximation algorithm for the k -frustration index with additive error (cid:15)N , and hence a toleranttester for k -clusterability. B Technical details for clusterability testing in bounded-degreemodel

B.1 Modiﬁed Markov chain

In this section we describe the technical details on the modiﬁed Markov chain proposed by Goldre-ich and Ron [GR99]. Let G = ( V, E, σ ) be a signed graph and let G + = ( V, E + ) be the subgraphinduced on the positive edge set. Let H + be a subgraph of G + , and let (cid:96) , (cid:96) be integers. Theboundary B ( H + ) of H + consists of those vertices in H + that have an edge in G + that leaves H + . A query to the instance I takes the form Q = ( f, x i , . . . , x i (cid:96) ) and returns 1 if Q ∈ I and 0 otherwise. ˆ H + be the graph obtained by appending to every boundary node v ∈ B ( H + ) an auxiliary path of length (cid:96) with node set a v, , . . . , a v,(cid:96) .We will deﬁne a surjective mapping φ from random walks W in G + of length L = (cid:96) (cid:96) towalks φ ( W ) in ˆ H + of length (cid:96) . If W = v , . . . , v L , let i , . . . , i k be the timesteps for which v i j ∈ H + . The mapping is deﬁned essentially by contracting all length- ( < (cid:96) ) walks outside of H + , and routing any length- ( ≥ (cid:96) ) walk outside of H + onto an auxiliary path. More precisely:• Contract: If W does not perform (cid:96) or more consecutive steps outside of H + before it made (cid:96) steps (in total) in H + , then φ ( W ) = v i , . . . , v i (cid:96) . • Contract and route:

In the other case, let i r be the ﬁrst index that precedes a walk of ≥ (cid:96) consecutive steps outside of H + . Then φ ( W ) = v i , . . . , v i r , a v ir , , . . . , a v ir ,(cid:96) − i r . Figure 3: Illustration of mapping φ from walks of length L = (cid:96) (cid:96) on G + to walks of length (cid:96) on ˆ H + .The distribution Pr G ( W ) over length- L walks W in G + induces a distribution Pr M ( U ) overlength- (cid:96) walks U in ˆ H + by setting Pr M ( U ) = (cid:88) W : φ ( W )= U Pr G + ( W ) . We now deﬁne a Markov chain M ( H + ) on ˆ H + such that length- (cid:96) walks U of M ( H + ) have thesame distribution Pr M ( U ) . In the deﬁnition of M ( H + ) we use the quantity p Hv,u ( t ) for v, u ∈ H + ,which denotes the probability that a random walk from v will take t − steps outside of H + andend in u at the t -th step. The Markov chain M ( H + ) is deﬁned as follows:• For every v, u ∈ H + : q v,u = (cid:80) (cid:96) − t =1 p Hv,u ( t ) .• For every v ∈ B ( H + ) : 19 q v,a v, = (cid:80) u ∈ H (cid:80) t ≥ (cid:96) p Hv,u ( t ) , – for every (cid:96) , ≤ (cid:96) < (cid:96) , q a v,(cid:96) ,a v,(cid:96) +1 = 1 , – for every u ∈ H , q a v,(cid:96) ,u = q − v,a v, (cid:80) t ≥ (cid:96) p Hv,u ( t ) .The following claim states that if we ﬁnd a bad cycle with the modiﬁed Markov chain, then wewill also ﬁnd a bad cycle using the original random walk. We say that a set of walks results in abad cycle if the original graph G has a negative edge between two distinct vertices of the walks. Claim 16.

Assume that there exists s and m such that m walks of M ( H + ) of length (cid:96) and startingfrom s result in a bad cycle with probability at least . . Then m random walks in G + of length L = (cid:96) (cid:96) and starting from s will also result in a bad cycle with probability at least . .Proof. Let U and U denote length- t walks of M ( H + ) that result in a bad cycle. If W and W are length- L walks on G + with φ ( W ) = U and φ ( W ) = U , then W and W will also resultin a bad cycle. Now let I bad ( X , . . . , X m ) denote the indicator of whether the walks X , . . . , X m (in G + or M ( H + ) ) form a bad cycle. By our former remark we know that I bad ( W , . . . , W m ) ≥ I bad ( φ ( W ) , . . . , φ ( W m )) . We can lower bound the probability that m walks in G + form a badcycle: (cid:88) W ,...,W m Pr G + ( W ) . . . Pr G + ( W m ) I bad ( W , . . . , W m )= (cid:88) U ,...,U m  (cid:88) W : φ ( W )= U · · · (cid:88) W m : φ ( W m )= U m Pr G + ( W ) . . . Pr G + ( W m ) I bad ( W , . . . , W m )  ≥ (cid:88) U ,...,U m  (cid:88) W : φ ( W )= U · · · (cid:88) W m : φ ( W m )= U m Pr G + ( W ) . . . Pr G + ( W m ) I bad ( U , . . . , U m )  = (cid:88) U ,...,U m Pr M ( U ) . . . Pr M ( U m ) I bad ( U , . . . , U m ) ≥ . , which proves our claim. B.2 Sufﬁcient condition for bad cycle

Lemma 12.

Let H + be a subgraph of G + , s a vertex in H + and S a subset of vertices in H + .Assume that there exist α > , F ≥ , t such that α ≤ q s,v ( t ) ≤ F α for every v ∈ S . If theoriginal graph G has at least (cid:15)d | S | / negative edges inside S then m ∈ Ω (cid:18) F(cid:15)α √ | S | (cid:19) runs of M ( H + ) over t steps and starting from s will return a bad cycle with probability at least . .Proof. For ≤ i, j ≤ m , let η i,j be the random variable so that η i,j = 1 if the i -th and j -th walkform a bad cycle and otherwise η i,j = 0 . We will bound the probability that we do not ﬁnd a bad20ycle, which is Pr( (cid:80) i