[PDF] A mixed-integer linear programming approach for soft graph clustering

Abstract

This paper proposes a Mixed-Integer Linear Programming approach for the Soft Graph Clustering Problem. This is the first method that simultaneously allocates membership proportion for vertices that lie in multiple clusters, and that enforces an equal balance of the cluster memberships. Compared to ([Palla et al., 2005], [Derenyi et al., 2005], [Adamcsek et al., 2006]), the clusters found in our method are not limited to k-clique neighbourhoods. Compared to ([Hope and Keller, 2013]), our method can produce non-trivial clusters even for a connected unweighted graph.

Full PDF

AA mixed-integer linear programming approach for soft graph clustering

Vicky Mak-Hau , John Yearwood

School of Information Technology, Deakin University, Waurn Ponds, Vic. 3215, Geelong, Australia { vicky.mak, john.yearwood } @deakin.edu.au Abstract

This paper proposes a Mixed-Integer Linear Pro-gramming approach for the Soft Graph ClusteringProblem. This is the ﬁrst method that simultane-ously allocates membership proportion for verticesthat lie in multiple clusters, and that enforces anequal balance of the cluster memberships. Com-pared to ([Palla et al. , 2005], [Der´enyi et al. , 2005],[Adamcsek et al. , 2006]), the clusters found inour method are not limited to k -clique neighbour-hoods. Compared to ([Hope and Keller, 2013]), ourmethod can produce non-trivial clusters even for aconnected unweighted graph. Consider an undirected graph G = ( V, E ) with V the setof vertices and E = { ( i, j ) | i, j ∈ V, i < j } the setof edges. Each edge e ∈ E is associated with a weight w e ∈ R that indicates the similarity between its two endvertices–the lager the weight, the more “similar” the two ver-tices are. The hard graph clustering (HGC) problem is tocreate distinct partitions (clusters, or, communities) of theset of vertices according to their similarities, i.e., to form V , . . . , V k , where (cid:83) i =1 ,...,k V i = V , and V i ∩ V j = ∅ forall i, j ∈ { , . . . , k } , i (cid:54) = j . For a thorough literature re-view of the graph clustering problems, see, e.g., [Schaef-fer, 2007], and for fast algorithms for large-scale networks,see, e.g., [Girvan and Newman, 2002; Clauset et al. , 2004;Rosvall and Bergstrom, 2008]. For datasets, see, e.g.,SNAP network datasets [sna, ] and SNAP biomedical datasets[Marinka Zitnik and Leskovec, 2018].The soft graph clustering (SGC) problem, (also known asfuzzy graph clustering), on the other hand, allows clustersto have overlaps. A vertex may be a member of more thanone cluster. There are numerous applications of SGC, suchas: brain research, social network research, natural languageprocessing, citation, and collaboration networks, and so on. Aprecise problem deﬁnition of the SGC varies and is dependenton the application, and sometimes it may not be possible toprovide a precise problem deﬁnition.The subject of study in this paper considers the combina-torial optimisation problem where we are required to deter-mine: 1) the composition of each of the clusters; 2) for each vertex that belongs to more than one cluster, how the mem-bership is distributed amongst the clusters (we denote this by x ic , for Vertex i in Cluster c , hence (cid:80) c x ic = 1 for all i ∈ V ).We consider the case that an equal balance of the cluster totalvertex memberships is desirable, and that not all vertices arerequired to be in a cluster. We consider two equally impor-tant objectives: 1) to minimize the sum of inter-cluster edgeweights (cut across clusters); and 2) to maximize the sum ofintra-cluster edge weights (cluster association). There are a number of existing soft clustering algorithms,each designed to suit different applications see, e.g., CFinderof [Palla et al. , 2005], (see also [Der´enyi et al. , 2005], [Adam-csek et al. , 2006]), the MaxMax Algorithm of [Hope andKeller, 2013], the WATSET methods of [Ustalov et al. , 2018]for NLP, the Chinese Whisper method of [Biemann, 2006],Betweenness-based method of [Pinney and Westhead, 2006],and the Purifying and Filtering the Coupling Matrix approachof [Liu and Foroushani, 2016]. Of these methods, [Biemann,2006], [Pinney and Westhead, 2006], [Liu and Foroushani,2016], and [Ustalov et al. , 2018] are designed for unweightedgraphs only (i.e., graphs with unit edge weight).The MaxMaxAlgorithm is designed for weighted undirected graphs. Forunweighted graphs, however, it will return a trivial solution–each connected component of the graph will be a cluster. TheCFinder is based on the ﬁnding of k -clique neighbourhoods.The mixed-integer linear programming method we proposein this paper is able to accomodate both weighted and un-weighted graphs, with a small modiﬁcation required for thelatter. In the preliminary experiments section, we will com-pare and contrast the different methods.We are not aware of any mixed-integer linear programming(MILP) models for the SGC problems. There are, however,MILP models for other graph clustering, machine learning,and data classiﬁcation problems. The article [Bertsimas andShioda, 2007] presents MILP formulations for classiﬁcationand regression. The idea for classiﬁcation, e.g., is to parti-tion Class 1 points into K disjoint subsets by ﬁnding the hy-perplanes that describe the partitioning polyhedrons such thatno Class 0 points can be expressed as a convex combinationof the Class 1 points in each partition. In general clusteringproblems, [Sa˘glam et al. , 2006] proposes a MILP formulationwhere one wishes to partition a set of data set into k (a pre- a r X i v : . [ c s . D M ] J un etermined number of) clusters. The objective is to minimizethe maximum diameter of the generated clusters in order toobtain evenly compact clusters. Essentially the method is anIP-based heuristic method, with some variables ﬁxed by thesolution of maximal independent set of size k , where eachmember of this set is a seed member in the k clusters. TheIP model (which is in fact a bilinear model, but linearized us-ing standard linearization strategies) is then solved to obtainan optimal solution to the general clustering problem. OtherMILP-based work can be found in, e.g., [Gilpin et al. , 2013]and [Ye, 2007] for hierarchical clustering. The latter presentsan application in recommendation systems. In Clique Cover-ing Problem (CCP), an NP-hard combinatorial optimisationproblem where an undirected graph is to be partitioned toform complete subgraphs, [Miyauchi et al. , 2018] proposesa compact ILP formulation for a relaxed problem, as well asa post-optimization repair procedure and a proof of optimalityfor the ﬁnal solution to the original problem. As far as we aware, this paper is the ﬁrst to propose a method-ology for SGC that i) deals with undirected graphs with gen-eral integer edge weights w e ≥ and truely takes the val-ues of w e into the optimization process in the way that thelarger the value is, the more favourably it will be considered,and at the same time, also deals with unweighted graphs (i.e.,graphs with unit edge weight); ii) simultaneously allocatesmembership proportion (the x ic values) for vertices that liein multiple clusters; and iii) it enforces an equal balance ofthe clusters that the sum of vertex memberships over all clus-ters are roughly the same. We propose an approach that isbased on a polynomial-size MILP model, beginning with asmall value for K –we enforce that the graph has at least K clusters. One can apply an adaptive approach to ﬁnd the bestvalue of K iteratively, but the focus of this research is to solvean instantaneous SGC problem with a given K .The method of [Palla et al. , 2005] requires obtaining E (cid:48) = { e ∈ E | w e > w ∗ } , and the graph is subsequently clusteredby ﬁnding κ -clique neighbourhoods on an unweighted graph H = ( V, E (cid:48) ) . Our method does not require the ﬁnding of κ -cliques, and it takes the values of w e into account during op-timization. Comparing with the method of [Hope and Keller,2013] (MaxMax), for unweighted graphs, if the graph is con-nected, then MaxMax will produce only one cluster whichis the entire graph. Our method, however, can deal with un-weighted graphs by converting them into weighted ones via asimple transformation.In Section 2.1, we present our basic MILP model by con-sidering a number of standard requirements for the SGCP. InSection 2.2, we discuss our strategies for graph connectivity.In Section 2.3, we discuss two objectives: 1) minimizing thetotal inter-cluster cut, and 2) maximize the total intra-clusterassociation. In Section 3, we present preliminary numericalresults. We then conclude our ﬁndings and discuss future re-search directions in Section 4. We ﬁrst introduce the notation used in this paper. Let: • A = { a ij ∈ { , } | i, j ∈ V } be the adjacency matrixof G ; • M w = max { w e | e ∈ E } the maximum edge weight; • K be the number of clusters; • C = { c . . . , c K } be the set of clusters; • y i,c ∈ { , } be a binary decision variable with y i,c = 1 indicating Vertex i is a member of Cluster c ; • x i,c ∈ [0 , be a continuous decision variable indicatingthe membership of Vertex i in Cluster c ; • κ ( c , c ) be the cut between clusters c and c ; • < µ < a predetermined minimum membership if avertex is a member of a cluster; • < δ < a predetermined tolerance equal balance ofcluster membership; • < ν < a predetermined maximum overlap factor.Now we introduce the constraints. First, we have a set of Membership Constraints . The membership of Vertex i inCluster c can only be non-zero if it is a member of c , andwhen it is, the membership must be no less than a predeter-mined value. x i,c ≤ y i,c , ∀ i ∈ V, ∀ c ∈ C (1) x i,c ≥ µy i,c , ∀ i ∈ V, ∀ c ∈ C (2)Let L i , for each i ∈ V , be an auxiliary binary variablewith L i = 1 indicating i is in at least one of the clusters and L i = 0 otherwise. We require that the sum of membershipsfor any vertex over all clusters is exactly 1 if the vertex is amember of at least one cluster and 0 otherwise. y i,c ≤ L i , ∀ i ∈ V, ∀ c ∈ C (3) L i ≤ (cid:88) c ∈C y i,c , ∀ i ∈ V (4) (cid:88) c ∈C x i,c = L i , ∀ i ∈ V (5)We consider an Equal Balance Requirement where the sumof memberships in all clusters are “roughly” the same. Asfar as we aware, this is the ﬁrst method that considers such arequirement. This requirement can be modelled as below. (1 − δ ) (cid:88) i ∈ V x i,c ≤ (cid:88) i ∈ V x i,c ≤ (1 + δ ) (cid:88) i ∈ V x i,c , ∀ c , c ∈ C , c (cid:54) = c (6)Next, we consider the Overlap Cardinality Constraints . Let t ic ,c be auxiliary binary decision variable such that t ic ,c =1 if and only if i is a member of both clusters c , c , i.e., y i,c = y i,c = 1 . We have that: y i,c + y i,c ≤ t ic ,c + 1 , ∀ i ∈ V, c , c ∈ C , c (cid:54) = c (7) t ic ,c ≤ y i,c , ∀ i ∈ V, c , c ∈ C , c (cid:54) = c (8) t ic ,c ≤ y i,c , ∀ i ∈ V, c , c ∈ C , c (cid:54) = c (9)iven any pairs of clusters c , c ∈ C , with c (cid:54) = c , thenumber of vertices that are in both clusters cannot be largerthan a predetermined fraction, ν , of the cardinality of eitherof the clusters. (cid:88) i ∈ V t ic ,c ≤ ν (cid:88) i ∈ V y i,c ∀ c , c ∈ C , c (cid:54) = c (10) (cid:88) i ∈ V t ic ,c ≤ ν (cid:88) i ∈ V y i,c ∀ c , c ∈ C , c (cid:54) = c (11)To calculate the Inter-cluster Cuts between c and c , we re-quire auxiliary binary variables and nonlinear terms. First, let η i,jc ,c be a binary variable such that η i,jc ,c = 1 if both of i, j are in the intersection of clusters c and c . t ic ,c + t jc ,c ≤ η i,jc ,c + 1 , ∀ i (cid:54) = j ∈ V, c (cid:54) = c ∈ C (12) η i,jc ,c ≤ t ic ,c , ∀ i (cid:54) = j ∈ V, c (cid:54) = c ∈ C (13) η i,jc ,c ≤ t jc ,c , ∀ i (cid:54) = j ∈ V, c (cid:54) = c ∈ C (14)We then use a binary variable s ec ,c to indicate the existenceof an edge (cut) e = ( i, j ) (i.e., a ij = 1 ) with i in c and j in c , but not both in the intersection of c and c (otherwise the“cut” should not be counted). The constraints are as below.For each pair of distinct vertices i, j ∈ V, i (cid:54) = j and each pairof distinct clusters c , c ∈ C , c (cid:54) = c , we have that: y i,c + y j,c + a ij + (1 − η i,jc ,c ) ≤ s ec ,c + 3 (15) s ec ,c ≤ y i,c (16) s ec ,c ≤ y j,c (17) s ec ,c ≤ a ij (18) s ec ,c ≤ (1 − η i,jc ,c ) (19)Now, the cut between two distinct vertices i, j ∈ V acrosstwo distinct clusters c , c ∈ C , if edge e = ( i, j ) ∈ E exists,is deﬁned by: w e v ec ,c , for v ec ,c = ( x i,c + x j,c ) s ec ,c (20)The terms x i,c s ec ,c and x j,c s ec ,c are bilinear, and can belinearized by introducing auxiliary non-negative continuousvariables τ e,ic ,c ≥ and τ e,jc ,c ≥ and the following con-straints. For each e = ( i, j ) ∈ E, c , c ∈ C , c (cid:54) = c , wehave that: τ e,ic ,c ≤ x i,c (21) τ e,ic ,c ≤ s ec ,c (22) τ e,ic ,c ≥ − s ec ,c + x i,c (23) τ e,jc ,c ≤ x j,c (24) τ e,jc ,c ≤ s ec ,c (25) τ e,jc ,c ≥ − s ec ,c + x j,c (26)The cut κ ( c , c ) is calculated by the following linear term. κ ( c , c ) = (cid:88) ( i,j ) ∈ E w i,j ( τ e,ic ,c + τ e,jc ,c ) (27)Now, we consider the Intra-cluster Association calcula-tions. Let z ci,j be an auxiliary binary variable with z ci,j = 1 if i, j are both in c and that the edge e = ( i, j ) exists in E . Theconstraints are give by: y i,c + y j,c + a ij ≤ z ci,j + 2 , ( i, j ) ∈ E, ∀ c ∈ C (28) z ci,j ≤ y i,c , ( i, j ) ∈ E, ∀ c ∈ C (29) z ci,j ≤ y j,c , ( i, j ) ∈ E, ∀ c ∈ C (30) z ci,j ≤ a ij , ( i, j ) ∈ E, ∀ c ∈ C (31)The intra-cluster association of c ∈ C is given by A ( c ) = (cid:88) ( i,j ) ∈ E w i,j ( x i,c + x j,c ) z ci,j We linearize the association using auxiliary continuous vari-ables π c,ii,j and π c,ji,j to capture the memberships of two verticesin the same cluster should an edge exists between them. I.e.,when z ci,j = 1 , π c,ii,j = x i,c , otherwise, π c,ii,j = 0 . (Similarlyfor π c,ji,j ). The constraints are as below. π c,ii,j ≤ x i,c , ∀ e = ( i, j ) ∈ E, c ∈ C (32) π c,ii,j ≤ z ci,j , ∀ e = ( i, j ) ∈ E, c ∈ C (33) π c,ii,j ≥ − z ci,j + x i,c , ∀ e = ( i, j ) ∈ E, c ∈ C (34) π c,ji,j ≤ x j,c , ∀ e = ( i, j ) ∈ E, c ∈ C (35) π c,ji,j ≤ z ci,j ∀ e = ( i, j ) ∈ E, c ∈ C (36) π c,ji,j ≥ − z ci,j + x j,c , ∀ e = ( i, j ) ∈ E, c ∈ C (37)The association within a cluster c , denoted by A ( c ) , is calcu-lated as follows: A ( c ) = (cid:88) ( i,j ) ∈ E w i,j ( π c,ii,j + π c,ji,j ) (38) Clusters are required to be connected, it is likely that thiscan only be achieved with exponentially many variables andsolved using a branch-and-price approach, or with exponen-tially many constraints and solved using a branch-and-cut ap-proach. Our approach here is not an exact one, as we wish tokeep the formulation compact (i.e., polynomial in size). Wederive a number of constraints that capture a few conditionsfor graph connectivity that are necessary but not sufﬁcient.When there is a connectivity violation, violation eliminationconstraints can be added in a lazy fashion. (

Lazy constraints is a technical term in integer programming–hard (and oftenexponentially many) constraints are relaxed, and are onlyadded when the current integer optimal solution to the re-laxed problem violates them–usually only a very small num-ber of them are violated–the problem is then re-optimized,and the procedure recurs until there is no more violated hardconstraints). In our preliminary test where we used randomlygenerated undirected graphs (see Preliminary Numerical Re-sults section), most of the problem instances produced con-nected clusters.First of all, the number of edges in a connected undirectedgraph cannot be smaller than the cardinality of the graph mi-nus one. We deﬁne γ ci,j ∈ { , } to be a span variable thatan only be one when i and j are both in Cluster c and thatthe edge ( i, j ) exists in E . We have that: γ ci,j ≤ a ij , ( i, j ) ∈ E, ∀ c ∈ C (39) γ ci,j ≤ y i,c , ( i, j ) ∈ E, ∀ c ∈ C (40) γ ci,j ≤ y j,c , ( i, j ) ∈ E, ∀ c ∈ C (41)and that (cid:88) i ∈ V y i,c − ≤ (cid:88) ( i,j ) ∈ E γ ci,j , ∀ c ∈ C (42)We also require that all vertices in each cluster must beconnected to at least one other vertex in the same cluster. y i,c ≤ (cid:88) j ∈ V \{ i } : ( i,j ) ∈ E z cij , ∀ i ∈ V, ∀ c ∈ C (43)However, Constraints (39)–(43) are not enough to eliminatemultiple loops within a cluster. A cluster may contain vertices { , , , , , } , but the constraints cannot eliminate the for-mation of subgraphs { , , } and { , , } within the cluster.Therefore, we borrowed the time constraints idea forAsymmetric Travelling Salesman Problem (ATSP) subtourelimination ([Miller et al. , 1960]). Let t i ≥ , i ∈ V , bea decision variable indicating the “time of arrival” at Vertex i . Suppose { , , } is a strict subset of a cluster c , to preventthe relevant span variables from forming a loop, we requirethat γ ci,j = 1 if and only if t j = t i +1 , for each pair of distinct i, j ∈ V , i (cid:54) = j and each c ∈ C . We have that: − ( | V | + 1)(1 − γ ci,j ) + 1 ≤ t j − t i (44) t j − t i ≤ | V | (1 − γ ci,j ) (45)E.g., consider the vertices { , , } a strict subset of c , a loopwith edges (1 , , (2 , , (1 , a violation of the time con-straints, as t cannot be equal to t + 1 and t + 2 simultane-ously. Constraints (44) and (45) can eliminate certain types ofloops formed by the span variables, but not all. In any case,adding them will give the span variables a better chance atmaking a full span in a cluster and thus the connectivity.Notice that the time constraints do not cut off feasible solu-tions due to the fact that even if a variable γ ci,j is forced to be0, there is no impact on the values of y ic and y jc , as (39)–(41)do not induce a bi-conditional relation. In our preliminarynumerical experiments, the time constraints are expensive toimplement, and often we do not have disconnected clusterswith just (39)–(43) alone.When cluster size K is a hard constraint, and there exist adisconnected cluster, in the case of maximizing total associa-tion, e.g., one can consider adding the following cut in a lazyfashion and re-optimize. Let I ∗ be the set of y -variables witha value of 1 in the optimal solution, we add: (cid:88) ( i,c ) ∈I y i,c ≤ |I ∗ | − . (46) Two commonly considered objective functions are: 1) mini-mize the sum of inter-cluster cuts and 2) maximize the sum of intra-cluster association given by (47) and (48) below re-spectively. min z = (cid:88) ∀ c ,c ∈C , c (cid:54) = c κ ( c , c ) (47) max w = (cid:88) ∀ c ∈C A ( c ) (48)Notice that it is necessary to enforce a minimum cluster size,otherwise a trivial optimal solution to (47) is to assign no ver-tices to any clusters. Thus we have: (cid:88) c ∈C (cid:88) i ∈ V y i,c ≥ σ | V | , (49)for < σ < a predetermined value.The two objectives should be considered simultaneously.Some may consider minimizing the sum of ratio of inter-cluster cuts and intra-cluster associations over all pairs of dis-tinct clusters. min z = (cid:88) ∀ c ,c ∈C , c (cid:54) = c κ ( c , c ) A ( c ) + A ( c ) (50)Diagram (1) below shows how the sum of ratios of inter-cluster cuts and intra-cluster associations changes. The ﬁrstdata point on the left is obtained when we optimize (47) andset w to be the value of (cid:80) ∀ c ∈C A ( c ) in the optimal solu-tion. We can see that the total intra-cluster association is alsosmall, and so is the value of (50). The last data point on theright is obtained when we optimize (48) and we set w tobe the optimal objective value. We then minimize (47) with (cid:80) ∀ c ∈C A ( c ) ≥ (cid:96) j , for (cid:96) j = j ( w − w ) , j = 1 , . . . , . Figure 1: Changes of Total Association, Total Cut, and Sum of Ra-tios.

One can, however, consider recent advances in Bi-objective Integer Programming (see, e.g., [Dai andCharkhgard, 2018]). In fact, different applications of theSGC may have a different objective function that describesthe SGC better. Besides, the structure of the graph must betaken into consideration in determining what the most appro-priate objective function is. One may therefore consider ap-plying machine learning to automate the ﬁnding of an appro-priate objective function. .4 Edge weight transformation for unweightedgraphs

For undirected graphs with w e = 1 , for all e = ( i, j ) ∈ E ,the MILP does not work very well because it cannot distin-guish between an edge in a sparse neighbourhood with one ina dense neighbourhood. To give favour to edges in a denseneighbourhood, we obtain new edge weights by calculating w (cid:48) e = 1 + |{ k ∈ V : ( i, k ) , ( j, k ) ∈ E, k (cid:54) = i, j }| , i.e., thenew edge weight of e will be one plus the number of verticesthat are connected to both of the end nodes of e . We compared our method with

CFinder [Palla et al. , 2005;Der´enyi et al. , 2005; Adamcsek et al. , 2006] and MaxMaxAlgorithm [Hope and Keller, 2013]. We used a 21-vertex in-stance of weighted graph. We can see that the clusters ofCFinder are the -clique neighbourhoods, and therefore someof the edges with heavy weight are not included, e.g., (6 , , (9 , , (17 , . Figure 2: CFinder solution.

The MaxMax solution, however, have these heavy-weightedges covered. However, for unweighted graphs, MaxMaxwill cluster the graph by connected components, so if thegraph is connected, then there will be only one cluster.

Figure 3: MaxMax solution.

The MILP solution produces not only the clusters, but alsothe membership proportion for each vertex that belongs tomore than one cluster. None of MaxMax or CFinder providesthis information. One can see clearly that the Min Cut solu-tion can also produce a relatively balanced set of clusters interms of total weighted memberships of the clusters when theclusters are connected. None of the existing SGC methodsconsidered such a constraint.

Figure 4: Minimium Total Cut solution.

In the Max Association solution below, unfortunately, be-cause the MILP enforced three clusters, and the blue clusteris not connected, we have four clusters instead, so the equalbalance cannot be guarantee in this case. In any case, thefour clusters demonstrated the four strongest connections by w e . As for the membership proportions, take Vertex 9 as anexample, since there are more heavily weighted links in thepink cluster rather than the green cluster, 0.9 of its member-ship is in the former, and 0.1 in the latter. Figure 5: Maximum Total Association solution.

In this section, we applied our method on some KKI instancesfrom https://github.com/shiruipan/graph datasets. Since theKKI instances are unweighted, we perform the edge weighttransformation described in Section 2.4. able 1: Computational results for KKI Instances with 20-50 ver-tices. M IN C UT M AX A SSOCIATION D ATA SET O PT G AP r O PT G AP r In Table 1, column

Opt presents the time taken (in seconds)for proving optimality, and the column

Gap presents the gapbetween the MIP relative gap, (i.e., the gap between the bestinteger objective and the objective of the best active node onthe branch-and-bound tree), for instances that are not solvedto optimality within the given time limit (10 minutes). Thecolumn r is the ratio of total inter-cluster cut over total intra-cluster association. We used IBM ILOG CPLEX Optimiza-tion Studio v12.8 [Support, 2018] for solving the MILPs. We have generated 10 problem classes with 5 instances foreach problem class. The problem instances were generatedusing a random problem generator where number of verticeson the undirected graph, ( | V | ), density of graph | E | | V | ( | V |− ,and an upper bound on the edge weight M w are taken as userinput. The edge weights are generated by each given a num-ber chosen uniformly randomly between [1 , M w ] . In Table 2,the names of the problem classes are in the form of: N fol-lowed by the value of | V | , d followed by the density of G , andthen M followed by the value of M w . The column CON isthe percentage of clusters that are connected in each problemclass. In the columns where two numbers were given for eachproblem class, the ﬁrst row presents the average value over 5problem instances, and the second row presents the value ofthe standard deviation. The bracket underneath each prob-lem class indicates the number of instances that are solved tooptimality versus the number of instances that are not.

Table 2: Computational results for problem instances with maxi-mum edge weight M w = 50 and 100, K = 3 clusters, and at least σ = 0 . vertices must be in at least one cluster. M IN C UT M AX A SSOCIATION D ATA SET O PT G AP R C ON O PT G AP R C ON N15 D D D D D D D D D D D D D D D D D D D D We experimented with instances with maximum edge weight M w = 50 and , K = 3 clusters, and σ = 0 . . Fromthe results of Table 2, we can see that the computation timegrows exponentially as the size of the problem grows, or thedensity of the graph grows. For problem instances that arenot solved to optimality, the MIP gaps are large. However inall problem instances, feasible solutions are found reasonablyquickly. We can see that the computation time for maximiz-ing total intra-cluster association is substantially longer thanminimizing total inter-cluster cuts. For problem instances thatare not solved to optimality, however, the former has a smallerMIP gap whilst the latter can solve larger problem instances.With an objective of maximizing total association, we did notinclude Constraint (49) as the objective will drive as muchedges used as possible. We cannot obtain a conclusive re-mark on the effects of M w in terms of computation time. We proposed and developed a polynomial-size MILP modelfor the SGCP. As far as we aware, this is the ﬁrst approachthat simultaneously allocates membership proportion (the x ic values) for vertices that lie in multiple clusters, and that en-forces an equal balance of the clusters so that the sum ofvertex memberships over all clusters are roughly the same.Compared to CFinder ([Palla et al. , 2005], [Der´enyi et al. ,2005], [Adamcsek et al. , 2006]), the clusters found in our able 3: Computational results for problem instances with maxi-mum edge weight M w = 50 , K = 2 clusters, and at least σ = 0 . vertices must be in at least one cluster. M IN C UT M AX A SSOCIATION D ATA SET O PT G AP R C ON O PT G AP R C ON N15 D D D D D D D D D D Table 4: Computational results for problem instances with maxi-mum edge weight M w = 50 , K = 4 clusters, and at least σ = 0 . vertices must be in at least one cluster. M IN C UT M AX A SSOCIATION D ATA SET O PT G AP R C ON O PT G AP R C ON N15 D D D D D D D D D method are not limited to k -clique neighbourhoods. Com-pared to MaxMax ([Hope and Keller, 2013]), our method canproduce non-trivial clusters even for a connected unweightedgraph. An obvious future research direction is to performa thorough numerical experiment on all parameters used inthe model. One can consider alternative formulations, e.g., abranch-and-cut approach, using constraints to cut off infeasi-ble solutions (namely, solutions with disconnected clusters);or, a branch-and-price approach, using exponentially manyvariables each representing a feasible cluster (in which caseconnectivity is guaranteed). Even though the cut separationand column generation subproblems themselves are expectedto be NP-hard, heuristic approaches can be derived for speedyexecution. Further, one may consider an adaptive approach toﬁnd the optimal number of clusters, instead of iteratively op-timising over different values of K . Table 5: Computational results for problem instances with maxi-mum edge weight M w = 50 , K = 3 clusters, and tested σ = 0 . versus σ = 0 . . M IN C UT M AX A SSOCIATION D ATA SET O PT G AP R C ON O PT G AP R C ON N15 D D D D D D D D D D eferences [Adamcsek et al. , 2006] Bal´azs Adamcsek, Gergely Palla,Ill´es Farkas, Imre Der´enyi, and Tam´as Vicsek. Cﬁnder:Locating cliques and overlapping modules in biologicalnetworks. Bioinformatics (Oxford, England) , 22:1021–3,05 2006.[Bertsimas and Shioda, 2007] Dimitris Bertsimas and RomyShioda. Classiﬁcation and regression via integer optimiza-tion.

Oper. Res. , 55(2):252–271, March 2007.[Biemann, 2006] Christian Biemann. Chinese whispers - anefﬁcient graph clustering algorithm and its application tonatural language processing problems. 2006.[Clauset et al. , 2004] Aaron Clauset, M. E. J. Newman, andCristopher Moore. Finding community structure in verylarge networks.

Phys. Rev. E , 70:066111, Dec 2004.[Dai and Charkhgard, 2018] Rui Dai and Hadi Charkhgard.A two-stage approach for bi-objective integer linear pro-gramming.

Oper. Res. Lett. , 46:81–87, 2018.[Der´enyi et al. , 2005] Imre Der´enyi, Gergely Palla, andTam´as Vicsek. Clique percolation in random networks.

Phys. Rev. Lett. , 94:160202, Apr 2005.[Gilpin et al. , 2013] Sean Gilpin, Siegried Nijssen, and IanDavidson. Formalizing hierarchical clustering as integerlinear programming, 2013.[Girvan and Newman, 2002] M. Girvan and M. E. J. New-man. Community structure in social and biological net-works.

Proceedings of the National Academy of Sciences ,99(12):7821–7826, 2002.[Hope and Keller, 2013] David Hope and Bill Keller. Max-max: A graph-based soft clustering algorithm appliedto word sense induction. In Alexander Gelbukh, editor,

Computational Linguistics and Intelligent Text Processing ,pages 368–381, Berlin, Heidelberg, 2013. Springer BerlinHeidelberg.[Liu and Foroushani, 2016] Ying Liu and Amir Foroushani.Pfc: An efﬁcient soft graph clustering method for ppinetworks based on purifying and ﬁltering the couplingmatrix. In De-Shuang Huang, Vitoantonio Bevilacqua,and Prashan Premaratne, editors,

Intelligent ComputingTheories and Application , pages 247–257, Cham, 2016.Springer International Publishing.[Marinka Zitnik and Leskovec, 2018] Sagar MaheshwariMarinka Zitnik, Rok Sosiˇc and Jure Leskovec. BioSNAPDatasets: Stanford biomedical network dataset collection.http://snap.stanford.edu/biodata, August 2018.[Miller et al. , 1960] C. E. Miller, A. W. Tucker, and R. A.Zemlin. Integer programming formulation of travelingsalesman problems.

Journal of the ACM , 7(4):326–329,October 1960.[Miyauchi et al. , 2018] Atsushi Miyauchi, TomohiroSonobe, and Noriyoshi Sukegawa. Exact clustering viainteger programming and maximum satisﬁability, 2018. [Palla et al. , 2005] Gergely Palla, Imre Der´enyi, Ill´es Farkas,and Tam´as Vicsek. Uncovering the overlapping commu-nity structure of complex networks in nature and society.

Nature , 435(7043):814–818, June 2005.[Pinney and Westhead, 2006] John W. Pinney and David R.Westhead. Betweenness-based decomposition methods forsocial and biological networks. 2006.[Rosvall and Bergstrom, 2008] Martin Rosvall and Carl T.Bergstrom. Maps of random walks on complex networksreveal community structure.

Proceedings of the NationalAcademy of Sciences , 105(4):1118–1123, 2008.[Sa˘glam et al. , 2006] Burcu Sa˘glam, F. Sibel Salman, SerpilSayın, and Metin T¨urkay. A mixed-integer programmingapproach to the clustering problem with an application incustomer segmentation.

European Journal of OperationalResearch , 173(3):866 – 879, 2006.[Schaeffer, 2007] Satu Elisa Schaeffer. Graph clustering.

Computer Science Review et al. , 2018] Dmitry Ustalov, AlexanderPanchenko, Chris Biemann, and Simone Paolo Ponzetto.Local-global graph clustering with applications in senseand frame induction.