Constraint Generation Algorithm for the Minimum Connectivity Inference Problem
CConstraint Generation Algorithm for theMinimum Connectivity Inference Problem ´Edouard Bonnet , Diana-Elena F˘al˘ama¸s , , and R´emi Watrigant Univ Lyon, CNRS, ENS de Lyon, Universit´e Claude Bernard Lyon 1, LIP, F-69342,LYON Cedex 07, France Technical University of Cluj-Napoca, Romania. { edouard.bonnet, remi.watrigant } @ens-lyon.fr , [email protected] Abstract.
Given a hypergraph H , the Minimum Connectivity In-ference problem asks for a graph on the same vertex set as H withthe minimum number of edges such that the subgraph induced by everyhyperedge of H is connected. This problem has received a lot of atten-tion these recent years, both from a theoretical and practical perspec-tive, leading to several implemented approximation, greedy and heuris-tic algorithms. Concerning exact algorithms, only Mixed Integer LinearProgramming (MILP) formulations have been experimented, all repre-senting connectivity constraints by the means of graph flows. In thiswork, we investigate the efficiency of a constraint generation algorithm ,where we iteratively add cut constraints to a simple ILP until a feasible(and optimal) solution is found. It turns out that our method is fasterthan the previous best flow-based MILP algorithm on random generatedinstances, which suggests that a constraint generation approach mightbe also useful for other optimization problems dealing with connectivityconstraints. At last, we present the results of an enumeration algorithmfor the problem. Keywords:
Hypergraph · Constraint generation algorithm · Connectiv-ity problem.
We study the problem where one wants to infer a binary relation over a set ofitems V (that is, a graph), where the input consists of some subsets of thoseitems which are known to be connected in the solution we are looking for. Inother words, the input can be represented by a hypergraph H = ( V, E ), and weare looking for an underlying undirected graph G = ( V, E ) such that for everyhyperedge S ∈ E , the subgraph induced by S , denoted by G [ S ], is connected(such a graph G will be called a feasible solution in the sequel). Observe thatit is easy to construct trivial feasible solutions to this problem: consider for in-stance the graph K ( H ) having vertex set V and an edge uv iff u and v belongto a same hyperedge. Since these solutions are unlikely to be of great interest in a r X i v : . [ c s . D S ] A ug ´E. Bonnet, D. F˘al˘ama¸s, and R. Watrigant practice, it makes sense to add an optimization criteria. In this paper, we focuson minimizing the number of edges of the solution. More formally, we study thefollowing problem: Minimum Connectivity Inference ( MCI )Input: a hypergraph H = ( V, E )Output: a graph G = ( V, E ) such that G [ S ] is connected ∀ S ∈ E Goal: minimize | E ( G ) | This optimization problem is NP-hard [11], and was first introduced for thedesign of vacuum systems [12]. It has then be studied independently in severaldifferent contexts, mainly dealing with network design: computer networks [13],social networks [3] (more precisely modeling the publish/subscribe communica-tion paradigm [7,15,19]), but also other fields, such as auction systems [8] andstructural biology [1,2]. Finally, we can mention the issue of hypergraph drawing,where, in addition to the connectivity constraints, one usually looks for graphswith additional properties ( e.g. planarity, having a tree-like structure... etc. )[5,16,17,18]. This plethora of applications explains why this problem is knownunder different names, such as
Subset Interconnected Design , MinimumTopic Overlay or Interconnection Design . For a comprehensive surveyof the theoretical work done on this problem, see [6] and the references therein.Concerning the implementation of algorithms, previous works mainly focusedon approximation, greedy and other heuristic techniques [19]. To the best of ourknowledge, the first exact algorithm was designed by Agarwal et al. [1,2] in thecontext of structural biology, where the sought graph represents the contact re-lations between proteins of a macro-molecule, which has to be inferred from ahypergraph constructed by chemical experiments and mass spectrometry. In thiswork, the authors define a Mixed Integer Linear Programming (MILP) formu-lation of the problem, representing the connectivity constraints by flows. Theyalso provide an enumeration method using their algorithm as a black box, byiteratively adding constraints to the MILP in order to forbid already found solu-tions. Both their optimization and enumeration algorithms were tested on somereal-life (from a structural biology perspective) instances for which the contactgraph was already known.This MILP model was then improved recently by Dar et al. [10], who mainlyreduced the number of variables and constraints of the formulation, but stillrepresenting the connectivity constraints by the means of flows. In addition,they also presented and implemented a number of (already known and new)reduction rules. This new MILP formulation together with the reduction ruleswere then compared to the algorithm of Agarwal et al. on randomly-generatedinstances. For every kind of tested hypergraphs (different number and sizes ofhyperedges), they observed a drastic improvement of both the execution timeand the maximum size of instances that could be solved.In this paper we initiate a different approach for this problem, by defining asimple constraint generation algorithm relying on a cut-based ILP. This methodcan be seen as an application of
Benders’ decomposition [4], where one wants onstraint Generation Algorithm for the Minimum Connectivity Problem 3 to solve a (generally large) ILP called master problem by decomposing it into asmaller (and easier to solve) one, adding new constraints from the master prob-lem when the obtained solution is infeasible (this approach is sometimes knownas row generation , because new constraints are added throughout the resolu-tion). We first present different approaches for the addition of new constraintsand compare their efficiency on random instances. We then evaluate the perfor-mance of our method by comparing it to the MILP formulation of Dar et al. onrandomly generated instances (using the same random generator).Finally, we present an algorithm for enumerating all optimal solutions of aninstance, which we compare to the approach developed by Agarwal et al. . Organization of the paper.
In the next section, we introduce our constraint gen-eration algorithm. In Section 3, we recall the random generator of Dar et al. andpresent the results of the comparison between our constraint generation algo-rithm and the flow-based MILP formulation. Finally, Section 4 is devoted to ourenumeration algorithm.
Rather than defining a single (M)ILP model whose optimal solutions coincidewith optimal solutions of the
MCI problem, our approach is a constraint gener-ation algorithm which starts with a simple ILP whose optimal solutions do notnecessarily correspond to feasible solutions for
MCI . Then, some constraints areadded to the model which is solved again. This process is repeated until we reacha feasible solution.Let us define more formally our approach. In the sequel, H = ( V, E ) willalways denote our input hypergraph, and n and m will always denote the numberof vertices and hyperedges of H , respectively. Recall that K ( H ) denotes thegraph with vertex set V having an edge uv iff u and v belong to a same hyperedge.Let us first define our starting ILP model. It has one binary variable x e for everypossible edge e of K ( H ), which takes value 1 iff the corresponding edge is in thesolution. In the following, we will thus make no distinction between solutions ofour ILP and graphs with vertex set V .The constraints that will be added are defined by cuts ( X , X , . . . , X r ), r ≥
2, where X i ⊆ V , X i (cid:54) = ∅ and X i ∩ X j = ∅ for every i, j ∈ { , . . . , r } , i (cid:54) = j . Given a cut C := ( X , . . . , X r ), we define its corresponding set of edges E ( C ) := { xy ∈ E ( G ) , x ∈ X i , y ∈ X j , i (cid:54) = j } . Given a set of cuts C , let M ( C ) bethe following ILP: ´E. Bonnet, D. F˘al˘ama¸s, and R. Watrigant Minimize (cid:80) e ∈ K ( H ) x e subject to: (cid:80) u,v ∈ S x uv ≥ | S | − ∀ S ∈ E (1) (cid:80) e ∈ E ( C ) x e ≥ r − ∀ C := ( X , . . . , X r ) ∈ C (2) x e ∈ { , } ∀ e ∈ K ( H )Constraints (1) forces the solution to contain at least | S | − X , . . . , X r to be connected com-ponents in the solution: it forces the quotient graph w.r.t. X , . . . , X r to containat least r − r = 2, then it forces the solution to have anedge between the two parts X and X .For a set S ⊆ V , define B S := { ( X, S \ X ) : X ⊆ S, X / ∈ {∅ , S }} the set of cutsconstructed from all non-trivial bipartitions of S , and P S = { ( X , . . . , X r ) : r ≥ , X i ⊆ S , X i (cid:54) = ∅ , ∪ ri =1 X i = S and X i ∩ X j = ∅ for all i, j ∈ { , . . . , r } , i (cid:54) = j } the set of cuts constructed from all non-trivial partitions of S . Moreover let B H := (cid:83) S ∈E B S and P H := (cid:83) S ∈E P S . We have the following: Proposition 1.
Optimal solutions of M ( P H ) are in one-to-one correspondencewith optimal solutions of M ( B H ) which are themselves in one-to-one correspon-dence with optimal solutions of the MCI instance.Proof.
We have B H ⊆ P H , hence a feasible solution of M ( P H ) is also a feasiblesolution of M ( B H ). A feasible solution of M ( B H ) is also a feasible solution of MCI , since otherwise Constraint (2) would not be satisfied for some bipartitionof some hyperedge. Finally a feasible solution of
MCI is a feasible solution of M ( P H ), otherwise a hyperedge would not induce a connected subgraph. (cid:117)(cid:116) By the previous proposition, it would be sufficient to solve M ( B H ) or M ( P H ).However, we have |P H | = (cid:80) S ∈E | S | − |B H | = (cid:80) S ∈E | S |− −
1, whichmakes these naive ILPs inefficient from a practical point of view. Fortunately, itturns out that for many instances in practice, only a small number of cuts among B H (resp. P H ) is actually needed in order to ensure connectivity in every hy-peredge. This idea is the basis of our constraint generation algorithm describedbelow. The quotient graph w.r.t. X , . . . , X r has r vertices v , . . . , v r , and an edge v i v j whenever there is an edge between a vertex of X i and a vertex of X j , i (cid:54) = j . A non-trivial partition of a set V is a partition where each set is different from ∅ and V .onstraint Generation Algorithm for the Minimum Connectivity Problem 5 Algorithm 1: constraint generation algorithm for
MCI
Input: a hypergraph H = ( V, E ) Output: a solution G = ( V, E ) C ← C init ( H ) G ← solve ( M ( C )) while G is not feasible do C ← C ∪ newCuts ( G ) G ← solve ( M ( C )) end Our strategy is specified by a set of initial cuts of the input hypergraph C init ( H ), and a routine newCuts ( G ) which takes a non-feasible solution G asinput, and outputs a set of cuts. If the newCuts ( . ) routine always returns cutsfrom B H (resp. P H ) that were not considered before, then the algorithm clearlyoutputs a feasible optimal solution for the problem, since it only stops whena feasible solution is found and, in the worst case, it ends by solving M ( B H )(resp. M ( P H )). This proves that Algorithm 1 always terminates and returns anoptimal solution for MCI , provided that the newCuts ( . ) routine satisfies theproperty described above. The choices of the initial set of cuts and this routineare described in the next sub-section. The choice of cuts is a crucial feature of our algorithm. The main challenge isto find the policies that will lead to a right balance of the number of addedconstraints: if too few constraints are added in each iteration, then the numberof these iterations will increase, which will then result in a lack of efficiency.On the opposite, if too many constraints are added at the beginning and/orin each iteration, then the size of the ILP will increase too quickly, which willslow down the solver, and then result in a lack of efficiency once again. Here wepresent a set of initial set of cuts, and three possible newCuts ( . ) routines. Wethen conducted an empirical evaluation of these strategies (using the initial setof cuts or not, followed by one of the three newCuts ( . ) routine, thus defining sixpossible strategies). Initial set of cuts.
For every hyperedge S ∈ E , and every vertex v ∈ S , the ideais to add the cut ( { v } , S \ { v } ). This set of cuts forbids solutions with isolatedvertices in every hyperedge. One could also consider cuts ( X, X \ S ) formed fromevery subset X ⊆ S of a fixed size q . However, for q = 2 already, we noticed adrop of efficiency, mainly caused by the large number of constraints it creates.Hence, we shall initialize C init with the cuts formed by singletons only. In thesequel, this initial set of cuts will sometimes be called singleton cuts . ´E. Bonnet, D. F˘al˘ama¸s, and R. Watrigant The newCuts ( . ) routine. Given a non-feasible solution G of MCI , recall that weshall add, for every hyperedge S such that G [ S ] is disconnected, a set of cuts. Let S be such a hyperedge. Notice that the objective is not to guarantee connectivityin the very next iteration of the algorithm, but to constrain the model more andmore. Let S , . . . , S p be the connected components of G [ S ], with p ≥
2. Weconsidered three natural ideas for the set of new cuts corresponding to S in thissituation: – Routine 1 : add only one cut ( A, B ) corresponding to a balanced bipartitionof the connected components, that is, A ∪ B = S , A ∩ B = ∅ and S i ⊆ A or S i ⊆ B for every i ∈ { , . . . , p } , and the absolute value of | A | − | B | isas minimum as possible. Since the problem of finding a balanced bipartitionof a given set of numbers is an NP-hard problem, the computation of thebipartition was done using a polynomial greedy algorithm which considersconnected components in decreasing order w.r.t. their sizes, and iterativelyadds each of them to A (resp. B ) whenever | A | < | B | (resp. | A | ≥ | B | ). Noticethat this algorithm provides a -approximation of an optimal bipartition,and runs in O ( p log p ) time [14]. – Routine 2 : add the cut ( S i , ∪ j (cid:54) = i S j ), for every i ∈ { , . . . , p } . This ideaforbids S i to be disconnected from the rest of S in the next iteration. – Routine 3 : add the cut ( S , . . . , S p ). Here, we simply forbid G [ S ] to havethe exact same connected components in the next round.Observe that the first two strategies return cuts from the set B H defined previ-ously, while the third one returns a cut which belongs to P H . In all three cases,the routine returns cuts which were not in the model, hence guaranteeing theoptimality and termination of our algorithms, as seen previously.Combining the above choices, it gives six different strategies: – Strategy 1: initial set of cuts: none ; newCuts ( . ): Routine 1 – Strategy 2: initial set of cuts: none ; newCuts ( . ): Routine 2 – Strategy 3: initial set of cuts: none ; newCuts ( . ): Routine 3 – Strategy 4: initial set of cuts: singleton cuts ; newCuts ( . ): Routine 1 – Strategy 5: initial set of cuts: singleton cuts ; newCuts ( . ): Routine 2 – Strategy 6: initial set of cuts: singleton cuts ; newCuts ( . ): Routine 3After an empirical evaluation of the above strategies for different kind of in-stances, we observed a similar behaviour for all of them, with a high deviationfor seemingly similar instances. Nevertheless, we could observe that on average,strategies 4, 5, and 6 were more efficient than strategies 1, 2 and 3, especially forinstances with a high number of vertices, which suggests that using a non-emptyset of initial set of cuts should always be better. The closeness of the results forthe three routines can be explained by the fact that in practice (in our randominstances, all having less than 25 vertices), the number of connected componentsof every hyperedge of non-feasible solution is usually small (frequently 2 or 3,and often smaller than 5), which leads to similar ILP models to be solved (for in-stance, when there are only two connected components, all three routines outputexactly the same set of cuts). onstraint Generation Algorithm for the Minimum Connectivity Problem 7 Our first empirical results suggest that a more fine-grained comparison shouldbe performed in order to better understand which hypergraph parameters influ-ence the efficiency of our different strategies. This approach could then be usedin a more general algorithm which would first analyze the instance to solve,and then choose the right strategy to use. Another option would be to run allstrategies in parallel in order to obtain the least running time for every instance.In the sequel, we decided to effectively use the singleton cuts as initial set ofcuts, and to use Routine 1 as newCuts ( . ) (that is, it corresponds to strategy 4described above). Our random generator of instances follows the same rules as in the experimentconducted by Dar et al. [10]. A given scenario depends on the following features: – Number of vertices n of the hypergraph. – Density of the hypergraph d = mn . As in [10], we used the followingvalues: d ∈ { , , } . – Hyperedge size bounds and distributions. For this parameter, we usedthe four types defined by [10] plus a new fifth type. For the first four, a sizeis chosen uniformly at random for each hyperedge among prescribed upperand lower bounds: • Type 1: sizes of hyperedges between 2 and n • Type 2: sizes of hyperedges between 2 and (cid:100) n/ (cid:101)• Type 3: sizes of hyperedges between (cid:100) n/ (cid:101) and n • Type 4: sizes of hyperedges between (cid:100) n/ (cid:101) and (cid:100) n/ (cid:101) .Then, for each hyperedge, vertices are chosen uniformly at random until thedesired size is reached. For the fifth type, hyperedges are chosen uniformly atrandom among all possible hyperedges. To do so, for each hyperedge, eachvertex is added with probability 1 / n )for the fifth one.In the following, a scenario corresponds to a triple ( n, d, T ype ). In all exper-iments conducted in this paper, 50 instances were generated for each scenario.Moreover, a time limit of 900 seconds (15 minutes) was set for each instance. In this sub-section, we present the results of the comparison between our con-straint generation algorithm and the best state-of-the-art exact algorithm for
MCI , which is the improved flow-based MILP model of Dar et al. [10]. As ex-plained in the introduction, this algorithm is itself an improvement of a previous ´E. Bonnet, D. F˘al˘ama¸s, and R. Watrigant algorithm of Agarwal et al. [1]. Although both algorithms rely on a flow-basedMILP formulation of the problem, the improvement of Dar et al. can be sum-marized as follows: – The MILP formulation of Dar et al. contains less variables and constraints,mainly because of a factoring of several linearly-dependent constraints inthe previous formulation. They also added some new constraints in order tospeed-up the resolution. – The algorithm of Dar et al. also contains several pre-processing rules whosepurpose is to reduce the number of vertices and hyperedges of the inputinstance, and thus reduce the size of the MILP formulation. These reductionrules rely on some observations of the problem, dealing with parts of theinstances where the structure of an optimal solution can be inferred in poly-nomial time ( e.g. when a set of vertices belong to a same set of hyperedgesof a large size). Notice that Dar et al. conducted an experimental evaluationof their reduction rules in [9].For the sake of completeness, we provide the MILP formlulation of Dar etal. . To this end, let us first introduce some notions and definitions. For everyhyperedge S ∈ E they choose an arbitrary vertex r S ∈ S to be the source ofthe flow which will ensures connectivity. Hence, they define a complete digraph A ( S ) with vertex set S and, in addition to a variable x e for every edge of K ( H ),their model has also a variable f Sa for every arc a of A ( S ). For a vertex v ∈ S , A − S ( v ) (resp. A + S ( v )) denotes the set of arcs of A ( S ) entering v (resp. leaving v ).The model is the following:Minimize (cid:80) e ∈ K ( H ) x e subject to: (cid:80) u,v ∈ S x uv ≥ | S | − ∀ S ∈ E (cid:80) a ∈ A − S ( v ) f Sa − (cid:80) a ∈ A + S ( v ) f Sa = − ∀ S ∈ E , ∀ v ∈ S \ r S f Suv + f Svu ≤ ( | S | − x e ∀ S ∈ E , ∀ u, v ∈ Sf Sa ≥ ∀ S ∈ E , ∀ a ∈ A ( S ) x e ∈ { , } ∀ e ∈ K ( H )Since our goal was mainly to compare the performance of our constraint gen-eration algorithm to a simple (M)ILP formulation, the reduction rules of Dar etal. were not used for both algorithms. In the sequel, the algorithm of Dar et al. will be denoted by Flow-MILP , and our constraint generation algorithm by
CGA .All experiments were conducted on a computer equipped with an Intel R (cid:13) Xeon R (cid:13) E5620 processor (64 bits) at 2.4GHz, 24GB of RAM and a Linux system(Ubuntu version 18.04.1 LTS). The implementation of our constraint generationalgorithm (Strategy 4 described above) was written and run in SageMath version onstraint Generation Algorithm for the Minimum Connectivity Problem 9 et al. was written and runin MATLAB R (cid:13) Released R2016b. The MILP solver used in both algorithms wasCPLEX R (cid:13) version 12.8 from IBM R (cid:13) . All algorithms (including all MILP reso-lutions) were conducted sequentially, i.e. not exploiting multi-threading. Noticethat the measured time of the algorithm of Dar et al. only consists in the reso-lution of the MILP model (the purpose of the MATLAB R (cid:13) code is thus only toconstruct the MILP model from the instance), hence the difference of program-ming languages does not matter for the comparison.For each scenario ( n, d, T ype ), a set of 50 instances were generated and givento both Flow-MILP and
CGA . As said previously, for each instance, a timelimit of 900 seconds was set. Tables 1, 2 and 3 represent the results of thecomparison for densities 1, 3 and 5, respectively, where the running time isthe average running time of all instances solved within the time limit, and thenumber in brackets indicates the number of instances (out of 50) effectivelysolved within this limit in the case this number was different from 50. Thetables also show the average number of constraints in the MILP formulationof both algorithms: for
Flow-MILP it corresponds directly to the number ofconstraints of the MILP model, while for
CGA it corresponds to the numberof constraints it had to add in order to be able to solve the instance (hence, itcorresponds to the number of constraints in the last ILP solved).As we can see in the results, our approach has a much lower average runningtime compared to the previous algorithm in every scenario. Indeed, on average(for all instances of all scenarios)
CGA has a running time more than 13 timessmaller than
Flow-MILP . As we could expect, the newly introduced type 5of instances is the most difficult for both algorithms, certainly because theseinstances contain much less small hyperedges than the others. This also explainswhy type 2 instances are often the easiest to solve for both algorithms. Theseresults also highlights the fact that our algorithm is able to solve larger instancesthan previously. When considering types 1, 2, 3, and 4 only: – For m = n and n = 26 for instance, our algorithm is able to solve 100% ofinstances within the time limit, while Flow-MILP can only solve less than85% of them. – For m = 3 n , n = 20, CGA is able to solve 90% of instances, while
Flow-MILP can only solve 66%. – For m = 5 n and n = 18, CGA is able to solve 98% of instances while
Flow-MILP can only solve 82% of them.Observe also that our algorithm generate much smaller MILP models. In-deed, firstly the number of variables is always smaller, since our models do notcontain any flow variables. Secondly, as we can observe in the results, the num-ber of added constraints is roughly 6 times smaller than in the flow-based MILPmodel. Despite the fact that for each instance our algorithm needs to call the We used the implementation of [10] provided by their authors.0 ´E. Bonnet, D. F˘al˘ama¸s, and R. Watrigant n Type
Flow-MILP (sec)
Flow-MILP (con)
CGA (sec)
CGA (con)14 1 0.40 598.56 0.05 130.542 0.10 195.26 0.02 78.423 0.40 636.28 0.05 135.024 0.12 226.58 0.03 85.705 0.31 428.56 0.04 115.1216 1 0.84 830.22 0.07 162.082 0.16 277.54 0.04 99.183 1.05 958.52 0.08 178.284 0.25 358.30 0.06 115.605 1.75 618.86 0.18 152.5018 1 2.58 1163.16 0.11 201.562 0.27 372.92 0.07 123.003 8.51 1263.40 0.19 219.424 0.40 466.40 0.09 139.085 3.72 826.74 0.37 187.1820 1 24.17 1569.76 0.34 253.222 0.52 471.60 0.11 145.183 35.33 1799.16 0.58 282.624 1.88 672.04 0.54 182.145 35.64 1158.72 2.97 250.8022 1 60.53 2023.32 0.85 307.942 1.05 630.04 0.27 177.803 107.17 [49] 2386.94 1.08 342.644 6.31 838.70 1.14 216.185 137.17 [47] 1504.98 12.02 315.6424 1 119.20 [49] 2566.84 2.15 365.822 3.76 807.26 0.54 211.763 314.43 [39] 3147.10 8.58 443.444 42.49 [49] 1140.20 4.29 272.385 344.12 [28] 1929.75 122.41 [49] 404.2426 1 194.88 [38] 3342.79 28.25 448.262 19.83 1019.42 4.28 253.163 365.14 [33] 3733.33 8.39 498.224 101.67 [48] 1338.85 16.92 318.625 606.45 [8] 2382.62 285.70 [31] 478.58
Table 1.
Comparison of running times and number of constraints between
Flow-MILP and
CGA for density 1. Columns labeled with (sec) (resp. (con) represent theaverage running time (resp. number of constraints).
MILP solver several times, calling it on much smaller MILP models offers a bet-ter overall running time.We also generated instances with hyperedges sizes bounded by a (small)constant, in order to see how far we could increase the number of vertices forboth algorithms. More precisely, we generated instances with hyperedges of size onstraint Generation Algorithm for the Minimum Connectivity Problem 11 n Type
Flow-MILP (sec)
Flow-MILP (con)
CGA (sec)
CGA (con)12 1 0.90 1056.10 0.08 275.242 0.19 399.16 0.05 181.203 0.78 1153.72 0.08 291.504 0.30 469.44 0.06 198.745 0.99 814.18 0.13 253.8414 1 2.40 1645.76 0.15 366.242 0.38 586.10 0.08 233.463 2.75 1707.58 0.19 377.324 0.59 665.06 0.14 251.905 6.14 1254.08 0.69 340.6416 1 9.67 2424.56 0.42 469.522 0.97 827.48 0.21 293.363 31.98 2773.78 1.38 518.964 8.25 1056.66 1.60 340.165 155.75 [49] 1857.67 27.56 454.7818 1 35.95 3269.58 1.72 578.042 3.55 1107.86 0.66 356.143 187.60 [49] 3773.76 15.92 645.784 69.12 1411.38 12.30 417.765 393.61 [11] 2458.09 241.37 [33] 566.2120 1 178.20 [44] 4413.09 11.05 712.242 21.53 1454.96 4.73 432.203 418.32 [22] 5395.00 115.07 [48] 829.404 367.51 [16] 1943.69 274.18 [33] 532.665 -1.00 [0] 0.00 888.23 [1] 685.0022 1 330.37 [29] 5896.55 76.89 872.122 105.96 [49] 1901.41 23.52 [49] 519.823 544.25 [2] 6935.50 210.25 [35] 993.804 -1.00 [0] 0.00 689.20 [8] 623.135 -1.00 [0] 0.00 -1.00 [0] 0.00
Table 2.
Comparison of running times and number of constraints between
Flow-MILP and
CGA for density 3. Columns labeled with (sec) (resp. (con)) represent theaverage running time (resp. number of constraints).
7, and density d ∈ { , } (for density 5, the maximum number of vertices forwhich our algorithm was able to solve 100% of the instances was only 300).The differences of running time is even more significant in this experiment.The algorithm of Dar et al. fails to solve 100% of the instances within the timelimit for 200 vertices already (density 3). Moreover, for density 1, there is a hugelack of efficiency between 750 and 1000 vertices for the flow-based MILP algo-rithm, going from 100% of instances solved to 8%. Overall, we can observe thatour approach allows to solve instances or a much larger size than the previousalgorithm. n Type
Flow-MILP (sec)
Flow-MILP (con)
CGA (sec)
CGA (con)10 1 0.44 1009.94 0.06 322.382 0.19 432.96 0.05 226.903 0.47 1023.98 0.07 324.684 0.19 438.26 0.05 228.505 0.53 816.82 0.07 301.5812 1 1.28 1717.92 0.14 452.322 0.36 674.92 0.07 303.983 2.37 1862.66 0.19 479.424 0.76 786.34 0.13 331.625 2.63 1346.06 0.27 420.5814 1 6.54 2684.80 0.28 601.182 0.82 989.18 0.17 390.823 10.00 2809.78 0.42 624.304 1.79 1112.46 0.31 419.545 57.82 2108.38 5.90 568.0616 1 27.61 3892.42 0.95 765.862 2.35 1375.52 0.45 485.303 146.77 [49] 4414.90 6.91 843.044 73.50 [46] 1757.11 42.70 567.125 546.07 [20] 2976.35 176.13 [38] 728.7618 1 91.41 [48] 5527.35 3.31 963.822 15.35 1884.84 1.61 597.763 381.68 [30] 6082.20 55.23 1050.144 357.08 [36] 2313.11 104.60 [46] 684.295 440.86 [1] 4244.00 283.08 [2] 911.0020 1 178.73 [35] 7266.37 17.41 1169.502 65.76 2430.20 6.06 708.823 588.01 [1] 9003.00 347.20 [21] 1322.054 -1.00 [0] 0.00 -1.00 [0] 0.005 -1.00 [0] 0.00 -1.00 [0] 0.00
Table 3.
Comparison of running times and number of constraints between
Flow-MILP and
CGA for density 5. Columns labeled with (sec) (resp. (con)) represent theaverage running time (resp. number of constraints).
In this section, we describe an approach to enumerate all optimal solutions ofan instance of
MCI . When solving an optimization problem using an MILPformulation in which the solution is represented by 0-1 variables, a natural wayto obtain an enumeration algorithm consists in adding new constraints in orderto forbid previously found solutions. More formally, if the objective of the MILPis
M inimize n (cid:88) i =1 x i onstraint Generation Algorithm for the Minimum Connectivity Problem 13 n d Flow-MILP (sec)
Flow-MILP (con)
CGA (sec)
CGA (con)30 1 0.31 407.04 0.12 169.5830 3 5.14 1259.36 2.03 512.1450 1 0.47 699.32 0.25 286.5850 3 12.85 2066.28 2.89 856.66100 1 1.33 1396.66 0.62 572.82100 3 39.35 4176.84 4.85 1752.02200 1 5.38 2809.72 0.79 1132.54200 3 106.51 [46] 8269.80 4.76 3436.34300 1 17.20 4209.62 1.40 1691.58300 3 148.21 [33] 12415.18 8.04 5117.94400 1 41.62 5596.66 1.61 2239.86400 3 220.36 [37] 16584.57 23.73 7033.42500 1 83.89 7002.62 2.29 2792.82500 3 369.10 [46] 20739.85 94.24 8969.68750 1 296.43 10521.40 15.07 4265.36750 3 -1.00 [0] 0.00 266.82 13454.441000 1 627.295 [4] 14018.50 34.54 5645.481000 3 -1.00 [0] 0.00 666.70 [33] 17785.06
Table 4.
Results for instances with hyperedges of size 7. Columns labeled with (sec)(resp. (con)) represent the average running time (resp. number of constraints). where each x i is a 0-1 variable, then one can forbid a given solution S ⊆{ , . . . , n } represented by the indices of all variables set to 1 by adding thefollowing constraint: (cid:88) i ∈ S x i < | S | Hence, forbidding a set of solutions A can be done by adding |A| new constraintsto the model. This idea was used by Agarwal et al. [1] in order to obtain an al-gorithm enumerating all optimal solutions of an instance of MCI . This strategy,although being easy to implement, becomes much less efficient when the num-ber of solutions of the instances increases, because the size of the MILP modelbecomes too large for the solver. We propose a new method for the enumerationof solutions, which, in a nutshell, consists in forbidding the solutions “chunk bychunk”. To this end, we iteratively accumulate optimal solutions by exploringthe neighborhood of a solution found (the way we explore this neighborhoodwill be explained later). Once this exploration is done, we forbid all optimal solutions found at the same time. A pseudo-code of this approach is presentedin Algorithm 2.
Algorithm 2:
Enumeration algorithm for
MCI
Input: a hypergraph H = ( V, E ) Output: A : the set of all optimal solutions of H A ← ∅ c ∗ ← cost of an optimal solution of H while there exists a solution S of cost c ∗ which does not belong to A do N ← neighborhood of S A ← A ∪ N end return A Naturally, we use our constraint generation algorithm described previouslyin order to find new optimal solutions. Notice that once we have found oneoptimal solution of cost c ∗ , we shall add a new constraint to our ILP in orderto find new solutions of size exactly c ∗ in the next rounds, which usually speedsup the resolution. We now describe the way we explore the neighborhood ofa solution (which corresponds to Line 4 of Algorithm 2). This step is done byforbidding an arbitrary edge e of the previously found solution, by simply addinga new constraint to our ILP forcing the corresponding variable x e to 0. We thusiteratively accumulate new optimal solutions until the solver returns that theobtained ILP does not admit a solution of the desired cost, which means thatthe exploration of the neighborhood is done. We then remove the newly addedconstraints used in this routine for the next loop in Line 3 of Algorithm 2.We evaluated the performance of our approach by comparing its runningtime to the natural approach of forbidding each new optimal found in the ILPdescribed at the beginning of this section (still using our exact algorithm as ablack box for finding new solutions). To this end, we generated a set of 1000random instances of type 1 with a density of mn = 2, and n = 10. These settingswere chosen because they allow the random generation to produce instances ofvarious different structures. In particular, we observed a quite fair distributionof the numbers of solutions, which seemed to be a meaningful parameter forthe comparison of the two approaches. Figure 7 presents the result of theseexperiments. As we can see, our new method offers a great improvement whenthe number of solutions is high, by reducing by more than 8 times the runningtime in our generated instances. These results suggest that our algorithm has arunning time which is linear in the number of solutions in practice. In this paper we presented and evaluated an exact algorithm for the
MinimumConnectivity Inference problem, based on a constraint generation strategyin order to ensure connectivity. Our experiments, conducted on various randomlygenerated instances, demonstrated that our method outperforms the best pre- onstraint Generation Algorithm for the Minimum Connectivity Problem 15
Fig. 1.
Comparison of running times between the naive enumeration algorithm andour new approach, as a function of the number of solutions of the instances. viously known exact algorithm for this problem, relying on a flow-based MILPformulation. Since connectivity constraints appear very often in practical situa-tions which are usually solved by the means of MILP, our results suggest thata constraint generation strategy can sometimes be much more efficient. As afurther research, it would be interesting to apply this technique to other op-timization problems in which connectivity plays an important role. It shouldbe noted that during the empirical evaluation of the different sub-routines forour algorithm, we noticed high standard deviations in the running times. Itwould be thus interesting to understand which hypergraph parameters influ-ence the complexity of our strategies. Apart from providing useful informationabout the problem and our method, this could be used in order to build a morestructured benchmark of instances, which could be of great help for the eval-uation of future exact algorithms. Finally, our enumeration algorithms seemsto be a promising method which should be tested for other similar problems.
Acknowledgment.
We would like to thank Muhammad Abid Dar, AndreasFischer, John Martinovic and Guntram Scheithauer for providing us the sourcecode of their algorithm [10].
References
1. Agarwal, D., Ara´ujo, J.C.S., Caillouet, C., Cazals, F., Coudert, D., P´erennes, S.:Connectivity inference in mass spectrometry based structure determination. In:Proceedings of the 21st European Symposium on Algorithms (ESA 2013). pp.289–300 (2013)2. Agarwal, D., Ara´ujo, J.C.S., Caillouet, C., Cazals, F., Coudert, D., P´erennes, S.:Unveiling contacts within macro-molecular assemblies by solving minimum weightconnectivity inference problems. Molecular and Cellular Proteomics (2015)3. Angluin, D., Aspnes, J., Reyzin, L.: Inferring social networks from outbreaks. In:Algorithmic Learning Theory. pp. 104–118 (2010)4. Benders, J.F.: Partitioning procedures for solving mixed-variables programmingproblems. Numerische Mathematik (1), 238–252 (Dec 1962)5. Brandes, U., Cornelsen, S., Pampel, B., Sallaberry, A.: Path-based supports forhypergraphs. Journal of Discrete Algorithms , 248 – 261 (2012), proceedings ofthe 21st International Workshop on Combinatorial Algorithms (IWOCA 2010)6. Chen, J., Komusiewicz, C., Niedermeier, R., Sorge, M., Such´y, O., Weller, M.:Polynomial-time data reduction for the subset interconnection design problem.SIAM J. Discrete Math. (1), 1–25 (2015)7. Chockler, G., Melamed, R., Tock, Y., Vitenberg, R.: Constructing scalable overlaysfor pub-sub with many topics. In: Proceedings of the 26th Annual ACM Symposiumon Principles of Distributed Computing (PODC ’07). pp. 109–118 (2007)8. Conitzer, V., Derryberry, J., Sandholm, T.: Combinatorial auctions with struc-tured item graphs. In: Proceedings of the 19th National Conference on ArtificalIntelligence. pp. 212–218. AAAI’04 (2004)9. Dar, M.A., Fischer, A., Martinovic, J., Scheithauer, G.: A computational study ofreduction techniques for the minimum connectivity inference problem. Proceedingsof Advances in Mathematical Methods and High Performance Computing, Springerseries Advances in Mechanics and Mathematics (2018)10. Dar, M.A., Fischer, A., Martinovic, J., Scheithauer, G.: An improved flow-basedformulation and reduction principles for the minimum connectivity inference prob-lem. Optimization (0), 1–21 (2018)11. Du, D.Z., Miller, Z.: Matroids and subset interconnection design. SIAM Journalon Discrete Mathematics (4), 416–424 (1988)12. Du, D.Z., Miller, Z.: On complexity of subset interconnection designs. Journal ofGlobal Optimization (2), 193–205 (1995)13. Fan, H., Hundt, C., Wu, Y.L., Ernst, J.: Algorithms and implementation for inter-connection graph problem. In: Combinatorial Optimization and Applications. pp.201–210 (2008)14. Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM Journal onApplied Mathematics (2), 416–429 (1969)15. Hosoda, J., Hromkoviˇc, J., Izumi, T., Ono, H., Steinov´a, M., Wada, K.: On theapproximability and hardness of minimum topic connected overlay and its specialinstances. Theoretical Computer Science , 144 – 154 (2012)16. Johnson, D.S., Pollak, H.O.: Hypergraph planarity and the complexity of drawingvenn diagrams. Journal of Graph Theory (3), 309–325 (1987)17. Klemz, B., Mchedlidze, T., N¨ollenburg, M.: Minimum tree supports for hyper-graphs and low-concurrency euler diagrams. In: Algorithm Theory (SWAT 2014).pp. 265–276 (2014)onstraint Generation Algorithm for the Minimum Connectivity Problem 1718. Korach, E., Stern, M.: The clustering matroid and the optimal clustering tree.Mathematical Programming98