From an Interior Point to a Corner Point: Smart Crossover
FFrom an Interior Point to a Corner Point: Smart Crossover
Dongdong Ge ∗ Chengwenjian Wang † Zikai Xiong ‡ Yinyu Ye § February 22, 2021
Abstract
The crossover in solving linear programs is a procedure to recover an optimal cor-ner/extreme point from an approximately optimal inner point generated by interior-pointmethod or emerging first-order methods. Unfortunately it is often observed that the compu-tation time of this procedure can be much longer than the time of the former stage. Ourwork shows that this bottleneck can be significantly improved if the procedure can smartlytake advantage of the problem characteristics and implement customized strategies. For theproblem with the network structure, our approach can even start from an inexact solution ofinterior-point method as well as other emerging first-order algorithms. It fully exploits thenetwork structure to smartly evaluate columns’ potential of forming the optimal basis andefficiently identifies a nearby basic feasible solution. For the problem with a large optimalface, we propose a perturbation crossover approach to find a corner point of the optimalface. The comparison experiments with state-of-art commercial LP solvers on classical linearprogramming problem benchmarks, network flow problem benchmarks and MINST datasetsexhibit its considerable advantages in practice.
The linear programming (LP) problem is an important fundamental tool in a wide range of practicalapplications and theoretical analysis. In linear programming, the basic feasible solution (BFS)is also equivalent to vertex solution , corner solution or extreme point of the feasible set. Inpractice, the BFS is essentially important for many reasons. First of all, a BFS is necessaryfor the warm start of simplex method, which cannot start from a non-corner solution given bythe interior point method or first-order methods. Secondly, the BFS is often sparse, which isvaluable for the LP relaxation of a mixed integer programming problem, or the subproblem ofsome discrete optimization problems. Moreover, for some mixed integer problems, such as manynetwork flow problems with integer parameters, an optimal BFS of the LP relaxation has beenproven to be equivalent to the optimal solution of the original problems. This property is called total unimodularity [Schrijver, 1998].Compared with simplex method, although interior point method often converges faster inpractice but only returns an inner point solution instead of a BFS. To deal with it, the crossover ∗ Research Institute for Interdisciplinary Sciences, Shanghai University of Finance and Economics, Shanghai,200433, China. Email: [email protected] † Corresponding author, School of Mathematical Sciences, Fudan University, Shanghai, 200086, China. Email:[email protected] ‡ Corresponding author, Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA02139, USA. Email: [email protected] § Department of Management Science and Engineering, Stanford University, Stanford, CA 94305, USA. Email:[email protected] a r X i v : . [ m a t h . O C ] F e b s the procedure to obtain an optimal BFS from an interior point solution. Unfortunately in manyLP cases its computation time often largely exceeds that of the interior point method. The samedilemma happens to popular first-order methods, which often generate approximately optimalinner point solutions. Considering that these problems usually have unique characteristics, if wecan utilize those characteristics and adopt ad-hoc crossover strategies, the performance of LPsolvers can be considerably improved.There have been significant efforts devoted to identify the optimal basis from the given interiorpoint solution. For example, [Mitra et al., 1988] and [Kortanek and Jishan, 1988] develop analgorithm called purification that implements simplex-like iterations to push the nonbasic variablewith intermediate value to the lower or upper bound. [Megiddo, 1991] gives a strongly polynomialtime algorithm to find a basis from the primal dual solution pair. [Bixby and Saltzman, 1994]slightly modify it, simultaneously maintains non-negativity in each pivot, and iterates towardscomplementarity. [Andersen and Ye, 1996] constructs an approximate problem with a knownbasic feasible solution, and proves that the optimal basis remains unchanged if starting froma high-accuracy solution. [Galabova and Hall, 2020] reveals that the open source solver Clp implements the ’Idiot’ crash to increase the sparsity of the given solution via a penalty method,aimed at warm starting the simplex method faster.Nowadays, the traditional interior point method is often not applicable for many huge scale LPproblems due to its high per iteration cost. One path of research is to simplify the computationof interior point method, via applying first order method on the subproblems and exploiting theproblem structure. For instance, instead of applying Newton’s method, [Lin et al., 2020b] proposesan ADMM-Based Interior Point Method (ABIP), which implemented alternating direction methodof multipliers (ADMM) to minimize the barrier function of primal-dual path-following methodfor LP. [Ge et al., 2019] exploits the structure of the Wasserstein barycenter problem and largelydecreased the complexity of interior point method in this problem. For some important large-scaleproblems, many researchers develop approximate methods that fully exploit the problem’s uniquestructure, especially the network structure, e.g. optimal transport problem [Cuturi, 2013,Altschuleret al., 2017, Lin et al., 2019] and Wasserstein barycenter problem [Benamou et al., 2015, Janatiet al., 2020, Lin et al., 2020a]. However, the state-of-art commercial LP solvers still cannot benefitfrom these emerging first-order solutions because those solutions are usually of low accuracy andlack the corresponding dual solution.As an important case of LP, the minimum cost flow (MCF) problem has an important propertycalled total unimodularity [Schrijver, 1998]: when all parameters in the standard form are allintegers, each BFS is also integer-valued. Total unimodularity is of unique importance becausethe solution of the mixed integer program is exactly the optimal BFS of the LP relaxation. Itshould be mentioned that the network simplex method has been well developed to obtain theoptimal BFS of MCF problems [Dantzig, 1963, Ahuja et al., 1993]. But for those problems ata huge scale, the network simplex method cannot be efficiently parallelized and there’s still noefficient algorithm developed yet, compared with general first-order methods like ABIP [Lin et al.,2020b] or other methods for its special cases, like the auction algorithm [Bertsekas and Castanon,1989] and the Sinkhorn method [Cuturi, 2013].Besides, MCF problems contain many important special cases, such as the shortest pathproblem, maximum flow problem, optimal transport problem, and assignment problems (see morein [Ahuja et al., 1993]. In many of them, the exact optimal solutions are of unique interests inmultiple areas. [Wagner, 1959] has proven that MCF problems can be equivalently transformed intooptimal transport (OT) problems. And the OT problem has served as an fundamental for a widerange of fields, including mathematics [Villani, 2003, Santambrogio, 2015], economics [Galichon,2018]) and machine learning [Nguyen, 2013, Tolstikhin et al., 2018, Ho et al., 2019]. The recent https://github.com/coin-or/Clp Contribution.
In this paper, we consider the problem in the following general form: min x c (cid:62) x s . t . Ax = b, x ≥ , (1)where A ∈ R m × n , c ∈ R n and b ∈ R m . Given a single inexact solution ˜ x , from the output ofthe interior-point method or a first order method, the crossover method can warmly start thesimplex method and iterate to an optimal BFS of the problem (1). Generally, the crossovermethod consists of two phase, basis identification and reoptimization . In the basis identificationphase, the algorithm can identify an advanced BFS near ˜ x . In the reoptimization phase, it appliessimplex-like iterations to obtain the optimal BFS.• For the emerging large scale LP problems with network structure, starting from an approxi-mate solution ˜ x , we propose a criterion to evaluate the possibility of being in the optimalbasis for each column and a column generation based basis identification phase to reacha nearby basis, without utilizing any information from the dual solution. Besides, basedon the spanning tree characteristics of BFS, we also propose a tree structure based basisidentification method. We also design a reoptimization phase to iterate towards the optimalcorner point if it is required.• For the general LP problems with long crossover computation time, usually it is the high-dimension optimal face that causes the high basis identification cost. For these problems,we propose a perturbation crossover that can firstly construct an approximate optimal facevia the primal dual solution pair, and then directly find a corner solution in optimal face.Our perturbation crossover can also choose whether to iterate towards the optimal or notaccording to the problem requirement.The numerical experiments on many classical and important datasets demonstrates thatour methods merit wide applicability and perform particularly well on problems with network3tructure, with significant improvement over Gurobi and other commercial solvers. In addition,combining with other more efficient first-order algorithms, it can greatly speed up the overallsolution of the problem. For example, on large-scale OT problems, combining with the Sinkhornmethod, we can find the exact solution in much shorter time than commercial solvers whichimplement simplex method following general barrier algorithm. Also, by using our perturbationcrossover technique, we made great breakthrough on some typical problem with high-dimensionaloptimal face and long crossover time, reducing their solving time by a factor ten or more.
Organizations.
The rest of the paper is organized as follows. In Section 2, we introduce theoptimal face and the framework of column generation method. In Section 3, we propose a networkstructure based two-phase crossover method for MCF problems and general LP problems thatcan start from an approximate solution by first-order methods. In Section 4, we propose anperturbation crossover method to accelerate the crossover procedure in general LP problems forLP solvers. The computational results are presented in Section 5.
Notations.
We let [ n ] be the shorthand for { , , . . . , n } . For a set X , the notation |X | denotesthe cardinality of X . For X ⊂ [ n ] and n -vector u , u X is the |X | -vector constituted by thecomponents of u with indices in X . n and n are the n-vector of ones and zeros. For a matrix X and scalar x , X ≥ x means each component of X is greater than or equal to x . For a matrix A = ( A ) ij , the positive part and negative part of A are A + and A − , ( A + ) ij := max { ( A ) ij , } , and ( A − ) ij := max {− ( A ) ij , } . A i · , A · j and A ij denote the i -th row, j -th column and the componentin the i -th row and the j -th column. LP problems with network structure.
There are many LP problems having network struc-ture. The Optimal Transport (OT) problem is a typical one which is also a special equivalentform of the MCF problem. The formulation of the OT problem is as follows:
Example 2.1 (Optimal Transport (OT) Problem) . Following the same notations in example3.1, the nodes N can be divided into two groups, suppliers S and consumers C . And arcs A pairwisely connect consumers and suppliers, in the direction from the former to the latter, i.e., A = S × C := { ( i, j ) : i ∈ S , j ∈ C} . For any i ∈ S , O ( i ) = C and I ( i ) = ∅ . While for any j ∈ C , O ( j ) = ∅ and I ( i ) = S . For any supplier i and consumer j , s i and d j respectively denote thecapacity of the supply and demand. Besides, the capacity of any route is unlimited, i.e., u ij = ∞ for any ( i, j ) ∈ A . Then any OT problems can be rewritten as follows: min f (cid:88) ( i,j ) ∈S×C c ij f ij s . t . (cid:88) j ∈C f ij = s i , ∀ i ∈ S (cid:88) i ∈S f ij = d j , ∀ j ∈ C f ij ≥ , ∀ ( i, j ) ∈ S × C (2)There are also many other important linear programs that origin from network structure butare not MCF problems, such as the Wasserstain Barycenter problems with fixed support.4 xample 2.2 (Fixed-Support Wasserstein Barycenter (FS-WB) Problem) . The Wassersteinbarycenter of probability measures ( µ k ) Nk =1 is the probability measure µ with the minimal sumof Wasserstein distance to each probability measure. For the convenience of computation, weusually assume that ( µ k ) nk =1 are discrete probability measures, with support points { x ki } i ∈ [ m k ] andweight u k , where u k ∈ R m k . Similarily, in FS-WB problems, the support points of the barycenter { ˆ x i } i ∈ [ m ] also fixed, but the weight µ is part of the variables. Then, the FS-WB between { µ k } Nk =1 can be represented by the following linear program: min u, { X k } k ∈ [ N ] N (cid:88) k =1 ω k (cid:104) C k , X k (cid:105) s . t . X k m = u k , X k ≥ , for all k ∈ [ N ] X (cid:62) k m k = u, for all k ∈ [ N ] u ≥ , X k ≥ , for all k ∈ [ N ] (3) where { X k } Nk =1 represents the transportation plans between the barycenter and the N probabilitymeasure, and { C k } Nk =1 denotes the cost matrices. To be specific, the cost per unit of weightbetween the i -th support point of probability µ k and the j -th support point of the barycenter µ is ( C k ) ij := d p ( x ki , ˆ x j ) . The Wasserstein barycenter problem has similar structure with the OT problem but issignificantly harder. If we fix the weight of the barycenter u , then the FS-WB problem isequivalent to N seperate OT problems. But [Lin et al., 2020a] has already proven that mostfixed-support Wasserstein barycenter problems are not MCF problems. Our smart crossover cannot only deal with MCF problems but also has good performance in the linear programs withnetwork structure, such as FS-WB problems. When given a non-corner solution, we can fix theweight u of the barycenter and use Tree-BI to obtain the BFS ¯ X k , k ∈ [ N ] for the separate OTproblems. Then ( u, { ¯ X k } k ∈ [ N ] ) is a BFS for the FS-WB problem and we can use simplex methodto further warmly start simplex method from it. Optimal face.
For the general primal linear program (1), the corresponding dual problem is max y,s b (cid:62) y s . t . A (cid:62) y + s = c, s ≥ (4)where A ∈ R m × n , c ∈ R n and b ∈ R m . [Goldman and Tucker, 1956] have proven that there existsat least one pair of optimal solution pair ( x ∗ , s ∗ ) strictly complementary, i.e., P ( x ∗ ) ∩ P ( s ∗ ) = ∅ , P ( x ∗ ) ∩ P ( s ∗ ) = [ n ] . (5)where P ( x ) := { i : x i > } . Moreover, if we denote P ∗ as P ( x ∗ ) , and denote ¯ P ∗ as P ( s ∗ ) , then {P ∗ , ¯ P ∗ } is called the optimal partition . One can further prove that for any other complementarysolution pair ( x, s ) , there is P ( x ) ⊂ P ∗ , P ( s ) ⊂ ¯ P ∗ . (6)Therefore, the optimal face of the primal problem (1) can be actually written as Θ p := { x : Ax = b, x ≥ , and x ¯ P ∗ = 0 } , (7)and the optimal face of the dual problem (4) is Θ d := { ( y, s ) : A (cid:62) y + s = c, s ≥ , and s P ∗ = 0 } . (8)5oreover, the strict complementary property shows that the relative interior of Θ p and Θ d arenonempty and [Mehrotra and Ye, 1993] have proven that the interior-point method will alwaysconverge to the relative interior of the optimal face. Therefore, the larger the cardinality of P ∗ is,the higher dimension the optimal face is of, and the more corner points there are in the optimalface Θ p , and thus the more difficult it is to identify an optimal BFS in the optimal face. Usuallyin real LP applications, the optimal solution is highly degenerate and the cardinality of P ∗ isvery large, so the crossover often become the bottleneck. Column generation method.
The column generation method is an efficient approach for largescale LP problems. Instead of directly solving the original problem (1), the column generationmethod is essentially a sequence of master iterations. In each iteration, we form a collection ofcolumns A · i , i ∈ I , and solve the following restricted problem LP ( I ) : min x (cid:88) i ∈I c i x i s . t . (cid:88) i ∈I A · i x i = b, x [ n ] \I = 0 , x ≥ . (9)See algorithm 1 for the general framework. The collection I k +1 contains the basis of the solution x k from LP ( I k ) so LP ( I k +1 ) can easily start from x k . The sequence ( I k ) k can be triviallygenerated by I k +1 ← I k ∪ arg min i { ¯ c i | i / ∈ I k } (10)in each iteration. And there are also many other variants, see e.g. [Bertsimas and Tsitsiklis, 1997]. Algorithm 1
General Column Generation Method Initialize iteration counter k = 1 . while the reduced cost ¯ c (cid:3) do Use the simplex method to solve the restricted problem LP ( I k ) . Update reduced cost ¯ c and k ← k + 1 . end while In this section, we consider the general minimum cost flow problems (11) defined below.
Definition 3.1 (Minimum Cost Flow (MCF) Problem) . In general minimum cost flow (MCF)problems, there is a directed graph G = ( N , A ) with capacity u ij and cost c ij per unit of flowfor each arc ( i, j ) ∈ A . Among them, u ij must be non-negative and even possibly infinite. And b i representing the external supply or demand for each node i ∈ N . If we use f ij to denote theamount of flow on arc ( i, j ) , a general minimum cost flow problem can be formulated as follows, min f (cid:80) ( i,j ) ∈A c ij f ij s . t . b i + (cid:80) j ∈I ( i ) f ji = (cid:80) j ∈O ( i ) f ij , ∀ i ∈ N ≤ f ij ≤ u ij , ∀ ( i, j ) ∈ A , (11) where O ( i ) := { j : ( i, j ) ∈ A} , I ( i ) := { j : ( j, i ) ∈ A} . Suppose that we have already got an approximate solution ˜ f of the problem (11) via interior-point method or any first-order methods, now we have the following lemma:6 emma 3.1. For the general MCF problem (11) , named
M CF ( b, c, u ) , for any partition of A , L and U , we can construct an equivalent problem M CF (¯ b, ¯ c, ¯ u ) . Moreover, if ¯ f ∗ is an optimalsolution of M CF (¯ b, ¯ c, ¯ u ) , then ˜ f ∗ ij := (cid:26) ¯ f ∗ ij , for ( i, j ) ∈ L u ij − ¯ f ∗ ij , for ( i, j ) ∈ U (12) is an optimal solution of M CF ( b, c, u ) , vice versa.Proof. We can shift the problem (11) to construct an equivalent MCF problem. The new MCFproblem is based on the network ¯ G := ( ¯ N , ¯ A ) , in which ¯ N = N , ¯ A = L ∪ { ( j, i ) : ( i, j ) ∈ U} . (13)Accordingly, the capacity limit ¯ u remains the same in the corresponding modified arcs; and thecost ¯ c in the same direction remain unchanged but cost in the opposite direction turns opposite.i.e., ¯ u L = u L and ¯ u ji = u ij when ( i, j ) ∈ U ¯ c L = c L but ¯ c ji = − c ij when ( i, j ) ∈ U (14)Given any flow f in the original problem (11), we can construct the corresponding new flow ¯ f by ¯ f ij = (cid:26) f ij ≤ f ij ≤ u ij and ( i, j ) ∈ L u ij − f ij ≤ f ij ≤ u ij and ( i, j ) ∈ U (15)And ¯ b i = b i − (cid:80) ( i,j ) ∈U u ij + (cid:80) ( j,i ) ∈U u ij . Then, the new MCF problem M CF ( ˜ f ) is as follows: min f (cid:80) ( i,j ) ∈ ¯ A ¯ c ij f ij s . t . ¯ b i + (cid:80) j ∈ I ( i ) f ji = (cid:80) j ∈ O ( i ) f ij , ∀ i ∈ ¯ N ≤ f ij ≤ ¯ u ij , ∀ ( i, j ) ∈ ¯ A (16)Now easy to see that now ¯ f is feasible for the new problem M CF (¯ b, ¯ c, ¯ u ) if and only if f isfeasible for the original problem M CF ( b, c, u ) .Besides, we have (cid:88) ( i,j ) ∈ ¯ A ¯ c ij ¯ f ij = (cid:88) ( i,j ) ∈A c ij f ij − (cid:88) ( j,i ) ∈U c ji u ji . Therefore, ¯ f is an optimal solution for the problem M CF (¯ b, ¯ c, ¯ u ) if and only if ¯ f is an optimalsolution for the original problem M CF ( b, c, f ) . (cid:50) [Mehrotra and Ye, 1993] have shown that the iterations of interior-point method convergeto an interior point of the optimal face. Therefore, if the solution ˜ f is of enough accuracy andwe use the partition L := { ( i, j ) : 0 ≤ f ij ≤ u ij / } and U := A \ L , this partition will keep thesame in the rest iterations. Moreover, under this circumstance, when the problem has a uniqueoptimal solution, the original MCF problem is even equivalent to the incapacitated version of
M CF (¯ b, ¯ c, ¯ u ) . Without loss of generality, now we can assume that ≤ ˜ f ij ≤ u ij / for each arc ( i, j ) ∈ A .In incapacitated MCF problems, the basic solutions have an important property: a solution f is a basic (feasible) if and only if it is a (feasible) tree solution . Generally speaking, the treesolutions in the incapacitated MCF problems are those whose nonzero components can form asubset of a spanning tree, without considering the direction of flows [Ahuja et al., 1993]. Therefore,7or a nondegenerate BFS of the incapacitated MCF problem in a connected graph, there areexactly |N | − strictly positive components, connecting each node in the graph and correspondingto a spanning tree in G . However, for an interior-point solution, the flow in each arc is strictlypositive, so the arc that serves the most flow to each specific node has higher potential andpriority of being in the basis. We can define the flow ratio to measure it. Definition 3.2 (flow ratio for MCF problem) . For any flow f , we define maximal flow onnode k : f k := (cid:80) l ∈O ( k ) f kl + (cid:80) l ∈I ( k ) f lk . And then the flow ratio of the arc is defined as r ij := max { f ij /f i , f ij /f j } . One can easily observe that the flow ratio measures the proportion that the flow in a specificarc serves in all the conjoint arcs. Since the iterations of interior-point method will convergeto relative interior of the optimal face, so for the arcs not in the optimal face, the flow ratiowill gradually converge to zero. The flow ratio contains not only the magnitude informationbut also the graph information, so the flow ratio is more powerful than the criterion proposedby [Mehrotra and Ye, 1993] in predicting the primal optimal face P ∗ . The arc/column with ahigher flow ratio is more likely to be in the basis, so given any approximate solution ˜ f we cansort them as s := ( s , s , . . . , s |N | ) , in which the first |N | arcs connect all nodes with the largestratio flow and the rest array is generally arranged in descending order of ˜ f ’s flow ratio.Now we can propose a column generation based crossover method, composed of two phases,basis identification and reoptimization. Column generation basis identification (
Col-BI ). The basis identification phase isaimed at capturing a nearby BFS of the non-corner solution ˜ f . The key idea in the basisidentification phase is to implement the column generation algorithm 1 on an equivalent problemand get a nearby BFS of the original problem. Note that for any general LP problem (1) withupper bounds of x , we can use the big-M method to form the following equivalent problem min x ,x a c (cid:62) x + M (cid:62) m x a s . t . Ax + Ix a = bl ≤ x ≤ u, x a ≥ (17)where I is an identity matrix in R m × m , and M is an large positive scalar. So there is anobvious basic feasible solution x , x (cid:62) := ( x (cid:62) x (cid:62) a ) = (0 (cid:62) n b (cid:62) ) , that can serve as the startingpoint the simplex method in the column generation method. Besides, if the original problemis the general MCF problem (11), when M is larger than n ∗ max j | c j | , easy to prove that theproblem (17) is equivalent to the original problem. Then we directly adopt algorithm 1, in which I := { n + 1 , . . . , n + m } and the sequence ( I k ) k is generated by I k +1 ← (cid:0) I k ∪ { s , s , . . . , s θ k } (cid:1) ∩ { n < i ≤ m + n : x ki is nonbasic } c . (18)Here ( θ k ) k can be any monotonously increasing integer series. When all the artificial variables of x k are nonbasic, x k n is a BFS for the original problem. Usually, if our goal is to obtain a BFSbut don’t require it to be exactly optimal, the x k n is usually already of very high quality. Column generation reoptimization (
Col-OPT ). After obtaining an advanced startingBFS of the original problem, we can continue to implement the reoptimization phase to obtain an ε -optimal corner solution (the reduced cost ¯ c ≥ − ε ). In this phase, we use the column generationalgorithm 1 on the original problem and generate the sequence ( I k ) k by I k +1 ← I k ∪ { i : ¯ c i < − ε } ∪ { s , s , . . . , s θ k } (19)8ntil ¯ c ≥ − ε . Similarly, ( θ k ) k can be any monotonously increasing integer series. Spanning tree basis identification (
Tree-BI ). Instead of using the general
Col-BI , wealso have an alternative basis identification method for MCF problems. Because the BFS of anyMCF problem must be a feasible tree solution, and a tree solution corresponds to a spanningtree in the graph, instead of directly dealing with the LP problem, we can alternatively tryto construct the spanning tree with the highest sum of flow ratio in the graph. If the givennon-corner solution is of enough high accuracy and the optimal solution is unique, such a spanningtree can exactly corresponds to the optimal BFS. If not, we can also obtain a high-quality basicsolution in a very short time. Besides, many efforts have been devoted to design the algorithm tofind the maximum/minimum weight spanning tree. For the classic algorithms with complexity O (cid:0) |A| log( |N | ) (cid:1) , there are Prim’s algorithm [Prim, 1957, Dijkstra et al., 1959] and Kruskal’salgorithm [Kruskal, 1956]. For faster almost linear time algorithms, see [Karger et al., 1995]and [Chazelle, 2000]. When the accuracy of the non-corner solution is not high enough andthe tree solution is infeasible, we can implement network-simplex-like iterations to push it tothe feasible set. Besides, since optimal transport problems have unique network structure, theiterations of the ‘push’ phase can be very simple.For OT problems, the use of network simplex iterations can be can especially simple. We havethe following lemma: Lemma 3.2.
For an OT problem, suppose that the flow f is infeasible, with f ij < , ( i, j ) ∈ A .Then there exist j (cid:48) , i (cid:48) ∈ N such that f ij (cid:48) > , f i (cid:48) j > and ( i (cid:48) , j (cid:48) ) ∈ A .Proof. Since ( i, j ) ∈ A , we have i ∈ S and j ∈ C . Besides, we have (cid:80) l ∈C f il = s i ≥ and (cid:80) l ∈ S f lj = d j ≥ . Note that we already have f ij < so there must exist f ij (cid:48) > , j (cid:48) ∈ C and f i (cid:48) j > , i (cid:48) ∈ S . Bacaues i (cid:48) ∈ S and j (cid:48) ∈ C , ( i (cid:48) , j (cid:48) ) ∈ A . (cid:50) Thus, by this lemma, the network simplex method can directly eliminate the inverse branch,say f ij < , by repeating the following process: finding j (cid:48) , i (cid:48) such that f ij (cid:48) , f i (cid:48) j > are positiveand increasing value in the loop i → j → i (cid:48) → j (cid:48) → i until one branch of this loop goes to zero.Therefore, the network simplex iteration in the push phase can be directly written as follows: Algorithm 2
The ‘Push’ Phase for OT Problems Input: a basic solution f and I = { ( i, j ) | f ij < } while I is nonempty do for ( i, j ) ∈ I do j (cid:48) ← arg max l ∈C f il i (cid:48) ← arg max k ∈S f kj θ ← min {− f ij , f i (cid:48) j , f ij (cid:48) } f ij ← f ij + θ , f ij (cid:48) ← f ij (cid:48) − θ , f i (cid:48) j ← f i (cid:48) j − θ , f i (cid:48) j (cid:48) ← θ end for Update I = { ( i, j ) | f ij < } end whileExtension to other LP problems. For the other problems with network structure, thenetwork crossover method has the potential to be easily extended. For instance, although [Linet al., 2020a] has proven that Wasserstain barycenter problem is not a special case of MCFproblems, it can be separated into many different OT problems if fixing the weight of thebarycenter. Combined with
Tree-BI for OT problems, we can obtain a nearby corner point in avery short time. 9or the general LP problems, we can also approximately regard them as MCF problems anduse column generation based network crossover. From the view of the graph, there is also anunderlying graph G := ( N , A ) and each column of A corresponds to one arc in the graph. Butdifferent from the MCF problems, we allow the arc to connect with more than two nodes. Similarto definition 3.2, now we can define the flow ratio for each column of the general LP problem (9): Definition 3.3 (Flow ratio for general LP problem) . For any non-corner solution x , we definethe maximal flow of each row k as f k := A + k · x + A − k · x . Then we can define the flow ratio for eachcolumn i as r i := max { A ki x i /f k : k = 1 , . . . m } In fact, for MCF problems, each column of matrix A contains exactly two nonzero components,so the flow ratio in the MCF problem defined in definition 3.2 is just a special case for theabove definition 3.3. Intuitively, the flow ratio can similarly measure the weight of the ‘flow’ onecolumn serves among all rows in the approximate graph G . After that, we sort the columns indescending order of flow ratio, and then follow it with the basis identification phase Col-BI andthe reoptimization phase
Col-OPT . Unlike the emerging first-order methods, the interior-point method in commercial LP solvers canusually return a high-accuracy interior-point primal dual solution pair ( x k , s k ) for problem (1).However, sometimes in real LP applications, the computation time in the crossover procedure canlargely exceed that of the interior-point method stage. In this section, we introduce a perturbationcrossover method to deal with it.[Mehrotra and Ye, 1993] find out that, for a class of classic interior-point algorithms, suchas [Güler and Ye, 1993], [Kojima et al., 1989] and [Ye, 1992], one can always get an optimalpartition by P k := { j : x kj ≥ s kj } , (20)or P k := { j : | x k +1 j − x kj | /x kj ≤ | s k +1 j − s kj | /s kj } , (21)and the primal dual pair will always converge to a relative interior point of the optimal face Θ p and Θ d . Therefore, when the optimal face contains too many corner points, the traditionalcrossover procedure might be stuck in purification stage.Since any extreme point in the optimal face is an optimal BFS, our goal is now equivalent toobtain an extreme point in the optimal face. Using the criterions in (20) and (21), we can obtain P k , an estimation of P ∗ that usually contains the real P ∗ , and get the estimated primal optimalface Θ kp := { x : Ax = b, x ≥ , and x j = 0 for j ∈ ¯ P k } , where P k ∩ ¯ P k = ∅ , P k ∩ ¯ P k = [ n ] . Then the perturbed feasibility problem on Θ kp is: min x ( c + ε ) (cid:62) x s . t . x ∈ Θ kp . (22)Besides, we have the following theorem 4.1 about the problem (22): Theorem 4.1.
If the optimal face Θ p for the original problem (1) is bounded and P ∗ ⊆ P k ,then there exists a δ > , such that for the perturbation ε whose components ε i are uniformlydistributed in [ − δ, δ ] for i ∈ P k and ε i = 0 for i ∈ ¯ P , the perturbed feasibility problem (22) has aunique optimal solution x ∗ almost surely. Moreover, x ∗ is an optimal BFS for the original primalproblem (1) . roof. First of all, since P ∗ ⊆ P k , the optimal face of the perturbed feasibility problem is exactlythe same as the optimal face of the original problem Θ p . Because the optimal face Θ p is bounded,it should be the convex hull of finite different extreme points { v , . . . , v k } , and those extremepoints of the optimal face are exactly all the optimal basic feasible solution of the problem (1).Since the feasible set is a polyhedron, the feasible set can be written as: P := { t (cid:88) i =1 λ i v i + q (cid:88) j =1 θ j w j : t (cid:88) i =1 λ i = 1 , λ i ≥ , θ j ≥ } , (23)where { v i } and { w i } are the extreme points and extreme rays of the feasible set. Let f ∗ denotethe optimal value of problem (1), then c (cid:62) v i = f ∗ when i ≤ k , and c (cid:62) v i > f ∗ when k < i ≤ t , and c (cid:62) w j > for any j .Therefore, the optimal solution of the perturbed feasibility problem (22) is still the optimalsolution of the original problem, if and only if h ( ε ) := min i ≤ k ε (cid:62) v i − min k .Note that the problem (22) has multiple optimal solutions only if there exists i, j , ≤ i < j ≤ t ,such that ( c + ε ) (cid:62) ( x i − x j ) = 0 . However, since x i (cid:54) = x j , the set { ε : ( x i − x j ) (cid:62) ε = ( x i − x j ) (cid:62) c } is of zero Lebesgue measure in R n . Besides, there are only finite pairs of extreme points in Θ p ,so the perturbed problem would have a unique optimal solution almost surely if ε uniformlydistributed in [ − δ, δ ] n . (cid:50) However, sometimes in practice, a tiny perturbation might be covered by machine error.In order to deal with it, we slightly increase the perturbation size and loosen the criterion forconstructing P k such that P k contains the optimal partition for the perturbed problem: min x ( c + ε ) (cid:62) x s . t . Ax = b, x ≥ (25)If it happens, we have the theorem 4.2 to estimate the error from perturbation ε : Theorem 4.2.
Let x k be the solution returned by the interior point method on the originalproblem (1) , with duality gap smaller than δ g . If the perturbation ε ≥ , then the optimal basicfeasible solution ¯ x ∗ returned by the perturbed problem (22) satisfies: c (cid:62) ¯ x ∗ ≤ c (cid:62) x ∗ + δ g + ( x k ) (cid:62) ε, (26) where x ∗ is an optimal solution to the original problem.Proof. Let the corresponding dual variable of x k be ( y k , s k ) , then the duality gap is ( x k ) (cid:62) s k ≤ δ g .Since the perturbation is positive, ( x k , y k , s k + ε ) is a primal dual solution pair for the perturbedproblem (25), with duality gap ( x k ) (cid:62) s k + ( x k ) (cid:62) ε .Now we have c (cid:62) x k − ( s k ) (cid:62) x k ≤ c (cid:62) x ∗ ≤ c (cid:62) x k , and c (cid:62) x k − ( s k ) (cid:62) x k ≤ ( c + ε ) (cid:62) ¯ x ∗ ≤ ( c + ε ) (cid:62) x k , which means c (cid:62) ¯ x ∗ ≤ c (cid:62) x ∗ + ( x k ) (cid:62) s k + ( x k ) (cid:62) ε ⇒ c (cid:62) ¯ x ∗ ≤ c (cid:62) x ∗ + δ g + ( x k ) (cid:62) ε. In practice, we can directly adopt crossover procedure starting from ¯ x k , the interior pointsolution of the perturbed feasibility problem, for the original problem, because ¯ x k is also an interiorpoint for the original problem. Besides, when solving the problem (22) is not obviously cheaperthan solving the problem (25), we can also let P k be [ n ] and use warm starting strategies [Yildirimand Wright, 2002, Skajaa et al., 2013] to directly solve the perturbed problem (25). Besides, afterobtaining an advanced BFS from the perturbed problem (22) or (25), we can choose to enter thereoptimization phase Col-OPT , if an optimal BFS is required.
In this section, we evaluate our smart crossover methods on a variety of datasets. For networkcrossover method, we compare it with the state-of-art LP solver on: randomly generated MCFproblems, OT problems generated by MNIST database of handwritten digits , and large network-LP problems from Hans Mittelmann’s benchmarks for optimization software . For perturbationcrossover method, we compare it with the state-of-art LP solver on Hans Mittelmann’s benchmarkof barrier LP solvers. Here, the barrier method also refers to the interior-point method. To befair, we turn off the presolve phase of all solvers.All the experiments are implemented in MATLAB-R2020b, on a Windows64(x64) computerwith processor Intel(R) Core(TM) i7-7820HQ @2.90GHz (4 cores and 8 threads), and 16GB ofRAM.For the experiments on the network crossover method, we compare with the LP commercialsolver Gurobi (9.0.3), which was demonstrated to have a top-tier performance in the largenetwork-LP benchmark. Since
Gurobi doesn’t provide a suitable parameter to warm startthe simplex method from a BFS of the perturbed feasibility problem (22), we compare theperturbation crossover method with another state-or-art LP commercial solver
Mosek (9.2),which also has a top-tier performance in the benchmark of barrier LP solvers. Also, we selectthe emerging
Sinkhorn method [Cuturi, 2013] as a representative algorithm to provide anapproximate solution in OT problems to show the power of our methods to combine with moreefficient approximation algorithms. In this subsection, we evaluate our network crossover method introduced in Section 3. Forconvenience of notation, we name the two-phase crossover
Col-BI and
Col-OPT as CNET , andname the two phase crossover
Tree-BI and
Col-OPT as TNET . Specially, for OT problems weadopt
TNET and for other problems we adopt the general
CENT method. The simplex methodused in the column generation framework are all run with
Gurobi interface. The precision of theinterior-point solution solved by the barrier algorithms (interior-point methods) can be measuredby its termination criterion of primal-dual gap. Without specially instructions, in order to mimicthe approximate solutions of the first-order methods, we set the primal-dual gap to . whenusing the barrier algorithm and set θ k = 2 k in (18) and (19). Synthetic MCF problems.
The synthetic problems can clearly shows the relationship betweenthe computation runtime of the crossover and the problem scale. Figure 1 illustrates the the http://yann.lecun.com/exdb/mnist/ http://plato.asu.edu/bench.html http://marcocuturi.net/SI.html (Accessed: 21 January, 2021) E+06 4E+06 6E+06 8E+06Number of arcs020406080100120140 C r o ss o v e r r un - t i m e ( s e c o n d s ) GurobiCNET 4E+03 6E+03 8E+03 1E+04Number of nodes020406080100120140 C r o ss o v e r r un - t i m e ( s e c o n d s ) GurobiCNET
Figure 1: Synthetic MCF problems with varying scale: run-time comparison when the numberof nodes or arcs gradually increase in our randomly generated MCF problems with integercomponents. In the left and the right figure, the number of nodes is 5E+03 and the number ofarcs is 4E+06 respectively. The experiment of each group size is independently repeated 10 times.crossover computation time with the number nodes (arcs) fixed and the number of arcs (nodes)varies. One can easily observe that, compared with the crossover procedure of
Gurobi , CNET is not only considerably faster, but also more stable and insusceptible. _ n i _ n l o l o n g n e t l a r g e n e t l a r g e n e t l a r g e n e t l a r g e s q u a r e w i d e C r o ss o v e r T i m e ( s e c o n d s ) GurobiCNET
Figure 2: Computation time of crossover on the large network-LP benchmark problems.
Large network benchmark problems.
Figure 2 and 3 shows the experiments on the MCFbenchmark problems from Hans Mittlemann’s benchmarks for optimization software. One canobserve that, besides the overall better performance than
Gurobi , as precision decreases,
CNET ’sadvantages become even more apparent and stable. This characteristic ensures that
CNET ismore suitable for starting from the low-accuracy first-order solutions.
OT problems generated by MNIST dataset.
The MNIST dataset consists of 60,000 imagesof handwritten digits of size 28 by 28 pixels. We randomly pick two images, normalize the sum of13 C r o ss o v e r T i m e ( s e c o n d s ) netlarge GurobiCNET 1 1E-2 1E-3 1E-4 1E-6050100150200250 i _ n GurobiCNET1E-1 1E-2 1E-3 1E-5 1E-8Primal-dual gap0255075100 C r o ss o v e r T i m e ( s e c o n d s ) n GurobiCNET 1E-1 1E-2 1E-6Primal-dual gap0250500750100012501500 lo GurobiCNET
Figure 3: Computation time of crossover on the large network-LP benchmark problems. Theinterior-point method terminates at different primal-dual gaps between − and − .non-zero wights to and generate the OT problem of solving the Wasserstain distance betweenthem. Figure 4 illustrated the difference between the approximate solution and the corner solution.The solution from the
Tree-BI phase is a nearby BFS of the interior-point solution. And thesolution from the
Col-OPT is the optimal BFS obtained after some simplex iterations. One caneasily see that the corner solution is of apparently higher sparsity and fully avoids the biasedblurring solution. Meanwhile, the corner solution remains the same shape but provides moreinformation and largely lowers the dimension.Figure 4: Transport plan of a randomly generated OT problem from MNIST dataset. The left,the middle, and the right are initial interior-point solution, tree solution from
Tree-BI , and thefinal-solved optimal solution respectively.And figure 5 illustrates the difference between the interior-point Wasserstein barycentersand the exact ones. Note that if fixing the barycenter and applying
Tree-BI to construct thecorresponding transportation plan, we can obtain the BFS with the same barycenter shape, sothe barycenter after the basis identification phase is the same with the interior-point barycenter.14igure 5: The Wasserstein barycenters solved from the interior-point method (the first row) andthe barycenters after crossover (the second row). Each image is the Wasserstain barycenter of 25MNIST digit images.Apparently, the exact barycenter after some simplex iterations from the reoptimization phase issharper and contains more information.To better exhibit the effect of the problem size, we artificially magnify the images by α ( α = 1 , , , ... ) times and generate the images with size α × α . Table 1 lists the experimentaldetails on MNIST problems. No matter starting from the interior-point solution or the Sinkhorn method solution,
TNET is always faster than the crossover of
Gurobi . Besides, as the problemscale grows,
TNET ’s advantages become even more apparent, which makes it also suitable forlarge-scale problems. It should be mentioned that [Dong et al., 2020] demonstrates that thenetwork simplex method has the best computation performance among several exact methods viasome experiments. Table 1 also shows that TNET can be considerably faster than directly runningthe base simplex method from the very beginning. Therefore, since the network simplex methodis an efficient implementation of the general simplex method on MCF problems, it is reasonableto conclude that the combination of the emerging fast inexact methods and the network simplexbased
TNET should be a faster exact method for OT problems.Note that [Dong et al., 2020] has already demonstrated that the network simplex methodhas the best computation performance in solving the OT problem compared with several exactmethods, so in this section we demonstrate that our crossover can also benefit from the networksimplex method and the combination with
Sinkhorn [Cuturi, 2013] can further largely outperformdirectly adopting it.Specifically, we use the commercial LP
Cplex , because it is equipped with a state-of-artnetwork simplex for MCF problems. To be fair, our
CNET also adopts the network simplex methodof
Cplex to solve the subproblems. See figure 6 for the comparison between
Sinkhorn + CNET and the network simplex method of
Cplex when the problem scale varies. One can easily observethat our crossover method can largely shorten the overall runtime if starting from an advancedsolution, and this advantage becomes more apparent as the problem scale grows.Besides, table 2 compares
Cplex (network simplex),
Cplex (barrier method + crossover)and
Sinkhorn + CNET on normally distributed random pictures. Compared with the simplexmethod of
Gurobi , Cplex ’s network simplex method is more efficient in OT problems. Thus,different from
Gurobi , Cplex ’s network simplex method is better than its barrier algorithm.But our crossover method still maintains the shortest computation time when starting from the
Sinkhorn solutions. Note that our crossover method can accelerate any simplex implementationfrom an advanced solution from other fast methods, it offers a new point of view to accelerateexact methods.
The perturbation crossover is aimed at accelerating the crossover procedure for general LPproblems, so we use the default parameters of the LP solver. In this subsection, we do experimentson Hans Mittlemann’s Benchmark problems for barrier LP solvers and compare with the state-15able 1: Run-time comparison of
Gurobi , Gurobi + TNET and
Sinkhorn + TNET on OTproblems generated from MNIST dataset. α n m gurSpl Gurobi
Gur+
TNET
Skh+
TNET gurBar gurCrs gurBar TNET Skh TNET The number of suppliers and consumers; Runtime using
Gurobi ’s simplex algorithm to solve the OT problem; Computation time of
Gurobi ’s barrier algorithm without crossover; Computation time of
Gurobi ’s crossover method after the barrier algorithm; Computation time of
TNET from
Gurobi ’s barrier method solution or the
Sinkorn method solutions; Runtime of the
Sinkhorn method; Time limit exceeded (over 1000 seconds)
Table 2: The average runtime of 10 independent trials of
Cplex (network simplex),
Cplex (barrier+crossover) and
Sinkhorn + CNET on OT problems generated by normallydistributed random pictures. size cplNetSpl Cplex Sinkhorn+CNETcplBarr cplCross Sinkhorn CNET > The size of each picture; Runtime of
Cplex ’s network-simplex (seconds); Computation time of
Cplex ’s barrier algorithm without crossover (seconds); Computation time of
Cplex ’s crossover method after the barrier (seconds); Runtime of
Sinkhorn method (seconds); Computation time of
CNET (seconds); E+03 2.5E+03 4E+03 5.5E+03 7E+03Number of suppliers and consumers050100150200 R un - t i m e ( s e c o n d s ) Cplex(Network-Simplex)Sinkhorn+CNET 4E+03 1E+04 1.6E+04 2.2E+04 2.8E+04Number of consumers (the number of suppliers is fixed)050100150200250300 R un - t i m e ( s e c o n d s ) Cplex(Network-Simplex)Sinkhorn+CNET
Figure 6: Average computation time of
Cplex (network simplex method) and
Sinkhorn + CNET on random OT problems with uniformly distributed cost: In the left diagram, the number ofsuppliers and consumers simultaneously varies between 500 and 7500; in the right diagram, thenumber of suppliers is fixed at 1000 and the number of consumers varies between 2000 and 30000.The result is the average of 10 independent trials.or-art commercial solver
Mosek since its simplex method can easily warmly start from a BFSof the perturbed feasibility problem (22). But the underlying simplex method of our crossovermethod can also be other LP solvers. Since the perturbation crossover method is especiallyvaluable for the problem with a high-dimension optimal face, it mightn’t have a good performancein accelerating the problems whose crossover procedure is already fast enough. Therefore, weconsider the problems whose crossover procedure is a bottleneck for designing an efficient barrieralgorithm. In other word, even when start from an approximate solution with high accuracy, itstill takes a long time to find a vertex solution. In our experiments, we chose all the problemswhose crossover time is at least twice longer than the interior-point method time.Table 3: Computation time on the barrier LP benchmark problems. problem original perturbedmskCross mskBarr mskCross mskSpl total time t
10 stat96v1
11 graph40-40 >
12 graph40-80rand >
13 fhnw-bin0 Computation time of
Mosek ’s crossover method after the barrier on the original problem (seconds); Computation time of
Mosek ’s barrier algorithm without crossover on the perturbed problem; Computation time of
Mosek ’s crossover method after the barrier on the perturbed problem (seconds); Runtime of
Mosek ’s simplex method when warm-start with the solution from ; Time limit exceeded (over 300 seconds);
Mosek and our perturbation crossovermethod by starting from the same default interior-point solution. It shows that our perturbationcrossover makes a significant improvement and thoroughly removes the bottleneck for someproblems. For example, the crossover for datt256_lp is a common bottleneck for almost allcommercial LP solvers , since the barrier algorithm can terminate usually in no more than seconds while the crossover procedure takes a much longer time. However, our perturbationcrossover can perfectly solve it. Also note that, dependent on the size of perturbation, thereexists a balance between the crossover computation time of the perturbed feasibility problemand that of the reoptimization phase, so in practice solvers can benefit from the perturbationcrossover method by concurrently running both the traditional crossover and our perturbationcrossover with different size of perturbation to be more robust. Acknowledgments
We thank Jiayi Guo and Fangkun Qiu for helpful discussions and fruitful suggestions.
References [Ahuja et al., 1993] Ravindra K Ahuja, James B Orlin, and Thomas L Magnanti.
Network flows:theory, algorithms, and applications . Prentice Hall, 1993.[Altschuler et al., 2017] Jason Altschuler, Jonathan Weed, and Philippe Rigollet. Near-lineartime approximation algorithms for optimal transport via sinkhorn iteration. In
Advances inNeural Information Processing Systems 30 , pages 1961—-1971. Curran Associates Inc., 2017.[Andersen and Ye, 1996] Erling D Andersen and Yinyu Ye. Combining interior-point and pivotingalgorithms for linear programming.
Management Science , 42(12):1719–1731, 1996.[Benamou et al., 2015] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna,and Gabriel Peyré. Iterative bregman projections for regularized transportation problems.
SIAM Journal on Scientific Computing , 37(2):A1111–A1138, 2015.[Bertsekas and Castanon, 1989] Dimitri P Bertsekas and David A Castanon. The auction algo-rithm for the transportation problem.
Annals of Operations Research , 20(1):67–96, 1989.[Bertsimas and Tsitsiklis, 1997] Dimitris Bertsimas and John N Tsitsiklis.
Introduction to linearoptimization , volume 6. Athena Scientific Belmont, MA, 1997.[Bixby and Saltzman, 1994] Robert E Bixby and Matthew J Saltzman. Recovering an optimallp basis from an interior point solution.
Operations Research Letters , 15(4):169–178, 1994.[Buttazzo et al., 2012] Giuseppe Buttazzo, Luigi De Pascale, and Paola Gori-Giorgi. Optimal-transport formulation of electronic density-functional theory.
Physical Review A , 85(6):062502,2012.[Carlier and Ekeland, 2010] Guillaume Carlier and Ivar Ekeland. Matching for teams.
Economictheory , 42(2):397–418, 2010.[Chazelle, 2000] Bernard Chazelle. A minimum spanning tree algorithm with inverse-ackermanntype complexity.
Journal of the ACM (JACM) , 47(6):1028–1047, 2000. http://plato.asu.edu/ftp/lpbar.html (Accessed: 21 January, 2021) Economic Theory , 42(2):317–354, 2010.[Cotar et al., 2013] Codina Cotar, Gero Friesecke, and Claudia Klüppelberg. Density functionaltheory and optimal transportation with coulomb cost.
Communications on Pure and AppliedMathematics , 66(4):548–599, 2013.[Cuturi, 2013] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal trans-port. In
Advances in Neural Information Processing Systems 26 , pages 2292–2300. CurranAssociates, Inc., 2013.[Cuturi and Doucet, 2014] Marco Cuturi and Arnaud Doucet. Fast computation of wassersteinbarycenters. In
Proceedings of the 31st International Conference on Machine Learning , pages685–693, 2014.[Dantzig, 1963] George Bernard Dantzig.
Linear programming and extensions . Princeton Univer-sity Press, 1963.[Dijkstra et al., 1959] Edsger W Dijkstra et al. A note on two problems in connexion with graphs.
Numerische Mathematik , 1(1):269–271, 1959.[Dong et al., 2020] Yihe Dong, Yu Gao, Richard Peng, Ilya Razenshteyn, and Saurabh Sawlani.A study of performance of optimal transport. arXiv preprint arXiv:2005.01182 , 2020.[Dvurechensky et al., 2018] Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin.Computational optimal transport: Complexity by accelerated gradient descent is better thanby sinkhorn’s algorithm. In
Proceedings of the 35th International Conference on MachineLearning , pages 1367–1376, 2018.[Galabova and Hall, 2020] IL Galabova and JAJ Hall. The ‘idiot’crash quadratic penalty algo-rithm for linear programming and its application to linearizations of quadratic assignmentproblems.
Optimization Methods and Software , 35(3):488–501, 2020.[Galichon, 2018] Alfred Galichon.
Optimal transport methods in economics . Princeton UniversityPress, 2018.[Ge et al., 2019] Dongdong Ge, Haoyue Wang, Zikai Xiong, and Yinyu Ye. Interior-point methodsstrike back: solving the wasserstein barycenter problem. In
Advances in Neural InformationProcessing Systems 32 , pages 6894–6905, 2019.[Genevay et al., 2016] Aude Genevay, Marco Cuturi, Gabriel Peyré, and Francis Bach. Stochasticoptimization for large-scale optimal transport. In
Advances in Neural Information ProcessingSystems 29 , pages 3440–3448. Curran Associates, Inc., 2016.[Goldman and Tucker, 1956] Alan J Goldman and Albert W Tucker. Theory of linear program-ming.
Linear Inequalities and Related Systems , 38:53–97, 1956.[Güler and Ye, 1993] Osman Güler and Yinyu Ye. Convergence behavior of interior-point algo-rithms.
Mathematical Programming , 60(1-3):215–228, 1993.[Ho et al., 2019] Nhat Ho, Viet Huynh, Dinh Phung, and Michael Jordan. Probabilistic multilevelclustering via composite transportation distance. In
The 22nd International Conference onArtificial Intelligence and Statistics , pages 3149–3157, 2019.19Jambulapati et al., 2019] Arun Jambulapati, Aaron Sidford, and Kevin Tian. A direct ˜ o (1/ep-silon) iteration parallel algorithm for optimal transport. In Advances in Neural InformationProcessing Systems 32 , pages 11359–11370. Curran Associates, Inc., 2019.[Janati et al., 2020] Hicham Janati, Marco Cuturi, and Alexandre Gramfort. Debiased Sinkhornbarycenters. In
Proceedings of the 37th International Conference on Machine Learning , pages4692–4701, 2020.[Karger et al., 1995] David R Karger, Philip N Klein, and Robert E Tarjan. A randomized linear-time algorithm to find minimum spanning trees.
Journal of the ACM (JACM) , 42(2):321–328,1995.[Kojima et al., 1989] Masakazu Kojima, Shinji Mizuno, and Akiko Yoshise. A polynomial-timealgorithm for a class of linear complementarity problems.
Mathematical Programming ,44(1-3):1–26, 1989.[Kortanek and Jishan, 1988] KO Kortanek and Zhu Jishan. New purification algorithms forlinear programming.
Naval Research Logistics (NRL) , 35(4):571–583, 1988.[Kroshnin et al., 2019] Alexey Kroshnin, Nazarii Tupitsa, Darina Dvinskikh, Pavel Dvurechensky,Alexander Gasnikov, and Cesar Uribe. On the complexity of approximating Wassersteinbarycenters. In
Proceedings of the 36th International Conference on Machine Learning , pages3530–3540, 2019.[Kruskal, 1956] Joseph B Kruskal. On the shortest spanning subtree of a graph and the travelingsalesman problem.
Proceedings of the American Mathematical society , 7(1):48–50, 1956.[Lin et al., 2020a] Tianyi Lin, Nhat Ho, Xi Chen, Marco Cuturi, and Michael I Jordan. Fixed-support wasserstein barycenters: Computational hardness and fast algorithm. In
Advancesin Neural Information Processing Systems 33 Pre-proceedings , 2020a.[Lin et al., 2019] Tianyi Lin, Nhat Ho, and Michael I Jordan. On the efficiency of the sinkhorn andgreenkhorn algorithms and their acceleration for optimal transport. In
The 36th InternationalConference on Artificial Intelligence and Statistics , pages 3982–3991, 2019.[Lin et al., 2020b] Tianyi Lin, Shiqian Ma, Yinyu Ye, and Shuzhong Zhang. An admm-basedinterior-point method for large-scale linear programming.
Optimization Methods and Software ,0(0):1–36, 2020b.[Megiddo, 1991] Nimrod Megiddo. On finding primal-and dual-optimal bases.
ORSA Journal onComputing , 3(1):63–65, 1991.[Mehrotra and Ye, 1993] Sanjay Mehrotra and Yinyu Ye. Finding an interior point in the optimalface of linear programs.
Mathematical Programming , 62(1-3):497–515, 1993.[Mitra et al., 1988] Gautam Mitra, Mehrdad Tamiz, and Joseph Yadegar. Experimental investi-gation of an interior search method within a simplex framework.
Communications of theACM , 31(12):1474–1482, 1988.[Nguyen, 2013] XuanLong Nguyen. Convergence of latent mixing measures in finite and infinitemixture models.
The Annals of Statistics , 41(1):370–400, 2013.[Prim, 1957] Robert Clay Prim. Shortest connection networks and some generalizations.
TheBell System Technical Journal , 36(6):1389–1401, 1957.20Santambrogio, 2015] Filippo Santambrogio.
Optimal transport for applied mathematicians .Springer, 2015.[Schrijver, 1998] Alexander Schrijver.
Theory of linear and integer programming . John Wiley &Sons, 1998.[Skajaa et al., 2013] Anders Skajaa, Erling D Andersen, and Yinyu Ye. Warmstarting thehomogeneous and self-dual interior point method for linear and conic quadratic problems.
Mathematical Programming Computation , 5(1):1–25, 2013.[Tolstikhin et al., 2018] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf.Wasserstein auto-encoders. In
International Conference on Learning Representations , 2018.[Villani, 2003] Cédric Villani.
Topics in optimal transportation . American Mathematical Society,2003.[Wagner, 1959] Harvey M Wagner. On a class of capacitated transportation problems.
Manage-ment Science , 5(3):304–318, 1959.[Ye, 1992] Yinyu Ye. On the finite convergence of interior-point algorithms for linear programming.
Mathematical Programming , 57(1-3):325–335, 1992.[Yildirim and Wright, 2002] E Alper Yildirim and Stephen J Wright. Warm-start strategies ininterior-point methods for linear programming.