[PDF] From an Interior Point to a Corner Point: Smart Crossover

Abstract

The crossover in solving linear programs is a procedure to recover an optimal corner/extreme point from an approximately optimal inner point generated by interior-point method or emerging first-order methods. Unfortunately it is often observed that the computation time of this procedure can be much longer than the time of the former stage. Our work shows that this bottleneck can be significantly improved if the procedure can smartly take advantage of the problem characteristics and implement customized strategies. For the problem with the network structure, our approach can even start from an inexact solution of interior-point method as well as other emerging first-order algorithms. It fully exploits the network structure to smartly evaluate columns' potential of forming the optimal basis and efficiently identifies a nearby basic feasible solution. For the problem with a large optimal face, we propose a perturbation crossover approach to find a corner point of the optimal face. The comparison experiments with state-of-art commercial LP solvers on classical linear programming problem benchmarks, network flow problem benchmarks and MINST datasets exhibit its considerable advantages in practice.

Full PDF

FFrom an Interior Point to a Corner Point: Smart Crossover

Dongdong Ge ∗ Chengwenjian Wang † Zikai Xiong ‡ Yinyu Ye § February 22, 2021

Abstract

The crossover in solving linear programs is a procedure to recover an optimal cor-ner/extreme point from an approximately optimal inner point generated by interior-pointmethod or emerging ﬁrst-order methods. Unfortunately it is often observed that the compu-tation time of this procedure can be much longer than the time of the former stage. Ourwork shows that this bottleneck can be signiﬁcantly improved if the procedure can smartlytake advantage of the problem characteristics and implement customized strategies. For theproblem with the network structure, our approach can even start from an inexact solution ofinterior-point method as well as other emerging ﬁrst-order algorithms. It fully exploits thenetwork structure to smartly evaluate columns’ potential of forming the optimal basis andeﬃciently identiﬁes a nearby basic feasible solution. For the problem with a large optimalface, we propose a perturbation crossover approach to ﬁnd a corner point of the optimalface. The comparison experiments with state-of-art commercial LP solvers on classical linearprogramming problem benchmarks, network ﬂow problem benchmarks and MINST datasetsexhibit its considerable advantages in practice.

The linear programming (LP) problem is an important fundamental tool in a wide range of practicalapplications and theoretical analysis. In linear programming, the basic feasible solution (BFS)is also equivalent to vertex solution , corner solution or extreme point of the feasible set. Inpractice, the BFS is essentially important for many reasons. First of all, a BFS is necessaryfor the warm start of simplex method, which cannot start from a non-corner solution given bythe interior point method or ﬁrst-order methods. Secondly, the BFS is often sparse, which isvaluable for the LP relaxation of a mixed integer programming problem, or the subproblem ofsome discrete optimization problems. Moreover, for some mixed integer problems, such as manynetwork ﬂow problems with integer parameters, an optimal BFS of the LP relaxation has beenproven to be equivalent to the optimal solution of the original problems. This property is called total unimodularity [Schrijver, 1998].Compared with simplex method, although interior point method often converges faster inpractice but only returns an inner point solution instead of a BFS. To deal with it, the crossover ∗ Research Institute for Interdisciplinary Sciences, Shanghai University of Finance and Economics, Shanghai,200433, China. Email: [email protected] † Corresponding author, School of Mathematical Sciences, Fudan University, Shanghai, 200086, China. Email:[email protected] ‡ Corresponding author, Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA02139, USA. Email: [email protected] § Department of Management Science and Engineering, Stanford University, Stanford, CA 94305, USA. Email:[email protected] a r X i v : . [ m a t h . O C ] F e b s the procedure to obtain an optimal BFS from an interior point solution. Unfortunately in manyLP cases its computation time often largely exceeds that of the interior point method. The samedilemma happens to popular ﬁrst-order methods, which often generate approximately optimalinner point solutions. Considering that these problems usually have unique characteristics, if wecan utilize those characteristics and adopt ad-hoc crossover strategies, the performance of LPsolvers can be considerably improved.There have been signiﬁcant eﬀorts devoted to identify the optimal basis from the given interiorpoint solution. For example, [Mitra et al., 1988] and [Kortanek and Jishan, 1988] develop analgorithm called puriﬁcation that implements simplex-like iterations to push the nonbasic variablewith intermediate value to the lower or upper bound. [Megiddo, 1991] gives a strongly polynomialtime algorithm to ﬁnd a basis from the primal dual solution pair. [Bixby and Saltzman, 1994]slightly modify it, simultaneously maintains non-negativity in each pivot, and iterates towardscomplementarity. [Andersen and Ye, 1996] constructs an approximate problem with a knownbasic feasible solution, and proves that the optimal basis remains unchanged if starting froma high-accuracy solution. [Galabova and Hall, 2020] reveals that the open source solver Clp implements the ’Idiot’ crash to increase the sparsity of the given solution via a penalty method,aimed at warm starting the simplex method faster.Nowadays, the traditional interior point method is often not applicable for many huge scale LPproblems due to its high per iteration cost. One path of research is to simplify the computationof interior point method, via applying ﬁrst order method on the subproblems and exploiting theproblem structure. For instance, instead of applying Newton’s method, [Lin et al., 2020b] proposesan ADMM-Based Interior Point Method (ABIP), which implemented alternating direction methodof multipliers (ADMM) to minimize the barrier function of primal-dual path-following methodfor LP. [Ge et al., 2019] exploits the structure of the Wasserstein barycenter problem and largelydecreased the complexity of interior point method in this problem. For some important large-scaleproblems, many researchers develop approximate methods that fully exploit the problem’s uniquestructure, especially the network structure, e.g. optimal transport problem [Cuturi, 2013,Altschuleret al., 2017, Lin et al., 2019] and Wasserstein barycenter problem [Benamou et al., 2015, Janatiet al., 2020, Lin et al., 2020a]. However, the state-of-art commercial LP solvers still cannot beneﬁtfrom these emerging ﬁrst-order solutions because those solutions are usually of low accuracy andlack the corresponding dual solution.As an important case of LP, the minimum cost ﬂow (MCF) problem has an important propertycalled total unimodularity [Schrijver, 1998]: when all parameters in the standard form are allintegers, each BFS is also integer-valued. Total unimodularity is of unique importance becausethe solution of the mixed integer program is exactly the optimal BFS of the LP relaxation. Itshould be mentioned that the network simplex method has been well developed to obtain theoptimal BFS of MCF problems [Dantzig, 1963, Ahuja et al., 1993]. But for those problems ata huge scale, the network simplex method cannot be eﬃciently parallelized and there’s still noeﬃcient algorithm developed yet, compared with general ﬁrst-order methods like ABIP [Lin et al.,2020b] or other methods for its special cases, like the auction algorithm [Bertsekas and Castanon,1989] and the Sinkhorn method [Cuturi, 2013].Besides, MCF problems contain many important special cases, such as the shortest pathproblem, maximum ﬂow problem, optimal transport problem, and assignment problems (see morein [Ahuja et al., 1993]. In many of them, the exact optimal solutions are of unique interests inmultiple areas. [Wagner, 1959] has proven that MCF problems can be equivalently transformed intooptimal transport (OT) problems. And the OT problem has served as an fundamental for a widerange of ﬁelds, including mathematics [Villani, 2003, Santambrogio, 2015], economics [Galichon,2018]) and machine learning [Nguyen, 2013, Tolstikhin et al., 2018, Ho et al., 2019]. The recent https://github.com/coin-or/Clp Contribution.

In this paper, we consider the problem in the following general form: min x c (cid:62) x s . t . Ax = b, x ≥ , (1)where A ∈ R m × n , c ∈ R n and b ∈ R m . Given a single inexact solution ˜ x , from the output ofthe interior-point method or a ﬁrst order method, the crossover method can warmly start thesimplex method and iterate to an optimal BFS of the problem (1). Generally, the crossovermethod consists of two phase, basis identiﬁcation and reoptimization . In the basis identiﬁcationphase, the algorithm can identify an advanced BFS near ˜ x . In the reoptimization phase, it appliessimplex-like iterations to obtain the optimal BFS.• For the emerging large scale LP problems with network structure, starting from an approxi-mate solution ˜ x , we propose a criterion to evaluate the possibility of being in the optimalbasis for each column and a column generation based basis identiﬁcation phase to reacha nearby basis, without utilizing any information from the dual solution. Besides, basedon the spanning tree characteristics of BFS, we also propose a tree structure based basisidentiﬁcation method. We also design a reoptimization phase to iterate towards the optimalcorner point if it is required.• For the general LP problems with long crossover computation time, usually it is the high-dimension optimal face that causes the high basis identiﬁcation cost. For these problems,we propose a perturbation crossover that can ﬁrstly construct an approximate optimal facevia the primal dual solution pair, and then directly ﬁnd a corner solution in optimal face.Our perturbation crossover can also choose whether to iterate towards the optimal or notaccording to the problem requirement.The numerical experiments on many classical and important datasets demonstrates thatour methods merit wide applicability and perform particularly well on problems with network3tructure, with signiﬁcant improvement over Gurobi and other commercial solvers. In addition,combining with other more eﬃcient ﬁrst-order algorithms, it can greatly speed up the overallsolution of the problem. For example, on large-scale OT problems, combining with the Sinkhornmethod, we can ﬁnd the exact solution in much shorter time than commercial solvers whichimplement simplex method following general barrier algorithm. Also, by using our perturbationcrossover technique, we made great breakthrough on some typical problem with high-dimensionaloptimal face and long crossover time, reducing their solving time by a factor ten or more.

Organizations.

The rest of the paper is organized as follows. In Section 2, we introduce theoptimal face and the framework of column generation method. In Section 3, we propose a networkstructure based two-phase crossover method for MCF problems and general LP problems thatcan start from an approximate solution by ﬁrst-order methods. In Section 4, we propose anperturbation crossover method to accelerate the crossover procedure in general LP problems forLP solvers. The computational results are presented in Section 5.

Notations.

We let [ n ] be the shorthand for { , , . . . , n } . For a set X , the notation |X | denotesthe cardinality of X . For X ⊂ [ n ] and n -vector u , u X is the |X | -vector constituted by thecomponents of u with indices in X . n and n are the n-vector of ones and zeros. For a matrix X and scalar x , X ≥ x means each component of X is greater than or equal to x . For a matrix A = ( A ) ij , the positive part and negative part of A are A + and A − , ( A + ) ij := max { ( A ) ij , } , and ( A − ) ij := max {− ( A ) ij , } . A i · , A · j and A ij denote the i -th row, j -th column and the componentin the i -th row and the j -th column. LP problems with network structure.

There are many LP problems having network struc-ture. The Optimal Transport (OT) problem is a typical one which is also a special equivalentform of the MCF problem. The formulation of the OT problem is as follows:

Example 2.1 (Optimal Transport (OT) Problem) . Following the same notations in example3.1, the nodes N can be divided into two groups, suppliers S and consumers C . And arcs A pairwisely connect consumers and suppliers, in the direction from the former to the latter, i.e., A = S × C := { ( i, j ) : i ∈ S , j ∈ C} . For any i ∈ S , O ( i ) = C and I ( i ) = ∅ . While for any j ∈ C , O ( j ) = ∅ and I ( i ) = S . For any supplier i and consumer j , s i and d j respectively denote thecapacity of the supply and demand. Besides, the capacity of any route is unlimited, i.e., u ij = ∞ for any ( i, j ) ∈ A . Then any OT problems can be rewritten as follows: min f (cid:88) ( i,j ) ∈S×C c ij f ij s . t . (cid:88) j ∈C f ij = s i , ∀ i ∈ S (cid:88) i ∈S f ij = d j , ∀ j ∈ C f ij ≥ , ∀ ( i, j ) ∈ S × C (2)There are also many other important linear programs that origin from network structure butare not MCF problems, such as the Wasserstain Barycenter problems with ﬁxed support.4 xample 2.2 (Fixed-Support Wasserstein Barycenter (FS-WB) Problem) . The Wassersteinbarycenter of probability measures ( µ k ) Nk =1 is the probability measure µ with the minimal sumof Wasserstein distance to each probability measure. For the convenience of computation, weusually assume that ( µ k ) nk =1 are discrete probability measures, with support points { x ki } i ∈ [ m k ] andweight u k , where u k ∈ R m k . Similarily, in FS-WB problems, the support points of the barycenter { ˆ x i } i ∈ [ m ] also ﬁxed, but the weight µ is part of the variables. Then, the FS-WB between { µ k } Nk =1 can be represented by the following linear program: min u, { X k } k ∈ [ N ] N (cid:88) k =1 ω k (cid:104) C k , X k (cid:105) s . t . X k m = u k , X k ≥ , for all k ∈ [ N ] X (cid:62) k m k = u, for all k ∈ [ N ] u ≥ , X k ≥ , for all k ∈ [ N ] (3) where { X k } Nk =1 represents the transportation plans between the barycenter and the N probabilitymeasure, and { C k } Nk =1 denotes the cost matrices. To be speciﬁc, the cost per unit of weightbetween the i -th support point of probability µ k and the j -th support point of the barycenter µ is ( C k ) ij := d p ( x ki , ˆ x j ) . The Wasserstein barycenter problem has similar structure with the OT problem but issigniﬁcantly harder. If we ﬁx the weight of the barycenter u , then the FS-WB problem isequivalent to N seperate OT problems. But [Lin et al., 2020a] has already proven that mostﬁxed-support Wasserstein barycenter problems are not MCF problems. Our smart crossover cannot only deal with MCF problems but also has good performance in the linear programs withnetwork structure, such as FS-WB problems. When given a non-corner solution, we can ﬁx theweight u of the barycenter and use Tree-BI to obtain the BFS ¯ X k , k ∈ [ N ] for the separate OTproblems. Then ( u, { ¯ X k } k ∈ [ N ] ) is a BFS for the FS-WB problem and we can use simplex methodto further warmly start simplex method from it. Optimal face.

For the general primal linear program (1), the corresponding dual problem is max y,s b (cid:62) y s . t . A (cid:62) y + s = c, s ≥ (4)where A ∈ R m × n , c ∈ R n and b ∈ R m . [Goldman and Tucker, 1956] have proven that there existsat least one pair of optimal solution pair ( x ∗ , s ∗ ) strictly complementary, i.e., P ( x ∗ ) ∩ P ( s ∗ ) = ∅ , P ( x ∗ ) ∩ P ( s ∗ ) = [ n ] . (5)where P ( x ) := { i : x i > } . Moreover, if we denote P ∗ as P ( x ∗ ) , and denote ¯ P ∗ as P ( s ∗ ) , then {P ∗ , ¯ P ∗ } is called the optimal partition . One can further prove that for any other complementarysolution pair ( x, s ) , there is P ( x ) ⊂ P ∗ , P ( s ) ⊂ ¯ P ∗ . (6)Therefore, the optimal face of the primal problem (1) can be actually written as Θ p := { x : Ax = b, x ≥ , and x ¯ P ∗ = 0 } , (7)and the optimal face of the dual problem (4) is Θ d := { ( y, s ) : A (cid:62) y + s = c, s ≥ , and s P ∗ = 0 } . (8)5oreover, the strict complementary property shows that the relative interior of Θ p and Θ d arenonempty and [Mehrotra and Ye, 1993] have proven that the interior-point method will alwaysconverge to the relative interior of the optimal face. Therefore, the larger the cardinality of P ∗ is,the higher dimension the optimal face is of, and the more corner points there are in the optimalface Θ p , and thus the more diﬃcult it is to identify an optimal BFS in the optimal face. Usuallyin real LP applications, the optimal solution is highly degenerate and the cardinality of P ∗ isvery large, so the crossover often become the bottleneck. Column generation method.

The column generation method is an eﬃcient approach for largescale LP problems. Instead of directly solving the original problem (1), the column generationmethod is essentially a sequence of master iterations. In each iteration, we form a collection ofcolumns A · i , i ∈ I , and solve the following restricted problem LP ( I ) : min x (cid:88) i ∈I c i x i s . t . (cid:88) i ∈I A · i x i = b, x [ n ] \I = 0 , x ≥ . (9)See algorithm 1 for the general framework. The collection I k +1 contains the basis of the solution x k from LP ( I k ) so LP ( I k +1 ) can easily start from x k . The sequence ( I k ) k can be triviallygenerated by I k +1 ← I k ∪ arg min i { ¯ c i | i / ∈ I k } (10)in each iteration. And there are also many other variants, see e.g. [Bertsimas and Tsitsiklis, 1997]. Algorithm 1

General Column Generation Method Initialize iteration counter k = 1 . while the reduced cost ¯ c (cid:3) do Use the simplex method to solve the restricted problem LP ( I k ) . Update reduced cost ¯ c and k ← k + 1 . end while In this section, we consider the general minimum cost ﬂow problems (11) deﬁned below.

Deﬁnition 3.1 (Minimum Cost Flow (MCF) Problem) . In general minimum cost ﬂow (MCF)problems, there is a directed graph G = ( N , A ) with capacity u ij and cost c ij per unit of ﬂowfor each arc ( i, j ) ∈ A . Among them, u ij must be non-negative and even possibly inﬁnite. And b i representing the external supply or demand for each node i ∈ N . If we use f ij to denote theamount of ﬂow on arc ( i, j ) , a general minimum cost ﬂow problem can be formulated as follows, min f (cid:80) ( i,j ) ∈A c ij f ij s . t . b i + (cid:80) j ∈I ( i ) f ji = (cid:80) j ∈O ( i ) f ij , ∀ i ∈ N ≤ f ij ≤ u ij , ∀ ( i, j ) ∈ A , (11) where O ( i ) := { j : ( i, j ) ∈ A} , I ( i ) := { j : ( j, i ) ∈ A} . Suppose that we have already got an approximate solution ˜ f of the problem (11) via interior-point method or any ﬁrst-order methods, now we have the following lemma:6 emma 3.1. For the general MCF problem (11) , named

M CF ( b, c, u ) , for any partition of A , L and U , we can construct an equivalent problem M CF (¯ b, ¯ c, ¯ u ) . Moreover, if ¯ f ∗ is an optimalsolution of M CF (¯ b, ¯ c, ¯ u ) , then ˜ f ∗ ij := (cid:26) ¯ f ∗ ij , for ( i, j ) ∈ L u ij − ¯ f ∗ ij , for ( i, j ) ∈ U (12) is an optimal solution of M CF ( b, c, u ) , vice versa.Proof. We can shift the problem (11) to construct an equivalent MCF problem. The new MCFproblem is based on the network ¯ G := ( ¯ N , ¯ A ) , in which ¯ N = N , ¯ A = L ∪ { ( j, i ) : ( i, j ) ∈ U} . (13)Accordingly, the capacity limit ¯ u remains the same in the corresponding modiﬁed arcs; and thecost ¯ c in the same direction remain unchanged but cost in the opposite direction turns opposite.i.e., ¯ u L = u L and ¯ u ji = u ij when ( i, j ) ∈ U ¯ c L = c L but ¯ c ji = − c ij when ( i, j ) ∈ U (14)Given any ﬂow f in the original problem (11), we can construct the corresponding new ﬂow ¯ f by ¯ f ij = (cid:26) f ij ≤ f ij ≤ u ij and ( i, j ) ∈ L u ij − f ij ≤ f ij ≤ u ij and ( i, j ) ∈ U (15)And ¯ b i = b i − (cid:80) ( i,j ) ∈U u ij + (cid:80) ( j,i ) ∈U u ij . Then, the new MCF problem M CF ( ˜ f ) is as follows: min f (cid:80) ( i,j ) ∈ ¯ A ¯ c ij f ij s . t . ¯ b i + (cid:80) j ∈ I ( i ) f ji = (cid:80) j ∈ O ( i ) f ij , ∀ i ∈ ¯ N ≤ f ij ≤ ¯ u ij , ∀ ( i, j ) ∈ ¯ A (16)Now easy to see that now ¯ f is feasible for the new problem M CF (¯ b, ¯ c, ¯ u ) if and only if f isfeasible for the original problem M CF ( b, c, u ) .Besides, we have (cid:88) ( i,j ) ∈ ¯ A ¯ c ij ¯ f ij = (cid:88) ( i,j ) ∈A c ij f ij − (cid:88) ( j,i ) ∈U c ji u ji . Therefore, ¯ f is an optimal solution for the problem M CF (¯ b, ¯ c, ¯ u ) if and only if ¯ f is an optimalsolution for the original problem M CF ( b, c, f ) . (cid:50) [Mehrotra and Ye, 1993] have shown that the iterations of interior-point method convergeto an interior point of the optimal face. Therefore, if the solution ˜ f is of enough accuracy andwe use the partition L := { ( i, j ) : 0 ≤ f ij ≤ u ij / } and U := A \ L , this partition will keep thesame in the rest iterations. Moreover, under this circumstance, when the problem has a uniqueoptimal solution, the original MCF problem is even equivalent to the incapacitated version of

M CF (¯ b, ¯ c, ¯ u ) . Without loss of generality, now we can assume that ≤ ˜ f ij ≤ u ij / for each arc ( i, j ) ∈ A .In incapacitated MCF problems, the basic solutions have an important property: a solution f is a basic (feasible) if and only if it is a (feasible) tree solution . Generally speaking, the treesolutions in the incapacitated MCF problems are those whose nonzero components can form asubset of a spanning tree, without considering the direction of ﬂows [Ahuja et al., 1993]. Therefore,7or a nondegenerate BFS of the incapacitated MCF problem in a connected graph, there areexactly |N | − strictly positive components, connecting each node in the graph and correspondingto a spanning tree in G . However, for an interior-point solution, the ﬂow in each arc is strictlypositive, so the arc that serves the most ﬂow to each speciﬁc node has higher potential andpriority of being in the basis. We can deﬁne the ﬂow ratio to measure it. Deﬁnition 3.2 (ﬂow ratio for MCF problem) . For any ﬂow f , we deﬁne maximal ﬂow onnode k : f k := (cid:80) l ∈O ( k ) f kl + (cid:80) l ∈I ( k ) f lk . And then the ﬂow ratio of the arc is deﬁned as r ij := max { f ij /f i , f ij /f j } . One can easily observe that the ﬂow ratio measures the proportion that the ﬂow in a speciﬁcarc serves in all the conjoint arcs. Since the iterations of interior-point method will convergeto relative interior of the optimal face, so for the arcs not in the optimal face, the ﬂow ratiowill gradually converge to zero. The ﬂow ratio contains not only the magnitude informationbut also the graph information, so the ﬂow ratio is more powerful than the criterion proposedby [Mehrotra and Ye, 1993] in predicting the primal optimal face P ∗ . The arc/column with ahigher ﬂow ratio is more likely to be in the basis, so given any approximate solution ˜ f we cansort them as s := ( s , s , . . . , s |N | ) , in which the ﬁrst |N | arcs connect all nodes with the largestratio ﬂow and the rest array is generally arranged in descending order of ˜ f ’s ﬂow ratio.Now we can propose a column generation based crossover method, composed of two phases,basis identiﬁcation and reoptimization. Column generation basis identiﬁcation (

Col-BI ). The basis identiﬁcation phase isaimed at capturing a nearby BFS of the non-corner solution ˜ f . The key idea in the basisidentiﬁcation phase is to implement the column generation algorithm 1 on an equivalent problemand get a nearby BFS of the original problem. Note that for any general LP problem (1) withupper bounds of x , we can use the big-M method to form the following equivalent problem min x ,x a c (cid:62) x + M (cid:62) m x a s . t . Ax + Ix a = bl ≤ x ≤ u, x a ≥ (17)where I is an identity matrix in R m × m , and M is an large positive scalar. So there is anobvious basic feasible solution x , x (cid:62) := ( x (cid:62) x (cid:62) a ) = (0 (cid:62) n b (cid:62) ) , that can serve as the startingpoint the simplex method in the column generation method. Besides, if the original problemis the general MCF problem (11), when M is larger than n ∗ max j | c j | , easy to prove that theproblem (17) is equivalent to the original problem. Then we directly adopt algorithm 1, in which I := { n + 1 , . . . , n + m } and the sequence ( I k ) k is generated by I k +1 ← (cid:0) I k ∪ { s , s , . . . , s θ k } (cid:1) ∩ { n < i ≤ m + n : x ki is nonbasic } c . (18)Here ( θ k ) k can be any monotonously increasing integer series. When all the artiﬁcial variables of x k are nonbasic, x k n is a BFS for the original problem. Usually, if our goal is to obtain a BFSbut don’t require it to be exactly optimal, the x k n is usually already of very high quality. Column generation reoptimization (

Col-OPT ). After obtaining an advanced startingBFS of the original problem, we can continue to implement the reoptimization phase to obtain an ε -optimal corner solution (the reduced cost ¯ c ≥ − ε ). In this phase, we use the column generationalgorithm 1 on the original problem and generate the sequence ( I k ) k by I k +1 ← I k ∪ { i : ¯ c i < − ε } ∪ { s , s , . . . , s θ k } (19)8ntil ¯ c ≥ − ε . Similarly, ( θ k ) k can be any monotonously increasing integer series. Spanning tree basis identiﬁcation (

Tree-BI ). Instead of using the general

Col-BI , wealso have an alternative basis identiﬁcation method for MCF problems. Because the BFS of anyMCF problem must be a feasible tree solution, and a tree solution corresponds to a spanningtree in the graph, instead of directly dealing with the LP problem, we can alternatively tryto construct the spanning tree with the highest sum of ﬂow ratio in the graph. If the givennon-corner solution is of enough high accuracy and the optimal solution is unique, such a spanningtree can exactly corresponds to the optimal BFS. If not, we can also obtain a high-quality basicsolution in a very short time. Besides, many eﬀorts have been devoted to design the algorithm toﬁnd the maximum/minimum weight spanning tree. For the classic algorithms with complexity O (cid:0) |A| log( |N | ) (cid:1) , there are Prim’s algorithm [Prim, 1957, Dijkstra et al., 1959] and Kruskal’salgorithm [Kruskal, 1956]. For faster almost linear time algorithms, see [Karger et al., 1995]and [Chazelle, 2000]. When the accuracy of the non-corner solution is not high enough andthe tree solution is infeasible, we can implement network-simplex-like iterations to push it tothe feasible set. Besides, since optimal transport problems have unique network structure, theiterations of the ‘push’ phase can be very simple.For OT problems, the use of network simplex iterations can be can especially simple. We havethe following lemma: Lemma 3.2.

For an OT problem, suppose that the ﬂow f is infeasible, with f ij < , ( i, j ) ∈ A .Then there exist j (cid:48) , i (cid:48) ∈ N such that f ij (cid:48) > , f i (cid:48) j > and ( i (cid:48) , j (cid:48) ) ∈ A .Proof. Since ( i, j ) ∈ A , we have i ∈ S and j ∈ C . Besides, we have (cid:80) l ∈C f il = s i ≥ and (cid:80) l ∈ S f lj = d j ≥ . Note that we already have f ij < so there must exist f ij (cid:48) > , j (cid:48) ∈ C and f i (cid:48) j > , i (cid:48) ∈ S . Bacaues i (cid:48) ∈ S and j (cid:48) ∈ C , ( i (cid:48) , j (cid:48) ) ∈ A . (cid:50) Thus, by this lemma, the network simplex method can directly eliminate the inverse branch,say f ij < , by repeating the following process: ﬁnding j (cid:48) , i (cid:48) such that f ij (cid:48) , f i (cid:48) j > are positiveand increasing value in the loop i → j → i (cid:48) → j (cid:48) → i until one branch of this loop goes to zero.Therefore, the network simplex iteration in the push phase can be directly written as follows: Algorithm 2

The ‘Push’ Phase for OT Problems Input: a basic solution f and I = { ( i, j ) | f ij < } while I is nonempty do for ( i, j ) ∈ I do j (cid:48) ← arg max l ∈C f il i (cid:48) ← arg max k ∈S f kj θ ← min {− f ij , f i (cid:48) j , f ij (cid:48) } f ij ← f ij + θ , f ij (cid:48) ← f ij (cid:48) − θ , f i (cid:48) j ← f i (cid:48) j − θ , f i (cid:48) j (cid:48) ← θ end for Update I = { ( i, j ) | f ij < } end whileExtension to other LP problems. For the other problems with network structure, thenetwork crossover method has the potential to be easily extended. For instance, although [Linet al., 2020a] has proven that Wasserstain barycenter problem is not a special case of MCFproblems, it can be separated into many diﬀerent OT problems if ﬁxing the weight of thebarycenter. Combined with

Tree-BI for OT problems, we can obtain a nearby corner point in avery short time. 9or the general LP problems, we can also approximately regard them as MCF problems anduse column generation based network crossover. From the view of the graph, there is also anunderlying graph G := ( N , A ) and each column of A corresponds to one arc in the graph. Butdiﬀerent from the MCF problems, we allow the arc to connect with more than two nodes. Similarto deﬁnition 3.2, now we can deﬁne the ﬂow ratio for each column of the general LP problem (9): Deﬁnition 3.3 (Flow ratio for general LP problem) . For any non-corner solution x , we deﬁnethe maximal ﬂow of each row k as f k := A + k · x + A − k · x . Then we can deﬁne the ﬂow ratio for eachcolumn i as r i := max { A ki x i /f k : k = 1 , . . . m } In fact, for MCF problems, each column of matrix A contains exactly two nonzero components,so the ﬂow ratio in the MCF problem deﬁned in deﬁnition 3.2 is just a special case for theabove deﬁnition 3.3. Intuitively, the ﬂow ratio can similarly measure the weight of the ‘ﬂow’ onecolumn serves among all rows in the approximate graph G . After that, we sort the columns indescending order of ﬂow ratio, and then follow it with the basis identiﬁcation phase Col-BI andthe reoptimization phase

Col-OPT . Unlike the emerging ﬁrst-order methods, the interior-point method in commercial LP solvers canusually return a high-accuracy interior-point primal dual solution pair ( x k , s k ) for problem (1).However, sometimes in real LP applications, the computation time in the crossover procedure canlargely exceed that of the interior-point method stage. In this section, we introduce a perturbationcrossover method to deal with it.[Mehrotra and Ye, 1993] ﬁnd out that, for a class of classic interior-point algorithms, suchas [Güler and Ye, 1993], [Kojima et al., 1989] and [Ye, 1992], one can always get an optimalpartition by P k := { j : x kj ≥ s kj } , (20)or P k := { j : | x k +1 j − x kj | /x kj ≤ | s k +1 j − s kj | /s kj } , (21)and the primal dual pair will always converge to a relative interior point of the optimal face Θ p and Θ d . Therefore, when the optimal face contains too many corner points, the traditionalcrossover procedure might be stuck in puriﬁcation stage.Since any extreme point in the optimal face is an optimal BFS, our goal is now equivalent toobtain an extreme point in the optimal face. Using the criterions in (20) and (21), we can obtain P k , an estimation of P ∗ that usually contains the real P ∗ , and get the estimated primal optimalface Θ kp := { x : Ax = b, x ≥ , and x j = 0 for j ∈ ¯ P k } , where P k ∩ ¯ P k = ∅ , P k ∩ ¯ P k = [ n ] . Then the perturbed feasibility problem on Θ kp is: min x ( c + ε ) (cid:62) x s . t . x ∈ Θ kp . (22)Besides, we have the following theorem 4.1 about the problem (22): Theorem 4.1.

If the optimal face Θ p for the original problem (1) is bounded and P ∗ ⊆ P k ,then there exists a δ > , such that for the perturbation ε whose components ε i are uniformlydistributed in [ − δ, δ ] for i ∈ P k and ε i = 0 for i ∈ ¯ P , the perturbed feasibility problem (22) has aunique optimal solution x ∗ almost surely. Moreover, x ∗ is an optimal BFS for the original primalproblem (1) . roof. First of all, since P ∗ ⊆ P k , the optimal face of the perturbed feasibility problem is exactlythe same as the optimal face of the original problem Θ p . Because the optimal face Θ p is bounded,it should be the convex hull of ﬁnite diﬀerent extreme points { v , . . . , v k } , and those extremepoints of the optimal face are exactly all the optimal basic feasible solution of the problem (1).Since the feasible set is a polyhedron, the feasible set can be written as: P := { t (cid:88) i =1 λ i v i + q (cid:88) j =1 θ j w j : t (cid:88) i =1 λ i = 1 , λ i ≥ , θ j ≥ } , (23)where { v i } and { w i } are the extreme points and extreme rays of the feasible set. Let f ∗ denotethe optimal value of problem (1), then c (cid:62) v i = f ∗ when i ≤ k , and c (cid:62) v i > f ∗ when k < i ≤ t , and c (cid:62) w j > for any j .Therefore, the optimal solution of the perturbed feasibility problem (22) is still the optimalsolution of the original problem, if and only if h ( ε ) := min i ≤ k ε (cid:62) v i − min k .Note that the problem (22) has multiple optimal solutions only if there exists i, j , ≤ i < j ≤ t ,such that ( c + ε ) (cid:62) ( x i − x j ) = 0 . However, since x i (cid:54) = x j , the set { ε : ( x i − x j ) (cid:62) ε = ( x i − x j ) (cid:62) c } is of zero Lebesgue measure in R n . Besides, there are only ﬁnite pairs of extreme points in Θ p ,so the perturbed problem would have a unique optimal solution almost surely if ε uniformlydistributed in [ − δ, δ ] n . (cid:50) However, sometimes in practice, a tiny perturbation might be covered by machine error.In order to deal with it, we slightly increase the perturbation size and loosen the criterion forconstructing P k such that P k contains the optimal partition for the perturbed problem: min x ( c + ε ) (cid:62) x s . t . Ax = b, x ≥ (25)If it happens, we have the theorem 4.2 to estimate the error from perturbation ε : Theorem 4.2.

Let x k be the solution returned by the interior point method on the originalproblem (1) , with duality gap smaller than δ g . If the perturbation ε ≥ , then the optimal basicfeasible solution ¯ x ∗ returned by the perturbed problem (22) satisﬁes: c (cid:62) ¯ x ∗ ≤ c (cid:62) x ∗ + δ g + ( x k ) (cid:62) ε, (26) where x ∗ is an optimal solution to the original problem.Proof. Let the corresponding dual variable of x k be ( y k , s k ) , then the duality gap is ( x k ) (cid:62) s k ≤ δ g .Since the perturbation is positive, ( x k , y k , s k + ε ) is a primal dual solution pair for the perturbedproblem (25), with duality gap ( x k ) (cid:62) s k + ( x k ) (cid:62) ε .Now we have c (cid:62) x k − ( s k ) (cid:62) x k ≤ c (cid:62) x ∗ ≤ c (cid:62) x k , and c (cid:62) x k − ( s k ) (cid:62) x k ≤ ( c + ε ) (cid:62) ¯ x ∗ ≤ ( c + ε ) (cid:62) x k , which means c (cid:62) ¯ x ∗ ≤ c (cid:62) x ∗ + ( x k ) (cid:62) s k + ( x k ) (cid:62) ε ⇒ c (cid:62) ¯ x ∗ ≤ c (cid:62) x ∗ + δ g + ( x k ) (cid:62) ε. In practice, we can directly adopt crossover procedure starting from ¯ x k , the interior pointsolution of the perturbed feasibility problem, for the original problem, because ¯ x k is also an interiorpoint for the original problem. Besides, when solving the problem (22) is not obviously cheaperthan solving the problem (25), we can also let P k be [ n ] and use warm starting strategies [Yildirimand Wright, 2002, Skajaa et al., 2013] to directly solve the perturbed problem (25). Besides, afterobtaining an advanced BFS from the perturbed problem (22) or (25), we can choose to enter thereoptimization phase Col-OPT , if an optimal BFS is required.

In this section, we evaluate our smart crossover methods on a variety of datasets. For networkcrossover method, we compare it with the state-of-art LP solver on: randomly generated MCFproblems, OT problems generated by MNIST database of handwritten digits , and large network-LP problems from Hans Mittelmann’s benchmarks for optimization software . For perturbationcrossover method, we compare it with the state-of-art LP solver on Hans Mittelmann’s benchmarkof barrier LP solvers. Here, the barrier method also refers to the interior-point method. To befair, we turn oﬀ the presolve phase of all solvers.All the experiments are implemented in MATLAB-R2020b, on a Windows64(x64) computerwith processor Intel(R) Core(TM) i7-7820HQ @2.90GHz (4 cores and 8 threads), and 16GB ofRAM.For the experiments on the network crossover method, we compare with the LP commercialsolver Gurobi (9.0.3), which was demonstrated to have a top-tier performance in the largenetwork-LP benchmark. Since

Gurobi doesn’t provide a suitable parameter to warm startthe simplex method from a BFS of the perturbed feasibility problem (22), we compare theperturbation crossover method with another state-or-art LP commercial solver

Mosek (9.2),which also has a top-tier performance in the benchmark of barrier LP solvers. Also, we selectthe emerging

Sinkhorn method [Cuturi, 2013] as a representative algorithm to provide anapproximate solution in OT problems to show the power of our methods to combine with moreeﬃcient approximation algorithms. In this subsection, we evaluate our network crossover method introduced in Section 3. Forconvenience of notation, we name the two-phase crossover

Col-BI and

Col-OPT as CNET , andname the two phase crossover

Tree-BI and

Col-OPT as TNET . Specially, for OT problems weadopt

TNET and for other problems we adopt the general

CENT method. The simplex methodused in the column generation framework are all run with

Gurobi interface. The precision of theinterior-point solution solved by the barrier algorithms (interior-point methods) can be measuredby its termination criterion of primal-dual gap. Without specially instructions, in order to mimicthe approximate solutions of the ﬁrst-order methods, we set the primal-dual gap to . whenusing the barrier algorithm and set θ k = 2 k in (18) and (19). Synthetic MCF problems.

The synthetic problems can clearly shows the relationship betweenthe computation runtime of the crossover and the problem scale. Figure 1 illustrates the the http://yann.lecun.com/exdb/mnist/ http://plato.asu.edu/bench.html http://marcocuturi.net/SI.html (Accessed: 21 January, 2021) E+06 4E+06 6E+06 8E+06Number of arcs020406080100120140 C r o ss o v e r r un - t i m e ( s e c o n d s ) GurobiCNET 4E+03 6E+03 8E+03 1E+04Number of nodes020406080100120140 C r o ss o v e r r un - t i m e ( s e c o n d s ) GurobiCNET

Figure 1: Synthetic MCF problems with varying scale: run-time comparison when the numberof nodes or arcs gradually increase in our randomly generated MCF problems with integercomponents. In the left and the right ﬁgure, the number of nodes is 5E+03 and the number ofarcs is 4E+06 respectively. The experiment of each group size is independently repeated 10 times.crossover computation time with the number nodes (arcs) ﬁxed and the number of arcs (nodes)varies. One can easily observe that, compared with the crossover procedure of

Gurobi , CNET is not only considerably faster, but also more stable and insusceptible. _ n i _ n l o l o n g n e t l a r g e n e t l a r g e n e t l a r g e n e t l a r g e s q u a r e w i d e C r o ss o v e r T i m e ( s e c o n d s ) GurobiCNET

Figure 2: Computation time of crossover on the large network-LP benchmark problems.

Large network benchmark problems.

Figure 2 and 3 shows the experiments on the MCFbenchmark problems from Hans Mittlemann’s benchmarks for optimization software. One canobserve that, besides the overall better performance than

Gurobi , as precision decreases,

CNET ’sadvantages become even more apparent and stable. This characteristic ensures that

CNET ismore suitable for starting from the low-accuracy ﬁrst-order solutions.

OT problems generated by MNIST dataset.

The MNIST dataset consists of 60,000 imagesof handwritten digits of size 28 by 28 pixels. We randomly pick two images, normalize the sum of13 C r o ss o v e r T i m e ( s e c o n d s ) netlarge GurobiCNET 1 1E-2 1E-3 1E-4 1E-6050100150200250 i _ n GurobiCNET1E-1 1E-2 1E-3 1E-5 1E-8Primal-dual gap0255075100 C r o ss o v e r T i m e ( s e c o n d s ) n GurobiCNET 1E-1 1E-2 1E-6Primal-dual gap0250500750100012501500 lo GurobiCNET

Figure 3: Computation time of crossover on the large network-LP benchmark problems. Theinterior-point method terminates at diﬀerent primal-dual gaps between − and − .non-zero wights to and generate the OT problem of solving the Wasserstain distance betweenthem. Figure 4 illustrated the diﬀerence between the approximate solution and the corner solution.The solution from the

Tree-BI phase is a nearby BFS of the interior-point solution. And thesolution from the

Col-OPT is the optimal BFS obtained after some simplex iterations. One caneasily see that the corner solution is of apparently higher sparsity and fully avoids the biasedblurring solution. Meanwhile, the corner solution remains the same shape but provides moreinformation and largely lowers the dimension.Figure 4: Transport plan of a randomly generated OT problem from MNIST dataset. The left,the middle, and the right are initial interior-point solution, tree solution from

Tree-BI , and theﬁnal-solved optimal solution respectively.And ﬁgure 5 illustrates the diﬀerence between the interior-point Wasserstein barycentersand the exact ones. Note that if ﬁxing the barycenter and applying

Tree-BI to construct thecorresponding transportation plan, we can obtain the BFS with the same barycenter shape, sothe barycenter after the basis identiﬁcation phase is the same with the interior-point barycenter.14igure 5: The Wasserstein barycenters solved from the interior-point method (the ﬁrst row) andthe barycenters after crossover (the second row). Each image is the Wasserstain barycenter of 25MNIST digit images.Apparently, the exact barycenter after some simplex iterations from the reoptimization phase issharper and contains more information.To better exhibit the eﬀect of the problem size, we artiﬁcially magnify the images by α ( α = 1 , , , ... ) times and generate the images with size α × α . Table 1 lists the experimentaldetails on MNIST problems. No matter starting from the interior-point solution or the Sinkhorn method solution,

TNET is always faster than the crossover of

Gurobi . Besides, as the problemscale grows,

TNET ’s advantages become even more apparent, which makes it also suitable forlarge-scale problems. It should be mentioned that [Dong et al., 2020] demonstrates that thenetwork simplex method has the best computation performance among several exact methods viasome experiments. Table 1 also shows that TNET can be considerably faster than directly runningthe base simplex method from the very beginning. Therefore, since the network simplex methodis an eﬃcient implementation of the general simplex method on MCF problems, it is reasonableto conclude that the combination of the emerging fast inexact methods and the network simplexbased

TNET should be a faster exact method for OT problems.Note that [Dong et al., 2020] has already demonstrated that the network simplex methodhas the best computation performance in solving the OT problem compared with several exactmethods, so in this section we demonstrate that our crossover can also beneﬁt from the networksimplex method and the combination with

Sinkhorn [Cuturi, 2013] can further largely outperformdirectly adopting it.Speciﬁcally, we use the commercial LP

Cplex , because it is equipped with a state-of-artnetwork simplex for MCF problems. To be fair, our

CNET also adopts the network simplex methodof

Cplex to solve the subproblems. See ﬁgure 6 for the comparison between

Sinkhorn + CNET and the network simplex method of

Cplex when the problem scale varies. One can easily observethat our crossover method can largely shorten the overall runtime if starting from an advancedsolution, and this advantage becomes more apparent as the problem scale grows.Besides, table 2 compares

Cplex (network simplex),

Cplex (barrier method + crossover)and

Sinkhorn + CNET on normally distributed random pictures. Compared with the simplexmethod of

Gurobi , Cplex ’s network simplex method is more eﬃcient in OT problems. Thus,diﬀerent from

Gurobi , Cplex ’s network simplex method is better than its barrier algorithm.But our crossover method still maintains the shortest computation time when starting from the

Sinkhorn solutions. Note that our crossover method can accelerate any simplex implementationfrom an advanced solution from other fast methods, it oﬀers a new point of view to accelerateexact methods.

The perturbation crossover is aimed at accelerating the crossover procedure for general LPproblems, so we use the default parameters of the LP solver. In this subsection, we do experimentson Hans Mittlemann’s Benchmark problems for barrier LP solvers and compare with the state-15able 1: Run-time comparison of

Gurobi , Gurobi + TNET and

Sinkhorn + TNET on OTproblems generated from MNIST dataset. α n m gurSpl Gurobi

Gur+

TNET

Skh+

TNET gurBar gurCrs gurBar TNET Skh TNET The number of suppliers and consumers; Runtime using

Gurobi ’s simplex algorithm to solve the OT problem; Computation time of

Gurobi ’s barrier algorithm without crossover; Computation time of

Gurobi ’s crossover method after the barrier algorithm; Computation time of

TNET from

Gurobi ’s barrier method solution or the

Sinkorn method solutions; Runtime of the

Sinkhorn method; Time limit exceeded (over 1000 seconds)

Table 2: The average runtime of 10 independent trials of

Cplex (network simplex),

Cplex (barrier+crossover) and

Sinkhorn + CNET on OT problems generated by normallydistributed random pictures. size cplNetSpl Cplex Sinkhorn+CNETcplBarr cplCross Sinkhorn CNET > The size of each picture; Runtime of

Cplex ’s network-simplex (seconds); Computation time of

Cplex ’s barrier algorithm without crossover (seconds); Computation time of

Cplex ’s crossover method after the barrier (seconds); Runtime of

Sinkhorn method (seconds); Computation time of

CNET (seconds); E+03 2.5E+03 4E+03 5.5E+03 7E+03Number of suppliers and consumers050100150200 R un - t i m e ( s e c o n d s ) Cplex(Network-Simplex)Sinkhorn+CNET 4E+03 1E+04 1.6E+04 2.2E+04 2.8E+04Number of consumers (the number of suppliers is fixed)050100150200250300 R un - t i m e ( s e c o n d s ) Cplex(Network-Simplex)Sinkhorn+CNET

Figure 6: Average computation time of

Cplex (network simplex method) and

Sinkhorn + CNET on random OT problems with uniformly distributed cost: In the left diagram, the number ofsuppliers and consumers simultaneously varies between 500 and 7500; in the right diagram, thenumber of suppliers is ﬁxed at 1000 and the number of consumers varies between 2000 and 30000.The result is the average of 10 independent trials.or-art commercial solver

Mosek since its simplex method can easily warmly start from a BFSof the perturbed feasibility problem (22). But the underlying simplex method of our crossovermethod can also be other LP solvers. Since the perturbation crossover method is especiallyvaluable for the problem with a high-dimension optimal face, it mightn’t have a good performancein accelerating the problems whose crossover procedure is already fast enough. Therefore, weconsider the problems whose crossover procedure is a bottleneck for designing an eﬃcient barrieralgorithm. In other word, even when start from an approximate solution with high accuracy, itstill takes a long time to ﬁnd a vertex solution. In our experiments, we chose all the problemswhose crossover time is at least twice longer than the interior-point method time.Table 3: Computation time on the barrier LP benchmark problems. problem original perturbedmskCross mskBarr mskCross mskSpl total time t

10 stat96v1

11 graph40-40 >

12 graph40-80rand >

13 fhnw-bin0 Computation time of

Mosek ’s crossover method after the barrier on the original problem (seconds); Computation time of

Mosek ’s barrier algorithm without crossover on the perturbed problem; Computation time of

Mosek ’s crossover method after the barrier on the perturbed problem (seconds); Runtime of

Mosek ’s simplex method when warm-start with the solution from ; Time limit exceeded (over 300 seconds);

Mosek and our perturbation crossovermethod by starting from the same default interior-point solution. It shows that our perturbationcrossover makes a signiﬁcant improvement and thoroughly removes the bottleneck for someproblems. For example, the crossover for datt256_lp is a common bottleneck for almost allcommercial LP solvers , since the barrier algorithm can terminate usually in no more than seconds while the crossover procedure takes a much longer time. However, our perturbationcrossover can perfectly solve it. Also note that, dependent on the size of perturbation, thereexists a balance between the crossover computation time of the perturbed feasibility problemand that of the reoptimization phase, so in practice solvers can beneﬁt from the perturbationcrossover method by concurrently running both the traditional crossover and our perturbationcrossover with diﬀerent size of perturbation to be more robust. Acknowledgments

We thank Jiayi Guo and Fangkun Qiu for helpful discussions and fruitful suggestions.

References [Ahuja et al., 1993] Ravindra K Ahuja, James B Orlin, and Thomas L Magnanti.

Network ﬂows:theory, algorithms, and applications . Prentice Hall, 1993.[Altschuler et al., 2017] Jason Altschuler, Jonathan Weed, and Philippe Rigollet. Near-lineartime approximation algorithms for optimal transport via sinkhorn iteration. In

Advances inNeural Information Processing Systems 30 , pages 1961—-1971. Curran Associates Inc., 2017.[Andersen and Ye, 1996] Erling D Andersen and Yinyu Ye. Combining interior-point and pivotingalgorithms for linear programming.

Management Science , 42(12):1719–1731, 1996.[Benamou et al., 2015] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna,and Gabriel Peyré. Iterative bregman projections for regularized transportation problems.

SIAM Journal on Scientiﬁc Computing , 37(2):A1111–A1138, 2015.[Bertsekas and Castanon, 1989] Dimitri P Bertsekas and David A Castanon. The auction algo-rithm for the transportation problem.

Annals of Operations Research , 20(1):67–96, 1989.[Bertsimas and Tsitsiklis, 1997] Dimitris Bertsimas and John N Tsitsiklis.

Introduction to linearoptimization , volume 6. Athena Scientiﬁc Belmont, MA, 1997.[Bixby and Saltzman, 1994] Robert E Bixby and Matthew J Saltzman. Recovering an optimallp basis from an interior point solution.

Operations Research Letters , 15(4):169–178, 1994.[Buttazzo et al., 2012] Giuseppe Buttazzo, Luigi De Pascale, and Paola Gori-Giorgi. Optimal-transport formulation of electronic density-functional theory.

Physical Review A , 85(6):062502,2012.[Carlier and Ekeland, 2010] Guillaume Carlier and Ivar Ekeland. Matching for teams.

Economictheory , 42(2):397–418, 2010.[Chazelle, 2000] Bernard Chazelle. A minimum spanning tree algorithm with inverse-ackermanntype complexity.

Journal of the ACM (JACM) , 47(6):1028–1047, 2000. http://plato.asu.edu/ftp/lpbar.html (Accessed: 21 January, 2021) Economic Theory , 42(2):317–354, 2010.[Cotar et al., 2013] Codina Cotar, Gero Friesecke, and Claudia Klüppelberg. Density functionaltheory and optimal transportation with coulomb cost.

Communications on Pure and AppliedMathematics , 66(4):548–599, 2013.[Cuturi, 2013] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal trans-port. In

Advances in Neural Information Processing Systems 26 , pages 2292–2300. CurranAssociates, Inc., 2013.[Cuturi and Doucet, 2014] Marco Cuturi and Arnaud Doucet. Fast computation of wassersteinbarycenters. In

Proceedings of the 31st International Conference on Machine Learning , pages685–693, 2014.[Dantzig, 1963] George Bernard Dantzig.

Linear programming and extensions . Princeton Univer-sity Press, 1963.[Dijkstra et al., 1959] Edsger W Dijkstra et al. A note on two problems in connexion with graphs.

Numerische Mathematik , 1(1):269–271, 1959.[Dong et al., 2020] Yihe Dong, Yu Gao, Richard Peng, Ilya Razenshteyn, and Saurabh Sawlani.A study of performance of optimal transport. arXiv preprint arXiv:2005.01182 , 2020.[Dvurechensky et al., 2018] Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin.Computational optimal transport: Complexity by accelerated gradient descent is better thanby sinkhorn’s algorithm. In

Proceedings of the 35th International Conference on MachineLearning , pages 1367–1376, 2018.[Galabova and Hall, 2020] IL Galabova and JAJ Hall. The ‘idiot’crash quadratic penalty algo-rithm for linear programming and its application to linearizations of quadratic assignmentproblems.

Optimization Methods and Software , 35(3):488–501, 2020.[Galichon, 2018] Alfred Galichon.

Optimal transport methods in economics . Princeton UniversityPress, 2018.[Ge et al., 2019] Dongdong Ge, Haoyue Wang, Zikai Xiong, and Yinyu Ye. Interior-point methodsstrike back: solving the wasserstein barycenter problem. In

Advances in Neural InformationProcessing Systems 32 , pages 6894–6905, 2019.[Genevay et al., 2016] Aude Genevay, Marco Cuturi, Gabriel Peyré, and Francis Bach. Stochasticoptimization for large-scale optimal transport. In

Advances in Neural Information ProcessingSystems 29 , pages 3440–3448. Curran Associates, Inc., 2016.[Goldman and Tucker, 1956] Alan J Goldman and Albert W Tucker. Theory of linear program-ming.

Linear Inequalities and Related Systems , 38:53–97, 1956.[Güler and Ye, 1993] Osman Güler and Yinyu Ye. Convergence behavior of interior-point algo-rithms.

Mathematical Programming , 60(1-3):215–228, 1993.[Ho et al., 2019] Nhat Ho, Viet Huynh, Dinh Phung, and Michael Jordan. Probabilistic multilevelclustering via composite transportation distance. In

The 22nd International Conference onArtiﬁcial Intelligence and Statistics , pages 3149–3157, 2019.19Jambulapati et al., 2019] Arun Jambulapati, Aaron Sidford, and Kevin Tian. A direct ˜ o (1/ep-silon) iteration parallel algorithm for optimal transport. In Advances in Neural InformationProcessing Systems 32 , pages 11359–11370. Curran Associates, Inc., 2019.[Janati et al., 2020] Hicham Janati, Marco Cuturi, and Alexandre Gramfort. Debiased Sinkhornbarycenters. In

Proceedings of the 37th International Conference on Machine Learning , pages4692–4701, 2020.[Karger et al., 1995] David R Karger, Philip N Klein, and Robert E Tarjan. A randomized linear-time algorithm to ﬁnd minimum spanning trees.

Journal of the ACM (JACM) , 42(2):321–328,1995.[Kojima et al., 1989] Masakazu Kojima, Shinji Mizuno, and Akiko Yoshise. A polynomial-timealgorithm for a class of linear complementarity problems.

Mathematical Programming ,44(1-3):1–26, 1989.[Kortanek and Jishan, 1988] KO Kortanek and Zhu Jishan. New puriﬁcation algorithms forlinear programming.

Naval Research Logistics (NRL) , 35(4):571–583, 1988.[Kroshnin et al., 2019] Alexey Kroshnin, Nazarii Tupitsa, Darina Dvinskikh, Pavel Dvurechensky,Alexander Gasnikov, and Cesar Uribe. On the complexity of approximating Wassersteinbarycenters. In

Proceedings of the 36th International Conference on Machine Learning , pages3530–3540, 2019.[Kruskal, 1956] Joseph B Kruskal. On the shortest spanning subtree of a graph and the travelingsalesman problem.

Proceedings of the American Mathematical society , 7(1):48–50, 1956.[Lin et al., 2020a] Tianyi Lin, Nhat Ho, Xi Chen, Marco Cuturi, and Michael I Jordan. Fixed-support wasserstein barycenters: Computational hardness and fast algorithm. In

Advancesin Neural Information Processing Systems 33 Pre-proceedings , 2020a.[Lin et al., 2019] Tianyi Lin, Nhat Ho, and Michael I Jordan. On the eﬃciency of the sinkhorn andgreenkhorn algorithms and their acceleration for optimal transport. In

The 36th InternationalConference on Artiﬁcial Intelligence and Statistics , pages 3982–3991, 2019.[Lin et al., 2020b] Tianyi Lin, Shiqian Ma, Yinyu Ye, and Shuzhong Zhang. An admm-basedinterior-point method for large-scale linear programming.

Optimization Methods and Software ,0(0):1–36, 2020b.[Megiddo, 1991] Nimrod Megiddo. On ﬁnding primal-and dual-optimal bases.

ORSA Journal onComputing , 3(1):63–65, 1991.[Mehrotra and Ye, 1993] Sanjay Mehrotra and Yinyu Ye. Finding an interior point in the optimalface of linear programs.

Mathematical Programming , 62(1-3):497–515, 1993.[Mitra et al., 1988] Gautam Mitra, Mehrdad Tamiz, and Joseph Yadegar. Experimental investi-gation of an interior search method within a simplex framework.

Communications of theACM , 31(12):1474–1482, 1988.[Nguyen, 2013] XuanLong Nguyen. Convergence of latent mixing measures in ﬁnite and inﬁnitemixture models.

The Annals of Statistics , 41(1):370–400, 2013.[Prim, 1957] Robert Clay Prim. Shortest connection networks and some generalizations.

TheBell System Technical Journal , 36(6):1389–1401, 1957.20Santambrogio, 2015] Filippo Santambrogio.

Optimal transport for applied mathematicians .Springer, 2015.[Schrijver, 1998] Alexander Schrijver.

Theory of linear and integer programming . John Wiley &Sons, 1998.[Skajaa et al., 2013] Anders Skajaa, Erling D Andersen, and Yinyu Ye. Warmstarting thehomogeneous and self-dual interior point method for linear and conic quadratic problems.

Mathematical Programming Computation , 5(1):1–25, 2013.[Tolstikhin et al., 2018] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf.Wasserstein auto-encoders. In

International Conference on Learning Representations , 2018.[Villani, 2003] Cédric Villani.

Topics in optimal transportation . American Mathematical Society,2003.[Wagner, 1959] Harvey M Wagner. On a class of capacitated transportation problems.

Manage-ment Science , 5(3):304–318, 1959.[Ye, 1992] Yinyu Ye. On the ﬁnite convergence of interior-point algorithms for linear programming.

Mathematical Programming , 57(1-3):325–335, 1992.[Yildirim and Wright, 2002] E Alper Yildirim and Stephen J Wright. Warm-start strategies ininterior-point methods for linear programming.

Related Researches

Adversarial Resilience for Sampled-Data Systems under High-Relative-Degree Safety Constraints

by James Usevitch

Rolling Horizon Policies in Multistage Stochastic Programming

by Murwan Siddig

Local and Global Uniform Convexity Conditions

by Thomas Kerdreux

Pure Characteristics Demand Models and Distributionally Robust Mathematical Programs with Stochastic Complementarity Constraints

by Jie Jiang

Max-Cut via Kuramoto-type Oscillators

by Stefan Steinerberger

Model reduction and uncertainty quantification of multiscale diffusions with parameter uncertainties using nonlinear expectations

by Hafida Bouanani

Tightness and Equivalence of Semidefinite Relaxations for MIMO Detection

by Ruichen Jiang

Nonlinear Observers Design for Vision-Aided Inertial Navigation Systems

by Miaomiao Wang

Over-approximating reachable tubes of linear time-varying systems

by Mohamed Serry

Infeasibility detection with primal-dual hybrid gradient for large-scale linear programming

by David Applegate

Optimal Transport in the Face of Noisy Data

by Bart P.G. Van Parys

Core Imaging Library -- Part I: a versatile Python framework for tomographic imaging

by Jakob S. Jørgensen

Analysis of the Optimization Landscape of Linear Quadratic Gaussian (LQG) Control

by Yang Zheng

SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

by Courtney Paquette

Minimum-time lateral interception of a moving target by a Dubins car

by Maksim Buzikov

On the Convexity of Level-sets of Probability Functions

by Yassine Laguel

Entropic Optimal Transport: Geometry and Large Deviations

by Espen Bernton

Partition-based formulations for mixed-integer optimization of trained ReLU neural networks

by Calvin Tsay

Asynchronous Parallel Nonconvex Optimization Under the Polyak-Lojasiewicz Condition

by Kasra Yazdani

Correlated Bandits for Dynamic Pricing via the ARC algorithm

by Samuel Cohen

Efficient extended-search space full-waveform inversion with unknown source signatures

by Hossein S. Aghamiry

Regularization with Multilevel Non-stationary Tight Framelets for Image Restoration

by Yan-ran Li

Solutions to a system of first order H-J equations related to a debt management problem

by Antonio Marigonda

A LP relaxation based matheuristic for multi-objective integer programming

by Duleabom An

Efficient Learning of a Linear Dynamical System with Stability Guarantees

by Wouter Jongeneel

«

1

2

3

4

»

Submitted on 18 Feb 2021 (v1), last revised 19 Feb 2021 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar