[PDF] Between steps: Intermediate relaxations between big-M and convex hull formulations

Abstract

This work develops a class of relaxations in between the big-M and convex hull formulations of disjunctions, drawing advantages from both. The proposed "P-split" formulations split convex additively separable constraints into P partitions and form the convex hull of the partitioned disjuncts. Parameter P represents the trade-off of model size vs. relaxation strength. We examine the novel formulations and prove that, under certain assumptions, the relaxations form a hierarchy starting from a big-M equivalent and converging to the convex hull. We computationally compare the proposed formulations to big-M and convex hull formulations on a test set including: K-means clustering, P_ball problems, and ReLU neural networks. The computational results show that the intermediate P-split formulations can form strong outer approximations of the convex hull with fewer variables and constraints than the extended convex hull formulations, giving significant computational advantages over both the big-M and convex hull.

Full PDF

BBetween Steps: Intermediate Relaxationsbetween big-M and Convex Hull Formulations

Jan Kronqvist (cid:63) , Ruth Misener, and Calvin Tsay

Deparment of Computing, Imperial College London { j.kronqvist, r.misener, c.tsay } @imperial.ac.uk Abstract.

This work develops a class of relaxations in between the big-M and convex hull formulations of disjunctions, drawing advantages fromboth. The proposed “ P -split” formulations split convex additively sepa-rable constraints into P partitions and form the convex hull of the parti-tioned disjuncts. Parameter P represents the trade-oﬀ of model size vs.relaxation strength. We examine the novel formulations and prove that,under certain assumptions, the relaxations form a hierarchy starting froma big-M equivalent and converging to the convex hull. We computation-ally compare the proposed formulations to big-M and convex hull formu-lations on a test set including: K-means clustering, P ball problems, andReLU neural networks. The computational results show that the inter-mediate P -split formulations can form strong outer approximations ofthe convex hull with fewer variables and constraints than the extendedconvex hull formulations, giving signiﬁcant computational advantagesover both the big-M and convex hull. Keywords:

Disjunctive programming · Relaxation comparison · Formu-lations · Mixed-integer programming · Convex MINLP

There are well-known trade-oﬀs between the big-M and convex hull relaxationsof disjunctions in terms of problem size and relaxation tightness. Convex hullformulations [4,6,9,16,20,36] provide a sharp formulation for a single disjunction, i.e., the continuous relaxation provides the best possible lower bound. The con-vex hull is often represented by so-called extended (a.k.a. perspective/multiple-choice) formulations [5,7,11,14,15,17,38], which introduce multiple copies of eachvariable in the disjunction(s). On the other hand, the big-M formulation onlyintroduces one binary variable for each disjunct and results in a smaller prob-lem in terms of both number of variables and constraints; however, in generalit provides a weaker relaxation than the convex hull and may require a solverto explore signiﬁcantly more nodes in a branch-and-bound tree [10,38]. Eventhough the big-M formulation is weaker, in some cases it can computationallyoutperform extended convex hull formulations, as the simpler subproblems can (cid:63)

Corresponding author a r X i v : . [ m a t h . O C ] J a n J. Kronqvist et al. oﬀset the larger number of explored nodes. Anderson et al. [1] describe a folkloreobservation in mixed-integer programming (MIP) that extended convex hull for-mulations tend to perform worse than expected. The observation is supportedby the numerical results in Anderson et al. [1] and in this paper.This paper presents a framework for generating formulations for disjunctionsbetween the big-M and convex hull with the intention of combining the best ofboth worlds: a tight, yet computationally eﬃcient, formulation. The main ideabehind the novel formulations is partitioning the constraints of each disjunctand moving most of the variables out of the disjunction. Forming the convexhull of the resulting disjunctions results in a smaller problem, while retainingsome features of the convex hull. We call the new formulation the P -split, asthe constraints are split into P parts. While many eﬀorts have been devoted tocomputationally eﬃcient convex hull formulations [3,11,19,33,37,39,40,41] andtechniques for deriving the convex hull of MIP problems [2,22,25,31,35], ourprimary goal is not to generate the convex hull. Rather, we provide a straight-forward framework for generating a family of relaxations that approximate theconvex hull for a general class of disjunctions using a smaller problem formula-tion. Our experiments show that the P -split formulations can give a signiﬁcantcomputational advantage over both the big-M and convex hull formulations.This paper is organized as follows: the P -split formulation is presented inSection 2, together with properties of the P -split relaxations and how they com-pare to the big-M and convex hull relaxations. We also present a non-extendedrealization of the P -split formulation for the special case of a two-term disjunc-tion. Finally, a numerical comparison of the formulations is presented in Section3 using both instances with linear and nonlinear disjunctions. We consider optimization problems containing disjunctions of the form ∨ l ∈D (cid:2) g k ( x ) ≤ b k ∀ k ∈ C l (cid:3) x ∈ X ⊂ R n , (1)where D contains the indices of the disjuncts, C l the indices of the constraintsin disjunct l , and X is a convex compact set. This paper assumes the following: Assumption 1

The functions g k : R n → R are convex additively separablefunctions, i.e., g k ( x ) = (cid:80) ni =1 h ik ( x i ) where h ik : R → R are convex functions,and each disjunct is non-empty on X . Assumption 2

All functions g k are bounded over X . Assumption 3

Each disjunct contains far fewer constraints than the numberof variables in the disjunction, i.e., |C l | << n . The ﬁrst two assumptions are needed for the P -split formulation to be validand result in a convex MIP. While the ﬁrst assumption simpliﬁes our analysis of elaxations between big-M and convex hull formulations 3 P -split formulations, it could easily be relaxed to partially additively separablefunctions. Furthermore, the computational experiments only consider problemswith linear or quadratic constraints, which ensures that the convex hull of thedisjunction is representable by a polyhedron or (rotated) second-order cone con-straints [6]. Assumption 3 characterizes problem structures favorable for thepresented formulations. Problems with such a structure include, e.g., cluster-ing [28,32], mixed-integer classiﬁcation [24,30], optimization over trained neuralnetworks [1,8,12,13,34], and coverage optimization [18]. The formulations in this section apply to disjunctions with multiple constraintsper disjunct. However, to simplify the derivation, we only consider disjunctionswith one constraint per disjunct, i.e., |C l | = 1 ∀ l ∈ D . The extension to multipleconstraints per disjunct simply applies the splitting procedure to each constraint.To derive the new formulations, we partition the variables into P sets andform the corresponding index sets I , . . . , I P . The constraint for each disjunctis then split into P constraints, by introducing auxiliary variables α j ∈ R P ∨ l ∈D (cid:2) g l ( x ) ≤ b l (cid:3) x ∈ X −→ ∨ l ∈D  (cid:88) i ∈I h i,l ( x i ) ≤ α l ... (cid:88) i ∈I P h i,l ( x i ) ≤ α lPP (cid:88) s =1 α ls ≤ b l α ¯ ls ≤ α ls ≤ ¯ α ls ∀ s ∈ { , . . . , P }  x ∈ X , α l ∈ R P ∀ l ∈ D . (2)By Assumption 2, function h i,l is bounded on X , and bounds on the auxiliaryvariables are given by α ¯ ls := min x ∈X (cid:88) i ∈I s h i,l ( x i ) , ¯ α ls := max x ∈X (cid:88) i ∈I s h i,l ( x i ) . (3)The P -split formulation does not require tight bounds, but weak bounds resultin an overall weaker relaxation.The splitting creates a lifted formulation by introducing P × |D| auxiliaryvariables. Both formulations in (2) have the same feasible set in the x variables. J. Kronqvist et al.

We relax the disjunction by treating the splitted constraints as global constraints ∨ l ∈D  P (cid:88) s =1 α ls ≤ b l α ¯ ls ≤ α ls ≤ ¯ α ls ∀ s ∈ { , . . . , P } (cid:88) i ∈I s h i,l ( x i ) ≤ α ls ∀ s ∈ { , . . . , P } , ∀ l ∈ D x ∈ X , α l ∈ R P ∀ l ∈ D . (4) Deﬁnition 1.

Formulation (4) is a P -split representation of the original dis-junction in (2) . Lemma 1 relates the P -split representation to the original disjunction. The prop-erty is rather simple, but for completeness we have stated it as a lemma. Lemma 1.

The feasible set of P -split representation projected onto the x -spaceis equal to the feasible set of the original disjunctions in (2) .Proof. An ¯ x that is feasible for (4) and violates (2) gives a contradiction. Simi-larly, an ¯ x that is feasible for (2) is also clearly feasible for (4). (cid:117)(cid:116) Using the extended formulation [4] to represent the convex hull of the dis-junction in (4) results in the P -split formulation α ls = (cid:88) d ∈D ν α ls d ∀ s ∈ { , . . . , P } , ∀ l ∈ D P (cid:88) s =1 ν α ls l ≤ b l λ l ∀ l ∈ D α ¯ ls λ d ≤ ν α ls d ≤ ¯ α ls λ d ∀ s ∈ { , . . . , P } , ∀ l, d ∈ D (cid:88) i ∈I s h i,l ( x i ) ≤ α ls ∀ s ∈ { , . . . , P } , ∀ l ∈ D (cid:88) l ∈D λ l = 1 , λ ∈ { , } |D| x ∈ X , α l ∈ R P , ν α ls ∈ R P ∀ s ∈ { , . . . , P } , ∀ l ∈ D , ( P -split)which forms a convex MIP problem. To clarify our terminology: a 2-split formu-lation is a formulation ( P -split) where the constraints of the original disjunctionare split up into two parts, i.e., P = 2. We assume that the disjunction is part ofa larger optimization problem that may contain multiple disjunctions. Therefore,we need to enforce integrality on the λ variables even if we recover the convexhull of the disjunction. Proposition 1 shows the correctness of the the ( P -split)formulation of the original disjunction. elaxations between big-M and convex hull formulations 5 Proposition 1.

The set of feasible x variables in formulation ( P -split) is equalto the feasible set of x variables in disjunction (2) .Proof. By Lemma 1, (2) and (4) have equivalent x feasible sets. For λ ∈ { , } |D| ,the extended formulation ( P -split) exactly represents the disjunction (4). (cid:117)(cid:116) Proposition 1 states that the P -split formulation is correct for integer feasiblesolutions, but it does not give any insight on the quality of the continuous relax-ation. The following subsections further analyze the properties of the ( P -split)formulation and its relation to the big-M and convex hull formulations. Remark 1.

A ( P -split) formulation introduces P · (cid:0) |D| + 1 (cid:1) continuous and |D| binary variables. Unlike the extended convex hull formulation (which introduces |D| · n continuous and |D| binary variables), the number of “extra” variables isindependent of n , i.e., the number of variables in the original disjunction. Aswe later show, there are applications where |D| << n for which ( P -split) formu-lations can be smaller and computationally more tractable than the extendedconvex hull formulation. P -Split formulation This section focuses on the strength of the continuous relaxation of the P -splitformulation, and how it compares to convex hull and big-M formulations. Tosimplify the analyses, we only consider disjunctions with a single constraint perdisjunct. However, the results directly extend to the case of multiple constraintsper disjunct by applying the same procedure to each individual constraint.We ﬁrst analyze the 1-split, as summarized in the following theorem. Theorem 1.

The 1-split formulation is equivalent to the big-M formulation.Proof.

We eliminate the disaggregated variables ν α l d from the 1-split formulationusing Fourier-Motzkin elimination. Furthermore, we eliminate trivially redun-dant constraints, e.g., α ¯ l λ d ≤ ¯ α l λ d , resulting in α l ≤ b l λ l + (cid:88) d ∈D\ l ¯ α l λ d ∀ l ∈ D n (cid:88) i =1 h i,l ( x i ) ≤ α l ∀ l ∈ D (cid:88) l ∈D λ l = 1 , λ ∈ { , } |D| , x ∈ X , α l ∈ R ∀ l ∈ D . (5)The auxiliary variables α l are removed by combining the ﬁrst and second con-straints in (5). The smallest valid big-M coeﬃcients are M l = ¯ α l − b l , whichenables us to write (5) as n (cid:88) i =1 h i,l ( x i ) ≤ b l + M l (1 − λ l ) ∀ l ∈ D k (cid:88) l ∈D λ l = 1 , λ ∈ { , } |D| , x ∈ X . (6) J. Kronqvist et al. (cid:117)(cid:116)

Since the 1-split formulation introduces |D| + 1 auxiliary variables, but hasthe same continuous relaxation as the big-M formulation, there are no clearadvantages of the 1-split formulation vs the big-M formulation.We now examine the other extreme, where constraints are fully disaggregated, i.e., the n -split. Its relation to the convex hull is given in the following theorem. Theorem 2.

If all h i,l are aﬃne functions, then the n -split formulation (whereconstraints are split for each variable) provides the convex hull of the disjunction.Proof. In the linear case, the original disjunction is given by ∨ l ∈D (cid:2) ( a l ) T x ≤ b l (cid:3) x ∈ X , (7)and the n -split formulation can be written compactly as ∨ l ∈D (cid:2) B l ˜ α ≤ ˜ b l (cid:3) ˜ α = Γ x , x ∈ X , ˜ α ∈ R n ×|D| . (8)The n -split formulation is given by the convex hull of (8) through the extendedformulation. Here, Γ deﬁnes a bijective mapping between the x and ˜ α variablespaces (only true for an n -split). A reverse mapping is given by x = Ψ ˜ α . Thelinear transformations preserve an exact representation of the feasible sets, i.e., B l ˜ α ≤ ˜ b l ⇐⇒ ( a l ) T Ψ ˜ α ≤ b , ( a l ) T x ≤ b l ⇐⇒ B l Γ x ≤ ˜ b l . (9)For any point z in the the convex hull of (8) ∃ ˜ α , ˜ α , . . . ˜ α |D| and λ ∈ R |D| + z = |D| (cid:88) l =1 λ l ˜ α l (10) |D| (cid:88) l =1 λ l = 1 , B l ˜ α l ≤ ˜ b l ∀ l ∈ D . Applying the reverse mapping to (10) gives Ψ z = |D| (cid:88) l =1 λ l Ψ ˜ α l . (11)By construction, (cid:0) a l (cid:1) T Ψ ˜ α l ≤ b l ∀ l ∈ D . The point Ψ z is given by a convexcombination of points that all satisfy the constraints of one of the disjuncts in(7) and, therefore, belongs to the convex hull of (7). The same technique easilyshows that any point in the convex hull of disjunction (7) also belongs to theconvex hull of disjunction (8). (cid:117)(cid:116) Theorem 2 does not hold with nonlinear functions, since the mapping may not bebijective or a homomorphism. In general, the n -split formulation will not obtainthe convex hull of nonlinear disjunctions, as Section 2.2 shows by example, butit can provide a strong outer approximation. elaxations between big-M and convex hull formulations 7 Two-term disjunctions

We further analyze the special case of a two-termdisjunction for which we also present a non-lifted P -split formulation in thefollowing theorem. Theorem 3.

For a two-term disjunction, the P -split formulation has the fol-lowing non-extended realization (cid:88) j ∈S p  (cid:88) i ∈I j h i, ( x i )  ≤  b − (cid:88) s ∈S\S p α ¯ s  λ + (cid:88) s ∈S p ¯ α s λ ∀S p ⊂ S (cid:88) j ∈S p  (cid:88) i ∈I j h i, ( x i )  ≤  b − (cid:88) s ∈S\S p α ¯ s  λ + (cid:88) s ∈S p ¯ α s λ ∀S p ⊂ S λ + λ = 1 , λ ∈ { , } , x ∈ X , (12) where S = { , , . . . P } .Proof. The equality constraints for the disaggregated variables ( α ls = ν α ls + ν α ls )enable us to easily eliminate the variables ν α ls from ( P -split), resulting in P (cid:88) s =1 (cid:16) α s − ν α s (cid:17) ≤ b λ (13) P (cid:88) s =1 ν α s ≤ b λ (14) α ¯ ls λ ≤ α ls − ν α ls ≤ ¯ α ls λ ∀ s ∈ { , , . . . , P } , ∀ l ∈ { , } (15) α ¯ ls λ ≤ ν α ls ≤ ¯ α ls λ ∀ s ∈ { , , . . . , P } , ∀ l ∈ { , } (16) (cid:88) i ∈I s h i,l ( x i ) ≤ α ls ∀ s ∈ { , , . . . , P } , ∀ l ∈ { , } (17) λ + λ = 1 , λ ∈ { , } (18) x ∈ X , α l ∈ R P , ν α ls ∈ R P ∀ l ∈ { , } , ∀ s ∈ { , , . . . , P } . (19)Next, we use Fourier-Motzkin elimination to project out the ν α s variables. Com-bining the constraints in (15) and (16) only results in trivially redundant con-straints, e.g., α ls ≤ ¯ α ls ( λ + λ ). Eliminating the ﬁrst variable ν α creates twonew constraints by combining (13) with (15)–(16). The ﬁrst constraint is ob-tained by removing ν α and α from (13) and adding α ¯ λ to the left-hand side.The second constraint is obtained by removing ν α from (13) and subtracting¯ α λ from the left-hand side. Eliminating the next variable is done by repeatingthe procedure of combining the two new constraints with the corresponding in-equalities in (15)–(16). Each elimination step doubles the number of constraints J. Kronqvist et al. originating from inequality (13). Eliminating all the variables ν α s and α s resultsin the ﬁrst set of constraints (cid:88) s ∈S p α s ≤  b − (cid:88) s ∈S\S p α ¯ s  λ + (cid:88) s ∈S p ¯ α s λ ∀S p ⊂ S . (20)The variables ν α s and α s are eliminated by same steps, resulting in the secondset of constraints in (12). (cid:117)(cid:116) To further analyze the tightness of diﬀerent P -split relaxations we requirethat the bounds on the auxiliary variables be independent , as deﬁned below: Deﬁnition 2.

We say that the upper and lower bounds for the constraint (cid:80) ni =1 h i ( x i ) ≤ are independent on X if min x ∈X ( h i ( x i ) + h j ( x j )) = min x ∈X h i ( x i ) + min x ∈X h j ( x j )max x ∈X ( h i ( x i ) + h j ( x j )) = max x ∈X h i ( x i ) + max x ∈X h j ( x j ) , (21) hold for all i, j ∈ { , , . . . n } . Independent bounds are not restricted to linear constraints, but the most generalcase of independent bounds are linear disjunctions with X deﬁned as a box.Independent bounds enable us to establish a strict relation on the tightness ofdiﬀerent P -split formulations, which is presented in the next corollary. Corollary 1.

For a two-term disjunction with independent bounds, a ( P + 1) -split formulation, obtained by splitting one variable group in the P -split, is alwaysas tight or tighter than the corresponding P-split formulation.Proof. The non-extended formulation (12) for the ( P + 1)-split comprises theconstraints in the P -split formulation and some additional constraints. (cid:117)(cid:116) From Corollary 1 it follows that the P -split formulations represent a hierarchyof relaxations, and we formally state this property in the following corollary. Corollary 2.

For a linear two-term disjunction the P-split formulations forma hierarchy of relaxations, starting from the big-M relaxation ( P = 1 ) and con-verging to the convex hull relaxation ( P = n ).Proof. Theorems 1 and 2 give equivalence to big-M and convex hull. By Corollary1, the ( P + 1)-split is as tight or tighter than the P -split relaxation. (cid:117)(cid:116) To see the diﬀerences between P -split formulations, consider the disjunction (cid:2)(cid:80) i =1 x i ≤ (cid:3) ∨ (cid:2)(cid:80) i =1 (3 − x i ) ≤ (cid:3) x ∈ R . (ex-1) elaxations between big-M and convex hull formulations 9 The tightest valid bounds on all the auxiliary variables are given by α ¯ ls = 0 , ¯ α ls := (cid:16)(cid:112) |I s | · + 1 (cid:17) ∀ s ∈ { , , , } , ∀ l ∈ { , } . (22)These bounds are derived from the fact that one of the two constraints in thedisjunction must hold, and are symmetric for the two set of α -variables. The con-tinuously relaxed feasible sets of the P -split formulations of disjunction (ex-1) areshown in Fig. 1, which shows that the relaxations overall tighten with increasingnumber of splits P . The 4-split formulation does not give the convex hull, butprovides a good approximation. For this example, the independent bound prop-erty does not hold and the relaxations do not form a proper hierarchy. To showwhy the independent bound property is needed, we compare the non-extendedrepresentations of the 1-split and 2-split formulations: (cid:88) i =1 x i ≤ λ + (cid:16) √

36 + 1 (cid:17) λ , (cid:88) i =1 (3 − x i ) ≤ λ + (cid:16) √

36 + 1 (cid:17) λ (1-s) (cid:88) i =1 x i ≤ λ + (cid:16) √

18 + 1 (cid:17) λ , (cid:88) i =3 x i ≤ λ + (cid:16) √

18 + 1 (cid:17) λ (2-s1) (cid:88) i =1 (3 − x i ) ≤ λ + (cid:16) √

18 + 1 (cid:17) λ , (cid:88) i =3 (3 − x i ) ≤ λ + (cid:16) √

18 + 1 (cid:17) λ (2-s2) (cid:88) i =1 x i ≤ λ + 2 (cid:16) √

18 + 1 (cid:17) λ , (cid:88) i =1 (3 − x i ) ≤ λ + 2 (cid:16) √

18 + 1 (cid:17) λ . (2-s3)The 1-split formulation is given by (1-s), and the 2-split by (2-s1)–(2-s3). The2-split contains additional constraints (2-s1) and (2-s2), but (2-s3) is a weakerversion of (1-s). If the independent bound property were true, then (2-s3) and(1-s) would be identical and the relaxations would form a proper hierarchy. { x , x , x , x } ) 2-split( { x , x } , { x , x } ) 4-split( { x } , { x } , { x } , { x } ) Fig. 1: The dark circles show the feasible set of (ex-1) in the x , x space. Thelight grey areas show the continuously relaxed feasible set of the P-split formu-lations. The sets in the parentheses show the partitioning of variables. To compare how the formulations perform computationally, we apply the P -split,big-M, and convex hull formulations to several test problems. We consider threetypes of optimization problems that have a suitable structure for the P -splitformulation (assumptions 1–3) and that are known to be challenging. K-means clustering

Using the formulation by Papageorgiou and Trespalacios[28], the K-means clustering problem [26] is given bymin r ∈ R L , x j ∈ R n , ∀ j ∈K L (cid:88) i =1 r i s.t. ∨ j ∈K (cid:104)(cid:13)(cid:13) x j − d i (cid:13)(cid:13) ≤ r i (cid:105) ∀ i ∈ { , , . . . , L } , (23)where x j are the cluster centers, { d i } Li =1 are n -dimensional data points, and K = { , , . . . k } . The tightest upper bound for the auxiliary variables in the P-split formulations are given by the largest squared Euclidean distance betweenany two data points in the subspace corresponding to the auxiliary variable. Byintroducing auxillary variables for the diﬀerences ( x − d ), we can express theconvex hull of the disjunctions by rotated second order cone constraints [6] in aform suitable for Gurobi. We use the G2 data set [27] to generate low-dimensionaltest instances, and the MNIST data set [23] to generate high-dimensional testinstances. For the MNIST-based problems, we select the ﬁrst images of eachclass ranging from 0 to the number of clusters. Details about the problems arepresented in Table 1. P ball problems

The task is to assign p -points to n -dimensional unit balls suchthat the total (cid:96) distance between all points is minimized and only one point isassigned to each unit ball [21]. Upper bounds on the auxiliary variables in theP-split formulation are given by the same technique as for the M -coeﬃcients in[21], but in the subspace corresponding to the auxiliary variable. By introducingauxiliary variables for the diﬀerences between the points and the centers, we areable to express the convex hull by second order cone constraints [6] in a formsuitable for Gurobi. We have generated a few larger instances to obtain morechallenging problems and details of the problems are given in Table 1. ReLU neural networks

Optimization over a ReLU neural network (NN) isused to quantify extreme outputs [1,8]. Each ReLU activation function ( y =max { , w T x + b } ) can be expressed as a two-part disjunction using the P -splitformulation, by separating w T x = (cid:80) i ∈S ∪ ... ∪S P w i x i . We sort the variables x i by index and assign them to splits of even size. Upper bounds on node outputsand auxiliary variables can be computed using simple interval arithmetic. Wecreated several instances (Table 1) that minimize the prediction of single-output elaxations between big-M and convex hull formulations 11 NNs trained on the d -dimensional Ackley/Rastrigin functions. All NNs were im-plemented in PyTorch [29] and trained for 1000 epochs, using a Latin hypercubeof 10 samples. Note that more samples may be required to accurately representthe target functions, but here we are solely concerned with the performance ofvarious optimization formulations.Table 1: Details of the clustering, P ball and neural network problems. name data points data dimension number of clustersCluster g1 20 32 2Cluster g2 25 32 2Cluster g3 20 16 3Cluster m1 5 784 3Cluster m2 8 784 2Cluster m3 10 784 2number of balls number of points ball dimensionP ball 1 10 5 8P ball 2 10 5 16P ball 3 8 5 32input dimension ( d ) hidden layers functionNN 1 2 [50, 50, 50] AckleyNN 2 10 [50, 50, 50] AckleyNN 3 3 [100, 100] Rastrigin Computational setup

Optimization performance is dependent on both thetightness and the computational complexity of the continuous relaxation. Thedefault (automatic) parameter selection in Gurobi caused large variations inthe results that were due to diﬀerent solution strategies rather than diﬀerencesbetween formulations. Therefore, we used the parameter settings

MIPFocus =3,

Cuts = 1, and

MIQCPMethod = 1 for all problems. We found that using

PreMIQCPForm = 2 drastically improves the performance of the extended convexhull formulations for the clustering and P ball problems. However, it resultedin worse performance for the other formulations and, therefore, we only usedit with the convex hull. Since the NN problems only contain linear constraints,only the

MIPFocus and

Cuts parameters apply to these problems The defaultvalues were used for all other settings. All problems were solved using Gurobi9.0.3 on a desktop computer with an i7 8700k processor and 16GB RAM.Diﬀerent variable partitionings can lead to diﬀerences in the P -split formu-lations. For all the problems, the variables are simply partitioned based on theirordered indices. For the K-means clustering and P ball problems, we have usedthe smallest valid M-coeﬃcients and thight bounds for the α -variables. The K-means clustering and P ball problems both have analytical expressions for allthe bounds. For the NN problems tight bounds are not easily obtained, and thebounds are obtained using interval arithmetic. Table 2 shows the elapsed CPU time and number of nodes explored to solve eachproblem. The results show that P -split formulations can drastically reduce thenumber of explored nodes compared to the big-M formulation, even with only afew splits. The diﬀerences are clearest for the nonlinear problems, where both theCPU times and numbers of nodes are reduced by several orders of magnitude.As expected, the convex hull formulation results in the fewest explored nodes.However, the P -split formulations have a simpler problem formulation, reducingthe CPU times for all but one instance compared to the convex hull. The resultsclearly show the advantage of the intermediate P -split formulations, resultingin a tighter formulation than big-M and a computationally cheaper formulationthan the extended convex hull.Note that the P -split formulations are in general robust towards the choice of P . For the clustering and P ball problems, all P -split formulations outperformedthe big-M formulation both in terms of solution times and numbers of explorednodes. For the cases where the smallest P -split formulations timed out, Gurobiterminated with a much smaller gap compared to that of the big-M formulation.The P -split formulations also outperform the convex hull formulations in termsof solution time for a wide range of P in all but one of the test problems.For the NN problems, which have linear disjunctions, the situation is some-what diﬀerent. Here, while increasing P still decreased the number of explorednodes, the improvements are less signiﬁcant, and the trend is not completelymonotonic. Note that bounds on the inputs to layers 2–3 are computed using in-terval arithmetic, resulting in overall weaker relaxations for all formulations. Theweaker bounds in layers 2–3 reduce the beneﬁts of both the P -split and convexhull formulations, and may favor the simpler big-M formulation. As the reduc-tion in explored nodes is less drastic, smaller formulations perform the best interms of CPU time, supporting claims that extended formulations may performworse than expected [1,39]. This may also be a consequence of Gurobi eﬃcientlyhandling linear problems when it detects big-M-type constraints. Ignoring thebig-M (1-split), the 2- and 4-splits have the lowest CPU time for all NNs, andall the split formulations solve the problems signiﬁcantly faster than the convexhull formulation. The extended convex hull formulations for the nonlinear problems require auxiliaryvariables and (rotated) second order cone constraints. All P -split formulations havefewer variables and constraints and only contain linear/convex-quadratic constraints.elaxations between big-M and convex hull formulations 13 Table 2: CPU times [s] and numbers of nodes explored for test problems. Inbold is the winner for each test instance with respect to both time and numberof nodes. The grey shading shows the P -split times that strictly outperformboth the big-M and convex hull formulations. The time limit was 1800 CPUseconds. Cells marked NA correspond to instances with fewer than P terms perdisjunction. instance big-M 2-split 4-split 8-split 16-split 32-split convex hullCluster g1 time > > Cluster g2 time > > Cluster g3 time > > > > > > NA NA 42.2nodes 29493 7919 5518 2202

P ball 2 time > nodes > P ball 3 time > > > > > > big-M 14-split 28-split 56-split 196-split 392-split convex hullCluster m1 time > > > > > > Cluster m3 time > > > > > > > > *50-split is not the convex hull of each node for NN 3, which has layers of 100 nodes. We have presented a general framework for generating intermediate relaxationsin between the big-M and convex hull. The numerical results show a great po-tential of the intermediate relaxations, by providing a good approximation of theconvex hull through a computationally simpler problem. For several of the testproblems, the intermediate relaxations result in a similar number of explorednodes as the convex hull formulation while reducing the total solution time byan order of magnitude.

Acknowledgements

The research was funded by a Newton International Fellowship by the Royal Soci-ety (NIF \ R1 \ References

1. Anderson, R., Huchette, J., Ma, W., Tjandraatmadja, C., Vielma, J.P.: Strongmixed-integer programming formulations for trained neural networks. Mathemat-ical Programming pp. 1–37 (2020)2. Balas, E.: Disjunctive programming and a hierarchy of relaxations for discreteoptimization problems. SIAM Journal on Algebraic Discrete Methods (3), 466–486 (1985)3. Balas, E.: On the convex hull of the union of certain polyhedra. Operations Re-search Letters (6), 279–283 (1988)4. Balas, E.: Disjunctive programming: Properties of the convex hull of feasible points.Discrete Applied Mathematics (1-3), 3–44 (1998)5. Balas, E.: Disjunctive Programming. Springer International Publishing (2018).https://doi.org/10.1007/978-3-030-00148-36. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis,Algorithms, and Engineering Applications, vol. 2. Siam (2001)7. Bonami, P., Lodi, A., Tramontani, A., Wiese, S.: On mathematical programmingwith indicator constraints. Mathematical Programming (1), 191–223 (2015)8. Botoeva, E., Kouvaros, P., Kronqvist, J., Lomuscio, A., Misener, R.: Eﬃcient ver-iﬁcation of ReLU-based neural networks via dependency analysis. In: AAAI-20Proceedings. pp. 3291–3299 (2020)9. Ceria, S., Soares, J.: Convex programming for disjunctive convex optimization.Mathematical Programming (3), 595–614 (1999)10. Conforti, M., Cornu´ejols, G., Zambelli, G.: Integer programming, volume 271 ofgraduate texts in mathematics (2014)11. Conforti, M., Wolsey, L.A.: Compact formulations as a union of polyhedra. Math-ematical Programming (2), 277–289 (2008)12. Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization.Constraints (3), 296–309 (2018)13. Grimstad, B., Andersson, H.: ReLU networks as surrogate models in mixed-integerlinear programs. Computers & Chemical Engineering , 106580 (2019)14. Grossmann, I.E., Lee, S.: Generalized convex disjunctive programming: Nonlinearconvex hull relaxation. Computational optimization and applications (1), 83–100(2003)15. G¨unl¨uk, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinearprograms with indicator variables. Mathematical programming (1-2), 183–205(2010)16. Helton, J.W., Nie, J.: Suﬃcient and necessary conditions for semideﬁnite repre-sentability of convex hulls and sets. SIAM Journal on Optimization (2), 759–791(2009)elaxations between big-M and convex hull formulations 1517. Hijazi, H., Bonami, P., Cornu´ejols, G., Ouorou, A.: Mixed-integer nonlinear pro-grams featuring “on/oﬀ” constraints. Computational Optimization and Applica-tions (2), 537–558 (2012)18. Huang, C.F., Tseng, Y.C.: The coverage problem in a wireless sensor network.Mobile networks and Applications (4), 519–528 (2005)19. Jeroslow, R.G.: A simpliﬁcation for some disjunctive formulations. European Jour-nal of Operational research (1), 116–121 (1988)20. Jeroslow, R.G., Lowe, J.K.: Modelling with integer variables. In: MathematicalProgramming at Oberwolfach II, pp. 167–184. Springer (1984)21. Kronqvist, J., Misener, R.: A disjunctive cut strengthening technique for convexMINLP. Optimization and Engineering pp. 1–31 (2020)22. Lasserre, J.B.: An explicit exact SDP relaxation for nonlinear 0-1 programs. In: In-ternational Conference on Integer Programming and Combinatorial Optimization.pp. 293–303. Springer (2001)23. LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. ATT Labs[Online]. Available: http://yann.lecun.com/exdb/mnist (2010)24. Liittschwager, J., Wang, C.: Integer programming solution of a classiﬁcation prob-lem. Management Science (14), 1515–1525 (1978)25. Lov´asz, L., Schrijver, A.: Cones of matrices and set-functions and 0–1 optimization.SIAM Journal on Optimization (2), 166–190 (1991)26. MacQueen, J., et al.: Some methods for classiﬁcation and analysis of multivariateobservations. In: Proceedings of the ﬁfth Berkeley symposium on mathematicalstatistics and probability. vol. 1, pp. 281–297. Oakland, CA, USA (1967)27. Mariescu-Istodor, P.F.R., Zhong, C.: XNN graph LNCS 10029 , 207–217 (2016)28. Papageorgiou, D.J., Trespalacios, F.: Pseudo basic steps: bound improvement guar-antees from Lagrangian decomposition in convex disjunctive programming. EUROJournal on Computational Optimization (1), 55–83 (2018)29. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information ProcessingSystems. pp. 8026–8037 (2019)30. Rubin, P.A.: Solving mixed integer classiﬁcation problems by decomposition. An-nals of Operations Research , 51–64 (1997)31. Ruiz, J.P., Grossmann, I.E.: A hierarchy of relaxations for nonlinear convexgeneralized disjunctive programming. European Journal of Operational Research (1), 38–47 (2012)32. Sa˘glam, B., Salman, F.S., Sayın, S., T¨urkay, M.: A mixed-integer programmingapproach to the clustering problem with an application in customer segmentation.European Journal of Operational Research (3), 866–879 (2006)33. Sawaya, N.W., Grossmann, I.E.: Computational implementation of non-linear con-vex hull reformulation. Computers & Chemical Engineering (7), 856–866 (2007)34. Serra, T., Kumar, A., Ramalingam, S.: Lossless compression of deep neural net-works. In: Integration of Constraint Programming, Artiﬁcial Intelligence, and Op-erations Research. pp. 417–430. Springer (2020)35. Sherali, H.D., Adams, W.P.: A hierarchy of relaxations between the continuous andconvex hull representations for zero-one programming problems. SIAM Journal onDiscrete Mathematics (3), 411–430 (1990)36. Stubbs, R.A., Mehrotra, S.: A branch-and-cut method for 0-1 mixed convex pro-gramming. Mathematical Programming (3), 515–532 (1999)6 J. Kronqvist et al.37. Trespalacios, F., Grossmann, I.E.: Algorithmic approach for improved mixed-integer reformulations of convex generalized disjunctive programs. INFORMS Jour-nal on Computing (1), 59–74 (2015)38. Vielma, J.P.: Mixed integer linear programming formulation techniques. Siam Re-view (1), 3–57 (2015)39. Vielma, J.P.: Small and strong formulations for unions of convex sets from thecayley embedding. Mathematical Programming (1-2), 21–53 (2019)40. Vielma, J.P., Ahmed, S., Nemhauser, G.: Mixed-integer models for nonseparablepiecewise-linear optimization: unifying framework and extensions. Operations re-search (2), 303–315 (2010)41. Vielma, J.P., Nemhauser, G.L.: Modeling disjunctive constraints with a logarithmicnumber of binary variables and constraints. Mathematical Programming128