[PDF] On branching-point selection for trilinear monomials in spatial branch-and-bound: the hull relaxation

Abstract

In Speakman and Lee (2017), we analytically developed the idea of using volume as a measure for comparing relaxations in the context of spatial branch-and-bound. Specifically, for trilinear monomials, we analytically compared the three possible "double-McCormick relaxations" with the tight convex-hull relaxation. Here, again using volume as a measure, for the convex-hull relaxation of trilinear monomials, we establish simple rules for determining the optimal branching variable and optimal branching point. Additionally, we compare our results with current software practice.

Full PDF

OOn branching-point selection for trilinear monomialsin spatial branch-and-bound: the hull relaxation (cid:63)

Emily Speakman · Jon LeeAbstract

In Speakman and Lee (2017), we analytically developed the idea ofusing volume as a measure for comparing relaxations in the context of spatialbranch-and-bound. Speciﬁcally, for trilinear monomials, we analytically comparedthe three possible “double-McCormick relaxations” with the tight convex-hull re-laxation. Here, again using volume as a measure, for the convex-hull relaxation oftrilinear monomials, we establish simple rules for determining the optimal branch-ing variable and optimal branching point. Additionally, we compare our resultswith current software practice.

In this article, we consider the spatial branch-and-bound (sBB) family of algo-rithms (see, for example, [2],[19],[22], building on [13]) which aim to ﬁnd globally-optimal solutions of factorable mathematical-optimization formulations via a divide-and-conquer approach (building on the branch-and-bound approach for discreteoptimization, see [9] and [10]). Implementations of these sBB algorithms for fac-torable formulations work by introducing auxiliary variables in such a way as todecompose every function of the original formulation which we can then view asa labeled directed graph (DAG). Leaves correspond to original model variables,and we assume that the domain of each such model variable is a ﬁnite interval.We have a library of basic functions, including ‘linear combination’ of an arbitrarynumber variables, and other simple functions of a small number of variables. Theout-degree of each internal node, labeled by a library function f ∈ F that is not ‘linear combination’ is typically small (say d f ≤

3, for all f ∈ F ). We assume (cid:63) This work extends and presents parts of the ﬁrst author’s doctoral dissertation [26], andit corrects results ﬁrst announced in the short abstract [23].E. SpeakmanDept. of Industrial and Operations Engineering, University of Michigan, Ann Arbor.E-mail: [email protected]. LeeDept. of Industrial and Operations Engineering, University of Michigan, Ann Arbor.E-mail: [email protected] a r X i v : . [ m a t h . O C ] J a n Emily Speakman, Jon Lee that we have methods for convexifying each low-dimensional library function f on an arbitrary box domain in R d f . From these DAGs, relaxations are composedand reﬁned (see [4], for example). For a given function f , the associated DAG canbe constructed in more than one way, and therefore sBB has choices to make inthis regard. Such choices can have a strong impact on the quality of the convexrelaxation obtained from the formulation. Because sBB algorithms obtain boundsfrom these convex relaxations, these choices can have a signiﬁcant impact on theperformance of the algorithm.There has been substantial research on how to obtain good-quality convexrelaxations of graphs of low-dimensional nonlinear functions on various domains(see, for example, [8], [21], [15], [14], [18], [12]), and some consideration has beengiven to constructing DAGs in a favorable way. In particular, in [24], we ob-tained analytic results regarding the convexiﬁcations obtained from diﬀerent waysof treating trilinear monomials, f = x x x , on non-negative box domains { x ∈ R : x i ∈ [ a i , b i ] , i = 1 , , } . We computed both the extreme point and in-equality representations of the alternative relaxations (derived from iterating Mc-Cormick inequalities) and calculated their 4-dimensional volumes (in the space of { ( f, x i , x j , x k ) ∈ R } ) as a comparison measure. Using volume as a measure givesa way to analytically compare formulations and corresponds to a uniform distri-bution of the optimal solution across a relaxation; when concerned with non-linearoptimization, such a uniform distribution is quite natural. This is in contrast tolinear optimization, where an optimal solution can always be found at an extremepoint, and therefore, the distribution of the optimal solution (or optimal solutions)is clearly not uniform across the feasible region. Experimental corroboration forusing volume as a measure of the quality of relaxations for trilinear monomialsappears in [25] (also see [5], concerning quadrilinear monomials).Along with utilizing good convex relaxations, other important issues in theeﬀective implementation of sBB for factorable formulations are: (i) the choice ofbranching variable, and (ii) the selection of the branching point. Software develop-ers have tuned their choice of branching point using extensive problem test beds.It is common practice for solvers to branch on the value of the variable at thecurrent solution, adjusted using some method to ensure that the branching pointis not too close to either of the interval endpoints. Often this is done by weightingthe interval midpoint and the variable at the optimal solution of the current re-laxation, and/or restricting the branching choice to a central part of the interval.For example, in [6] (also see [7]), they suggest branching at the current relaxationpoint when it is in the middle 60% of the interval and failing that, branch atthe midpoint. The ooOPS software, see [3], uses the solution of an upper-boundingproblem as a reference solution, if such a solution is found; otherwise the solutionof the lower-bounding relaxation is used as a reference solution. ooOPS then iden-tiﬁes the non-convex term with the greatest separation distance with respect toits convex relaxation. The branching variable is then chosen as the variable whosevalue at the reference solution is nearest to the midpoint of its range. But it is not clear how ooOPS then chooses the branching point.[27] describe a typical way to avoid the interval endpoints by choosing thebranching point asmax (cid:26) a i + β ( a i − b i ) , min (cid:110) b i − β ( b i − a i ) , α ˆ x i + (1 − α )( a i + b i ) / (cid:111)(cid:27) , (1) ranching-point selection for trilinear monomials 3 where ˆ x i is the value of the branching variable x i at the current solution. Theconstants α ∈ [0 ,

1] and β ∈ [0 , /

2] are algorithm parameters. So, the branchingpoint is the closest point in the interval[ a i + β ( a i − b i ) , b i − β ( b i − a i )]to the weighted combination α ˆ x i + (1 − α )( a i + b i ) / x -value in the currentoptimal solution and the interval midpoint), thus explicitly ruling out branchingin the bottom and top β fraction of the interval. Note that if β ≤ (1 − α ) /

2, thenthere is no such explicit restriction, because already the weighted combination α ˆ x i + (1 − α )( a i + b i ) / − α ) / α and β .The method (mostly) employed by SCIP (see [1], [28] and the open-source codeitself) is to select the branching point as the closest point in the middle 60% of theinterval to the variable value ˆ x i . This is equivalent to setting α = 1 and β = 0 . β . The current default settings of ANTIGONE ([16] and [17]), BARON ([20]) and COUENNE (see [4] and the open-sourcecode itself) all have β ≤ (1 − α ) /

2, and so the default branching point is simplythe weighted combination α ˆ x i + (1 − α )( a i + b i ) /

2; see Table 1.

Solver α β

SCIP .

00 0 . (cid:54)≤ (1 − α ) / . ANTIGONE .

75 0 . ≤ (1 − α ) / . BARON .

70 0 . ≤ (1 − α ) / . COUENNE .

25 0 . ≤ (1 − α ) / . Table 1:

Default parameter settings

The diﬀerent choices are based on combinations of intuition and substantialempirical evidence gathered by the software developers. We note that there is con-siderable variation in the settings of these parameters, across the various softwarepackages. Furthermore, there are other factors (especially in

BARON ) that sometimessupersede selecting a branching point according to formula (1); in particular, func-tional forms involved, the solution of the current relaxation, available incumbentsolutions, complementarity considerations, etc. Our work is based solely on ana-lyzing a single trilinear monomial, after branching on a variable in that trilinearmonomial, with the goal of helping to guide, and in some cases mathematicallysupport, the choice of a branching point. Of course variables often appear in mul-tiple functions. So, when deciding on a branching variable or a branching point, wemay obtain conﬂicting guidance. But this is an issue with most branching rules,including those developed empirically, and it is always a challenge to ﬁnd good ways to combine local information to make algorithmic decisions (see [2]). We hopethat our results can help inﬂuence such decisions. For example, taking weightedaverages of scores based on our metric would be a reasonable way to proceed. Private communication with Ruth Misener Private communication with Nick Sahinidis Emily Speakman, Jon Lee

In this work, we focus on trilinear monomials; that is, functions of the form f = x x x . This is an important class of functions for sBB algorithms, becausesuch monomials may also involve auxiliary variables. This means that whenever aformulation contains the product of three (or more) expressions (possibly compli-cated themselves), our results apply.Following [24], for the variables x i ∈ [ a i , b i ], i = 1 , ,

3, throughout this paperwe assume the following conditions hold:0 ≤ a i < b i for i = 1 , , , and a b b + b a a ≤ b a b + a b a ≤ b b a + a a b . (cid:41) (Ω)To see that the latter two inequalities are without loss of generality, let O i := a i ( b j b k ) + b i ( a j a k ), for i = 1 , ,

3. Then we can construct a labeling such that O ≤ O ≤ O . Note that because we are only considering non-negative bounds,the latter part of this condition is equivalent to: a b ≤ a b ≤ a b . (2)This follows from Lemma 5 (in the Appendix), and that b i > , i = 1 , ,

3. Also, itis very important to note that once we have labeled our variables to satisfy Ω , ourtrilinear monomial cannot be treated as symmetric across variables. This conditionalso arises in the complete characterization of the inequality description for the(polyhedral) convex hull of the graph of the trilinear monomial f := x x x (in R ) (see [15] and [14]).We introduce the following notation for the convex hull of the graph of f := x x x on a box domain: P h := conv (cid:16)(cid:110) ( f, x , x , x ) ∈ R : f = x x x , x i ∈ [ a i , b i ] , i = 1 , , (cid:111)(cid:17) . Instead of referring to convex lower envelopes and concave upper envelopes, wetake the view that any given monomial is likely to be composed in many diﬀerentways in a complicated formulation, and so we are agnostic about focusing on onlyone of convex lower envelopes and concave upper envelopes, and rather we lookat the convex hull of the graph of the function on the domain of interest (and it’stotal volume; not just the volume below or above the graph).The extreme points of P h are the eight points that correspond to the 2 = 8choices of each x -variable at its upper or lower bound (see [18]). We label theseeight points (all of the form [ x x x , x , x , x ] T ) as follows: v :=  b a a b a a  , v :=  a a a a a a  , v :=  a a b a a b  , v :=  a b a a b a  ,v :=  a b b a b b  , v :=  b b b b b b  , v :=  b b a b b a  , v :=  b a b b a b  . ranching-point selection for trilinear monomials 5 The (complicated) inequality description of the convex hull (see [15] and [14])is directly used by some global-optimization software (e.g.,

BARON and

ANTIGONE ).However, other software packages (e.g.,

COUENNE and

SCIP ) instead use McCormickinequalities iteratively to obtain a (simpler) convex relaxation for trilinear mono-mials. These alternative approaches reﬂect the tradeoﬀ between using a more com-plicated but stronger convexiﬁcation and a simpler but weaker one, especially inthe context of global optimization (see [11], for example).From [24], we have a formula for the volume of the convex-hull relaxation(additionally, for the various double-McCormick relaxations), parameterized interms of the upper and lower variable bounds.

Theorem 1 (see [24])

Under Ω , we have vol( P h ) = ( b − a )( b − a )( b − a ) × ( b (5 b b − a b − b a − a a ) + a (5 a a − b a − a b − b b )) / . Note that due to the asymmetry introduced by Ω, the formula does not treatall variables in the same manner. In particular, the role of x is quite diﬀerentthan the roles of x and x (which can be interchanged). This observation is veryimportant in our analysis that follows.In the context of branching within sBB, let c i ∈ [ a i , b i ] be the branchingpoint of variable x i . We obtain two children. By substituting a i = c i and b i = c i (respectively) for a given variable x i into the appropriate formula (i.e., Theorem1 after a possible relabeling of the variables ), and summing the results, we obtainthe total resulting volume of the relaxations at the two child nodes, given thatwe branch on variable x i at point c i . It is important to realize that the volumeformula only holds when the labeling Ω is respected. Given when we branch, thebounds in our problem change, we must be careful to ensure that we always usethe formula correctly (i.e. if necessary, we relabel the variables to ensure that Ωholds).In §

3, we present our results analyzing optimal branching-point selection for x . Then, in §

4, we present the analysis for x and x . Due to the special roleof x , we will see that the analysis (and even the result) is signiﬁcantly simplerfor x and x than for x . In §

5, we analyze branching-variable selection, and inparticular, we demonstrate that it is always best to branch on x . In §

6, we makesome concluding remarks. In the Appendix, we provide proofs of various technicalresults that we utilize. x First, we deﬁne the following quantities (note that because we assume b i > a i , i =1 , ,

3, the denominators will not be zero for any valid parameter choice): (3) q : = 3 a a a + a a b − a b a − a b b + 4 b a a − b b b a a + a b − b b ) ; (4) q : = a + b Emily Speakman, Jon Lee (5) q : = 4 a a a − a b b + 3 b a a + b a b − b b a − b b b a a − b a − b b ) . Next, we refer to Procedure 1 which depicts a procedure for choosing a branchingpoint when branching on variable x . Note that q is not used in the procedure,but it is used in the analysis of the procedure. Procedure 1:

Output is the optimal branching point when branching on variable x First, consider what happens when we pick a branching variable x i , and branchat a given point c i : we obtain two children, now with diﬀerent bounds on thebranching variable. The upper bound of the branching variable in the left child becomes the value of the branching point, as does the lower bound of the branching variable in the right child . That is, the domain of x i for the left child is [ a i , c i ],and the domain of x i for the right child is [ c i , b i ]. We reconvexify the two childrenusing our chosen method of convexiﬁcation (i.e., the convex hull), and we can sumthe volumes from both children to obtain the total volume when branching atthat given point. We are interested in ﬁnding the branching point that leads to ranching-point selection for trilinear monomials 7 the least total volume. For an example of this principle in a lower dimension, seethe diagram of Figure 1 which illustrates reconvexifying after branching in sBB.Here, because we have a one dimensional function, the graph of the function is aset in R . Therefore, in the context of this diagram, we wish to ﬁnd the branchingpoint that minimizes the sum of the areas of the two diagonally striped (green)regions. Clearly this depends on the choice of convexiﬁcation method. Fig. 1:

Illustration of sBB for a 1-dimensional function

We can compute the volume of the relaxation for each of the children usingTheorem 1 (i.e., Theorem 4.1 from [24]). To ensure that we compute the appropri-ate volumes, we need to check that as the bounds on the branching variable change,we still respect the labeling Ω. To illustrate this, consider the left child obtained bybranching on variable x at some point c ∈ [ a , b ]. For this left child, the lowerbound on the branching variable remains the same and the new upper bound is c .We can see that if c is ‘close enough’ to b , then Ω will remain satisﬁed, howeveras c decreases, there comes a point where the labeling must change. By simplealgebra, we calculate that this critical point is when a c = a b ⇔ c = a b a (assum-ing for now that a > c . When c is close to a , Ω will remain satisﬁed; however,as c becomes larger, eventually the labeling must change. This critical point for the right child is at c b = a b ⇔ c = b a b .We note that because of the structure of the volume function of the convex hull,(see Theorem 1), the second and third variables are interchangeable. This meansthat we do not need to consider what happens when the bounds vary enough for x to be relabeled as x . Emily Speakman, Jon Lee

Before stating Theorem 2, we need to clarify some deﬁnitions and state fourtechnical lemmas that will be needed in the proof. The proofs of these four lemmascan be found in the Appendix.We ﬁrst deﬁne V ( l , u , l , u , l , u ) : = ( u − l )( u − l )( u − l ) × ( u (5 u u − l u − u l − l l ) + l (5 l l − u l − l u − u u )) / , to be the volume of the convex hull with variable lower bounds l i and upperbounds, u i , for i = 1 , , T V ( c ) : =  V ( c ) , a ≤ c ≤ b a b ; V ( c ) , b a b < c < a b a ; V ( c ) , a b a ≤ c ≤ b , where: V ( c ) := V ( a , b , a , c , a , b ) + V ( c , b , a , b , a , b ) ,V ( c ) := V ( a , b , a , c , a , b ) + V ( a , b , c , b , a , b ) ,V ( c ) := V ( a , c , a , b , a , b ) + V ( a , b , c , b , a , b ) . And ﬁnally the second parameterized function: (7) (cid:100)

T V ( c ) : =  V ( c ) a ≤ c ≤ b a b ; V ( c ) b a b < c < a b a ; V ( c ) a b a ≤ c ≤ b , where V ( c ) and V ( c ) are deﬁned as before and: V ( c ) := V ( a , c , a , b , a , b ) + V ( c , b , a , b , a , b ) . Both

T V ( c ) and (cid:100) T V ( c ) are piecewise-quadratic functions in c . We can easilyobserve this by noticing that V is the product of a pair of multilinear functions inthe parameters. Lemma 1

Given that the upper- and lower-bound parameters respect the labeling Ω , and b a b ≤ a b a , V (cid:18) b a b (cid:19) = V (cid:18) b a b (cid:19) ≥ V (cid:18) a b a (cid:19) = V (cid:18) a b a (cid:19) . Lemma 2

Given that the upper- and lower-bound parameters respect the labeling Ω , and b a b > a b a , V (cid:18) a b a (cid:19) = V (cid:18) a b a (cid:19) ≥ V (cid:18) b a b (cid:19) = V (cid:18) b a b (cid:19) . Lemma 3

Given that the parameters satisfy the conditions Ω , and furthermore, b a b ≤ a b a , we have q ≥ b a b . ranching-point selection for trilinear monomials 9 Lemma 4

Given that the parameters satisfy the conditions Ω , and furthermore, b a b ≥ a b a , we have q ≥ a b a . We are now ready to state the theorem.

Theorem 2

Assume initial bounds a , b , a , b , a , b that satisfy Ω and that webranch on x . Furthermore, assume that q and q are deﬁned as in Equation 4and Equation 5. Procedure 1 gives the optimal branching point with respect to min-imizing the sum of the volumes of the two convex-hull relaxations of the children.Proof Given our earlier discussion, it is natural to think about three cases. First,when a = 0 (we refer to this as Case 0). Second (Case 1), when a (cid:54) = 0 and b a b ≤ a b a ⇐⇒ a b ≤ a b ⇐⇒ b a ≤ a b , and third (Case 2), when a (cid:54) = 0 and b a b > a b a ⇐⇒ a b > a b ⇐⇒ b a > a b . The case of equality, i.e., b a b = a b a , is arbitrarily included with Case 1. Infact, when equality holds, the analysis that follows is simpliﬁed, and it could becontained in either of the cases.Depending on the case, the necessary relabeling to ensure Ω remains satisﬁed isdiﬀerent, and the functions we deﬁned as V i ( c ) , i = 1 . . . Case 0: a = 0 . From the condition Ω, we know that a = 0 ⇒ a = 0. In thisspecial case, the labeling for the left child does not change no matter how smallthe upper bound becomes. Conversely, the labeling for the right child changes assoon as the lower bound becomes positive. We therefore have the picture shown inFigure 2, and the function (i.e. V ( c )) describing the sum of the volumes of thetwo child relaxations over the entire domain, [ a , b ], does not need to be deﬁnedin a piecewise manner. As we will see shortly, this function is a convex quadratic, and therefore it is easy to check (by calculating where the derivative is zero) thatin this special case the minimizer of this function is q (deﬁned above) and this isthe minimizer of the total volume of the two children. Furthermore, when a = 0(and therefore a = 0), this minimizer simpliﬁes to b = a + b = q , the midpointof the interval. Fig. 2:

Variable labeling as the branching point varies for the three cases

Case 1: b a b ≤ a b a . As illustrated in Figure 2, in Case 1, the function describingthe sum of the volumes of the child relaxations is

T V ( c ) (Equation 6). It isstraightforward to check that the function T V ( c ) is continuous over its domain.Furthermore, by observing that the leading coeﬃcient of each piece ( V i ( c ) , i = , ,

3) is positive for all parameter values satisfying Ω, we conclude that each pieceis strictly convex. We are able to claim strict convexity because we assume b i > a i for all i . Using this fact, for each coeﬃcient below we observe that each multiplicandin the numerator is strictly positive and therefore each leading coeﬃcient is strictlypositive. ranching-point selection for trilinear monomials 11 The coeﬃcient of c in the quadratic function V ( c ) is:( b − a )( b − a )(6( b b − a a ) + 2 b ( b − a ))24 > . The coeﬃcient of c in the quadratic function V ( c ) is:( b − a )( b − a )(4( b b − a a ) + 2( b + a )( b − a ))24 > . The coeﬃcient of c in the quadratic function V ( c ) is:( b − a )( b − a )(6( b b − a a ) + 2 a ( b − a ))24 > . Figure 3 gives some idea of what this function could look like. The exampledepicts a globally convex function and we are yet to prove that this will always bethe case. However, in later analysis (Theorem 3 in § Fig. 3:

Illustration of a (globally convex) continuous piecewise-quadratic function

Now that we know that

T V ( c ) has this structure, to ﬁnd the global minimizerover the domain [ a , b ], we can simply ﬁnd the local minimizer on each of thethree pieces and pick the point with the least function value. Because we haveconvex functions, the local minimum of a given piece will either occur at the global minimizer of V i ( c ) (if this occurs over the appropriate subdomain), or atone of the end points of the subdomain. Therefore, to ﬁnd the local minimizer fora given segment, we ﬁrst ﬁnd the global minimizer of V i ( c ) over the entire realline and check if it occurs in the interval; if so, it is the local minimizer, if not,we examine the interval end points to locate the local minimizer. We can then compare the function value of the local minimizer of each of the three pieces toﬁnd the global minimizer of T V ( c ), i.e., the branching point that obtains theleast total volume.Given that each V i is a parameterised convex-quadratic function in c , it iseasy to use a computer algebra system to calculate the following:The minimum of V ( c ) occurs at: c = 3 a a a + a a b − a b a − a b b + 4 b a a − b b b a a + a b − b b ) = q . The minimum of V ( c ) occurs at: c = a + b q . The minimum of V ( c ) occurs at: c = 4 a a a − a b b + 3 b a a + b a b − b b a − b b b a a − b a − b b ) = q . Therefore, the candidate points for the minimizer are a , b a b , a b a , b , q , q and q . We can immediately discard a and b because these are both equiv-alent to not branching. By branching and reconvexifying over the two children,we can never do worse with regard to volume. Therefore, we have ﬁve points toconsider. For a given set of parameters, it is straightforward to evaluate and checkwhich of these ﬁve points is the minimizer. However, making use of the followingobservations, we can further reduce the possibilities.If q were to be the global minimizer, then it must fall in the appropriatesubdomain; i.e., it must be that q ≤ b a b . However, by Lemma 3, in Case 1we always have q ≥ b a b . Therefore, we can discard q as a candidate point forthe minimizer because for it to be the minimizer, this quantity would have to beexactly equal to b a b , which is already on the list of candidate points.Now, consider the quantities: q − q = ( b − a )( b a − a b )2(4 b b − a b − a a ) ≥ , (8)and q − q = ( a − b )( b a − a b )2(3 b b + b a − a a ) ≤ . (9)The inequalities follow from b i > a i , i = 1 , ,

3, and Lemma 5 in the Appendix.We therefore have: q ≥ q = a + b ≥ q . (10) From this, we can observe that if q ≥ a b a , then q ≥ q ≥ a b a , and therefore q is the minimizer. This is because neither q nor q fall in their key intervals(i.e. in the appropriate subdomain); furthermore, by the deﬁnition of q as theminimizer of V , we must have that V ( q ) ≤ V (cid:16) a b a (cid:17) , and by Lemma 1, we havethat V (cid:16) a b a (cid:17) ≤ V (cid:16) b a b (cid:17) . ranching-point selection for trilinear monomials 13 If this does not occur, i.e. q < a b a , then if b a b ≤ a + b ≤ a b a , the midpoint q is the minimizer. This is because under these conditions, q is the only minimizerthat occurs in the ‘correct’ function piece, and by deﬁnition of q as the minimizerof V , the function value is not more than at either of the end points.Otherwise, if none of the above occurs (i.e., none of the intervals contain theirfunction global minimizer), we have that a b a is the minimizer by Lemma 1. Case 2: b a b > a b a . In this second case, for a given problem with initial upperand lower bounds ( a , b , a , b , a , b ), the sum of the volumes of the two childrelaxations after branching at point c , is given by the function (cid:100) T V ( c ) (Equation7 and illustrated in Figure 2). This is similar, but distinct, from the function inCase 1.Recall that this is a piecewise-quadratic function in c , and, as before, it issimple to check that the function is continuous over its domain. Furthermore, byobserving that the leading coeﬃcient of each piece is positive for all parametervalues satisfying Ω, we know that each piece is strictly convex. Strict convexitycomes from the knowledge b i > a i , i = 1 , , c in the quadratic function V ( c ) is:8( b − a )( b − a )( b b − a a )24 > . Therefore, we can take the same approach as before to ﬁnd the global mini-mizer: ﬁrst ﬁnd the local minimizer for each segment. We do this by ﬁnding theglobal minimizer for the appropriate function ( V i ( c )), over the whole real line andchecking if it occurs in the segment. If it does, we have found the minimizer forthat segment, if not, we examine the interval end points. We then compare theminimum in each of the three segments to ﬁnd the branching point that obtainsthe least total volume.From our analysis of Case 1, we know that the minimums of V ( c ) and V ( c )occur at q and q respectively. We compute that the minimum of V ( c ) occursat the midpoint of the whole interval, i.e., at c = a + b q . As before, the candidate points for the minimizer are b a b , a b a , q , q and q .However, by making the following observations we can further reduce the pointswe need to examine.If q were to be the global minimizer, then it must fall in the appropriatesubdomain, i.e., it must be that q ≤ a b a . However, by Lemma 4, in Case 2 wealways have q ≥ a b a . Therefore, we can discard q as a candidate point for theminimizer because for it to be the minimizer it would have to be exactly equal to a b a , which is already on the list of candidate points. If q ≥ b a b , then q ≥ q ≥ b a b , and therefore q is the minimizer. This isbecause neither q nor q fall in their key intervals; furthermore, by deﬁnition of q as the minimizer of V , we must have that V ( q ) ≤ V (cid:16) b a b (cid:17) , and by Lemma2 we know that V (cid:16) b a b (cid:17) ≤ V (cid:16) a b a (cid:17) . If this does not occur, i.e. q < b a b , then if a b a ≤ a + b ≤ b a b , the midpoint q is the minimizer. This is because under these conditions, q is the only minimizerthat occurs in the ‘correct’ function piece, and by deﬁnition of q as the minimizerof V , the function value is no more than at either of the end points.Otherwise, we have that b a b is the minimizer by Lemma 2. Therefore Proce-dure 2 is correct. (cid:117)(cid:116) q ≤ b a b , then q ≤ q ≤ q ≤ b a b , and therefore q would be the minimizer. This is because neither q nor q would fall in their keyintervals; furthermore, by the deﬁnition of q as the minimizer of V , we havethat V ( q ) ≤ V (cid:16) b a b (cid:17) , and by Proposition 1(see the Appendix), we know that V ( q ) ≤ V (cid:16) a b a (cid:17) . However, by Lemma 3 we have already discarded this case.As another interesting side point, we also note that in Case 2, if it were possibleto have q ≤ a b a , then q ≤ q ≤ q ≤ a b a , and q would be the minimizer. Thisis because neither q nor q would fall in their key intervals. Furthermore, bydeﬁnition of q as the minimizer of V , we must have that V ( q ) ≤ V (cid:16) a b a (cid:17) , andby Proposition 2 (see the Appendix), we know that V ( q ) ≤ V (cid:16) b a b (cid:17) . However,by Lemma 4 we have already discarded this case.3.2 Some examplesWe can illustrate these piecewise-quadratic functions for the possible outcomesof Procedure 1. In this illustration, we focus on Case 1, and therefore Figure 4shows the function T V ( c ) over the domain [ a , b ]. The (orange) dashed curveillustrates an example where the minimizer of V ( c ), (i.e. q ), falls in the relevantinterval, and therefore is the minimizer over our whole domain. The (purple) solidcurve illustrates an example where q does not fall in this interval, however themidpoint, q , falls in between the quantities b a b and a b a and is therefore therequired minimizer. The (green) dotted curve illustrates an example where neitherof the above happens, and therefore the breakpoint between the function V ( c )and the function V ( c ) is the minimizer. In this example we are in Case 1, andtherefore this point is a b a .It is important to note that each of the cases in Procedure 1 actually can occur.It is easy to check the following: – An example of a dashed curve (minimum occurs at q ) is ( a = 1, b = 35, a = 2, b = 12, a = 12, b = 35). – An example of a solid curve (minimum occurs at q ) is ( a = 1, b = 34, a = 2, b = 36, a = 12, b = 35). – An example of a dotted curve (minimum occurs at a b a ) is ( a = 1, b = 8, a = 5, b = 22, a = 1, b = 4). ranching-point selection for trilinear monomials 15 Unfortunately, the plots of the actual functions do not display the key details asclearly as our illustration, so we do not include them here.Furthermore, an example of Case 2, where the minimum occurs at the break-point between the function V and the function V , i.e. the point b a b is ( a = 1, b = 13, a = 1, b = 2, a = 2, b = 4). Finally, a simple example of Case 0, isthe special case ( a = 0, b = 1, a = 0, b = 1, a = 0, b = 1). In Figure 5 wecan see the plot of this function and the minimum, which falls at the midpoint. InCase 0 we always have q = q = q = a + b = b .3.3 Global convexity of our piecewise-quadratic function over its domainWe have seen that each piece of T V ( c ) and (cid:100) T V ( c ) is a convex quadratic func-tion. However, this does not imply that the functions are convex over the wholedomain, [ a , b ]. Nevertheless, as we show in the following theorem, with a bitmore work, we are able to demonstrate that T V ( c ) and (cid:100) T V ( c ) are convex overthe domain, [ a , b ]. It is very useful that these functions are globally convex; if avariable appears in many trilinear terms, it is quite reasonable to combine volumesin a reasonable manner (see [25]). For example, we can take a weighted average (ofthe sum of the two volumes for each term) as a measure for deciding on a branch-ing point. A weighted average (assuming positive weights) of convex functions isconvex, and therefore, the global-convexity property of these functions allows us toﬁnd the optimal branching point (deﬁned as the minimum of the weighted-averagefunction) by a simple bisection search. However, it is important to note that weare not advocating a bisection search if there is only one term being considered.In this case, Procedure (1) is more eﬃcient. Fig. 4:

Picture to illustrate the possible outcomes of Procedure 1 in Case 16 Emily Speakman, Jon Lee

Fig. 5:

Plot of the total volume function (when branching on x ), for parameter values:( a = 0 , b = 1 , a = 0 , b = 1 , a = 0 , b = 1) Theorem 3

Given that the upper- and lower-bound parameters respect the labeling Ω , the functions T V ( c ) and (cid:100) T V ( c ) are globally-convex functions in the branchingpoint c over the domain [ a , b ] .Proof To demonstrate the global convexity of

T V ( c ), we will establish that it isthe pointwise maximum of the convex functions V ( c ), V ( c ) and V ( c ). Simi-larly, to demonstrate the global convexity of (cid:100) T V ( c ), we will establish that it isthe pointwise maximum of the convex functions V ( c ), V ( c ) and V ( c ). Global convexity of

T V ( c : Consider the diﬀerence of V ( c ) and V ( c ): V ( c ) − V ( c ) = ( b − a ) ( b − c )( b − a )( b a − c b )12 . Note that for all parameter values such that Ω is satisﬁed and a < c V ( c ) if and only if c V ( c )if and only if c > b a b . They are equal when c = b a b .Now consider the diﬀerence of V ( c ) and V ( c ): V ( c ) − V ( c ) = ( b − a ) ( c − a )( b − a )( c a − a b )12 . Again, note that for all parameter values such that Ω is satisﬁed and a V ( c ) if and only if c > a b a and conversely V ( c ) < V ( c ) if and only if c < a b a . They are equal when c = a b a . Alsorecall that in the deﬁnition of T V ( c ), we implicitly assume b a b ≥ a b a . We canmake the following observations. ranching-point selection for trilinear monomials 17 On the interval c ∈ (cid:16) a , b a b (cid:17) we have V ( c ) > V ( c ) > V ( c ), at c = b a b we have V ( c ) = V ( c ) > V ( c ), on the interval c ∈ (cid:16) b a b , a b a (cid:17) we have V ( c ) > V ( c ) and V ( c ) > V ( c ). At c = a b a we have V ( c ) = V ( c ) >V ( c ) and on the interval c ∈ (cid:16) a b a , b (cid:17) we have V ( c ) > V ( c ) > V ( c ).Furthermore, when c = a , we have V ( c ) > V ( c ) = V ( c ) and when c = b we have V ( c ) > V ( c ) = V ( c ).From these observations, it is clear that T V ( c ), is the pointwise maximum ofthe convex functions V ( c ), V ( c ) and V ( c ) over the domain [ a , b ] thereforewe observe that T V ( c ) is globally convex over the domain [ a , b ]. Global convexity of (cid:100)

T V ( c : Consider the diﬀerence of V ( c ) and V ( c ): V ( c ) − V ( c ) = ( b − a ) ( c − a )( b − a )( a b − c a )12 . Note that for all parameter values such that Ω is satisﬁed and a < c V ( c ) if and only if c < a b a and conversely V ( c ) > V ( c )if and only if c > a b a . They are equal when c = a b a .Now consider the diﬀerence of V ( c ) and V ( c ): V ( c ) − V ( c ) = ( b − a ) ( b − c )( b − a )( c b − b a )12 . Again, note that for all parameter values such that Ω is satisﬁed and a V ( c ) if and only if c > b a b and conversely V ( c ) < V ( c ) if and only if c < b a b . They are equal when c = b a b .Now recall that (cid:100) T V ( c ) is deﬁned with the assumption b a b < a b a . We canmake almost identical observations to those above to see that (cid:100) T V ( c ) is the point-wise maximum of the convex functions V ( c ), V ( c ) and V ( c ) and therefore isalso globally convex over the domain [ a , b ]. (cid:117)(cid:116) § Theorem 4

The branching point for variable x that obtains the least total vol-ume, never occurs at a point in the interval greater than the midpoint. Proof If a = 0, then we are in Case 0, and the minimizer is at the midpoint,which is clearly no greater than the midpoint.If a b a ≥ b a b , then we are in Case 1. If q ≥ a b a , then q is the minimizer,but we know that q ≤ a + b (see Equation 9). If q = a + b falls in the interval (cid:104) b a b , a b a (cid:105) , then the midpoint is the minimizer. If it does not, then (i) a b a is theminimizer, and (ii) it must be that either that a + b > a b a , in which case ourclaim is valid, or a + b , and2 a b − a a − b a = a ( b − a ) + ( a b − b a ) > . Now let X := b − a and Y := b a − a b (note that both X and Y are non-negative: Lemma 5). Therefore we can write our assumption as: b ( − X ) + Y > a ( X ) + ( − Y ) > , which implies Y > b X and Y < a X, a contradiction. Therefore, in Case 1 the minimizer must be no larger than themidpoint.We make a similar argument for Case 2. Here a b a < b a b . If q ≥ b a b , then q is the minimizer, but we know that q ≤ a + b (see Equation 9). If q = a + b falls in the interval (cid:104) a b a , b a b (cid:105) , then the midpoint is the minimizer. If it does not,then (i) b a b is the minimizer, and (ii) it must be that either that a + b > b a b ,in which case our claim is valid, or a + b < a b a < b a b . However, we have justshown by contradiction that this cannot be the case. Therefore, in Case 2 theminimizer must be no larger than the midpoint. (cid:117)(cid:116) This theorem gives an upper bound on the fraction through the interval theminimizer can fall (namely ). Furthermore, this bound is sharp (i.e. it is obtainedand therefore cannot be strengthened) because we know examples when the min-imizer is exactly at the midpoint. It would be nice to also obtain a sharp lowerbound on this fraction. By demonstrating that the minimizer cannot fall too closeto the end points of the interval, we are providing mathematical evidence to justifythe current choices of branching point in software, as discussed in §

1. The followingtheorem gives a lower bound on this fraction when a (cid:54) = 0, (when a = 0, we knowthat the minimizer will be exactly at the midpoint). We note that because of thecondition Ω, the problem is no longer symmetric and therefore knowledge aboutthe upper bound does not allow us to draw conclusions about the lower bound. Theorem 5

Given upper- and lower-bound parameters ( a , b , a , b , a , b )satisfying Ω , and a (cid:54) = 0 . The branching point for variable x that obtains the least total volume, never occurs at a point in the interval less than min (cid:26) max (cid:26) a ( b − a ) a ( b − a ) , b a − a b b b − a b (cid:27) , (cid:27) of the way through the interval. ranching-point selection for trilinear monomials 19 Proof

There are four candidate points where the minimizer can occur. Namely, q = a + b , q , a b a , and b a b . Thereforemin (cid:26) a + b , q , a b a , b a b (cid:27) , is a trivial lower bound on this minimizer.We know that if q is the minimizer, then we must have q ≥ a b a (Case 1), or q ≥ b a b (Case 2), so we can discard this point.Additionally, we know that if a b a is the minimizer, then we have a b a ≥ b a b (Case 1), and if b a b is the minimizer, then we have b a b > a b a (Case 2).Therefore we have that a lower bound on the minimizer is:min (cid:26) max (cid:26) a b a , b a b (cid:27) , a + b (cid:27) . Moreover, a lower bound for the fraction of the interval where this point can fallis: min (cid:40) max (cid:40) a b a − a b − a , b a b − a b − a (cid:41) , a + b − a b − a (cid:41) = min (cid:26) max (cid:26) a ( b − a ) a ( b − a ) , b a − a b b b − a b (cid:27) , (cid:27) . (cid:117)(cid:116) We note that this lower bound is unlikely to be sharp. Consider the case where a = 0, a = (cid:15) > b = 1. This bound becomes (cid:15) , and is therefore notparticularly informative, given that we can make (cid:15) as close to zero as we wish.However, we have computationally checked many examples, and we have yet toﬁnd an example where the minimizer occurs less than ∼ .

45 of the way throughthe interval. It would be nice to sharpen this bound, and our computations indicatethat this should be possible. x and x We noted in § x case, recall the condition Ω, which due to ournon-negativity assumption can be written as a b ≤ a b ≤ a b . Now consider what happens to the quantity a b when we branch on x . In theleft interval, a remains constant, and b becomes the branching point, c a . Therefore, again, a b cannot decreasefurther. Because of this, the labeling for x and x will not have to be switched to ensure Ω remains satisﬁed. Furthermore, x and x are interchangeable in theformula, so we do not need to consider what happens when the ratios change suchthat a b > a b .The case of x and x therefore both require the analysis of only one convexquadratic function. This is formalized in the following theorem. Theorem 6

Let c i ∈ [ a i , b i ] be the branching point for x i , i = 2 , . With theconvex-hull relaxation, the least total volume after branching is obtained when c i =( a i + b i ) / , i.e., branching at the midpoint is optimal.Proof We ﬁrst consider branching on x . Consider the sum of the two resultingvolumes, given by the following function: T V ( c ) = V ( a , b , c , b , a , b ) + V ( a , b , a , c , a , b ) , which is quadratic in c . The leading coeﬃcient (i.e. second derivative) is T V ( c ) = 112 ( b − a )( b − a )(3( b b − a a ) + ( b a − a b )) , which is greater than or equal to zero for all parameters satisfying Ω and henceall c ∈ [ a , b ] (Lemma 5). Therefore this function is convex. Setting the ﬁrstderivative equal to zero and solving for c , we obtain that the minimum occursat c = ( a + b ) /

2. Similar analysis can be completed for i = 3 to obtain theresult. (cid:117)(cid:116) Now that we have established the optimal branching point for each variable inall cases, it is interesting to compare the total volumes obtained when branchingat the optimal point for each variable. In this section we establish the optimalbranching variable.

Theorem 7

Given that the upper- and lower-bound parameters respect the labeling Ω , if we assume optimal branching-point selection, then branching on x obtainsthe least total volume, and branching on x obtains the greatest total volume. Ad-ditionally, even if we branch at the midpoint for x (which may not be optimal),this is at least as good as doing optimal branching-point selection (i.e., midpointbranching) on either x or x .Proof First, we establish that branching optimally (at the midpoint) on variable x obtains a lower total volume than branching optimally (at the midpoint) onvariable x . The optimal total volume when branching on variable x is:( b − a )( b − a )( b − a )48 × (7 a a a + a a b − a a b − a b b − a a b − a b b + a b b + 7 b b b ) . ranching-point selection for trilinear monomials 21 The optimal total volume when branching on variable x is:( b − a )( b − a )( b − a )48 × (7 a a a − a a b + a a b − a b b − a a b + a b b − a b b + 7 b b b ) . Therefore, the diﬀerence in total volume from branching on x compared with x is: ( b − a )( b − a )( b − a ) ( b a − a b )12 , which is greater than or equal to zero by Lemma 5. Therefore, if we assume optimalbranching, branching on x always results in a greater volume than branching on x . Now let us consider the optimal total volume when branching on x , thisquantity must always be less than or equal to the total volume when branchingat the midpoint of the interval (it will be equal exactly when the midpoint is theoptimal branching point). Therefore, if we can establish that branching on variable x at the midpoint always obtains a lesser total volume than branching on variable x at the midpoint, we will have shown our claim.Recall Figure 2. We know from the proof of Theorem 4, that the midpointcan never be less than: min (cid:110) a b a , b a b (cid:111) . Therefore, in every case, the midpointmust fall in a subdomain where: (i) the labeling for left interval stays the same,and the labeling for the right changes; (ii) the labeling changes for both intervals;or, (iii) the labeling remains the same for both intervals. This means that we areinterested in the function value (total volume) at the midpoint for the functions V ( c ), V ( c ) and V ( c ).The total volume of branching (on variable x ) at the midpoint if it occurs inthe subdomain corresponding to V is:( b − a )( b − a )( b − a )48 × (7 a a a − a a b − a a b + a b b + a a b − a b b − a b b + 7 b b b ) . Therefore, the diﬀerence in total volume from branching on x compared with thisquantity is: ( b − a ) ( b − a )( b − a )( b a − a b )8 , which is greater than or equal to zero by Lemma 5.The total volume of branching (on variable x ) at the midpoint if it occurs inthe subdomain corresponding to V is:( b − a )( b − a )( b − a )48 × (6 a a a − a a b − a a b − a b b − a b b − a b b + 7 b b b ) . Therefore, the diﬀerence in total volume from branching on x compared with thisquantity is:( b − a ) ( b − a )( b − a )(4( b a − a b ) + a ( b − a ))48 , which is greater than or equal to zero by Lemma 5.The total volume of branching (on variable x ) at the midpoint if it occurs inthe subdomain corresponding to V is:( b − a )( b − a )( b − a )24 × (3 a a a − a a b − a a b − a b b − a a b − a b b − a b b + 3 b b b ) . Therefore, the diﬀerence in total volume from branching on x compared with thisquantity is:( b − a ) ( b − a )( b − a )( b b − a a + 3( b a − a b ))48 , which is greater than or equal to zero by Lemma 5.Therefore, for each one of these possible scenarios, optimally branching on x results in a greater volume than branching on x at the midpoint. And so we canconclude that given optimal branching, branching on x obtains the least totalvolume, and branching on x obtains the greatest total volume. (cid:117)(cid:116) We have presented some analytic results on branching variable and branching-point selection in the context of sBB applied to models having functions involvingthe multiplication of three or more terms. In particular, for trilinear monomials f = x x x on a box domain satisfying Ω, we have shown that when the convex-hull relaxation is used, and the branching variable is x or x , branching at thecommonly-used midpoint results in the least total volume.We have presented a simple procedure for obtaining the optimal branchingpoint when using the convex-hull relaxation and branching on variable x . Wehave provided a sharp upper bound on where in the interval the minimizer can oc-cur, and we have also obtained a lower bound for this fraction. By computationallychecking many examples, we have evidence to suggest that this lower bound canbe sharpened, thus providing analysis that backs up software’s current choice ofbranching point. Furthermore, we have shown that the piecewise-quadratic func-tions we have been considering are globally convex over their entire domain.Given that we branch at an optimal branching point, we have also comparedthe choice of branching variable. We demonstrate that branching on x gives theleast total volume.We are in the process of carrying out a similar analysis to what we have donehere, but for the best of the double-McCormick convexiﬁcations rather than forthe convex-hull relaxation. However, due to the structure of the volume formula for the best double-McCormick convexiﬁcation (see [24]), our task is signiﬁcantlymore complex.Finally, we hope that our mathematical results can be used as some guidancetoward justifying, developing and reﬁning practical branching rules. We believethat our work is just a ﬁrst step in this direction. In this regard, we hope to fur-ther extend our mathematical analysis to directly deal with variables appearing ranching-point selection for trilinear monomials 23 in multiple non-linear terms. Acknowledgements

This work was supported in part by ONR grants N00014-14-1-0315 andN00014-17-1-2296. The authors gratefully acknowledge conversations with Ruth Misener andNick Sahinidis concerning how branching points are selected in

ANTIGONE and

BARON . References

1. Achterberg, T.: Scip: solving constraint integer programs. Mathematical ProgrammingComputation (1), 1–41 (2009)2. Adjiman, C., Dallwig, S., Floudas, C., Neumaier, A.: A global optimization method, α BB,for general twice-diﬀerentiable constrained NLPs: I. Theoretical advances. Computers &Chemical Engineering (9), 1137–1158 (1998)3. Agarwal, S., Nadeem, A.: Branch and bound algorithm with implementation of ooOPS.IOSR Journal of Mathematics (4), 22–26 (2012)4. Belotti, P., Lee, J., Liberti, L., Margot, F., W¨achter, A.: Branching and bounds tighteningtechniques for non-convex MINLP. Optimization Methods & Software (4-5), 597–634(2009)5. Caﬁeri, S., Lee, J., Liberti, L.: On convex relaxations of quadrilinear terms. Journal ofGlobal Optimization , 661–685 (2010)6. Epperly, T., Swaney, R.: Branch and bound for global NLP: iterative LP algorithm &results. In: I.E. Grossmann (ed.) Global Optimization in Engineering Design, NonconvexOptimization and its Applications , vol. 9, p. 41. Springer, US (1996)7. Epperly, T.G.W.: Global optimization of nonconvex nonlinear programs using parallelbranch and bound. Ph.D. thesis, The University of Wisconsin - Madison (1995)8. Jach, M., Michaels, D., Weismantel, R.: The convex envelope of (n-1)-convex functions.SIAM Journal on Optimization (3), 1451–1466 (2008)9. Land, A.H.: A problem of assignment with inter-related costs. Operational ResearchQuarterly (2), 185–199 (1963)10. Land, A.H., Doig, A.G.: An automatic method of solving discrete programming problems.Econometrica , 497–520 (1960)11. Lee, J.: Mixed integer nonlinear programming: Some modeling and solution issues. IBMJournal of Research and Development (3/4), 489–497 (2007)12. Maranas, C., Floudas, C.: Finding all solutions of nonlinearly constrained systems of equa-tions. Journal of Global Optimization , 143–182 (1995)13. McCormick, G.: Computability of global solutions to factorable nonconvex programs: PartI. Convex underestimating problems. Mathematical Programming , 147–175 (1976)14. Meyer, C., Floudas, C.: Trilinear monomials with mixed sign domains: Facets of the convexand concave envelopes. Journal of Global Optimization , 125–155 (2004)15. Meyer, C., Floudas, C.: Trilinear monomials with positive or negative domains: Facets ofthe convex and concave envelopes. Frontiers in Global Optimization pp. 327–352 (2004)16. Misener, R., Floudas, C.: GloMIQO: Global mixed-integer quadratic optimizer. Journalof Global Optimization (1), 3–50 (2013)17. Misener, R., Floudas, C.: Antigone: Algorithms for continuous/integer global optimizationof nonlinear equations. Journal of Global Optimization , 503–526 (2014)18. Rikun, A.: A convex envelope formula for multilinear functions. Journal of Global Opti-mization , 425–437 (1997)19. Ryoo, H., Sahinidis, N.: A branch-and-reduce approach to global optimization. Journal ofGlobal Optimization (2), 107–138 (1996)20. Sahinidis, N.V.: Baron: A general purpose global optimization software package. Journalof Global Optimization (2), 201–205 (1996)21. Sherali, H.D., Alameddine, A.: An explicit characterization of the convex envelope of abivariate bilinear function over special polytopes. Annals of Operations Research ,197–209 (1990)4 Emily Speakman, Jon Lee22. Smith, E., Pantelides, C.: A symbolic reformulation/spatial branch-and-bound algorithmfor the global optimisation of nonconvex MINLPs. Computers & Chemical Engineering , 457–478 (1999)23. Speakman, E., Lee, J.: On sBB branching for trilinear monomials. In: A. Rocha, M. Costa,E. Fernandes (eds.) Proceedings of the XIII Global Optimization Workshop (GOW16), pp.81–84 (2016)24. Speakman, E., Lee, J.: Quantifying double McCormick. Mathematics of Operations Re-search (4), 1230–1253 (2017)25. Speakman, E., Yu, H., Lee, J.: Experimental validation of volume-based comparison fordouble McCormick. To appear in: The proceedings of the Fourteenth International Con-ference on Integration of Artiﬁcial Intelligence and Operations Research Techniques inConstraint Programming (CPAIOR) (2017)26. Speakman, E.E.: Volumetric guidance for handling triple products in spatial branch-and-bound. Ph.D. thesis, University of Michigan (2017)27. Tawarmalani, M., Sahinidis, N.: Convexiﬁcation and global optimization in continuousand mixed-integer nonlinear programming: theory, algorithms, software and applications,

Nonconvex Optimization and Its Applications , vol. 65. Kluwer Academic Publishers, Dor-drecht (2002)28. Vigerske, S., Gleixner, A.: SCIP: Global optimization of mixed-integer nonlinear programsin a branch-and-cut framework. Tech. Rep. 16-24, ZIB, Takustr.7, 14195 Berlin (2016)

Appendix: technical propositions and lemmas

In this section, we provide the technical propositions and lemmas used for our analysis.

Proposition 1

Given that the upper- and lower-bound parameters respect the labeling Ω , and b a b ≤ a b a , V ( q ) ≤ V (cid:18) a b a (cid:19) = V (cid:18) a b a (cid:19) . Proof

It is easy to check that V (cid:16) a b a (cid:17) = V (cid:16) a b a (cid:17) . V (cid:18) a b a (cid:19) − V ( q ) = ( b − a )( b − a )48(4 b b − a b − a a ) a × (cid:0) pa + qa + r (cid:1) , where p = (cid:16) − a a − a b + b a + 3 b b (cid:17) × (cid:16) − a a − a b + 13 a b a + 7 a b b − a b a − a b b + 16 b b (cid:17) = (cid:16) b b − a a ) + b a − a b (cid:17) × (cid:16) ( − a + 13 a b − a b ) a + ( − a + 7 a b − a b + 16 b ) b (cid:17) ,q = 4 a b (2 a a − a b a − a b b + 4 b b ) × (3 a a + a b − b a − b b ) ,r = 4 a b ( a a + a b − b b ) . To show that V (cid:16) a b a (cid:17) − V ( q ) is non-negative for all parameters satisfying Ω, we willshow that pa + qa + r ≥ (cid:16) ( − a + 7 a b − a b + 16 b ) b + ( − a + 13 a b − a b ) a (cid:17) =: b Y + a Z, ranching-point selection for trilinear monomials 25where Y + Z = 4( b − a )(2 b − a ) ≥ , and Y = (cid:18) b − a (cid:19)(cid:18) b ( b − a ) + 12 b + a (cid:19) + 2 a b ≥ . Therefore, by Lemma 6 we have that b Y + a Z is non-negative and so p is non-negative(Lemma 5). From this we know that pa + qa + r is a convex function in a and we can ﬁndthe minimizer by setting the derivative to zero and solving for a . The minimum occurs at a = 2 b a (2 a a − a b a − a b b + 4 b b )( − a a − a b + 13 a b a + 7 a b b − a b a − a b b + 16 b b ) . Substituting this in to pa + qa + r , we obtain that the minimum value of this quadraticis: 4 a b ( b − a )( b − a ) (3 a a + a b − b b ) ( − a a − a b + 13 a b a + 7 a b b − a b a − a b b + 16 b b ) . In demonstrating the non-negativity of p , we have already shown that the denominatoris non-negative, and it is easy to see that the numerator is non-negative for all values of theparameters satisfying Ω. Therefore pa + qa + r ≥

0, and consequently, V (cid:16) a b a (cid:17) − V ( q ) ≥ (cid:117)(cid:116) Lemma 1

It is easy to check that V (cid:16) b a b (cid:17) = V (cid:16) b a b (cid:17) and V (cid:16) a b a (cid:17) = V (cid:16) a b a (cid:17) .Furthermore, V (cid:18) b a b (cid:19) − V (cid:18) a b a (cid:19) = ( b − a )( b − a ) ( b a − a b )( a b − a b )(3( b b − a a ) + b a − a b )12 a b ≥ , as required. (cid:117)(cid:116) Proposition 2

Given that the upper- and lower-bound parameters respect the labeling Ω , and b a b > a b a , V ( q ) ≤ V (cid:18) b a b (cid:19) = V (cid:18) b a b (cid:19) . Proof

It is easy to check that V (cid:16) b a b (cid:17) = V (cid:16) b a b (cid:17) . V (cid:18) b a b (cid:19) − V ( q ) = ( b − a )( b − a )48(4 b b − a b − a a ) b × (cid:0) pa + qa + r (cid:1) , where p = b (5 b b − b a − a b − a a ) ,q = 8 b b (6 a a + 2 a b − a b a − a b b + b a + 3 b b )( b b − a a ) ,r = 16 b ( − a a − a b + 3 a b a + 5 a b b − a b a − a b b + b b ) × ( b b − a a ) . pa + qa + r ≥ p = b (5 b b − b a − a b − a a ) ≥ . From this we know that pa + qa + r is a convex function in a , and we can ﬁnd the minimizerby setting the derivative to zero and solving for a . The minimum occurs at a = 4 b (6 a a + 2 a b − a b a − a b b + b a + 3 b b )( a a − b b ) b (3 a a + a b + b a − b b ) . Substituting this in to pa + qa + r , we obtain that the minimum value of this quadraticis: 16 b ( b − a )( b − a ) ( b b − a a )(3 a a + a b − b b ) (3 a a + a b + b a − b b ) , which is non-negative for all parameters satisfying Ω. Therefore pa + qa + r ≥

0, andconsequently, V (cid:16) b a b (cid:17) − V ( q ) ≥

0, as required. (cid:117)(cid:116)

Lemma 2

It is easy to check that V (cid:16) a b a (cid:17) = V (cid:16) a b a (cid:17) and V (cid:16) b a b (cid:17) = V (cid:16) b a b (cid:17) .Furthermore, V (cid:18) a b a (cid:19) − V (cid:18) b a b (cid:19) = ( b − a )( b − a ) ( b a − a b )( b a − a b )( b b − a a )3 a b ≥ , as required. (cid:117)(cid:116) Lemma 3

Given that the parameters satisfy the conditions Ω , and furthermore, b a b ≤ a b a ,we have q ≥ b a b . Proof

From the proof of Theorem 4, we know that the midpoint, q , cannot be less than both a b b and b a b . Therefore we have: q ≥ min (cid:26) a b b , b a b (cid:27) , and because we saw in 10 that q ≥ q we also have q ≥ min (cid:26) a b b , b a b (cid:27) . Therefore, under the conditions of the lemma, q ≥ b a b as required. (cid:117)(cid:116) Lemma 4

Given that the parameters satisfy the conditions Ω , and furthermore, b a b ≥ a b a ,we have q ≥ a b a . ranching-point selection for trilinear monomials 27 Proof

We saw in the proof of Lemma 3 that q ≥ min (cid:26) a b b , b a b (cid:27) . Therefore, under the conditions of the lemma, q ≥ a b a as required. (cid:117)(cid:116) For completeness, we state and give proofs of two very simple lemmas (from [24]) whichwe used several times.

Lemma 5 (Lemma 10.1 in [24])

For all choices of parameters ≤ a i < b i satisfying Ω ,we have: b a − a b ≥ , b a − a b ≥ and b a − a b ≥ .Proof ( b − a )( b a − a b ) = b a b + a b a − a b b − b a a ≥ b a − a b ≥

0, because b − a > b a − a b ≥ b a − a b ≥ (cid:117)(cid:116) Lemma 6 (Lemma 10.4 in [24])

Let

A, B, C, D ∈ R with A ≥ B ≥ , C + D ≥ , C ≥ .Then AC + BD ≥ .Proof AC + BD ≥ B ( C + D ) ≥0.