[PDF] A task-based approach to parallel parametric linear programming solving, and application to polyhedral computations

Abstract

Full PDF

AA task-based approach to parallel parametriclinear programming solving, and application topolyhedral computations

Camille Coti , David Monniaux , and Hang Yu LIPN, CNRS UMR 7030, Universit´e Sorbonne Paris Nord, 99,avenue Jean-Baptiste Cl´ement, F-93430 Villetaneuse, France VERIMAG, Univ. Grenoble Alpes, CNRS, Grenoble INP ∗ ,F-38000 Grenoble, FranceOctober 1, 2020 Abstract

Parametric linear programming is a central operation for polyhedralcomputations, as well as in certain control applications. Here we proposea task-based scheme for parallelizing it, with quasi-linear speedup overlarge problems. This type of parallel applications is challenging, becauseseveral tasks might be computing the same region. In this paper, weare presenting the algorithm itself with a parallel redundancy eliminationalgorithm, and conducting a thorough performance analysis. A convex polyhedron in dimension n is the solution set over Q n (or, equivalently, R n ) of a system of inequalities (with integer or rational coeﬃcients). Since allthe polyhedra we consider here are convex, we shall talk of polyhedra for short.There also exist integer polyhedra, deﬁned over Z n , but they are very diﬀerentin many respects; we shall not consider them here. Computations over polyhedra arise in low dimension ( n = 2 and n = 3)for modeling physical objects, but here we are interested in higher dimensions. ∗ Institute of Engineering Univ. Grenoble Alpes Whether emptiness tests, inclusion tests, projection etc. are speciﬁed with real or rationalvariables, the results are the same. It is impossible to distinguish the reals and the rationalsusing ﬁrst-order formulas in linear arithmetic. Among diﬀerences, it is possible to check in polynomial time if a given list of inequalitiesdeﬁnes an empty polyhedron over Q but the same problem is NP-complete over Z . a r X i v : . [ c s . C G ] S e p olyhedra in higher dimension are typically used to enclose the set of reachablestates of systems whose state can be expressed, at least partially, as a vector ofreals or rationals. For instance, one can study the ﬂow of ordinary diﬀerentialequations, or more generally the trajectories of hybrid systems, by enclosingthe trajectories into a succession of convex polyhedra. If the internal state of acontrol system is expressed by a vector of n numeric variables, one may provethat this system never encounters a bad condition by exhibiting a polyhedron P such that the initial state belongs to P , no bad condition belongs to P , andall possible time steps of the control system leave P stable (that is, it is notpossible to execute a step starting in P and ending outside of P ).One generally proves the correctness of programs by exhibiting inductiveinvariants — an inductive invariant for a loop is a set containing the precondi-tion of the loop, stable by moving to the next loop iteration, and implying thedesired postcondition. Since providing inductive invariants by hand is tedious,it is desirable to automate that process. One approach for doing so is abstractinterpretation , where an ascending sequence of sets of states is computed untilreaching a ﬁxed point. One may for instance search such sets as products ofintervals, an approach known as interval analysis , but the lack of relationshipsbetween the dimensions tends to severely limit the kind of properties that canbe proved (e.g., one cannot have an invariant i < n where n is a parameter:this is neither an interval on i nor on n nor a combination thereof). Cousotand Halbwachs proposed searching for polyhedral inductive invariants [10, 16].The operations needed there are projection, more generally image by an aﬃnetransformation, convex hull, and inclusion (or equality) test, together with anextrapolation operator known as widening .Such usages of convex polyhedra suﬀered from the curse of dimensionality :complexity grows quickly with the number of dimensions—the number of nu-meric variables in the program or hybrid system under analysis. This problemis exacerbated in approaches that double the number of variables to representpairs of states, or add extra variables for representing nonlinear terms (e.g., avariable v xy is added to represent xy [27]). This led to either restricting poly-hedra to subclasses with lower computational costs (e.g., restricting inequalitiesto ± x i ± x j ≤ C as in the octagons [28]), or severely restricting the numberof dimensions of the system under consideration (e.g., by focusing on a subsetof interest of the program variables, disregarding relationships with variablesoutside of that subset).Why this curse of dimensionality? There exist multiple libraries for comput-ing over polyhedra ( NewPolka , Parma Polyhedra Library , PolyLib , CDD . . . ).They all use the double description [29] of convex polyhedra: both as constraints (inequalities or equalities) and generators (vertices, and, for unbounded polyhe-dra, lines and rays). Some operations are indeed simpler on one description thanthe other, and Chernikova’s algorithm [24] converts between the two. Having Jeannet’s NewPolka is now part of Apron http://apron.cri.ensmp.fr/library/ [18] [4] https://icps.u-strasbg.fr/PolyLib/ l , h ] × · · · × [ l n , h n ]. Such a polyhedron has 2 n constraints( l i ≤ x i and x i ≤ h i for 1 ≤ i ≤ n ) and 2 n vertices (each x i is independentlychosen in { l i , h i } ). The double description then blows up. One workaround forthe above case is to detect that the polyhedron is exactly a Cartesian productof simpler polyhedra, and to compute as locally as possible in terms of thatproduct [17], but this fails if the polyhedron is almost a Cartesian product; Wethus chose to completely do away with the generator representation and useconstraints only. The question was how to compute over convex polyhedra described by con-straints only. It is possible to reduce most operations (image, convex hull etc.)to projection. An algorithm for projecting polyhedra described by inequalitiesonly, Fourier-Motzkin elimination [22], has long been known. Unfortunately ittends to generate a very large number of redundant constraints, which must beeliminated using an expensive procedure.We instead turned to parametric linear programming . Image, projection,convex hull can all be formulated as solutions to linear programs where param-eters occur linearly within the objective [25, 5]: given a system of equalities andinequalities, deﬁning a convex polyhedron P in higher dimension, and a para-metric bilinear objective function f ( x, λ ) where x is the point and λ a vectorof parameters, give for each λ a vertex x ∗ of P such that f ( x ∗ , λ ) is minimal.A solution to such a program is a quasi-partition of the space of parameters λ into convex polyhedra, with one optimum associated to each polyhedron. Theissue is how to compute this solution eﬃciently. In this article, we describe howwe parallelized our algorithm for doing so.In addition to computations on convex polyhedra, parametric linear pro-gramming is also used for control applications [20]: instead of using a solver,whose computation times are high and hard to predict, inside the control loop,the solution of the linear program is tabulated as a piecewise linear functionover the values of the parameters. Parametric linear programming is also usedfor aﬃne linear approximations of nonlinear expressions [27].Parallelizing this type of applications seems straightforward at ﬁrst sight,since each polyhedron can be computed independently from the other ones.However, it is actually challenging, because several computation units (threadsor processes) might be computing the same region at the same time. In thispaper, we present a parallel redundancy elimination algorithm that improvesthe performance by eliminating redundant computations between concurrentthreads, and conduct a thorough experimental study of the task-based parallelscheme presented shortly in [8]. By polyhedral duality, which exchanges constraints and generators, the worst-case forgenerators translates into a worst-case for constraints. The crucial point is that the worst-case for generators occurs very naturally in the analysis of programs or hybrid systems. Related works

Most libraries for computing over convex polyhedra are based on the doubledescription approach: a polyhedron is described both as the convex hull of itsvertices (and, in the case of unbounded polyhedra, rays and/or lines), and asthe solution set of a system of equalities and inequalities. They convert fromone description to the other using Chernikova’s algorithm [24], which computesa set of generators (vertices, rays, lines) from a set of (in)equalities (and, dually,the converse) by considering each (in)equality in a sequence and computingthe intersection of the polyhedron deﬁned by the previous (in)equalities in thesequence and the current one. To our best knowledge, there is no parallel versionof Chernikova’s algorithm, and we fail to see how to parallelize its main loop.It may be possible to parallelize the inner loops that compute the generators ofthe intersection of a polyhedron P and an (in)equality C given the generatorsof P . An alternative to Chernikova is the reverse search vertex enumerationalgorithm [2].We also opted out of the double description because it is diﬃcult to indepen-dently verify that a polyhedron described by generators includes the polyhedronthat should have been computed. Such veriﬁability is desirable for certain applications: when one computesa polyhedron meant to include all reachable states of a system, to prove thatno undesirable state can be reached, then it would be catastrophic that thispolyhedron excludes some reachable state due to a bug in a library. Our VeriﬁedPolyhedron Library (VPL) [1, 25, 26, 27, 11, 12, 13] provides, in addition tocore computations, an optional layer, formally proved correct within the Coqproof assistant, that performs this veriﬁcation.VPL implements a constraint-only description (equalities and inequalities)for polyhedra, using two generations of algorithms. The ﬁrst generation maps alloperations, including convex hull, to projection, performed by Fourier-Motzkinelimination [22, 14]. The second generation maps all operations to parametriclinear programming, performed as in Algorithm 1 except that no ﬂoating-pointsolver is used, just an exact-precision implementation of the simplex algorithm.Furthermore, VPL is implemented in OCaml, which does not currently allowrunning computations in multiple threads, and its data structures were designedfor compatibility with Coq. For all these reasons, VPL has lower performancethan the C++ implementation described in this paper, even with one thread. It is co-NP-hard to check that, given the description of a polyhedron A by constraintsand a polyhedron B described by generators, A is included in B [15]. Furthermore, the vertexenumeration problem, that is, checking whether, given a polyhedron P described by a listof constraints and a set of vertices V of that polyhedron, P has another vertex not in V , isNP-hard for unbounded polyhedra [23]. As of 2008, it was unknown if the same problem forbounded polyhedra, or, equivalently, that of generator enumeration ( V may contain rays andlines) for unbounded polyhedra, was also NP-complete (it is known to be in NP). No progressseems to have been made on this front since then.Enumeration can be done in polynomial time for simple polyhedra, those for which a vertexcorresponds to exactly one basis [2] (no degeneracy); note that degeneracy is also a majorsource of complication in our algorithms. https://github.com/VERIMAG-Polyhedra/VPL reverse search [3] (vertex enumeration is the casewhere there are as many independent parameters as there are variables, so thatthe optimization direction can point to any direction). A diﬀerence of our ap-proach with reverse search is that we store the nodes already traversed in acentral location, which they do not have to do. Jones and Maciejowski [21] ap-plied reverse search to parametric linear programming; they however warn thatwhile they have better asymptotic complexity than other approaches, the con-stant hidden in the big-O notation is huge and they warn that their approach islikely to be interesting only on larger examples. In contrast, we base ourselveson an approach already used in a sequential library that is competitive withdouble description approaches even on problems in moderate dimension [5]. We shall leave out here how polyhedral computations such as projection andconvex hull can be reduced to parametric linear programming — this is coveredin the literature [19, 1] — and focus on solving the parametric linear programs.

A linear program with n unknowns is deﬁned by a system of equations AX = B ,where A is an m × n matrix; a solution is a vector X such that X ≥ AX = B . The program is said to be feasible if it has atleast one solution, infeasible otherwise. In a non-parametric linear program oneconsiders an objective C : one wants the solution that maximizes C T X . Theprogram is deemed unbounded if it is feasible yet it has no such optimal solution. Example . Consider the polygon P deﬁned by x ≥ x ≥

0, 3 x − x ≤ − x + 3 x ≤

6. Deﬁne x = 6 − x + x and x = 6 + x − x . Let X = ( x , x , x , x ), then P is the projection onto the ﬁrst two coordinates ofthe solution set of AX = B ∧ X ≥ A = (cid:20) − − − − (cid:21) and B = (cid:20) (cid:21) .A linear programming solver takes as input ( A, B, C ) and outputs “infeasi-ble”, “unbounded” or an optimal solution X ∗ . Some linear programming solverstake ( A, B, C ) and output X ∗ as exact rational numbers and ensure that theanswer is correct. Most, however, operate fully on ﬂoating-point numbers andtheir ﬁnal answer may be incorrect: they may answer “infeasible” whereas theproblem is feasible, or propose “optimal solutions” that are not solutions, or arenot optimal. There exist more general descriptions of linear programming with upper and/or lowerbound constraints on each coordinate of X ; our approach generalizes to them.

5n addition to an exact X ∗ or ﬂoating-point ˜ X ∗ output, the solvers alsoprovide the discrete information of optimal basis : a solution is obtained bythe simplex algorithm setting n − m coordinates of X ∗ to 0 (known as nonbasicvariables ) and solving for the other coordinates (known as basic variables ) using AX ∗ = B , and the solver provides the partition into basic and nonbasic variablesit used. If a ﬂoating-point solver is used, it is possible to reconstruct an exactrational point X ∗ using that information and a library for solving linear systemsin rational arithmetic. One then checks whether it is truly a solution by checking X ∗ ≥ C T X as (cid:80) i ∈ N α i X i where N is the set of indices of the nonbasic variables, andconcludes that the solution obtained by setting these nonbasic variables to 0 ismaximal because all the α i are nonpositive.If X ∗ is not a solution of the problem (the condition X ∗ ≥ Example . Assume the objective is C = (cid:2) (cid:3) , that is, C T X = x + x . From AX = B we deduce x = 3 − x − x and x =3 − x − x . Thus x + x = 6 − x − x .Assume x and x are nonbasic variables and thus set to 0, then X ∗ =( x , x , x , x ) = (3 , , , X ≥

0, changing the values of x and x can only decrease the objective o = 6 − x − x . This expression of o in terms of the nonbasic variablescan be obtained by linear algebra once the partition into basic and nonbasicvariables is known.If the linear programming solver uses ﬂoating-point arithmetic and is notto be trusted, it is still possible to reconstruct, by pure linear arithmetic, theexpression of the objective function as a function of the nonbasic variables andcheck the signs of the coeﬃcients.While the optimal value C T X ∗ , if it exists, is unique for a given ( A, B, C ),there may exist several X ∗ for it, a situation known as dual degeneracy . Example . Assume the objective is C = (cid:2) − (cid:3) , that is, C T X = − x . X ∗ can be chosen to be (0 , , , , , ,

0) or any point inbetween.The same X ∗ may be described by diﬀerent bases, a situation known as primal degeneracy , happening when more than n − m coordinates of X ∗ arezero, and thus some basic variables could be used as nonbasic and the converse. Example . Consider the regular polygon B with k ≥ (cid:0) iπ/k ) , iπ/k ) (cid:1) | ≤ i < k . The convex hull of B and T = (1 , ,

1) is a pyramidwith k + 1 faces (e.g., k = 4 is a square pyramid), deﬁned using 3 + k unknownsand k equations (e.g., for k = 4, x = 1 − x − x − x , x = 1 + x − x − x , x = 1 − x + x − x , x = 1 + x + x − x ), plus x , . . . , x k +3 ≥

0. The6igure 1: A pyramid based on an octagon, the apex T described by (cid:0) (cid:1) bases.apex T of the pyramid corresponds to (1 , , , , . . . , x , . . . , x k as nonbasic: there are (cid:0) k (cid:1) bases deﬁning it(Fig. 1). For a parametric linear program, we replace the constant vector C by C + (cid:80) ki =1 µ i C i where the µ i are parameters. When the µ i change, the optimum X ∗ changes. Assume temporarily that there is no degeneracy. Then, for givenvalues of the µ i , the problem is either unbounded, or there is one single optimalsolution X ∗ . It can be shown that the region of the ( µ , . . . , µ k ) associated to agiven optimum X ∗ is a convex polyhedron (for C , a convex polyhedral cone),and that these regions form a quasi partition of the space of parameters (tworegions may overlap at their boundary, but not in their interior) [19, 20, 1]. Theoutput of the parametric linear programming solver is this quasi-partition, andthe associated optima—in our applications, the problem is always bounded inthe optimization directions, so we do not deal with the unbounded case.Let us see in more detail how to compute these regions. We wish to attachto each basis (at least, each basis that is optimal for at least one vector ofparameters) the region of parameters for which it is optimal. Example . Instead of C = (cid:2) (cid:3) we consider the paramet-ric C = (cid:2) µ µ (cid:3) . Let us now express o = C T X as a function of thenonbasic variables x and x : o = (3 µ + 3 µ ) + (cid:18) − µ − µ (cid:19) x + (cid:18) − µ − µ (cid:19) x (1)The coeﬃcients of x and x are nonpositive if and only if 3 µ + µ ≥ There exists another, dual, kind of parametric linear programming where the parametersare in the right-hand side B . We do not consider it here. i R j D j ? Figure 2: R i and R j are not adjacent. The intermediate region is missed. µ + 3 µ ≥

0, which deﬁne the cone of optimality associated to that basis andto the optimum X ∗ = (3 , , , R also provide a set ofvectors outside of R : for each constraint in the description they provide avector violating it [26], a feature that will be useful.Assume now we have solved the optimization problem for a value C ( D ) ofthe optimization direction, for a vector of parameters D , and obtained a region R in the parameters (of course, D ∈ R ). We now pick D / ∈ R — if theredundancy elimination procedure provided us with a set of vectors outside of R , we can store them in a “working set” to be processed and choose D init. We compute the region R associated to D . Assume that R and R areadjacent, meaning that they have a common boundary. We get vectors outsideof R and add them to the working set. We pick D in the working set, checkthat it is not covered by R or R , and, if it is not, compute a region R , etc.This amounts to a traversal of the adjacency graph of the optimality regions.The algorithm terminates when the working set becomes empty, meaning the R , . . . produced form the desired quasi-partition.This simplistic algorithm might fail because it assumes that it is discoveringthe adjacency relation of the graph. The problem is that, if we move from aregion R i to a vector D j / ∈ R i , it is not certain that the region R j generatedfrom D j is adjacent to it, so we could miss some intermediate region (Fig. 2).In order to cope with this issue, we modify our traversal algorithm as follows.The working set contains pairs ( R, D (cid:48) ) where R is a region and D (cid:48) / ∈ R a vector(there is also a special value none for R ). The region R (cid:48) corresponding to D (cid:48) is computed. If R and R (cid:48) are not adjacent, then a vector D (cid:48)(cid:48) in between R and R (cid:48) is computed, and the pair ( R, D (cid:48)(cid:48) ) is added to the working set. Thisensures that at the end, we obtain a quasi-partition. An additional beneﬁt isthat we can obtain a spanning tree of the region graph, with edges from R to R (cid:48) , tracking which region led to which other one.8he last diﬃculty is degeneracy. So far we have assumed that each opti-mization direction corresponded to exactly one optimal vector X ∗ , and thatthis optimal vector is described by exactly one basis. This is not the case ingeneral; in this case, the interiors of the optimality regions may overlap. Thissituation results in a performance hit; also the ﬁnal result is no longer a quasi-partition, but instead just a covering of the parameter space (for each possiblevector D of parameters, there is at least one region that covers it). Of course,the value of the objective function C ( D ) T X ∗ must be the same for all optimalvectors X ∗ associated with the regions covering D . A covering suﬃces howeverfor the correctness of our projection, convex hull etc. algorithms.We are currently investigating approaches for getting rid of degeneracy —enforcing one optimal vector X ∗ and only one optimal basis per vector D , exceptat the boundaries. The methods for doing so rely on lexicographic orderings orperturbations on the objective and/or constant term, or pivoting rules [20]. Wehave recently proposed a working solution to degeneracy [31] but there is stillroom for improvement. A polyhedron may be speciﬁed by a redundant system of inequalities, meaningthat some inequalities can be discarded without changing the polyhedron. Theﬁrst step is to eliminate syntactic redundancies — constraints simpliﬁed into true or false , or subsumed by another (e.g., 2 x + 2 y ≤ x + y ≤ ≤ − ≤ C T X ≤ B and C T X ≤ B where B ≤ B ,then the latter is to be discarded (note that this involves putting the vectorof rationals C in canonical form: ﬂushing denominators and removing commonfactors, so that e.g., 2 x + 2 y ≤ x + y ≤ C is redundant with respect to other inequalities C ∧ · · · ∧ C n boilsdown to ﬁnding a vector D such that D satisﬁes C ∧ · · · ∧ C n but not C : sucha vector exists if and only if the inequality is irredundant. This is a pure satis-ﬁability problem in linear programming (it uses a strict inequality ¬ C , but thiscan be dealt with). Note that if an inequality is irredundant, a vector D not sat-isfying that inequality is provided: this is handy since eliminate redundancy ( S )is to, in addition to removing constraints, provide a set of vectors each violatingone constraint but not the others.A sequential algorithm for removing redundant constraints thus considerseach constraint in sequence, and tests its irredundancy with respect to all theother remaining constraints that have not been shown to be redundant yet. Thiscan also be done in parallel, as in Algorithm 2. The only requirement is thatthe table marking which constraints have already been found to be redundant9 lgorithm 1 Sequential parametric linear programming solver. ﬂoat lp ( A, B, C ) is an external procedure returning the optimal basis for maxi-mizing C T X , AX = B , X ≥

0; it may provide incorrect results. exact lp returns the exact optimum and optimal basis. midpoint ( R, R (cid:48) , D (cid:48) ), where D (cid:48) ∈ R (cid:48) , computes a vector in between regions R and R (cid:48) . exact point computes the exact rational X ∗ point corresponding to the basis. exact objective computes the objective function as a bilinear function of theparameters and the nonbasic variables (the output is a matrix). sign conditions translates it into sign conditions on the parameters, deﬁning acone. eliminate redundancy ( S ) returns ( R, D next ), where R is an irredundant set ofinequalities deﬁning the same cone as S and D next are vectors outside of thatcone, each violating one diﬀerent inequality in R but not the others. procedure PLP ( A, B, C )pick any nonzero vector of parameters D W ← { ( none , D ) } regions ← ∅ while W (cid:54) = ∅ do Pick ( R from , D ) in W and remove it from WR cov ← is covered ( D, regions ) if R cov == none then basis ← ﬂoat lp ( A, B, C ( D )) X ∗ ← exact point ( basis ) o ← exact objective ( basis ) if ¬ ( X ∗ ≥ ∧ o ≤ then ( basis , X ∗ ) ← exact lp ( A, B, C ( D )) end if S ← sign conditions ( basis ) R ← eliminate redundancy ( S ) for each constraint i in R do D next ← compute next ( R , i ) W ← W ∪ { ( R, D next ) } end for regions ← regions ∪ { ( R, X ∗ ) } R cov ← R end ifif ¬ are adjacent ( R from , R cov ) then D (cid:48) ← midpoint ( R from , R cov , D ) W ← W ∪ { ( R from , D (cid:48) ) } end ifend whilereturn ( regions ) end procedureprocedure is covered ( D, regions )) for ( R, X ∗ ) ∈ regions doif D covered by R thenreturn ( R ) end ifend forreturn ( none ) end procedure lgorithm 2 Parallel redundancy elimination. is redundant ← make array ( n, false ) for i ∈ . . . n − do F ← {¬ C i } for j ∈ . . . n − doatomic r ← is redundant [ j ] if j (cid:54) = i ∧ r then F ← F ∪ { C j } end ifend forif ¬ check sat ( F ) thenatomic is redundant [ i ] ← true end ifend for should support atomic accesses. We applied the same parallelization scheme toour more reﬁned “ray-tracing” redundancy elimination algorithm [26].We can use parallel redundancy elimination within parallel parametric linearprogramming. It helps in some cases where the coarse-grained parallelism of theparametric linear programming solver is limited due to the lack of tasks thatcan be executed in parallel, e.g., projections of polyhedra in smaller dimensionsand few faces in the result; otherwise it does not help and hampers tuning. We implemented two variants of task-based coarse-grained parallelism: one us-ing Intel’s Thread Building Blocks (TBB), the other using OpenMP tasks [30].Algorithm 1 boils down to executing tasks taken from a working set, whichcan themselves spawn new tasks. In addition to the working set, it uses a shareddata structure: the set regions of regions already seen, used: i) for checkingwhether a vector D belongs to a region already covered ( is covered ); ii) forchecking adjacency of regions; iii) for appending new regions found.We implemented it as a concurrent extensible array, with a “push at end” op-eration, random read accesses, and iteration: either tbb::concurrent_vector ,or our simple lock-free implementation based on an array with a large, stati-cally deﬁned maximal capacity, using atomic operations for increasing the cur-rent number n ﬁll of items; the latter has the advantage of not needing TBB.There is a little subtlety involved here, in both implementations: n ﬁll is incre-mented atomically before the item is pushed, so that if two items are pushedconcurrently, they are pushed to diﬀerent slots. However, not all items in slots0 . . . n ﬁll − n ready ≤ n ﬁll , such thatall slots 0 . . . n ready − Algorithm 3

Concurrent push on the shared region structure. procedure push region ( R ) atomic ( i ← n ﬁll ; n ﬁll ← n ﬁll + 1) regions [ i ] ← R while n ready < i do possibly use a condition variable instead of spinning end while (cid:46) n ready = i atomic n ready ← i + 1 end procedure The working set is managed diﬀerently depending on whether we use TBBor OpenMP. For TBB, we use tbb::parallel_do , which dynamically schedulestasks from the working set on a number of threads and allows dynamicallyadding tasks to the working set. For OpenMP, when we run a task, we collectall the tasks (

R, D ) that it generates for spawning, and we spawn them at theend.The resulting implementation however had disappointing performance. Infact, we obtained better performance using a naive OpenMP implementationthat collected all tasks to spawn, spawned them and waited for them to com-plete, using a barrier, before the next round of spawning!We identiﬁed the reason. It was frequent that the working set containedtwo tasks ( R , D ) and ( R , D ) such that the regions generated from D and D were the same. In that case, there were two tasks that each solved a lin-ear program, found the same basis, reconstructed exactly the solution and theparametric objective function in that basis, etc. The workaround was to add ahash table (either a tbb::concurrent_unordered_set or a normal hash table,protected by a mutex) that stores the set of bases (each basis being identiﬁedby the ordered set of its nonbasic variables) that have been processed or arecurrently under processing. A task aborts after solving the ﬂoating-point linearprogram if it ﬁnds a basis identical to one already in the table.The overall algorithm is simple: create an initial task ( none , D ) and thenrun the tasks over available threads as long as there are uncompleted tasks(Algorithm 4).The number of tasks running to completion (not aborted early due to a test)is the same as the number of generated regions. Thus, if, geometrically, theproblem does not have many enough regions in comparison to the number ofavailable threads of execution, its parallelism is intrinsically limited.The is covered ( D, regions ) loop can be easily parallelized as well. We optedagainst it as it would introduce a diﬃcult-to-tune second level of parallelism.Figure 3 presents the tasks generated by the sequential and the parallelalgorithms on a real example, and their dependencies. Figure 3a show how thesequential algorithm handle tasks: as long as the tasks generate subtasks, thesesubtasks are computed. For instance, task 10 has a long series of descendants12hat are computed sequentially. Figure 3b show how the tasks are handled inparallel: after an initial task 0 that generates a set of subtasks (tasks 1 to 9),these subtasks are computed in parallel, as well as the subtasks they generate.This ﬁgure also illustrates the parallelism extracted from the computation,on a polyhedron involving 29 constraints and 16 variables. In particular, we cansee on Fig. 3b the number of parallel tasks and their dependencies. For instance,task 5 is generated by task 0 and generates tasks 15 and 16. We can see that,at the beginning of the computation, task 0 generates 7 tasks (i.e., regions tocompute). Hence, 7 cores will be used by the parallel algorithm. Since thecomputation time of each task varies a lot between tasks, the second level ofthe tree is not necessarily executed at the same time. (a) Task graph with 1 thread

01 2 3 4 5 6 79 10 11 12 17 22 13 14 15 16 8 18 19 20 212423 25 28 30 26 31 27 29 37 33 32 3435 36 (b) Task graph with multiple threads

Figure 3: Generation graph of the regions from one typical polyhedron, com-puted with 1 thread and 30 threads. The region graphs, depending on overlapsetc., are diﬀerent; the numbers in both trees have no relationship.

We implemented our parallel algorithms in C++, with three alternate schemesselectable at compile-time: no parallelism, OpenMP parallelism, TBB. We use:

Eigen for ﬂoating-point matrix computations outside of linear programming.Eigen supports internal OpenMP parallelism; we disabled it since thesecomputations take very little time in our overall execution and using itwould have entailed tuning for two levels of parallelism. We used Eigen3.3.4. 13

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (a) 2 dimensions projected

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (b) 5 dimensions projected

Figure 4: Computation time and speedup for 20 projections of polyhedra with9 constraints with no redundant constraints and 16 variables, on Paranoia (20hyperthreaded cores, OpenMP). Each parametric linear program has 2–36 re-gions

GLPK for ﬂoating-point linear programming. This library was not thread-safe,meaning it was impossible to solve linear programs in multiple threads atthe same time. We suggested to the maintainer making one global variablethread-local, which solved the problem and was made the default as ofGLPK version 4.61; we used 4.63. GLPK has no internal parallelism.

Flint for computations on rationals and rational matrices. This library is de-signed to be thread-safe, but does not use parallelism by itself, at least forthe operations that we use. We used Flint 2.5.2.All the benchmarks were run on the Paranoia cluster of Grid’5000 [7] and ona server called Pressembois. Paranoia features 8 nodes, each equipped with 2Intel Xeon E5-2660v2 CPUs (10 cores or 20 threads/CPU, 40 threads per node)and 128 GB of RAM. Although the network was not used for these experiments,each node as two 10 Gbps and one 1 Gbps Ethernet NICs. The code wascompiled using GCC 6.3.1 and OpenMP 4.5 (201511). The nodes run a LinuxDebian Stretch environment with a 4.9.0 kernel. Pressembois fetures 2 Intel(R)Xeon(R) Gold 6138 CPU (20 cores or 40 threads/CPU, 80 threads per node)and 192 GB of RAM. It runs a 4.9 Linux kernel, and we used GCC 6.3 to compilethe code. Every experiment was run 10 times, and the plots presented in thissection provide the average and standard deviation. Paranoia was used for theOpenMP experiments, whereas Pressembois was used for the TBB experiments.We evaluated our parallel parametric linear programming implementationby using it to project polyhedra. This is a very fundamental operation onpolyhedra, since it is used for computing forward images (image of a polyhedronby an aﬃne linear transformation), and may also be used for computing convexhulls, although there exists a more direct approach for that.14e used a set of typical polyhedra, with diﬀerent characteristics: numbers ofdimensions and of constraints, sparsity, number of dimensions to be projected.Here we present a subset of these benchmarks. Each benchmark comprises 50to 100 polyhedra.Our polyhedra were randomly generated. The reason is that it is diﬃcultto obtain polyhedra typically used for the target application (static analysis):because libraries based on the dual description approach behave exponentiallywith respect to the dimension, static analysers are typically designed to keeplow the dimension of the polyhedra, at the expense of analysis precision. On problems that have only few regions, not enough parallelism can beextracted to exploit all the cores of the machine. For instance, Figure 4 presentstwo experiments on 2 to 36 regions using the OpenMP version. It gives anacceptable speedup on a few cores (up to 10), then the computation does notgenerate enough tasks to keep the additional cores busy.As expected, when the solution has a large number of regions, computationscales better. Figure 5 presents the performance obtained on polyhedra made of24 constraints, involving 8 to 764 regions and using the OpenMP version. Thespeedup is sublinear. This is likely due to the synchronizations when tasks arecreated (lookup in the hash table), some contention on shared data structuresand task management.On larger polyhedra, with 120 constraints and 50 variables, we can see how-ever that the speedup is close to a linear one with OpenMP as well as with TBB(Figure 8).The speedup is good for larger problems with many regions (which can becomputed independently from each other): the number of tasks is larger thanthe number of available cores, hence allowing an eﬃcient parallel computation(Fig. 6). When the problem does not have enough regions to allow the algo-rithm to extract enough parallelism to use all the available cores, the speedupis bounded by how many independent tasks are generated. We have seen anexample presented Figure 3, where at the beginning of the computation, 7 coresat most can be used in parallel. Hence, a plateau occurs when the problems donot have enough regions (Fig. 9); the limited width of the region graph thenlimits parallelism.

We have successfully parallelized computations over convex polyhedra repre-sented as constraints, most importantly parametric linear programming. Speedupsare particularly satisfactory for the most complex cases: polyhedra with manyconstraints in higher dimension. There is a chicken-and-egg problem there: static analysis tools do not use general convexpolyhedra or only in low dimension due to the cost of the dual description in higher dimension,thus there is little incentive to develop libraries more eﬃcient in higher dimension. Designersof such libraries then lack higher dimension examples from static analysis. Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (a) 2 dimensions projected, 4 redundant con-straints

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (b) 5 dimensions projected, 4 redundant con-straints

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (c) 2 dimensions projected, 12 redundant con-straints

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (d) 5 dimensions projected, 12 redundant con-straints

Figure 5: Computation time and speedup for diﬀerent numbers of projections ofpolyhedra with 10 variables and variable numbers of redundant constraints, onParanoia (20 hyperthreaded cores, OpenMP). Each parametric linear programhas 8–764 regions.The main cause of ineﬃciency in our current implementation is geometricaldegeneracy, which causes overlapping regions. Several approaches are beingstudied in this respect, all based on enforcing that, for given parameters, thereshould be only one optimal basis. The ﬂoating-point simplex algorithm is restarted from scratch in each task.It could be more eﬃcient to store in the task structure the last basis used, or eventhe full simplex tableau, to start solving from that basis instead of the defaultinitial. The intuition is that neighboring regions are likely to have similar bases. For some of our applications, such as polyhedral projection, there cannot be several opti-mal vertices for the same parameters, except at region boundaries, so that source of degeneracyis not present. There however remains the other source, that is, several bases for the samevertex. Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup

Figure 6: Computation time andspeedup for 50 projections of polyhe-dra in dimension 120, on Paranoia (20hyperthreaded cores, OpenMP). Eachparametric linear program has 3460–3715 regions.

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup

Figure 7: Computation time andspeedup for 50 projections of polyhe-dra in dimension 100, on Pressembois(40 hyperthreaded cores, TBB).We have presented an approach to handle the heterogeneous, non-predictiblecomputation time, and lack of predictibility of the domain decomposition. Onemajor challenge of this problem is that each subtask covers an area of thedomain, but the size and shape of this area is not known before the task isactually computed. In addition, precautions must be taken to avoid computingthe same area multiple times on multiple tasks.Checking whether a vector belongs to a region that has already been pro-cessed is currently implemented following a very simplistic approach: the vectoris searched for in every region. A binary space partitioning region storage couldbe used instead: build a tree with nodes adorned by hyperplanes, all regionswholly on one side of the hyperplane under the ﬁrst child, all regions whollyon the other side under the second child, all regions straddling the hyperplanein both branches, and so recursively in the branches. Then a region covering avector is searched for by testing, at each node, on which side of the hyperplanethe vector lies, and entering the branch. An appropriate locking scheme wouldhowever have to be designed, hence introducing synchronizations and requiringmore collaboration from the operating system.Our coarse-grained parallelism for parametric linear programming dependenthighly on the geometry of the problem. If there are too few regions, too fewtasks will keep the cores busy and little speedup will be achieved. One coulduse two levels of parallelism, with parallel linear programming, parallel exactreconstruction, etc... onn each task, using parallel matrix computations. Onepossibility would be to parallelize ﬂoating-point linear programming and exactmatrix computations, both handled by external libraries. These phases arebased on matrix operations (either ﬂoating-point, rational, or integer modular),so the usual parallelization schemes for matrix computations should apply. Oneshould keep in mind, however, that our matrices are much smaller than the17

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (a) OpenMP on Paranoia

Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup (b) TBB on Pressembois

Figure 8: 120 constraints, 50 variables, 1 dimension projected, 3459–3718 re-gions.ones typically used in high performance computing applications. One optioncould be to replace GLPK by another library based on BLAS, such as Coin-OR LP, and use a parallel BLAS implementation such as GotoBLAS or IntelMKL, using an adaptative, hierarchical parallelism, as mentionned in [9]. Withrespect to Flint, we attempted to run certain internal matrix computation loopsin parallel, but gave up due to crashes; perhaps this could be done with morecareful examination of shared data structures or the collaboration of the librarymaintainer.While our approach based on (in)equalities only does not blow up expo-nentially when the polyhedron is a Cartesian product of simple polyhedra, asopposed to the double description, it would likely beneﬁt from being appliedseparately to terms of a Cartesian product as opposed to over the full product,perhaps in parallel. Some computations are performed over dense matrices,and, even if sparseness is exploited, it is easier to exploit it at a coarse level.We think of combining our approach with a Cartesian decomposition [17].

Acknowledgements

Experiments presented in this paper were carried out using the Grid’5000 testbed,supported by a scientiﬁc interest group hosted by Inria and including CNRS,RENATER and several Universities as well as other organizations (see ). References [1] David Monniaux Alexandre Mar´echal and Micha¨el P´erin. “Scalable minimizing-operators on polyhedra via parametric linear programming”. In:

Static Number of threads E x e c u t i o n t i m e ( s ) Sp ee d u p Execution timeSpeedup

Figure 9: Computation time andspeedup for smaller, simpler polyhe-dra: 16 projections in dimension 29,on Paranoia (20 cores, OpenMP). analysis (SAS) . Ed. by Francesco Ranzato. Springer, 2017. doi : . hal : hal-01555998 .[2] David Avis. “A revised implementation of the reverse search vertex enu-meration algorithm”. In: Polytopes — Combinatorics and Computation .Ed. by Gil Kalai and G¨unter M. Ziegler. Vol. 29. DMV Seminar. DeutscheMathematiker-Vereinigung. Birkh¨auser, 2000. isbn : . doi : .[3] David Avis and Charles Jordan. “mplrs: A scalable parallel vertex/facetenumeration code”. In: Math. Program. Comput. doi : .[4] Roberto Bagnara, Patricia M. Hill, and Enea Zaﬀanella. “The Parma Poly-hedra Library: toward a complete set of numerical abstractions for theanalysis and veriﬁcation of hardware and software systems”. In: Scienceof Computer Programming doi : . arXiv: cs/0612085 .[5] Sylvain Boulm´e et al. “The Veriﬁed Polyhedron Library: an overview”.In: . Ed. by Erika ´Abrah´am et al.IEEE Computer Society, 2019, pp. 9–17. isbn : .[6] David Cantor, Basil Gordon, and Bruce L. Rothschild, eds. Selected papersof Theodore S. Motzkin . Birkh¨auser, 1983. isbn : .[7] Franck Cappello et al. “Grid’5000: A large scale and highly reconﬁgurablegrid experimental testbed”. In: Proceedings of the 6th IEEE/ACM Inter-national Workshop on Grid Computing CD (SC—05) . IEEE/ACM. Seat-tle, Washington, USA, Nov. 2005, pp. 99–106.198] Camille Coti, David Monniaux, and Hang Yu. “Parallel parametric linearprogramming solving, and application to polyhedral computations”. In:

International conference on computational science (ICCS) . Springer, 2019,pp. 566–572. doi : \ . hal : hal-02097321 .[9] Camille Coti et al. “Solving 0-1 quadratic problems with two-level par-allelization of the BiqCrunch solver”. In: . IEEE. 2017, pp. 445–452.[10] Patrick Cousot and Nicolas Halbwachs. “Automatic discovery of linearrestraints among variables of a program”. In: SIGACT-SIGPLAN Sympo-sium on Principles of programming languages (POPL) . Ed. by Alfred V.Aho, Stephen N. Zilles, and Thomas G. Szymanski. ACM. ACM Press,1978, pp. 84–96. doi : .[11] Alexis Fouilh´e. “Revisiting the abstract domain of polyhedra : constraints-only representation and formal proof”. PhD thesis. Saint Martin d’H`eres,France: Universit´e Grenoble Alpes, 2015. tel : tel-01286086 .[12] Alexis Fouilh´e and Sylvain Boulm´e. “A certifying frontend for (sub)polyhedralabstract domains”. In: Veriﬁed software: theories, tools and experiments(VSTTE) . Vol. 8471. Lecture Notes in Computer Science. Springer, 2014,pp. 200–215. doi : . hal : hal-00991853 .[13] Alexis Fouilh´e, David Monniaux, and Micha¨el P´erin. “Eﬃcient generationof correctness certiﬁcates for the abstract domain of polyhedra”. In: Staticanalysis (SAS) . 2013, pp. 345–365. isbn : . doi : . hal : hal-00806990 .[14] Joseph Fourier. “Histoire de l’Acad´emie, partie math´ematique (1824)”.In: M´emoires de l’Acad´emie des sciences de l’Institut de France . Vol. 7.Gauthier-Villars, 1827, xlvij–lv. Gallica: ark:/12148/bpt6k32227/f53 .[15] Robert M. Freund and James B. Orlin. “On the complexity of four polyhe-dral set containment problems”. In:

Math. Program. doi : .[16] Nicolas Halbwachs. “D´etermination automatique de relations lin´eaires v´eriﬁ´eespar les variables d’un programme”. French. PhD thesis. Grenoble, France:Universit´e Scientiﬁque et M´edicale de Grenoble & Institut National Poly-technique de Grenoble, Mar. 1979. hal : tel-00288805 . url : https://tel.archives-ouvertes.fr/tel-00288805 .[17] Nicolas Halbwachs, David Merchat, and Laure Gonnord. “Some ways toreduce the space dimension in polyhedra computations”. In: Formal Meth-ods in System Design doi : . hal : hal-00189633 . 2018] Bertrand Jeannet and Antoine Min´e. “Apron: a library of numerical ab-stract domains for static analysis”. In: Computer aided veriﬁcation (CAV) .Ed. by Ahmed Bouajjani and Oded Maler. Vol. 5643. Lecture Notes inComputer Science. Springer, 2009, pp. 661–667. doi : . eprint: hal-00786354 .[19] Colin. Jones et al. “On polyhedral projections and parametric program-ming”. In: J. Optimization Theory and Applications doi : .[20] Colin N. Jones, Eric C. Kerrigan, and Jan M. Maciejowski. “Lexicographicperturbation for multiparametric linear programming with applications tocontrol”. In: Automatica (43 2007). doi : .[21] Colin N. Jones and Jan M. Maciejowski. “Reverse Search for ParametricLinear Programming”. In: Proceedings of the 45th IEEE Conference onDecision and Control . IEEE, Dec. 2006, pp. 1504–1509. doi :

10 . 1109 /CDC.2006.377799 .[22] Leonid Khachiyan. “Fourier-Motzkin elimination method”. In:

Encyclo-pedia of Optimization . Ed. by Christodoulos A. Floudas and Panos MPardalos. 2nd ed. Springer, 2009, pp. 1074–1076. isbn : .[23] Leonid Khachiyan et al. “Generating all vertices of a polyhedron is hard”.In: Discrete & Computational Geometry http : / / archive . dimacs . rutgers . edu / pub / dimacs /TechnicalReports/TechReports/2005/2005-21.pdf , pp. 174–190. doi : .[24] Herv´e Le Verge. A note on Chernikova’s Algorithm . Tech. rep. 635. Rennes,France: IRISA, 1992. hal : inria-00074895 .[25] Alexandre Mar´echal. “New algorithmics for polyhedral calculus via para-metric linear programming. (Nouvelle algorithmique pour le calcul poly´edralvia programmation lin´eaire param´etrique)”. PhD thesis. Saint Martind’H`eres, France: Universit´e Grenoble Alpes, 2017. tel : tel-01695086 .[26] Alexandre Mar´echal and Micha¨el P´erin. “Eﬃcient elimination of redun-dancies in polyhedra by raytracing”. In: Veriﬁcation, model checking, andabstract interpretation (VMCAI) . Ed. by Ahmed Bouajjani and DavidMonniaux. Vol. 10145. Lecture Notes in Computer Science. Springer, 2017,pp. 367–385. doi : . hal : hal-01385653 .[27] Alexandre Mar´echal et al. “Polyhedral approximation of multivariate poly-nomials using Handelman’s theorem”. In: Veriﬁcation, model checking,and abstract interpretation (VMCAI) . Ed. by Barbara Jobstmann and K.Rustan M. Leino. Vol. 9583. Lecture notes in computer science. Springer,Jan. 2016, pp. 166–184. doi : . hal : hal-01223362 . 2128] Antoine Min´e. “The octagon abstract domain”. In: Higher-Order andSymbolic Computation doi :

10 . 1007 / s10990 -006-8609-1 . hal : hal-00136639 .[29] Theodore S. Motzkin et al. “The double description method”. In: Con-tributions to the theory of games, vol. II . Ed. by Harold W. Kuhn andAlbert W. Tucker. Vol. 28. Annals of Mathematics Studies. Reprinted as[6, Ch. 2]. Princeton University Press, 1953, pp. 51–74. isbn : .[30] OpenMP Application Programming Interface . 4.5. OpenMP ArchitectureReview Board. Nov. 2015. url : .[31] Hang Yu and David Monniaux. “An eﬃcient parametric linear program-ming solver and application to polyhedral projection”. In: Static analysis(SAS) . Ed. by Bor-Yuh Evan Chang. Vol. 11822. Lecture notes in com-puter science. Springer, 2019, pp. 203–224. doi : \ . url : https://doi.org/10.1007/978-3-030-32304-2\_11 .22 lgorithm 4 Task for parallel linear programming solver. push tasks adds new tasks to those to be processed; its implementation is dif-ferent under TBB and OpenMP. test and insert ( T, x ) checks whether x already belongs to the hash table T , inwhich case it returns true ; otherwise it adds it and returns false . This operationis atomic. procedure process task (( R from , D )) R cov ← is covered ( D, regions ) if R cov == none then basis ← ﬂoat lp ( A, B, C ( D )) if ¬ test and insert ( bases , basis ) then X ∗ ← exact point ( basis ) o ← exact objective ( basis ) if ¬ ( X ∗ ≥ ∧ o ≤ then ( basis , X ∗ ) ← exact lp ( A, B, C ( D )) end if S ← sign conditions ( basis ) R ← eliminate redundancy ( S ) for each constraint i in R do D next ← compute next ( R , i ) push tasks ( D next ) end for push region ( R , X ∗ ) R cov ← R end ifend ifif ¬ are adjacent ( R from , R cov ) then D (cid:48) ← midpoint ( R from , R cov , D ) W ← W ∪ { ( R from , D (cid:48) ) } end ifend procedureprocedure is covered ( D, regions )) for i ∈ . . . n ready − do (cid:46) n ready to be read at every loop iteration( R, X ∗ ) ← regions [ i ] if D covered by R thenreturn ( R ) end ifend forreturn ( none ) end procedureend procedure