Scenario Aggregation using Binary Decision Diagrams for Stochastic Programs with Endogenous Uncertainty
aa r X i v : . [ m a t h . O C ] J a n Scenario Aggregation using Binary Decision Diagrams forStochastic Programs with Endogenous Uncertainty
Utz-Uwe Haus · Carla Michini · MarcoLaumannsAbstract
Modeling decision-dependent scenario probabilities in stochastic pro-grams is difficult and typically leads to large and highly non-linear MINLPs thatare very difficult to solve. In this paper, we develop a new approach to obtain acompact representation of the recourse function using a set of binary decision dia-grams (BDDs) that encode a nested cover of the scenario set. The resulting BDDscan then be used to efficiently characterize the decision-dependent scenario prob-abilities by a set of linear inequalities, which essentially factorizes the probabilitydistribution and thus allows to reformulate the entire problem as a small mixed-integer linear program. The approach is applicable to a large class of stochasticprograms with multivariate binary scenario sets, such as stochastic network design,network reliability, or stochastic network interdiction problems. Computational re-sults show that the BDD-based scenario representation reduces the problem size,and hence the computation time, significant compared to previous approaches.
Keywords multistage stochastic optimization, exact reformulation, scenarioaggregation, reliability optimization
When modeling a stochastic optimization problem as a stochastic program, oneusually assumes that the scenario probabilities are fixed and given input the prob-lem. In this setup, decisions can influence the outcome in each scenario but nottheir probability of realizing. This standard way of reflecting the effect of uncer-tainty in the model has been referred to as exogenous uncertainty , while the term endogenous uncertainty has been introduced to describe situations where decisionscan actually influence the stochastic process itself and not only its outcomes [28].While endogenous uncertainty is straightforward to express in the frameworkof Markov decision processes, its use in stochastic programming has remainedvery rare, because the resulting models appear very cumbersome to formulate
Cray EMEA Research Lab, Cray Switzerland, Hochbergerstr. 60C, 4057 Basel Switzerland E-mail: [email protected] · Wisconsin Institute for Discovery, University of Wisconsin – Madison,330 North Orchard Street, Madison WI 53715, USA E-mail: [email protected] · IBM Research,8803 Rueschlikon, Switzerland, E-mail: [email protected] Utz-Uwe Haus et al. and notoriously hard to solve. [19, 20] distinguish two types of endogenous uncer-tainty, according to whether the decisions can influence the temporal unfoldingof events or the probability distribution. Models of the first type are usually for-mulated by enumerating the various scenario trees that correspond to the differ-ent possible sequencing of events and couple them via disjunctive formulations ofnon-anticipativity constraints (see e.g. [12]. Solution approaches include implicitenumeration schemes [28], branch-and-bound coupled with Lagrange duality [20],branch-and-cut [11], Lagrange relaxation of the non-anticipativity constraints [22],decomposition techniques [21] or heuristics [26].In this paper we focus on problems of the second type of endogenous uncer-tainty, where decisions can influence the probability distribution of the scenarios.The straightforward way to model this involves products or other non-linear ex-pressions of decision variables thus leads to highly non-linear models that are veryhard to solve [37, 29]. [1] as well as [17] have used convexification techniques todeal with these polynomials, while [37] have relied on linear approximations. [42]solve the non-linear stochastic program directly with a new constraint program-ming approach using new type of extreme resource constraint in combination withan efficient propagation algorithm.In order to deal with the usually exponentially large scenario sets, most ap-proaches had to resort to scenario sampling and work with a representative sub-set of scenarios. Recently [38] and [39] proposed an exact scenario bundling ap-proach, where scenarios that are equivalent with respect to the recourse functionare merged. In [42] scenarios are grouped according to the shortest surviving pathsin the underlying network.An important class of stochastic optimization problems that typically includeendogenous uncertainty comes from the related areas of stochastic network design,network reliability, and network interdiction. Common to these problems are anunderlying graph whose edges or nodes are subject to random failures. The deci-sions to be made are in order to change their failure (or survival) probabilities ofedges or nodes such that the resulting failure process has some favorable properties,such as connectivity, shortest path lengths or other performance indicators relatedto the function or process which the graph expresses. Such problems can naturallybe formulated as two-stage stochastic programs, where those tactical or “design”decisions constitute the first stage, the failure process is represented by a (usuallyexponentially) large set of scenarios, and the resulting performance of the networkin each failure scenario is captured by a recourse function (which might involveauxiliary second-stage decisions, e.g., for determining flows or shortest paths).We consider a general setting where scenarios can be formally described asrandom binary vectors. This holds in particular for stochastic network design andnetwork reliability problems described above. In many such cases, the recoursefunction is monotonous with respect to the scenario. Take for example the short-est path length between a given pair of nodes in a graph: if scenarios indicatea set of surviving edges after some disaster, then the shortest path length cannever increase when fewer edges survive. Our main idea is exploit this structurefor scenario aggregation, and our main contribution is a new method for scenarioaggregation by means of binary decision diagrams. The approach not only enablesa substantial reduction of the (initially exponentially large) scenario set withoutany accuracy loss, but also to efficiently characterize the decision-dependent sce-nario probabilities by a set of linear inequalities. This essentially factorizes the cenario Aggregation using BDDs 3 probability distribution and thus allows to reformulate the entire problem as asmall mixed-integer linear program.Our motivation and running example will be the shortest path problem hintedat above:
Example 1
Let G = ( V, E ) be a graph whose edges have some (independent) failureprobabilities p e ∈ [0 ,
1] ( e ∈ E ). After a disastrous event some edges will have failed,and each failure scenario is described by a set of surviving edges ξ ⊆ E . We areinterested answering the following two questions:1. What is the expected value of the shortest path lengths between a set of dis-tinguished nodes?2. If we can take some actions to modify the failure probabilities p e , how shouldthis be done optimally in order to to minimize the expected shortest pathlength?If the graph is a road network, and edges fail due to an earthquake, this is thesetting of [37]. The set of all failure scenarios can be totally ordered by comparingthe lengths of the shortest path between two (fixed) nodes: If in failure scenariothere exists an st -path of length at most α , then in all scenarios where only asubset of these edges fails, this path still exists. Scenarios with the same shortestpath length will be considered equal in the total order. Of course, depending onthe application, many other scenario orderings are conceivable, e.g., accordingto longest st -path, number of disjoint st -paths, or size of the largest connectedcomponent of the graph.The successive shortest path problem is closely related to the classic st -reliabilityproblem introduced by [33, 34] and [46] and has been studied by many authors innumerous variants since then. Here the network is a system of circuits with edgescorresponding to components which have some probability of failing. The systemas a whole operates if there exists some st -path, and the probability of having sucha path is called st -reliability. This fundamental problem and its various extensionshave a vast number of applications in reliability engineering [6].The question also naturally arises in the setting of interdiction problems, wheretypically one considers decisions to be actions of an attacker to weaken the net-work structure (to increase failure probability of edges, reduce capacity, etc.) inorder to hamper the network’s operational capability. For a survey on typical net-work interdiction problems and corresponding modeling and solution approachessee [13].The approach we develop in this paper is not limited to combinatorial stochas-tic optimization problems. It can also be used in computing the expected objectivefunction value for linear programs with varying right-hand-side coefficients, whereeach coefficient independently attains one of two values with a given probability.Consider a maximization problem with less than or equal constraints, and assumethat: ( i ) in the nominal problem all right-hand-side coefficients are at the largerof the two values; ( ii ) the scenarios are given by sets of rows where the coefficienttakes the smaller value. Then the objective function is monotonously decreasingwhen taking supersets among the scenarios. In a similar way, irreducible inconsis-tent linear systems and maximal consistent subsets of LPs [36, 2] can be studied. Utz-Uwe Haus et al.
We consider a two-stage stochastic optimization problemmin E ξ | x [ f ( ξ )] Cx ≤ dx e ∈ { , } e ∈ Eξ = ( ξ e ) e ∈ E ∈ { , } | E | (1)where we minimize the expected value (over all realized failure scenarios) of therecourse function f ( ξ ) : 2 E → R conditioned on first-level decisions x e , where e ∈ E has an associated elementary failure event ξ e whose probability may beinfluenced by decision x e . The probability distribution of the scenario distributionthus changes depending on the first-level decisions. The first-level decisions areto be taken subject to some set of linear constraints, e.g., a budget restriction P e ∈ E x e ≤ β . Typically the evaluation of the recourse function f ( ξ ) will itselfamount to solving an optimization problem. We will interpret scenarios as sets of‘positive events’, e.g., the sets of surviving edges in a network after some disaster.In order to make our assumption on the problem structure more precise, wedefine the following properties and objects related to the recourse function. Definition 1 (aggregable recourse function)
The recourse function f of a 2-stage stochastic optimization problem is called aggregable if1. f does not depend on the first-stage decision x ,2. f is counter-monotone with taking subsets of scenarios, ξ ⊆ ξ ⇒ f ( ξ ) ≥ f ( ξ ) (2)for all ξ , ξ ∈ E ,3. the probabilities of events e ∈ E are independent.We denote the critical values of f , i.e. the different values f takes, by C ( f ) = { α : α = f ( ξ ) , ξ ∈ E } . Moreover we denote by T M α ( f ) = { ¯ ξ ∈ E : f ( ξ ) > α, ξ =1 E − ¯ ξ } the sets of edge failures that forces f to take values strictly greater than α . Finally we define the set of minimal survivable scenarios for each critical valueby M α ( f ) = { ξ ∈ E : f ( ξ ) = α, ∀ ξ ′ ⊂ ξ : f ( ξ ′ ) > f ( ξ ) } , with α ∈ C ( f ).The elements of M α ( f ) are the transversal of the clutter given by the minimalmembers of T M α ( f ). Example 1 (continued)
In our running example, computation of f amounts to com-puting a shortest path length f SP in a graph whose edge set depends on the sce-nario: the edges that failed are missing in the scenario compared to the initial graph G . Clearly, shortest path length is counter-monotone with removing supersets ofedges, i.e., considering subsets of the surviving edges of some other scenario. An-other way of looking at this is to say that the sublevel sets L − α ( f ) = { ξ : f ( ξ ) ≤ α } of f are co-monotone (wrt. inclusion) with the natural ordering of real numbers α i < α j . Each set M α ( f ) consists of the simple paths of length α in the graph. cenario Aggregation using BDDs 5 The set C ( f ) allows us to rewrite the objective function as follows: E ξ | x [ f ( ξ )] = X α ∈C ( f ) α Prob[ f ( ξ ) = α | x ] . We are thus interested in computing the probabilities Prob[ f ( ξ ) = α ], first withouttaking into account possible decision variables x e , and then integrating them. In this section, we develop our approach to provide an effective scenario aggre-gation by using nested scenario covers. We first describe how scenario sets canbe encoded as BDDs in such a way that these sets represent the sublevel sets ofthe recourse function. We then proceed to show how scenario probabilities canbe represented as a flow of probability on the DAGs represented by the BDDs.Finally, we describe how those probability flows can be shaped using the binaryfirst-stage decision variables via linear inequalities and a derive the resulting MIPformulation.3.1 Scenario Aggregation using BDDsWe will from now on consider the set C ( f ) as an ordered set ( α , . . . , α N ) with thenatural ordering < of the real numbers. This induces an ordering of the set M ( f )as ( M α ( f ) , . . . , M α N ( f )).Each M α ( f ) induces a monotone Boolean function Φ ≤ α on the scenarios whoseminimal true points are the members of M α ( f ) by “ Φ ≤ α ( ξ ) = 1 if and only if f ( ξ ) ≤ α .” It is well-defined because of (2). Note that Φ ≤ α ( ξ ) describes the supportof the cumulative distribution function Prob[ f ( ξ ) ≤ α ]. We refer to [14] for moredetails on Boolean functions. In the reliability analysis setting the function Φ ≤ α is called system state function, and we can assume that the system is a coherentbinary system. Then M α ( f ) are the path sets of the system and Prob[ Φ ≤ α ( ξ ) = 1]is the reliability, see [4].We propose to encode each function Φ ≤ α for α ∈ C ( f ) in the form of a (reduced,ordered) binary decision diagram (BDD), see [30, 8]. Intuitively, a BDD can beunderstood as a directed acyclic graph with a single source and a single sink,in which each source-sink path encodes one or more feasible points of a Booleanfunction. The graph is layered, with the layers indexed by the variables of thefunction. The key property is that every node has at most 2 children, and foreach sub-dag that includes a sink there are no isomorphic copies in the BDD. Thiscan make them much more compact than conventional decision trees encoding thesame feasible points. Although BDDs can be of exponential size compared to thefunction they encode, for various classes of functions one can find compact BDDencodings. Note that finding a minimal size BDD is N P -hard, even for monotoneBoolean functions [45], and not every BDD construction algorithm will be output-polynomial. Despite all of these caveats, BDDs have been highly successful inmany applications, see [32] for an overview. Furthermore, good software for BDD
Utz-Uwe Haus et al. handling exists, which provides various heuristic BDD size minimization strategies,e.g, cudd of [44].Recently, it was shown that BDDs encoding independence systems can alwaysbe constructed in output-polynomial time, if equivalence of minors in the asso-ciated circuit system can be checked efficiently by a top-down compilation ruleof [25]. We will use this to prove Lemma 3.
Definition 2 (BDD)
Let E = ( e , . . . , e | E | ) be a finite linearly ordered set. A binary decision diagram (BDD) B = ( E, V, A ⊆ V × V, ǫ : V → { , . . . , | E |} , l (( u, v )) : A → { , } ) is either degenerate, B = ( E, {⊥} , ∅ , ǫ, l ) or B = ( E, {⊤} , ∅ , ǫ, l ), orconsists of a directed acyclic graph on node set V with exactly one root u ∗ , twoleaves ⊤ , ⊥ , arc set A , arc label function l : A → { , } and a layering function ǫ : V → { , . . . , | E |} . Unless B is degenerate, it must satisfy the following conditions: – B is a layered digraph, i.e. its node set partitions into layers L i = { u ∈ V : ǫ ( u ) = i } , i = 1 , . . . , | E | and L | E | +1 = {⊤} , such that | L | = 1 (the root layer)and L | E | +1 is the terminal layer. – Arcs only extend to layers with higher index, i.e. ∀ ( u, v ) ∈ A, ǫ ( u ) < ǫ ( v ). – Every node u ∈ V \ {⊥ , ⊤} is the tail of exactly two differently labeled arcs,i.e. ∀ u ∈ V \ {⊥ , ⊤}∃ v , v ∈ V : v = v , ( u, v ) ∈ A, ( u, v ) ∈ A, l (( u, v )) = l (( u, v )). – For any two distinct nodes u, v of V , the sub-BDDs rooted at u and v are notisomorphic, i.e. B u = B v .Here B u denotes the sub-BDD of B defined by the sub-dag of ( V, A ) rooted at u ,with node and arc label functions obtained by restriction of ǫ and l to V ( B u )and A ( B u ), respectively. BDD-isomorphism ∼ = is defined as isomorphism ofdirected graphs with identical variable set and identically evaluating functions ǫ and l .For each node u ∈ V , the outgoing arc labeled by 1 is called the True -arc , and theoutgoing arc labeled by 0 is called the
False -arc .Note that BDDs according to this definition are called reduced, ordered BDD in theclassical literature [30, 8, 32]. (In drawing BDDs one often suppresses the ⊥ -nodeand all arcs δ in ( ⊥ ), since these can be easily reconstructed.)For i = 1 , . . . , | E | , the layer width w i of layer L i is defined as w i = | L i | . The BDD-width is then w = max i =1 ,..., | E | w i . The total size of a BDD is | V | + 1 = P | E | +1 i =1 w i .Each BDD B represents a Boolean function: Φ B ( x ) = _ P an u ∗ - ⊤ path ^ ( u, v ) edge of P : l (( u,v ))=1 x ǫ ( u ) ∧ ^ ( u, v ) edge of P :l((u,v))=0 ( ¬ x ǫ ( u ) ) . (3)Furthermore, every Boolean function has a BDD representation (which is es-sentially unique given the ordering of E ). Remark 1
Let Φ ( x ) be a monotone Boolean function and Φ D ( x ) = ¬ Φ ( ¬ x ) itsdual. If B = ( E, V, A, ǫ, l ) is a BDD representing Φ ( x ), then B ′ = ( E, V, A, ǫ, l ′ )with l ′ (( u, v )) = 1 − l (( u, v )) represents ¬ Φ D ( x ), i.e. except for the labeling, Φ ( x )and Φ D have the same BDD. cenario Aggregation using BDDs 7 This is a direct consequence of the fact that the BDD not only encodes allfeasible points of the monotone Boolean function but also all infeasible points (aspaths from u ∗ to ⊥ ) and formula (3).The isomorphism between BDDs for a monotone Boolean function and its dualfunction is useful if the set M α are given as a list of minimal true elements: thislist provides a covering-type formulation for the dual function and the output-polynomial time top-down BDD construction rule using matrix minors of [25].3.2 Integrating Scenario Probabilities Lemma 1
Given a BDD B encoding a Boolean function Φ : 2 E → { , } and ascenario distribution such that the events e ∈ E are independently distributed randomvariables, the value Prob[ B ] := Prob[ Φ ( ξ ) = 1] can be computed in linear time. Note that linear time here refers to linear in the size of the BDD B (and | E | ); inthe proof we only have to show that we can obtain a system of equations computingthe probability that is shaped like the BDD and does not have excessively largesubexpressions. Proof
The construction is similar to the classical computation of probabilities ina scenario tree, with the difference being that recurring computations in subtreesare avoided because the BDD structure (‘no isomorphic sub-BDDs’) merges thesecases.Let p i ∈ [0 ,
1] be the probability of event e i ∈ E . We proceed by induction: If B is the BDD consisting of only the ⊤ -terminal, it encodes the entire cube { , } | E | and Prob[ Φ ( ξ ) = 1] = 1. Similarly, if it consists only of the ⊥ terminal, in encodesthe empty set, so Prob[ Φ ( ξ ) = 1] = 0.Assume now that the statement has been proven for BDDs with k − > B be a BDD with k layers. The unique root node u ∗ has exactly 2 children thatwe’ll call True -child and
FALSE -child; let the event for the root layer be ǫ ( u ∗ ) = e .Initially assume that the sub-BDDs have their roots in the layer L directly follow-ing e . Then,denoting the probabilities computed for the sub-BDDs by p True -child and p False -child ,Prob[ Φ ( ξ ) = 1] = p p True -child + (1 − p ) p False -child , (4)since scenarios contain/don’t contain event e with these probabilities and theprobabilities of the completions have been computed inductively for the sub-BDDsof size at most k − ξ = 1, has its root in a layer l >
2, then thissub-BDD encodes the set of solutions (1 , ⋆, . . . , ⋆ | {z } l − , ξ ′ ) where ⋆ denotes that 0 or 1can be chosen arbitrarily for every ( ξ ′ ) ∈ { , } | E |− l .For simplicity assume thatonly the True -child of u ∗ starts at layer l . ThenProb[ Φ ( ξ ) = 1] = p l − Y i =2 ( p i + (1 − p i )) ! p True -child + (1 − p ) p False -child = p p True -child + (1 − p ) p False -child , (5) Utz-Uwe Haus et al. since the scenarios do not depend on events on layer 2 , . . . , l − ξ = 1.3.3 Shaping the Scenario DistributionEquations (4) and (5) give a recursive set of linear equations (starting with p ⊤ =1 , p ⊥ = 0 for the leafs) that can be used to directly turn a BDD encoding M α ( f )into a set of linear constraints to compute Prob[ Φ ≤ α ( ξ ) = 1] using as many variablesand equations as the BDD has nodes (each node has exactly one defining equation).We now show how decisions influencing the probabilities p i can be incorporatedinto these formulas as well. Lemma 2
Given a BDD B encoding a Boolean function Φ : 2 E → { , } and ascenario distribution such that the events e i ∈ E are independently distributed randomvariables depending on decisions ( x i ) i =1 ,..., | E | , the value Prob { ξ | x } [ Φ ( ξ ) = 1] can becomputed in linear time.Proof This follows immediately from (5) because the events are assumed to beindependent: Let p i ( x ) denote the conditioned probabilities, and p True -sub ( x ), p False -sub ( x ) the conditioned sub-BDD probabilities. ThenProb[ Φ ( ξ ) = 1] = p ( x ) p True -sub ( x ) + (1 − p ( x )) p False -sub ( x ) , (6)just as in (4).Again, equation (6) gives a recursive set of equations (starting with p ⊤ =1 , p ⊥ = 0 for the leafs), but depending on how p i ( x ) is defined they may not belinear. In the next section we will show that if the decisions are binary (or cantake only a fixed number of values), the problem can be formulated as a compactmixed-integer programming problem.3.4 MIP Model for Aggregated Scenario ProbabilitiesLet us now consider for ease of presentation the case where | E | binary decisionvariables x i ∈ { , } influence the nominal event probability p e such that p i ( x ) = ( p i if x i = 0 p i + ∆ i if x i = 1where ∆ i ∈ [ − p i , − p i ] is the boost or decrease of the probability of event e i ∈ E when taking decision x i = 1.With this definition, and in light of equation (6), we define for each arc ( u, v )in the BDD B ≤ α = ( E, A, ǫ, l ) p ( u,v ) ( x ) = ( p i ( x ) if ( u, v ) ∈ A, ǫ ( u ) = i, l (( u, v )) = 1(1 − p i ( x )) if ( u, v ) ∈ A, ǫ ( u ) = i, l (( u, v )) = 0 , that is, depending on the decision x i all arcs in the BDD that start at node u associated with event e i are assigned the appropriate probability depending on x i cenario Aggregation using BDDs 9 and whether they correspond to the event being in the scenario ( l (( u, v )) = 1) ornot ( l (( u, v )) = 0). Introducing intermediate variables for each BDD node we canthus formulate the following MIP constraints to compute the probability of allscenarios of B ≤ α subject to decision variables x = ( x i ) i =1 ,..., | E | : P α = ( p ≤ α , x ) : p ≤ α ( u ) = 1 u = ⊤ ∈ V ( B ≤ α ) p ≤ α ( u ) = 0 u = ⊥ ∈ V ( B ≤ α ) p ≤ α ( u ) = p ( u,v ) ( x ) p ≤ α ( v ) + p ( u,w ) ( x ) p ≤ α ( w )( u, v ) , ( u, w ) ∈ A ( B ≤ α ) , v = wp ≤ α ∈ [0 , V ( B ≤ α ) x ∈ { , } E (7)It is easy to see that the constraints of P α can be written as linear inequalities,which we avoided above to unclutter the formulation. For example, in the case( u, v ) , ( u, w ) ∈ A ( B ≤ α ) , v = w, l (( u, v )) = 1 , l (( u, w )) = 0 where ǫ ( u ) = i : p ≤ α ( u ) = p ( u,v ) ( x ) p ≤ α ( v ) + p ( u,w ) ( x ) p ≤ α ( w )can be written using inequalities p ≤ α ( u ) ≤ ( p i + ∆ i ) p ≤ α ( v ) + (1 − p i − ∆ i ) p ≤ α ( w ) + (1 − x i ) p ≤ α ( u ) ≤ p i p ≤ α ( v ) + (1 − p i ) p ≤ α ( w ) + x i p ≤ α ( u ) ≥ ( p i + ∆ i ) p ≤ α ( v ) + (1 − p i − ∆ i ) p ≤ α ( w ) − (1 − x i ) p ≤ α ( u ) ≥ p i p ≤ α ( v ) + (1 − p i ) p ≤ α ( w ) − x i . Then the formulation of P α contains | V ( B ≤ α ) | + | E | variables and 4 | V ( B ≤ α ) | + 1constraints (plus box constraints for all variables).Clearly, similar models can be built if x i is allowed to take a finite numberof discrete values, using 2 constraints per possible value of x i , or disjunctive, orconstraint programming techniques.To obtain a MIP model for aggregable problems of the form (1) we can simplycombine blocks of the form (7) using inclusion-exclusion on the scenario sets,exploiting: min α ∈C ( f ) αp = α p = α = p ≤ α ( u ∗ ) u ∗ root node of B ≤ α p = α i = p ≤ α i ( u ∗ ) − p = α i − u ∗ root node of B ≤ α i , α i ∈ C ( f ) \ { α } Cx ≤ d ( p ≤ α i , x ) ∈ P α i α i ∈ C ( f ) (8)Note that the x variables will be shared between P α i and P α j for α i , α j ∈ C ( f ).When can we expect a polynomial-size MIP formulation using aggregated sce-narios? This depends on two prerequisites: We need to be sure that the set C ( f )and the BDDs B ≤ α are small. The first condition may be satisfied automaticallyfor some problem classes, or may be a consequence of the way that the instance is encoded, e.g., giving an explicit list of relevant objective values. The second con-dition is hard to control in general, but often one can identify parametric problemsubclasses such that the width of B ≤ α is appropriately bounded for each fixed valueof the parameter: Bounds on the BDD width have been studied widely. For ourpurposes we will use the results of [25] and [35].To actually construct the formulation we also need to ensure that the construc-tion of each B ≤ α can be performed in (output-)polynomial time. The easiest way toensure this is to specify a top-down-compilation rule, such that each BDD node istouched only once in the construction, and decisions about whether to merge thechild subtrees are made immediately in the node in polynomial time. For mono-tone Boolean functions, which encode the members of an independence system,this can be achieved if equivalence of minors of the associated circuit system canbe checked efficiently, see [25]. A prominent example is that of the graphic ma-troid [43], i.e. when scenarios are forests or simple cycles of a graph. By Remark 1this is also always the case if the sets M ≤ α (or the dual sets) are explicitly given.Thus if M ≤ α is explicitly given or can be enumerated in polynomial time, efficientBDD construction reduces to the question whether the BDD width is polynomiallybounded.In light of this reformulation technique we make the following definition. Definition 3 (polynomially aggregable 2-stage stochastic optimization prob-lem)
An aggregable 2-stage stochastic optimization problem is called polynomiallyaggregable if – the number of critical values C ( f ) of f is polynomial, and – all BDDs B ≤ α ( f ) can be constructed in polynomial time.It follows from [25, Thm. 6] that whenever the matrix containing the incidencevectors of the elements of T M α ( f ) has bandwidth k , then B ≤ α ( f ) has size at most n k − , which is polynomial if k ∈ O ( logn ). Hence 2-stage stochastic optimiza-tion problems with bandwidth-limited sets T M α ( f ) and a polynomial number ofcritical values C ( f ) are polynomially aggregable: Lemma 3 σ , a polynomialnumber of critical values C ( f ) , and an explicitly given sets T M α ( f ) whose incidencematrix has bandwidth k ∈ log( σ ) are polynomially aggregable. Note that our definition of (8) contains as a special case the union of products(UPP) problem (this is the case where the decisions x e have no influence on theprobabilities). [4] show that UPP is N P -hard, even when the sets M ( f ) are ex-plicitly given – thus unless P = N P the BDD construction will be exponential inthese cases. cenario Aggregation using BDDs 11O-D-pair Dist.-Limit
Prestwich ’13 ) (using BDD-bundles)14–20 31 39 4 237 × × × × × P
223 14174 × × Prestwich ’13 ) (using BDD-bundles)14–20 378 14 2609 × × × × × P × × Table 1: Scenario partition bundles vs. BDD bundles in the Istanbul road net-work problem instance of [37]. MIP size is number of constraints/number of rows,excluding [0 , < s . CPLEX 12.5 in single-thread mode on a ded-icated 24-core, 4-CPU Intel X5650/2.67 GHz system with 96Gb RAM runningLinux 3.14.17.easy to answer: The set of all shortest paths M ( f SP ) between a pair of nodes canbe enumerated in time O (( | V | + | E | ) |M ( f SP ) | ) using the classic restricted back-tracking technique of [41] or the recent output-linear method of [7]. One couldalternatively enumerate the elements of T M ( f SP ) using the method of [40], whichmay be preferable if one is interested in displaying the minimal sets of simul-taneously failing edges characterizing a bundle of scenarios to a user. Actually,since T M α ( f SP ) may be exponential in the size of M α ( f SP ) (a graph consistingof k edge-disjoint st -paths with at most α edges each has α k minimal sets of edgefailures defining M α ( f SP )), it may be preferable to start computing both setsconcurrently, and stop when one is complete. From both sets one can constructidentically-sized BDD (which differ only by the arc labels), since the sets define apair of dual monotone Boolean functions.We can apply Lemma 3 to this problem as follows: For fixed s and t andeach α the set T M ≤ α contains the minimal sets of edges that need to be removedto destroy all shortest st -paths of length at most α (sometimes called α -length-bounded cuts [3]). Clearly, each such set is a subset of some minimal st -cut. Hence,if the incidence matrix of all minimal st -cuts has bandwidth at most k , so do allincidence matrices for the clutters T M ≤ α . Note that for the incidence matrix of all n arcs 2 arcs pairs size size size size size size α = 1 .
11 5 32 64 2 2 3 3 5 52 11 2048 384 2 3 4 5 14 263 19 524288 1280 2 3 5 6 33 1264 30 1 . × . × . × . × α = 1 .
51 5 32 64 2 2 3 4 9 92 11 2048 384 2 3 4 8 37 643 19 524288 1280 2 4 6 22 261 5654 30 1 . × . × . × . × α = ∞ . × . × . × . × Table 2: BDD bundles for random road networks on grid of ( n + 1) nodes withdensity approximately 1 .
2, see [31]. Experiments repeated for 32 different net-works and for all origin-destination pairs in each network, subject to length cutoff α · d , where d is the shortest path distance for the origin-destination pair underconsideration. Number and size of BDDs is reported for minimum, maximum,25/50/75/99% quantiles. Average CPU time per experiment across all 411264 ex-periments is 2.5s. Compare to [39, Table 7]. n B DD s i ze min 25% median75% 99% (a) Length cutoff α = 1 . n B DD s i ze min 25% median75% 99% (b) Length cutoff α = 1 . n B DD s i ze min 25% median75% 99% (c) Length cutoff α = ∞ . Fig. 1: BDD size comparison across instances of Table 2.. cenario Aggregation using BDDs 13 minimal st -cuts to have bandwidth at most k , in particular the cardinality of eachcut must be ≤ k . Lemma 4 shows a class of such graphs. Lemma 4
Let G , . . . , G l be graphs for which the incidence matrices of minimal s i − t i -cuts ( s i , t i ∈ N ( G i ) ) all have bandwidth at most k . Let v be a new node, i.e. v / ∈ S i N ( G i ) . Then all graphs G = ( N, E ) of the form N = { v } ∪ S i N ( G i ) , E = E v ∪ S i E ( G i ) with E v = S i E iv such that E iv ⊂ { ( x, v ) : x ∈ N ( G i ) } and | E iv | ≤ haveincidence matrices of minimal st -cuts (for all s, t ∈ N ) with bandwidth at most k .Proof If s and t are in an original graph G i then the statement is true by assump-tion. If s ∈ G i and t = v then all minimal cuts are either the edge ( v, x ) ∈ E iv , or s − x -cuts in G i , so have bandwidth at most k . Otherwise there is a path from s to t in which v is a separating node, so cuts either separate s and v or v and t andthe previous case applies.This recursive construction yields tree-like graphs composed of smaller bandwidth-bounded subgraphs and connecting nodes that are separators.We conjecture that the pre-disaster planning problem has polynomial-size exactMIP reformulation if the line graph of G = ( V, E ) has logarithmically boundedpathwidth k ∈ O (log( | V | + | E | )), by exploiting few vertex cuts in the line graphand translating them to few edge cuts in the original graph, but refer this tofurther work.From a practical perspective, we obtained very small BDDs for the relativelysparse graphs arising in road networks using the following heuristic. For each α ∈C ( f SP ) use the following edge ordering: Count the number of occurrences of edge e in the sets of M α ( f SP ) and sort them by increasing value. The intuition is thatedges which are in no element of M α ( f SP ) are ‘don’t care’ edges for every scenario,and will thus not require introduction of any node in the BDD, while puttingcentral edges (participating in many minimal scenarios) at the end will not lead tomany branching decisions in the BDD until the very last layers. Furthermore, theseedges will be included to complete many scenarios, so when merging isomorphicsub-dags in the BDD construction it is good to find them in the bottom layers.The ordering obtained will be similar, at least for graphs with few central nodes,to a heuristic low-bandwidth ordering obtained by the Cuthill-McKee algorithm([15]). The circuit system minor equivalence check amounts to a direct matrixminor comparison, since we assume that the circuit system is explicitly given bythe list of minimal scenarios.This allows us to give an exact formulation of the pre-disaster planning problemof [37] as a MIP of very small size, an order of magnitude smaller than in therecent work of [38, 39]. Table 1 shows the MIP sizes for the Istanbul instance, andTable 2 shows number of BDDs and sizes for randomly constructed networks ofsimilar density, as done in [39] (see also [31] for the specifics on sampling randomconnected graphs). Again, the BDD sizes directly correspond to MIP sizes.We remark that the partitioning scheme and ‘molded distribution’ conceptof [38, 39] can be seen as a special case of our construction where all BDDs aresimply paths, not arbitrary DAGs. I = ( E, S ) with S ⊆ E be an independence system, i.e. ∀ S ∈ S : S ⊆ S ⇒ S ∈ S . Let w : S → R with w ( S i ) = P e ∈ S i w e with w e ≥ e ∈ E can fail independentlyat random. Let x e ∈ { , } be decision variables such that the element failureprobabilities are p e ( x e ) = ( p e x e = 1 p e x e = 0 . Consider the 2-stage stochastic programming problem of the form (1) wherefor a scenario of failing edges ξ ⊆ E the inner optimization problem is given by f ( ξ ) = max S i ∩ ξ ∈ S w ( S i ), i.e. by optimizing w over the restriction of I to ξ . Thenthis setting covers the following problems: – Expected maximum weight forests in a graph G = ( V, E )The independence system considered is the graphic matroid with ground setelements E , non-negative edge weights w e . Edges can fail (independently) withsome failure probability that can be influenced by decisions x e . The goal is tominimize or maximize the expected weight of the maximal forest under edgedecisions. – Expected stability number in a graph G = ( V, E )The independence system considered is the set of stable sets in G . – Matroid Steiner Problems [10]As mentioned before, BDD construction is output-polynomial in the graphicmatroid. We point out that in all of the cases we believe that relevant graph classeswith bounded BDD width can be characterized, but each may be a separate andintricate topic of investigation.4.3 Stochastic Flow Network InterdictionThe flow value function of a capacitated network flow problems is monotone wrt.the network capacities. Consider a setting where arc capacity failures (or reduc-tions) occur randomly. Decisions allow us to increase or decrease reliability of thelinks. If we assume complete failure of arcs, the fundamental theorem of networkflow theory about path decomposition of flows lets us link this problem to a classicmaximum flow problem. It is more challenging to consider the case of (discrete)arc capacity changes; see [16] and papers citing that for structural statements onthe lattice of flow distributions under capacity changes.In particular, the network interdiction problem SNIP(IB) introduced in [13]fits this framework:
Definition 4 (stochastic network interdiction problem SNIP(IB))
Let G =( V, A ) be a (directed) graph, s, t ∈ V a source and a sink node, and arc capacities { c e } e ∈ A be given. For each arc e ∈ A let a probability p e of an interdiction attemptbeing successful be given, where success means that the arc capacity becomes 0.The problem SNIP(IB) for ( G, { p e } e ∈ A , s, t ) is then the problem of selecting a set ofarcs to interdict, such that the expected maximum flow in the remaining networkis minimized. cenario Aggregation using BDDs 15 e x p ec t e dfl o w investment/expected flow plot 00 . . . C P L E X . t i m e ( i n s ) expected flowtime 10 . . . . . interdiction budget e x p ec t e dfl o w log/log investment/yield plot expected flow (a) SNIP(IB)-4x9 : 38 nodes, 67 arcs (24 targetable), interdiction success rate 75%. Network has10 different flow values; P BDD sizes: 207 nodes, construction time 3 .
3s (3969 max flow oraclecalls), MIP size 1771 × e x p ec t e dfl o w investment/expected flow plot 0102030 C P L E X . t i m e ( i n s ) expected flowtime 10 . . . interdiction budget e x p ec t e dfl o w log/log investment/yield plot expected flow (b) SNIP(IB)-7x5 : 37 nodes, 72 arcs (22 targetable), interdiction success rate 75%. Networkhas 19 different flow values; P BDD sizes: 2302 nodes, construction time 5 .
9s (61988 max floworacle calls), MIP size 22867 × e x p ec t e dfl o w investment/expected flow plot 01 · · · C P L E X . . t i m e ( i n s ) expected flowtime 10 . . . interdiction budget e x p ec t e dfl o w log/log investment/yield plot expected flow (c) SNIP(IB)-10x10 : 10 ×
10 grid, 35% interdictable arcs, vertical arc orientation unif. randomup/down 50 : 50, capacities 10 ..
100 uniformly random multiples of 10, 75% interdiction successrates.
Fig. 2: SNIP(IB) Instances of [13]. Computations performed using CPLEX on asingle node 16-core Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz.
To solve an SNIP(IB) problem using our framework we need to compute theset of all st -flows arising under edge removal from the graph G . By the max flowmin cut theorem the number of different flow values appearing depends on thenumber of possible cut values, so in particular a rough estimate shows that if thenumber of different arc capacities is k there are at most ( | E | k ) possible cut values.Having a limited number of arc capacities is very natural in many applicationswhere the capacity is determined by a technological parameter of the arc type,e.g., link speed or pipe diameter.For the purpose of showing practicality of our approach we resort to computingthe sets of all minimal edge sets whose removal reduces the maximal st -flow in theresidual graph to less than α using a technique successfully applied to determiningminimal cut sets in metabolic flux networks [24, 5]. In particular, we solve the max-flow problem using the lemon-1.3.1 library [9] in the membership oracle of thejoint generation procedure [18, 23]. In Figure 2 we use the benchmark SNIP(IB)-instances of [13]. Due to the fact that the MIP only depends on the networkstructure and the set of interdictable arcs we can seamlessly change the budgetvalue or interdiction success probabilities without re-generating the MIP. Thisseparation of structure and data is a major advantage to other methods that takethese parameters into account throughout the solution procedure like [27], andmakes it possible to do a parameter study accounting for all conceivable budgetvalues.In summary, we find that we obtain MIPs that allow us to solve the exactreformulation, rather than resorting to solving approximations, and for all con-ceivable budgets, but that current MIP solvers – we’ve also done limited trialswith FICO Xpress, Gurobi and SCIP – struggle with the particular problem struc-ture more than we expected. The formulation with few, binary variables couplingmany continuous variables in equations seems like a natural fit for decompositionapproaches, which we are planning to investigate in future research. Acknowledgements
This work was partially supported by EU-FP7-ITN 289581 ‘NPlast’(while the second author was at ETH Z¨urich) and EU-FP7-ITN 316647 ‘MINO’ (while thefirst author was at ETH Z¨urich).
References
1. S. Ahmed.
Strategic Planning Under Uncertainty: Stochastic Integer Programming Ap-proaches . PhD thesis, University of Illinois at Urbana-Champaign, 2000.2. Edoardo Amaldi, Marc E. Pfetsch, and Leslie E. Trotter Jr. On the maximum feasiblesubsystem problem, IISs and IIS-hypergraphs.
Math. Program. , 95(3):533–554, 2003.3. Georg Baier, Thomas Erlebach, Alexander Hall, Ekkehard K¨ohler, Heiko Schilling, andMartin Skutella. Length-bounded cuts and flows. In Michele Bugliesi, Bart Preneel,Vladimiro Sassone, and Ingo Wegener, editors,
Automata, Languages and Programming ,volume 4051 of
Lecture Notes in Computer Science , pages 679–690. Springer Berlin Hei-delberg, 2006.4. Michael O. Ball and J. Scott Provan. Disjoint products and efficient computation ofreliability.
Operations Research , 36(5):703–715, 1988.5. Kathrin Ballerstein, Axel von Kamp, Steffen Klamt, and Utz-Uwe Haus. Minimal cutsets in a metabolic network are elementary modes in a dual network.
Bioinformatics ,28(3):381–387, 2012.6. R. Billinton and Allan R. N.
Reliability Evaluation of Engineering Systems: Concepts andTechniques . Plenum Press, 2 edition, 1992.cenario Aggregation using BDDs 177. Etienne Birmel´e, Rui Ferreira, Roberto Grossi, Andrea Marino, Nadia Pisanti, , RomeoRizzi, and Gustavo Sacomoto. Optimal listing of cycles and st-paths in undirected graphs.In
Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algo-rithms , pages 1884–1896, 2013.8. R.E. Bryant. Graph-based algorithms for boolean function manipulation.
IEEE Transac-tions on Computers , C-35(8):677–691, Aug. 1986.9. COIN-OR. Lemon, a graph library. http://lemon.cs.elte.hu/trac/lemon , 2014.10. Charles J Colbourn and William R Pulleyblank. Matroid steiner problems, the tuttepolynomial and network reliability.
Journal of Combinatorial Theory, Series B , 47(1):20– 31, 1989.11. Matthew Colvin and Christos T. Maravelias. Modeling methods and a branch and cut al-gorithm for pharmaceutical clinical trial planning using stochastic programming.
EuropeanJournal of Operational Research , 203(1):205 – 215, 2010.12. M. Colving and C. T. Maravelias. A stochastic programming approach for clinical trialplanning in new drug development.
Computers and Chemical Engineering , 32(11):2626–2642, 2008.13. Kelly J. Cormican, David P. Morton, and R. Kevin Wood. Stochastic network interdiction.
Operations Research , 46(2):184–197, 1998.14. Yves Crama and Peter L. Hammer.
Boolean Functions: Theory, Algorithms, and Appli-cations . Encyclopedia of Mathematics and its Applications. Cambridge University Press,2011.15. E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In
Proceedings of the 1969 24th national conference , ACM ’69, pages 157–172, New York,NY, USA, 1969. ACM.16. J. R. Evans. Maximum flow in probabilistic graphs-the discrete case.
Networks , 6(2):161–183, 1976.17. B. Flach and M. Poggi. On a class of stochastic programs with endogenous uncertainty:theory, algorithm and. Technical report, Monografias em Ciencia da Computac˜ao No05/10, PUC, Rio, 2010.18. Michael L. Fredman and Leonid Khachiyan. On the complexity of dualization of monotonedisjunctive normal forms.
J. Algorithms , 21(3):618–628, 1996.19. V. Goel and I. E. Grossmann. A stochastic programming approach to planning of offshoregas field developments under uncertainty in reserves.
Computers & Chemical Engineering ,28(8):1409–1429, 2004.20. V. Goel and I. E. Grossmann. A class of stochastic programs with decision dependentuncertainty.
Mathematical Programming , 108(2):355–394, 2006.21. V. Gupta and I. E. Grossmann. A new decomposition algorithm for multistage stochasticprograms with endogenous uncertainties.
Computers & Chemical Engineering , 62:62–79,2011.22. V. Gupta and I. E. Grossmann. Solution strategies for multistage stochastic programmingwith endogenous uncertainties.
Computers & Chemical Engineering , 35(11):2235–2247,2011.23. Utz-Uwe Haus. cl-jointgen , a Common Lisp Implementation of the Joint-GenerationMethod. available at , 2008-2013.24. Utz-Uwe Haus, Steffen Klamt, and Tamon Stephen. Computing knockout strategies inmetabolic networks.
Journal of Computational Biology , 15:259–268, 2008.25. Utz-Uwe Haus and Carla Michini. Compact representations of all members of an inde-pendence system.
Annals of Mathematics and Artificial Intelligence , pages 1–18, 2016.26. H. Held and D. L. Woodruff. Heuristics for multi-stage interdiction of stochastic networks.
Journal of Heuristics , 11(5):483–500, 2005.27. Udom Janjarassuk and Jeff Linderoth. Reformulation and sampling to solve a stochasticnetwork interdiction problem.
Networks , 52(3):120–132, 2008.28. T. W. Jonsbraten.
Optimization models for petroleum field exploitation . PhD thesis,Stavanger College, Department of Business Administration, 1998.29. B. Kawas, M. Laumanns, and E. Pratsini. A robust optimization approach to enhancingreliability in production planning under non-compliance risk.
OR Spectrum , 35(4):835–865, 2013.30. C. Y. Lee. Representation of switching circuits by binary-decision programs.
Bell SystemsTechnical Journal , 38(4):985–999, 1959.31. A. P. Masucci, D. Smith, A. Crooks, and M. Batty. Random planar graphs and the londonstreet network.
The European Physical Journal B , 71(2):259–271, 2009.8 Utz-Uwe Haus et al.32. Christoph Meinel and Thorsten Theobald.
Algorithms and Data Structures in VLSI De-sign: OBDD – Foundations and Applications . Springer-Verlag New York, Inc., Secaucus,NJ, USA, 1998.33. E. F. Moore and C. E. Shannon. Reliable circuits using less reliable relays: Part i.
Journalof the Franklin Institute , 262(3):191–208, 1956.34. E. F. Moore and C. E. Shannon. Reliable circuits using less reliable relays: Part II.
Journalof the Franklin Institute , 262(4):281–297, 1956.35. Guoqiang Pan and Moshe Y. Vardi. Symbolic techniques in satisfiability solving.
J.Autom. Reasoning , 35(1-3):25–50, 2005.36. Mark Parker and Jennifer Ryan. Finding the minimum weight IIS cover of an infeasiblesystem of linear inequalities.
Annals of Mathematics and Artificial Intelligence , 17(1):107–126, 1996.37. Srinivas Peeta, F. Sibel Salman, Dilek Gunnec, and Kannan Viswanath. Pre-disasterinvestment decisions for strengthening a highway network.
Computers & Operations Re-search , 37(10):1708 – 1719, 2010.38. Steven D. Prestwich, Marco Laumanns, and Ban Kawas. Value interchangeability inscenario generation. In Christian Schulte, editor,
Principles and Practice of ConstraintProgramming , volume 8124 of
Lecture Notes in Computer Science , pages 587–595. SpringerBerlin Heidelberg, 2013.39. Steven D. Prestwich, Marco Laumanns, and Ban Kawas. Distribution shaping and scenariobundling for stochastic programs with endogenous uncertainty. Stochastic ProgrammingE-Print Series 5, Institut f¨ur Mathematik, HU Berlin, 2014.40. J. Scott Provan and Douglas R. Shier. A paradigm for listing ( s, t )-cuts in graphs.
Algo-rithmica , 15(4):351–372, 1996.41. R.C. Read and R.E. Tarjan. Bounds on backtrack algorithms for listing cycles, paths, andspanning trees.
Networks , 5(3):237–252, 1975.42. Hermann Schichl and Meinolf Sellmann. Predisaster preparation of transportation net-works. In
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence,January 25-30, 2015, Austin, Texas, USA. , pages 709–715, 2015.43. Kyoko Sekine, Hiroshi Imai, and Seiichiro Tani. Computing the tutte polynomial of agraph of moderate size. In John Staples, Peter Eades, Naoki Katoh, and Alistair Moffat,editors,
Algorithms and Computations: 6th International Symposium, ISAAC ’95 Cairns,Australia, December 4–6, 1995 Proceedings , pages 224–233. Springer Berlin Heidelberg,Berlin, Heidelberg, 1995.44. Fabio Somenzi. CUDD: CU decision diagram package release 2.5.0. available as OpenSource at ftp://vlsi.colorado.edu/pub/cudd-2.5.0.tar.gz , 2012.45. Yasuhiko Takenaga and Shuzo Yajima. Hardness of identifying the minimum orderedbinary decision diagram.
Discrete Applied Mathematics , 107(1–3):191 – 201, 2000. SIBoolean Functions.46. J. D. Esary Z. W. Birnbaum and S. C. Saunders. Multi-component systems and structuresand their reliability.