On the approximability of robust spanning tree problems
aa r X i v : . [ c s . CC ] A p r On the approximability of robust spanning tree problems
Adam Kasperski
Institute of Industrial Engineering and Management, Wroc law University of Technology,Wybrze˙ze Wyspia´nskiego 27, 50-370 Wroc law, Poland, [email protected]
Pawe l Zieli´nski
Institute of Mathematics and Computer Science Wroc law University of Technology,Wybrze˙ze Wyspia´nskiego 27, 50-370 Wroc law, Poland, [email protected]
Abstract
In this paper the minimum spanning tree problem with uncertain edge costs is dis-cussed. In order to model the uncertainty a discrete scenario set is specified and a robustframework is adopted to choose a solution. The min-max, min-max regret and 2-stagemin-max versions of the problem are discussed. The complexity and approximability of allthese problems are explored. It is proved that the min-max and min-max regret versionswith nonnegative edge costs are hard to approximate within O (log − ǫ n ) for any ǫ > O (log n ) unless the problems in NPhave quasi-polynomial time algorithms. In this paper randomized LP-based approxima-tion algorithms with performance ratio of O (log n ) for min-max and 2-stage min-maxproblems are also proposed. Keywords:
Combinatorial optimization; Approximation; Robust optimization; Two-stageoptimization; Computational complexity
The usual assumption in combinatorial optimization is that all input parameters are preciselyknown. However, in real life this is rarely the case. There are two popular optimizationsettings of problems for hedging against uncertainty of parameters: stochastic optimizationsetting and robust optimization setting .In the stochastic optimization, the uncertainty is modeled by specifying probability dis-tributions of the parameters and the goal is to optimize the expected value of a solution built(see, e.g., [7, 22]). One of the most popular models of the stochastic optimization is a [7]. In the 2-stage approach the precise values of the parameters are specified in thefirst stage, while the values of these parameters in the second stage are uncertain and arespecified by probability distributions. The goal is to choose a part of a solution in the firststage and complete it in the second stage so that the expected value of the obtained solu-tion is optimized. Recently, there has been a growing interest in combinatorial optimizationproblems formulated in the 2-stage stochastic framework [9, 10, 12, 16, 21].In the robust optimization setting [17] the uncertainty is modeled by specifying a set ofall possible realizations of the parameters called scenarios . No probability distribution inthe scenario set is given. In the discrete scenario case , which is considered in this paper, we1efine a scenario set by explicitly listing all scenarios. Then, in order to choose a solution, twooptimization criteria, called the min-max and the min-max regret , can be adopted. Under themin-max criterion, we seek a solution that minimizes the largest cost over all scenarios. Underthe min-max regret criterion we wish to find a solution which minimizes the largest deviationfrom optimum over all scenarios. A deeper discussion on both criteria can be found in [17].The minmax (regret) versions of some basic combinatorial optimization problems with discretestructure of uncertainty have been extensively studied in the recent literature [2, 3, 14, 19].Furthermore, both robust criteria can be easily extended to the 2-stage framework. Such anextension has been recently done in [8, 16].In this paper, we wish to investigate the min-max (regret) and min-max 2-stage versions ofthe classical minimum spanning tree problem. The classical deterministic problem is formallystated as follows. We are given a connected graph G = ( V, E ) with edge costs c e , e ∈ E . Weseek a spanning tree of G of the minimal total cost. We use Φ to denote the set of all spanningtrees of G . The classical deterministic minimum spanning tree is a well studied problem, forwhich several very efficient algorithms exist (see, e.g., [1]).In the robust framework, the edge costs are uncertain and the set of scenarios Γ is definedby explicitly listing all possible edge cost vectors. So, Γ = { S , . . . , S K } is finite and containsexactly K scenarios, where a scenario is a cost realization S = ( c Se ) e ∈ E . In this paper weconsider the unbounded case , where the number of scenarios is a part of the input. We willdenote by C ∗ ( S ) = min T ∈ Φ P e ∈ T c Se the cost of a minimum spanning tree under a fixedscenario S ∈ Γ. In the
Min-max Spanning Tree problem, we seek a spanning tree thatminimizes the largest cost over all scenarios, that is
OP T = min T ∈ Φ max S ∈ Γ X e ∈ T c Se . (1)In the Min-max Regret Spanning Tree , we wish to find a spanning tree that minimizesthe maximal regret:
OP T = min T ∈ Φ max S ∈ Γ (X e ∈ T c Se − C ∗ ( S ) ) . (2)The formulation (1) is a single-stage decision one. We can extend this formulation to a2-stage case as follows. We are given the first stage edge costs c e , e ∈ E , and in the secondstage there are K possible cost realizations (scenarios) listed in scenario set Γ. The problem consists in determining a subset of edges E in the first stage anda subset of edges E S that augments it to form a spanning tree T S = E ∪ E S ∈ Φ underscenario S in the second stage for each scenario S ∈ Γ. The goal is minimize the maximumcost of the determined subsets of edges E , E S , . . . , E S K : OP T = min E ,E S ,...,E SK max S ∈ Γ X e ∈ E c e + X e ∈ E S c Se : T S = E ∪ E S ∈ Φ . (3)Let us now recall some known results on the problems under consideration. In the boundedcase (when the number of scenarios is bounded by a constant), the Min-max (Regret)Spanning Tree problem is NP-hard even if Γ contains only 2 scenarios [17] and admits anFPTAS [3], whose running time, however, grows exponentially with K . In the unboundedcase, the Min-max (Regret) Spanning Tree problem is strongly NP-hard [2, 17] and2ot approximable within (2 − ǫ ), for any ǫ >
0, unless P=NP even for edge series-parallelgraphs [14]. The
Min-max (Regret) Spanning Tree problem is approximable within K [3]. However, up to now the existence of an approximation algorithm with a constantperformance ratio for the unbounded case has been an open question. To the best of theauthors’ knowledge the 2-stage version of the minimum spanning tree problem seems to existonly in the stochastic setting [9, 10, 12]. Recently, the robust 2-stage framework has beenemployed in [8, 16] for some network design and matching problems. Our results
In this paper we prove that the
Min-max Spanning Tree and
Min-maxRegret Spanning Tree problems are hard to approximate with a constant performanceratio (Theorem 3 and Corollary 1). Namely, they are are not approximable within O (log − ǫ n )for any ǫ >
0, where n is the input size, unless NP ⊆ DTIME( n poly log n ). We thus give anegative answer to the open question about the existence of approximation algorithms with aconstant performance ratio for these problems. Moreover, if both positive and negative edgecosts are allowed, then the Min-max Spanning Tree problem is not at all approximableunless P=NP (Theorem 4). For the problem, we show that it isnot approximable within any constant, unless P=NP, and within (1 − ǫ ) ln n for any ǫ > ⊆ DTIME( n log log n ) (Theorem 6). The above negative results encourage us to findrandomized approximation algorithms, which yield a O (log n ) approximation ratio for Min-max Spanning Tree (Theorem 5) and (Theorem 7).
In this section, we study the
Min-max Spanning Tree and
Min-max Regret SpanningTree problems. We improve the results obtained in [2, 14], by showing that both problemsare hard to approximate within a ratio of O (log − ǫ n ) for any ǫ >
0, unless the problems in NPhave quasi-polynomial time algorithms. We then provide an LP-based randomized algorithmwith approximation ratio of O (log n ) for Min-max Spanning Tree . We reduce a variant of the
Label Cover problem (see e.g., [5, 19]) to
Min-max SpanningTree . Label Cover:
Input:
A regular bipartite graph G = ( V, W, E ), E ⊆ V × W ; an integer N that defines the set of labels, which are in integers in { , . . . , N } ; for every edge ( v, w ) ∈ E a partial map σ v,w : { , . . . , N } → { , . . . , N } . A labeling of the instance L =( G, N, { σ v,w } ( v,w ) ∈ E ) is a function l assigning a nonempty set of labels to each vertexin V ∪ W , namely l : V ∪ W → N . A labeling satisfies an edge ( v, w ) ∈ E if ∃ a ∈ l ( v ) , ∃ b ∈ l ( w ) : σ v,w ( a ) = b. A total labeling is a labeling that satisfies all edges. The value of a total labeling l ismax x ∈ V ∪ W | l ( x ) | . Output:
A total labeling of the minimum value. This value is denoted by val ( L ).We now recall the following theorem [5, 19]:3 heorem 1. There exists a constant γ > so that for any language L ∈ N P , any input w and N > , one can construct an instance L of Label Cover , with | w | O (log N ) vertices andthe label set of size N , so that: w ∈ L ⇒ val ( L ) = 1 , w L ⇒ val ( L ) ≥ N γ . Furthermore, L can be constructed in time polynomial in its size. We now state and prove the theorem, which is essential in showing the hardness resultsfor the problems of interest.
Theorem 2.
There exists a constant γ > so that for any language L ∈ N P , any input w ,any N > and any g ≤ N γ , one can construct an instance T of Min-max Spanning Tree in time O ( | w | O ( g log N ) N O ( g ) ) , so that: w ∈ L ⇒ OP T ( T ) ≤ , w L ⇒ OP T ( T ) ≥ g. Proof.
Let L be a language in N P and let L = ( G = ( V, W, E ) , N, { σ v,w } ( v,w ) ∈ E ) be the in-stance of Label Cover from Theorem 1 constructed for L . Let us introduce some additionalnotations: • δ ( x ) is the set of edges of G incident to vertex x ∈ V ∪ W , • N v,w = { ( a, b ) ∈ N × N : σ v,w ( a ) = b } .We now transform L to an instance T of Min-max Spanning Tree . Let us fix g ≤ N γ ,where γ is the constant from Theorem 1. We first construct graph G ′ in the following way.We replace every edge ( v, w ) ∈ E with paths ( v, u v,wa,b , w v ) for all ( a, b ) ∈ N v,w (see Figure 1).The edges of the form ( u v,wa,b , w v ) (the dashed edges) are called dummy edges and the edgesof the form ( v, u v,wa,b ) (the solid edges) are called label edges . We say that label edge ( v, u v,wa,b )assigns label a to v and label b to w . We will denote the obtained component by G v,w and wewill use E lv,w to denote the set of all label edges of G v,w , obviously | E lv,w | = | N v,w | . We finishthe construction of G ′ by adding additional vertex s and connecting all the components byadditional dummy edges ( s, v ) for all v ∈ V . A sample graph G ′ , where G is K , , is shownin Figure 2.We now form scenario set Γ. We first note that all dummy edges under all scenarios havecosts equal to 0. We say that two label edges are label-distinct if they do not assign the samelabel to any vertex v or w . Namely, ( v, u v,wa i ,b i ) and ( v ′ , u v ′ ,w ′ a ′ i ,b ′ i ) are label-distinct if a i = a ′ i implies v = v ′ and b i = b ′ i implies w = w ′ . Consider vertex v ∈ V , for which there is theset of p = | δ ( v ) | components G = { G v,w , . . . , G v,w p } . For every subset F ⊆ G of exactly g components, F = { G v,w , . . . , G v,w g } and for every g -tuple of pairwise label-distinct edges(( v, u v,w a ,b ) , . . . , ( v, u v,w g a g ,b g )) ∈ E lv,w × · · · × E lv,w g we form scenario under which all these edgeshave cost 1 and all the remaining edges have cost 0. We repeat this procedure for all vertices v ∈ V . Consider then vertex w ∈ W , for which there is the set of q = | δ ( w ) | components G = { G v ,w , . . . , G v q ,w } . For every subset F ⊆ G of exactly g components, F = { G v ,w , . . . , G v g ,w } vvw w v G v,w u v,wa ,b u v,wa ,b u v,wa | Nv,w | ,b | Nv,w | Figure 1: Replacing edge ( v, w ) ∈ E with component G v,w .PSfrag replacements sv w v G v ,w w v G v ,w w v G v ,w v w v G v ,w w v G v ,w w v G v ,w v w v G v ,w w v G v ,w w v G v ,w Figure 2: A sample of graph G ′ , where graph G in L is K , .and for every g -tuple of pairwise label-distinct edges (( v , u v ,wa ,b ) , . . . , ( v g , u v g ,wa g ,b g )) ∈ E lv ,w ×· · · × E lv g ,w we form scenario under which all these edges have cost 1 and all the remainingedges have cost 0. We repeat this for all vertices w ∈ W . In order to ensure Γ = ∅ , we includein Γ the scenario in which every edge has zero cost.Assume that w ∈ L and thus val ( L ) = 1. Thus, there exists a total labeling l satisfyingall edges in G such that max x ∈ V ∪ W | l ( x ) | = 1. Each edge ( v i , w i ) ∈ E in G correspondsto the exactly one component G v i ,w i in G ′ . Let ( a i , b i ) be the pair of labels satisfying theedge ( v i , w i ) in total labeling l , i.e. a i ∈ l ( v i ) and b i ∈ l ( w i ). We form a spanning tree T in G ′ by adding exactly one edge ( v i , u v i ,w i a i ,b i ) from every component G v i ,w i and we complete theconstruction by adding a necessary number of dummy edges. Since the labeling l is such thatmax x ∈ V ∪ W | l ( x ) | = 1, no pair of label-distinct edges have been chosen while constructing T ,so P e ∈ T c Se ≤ S ∈ Γ and consequently max S ∈ Γ P e ∈ T c Se ≤ w / ∈ L and thus max x ∈ V ∪ W | l ( x ) | ≥ N γ ≥ g for all total labellings l .Consider any spanning tree T in G ′ . Without loss of generality, we can assume that T containsexactly one label edge from every component G v,w . The set of all label edges contained in T corresponds to a total labeling l of L . Since | l ( x ) | ≥ g , for some vertex x ∈ V ∪ W , we haveto use at least g distinct labels in the labeling l . Suppose that x = v ∈ V and we use distinctlabels a , . . . , a g for v . Then, T contains pairwise label-distinct edges ( v, u v,w i a i ,b i ), i = 1 , . . . , g ,and P e ∈ T c Se = g under scenario S that correspond to this g -tuple of edges. The reasoningfor x = w , w ∈ W is the same. In consequence max S ∈ Γ P e ∈ T c Se = g and OP T ( T ) = g .Let us now examine the size of the resulting instance of the Min-max Spanning Tree E ′ is at most | V | + 2 | E | N , the size of the set of vertices V ′ is at most 1 + | V | + | E | N + | W || V | and the number of scenarios is at most 1 + 2 | E | g N g N g .Hence, and from | E | = | w | O (log N ) , we deduce that the size of the constructed instance ( G ′ , Γ)is | w | O ( g log N ) N O ( g ) , so it can be constructed in O ( | w | O ( g log N ) N O ( g ) ) time.From Theorem 2, we obtain the following result: Theorem 3.
The
Min-max Spanning Tree problem with nonnegative edge costs under allscenarios is not approximable within O (log − ǫ n ) for any ǫ > , where n is the input size,unless NP ⊆ DTIME ( n poly log n ) .Proof. Let γ be the constant from Theorem 2. For any β > g = log β | w | and N = log O ( β ) | w | , so that inequality g ≤ N γ is satisfied for the constant γ (see Theorem 2).The input size of the resulting instance ( G ′ , Γ) from Theorem 2 is n = | w | O ( g log N ) N O ( g ) = | w | O (log β + δ | w | ) for some constant δ >
0, so it can be constructed in O ( | w | poly log | w | ) time. Since g = log β | w | and n = 2 O (log β + δ +1 | w | ) , we get g = O (log ββ + δ +1 n ) and the gap is O (log − ǫ n ) forany ǫ > Corollary 1.
The
Min-max Regret Spanning Tree problem is not approximable within O (log − ǫ n ) for any ǫ > , where n is the input size, unless NP ⊆ DTIME ( n poly log n ) .Proof. The corollary follows easily if we assume that each component G v,w in the constructionfrom Theorem 2 has at least 2 label edges or, equivalently, every edge in the instance of LabelCover has at least two pairs of labels. In this case, under every scenario S ∈ Γ, there isa spanning tree of 0 cost (recall that we never assign two 1’s to the same component in S ).Hence OP T ( T ) = OP T ( T ) and the proof is completed. If some edge in the instance of Label Cover has only one pair of labels, then this pair trivially forces an assignment oflabels to two vertices, which (after checking consistency with other edges) can be removedfrom the instance before applying the construction from Theorem 2.Up to this point we have assumed that the edge costs under all scenarios are nonnegative.The following theorem demonstrates that violation of this assumption makes the
Min-maxSpanning Tree problem not at all approximable:
Theorem 4.
If both positive and negative costs are allowed, then the
Min-max SpanningTree problem is not at all approximable unless P=NP even for edge series-parallel graphsProof.
We show a gap-introducing reduction from which is known to be stronglyNP-complete [13].
Input:
A set U = { x , . . . , x n } of Boolean variables and a collection C = { C , . . . , C m } of clauses, where every clause in C has exactly three distinct literals. Question:
If there is an assignment to U that satisfies all clauses in C ?We will assume that in the instance of for every variable x i both x i and ∼ x i appearin C . Obviously, under such assumption remains strongly NP-complete. Given aninstance of we construct an instance of Min-max Spanning Tree as follows. Foreach clause C i = ( l i ∨ l i ∨ l i ) we create a graph G i composed of 5 vertices: s i , v i , v i , v i , t i s i , v i ), ( s i , v i ), ( s i , v i ) correspond to literals in C i , the edges ( v i , t i ),( v i , t i ), ( v i , t i ) have costs equal to − G = ( V, E ) with | V | = 4 m + 1, | E | = 6 m , we identify vertex t i of G i with vertex s i +1 of G i +1 for i = 1 , . . . m −
1. Note that the resulting graph G is edge series-parallel. Finally,we form scenario set Γ as follows. For every pair of edges of G , ( s i , v ij ) and ( s q , v qr ), thatcorrespond to contradictory literals l ji and l rq , i.e. l ji = ∼ l rq , we create scenario S such thatunder this scenario the costs of the edges ( s i , v ij ) and ( s q , v qr ) are set to 4 m − −
1. It is easy to verify that each spanning tree T in theconstructed instance has nonnegative maximal cost over all scenarios.Suppose that is satisfiable. Then there exists a spanning tree T of G containingexactly 4 m edges that do not correspond to contradictory literals. Thus, under every sce-nario S , the tree contains at most one edge with the cost 4 m − m − −
1. In consequence we get P e ∈ T c Se = 0 under every S ∈ Γ and
OP T = 0. If is unsatisfiable, then every spanning trees T of G contains at leasttwo edges which correspond to contradictory literals, and so OP T = max S ∈ Γ P e ∈ T c Se ≥ m .Consequently Min-max Spanning Tree is not approximable, unless P=NP. Otherwise, anypolynomial time approximation algorithm applied to the constructed instance could decide ifan instance of is satisfiable.
If the edge costs are nonnegative under all scenarios, then the
Min-max Spanning Tree problem is approximable within K , K is the number of scenarios, and this is the best approx-imation ratio known so far [3]. On the other hand the problem is not at all approximable ifnegative costs are allowed (Theorem 4). In this section, we assume that all costs are nonnega-tive and we give a polynomial time approximation algorithm for the problem which returns an O (log n )-approximate spanning tree, where n is the number of vertices of G . The algorithmis based on a randomized rounding of a solution to an iterative linear program.It is easy to check that binary solutions to the following program LP min max ( C ) are in one-to-one correspondence with solutions to Min-max Spanning Tree of edge costs in everyscenario at most C : LP min max ( C ) : X e ∈ E c Se x e ≤ C ∀ S ∈ Γ , (4) X e ∈ E x e = n − , (5) X e ∈ δ ( W ) x e ≥ ∀ W ⊂ V , (6)0 ≤ x e ≤ ∀ e ∈ E , (7)if c Se > C then x e = 0 ∀ e ∈ E and ∀ S ∈ Γ , (8)where δ ( W ) denotes the cut determined by vertex set W , i.e. δ ( W ) = { ( i, j ) ∈ E : i ∈ W, j ∈ V \ W } . The core of LP min max ( C ) (constraints (5)-(7)) is the relaxation of the cut-setformulation for spanning tree [18]. The polynomial time solvability of LP min max ( C ) followsfrom an efficient polynomial time separation based on the min-cut problem (see [18]). Solving LP min max ( C ) consists in rejecting all edges e ∈ E having c Se > C under some scenario S ∈ Γand solving then the resulting linear programming problem. Using binary search in [0 , ( n − c max ], where c max = max e ∈ E max S ∈ Γ c Se , one can find the minimal value of parameter C ,for which there is a feasible solution to LP min max ( C ). Let b C be this minimal value and let(ˆ x e ) e ∈ E be a feasible solution to LP min max ( b C ). Clearly b C ≤ OP T . Furthermore, if ˆ x e > c Se ≤ b C and thus c Se ≤ OP T for each scenario S ∈ Γ.We now give an algorithm that randomly rounds a feasible solution of LP min max ( b C ) to an O (log n )-approximate min-max spanning tree (see Algorithm 1). Algorithm 1:
Randomized algorithm for
Min-max Spanning Tree
Use binary search in [0 , ( n − c max ] to find the minimal value of C such that thereexists a feasible solution to LP min max ( C ), i.e., b C and (ˆ x e ) e ∈ E .Initially ˆ F contains only vertices of G , that is n components. r ← ⌈ √
21) ln n ⌉ for k ← to r do For all e ∈ E , add edge e independently with probability ˆ x e to ˆ F . if ˆ F is connected thenexit for-loopif ˆ F is connected thenreturn a spanning tree of ˆ F Let us analyze Algorithm 1. Obviously the algorithm is polynomial. The following lemmashows that the total cost of edges included in each iteration under any scenario S ∈ Γ is O (ln n ) OP T with probability at least 1 − n : Lemma 1.
Let ˆ E k be a set of edges added to ˆ F at iteration k of Algorithm 1 and let K ≤ n ρ , ≤ f ≤ n ρ , where f , ρ , ρ , ρ are nonnegative constants such that ρ + ρ ≤ . · ρ , ρ ≥ . Then max S ∈ Γ X e ∈ ˆ E k c Se ≤ ( ρ ln n + 1 . s K + ln fρ ln n ! OP T (9) holds with probability at least − fn ρ − .Proof. See Appendix A.We now analyze the feasibility of an output solution ˆ F . Let ˆ F k be the forest obtained fromˆ F k − after the k -th iteration. Initially, ˆ F , ˆ F ⊂ G , has no edges. Let C k denote the numberof connected components of ˆ F k . Obviously, C = n . We say that an iteration k is “successful”if either C k − = 1 ( ˆ F k − is connected) or C k < . C k − ; otherwise, it is “failure”. We nowrecall a result of Alon [4] (see also [9]). His proof is repeated in Appendix A for completeness. Lemma 2 (Alon [4]) . For every k , the conditional probability that iteration k is “successful”,given any set of components in ˆ F k − , is at least / . From Lemma 2, it follows that the probability of the event that iteration k is “successful”is at least 1 /
2. This is a lower bound on the probability of success of given any history. Notethat, if forest ˆ F k is not connected ( C k >
1) then the number of “successful” iterations has beenless than log . n <
10 ln n . Let X be a random variable denoting the number of “successful”8terations among r performed iterations of the algorithm. The probability Pr[X <
10 ln n ]can be upper bounded by Pr[Y <
10 ln n ], where Y = P rk =1 Y k is the sum of r independentBernoulli trials such that Pr[Y k = 1] = 1 /
2. This estimation can be done, since we have alower bound on success of given any history. Clearly, E [Y] = r/
2. We apply the Chernoffbound (see for instance [20]) and determine the values of δ ∈ (0 ,
1] and r in order to fulfillthe following inequality:Pr[X <
10 ln n ] ≤ Pr[Y <
10 ln n ] = Pr[Y < (1 − δ ) E [Y]] < e − E [Y] δ / = 1 n . (10)It is easily seen that inequality (10) holds if the following system of equations ( (1 − δ ) r/ n,rδ / n (11)holds true. An easy computation for δ and r in (11), shows that r = 2(11 + √
21) ln n, δ = q √ . Hence, after r iterations, r = ⌈ √
21) ln n ⌉ , we obtain with probability atleast 1 − /n a spanning tree. By the union bound and Lemma 1 (set f = r ), with probabilityat least 1 − /n in every iteration, k = 1 , . . . , r , the set of edges ˆ E k included at iteration k satisfies the bound (9). We conclude that after r iterations, we get with probability at least1 − /n a spanning tree whose total cost in every scenario is O ( r ln n ) OP T . We have, thusproved the following theorem: Theorem 5.
There is a polynomial time randomized algorithm for
Min-max SpanningTree that returns with probability at least − n a solution whose total cost in every scenariois O (log n ) OP T . In this section, we discuss the problem in robust optimizationsetting. We show that the problem is hard to approximate within a ratio of O (log n ) unlessthe problems in NP have quasi-polynomial algorithms. Then,we give an LP-based randomizedapproximation algorithm with ratio of O (log n ). Theorem 6.
The problem is not approximable within any con-stant, unless P=NP, and within (1 − ǫ ) ln n for any ǫ > , unless NP ⊆ DTIME ( n log log n ) .Proof. We proceed with a cost preserving reduction from
Set Cover to . The reduction is similar to that in [12] for the 2-stage stochastic spanning tree. SetCover is defined as follows (see, e.g., [5, 13]):
Set Cover:
Input:
A ground set U = { , . . . , n } and a collection of its subsets U , . . . , U m such that S mi =1 U i = U .A subcollection I ⊆ { , . . . , m } covers U if S i ∈ I U i = U , where | I | is the size of thesubcollection . Output:
A minimum sized subcollection that covers U .9he Set Cover problem is not approximable within any constant, unless P=NP, andwithin (1 − ǫ ) log n for any ǫ >
0, unless NP ⊆ DTIME( n log log n ), where n is the size of theground set (see [6, 11]). For a given instance C = ( U , U , . . . , U m ) of Set Cover , we constructan instance T = ( G = ( V, E ) , Γ) of as follows. Graph G = ( V, E )is a complete graph with m + n + 1 vertices V = { u , . . . , u m , , . . . , n, r } . Vertices u , . . . , u m correspond to m subsets U , . . . , U m , vertices 1 , . . . , n correspond to n elements of set U . Thecosts of the edges ( r, u i ), i = 1 , . . . , m , in G in the first stage are set to 1 and the costs of allthe remaining edges in G are set to m + 1. Now we form scenario set Γ in the second stage.Each scenario S j ∈ Γ corresponds to vertex j , j = 1 , . . . , n . Let T j = { j } ∪ { u i : j ∈ U i } and let ( T j , V \ T j ) be the cut separating T j from all other vertices of G . Each second stagescenario S j is defined as: the costs of the edges from cut ( T j , V \ T j ) are set to m + 1 and thecosts of the remaining edges in G are set to 0.We now prove that there is a subcollection of size at most k ≤ m that covers U if and onlyif there exists a spanning tree in G of the maximum 2-stage cost at most k ≤ m . Given asubcollection U i , . . . , U i k of size k that covers U . In the first stage, we include in E the edges( r, u i j ), where vertices u i j correspond to subsets U i j , j = 1 , . . . , k . The cost of E is equalto k . In the second stage, we augment E to form a spanning tree with edges of cost zero ineach scenario S j , j = 1 , . . . , n . Hence, the maximum 2-stage cost of the obtained spanningtree equals k . Conversely, let T be a spanning tree in G with the maximum 2-stage cost atmost k . Hence, this tree does not contain any edge with cost m + 1. Consequently, in thefirst stage the tree contains k ′ ≤ k edges of the form ( r, u i j ), j = 1 , . . . , k ′ , and in the secondstage in each scenario it contains zero cost edges. The vertices u i j correspond to subsets U i j , j = 1 , . . . , k ′ . It is easily seen that any element i ∈ U must be covered by at least one ofsubsets U i j , j = 1 , . . . , k ′ . Otherwise the solution would contain an edge of cost m + 1. Thus, U i j , j = 1 , . . . , k ′ , form a subcollection of the size at most k that covers U .The presented reduction is cost preserving. Hence, has thesame approximation bounds as Set Cover . In this section we construct a randomized approximation algorithm for , which is based on a similar idea as the corresponding algorithm for
Min-max Span-ning Tree (see Section 2.2). Consider the following program LP stage ( C ), whose binarysolutions correspond to the solutions of : LP stage ( C ) : X e ∈ E c e x e + X e ∈ E c Se x Se ≤ C ∀ S ∈ Γ X e ∈ E ( x e + x Se ) = n − ∀ S ∈ Γ X e ∈ δ ( W ) ( x e + x Se ) ≥ ∀ W ⊂ V , ∀ S ∈ Γ ≤ x e , x Se ≤ ∀ e ∈ E , ∀ S ∈ Γ if c e > C then x e = 0 ∀ e ∈ E if c Se > C then x Se = 0 ∀ e ∈ E , ∀ S ∈ Γ x e , ˆ x Se , S ∈ Γ, e ∈ E , of LP stage ( b C ), where b C denotes the minimal value of C for which there is a feasible solution to LP stage ( C ). Algorithm 2:
Randomized algorithm for c max ← max e ∈ E { c e , max S ∈ Γ c Se } Use binary search in [0 , ( n − c max ] to find the minimal value of C such that thereexists a feasible solution of LP stage ( C ), i.e., ˆ x e , ˆ x Se , S ∈ Γ, e ∈ E .Initially ˆ F S contains only vertices of G for S ∈ Γ. r ← ⌈ ( √ ln n + ln K + √
21 ln n + ln K ) ⌉ for k ← to r do In the first stage:
For all e ∈ E , choose edge e independently with probability ˆ x e and add it to each ˆ F S for S ∈ Γ. In the second stage: for every S ∈ Γ and every e ∈ E , add edge e independentlywith probability ˆ x Se to ˆ F S . if all ˆ F S , S ∈ Γ , are connected thenreturn { ˆ F S } S ∈ Γ An analysis of Algorithm 2 proceeds similarly as the one of Algorithm 1. The followinglemma holds (the proof goes in similar manner as the proof of Lemma 1):
Lemma 3.
Let ˆ E k and ˆ E Sk be the sets of edges in the first stage and in the second stagefor every S ∈ Γ , respectively, added to ˆ F S at iteration k of Algorithm 2 and let K ≤ n ρ , ≤ f ≤ n ρ , where f , ρ , ρ , ρ are nonnegative constants such that ρ + ρ ≤ . · ρ , ρ ≥ . Then X e ∈ ˆ E k c e + X e ∈ ˆ E Sk c Se ≤ ( ρ ln n + 1 . s K + ln fρ ln n ! OP T ∀ S ∈ Γ (12) holds with probability at least − fn ρ − . Let ˆ F Sk be the forest for S ∈ Γ after the k -th iteration of Algorithm 2, Let C Sk denote thenumber of connected components of ˆ F Sk . Again, we say that an iteration k is “successful” ifeither C Sk − = 1 or C Sk < . C Sk − ; otherwise it is “failure”. The probability of the event thatiteration k is “successful” is at least 1 /
2, which is due to Lemma 2.Consider any scenario S ∈ Γ. If forest ˆ F Sk is not connected then the number of “successful”iterations is less than log . n <
10 ln n . We estimate Pr[X <
10 ln n ] by Pr[Y <
10 ln n ], whereX is random variable denoting the number of “successful” iterations among r iterations andY = P rk =1 Y k is the sum of r independent Bernoulli trials such that Pr[Y k = 1] = 1 / E [Y] = r/
2. We use the Chernoff bound and compute the values of δ ∈ (0 ,
1] and r satisfyingthe following inequality:Pr[X <
10 ln n ] ≤ Pr[Y <
10 ln n ] = Pr[Y < (1 − δ ) E [Y]] < e − E [Y] δ / = 1 nK . (13)This gives r = ( √ ln n + ln K + √
21 ln n + ln K ) and δ = √ ln n +ln K √ ln n +ln K + √
21 ln n +ln K . Recall that K is the number of scenarios. By the union bound, the probability that a forest in at least11ne scenario S is not connected is less than 1 /n . Again, by the union bound and Lemma 1(set f = r ), with probability at least 1 − /n in every k iteration, k = 1 , . . . , r , the sets ofedges ˆ E k and ˆ E Sk for each S ∈ Γ, included at iteration k , satisfy the bound (12). Thus, after r iterations, r = ⌈ ( √ ln n + ln K + √
21 ln n + ln K ) ⌉ , with probability at least 1 − /n , weobtain spanning trees of cost O ( r ln n ) OP T in every scenario. We get the following theorem: Theorem 7.
There is a polynomial time randomized algorithm for that returns with probability at least − n a spanning tree whose cost in everyscenario is O (log n ) OP T . References [1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin.
Network Flows: theory, algorithms, andapplications . Prentice Hall, Englewood Cliffs, New Jersey, 1993.[2] H. Aissi, C. Bazgan, and D. Vanderpooten. Approximation complexity of min-max(regret) versions of shortest path, spanning tree, and knapsack. In
ESA 2005 , volume3827 of
Lecture Notes in Computer Science , pages 789–798. Springer-Verlag, 2005.[3] H. Aissi, C. Bazgan, and D. Vanderpooten. Approximation of min-max (regret) versionsof some polynomial problems. In
COCOON 2006 , volume 4112 of
Lecture Notes inComputer Science , pages 428–438. Springer-Verlag, 2006.[4] N. Alon. A note on network reliability. In D. Aldous, P. Diaconis, J. Spencer, andJ. M. Steele, editors,
Discrete Probability and Algorithms , volume 72 of
IMA Volumesin Mathematics and its applications , pages 11–14. Springer-Verlag, 1995.[5] S. Arora and C. Lund. Hardness of approximations. In D. Hochbaum, editor,
Approxi-mation Algorithms for NP-Hard Problems . PWS, 1995.[6] M. Bellare, O. Goldreich, and M. Sudan. Free Bits, PCPs and Non-Approximability -Towards Tight Results. In ,pages 422–431. IEEE Computer Society, 1995.[7] J. R. Birge and F. Louveaux.
Introduction to Stochastic Programming . Springer-Verlag,1997.[8] K. Dhamdhere, V. Goyal, and R. Ravi. Pay Today for a Rainy Day: Improved Approxi-mation Algorithms for Demand-Robust Min-Cut and Shortest Path Problems. In
STACS2006 , volume 3884 of
Lecture Notes in Computer Science , pages 206–217. Springer-Verlag, 2006.[9] K. Dhamdhere, R. Ravi, and M. Singh. On Two-Stage Stochastic Minimum SpanningTrees. In M. J¨unger and V. Kaibel, editors,
IPCO 2005 , volume 3509 of
Lecture Notesin Computer Science , pages 321–334. Springer-Verlag, 2005.[10] B. Escoffier, L. Gourves, J. Monnot, and O. Spanjaard. Two-stage stochastic matchingand spanning tree problems: Polynomial instances and approximation.
European Journalof Operational Research , 205:19–30, 2010.1211] U. Feige. A Threshold of ln n for Approximating Set Cover. Journal of the ACM ,45:634–652, 1998.[12] A. D. Flaxman, A. M. Frieze, and M. Krivelevich. On the random 2-stage minimumspanning tree.
Random Structures and Algorithms , 28:24–36, 2006.[13] M. R. Garey and D. S. Johnson.
Computers and Intractability. A Guide to the Theoryof NP-Completeness . W. H. Freeman and Company, 1979.[14] A. Kasperski and P. Zieli´nski. On the approximability of minmax (regret) networkoptimization problems.
Information Processing Letters , 109:262–266, 2009.[15] A. Kasperski and P. Zieli´nski. A randomized algorithm for the min-max selecting itemsproblem with uncertain weights.
Annals of Operations Research , 172:221–230, 2009.[16] I. Katriel, C. Kenyon-Mathieu, and E. Upfal. Commitment under uncertainty: Two-stagematching problems.
Theoretical Computer Science , 408:213–223, 2008.[17] P. Kouvelis and G. Yu.
Robust Discrete Optimization and its applications . KluwerAcademic Publishers, 1997.[18] T. L. Magnanti and L. A. Wolsey. Optimal Trees. In M. O. Ball, T. L. Magnanti, C. L.Monma, and G. L. Nemhauser, editors,
Network Models, Handbook in Operations Re-search and Management Science , volume 7, pages 503–615. North-Holland, Amsterdam,1995.[19] M. Mastrolilli, N. Mutsanas, and O. Svensson. Approximating Single Machine Schedulingwith Scenarios. In
APPROX-RANDOM 2008 , volume 5171 of
Lecture Notes in ComputerScience , pages 153–164. Springer-Verlag, 2008.[20] R. Motwani and P. Raghavan.
Randomized Algorithms . Cambridge University Press,1995.[21] R. Ravi and A. Sinha. Hedging Uncertainty: Approximation Algorithms for StochasticOptimization Problems.
Mathematical Programming , 108:97–114, 2006.[22] M. H. van der Vlerk. Stochastic programming bibliography.http://mally.eco.rug.nl/spbib.html, 1996–2007.
A Some proofs
Proof. (Lemma 1) In order to prove the bound (9), we will apply a technique used in [16, 15].Consider any scenario S ∈ Γ. Let us sort the costs in S in nonincreasing order c Se [1] ≥ c Se [2] ≥· · · ≥ c Se [ m ] , ( m is the number of edges of G ). We partition the ordered set of edges E intogroups as follows. The first group G (1) consists of edges e [1] , . . . , e [ j (1) ], where j (1) is themaximum such that ˆ x e [1] + · · · + ˆ x e [ j (1) ] ≤ ρ ln n . The subsequent groups G ( l ) , l = 2 , . . . , t ,are defined in the same way, that is G ( l ) consists of edges e [ j ( l − + 1] , . . . , e [ j ( l ) ], where j ( l ) isthe maximum such that ˆ x e [ j ( l − +1] + · · · + ˆ x e [ j ( l ) ] ≤ ρ ln n . The optimal value OP T satisfies: OP T ≥ b C ≥ m X i =1 c Se [ i ] ˆ x e [ i ] ≥ t X l =1 ( min e ∈ G ( l ) c Se ) X e ∈ G ( l ) ˆ x e ≥ ( ρ ln n − t − X l =1 min e ∈ G ( l ) c Se . (14)13et X e be a binary random variable with Pr[X e = 1] = ˆ x e . It holds X e ∈ ˆ E k c Se ≤ t X l =1 X e ∈ G ( l ) c Se X e ≤ t X l =1 X e ∈ G ( l ) ( max e ∈ G ( l ) c Se )X e ≤ ( max e ∈ G (1) c Se ) X e ∈ G (1) X e + t X l =2 ( min e ∈ G ( l − c Se ) X e ∈ G ( l ) X e . (15)Let us recall a Chernoff bound (see e.g., [20]). Suppose X , . . . , X N are independent Poissontrials such that Pr[X i = 1] = p i . Let X = P Ni =1 X i Then the inequality holds: Pr[X > E [X](1+ δ )] < e − E [X] δ / for any δ ≤ −
1. We use this Chernoff bound to estimate P e ∈ G ( l ) X e in each group G ( l ) . Consider a group G ( l ) . It holds E [ P e ∈ G ( l ) X e ] = P e ∈ G ( l ) ˆ x e ≤ ρ ln n . Set δ = 2 p ( ρ ln n + ln K + ln f ) / ( ρ ln n ). Since K ≤ n ρ , 1 ≤ f ≤ n ρ and ρ + ρ ≤ . · ρ , ρ ≥
2, inequality δ ≤ − X e ∈ G ( l ) X e > ρ ln n (1 + δ ) < e − ( ρ ln n +ln K +ln f ) = 1 / ( f Kn ρ ) . (16)By the union bound, the probability that P e ∈ G ( l ) X e > ρ ln n (1 + δ ) holds for at least onegroup G ( l ) is less than 1 / ( f Kn ρ − ) (because the number of groups is at most n ). Nowapplying the bound P e ∈ G ( l ) X e ≤ ρ ln n (1 + δ ) for every l = 1 , . . . , t to (15) and using thefact that max e ∈ G (1) w Se ≤ OP T and inequality (14) we obtain: X e ∈ ˆ E k c Se ≤ ρ ln n s ρ ln n + ln K + ln fρ ln n ! (cid:18) OP T + OP T ρ ln n − (cid:19) . An easy computation shows that: P e ∈ ˆ E k c Se ≤ ( ρ ln n + 1 . (cid:16) q ln K +ln fρ ln n (cid:17) OP T .The probability that the bound fails for a given scenario S is less than 1 / ( f Kn ρ − ) so,by the union bound, the probability that it fails for at least one scenario S ∈ Γ is lessthan 1 / ( f n ρ − ). Proof. (Lemma 2) If ˆ F k − is connected then we are done. Otherwise, let us denote by H =( V H , E H ) the graph obtained from ˆ F k − by contracting its every connected components to asingle vertex. An edge e is not included in ˆ F k with probability 1 − ˆ x e . Hence, the probabilitythat any vertex v of H remains isolated is Y e ∈ δ ( v ) (1 − ˆ x e ) ≤ exp( − X e ∈ δ ( v ) (1 − ˆ x e )) ≤ / e , where δ ( v ) denotes the set of edges incident to v . The last inequality follows from the factthat P e ∈ δ ( v ) (1 − ˆ x e ) ≥
1. By linearity of expectation, the expected number of isolated verticesof H is | V H | / e, and thus with the probability at least 1 / | V H | / e. Hence, the number of connected components of ˆ F k is at most2 | V H | e + 12 (cid:18) | V H | − | V H | e (cid:19) = (cid:18)
12 + 1e (cid:19) | V H | < . | V H | . Since | V H | = C k −1