Network Design with Coverage Costs
aa r X i v : . [ c s . D S ] J u l Network Design with Coverage Costs
Siddharth Barman ∗ Shuchi Chawla † Seeun Umboh ‡ Abstract
We study network design with a cost structure motivated by redundancy in data traffic. Weare given a graph, g groups of terminals, and a universe of data packets. Each group of terminalsdesires a subset of the packets from its respective source. The cost of routing traffic on any edgein the network is proportional to the total size of the distinct packets that the edge carries. Ourgoal is to find a minimum cost routing. We focus on two settings. In the first, the collectionof packet sets desired by source-sink pairs is laminar. For this setting, we present a primal-dual based 2-approximation, improving upon a logarithmic approximation due to Barman andChawla (2012) [6]. In the second setting, packet sets can have non-trivial intersection. We focuson the case where each packet is desired by either a single terminal group or by all of the groups,and the graph is unweighted. For this setting we present an O (log g )-approximation.Our approximation for the second setting is based on a novel spanner-type construction inunweighted graphs that, given a collection of g vertex subsets, finds a subgraph of cost only aconstant factor more than the minimum spanning tree of the graph, such that every subset inthe collection has a Steiner tree in the subgraph of cost at most O (log g ) that of its minimumSteiner tree in the original graph. We call such a subgraph a group spanner. Some of the classical applications of the theory of algorithms are in transportation and commoditynetworks: how should commodities be transported from where they are manufactured to wherethey are consumed? How should pipelines be laid to be most effective at balancing costs withrequirements? Questions such as these have lead to some of the most basic problems and theoremsin the area of approximation algorithms: network flow, traveling salesman, Steiner tree, flow-cutgaps, etc. Over time, solutions to these problems have come to be applied to a different classof networks, namely communication networks. At a basic level, the problems in communicationnetworks are similar: how should data be routed from its sources to its destinations? How shouldnetworks be designed to be able to handle different kinds of workload and traffic patterns? However,the underlying commodity in these networks – data – is fundamentally different from physicalcommodities. Unlike the latter, data can be compressed, encoded, or replicated, at virtually nocost. Network algorithms that do not exploit these properties fail to utilize the entire capacity ofthe network.The last few years have seen a rapid growth in “content aware” network optimization solutions,both within the academic literature (see, e.g., [1, 24], and references therein) as well as in the form ∗ California Institute of Technology. [email protected]. † University of Wisconsin – Madison. [email protected]. ‡ University of Wisconsin – Madison. [email protected].
1f commercial technologies [8, 22]. One of the functionalities that these technologies provide isto remove duplicate traffic from the network. In particular, every router in the network equippedwith such a technology keeps track of recently seen traffic. When duplicates are detected, a singlecopy of the duplicated data is sent forward along with a short message containing instructions forreplication at the next router. This defines a cost function on every link in the network, where thecost of carrying data is proportional to the number (or total size) of distinct packets that the linkcarries; in other words, it is a coverage function over the set of traffic streams that use the link.We study network design problems within this context.We consider the following framework. We are given a weighted network, and multiple commodi-ties , each with a source and several possible destinations that we collectively call terminals. Eachcommodity is composed of a number of different data packets drawn from a universe of packets; wecall these sets of packets demands . Importantly, there is redundancy in traffic—different commodi-ties may overlap in the sets of packets they contain, and so can benefit from using common routes.Our goal is to find a minimum cost routing for the given traffic matrix, assuming that we can buybandwidth at a fixed rate on every edge. Formally, our solution specifies for each commodity arouting tree spanning all of the terminals for this commodity. The cost of this solution on anyparticular edge is proportional to the total size of the distinct packets that the edge carries. Thisproblem was introduced in [6] where it was called redundancy aware network design.Network design with coverage costs displays the same short-routes-versus-shared-routes trade-off present in several classical network design problems with nonlinear costs, such as rent-or-buynetwork design [17, 13], access network design [3], and buy-at-bulk network design [4, 14, 19, 26].However there are fundamental differences. The buy-at-bulk cost model is inspired by economiesof scale in a physical commodity network—the volume of traffic that an edge carries is the sumof the volumes that the different commodities impose on it and the routing cost on the edge is aconcave function of the total volume of traffic. On the other hand, in our setting, the volume oftraffic itself is lowered due to the inherent nature of data traffic. In particular, this means thatthe savings achieved depend on the contents of the traffic and not just its quantity. We not onlyneed to bundle traffic streams as much as we can, but we also need to decide the right sets oftraffic streams to bundle. Consequently, the approximability of the problem also depends on theextent and manner in which different commodities share packets. When every source-sink pair inthe network demands a distinct packet, that is, there is no data redundancy in the network, theproblem reduces to finding the shortest route for each pair. When all of the demands are identical,the problem reduces to finding a single optimal Steiner forest over all of the terminal sets.In this paper we focus on two special cases of the network design problem with coverage costs—the laminar demands setting, and the sunflower demands setting. In the laminar demands settingthe packet sets corresponding to the commodities form a laminar family: the packet sets of any twocommodities are either completely disjoint or one contains the other. There is a natural hierarchyover commodities in this setting and any commodity can use for free an edge that is being usedfor another commodity that “dominates” it. So we may favor long routes for a commodity if thoseroutes share edges with a dominating commodity, in comparison to shorter ones that do not shareedges. Less intuitively, it may be useful to pick similar routes for two commodities with disjointpackets sets if a portion of the shared route can be used for a commodity that dominates both.Consequently, commodities that are higher up in the hierarchy are in some sense more importantthan commodities that are lower in the hierarchy.Non-laminar settings, where packet sets can have arbitrary intersection, also display sharing of2aths among similar as well as dissimilar commodities. However, we cannot exploit any naturalordering over commodities in determining which paths to use. Our second setting captures the com-plexity introduced by non-trivial intersections. In the sunflower demands setting, every collectionof demands has the same intersection. In other words, there is a common set of packets that belongsto every commodity, and every other packet belongs to exactly one commodity. A simple exampleof this setting is where each demand is of the form { , i } ; here 0 denotes the common packet, and i denotes the packet belonging only to commodity i . Once again our goal is to construct a routingtree for each commodity of minimum total cost. The cost of the collection of routing trees has twocomponents. The first corresponds to the total size of the union of the routing trees: we pay forthe cost of routing the common packets on this entire subgraph. The second corresponds to thecosts of the individual trees, weighted by the sizes of their respective unique packets.A standard approach in network optimization is to approximate a given network by a subgraphthat is much cheaper or sparser than the entire graph, and yet faithfully captures some essen-tial property of the graph. For example, spanners [21] are low-cost subgraphs that approximatelycapture shortest path distances between every pair of points in the graph. Likewise, cut- and flow-sparsifiers [20, 18] are sparse subgraphs that approximate cuts and flows in the graph respectively.Network design with coverage costs defines another such graph sparsification problem that may beof independent interest. In particular, for a given solution to the network design problem, considerpartitioning the edges into sets that carry a particular packet. Each such set is a Steiner forestover the terminal sets that demand that packet. Our goal is to find a solution that minimizes aweighted sum of the sizes of these Steiner forests. One way of doing so may be to find a subgraphthat induces Steiner forests over each respective set of terminals corresponding to a single packet,that are simultaneously approximately minimal for their corresponding instances. This approachis particularly relevant for the sunflower demands setting. In that setting, the Steiner forest cor-responding to the common packets is the entire subgraph itself, whereas the forest correspondingto packets unique to a commodity is simply the routing tree constructed for that commodity. Wetherefore ask: is there a subgraph that α − approximates the size of the minimum Steiner forestover the union of all terminal sets, and at the same time induces a Steiner tree over each individualterminal set that is within a factor of β of the smallest such tree? We call such a subgraph an ( α, β ) group spanner . Group spanners generalize spanners: if for every pair of nodes in the graph ourinstance contains a terminal set comprising of the two nodes, then a group spanner for the instancesimultaneously approximates the shortest path distances between every pair of nodes. The factor β is called the stretch of the spanner.The main technical component in our approach for the sunflower demands setting is a con-struction for group spanners in unweighted graphs where the union of all terminal sets spans theentire graph . Our construction achieves an ( O (1) , O (log g )) approximation, where g is the numberof commodities. This implies an O (log g ) approximation for the sunflower demands setting underthose assumptions. We leave open the problem of extending our construction to arbitrary weightedgraphs.For the laminar demands setting we obtain a 2-approximation in general graphs. To formintuition for this setting consider an instance with k different packets and k + 1 commodities: for We note that the first assumption by itself, i.e. the graph is unweighted, is without loss of generality: since ourapproximation is with respect to the sizes of the subgraphs, and not with respect to the number of edges, we canbreak up each long edge into edges of equal size by introducing new nodes. However, the additional assumption thatevery vertex belongs to some terminal set disallows this sort of transformation. ≤ k the demand set of commodity i contains only packet i , and demand set of commodity k + 1contains all of the k packets. Suppose also that every commodity has a single source and a singlesink. Then, one approach to solving the problem is to first find a least cost path for commodity k + 1, and then find least cost paths for the remaining commodities using the edges in the first pathfor free. This approach misses solutions where a slightly longer path for commodity k + 1 is muchmore cost efficient for the remaining commodities than the shortest path for k + 1. An alternativeis to first find shortest paths for commodities 1 through k , and then find the least cost path forcommodity k + 1 that can use edges in previously picked paths at a cheaper cost. This missessolutions where picking slightly longer paths for commodities 1 through k leads to a greater sharingof the edges. The first approach is indeed the approach analyzed in [6] for the special case of theproblem where there is a single source that belongs to all of the terminal sets. That paper showsthat in any single source laminar demands setting routing commodities in order of decreasing sizesof demand sets achieves an O (log k ) approximation where k is the number of different packets inthe universe.We extend and improve the result of [6] to obtain a 2-approximation for the laminar demandssetting with arbitrary terminal sets. Our approach is a hybrid of the two described above. At a highlevel, we first consider commodities in increasing order of the sizes of their demand sets. However,instead of committing to a single path for each commodity before considering the next, we keeparound a collection of all possible near-optimal paths for the smaller demand sets before consideringchoices for the larger demand sets. Then in a second pass, we finalize a single path (tree) for eachcommodity, considering commodities in decreasing order of sizes of their demand sets. That is, wecommit to paths for the larger demand sets before finalizing paths for the smaller demand sets. Inorder to maintain a collection of all near-optimal paths efficiently we use a primal-dual approach.The duals constructed for each commodity give a succinct description of all possible short pathsconnecting the source and the sink for that commodity. After having constructed all of the duals,we perform a reverse delete step that finalizes paths for commodities starting from the one withthe largest demand and moving on to smaller demand sets. The cost structure in the network design problem we consider is uniform in the sense that costs ondifferent edges are related through constant factors. Obtaining a randomized O (log n ) approxima-tion for network design problems with a uniform cost structure is often easy: we can use the treeembeddings of Bartal [7] and Fakcharoenphol et al. [10] to convert the graph into a distribution overtrees such that distances between nodes are preserved to within logarithmic factors in expectation.Then the expected cost of the optimal routing over the (random) tree is related within logarithmicfactors to the cost of the optimal routing over the graph. Moreover, the problem is easy to solveon trees, because there is a unique path between every pair of nodes. We achieve much betterapproximation factors. For the laminar demands setting, we obtain a 2-approximation. For thesunflower demands setting, our approximation factor is O (log g ); note that g is always at most n ,and in most applications should be much smaller.As mentioned earlier, network design with coverage costs is closely related but incomparable toother models of network design with uniform costs that display economies of scale. This includes,e.g., the uniform buy-at-bulk [4, 14, 19, 26], rent-or-buy [17, 13], and access network design [3, 12]problems. For all of these problems constant factor approximations are known in the uniform costssetting for the special case where all of the commodities share a common source. In the multi-4ommodity setting, i.e., with distinct sources and sinks, the rent-or-buy network design problemadmits a 2-approximation [17, 13], but the buy-at-bulk network design problem is hard to approx-imate within poly-logarithmic factors [2].Cost models specific to communication networks have been considered before in network design.Hayrapetyan et al. [15] study a single-source network design problem in which the cost on an edgeis a monotone submodular function of the commodities that use the edge. They obtain an O (log n )approximation via tree embeddings [7, 10], where n is the number of vertices in the graph. The coststructure that we consider is a special case of the one in [15] (coverage functions are submodular).However, unlike [15] we assume that terminals sets are arbitrary (in particular, they do not sharea common source). Moreover, we obtain stronger approximation guarantees.Shmoys et al. [23] study a facility location problem with a cost structure very similar to that inour sunflower demands setting. In their model, the cost of opening a facility has two components:a fixed cost (similar to the cost of routing the common packets in our setting), and a servicespecific cost (similar to the cost of routing other packets in our setting). They present a constantfactor approximation for facility location with this cost structure. Svitkina and Tardos [25] furtherextend this to a facility location problem with hierarchical costs, again presenting a constant factorapproximation. Extending our results to more general non-laminar coverage functions includinghierarchical costs is an interesting open problem.As mentioned earlier, a main component in our approach for the sunflower demands setting isa construction for group spanners in unweighted graphs. Group spanners generalize graph span-ners. Low-stretch spanners have a number of applications, including distributed routing using smallrouting tables and in computing near-shortest paths in distributed networks (see [21] and refer-ences therein). In unweighted graphs it is well known that the size of the smallest spanner withmultiplicative stretch k is equal to the maximum number of edges in a graph with girth at least k + 1; this is known to be O ( n O (1 /k ) ), and is conjectured tight. Our result is consistent withthis bound: when the number of commodities g is equal to the number of vertex pairs, we get an O (log g ) = O (log n ) stretch with a spanner of size O ( n ). Other work on spanners has focused onadditive stretch and weighted graphs (see, e.g., [9, 21, 27]).Group spanners also generalize shallow-light spanning trees. The latter is a subgraph thatis simultaneously an approximately-minimum spanning tree of the given graph, as well as anapproximate-shortest-paths tree with respect to a given source node. Consider an instance witha special source node s that for every node v in the graph contains the terminal set { s, v } . Thenan ( α, β ) group spanner for this instance simultaneously approximates the shortest path distancefrom s to v for every v to within a factor of β , and has size no more than α times the size of theminimum spanning tree in the graph. However, while our approach only guarantees β = O (log n )for g = n commodities, it is possible to obtain an ( O (1 /ǫ ) , ǫ ) approximation for any ǫ > In this section, we formally define Network Design with Coverage Costs. We are given a graph G = ( V, E ) with costs c e on edges, a universe Π of packets, and g commodities with terminal sets X , . . . , X g ⊆ V . The demand set of terminal set X j is denoted D j ⊆ Π, and we denote the collectionof all demand sets as D . A solution consists of a collection of g Steiner trees T = { T , . . . , T g } where T j is a Steiner tree spanning terminal set X j . The trees specify how packets are to be routed5ver the edges: the packets of demand D j are routed over edges of T j . For a solution T , the loadon edge e is ℓ e ( T ) = | S i : e ∈ T i D i | , i.e. the total number of distinct packets being routed over edge e .More generally, we can consider a setting in which packets have weights and we define the load onan edge to be the total weight of all of the distinct packets that an edge carries. The performanceand running times of both of our algorithms are independent of the number of distinct packets, sowe may assume without loss of generality that all packets have unit weight. Our goal is to find asolution T so as to minimize the total cost P e ∈ E c e ℓ e ( T ).We now describe the two special cases of network design with coverage costs that we study. Inthe following, for a subgraph H , we write c ( H ) for the total cost of edges in H , i.e. c ( H ) := P e ∈ H c e . Laminar demands.
In this setting, the collection of demand sets is laminar: for any
D, D ′ ∈ D , D ∩ D ′ = ∅ implies either D ⊆ D ′ or D ′ ⊆ D . In this case we can transform our objective intoa simpler form where the cost of each edge is charged to a collection of disjoint demand sets. Inparticular, given a solution T , for an edge e consider the demand sets D that are maximal amongthe collection { D j : e ∈ T j } of demand sets that this edge carries. Because of laminarity, thesemaximal demand sets are disjoint, and so the load on the edge is simply the sum of the sizes ofthese demand sets. Accordingly, let us define H D ( T ) to be the set of edges e such that D is amaximal set in { D j : e ∈ T j } . The packet set D will contribute to the load on these edges. Thenwe can write the total cost of the solution T as ℓ ( T ) = X e c e ℓ e ( T ) = X e X D : H D ( T ) ∋ e c e | D | = X D | D | X e ∈ H D ( T ) c e = X D | D | c ( H D ( T )) . Further note that in a feasible solution T , for each commodity j , the subgraph S D ⊇ D j H D ( T )contains the tree T j and therefore spans the terminal set X j . Therefore, instead of specifying aSteiner tree for each terminal set, it suffices to specify a forest H D for each demand set D suchthat each terminal set X j is connected in S D ⊇ D j H D . Sunflower demands.
In this setting, there is a special set of packets P ⊆ Π such that for all i = j , we have D i ∩ D j = P . In other words, D j = P ∪ P j with P i ∩ P j = ∅ for all i = j . Wecan again transform our objective into a simpler form. For a routing solution T = { T , T , . . . , T g } ,let H denote the subgraph obtained by taking the union of the T j s. Observe that H is a Steinerforest for X , . . . , X g . We have to route P over H , since all terminal sets demand P , and P j over T j . Thus the cost of the routing solution can be expressed as ℓ ( T ) = | P | c ( H ) + P j | P j | c ( T j ).We will now describe a lower bound on the cost of the optimal solution in this setting. Fora vertex set X and subgraph H , let St H ( X ) denote the cost of an optimal (i.e., minimum cost)Steiner tree over X in H . Let T ∗ = { T ∗ , T ∗ , . . . , T ∗ g } be an optimal routing solution to the giveninstance and let H ∗ = S j T ∗ j . Suppose F ∗ is an optimal Steiner forest for X , . . . , X g . Since H ∗ is a Steiner forest for X , . . . , X g and T ∗ j is a Steiner tree for X j , we have c ( H ∗ ) ≥ c ( F ∗ )and c ( T ∗ j ) ≥ St G ( X j ). Therefore the optimal routing-solution cost can be bounded as ℓ ( T ∗ ) ≥| P | c ( F ∗ ) + P j | P j | St G ( X j ). Group spanners.
For a graph G = ( V, E ) with cost c e on edges and g terminal sets X , . . . , X g ⊆ V , we say that subgraph H is an ( α, β ) group spanner if c ( H ) ≤ αc ( F ∗ ) and St H ( X j ) ≤ β St G ( X j )for all j . Here F ∗ denotes an optimal Steiner forest for X , . . . , X g in G . Note that a group spannergeneralizes the notion of a spanner since the latter asks for a sparse spanning subgraph H such that6or every pair of vertices ( u, v ) we have β stretch: d H ( u, v ) ≤ βd G ( u, v ). Here d H ( u, v ) (respectively, d G ( u, v )) denotes the distance, with edge lengths c e , between vertices u and v in H (respectively, G ). The following lemma shows that a good group spanner implies an approximation for the sun-flower demands setting. Lemma 1.
Given an ( α, β ) group spanner H for graph G and terminal sets X , X , . . . , X g , wecan obtain an α + 2 β approximation for any sunflower demands instance defined over G and X j s.Proof. For all j , let H j be the Steiner trees over X j in H obtained via any constant factor approx-imation. We set { H , H , . . . , H g } as the routing solution for the given instance. The cost of thissolution is no more than | P | c ( H ) + P j | P j | c ( H j ). Recall that the optimal routing-solution cost forthe given instance is at least | P | c ( F ∗ ) + P j | P j | St G ( X j ). Therefore, using the fact that H is an( α, β ) group spanner and c ( H j ) ≤ O (1) St H ( X j ), we get the desired claim.Note that using group spanners we get an oblivious approximation in the sense that the con-struction uses only the knowledge of the underlying graph and the terminal sets but not the demandsets.In Section 4 we consider unweighted graphs with terminal sets that satisfy V = S j X j . Wedevelop an algorithm that obtains a (14 , O (log g )) group spanner for such an instance, and so byLemma 1 gives an O (log g ) approximation to the sunflower demands setting over the instance (seeTheorem 9). -approximation for the laminar demands setting Recall that in the laminar demands setting, for all
D, D ′ ∈ D with D ∩ D ′ = ∅ , we have D ⊆ D ′ or D ′ ⊆ D . As established in Section 2, in order to obtain a feasible solution in this setting, itsuffices to specify a forest H D for each demand set D such that each terminal set X j is connectedin S D ⊇ D j H D . The cost of the corresponding routing is P D | D | c ( H D ( T )).Our algorithm for the laminar demands case is an extension of the Goemans-Williamson primal-dual algorithm for the Steiner Forest Problem [11]. We begin by defining the primal and dual linearprograms.In the linear program below, the variable x e,D denotes whether e ∈ H D . We denote by δ ( S )the set of edges crossing a cut S ⊆ V , and by S D the collection of cuts S ⊆ V that separates aterminal set X j with D j ⊇ D . The cut constraints require that each terminal set X j is connectedby S D ⊇ D j H D . minimize X e,D ∈D x e,D · | D | c e subject to X D ′ ⊇ D X e ∈ δ ( S ) x e,D ′ ≥ ∀ D ∈ D , S ∈ S D X D ∈D ,S ∈S D y D,S subject to X D ′ ⊆ D X S ∈S D ′ : e ∈ δ ( S ) y D ′ ,S ≤ | D | c e ∀ e, D ∈ D The algorithm starts with a dual ascent stage in which it adds edges to forests { F D } D ∈D , and endswith a pruning stage. In the following discussion, for a demand set D ∈ D we say that S ∈ S D isa D -unsatisfied cut if ( S D ′ ⊇ D F D ′ ) ∩ δ ( S ) = ∅ . We also say that an edge e is D -tight if X D ′ ⊆ D X S ∈S D ′ : e ∈ δ ( S ) y D ′ ,S = | D | c e . In the dual ascent stage, the algorithm raises duals in phases, one per demand set D ∈ D inorder of increasing size. In phase D , while there exists a D -unsatisfied cut it alternates betweenraising duals of the minimal D -unsatisfied cuts and adding D -tight edges to F D . We say that S is an active set in the current iteration of the inner while loop if it is a minimal D -unsatisfiedcut. The algorithm ensures that at the end of phase D , the edges F D are paid for by the dualand F D is a Steiner forest for terminal sets whose demand set contains D . In the pruning stage,the algorithm processes the demand sets in order of decreasing size and removes unnecessary edgesfrom { F D } D ∈D and returns { H D } D ∈D .The following lemma implies that we can efficiently find active sets. Lemma 2.
In any iteration in phase D , a set S is active if and only if it is a component of F D and it separates a terminal set whose demand set contains D .Proof. Let S be an active set. By definition, S is a minimal cut in S D such that S D ′ ⊇ D F D ′ ∩ δ ( S ) = ∅ . Since S ∈ S D , it separates a terminal set whose demand set contains D . The algorithm raisesduals for demand sets in increasing order of size, so we have F D ′ = ∅ for D ′ ) D . This implies that F D ∩ δ ( S ) = ∅ and so S ∩ C = ∅ or S ∩ C ⊇ C for every connected component C of F D . Thus, S isa superset of a union of connected components of F D . The algorithm processes the demand sets inincreasing order of size, so we have F D ′ = ∅ for D ′ ) D and thus F D ∩ δ ( S ) = ∅ . This implies that S ∩ C = ∅ or S ∩ C ⊇ C for every connected component C of F D and so S is a superset of a unionof connected components of F D . By minimality, we have that S is a connected component of F D .For the converse, consider a connected component S ′ of F D that separates a terminal set whosedemand set contains D . By definition, we have S ′ ∈ S D . Since S ′ is a connected component of F D and F D ′ = ∅ for D ′ ) D , it is a minimal set in S D such that S D ′ ⊇ D F D ′ ∩ δ ( S ) = ∅ . Therefore S ′ is an active set. Our analysis follows along the lines of the analysis for the Goemans-Williamson algorithm. Wefirst establish that the primal and dual solutions generated by the algorithm are feasible.8 lgorithm 1
Primal-Dual Algorithm for Laminar Buy-at-Bulk Initialize F D ← ∅ for all D ∈ D and y D,S ← D ∈ D , S ⊆ V . (Dual ascent stage) for D ∈ D in increasing order of size do (Start of phase D ) while there exists a D -unsatisfied cut do Simultaneously raise y D,S for active sets S until some edge e goes D -tight. F D ← F D + e . end while (End of phase D ) end for (End of dual ascent stage) (Pruning stage) H D ← F D for all D ∈ D . for D ∈ D in decreasing order of size do for e ∈ H D do if ( H D − e ) ∪ S D ′ ) D H D ′ is a Steiner forest for terminal sets with demand set D then H D ← H D − e . end if end for end for (End of pruning stage) return { H D } D emma 3. The primal solution { H D } D ∈D and the dual solution { y D,S } D ∈D ,S ⊆ V are feasible.Proof. We first prove that the primal solution is feasible. Consider an iteration during the pruningstage. We say that terminal set X j is H -disconnected if it is disconnected with respect to edge set S D ⊇ D j H D and H -connected otherwise. We will show that all terminal sets are H -connected in alliterations of the pruning stage.Observe that at the end of phase D , there are no D -unsatisfied cuts and F D ′ = ∅ for D ′ ) D .Thus, all terminal sets with demand set D are connected with respect to edge set F D . At thebeginning of the pruning stage, we have H D = F D for all D ∈ D , and so all terminal sets are H -connected. Consider an iteration in which the algorithm deletes an edge e from H D . By definition of H -disconnected, this can only cause a terminal set with demand set D ′ ⊆ D to be H -disconnected.However, the algorithm will not delete e if it causes a terminal set with demand set D to be H -disconnected. Now consider a demand set D ′ ( D . Since | D ′ | ≤ | D | , we still have H D ′ = F D ′ soall terminal sets with demand set D ′ are H -connected. Thus, all terminal sets are H -connectedthroughout the pruning stage and so { H D } D ∈D is a feasible primal solution.The dual solution is feasible since the algorithm explicitly ensures that the dual variables in atight constraint are not raised.Next, we show that in each phase D of the dual raising stage, the current active sets has averagedegree with respect to edges S D ′ ⊇ D H D ′ (formally defined below) at most 2 in every iteration. Thisin turn implies that the primal solution has cost at most twice the total dual value. Since the dualis feasible, we have that the algorithm gives a 2-approximation. We bound the average degree ofactive sets by showing that S D ′ ⊇ D H D ′ is a forest and that no inactive set has degree 1. Lemma 4.
For all D ∈ D , we have that S D ′ ⊇ D H D ′ is a forest.Proof. Suppose, towards a contradiction, that the statement is false. Let D be a maximal demandset such that S D ′ ⊇ D H D ′ contains a cycle C . By maximality, there exists e ∈ C ∩ H D . Since e isin a cycle in S D ′ ⊇ D H D , we have that ( H D − e ) ∪ S D ′ ) D H D ′ is still a Steiner forest for terminalsets with demand set D . Thus, the algorithm would have removed e from H D and so we have acontradiction.For a subset of edges E ′ ⊆ E , let deg E ′ ( S ) = | δ ( S ) ∩ E ′ | denote the number of edges in E ′ exiting S . Lemma 5.
Consider an iteration in phase D of the dual raising stage. Let S be a connectedcomponent of F D in this iteration. If S / ∈ S D , then P D ′ ⊇ D deg H D ′ ( S ) = 1 .Proof. We prove the contrapositive. Suppose P D ′ ⊇ D deg H D ′ ( S ) = 1. Let e and A ⊇ D be theunique edge and demand set, respectively, such that e ∈ H A ∩ δ ( S ). Since the algorithm did notdelete e from H A and S D ′ ⊇ A H D ′ is acyclic by Lemma 4, there exists X j with D j = A and u, v ∈ X j such that e is on the unique u − v path in S D ′ ⊇ A H D ′ . Since P D ′ ⊇ D deg H D ′ ( S ) = 1, the path crosses S exactly once. Thus, we have that S separates u, v and so S ∈ S A . By definition of S D , we have S A ⊆ S D and this completes the proof of the lemma.We are now ready to prove that the primal solution has cost at most twice the dual value.10 emma 6. P D P e ∈ H D | D | c e ≤ P D,S y D,S .Proof.
Using the fact that we only add tight edges, we have X D X e ∈ H D | D | c e = X D X e ∈ H D X D ′ ⊆ D X S ∈S D ′ : e ∈ δ ( S ) y D ′ ,S = X D ′ X S ∈S D ′ y D ′ ,S X D ⊇ D ′ X e ∈ δ ( S ) ∩ H D = X D ′ X S ∈S D ′ y D ′ ,S X D ⊇ D ′ deg H D ( S ) = X D ′ X S ∈S D ′ y D ′ ,S deg S D ⊇ D ′ H D ( S ) . The second equality is obtained by rearranging, and the last follows from the fact that each edgeis in H D for at most one D ⊇ D ′ .Suppose that in an iteration in phase D ′ , the dual for each active set is raised by ∆. This implies P S ∈S D ′ y D ′ ,S deg S D ⊇ D ′ H D ( S ) increases by ∆ · P S active deg S D ⊇ D ′ H D ( S ), and P D,S y D,S increasesby ∆ · D ′ and in each iteration within thephase, the average degree of active sets is at most 2: X S active deg S D ⊇ D ′ H D ( S ) ≤ · . Fix an iteration in phase D ′ . Note that each active set corresponds to some connected compo-nent of F D ′ by Lemma 2. Let G ′ be a graph whose nodes are connected components of F D ′ andwhose edge set is S D ⊇ D ′ H D . The degree of a node in G ′ is equal to the degree of the correspondingset with respect to edge set S D ⊇ D ′ H D . Let us say that a node of G ′ corresponding to an active setis an active node , and that any other node is inactive . We want to show that the average degreeof active nodes in G ′ is at most 2. Suppose we remove all isolated nodes from G ′ . In the resultinggraph, by Lemma 5 the degree of each inactive node is at least 2, and by Lemma 4 the averagedegree is at most 2. So the claim follows.Lemmas 3 and 6 gives us the following theorem. Theorem 7.
Algorithm 1 is a -approximation for network design with coverage costs in the lam-inar demands setting. We now consider the sunflower demands setting. The main technical result of this section is thefollowing lemma which says that we can find a group spanner of linear size with stretch O (log g ). Lemma 8.
Given an unweighted graph G = ( V, E ) ( c e = 1 for all e ∈ G ) and terminal sets X , . . . , X g such that V = S j X j , we can construct in polynomial time a (14 , g ) group spanner. V = S j X j . Theorem 9.
Network design with coverage costs in the sunflower demands setting admits an O (log g ) approximation over unweighted graphs with vertex set V = S j X j . In the remainder of the section we will focus on unweighted graphs and write | H | to denote thecost (i.e., the number of edges) of subgraph H . Let us recall some notation: for a subgraph H ,St H ( X ) denotes the cost of an optimal (i.e., minimum cost) Steiner tree over vertex set X in H ,and d H ( u, v ) denotes the distance between vertices u, v in H . Let T denote a minimum spanningtree of the given graph G .Now we prove Lemma 8. To that end we consider uniform group spanner instances where thefollowing holds for all j : for all strict subsets S of X j , there exists an edge ( x, y ) ∈ E such that x ∈ S, y ∈ X j \ S . In other words, there exists an optimal Steiner tree for each X j with no Steinervertices and it is easy to find.Next we show that in order to establish Lemma 8 it suffices to solve uniform instances. We cantransform any given group spanner instance over an unweighted graph G with V = S j X j into auniform instance as follows: add to X j all Steiner vertices in the 2-approximate Steiner tree givenby the MST heuristic [28] applied over X j in G and let X ′ j be the resulting set. Since X ′ j is theset of all vertices of a Steiner tree, the group spanner instance with terminal sets X ′ , . . . , X ′ g is auniform one.Say we obtain subgraph H after solving the above uniform instance and H satisfies St H ( X ′ j ) ≤ β St G ( X ′ j ) for all j and | H | ≤ α | T | . We show that H is in fact a (2 α, β ) group spanner forthe original instance. The MST heuristic guarantees that St G ( X ′ j ) ≤ G ( X j ); which impliesSt H ( X j ) ≤ β St G ( X j ). Finally, let F ∗ denote an optimal Steiner forest for X , . . . , X g in G . In anunweighted instance, we have that | F ∗ | ≥ | T | /
2. This is because V = S j X j and each componentof the forest has at least one edge so | F ∗ | ≥ | V | / ≥ | T | /
2. Since, | H | ≤ α | T | we get the costguarantee, | H | ≤ α | F ∗ | .This implies that to prove Lemma 8 we only need to solve uniform group spanner instances. Inthe remainder of this section, we focus on uniform instances and for ease of exposition write X j inplace of X ′ j . Lemma 10.
Given any uniform group spanner instance with terminal sets X j , there exists a subsetof edges A of size | A | ≤ | T | such that for H := A ∪ T we have St H ( X j ) ≤ (2 log g ) St G ( X j ) for all j . Since | H | = | A | + | T | ≤ | T | and St H ( X j ) ≤ (2 log g ) St G ( X j ), we get that H is a (14 , g )group spanner that satisfies the desired bounds in Lemma 8.We now move on to present a constructive proof of Lemma 10. We assume that terminals of X j are ordered x j, , x j, , . . . such that for i >
1, there exists an edge ( x j,i , x j,k ) ∈ E for some k < i ;we call this edge a satisfying edge for x j,i . For ease of notation, we drop the indices when they donot matter and write ( x, y ) to denote x ’s satisfying edge. Note that such an ordering always exists,e.g. a preordering of the (uniform) Steiner tree over X j with any root. We say that a terminal x j,i ∈ X j is unsatisfied in a spanning subgraph H if d H ( x j,i , { x j, . . . , x j,i − } ) > g . Note thata single vertex may correspond to multiple satisfied/unsatisfied terminals of different groups. The We assume without loss of generality that | X j | ≥ j We define the lowest indexed vertex x j, to be always satisfied. β = 2 log g . Fact 1. If H is a spanning subgraph such that d H ( x j,i , { x j, . . . , x j,i − } ) ≤ g for all i > , thenthere exists a Steiner tree for X j in H with total size at most (2 log g ) St G ( X j ) . Our algorithm starts with the MST T and adds satisfying edges to it in order to construct H . In order to bound the cost of these edges, the algorithm maintains an arc set E ′ defined overthe vertex set V . Let G ′ denote the directed graph ( V, E ′ ). At the beginning of the algorithm, E ′ is empty. We use arcs to refer to directed edges in E ′ and simply edges for edges in E . Ouralgorithm works in two phases. In the first phase, for each unsatisfied terminal, the algorithm addsits satisfying edge only if we can add an oriented copy of it to E ′ and modify nearby arcs in E ′ suchthat the out-degree of every node is at most 2. The main lemma is that the number of unsatisfiedterminals at the end of this phase is at most | V | , and so we can simply add their satisfying edgesin the second phase. We use the following notation for the algorithm: δ + ( x ) denotes the numberof edges of E ′ that are oriented away from x ; Γ( x ) ⊆ V denotes the set of terminals reachable from x via a directed path in E ′ of length at most log g . Algorithm 2
Algorithm for uniform graph spanner instances (Phase 1) E ′ , A , A ← ∅ while there exists x that is unsatisfied in T ∪ A and z ∈ Γ( x ) such that δ + ( z ) ≤ do Add x ’s satisfying edge ( x, y ) to E ′ oriented from x to y Add ( x, y ) to A if δ + ( x ) > then Flip directions of arcs in G ′ along x − z path end if end while (Phase 2) For every x unsatisfied in T ∪ A , add its satisfying edge ( x, y ) to A return A = A ∪ A At the end of the algorithm every vertex is satisfied. Fact 1 then implies that H = T ∪ A ∪ A is a group spanner with β = 2 log g . So we only need to bound the sizes of A and A . Since thereis a one-to-one correspondence between edges in A and arcs in E ′ , the following lemma impliesthat | A | = | E ′ | ≤ | V | . Lemma 11.
We have δ + ( x ) ≤ for all x ∈ V .Proof. We prove the lemma by induction on the iterations of the algorithm. The base case ( E ′ = ∅ )is trivial. The interesting case is when δ + ( x ) = 2 at the beginning of the iteration and the algorithmadds ( x, y ) to E ′ oriented from x to y . At this point, we have δ + ( x ) = 3, δ + ( z ) ≤ x − z path have out-degree at most 2 by the inductive hypothesis. When thealgorithm flips the arcs on the path, it decrements δ + ( x ) by 1, increments δ + ( z ) by 1 and does notaffect the out-degrees of other terminals on the path. This proves the lemma.Next we bound | A | . 13 emma 12. | A | ≤ | V | .Proof. First we prove that, even if we ignore edge directions, the length of the smallest cycle (i.e.girth) in E ′ is at least log g . Assume, towards a contradiction, that there is an undirected cycle oflength k ≤ log g in E ′ . Let ( x, y ) be the last arc added in the cycle. Before the algorithm added it,there is a path from x to y of length k − A corresponding to the other arcs in the cycle. Thiscontradicts the condition for adding ( x, y ); in particular, x is not unsatisfied.Let U = { x j,i : x j,i unsatisfied in T ∪ A } . For x j,i ∈ U , we have δ + ( z ) = 2 for all z ∈ Γ( x j,i )since otherwise we would have added its satisfying edge in phase 1. Since the girth of E ′ is atleast log g , we have a full binary tree of depth log g rooted at x j,i in E ′ . This implies | Γ( x j,i ) | ≥ g .Furthermore, for any x j,i , x j,k ∈ U with i > k , we have Γ( x j,i ) ∩ Γ( x j,k ) = ∅ because otherwise d T ∪ A ( x j,i , x j,k ) ≤ g and x j,i would not have been unsatisfied in T ∪ A . Therefore anyterminal can belong to at most one Γ( x j,i ) per j , giving us P x j,i ∈ U | Γ( x j,i ) | ≤ g | V | . Hence we getthe desired bound: | V | ≥ P x j,i ∈ U | Γ( x j,i ) | /g ≥ g | U | /g = | U | = | A | .Lemmas 11 and 12 imply that | A | + | A | ≤ | V | . Furthermore, the algorithm ensures that allthe terminals are satisfied in T ∪ A ∪ A . Together with Fact 1, we get Lemma 10. References [1] Ashok Anand, Vyas Sekar, and Aditya Akella. SmartRE: an architecture for coordinatednetwork-wide redundancy elimination. In
ACM SIGCOMM , 2009.[2] Matthew Andrews. Hardness of buy-at-bulk network design. In
Foundations of ComputerScience, 2004. Proceedings. 45th Annual IEEE Symposium on , pages 115–124. IEEE, 2004.[3] Matthew Andrews and Lisa Zhang. Approximation algorithms for access network design.
Algorithmica , 34(2):197–215, 2002.[4] Baruch Awerbuch and Yossi Azar. Buy-at-bulk network design. In
Proceedings of the 38thIEEE Symposium on Foundations of Computer Science , pages 542–547, 1997.[5] Baruch Awerbuch, Alan Baratz, and David Peleg. Cost-sensitive analysis of communicationprotocols. In
Proceedings of the ninth annual ACM symposium on Principles of distributedcomputing , PODC ’90, pages 177–187, New York, NY, USA, 1990. ACM.[6] Siddharth Barman and Shuchi Chawla. Traffic-redundancy aware network design. In
Proceed-ings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’12,pages 1487–1498. SIAM, 2012.[7] Y. Bartal. Probabilistic approximation of metric spaces and its algorithmic applications. In
Proceedings of the 37th Annual Symposium on Foundations of Computer Science ǫ , β )-spanner constructions for general graphs. SIAMJournal on Computing , 33(3):608–631, 2004.1410] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating ar-bitrary metrics by tree metrics. In
Proceedings of the thirty-fifth annual ACM symposium onTheory of computing , pages 448–455, 2003.[11] M. Goemans and D. Williamson. A general approximation technique for constrained forestproblems.
SIAM Journal on Computing , 24(2):296–317, 1995.[12] S. Guha, A. Meyerson, and K. Munagala. Hierarchical placement and network design problems.In
Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on , pages603–612, 2000.[13] Anupam Gupta, Amit Kumar, Martin P´al, and Tim Roughgarden. Approximation via cost-sharing: A simple approximation algorithm for the multicommodity rent-or-buy problem. In
Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science , FOCS’03, pages 606–617, 2003.[14] Anupam Gupta, Amit Kumar, and Tim Roughgarden. Simpler and better approximationalgorithms for network design. In
STOC ’03: Proceedings of the thirty-fifth annual ACMsymposium on Theory of computing , pages 365–372, 2003.[15] A. Hayrapetyan, C. Swamy, and ´E. Tardos. Network design for information networks. In
Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms , pages 933–942. Society for Industrial and Applied Mathematics, 2005.[16] Samir Khuller, Balaji Raghavachari, and Neal Young. Balancing minimum spanning andshortest path trees. In
Proceedings of the fourth annual ACM-SIAM Symposium on Discretealgorithms , SODA ’93, pages 243–250, Philadelphia, PA, USA, 1993. Society for Industrial andApplied Mathematics.[17] Amit Kumar, Anupam Gupta, and Tim Roughgarden. A constant-factor approximation algo-rithm for the multicommodity rent-or-buy problem. In
Proceedings of the 43rd Symposium onFoundations of Computer Science , FOCS ’02, pages 333–344, 2002.[18] F Thomson Leighton and Ankur Moitra. Extensions and limits to vertex sparsification. In
Proceedings of the 42nd ACM symposium on Theory of computing , pages 47–56. ACM, 2010.[19] Adam Meyerson, Kamesh Munagala, and Serge Plotkin. Cost-distance: Two metric networkdesign.
SIAM J. Comput. , 38(4):1648–1659, December 2008.[20] Ankur Moitra. Approximation algorithms for multicommodity-type problems with guaranteesindependent of the graph size. In
Foundations of Computer Science, 2009. FOCS’09. 50thAnnual IEEE Symposium on , pages 3–12. IEEE, 2009.[21] Seth Pettie. Low distortion spanners. In
Automata, Languages and Programming
Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms ,pages 1088–1097, 2004. 1524] Neil T. Spring and David Wetherall. A protocol-independent technique for eliminating redun-dant network traffic. In
Proceedings of the conference on Applications, Technologies, Architec-tures, and Protocols for Computer Communication - SIGCOMM ’00 , pages 87–95, Stockholm,Sweden, 2000.[25] Zoya Svitkina and ´Eva Tardos. Facility location with hierarchical facility costs. In
Proceedingsof the seventeenth annual ACM-SIAM symposium on Discrete algorithm , pages 153–161, 2006.[26] Kunal Talwar. The Single-Sink Buy-at-Bulk LP Has Constant Integrality Gap. In
Proceed-ings of the 9th International IPCO Conference on Integer Programming and CombinatorialOptimization , pages 475–486, 2002.[27] Mikkel Thorup and Uri Zwick. Spanners and emulators with sublinear distance errors. In
Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm , pages802–809. ACM, 2006.[28] Vijay V Vazirani.