Centrality-based Middlepoint Selection for Traffic Engineering with Segment Routing
George Trimponias, Yan Xiao, Hong Xu, Xiaorui Wu, Yanhui Geng
IIEEE/ACM TRANSACTIONS ON NETWORKING 1
On Traffic Engineering with Segment Routingin SDN based WANs
George Trimponias, Yan Xiao, Hong Xu, Xiaorui Wu, and Yanhui Geng
Abstract —Segment routing is an emerging technology to sim-plify traffic engineering implementation in WANs. It expressesan end-to-end logical path as a sequence of segments, eachof which is represented by a middlepoint. In this paper, wearguably conduct the first systematic study of traffic engineeringwith segment routing in SDN based WANs. We first provide atheoretical characterization of the problem. We show that forgeneral segment routing, where flows can take any path thatgoes through a middlepoint, the resulting traffic engineering isNP-hard. We then consider segment routing with shortest pathsonly, and prove that the traffic engineering problem can nowbe solved in (weakly) polynomial time when the number ofmiddlepoints per path is fixed and not part of the input. Ourresults thus explain, for the first time, the underlying reasonwhy existing work only focuses on segment routing with shortestpaths. In the second part of the paper, we study practicaltraffic engineering using shortest path based segment routing.We note that existing methods work by taking each node asa candidate middlepoint. This requires solving a large-scalelinear program which is prohibitively slow. We thus proposeto select just a few important nodes as middlepoints for alltraffic. We use node centrality concepts from graph theory,notably group shortest path centrality, for middlepoint selection.Our performance evaluation using realistic topologies and traffictraces shows that a small percentage of the most central nodescan achieve good results with orders of magnitude lower runtime.
Index Terms —Segment Routing, Traffic Engineering, GraphCentrality, Software Defined Networking
I. I
NTRODUCTION
Traffic engineering (TE) is an important task for networkoperators to improve network efficiency and application per-formance. TE is commonly exercised in a wide range ofnetworks, from carrier networks [22], [28] to data center back-bones [30], [31]. Increasingly, TE is implemented using SDN(Software Defined Networking) given its flexibility. Notableexamples include Google’s B4 [31] and Microsoft’s SWAN[30].Implementing TE in the data plane requires a large numberof flow table entries on switches. This is because each switchon the path needs to have an entry for a demand, i.e. ingress-egress switch pair, to forward its traffic to the next hop, and fora large-scale network there can be many demands. Commodityswitches, on the other hand, have very limited capacity forflow entries (usually 1-2 thousands of entries [9], [35]) due
The work was supported in part by contract research between City Univer-sity of Hong Kong and Huawei. The corresponding author is Hong Xu.G. Trimponias and Y. Geng are with Huawei Noah’s Ark Lab, Hong Kong,China (email: [email protected], [email protected]).Y. Xiao, H. Xu and X. Wu are with Department of Computer Sci-ence, City University of Hong Kong, Hong Kong, China (email: [email protected], [email protected], [email protected]). to the expensive TCAM (Ternary Content Aware Memory)hardware needed [8], [9], [40]. The use of wildcarding couldreduce the number of flow entries, but it is often undesirableas it reduces the ability to implement demand-level policiesand monitoring. Therefore it has become a major challenge topractically implement TE on commodity SDN switches.Segment routing [18]–[20] is a recently proposed routingarchitecture to tackle this challenge. Its key idea is to performrouting based on a sequence of logical segments formed bysome middlepoints between the ingress and egress nodes. Asegment is the logical pipe between two middlepoints that mayinclude multiple physical paths spanning multiple hops, andECMP is used to load balance traffic among these paths. Nowwith segment routing instead of end-to-end paths, intermediateswitches only need to know how to reach middlepoints inorder to forward packets. They no longer need to maintain per-demand routing information which scales quadratically withthe number of nodes. Thus segment routing has the potentialto greatly reduce the overhead and cost of TE [5], [28].Segment routing has been explored with TE in some existingwork. For example Bhatia et al. [5] apply 2-segment routingto TE, where any logical path contains only one middlepointand thus two segments. Hartert et al. [28] propose someheuristics to solve various TE problems with segment routing.There lacks a thorough exploration and understanding ofapplying segment routing to TE, particularly the hardness ofthe resulting TE problem, and the development of practicalTE algorithms with segment routing.In this paper, we conduct arguably the first systematic studyof TE with segment routing in SDN based WANs. We firstfocus on the theoretical aspects of TE with segment routing.We consider two common types of TE:
T E MF that maximizestotal throughput based on multi-commodity flow, and T E LU that minimizes the maximum link utilization. T E MF is mostlyfor data center backbone WANs [30], [31], and T E LU mostlyfor carrier networks [22], [28].We provide new hardness results for TE with segment rout-ing in directed networks. With general segment routing wheretraffic can take any path that goes through a middlepoint,we prove that it is NP-hard to decide if the maximum flowthrough just one given middlepoint is greater than 0 (§IV).Thus T E MF is NP-hard. Due to the connection between thedecision version of maximum flow and T E LU , this also provesthat T E LU given a single fixed middlepoint is NP-hard. Wethen study a restricted form of segment routing that uses onlyshortest paths between two segments in §V, and prove thatboth TE problems now can be solved in (weakly) polynomialtime as an LP when the number of middlepoints per path a r X i v : . [ c s . N I] M a r IEEE/ACM TRANSACTIONS ON NETWORKING is fixed and not part of the input. Our results thus providea theoretical foundation for existing work that focuses onshortest path based segment routing [5], [28]. Interestingly,imposing acyclic end-to-end paths renders TE NP-hard.Given our theoretical results, we next focus on the practicalproblem of how to choose a small but representative set ofmiddlepoints in order to solve TE with shortest path basedsegment routing (§VI). Existing approaches [5] assume thatfor each demand, every node in the network is potentiallya middlepoint candidate, and formulate it as part of the TEproblem. This causes the TE to be of a very large scale, whichmakes it computationally expensive to solve for practicalpurposes. As we show in §VII-C, it cannot be solved by theECOS solver [11] after three hours on a medium topologywith 100 nodes and 1500 commodity flows, while in practiceTE often needs to be re-computed at the granularity of 10minutes [28], [30], [31], [37].Thus we propose to apply the centrality concept from graphtheory and network analysis [42] to select a few middlepointsto route all traffic in the network. Centrality was first developedin social network analysis [6], [24] to determine the mostinfluential nodes in a social graph. In the context of routing,centrality can be naturally viewed in terms of the nodeimportance when routing the demands along the admissiblepaths. We explore several centrality definitions based on thenetwork topology only, such as shortest-path, group shortest-path, and degree centralities, and apply them to middlepointselection in networks. We also introduce weighted variants thatadditionally take into account the link capacities.We conduct comprehensive performance evaluation of cen-trality based middlepoint selection using real topologies andtraffic traces. Our results demonstrate that only a smallpercentage of around 2.5%–7% of the most central nodescan achieve good TE performance with orders of magnitudelower runtime. Using centrality based middlepoint selectionmethods, one can solve TE problems with up to 3000 flowson a 161-node topology in less than 3 minutes. We alsoobserve that group shortest-path consistently outperforms othercentralities for middlepoint selection, and may be used as thesole solution in practice for simplicity.Finally, we comment that our work is of independent theo-retical interest for two reasons. First, our theoretical analysisfor
T E MF can be used to prove that the flow centralities,first introduced in 1991 by Freeman, Borgatti, and White [23],are NP-hard to compute in directed graphs, thus restrictingtheir practical applicability. Second, we show a profounddichotomy between directed and undirected networks: TE andflow centralities are NP-hard to compute in the former, but theyare (at least) weakly polynomial in the latter. These results areincluded in Appendices at the end.II. A P RIMER ON S EGMENT R OUTING
We start by introducing segment routing and the benefit ofapplying it to TE. We next explain related work on segmentrouting.Segment routing [18]–[20] is a recently proposed architec-ture based on source routing that facilitates packet forwarding via a series of segments. It can be directly applied to MPLSand IPv6. The key idea is that the ingress switch can breakup the end-to-end logical path into segments, and specify thislogical path as a series of middlepoints to traverse. Figure 1illustrates an example of segment routing. The ingress switch Sembeds a stack of segment labels (MPLS labels for example)into the packet header to specify the entire path. Note hereeach label just represents a middlepoint in the network, i.e. weconsider node segments here [18]–[20]. The top label is theactive label that instructs packet forwarding. Then the packetis sent to the next label M along the shortest path(s). ECMPis used if there are multiple shortest paths. When the packetreaches M , the top label M is popped and the packet isrouted to the next label M . Finally, all the labels are poppedand the packet arrives at the egress switch D. M S M D Packet DM M SR header
Packet DM Packet D (cid:1)(cid:1) Fig. 1. A segment routing example from S to D through middlepoints M and M . One key advantage of segment routing is that it can greatlyreduce routing cost in terms of number of flow table entriesrequired. To see this, consider the next example shown inFigure 2. Three demands , which refer to the aggregatedflows between a unique ingress-egress switch pair, are routedthrough three paths P1, P2, and P3 to their respective des-tinations. With tunnel-based forwarding in SDN [30], [31],each intermediate switch needs to store flow entries for eachdemand, and in total 12 entries are needed as shown inTable II. Now if segment routing is applied with node E as themiddlepoint, the three paths can be represented using just twolabels each as in Figure 2, and switches C and D only needto have one entry in order to forward to the middlepoint E.The total number of entries is reduced to only 8, with 33.3%saving. AB C FD E GIH
Path Segment labels
P1P2 P3
P1P2P3 {E, G}{E, H}{E, I}
Fig. 2. An example where segment routing saves flow table entries.
Given the potential of segment routing, some recent workhas started to investigate how to apply it in TE. In [5], theauthors propose solutions for determining the optimal TE withsegment routing and ECMP. In their scheme, they regard allnodes except for the source and the destination as candidatemiddlepoints, and split flows across exactly one middlepoint.Although they limit the number of middlepoints to one foreach logical path, since they consider all the intermediate When it is clear from the context, we use the terms commodities , demands ,and flows interchangeably. RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 3
Node w.o. segment routing w. segment routingA 2 2B 1 1C 3 1D 3 1E 3 3
TABLE IN
UMBER OF FLOW TABLE ENTRIES FOR THE EXAMPLE IN F IG . 2. nodes for one demand, the search space for middlepoints isvery large. The algorithm thus cannot scale to handle mediumto large scale networks. Hartert et al. [27], [28] studied asimilar TE problem with segment routing under a constraintprogramming framework. Their middlepoint selection methodalso takes every node as a potential candidate on a per-demandbasis, and they have to resort to heuristics to reduce the runtime of the algorithm.The exploration of segment routing in TE has been ad-hocso far. Existing work uses heuristics [28] or some special forof segment routing [5] without theoretical justification. Weare thus motivated to conduct a systematic study of segmentrouting in TE, including the theoretical characterization ofthe hardness results of various forms of segment routing, andpractical algorithms of solving the problems.Given our focus on TE, in the next section we reviewthe two common types of TE formulations. We subsequentlyreveal an interesting connection between them, which weutilize later in the proof of several hardness results.III. B ACKGROUND ON T RAFFIC E NGINEERING
In our work, we focus on two common types of trafficengineering, depending on the objective criterion. The firsttype maximizes the total throughput subject to the capacityand maximum demand constraints. Since it can be formulatedas a maximum flow problem, we call it
T E MF . The secondtype minimizes the maximum link utilization, which acts asthe system bottleneck. For this reason, we call it T E LU . A. Preliminaries
Assume a directed graph G = ( V, E ) , where V is the set ofnodes and E the set of directed edges. Given a node v ∈ V , v + denotes the set of outgoing edges of node v , i.e., the subsetof edges in E of the form ( v, u ) , u ∈ V . Similarly, the set v − denotes the set of incoming edges of v of the form ( u, v ) , u ∈ N . The out-degree of v is defined as the cardinality | v + | ,whereas the in-degree is defined as the cardinality | v − | .A flow network G = ( V, E, c ) is defined as a directed graph G = ( V, E ) , together with a non-negative function c : V × V → R ≥ that assigns to each edge e ∈ E a non-negativecapacity c ( e ) . If ( u, v ) (cid:54)∈ E , then we define c ( u, v ) = 0 .A walk in a directed graph is an alternating sequence ofvertices and edges, v , e , v , . . . , v k − , e k − , v k , whichbegins and ends with vertices and has the property that each e i is an edge from v i to v i +1 . A path is a walk where alledges are distinct. A simple path is a path where all verticesare distinct. The term u - v path (resp., simple path) refers toany valid path (resp., simple path) from u to v . In flow networks, we usually distinguish between single-commodity and multi-commodity flows. For single-commodityflow, we consider a single commodity that consists of a source s ∈ V and a sink t ∈ V , where s (cid:54) = t . For multi-commodityflows, we assume L commodities of the form ( s i , t i ) , where s i , t i ∈ V, s i (cid:54) = t i . Each commodity i is associated with anon-negative demand D i ≥ . For convenience, we also usethe notation s = ( s , . . . , s L ) and t = ( t , . . . , t L ) , and write ( s , t ) to denote the corresponding multi-commodity network. B. TE Type 1:
T E MF Let P i be the set of all s i - t i paths, and P i,e the set of all s i - t i paths that go through edge e . Then the maximum multi-commodity flow program can be expressed by the followingpath-based formulation:maximize ν = L (cid:88) i =1 (cid:88) p ∈P i f i ( p ) (1)subject to L (cid:88) i =1 (cid:88) p ∈P i,e f i ( p ) ≤ c ( e ) , ∀ e ∈ E (2) (cid:88) p ∈P i f i ( p ) ≤ D i (3) f i ( p ) ≥ , ∀ i ∈ { , . . . , L } , ∀ p ∈ P i (4)The commodity flow i along path p ∈ P i is f i ( p ) . Constraint(2) is a capacity constraint that the sum of all sub-flows on anyedge cannot exceed the edge capacity. Constraint (3) simplydescribes the maximum demand D i for commodity i . Finally,constraint (4) simply imposes that the flow should be non-negative. For any valid flow f , the value of a flow ν ( f ) isdefined as the total sum of units that all sub-flows f i send.The maximum flow is then simply defined as ν max . T E MF ismostly used in data center backbone WANs [30], [31], wheretraffic is elastic and the main objective is to fully utilize theexpansive WAN links.Note that even though the single-commodity maximumflow accepts various combinatorial algorithms [3], e.g., Ford-Fulkerson or Edmonds-Karp, there is to date no combinato-rial algorithm for the maximum multi-commodity flow eventhough the problem is known to be strongly polynomial dueto Tardos [46]. Furthermore, even though single-commoditynetworks always accept an integer maximum flow, this is notalways the case with multi-commodity networks; in fact, thedecision problem of integral multi-commodity flow is NP-complete even if the number of commodities is two, for boththe directed and undirected cases [14]. C. TE Type 2:
T E LU T E LU is mostly used in carrier networks [22], [28], wheretraffic demands are given and inelastic, and the main objectivethus is to control the congestion or link utilization in order to IEEE/ACM TRANSACTIONS ON NETWORKING ensure the smooth operation of the network. The general formfor this type of TE is:minimize θ (5)subject to L (cid:88) i =1 (cid:88) p ∈P i,e f i ( p ) ≤ θ · c ( e ) , ∀ e ∈ E (6) (cid:88) p ∈P i f i ( p ) ≥ D i (7) f i ( p ) ≥ , ∀ i ∈ { , . . . , L } , ∀ p ∈ P i (8)The variable θ in the objective function (5) refers to themaximum link utilization, which must be minimized. Con-straint (6) ensures that θ will be at least as large as themaximum link utilization; constraint (7) ensures that eachdemand is satisfied; and the last constraint (8) is similar to T E MF in Section III-B. D. An Interesting Connection
Having introduced the two types of TE, a natural questionis to ask whether they are related. To answer this question,we introduce the following decision version of the maximummulti-commodity flow problem.
Definition 1. [Decision version of maximum flow (DMF)]Given a flow network G = ( V, E, c ) with a set of L commodi-ties ( s , t ) , each associated with a non-negative maximum de-mand D i ≥ , decide whether the maximum multi-commodityflow has a value of at least L (cid:80) i =1 D i . Note that is the answer to the decision problem DMF isa “yes”, then by constraint (3) the maximum flow has to beexactly equal to L (cid:80) i =1 D i . If the answer is no, then the maximumflow is strictly less than L (cid:80) i =1 D i .We are now ready to establish the following result thatreveals the relationship between the two types of TE: Lemma 1.
DMF accepts a “yes” answer, if and only if thesystem (5) – (8) for T E LU accepts a solution θ ∗ ≤ .Proof. Assume that DMF accepts a “yes” answer. Then thereis a flow that respects constraints (2)–(4). That flow will thentrivially satisfy constraints (2)–(4) with θ = 1 . Since theobjective criterion of T E LU minimizes over θ , the optimalsolution to the TE program (5)–(8) will accept an optimalsolution θ ∗ ≤ .For the reverse direction, assume that the system (5)–(8)accepts a solution θ ∗ ≤ . Then constraint (6) implies thatthe capacity constraints are satisfied for each edge, thus thecorresponding flow is a valid flow for system (1)–(4) withvalue L (cid:80) i =1 D i . The maximum flow has then trivially a value ofat least L (cid:80) i =1 D i .Lemma 1 shows that solving T E LU immediately generatesa “yes” or “no” answer to the DMF. Thus, the TE naturally encompasses the general DMF problem of Definition 1. Thisalso suggests that hardness results on the DMF (Proposition 1)immediately imply hardness for T E LU . We describe this inmore detail in the next section.To conclude this section, notice that even though we as-sumed a directed network throughout Section III, it is possibleto extend the definitions to undirected graphs as well. Themain difference is that an undirected edge is associated with acapacity, and flow can travel in both directions of a link, underthe constraint that the sum of the flow value in the two edgedirections does not exceed the capacity. Despite the innocuousdifference, our analysis in Appendix B uncovers a profounddichotomy between the two cases with regard to maximumflow.IV. H ARDNESS OF G ENERAL S EGMENT R OUTING
In this section, we present the first part of our theoreticalinvestigation. We study the general form of applying segmentrouting to TE, where traffic can take any path that goes througha middlepoint. We prove an important new result that it is NP-hard to decide if the maximum flow through just one givenmiddlepoint is greater than 0. Due to the connection betweenthe decision version of maximum flow and
T E LU , this alsomeans that T E LU given a single fixed middlepoint is alreadyNP-hard. This then motivates us to consider using segmentrouting with shortest paths only in the next section. A. Hardness of
T E MF The maximum multi-commodity flow f max with value ν max refers to the total flow over all possible paths that eachcommodity accepts. Assume instead that we focus on themaximum flow that can go through a specific network node,e.g., w (cid:54) = s, t . Let P wi be the set of all s i - w - t i paths (i.e. s i - t i paths that go through w ), and P wi,e the set of all s i - w - t i pathsthat also go through edge e . The path-based formulation thenis: maximize ν w = L (cid:88) i =1 (cid:88) p ∈P wi f i ( p ) subject to L (cid:88) i =1 (cid:88) p ∈P wi,e f i ( p ) ≤ c ( e ) , ∀ e ∈ Ef i ( p ) ≥ , ∀ i ∈ { , . . . , L } , ∀ p ∈ P wi We denote the maximum flow through any node w as the maximum w - flow f wmax and denote its value by ν wmax . Alter-natively, we use the notation s - w - t flow for single-commoditynetworks (or s - w - t for multi-commodity networks). Similarly,for single-commodity flows we also write ν wmax ( s, t ) (or ν wmax ( s , t ) for multi-commodity networks) for the value ofthe maximum w -flow.Note that in the single-commodity case we always assumethat w (cid:54) = s, t , even if not explicitly stated. Indeed, if either w = s or w = t then ν max = ν wmax . In this case, the problemis strongly polynomial and accepts combinatorial algorithmssuch as the Ford-Fulkerson algorithm. RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 5
A central result in graph theory that we will be usingthroughout the paper is the two node-disjoint path (2DP)problem due to Fortune, Hopcroft and Wyllie [21].
Theorem 1 (NP-hardness of 2DP [21]) . Assume a directedgraph G = ( V, E ) and four distinct vertices u , u , v , v ∈ V . It is NP-hard to decide whether there are two node-disjointpaths in G from u to u and from v to v . We are now ready to provide two lemmas.
Lemma 2.
Computing whether there is a simple s - w - t path ina directed graph G = ( V, E ) , where w, s, t ∈ V and w (cid:54) = s, t ,is NP-hard.Proof. Finding whether there is a simple s - t path that goesthrough a node w is equivalent to determining whether thereare two node-disjoint paths from s to w and from w to t (excluding of course node w ). We prove that the latter problemis NP-hard by a reduction from the NP-hard 2DP problem.Assume a directed graph G = ( V, E ) and nodes u , u , v , v ∈ V . We introduce a new node w and create the new edges e = ( u , w ) and e = ( w, v ) . We now argue that there aretwo node-disjoint paths from u to u and from v to v , ifand only if there is a simple u - w - v path.Indeed, if the former condition is satisfied, then we canknow for sure that the path from u to u cannot go throughnode w through edge e since otherwise that path would alsohave to use node v after w since u is the end node. Similarlywe can argue that the path from v to v cannot go throughnode w through edge e . But then we can form a new pathfrom u to v by concatenating the path u to u , edge e ,edge e , and finally the path from v to v . This path doesnot repeat any node since the node disjoint paths from u to u and from v to v do not contain w , hence it is a simplepath. For the reverse direction, we just note that if there existsa simple u - w - v path, then this path will necessarily containedges e and e . By removing these two edges, we get twonode-disjoint paths, one from u to u and another from v to v , since the u - w - v path is simple. Lemma 3.
Computing whether there is a s - w - t path in adirected graph G = ( V, E ) , where w, s, t ∈ V and w (cid:54) = s, t ,is NP-hard.Proof. Finding whether there is a s - t path that goes througha node w is equivalent to determining whether there are twoedge-disjoint paths from s to w and from w to t . We nowargue that the latter problem is NP-hard by a reduction fromthe 2DP problem.Indeed, assume any graph G = ( V, E ) , and three distinctnodes s, t, w ∈ V . We construct a new graph G (cid:48) = ( V (cid:48) , E (cid:48) ) from G = ( V, E ) in the following manner. For each node v ∈ V we introduce two nodes v in , v out ∈ V (cid:48) . For each edge e = ( u, v ) ∈ E , we introduce an edge e (cid:48) = ( u out , v in ) ∈ E (cid:48) .Moreover, E (cid:48) contains an edge e (cid:48) = ( v in , v out ) connectingeach pair of nodes v in , v out in G (cid:48) that we constructed above.We now make the statement that there are two edge-disjointpaths in graph G (cid:48) from s out to w in and from w out to t , if andonly if there are two node-disjoint paths in G from s to w andfrom w to t . Indeed, consider first any two node-disjoint paths in G ,namely, s, u , . . . , u l , w and w, v , . . . , v m , t , where all inter-mediate nodes u i and v j are distinct. Then it is easy to seethat the paths s out , u ,in , u ,out , . . . , u l,in , u l,out , w in and w out , v ,in , v ,out , . . . , v m,in , v m,out , t in in G (cid:48) are : (1) validsince they use existing edges in G (cid:48) , and (2) edge-disjoint sincethe set of nodes on the first path is disjoint to the set of nodesin the second. For the reverse direction, consider two edge-disjoint paths in G (cid:48) from s out to w in and from w out to t . Wethen argue that these paths necessarily have the above form s out , u ,in , u ,out , . . . , u l,in , u l,out , w in and w out , v ,in ,v ,out , . . . , v m,in , v m,out , t in . The reason is that any pair ofnodes ( v in , v out ) can only be reached from other nodes in V (cid:48) via v in and can only reach other nodes in V (cid:48) via v out . So, apath will necessarily consist of consecutive pairs of nodes ofthe form ( v in , v out ) (with the exception of the two endpoints).Furthermore, any such pair ( v in , v out ) can (1) appear at mostonce on either path, and (2) cannot appear on both paths. Thereason is that going from v in to v out requires edge v in , v out ,but the two paths are edge-disjoint. But then it holds that thepaths s, u , . . . , u l , w and w, v , . . . , v m , t in G are both validand node-disjoint. This completes the proof.We next provide definitions and results for the maximum w -flow that are reminiscent of results in traditional single-commodity maximum flow. One significant difference is thatthe cut is now defined as a collection of edges rather than acollection of nodes. We focus first on the s - w - t flow in single-commodity networks. Definition 2. A s - w - t edge-cut is a subset of edges C w ⊆ E such that removing the edges in C w from the graph resultsin no s - w - t paths, i.e., there are no s - w - t paths in the graph G (cid:48) = ( V, E − C w ) . The value c ( C w ) of the edge-cut is definedas the sum of the capacities of all edges in C w . Lemma 4.
Let f w be any s - w - t flow, and C w any s - w - t cut.Then ν w ( f w ) ≤ c ( C w ) .Proof. First, note that the flow f w is the sum of individualflows, each going through a distinct s - w - t path p . Each ofthese individual flows must go through at least some edge in e ∈ C w , otherwise there would be a s - w - t path in the graph G (cid:48) = ( V, E − E (cid:48) ) . So, let F e be the set of flows that gothrough e . Then we have: (cid:88) e ∈C w ν w ( F e ) ≤ (cid:88) e ∈C w c ( e ) ⇔ (cid:88) e ∈C w ν w ( F e ) ≤ c ( C w ) ⇔ ν w ( f w ) ≤ c ( C w ) (9)Note that (cid:80) e ∈C w ν w ( F e ) ≤ ν w ( f ) , since we argued that thepath for each individual flow must go through at least oneedge in C w . Lemma 5.
Given a flow network G = ( V, E, c ) and threedistinct nodes s, w, t , the maximum value of any s - w - t flow isequal to the minimum capacity of any s - w - t edge-cut. IEEE/ACM TRANSACTIONS ON NETWORKING
Proof.
Consider a variation of the well-known Ford-Ful-kerson algorithm for (single-commodity) maximum flow [43],where at each round the algorithm picks a s - w - t path ratherthan just a s - t path. The augmenting s - w - t path algorithmterminates, if and only if there is an s - w - t edge-cut in thegraph where each edge e ∈ C w is saturated. Next, note thatthe total flow comprised of all individual path flows must beat least as large as the capacity of that edge-cut. But since byLemma 4 the value of any flow can be at most as large asthe value of any s - w - t edge-cut, it must necessarily hold thatthe flow that the augmenting path algorithm returns is maximaland equal to the capacity of the minimum s - w - t edge-cut. Corollary 1.
For integral capacities, there is a maximum s - w - t flow that is integral.Proof. Note that by Lemma 5 the augmenting s - w - t pathalgorithm returns a maximal flow and, furthermore, at eachstep the flow on any edge is integral. Thus, there must be amaximal flow whose value on any edge is integral.We are now ready to prove that the decision problem ofwhether the maximum flow through any specific node isgreater than 0 is NP-hard under integral demands. Proposition 1.
Given a multi-commodity flow network G =( V, E, c ) , it is NP-hard to even decide whether ν wmax ( s , t ) > .Proof. We show that even the single-commodity version ofour problem is NP-hard. Since the single-commodity case canbe seen as a special case of the multi-commodity problem, ourresult immediately implies hardness for the multi-commoditymaximum flow as well. Our strategy will be to reduce the s - w - t path problem in Lemma 3 to the single-commodity maximumflow problem with integral demands.Indeed, consider a directed graph G = ( V, E ) and threedistinct nodes s, t, w ∈ V . We construct a flow-network G (cid:48) from G in the following manner. We consider one commodityfrom s to t , and we further assume that each edge e ∈ E hasa capacity of 1. We then argue that there is a path from s to t through w , if and only if the maximum flow through w in G (cid:48) is greater than 0.First, assume there is a path p = ( s, e , . . . , e m , t ) , so thatevery edge e i in the path appears only once and node w appears in the path. It is then possible to send one unit offlow from s to t , given the unit capacities. Thus, the maximummulti-commodity flow will be at least 1, and thus for suregreater than 0. For the reverse direction, assume there is amaximum flow greater than 0 in G (cid:48) . Since we only have onecommodity, it follows that there has to be a maximum flowwhere the flow value on every edge is integral. Moreover, notethat each edge can be used at most once in that flow due to itsunit capacity. Now, consider any path p ( s,w ) that carries flowfrom s to w , and any path p ( w,d ) that carries flow from w to d . Since each edge is used at most once, this means that thepath p ( s,w ) ∪ p ( w,d ) has the properties: (1) it goes from s to d through w , and (2) it visits any edge at most once. Thus,there must be a path from s to t through w . Corollary 2.
Given a flow network G = ( V, E, c ) with asingle commodity ( s, t ) and a node w ∈ V , it is NP-hard tocompute the minimum s - w - t cut C w .Proof. We can show that even for single commodity-networksthe problem is NP-hard. Indeed, by Lemma 5 we know thatthe maximum value of any s - w - t flow is equal to the minimumcapacity of any s - w - t edge-cut. But since by Proposition 1 it isNP-hard to compute the maximum s - w - t flow, it must also beNP-hard to compute the value of the minimum s - w - t cut.Perhaps unexpectedly, the above hardness results do nothold when the underlying graph is undirected. Appendix Bdiscusses this dichotomy in detail. B. Hardness of
T E LU For the flow network G with the L commodities, we definethe T E LU in the following manner: min θ (10)subject to L (cid:88) i =1 (cid:88) p ∈P wi,e f i ( p ) ≤ θ · c ( e ) , ∀ e ∈ E (11) L (cid:88) i =1 (cid:88) p ∈P wi f i ( p ) ≥ (12) f i ( p ) ≥ , ∀ i ∈ { , . . . , L } , ∀ p ∈ P wi (13)Lemma 1 shows that solving the T E LU immediately gener-ates a “yes” or “no” answer to the corresponding DMF. Thus,the hardness results on the DMF (Proposition 1) immediatelyimply hardness for the corresponding TE: Corollary 3.
It is NP-hard to solve the
T E LU (10) – (13) . V. S
EGMENT R OUTING WITH S HORTEST P ATHS
Given the NP-hardness of applying general segment routingin TE, here we consider TE with shortest path based segmentrouting. That is, now traffic is routed only along the shortestpaths for a given segment. In this sense, our results providefor the first time theoretical foundation for existing work thatfocuses on shortest path based segment routing [5], [28].
A. Network model and TE
Assume there are in total K middle points available. Eachend-to-end path can use up to M ≤ K of these middlepoints. For a segment s ∈ S between an ingress node anda middlepoint, two middlepoints, or a middlepoint and anegress node, there are multiple paths in general. We assume,for simplicity, that routing is done by ECMP over all shortestpaths of a segment. This is consistent with prior work [5]. Weuse T i to denote the complete set of logical tunnels formed bysegments in S that can be used for commodity i , with up to M middle points. A tunnel involves only ingress/egress switch,and the intermediate middlepoints. This can be constructedoffline efficiently.Let G t,s denote if a tunnel t uses segment s or not, and I p,e denote if path p uses link e or not. Furthermore, let ˆ P s be the RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 7 set of all shortest paths for segment s , and f i ( t ) represent theflow in tunnel t for commodity i . The split ratio x i,t for i ontunnel t is defined as the ratio x i,t = f i ( t ) (cid:80) t ∈ Ti f i ( t ) .The T E LU problem with segment routing can be formu-lated similar to Section III-C, where the set of paths P i forcommodity i is now replaced by the set of logical tunnels T i : min θ (14)s.t. L (cid:88) i =1 (cid:88) t ∈ T i (cid:88) s ∈ S t (cid:88) p ∈ ˆ P s f i ( t ) I p,e | ˆ P s | ≤ θ · c ( e ) , ∀ e ∈ E, (15) ≤ f i ( t ) , ∀ i ∈ { , . . . , L } , t ∈ T i , (16) (cid:88) t ∈ T i f i ( t ) ≥ D i , ∀ i ∈ { , . . . , L } . (17)The capacity constraint (15) indicates that the total trafficrouted to link e from across all flows, tunnels, segments, andshortest paths, cannot exceed θ times the link capacity. SinceECMP is used for routing within any segment s , each shortestpath p of segment s receives flow equal to f i ( t ) / | ˆ P s | .Regarding the TE asymptotic complexity, we have thefollowing result when M is fixed and not part of the input: Proposition 2.
For fixed M with respect to the input graph G , the T E LU problem described by Equations (14) - (17) canbe solved in (weakly) polynomial time.Proof. The number of commodities L cannot exceed | V | · ( | V | − , and the number | T i | of tunnels per commodity i is upper bounded by (cid:0) K (cid:1) + · · · + (cid:0) KM (cid:1) , where K ≤ | V | . Forfixed M w.r.t. the input graph G , | T i | has polynomial sizew.r.t the graph. Finally, the number S t of segments per tunnelcannot exceed K + 1 ≤ | V | + 1 , since a tunnel can use atmost all K middlepoints. For the inner sum (cid:80) p ∈ ˆ P s I p,e | ˆ P s | , notethat it basically denotes the percentage of shortest paths forsegment s that use link e . However, this can be computed inpolynomial time, e.g. by using the techniques in [7].Thus, we have proved that for fixed M , the LP has apolynomial number of variables, and a polynomial number ofconstraints whose coefficients can be computed in polynomialtime. The proposition then immediately follows by standardresults in linear programming [33], [34].Given that the T E MF formulation is very similar, wecan similarly prove that is can also be solved in (weakly)polynomial time for segment routing with shortest paths.Finally, we observe that the TE problem is naturally relatedto the shortest path centrality that we discussed in Section VI.Indeed, the inner part (cid:80) p ∈ ˆ P s I p,e | ˆ P s | of constraint (15) preciselydescribes the percentage of shortest paths for segment s thatuse a specific edge, and note that we do that for all possiblesegments. Even though shortest path centrality refers to anode rather than an edge and equally takes into accountall possible source-destination pairs, constraint (15) revealsinteresting connections between the popular centrality metricand the segment routing problem. B. Hardness of Acyclic Segment Routing
So far, we have focused on segment routing where routingon a segment is based on ECMP. One challenge is thatthis generally produces source destination paths with edgerepetitions, i.e., walks. Even in the case of just one middlepoint per path, it is possible that a (simple) shortest path fromthe source s to a middle point M shares an edge e with ashortest path from M to the destination d . So, even thoughthe paths for any given segment are simple, the resulting s - d path may not even be a path. This may reduce the performanceof segment routing, because it increases the link load on thereused edges and may lead to higher link utilization.In that case a natural question arises: what if we considersegment routing with shortest paths for segments, under thecondition that the resulting walk from the source to thedestination is a path or even a simple path? As our subsequentanalysis shows, traffic engineering will become NP-hard, evenfor just one commodity. To prove this fact, we first introducethe following fundamental result due to Eilam-Tzoreff [12]. Theorem 2 (NP-hardness of k DSP [12]) . Given a graph G =( V, E ) and k pairs of distinct vertices ( u i , v i ) , ≤ i ≤ k ,the k DSP problem of computing k pairwise disjoint shortestpaths P i between u i and v i is NP-complete, when k is part ofthe input. This result holds for all four versions of the k DSPproblem, namely, node or edge-disjoint paths for directed orundirected graphs.
Proposition 3.
The
T E MF and T E LU problems in a directedor undirected graph with K middle points (1) using onlyshortest path segment routing and (2) only allowing pathsor simple paths from a source to a destination, are NP-hard,even for just one commodity, when K is part of the input.Proof sketch. We can show the statement by making similararguments as for general segment routing TE in Section IV.For
T E MF , the idea is to first show that we can solve k DSP if and only if we can solve the corresponding
T E MF formulation.Indeed, assume a directed or undirected graph G = ( V, E ) ,two distinct nodes s, t in V , and K distinct nodes s (cid:54) = M i (cid:54) = t in V , ≤ i ≤ K . Consider we do segment routing from s to t using nodes M i as our K middle points. We will show thatthe the T E MF problem is NP-hard by a reduction from the k DSP problem.Concretely, assume for instance the k DSP node-disjointproblem in Theorem 2. We construct a new graph G (cid:48) asfollows. For each i , ≤ i ≤ K − , we introduce a newnode M i along with the two (directed or undirected) edges e iin = ( v i , M i ) and e iout = ( M i , u i +1 ) . Moreover, we associateeach edge in G (cid:48) with a positive capacity, and we assume thesingle commodity ( s, t ) = ( u , v K ) with a positive demand D > . We now argue that there are K node-disjoint shortestpaths between u i and v i , if and only if T E MF ) with the singlecommodity ( s, t ) and the K − middle points M , . . . , M K − accepts a positive solution (maximum flow).But this can be proven using very similar techniques as inSection V-B. The only difference is that the minimum edgecut in this case corresponds to the minimum sum of edge IEEE/ACM TRANSACTIONS ON NETWORKING capacities whose removal results in no path (or simple path)using shortest paths from the source to the destination throughthe middle points.Finally, NP-hardness for
T E LU then follows immediatelyby Lemma 1, in a similar spirit as Corollary 3.Interestingly, Proposition 3 is general and holds for bothdirected and undirected graphs. We emphasize however thatit assumes that the number of middle points K is part of theinput. For k = 2 , [12] provides a polynomial algorithm forthe undirected case of k DSP, whereas the complexity for thedirected case when k = 2 remains open.VI. C ENTRALITY B ASED M IDDLEPOINT S ELECTION
The previous sections investigated the fundamentals of seg-ment routing, and showed that the TE for unrestricted segmentrouting is NP-hard. On the other hand, Proposition 2 suggeststhat if we only allow a fixed number M of middlepoints perpath, then TE with shortest paths is (weakly) polynomiallycomputable.One approach would then be to consider all nodes ascandidate middlepoints, i.e. K = | V | . However, that results invery large TE programs that are costly to solve. An alternativeis to just consider a small number of middlepoints such that K (cid:28) | V | , that would still produce good output for the TE.Given that this is generally NP-hard [28], in this section wediscuss practical middlepoint selection based on alternativecentrality measures with polynomial complexity. Note thatthese centralities are structural metrics that look at the graphstructure, i.e., the connections among the various nodes. How-ever, they generally do not take into account the flow networkand its flow conservation and capacity constraints, which wasthe case with the NP-hard flow centralities in Appendix A. Shortest-path centrality.
We start with shortest-path cen-trality, which characterizes the power of a node in terms ofthe number of shortest paths that go through that node for arandomly picked source-destination pair. Concretely, assumea directed graph G = ( V, E ) . The shortest-path betweennesscentrality of a node v ∈ V [24] is defined as: δ ( v ) = (cid:88) s,t ∈ V | s (cid:54) = v (cid:54) = t σ st ( v ) σ st , (18)where σ st ( v ) is the number of shortest paths from s to t that go through v , and σ st the total number of shortestpaths from s to t . Calculating the shortest path centrality ofall vertices in a graph requires Θ( | V | ) time and Θ( | V | ) space. This can be achieved by augmenting the Floyd-Warshallalgorithm for the all-pairs shortest-paths problem with pathcounting. Brande’s algorithm improves these bounds by onlyusing O ( | V | + | E | ) space and running in O ( | V | + | E | ) and O ( | V | · | E | + | V | log | V | ) time on unweighted and weightednetworks, respectively [7]. Group shortest-path centrality.
As opposed to the afore-mentioned individual centrality, the group shortest-path be- tweenness centrality of a group of nodes C ⊆ V refers to thecombined centrality of the group [15]. It is defined as: δ G ( C )( v ) = (cid:88) s,t ∈ V | s (cid:54) = v (cid:54) = t σ st ( C ) σ st , (19)where σ st ( C ) the number of shortest paths that go through any node in C . Group betweenness centrality can be approx-imated within a factor − e to the optimal using a greedyincremental algorithm [10]. Brandes’ algorithm for computingthe betweenness centrality of all vertices can be modified tocompute the group betweenness centrality of one group ofnodes with the same asymptotic running time [44]. Degree centrality.
A simple alternative to the family ofshortest path centralities is degree centrality. The degreecentrality of a node v ∈ V is defined as the average of itsin-degree and its out-degree: d ( v ) = | v + | + | v − | (20)Degree centrality captures a node’s power by its number ofneighbors; the higher that number, the better connected thenode and the larger its centrality. Despite its simplicity, degreecentrality can capture to a good extent a node’s structuralimportance. Weighted centralities.
All aforementioned centralities onlyemploy the graph connectivity information, and treat all linksequally. However, in practice links are further characterizedby their capacity. We can thus define variants of the previouscentralities that additionally take into account the link capacityinformation. A simple approach is to associate each edge withthe non-negative cost c ( e ) . This is based on the observationthat the higher the capacity, the lower the cost of the link sinceit can accommodate larger flows. The shortest path centralityvariants are simple to define, if we note that the cost of apath is the sum of the costs of its constituent links, andthe shortest path refers to the path with the minimum costamong all paths. In a similar spirit, we can define the weighteddegree of any node as the sum of the costs of the edgesthat are incident to the node. Intuitively, we expect that theweighted variants should perform better since they take intoaccount both the connectivity and the capacity information.Section VII-E empirically confirms our intuition.VII. E VALUATION
In this section, we conduct trace-driven simulations toevaluate the performance of centrality based middlepoint se-lection methods. The experiments are designed to answer thefollowing important questions: • What is the best parameter setting for centrality basedmiddlepoint selection? Specially, how many middlepointsper commodity, and how many middlepoints in totalshould we use? • How does our centrality based approach compare to exist-ing work, in terms of both performance and complexity? • How do various centrality definitions perform againsteach other?
RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 9
A. Methodology
We use two network topologies from the dataset providedby DEFO (Declarative and Expressive Forwarding Optimizer)[2] which is used in [28]. One is a synthetic network with100 nodes ( synth100 in the dataset), and the other is a realnetwork with 161 nodes ( rf3257 in the dataset). Table IIprovides more details about the networks. The DEFO datasetalso contains information of commodity flows (simply referredto as flows hereafter) for these topologies. For the real 161-node topology the flows are provided by the ISP [28]. For thesynthetic topology the demand matrices are computed usingthe approach in [45]. As explained in [28], this approach usesa gravity model fed with i.i.d. exponential random variables.It produces realistic demand matrices as shown in [28], [45].
Type ID synth100
100 572 9817Real rf3257
161 656 25486
TABLE IID
ATASET S UMMARY . We perform simulations on servers each with a 2.2 GHz64-bit 8-Core Xeon processor and 128 GB memory. We usethe cvxpy [1] modeling language with the ECOS solver [11]to solve the LPs. Our evaluation compares the followingschemes: • Baseline : Traditional approach of applying segment rout-ing studied in Sec. IV of [5]. Specifically, the TE problemassumes that every node is a candidate of middlepoint,and exactly one middlepoint is used for each flow. Onlyshortest paths are used for segment routing. • Random : Our approach where a total of K middle-points are randomly selected and used in TE problems T E MF and T E LU . • Shortest-path centrality (SP) : Our approach where mid-dlepoints are selected using shortest-path centrality asexplained in §VI for TE. • Group shortest-path centrality (GSP) : Our approachwhere middlepoints are selected using group shortest-pathcentrality. • Degree centrality (Degree) : Our approach where middle-points are selected using degree centrality.
B. Microscopic Performance
First we aim to understand the microscopic performance ofour centrality based approach. There are two key parametersaffecting our approach in general: the number of middlepointsper flow M , and the total number of available middlepoints forall flows K . Their effects need to be thoroughly understoodbefore we compare our approach to existing methods. Number of middlepoints per flow M : We begin by tryingto answer how many middlepoints should be used for eachflow. Note that there is an inherent tradeoff: more middlepointsper flow leads to more flexibility in constructing the pathsand balancing the traffic, and thus better performance. On theother hand it also means more overhead in terms of highercomplexity of the TE algorithms. To demonstrate this tradeoff, we use T E LU and com-pute the maximum link utilization with our centrality basedmiddlepoint selection when the number of middlepoints perflow M is equal to 1 or 2. We use the synth100 networkwith 100 nodes. We choose 1000 flows for the topologyrandomly for ten times and report the average. We apply GSP to select the middlepoints, and vary the total numberof available middlepoints K from 2 to 6. For a given K , theflows are identical for different values of M . Note we alsoexperiment with T E MF and the 161-node network; the resultsare qualitatively similar and omitted here for space. Number of middlepoints (K) M a x li n k u t ili z a t i o n ( % ) M=1M=2
Fig. 3. Maximum link utilizationof the 100-node network with 1000flows and varying M . number of middlepoints (K) L P t i m e ( s ) M=1M=2
Fig. 4. LP solving time of
T E LU onthe 100-node network with 1000flows and varying M . Fig. 3 and Fig. 4 depict the results. We find that interest-ingly, the maximum link utilizations for M = 1 and M = 2 are quite similar. Yet the time of solving the TE with M = 2 is much higher than M = 1 as shown in Fig. 4 (a differenceof almost 20x) when K = 6 ). Given that the middlepointsare central, there is indeed a low probability that a bottlenecklink exists between two middlepoints. Hence, maximum linkutilization is largely determined by the bottleneck links be-tween either the source and the middlepoint, or between themiddlepoint and the destination. Therefore we conclude that1 middlepoint per flow is good enough for performance, anduse that throughout the remainder of the experiments. Number of total middlepoints K : We next run experimentsto verify that just a few central middlepoints are sufficient toachieve satisfactory TE performance. We vary the total numberof middlepoints K from 1 to 6 for T E LU and from 1 to8 for T E MF . The middlepoints are selected using Random , SP , GSP , and
Degree as explained in §VII-A. We use boththe 100-node and 161-node networks, and randomly choose1000 flows and 2000 flows respectively for 10 runs. We reportthe average and standard deviation results. Since
Random isnon-deterministic, we randomly select 5 sets of middlepointsfor each of the 10 flow sets, resulting in 50 runs in totalfor Random. For
T E MF , in order to make the results morereadable, we scale the traffic volumes by 10 times for 100-nodetopology and 40 times for 161-node topology, respectively.We depict the results in Fig. 5–Fig. 8. As expected, moreavailable middlepoints improve TE performance. For T E LU ,when there is only one available middlepoint for the network,every flow has to be routed through it, which severely limitsthe path choice and the maximum link utilization is wayabove 1 for Random and over 1 for the other schemes. Withtwo middlepoints the maximum link utilization is dramaticallyreduced by over 50% for most schemes as seen in Fig. 5 and
Fig 6. The same can be observed for
T E MF in Fig. 7 andFig. 8. The demand satisfaction ratio is improved by arounda factor of 2 when K increases to 2. Number of middlepoints (K) M a x li n k u t ili z a t i o n RandomSPDegreeGSP
Fig. 5. 100-node network with 1000flows of
T E LU . Number of middlepoints (K) M a x li n k u t ili z a t i o n Fig. 6. 161-node network with 2000flows of
T E LU . Number of middlepoints (K) D e m a n d s a t i s f a c t i o n ( % ) RandomSPDegreeGSP
Fig. 7. 100-node network with 1000flows of
T E MF . Number of middlepoints (K) D e m a n d s a t i s f a c t i o n ( % ) Fig. 8. 161-node network with 2000flows of
T E MF . Another important observation is that the TE performanceexhibits diminishing marginal gains as K increases. For T E LU , when more than 4 middlepoints are used, very limitedgains are observed ( < T E MF ,beyond 7 middlepoints there is little demand satisfactionimprovement especially for GSP in Fig. 7 and Fig. 8. Onthe other hand the runtime of the TE algorithms increasesdramatically due to the growing size of the LP problems.Take the 161-node network for instance. The LP time for
T E LU increases by ∼
50% when K increases from 4 to 6 asshown in Table III, and from 6 to 8 for T E MF as in Table IV. Scheme K TABLE IIIA
VERAGE LP TIME ( SECONDS ) OF NODE NETWORK WITH
FLOWS OF
T E LU . Scheme K TABLE IVA
VERAGE LP TIME ( SECONDS ) OF NODE NETWORK WITH
FLOWS OF
T E MF . Based on the above results, we conclude that 4 middlepointsfor
T E LU and 7 middlepoints for T E MF are the sweetspotsof the tradeoff between performance and complexity. We thus use these settings in the rest of the experiments. This confirmsthe intuition behind our centrality based approach, namely, thatit suffices to just use a small fraction of nodes as middlepoints(2.48%–7% of nodes) to achieve satisfactory performance. C. Comparison with Baseline
Our motivation of using centrality based middlepoint selec-tion is to reduce the high complexity of existing approaches,which takes all nodes in the network as middlepoints [5]as discussed in §I. We now compare our approach against
Baseline to validate its effectiveness in this regard. The ex-periments here are performed on the 100-node topology forboth TE formulations. The maximum link utilization and LPtime of
T E LU are shown in Fig. 9 and Fig. 10, respectively.The demand satisfaction ratio and corresponding LP time aredepicted in Fig. 11 and Fig. 12, respectively, for T E MF . Wescale the demands of flows by a factor of 2 for T E LU and afactor of 40 for T E MF .
200 250 300 350 400
Number of flows M a x li n k u t ili z a t i o n ( % ) BaselineRandom SPDegree GSP
Fig. 9. Performance of
T E LU withdifferent centralities and Baseline .
200 250 300 350 400
Number of flows L P t i m e ( s ) BaselineRandom SPDegree GSP
Fig. 10. LP time of
T E LU with dif-ferent centralities and Baseline . Notethe log scale of the y-axis.
500 550 600 650 700
Number of flows D e m a n d s a t i s f a c t i o n ( % ) BaselineRandom SPDegree GSP
Fig. 11. Performance of
T E MF be-tween different centralities and Base-line .
500 550 600 650 700
Number of flows L P t i m e ( s ) BaselineRandom SPDegree GSP
Fig. 12. LP time of
T E MF be-tween different centralities and Base-line . Note the log scale of the y-axis.
Notice that with
Baseline , the TE problems have much morevariables and constraints due to the large number of middle-points. As a result, our machines can only solve
T E LU with ∼
400 flows, and
T E MF with ∼ T E MF with Baseline and 1500 flows takes more thanthree hours, far exceeding the time scale (5–10 min) at whichTE is performed in practice [28], [30], [31]. Thus we only run
Baseline with up to 700 flows for
T E MF to make sure theLP time is less than 1000 seconds.As shown in Fig. 9, the maximum link utilization of ourapproach is about 4–5 times that of Baseline , whereas the LP
RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 11 time of
Baseline is at least 40 times worse than any centralitybased approach shown in Fig. 10. For
T E MF , the demandsatisfaction ratio of Baseline is about 1.5 times of ours inFig. 11 but the LP time is about 60 times higher than ours asin Fig. 12.Indeed we observe that our centrality based approach sac-rifices performance in order to reduce the complexity ofTE. We argue that this is a sensible tradeoff to make inmost cases, especially for data center backbone WANs thatuse
T E MF with very short time periods of 5–10 min [28],[30], [31]. Centrality based approach can support much largertopologies and much more flows with orders of magnitudesmaller runtime. One can also increase K to obtain betterperformance if necessary. D. Comparison of Various Centralities
We now wish to understand the relative performance ofvarious centralities in realistic settings. We use both the 100-node and 161-node topologies with M = 1 . Total number ofmiddlepoints K is set to 4 for T E LU and 7 for T E MF basedon our previous experiments. We vary the number of flowsand for a given number of flows randomly draw flows 15times from the traces. For Random we perform 5 independentrandom selections of middlepoints for a given set of flows,resulting in 75 runs in total. For each run we compute therespective performance metrics and report the average andstandard deviation. In order to make the results more readablewe scale the demands by 10 for 100-node topology and 40 for161-node topology, respectively for
T E MF . Number of flows M a x li n k u t ili z a t i o n ( % ) RandomSPDegreeGSP
Fig. 13. Performance of
T E LU onthe 100-node network with variouscentralities. M =1 and K =4. Number of flows M a x li n k u t ili z a i o n ( % ) Fig. 14. Performance of
T E LU onthe 161-node network with variouscentralities. M =1 and K =4. Number of flows D e m a n d s a t i s f a c t i o n ( % ) RandomSPDegreeGSP
Fig. 15. Performance of
T E MF onthe 100-node network with variouscentralities. M =1 and K =7. Number of flows D e m a n d s a t i s f a c t i o n ( % ) Fig. 16. Performance of
T E MF onthe 161-node network with variouscentralities. M =1 and K =7. Fig. 13 and Fig. 14 depict the results for
T E LU , and Fig. 15and Fig. 16 for T E MF . We can make several interestingobservations. First, for the 100-node network SP and GSP perform the best under all settings in Fig. 13. In contrast, forthe 161-node topology in Fig. 14
GSP and
Degree performthe best. When considering
T E MF , Fig. 15 shows that GSP and
Degree perform better in the 100-node topology, whilein the 161-node topology
GSP performs best in Fig. 16.Thus, middlepoints chosen by group shortest path centralityconsistently outperform those selected by other centralities interms of TE performance.The main advantage of
GSP is that it selects a set ofmiddlepoints whose combined power is strong. In particular, SP may select nodes that are individually strong but cover thesame set of shortest paths; thus, when combined together thesenodes result in poor performance since they share the sameshortest paths and are unable to spread out the traffic. Thisis the reason why GSP performs consistently well, while theperformance of SP can fluctuate from very strong as in Fig. 13to very poor and even worse than Random as in Fig. 14.Second,
Random performs the worst in Fig. 13, Fig. 15,and Fig. 16, and it also performs badly in the 161-nodenetwork in Fig. 14. This confirms our premise that centralitybased middlepoint selection generally outperforms a naiverandom selection scheme. Indeed,
Random does not utilize anytopological information from the network. Further,
Random fluctuates wildly, which makes it ill-fitted for practical use.As seen from the figures,
Random has the largest standarddeviations among all.Third, we observe that the performance of SP can be worsethan Random sometimes in Fig. 14. Indeed, SP just greedilyselects the top- K shortest-path central nodes, even though inreality these nodes may share several shortest paths. Random ,on the other hand, can do better than SP in certain settingssince it has a lower probability of choosing overlappingshortest paths.Another aspect of performance is the runtime of the TELPs. Table V and Table VI show the average runtimes for T E LU and T E MF respectively. Random consistently hasthe worst results. SP takes the least time but the differencebetween SP , GSP , and
Degree is little. All of the schemescan finish within 100 seconds even with 2000 flows, whichdemonstrates that centrality based segment routing can bepractically used in large-scale networks.
Scheme
TABLE VA
VERAGE LP TIME ( SECONDS ) OF NODE NETWORK WITH
T E LU . Scheme
TABLE VIA
VERAGE LP TIME ( SECONDS ) OF NODE NETWORK WITH
T E MF . The reason that
Random has the longest runtime is thatit selects nodes that are not central with possibly manydistinct paths and links. This leads to more active optimizationvariables and constraints for the same LP, thus longer runtime.By the same token, the reason that SP has the lowest runtimeis that it selects top- K central nodes with many overlappingshortest paths. This results in fewer active links being usedfor routing, and thus fewer active optimization variables andconstraints in the LP.To summarize, based on the above experimental results andanalysis, we find that GSP consistently delivers the best TEperformance with the least LP time among all centralities weconsidered.
E. Comparison of Weighted Centralities
The centralities we have studied so far only considered theconnectivity of the network topology. As discuss in §VI, it isalso possible to take into account the link capacity informationby adding weights to links and using weighted versions ofcentralities.
Number of flows M a x li n k u t ili z a t i o n ( % ) GSPweighted SPweighted Degreeweighted GSP
Fig. 17. 100-node network when M =1 and K =4 based on weightedcentrality for T E LU . Number of flows M a x li n k u t ili z a t i o n ( % ) Fig. 18. 161-node network when M =1 and K =4 based on weightedcentrality for T E LU . Number of flows D e m a n d s a t i s f a c t i o n ( % ) GSPweighted SPweighted Degreeweighted GSP
Fig. 19. 100-node network when M =1 and K =7 based on weightedcentrality for T E MF . Number of flows D e m a n d s a t i s f a c t i o n ( % ) Fig. 20. 161-node network when M =1 and K =7 based on weightedcentrality for T E MF . We also carry out experiments to compare the performanceof weighted SP , weighted degree , and weighted GSP central-ities against
GSP , the best centrality without using weightsfor middlepoint selection. Weighted here means that the threecentrality based approaches are weighted by the capacityof each edge. Fig. 17 and Fig. 18 show the performancecomparison with
T E LU , while Fig. 19 and Fig. 20 showthe comparison with T E MF . For the 161-node topology, weobserve that GSP and weighted
GSP are always the best.In the 100-node topology,
GSP is sometimes worse thanweighted
Degree and weighted
GSP although the differencesare very small. Therefore,
GSP without weights is still the most effective and robust middlepoint selection method in allsettings. VIII. R
ELATED W ORK
We now review related work on segment routing other thanthose discussed already in §II. Segment routing is a relativelynew concept with limited prior work. Aubry et al. [4] proposeto use segment routing for continuous monitoring of the dataplane of the network with a single box. Segment routing isused to force probe packets to traverse specific paths. Giorgettiet al. [26] propose algorithms for segment routing label stackcomputation that guarantee minimum label stack depth.TE has been extensively studied in carrier networks [13],[22], [28], [29], [32], [47], and has also attracted muchattention recently in data center backbone WANs [25], [30],[31], [37] with software defined networking [16]. End-to-endpaths are usually used while we study segment routing herein TE.Finally, we note that graph centralities have been appliedto routing in some specific SDN problems, such as in servicechain embedding [39] and incremental SDN deployment [36],[38]. In a service chain [39], traffic needs to be steered througha set of waypoints, with the goal of admitting a maximumnumber of routes. In the context of hybrid and incrementalSDN deployment [36], a set of middleboxes need to be de-ployed in order to serve a maximal number of flows, respectingflow rule constraints. Solutions to these problems are basedon degree centralities, and there exist greedy approximationalgorithms exploiting submodularity as well [38]. Contrary tothese works, our paper focuses on the theoretical fundamentalsof TE using segment routing and on graph-theoretic practicalmiddlepoint selection.IX. C
ONCLUSION
We have conducted the first systematic study of trafficengineering with segment routing in SDN based WANs. Weshowed that TE for the general segment routing is NP-hard,while segment routing with shortest paths is polynomial whenthe number of middlepoints per logical path is fixed and notpart of the input. We also studied practical TE with shortestpath based segment routing, and proposed to select just a fewimportant nodes for all network traffic using graph theoreticcentrality concepts. Our performance evaluation demonstratedthat just a small percentage of powerful nodes can achievegood results at very low time complexities.R
Network Flows: Theory,Algorithms, and Applications . Prentice-Hall, Inc., 1993.[4] F. Aubry, D. Lebrun, S. Vissicchio, M. T. Khong, Y. Deville, andO. Bonaventure, “SCMon: Leveraging Segment Routing to ImproveNetwork Monitoring,” in
Proc. IEEE INFOCOM , 2016.[5] R. Bhatia, F. Hao, M. Kodialam, and T. V. Lakshman, “OptimizedNetwork Traffic Engineering using Segment Routing,” in
Proc. IEEEINFOCOM , 2015.[6] P. Bonacich, “Power and Centrality: A Family of Measures,”
AmericanJournal of Sociology , vol. 92, no. 5, pp. 1170–1182, 1987.
RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 13 [7] U. Brandes, “A faster algorithm for betweenness centrality,”
Journal ofMathematical Sociology , vol. 25, pp. 163–177, 2001.[8] R. Cohen, L. Lewin-Eytan, J. S. Naor, and D. Raz, “On the effectof forwarding table size on SDN network utilization,” in
Proc. IEEEINFOCOM , 2014.[9] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, andS. Banerjee, “Devoflow: Scaling flow management for high-performancenetworks,” in
Proc. ACM SIGCOMM , 2011.[10] S. Dolev, Y. Elovici, R. Puzis, and P. Zilberman, “Incremental deploy-ment of network monitors based on group betweenness centrality,”
Inf.Process. Lett. , vol. 109, no. 20, pp. 1172–1176, 2009.[11] A. Domahidi, E. Chu, and S. Boyd, “ECOS: An SOCP solver forembedded systems,” in
Proc. European Control Conference (ECC) ,2013.[12] T. Eilam-Tzoreff, “The disjoint shortest paths problem,”
Discrete Appl.Math. , vol. 85, no. 2, pp. 113–138, 1998.[13] A. Elwalid, C. Jin, S. Low, and I. Widjaja, “MATE: MPLS AdaptiveTraffic Engineering,” in
Proc. IEEE INFOCOM , 2001.[14] S. Even, A. Itai, and A. Shamir, “On the complexity of time table andmulti-commodity flow problems,” in
Proceedings of the 16th AnnualSymposium on Foundations of Computer Science , 1975, pp. 184–193.[15] M. G. Everett and S. P. Borgatti, “The centrality of groups and classes,”
The Journal of Mathematical Sociology , vol. 23, no. 3, pp. 181–201,1999.[16] N. Feamster, J. Rexford, and E. Zegura, “The road to SDN: Anintellectual history of programmable networks,”
ACM Queue , vol. 11,no. 12, pp. 20:20–20:40, December 2013.[17] U. Feige, “A threshold of ln n for approximating set cover,”
Journal ofACM , vol. 45, no. 4, pp. 634–652, 1998.[18] C. Filsfils, S. Previdi, A. Bashandy, and Decraene, “Segment routingwith mpls data plane,”
Internet Engineering Task Force, Internet Draft(Work in Progress) draft-ietf-spring-segment-routing-mpls-00 , 2014.[19] C. Filsfils, P. Francois, and Previdi, “Segment routing use cases,” 2013.[20] C. Filsfils, N. K. Nainar, and Pignataro, “The Segment Routing Archi-tecture,” in
Proc. IEEE Globecom , 2015.[21] S. Fortune, J. Hopcroft, and J. Wyllie, “The directed subgraph homeo-morphism problem,”
Theoretical Computer Science , vol. 10, no. 2, pp.111–121, 1980.[22] B. Fortz and M. Thorup, “Internet traffic engineering by optimizingOSPF weights,” in
Proc. IEEE INFOCOM , 2000.[23] L. C. Freeman, S. P. Borgatti, and D. R. White, “Centrality in valuedgraphs: A measure of betweenness based on network flow,”
SocialNetworks , vol. 13, no. 2, pp. 141–154, 1991.[24] L. C. Freeman, “A Set of Measures of Centrality Based on Between-ness,”
Sociometry , vol. 40, no. 1, pp. 35–41, 1977.[25] A. Ghosh, S. Ha, E. Crabbe, and J. Rexford, “Scalable multi-class trafficmanagement in data center backbone networks,” 2013.[26] A. Giorgetti, P. Castoldi, F. Cugini, J. Nijhof, F. Lazzeri, and G. Bruno,“Path encoding in segment routing,” in
Proc. IEEE Globecom , 2015.[27] R. Hartert, P. Schaus, S. Vissicchio, and O. Bonaventure, “SolvingSegment Routing Problems with Hybrid Constraint Programming Tech-niques,” in
International Conference on Principles and Practice ofConstraint Programming , 2015.[28] R. Hartert, S. Vissicchio, P. Schaus, O. Bonaventure, C. Filsfils,T. Telkamp, and P. Francois, “A Declarative and Expressive Approachto Control Forwarding Paths in Carrier-Grade Networks,” in
Proc. ACMSIGCOMM , 2015.[29] J. He, M. Bresler, M. Chiang, and J. Rexford, “Towards robust multi-layer traffic engineering: Optimization of congestion control and rout-ing,” vol. 25, no. 5, pp. 868–880, June 2007.[30] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri,and R. Wattenhofer, “Achieving high utilization with software-drivenWAN,” in
Proc. ACM SIGCOMM , 2013.[31] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh,S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart,and A. Vahdat, “B4: Experience with a globally-deployed softwaredefined WAN,” in
Proc. ACM SIGCOMM , 2013.[32] S. Kandula, D. Katabi, B. Davie, and A. Charny, “Walking the Tightrope:Responsive Yet Stable Traffic Engineering,” in
Proc. ACM SIGCOMM ,2005.[33] N. Karmarkar, “A new polynomial-time algorithm for linear program-ming,” in
Proc. ACM STOC , 1984.[34] L. Khachiyan, “Polynomial algorithms in linear programming,”
USSRComputational Mathematics and Mathematical Physics , vol. 20, no. 1,pp. 53–72, 1980.[35] M. Ku´zniar, P. Perešíni, and D. Kosti´c, “What you need to know aboutSDN flow tables,” in
Proc. PAM , 2015. [36] D. Levin, M. Canini, S. Schmid, F. Schaffert, and A. Feldmann,“Panopticon: Reaping the benefits of incremental sdn deployment inenterprise networks,” in , 2014, pp. 333–345.[37] H. H. Liu, S. Kandula, R. Mahajan, M. Zhang, and D. Gelernter, “Trafficengineering with forward fault correction,” in
Proc. ACM SIGCOMM ,2014.[38] T. Lukovszki, M. Rost, and S. Schmid, “It’s a match!: Near-optimaland incremental middlebox deployment,”
SIGCOMM Comput. Commun.Rev. , vol. 46, no. 1, pp. 30–36, 2016.[39] T. Lukovszki and S. Schmid, “Online admission control and embed-ding of service chains,” in
Post-Proceedings of the 22Nd InternationalColloquium on Structural Information and Communication Complexity- Volume 9439 , ser. SIROCCO 2015, 2015, pp. 104–118.[40] L. Molnár, G. Pongrácz, G. Enyedi, Z. L. Kis, L. Csikor, F. Juhász,A. K˝orösi, and G. Rétvári, “Dataplane Specialization for High-performance OpenFlow Software Switching,” in
Proc. ACM SIGCOMM ,2016.[41] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis ofapproximations for maximizing submodular set functions—i,”
Mathe-matical Programming , vol. 14, no. 1, pp. 265–294, 1978.[42] M. Newman,
Networks: An Introduction . Oxford University Press, Inc.,2010.[43] C. H. Papadimitriou and K. Steiglitz,
Combinatorial Optimization:Algorithms and Complexity . Prentice-Hall, Inc., 1982.[44] R. Puzis, D. Yagil, Y. Elovici, and D. Braha, “Collaborative attack oninternet users’ anonymity,”
Internet Research , vol. 19, pp. 60–77, 2009.[45] M. Roughan, “Simplifying the synthesis of Internet traffic matrices,”
ACM CCR , vol. 35, no. 5, pp. 93–96, 2005.[46] E. Tardos, “A strongly polynomial algorithm to solve combinatoriallinear programs,”
Oper. Res. , vol. 34, no. 2, pp. 250–256, 1986.[47] H. Wang, H. Xie, L. Qiu, Y. R. Yang, Y. Zhang, and A. Greenberg,“COPE: Traffic Engineering in Dynamic Networks,” in
Proc. ACMSIGCOMM , 2006.
George Trimponias received his PhD from the Department of ComputerScience and Engineering at the Hong Kong University of Science and Technol-ogy. Prior to that he obtained a five-year diploma in Electrical and ComputerEngineering from the National Technical University of Athens, Greece. Heis currently a Researcher at Huawei Noah’s Ark Lab in Hong Kong. Hisresearch interests include machine learning, combinatorial optimization, andgame theory.
Yan Xiao received the B.Eng. and M.Eng. degrees in College of Computerand Information from Hohai University, Nanjing, China, in 2013 and 2016,respectively. She is currently a Ph.D. student in Department of ComputerScience, City University of Hong Kong. Her research interests include datamining, big data, deep learning based software engineering, and graph theory.
Hong Xu received the B.Eng. degree from the Department of InformationEngineering, The Chinese University of Hong Kong, in 2007, and theM.A.Sc. and Ph.D. degrees from the Department of Electrical and ComputerEngineering, University of Toronto. He joined the Department of ComputerScience, City University of Hong Kong in 2013, where he is currently anassistant professor. His research interests include data center networking,SDN, NFV, and cloud computing. He was the recipient of an Early CareerScheme Grant from the Research Grants Council of Hong Kong in 2014.He also received the best paper awards from IEEE ICNP 2015 and ACMCoNEXT Student Workshop 2014. He is a member of ACM and IEEE.
Xiaorui Wu received his B.S. degree in Electrical Information Engineeringfrom University of Electronic Science and Technology of China in 2016. Heis currently a first-year PhD student in Department of Computer Science, CityUniversity of Hong Kong. His research interests include computer networking,machine learning, and big data systems. He is a student member of the IEEE.
Yanhui Geng is a senior researcher and project manager at Huawei Noah’sArk Lab (Hong Kong). He received his B.Eng. and M.Eng. in ElectronicEngineering and Information Science from the University of Science andTechnology of China (USTC) in 2002 and 2005, respectively. He obtainedhis Ph.D. degree in Electrical and Electronic Engineering from the Universityof Hong Kong (HKU) in 2009. Before joining Huawei, he was a Post-DoctoralResearch Fellow at HKU from 2009 to 2012, and was a Senior Engineer atthe Hong Kong Applied Science and Technology Research Institute (ASTRI)from 2012 to 2013. His research interests include performance modelingand analysis of communication networks, SDN, machine learning, big dataanalytics, cloud computing, and indoor positioning technology. He has filed15 patents related to communication networks. He has 26 technical publica-tions on international journals and conferences including ACM SIGCOMM,IEEE/ACM Transactions on Networking, IEEE INFOCOM, ICNP, ICDCS,ICDM, etc. and his work on information quality received the IEEE ICC 2010Best Paper Award.
RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 15 A PPENDIX AF LOW CENTRALITIES
A. Preliminaries
The original flow centrality by Freeman et al. [23] definesthe flow centrality of a node w ∈ V in a flow network G =( V, E, c ) as: γ ( w ) = (cid:88) s,t ∈ V | s (cid:54) = v (cid:54) = t ν wmax ( s, t ) ν max ( s, t ) , (21)where ν max ( s, t ) the maximum flow in the single-commodityflow network with commodity s - t , and ν wmax ( s, t ) the max-imum flow through node w in the single-commodity flownetwork with commodity s - t . Thus, the flow centrality of node w represents the percentage of the maximum flow that can gothrough w for a demand chosen uniformly at random.For multi-commodity networks with L commodities asdescribed in Section III-A, we can provide an alternativedefinition of flow centrality as follows: (cid:101) γ ( w ) = ν wmax ( s , t ) ν max ( s , t ) , (22)which denotes the percentage of the maximum multi-commodity flow ν wmax ( s , t ) that can go through node w tothe maximum multi-commodity flow.The basic difference in the two definitions is that the formerconsiders equiprobably all possible source-destination pairs,while the latter focuses on the actual commodities in the flownetwork. Thus, the former is based on the single-commodityformulation, and the latter on the multi-commodity one. B. Hardness of Flow Centralities
Corollary 4.
Given a flow network G = ( V, E, c ) and a node w ∈ V , it is NP-hard to compute flow centrality γ ( w ) as inEquation (21) . Furthermore, it is NP-hard to compute the flowcentrality (cid:101) γ ( w ) of Equation (22) .Proof. The first statement is straightforward, since it uses ν wmax ( s, t ) which is NP-hard by Proposition 1. For the secondstatement, note that (cid:101) γ ( w ) is a fraction of two terms. Thedenominator is the maximum multi-commodity flow ν max which can be computed in strongly polynomial time [46].If (cid:101) γ ( w ) could also be computed in polynomial time, thenwe could simply compute ν wmax in polynomial time as theproduct of (cid:101) γ ( w ) and ν max , which is a contradiction since byProposition 1, it is NP-hard to compute ν wmax . C. Group flow centrality
In this section, we introduce the concept of multi-commodity group flow centrality, which can be seen asgeneralization of the flow centrality (cid:101) γ that we defined inEquation (22). Definition 3.
The group multi-commodity maximum flow GF :2 V → R ≥ in a multi-commodity flow network is a functionwhich, for any group of nodes C ⊆ V of nodes, returns themaximum multi-commodity flow GF ( C ) that can go throughany node in C . Furthermore, we call the maximum group flow that uses atmost N nodes as the N -group maximum flow. The group flow centrality for directed graphs is obviouslyNP-hard as a generalization of the (cid:101) γ flow centrality whichis also NP-hard by Proposition 1. It is however possible toshow NP-hardness by reduction from the maximum coverageproblem, even for just one commodity. This is very importantto also acquire approximability, as we discuss later. Proposition 4.
The N -group maximum single-commodity flowis NP-hard.Proof. We prove NP-hardness by reduction from the maxi-mum coverage problem (MCP) [41], which is well-knownto be NP-hard. In particular, assume a set S of m items I = i , . . . , i m and n sets S , . . . , S n containing elementsin I . Given a positive integer N ≤ n , the MCP tries to select N sets among S , . . . , S n such that the maximum numberof elements are covered, i.e. the union of the selected setshas maximal size. We can reduce the MCP to the N -groupmaximum multi-commodity flow by constructing a directedgraph G = ( V, E ) as follows. V contains two dedicated nodes s and t , one node u j for each item i j that appears in I , andone node v k for each set S k . We then add one edge ( s, u j ) from s to every node u j , one edge ( v k , t ) from each node v k to t , and finally an edge ( u j , v k ) , if and only if set S k containsitem i j . We consider that each edge of the form ( s, u j ) hasa capacity of 1, while all other edges have infinite capacity.Finally, the demand has an upper bound of m .We now prove that the maximum coverage problem has avalue equal to C max , if and only if the N -group maximumflow has a value of C max . Assume first that the MCP has avalue of C max . We can then construct a corresponding flowin G in the following manner. First, we send one unit of flowfrom s to node u j if and only if item i j is covered. Let V ⊆ V be the subset of nodes u j where a unit of flow was sent from s . For each u j ∈ U we subsequently send one unit of flowto exactly one of the nodes v k that u j connects to, chosenuniformly at random. Let V ⊆ V be the subset of nodes v k which receive flow from any node in V . We complete theconstruction by sending l units of flow from every v k ∈ V to t , where l is the number of nodes in V that send a unit of flowto v k . Now, the constructed flow is valid since it is easy toverify that it respects all capacity and conservation constraints.Based on that flow, we can also form a N -group flow. Indeed,by definition the MCP contains at most N sets S i , so therecan be at most N nodes of the form v k participating in theflow. Since the entire flow has to pass through these nodes, wecan then claim that the above flow is a N -group flow passingthrough (at most) N nodes of the form v k . So, the N -groupflow is at least C max .Furthermore, we can argue that that the N -group maximumflow cannot be greater than C max . Indeed, if that were notthe case, then that would imply that there is another group of N nodes that can accept an even larger flow. Without loss ofgenerality, we may assume that these nodes belong to V only,since we can trivially replace any node in U by any node in V that it connects to and achieve a flow that is at least as large the original one. The key observation is that the group flowof N nodes in V has a value C , if and only if the coverageof the respective sets has a size of C . So, if there existed agroup maximum flow with a greater value than C max , thenthat would imply the existence of a maximum coverage of avalue greater than C max , which is a contradiction.The reverse direction can be proven similarly.The previous result immediately implies that the N -groupmulti-commodity flow is also NP-hard. It is interesting thatthe proof of Proposition 4 does not rely at all on the theorythat we developed to show that the individual flow centralityis NP-hard in Proposition 1. It is well-known that the MCPcannot achieve a better approximation ratio than O (1 − e ) ,unless P=NP. We show below that this is also the case for thegroup multi-commodity flow by using results from submodularfunction maximization [17], [41]. Definition 4.
Consider a finite set of elements U and afunction g : 2 U → R ≥ . We call g monotone if addingan element to a set S ∈ U cannot cause the function todecrease, i.e., g ( S ∪ { v } ) ≥ g ( S ) for all v ∈ U and S ∈ U .Furthermore, we call g submodular if the marginal gain fromadding an element to a set S is at least as high as the marginalgain from adding the same element to a superset of S , i.e., g ( S ∪ { v } ) − g ( S ) ≥ g ( T ∪ { v } ) − g ( T ) for all v ∈ U andpairs of set S ⊆ T . Lemma 6.
The function GF : 2 V −{ s,t } → R ≥ is (1)monotone, and (2) submodular.Proof. For monotonicity, note that adding a node can neverdecrease the maximum group flow, since an additional nodecan never decrease the number of available paths to route theflow; in particular, adding a node v (cid:54) = s, t to a set S caneither increase the number of available paths to route the flow,or leave the number of paths unchanged, if all paths that gothrough v already go through nodes in S .It is also simple to prove submodularity by noticing thatthe augmenting s - w - t algorithm can pick the augmenting pathsarbitrarily. For T ∪{ v } , we can then first saturate paths through S , then through T − S , and finally through v . For S ∪ { v } , wecan then first saturate paths through S , and finally through v .Since paths through T − S may overlap with paths through v ,it is straightforward that the marginal gain from adding v inthe former case can never exceed the marginal gain from thelatter. Proposition 5.
Consider the greedy algorithm that each timepicks the node in V − { s, t } that maximizes the marginalutility to add to the group set. Let S ∗ be the subset ofsize k of V − { s, t } that maximizes the group GF . Thenthe set S g that the greedy selects satisfies the property that GF ( S g ) ≥ (1 − e ) · GF ( S ∗ ) , i.e., S ∗ provides a (1 − e ) -approximation. Furthermore, unless P=NP, no polynomialalgorithm can achieve a (1 − e + o (1)) - approximation ratio.Proof. GF is a non-negative monotone submodular function.The approximation ratio for the greedy algorithm then followsdirectly from submodular function maximization [41]. The impossibility result on the approximation ratio follows from[17].However, note that applying the greedy algorithm by Nem-hauser et al. [41] to approximate the maximal group flow isNP-hard, since even computing the maximum s - w - t node isNP-hard by Proposition 4. So, the greedy algorithm will alsobe NP-hard.For the undirected group flow centrality, we can get a similarresult as in Proposition 5. The main difference is that in theundirected case a s - w - t path consists of undirected rather thandirected edges. In terms of computational complexity, notethat now each step of the greedy algorithm runs in (weakly)polynomial time (see Appendix B), so the time complexity forthe greedy algorithm will be (weakly) polynomial, as opposedto the directed case where it is NP-hard. In fact, a stronglypolynomial algorithm may also be possible, in a similar spiritto the results in [46]. A PPENDIX BT HE UNDIRECTED CASE
We demonstrate a very interesting dichotomy between thecases of a directed and undirected graph. In particular, weshow that the maximum multi-commodity s - w - t flow in aundirected graph can be computed in polynomial time. Notethat the main difference between the directed and the undi-rected flow is that the directed assumes separate capacitiesfor each direction ( u, v ) and ( v, u ) whereas the undirectedassumes a single capacity for the undirected edge e which canbe arbitrarily allocated in both directions, which upper boundsthe total flow that we can send in both directions (but not inany individual direction). Proposition 6.
The maximum multi-commodity flow ν wmax inany undirected graph G = ( V, E ) , w ∈ V , can be computedexactly in (weakly) polynomial time.Proof. Assume a multi-commodity undirected graph G =( V, E ) with L commodities of the form ( s i , t i ) . For simplicity,we assume infinite maximum demands D i so that the demandconstraint becomes redundant. We construct a directed graph G (cid:48) = ( V (cid:48) , E (cid:48) ) from G as follows. We first replace each undi-rected edge ( u, v ) ∈ E by two directed edges ( u, v ) and ( v, u ) edges. The capacities on the two directed edges are equal to theoriginal capacity c ( e ) of the undirected edge. We next intro-duce L new nodes z , . . . , z L (one for each commodity), andfor each z i we add the two directed edges ( s i , z i ) and ( t i , z i ) .Finally, we introduce a node z and L directed edges ( z i , z ) from each z i to z . Thus, we have that V (cid:48) = V ∪{ z , . . . , z L , z } and E (cid:48) = E G ∪ ( ∪ i { ( s i , z i ) , ( t i , z i ) , ( z i , z ) } ) , where E G arethe edges that we got by replacing each undirected edge in E by two directed edges. The capacities of the newly constructededges of the form ( s i , z i ) , ( t i , z i ) , ( z i , z ) are infinite. RIMPONIAS ET AL.: ON TRAFFIC ENGINEERING WITH SEGMENT ROUTING IN SDN BASED WANS 17
Next, we claim that the maximum flow in G can be com-puted by considering the following arc-based linear program:maximize V = L (cid:88) i =1 (cid:88) e ∈ w + f i ( e ) subject to L (cid:88) i =1 f i ( e ) ≤ c ( e ) , ∀ e ∈ E (cid:48) (23) (cid:88) e ∈ u + f i ( e ) = (cid:88) e ∈ u − f i ( e ) , ∀ i, ∀ u ∈ V (cid:48) , w (cid:54) = u (cid:54) = z (24) f i ( u, v ) + f i ( v, u ) ≤ c ( e ) , ∀ i, ∀ e = ( u, v ) ∈ E (25) f i ( e ) ≥ , ∀ e ∈ E (cid:48) , ∀ i ∈ { , . . . , L } (26) f j ( s i , z i ) = 0 , ∀ i, j ∈ { , . . . , L } with i (cid:54) = j (27) f i ( s i , z i ) = f i ( t i , z i ) , ∀ i ∈ { , . . . , L } (28)The above LP computes the maximum flow from w to z ,where the total flow is composed of L separate sub-flows, onefor each commodity. The sub-flow for commodity i can be sentfrom w to z i through either s i or t i . Constraints (23), (24), (26)are the link capacity, node conservation and positive flowconstraints, respectively. Constraint (25) is necessary to makesure that the sum of flow units in each of the two directededges does not exceed the capacity of the original undirectededge. Constraint (27) implies that node z i can only receiveflow from commodity i . Constraint (28) is especially importantbecause it ensures that the sub-flow for commodity i sentthrough s i is the same as the one sent through t i .Now, we establish the equivalence between the originalproblem and the above LP by showing that (i) · ν wmax ≤ V ∗ ,and (ii) V ∗ ≤ · ν wmax . We start with (i). Assume any maximumflow through w with value ν wmax in G . We first construct thecorresponding flow in G (cid:48) . We note that the flow in G consistsof L sub-flows, one for each commodity i , that send flow from s i to t i through s i - w - t i paths. The idea is then to reverse thedirection of each sub-flow in the part from s i to w i , so thatit now sends to the opposite direction. We then send ν ( f i ) units of flow from s i to z i , ν ( f i ) units of flow from t i to z i ,and · ν ( f i ) units of flow from z i to z . Note that that is avalid flow since it respects all constraints in the LP, and it hasa value · ( ν ( f ) + · · · + ν ( f L )) = 2 · ν wmax . But then themaximum flow will be at least as large, hence · ν wmax ≤ V ∗ .For the reverse direction (ii), assume a maximum flow in G (cid:48) . Then for each commodity i half units are sent from w to s i and half from w to t i (and subsequently z i and z ) due toconstraint (28). Again, the idea is to reverse the flow f i inall paths from w to s i . For each edge of G we then send oneach direction as many units of flow as we send in G (cid:48) (afterreversing the direction from w to s i ). The key is that in graph G (cid:48) the same amount of flow is sent from w to s i and w to t i , which implies that the constructed flow in G will respectall capacity and conservation constraints, including for node w . The value of the flow in G is half that in G (cid:48) , thus for themaximum flow on G we will trivially have that V ∗ ≤ · ν wmax .Since the constraints of the LP have a size that is polynomialin V and E , and LP runs has a (weakly) polynomial complex- ity in terms of the LP size [33], [34], we immediately deducethat computing the maximum s - w - t flow in an undirectedgraph G = ( V, E ))