A Convergent Semi-Proximal Alternating Direction Method of Multipliers for Recovering Internet Traffics from Link Measurements
Zhenyu Ming, Liping Zhang, Hao Wu, Yanwei Xu, Mayank Bakshi, Bo Bai, Gong Zhang
aa r X i v : . [ m a t h . O C ] F e b A Convergent Semi-Proximal Alternating Direction Method ofMultipliers for Recovering Internet Traffics from LinkMeasurements
Zhenyu Ming , Liping Zhang ∗ , Hao Wu , Yanwei Xu , Mayank Bakshi , Bo Bai , andGong Zhang Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co., Ltd, HongKong
Abstract
It is challenging to recover the large-scale internet traffic data purely from the link mea-surements. With the rapid growth of the problem scale, it will be extremely difficult to sustainthe recovery accuracy and the computational cost. In this work, we propose a new SparsityLow-Rank Recovery (SLRR) model and its Schur Complement Based semi-proximal Alter-nating Direction Method of Multipliers (SCB-spADMM) solver. Our approach distinguishesitself mainly for the following two aspects. First, we fully exploit the spatial low-rank propertyand the sparsity of traffic data, which are barely considered in the literature. Our model canbe divided into a series of subproblems, which only relate to the traffics in a certain individualtime interval. Thus, the model scale is significantly reduced. Second, we establish a globallyconvergent ADMM-type algorithm inspired by [Li et al., Math. Program., 155(2016)] to solvethe SLRR model. In each iteration, all the intermediate variables’ optimums can be calculatedanalytically, which makes the algorithm fast and accurate. Besides, due to the separability ofthe SLRR model, it is possible to design a parallel algorithm to further reduce computationaltime. According to the numerical results on the classic datasets Abilene and GEANT, ourmethod achieves the best accuracy with a low computational cost. Moreover, in our newlyreleased large-scale Huawei Origin-Destination (HOD) network traffics, our method perfectlyreaches the seconds-level feedback, which meets the essential requirement for practical scenar-ios. keywords:
Large-scale network recovery, HOD dataset, Spatial low-rankness, Nuclearnorm minimization, semi-proximal ADMM
The increasing demand of various services and applications running on the internet has ledto an exponential growth of internet traffic in the last two decades. Such growth has, in turn, ∗ Corresponding author: [email protected] X (= x ijk ), with eachcomponent x ijk being the traffic flow transmitting from the i -th node to the j -th node over the k -th time interval (see Figure 1 (a)). Consider the network consists of S nodes and the traffic (1) X (2) X ( ) T X origindestinationtime (c)(b)(a) ij k Figure 1: Third-order traffic tensor model. (a) Third-order traffic tensor X , with the ( i, j, k )-thelement x ijk being the traffic flow transmitted from the i -th origin to j -th destination in the k -thtime interval. (b) Different time intervals of X , i.e., X (1) , · · · , X ( T ) . (c) The traffic tensor X isrecovered as chronological order. The traffics in green have been recovered, while the traffics inyellow have not.information on T discrete time intervals, the dimension of traffic tensor X is S × S × T , and thedimension of its frontal slices, i.e. X (1) , · · · , X ( T ) (see Figure 1 (b)), are all S × S . If we furtherassume that the network has M links and N OD pairs, then
M, S and N satisfy the following3elations: N / = S ≤ M + 1 < N. The character of a large-scale network system is that the size of M and N could be very large.In addition, the network topology would be rather sparse ( S ≈ M ≪ N ). In the HOD network,there are 577 links, 243 nodes and 59049 OD pairs. More than 95% of OD pairs do not transmitany traffic.In this work, we propose a fast and accurate recovery method for NTP. First, we develop aconvex Sparsity Low-Rank Recovery (SLRR) model, which is the first to consider both spatiallow-rank property and the sparsity of traffic data. Moreover, SLRR can be divided into a groupof subproblems, each of which only related to the traffics in a single time interval. Thus, theproblem scale can be sharply reduced, and we can recover the traffics as chronological order (seeFigure 1 (c)). Then, we design a Schur Complement Based semi-proximal Alternating DirectionMethod of Multipliers (SCB-spADMM) to solve SLRR. For each block variable, we derive theexplicit form of its optimum in the updating schemes. Thus, the algorithm iteration is rather fastand accurate. Based on Li, Sun and Toh’s research [21], the global convergence of SCB-spADMMis established. In addition, with the separable structure of SLRR, people can easily design theparallel algorithm. Finally, we utilize cross-validation, a commonly used strategy for parametersetting in machine learning, to previously pick out a group of appropriate parameters involvedin our model and algorithm. Such a technique is data-independent, and thus it strengthens therobustness of our approach.To test the accuracy and efficiency of numerical methods, we need practical network datasets.In this field, Abilene dataset and GEANT dataset are frequently used. Abilene/G´EANT datasetconsists of 41/74 links and 11/23 nodes, collected using 5-minute/15-minute time intervals fromDec. 8, 2003 to Dec. 28, 2003/Jan. 1, 2005 to Apr. 29, 2005. However, the scale of thesedatasets is too small compared to the real network problems. Therefore, we released a newdataset – HOD dataset , which is extracted and modified from real data. The traffics of HODnetwork are collected in 671 time-intervals, each of which lasts for 15 minutes. There are 243nodes and 577 links, and 95% of OD pairs never transmit traffic. Thus HOD traffic data is highlysparse. Since monitoring the OD traffics is prohibitively expensive, our HOD dataset does notcontain such information. This means that HOD dataset is mainly applicable to NTP that thiswork cares about. We will see that the method proposed in this work is very effective for HODdataset in Section 5. Furthermore, we also hope that the newly proposed HOD dataset can beused as the benchmark of traffic engineering methods in the future.The outline of this paper is as follows. Our recovery model SLRR and SCB-spADMM algo-rithm are discussed in Section 2 and Section 3, respectively. In Section 4, we conduct numericalsimulations and evaluate the performance of different approaches. In Section 5, we introduce thenewly published HOD dataset and present the corresponding numerical simulations. In Section6, the conclusion of our work is drawn. https://totem.info.ucl.ac.be/dataset.html https://gitee.com/network-user123/hod-network_traffic_data Recovery Model
In this section, we first establish a regularized nuclear norm minimization with linear con-straints, called Sparsity Low-Rank Recovery (SLRR) model, to infer the traffic data from linkmeasurements. After analyzing the structure property of traffic data, we handle the traffic tensor X according to its frontal slices X (1) , · · · , X ( T ) . This improvement not only notably reduces thesize of the problem but also makes it possible to design parallel algorithm. Then, we derivethe dual model of SLRR, which can be easily solved by SCB-spADMM. Finally, the optimalityconditions of the primal and dual problems are presented for setting the stopping criterion ofSCB-spADMM. Intuitively, the optimization model to recover the traffics in the k -th ( k = 1 , · · · , T ) timeinterval X ( k ) with the spatial low-rank property can be established by minimizing the rankfunction: min X ( k ) rank( X ( k ) )s.t. A k ( X ( k ) ) = B k , (1)where A k is a linear operator and B k contains measurements and other given constrains. Sincerank function is nonconvex and nonsmooth, it is often relaxed by nuclear norm k · k ∗ . It standsfor the sum of all the singular values of a matrix, which is proved to be the tightest convexrelaxation of rank function on the unit sphere B := { X : k X k ≤ } [11]. The linear constraint A k ( X ( k ) ) = B k contains three types of information:1. X ( k ) is nonnegative.2. The sparsity constraint of the complete traffic tensor takes form as P Ω ( X ) = 0, whereΩ ⊂ { ( i, j, k ) : 1 ≤ i, j ≤ S, ≤ k ≤ T } is the index set indicating the location of thesezero-traffics and P Ω ( · ) is the projection on Ω, i.e., P Ω ( x ijk ) = x ijk if ( i, j, k ) ∈ Ω, otherwise, P Ω ( x ijk ) = 0. Denote Ω ( k ) := { ( i, j ) : ( i, j, k ) ∈ Ω } . Thus, in model (1), it follows that P Ω ( k ) ( X ( k ) ) = 0.3. By link-load measurements, we have L ( k ) = R · vec( X ( k ) ) , (2)where R ∈ R M × N is the 0-1 routing matrix, indicating which link associates to whichOD pairs (see Figure 2), L ( k ) ∈ R M is the k -th column of link measurements matrix L and vec( X ( k ) ) is vectorization of the spatial matrix X ( k ) . Formulation (2) can be easilyrewritten as R · vec( X ( k ) ) = S X j =1 R j X ( k ) e j , where R = [ R , R , · · · , R S ] is uniformly divided into S parts along its columns and e j isthe zero-vector except the j -th element being 1.5 R R R S R …… s m s s s (a) (cid:21)(cid:19) (cid:23)(cid:19) (cid:25)(cid:19) (cid:27)(cid:19) (cid:20)(cid:19)(cid:19) (cid:20)(cid:21)(cid:19)(cid:24)(cid:20)(cid:19)(cid:20)(cid:24)(cid:21)(cid:19)(cid:21)(cid:24)(cid:22)(cid:19)(cid:22)(cid:24)(cid:23)(cid:19) ( (cid:69) ) ( (cid:70) ) (cid:24)(cid:19) (cid:20)(cid:19)(cid:19) (cid:20)(cid:24)(cid:19) (cid:21)(cid:19)(cid:19) (cid:21)(cid:24)(cid:19) (cid:22)(cid:19)(cid:19) (cid:22)(cid:24)(cid:19) (cid:23)(cid:19)(cid:19) (cid:23)(cid:24)(cid:19) (cid:24)(cid:19)(cid:19)(cid:20)(cid:19)(cid:21)(cid:19)(cid:22)(cid:19)(cid:23)(cid:19)(cid:24)(cid:19)(cid:25)(cid:19)(cid:26)(cid:19) Figure 2: Routing matrix and its visualization. (a) Routing matrix R and its blocks R , · · · , R S .The dimensions of R and each block R i ( i ∈ { , · · · , S } ) are m × S and M × S . (b) The routingmatrix of Abilene dataset. (c) The routing matrix of G´EANT dataset. In (b) and (c), the blueelements are 0 while the yellow element are 1.Therefore, we modify model (1) to (for simplicity but with a little ambiguity, we omit the index‘(k)’): min X k X k ∗ s.t. S X j =1 R j Xe j = L, P Ω ( X ) = 0 , X ≥ . In internet traffic data, the variation of the traffics from the start to the end is often con-siderably large. Nevertheless, the traffics change slowly between adjacent time intervals. Thiscontinuity property of traffics has been studied in many models [26, 37, 43]. On the other hand,the traffics is also has periodic [37]. This situation of similar traffic in the same time period atdifferent times can be understood as Internet users having similar behaviors in the same timeperiod every day. By adding two regularized terms in model (2), we have obtained a more realisticmodel: min X k X k ∗ + ρ k X − ¯ X k F + ρ k X − ˆ X k F s.t. S X j =1 R j Xe j = L, P Ω ( X ) = 0 , X ≥ . (3)Here, ¯ X and ˆ X represent the traffics one time interval ago, and one week ago, respectively. Theregularization parameters ρ i > i = 1 ,
2) allow a tunable tradeoff between the spatial low-rankproperty, the continuity, and the periodicity of traffics.Equation (3) leads to the Sparsity Low-Rank Recovery (SLRR) model. It allows us to copewith the traffics in each individual time interval as the chronological order X (1) → X (2) → · · · ,instead of the aggregate data. As shown in Figure 1(c), the green frontal slices represent therecovered traffic data, while the yellow ones are not. This character sharply reduces the variablesize of each independent subproblem, and it correctly describes the basic requirement that theupdate traffics needs to be reported immediately each time when the lastest time slice ends, e.g.,the congestion control [14] and real-time monitoring [33].6 .2 Dual Problem We first reformulate SLRR model (3) asmin X k X k ∗ + α k Z k F s.t. S X j =1 R j Xe j = L, P Ω ( X ) = 0 , X = Y, X − A − Z = 0 , Y ≥ . (4)In model (4), it follows α := ρ + ρ and A = ρ ¯ X + ρ ˆ Xρ + ρ . (5)The Lagrangian function of (4) is L ( X, Y, Z ; U, V, W, Q, G ) := k X k ∗ + α k Z k F − h U, P Ω ( X ) i − h V, X − Y i− h W, X − A − Z i − h Q, S X j =1 R j Xe j − L i , with the variable matrix Y being nonnegative. Since the objective function of (4) are convex andall of constraints are linear, it has the strong duality, namelymin X,Y ≥ ,Z max U,V,W,G L ( X, Y, Z ; U, V, W, Q, G ) = max
U,V,W,G min
X,Y ≥ ,Z L ( X, Y, Z ; U, V, W, Q, G ) . This property leads to the dual problem that is very easy to solve with the SCB-spADMM. Next,we derive the dual form through the following minimization problem:min
X,Y ≥ ,Z L ( X, Y, Z ; U, V, W, Q, G ) := min
X,Y ≥ ,Z L ( X, Y, Z ) . Note that
X, Y and Z in L ( · ) are decoupled. Thus, the above problem can be divided into threesmaller pieces associated to X, Y and Z and be solved successively.The subproblem of X is solved bymin X k X k ∗ − hP Ω ( U ) + V + W + s X j =1 R Tj Qe Tj , X i = , kP Ω ( U ) + V + W + S X j =1 R Tj Qe Tj k ≤ , −∞ , otherwise . The last equality holds according to the inequality h A, B i ≤ k A k ∗ k B k . When nonzero B is fixed,the equality can be attained for certain A because k · k ∗ and k · k are dual norms.In terms of Y , the subproblem can be easily fixed bymin Y ≥ h V, Y i = ( , V ≥ , −∞ , otherwise . Z can also be trivially obtained since we only need to solve a convexquadratic programming min Z α k Z k F + h W, Z i = − k W k F α . As a consequence, the dual model of (4) takes the following form, and with the strong duality,they are equivalent.min
U,V,W,Q,G α k W − αA k F − h Q, L i s.t. P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj = G, V ≥ , k G k ≤ . (6)For later developments, we further rewrite problem (6) asmin U,V,W,Q,G δ R s × s + ( V ) − h Q, L i + δ B ( G ) + 14 α k W − αA k F s.t. P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj = G, (7)where B := { X ∈ R S × S : k X k ≤ } , R s × s + := { X ∈ R S × S : X ≥ } , and δ C ( · ) is the indicatorfunction on a closed convex set C , i.e., δ C ( x ) = 0 if x ∈ C , otherwise δ C ( x ) = + ∞ . The objectiveof (7) contains two blocks, namelyblock := (cid:16) δ R s × s + ( V ) − h Q, L i (cid:17) and block := (cid:18) δ B ( G ) + 14 α k W − αA k F (cid:19) , each of which is a convex composite quadratic function. In general, a convex composite quadraticfunction takes form as f ( x, y ) = p ( x ) + h y, M y + b i , where p ( · ) : X → ( −∞ , + ∞ ] is closed proper convex function, M is a positive semidefinite matrixand b is a vector. The Karush-Kuhn-Tucker (KKT) conditions associated with (4) and (7) are given as follows. S X j =1 R j Xe j = L, P Ω ( X ) = 0 , P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj = G,V ∈ R S × S + , G ∈ B . We can set the stopping criterion according to these KKT conditions. Specifically, define8 := max { η P , η P , η D , η V , η G } , where η P := k S P j =1 R j Xe j − L k F k L k F , η P := kP Ω ( X ) k F k X k F ,η D := kP Ω ( U ) + V + W + S P j =1 R Tj Qe Tj − G k F k G k F ,η V := k V − P R S × S + ( V ) k F k V k F , η G := k G − P B ( G ) k F k G k F . We can set a small positive parameter ε and stop the algorithm when η < ε or the iteration stepattains it maximum. Next, we would develop the Schur Complement Based semi-proximal ADMM (SCB-spADMM)for (7). As it is well known, the ADMM-type algorithm is effective and efficient for large-scaleconvex modes with linear constraints [3, 8, 9, 21] and has been successfully applied to manynuclear norm minimization problems [32, 24, 22]. It is easy to implement and only requires littlememory space. The convergence of our SCB-spADMM can be similarly proved as in [21]. Inparticular, the optimum of each variable can be solved explicitly.
A natural idea is to directly employ a 5-block ADMM for solving model (7). DenoteΓ(
U, V, W, Q, G ) := P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj − G. The Lagrangian function of (7) is L β ( U, V, W, Q, G ; X ) := δ R S × S + ( V ) − h Q, L i + δ B ( G ) + 14 α k W − αA k F + h X, Γ( U, V, W, Q, G ) i + β k Γ( U, V, W, Q, G ) k F . A commonly used algorithm for solving (7) is the directly extended 5-block ADMM. Its9teration steps are given by U k +1 := arg min U L β ( U, V k , W k , Q k , G k ; X k ) ,V k +1 := arg min V L β ( U k +1 , V, W k , Q k , G k ; X k ) ,W k +1 := arg min W L β ( U k +1 , V k +1 , W, Q k , G k ; X k ) ,Q k +1 := arg min Q L β ( U k +1 , V k +1 , W k +1 , Q, G k ; X k ) ,G k +1 := arg min G L β ( U k +1 , V k +1 , W k +1 , Q k +1 , G ; X k ) ,X k +1 := X k + τ β ( P Ω ( U k +1 ) + V k +1 + W k +1 + s X j =1 R Tj Q k +1 e Tj − G k +1 ) . Since each variable can be obtained explicitly in the iterations, the solving of subproblems becomesvery simple. On the other hand, as mentioned in [7], it is a challenge to ensure that the designedmulti-block ( ≥ As introduced in [21], let H U , H Q , H W , H V and H G be self-adjoint positive semidefinitelinear operators and satisfy H U := G U − P ∗ Ω P Ω = G U − P Ω , H Q := G Q − A ∗ A , H W := G W − β I − I ∗ I = G W − ( 1 β + 1) I , H V := G V − I ∗ I = G V − I , H G := G G − I ∗ I = G G − I , (10)where G U , G Q , G W , G V and G G are also self-adjoint positive semidefite linear operators, andadditionally, their inverses are relatively easy to compute. A is a linear operator satisfying A Q := s P j =1 R Tj Qe Tj and I is the identity operator, i.e., I X = X for any X . From the definition,we can derive that A ∗ A Q = s P j =1 R j R Tj Q. We propose the SCB-spADMM algorithm for solvingmodel (7) as in Algorithm .
The presence of H U , H Q , H W , H V and H G can guarantee the existence of the solution ofsubproblem (8) and (9), as well as the boundedness of the iteration points U k , Q k , W k , V k and G k . In general, H U , H Q , H W , H V and H G should be as small as possible and meanwhile10 lgorithm SCB-spADMM for solving (7)Define R j as the j − th block of the routing matrix R , and e j as the zero vector except the j -thelement being one. Calculate α and A through (5). Choose U , Q , V , W , G , X as zeromatrices. For k = 0 , , · · · , repeat the following iteration scheme. while Stopping criterion is not reached do Update
U, Q and V as the following order according to formulations (11), (12), (14), respec-tively. U k + := arg min U L β ( U, Q k , V k , W k , G k ; X k ) + β k U − U k k H U ,Q k + := arg min Q L β ( U k + , Q, V k , W k , G k ; X k ) + β k Q − Q k k H Q ,V k +1 := arg min V L β ( U k + , Q k + , V, W k , G k ; X k ) + β k V − V k k H V ,Q k +1 := arg min Q L β ( U k + , Q, V k +1 , W k , G k ; X k ) + β k Q − Q k + k H Q ,U k +1 := arg min U L β ( U, Q k +1 , V k +1 , W k , G k ; X k ) + β k U − U k + k H U , (8)Update W and G as the following order according to formulations (13), (15), respec-tively. W k + := arg min W L β ( U k +1 , Q k +1 , V k +1 , W, G k ; X k ) + β k W − W k k H W ,G k +1 := arg min G L β ( U k +1 , Q k +1 , V k +1 , W k + , G ; X k ) + β k G − G k k H G ,W k +1 := arg min W L β ( U k +1 , Q k +1 , V k +1 , W, G k +1 ; X k ) + β k W − W k + k H W , (9) X k +1 := X k + γβ ( P Ω ( U k +1 ) + V k +1 + W k +1 + s P j =1 R Tj Q k +1 e Tj − G k +1 ) . end whilereturn XU k , Q k , W k , V k and G k are still easy to compute. According to [21], we have the followingconvergence result. Theorem 3.1.
If one of (1) τ ∈ (0 , √ ) and (2) τ ≥ √ but + ∞ P k =0 k ( G k +1 − G k ) + ( W k +1 − W k ) k F + τ kP Ω ( U k +1 ) + V k +1 + W k +1 + s P j =1 R Tj Q k +1 e Tj − G k +1 k F < + ∞ , then we have thefollowing properties.(i) U k , Q k , W k , V k and G k are automatically well defined based on the settings of H U , H Q , H W , H V and H G given by (10) .(ii) { ( U k , Q k , W k , V k , G k , X k ) } ∞ k =1 converges to the unique accumulate point ( U ∞ , Q ∞ , W ∞ , V ∞ ,G ∞ , X ∞ ) , with ( U ∞ , Q ∞ , W ∞ , V ∞ , G ∞ ) being the optimum of model (7) and X ∞ beingthe optimum of model (4) . roof. After substituting the specific form of (7), it becomes a straightforward deduction ofTheorem 3 in [21]. (cid:3)
Recall the requirements of H U , H Q , H W , H V and H G introduced previously, we can simplyset H W = H V = H G = if we let G V = G G = ( β + 1) − G W = I . Since P Ω is a nondegenerateprojection operator, it follows that kP Ω k = 1. Thus, we can set G U = I and have H U = I − P Ω . In terms of H Q , from the equation A ∗ A Q = S P j =1 R j R Tj Q , we know that kA ∗ Ak = k S P j =1 R j R Tj k . As for G Q , a common and simple setting is G Q = λ max I , where λ max is the largest eigenvalue of A ∗ A . Thus, we have that H Q ( Q ) = ( λ max I − S X j =1 R j R Tj ) Q := H Q Q. In the following, we will derive the explicit forms of the subproblems associated to
U, Q, V, W and G , respectively.The subproblem of U takes the form asmin U h X, P Ω ( U ) i + β k Γ( U, V, W, Q, G ) k F + β k U − U k k I−P Ω . Since it is an unconstraint model and its objective function is convex and differentiable, we canobtain its optimum by directly setting the gradient of the objective function to zero. Thus, theoptimum of U is U ∗ := P Ω − V − S X j =1 R Tj Qe Tj − W + G − Xβ + P Ω C ( U k ) , (11)Similarly, we can simply derive the optimum of Q and W by Q ∗ := arg min Q h S X j =1 R j Xe j − L, Q i + β k Γ( U, V, W, Q, G ) k F + β k Q − Q k k H Q = − λ max S X j =1 R j ( V + P Ω ( U ) + W − G ) e j − H Q Q k + s P j =1 R j Xe j − Lβ (12)and W ∗ := arg min W α k W − αA k F + h X, W i + β k Γ( U, V, W, Q, G ) k F = A − X − β V + S P j =1 R Tj Qe Tj + P Ω ( U ) − G ! α + β . (13)The subproblem in terms of V can be divided into a group of convex quadratic programmings,each of which asks for nonnegative solution and associates to scalar variable. It is easy to derive12hat V ∗ := arg min V δ R S × S + ( V ) + h X, V i + β kP Ω ( U ) + V + W + s X j =1 R Tj Qe Tj − G k F = max , − s X j =1 R Tj Qe Tj + P Ω ( U ) + W − G − Xβ . (14)About G , suppose U G Σ G V G := V + P R Tj Qe Tj + P Ω ( U ) + W − Xβ is the economical singular valuedecomposition ( SVD ). Then, we have that G ∗ := arg min G δ B ( G ) − h X, G i + β k Γ( U, V, W, Q, G ) k F = U G min { I, Σ G } V TG . (15) In this section, we present the simulation results of our SLLR model and SCH-spADMMalgorithm on the Abilene and GEANT datasets. The results of Tomo-SRMF [26], SDPLR [24]and Tomo-Gravity [42] are also output for comparison. As already mentioned in the introductionsection, we will focus on more practical network tomography problem [1, 24, 26, 31, 40, 41] hererather than the completion problem [5, 6, 15, 24, 26, 34, 35, 37, 38, 39, 43].We notice there are some anomalies in G´EANT traffics [30]. As seen in Figure 3, a few oftraffic volumes are much larger than the others. Suppose the traffics in the time intervals t and t are correct but all the intermediate traffics are abnormal. Then, we modify the anomalies byusing linear interpolation between the time intervals t and t . In Figure 3 (c), the refined datais presented. For Abilene dataset, we can see that it has good quality from Figure 4. Thus nospecial processing is required.As previously mentioned, the real-world traffic data is of high sparsity. In order to imitatethis realistic circumstance, we modify the Abilene dataset and G´EANT dataset by setting thesmallest p % ( p = 50 , ,
90) of OD pairs to zero-ODs, which means they do not transmit anynetwork traffic. time interval t r a ff i c v o l u m e (a) time interval t r a ff i c v o l u m e (b) time interval t r a ff i c v o l u m e (c) Figure 3: The Frobenius norm of (a) the original G´EANT traffics in each time interval, (b)the original G´EANT traffics from the 5001st time interval to the 5500th time interval, (c) themodified G´EANT traffics in each time interval.K-fold [23] Cross Validation (CV) is employed to find proper parameters ( α , α , β ) in our13lgorithm. We uniformly divide the rows of the link-measurement matrix L into K groups. Eachrow group and the traffics therein are denoted as Ψ l and L Ψ l ( l = 1 , · · · , K ), respectively. Then,we generate candidate parameters { ( α j , α j , β j ) } j =1 from a wide-range interval. All of candidateswill be evaluated by the following steps:1. Let the integer l change from 1 to K . In each case, L Ψ l is taken as the test set, while theother link-load data is taken as the training set.2. Based on the link-load traffics in the training set, we can obtain an estimation of the trafficsin rows Ψ l using proposed SCB-spADMM, denoted as ˆ L Ψ l .3. If l = K , calculate the overall recovery error by Normalized Mean Absolute Error (NMAE)for CV, defined as N CV := K P l =1 P i ∈ Ψ l T P j =1 (cid:12)(cid:12)(cid:12) ( ˆ L Ψ l ) ij − ( L Ψ l ) ij (cid:12)(cid:12)(cid:12) M P i =1 T P j =1 L ij . Ultimately, we choose the candidate that corresponds to the lowest N CV as the parameters usedin our algorithm. With the fixed parameters, we can implement SCB-spADMM to estimate thewhole OD traffics. The recovery performance is evaluated by NMAE on the missing data:NMAE := P ( i,j,k ) ∈ Ω C | ˆ X ijk − X ijk | P ( i,j,k ) ∈ Ω C X ijk , (16)We compare SLRR against three alternative methods: Tomo-SRMF, SDPLR and Tomo-Gravity. The complete three-week Abilene traffics and four-month G´EANT traffics are simulated.In particular, we simulate G´EANT dataset in three different time periods: the first week, the firsteight weeks and the last eight weeks, respectively. We need to point out that other work [26, 43]only simulated the one-week data of the GEANT dataset. In fact, sometimes people only careabout one-week traffics rather than the aggregate data.The simulation results are reported in Table 1. We can see that as the sparsity increases, therecovery errors will decrease. This is because high sparsity contains more raw traffic informationand reduce uncertainty. More importantly, our SLRR is significantly better than other methodsin all cases. For the GEANT dataset, the results of the first-week data are better than theresults of the other two groups. This may be related to the data quality of the GEANT dataset.Nevertheless, our SLRR is still superior to other methods in these cases.The recovery performances of SLRR in different time intervals are presented in Figure 4.We especially mark the NAME under certain sparsity with a horizontal dashed line of the samecolor. It shows that our recovery results are consistent and reliable. Compared with Abilene dataset and GEANT dataset, the most significant feature of HODdataset is that it is highly sparsity. In fact, 95.26% of OD pairs have no traffic. In Figure 5, we14able 1: Comparison of NMAE on Abilene and G´EANT datasets. For G´EANT dataset, weoutput the simulation results of the first-week/the first-eight-week/the last-eight-week data.Method Sparsity = 50% Sparsity = 70% Sparsity = 90%AbileneTomo-SRMF 0.374 0.335 0.236SDPLR 0.535 0.393 0.272Tomo-Gravity 0.631 0.558 0.248SLRR
G´EANTTomo-SRMF 1.003/1.182/1.342 0.725/0.922/1.075 0.271/0.465/0.530SDPLR 1.004/1.055/1.171 0.734/0.859/0.973 0.308/0.456/0.526Tomo-Gravity 1.065/1.113/1.145 0.871/0.971/1.005 0.583/0.644/0.734SLRR
200 400 600 800 1000 1200 1400 1600 1800 2000 time interval N M AE (a) rate of zero OD pair is 50% rate of zero OD pair is 70% rate of zero OD pair is 90%
100 200 300 400 500 600 time interval N M AE (b) rate of zero OD pair is 50% rate of zero OD pair is 70% rate of zero OD pair is 90% Figure 4: NMAE in different time intervals on (a) the complete three-week traffics of Abilenedataset, (b) the first-week traffics of G´EANT dataset. The spaces between adjacent time intervalsare 40 for Abilene dataset and 15 for G´EANT dataset.fully demonstrate this. Moreover, HOD dataset also has continuity and periodicity, see Figure 6for illustration. In the following simulations, we will also make use of these properties.Next, we evaluate different methods on HOD dataset. Because the real OD traffics areunclear, NMAE metric (16) is no longer effective. Again we utilize CV errors (3) for assessment.Except the aforementioned K-fold CV, we also consider another kind of CV, named the MonteCarlo CV [10] to make fair evaluation. In the Monte Carlo CV, the training set and the testset are randomly generated. We set the ratio of the test sets to 2% and repeat 50 times in eachsimulation.Table 2 presents the recovery performance of different methods testing on HOD dataset.‘SLRR-MC’ stands for the case where SLRR is evaluated by Monte Carlo CV. From Table 2,we find that over the four methods, SLRR significantly outperforms the others. Moreover, the15
50 100 150 200050100150200 (a) (b)
Figure 5: The sparsity of (a) the spatial matrix, (b) the routing matrix of HOD dataset. Thered components in the spatial matrix means that the corresponding OD pairs have transmittedtraffic data, and those in the routing matrix stand for 1-components.
100 200 300 400 500 600 time interval L2 no r m o f li n k - l oad s Figure 6: The Frobenius norm of link-loads in different time intervals. The figure illustrates thethe continuity and the periodicity of HOD dataset.recovery results of SLRR assessed by two different CVs are very close.The computational time of our algorithm is also investigated. The overall simulations areimplemented on an ASUS laptop (4-core, Intel i7-10510U, @2.30GHz, 16G RAM). We have toadmit that our SLRR model is slower than the other methods due to the SVD decomposition.16able 2: Comparison of recovery performance on HOD dataset.Method Tomo-SRMF SDPLR Tomo-Gravity SLRR SLRR-MCNMAE 0.3314 0.2832 0.3243
However, our method only takes a few seconds (see Figure 7) to recover the time interval of morethan ten minutes in practical scenarios. This well satisfies the requirement of practical problems.Moreover, our SLRR model has better recovery accuracy than all the existing methods in any case.These make our SLRR model have significant advantages in the network tomography problem,and it may be integrated into future products. time interval C P U - t i m e ( s e c ond s ) (a) sparsity = 50% sparsity = 70% sparsity = 90% time interval C P U - t i m e ( s e c ond s ) (b) sparsity = 50% sparsity = 70% sparsity = 90% time interval C P U - t i m e ( s e c ond s ) (c) Figure 7: CPU-time of each time interval on (a) Abilene dataset, (b) G´EANT dataset, (c) HODdataset. For Abilene and G´EANT datasets, it takes less than 0.1 second to recover the trafficsin one time interval. For large-scale HOD dataset, it also only takes several seconds for eachinterval.
Our research extends the knowledge into solving large-scale network tomography problem.By leveraging on the spatial low-rank property and the sparsity of traffic data, we propose SLRRmodel to accurately inference the network traffics. The separability of SLRR can significantlyreduce the problem scale and make it possible to design parallel algorithm. To solve SLRR,we design a globally convergent Schur complement based semi-proximal alternating directionmethod of multipliers. Since all of optimums of block variables can be calculated by explicitforms, the iteration scheme is fast and accurate. K-fold cross validation is employed to findproper parameters in our algorithm. By taking advantages of the above techniques, our methodoutperforms all the others with respect to the recovery accuracy. Moreover, the seconds-feedbackon different datasets also demonstrate the high computation efficiency of our algorithm. Except17he proposed methodology, another highlight of our work is that we publish an industrial datasetHOD to enrich the database of network traffics recovery study. acknowledgement
This work was supported by the National Natural Science Foundation of China (Grant No.11771244,11871297) and Tsinghua University Initiative Scientific Research Program.