[PDF] A Convergent Semi-Proximal Alternating Direction Method of Multipliers for Recovering Internet Traffics from Link Measurements

Abstract

It is challenging to recover the large-scale internet traffic data purely from the link measurements. With the rapid growth of the problem scale, it will be extremely difficult to sustain the recovery accuracy and the computational cost. In this work, we propose a new Sparsity Low-Rank Recovery (SLRR) model and its Schur Complement Based semi-proximal Alternating Direction Method of Multipliers (SCB-spADMM) solver. Our approach distinguishes itself mainly for the following two aspects. First, we fully exploit the spatial low-rank property and the sparsity of traffic data, which are barely considered in the literature. Our model can be divided into a series of subproblems, which only relate to the traffics in a certain individual time interval. Thus, the model scale is significantly reduced. Second, we establish a globally convergent ADMM-type algorithm inspired by [Li et al., Math. Program., 155(2016)] to solve the SLRR model. In each iteration, all the intermediate variables' optimums can be calculated analytically, which makes the algorithm fast and accurate. Besides, due to the separability of the SLRR model, it is possible to design a parallel algorithm to further reduce computational time. According to the numerical results on the classic datasets Abilene and GEANT, our method achieves the best accuracy with a low computational cost. Moreover, in our newly released large-scale Huawei Origin-Destination (HOD) network traffics, our method perfectly reaches the seconds-level feedback, which meets the essential requirement for practical scenarios.

Full PDF

aa r X i v : . [ m a t h . O C ] F e b A Convergent Semi-Proximal Alternating Direction Method ofMultipliers for Recovering Internet Traﬃcs from LinkMeasurements

Zhenyu Ming , Liping Zhang ∗ , Hao Wu , Yanwei Xu , Mayank Bakshi , Bo Bai , andGong Zhang Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co., Ltd, HongKong

Abstract

It is challenging to recover the large-scale internet traﬃc data purely from the link mea-surements. With the rapid growth of the problem scale, it will be extremely diﬃcult to sustainthe recovery accuracy and the computational cost. In this work, we propose a new SparsityLow-Rank Recovery (SLRR) model and its Schur Complement Based semi-proximal Alter-nating Direction Method of Multipliers (SCB-spADMM) solver. Our approach distinguishesitself mainly for the following two aspects. First, we fully exploit the spatial low-rank propertyand the sparsity of traﬃc data, which are barely considered in the literature. Our model canbe divided into a series of subproblems, which only relate to the traﬃcs in a certain individualtime interval. Thus, the model scale is signiﬁcantly reduced. Second, we establish a globallyconvergent ADMM-type algorithm inspired by [Li et al., Math. Program., 155(2016)] to solvethe SLRR model. In each iteration, all the intermediate variables’ optimums can be calculatedanalytically, which makes the algorithm fast and accurate. Besides, due to the separability ofthe SLRR model, it is possible to design a parallel algorithm to further reduce computationaltime. According to the numerical results on the classic datasets Abilene and GEANT, ourmethod achieves the best accuracy with a low computational cost. Moreover, in our newlyreleased large-scale Huawei Origin-Destination (HOD) network traﬃcs, our method perfectlyreaches the seconds-level feedback, which meets the essential requirement for practical scenar-ios. keywords:

Large-scale network recovery, HOD dataset, Spatial low-rankness, Nuclearnorm minimization, semi-proximal ADMM

The increasing demand of various services and applications running on the internet has ledto an exponential growth of internet traﬃc in the last two decades. Such growth has, in turn, ∗ Corresponding author: [email protected] X (= x ijk ), with eachcomponent x ijk being the traﬃc ﬂow transmitting from the i -th node to the j -th node over the k -th time interval (see Figure 1 (a)). Consider the network consists of S nodes and the traﬃc (1) X (2) X ( ) T X origindestinationtime (c)(b)(a) ij k Figure 1: Third-order traﬃc tensor model. (a) Third-order traﬃc tensor X , with the ( i, j, k )-thelement x ijk being the traﬃc ﬂow transmitted from the i -th origin to j -th destination in the k -thtime interval. (b) Diﬀerent time intervals of X , i.e., X (1) , · · · , X ( T ) . (c) The traﬃc tensor X isrecovered as chronological order. The traﬃcs in green have been recovered, while the traﬃcs inyellow have not.information on T discrete time intervals, the dimension of traﬃc tensor X is S × S × T , and thedimension of its frontal slices, i.e. X (1) , · · · , X ( T ) (see Figure 1 (b)), are all S × S . If we furtherassume that the network has M links and N OD pairs, then

M, S and N satisfy the following3elations: N / = S ≤ M + 1 < N. The character of a large-scale network system is that the size of M and N could be very large.In addition, the network topology would be rather sparse ( S ≈ M ≪ N ). In the HOD network,there are 577 links, 243 nodes and 59049 OD pairs. More than 95% of OD pairs do not transmitany traﬃc.In this work, we propose a fast and accurate recovery method for NTP. First, we develop aconvex Sparsity Low-Rank Recovery (SLRR) model, which is the ﬁrst to consider both spatiallow-rank property and the sparsity of traﬃc data. Moreover, SLRR can be divided into a groupof subproblems, each of which only related to the traﬃcs in a single time interval. Thus, theproblem scale can be sharply reduced, and we can recover the traﬃcs as chronological order (seeFigure 1 (c)). Then, we design a Schur Complement Based semi-proximal Alternating DirectionMethod of Multipliers (SCB-spADMM) to solve SLRR. For each block variable, we derive theexplicit form of its optimum in the updating schemes. Thus, the algorithm iteration is rather fastand accurate. Based on Li, Sun and Toh’s research [21], the global convergence of SCB-spADMMis established. In addition, with the separable structure of SLRR, people can easily design theparallel algorithm. Finally, we utilize cross-validation, a commonly used strategy for parametersetting in machine learning, to previously pick out a group of appropriate parameters involvedin our model and algorithm. Such a technique is data-independent, and thus it strengthens therobustness of our approach.To test the accuracy and eﬃciency of numerical methods, we need practical network datasets.In this ﬁeld, Abilene dataset and GEANT dataset are frequently used. Abilene/G´EANT datasetconsists of 41/74 links and 11/23 nodes, collected using 5-minute/15-minute time intervals fromDec. 8, 2003 to Dec. 28, 2003/Jan. 1, 2005 to Apr. 29, 2005. However, the scale of thesedatasets is too small compared to the real network problems. Therefore, we released a newdataset – HOD dataset , which is extracted and modiﬁed from real data. The traﬃcs of HODnetwork are collected in 671 time-intervals, each of which lasts for 15 minutes. There are 243nodes and 577 links, and 95% of OD pairs never transmit traﬃc. Thus HOD traﬃc data is highlysparse. Since monitoring the OD traﬃcs is prohibitively expensive, our HOD dataset does notcontain such information. This means that HOD dataset is mainly applicable to NTP that thiswork cares about. We will see that the method proposed in this work is very eﬀective for HODdataset in Section 5. Furthermore, we also hope that the newly proposed HOD dataset can beused as the benchmark of traﬃc engineering methods in the future.The outline of this paper is as follows. Our recovery model SLRR and SCB-spADMM algo-rithm are discussed in Section 2 and Section 3, respectively. In Section 4, we conduct numericalsimulations and evaluate the performance of diﬀerent approaches. In Section 5, we introduce thenewly published HOD dataset and present the corresponding numerical simulations. In Section6, the conclusion of our work is drawn. https://totem.info.ucl.ac.be/dataset.html https://gitee.com/network-user123/hod-network_traffic_data Recovery Model

In this section, we ﬁrst establish a regularized nuclear norm minimization with linear con-straints, called Sparsity Low-Rank Recovery (SLRR) model, to infer the traﬃc data from linkmeasurements. After analyzing the structure property of traﬃc data, we handle the traﬃc tensor X according to its frontal slices X (1) , · · · , X ( T ) . This improvement not only notably reduces thesize of the problem but also makes it possible to design parallel algorithm. Then, we derivethe dual model of SLRR, which can be easily solved by SCB-spADMM. Finally, the optimalityconditions of the primal and dual problems are presented for setting the stopping criterion ofSCB-spADMM. Intuitively, the optimization model to recover the traﬃcs in the k -th ( k = 1 , · · · , T ) timeinterval X ( k ) with the spatial low-rank property can be established by minimizing the rankfunction: min X ( k ) rank( X ( k ) )s.t. A k ( X ( k ) ) = B k , (1)where A k is a linear operator and B k contains measurements and other given constrains. Sincerank function is nonconvex and nonsmooth, it is often relaxed by nuclear norm k · k ∗ . It standsfor the sum of all the singular values of a matrix, which is proved to be the tightest convexrelaxation of rank function on the unit sphere B := { X : k X k ≤ } [11]. The linear constraint A k ( X ( k ) ) = B k contains three types of information:1. X ( k ) is nonnegative.2. The sparsity constraint of the complete traﬃc tensor takes form as P Ω ( X ) = 0, whereΩ ⊂ { ( i, j, k ) : 1 ≤ i, j ≤ S, ≤ k ≤ T } is the index set indicating the location of thesezero-traﬃcs and P Ω ( · ) is the projection on Ω, i.e., P Ω ( x ijk ) = x ijk if ( i, j, k ) ∈ Ω, otherwise, P Ω ( x ijk ) = 0. Denote Ω ( k ) := { ( i, j ) : ( i, j, k ) ∈ Ω } . Thus, in model (1), it follows that P Ω ( k ) ( X ( k ) ) = 0.3. By link-load measurements, we have L ( k ) = R · vec( X ( k ) ) , (2)where R ∈ R M × N is the 0-1 routing matrix, indicating which link associates to whichOD pairs (see Figure 2), L ( k ) ∈ R M is the k -th column of link measurements matrix L and vec( X ( k ) ) is vectorization of the spatial matrix X ( k ) . Formulation (2) can be easilyrewritten as R · vec( X ( k ) ) = S X j =1 R j X ( k ) e j , where R = [ R , R , · · · , R S ] is uniformly divided into S parts along its columns and e j isthe zero-vector except the j -th element being 1.5 R R R S R …… s m s s s (a) (cid:21)(cid:19) (cid:23)(cid:19) (cid:25)(cid:19) (cid:27)(cid:19) (cid:20)(cid:19)(cid:19) (cid:20)(cid:21)(cid:19)(cid:24)(cid:20)(cid:19)(cid:20)(cid:24)(cid:21)(cid:19)(cid:21)(cid:24)(cid:22)(cid:19)(cid:22)(cid:24)(cid:23)(cid:19) ( (cid:69) ) ( (cid:70) ) (cid:24)(cid:19) (cid:20)(cid:19)(cid:19) (cid:20)(cid:24)(cid:19) (cid:21)(cid:19)(cid:19) (cid:21)(cid:24)(cid:19) (cid:22)(cid:19)(cid:19) (cid:22)(cid:24)(cid:19) (cid:23)(cid:19)(cid:19) (cid:23)(cid:24)(cid:19) (cid:24)(cid:19)(cid:19)(cid:20)(cid:19)(cid:21)(cid:19)(cid:22)(cid:19)(cid:23)(cid:19)(cid:24)(cid:19)(cid:25)(cid:19)(cid:26)(cid:19) Figure 2: Routing matrix and its visualization. (a) Routing matrix R and its blocks R , · · · , R S .The dimensions of R and each block R i ( i ∈ { , · · · , S } ) are m × S and M × S . (b) The routingmatrix of Abilene dataset. (c) The routing matrix of G´EANT dataset. In (b) and (c), the blueelements are 0 while the yellow element are 1.Therefore, we modify model (1) to (for simplicity but with a little ambiguity, we omit the index‘(k)’): min X k X k ∗ s.t. S X j =1 R j Xe j = L, P Ω ( X ) = 0 , X ≥ . In internet traﬃc data, the variation of the traﬃcs from the start to the end is often con-siderably large. Nevertheless, the traﬃcs change slowly between adjacent time intervals. Thiscontinuity property of traﬃcs has been studied in many models [26, 37, 43]. On the other hand,the traﬃcs is also has periodic [37]. This situation of similar traﬃc in the same time period atdiﬀerent times can be understood as Internet users having similar behaviors in the same timeperiod every day. By adding two regularized terms in model (2), we have obtained a more realisticmodel: min X k X k ∗ + ρ k X − ¯ X k F + ρ k X − ˆ X k F s.t. S X j =1 R j Xe j = L, P Ω ( X ) = 0 , X ≥ . (3)Here, ¯ X and ˆ X represent the traﬃcs one time interval ago, and one week ago, respectively. Theregularization parameters ρ i > i = 1 ,

2) allow a tunable tradeoﬀ between the spatial low-rankproperty, the continuity, and the periodicity of traﬃcs.Equation (3) leads to the Sparsity Low-Rank Recovery (SLRR) model. It allows us to copewith the traﬃcs in each individual time interval as the chronological order X (1) → X (2) → · · · ,instead of the aggregate data. As shown in Figure 1(c), the green frontal slices represent therecovered traﬃc data, while the yellow ones are not. This character sharply reduces the variablesize of each independent subproblem, and it correctly describes the basic requirement that theupdate traﬃcs needs to be reported immediately each time when the lastest time slice ends, e.g.,the congestion control [14] and real-time monitoring [33].6 .2 Dual Problem We ﬁrst reformulate SLRR model (3) asmin X k X k ∗ + α k Z k F s.t. S X j =1 R j Xe j = L, P Ω ( X ) = 0 , X = Y, X − A − Z = 0 , Y ≥ . (4)In model (4), it follows α := ρ + ρ and A = ρ ¯ X + ρ ˆ Xρ + ρ . (5)The Lagrangian function of (4) is L ( X, Y, Z ; U, V, W, Q, G ) := k X k ∗ + α k Z k F − h U, P Ω ( X ) i − h V, X − Y i− h W, X − A − Z i − h Q, S X j =1 R j Xe j − L i , with the variable matrix Y being nonnegative. Since the objective function of (4) are convex andall of constraints are linear, it has the strong duality, namelymin X,Y ≥ ,Z max U,V,W,G L ( X, Y, Z ; U, V, W, Q, G ) = max

U,V,W,G min

X,Y ≥ ,Z L ( X, Y, Z ; U, V, W, Q, G ) . This property leads to the dual problem that is very easy to solve with the SCB-spADMM. Next,we derive the dual form through the following minimization problem:min

X,Y ≥ ,Z L ( X, Y, Z ; U, V, W, Q, G ) := min

X,Y ≥ ,Z L ( X, Y, Z ) . Note that

X, Y and Z in L ( · ) are decoupled. Thus, the above problem can be divided into threesmaller pieces associated to X, Y and Z and be solved successively.The subproblem of X is solved bymin X k X k ∗ − hP Ω ( U ) + V + W + s X j =1 R Tj Qe Tj , X i =  , kP Ω ( U ) + V + W + S X j =1 R Tj Qe Tj k ≤ , −∞ , otherwise . The last equality holds according to the inequality h A, B i ≤ k A k ∗ k B k . When nonzero B is ﬁxed,the equality can be attained for certain A because k · k ∗ and k · k are dual norms.In terms of Y , the subproblem can be easily ﬁxed bymin Y ≥ h V, Y i = ( , V ≥ , −∞ , otherwise . Z can also be trivially obtained since we only need to solve a convexquadratic programming min Z α k Z k F + h W, Z i = − k W k F α . As a consequence, the dual model of (4) takes the following form, and with the strong duality,they are equivalent.min

U,V,W,Q,G α k W − αA k F − h Q, L i s.t. P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj = G, V ≥ , k G k ≤ . (6)For later developments, we further rewrite problem (6) asmin U,V,W,Q,G δ R s × s + ( V ) − h Q, L i + δ B ( G ) + 14 α k W − αA k F s.t. P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj = G, (7)where B := { X ∈ R S × S : k X k ≤ } , R s × s + := { X ∈ R S × S : X ≥ } , and δ C ( · ) is the indicatorfunction on a closed convex set C , i.e., δ C ( x ) = 0 if x ∈ C , otherwise δ C ( x ) = + ∞ . The objectiveof (7) contains two blocks, namelyblock := (cid:16) δ R s × s + ( V ) − h Q, L i (cid:17) and block := (cid:18) δ B ( G ) + 14 α k W − αA k F (cid:19) , each of which is a convex composite quadratic function. In general, a convex composite quadraticfunction takes form as f ( x, y ) = p ( x ) + h y, M y + b i , where p ( · ) : X → ( −∞ , + ∞ ] is closed proper convex function, M is a positive semideﬁnite matrixand b is a vector. The Karush-Kuhn-Tucker (KKT) conditions associated with (4) and (7) are given as follows.  S X j =1 R j Xe j = L, P Ω ( X ) = 0 , P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj = G,V ∈ R S × S + , G ∈ B . We can set the stopping criterion according to these KKT conditions. Speciﬁcally, deﬁne8 := max { η P , η P , η D , η V , η G } , where  η P := k S P j =1 R j Xe j − L k F k L k F , η P := kP Ω ( X ) k F k X k F ,η D := kP Ω ( U ) + V + W + S P j =1 R Tj Qe Tj − G k F k G k F ,η V := k V − P R S × S + ( V ) k F k V k F , η G := k G − P B ( G ) k F k G k F . We can set a small positive parameter ε and stop the algorithm when η < ε or the iteration stepattains it maximum. Next, we would develop the Schur Complement Based semi-proximal ADMM (SCB-spADMM)for (7). As it is well known, the ADMM-type algorithm is eﬀective and eﬃcient for large-scaleconvex modes with linear constraints [3, 8, 9, 21] and has been successfully applied to manynuclear norm minimization problems [32, 24, 22]. It is easy to implement and only requires littlememory space. The convergence of our SCB-spADMM can be similarly proved as in [21]. Inparticular, the optimum of each variable can be solved explicitly.

A natural idea is to directly employ a 5-block ADMM for solving model (7). DenoteΓ(

U, V, W, Q, G ) := P Ω ( U ) + V + W + S X j =1 R Tj Qe Tj − G. The Lagrangian function of (7) is L β ( U, V, W, Q, G ; X ) := δ R S × S + ( V ) − h Q, L i + δ B ( G ) + 14 α k W − αA k F + h X, Γ( U, V, W, Q, G ) i + β k Γ( U, V, W, Q, G ) k F . A commonly used algorithm for solving (7) is the directly extended 5-block ADMM. Its9teration steps are given by  U k +1 := arg min U L β ( U, V k , W k , Q k , G k ; X k ) ,V k +1 := arg min V L β ( U k +1 , V, W k , Q k , G k ; X k ) ,W k +1 := arg min W L β ( U k +1 , V k +1 , W, Q k , G k ; X k ) ,Q k +1 := arg min Q L β ( U k +1 , V k +1 , W k +1 , Q, G k ; X k ) ,G k +1 := arg min G L β ( U k +1 , V k +1 , W k +1 , Q k +1 , G ; X k ) ,X k +1 := X k + τ β ( P Ω ( U k +1 ) + V k +1 + W k +1 + s X j =1 R Tj Q k +1 e Tj − G k +1 ) . Since each variable can be obtained explicitly in the iterations, the solving of subproblems becomesvery simple. On the other hand, as mentioned in [7], it is a challenge to ensure that the designedmulti-block ( ≥ As introduced in [21], let H U , H Q , H W , H V and H G be self-adjoint positive semideﬁnitelinear operators and satisfy H U := G U − P ∗ Ω P Ω = G U − P Ω , H Q := G Q − A ∗ A , H W := G W − β I − I ∗ I = G W − ( 1 β + 1) I , H V := G V − I ∗ I = G V − I , H G := G G − I ∗ I = G G − I , (10)where G U , G Q , G W , G V and G G are also self-adjoint positive semideﬁte linear operators, andadditionally, their inverses are relatively easy to compute. A is a linear operator satisfying A Q := s P j =1 R Tj Qe Tj and I is the identity operator, i.e., I X = X for any X . From the deﬁnition,we can derive that A ∗ A Q = s P j =1 R j R Tj Q. We propose the SCB-spADMM algorithm for solvingmodel (7) as in Algorithm .

The presence of H U , H Q , H W , H V and H G can guarantee the existence of the solution ofsubproblem (8) and (9), as well as the boundedness of the iteration points U k , Q k , W k , V k and G k . In general, H U , H Q , H W , H V and H G should be as small as possible and meanwhile10 lgorithm SCB-spADMM for solving (7)Deﬁne R j as the j − th block of the routing matrix R , and e j as the zero vector except the j -thelement being one. Calculate α and A through (5). Choose U , Q , V , W , G , X as zeromatrices. For k = 0 , , · · · , repeat the following iteration scheme. while Stopping criterion is not reached do Update

U, Q and V as the following order according to formulations (11), (12), (14), respec-tively.  U k + := arg min U L β ( U, Q k , V k , W k , G k ; X k ) + β k U − U k k H U ,Q k + := arg min Q L β ( U k + , Q, V k , W k , G k ; X k ) + β k Q − Q k k H Q ,V k +1 := arg min V L β ( U k + , Q k + , V, W k , G k ; X k ) + β k V − V k k H V ,Q k +1 := arg min Q L β ( U k + , Q, V k +1 , W k , G k ; X k ) + β k Q − Q k + k H Q ,U k +1 := arg min U L β ( U, Q k +1 , V k +1 , W k , G k ; X k ) + β k U − U k + k H U , (8)Update W and G as the following order according to formulations (13), (15), respec-tively.  W k + := arg min W L β ( U k +1 , Q k +1 , V k +1 , W, G k ; X k ) + β k W − W k k H W ,G k +1 := arg min G L β ( U k +1 , Q k +1 , V k +1 , W k + , G ; X k ) + β k G − G k k H G ,W k +1 := arg min W L β ( U k +1 , Q k +1 , V k +1 , W, G k +1 ; X k ) + β k W − W k + k H W , (9) X k +1 := X k + γβ ( P Ω ( U k +1 ) + V k +1 + W k +1 + s P j =1 R Tj Q k +1 e Tj − G k +1 ) . end whilereturn XU k , Q k , W k , V k and G k are still easy to compute. According to [21], we have the followingconvergence result. Theorem 3.1.

If one of (1) τ ∈ (0 , √ ) and (2) τ ≥ √ but + ∞ P k =0 k ( G k +1 − G k ) + ( W k +1 − W k ) k F + τ kP Ω ( U k +1 ) + V k +1 + W k +1 + s P j =1 R Tj Q k +1 e Tj − G k +1 k F < + ∞ , then we have thefollowing properties.(i) U k , Q k , W k , V k and G k are automatically well deﬁned based on the settings of H U , H Q , H W , H V and H G given by (10) .(ii) { ( U k , Q k , W k , V k , G k , X k ) } ∞ k =1 converges to the unique accumulate point ( U ∞ , Q ∞ , W ∞ , V ∞ ,G ∞ , X ∞ ) , with ( U ∞ , Q ∞ , W ∞ , V ∞ , G ∞ ) being the optimum of model (7) and X ∞ beingthe optimum of model (4) . roof. After substituting the speciﬁc form of (7), it becomes a straightforward deduction ofTheorem 3 in [21]. (cid:3)

Recall the requirements of H U , H Q , H W , H V and H G introduced previously, we can simplyset H W = H V = H G = if we let G V = G G = ( β + 1) − G W = I . Since P Ω is a nondegenerateprojection operator, it follows that kP Ω k = 1. Thus, we can set G U = I and have H U = I − P Ω . In terms of H Q , from the equation A ∗ A Q = S P j =1 R j R Tj Q , we know that kA ∗ Ak = k S P j =1 R j R Tj k . As for G Q , a common and simple setting is G Q = λ max I , where λ max is the largest eigenvalue of A ∗ A . Thus, we have that H Q ( Q ) = ( λ max I − S X j =1 R j R Tj ) Q := H Q Q. In the following, we will derive the explicit forms of the subproblems associated to

U, Q, V, W and G , respectively.The subproblem of U takes the form asmin U h X, P Ω ( U ) i + β k Γ( U, V, W, Q, G ) k F + β k U − U k k I−P Ω . Since it is an unconstraint model and its objective function is convex and diﬀerentiable, we canobtain its optimum by directly setting the gradient of the objective function to zero. Thus, theoptimum of U is U ∗ := P Ω  − V − S X j =1 R Tj Qe Tj − W + G − Xβ  + P Ω C ( U k ) , (11)Similarly, we can simply derive the optimum of Q and W by Q ∗ := arg min Q h S X j =1 R j Xe j − L, Q i + β k Γ( U, V, W, Q, G ) k F + β k Q − Q k k H Q = − λ max  S X j =1 R j ( V + P Ω ( U ) + W − G ) e j − H Q Q k + s P j =1 R j Xe j − Lβ  (12)and W ∗ := arg min W α k W − αA k F + h X, W i + β k Γ( U, V, W, Q, G ) k F = A − X − β V + S P j =1 R Tj Qe Tj + P Ω ( U ) − G ! α + β . (13)The subproblem in terms of V can be divided into a group of convex quadratic programmings,each of which asks for nonnegative solution and associates to scalar variable. It is easy to derive12hat V ∗ := arg min V δ R S × S + ( V ) + h X, V i + β kP Ω ( U ) + V + W + s X j =1 R Tj Qe Tj − G k F = max  , −  s X j =1 R Tj Qe Tj + P Ω ( U ) + W − G  − Xβ  . (14)About G , suppose U G Σ G V G := V + P R Tj Qe Tj + P Ω ( U ) + W − Xβ is the economical singular valuedecomposition ( SVD ). Then, we have that G ∗ := arg min G δ B ( G ) − h X, G i + β k Γ( U, V, W, Q, G ) k F = U G min { I, Σ G } V TG . (15) In this section, we present the simulation results of our SLLR model and SCH-spADMMalgorithm on the Abilene and GEANT datasets. The results of Tomo-SRMF [26], SDPLR [24]and Tomo-Gravity [42] are also output for comparison. As already mentioned in the introductionsection, we will focus on more practical network tomography problem [1, 24, 26, 31, 40, 41] hererather than the completion problem [5, 6, 15, 24, 26, 34, 35, 37, 38, 39, 43].We notice there are some anomalies in G´EANT traﬃcs [30]. As seen in Figure 3, a few oftraﬃc volumes are much larger than the others. Suppose the traﬃcs in the time intervals t and t are correct but all the intermediate traﬃcs are abnormal. Then, we modify the anomalies byusing linear interpolation between the time intervals t and t . In Figure 3 (c), the reﬁned datais presented. For Abilene dataset, we can see that it has good quality from Figure 4. Thus nospecial processing is required.As previously mentioned, the real-world traﬃc data is of high sparsity. In order to imitatethis realistic circumstance, we modify the Abilene dataset and G´EANT dataset by setting thesmallest p % ( p = 50 , ,

90) of OD pairs to zero-ODs, which means they do not transmit anynetwork traﬃc. time interval t r a ff i c v o l u m e (a) time interval t r a ff i c v o l u m e (b) time interval t r a ff i c v o l u m e (c) Figure 3: The Frobenius norm of (a) the original G´EANT traﬃcs in each time interval, (b)the original G´EANT traﬃcs from the 5001st time interval to the 5500th time interval, (c) themodiﬁed G´EANT traﬃcs in each time interval.K-fold [23] Cross Validation (CV) is employed to ﬁnd proper parameters ( α , α , β ) in our13lgorithm. We uniformly divide the rows of the link-measurement matrix L into K groups. Eachrow group and the traﬃcs therein are denoted as Ψ l and L Ψ l ( l = 1 , · · · , K ), respectively. Then,we generate candidate parameters { ( α j , α j , β j ) } j =1 from a wide-range interval. All of candidateswill be evaluated by the following steps:1. Let the integer l change from 1 to K . In each case, L Ψ l is taken as the test set, while theother link-load data is taken as the training set.2. Based on the link-load traﬃcs in the training set, we can obtain an estimation of the traﬃcsin rows Ψ l using proposed SCB-spADMM, denoted as ˆ L Ψ l .3. If l = K , calculate the overall recovery error by Normalized Mean Absolute Error (NMAE)for CV, deﬁned as N CV := K P l =1 P i ∈ Ψ l T P j =1 (cid:12)(cid:12)(cid:12) ( ˆ L Ψ l ) ij − ( L Ψ l ) ij (cid:12)(cid:12)(cid:12) M P i =1 T P j =1 L ij . Ultimately, we choose the candidate that corresponds to the lowest N CV as the parameters usedin our algorithm. With the ﬁxed parameters, we can implement SCB-spADMM to estimate thewhole OD traﬃcs. The recovery performance is evaluated by NMAE on the missing data:NMAE := P ( i,j,k ) ∈ Ω C | ˆ X ijk − X ijk | P ( i,j,k ) ∈ Ω C X ijk , (16)We compare SLRR against three alternative methods: Tomo-SRMF, SDPLR and Tomo-Gravity. The complete three-week Abilene traﬃcs and four-month G´EANT traﬃcs are simulated.In particular, we simulate G´EANT dataset in three diﬀerent time periods: the ﬁrst week, the ﬁrsteight weeks and the last eight weeks, respectively. We need to point out that other work [26, 43]only simulated the one-week data of the GEANT dataset. In fact, sometimes people only careabout one-week traﬃcs rather than the aggregate data.The simulation results are reported in Table 1. We can see that as the sparsity increases, therecovery errors will decrease. This is because high sparsity contains more raw traﬃc informationand reduce uncertainty. More importantly, our SLRR is signiﬁcantly better than other methodsin all cases. For the GEANT dataset, the results of the ﬁrst-week data are better than theresults of the other two groups. This may be related to the data quality of the GEANT dataset.Nevertheless, our SLRR is still superior to other methods in these cases.The recovery performances of SLRR in diﬀerent time intervals are presented in Figure 4.We especially mark the NAME under certain sparsity with a horizontal dashed line of the samecolor. It shows that our recovery results are consistent and reliable. Compared with Abilene dataset and GEANT dataset, the most signiﬁcant feature of HODdataset is that it is highly sparsity. In fact, 95.26% of OD pairs have no traﬃc. In Figure 5, we14able 1: Comparison of NMAE on Abilene and G´EANT datasets. For G´EANT dataset, weoutput the simulation results of the ﬁrst-week/the ﬁrst-eight-week/the last-eight-week data.Method Sparsity = 50% Sparsity = 70% Sparsity = 90%AbileneTomo-SRMF 0.374 0.335 0.236SDPLR 0.535 0.393 0.272Tomo-Gravity 0.631 0.558 0.248SLRR

G´EANTTomo-SRMF 1.003/1.182/1.342 0.725/0.922/1.075 0.271/0.465/0.530SDPLR 1.004/1.055/1.171 0.734/0.859/0.973 0.308/0.456/0.526Tomo-Gravity 1.065/1.113/1.145 0.871/0.971/1.005 0.583/0.644/0.734SLRR

200 400 600 800 1000 1200 1400 1600 1800 2000 time interval N M AE (a) rate of zero OD pair is 50% rate of zero OD pair is 70% rate of zero OD pair is 90%

100 200 300 400 500 600 time interval N M AE (b) rate of zero OD pair is 50% rate of zero OD pair is 70% rate of zero OD pair is 90% Figure 4: NMAE in diﬀerent time intervals on (a) the complete three-week traﬃcs of Abilenedataset, (b) the ﬁrst-week traﬃcs of G´EANT dataset. The spaces between adjacent time intervalsare 40 for Abilene dataset and 15 for G´EANT dataset.fully demonstrate this. Moreover, HOD dataset also has continuity and periodicity, see Figure 6for illustration. In the following simulations, we will also make use of these properties.Next, we evaluate diﬀerent methods on HOD dataset. Because the real OD traﬃcs areunclear, NMAE metric (16) is no longer eﬀective. Again we utilize CV errors (3) for assessment.Except the aforementioned K-fold CV, we also consider another kind of CV, named the MonteCarlo CV [10] to make fair evaluation. In the Monte Carlo CV, the training set and the testset are randomly generated. We set the ratio of the test sets to 2% and repeat 50 times in eachsimulation.Table 2 presents the recovery performance of diﬀerent methods testing on HOD dataset.‘SLRR-MC’ stands for the case where SLRR is evaluated by Monte Carlo CV. From Table 2,we ﬁnd that over the four methods, SLRR signiﬁcantly outperforms the others. Moreover, the15

50 100 150 200050100150200 (a) (b)

Figure 5: The sparsity of (a) the spatial matrix, (b) the routing matrix of HOD dataset. Thered components in the spatial matrix means that the corresponding OD pairs have transmittedtraﬃc data, and those in the routing matrix stand for 1-components.

100 200 300 400 500 600 time interval L2 no r m o f li n k - l oad s Figure 6: The Frobenius norm of link-loads in diﬀerent time intervals. The ﬁgure illustrates thethe continuity and the periodicity of HOD dataset.recovery results of SLRR assessed by two diﬀerent CVs are very close.The computational time of our algorithm is also investigated. The overall simulations areimplemented on an ASUS laptop (4-core, Intel i7-10510U, @2.30GHz, 16G RAM). We have toadmit that our SLRR model is slower than the other methods due to the SVD decomposition.16able 2: Comparison of recovery performance on HOD dataset.Method Tomo-SRMF SDPLR Tomo-Gravity SLRR SLRR-MCNMAE 0.3314 0.2832 0.3243

However, our method only takes a few seconds (see Figure 7) to recover the time interval of morethan ten minutes in practical scenarios. This well satisﬁes the requirement of practical problems.Moreover, our SLRR model has better recovery accuracy than all the existing methods in any case.These make our SLRR model have signiﬁcant advantages in the network tomography problem,and it may be integrated into future products. time interval C P U - t i m e ( s e c ond s ) (a) sparsity = 50% sparsity = 70% sparsity = 90% time interval C P U - t i m e ( s e c ond s ) (b) sparsity = 50% sparsity = 70% sparsity = 90% time interval C P U - t i m e ( s e c ond s ) (c) Figure 7: CPU-time of each time interval on (a) Abilene dataset, (b) G´EANT dataset, (c) HODdataset. For Abilene and G´EANT datasets, it takes less than 0.1 second to recover the traﬃcsin one time interval. For large-scale HOD dataset, it also only takes several seconds for eachinterval.

Our research extends the knowledge into solving large-scale network tomography problem.By leveraging on the spatial low-rank property and the sparsity of traﬃc data, we propose SLRRmodel to accurately inference the network traﬃcs. The separability of SLRR can signiﬁcantlyreduce the problem scale and make it possible to design parallel algorithm. To solve SLRR,we design a globally convergent Schur complement based semi-proximal alternating directionmethod of multipliers. Since all of optimums of block variables can be calculated by explicitforms, the iteration scheme is fast and accurate. K-fold cross validation is employed to ﬁndproper parameters in our algorithm. By taking advantages of the above techniques, our methodoutperforms all the others with respect to the recovery accuracy. Moreover, the seconds-feedbackon diﬀerent datasets also demonstrate the high computation eﬃciency of our algorithm. Except17he proposed methodology, another highlight of our work is that we publish an industrial datasetHOD to enrich the database of network traﬃcs recovery study. acknowledgement

This work was supported by the National Natural Science Foundation of China (Grant No.11771244,11871297) and Tsinghua University Initiative Scientiﬁc Research Program.