[PDF] A fast two-stage algorithm for non-negative matrix factorization in streaming data

Abstract

In this article, we study algorithms for nonnegative matrix factorization (NMF) in various applications involving streaming data. Utilizing the continual nature of the data, we develop a fast two-stage algorithm for highly efficient and accurate NMF. In the first stage, an alternating non-negative least squares (ANLS) framework is used, in combination with active set method with warm-start strategy for the solution of subproblems. In the second stage, an interior point method is adopted to accelerate the local convergence. The convergence of the proposed algorithm is proved. The new algorithm is compared with some existing algorithms in benchmark tests using both real-world data and synthetic data. The results demonstrate the advantage of our algorithm in finding high-precision solutions.

Full PDF

aa r X i v : . [ m a t h . O C ] J a n IEEE TRANSACTIONS ON SIGNAL PROCESSING 1

A fast two-stage algorithm for non-negative matrixfactorization in streaming data

Ran Gu, Qiang Du, and Simon J. L. Billinge

Abstract —In this article, we study algorithms for nonnegativematrix factorization (NMF) in various applications involvingstreaming data. Utilizing the continual nature of the data,we develop a fast two-stage algorithm for highly efﬁcient andaccurate NMF. In the ﬁrst stage, an alternating non-negativeleast squares (ANLS) framework is used, in combination withactive set method with warm-start strategy for the solution ofsubproblems. In the second stage, an interior point method isadopted to accelerate the local convergence. The convergence ofthe proposed algorithm is proved. The new algorithm is comparedwith some existing algorithms in benchmark tests using bothreal-world data and synthetic data. The results demonstrate theadvantage of our algorithm in ﬁnding high-precision solutions.

Index Terms —Interior point method, active set method, non-negative matrix factorization, streaming data.

I. I

NTRODUCTION N ONNEGATIVE matrix factorization (NMF) [26], [19]refers to the factorization of a matrix approximatelyinto the product of two nonnegative matrices with low rank, M ≈ XY . It has become one of the most popular multi-dimensional data processing tools in various applications suchas signal processing[2], biomedical engineering[27], patternrecognition[6], computer vision and image engineering[2].Lee and Sewing initiated the study of NMF and presenteda method in 1999 [19]. Their method makes all decomposedcomponents non-negative and achieves non-linear dimensionreduction at the same time. Developed in [20] for NMF, Leeand Seung’s multiplicative update rule has been a popularmethod due to the simplicity in implementation.A commonly used optimization formulation of M ≈ XY isto use the Square of Euclidian Distance (SED) as the objectivefunction, that is, min X ∈ R n × k ,Y ∈ R k × m k XY − M k F s.t. X ≥ Y ≥ (1)Many studies of NMF based on the above formulation havefocused on the use of different optimization approaches like Ran Gu is with the School of Statistics and Data Science, Nankai University,Tianjin 300071, China.Qiang Du, and Simon J. L. Billinge are with Department of Applied Physicsand Applied Mathematics, Fu Foundation School of Engineering and AppliedSciences, Columbia University, New York, NY 10027, USA.Qiang Du is also with the Data Science Institute, Columbia University, NewYork, NY 10027, USA.Simon J. L. Billinge is also with the Condensed Matter Physics andMaterials Science Department, Brookhaven National Laboratory, Upton, NY11973, USA.This research was supported by NSF-DMR 1922234 (for R.G., Q.D. andS.B.) and NSF CCF 1740833 (for Q.D.).Manuscript received January 2th, 2021; revised the alternating non-negative least squares (ANLS) [22], [15],[11], [13], coordinate descent methods [5], [21], and al-ternating direction method of multipliers (ADMM) [12]. Acomprehensive survey on various NMF models and manyexisting NMF algorithms can be found in [30].The NMF problem has been shown to be non-convex andNP-hard [29]. The algorithms studied in the literature canonly guarantee ﬁnding a local minimum in general, ratherthan a global minimum of the cost function. Although Aroraet al. [1] presented a polynomial-time algorithm for constant k , the order of the complexity of the algorithm is too highto be applied in practice. Nevertheless, in many data miningapplications, high quality local minima are desired in shorttime [30].In this study, we are motivated by the application of NMFto problems involving streaming data. Here, streaming data iscontinually generated data with continuous distributions, suchas those from the real-time monitoring of reaction productin chemistry experiments and materials synthesis [23], [28].For example, Zhao et al. [32] measured the X-ray diffractiondata during the nucleation and growth of zeolite-supported Agnanoparticles through reduction of Ag-exchanged mordenite(MOR), and the processed the data with pair distributionfunction (PDF) measurement [7]. In the data, each PDF isapproximately represented by a vector representation of n dimensions, which is recorded at m time instances in total. Weshould note the key features of continual nature of these typesof continuously distributed data: at any ﬁxed time, the PDF iscontinuous in the distance variable; meanwhile, at any ﬁxeddistance in the PDF measurement, its value also has continuityin time; moreover, the spatially distributed data at later timesare generated progressively following the data earlier in time.In the AgMOR data used in [32], the dimensions were respec-tively n = 3000 and m = 36 , where n is the length of eachdata, and m is the number of measurements. It was anticipatedthat there are three materials present in the reaction, whichmeans that k = 3 . The focus of our study here is on streamingdata in the particular regime of n ≫ m ≫ k , with k beingvery small. This reﬂects the high dimensionality of data inan individual measurement (very large n ) for systems with arelatively small number of components k .Our goal is to obtain high-precision local solutions forstreaming data with a relatively small scale, that n ≤ , m ≤ , and k ≤ . Based on our observation, ANLS has afast descent rate on objective function at the ﬁrst few iterations,but a low convergent rate near a local minimizer. Interior pointmethod has a better local convergent rate, but require muchmore computation. Thus, we propose to combine ANLS with EEE TRANSACTIONS ON SIGNAL PROCESSING 2 interior point method into a two-stage algorithm. In the ﬁrststage, an active set method is used for ANLS. Due to thecontinuity of streaming data, the number of active set changesis small. In the second stage, a line search interior pointmethod is adopted to reach a fast local convergence. Based onthe property of streaming data, we optimize the computationof the interior point algorithm so that the computational costcan scale like O ( nm k ) in each iteration. The proposed two-stage algorithm combines the advantages of ANLS and interiorpoint method for streaming data.The organization of this paper is as follows. Section 2introduces the ﬁrst stage of our proposed algorithm which isbased on the ANLS with an active set method for streamingdata. Section 3 proposes the second stage of the algorithmwhich is a line search interior point method. Its convergenceis also discussed. Section 4 gives the whole framework of ourproposed two-stage algorithm to solve NMF in streaming data.Section 5 shows the efﬁciency of our proposed algorithm bynumerical tests. Section 6 concludes this paper with discus-sions on some future concerns.II. ANLS FRAMEWORK AND ACTIVE SET METHOD

We ﬁrst brieﬂy review the ANLS framework for solving (1).In ANLS, variables are simply divided into two groups, whichare then updated respectively as outlined in algorithm 1.

Algorithm 1

Alternating nonnegative least squares (ANLS)Repeat until stopping criteria are satisﬁed min X ∈ R n × k k XY − M k F s.t. X ≥ . min Y ∈ R k × m k XY − M k F s.t. Y ≥ . endNote that each subproblem can be split into a series ofnonnegative least square problems min x k Cx − d k s.t. x ≥ (2)For C = X , d corresponds to every column of M , while for C = Y T , d takes every row of M .Although the original problem in (1) is nonconvex, thesubproblems in algorithm 1 are convex quadratic problemswhose optimal solutions can be found in polynomial time. Inaddition, the convergence of algorithm 1 to a stationary pointof (1) has been proved [10].On the basis of ANLS framework, many algorithms forNMF have been developed, such as active set method [15],projection gradient method [22], projection Newton method[9], projection quasi-Newton method [14], [31], Nesterov’sgradient method [11], and the method combined with Barzilai-Borwein stepsize [13].The active set method solves the subproblem exactly. Kimand park [15] introduced an algorithm based on ANLS andactive set method. The constrained least square problem in the matrix form with multiple right-hand side vectors isdecomposed into several independent nonnegative least squareproblems with single right-hand side vector (2), which aresolved by the active set method of Lawson and Hanson [18].Later, Kim and Park [16] proposed an active-set-like algorithmbased on ANLS framework. In their algorithm, the single leastsquares are solved by block principal pivoting method, and thecolumns that have a common active set are grouped togetherto avoid redundant computation of Cholesky factorization insolving linear equations.By noticing the continuity in streaming data, when a singleleast square problem (2) is solved, the same C and similar d may be used in the subsequent least square, resulting in similarsolutions with likely the same active set.Therefore, we choose the active set method of Lawson andHanson [18] to solve the least square problem (2), using thesolution and its active set as the initial guess of the subsequentleast square. This strategy is called warm start strategy or con-tinuation technology, which is widely used in many algorithmsand can reasonably improve the effectiveness[24], [8]. We donot need to do any re-grouping, because the right-hand sidevectors have been naturally grouped due to the continuity indata, and we just need to solve them one by one sequentially.In fact, the active set of the solution in the subsequent leastsquare usually has no or little change. When we solve theﬁrst set of equations in the new least square subproblem,the Cholesky factorization performed in the previous step canbe utilized to avoid redundant computation. Numerically, wesee that the active set usually changes only on a very fewoccasions or does not change at all for the streaming data underconsideration. Consequently, the ﬁrst stage of our algorithm ischosen to be a combination of ANLS framework, active setmethod and warm start strategy.III. A LINE SEARCH INTERIOR POINT METHOD FOR

NMFInterior-point methods have proved to be equally successfulfor various nonlinear optimization as for linear programming.In this section, we propose a line search interior point methodfor solving NMF. Its global convergence and computationalcost are analyzed.

A. Algorithm

Given that the linear independence constraint qualiﬁcation(LICQ) holds for NMF problem, the KKT conditions [17] forthe problem can be written as ( XY − M ) Y T = RX T ( XY − M ) = S h R, X i = 0 h S, Y i = 0 X ≥ , Y ≥ , R ≥ , S ≥ (3)We denote x = vec ( X T ) , y = vec ( Y ) , r = vec ( R T ) , s = vec ( S ) , where vec transforms a matrix to a vector byexpanding it by columns. Meanwhile, we deﬁne the inverseoperations mat ( x ) = X T , mat ( y ) = Y , mat ( r ) = R T , mat ( s ) = S . EEE TRANSACTIONS ON SIGNAL PROCESSING 3

Applying Newton’s method to the nonlinear system, in thevariables x , y , r , s , we obtain  Q C T − IC Q − IDiag ( r ) Diag ( x ) Diag ( s ) Diag ( y )   ∆ x ∆ y ∆ r ∆ s  =  r − graXs − graYµe − Diag ( r ) xµe − Diag ( s ) y  (4)with µ = 0 , where graX = vec ( Y ( XY − M ) T ) , graY = vec ( X T ( XY − M )) , e is a vector of ones, Diag ( x ) constructsa diagonal matrix with the diagonal elements given by x , Q =  Y Y T . .. Y Y T  Q =  X T X . .. X T X  C =  X T Y T · · · X Tn Y T ... . . . ... X T Y Tm · · · X Tn Y Tm  +  ( X Y − M ) I · · · ( X n Y − M n ) I ... . . . ... ( X Y m − M m ) I · · · ( X n Y m − M nm ) I  (5) X i is the i th row of X , and Y j is the j th column of Y .Let µ be strictly positive, then the variables x , y , r and s are forced to take positive values. The trajectory ( x ( µ ) , y ( µ ) , r ( µ ) , s ( µ )) is called the primal-dual central path.The variables are updated by x = x + α ∆ xy = y + α ∆ yr = r + α ∆ rs = s + α ∆ s (6)where α ∈ (0 , α max1 ] , α ∈ (0 , α max2 ] , and α max1 = max { α ∈ (0 ,

1] : x + α ∆ x ≥ (1 − τ ) x,y + α ∆ y ≥ (1 − τ ) y } α max2 = max { α ∈ (0 ,

1] : r + α ∆ r ≥ (1 − τ ) r,s + α ∆ s ≥ (1 − τ ) s } (7)with τ ∈ (0 , . The condition (7) is called the fraction tothe boundary rule [25], which is used to prevent the variablesfrom approaching their lower bounds of 0 too quickly. In thiswork, we choose τ = 0 . .A predictor or probing strategy [25] can also be used todetermine the parameter µ . We calculate a predictor (afﬁnescaling) direction (∆ x aff , ∆ y aff , ∆ r aff , ∆ s aff ) by setting µ = 0 . We probe this direction by letting ( α aff , α aff ) be the longest step lengths that can be taken along the afﬁne scaling direction before violating the nonneg-ativity conditions ( x, y, r, s ) ≥ . Explicit formulas for thesestep lengths are given by (7) with τ = 1 . We then deﬁne µ aff to be the value of complementarity along the (shortened) afﬁnescaling step, that is, µ aff = [( x + α aff ∆ x ) T ( r + α aff ∆ r )+( y + α aff ∆ y ) T ( s + α aff ∆ s )] / ( nk + mk ) (8)and a heuristic choice of µ is deﬁned as following: µ = σµ and σ = min { ( µ aff ( x T r + y T s ) / ( nk + mk ) ) , . } . (9)We propose a two-level nested loop algorithm for theinterior point search. In the inner loop, the parameter µ isﬁxed. In the outer loop, we gradually reduce µ to 0. We usethe following error function to break the inner loop, which isbased on the perturbed KKT system: E ( x, y, r, s ; µ ) = max {k (( graX − r ) T , ( graY − s ) T ) T k , k (( r T Diag ( x ) , s T Diag ( Y )) − µe T ) T k} (10)To guarantee the global convergence of the algorithm, weapply a line search approach. First we consider the secondderivation of the interior-point method associated with thebarrier problem min X,Y k XY − M k F − µ X log( X ik ) − µ X log( Y kj ) We use the exact merit function as same as the barrier function,which can be formed by φ ( x, y ) = k mat ( x ) T mat ( y ) − M k F − µ P log( x i ) − µ P log( y j ) In the algorithm, we utilize Armijo line search to make themerit function sufﬁciently decrease.Considering the convexity of the Hessian, we approximate (cid:18) Q C T C Q (cid:19) by a positive deﬁnite matrix to guarantee the direction (∆ x, ∆ y ) is always a descent direction of φ ( x, y ) , so thatsuch a line search can be implemented. Notice that the originalNMF problem is a least square problem. Thus we can utilizethe Hessian in the traditional Guass-Newton algorithm. TheHessian matrix is (cid:18) Q ¯ C T ¯ C Q (cid:19) and it is guaranteed to be positive semi-deﬁnite. Here ¯ C =  X T Y T · · · X Tn Y T ... .. . ... X T Y Tm · · · X Tn Y Tm  EEE TRANSACTIONS ON SIGNAL PROCESSING 4 which is the ﬁrst item of C . For further safeguarding, we add adiagonal matrix ρI to this Hessian, where ρ is a small positiveconstant. Then we obtain the primal-dual direction by solving  Q + ρI ¯ C T − I ¯ C Q + ρI − IDiag ( r ) Diag ( x ) Diag ( s ) Diag ( y )   ∆ x ∆ y ∆ r ∆ s  =  r − graXs − graYµe − Diag ( r ) xµe − Diag ( s ) y  . (11)By eliminating ∆ r and ∆ s in (11), we have (cid:18) R ¯ C T ¯ C R (cid:19) (cid:18) ∆ x ∆ y (cid:19) = −∇ φ ( x, y ) , (12)where R = Q + ρI + Diag ( x ) − Diag ( r ) ,R = Q + ρI + Diag ( y ) − Diag ( s ) , ∇ φ ( x, y ) = ( graX T − µ ( x − ) T , graY T − µ ( y − ) T ) T , and Diag ( x ) − e is simply represented by x − . We simplifythe above formula by Bp = −∇ φ ( x, y ) , where B represents the coefﬁcient matrix and p represents thedirection (∆ x T , ∆ y T ) T . Since B is positive deﬁnite, the innerproduct p T ∇ φ ( x, y ) > , which means that p is a descentdirection.To sum up all the approaches above, we present the wholealgorithm of interior point method in (2). It contains two loops.The parameter µ is ﬁxed in the inner loop. Due to the Armijoline search (13) in the inner loop, the stopping criterion ofthe inner loop can be satisﬁed in ﬁnite iterations. Further, theparameter µ and ǫ µ are reduced gradually, and by the deﬁnitionof error function E , the solutions of the two-loop algorithmsatisfy the KKT system (3) within the error ǫ T OL . In practice,the barrier stop tolerance can be deﬁned as ǫ µ = µ. The complete convergence theorem and its proof are given in(1).

Theorem 1.

Suppose that all the sequences { x k } , { y k } , { r k } , { s k } generated by algorithm 2 are bounded. Then algorithm 2stops in ﬁnite iterations.Proof. We will consider the inner loop ﬁrst and show that fora given µ > , E ( x k , y k , r k , s k ; µ ) ≤ ǫ µ will be satisﬁed inﬁnite iterations.Based on the Armijo line search rule (13), the value of φ ( x k , y k ) decreases monotonously. Then we have that thelower bound of x k and y k is greater than a strictly positiveconstant that depends on µ . Thus the smallest eigenvalue of thecoefﬁcient matrix of (12) is greater than a constant greater than0. Further more, using the boundedness of ( x k , y k ) , we obtainthat (∆ x, ∆ y ) is bounded. According to the lower bound of x k , y k , the boundedness of (∆ x, ∆ y ) and (7), we have inf α max1 > . (14) Algorithm 2

A line search interior point methodInitialize: Choose x , y , r , s > , Select an initial barrierparameter µ > , parameters η, σ ∈ (0 , , and decreasingtolerances ǫ µ ↓ and ǫ T OL . Set k = 0 .Repeat until E ( x k , y k , r k , s k ; 0) ≤ ǫ T OL

Repeat until E ( x k , y k , r k , s k ; µ ) ≤ ǫ µ Compute the primal-dual direction by solving (11).Compute α max1 and α max2 using (7).Backtrack step lengths α = t α max1 , α = α max2 to ﬁnd the smallest integer t ≥ satisfying φ ( x k + α ∆ x, y k + α ∆ y ) ≤ φ ( x k , y k ) + ηα (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) (13)Compute ( x k +1 , y k +1 , r k +1 , s k +1 ) using (6).Set k := k + 1 .endCompute parameter σ using (8) and (9) and update µ = σµ .endAccording to the stepsize rule (13), we have α (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) → We aim to prove (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) → (15)next. To prove it by contradiction, we suppose that (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) (16)This means that there exists a subsequence T and a constant a > such that − (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) > a (17)for k ∈ T . Due to the boundedness of { ( x k , y k ) } k ∈T ,there exists a subsequence T ∈ T such that { ( x k , y k ) } k ∈T converges to (¯ x, ¯ y ) . By (16) and (17), we have { α k } k ∈T → According to (14), α k < inf α max1 when k ∈ T is large enough. For simplicity, we redeﬁnethe sequence satisfying the above condition as T . From thestepsize rule, the condition (13) is violated by α = 2 α k . Wehave ( φ ( x k + 2 α k ∆ x, y k + 2 α k ∆ y ) − φ ( x k , y k )) / (2 α k ) > η (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) (18)Taking the limit of the above inequality, we obtain (∆¯ x T , ∆¯ y T ) ∇ φ (¯ x k , ¯ y k ) ≥ η (∆¯ x T , ∆¯ y T ) ∇ φ (¯ x k , ¯ y k ) Due to < η < , it follows that (∆¯ x T , ∆¯ y T ) ∇ φ (¯ x k , ¯ y k ) ≥ . On the other hand, (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) < . Therefore, (∆¯ x T , ∆¯ y T ) ∇ φ (¯ x k , ¯ y k ) = 0 , which is in contradiction to(16). So (15) is established.From (12) and (15), it follows that ∇ φ ( x k , y k ) → EEE TRANSACTIONS ON SIGNAL PROCESSING 5 and (∆ x, ∆ y ) → . Then according to (11), we obtain ∆ r → µx − k − r k ∆ s → µy − k − s k For an arbitrary cluster point (¯ x, ¯ y ) , for any δ µ > , thereexists a constant κ such that k ( x κ , y κ ) − (¯ x, ¯ y ) k ≤ δ µ and k ∆ r + r k − µx − k k ≤ δ µ k ∆ s + s k − µy − k k ≤ δ µ k (∆ x, ∆ y ) k ≤ δ µ k∇ φ ( x k , y k ) k ≤ δ µ for all k ≥ κ . Due to the boundedness of ( x k , y k ) , α k canreach 1 for some k := ¯ k ≤ κ + T − , where T is a constant.Then it follows from k (¯ x, ¯ y ) − ( x ¯ k , y ¯ k ) k ≤ T δ µ that k r ¯ k − µx − k k ≤ cδ µ k s ¯ k − µy − k k ≤ cδ µ where c is constant. Therefore, for a given ǫ µ > , let δ µ be sufﬁciently small, then there exists a k such that E ( x k , y k , r k , s k ; µ ) ≤ ǫ µ .We denote the sequence satisfying the inner loop stoppingcriterion by { ( x k , y k , r k , s k } k ∈S . To prove the theorem by contradiction, we suppose thatthere is no point satisfying the outer loop stopping criterion E ( x k , y k , r k , s k ; 0) ≤ ǫ T OL . Due to the boundedness, thereexists a cluster point of { ( x k , y k , r k , s k } k ∈S . For any clusterpoint (¯ x, ¯ y, ¯ r, ¯ s ) , we consider the limits on both sides of E ( x k , y k , r k , s k ; µ ) k ∈T ≤ ǫ µ , then we have that E (¯ x, ¯ y, ¯ r, ¯ s ; 0) = 0 . Thus algorithm 2 stops at some k ∈ T , a contradiction.Therefore, algorithm 2 stops in ﬁnite iterations. B. Computation

In general, the computational cost in each iteration of theinterior point method is usually the cubic power of its size,which is impractical for large scale problems. However, for thespecial NMF problem under consideration here, the amountof computation can be greatly reduced, so that the proposedmethod can be applied to practical problems involving stream-ing data. In the following, we analyze the computational costof algorithm 2.First of all, to compute the gradient in (1), O ( nmk ) ﬂopsare needed. The main computational cost of (2) is computing the primaldual direction (11). One can solve (12) to obtain (∆ x, ∆ y ) ﬁrst, and then compute (∆ r, ∆ s ) within a low cost.We rewrite (12) as (cid:18) ¯ Q ¯ C T ¯ C ¯ Q (cid:19) (cid:18) ∆ x ∆ y (cid:19) = (cid:18) b b (cid:19) (19)In order to minimize the computational cost, we ﬁrst decom-pose ¯ Q . ¯ Q = P T P can be obtained by Cholesky factorization or eigenvalue de-composition. Since the matrix ¯ Q is composed of n positivedeﬁnite diagonal blocks with the size of k times k, one canobtain P and P − within O ( nk ) ﬂops. (19) is equivalent to (cid:18) I P − T ¯ C T ¯ CP − ¯ Q (cid:19) (cid:18) P ∆ x ∆ y (cid:19) = (cid:18) P − T b b (cid:19) . Then we solve ∆ y from ( ¯ Q − ( ¯ CP − )( ¯ CP − ) T )∆ y = b − ( ¯ CP − )( P − T b ) (20)and compute ∆ x by ∆ x = P − ( P − T b − ( ¯ CP − ) T ∆ y ) . We need O ( nmk ) ﬂops for constructing ¯ C and O ( nmk ) ﬂops for computing ¯ CP − . By considering that each blockof ¯ C is rank one, we can compute ¯ CP − within O ( nmk ) ﬂops. The dominant computation is the computation of ( ¯ CP − )( ¯ CP − ) T which costs O ( nm k ) ﬂops. If we con-sider that each block of ¯ C is rank one, it can be reducedto O ( nm k ) ﬂops. When solving (20) by Cholesky fac-torization, since the size of the coefﬁcient matrix is mk by mk , the computational cost is O ( m k ) . Other computations,like computing the right side of (20) and computing ∆ x , are O ( nmk ) ﬂops.To sum up, the computation cost of algorithm 2 in eachiteration is O (( n + mk ) m k ) . Compared with the computa-tion of gradient O ( nmk ) , the cost is no more than O ( mk ) times. Since n ≫ m ≫ k and k is usually very small in ourstreaming data, the computational complexity is completelyacceptable.IV. A TWO - STAGE ALGORITHM FOR

NMF

ON STREAMINGDATA

In this section, we propose a practical algorithm with fastconvergence for solving NMF in streaming data. It combinesboth ANLS framework with active set method and interiorpoint method proposed in the previous sections.In the early stage of the algorithm, we use ANLS frameworkwith active set method. It can reduce the value of objectivefunction rapidly. We use the relative step tolerance, which isa relative lower bound on the size of a step, meaning k ( x k , y k ) − ( x k +1 , y k +1 ) k ≤ ǫ ST OL (1 + k ( x k , y k ) k ) , as the stopping criterion of this stage. If the algorithm attemptsto take a step that is smaller than step tolerance, the iterationsend. EEE TRANSACTIONS ON SIGNAL PROCESSING 6

At the end of the ﬁrst phase, the algorithm is going to enterthe second phase of using the interior point method. However,the solutions from active set method usually contain elementswith value zero, which is incompatible to the strict interiorpoint required by interior point method. Meanwhile, we alsoneed to provide the initial dual variables ( r, s ) to the interiorpoint method. To address these issues, we ﬁrst change theprimal variable smaller than ρ to ρ by ( x, y ) := max { ( x, y ) , ρ } (21)where ρ is a small positive constant. We can choose ρ = 10 − max { ( x, y ) } . (22)Next, we give the initial value of the dual variable by r = max {| graX |} es = max {| graY |} e (23)We set the parameters µ = x T r + y T smk + nk (24)and ρ = ǫ T OL . (25)After these preparations, the algorithm enters its secondstage by implementing algorithm 2. As the Hessian matrixis approximated in algorithm 2 by the positive deﬁnite matrix (cid:18) Q + ρI ¯ C ¯ C Q + ρI (cid:19) , (26)in order to further speed up convergence, we can change ¯ C back to C at the right time. A heuristic way to switch to (cid:18) Q + ρI CC Q + ρI (cid:19) , (27)is by monitoring σ , which is given in (9). A small σ impliesthat the predictor step generates a point close to the boundary,thus it is likely to be close to a local minimum. Therefore,when σ ≤ σ c , we switch to (27), where σ c is a user-supplied constant, andwe set σ c = 0 . in our test. The computation of the primal-dual direction is similar to that using ¯ C . The difference is thateach block of C is no longer rank one, thus the computationalcost is O ( nm k ) ﬂops. Since (27) is not guaranteed to bepositive deﬁnite, the primal-dual direction using (27) may notbe a descent direction. We check the negativity of (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) in (13) before we implement the line search. If we fail toobtain a descent direction, we switch to the positive deﬁniteHessian (26).All elements of the above approaches are presented in ourtwo-stage algorithm, as outlined below in algorithm 3. Algorithm 3

A fast two-stage algorithmInitialize: Choose initial

X, Y

Implement (1) with active set methodUpdate variables by (21), (22) and (23)Set parameters by (24) and (25), η = 0 . , select the tolerance ǫ T OL , and let f lag = 0 .Repeat until E ( x k , y k , r k , s k ; 0) ≤ ǫ T OL ǫ µ := µ Repeat until E ( x k , y k , r k , s k ; µ ) ≤ ǫ µ If f lag = 1 ,compute the primal-dual direction by solving (11)with Hessian (27).If (∆ x T , ∆ y T ) ∇ φ ( x k , y k ) ≥ , f lag = 0 , compute the primal-dual direction bysolving (11).Else,compute the primal-dual direction by solving (11).Compute α max1 and α max2 using (7).Backtrack step lengths α = t α max1 , α = α max2 to ﬁndthe smallest integer t ≥ satisfying (13).Compute ( x k +1 , y k +1 , r k +1 , s k +1 ) using (6).Set k := k + 1 .EndCompute parameter σ using (8) and (9) and update µ = σµ .If σ ≤ . , f lag = 1 , else f lag = 0 End V. N

UMERICAL TESTS

We test our two-stage algorithm and compare it with otherANLS-based methods including NeNMF [11], QRPBB [13]and ANLS-BPP [16]. All the tests are performed using MAT-LAB 2020a on a computer with 64-bit system and 2.70GHzCPU. Comparisons are done on both synthetic data sets andreal-world problems.

A. Stopping criterion

The KKT conditions of (1) are given in (3). The deﬁnition ofthe error function E ( x, y, r, s ; 0) (10) measures the violationof the KKT conditions. Therefore, we set E ( x, y, r, s ; 0) ≤ ǫ T OL to be the stopping criterion, where ǫ T OL = 10 − . The ANLS-based algorithms do not generate the dual vari-ables r and s . Here we give a reasonable deﬁnition that r = max { graX, } s = max { graY, } In some cases, we also limit the maximum CPU time, incases where some algorithms can not reach the given accuracy.

B. Streaming AgMOR PDF data

Zhao et al. [32] measured the X-ray diffraction data duringthe nucleation and growth of zeolite-supported Ag nanopar-ticles through reduction of Ag-exchanged mordenite (MOR),

EEE TRANSACTIONS ON SIGNAL PROCESSING 7 cpu time(s) -5 E NeNMFQRPBBANLS-BPP2-STAGE

Fig. 1. CPU time (s) versus tolerance values on AgMOR Data and processed the data with pair distribution function (PDF)measurement. In the ﬁeld of chemistry, more and more peopleuse mathematical tools to analyze their measured data. Chap-man et al. [4] used principle component analysis (PCA) andsome post-processing to analyze the data given in [32], andobtained 3 principle components. Since PDF is a distribution-type function, NMF may be intuitively more applicable.We simply remove the negativity of the raw data by shiftingeach PDF up by the opposite of its original minimum value.Then we perform the NMF algorithms on the data. The sizeof the data is n = 3000 , m = 36 , and in the algorithms, k is set to be 3 based on the analysisof [4].We use the same initial point for each algorithm. Therelationship between KKT violation E and CPU time of thefour algorithms are shown in Fig. 1. It can be seen thatthe performances of NeNMF, ANLS-BPP and our 2-STAGEalgorithm are similar in the early stage, and the decline oferror function value for QRPBB is the most obvious. Afterthat, there is a rise of error function value in 2-STAGE,because our algorithm begins to enter the second stage andthe variables change, and some shocks occur later, which iscaused by the instability of the early iterations of the interiorpoint method. After several iterations, 2-STAGE becomesstable and converges quickly to meet the termination criterion.Compared with NeNMF and ANLS-BPP, QRPBB is alwaysmore accurate. Even so, it can not achieve the given criterionin a short time.Next, we generate 10 different random initial points. Foreach of them, all algorithms are implemented. The resultsare shown in Tab. I. Because none of the three algorithmscompared with the 2-STAGE algorithm can meet the stoppingcriterion in a short time, we set the maximum running CPUtime as 20 seconds. Tab. I presents the average, minimum andmaximum CPU time, the KKT violation E and the objectivefunction value of each algorithm respectively. It can be seenthat our algorithm always has the minimum KKT violation andthe minimum objective function value. It has a great advantagein ﬁnding a high-precision solution within a short amount oftime. C. Yale face database

The Yale face database is widely tested for face recognition[3]. It contains 165 grayscale images of 15 individuals. Thereare 11 images per subject, one per different facial expression orconﬁguration: lighting (center-light, left-light and right-light),with/without glasses, facial expressions (normal, happy, sad,sleepy, surprised, and wink). The original image size is × pixels. To reduce the computational burden, the image sizewas reduced to × pixels. Because the image of face alsohas some degree of continuity, we regard the data as streamingdata to test our algorithm. The data size is n = 4096 , m = 165 .We gradually increase m to explore the inﬂuence of m vari-ation on the algorithm. We selected the face images of the ﬁrst i individuals, i = 1 , , , , , i.e. m = 11 , , , , .In the following tests, we ﬁx k = 3 . We limit a maximumCPU time of 60 seconds for each algorithm and generate10 different random initial points for each sample. We seefrom Fig. 2, where we take an example of m = 44 and anexample of m = 165 , that the ﬁrst stage of 2-STAGE is notas efﬁcient as other algorithms. The main reason is that thecontinuity of face data is not strong enough compared withstreaming data, thus the active set changes frequently in thealgorithm resulting large computational cost. In Fig. 2 (top),the second stage of 2-STAGE, interior point method, shows afast local convergence, since the slope is signiﬁcantly steeperthan the others. However, in Fig. 2 (bottom), the slope of thesecond stage of 2-STAGE is similar with others. The reasonis that when m increases, the computational cost of interiorpoint method increases faster than that of other algorithms.Therefore, we see from Tab. II that, as anticipated, the larger m is, the less efﬁcient 2-STAGE is compared with otheralgorithms. D. Synthetic data

In order to test more problems on different scales, weartiﬁcially synthesized some data for testing.We note that the focus of this paper is on streaming data.Chapman et al.[4] use PCA method to study AgMor data. Theyfound that there were three dominant components (singularvalues) of the data, which means that the original data is arank-3 matrix plus noise. According to this characteristics, weconstruct the artiﬁcial data by the following methods. First, wedecide the problem size ( n, m, k ) . Then, we generate a randommatrix X whose size is n × k , and each element is uniformlydistributed from 0 to 1. In the same way, we generate a randommatrix Y whose size is k × m . Next, we compute M = XY and add Gaussian noise, whose expectation is 0 and standarddeviation is . , to each element of M . Finally, we change thenegative elements in M to zeros.In this set of tests, each algorithm is limited to a maximumCPU time of 60 seconds. We generate 10 different randominitial points for sample. The results are shown in Tab. III.From an average point of view, the performance of 2-STAGEis the best of the four algorithms, followed by QRPBB. Exceptfor M = 50, k = 6, ANLS-BPP exceeds QRPBB. In some cases,other algorithms may be better than 2-STAGE. For example,when m = 100 and k = 6 , the minimum time of QRPBB EEE TRANSACTIONS ON SIGNAL PROCESSING 8

Algorithm cpu(s):avrg(min,max) E:avrg(min,max) f:avrg(min,max)NeNMF 20.01(20.00,20.03) 6.64(1.72,16.32)e-2 5.37274(5.37242,5.37329)e+2QRPBB 20.02(20.00,20.03) 2.19(1.48,2.94)e-3 5.37236(5.37213,5.37263)e+2ANLS-BPP 20.02(20.02,20.05) 1.37(0.75,2.14)e-2 5.37236(5.37206,5.37264)e+22-STAGE 5.03(3.55,8.88) 3.58(1.24,6.30)e-7 5.37205(5.37205,5.37205)e+2TABLE IE

XPERIMENTAL RESULTS ON A G MOR

DATA size(n,m,k) Algorithm cpu(s):avrg(min,max) E:avrg(min,max) f:avrg(min,max)(4096,11,3) NeNMF 44.94(30.50,66.53) 125(9.77e-7,1254) 1.40186(1.40104,1.40918)e+7QRPBB 19.10(12.05,40.28) 8.20(3.39,9.99)e-7 1.40511(1.40104,1.40918)e+7ANLS-BPP 23.14(16.42,49.08) 9.80(9.50,9.98)e-7 1.40185(1.40104,1.40918)e+72-STAGE 13.30(6.83,24.33) 4.39(1.16,9.46)e-7 1.40520(1.40104,1.41008)e+7(4096,22,3) NeNMF 36.36(7.28,65.63) 1.13(9.47e-7,11.34) 3.13573(3.12627,3.14786)e+7QRPBB 6.30(2.41,17.23) 7.32(2.63,9.95)e-7 3.13324(3.12627,3.14785)e+7ANLS-BPP 19.90(2.89,60.03) 0.79(9.48e-7,5.78) 3.13772(3.12627,3.14786)e+72-STAGE 9.81(5.45,18.28) 3.54(1.79,7.37)e-7 3.12893(3.12627,3.14785)e+7(4096,44,3) NeNMF 63.76(61.03,67.06) 2.71(2.04,6.17)e-6 8.20694(8.20694,8.20694)e+7QRPBB 6.32(4.97,7.06) 8.52(2.49,9.99)e-7 8.20694(8.20694,8.20694)e+7ANLS-BPP 10.06(8.34,12.14) 9.85(9.44,9.99)e-7 8.20694(8.20694,8.20694)e+72-STAGE 16.05(11.08,31.14) 5.05(1.02,9.02)e-7 8.20694(8.20694,8.20694)e+7(4096,88,3) NeNMF 62.86(61.05,64.81) 2.09(1.35,4.19)e-6 1.97873(1.97753,1.98962)e+8QRPBB 4.97(3.14,10.58) 8.37(5.42,9.96)e-7 1.97945(1.97753,1.98962)e+7ANLS-BPP 9.10(7.25,16.50) 4.53(1.32,9.81)e-7 1.97874(1.97753,1.98962)e+72-STAGE 14.76(7.86,43.48) 5.05(1.02,9.02)e-7 1.97753(1.97753,1.97753)e+7(4096,165,3) NeNMF 63.99(60.17,67.20) 1.54(1.35,1.89)e-6 4.06305(4.06305,4.06305)e+8QRPBB 5.91(5.30,6.78) 5.01(4.79,9.58)e-7 4.06305(4.06305,4.06305)e+8ANLS-BPP 11.46(10.81,12.36) 9.60(9.40,9.85)e-7 4.06305(4.06305,4.06305)e+82-STAGE 29.78(16.58,61.78) 4.10(1.19,8.39)e-7 4.06305(4.06305,4.06305)e+8TABLE IIE

XPERIMENTAL RESULTS ON Y ALE FACE DATABASE cpu time(s) -5 E NeNMFQRPBBANLS-BPP2-STAGE cpu time(s) -5 E NeNMFQRPBBANLS-BPP2-STAGE

Fig. 2. Experimental tests on Yale face data, m = 44 (top) and m = 165 (bottom). is much less than the maximum time of 2-STAGE. Generallyspeaking, the performance of 2Stage in getting high-precisionsolution truly stands out.VI. C ONCLUSIONS

In this paper we focused on solutions to the NMF forstreaming data. We presented a fast two-stage algorithm, wherethe ﬁrst stage is the ANLS framework with active set methodwhich gains beneﬁt from the continuity of streaming data,and the second stage is a line search interior point methodwhich gets beneﬁt from n ≫ m ≫ k . In addition, we haveproved the global convergence of the proposed line searchinterior point method. The ﬁrst stage reduces the value of theobjective function rapidly, and the second stage converges toa local solution quickly due to the property of Newton-typedirection. We tested the proposed algorithm on several realand synthetic data, and observed that compared with otheralgorithms, our algorithm is more effective in solving high-precision local solutions.The active set method in the ﬁrst stage does not reachthe expected speed, even if it is tested on continuous data.We think that this may be caused by the limitations of theunderlying code implementation in MATLAB. On the otherhand, we ﬁnd that the transition part between the two stagesmay induce instability. This is because the solution of theactive set method cannot be directly used as the initial guess ofthe interior point method, and its changes have an impact onstability. At present, the parameters used to generate startingpoint are selected carefully to avoid the instability. In thefuture, we will work to ﬁnd a more stable transition technique. EEE TRANSACTIONS ON SIGNAL PROCESSING 9 size(n,m,k) Algorithm cpu(s):avrg(min,max) E:avrg(min,max) f:avrg(min,max)(2000,50,3) NeNMF 60.01(60.00,60.03) 1.74(0.70,3.57)e-5 4.61662(4.61662,4.61662)e+2QRPBB 29.36(20.17,48.14) 8.46(5.60,9.80)e-7 4.61662(4.61662,4.61662)e+2ANLS-BPP 58.48(47.89,60.05) 1.69(1.00,3.29)e-6 4.61662(4.61662,4.61662)e+22-STAGE 3.31(2.45,4.75) 3.07(1.16,5.86)e-7 4.61662(4.61662,4.61662)e+2(2000,50,4) NeNMF 47.80(40.88,60.02) 1.19(1.00,2.31)e-6 4.56019(4.56019,4.56019)e+2QRPBB 22.89(12.78,34.22) 7.30(2.67,9.99)e-7 4.56019(4.56019,4.56019)e+2ANLS-BPP 28.03(23.44,32.61) 9.99(9.98,10.00)e-7 4.56019(4.56019,4.56019)e+22-STAGE 7.41(4.22,11.78) 3.25(1.09,8.91)e-7 4.56019(4.56019,4.56019)e+2(2000,50,5) NeNMF 49.81(40.92,56.80) 9.99(9.98,10.00)e-7 4.45449(4.45449,4.45449)e+2QRPBB 25.14(20.61,35.91) 5.99(1.80,8.86)e-7 4.45449(4.45449,4.45449)e+2ANLS-BPP 24.78(21.67,28.31) 9.99(9.97,9.99)e-7 4.45449(4.45449,4.45449)e+22-STAGE 6.98(6.20,7.91) 3.93(1.69,8.31)e-7 4.45449(4.45449,4.45449)e+2(2000,50,6) NeNMF 60.02(60.00,60.03) 5.18(0.38,13.11)e-3 4.36121(4.36121,4.36121)e+2QRPBB 60.01(60.00,60.03) 2.19(0.02,8.31)e-4 4.36121(4.36121,4.36121)e+2ANLS-BPP 56.07(20.38,60.05) 5.35(0.10,32.81)e-5 4.36309(4.36121,4.38005)e+22-STAGE 14.46(6.31,25.81) 4.74(1.18,7.77)e-7 4.36121(4.36121,4.36121)e+2(2000,100,3) NeNMF 17.14(14.88,19.91) 9.96(9.93,10.00)e-7 9.57337(9.57337,9.57337)e+2QRPBB 5.84(4.86,7.70) 7.81(1.76,9.98)e-7 9.57337(9.57337,9.57337)e+2ANLS-BPP 15.10(14.27,15.97) 9.98(9.94,10.00)e-7 9.57337(9.57337,9.57337)e+22-STAGE 4.05(2.95,5.33) 4.71(1.00,9.92)e-7 9.57337(9.57337,9.57337)e+2(2000,100,4) NeNMF 26.86(23.59,31.17) 9.98(9.97,10.00)e-7 9.54230(9.54230,9.54230)e+2QRPBB 14.53(13.17,15.63) 5.75(2.37,8.99)e-7 9.54230(9.54230,9.54230)e+2ANLS-BPP 24.60(22.36,26.70) 9.98(9.96,10.00)e-7 9.54230(9.54230,9.54230)e+22-STAGE 7.46(5.36,8.91) 4.82(1.01,9.44)e-7 9.54230(9.54230,9.54230)e+2(2000,100,5) NeNMF 36.04(31.03,41.45) 9.98(9.97,10.00)e-7 9.44857(9.44857,9.44857)e+2QRPBB 17.60(16.17,20.45) 7.02(4.22,9.12)e-7 9.44857(9.44857,9.44857)e+2ANLS-BPP 27.54(25.70,29.58) 9.98(9.96,10.00)e-7 9.44857(9.44857,9.44857)e+22-STAGE 11.77(9.69,17.59) 4.59(1.27,8.06)e-7 9.44857(9.44857,9.44857)e+2(2000,100,6) NeNMF 60.02(60.00,60.03) 7.43(0.03,28.52)e-3 9.35451(9.35451,9.35452)e+2QRPBB 57.20(37.47,60.03) 2.54(0.07,12.20)e-5 9.35451(9.35451,9.35451)e+2ANLS-BPP 60.03(60.02,60.05) 1.14(0.04,4.44)e-4 9.35451(9.35451,9.35451)e+22-STAGE 25.54(14.44,52.39) 3.05(1.04,9.33)e-7 9.35451(9.35451,9.35451)e+2TABLE IIIE

XPERIMENTAL RESULTS ON SYNTHETIC DATA

Considering that in addition to the basic NMF model,there are other variants of NMF, such as constrained NMFsand structured NMFs, our algorithm has the potential to beapplications to more problems through extension. This will beour consideration in the future.R

EFERENCES[1] S. Arora, R. Ge, R. Kannan, and A. Moitra, “Computing a nonnegativematrix factorization—provably,”

SIAM Journal on Computing , vol. 45,no. 4, pp. 1582–1611, 2016.[2] I. Buciu, “Non-negative matrix factorization, a new tool for featureextraction: theory and applications,”

International Journal of Computers,Communications and Control , vol. 3, no. 3, pp. 67–74, 2008.[3] D. Cai, X. He, J. Han, and H.-J. Zhang, “Orthogonal laplacianfacesfor face recognition,”

IEEE transactions on image processing , vol. 15,no. 11, pp. 3608–3614, 2006.[4] K. W. Chapman, S. H. Lapidus, and P. J. Chupas, “Applications ofprincipal component analysis to pair distribution function data,”

Journalof Applied Crystallography , vol. 48, no. 6, pp. 1619–1626, 2015.[5] A. Cichocki and A.-H. Phan, “Fast local algorithms for large scalenonnegative matrix and tensor factorizations,”

IEICE transactions onfundamentals of electronics, communications and computer sciences ,vol. 92, no. 3, pp. 708–721, 2009.[6] A. Cichocki, R. Zdunek, A. H. Phan, and S.-i. Amari,

Nonnegativematrix and tensor factorizations: applications to exploratory multi-waydata analysis and blind source separation . John Wiley & Sons, 2009.[7] T. Egami and S. J. L. Billinge,

Underneath the Bragg peaks: structuralanalysis of complex materials , 2nd ed. Amsterdam: Elsevier, 2012.[Online]. Available: Link[8] D. Goldfarb and S. Ma, “Convergence of ﬁxed-point continuationalgorithms for matrix rank minimization,”

Foundations of ComputationalMathematics , vol. 11, no. 2, pp. 183–210, 2011.[9] P. Gong and C. Zhang, “Efﬁcient nonnegative matrix factorization viaprojected newton method,”

Pattern Recognition , vol. 45, no. 9, pp. 3557–3565, 2012. [10] L. Grippo and M. Sciandrone, “On the convergence of the blocknonlinear gauss–seidel method under convex constraints,”

Operationsresearch letters , vol. 26, no. 3, pp. 127–136, 2000.[11] N. Guan, D. Tao, Z. Luo, and B. Yuan, “Nenmf: An optimal gradientmethod for nonnegative matrix factorization,”

IEEE Transactions onSignal Processing , vol. 60, no. 6, pp. 2882–2898, 2012.[12] D. Hajinezhad, T.-H. Chang, X. Wang, Q. Shi, and M. Hong, “Non-negative matrix factorization using admm: Algorithm and convergenceanalysis,” in . IEEE, 2016, pp. 4742–4746.[13] Y. Huang, H. Liu, and S. Zhou, “Quadratic regularization projectedbarzilai–borwein method for nonnegative matrix factorization,”

Datamining and knowledge discovery , vol. 29, no. 6, pp. 1665–1684, 2015.[14] D. Kim, S. Sra, and I. S. Dhillon, “Fast newton-type methods for the leastsquares nonnegative matrix approximation problem,” in

Proceedings ofthe 2007 SIAM international conference on data mining . SIAM, 2007,pp. 343–354.[15] H. Kim and H. Park, “Nonnegative matrix factorization based on alter-nating nonnegativity constrained least squares and active set method,”

SIAM journal on matrix analysis and applications , vol. 30, no. 2, pp.713–730, 2008.[16] J. Kim and H. Park, “Fast nonnegative matrix factorization: An active-set-like method and comparisons,”

SIAM Journal on Scientiﬁc Comput-ing , vol. 33, no. 6, pp. 3261–3281, 2011.[17] H. W. Kuhn and A. W. Tucker, “Nonlinear programming,” in

Proceedings of the Second Berkeley Symposium on MathematicalStatistics and Probability . Berkeley, Calif.: University of CaliforniaPress, 1951, pp. 481–492. [Online]. Available: Link[18] C. L. Lawson and R. J. Hanson,

Solving least squares problems . Siam,1995, vol. 15.[19] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,”

Nature , vol. 401, no. 6755, p. 788, 1999.[20] ——, “Algorithms for non-negative matrix factorization,” in

Advancesin neural information processing systems , 2001, pp. 556–562.[21] L. Li and Y.-J. Zhang, “Fastnmf: highly efﬁcient monotonic ﬁxed-pointnonnegative matrix factorization algorithm with good applicability,”

Journal of Electronic Imaging , vol. 18, no. 3, p. 033004, 2009.

EEE TRANSACTIONS ON SIGNAL PROCESSING 10 [22] C.-J. Lin, “Projected gradient methods for nonnegative matrix factoriza-tion,”

Neural computation , vol. 19, no. 10, pp. 2756–2779, 2007.[23] C.-H. Liu, C. J. Wright, R. Gu, S. Bandi, A. Wustrow, P. K. Todd,D. O’Nolan, M. L. Beauvais, J. R. Neilson, P. J. Chupas, K. W.Chapman, and S. J. L. Billinge, “Validation of non-negative matrixfactorization for assessment of atomic pair-distribution function (pdf)data in a real-time streaming context,”

J. Appl. Crystallogr. , Jan 2020,arXiv:2010.11807 [cond-mat.mtrl-sci]. [Online]. Available: Link[24] Y.-F. Liu, M. Hong, and E. Song, “Sample approximation-based deﬂa-tion approaches for chance sinr-constrained joint power and admissioncontrol,”

IEEE Transactions on Wireless Communications , vol. 15, no. 7,pp. 4535–4547, 2016.[25] J. Nocedal and S. Wright,

Numerical optimization . Springer Science& Business Media, 2006.[26] P. Paatero and U. Tapper, “Positive matrix factorization: A non-negativefactor model with optimal utilization of error estimates of data values,”

Environmetrics , vol. 5, no. 2, pp. 111–126, 1994.[27] S. Sra and I. S. Dhillon,

Nonnegative matrix approximation: Algorithmsand applications . Computer Science Department, University of Texasat Austin, 2006.[28] P. K. Todd, A. Wustrow, R. D. McAuliffe, M. J. McDermott, G. T.Tran, B. C. McBride, E. D. Boeding, D. O’Nolan, C.-H. Liu, S. S.Dwaraknath, K. W. Chapman,

Simon J. L. Billinge , K. A. Persson,A. Huq, G. M. Veith, and J. R. Neilson, “Defect-accommodatingintermediates yield selective low-temperature synthesis of YMnO polymorphs,” Inorg. Chem. , vol. 59, p. 13639–13650, Aug 2020.[Online]. Available: Link[29] S. A. Vavasis, “On the complexity of nonnegative matrix factorization,”

SIAM Journal on Optimization , vol. 20, no. 3, pp. 1364–1377, 2009.[30] Y.-X. Wang and Y.-J. Zhang, “Nonnegative matrix factorization: Acomprehensive review,”

IEEE Transactions on Knowledge and DataEngineering , vol. 25, no. 6, pp. 1336–1353, 2012.[31] R. Zdunek and A. Cichocki, “Non-negative matrix factorization withquasi-newton optimization,” in

International conference on artiﬁcialintelligence and soft computing . Springer, 2006, pp. 870–879.[32] H. Zhao, T. M. Nenoff, G. Jennings, P. J. Chupas, and K. W. Chapman,“Determining quantitative kinetics and the structural mechanism forparticle growth in porous templates,”