Monotonous (Semi-)Nonnegative Matrix Factorization
aa r X i v : . [ c s . L G ] M a y Monotonous (Semi-)Nonnegative Matrix Factorization
Nirav Bhatt a⋆ , Arun Ayyar b ∗ a Systems & Control Group, Department of Chemical Engineering,Indian Institute of Technology Madras, Chennai - 600036, India b Santa Fe Research, IIT Madras Research Park, Chennai - 600036, IndiaMay 5, 2015
Abstract
Nonnegative matrix factorization (NMF) factorizes a non-negative matrix into product of twonon-negative matrices, namely a signal matrix and a mixing matrix. NMF suffers from the scaleand ordering ambiguities. Often, the source signals can be monotonous in nature. For example, insource separation problem, the source signals can be monotonously increasing or decreasing whilethe mixing matrix can have nonnegative entries. NMF methods may not be effective for suchcases as it suffers from the ordering ambiguity. This paper proposes an approach to incorporatenotion of monotonicity in NMF, labeled as monotonous NMF . An algorithm based on alternatingleast-squares is proposed for recovering monotonous signals from a data matrix. Further, theassumption on mixing matrix is relaxed to extend monotonous NMF for data matrix with realnumbers as entries. The approach is illustrated using synthetic noisy data. The results obtained bymonotonous NMF are compared with standard NMF algorithms in the literature, and it is shownthat monotonous NMF estimates source signals well in comparison to standard NMF algorithmswhen the underlying sources signals are monotonous.
Keywords : Nonnegative matrix factorization, Monotonicity, Unsupervised learning, Blind sourceseparation
Nonnegative matrix factorization (NMF) is one of the widely used matrix factorization techniqueswith application areas ranging from basic sciences such as chemistry, environmental science, systems ∗ Corresponding authors. E-mail: [email protected], [email protected] .1 Notations
The bold capital and small letters are used for defining matrices and vectors; the small italic letters areused to define scaler quantities; w .,i and h i,. are used for denoting column and row vectors, respectively; Q ( i, j ) denotes ( i, j ) th element of matrix Q ; p ( . ) indicates permutations; vec( Q ) indicates the vector-ization of a matrix Q which converts the matrix into a column vector; ( · ) + indicates pseudo-inverse ofmatrix; Q T is transpose of matrix Q ; R m × n indicates m × n -dimensional matrices; k · k denotes Frobe-nius norm; k · k denotes L -norm of a vector; Z is integers; R + indicates nonnegative real numbers; S n + denotes n × n -dimensional symmetric positive semidefinite matrices; ⊗ denotes Kronecker products; I n is an n -dimensional identity matrix; n denotes vector of length n with zeros as elements; ≥ or ≤ indicates component-wise inequality for matrices and vectors. A data matrix Z ∈ R n × m can be factorized as product of W ∈ R n × s and H ∈ R s × m for s << min( m, n ) as follows: Z ≈ W H . (1)In BSS problems, W is unknown mixing matrix and H is source signal matrix containing unknownsignals. Then, m is the number of observations, n the number of samples, and s is the number ofsources. Note that Eq. (1) considers approximate factorization of the data matrix. Since physicalsystems are often corrupted by noise, the exact factorization of Z into W and H of the data shouldnot be attempted. In presence of noise, the exact factorization of Z can be written as Z = W H + E , (2)where E is an m × n − dimensional matrix containing a contribution of noise in a given data.If W is known and rank ( W ) = s , then the least-squares solution of the unknown signal matrix is givenby ˆ H = W + Z . However, W is often not known, and the objective of BSS problems is to estimatethe unknown mixing matrix and signal matrix from the given data matrix. Note that BSS problemscan be also viewed as unsupervised learning methods where the objective is to extract useful featuresfrom the given data. To accomplish this task, a prior knowledge about the underlying system suchas non-negativity, and sparsity are imposed as constraints. These constraints lead to factorization ofthe data matrix with meaningful physical insights. The non-negativity constraint leads to popularnon-negative matrix factorization (NMF). In the next section, NMF is described briefly.3 .1 Nonnegative matrix factorization (NMF) NMF aims to factorize the Z into two nonnegative matrices W ∈ R n × s + and H ∈ R s × m + . In NMF, thefollowing problem is solved to obtain W and H min W , H k Z − WH k st W ≥ , H ≥ . (3)The imposing of nonnegative constraints leads to only additive combinations of several rank-one fac-tors to represent the data. Several algorithms have been proposed in the literature to solve Eq. (3),efficiently. However, NMF solution suffers from the following ambiguities [14]: • Scale ambiguity: Consider a non-singular matrix T ∈ R s × s + . Then, the factorization of Z canalso represented as: Z ≈ WH = ( WT )( T − H ) = W T H T . (4)Eq. (4) indicates that W T and H T are also solution of the problem (3). Hence, NMF leads to alocal optimum of the solution depending on the initialization, and provides numerous solutionsto the problem (3). • Ordering: Note that Z ≈ P i w .,i h i,. = P i w .,p ( i ) h p ( i ) ,. with p ( · ) is an ordering (or permutation).Hence, it is not possible to recover the order of the columns of W and of the rows H .Therefore, the formulation of NMF in Eq. (3) has many solutions. Additional constraints such assparsity, partial or full knowledge of noisy W need to be imposed to reduce the number solutions orto obtain a unique solution of the problem (3). These constraints depend on the applications andunderlying physical system. Often, the signal sources are in increasing or decreasing orders , i.e.,monotonous in nature. In such case, each row of H contains ordered elements. Hence, the monotonousconstraint can be added in the optimization problem (3) for recovering these signals from the datamatrix. The main objective of this paper is to develop an approach based on NMF for recoveringmonotonous signal sources. This leads to a novel method, called monotonous NMF, which is describednext. In this section, the monotonous constraints on the source signals will be formulated and imposed onthe factorization problem (3). Consider the i th row of H matrix, h i,. with i = 1 , . . . , s , whose elementsare monotonously increasing or decreasing order. Without loss of generality, we assume that out ofthe s rows, the first c rows have entries which are monotonously increasing, while the remaining rows4ave entries that are monotonously decreasing. Then, the following relationships hold between theelements: h k, ≤ h k, ≤ . . . ≤ h k,m , k = 1 , . . . , c, (increasing order) h l, ≥ h l, ≥ . . . ≥ h l,m , l = c + 1 , . . . , s, (decreasing order) . (5)The optimization problem in Eq. (3) can be reformulated by imposing the constraints in Eq. (5) as: min W , H k Z − WH k st W ≥ , H ≥ ,h k, ≤ h k, ≤ . . . ≤ h k,m , k = 1 , . . . , c,h l, ≥ h l, ≥ . . . ≥ h l,m . l = ( c + 1) , . . . , s. (6)The formulation in Eq. (6) is nonconvex optimization problem. The solution to the optimizationproblem (6) can be obtained by decomposing it into two subproblems. These problems can be solvedalternately till the solution converges. This approach is known as alternating least squares (ALS) inthe literature [8, 11]. The two subproblems can be formulated as follows: • Given H : min W k Z − WH k st W ≥ (7) • Given W : min H k Z − WH k st H ≥ h k, ≤ h k, ≤ . . . ≤ h k,m , k = 1 , . . . , c,h l, ≥ h l, ≥ . . . ≥ h l,m . l = ( c + 1) , . . . , d. (8)Note that objective functions in Eqs. (7) and (8) are quadratic functions. If we can formulate thesesubproblems in terms of standard quadratic programming as follows: min x x T Px + f T x st Gx ≤ lKx = r (9)where x ∈ R p , P ∈ S p + , G ∈ R t × p , l ∈ R p , K ∈ R b × p , and r ∈ R b , then it can be shown that bothsubproblem are convex optimization problems [2]. The standard ALS framework can be applied tosolve the problems (7)–(8). We reformulate the problems (7)–(8) in terms of standard form (9) as:5 Given H : min W vec( W ) T ( HH T ⊗ I n ) vec( W ) − ZH T ) T vec( W ) st − I ns vec( W ) ≤ ns (10) • Given W : min H vec( H ) T ( I m ⊗ W T W ) vec( H ) − Z T W ) T vec( H ) st (cid:20) − I ms A (cid:21) vec( H ) ≤ ms (11)where A ∈ R ( m − s × ms is matrix having the form A = D D . . . D s . The entries not shown in A are zeros. The matrices D k ∈ R ( m − × m , k =1 , , . . . , c and D l ∈ R ( m − × m , l = c + 1 , , . . . , s are defined as follows: D k = − − . . . . . . − − D l = − − . . . . . . − − Note that D k and D l are Toeplitz matrices with the first rows being [1 − . . . and [ − . . . ,respectively.Since H ≥ and W ≥ , the matrices ( HH T ⊗ I n ) and ( I m ⊗ W T W ) in Eqs. (10) and (11) arepositive semi-definite, and of full rank s . Further, both subproblems (10)–(11) are subjected to linearinequality constraints. Hence, the objective problems in (10)–(11) are quadratic programming (QP)of form (9). Consequently, they are convex optimization problems [2]. The optimal solutions of theproblems in Eqs. (10)–(11) exist. Standard QP algorithms such as active-set methods, interior-point6ethods etc [2, 12] can be used to solve the optimization problems (10)–(11) in the ALS framework.The implementation of these algorithms is available in in commercial scientific packages.Next, we present an algorithm to solve the optimization problems (10)–(11) for estimating W and H given Z . Further, one can impose the upper bounds on the elements of W and H based on thedata matrix and the underlying physical problem. Algorithm 1 presents an implementation for solvingmonotonous NMF problem (6) in the ALS framework. Data : Z , number of source signals ( s << min ( m, n ) ) Result : W , H initialization: W old , H old ; while k W old − W new k ≥ tolerance and k H old − H new k ≥ tolerance do read current; W new ⇐ = Solution of (10) H new ⇐ = Solution of (11) endReturn W = W new , H = H new Algorithm 1:
Algorithm for Monotonous NMFIn Algorithm 1, H can be normalized after each iteration. Then, W has to be scaled in appropriatemanner before proceeding to next iteration. Further, note that the subproblem (10) is a standardsubproblem in NMF algorithms based on the ALS framework. Then, the multiplicative rules proposedin [1, 10] can also be used to update W instead of solving the optimization (10) in the ALS framework.It has been shown that the algorithms in the ALS framework (such as Algorithm 1) decrease thefunction value at each iteration because the ALS framework produces stationary point at each iteration[8, 11]. Hence, Algorithm 1 converges to a solution but note that the convergence speed may be slower. This section demonstrates how idea of monotonous NMF can be extended by relaxing nonnegativeconstraints on the mixing matrix, W and Z . This will allow to apply monotonous NMF to datamatrix consisting negative entries. When we relax nonnegative constraints, W entries can have bothpositive or negative signs. This kind of NMF is referred as semi-NMF in the literature [4]. However, theentries of signal matrix are still nonnegative and monotonous. This relaxation on the mixing matrixallows the non-positive entries in data matrix. The optimization problem (7) can be reformulatedwithout the non-negativity constraints as follows: min W k Z − WH k . (12)7or given H , Eq. (12) is a least-squares problem, and hence, the analytical update rule can be obtainedas W = ZH T ( HH T ) + (13)In Eq. (13), HH T is positive semidefinite matrix and its inverse (or pseudo-inverse) exists. With thisupdate rule, a modified version of Algorithm 1 is given in Algorithm 2. Data : Z , number of source signals ( s << min ( m, n ) ) Result : W , H initialization: W old , H old ; while k W old − W new k ≥ tolerance and k H old − H new k ≥ tolerance do read current; W new ⇐ = ZH T old ( H old H T old ) + H new ⇐ = Solution of (11) endReturn W = W new , H = H new Algorithm 2:
Algorithm for Semi-NMFThis extension is in line of work done in [4]. Further, Algorithm 2 converges to a solution as we havediscussed for Algorithm 1.
In this section, we demonstrate monotonous NMF on synthetic data involving three source signals. Wehave performed simulation studies for two scenarios: (S1) monotonically increasing source signals, and(S2) mixed monotonous source signals (two increasing signals, one decreasing signal). The noise-freedata matrix is constructed in the following manner. Three monotonically source signals are consideredas shown in Figures 1 and 2 for Scenarios S1 and S2, respectively. Fifty sample points of signals areavailable in both scenarios. The mixing matrix of size (8 × is generated from uniform distributionin the interval (0,1). The noise-free data matrix Z of size ( × is obtained by multiplication of themixing matrix and signal source matrix. To generate the noisy data, 5% random uniformly distributednoise is added to the noisy-free data. All simulations are performed in MATLAB. For Algorithm 1,"quadprog" function is used to solve the problems (10)-(11).The following three methods are applied to both scenarios: (i) Monotonous NMF (labeled MNMF),(ii) NMF using multiplicative rules (labeled NNMF) (MATLAB in-built function), and (iii) fast NMF(labeled NMF) algorithm based on active-set methods in [8] with the assumption of three source signals.8 Figure 1: Source signals extracted from the noisy data for Scenario S1
Scenario S1
The normalized source signals obtained by applying three methods along with the trueone are shown in Figure 1 for Scenario S1. The reconstruction errors ( k Z − WH k ) are 0.1197, 0.1564and, 0.9835 for MNMF, NNMF, and NMF, respectively. This result shows that MNMF performs wellin comparison of NNMF and NMF. It should be noted that NMF estimates rank-deficient source signalmatrix ( rank ( H ) = 1 for NMF). In other words, NMF factorizes the data matrix into single one-rankfactor instead of three one-rank factors, while MNMF and NNMF factorize the data matrix into threerank-one factors. However, NNMF fails to capture monotonous behaviour of source signals. Scenario S2
The normalized source signals obtained using the three methods along with the true oneare shown in Figure 2 for Scenario S2. The reconstruction errors ( k Z − WH k ) are 0.1189, 0.1399 and,0.5736 for MNMF, NNMF, and NMF, respectively. In this case, MNMF performs better in comparisonto other methods to capture both kinds of monotonous signals. In this case, NMF estimates rank-deficient source signal matrix, however, the rank of H has increased by one rank ( H ) = 2 for NMF.Note that NNMF fails to capture monotonous behaviour of source signals in Scenario 2 too. The paper has proposed an approach to incorporate notion of monotonicity in NMF. The new extensionis called as monotonous NMF. Monotonous NMF has been applied to recover monotonous sourcesignals from noisy data. Further, we have extended monotonous NMF to semi-NMF by relaxing non-negativity constraints on the mixing matrix. Algorithms for monotonous (semi-)NMF using quadratic9
Figure 2: Source signals extracted from the noisy data for Scenario S2programming have been proposed in the ALS framework. The illustrative examples show that themonotonous NMF performs better in comparison to the algorithms of NMF in the literature when thesource signals exhibit monotonous behaviour. This indicates the importance of monotonicity constraintin NMF.
References [1] M.W. Berry, M. Browne, A. M. Langville, V. P. Pauca, and R. J. Plemmons. Algorithms andapplications for approximate nonnegative matrix factorization.
Computational Statistics & DataAnalysis , 52:155–173, 2007.[2] S. Boyd and L. Vandenberghe.
Convex Optimization . Cambridge University Press, UK, 2004.[3] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari.
Nonnegative matrix and tensor factorizations:Applications to exploratory multi-way data analysis and blind source separation . John Wiley &Sons, Ltd, New York, USA, 2009.[4] C. Ding, L. Tao, and M. I. Jordon. Convex and semi-nonnegative matrix factorizations.
IEEETransactions on Pattern Analysis and Machine Intelligence , 32:45–55, 2010.[5] D. L. Donoho and V. C. Stodden. When does non-negative matrix factorization give a correct de-composition into parts? In
Advances in neural information processing systems (NIPS) , volume 16,pages 1141–1148, 2003.[6] P. O. Hoyer. Non-negative matrix factorization with sparseness contraints.
Journal of MachineLearning Research , 5:1457–1469, 2004. 107] K. Huang, N. D. Sidiropoulos, and A. Swami. Non-negative matrix factorization revisited: Unique-ness and algorithm for symmetric decomposition.
IEEE Transcations on Signal Processing , 62:211–224, 2014.[8] J. Kim and H. Park. Fast nonnegative matrix factorization: An active-set-like method and com-parisons.
SIAM Journal of Scientific Computing , 33:3261–3281, 2011.[9] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization.
Nature , 401:788–791, 1999.[10] D. D. Lee and H. S. Seung. Algorithms for non-negativematrix factorization. In
Advances inneural information processing systems (NIPS) , volume 13, pages 556–562, 2001.[11] C. J. Lin. Projected gradient methods for nonnegative matrix factorization.
Neural Computing ,19:2756–2779, 2007.[12] J. Nocedal and S. J. Wright.
Numerical Optimization . Springer-Verlag New York, Inc., USA,1999.[13] P. Paatero and U. Tapper. Postive matrix factorization: A non-negative factor model with optimalutilization of error estimates of data values.
Enviornmetrics , 5:111–126, 1994.[14] J. Rapin, J. Bobin, A. Larue, and J. Starck. Sparse and non-negative bss for noisy data.
IEEETransactions on Signal Processing , 61(22):5620–5632, 2013.[15] T. Virtanen. Monaural sound source separation by nonnegative matrix factorization with temporalcontinuity and sparseness criteria.
IEEE Transactions on Audio, Speech, and Language Processing ,15(3):1066–1074, 2007.[16] R. Zdunek and A. Cichocki. Nonegative matrix factorization with quadartic programming.