[PDF] Sparse Separable Nonnegative Matrix Factorization

Abstract

We propose a new variant of nonnegative matrix factorization (NMF), combining separability and sparsity assumptions. Separability requires that the columns of the first NMF factor are equal to columns of the input matrix, while sparsity requires that the columns of the second NMF factor are sparse. We call this variant sparse separable NMF (SSNMF), which we prove to be NP-complete, as opposed to separable NMF which can be solved in polynomial time. The main motivation to consider this new model is to handle underdetermined blind source separation problems, such as multispectral image unmixing. We introduce an algorithm to solve SSNMF, based on the successive nonnegative projection algorithm (SNPA, an effective algorithm for separable NMF), and an exact sparse nonnegative least squares solver. We prove that, in noiseless settings and under mild assumptions, our algorithm recovers the true underlying sources. This is illustrated by experiments on synthetic data sets and the unmixing of a multispectral image.

Full PDF

SSparse Separable Nonnegative MatrixFactorization

Nicolas Nadisic (cid:0) , Arnaud Vandaele , Jeremy E. Cohen , and Nicolas Gillis University of Mons, Belgium. {nicolas.nadisic, arnaud.vandaele, nicolas.gillis}@umons.ac.be Univ Rennes, Inria, CNRS, IRISA, Rennes, France. [email protected]

Abstract.

We propose a new variant of nonnegative matrix factoriza-tion (NMF), combining separability and sparsity assumptions. Separa-bility requires that the columns of the ﬁrst NMF factor are equal tocolumns of the input matrix, while sparsity requires that the columns ofthe second NMF factor are sparse. We call this variant sparse separa-ble NMF (SSNMF), which we prove to be NP-complete, as opposed toseparable NMF which can be solved in polynomial time. The main mo-tivation to consider this new model is to handle underdetermined blindsource separation problems, such as multispectral image unmixing. Weintroduce an algorithm to solve SSNMF, based on the successive nonneg-ative projection algorithm (SNPA, an eﬀective algorithm for separableNMF), and an exact sparse nonnegative least squares solver. We provethat, in noiseless settings and under mild assumptions, our algorithmrecovers the true underlying sources. This is illustrated by experimentson synthetic data sets and the unmixing of a multispectral image.

Keywords:

Nonnegative Matrix Factorization · Sparsity · Separability.

Nonnegative Matrix Factorization (NMF) is a low-rank model widely used forfeature extraction in applications such as multispectral imaging, text mining, orblind source separation; see [8,6] and the references therein. Given a nonnegativedata matrix M ∈ R m × n + and a factorization rank r , NMF consists in ﬁnding twononnegative matrices W ∈ R m × r + and H ∈ R r × n + such that M ≈ W H . NMF canbe formalized as the following optimization problem: min W ≥ ,H ≥ (cid:107) M − W H (cid:107) F . (1)In this paper, we use the Frobenius norm to measure the quality of the approxi-mation. Although other measures are possible, the Frobenius norm is by far themost commonly used, because it assumes Gaussian noise (which is reasonable inmany real-life applications) and allows for eﬃcient computations [8].One of the advantages of NMF over similar methods such as principal com-ponent analysis (PCA) is that the nonnegativity constraint favors a part-based a r X i v : . [ c s . L G ] J un N. Nadisic et al. representation [13], which is to say that the factors are more easily interpretable,in particular when they have a physical meaning. If each column of M repre-sents a data point, then each corresponding column of H contains the coeﬃ-cients to reconstruct it from the r atoms represented by the columns of W , since M (: , j ) ≈ W H (: , j ) for all j . Every data point is therefore expressed as a linearcombination of atoms. For example, when using NMF for multispectral unmix-ing, a data point is a pixel, an atom is a speciﬁc material, and each columnof H contains the abundance of these materials in the corresponding pixel; seeSection 5.2 for more details. Geometrically, the atoms (columns of W ) can beseen as r vertices whose convex hull contains the data points (columns of M ),under appropriate scaling. In general, computing NMF is NP-hard [19]. However, Arora et al. [2] provedthat NMF is solvable in polynomial time under the separability assumption onthe input matrix.

Deﬁnition 1.

A matrix M is r -separable if there exists a subset of r columns of M , indexed by J , and a nonnegative matrix H ≥ , such that M = M (: , J ) H . Equivalently, M is r -separable if M has the form M = W [ I r , H (cid:48) ] Π , where I r isthe identity matrix of size r , H (cid:48) is a nonnegative matrix, and Π is a permutation.Separable NMF consists in selecting the right r columns of M such that M can bereconstructed perfectly. In other words, it consists in ﬁnding the atoms (columnsof W ) among the data points (columns of M ). Problem 1 (Separable NMF).

Given a r -separable matrix M , ﬁnd W = M (: , J ) with |J | = r and H ≥ such that M = W H .Note that, if W is known, the computation of H is straightforward: it is a convexproblem that can be solved using any nonnegative least squares (NNLS) solver(for example, it can be solved with the Matlab function lsqnonneg ). However,the solution is not necessarily unique, unless W is full rank.In the presence of noise, which is typically the case in real-life applications,this problem is called near-separable NMF and is also solvable in polynomialtime given that the noise level is suﬃciently small [2]. In this case, we are givena near-separable matrix M ≈ M (: , J ) H where |J | = r and H ≥ . Various algorithms have been developed to tackle the (near-)separable NMFproblem. Some examples are the successive projections algorithm (SPA) [1], thefast canonical hull algorithm [12], or the successive nonnegative projections algo-rithm (SNPA) [7]. Such algorithms start with an empty matrix W and a residualmatrix R = M , and then alternate between two steps: a greedy selection of onecolumn of R to be added to W , and an update of R using M and the columns parse Separable Nonnegative Matrix Factorization 3 Algorithm 1:

SNPA

Input:

A near-separable matrix M ∈ R m × n , the number r of columns to beextracted, and a strongly convex function f with f (0) = 0 (by default, f ( x ) = (cid:107) x (cid:107) ). Output:

A set of r indices J , and a matrix H ∈ R r × n + such that M ≈ M (: , J ) H . Init R ← M Init J = {} Init t = 1 while R (cid:54) = 0 & t ≤ r do p = argmax j f ( R (: , j )) J = J ∪ { p } foreach j do H ∗ (: , j ) = argmin h ∈ ∆ f ( M (: , j ) − M (: , J ) h ) R (: , j ) = M (: , j ) − M (: , J ) H ∗ (: , j ) t = t + 1 extracted so far. As SNPA was shown, both theoretically and empirically, toperform better and to be more robust than its competitors [7], it is the one westudy here in detail. Moreover, SNPA is able to handle the underdeterminedcase when rank( W ) < m which will be key for our problem setting (see belowfor more details).SNPA is presented in Algorithm 1. SNPA selects, at each step, the columnof M maximizing a function f (which can be any strongly convex function suchthat f (0) = 0 , and f = || . || is the most common choice). Then, the columns of M are projected onto the convex hull of the origin and the columns extractedso far, see step 8 where we use the notation ∆ = (cid:110) h (cid:12)(cid:12) h ≥ , (cid:88) i h i ≤ (cid:111) , whose dimension is clear from the context. After r steps, given that the noise issuﬃciently small and that the columns of W are vertices of conv( W ) , SNPA isguaranteed to identify W . An important point is that SNPA requires the columnsof H to satisfy (cid:107) H (: , j ) (cid:107) ≤ for all j , where (cid:107) x (cid:107) = (cid:80) i | x i | is the (cid:96) norm.This assumption can be made without loss of generality by properly scaling thecolumns of the input matrix to have unit (cid:96) norm; see the discussion in [7]. Unfortunately, some data sets cannot be handled successfully by separable NMF,even when all data points are linear combinations of a subset of the input ma-trix. In fact, in some applications, the columns of the basis matrix W , that is,the atoms, might not be vertices of conv( W ) . This may happen when one seeks N. Nadisic et al. . . . .

810 0 . . . . . . . . (a) . . . .

810 0 . . . . . . . . W W W W M Data points M (: , j ) Exterior verticesInterior vertexUnit simplex (b)

Fig. 1.

On the left (a), all vertices are exterior, and SNPA is assured to identify themall. On the right (b), the data points are 2-sparse combinations of 4 points, one ofwhich (vertex 4) is “interior” hence it cannot be identiﬁed with separable NMF. a matrix W which is not full column rank. For example, in multispectral unmix-ing, m is the number of spectral bands which can be smaller than r , which isthe number of materials present in the image; see Section 5.2 for more details.Therefore, it is possible for some columns of W to be contained in the convexhull of the other columns, that is, to be additive linear combinations of otherscolumns of W ; see Figure 1 for illustrations in three dimensions (that is, m = 3 ).These diﬃcult cases cannot be handled with separable NMF, because it as-sumes the data points to be linear combinations of vertices, so an “interior ver-tex” cannot be distinguished from another data point. However, if we assumethe sparsity of the mixture matrix H , we may be able to identify these interiorvertices. To do so, we introduce a new model, extending the approach of SNPAusing additional sparsity constraints. We introduce in Section 2 a proper deﬁ-nition of this new problem, which we coin as sparse separable NMF (SSNMF).Before doing so, let use recall the literature on sparse NMF. A vector or matrix is said to be sparse when it has few non-zero entries. SparseNMF is one of the most popular variants of NMF, as it helps producing moreinterpretable factors. In this model, we usually consider column-wise sparsity ofthe factor H , meaning that a data point is expressed as the combination of onlya few atoms. For example, in multispectral unmixing, the column-wise sparsityof H means that a pixel is composed of fewer materials than the total numberof materials present in the image. When sparsity is an a priori knowledge on thestructure of the data, encouraging sparsity while computing NMF is likely toreduce noise and produce better results. parse Separable Nonnegative Matrix Factorization 5 Sparse NMF is usually solved by extending standard NMF algorithms witha regularization such as the (cid:96) penalty [9,11], or constraints on some sparsitymeasure, like the one introduced in [10]. Recently, exact k -sparse methods basedon the (cid:96) -“norm” have been used for NMF, using a brute-force approach [4], or adedicated branch-and-bound algorithm [16]. They allow the explicit deﬁnition ofa maximum number (usually noted k ) of non-zero entries per column of H . Theseapproaches leverage the fact that, in most NMF problems, the factorization rank r is small, hence it is reasonable to solve the k -sparse NNLS subproblems exactly. In this work, we study the SSNMF model from a theoretical and a pratical pointof view. Our contributions can be summarized as follows: – In Section 2, we introduce the SSNMF model. We prove that, unlike sepa-rable NMF, SSNMF is NP-complete. – In Section 3, we propose an algorithm to tackle SSNMF, based on SNPAand an exact sparse NNLS solver. – In Section 4, we prove that our algorithm is correct under reasonable as-sumptions, in the noiseless case. – In Section 5, experiments on both synthetic and real-world data sets illus-trate the relevance and eﬃciency of our algorithm.

We explained in the previous section why separable NMF does not allow for theidentiﬁcation of “interior vertices”, as they are nonnegative linear combinationsof other vertices. However, if we assume a certain column-wise sparsity on thecoeﬃcient matrix H , they may become identiﬁable. For instance, the vertex W of Figure 1b can be expressed as a combination of the three exterior vertices ( W , W , and W ), but not as a combination of any two of these vertices. Moreover,some data points cannot be explained using only pairs of exterior vertices, whilethey can be if we also select the interior vertex W . Let us denote (cid:107) x (cid:107) the number of non-zero entries of the vector x . Deﬁnition 2.

A matrix M is k -sparse r -separable if there exists a subset of r columns of M , indexed by J , and a nonnegative matrix H ≥ with (cid:107) H (: , j ) (cid:107) ≤ k for all j such that M = M (: , J ) H . Deﬁnition 2 corresponds to Deﬁnition 1 with the additional constraint that H has k -sparse columns, that is, columns with at most k non-zero entries. A naturalassumption to ensure that we can identify W (that is, ﬁnd the set J ), is thatthe columns of W are not k -sparse combinations of any other columns of W ; seeSection 4 for the details. N. Nadisic et al.

Problem 2 (SSNMF).

Given a k -sparse r -separable matrix M , ﬁnd W = M (: , J ) with |J | = r and a column-wise k -sparse matrix H ≥ such that M = W H .As opposed to separable NMF, given J , computing H is not straightforward.It requires to solve the following (cid:96) -constrained optimization problem H ∗ = argmin H ≥ f ( M − M (: , J ) H ) such that (cid:107) H (: , j ) (cid:107) ≤ k for all j. (2)Because of the combinatorial nature of the (cid:96) -“norm”, this k -sparse projectionis a diﬃcult subproblem with (cid:0) rk (cid:1) possible solutions, which is known to be NP-hard [17]. In particular, a brute-force approach could tackle this problem bysolving O ( r k ) NNLS problems. However, this combinatorial subproblem can besolved exactly and at a reasonable cost by dedicated branch-and-bound algo-rithms, such as arborescent [16], given that r is suﬃciently small, which istypically the case in practice. Even when k is ﬁxed, the following result showsthat no provably correct algorithm exists for solving SSNMF in polynomial time(unless P=NP): Theorem 1.

SSNMF is NP-complete for any ﬁxed k ≥ .Proof. The proof is given in Appendix A. Note that the case k = 1 is trivialsince each data point is a multiple of a column of W .However, in Section 4, we show that under a reasonable assumption, SSNMFcan be solved in polynomial time when k is ﬁxed. To the best of our knowledge, the only work presenting an approach to tackleSSNMF is the one by Sun and Xin (2011) [18] — and it does so only partially. Itstudies the blind source separation of nonnegative data in the underdeterminedcase. The problem tackled is equivalent to NMF in the case m < r . The assump-tions used in this work are similar to ours, that is, separability and sparsity.However, the setup considered is less general than SSNMF because the sparsityassumption (on each column of H ) is limited to k = m − , while the only caseconsidered theoretically is the case r = m + 1 with only one interior vertex.The proposed algorithm ﬁrst extracts the exterior vertices using the methodLP-BSS from [15], and then identiﬁes the interior vertex using a brute-force geo-metric method. More precisely, they select an interior point, and check whetherat least two of the m − hyperplanes generated by this vertex with m − ofthe extracted exterior vertices contain other data points. If it is the case, thenthey conclude that the selected point is an interior vertex, otherwise they se-lect another interior point. For example, when m = 3 , this method consists inconstructing the segments between the selected interior point and all the exte-rior vertices. If two of these segments contain at least one data point, then themethod stops and the selected interior point is chosen as the interior vertex.Looking at Figure 1b, the only interior point for which two segments joining parse Separable Nonnegative Matrix Factorization 7 Algorithm 2: brassens

Input: A k -sparse-near-separable matrix M ∈ R m × n , and the desired sparsitylevel k . Output:

A set of r indices J , and a matrix H ∈ R r × n + , such that (cid:107) H (: , j ) (cid:107) ≤ k for all j and M = ˜ M (: , J ) H . J = SNPA ( M, ∞ ) J (cid:48) = kSSNPA ( M, ∞ , J ) foreach j ∈ J (cid:48) do if min || h || ≤ k,h ≥ f ( M (: , j ) − M (: , J (cid:48) \ j ) h ) > then J = J ∪ { j } H = arborescent ( M, M (: , J ) , k ) this point and an exterior vertex contain data points is W . Note that, to beguaranteed to work, this method requires at least two hyperplanes containingthe interior vertex and m − exterior vertices to contain data points. This willnot be a requirement in our method. brassens In the following, we assume that the input matrix M is k -sparse r -separable.Our algorithm, called brassens , is presented formally in Algorithm 2.On Line 1 we apply the original SNPA to select the exterior vertices; it iscomputationally cheap and ensures that these vertices are properly identiﬁed.The symbol ∞ means that SNPA stops only when the residual error is zero.For the noisy case, we replace the condition R (cid:54) = 0 by R > δ , where δ is auser-provided noise-tolerance threshold.Then, we adapt SNPA to impose a k -sparsity constraint on H : the projec-tion step (Line 8 of Algorithm 1) is replaced by a k -sparse projection step thatimposes the columns of H to be k -sparse by solving (2). We call kSSNPA thismodiﬁed version of SNPA. Note that, if k = r , kSSNPA reduces to SNPA.On Line 2 we apply kSSNPA to select candidate interior vertices. We pro-vide it with the set J of exterior vertices so that they do not need to be identi-ﬁed again. kSSNPA extracts columns of M as long as the norm of the residual (cid:107) M − M (: , J (cid:48) ) H (cid:107) F is larger than zero. At this point, all vertices have beenidentiﬁed: the exterior vertices have been identiﬁed by SNPA, while the interiorvertices have been identiﬁed by kSSNPA because we will assume that they arenot k -sparse combinations of any other data points; see Section 4 for the details.Hence, the error will be equal to zero if and only if all vertices have been identi-ﬁed. However, some selected interior points may not be interior vertices, becausethe selection step of kSSNPA chooses the point that is furthest away from the It stands for brassens

Relies on Assumptions of Separability and Sparsity for Ele-gant NMF Solving. N. Nadisic et al. k -sparse hull of the selected points, that is, the union of the convex hulls ofthe subsets of k already selected points. For example, in Figure 1b, if W , W ,and W are selected, the k -sparse hull is composed of the 3 segments [ W , W ] , [ W , W ] , and [ W , W ] . In this case, although only point W is a interior vertex,point M is selected before point W , because it is located further away fromthe k -sparse hull.On Lines 3 to 5, we apply a postprocessing to the selected points by checkingwhether they are k -sparse combinations of other selected points; this is a k -sparseNNLS problem solved with arborescent [16]. If they are, then they cannot bevertices and they are discarded, such as point M in Figure 1b which belongs tothe segment [ W , W ] .Note that this “postprocessing” could be applied directly to the whole dataset by selecting data points as the columns of W if they are not k -sparse com-binations of other data points. However, this is not reasonable in practice, asit is equivalent to solving n times a k -sparse NNLS subproblem in n − vari-ables. The kSSNPA step can thus be interpreted as a safe screening technique,similarly as done in [5] for example, in order to reduce the number of candidateatoms from all the columns of M to a subset J (cid:48) of columns. In practice, we haveobserved that kSSNPA is very eﬀective at identifying good candidates points;see Section 5.1. brassens In this section, we ﬁrst discuss the assumptions that guarantee brassens torecover W given the k -sparse r -separable matrix M , and then discuss the com-putational complexity of brassens . In this section, we show that, given a k -sparse r -separable matrix M , the bras-sens algorithm provably solves SSNMF, that is, it is able to recover the correctset of indices J such that W = M (: , J ) , under a reasonable assumption.Clearly, a necessary assumption for brassens to be able to solve SSNMFis that no column of W is a k -sparse nonnegative linear combinations of othercolumns of W , otherwise kSSNPA might set that column of W to zero, hencemight not be able to extract it. Assumption 1.

No column of W is a nonnegative linear combination of k othercolumns of W . Interestingly, unlike the standard separable case (that is, k = r ), and al-though it is necessary in our approach with brassens , Assumption 1 is notnecessary in general to be able to uniquely recover W . Take for example thesituation of Figure 2, with three aligned points in the interior of a triangle, sothat r = 6 , m = 3 and k = 2 . The middle point of these three aligned points is a2-sparse combination of the other two, by construction. If there are data points parse Separable Nonnegative Matrix Factorization 9 W W W Fig. 2.

There are three interior vertices.One of them ( W ) is a combination of theothers two. W M M Fig. 3.

There are two interior vertices.One of them ( W ) is a 2-sparse combina-tion of data points ( M and M ). on each segment joining these three interior points and the exterior vertices,the only solution to SSNMF with r = 6 is the one selecting these three alignedpoints. However, Assumption 1 is a reasonable assumption for SSNMF.Unfortunately, Assumption 1 is not suﬃcient for brassens to provably re-cover W . In fact, we need the following stronger assumption. Assumption 2.

No column of W is a nonnegative linear combination of k othercolumns of M . This assumption guarantees that a situation such as the one shown on Figure 3where one of the columns of W is a 2-sparse combination of two data points isnot possible. In fact, in that case, if brassens picks these two data points beforethe interior vertex W in between them, it will not be able to identify W as itis set to zero within the projection step of kSSNPA.Interestingly, in the standard separable case, that is, k = r , the two assump-tions above coincide; this is the condition under which SNPA is guaranteed towork. Although Assumption 2 may appear much stronger than Assumption 1,they are actually generically equivalent given that the entries of the columnsof H are generated randomly (that is, non-zero entries are picked at randomand follow some continuous distribution). For instance, for m = 3 and k = 2 , itmeans that no vertex is on a segment joining two data points. If the data pointsare generated randomly on the segments generated by any two columns of W ,the probability for the segment deﬁned by two such data points to contain acolumn of W is zero. In fact, segments deﬁne a set of measure zero in the unitsimplex.We can now provide a recovery result for brassens . Theorem 2.

Let M = W H with W = M (: , J ) be a k -sparse r -separable matrixso that |J | = r ; see Deﬁnition 2. We have that – If W satisﬁes Assumption 2, then the factor W with r columns in SSNMFis unique (up to permutation and scaling) and brassens recovers it. – If W satisﬁes Assumption 1, the entries of H are generated at random (moreprecisely, the position of the non-zero entries are picked at random, while their values follows a continuous ditribution) and k < rank( M ) , then, withprobability one, the factor W with r columns in SSNMF is unique (up topermutation and scaling) and brassens recovers it.Proof. Uniqueness of W in SSNMF under Assumption 2 is straightforward: sincethe columns of W are not k -sparse combinations of other columns of M , theyhave to be selected in the index set J . Otherwise, since columns of W areamong the columns of M , it would not possible to reconstruct M exactly using k -sparse combinations of M (: , J ) . Then, since all other columns are k -sparsecombinations of the r columns of W (by assumption), no other columns needsto be added to J which satisﬁes |J | = r .Let us show that, under Assumption 2, brassens recovers the correct set ofindices J . kSSNPA can only stop when all columns of W have been identiﬁed.In fact, kSSNPA stops when the reconstruction error is zero, while, under As-sumption 2, this is possible only when all columns of W are selected (for thesame reason as above). Then, the postprocessing will be able to identify, amongall selected columns, the columns of W , because they will be the only ones thatare not k -sparse combinations of other selected columns.The second part of the proof follows from standard probabilistic results:since k < rank( M ) , the combination of k data points generates a subspace ofdimension smaller than that of col( M ) . Hence, generating data points at randomis equivalent to generating such subspaces at random. Since these subspacesform a space of measure zero in col( M ) , the probability for these subspaces tocontain a column of W is zero, which implies that Assumption 2 is satisﬁed withprobability one. Let us derive an upper bound on the computational cost of brassens . First,recall that solving an NNLS problem up to any precision can be done in poly-nomial time. For simplicity and because we focus on the non-polynomial part of brassens , we denote ¯ O (1) the complexity of solving an NNLS problem. In theworst case, kSSNPA will extract all columns of M . In each of the n iterations ofkSSNPA, the problem (2) needs to be solved. When |J | = O ( n ) , this requiresto solve n times (one for each column of M ) a k -sparse least squares problem in |J | = O ( n ) variables. The latter requires in the worst case O ( n k ) operations bytrying all possible index sets; see the discussion after (2). In total, kSSNPA willtherefore run in the worst case in time ¯ O ( n k +2 ) .Therefore, when k is ﬁxed (meaning that k is considered as a ﬁxed constant)and under Assumption 2, brassens can solve SSNMF in polynomial time. Notethat this is not in contradiction with our NP-completeness results when k is ﬁxed(Theorem 1) because our NP-completeness proof does not rely on Assumption 2.In summary, to make SSNMF hard, we need either k to be part of the input,or the columns of W to be themselves k -sparse combinations of other columnsof W . parse Separable Nonnegative Matrix Factorization 11 The code and data are available online . All experiments have been performedon a personal computer with an i5 processor, with a clock frequency of 2.30GHz.All algorithms are single-threaded. All are implemented in Matlab, except thesparse NNLS solver arborescent , which is implemented in C++ with a MatlabMEX interface.As far as we know, no algorithm other than brassens can tackle SSNMFwith more than one interior point (see Section 2.2) hence comparisons with ex-isting works are unfortunately limited. For example, separable NMF algorithmscan only identify the exterior vertices; see Section 1. However, we will compare brassens to SNPA on a real multispectral image in Section 5.2, to show the ad-vantages of the SSNMF model over separable NMF. In Section 5.1, we illustratethe correctness and eﬃciency of brassens on synthetic data sets. In this section, we illustrate the behaviour of brassens in diﬀerent experimentalsetups. The generation of a synthetic data set is done as follows: for a givennumber of dimensions m , number of vertices r , number of data points n , anddata sparsity k , we generate matrices W ∈ R m × r + such that the last r − m columnsof W are linear combinations of the ﬁrst m columns, and H ∈ R r × n + such that H = [ I r , H (cid:48) ] and (cid:107) H (: , j ) (cid:107) ≤ k for all j . We use the uniform distribution in theinterval [0,1] to generate random numbers (columns of W and columns of H ), andthen normalize the columns of W and H to have unit (cid:96) norm. We then compute M = W H . This way, the matrix M is k -sparse r -separable, with r vertices, ofwhich r − m are interior vertices (in fact, the ﬁrst m columns of W are linearlyindependent with probability one as they are generated randomly). We then run brassens on M , with the parameter k , and no noise-tolerance. For a given setup,we perform 30 rounds of generation and solving, and we measure the median ofthe running time and the median of the number of candidates extracted bykSSNPA. This number of candidates corresponds to |J (cid:48) | in Algorithm 2, that is,the number of interior points selected by kSSNPA as potential interior vertices.Note that a larger number of candidates only results in an increased computationtime, and does not change the output of the algorithm which is guaranteed toextract all vertices (Theorem 2).Figure 4 shows the behaviour of brassens when n varies, with ﬁxed m = 3 , k = 2 , and r = 5 . To the best of our knowledge, this case is not handled byany other algorithm in the literature. Both the number of candidates and therun time grow slower than linear. The irregularities in the plot are due to thehigh variance between runs. Indeed, if vertices are generated in a way that somesegments between vertices are very close to each other, brassens typically selectsmore candidates before identifying all columns of W . https://gitlab.com/nnadisic/ssnmf

20 40 60 80 100 120 140 160 180 200681012

Number of data points n N u m b e r o f c a nd i d a t e i n t e r i o r v e r t i c e s R un t i m e ( i n s e c o nd s ) Fig. 4.

Results for brassens on synthetic data sets for diﬀerent values of n , with ﬁxed m = 3 , k = 2 , and r = 5 with 3 exterior and 2 interior vertices. The values showed arethe medians over 30 experiments. In Table 1 we compare the performance of brassens for several sets of pa-rameters. The number of candidates grows relatively slowly as the dimensions ( m, n ) of the problem increase, showing the eﬃciency of the screening performedby kSSNPA. However, the run time grows rather fast when the dimensions ( m, n ) grow. This is because, not only the number of NNLS subproblems to solve in-crease, but also their size. In all cases, as guaranteed by Theorem 2, brassens Table 1.

Results for brassens on synthetic data sets (median over 30 experiments).m n r k Number of candidates Run time in seconds3 25 5 2 5.5 0.264 30 6 3 8.5 3.305 35 7 4 9.5 38.716 40 8 5 13 395.88 was able to correctly identify the columns of W . Again, as far as we know, noexisting algorithms in the literature can perform this task.To summarize, our synthetic experiments show the eﬃciency of the screeningdone by kSSNPA, and the capacity of brassens to handle medium-scale datasets. parse Separable Nonnegative Matrix Factorization 13 A multispectral image is an image composed of various wavelength ranges, called spectral bands , where every pixel is described by its spectral signature . This sig-nature is a vector representing the amount of energy measured for this pixelin every considered spectral band. Multispectral images usually have a smallnumber of bands (between 3 and 15). These bands can be included or not inthe spectrum of visible light. In the NMF model, if the columns of M are the n pixels of the image, then its rows represent the m spectral bands.The unmixing of a multispectral image consists in identifying the diﬀerentmaterials present in that image. When the spectral signatures of the materialspresent in the image are unknown, it is referred to as blind unmixing . The useof NMF for blind unmixing of multispectral images relies on the linear mixingmodel, that is, the assumption that the spectral signature of a pixel is the lin-ear combination of the spectral signatures of the material present in this pixel.This corresponds exactly to the NMF model, which is therefore able to identifyboth the materials present in the image ( W ) and the proportions/abundancesof materials present in every pixel ( H ); see [3,14] for more details.Let us apply brassens to the unmixing of the well-known Urban satelliteimage [21], composed of × pixels. The original cleaned image has 162bands, but we only keep 3 bands, namely the bands 2, 80, and 133 – these wereobtained by selecting diﬀerent bands with SPA applied on M T – to obtain a dataset of size ×

94 249 . The question is: can we still recover materials by using only3 bands? (The reason for this choice is that this data set is well known and theground truth is available, which is not the case of most multispectral images withonly 3 bands.) We ﬁrst normalize all columns of M so that they sum to one. Then,we run brassens with a sparsity constraint k = 2 (this means that we assumethat a pixel can be composed of at most 2 materials, which is reasonable forthis relatively high resolution image) and a noise-tolerance threshold of 4%; thismeans that we stop SNPA and kSSNPA when (cid:107) M − M (: , J ) H (cid:107) F ≤ . (cid:107) M (cid:107) F . brassens extracts 5 columns of the input matrix. For comparison, we run SNPAwith r = 5 . Note that this setup corresponds to underdetermined blind unmixing,because m = 3 < r = 5 . It would not be possible to tackle this problem usingstandard NMF algorithms (that would return a trivial solution such as M = I M ). It can be solved with SNPA, but SNPA cannot identify interior vertices.SNPA extracts the 5 vertices in . seconds. brassens extracts 5 vertices,including one interior vertex, in seconds. The resulting abundance maps areshowed in Figure 5. They correspond to the reshaped rows of H , hence they showwhich pixel contains which extracted material (they are more easily interpretablethan the spectral signatures contained in the columns of W ). The materials theycontain are given in Table 2, using the ground truth from [20]. We see that brassens produces a better solution, as the materials present in the image arebetter separated: the ﬁrst three abundance maps of brassens are sparser andcorrespond to well-deﬁned materials. The last two abundances maps of SNPAand of brassens are similar but extracted in a diﬀerent order. The running timeof brassens is reasonable, although ten times higher than SNPA. Fig. 5.

Abundances maps of materials (that is, reshaped rows of H ) extracted by SNPA(top) and by brassens (bottom) in the Urban image with only 3 spectral bands. Table 2.

Interpretation of the unimixing results from Figure 5.Image Materials extracted by SNPA Materials extracted by brassens

In this paper, we introduced SSNMF, a new variant of the NMF model com-bining the assumptions of separability and sparsity. We presented brassens , analgorithm able to solve exactly SSNMF, based on SNPA and an exact sparseNNLS solver. We showed its eﬃciency for various setups and in the successfulunmixing of a multispectral image. The present work provides a new way toperform underdetermined blind source separation, under mild hypothesis, anda new way to regularize NMF. It makes NMF identiﬁable even when atoms of W are nonnegative linear combinations of other atoms (as long as these com-binations have suﬃciently many non-zero coeﬃcients). Further work includesthe theoretical analysis of the proposed model and algorithm in the presence ofnoise. Acknowledgments.

The authors are grateful to the reviewers, whose insightfulcomments helped improve the paper. NN and NG acknowledge the support bythe European Research Council (ERC starting grant No 679515), and by theFonds de la Recherche Scientiﬁque - FNRS and the Fonds WetenschappelijkOnderzoek - Vlanderen (FWO) under EOS project O005318F-RG47. parse Separable Nonnegative Matrix Factorization 15

A Proof of Theorem 1: NP-Completeness of SSNMF

The main purpose of this material is to provide the proof of Theorem 1. Moreprecisely, we prove the NP-completeness of SSNMF with k = 2 , which we denote2-SSNMF. The decision version of this problem is formally deﬁned as follows. Problem 3. r > and a -sparse r -separable matrix M Question: ﬁnd a dictionary matrix W = M (: , J ) with |J | ≤ r and a column-wise -sparse matrix H ≥ such that M = W H .NP-completeness of SSNMF for any ≤ k ≤ r follows directly as it would al-low to solve 2-SSNMF by simply adding artiﬁcial columns of W (for exampleorthogonal to the ones used in the 2-sparse decomposition). In order to provethe NP-hardness of 2-SSNMF, we ﬁrst demonstrate a polynomial time reduc-tion from the well known NP-complete problem SET-COVER (see Garey andJohnson (2002) ) to 2-SSNMF. Problem 4.

SET-COVERGiven: A ﬁnite set S = { , ..., n } , a collection C = { C , ..., C m } of subsets of S and a positive integer K ≤ m .Question: Does C (cid:48) ⊆ C exist with | C (cid:48) | ≤ K such that every element of S belongsto at least one member of C (cid:48) .From an instance ( S, C, K ) of SET-COVER, let us construct an instance ( M, r ) of 2-SSNMF in polynomial time. – The natural number r > is deﬁned as r = m (cid:88) i =1 | C i | + 2 + K. – The matrix M is the concatenation of three matrices M , M , M such that M = [ M , M , M ] . • For each subset C i of Problem 4 with i = 1 , ..., m , we have the data point M (: , i ) deﬁned as follows: M (: , i ) = (cid:0) − h i (cid:1) T with h i = im +1 . Hence, M is a -by- m matrix. • For each element j = 1 , ..., n of the ground set S , we have the data point M (: , j ) deﬁned as follows: M (: , j ) = (cid:0) b j b j (cid:1) T with b j = m +1+ aj and a = n . These points belong to the curve y = x . M is a -by- n matrix. Garey, M.R., Johnson, D.S.: Computers and intractability, vol. 29 (2002)6 N. Nadisic et al. • When the j th element of the ground set S is a member of the i th subset C i , we add a data point in M as follows: M (: , l ) = (cid:16) h i b j h i b j (cid:17) T with h i and b j as previously deﬁned. Moreover, we add two more columnsto M for the data points (0 , and (0 , − . Hence, M is a -by- (2 + (cid:80) mi =1 | C i | ) matrix. Note that the intersection of the curve y = x withthe linear equation connecting (0 , − h i ) and ( b j , b j ) is precisely the point ( h i b j , h i b j ) . We show in Lemma 1 that all these points never overlap. Itimplies that a straight line between the i th point of M and a point in M is passing through the jth point of M if and only if j is in the subset C i . Lemma 1.

All the columns of M are diﬀerent.Proof. Suppose it is not the case and that for ( i, j ) and ( i (cid:48) , j (cid:48) ) with i (cid:54) = i (cid:48) and j (cid:54) = j (cid:48) , we have h i b j = h i (cid:48) b j (cid:48) , which means that, after rearrangement ii (cid:48) = m + 1 + aj (cid:48) m + 1 + aj . (3)For j, j (cid:48) = 1 , ..., n , the right-hand side of (3) varies as follows: − a (cid:18) n − m + 1 + an (cid:19) ≤ m + 1 + ajm + 1 + aj (cid:48) ≤ a (cid:18) n − m + 1 + a (cid:19) , and when a = n , we have a (cid:16) n − m +1+ an (cid:17) < a (cid:16) n − m +1+ a (cid:17) < m +1 , which means thatthe variation of the right-hand side around is as follows − m + 1 < m + 1 + ajm + 1 + aj (cid:48) < m + 1 . For i, i (cid:48) = 1 , ..., m + 1 , the closest value to of ii (cid:48) is mm +1 , that is − m +1 .Therefore, the choice of a = n prevents the right-hand side of (3) to be equal toits left-hand side. It results that all the values h i b j are diﬀerent for i = 1 , ..., m + 1 and j = 1 , ..., n . Lemma 2.

All the columns of M are situated inside the convex hull of thecolumns of M .Proof. Except for (0 , − , the columns of M are located on the moment curve y = x . It results that these points are the vertices of a convex polygon, knownunder the name of cyclic polytope . The intersection of the y -axis and the lineconnecting any two points of the set { ( x, y ) | y = x , x ≥ } is located strictly parse Separable Nonnegative Matrix Factorization 17 below the point (0 , − . Following the deﬁnitions of h i and b j , we have h i b j ≥ for any i = 1 , ..., m and j = 1 , ..., n , which means that even with the addition of (0 , − , the points of M still form a convex polygon. It is then easy to checkthat the points of M and M are inside the convex hull of M . Lemma 3.

The 2-SSNMF instance is a yes-instance if and only if the SET-COVER instance is a yes-instance.Proof.

The if part. Suppose we have an optimal cover C (cid:48) ⊆ C of the SET-COVER instance with | C (cid:48) | ≤ K . From this solution, we build a solution to the instance as follows: – For the dictionary matrix W , we concatenate M and the columns of M corresponding to the subsets in C (cid:48) . By this way, the number of columns of W is less or equal than r = (cid:80) mi =1 | C i | + 2 + K . – With M being in the dictionary, it is easy to construct the columns of H corresponding to [ M , M ] in M : it is trivial for M and, for M , thetwo nonnegative entries of a column of H are the two coeﬃcients of theconvex combination of (0 , , (0 , − . Moreover, since W also contains the K columns of M corresponding to the cover C (cid:48) , every column coming from M in M can be expressed as the convex combination of exactly two columnsof W (see the reduction above). By this way, we have H ≥ , a column-wise -sparse matrix, such that M = W H . The only if part.

Suppose that we have a solution ( W, H ) of the instance such that M = W H , W having at most r columns and H ≥ beinga column-wise -sparse matrix. From this factorization, we show how to extracta cover C (cid:48) made of at most K subsets. All the columns of M are necessarilyin W since they are the vertices of a convex polygon (see Lemma 2). Since, byconstruction, no convex combination of two points in M can reach the n pointsin M , we must have W = [ M , W (cid:48) ] Π with the columns of W (cid:48) coming either from M or from M . The number of columns of W (cid:48) is therefore r − ( (cid:80) mi =1 | C i | + 2) = K . It remains to show how to construct a solution to the SET-COVER instancefrom W (cid:48) . For every point in M , it is possible to ﬁnd a point in M and a point M such that the three points are lined up (it is always possible to ﬁnd suchpoints since we suppose that every element of the ground set belongs to at leastone subset in the SET-COVER instance). It means that we can replace all thecolumns coming from M in W (cid:48) by columns of M without increasing the sizeof W (cid:48) . In order to maintain the equality M = W H , it is easy to update thematrix H accordingly while keeping it column-wise 2-sparse. Finally, with the K columns of W (cid:48) coming from M , we have identiﬁed a cover C (cid:48) composed of K subsets for the SET-COVER instance.

Proof.

Proof of Theorem 1. 2-SSNMF is in NP since we can check in polynomialtime that a given pair ( W, H ) is a solution of a 2-SSNMF instance. With thereduction from the SET-COVER problem presented above and Lemma 3, wecan conclude that 2-SSNMF is NP-hard. Illustration of the reduction.

From the

SET-COVER instance: n = 5 , m = 4 , K = 2 , C = { , } , C = { , , } , C = { , } and C = { , } , the reductionpresented above leads to the following instance: r = 13 and M =[ M , M , M ] with M = (cid:18) − h − h − h − h (cid:19) , M = (cid:18) b b b b b b b b b b (cid:19) , and M = (cid:32) h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b − (cid:33) , where h i = i for i = 1 , ... and b j = (5 + j ) − for j = 1 , ... (see Figure 6).A solution to the SET-COVER instance is C (cid:48) = { C , C } and the corresponding solution is W = (cid:32) h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b h b − − h − h (cid:33) , and H = H =  β , β , β , I β ,

00 0 0 0 0 0 0 0 β , − α − α − α − α α α α α − β , − β , − β , . . .

00 0 0 0 0 0 0 1 − β , − β , . . .  , for which we have M = W H when α i = h i , β i,j = b j h i . References

1. Araújo, M.C.U., Saldanha, T.C.B., Galvão, R.K.H., Yoneyama, T., Chame, H.C.,Visani, V.: The successive projections algorithm for variable selection in spectro-scopic multicomponent analysis. Chemometrics and Intelligent Laboratory Systems (2), 65–73 (2001)2. Arora, S., Ge, R., Kannan, R., Moitra, A.: Computing a nonnegative matrix factor-ization – provably. In: Proceedings of the Forty-Fourth Annual ACM Symposiumon Theory of Computing. pp. 145–162 (2012)3. Bioucas-Dias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P.,Chanussot, J.: Hyperspectral unmixing overview: Geometrical, statistical, andsparse regression-based approaches. IEEE journal of selected topics in applied earthobservations and remote sensing (2), 354–379 (2012)parse Separable Nonnegative Matrix Factorization 19 − . − y = x Fig. 6.

Example of a 2-SSNMF instance constructed from the following SET-COVERinstance: n = 5 , m = 4 , C = { , } , C = { , , } , C = { , } and C = { , } . Thesecond picture is a zoom of the ﬁrst picture on the [0 , . × [ − , box. Red circlescorrespond to the points of M , blue diamonds to the points of M and green squaresto the points of M . The red dashed lines are the edges of the convex hull of the pointsin M .0 N. Nadisic et al.4. Cohen, J.E., Gillis, N.: Nonnegative Low-rank Sparse Component Analysis.In: IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). pp. 8226–8230 (2019)5. El Ghaoui, L., Viallon, V., Rabbani, T.: Safe feature elimination in sparse super-vised learning technical report no. Tech. rep., UC/EECS-2010-126, EECS Dept.,University of California at Berkeley (2010)6. Fu, X., Huang, K., Sidiropoulos, N.D., Ma, W.K.: Nonnegative matrix factorizationfor signal and data analytics: Identiﬁability, algorithms, and applications. IEEESignal Processing Magazine (2), 59–80 (2019)7. Gillis, N.: Successive Nonnegative Projection Algorithm for Robust NonnegativeBlind Source Separation. SIAM Journal on Imaging Sciences pp. 1420–1450 (2014)8. Gillis, N.: The why and how of nonnegative matrix factorization. Regularization,Optimization, Kernels, and Support Vector Machines (257), 257–291 (2014)9. Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 12th IEEE Work-shop On Neural Networks for Signal Processing. pp. 557–565 (2002)10. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journalof machine learning research , 1457–1469 (2004)11. Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics (12), 1495–1502 (2007)12. Kumar, A., Sindhwani, V., Kambadur, P.: Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization. In: Proceedings of the 30th Interna-tional Conference on Machine Learning (2013)13. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix fac-torization. Nature (6755), 788–791 (1999)14. Ma, W.K., Bioucas-Dias, J.M., Chan, T.H., Gillis, N., Gader, P., Plaza, A.J., Am-bikapathi, A., Chi, C.Y.: A signal processing perspective on hyperspectral unmix-ing: Insights from remote sensing. IEEE Signal Processing Magazine (1), 67–81(2014)15. Naanaa, W., Nuzillard, J.M.: Blind source separation of positive and partiallycorrelated data. Signal Processing (9), 1711–1722 (2005)16. Nadisic, N., Vandaele, A., Gillis, N., Cohen, J.E.: Exact Sparse Nonnegative LeastSquares. In: IEEE International Conference on Acoustics, Speech and Signal Pro-cessing (ICASSP). pp. 5395 – 5399 (2020)17. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM journal oncomputing (2), 227–234 (1995)18. Sun, Y., Xin, J.: Underdetermined sparse blind source separation of nonnegativeand partially overlapped data. SIAM Journal on Scientiﬁc Computing (4), 2063–2094 (2011)19. Vavasis, S.A.: On the Complexity of Nonnegative Matrix Factorization. SIAM Jour-nal on Optimization (3), 1364–1377 (2010)20. Zhu, F.: Hyperspectral unmixing: ground truth labeling, datasets, benchmark per-formances and survey. arXiv preprint arXiv:1708.05125 (2017)21. Zhu, F., Wang, Y., Xiang, S., Fan, B., Pan, C.: Structured sparse method forhyperspectral unmixing. ISPRS Journal of Photogrammetry and Remote Sensing88