Near-Convex Archetypal Analysis
Pierre De Handschutter, Nicolas Gillis, Arnaud Vandaele, Xavier Siebert
NNear-Convex Archetypal Analysis
Pierre De Handschutter Nicolas Gillis Arnaud Vandaele Xavier Siebert ∗ Department of Mathematics and Operational ResearchFaculté Polytechnique, Université de MonsRue de Houdain 9, 7000 Mons, Belgium
Abstract
Nonnegative matrix factorization (NMF) is a widely used linear dimensionality reductiontechnique for nonnegative data. NMF requires that each data point is approximated by aconvex combination of basis elements. Archetypal analysis (AA), also referred to as convexNMF, is a well-known NMF variant imposing that the basis elements are themselves convexcombinations of the data points. AA has the advantage to be more interpretable than NMFbecause the basis elements are directly constructed from the data points. However, it usuallysuffers from a high data fitting error because the basis elements are constrained to be containedin the convex cone of the data points. In this letter, we introduce near-convex archetypalanalysis (NCAA) which combines the advantages of both AA and NMF. As for AA, thebasis vectors are required to be linear combinations of the data points and hence are easilyinterpretable. As for NMF, the additional flexibility in choosing the basis elements allowsNCAA to have a low data fitting error. We show that NCAA compares favorably with astate-of-the-art minimum-volume NMF method on synthetic datasets and on a real-worldhyperspectral image.
Keywords. nonnegative matrix factorization, separability, algorithms
Nonnegative Matrix Factorization (NMF) is a well-known technique in unsupervised data analysis;see for example [11, 8] and the references therein. Given an m -by- n nonnegative input matrix X ∈ R m × n + and a factorization rank r , the goal of NMF is to find two nonnegative matrices W ∈ R m × r + and H ∈ R r × n + such that X ≈ W H . The standard NMF optimization problem isformulated as follows min W ∈ R m × r ,H ∈ R r × n (cid:107) X − W H (cid:107) F such that W ≥ and H ≥ , where || A || F = (cid:80) i,j A ( i, j ) is the squared Frobenius norm of matrix A . The matrix W is referredto as the matrix of basis elements while the matrix H indicates the proportions in which each basisvector is present in any data point. Due to its physical interpretation, for example in hyperspectralunmixing (see Section 4.2), the matrix H is often required to have the sum of the entries of each ∗ This work was supported by the European Research Council (ERC starting grant n o a r X i v : . [ ee ss . SP ] O c t olumn less or equal to . With this constraint and denoting ∆ r = (cid:8) x ∈ R r | x ≥ , r (cid:80) i =1 x i ≤ (cid:9) , weconsider in this paper the following problem min W ∈ R m × r + H ∈ R r × n + (cid:107) X − W H (cid:107) F such that H (: , j ) ∈ ∆ r for all j. A notable variant of NMF is archetypal analysis (AA) [5]. In AA, an additional constraint imposesthat the basis vectors, referred to as archetypes, are themselves convex combinations of the datapoints, that is W (: , k ) = XA (: , k ) with A (: , k ) ∈ ∆ n , k = 1 , ..., r . The problem becomes min A (: ,k ) ∈ ∆ n for k =1 ,...,rH (: ,j ) ∈ ∆ r for j =1 ,...,n (cid:107) X − XAH (cid:107) F . (1)AA has also been introduced under the name convex NMF [6]. While AA offers higher guaranteesof interpretability since the archetypes have to belong to the convex hull of the data points, thereconstruction error is likely to be higher than in NMF. Hence the membership of the archetypesto the convex hull of the columns of X was relaxed in some previous works. In [20], the sum ofthe entries of each column of A is allowed to be between − δ and δ for some δ ≥ fixed apriori . In [14], the authors combine AA and NMF through a trade-off between the reconstructionerror and the distance between the archetypes and the convex hull of X . However, the variablesinvolved are the NMF ones, namely W and H . Minimum-volume NMF [19] is an NMF variant thatminimizes the volume of the convex hull of the columns of W ; see [8] and the references thereinfor details. The latter two approaches, though close in spirit to AA, do not allow to interpret howthe archetypes are built from the data through a coefficient matrix A .In this work, we propose a new model, dubbed near-convex archetypal analysis (NCAA), whichbenefits from the advantages of both NMF via low reconstruction error and AA via interpretability.This letter is organized as follows. In Section 2, we state the model and explain its geometricinterpretation. We detail the optimization framework in Section 3. We present the performances ofour algorithm on both synthetic and real datasets in Section 4 and draw some possible perspectivesof future research in Section 5. Given a data matrix X ∈ R m × n , a matrix Y ∈ R m × d with d columns, the rank r of the factorizationand a positive scalar (cid:15) , we define the NCAA problem as follows min A ∈ R d × r H (: ,j ) ∈ ∆ r for j =1 ,...,n (cid:107) X − Y AH (cid:107) F such that A ( k, l ) ≥ − (cid:15) for all k, l, (cid:15) (cid:62) , d (cid:88) k =1 A ( k, l ) = 1 for all l = 1 , . . . , r, (2)For Y = X and (cid:15) = 0 , NCAA (2) coincides with AA (1). Let us point out the two differencesbetween NCAA and AA: • For (cid:15) > , the archetypes are allowed to lie outside the convex hull of the data points andare said to be near-convex combinations (NCCs) of the data points.2 (: , Z (: , Z (: , Z (: , Z (: , Z (: , W (: , W (: , W (: , Y (: , Y (: , Y (: , Y (: , Y (: , Y (: , ¯ y Data points X Points Y Points Z Basis vectors W Mean vector ¯ y Fig. 1: Geometric interpretation of NCAA for r = 3 , d = 2 r = 6 , m = 2 : as (cid:15) grows, the estimatedbasis vectors are further away from the convex hull of X . In this example, p = 0 . and (cid:15) = . • In order to reduce the computational cost compared to AA, the basis vectors
Y A are com-binations of a matrix Y , made of d points such that d (cid:28) n . In practice (see below), we willchoose these d columns as a subset of the columns of X . Note that, in [3], Y is chosen asthe vertices of the convex hull of X . However, in noisy scenarios, most data points will bevertices hence this approach usually does not allow to have d significantly smaller than n .NCCs have an interesting geometric interpretation, as stated in the following lemma. Lemma 1.
Let Y ∈ R m × d , and let us define the columns of the matrix Z as Z (: , j ) = Y (: , j ) + d(cid:15) ( Y (: , j ) − y ) for j = 1 , . . . , d, where y is the average of the columns of Y , that is, y = Y e/d where e is the vector of all ones. Then,the NCC of the columns of Y , that is, the set Y = (cid:8) x | x = Y a, (cid:80) i a i = 1 , a i ≥ − (cid:15) (cid:9) , is equal to theconvex combinations of the columns of Z , that is, to the set Z = (cid:8) x | x = Za, (cid:80) i a i = 1 , a i ≥ (cid:9) . Fig. 1 illustrates the geometric interpretation given in Lemma 1 for d = 2 r = 6 : each Z (: , j ) isaligned with the corresponding Y (: , j ) and y , and lies outside the convex hull of the columns of Y .This interpretation is rather interesting: as (cid:15) increases, the archetypes W = Y A are allowedto lie further away from the convex hull of the columns of Y . Let us define the purity level p of X = W H as p = min ≤ i ≤ r max ≤ j ≤ n H ( i, j ) . In Fig. 1, p = 0 . . If p = 1 , the data is said to beseparable (see for example [8]), as the basis vectors W correspond to some of the points in X sothat (cid:15) can be chosen equal to 0.There are two key aspects in the NCAA model: the choice of Y and the choice of (cid:15) . The valueof (cid:15) will be tuned automatically within the algorithm; see Section 3 for more details. For the choiceof Y , we use two simple schemes but others could be considered: • Successive nonnegative projection algorithm (SNPA) [10], designed to solve separable NMF,extracts extreme points of the dataset but is sensitive to outliers (although appropriate pre-and post-processing could resolve this issue).3
Hierarchical clustering (HC) [12], designed to cluster data points in a hierarchical way, iden-tifies points that are not necessarily extreme points of the data cloud but is less sensitive tooutliers hence appropriate for real data sets.The number of points d in Y is chosen a priori depending on the application (so is the rank r as in most NMF models). It can be chosen as a small multiple of r : typically, a value between r and r works well in practice; see the numerical experiments for some examples. We propose a standard optimization framework to solve (2), namely a two-block coordinate descent(BCD) [15]. It consists in alternatively optimizing A and H keeping the other fixed; this is thestandard framework for most NMF algorithms [11]. The optimization of A and H is performedthrough a fast projected gradient descent method (FPGM) with Nesterov acceleration [21]. Thestep size is tuned with a backtracking line search. The matrix H is projected onto the unit simplexthrough the algorithm described in Appendix of [10]. The projection of A can be performed in asimilar way but requires the implementation of an efficient column-wise algorithm, valid for any (cid:15) .We have implemented such an approach; see the Matlab code available from http://bit.ly/NCAAv1 Algorithm 1
NCAA
Input:
Nonnegative matrices X ∈ R m × n + and Y ∈ R m × d + , rank r , bounds < (cid:15) min < (cid:15) max ,tolerance δ Output:
Matrices A ∈ R d × r and H ∈ R r × n that solve (2) Compute initial matrices A (0) and H (0) , i = 0 , (cid:15) (1) = (cid:15) min . err (0) = || X − Y A (0) H (0) || F for t = 1 , , . . . do for k = 1 , , . . . , do i = i + 1 A ( i ) = arg min A (: ,l ) ∈ ∆ d(cid:15) ( t ) ∀ l (cid:107) X − Y AH ( i − (cid:107) F ( (cid:63) ) H ( i ) = arg min H (: ,j ) ∈ ∆ r ∀ j (cid:107) X − Y A ( i ) H (cid:107) F ( (cid:63) ) err ( i ) = || X − Y A ( i ) H ( i ) || F end for if err ( i ) − err ( i − < δ err (0) then (cid:15) max = (cid:15) ( t ) ; (cid:15) ( t +1) = (cid:15) min + (cid:15) max else (cid:15) min = (cid:15) ( t ) ; (cid:15) ( t +1) = min(2 (cid:15) ( t ) , (cid:15) max ) end if end for (cid:63) The problem is solved via FPGM.The value of the parameter (cid:15) is tuned automatically as it highly depends on the data distribu-tion. We believe this is a strong advantage of our method. For example, in the separable case, thatis, when the basis vectors belong to the data points [2], (cid:15) should be set to . However, when this isnot the case, (cid:15) should be chosen more carefully; see for example Fig. 1 for a non-separable case. Weproceed as follows. The value of (cid:15) is initially set to a very small value (we have used (cid:15) min = 10 −
4n all numerical experiments) which imposes that the archetypes are close to the convex hull of X . Then, (cid:15) is doubled at each iteration as long as the relative error decreases by a given tolerancebetween two consecutive iterations (in our implementation we have used δ = 10 − ). Intuitively,the idea is to slowly allow the basis vectors to lie further away from the convex hull of X . Thistuning process is described in Algorithm 1. We will denote ∆ r(cid:15) = (cid:8) x ∈ R r | x ≥ − (cid:15) , r (cid:80) i =1 x i ≤ (cid:9) .In the model (2) and in Algorithm 1, the value of (cid:15) is the same for all entries of A . However,in practice, it may be crucial to allow the columns of W to be closer or further away from thedata points. For example, it may happen that a subset of the archetypes belong to the data points(that is, some columns of W appear as columns of Y ) for which the value of (cid:15) should be equal tozero. Therefore, rather than considering a unique (cid:15) in the model, we impose instead that A (: , l ) ≥ − (cid:15) l for l = 1 , . . . , r, where (cid:15) l ≥ ( ≤ l ≤ r ) are parameters. To deal with this non-symmetric case, we propose afine-tuning stage, after Algorithm 1 has terminated. Starting from the value of (cid:15) computed byAlgorithm 1, (cid:15) l is fine-tuned for each basis vector Y A (: , l ) , one at a time and independently ofeach other (keeping the others fixed at the value returned by Algorithm 1). It works as follows.The value of (cid:15) l ( l = 1 , . . . , r ) is decreased by a factor α (set to . in the implementation) untilthe reconstruction error becomes larger than the error at the end of the global tuning; seeAlgorithm 2. Intuitively, we move back each column of W = Y A towards the convex hull of thecolumns of Y as long as the error does not increase too much. Algorithm 2
Fine tuning of NCAA
Input:
Nonnegative matrices X ∈ R m × n + , Y ∈ R m × d + , A ∈ R d × r , H ∈ R r × n , rank r , tolerance δ ∗ Output:
Matrices A ( ∗ ) ∈ R d × r , H ( ∗ ) ∈ R r × n that solve (2)err = || X − Y AH || F for l = 1 , . . . , r do B = A(cid:15) (0) l = − min( A (: , l )) for t = 1 , , . . . do (cid:15) ( t ) l = α(cid:15) ( t − l for k = 1 , , . . . , do i = i + 1 B ( i ) (: , l ) = arg min B (: ,l ) ∈ ∆ d(cid:15) ( t ) l (cid:107) X − Y BH ( i − (cid:107) F ( (cid:63) ) H ( i ) = arg min H (: ,j ) ∈ ∆ r ∀ j (cid:107) X − Y B ( i ) H (cid:107) F ( (cid:63) )err ( i ) = || X − Y B ( i ) H ( i ) || F end forif err ( i ) > δ ∗ err then A ( ∗ ) (: , l ) = B ( i ) (: , l ) breakend ifend forend for H ( ∗ ) = arg min H (: ,j ) ∈ ∆ r ∀ j (cid:107) X − Y A ( ∗ ) H (cid:107) F ( (cid:63) ) (cid:63) The problem is solved via FPGM. 5 omputational cost
The main computational cost of Algorithms 1 and 2 lies in the compu-tation of the gradient and the projection, which both require matrix-matrix multiplications. Onecan check that the computational cost per iteration is O ( mnr ) operations. As long as d is chosensmall enough (we recommend a multiplicative factor of r ), the computational cost of the algorithmremains linear in the dimensions of the input matrix as for classical NMF, hence can be appliedto large-scale data sets. Note that the number of variables of NCAA is r ( d + n ) which is less thanthe nr of the classical AA but of the same order as the number of variables in NMF r ( m + n ) , as d is usually smaller than m . In this section, the performances of NCAA are evaluated on synthetic data sets, and on a real-world hyperspectral image. We compare NCAA with a state-of-the-art minimum-volume NMFalgorithm that uses a logdet penalty to penalize the volume of the columns of X [9]: min W ≥ H (: ,j ) ∈ ∆ r for all j (cid:107) X − W H (cid:107) F + ˜ λ log det( W T W + δI r ) (3)where I r is the identity matrix of size r , ˜ λ a regularization parameter and δ a small scalar constant.This model was shown to be the most efficient compared to other minimum-volume algorithms in[1]. We use the implementation from [17] where it is recommended to use ˜ λ = λ || X − W H || F log det( W T W + δI r ) with λ = 0 . , . to balance the two terms in the objective function, and where the initial matrices ( W, H ) are computed by SNPA. The performance metric considered is the average mean removedspectral angle (MRSA) over all couples of corresponding estimated and expected basis vectors,after a proper assignment with the Hungarian algorithm [16]. The MRSA between two vectors x and y is given by MRSA ( x, y ) = π arcos (cid:16) (cid:104) x − x,y − y (cid:105)(cid:107) x − x (cid:107) (cid:107) y − y (cid:107) (cid:17) ∈ [0 , where (cid:104) · , · (cid:105) indicates thescalar product of two vectors and · is the mean of a vector. We first compare the methods on synthetic data sets to investigate the influence of different datadistributions. For NCAA, we use SNPA to generate Y with d = 10 r . Moreover, only Algorithm 1is used, as the purity level will be identical over all the basis vectors and the improvement broughtby the fine tuning stage was negligible.We generate the data matrices X ∈ R m × n + as follows. We fix n = 1000 and m = 10 . Given thefactorization rank r , the purity level p ∈ (0 , and the noise level υ , we generate 25 random matrices X = W t H t + N as follows. Each entry of W t ∈ R m × r + is drawn from a uniform distribution overthe interval [0 , . Then, each column is normalized so that its entries sum to one such that all thebasis vectors belong to ∆ m . The matrix H t ∈ R r × n + is generated through a Dirichlet distributionof parameter α ∈ R r , α i = 0 . for all i . Each column is resampled until every entry is smallerthan the given purity level p . Finally, noise is added to the data such that X = max (cid:18) , ˜ X + υ || ˜ X || F N || N || F (cid:19) , where ˜ X = W t H t and each entry of N follows a Gaussian distribution of mean 0 and standarddeviation 1.The following values of the variable parameters are considered: r = 3 , , , , p = 0 . , . , and υ = 0 , . , . , . , . . 6 p , r , υ ) NCAA MinVolNMF ( λ = 0 . ) MinVolNMF ( λ = 0 . ) SNPA( , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( , , ) . · − ± . · − ( ) . · − ± . · − ( ) . · − ± . · − ( ) . · − ± . · − ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( )( . , , ) . ± . ( ) . ± . ( ) . ± . ( ) . ± . ( ) Table 1: Comparison of the performances of NCAA, MinVolNMF and SNPA on synthetic data,with n = 1000 , m = 10 , d = 10 r in function of the purity level, rank and noise level respectively interms of average MRSA over randomly generated true factors. For each configuration, the bestaverage MRSA is highlighted in bold and the number of times each algorithm performs the bestis written in parentheses.Table 1 displays the average MRSA and the corresponding standard deviations obtained withthe 25 different generated true factors, computed for both NCAA and MinVolNMF (Eq. (3)) withtwo values of λ ( . and . ) as well as SNPA. Table 1 also provides the number of times eachalgorithm returned the best solution (that is, smallest MRSA among the four algorithms). Weonly present the results for some important configurations: we fix r = 7 , p = 0 . and υ = 0 and,for these values, we vary r , p and υ independently.We observe the following: • The variability of the settings generates in general high standard deviations. However, theranking trend given by the average MRSA is confirmed by the distribution of the best in-stances. • The MRSA of NCAA is in most cases lower than the one of MinVolNMF. Note that MinVol-NMF with the two values of λ give similar results. The baseline SNPA is only competitivein separable cases (that is, when p = 1 ). NCAA performs particularly well in the difficultscenario when r > m , in presence of heavy noise and in highly mixed situations ( p (cid:28) ). Asopposed to MinVolNMF, NCAA uses the data points to construct the basis vectors hence ismore robust in these difficult scenarios. Hyperspectral unmixing consists in identifying r materials, or endmembers, inside a hyperspectralimage made of n pixels in m spectral bands. The HYDICE Urban image is made of n = 307 × pixels in m = 162 denoised spectral bands. Four important materials are asphalt road, grass,tree and roof, and we fix r = 4 (see for example [22] for more details). The matrix Y is set upwith HC, and we use d = 20 . On Fig. 2, a comparison of the normalized spectral signatures ofthe endmembers obtained with both NCAA (including the fine-tuning step) and MinVolNMF, andthe ground truth is presented. We observe that NCAA produces spectral signatures very closeto the ground truth. Besides, the MRSA of our model is . while the one of MinVolNMF is . . Moreover, the abundance maps on Fig. 3b show that NCAA is able to retrieve meaningfulproportions of each endmember in the initial image. For comparison, the abundance maps ofMinVolNMF are presented on Fig. 3a. 7ig. 2: Endmembers comparison between NCAA, MinVolNMF and the ground truth for Urbanimage with r = 4 . (a) MinVolNMF (b) NCAA Fig. 3: Material abundances for Urban image with r = 4 . From left to right, on top: road, grass;on bottom: tree, roof. 8 Conclusion
In this letter, we have proposed a new NMF model called near-convex archetypal analysis (NCAA)which on the one hand guarantees a low approximation error and on the other hand is interpretablelike archetypal analysis from which it is inspired. The value of the parameter (cid:15) in NCAA (2) playsthe role of a cursor regulating the maximum distance between the basis vectors and the convex hullof the data points X . Although it is possible to estimate the value of the parameter (cid:15) that retrievesthe true basis vectors in simple settings, we would be interested in analysing the uniqueness of thesolution of NCAA. As the intuition of NCAA is close to the one of minimum-volume NMF, it wouldbe particularly interesting to extend the identifiability results obtained in [13, 18, 7]. It would alsobe interesting to explore other models inspired by NCAA. For example, using Y = X and imposingrow sparsity of A would make the model learn Y and the number of points d automatically; see [4]for a similar idea. Other models using regularization terms in the objective function rather thanfixing hard constraints on A could also be interesting to explore. References [1] A. Ang and N. Gillis. Algorithms and comparisons of nonnegative matrix factorizations withvolume regularization for hyperspectral unmixing.
IEEE Journal of Selected Topics in AppliedEarth Observations and Remote Sensing , 2019.[2] S. Arora, R. Ge, R. Kannan, and A. Moitra. Computing a nonnegative matrix factorization—provably.
SIAM Journal on Computing , 45(4):1582–1611, 2016.[3] C. Bauckhage. A note on archetypal analysis and the approximation of convex hulls. arXivpreprint arXiv:1410.0642 , 2014.[4] C. Bauckhage and C. Thurau. Making archetypal analysis practical. In
Joint Pattern Recog-nition Symposium , pages 272–281. Springer, 2009.[5] A. Cutler and L. Breiman. Archetypal analysis.
Technometrics , 36(4):338–347, 1994.[6] C. H. Ding, T. Li, and M. I. Jordan. Convex and semi-nonnegative matrix factorizations.
IEEE Transactions on Pattern Analysis and Machine Intelligence , 32(1):45–55, 2008.[7] X. Fu, K. Huang, and N. D. Sidiropoulos. On identifiability of nonnegative matrix factoriza-tion.
IEEE Signal Processing Letters , 25(3):328–332, 2018.[8] X. Fu, K. Huang, N. D. Sidiropoulos, and W.-K. Ma. Nonnegative matrix factorization forsignal and data analytics: Identifiability, algorithms, and applications.
IEEE Signal ProcessingMagazine , 36(2):59–80, 2019.[9] X. Fu, K. Huang, B. Yang, W.-K. Ma, and N. D. Sidiropoulos. Robust volume minimization-based matrix factorization for remote sensing and document clustering.
IEEE Transactionson Signal Processing , 64(23):6254–6268, 2016.[10] N. Gillis. Successive nonnegative projection algorithm for robust nonnegative blind sourceseparation.
SIAM Journal on Imaging Sciences , 7(2):1420–1450, 2014.[11] N. Gillis. The why and how of nonnegative matrix factorization.
Regularization, Optimization,Kernels, and Support Vector Machines , 12(257), 2014.912] N. Gillis, D. Kuang, and H. Park. Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization.
IEEE Transactions on Geoscience and Remote Sensing ,53(4):2066–2078, 2015.[13] K. Huang, N. D. Sidiropoulos, and A. Swami. Non-negative matrix factorization revisited:Uniqueness and algorithm for symmetric decomposition.
IEEE Transactions on Signal Pro-cessing , 62(1):211–224, 2013.[14] H. Javadi and A. Montanari. Non-negative matrix factorization via archetypal analysis.
Journal of the American Statistical Association , (just-accepted):1–27, 2019.[15] J. Kim, Y. He, and H. Park. Algorithms for nonnegative matrix and tensor factorizations: Aunified view based on block coordinate descent framework.
Journal of Global Optimization ,58(2):285–319, 2014.[16] H. W. Kuhn. The hungarian method for the assignment problem.
Naval research logisticsquarterly , 2(1-2):83–97, 1955.[17] V. Leplat, A. M. Ang, and N. Gillis. Minimum-volume rank-deficient nonnegative matrixfactorizations. In
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP) , pages 3402–3406. IEEE, 2019.[18] C.-H. Lin, W.-K. Ma, W.-C. Li, C.-Y. Chi, and A. Ambikapathi. Identifiability of the simplexvolume minimization criterion for blind hyperspectral unmixing: The no-pure-pixel case.
IEEETransactions on Geoscience and Remote Sensing , 53(10):5530–5546, 2015.[19] L. Miao and H. Qi. Endmember extraction from highly mixed data using minimum volumeconstrained nonnegative matrix factorization.
IEEE Transactions on Geoscience and RemoteSensing , 45(3):765–777, 2007.[20] M. Mørup and L. K. Hansen. Archetypal analysis for machine learning and data mining.
Neurocomputing , 80:54–63, 2012.[21] Y. Nesterov. A method for solving the convex programming problem with convergence rate o (1 /k ) . In Dokl. akad. nauk Sssr , volume 269, pages 543–547, 1983.[22] F. Zhu. Spectral unmixing datasets with ground truths. arXiv preprint arXiv:1708.05125arXiv preprint arXiv:1708.05125