A normalized scaled gradient method to solve non-negativity and equality constrained linear inverse problem - Application to spectral mixture analysis
Céline Theys, Henri Lantéri, Nicolas Dobigeon, Cédric Richard, Jean-Yves Tourneret, André Ferrari
11 A normalized scaled gradient method to solvenon-negativity and equality constrainedlinear inverse problem – Applicationto spectral mixture analysis
C´eline Theys ∗ , Henri Lant´eri ∗ , Nicolas Dobigeon † ,C´edric Richard ∗ , Jean-Yves Tourneret † and Andr´e Ferrari ∗∗ Laboratoire Lagrange, Universit´e de Nice Sophia-Antipolis, 06108 NiceEmail: { celine.theys, henri.lanteri, cedric.richard, andre.ferrari } @unice.fr † IRIT/INP-ENSEEIHT, Universit´e de Toulouse, 31071 Toulouse Cedex 7, FranceEmail: { nicolas.dobigeon, jean-yves.tourneret } @enseeiht.fr Abstract
This paper addresses the problem of minimizing a convex cost function under non-negativityand equality constraints, with the aim of solving the linear unmixing problem encountered inhyperspectral imagery. This problem can be formulated as a linear regression problem whoseregression coefficients (abundances) satisfy sum-to-one and positivity constraints. A normalizedscaled gradient iterative method (NSGM) is proposed for estimating the abundances of the linearmixing model. The positivity constraint is ensured by the Karush Kuhn Tucker conditions whereasthe sum-to-one constraint is fulfilled by introducing normalized variables in the algorithm. Theconvergence is ensured by a one-dimensional search of the step size. Note that NSGM can be appliedto any convex cost function with non negativity and flux constraints. In order to compare the NSGMwith the well-known fully constraint least squares (FCLS) algorithm, this latter is reformulated interm of a penalized function, which reveals its suboptimality. Simulations on synthetic data illustratethe performances of the proposed algorithm in comparison with other unmixing algorithms and, moreparticulary, demonstrate its efficiency when compared to the popular FCLS. Finally, results on realdata are given.
October 29, 2018 DRAFT a r X i v : . [ m a t h . O C ] O c t I. I
NTRODUCTION
Hyperspectral and multispectral imagery have received considerable attention in the liter-ature (see for instance [1], [2] and references therein). Hyperspectral and multispectral dataare collected in many spectral bands, providing accurate information regarding the observedscene. Recent applications benefiting from multi/hyperspectral imagery include ecosystemmonitoring [3], crop measure [4] and natural disaster analysis [5]. Each pixel of such imagesis represented by a reflectance vector, called spectrum, which contains the measurementsassociated with the different spectral bands. However, mainly due to the spatial resolutionof current spectro-imager, the measured pixel spectrum consists of a mixture of severalspectral signatures, usually referred to as endmembers , that characterize the macroscopicmaterials present in this pixel [6], [7]. Identifying these endmembers and estimating theircorresponding fractions, or abundances , in each image pixel is the core of the linear spectralmixture analysis (LSMA). As a first approximation, it is widely admitted that each pixelof the image can be accurately modeled as a linear mixture of the endmembers. Followingthis linear mixing model (LMM), LSMA can be addressed following two steps. First, itis important to identify the spectral signatures associated to the endmembers. Very popularalgorithms allowing endmember determination are N-FINDR algorithm proposed by Winter[8] and vertex component analysis (VCA) introduced in [9]. The second step within LSMAis the linear unmixing of each pixel of the image. Linear unmixing consists of estimatingthe abundance of each endmember contained in a given pixel. The linear unmixing problemis challenging because the abundances have to satisfy sum-to-one and positivity constraints.There are mainly two kinds of approaches which can be used to estimate abundances thatsatisfy these constraints. The first approach is to define appropriate prior distributions for theabundances (satisfying the sum-to-one and positivity constraints) and estimate the unknownparameters from the resulting joint posterior distribution following the principles of Bayesianinference [10]. However, the complexity of the parameter posterior distribution (essentiallydue to the constraints inherent to abundances) requires to develop sophisticated samplingalgorithms to compute the Bayesian estimators. These algorithms include the Gibbs sampleror the Metropolis-within-Gibbs algorithm [11]. This approach was for instance advocated in[10] and provided interesting results. The price to pay with Bayesian unmixing algorithmsis the high computational complexity resulting from the sampling strategy.The second approach consists of estimating the abundances by minimizing an appropriate
October 29, 2018 DRAFT cost function such as the least squares criterion under sum-to-one and positivity constraints.As explained in [12], there is no analytical solution for this optimization problem becauseof the linear inequalities resulting from the positivity constraints. However, an efficientiterative algorithm referred to as fully constrained least square (FCLS) algorithm has beenproposed in [12]. The FCLS has been applied successfully to the unmixing of hyperspectralimages. More recently, another algorithm called projected scaled gradient method (PSGM)has been proposed in [13] where the sum-to-one constraint is ensured by a projection ateach iteration. An important advantage of the estimators proposed in [12] and [13] is theirreduced computational cost, in regard to Bayesian strategy. However, the convergence of thesealgorithms to the global minimum of the cost function of interest is generally not ensured,which is their main drawback.This paper studies a normalized split gradient method (NSGM) for estimating the abun-dances involved within the LMM under positivity and sum-to-one constraints. The conver-gence of the NSGM is ensured for an appropriate step size, which makes this approachvery attractive. In order to compare the proposed NSGM with the popular FCLS algorithm,we will show that in the FCLS algorithm, the flux constraint is ensured by including apenalty term into the quadratic data term. The paper is organized as follows. In Section II,the normalized scaled gradient method is developed to minimize a general criterion underpositivity constraints. The problem of taking a flux constraint (i.e., the additivity constraint)into account is addressed in Section III. The resulting iterative algorithm to perform LSMA,i.e., to solve the LMM-based unmixing problem, is derived in Section IV. In Section V,explicit form of the flux constraint is given. Simulation results conducted on synthetic andreal data are presented in Section VI. Conclusions are reported in Section VII.II. S
CALED GRADIENT ALGORITHM FOR POSITIVITY CONSTRAINTS
This section studies an iterative method referred to as scaled gradient method (SGM) tominimize any convex criterion under positivity constraints. In other words, first the sum-to-one constraint is not taken into account but it will be handled in the next section. Minimizing aconvex cost function J under inequality constraints can be classically achieved by introducingthe Lagrange function. The Lagrange function L associated to the linear unmixing problemwith positivity constraints can be written as L ( α , λ ) = J ( α ) − λ T g ( α ) , October 29, 2018 DRAFT where λ = [ λ . . . λ R ] T contains the Lagrange multipliers and g ( α ) = ( g ( α ) . . . g ( α R )) T .Let us note that the method proposed hereafter is valid for any differentiable criterion J .Moreover if the criterion is convex and gradient Lipschitz, as in (24), the proposed methodconverges to the global minimum of the cost function J . The function g has to be chosento express the positivity constraints. More precisely, g is an increasing function that mustbe positive for inactive constraints ( α r > ), and zero for active constraints ( α r = 0 ). TheKarush Kuhn Tucker (KKT) conditions [14], [15] at the optimum ( α ∗ , λ ∗ ) express as follows [ ∇ α L ( α ∗ , λ ∗ )] r = 0 , ∀ r, (1) g ( α ∗ r ) ≥ , ∀ r, (2) λ ∗ r ≥ , ∀ r, (3) λ ∗ r g ( α ∗ r ) = 0 , ∀ r, (4)where ∇ α L is the gradient of L with respect to (w.r.t.) α and the notation [ · ] r is used forthe r th component of a vector. As a consequence, (1) leads to λ ∗ r ∂g ( α ∗ r ) ∂α r = [ ∇ α J ( α ∗ )] r . Equivalently, the r th Lagrange multiplier can be computed as λ ∗ r = [ ∇ α J ( α ∗ )] r∂g ( α ∗ r ) ∂α r , r = 1 , . . . , R. Taking into account the properties of g , (4) reduces to: [ ∇ α J ( α ∗ )] r g ( α ∗ r ) = 0 , r = 1 , . . . , R. (5)This last equation allows an iterative algorithm to be derived to estimate α under positivityconstraints.Let us note that the choice for g can lead to some properties on the algorithm speed. Forexample, taking a power function with an exponent smaller than accelerates the algorithm(see Appendix A). If the descent step-size is computed to monitor the convergence of thealgorithm, an obvious choice for g ( · ) is g ( α r ) = α r . Then, more generally, the r th componentof the descent direction can be f r ( α ) ˆ α r [ −∇ α J ( ˆ α )] r (6)where f r ( α ) is a positive function scaling the gradient, leading to the scaled gradient method(SGM) α ( k +1) r = α ( k ) r + γ ( k ) r f r (cid:0) α ( k ) (cid:1) α kr (cid:2) −∇ α J (cid:0) α ( k ) (cid:1)(cid:3) r , (7) October 29, 2018 DRAFT where γ ( k ) r is the descent step-size that must be adjusted to ensure convergence of thealgorithm.An interesting choice for the scaling function f r ( · ) initially proposed in [16] is recalledbelow. The negative gradient of any convex cost function J ( α ) with a finite minimum canalways be expressed as the difference between two positive functions: − (cid:2) ∇ α J (cid:0) α ( k ) (cid:1)(cid:3) r = (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r − (cid:2) V (cid:0) α ( k ) (cid:1)(cid:3) r . (8)By choosing the scaling function as f r (cid:0) α ( k ) (cid:1) = 1[ V ( α ( k ) )] r , (9)then equation (7) becomes: α ( k +1) r = α ( k ) r + γ ( k ) r α kr (cid:32) (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r − (cid:2) V (cid:0) α ( k ) (cid:1)(cid:3) r [ V ( α ( k ) )] r (cid:33) . (10)Let us determine the maximum value for the step size in order that α ( k +1) r ≥ , given α ( k ) r ≥ .Note that, according to (10), a restriction may only apply for the set of index r such that (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r − (cid:2) V (cid:0) α ( k ) (cid:1)(cid:3) r < (11)since the other terms are positive. The maximum step size which ensures the positivity of α ( k +1) r ≥ is given by ( γ kr ) max = (cid:34) − (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r [ V ( α ( k ) )] r (cid:35) − (12)which is strictly greater than . Finally, the maximum step size over all the components mustsatisfy γ k max ≤ min r { ( γ kr ) max } . (13)This choice ensures the non-negativity of the components of α ( k ) r from iteration to iteration.We can then write the algorithm (10) with a step size γ independent of the component α ( k +1) r = α ( k ) r + γ ( k ) α kr (cid:32) (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r − (cid:2) V (cid:0) α ( k ) (cid:1)(cid:3) r [ V ( α ( k ) )] r (cid:33) . (14)The step size ensuring the convergence of the algorithm must be computed by an economicline search, i.e, following the Armijo rule (see appendix B), searched in the range ]0 , γ k max [ .The step size can be chosen equal to , γ ( k ) r = 1 , ∀ r = 1 , . . . , R , then, we obtain the classicalmultiplicative algorithm: α ( k +1) r = α ( k ) r (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r [ V ( α ( k ) )] r . October 29, 2018 DRAFT
This multiplicative form is very attractive since the positivity of α ( k ) r throughout the algorithmiterations is ensured for any positive initial value α (0) r positive but the convergence is notensured in the general case. However, in some particular cases, for instance if J ( ˆ α ) isa quadratic cost function, we obtain the iterative space reconstruction algorithm (ISRA)algorithm whose convergence has been proved in [17].III. A LGORITHMS FOR POSITIVITY AND SUM - TO - ONE CONSTRAINTS
A. Normalized SGM
In the case where parameters are subject to positivity and sum-to-one constraints, we pro-pose a normalized SGM (NSGM) by introducing the non-normalized variable u = [ u . . . u R ] T related to α by α = u (cid:80) j u j . (15)Let us note that if (cid:80) j u j is a constant and if J is convex w.r.t. α , then J is also convexw.r.t. u . This property will be important in the following.The gradient of J w.r.t. a component u r is ∂J∂u r = R (cid:88) l =1 ∂J∂α l ∂α l ∂u r , (16)hence ∂J∂u r = 1 (cid:80) j u j (cid:32) ∂J∂α r − R (cid:88) l =1 α l (cid:18) ∂J∂α l (cid:19)(cid:33) . (17)Thus the negative gradient of J can be decomposed as in (8) with (cid:2) U (cid:0) u ( k ) (cid:1)(cid:3) r = 1 (cid:80) j u ( k ) j (cid:18) − ∂J∂α r − min r (cid:18) − ∂J∂α r (cid:19) + (cid:15) (cid:19) , (18) (cid:2) V (cid:0) u ( k ) (cid:1)(cid:3) r = 1 (cid:80) j u ( k ) j (cid:32) R (cid:88) l =1 α l (cid:18) − ∂J∂α l (cid:19) − min r (cid:18) − ∂J∂α r (cid:19) + (cid:15) (cid:33) . (19)Note that the term min r (cid:16) − ∂J∂α r (cid:17) is subtracted to ensure positivity of both U and V whereas (cid:15) is a small fixed constant, that does not modify the gradient but avoid the division by zeroin (14). Then the final algorithm based on (10) with the particular choice for f ( · ) given by(9) and expressions (18), (19) is: u ( k +1) r = u ( k ) r + γ ( k ) r u ( k ) r − ∂J∂α r − min r (cid:16) − ∂J∂α r (cid:17) + (cid:15) (cid:80) Rl =1 α l (cid:16) − ∂J∂α l (cid:17) − min r (cid:16) − ∂J∂α r (cid:17) + (cid:15) − . (20) October 29, 2018 DRAFT
It can be easily found that the flux is maintained on u , i.e., (cid:88) j u ( k +1) j = (cid:88) j u ( k ) j . (21)As noticed above, the convexity of J w.r.t. α is ensured. This allows us to come back to theinitial variables α , finally leading to the following updating rule of the proposed NSGM α ( k +1) r = α ( k ) r + γ ( k ) r α ( k ) r − ∂J∂α r − min r (cid:16) − ∂J∂α r (cid:17) + (cid:15) (cid:80) Rl =1 α l (cid:16) − ∂J∂α l (cid:17) − min r (cid:16) − ∂J∂α r (cid:17) + (cid:15) − . (22)The step sizes γ ( k ) r are tuned following the Armijo rule [18] (see Appendix B), at eachiteration k , to ensure the convergence of the algorithm.One can be easily shown that the KKT conditions (1), (2), (3) and (4) are fullfilled at thesolution: • If the solution α (cid:63)r > , then from (7), [ −∇ α J ( α ∗ )] r = 0 . • If α (cid:63)r = 0 and [ ∇ α J ( α ∗ )] r < , there is a contradiction because the term (1 + γ ( k ) r f r (cid:0) α ( k ) ) (cid:1) (cid:2) −∇ α J (cid:0) α ( k ) (cid:1)(cid:3) r is greater than in the neighborhood of α (cid:63)r and thesolution will never be reached.IV. A PPLICATION TO
LSMAWithin a widely admitted LSMA framework, the LMM assumes that a mixed pixel y resulting from an observation in L spectral bands can be written as a linear combination of R endmember spectra M , . . . , M R y = M α + e (23)where M = ( M . . . M R ) is the L × R matrix of the endmember spectra, α = ( α . . . α R ) T is the abundance vector to be estimated and e is an additive noise. The linear unmixingproblem considered in this paper consists of estimating α under positivity and sum-to-oneconstraints α r ≥ , ∀ r = 1 , . . . , R and R (cid:88) r =1 α r = 1 . A standard assumption related to the LMM defined in (23) is that the noise vector isdistributed according to a Gaussian distribution with zero-mean and covariance matrix Σ = σ I L , where I L is the L × L identity matrix. Note that this statistical model assumes thatthe noise variance is the same in all bands. This assumption has been used extensively in theliterature (see for instance [19], [20], [21]). Since the variance σ can be easily estimated October 29, 2018 DRAFT from the observation vector, it is assumed to be known in this paper. After removing theadditive and multiplicative constants, the resulting negative log-likelihood function associatedto the observed model reduces to the following cost function J ( α ) = 12 ( y − M α ) T ( y − M α ) . (24)The opposite of the gradient is then ∂J∂α r = M T y − M T M α (25)and the resulting iterative algorithm on α to conduct LSMA is given by the following updatingrule α ( k +1) r = α ( k ) r + γ ( k ) r α ( k ) r − ∂J∂α r − min r (cid:16) − ∂J∂α r (cid:17) + (cid:15) (cid:80) Rl =1 α l (cid:16) − ∂J∂α l (cid:17) − min r (cid:16) − ∂J∂α r (cid:17) + (cid:15) − . (26)V. I NTERPRETATION OF THE
FCLS
ALGORITHM
The FCLS algorithm [12] is popular tool to solve the linear unmixing problem detailed insection IV, briefly recalled below min α ( y − M α ) T ( y − M α ) (27)with α r ≥ , ∀ r = 1 , . . . , R and R (cid:88) r =1 α r = 1 . Within the FCLS scheme, the positivity constraint is ensured using the nonnegative leastsquares (NNLS) method, proposed by Lawson and Hanson [22]. Regarding the sum-to-one constraint, Heinz and Chang introduce in [12] a new signature matrix N and a newobservation vector s defined by N = δ M T and s = δ y , (28)with = (1 . . . R ) T . The initial unmixing problem then becomes min α ( s − N α ) T ( s − N α ) (29)with α r ≥ , ∀ r = 1 , . . . , R. October 29, 2018 DRAFT
The negative gradient w.r.t. α is then − ∇ α J ( α ) = N T s − N T N α , (30) = δ M T y + 1 − δ M T M α − (cid:88) i α i , (31) = M T y − M T M α + 1 δ − δ (cid:88) i α i . (32)Note that (30) corresponds to the negative gradient of: J ( α ) = ( y − M α ) T ( y − M α ) + 12 δ (cid:32)(cid:88) i α i − (cid:33) . (33)This result shows that the sum-to-one constraint is taken into account within the FCLSalgorithm by adding a penalization to the data fidelity term. Consequently, the flux conser-vation is not ensured at each iteration and only in an approximate way at the convergence.Moreover it depends on the regularization parameter / δ that needs to be empirically tuned.To conclude, the fundamental difference with the proposed NSGM lies in the fact that wepropose an interior point method: at each iterative step, the current estimates satisfy all theconstraints and we search for the best estimate among the proposed solutions.VI. S IMULATION RESULTS
A. Synthetic data
Many simulations have been conducted to validate the previous NSGM algorithm in theLSMA context. The first experiment has been conducted on a linear mixture of R = 3 end-members with α = [0 . , . , . T . The Armijo rule has been implemented with parameters β = and σ = . The three endmembers used in this example have been extracted from theENVI library [23] and correspond to the spectra of the construction concrete, green grassand micaceous loam. The NSGM defined by (26) has been applied on these simulated datacorrupted by an additive Gaussian noise with a signal-to-noise ratio SNR = 25 dB. Figure 1shows a typical example of abundance estimates α ( k ) as a function of the number of iterations k . The algorithm clearly converges after very few iterations.The means of the estimated abundances with NSGM averaged over Monte Carlo runsare depicted in Figure 2 as a function of the SNR and compared to those obtained with theFCLS algorithm [12], the Bayes estimator [10], the SGM algorithm (without flux constraint)detailed in Section II and the PSGM algorithm [13]. The corresponding estimate variances aregiven in separate tables, namely Tables I, II and III for parameters α , α and α , respectively. October 29, 2018 DRAFT0 ! ! ! Fig. 1. Typical NSGM estimate α ( k ) of the abundance vector α versus the iteration number k for SNR = 25 dB.TABLE IV ARIANCE OF α AS A FUNCTION OF THE
SNRSNR(dB) −
10 0 10 20
Bayes . e − . e − . e − . e − FCLS . e − . e − . e − . e − SGM . e − . e − . e − . e − PSGM . e − . e − . e − . e − NSGM . e − . e − . e − . e − Fig. 3 shows the means and the standard deviations of the estimated abundances for bothFCLS and NSGM. Initial values of the abundances have been uniformly drawn in the domaindefined by the constraints for each Monte Carlo run. Figure 2 shows that the means obtainedwith the five algorithms are very similar for a SNR level over dB . However, for lowSNR, the iterative algorithms SGM, PSGM and NSGM perform better. Note also that theirperformances are very similar. However, in the case of SGM, the sum over the componentsof α fluctuates around one without summing exactly to one. Moreover, the convergence ofPSGM is not ensured contrary to the proposed NSGM. Figure 3 shows that the performancesof FCLS and NSGM are very similar in terms of mean and variance of the estimates but,again, the flux constraint is strictly imposed only in the case of NSGM as it has beendemonstrated in Section V. October 29, 2018 DRAFT1 − − − − − − − − − BayesFCLSSGMPSGMNSGM
Fig. 2. Means of the abundances versus SNR for noise realizations
B. Real AVIRIS data
The proposed unmixing algorithm has been also applied on two real hyperspectral imagesacquired by the JPL spectroimager AVIRIS, the Cuprite mining site (NV, USA) and the
October 29, 2018 DRAFT2
TABLE IIV
ARIANCE OF α AS A FUNCTION OF THE
SNRSNR(dB) −
10 0 10 20
Bayes . e − . e − e − e − FCLS . e − . e − e − e − SGM . e − . e − e − e − PSGM . e − . e − e − e − NSGM . e − . e − e − e − TABLE IIIV
ARIANCE OF α AS A FUNCTION OF THE
SNRSNR(dB) −
10 0 10 20
Bayes . e − . e − . e − e − FCLS . e − . e − . e − e − SGM . e − . e − . e − e − PSGM . e − . e − . e − e − NSGM . e − . e − . e − e − Purdue Indiana Indian test site (IN, USA).
1) Cuprite:
This dataset has received considerable attention in the literature since geologiccharacteristics of the scene have been mapped in [24], [25]. The sample analyzed in thisexperiment consists of a sub-image of × that has been initially studied in [9].Following the choice in [9], R = 14 endmember spectra have been extracted by the vertexcomponent analysis (VCA) [9]. Then, the proposed unmixing procedure has been appliedpixel-by-pixel to evaluate the relative contribution of each endmember in each image pixel.The abundance maps are depicted in Fig. 4 where a black (resp. white) pixels correspondto absence (resp. presence) of the corresponding endmembers. Note that several areas withdominant endmembers are clearly recovered as similar to those identified in [9].
2) Indian Pines:
The use of the abundances as features to classify the voxels of hyper-spectral image is today a well established approach (see for example [26]). It relies on boththe dimensionality reduction and the physical meaningful of the abundance vector.The gain provided by the proposed unmixing algorithm in a classification framework hasbeen evaluated on the Indian Pines data. Figure 5 shows a typical spectral band of the
October 29, 2018 DRAFT3 − − − − − − − − − − − Fig. 3. Means and variances of the abundances versus SNR for FCLS and NSGM × × data cube. The ground truth image shown in Figure 6(a) reveals distinctobject classes and an additional background class.The procedure followed to assess the performance of the unmixing algorithms is detailed October 29, 2018 DRAFT4
Fig. 4. Abundance maps estimated by the proposed normalized SGM. in what follows. First, a principal component analysis has been conducted to reduce thedimensionality of the image. The spectra have been projected on the subspace spanned bythe principal components, the resulting data cube containing approximately of thetotal cube power. Then R = 18 endmembers ( corresponding to the number of classesin the ground truth, plus additional for the background class), have been extracted usingVCA. Finally, abundances have been estimated with the FCLS algorithm [12], the SUNSALalgorithm [27] and the proposed fully constrained method, NSGM.Classification of the abundances has been performed using the support-vector-machinemulti-classes one-against-all algorithm, [28]. The kernel is chosen as Gaussian with bandwidthequal to and the regularization parameter is set to − . The classifier has been trainedusing of the abundances of each class. Figure 6 shows a significative result of theclassification obtained by the considered methods. Note that for this particular choice of October 29, 2018 DRAFT5 training data . of voxels are correctly classified with FCLS, . with SUNSALand . with NSGM. A Monte Carlo simulation has been performed using differenttraining sets randomly chosen in the image. The mean number of correctly classified voxelsis . for FCLS, . for SUNSAL and . for NSGM. In this context of classification,the advantage of the proposed method can be explained by the fact that, contrary to FCLSand SUNSAL, NSGM estimates abundances that strictly complies to the constraints. Band 186
Fig. 5. Indian Pines data. Image at wavelength
VII. C
ONCLUSIONS
Constrained scaled gradient methods were initially derived for linear models subjected topositivity constraints. This paper studied a normalized scaled gradient method (NSGM) withpositivity and sum-to-one constraints. NSGM can be applied to any differentiable criterioncontrary to previous proposed algorithm, all the constraints being fulfilled at each iteration(characteristic to interior points methods), with an ensured convergence. The efficiency ofthe proposed NSGM was illustrated in a LSMA context for the estimation of abundances.The results obtained on synthetic and real data were very promising.A
PPENDIX
A. Discussion on the relation between the function g and the algorithm speed In Section II, the non-negativity constraint is expressed using the function g ( α ) = α . Thisappendix shows that other functions can be used, resulting in other multiplicative algorithms October 29, 2018 DRAFT6
Ground truth FCLS: 63.23% NSGM: 64.14% SUNSAL: 63.05%
Fig. 6. Indian pines classification. Fig. (a): Ground truth. Fig. (b): Classification result using FCLS unmixing. 63.23%of voxels are correctly classified. Fig. (c): Classification result using NSGM unmixing. 64.14% of voxels are correctlyclassified. Fig. (d): Classification result using SUNSAL. 63.05% of voxels are correctly classified. with higher convergence rates. Let us consider, for example, the case where the non-negativityconstraint is expressed using the general function g ( α ) = α /n , with n ∈ N ∗ . Taking thedecomposition (8): − (cid:2) ∇ α J (cid:0) α ( k ) (cid:1)(cid:3) r = (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r − (cid:2) V (cid:0) α ( k ) (cid:1)(cid:3) r . (34)The KKT condition (30) writes at the solution α /nr ([ U ( α )] r − [ V ( α )] r ) = 0 (35)that can be modified in the equivalent form α r [ V ( α )] nr ([ U ( α )] nr − [ V ( α )] nr ) . (36)The expression ([ U ( α )] nr − [ V ( α )] nr )[ V ( α )] nr (37)can be expanded in the form ([ U ( α )] r − [ V ( α )] r )[ V ( α )] r (cid:34) n − (cid:88) p =0 [ V ( α )] p − n +1 r [ U ( α )] n − − pr (cid:35) . (38) October 29, 2018 DRAFT7
Then the algorithm can be rewritten in the form α ( k +1) r = α ( k ) r + γ ( k ) r α r ([ U ( α )] r − [ V ( α )] r )[ V ( α )] r (cid:34) n − (cid:88) p =0 [ V ( α )] p − n +1 r [ U ( α )] n − − pr (cid:35) . (39)The function f r ( α ) is then f r ( α ) = 1[ V ( α )] r (cid:34) n − (cid:88) p =0 [ V ( α )] p − n +1 r [ U ( α )] n − − pr (cid:35) . (40)The effect of this function consists of a modification of the direction and of the modulusof the descent vector. It is always greater than /V and when U tends to V , i.e, close tothe convergence, it tends to n/V . Then taking a function g ( α ) = α /n with n > has theeffect to multiply the descent step-size by a factor greater than and equal to n close tothe convergence. The search of the maximum step size that ensures the non-negativity of thecomponent α follows the same procedure that for n = 1 and its value is equal to ( γ kr ) max = 11 − [ U ( α ( k ) )] nr [ V ( α ( k ) )] nr . (41)for r such that [ ∇ α J ( α )] r > . It is always greater than one and in the same way, we canobtain a multiplicative form of the algorithm by using a constant stepsize equal to one andin this case the actualization of α is given by α ( k +1) r = α ( k ) r (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) nr [ V ( α ( k ) )] nr . Clearly, in this case, the use of the exponent /n proposed in the literature [29], [30] plays therole of an accelerating term but the convergence is not ensured. Let us note that if < n ≤ ,we can easily show that (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) nr − (cid:2) V (cid:0) α ( k ) (cid:1)(cid:3) nr [ V ( α ( k ) )] nr = (cid:2) U (cid:0) α ( k ) (cid:1)(cid:3) r − (cid:2) V (cid:0) α ( k ) (cid:1)(cid:3) r [ V ( α ( k ) )] r
11 + (cid:18) [ U ( α ( k ) )] r [ V ( α ( k ) )] r (cid:19) n + (cid:18) [ U ( α ( k ) )] r [ V ( α ( k ) )] r (cid:19) n + . . . + (cid:18) [ U ( α ( k ) )] r [ V ( α ( k ) )] r (cid:19) ( n − n (42)Then taking a function g ( α ) = α /n with ≤ n ≤ has the effect to multiply the descentstep size by a factor smaller than one and consequently to decrease the algorithm speed. October 29, 2018 DRAFT8
B. Line search and Armijo rule
A line search method consists, at each iteration k , of choosing a descent direction p k anda step length γ k to compute u k +1 = u k + γ k p k (43)to solve the optimization problem min x ∈ R n f ( u ) (44)with the following assumptions on f ( u ) : • f ( u ) is a convex function with a finite minimum. • The gradient of f ( u ) denoted as ∇ f ( u ) , is Lipschitz continuous. Armijo rule • Set scalars, s k , β, L > , µ and σ as follows • s k = −∇ f ( u k ) T p k L || p k || (45) • β ∈ (0 , • σ ∈ (0 , ) • Then let γ k be the largest γ in (cid:8) s k , βs k , β s k , . . . (cid:9) such that f ( u k + γ p k ) − f ( u k ) ≤ σγ ∇ f ( u k ) T p k (46) Theorem 1:
Let the sequence { u } be generated by u k +1 = u k + γ k p k where p k is gradientrelated and u k is chosen by Armijo rule. Then every limit point of the sequence { u } is astationary point. October 29, 2018 DRAFT9 R EFERENCES [1] D. A. Landgrebe,
Signal Theory Methods in Multispectral Remote Sensing . New York: Wiley, 2003.[2] C. I. Chang,
Hyperspectral Imaging: Techniques for Spectral Detection and Classification . New York: PlenumPublishing Co., 2003.[3] G. P. Asner, D. E. Knapp, T. Kennedy-Bowdoin, M. O. Jones, R. E. Martin, J. Boardman, and C. B. Field, “Carnegieairborne observatory: in-flight fusion of hyperspectral imaging and waveform light detection and ranging for three-dimensional studies of ecosystems,”
J. Appl. Remote Sensing , vol. 1, no. 1, p. 013536, June 2007.[4] A. Larsolle and H. Hamid Muhammed, “Measuring crop status using multivariate analysis of hyperspectral fieldreflectance with application to disease severity and plant density,”
Precision Agriculture , vol. 8, no. 1–2, pp. 37–47,2007.[5] R. F. Kokalya, B. W. Rockwella, S. L. Haireb, and T. V. V. Kinga, “Characterization of post-fire surface cover, soils,and burn severity at the Cerro Grande Fire, New Mexico, using hyperspectral and multispectral remote sensing,”
Remote Sensing of Environment , vol. 106, no. 3, pp. 305–325, Feb. 2007.[6] N. Keshava and J. F. Mustard, “Spectral unmixing,”
IEEE Signal Processing Magazine , vol. 19, no. 1, pp. 44–57, Jan.2002.[7] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot, “Hyperspectralunmixing overview: Geometrical, statistical, and sparse regression-based approaches,”
IEEE J. Sel. Topics Appl. EarthObservations and Remote Sens. , vol. 5, no. 2, pp. 354–379, April 2012.[8] M. E. Winter, “Fast autonomous spectral end-member determination in hyperspectral data,” in
Proc. 13th Int. Conf.on Applied Geologic Remote Sensing , vol. 2, Vancouver, April 1999, pp. 337–344.[9] J. M. Nascimento and J. M. Bioucas-Dias, “Vertex component analysis: A fast algorithm to unmix hyperspectral data,”
IEEE Trans. Geosci. and Remote Sensing , vol. 43, no. 4, pp. 898–910, April 2005.[10] N. Dobigeon, J.-Y. Tourneret, and C.-I Chang, “Semi-supervised linear spectral unmixing using a hierarchical Bayesianmodel for hyperspectral imagery,”
IEEE Trans. Signal Process. , vol. 56, no. 7, pp. 2684–2695, July 2008.[11] C. P. Robert and G. Casella,
Monte Carlo Statistical Methods , ser. Springer Texts in Statistics. New York: Springer-Verlag, 2005.[12] D. C. Heinz and C.-I Chang, “Fully constrained least squares linear spectral mixture analysis method for materialquantification in hyperspectral imagery,”
IEEE Trans. Geosci. and Remote Sensing , vol. 39, no. 3, pp. 529–545, March2001.[13] C. Theys, N. Dobigeon, J.-Y. Tourneret, and H. Lant´eri, “Linear unmixing of hyperspectral images using a scaledgradient method,” in
Proc. IEEE Workshop Stat. Signal Process. (SSP) , Cardiff, Wales, UK, Aug. 2009, pp. 729–732.[14] W. Karush, “Minima of functions of several variables with inequalities as side constraints,” Ph.D. dissertation, Univ.of Chicago, 1939.[15] H. W. Kuhn and A. Tucker, “Nonlinear programming,” in
Proc. 2nd Berkeley Symp. , U. of California Press, Ed., 1951,pp. 481–492.[16] H. Lant´eri, M. Roche, and C. Aime, “Penalized maximum likelihood image restoration with positivity constraints –multiplicative algorithms,”
Inverse problems , vol. 18, no. 5, pp. 1397–1419, 2002.[17] M. E. Daube-Witherspoon and G. Muehllehner, “An iterative image space reconstruction algorithm suitable for volumeECT,”
IEEE Trans. Medical Imaging , vol. 5, no. 2, pp. 61–66, June 1986.[18] D. P. Bertsekas,
Non linear programming . Athena Scientific, 1995.[19] C.-I Chang, X.-L. Zhao, M. L. G. Althouse, and J. J. Pan, “Least squares subspace projection approach to mixed pixel
October 29, 2018 DRAFT0 classification for hyperspectral images,”
IEEE Trans. Geosci. and Remote Sensing , vol. 36, no. 3, pp. 898–912, May1998.[20] D. Manolakis, C. Siracusa, and G. Shaw, “Hyperspectral subpixel target detection using the linear mixing model,”
IEEE Trans. Geosci. and Remote Sensing , vol. 39, no. 7, pp. 1392–1409, July 2001.[21] J. Wang and C.-I Chang, “Applications of independent component analysis in endmember extraction and abundancequantification for hyperspectral imagery,”
IEEE Trans. Geosci. and Remote Sensing , vol. 44, no. 9, pp. 2601–2616,Sept. 2006.[22] C. L. Lawson and R. J. Hanson,
Solving Least Squares Problems . Prentice-Hall, 1974.[23] RSI (Research Systems Inc.),
ENVI User’s guide Version 4.0 , Boulder, CO 80301 USA, Sept. 2003.[24] R. N. Clark, G. A. Swayze, and A. Gallagher, “Mapping minerals with imaging spectroscopy, U.S. Geological Survey,”
Office of Mineral Resources Bulletin , vol. 2039, pp. 141–150, 1993.[25] R. N. Clark et al. , “Imaging spectroscopy: Earth and planetary remote sensing with the USGS Tetracorder and expertsystems,”
J. Geophys. Res. , vol. 108, no. E12, pp. 5–1–5–44, Dec. 2003.[26] I. D´opido, A. Villa, A. Plaza, and P. Gamba, “A quantitative and comparative assessment of unmixing-based featureextraction techniques for hyperspectral image classification,”
IEEE J. Sel. Topics Applied Earth Observations andRemote Sens. , vol. 5, no. 2, pp. 421–435, 2012.[27] J. Bioucas-Dias and M. A. T. Figueiredo, “Alternating direction algorithms for constrained sparse regression:Application to hyperspectral unmixing,” in
Proc. IEEE GRSS Workshop Hyperspectral Image SIgnal Process.:Evolution in Remote Sens. (WHISPERS) , 2010, pp. 1–4.[28] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines,”
IEEE Trans. Neur. Net. ,vol. 13, no. 2, pp. 415–425, 2002.[29] J. Llacer and J. Nu˜nez, “Iterative maximum likelihood and Bayesian algorithms for image reconstruction in astronomy,”in
The restoration Of Hubble Space Telescope images , R. L. White and R. J. Allen, Eds. The Space Telescope ScienceInstitute, 1990, pp. 62–69.[30] T. S. Zaccheo and R. A. Gonsalves, “Iterative maximum-likelihood estimators for positively constrained objects,”
J.Opt. Soc. Am. A , vol. 13, no. 2, pp. 236–242, Feb 1996., vol. 13, no. 2, pp. 236–242, Feb 1996.