[PDF] A Semi-supervised Spatial Spectral Regularized Manifold Local Scaling Cut With HGF for Dimensionality Reduction of Hyperspectral Images

Abstract

Hyperspectral images (HSI) contain a wealth of information over hundreds of contiguous spectral bands, making it possible to classify materials through subtle spectral discrepancies. However, the classification of this rich spectral information is accompanied by the challenges like high dimensionality, singularity, limited training samples, lack of labeled data samples, heteroscedasticity and nonlinearity. To address these challenges, we propose a semi-supervised graph based dimensionality reduction method named `semi-supervised spatial spectral regularized manifold local scaling cut' (S3RMLSC). The underlying idea of the proposed method is to exploit the limited labeled information from both the spectral and spatial domains along with the abundant unlabeled samples to facilitate the classification task by retaining the original distribution of the data. In S3RMLSC, a hierarchical guided filter (HGF) is initially used to smoothen the pixels of the HSI data to preserve the spatial pixel consistency. This step is followed by the construction of linear patches from the nonlinear manifold by using the maximal linear patch (MLP) criterion. Then the inter-patch and intra-patch dissimilarity matrices are constructed in both spectral and spatial domains by regularized manifold local scaling cut (RMLSC) and neighboring pixel manifold local scaling cut (NPMLSC) respectively. Finally, we obtain the projection matrix by optimizing the updated semi-supervised spatial-spectral between-patch and total-patch dissimilarity. The effectiveness of the proposed DR algorithm is illustrated with publicly available real-world HSI datasets.

Full PDF

11 A Semi-supervised Spatial Spectral RegularizedManifold Local Scaling Cut With HGF forDimensionality Reduction of Hyperspectral Images

Ramanarayan Mohanty,

Student Member, IEEE,

S L Happy,

Member, IEEE, and Aurobinda Routray,

Member, IEEE

Abstract —Hyperspectral images (HSI) contain a wealth ofinformation over hundreds of contiguous spectral bands, makingit possible to classify materials through subtle spectral discrepan-cies. However, the classiﬁcation of this rich spectral informationis accompanied by the challenges like high dimensionality, sin-gularity, limited training samples, lack of labeled data samples,heteroscedasticity and nonlinearity. To address these challenges,we propose a semi-supervised graph based dimensionality reduc-tion method named ‘semi-supervised spatial spectral regularizedmanifold local scaling cut’ (S3RMLSC). The underlying idea ofthe proposed method is to exploit the limited labeled informationfrom both the spectral and spatial domains along with theabundant unlabeled samples to facilitate the classiﬁcation taskby retaining the original distribution of the data. In S3RMLSC,a hierarchical guided ﬁlter (HGF) is initially used to smoothenthe pixels of the HSI data to preserve the spatial pixel consistency.This step is followed by the construction of linear patches fromthe nonlinear manifold by using the maximal linear patch (MLP)criterion. Then the inter-patch and intra-patch dissimilaritymatrices are constructed in both spectral and spatial domains byregularized manifold local scaling cut (RMLSC) and neighboringpixel manifold local scaling cut (NPMLSC) respectively. Finally,we obtain the projection matrix by optimizing the updatedsemi-supervised spatial-spectral between-patch and total-patchdissimilarity. The effectiveness of the proposed DR algorithm isillustrated with publicly available real-world HSI datasets.

Index Terms —Dimensionality reduction, hyperspectral image,manifold local scaling cut, neighboring pixel manifold local scal-ing cut, regularized manifold local scaling cut, semi-supervisedspatial-spectral regularized manifold local scaling cut.

I. I

NTRODUCTION H YPERSPECTRAL remote sensing images (HSI) withhigh spectral and spatial resolutions capture the inherentphysical and chemical properties of the land cover. Therefore,HSI data analysis ﬁnds potential applications in environmentalresearch, geological surveys, mineral identiﬁcation, agriculturemonitoring, etc. However, the availability of limited traindata with a large number of spectral bands, make theseapplications very challenging. Generally the variation in sun-canopy-sensor geometry, the multipath scattering of light andnon-homogeneous composition of pixels make the acquiredHSI data modeling nonlinear [1]. Handling these complexand high dimensional redundant nonlinear data is a major

R. Mohanty is with the Advanced Technology Development Centre, IndianInstitute of Technology, Kharagpur, West Bengal, 721302 India e-mail:[email protected] L Happy and A. Routray are with Department of Electrical Engineering,Indian Institute of Technology, Kharagpur. challenge in HSI data analysis. To mitigate this challenge inHSI data analysis, an effective dimensionality reduction (DR)method is essential before training the classiﬁers. In this paper,we have made an effort to address this challenge from themanifold learning point of view.Assuming the real world high dimensional data possess fewdegrees of freedom [2], manifold learning helps in recoveringcompact, meaningful low dimensional structures from thecomplex high dimensional data for subsequent processing,such as classiﬁcations and visualizations [3]. This projectsthe higher dimensional data into lower dimensional space,while preserving their underlying geometrical structure [4].Several state-of-the-art techniques for DR utilize manifoldlearning, such as local linear embedding (LLE) [5], isometricfeature mapping (ISOMAP) [6], etc. These methods followunsupervised mode to model the data using a single man-ifold. However, the DR methods are mainly classiﬁed intothree major categories unsupervised, supervised and semi-supervised ones. The unsupervised DR methods project thedata to lower dimensional space without using any label in-formation. Several state-of-the-art unsupervised DR techniquesinclude principal component analysis (PCA) [7], LLE [5], [8],ISOMAP [6], Laplacian eigenmap (LE) [9], etc. However,their implicit nonlinear mapping technique forbids them todirectly apply to new test samples. This limits the applica-tion of these methods in the classiﬁcation task. Apart fromthat, these unsupervised manifold learning methods search k -nearest neighbors (Knn) of the given point from differentclasses, whereas their supervised counterparts only identifythe neighbors that are of the same class of that given point.Hence, it makes the supervised manifold learning approachmore favorable towards the classiﬁcation of HSI data [10],[11].The supervised DR approaches use the label informationto learn the discriminative projections. This includes, lineardiscriminant analysis (LDA) [12], [13], scaling cut (SC) [14],local scaling cut (LSC) [15], [16], [14], linear discriminant em-bedding (LDE) [17], local ﬁsher discriminant analysis (LFDA)[18], [19], nonparametric weighted feature extraction (NWFE)[20] and so on. The LDA seeks discriminative projectionby maximizing between-class scatter and minimizing within-class scatter by assuming the class distribution as unimodalGaussian distribution with equal covariance. Therefore, LDAfails to handle real world heteroscedastic and multimodal data.SC [14] and LSC [15] address these issue by constructing a r X i v : . [ c s . L G ] N ov pairwise dissimilarity matrix among the samples. LDE extendthe LDA by performing the local discriminant in a graphembedding framework. LFDA approach fuse the discrimina-tive property of LDA with the local preserving capability oflocality preserving projection (LPP) [21]. The NWFE methodextends the LDA method by adding a nonparametric scattermatrices with training samples.The aforementioned DR methods mostly focus on thespectral based approaches. They use the spectral domainEuclidean distance to compute the similarity measure. Thespectral-domain methods possess several limitations, such as:1) Relatively large spectral bands with respect to small trainingsamples create a singularity in the sample covariance matrixthat leads to ill-posed problems in classiﬁcation [22]. 2) Thelimited availability of labeled data samples for supervisedlearning is a major bottleneck in HSI data classiﬁcation. Itis also an expensive task in terms of time and efforts to labelall the acquired data for classiﬁcation purpose. 3) As the HSIdata class is distributed to multiple subregions, two HSI datasamples with small spectral distance measures may have largespatial distances or may be belong to different class (e.g. Theconcrete roof top of a house and concrete road may havesimilar spectral similarity measure but they belong to differentclass). This implies that only spectral similarity measure is notsufﬁcient for HSI data classiﬁcation task. Hence, the spectralsimilarity measure without considering the spatial inter-pixelcorrelation sometimes leads to under-classiﬁcation or over-classiﬁcation [23]. Therefore, use of spectral information aloneresults in unsatisfactory performances.To mitigate the challenge of limited training data, sev-eral DR algorithms adopt the semi-supervised approach byincorporating the unlabeled data samples with the labeledtraining data samples. Several attempts have been made forsemi-supervised classiﬁcation of HSI data, such as semi-supervised discriminant analysis (SDA) [24], semi-supervisedlocal ﬁsher discriminant analysis (SELF) [25], semi-supervisedlocal discriminant analysis (SELD) [26], generalized semi-supervised local discriminant analysis (GSELD) [27], semi-supervised local scaling cut (SSLSC) [16]. The SDA is thesemi-supervised version of the LDA and it faces the similarproblem as LDA faces. Similarly, SELF, SELD, GSELD andSSLSC are the semi-supervised versions of the LFDA, NPEand LSC algorithms. The SELF is based on LDA, whenthe availability of labeled samples are less, LDA performspoorly due to over-ﬁtting of the data. The SELD methodhas addressed the issues faced by SELF. Still, it overlooksthe non-linear property of the labeled data in deriving thescatter matrix for the whole class. The GSELD method isthe extended version of SELD with the tunable parameters.Similarly, SSLSC only considers the k -nearest elements toform the dissimilarity matrix without considering non-linearproperty of the classes. However, this is not sufﬁcient tocope with other mentioned limitations. Hence, supervisedDR methods consider both the spatial as well as spectralinformation to perform the similarity measure.The spatial contexture information boosts the discriminationability of the spectral based information to improve HSI dataclassiﬁcation performance. Therefore, spectral-spatial based methods gain considerable attention in HSI feature extractionand classiﬁcation tasks [22], [23], [28], [29]. In [30], Zhong.et. al proposed a tensorial extension of LDA to extract spatialspectral features of HSI, which obeys the Gaussian distributionwith equal variance. The spatial methods include two majorcategory i) spatial ﬁltering ii) spatial feature extraction. Thespatial ﬁltering method is a preprocessing approach to classiﬁ-cation, while spatial feature extraction incorporates the spatialinformation to improve the class discrimination. For example,multiple spectral and spatial features at both pixel and objectlevels are combined in [31] to construct an ensemble supportvector machine (SVM) for direct classiﬁcation of the data.These above mentioned methods either focus more onthe data distribution problem or pay more attention to theproblem of learning discriminant function by overlookingthe nonlinearity property, intrinsic geometry and proper dataprojection direction. In this paper, we have proposed anapproach for solving the above issues, that other techniqueshave overlooked. To achieve the objective, we ﬁrst representthe whole dataset as the union of several local linear patchesfollowed by the application of the local scaling-cut to ﬁndthe optimal projection matrix of the data in both spectral andspatial domains. Then we extend it to semi-supervised methodby exploiting the unlabeled data samples and named it as semi-supervised spatial-spectral regularized manifold local scalingcut (S3RMLSC). The brief overview of the proposed methodis discussed below.Here, we propose the manifold based method for reducingthe dimensions of the HSIs on the basis of their geometry andnonlinearity property. The proposed semi-supervised spatial-spectral regularized manifold local scaling cut (S3RMLSC)method is derived in six small steps. The initial step of themethod consists of preprocessing of the data by a spatialhierarchical guided ﬁlter (HGF) [32] to enhance the pixelconsistency by performing the edge aware noise smoothening.Next in the second step, we generate the local patches fromthe preprocessed data by selecting data points of the localneighborhood within the class in hierarchical manner. Thisresults in construction of multiple non overlapping linearpatches from a single class. In the third step, we proposethe spectral domain manifold scaling cut (MLSC) by con-structing the inter-patch and intra-patch dissimilarity matrices.These inter-patch dissimilarity matrix is constructed betweentwo nearest patches of different classes only. Then, we adda regularizer with the spectral domain MLSC to improvethe classiﬁcation performance by enhancing data diversityand stability and formulated a regularized MLSC (RMLSC).However, spectral information is not sufﬁcient to achievebetter classiﬁcation accuracy. Hence, in the fourth step, wepropose a graph based spatial segmentation technique of thepatches, called neighboring pixel MLSC (NPMLSC). ThisNPMLSC method constructs the between-patch and within-patch dissimilarity matrix among the spectrally closest patches.After getting the dissimilarity matrices by spectral RMLSCand spatial NPMLSC, we fuse both the dissimilarity matricesto achieve the new dissimilarity matrix for spectral-spatialMLSC (SSRMLSC) in step ﬁve. Finally, in the last step, weincorporate the unlabeled data with the labeled data to for- mulate the semi-supervised SSRMLSC (S3RMLSC) method.Irrespective of the distribution and the modality of the data,these local patches in the manifold are better separated, andmeanwhile, the intrinsic geometry of data is well preservedto maintain the within patch compactness. That assures thereliable classiﬁcation of the new testing data in the projectedembedding space.In summary, the method proposed in this paper addressesseveral issues like high dimensionality, singularity, insufﬁcientlabeled data samples, multimodality, heteroscedasticity whilepreserving local geometry of the data. The major characteris-tics of this work are enumerated as follows: • The spatial ﬁlter enhances the robustness towards thenoisy points by preserving the neighboring pixel consis-tency. • The spectral information reveal the nonlinear manifoldproperties of the HSI data and gather the neighboring datasamples from the manifold that spans the same class andlie on the linear patch. • The spatial inter-pixel correlation among the elements ofthe nearest patches enhances the class discrimination ofthe objects having similar spectral signature but belongto different class. • The regularization strategy used in the spectral domaingraph cut overcomes the singularity issue that might arisedue to small size samples. Moreover, the penalty termreveals the data diversity and enhances the data stability. • SSRMLSC fuses both local label neighborhood as wellas the local pixel neighborhood relation of the patches toachieve better projection and high classiﬁcation accuracy. • S3RMLSC incorporates spectral-spatial information fromthe labeled train data with the randomly selected unla-beled data samples from the test data for further improve-ment of the classiﬁcation performance.The complete work-ﬂow diagram of the proposedS3RMLSC based HSI classiﬁcation system is shown in Fig. 1.II. S

EMI - SUPERVISED S PATIAL S PECTRAL R EGULARIZED M ANIFOLD L OCAL S CALING C UT A. Problem formulation

Let’s assume that the data samples of the dataset lie ona nonlinear manifold M . In semi-supervised learning, a fewunlabeled samples from test set are used during training.Suppose, L + U is the total number of training data containing L number of labeled and U unlabeled samples. Therefore,we can represent the training dataset as X = x i | L + Ui =1 =( X L , X U ); x i ∈ R D , where X L = x i | Li =1 is the labeledtraining dataset with label y i and X U is the unlabeled trainingdata collected from the test dataset. The test dataset is X test and X U ⊂ X test . Here the number of distinct classes is K , i.e. y i ∈ { , , ..., K } . We represent M as the union ofseveral linear patches such that M = { S , S , ..., S n } , where n denotes the number of linearized patches.Our objective is to project the higher dimensional featurespace to a lower one ( R d ) by considering the original distri-bution of samples in the patch-wise locality of the spectraldomain manifold while utilizing the information from spatial neighborhood pixel structure to boost the system performance.These projection directions are obtained by maximizing thebetween-patch separability and minimizing the within-patchdistances to enhance compactness of the local patches withvaried class labels. First, the HSI data are preprocessed spa-tially by HGF [32]. Then, the manifold is learned consideringthe spatial-spectral local patch discrimination to distinguishmanifold boundaries efﬁciently. B. Spatial Hierarchical Guided Filter

The HGF [32] is a hierarchical extension of edge preservingguided ﬁlter (GF) [33] which is used for edge-aware noiseremoval. It is based on an assumption of local linear model,i.e., the ﬁlter output F is a linear transformation of theguidance image I in a squared window w k of size r × r centered at the pixel k : f i = a k I i + b k ∀ i ∈ w k (1)where a k and b k are some linear coefﬁcients for w k . Theassumption of this model ensures that ∇ f ≈ a ∇ I , i.e., theﬁlter output F has an edge if the guidance image I has anedge at that location. Given the input image P , the linearcoefﬁcients a k and b k are determined by minimizing theenergy function: E ( a k , b k ) = (cid:88) i ∈ w k (( a k I i + b k − P i ) + (cid:15)a k ) (2)Here (cid:15) is the regularization parameter that controls the degreeof blurring of the guided image.In this ﬁltering step, we initially obtain the guidance image I by taking PCA of the input image P and the obtainedﬁrst principal component is selected as the gray scale singlechannel guidance image, so that maximum reconstructionis possible. The given input HSI dataset is represented as X = { B , B , ..., B D } , where the input image P = X , B i is the i th band and D is the total number of bands.The principal components of the HSI data is derived as [ pc , pc , pc , ..., pc D ] = P CA ( X ) , and the constructed guid-ance image I = [ pc ] . Then, using Eq. 1 and 2, we determinethe ﬁltered output of each band of input image P and generatethe new ﬁltered image F = [ f , f , ..., f D ] with dimensionssame as data P . In this hierarchical model the output image F of the current hierarchy is utilized as the input to thenext hierarchy. This ﬁlter captures different small and largehomogeneous spatial structure of the HSI data. C. Local linearized patch model construction

Several methods have been proposed to extract the locallinear patches from a manifold using K-means clustering [34],[35], [36] and hierarchical agglomerative clustering (HAC)[37], [38]. These methods do not consider the linearity prop-erty of the extracted local patches during manifold formation.Moreover, the number of extracted local patches are neededto be speciﬁed prior to the clustering. Apart from that, theEuclidean distance measure becomes uniform and adverselyaffects the data representation when dimension is high. Inorder to overcome these limitations, a concept of maximal

Fig. 1: Complete workﬂow diagram of the proposed approach.linear patch (MLP) [39] with top-down hierarchical divisiveclustering (HDC) was proposed in [40]. The principal objec-tive of the MLP lies in two major criteria. 1) linear patchcriterion: for every point pair, their geodesic distance must benearly equal to their euclidean distance, which ensures thatthe patch lies in the linear subspace; 2) maximal linear patchcriterion: the patch size is maximized until the appended datasample violates the linear patch criterion. The nonlinearitydegree of this technique is measured by this linear patch cri-terion or deviation between euclidean distances and geodesicdistances [6] and [41]. In this work, we choose the hierarchicalclustering technique HDC to construct the local linear patchesfrom the nonlinear manifold, due to its ability to constructcluster trees or dendrograms of different degrees.We constructed local linear patches ( S , S , ..., S n ) outof the ﬁltered training data samples ( X L ) using HDC-MLPalgorithm for each class separately. That means samples fromeach class is further divided into different patches. Thus, thenon-linear manifold is approximated by the union of the localpatches, each containing samples from one class only, givenby M = n (cid:91) k =1 S k , and S i ∩ S j = φ (3)where, L is the total number of labeled training data samplesand n is the total number of disjoint linear patches. Letsassume t k is the number of data samples in the k th patch( S k ), so that n (cid:80) k =1 t k = L .The major advantage of generating the local patches include1) preserving the inherent structure of non-linear manifold, and2) using these local patches instead of the class samples forconstruction of the projection matrix in MLSC. Extracting lo-cal patches proves to be beneﬁcial for obtaining the projectionvectors that can achieve the optimal performance locally. D. Manifold Local Scaling Cut (MLSC)

In order to extract the spectral-domain information from thegenerated linear patches, we propose a manifold local scaling cut (MLSC) method. After representing the manifold by thelocal linear patches, the aim is to construct the optimal projec-tion matrix. MLSC criterion is constructed for the purpose ofexploiting both the local linear patch geometry and the globalmanifold structure of the HSI data. It exploits the geometryby using a Knn graph over extracted local linear patches fromthe nonlinear manifold.In [40], the discriminant function has been calculated byconsidering the patch centers, which signiﬁes that every patchobeys Gaussian distribution with equal variances. However,real world data usually have non-Gaussian distribution. Inorder to address this issue, the existing graph-based DRapproaches like SC [14], LSC [15] or SSLSC [16] select thenearest data samples from the same class and from dissimilarclasses for constructing the dissimilarity matrix. This has alimitation too; if a sample is surrounded by other class samplesin all directions, then proper projection matrix is not learned.However, this can be overcome by considering samples ingroups instead of individuals. When samples are grouped, wecan use the nearest group of similar and dissimilar classes forconstructing the dissimilar matrix.MLSC solves the above issue by working on the locallinear patches ( S i ). It optimizes the projection vectors by usingdiscriminant analysis on the samples locally. The formationof local patches facilitates the algorithm to select the closestdissimilar class patches for constructing the dissimilar ma-trix. Since the patches are locally linear, it guarantees theappropriate learning of projection directions. Moreover, thenumber of samples in a patch is determined based on thelinearity constraints. Thus, the selection of the neighboringpatch, instead of neighboring k samples, helps in preservingthe data variance and the inherent manifold structure. Theseclosest patch pairs are determined by computing the inter-patch distances for every patch of one class with every otherpatch of dissimilar classes only.The conceptual illustration of the proposed method isshown in Fig. 2. As can be seen in Fig. 2, the nonlin-ear manifold M is represented by the linear local patches Fig. 2: Illustration of the inter-patch dissimilarity matrixconstruction for MLSC. { A, B, C, D, E, F, G, H } , each of which contains samples ofa single class. MLSC uses the data points of the closest patchof a different class to construct the dissimilarity matrix. Thebetween class separability is represented by double headedthick blue arrows. The closest patch pairs based on their interpatch distances are shown by thin black arrows within thethick blue arrows. For instance, G ⇔ H ( ⇔ : signiﬁes H isthe nearest patch of G and vice-versa ) as distance between G and H is the smallest inter-patch distance for G as well as H . Similarly A ⇐ C ( ⇐ : signiﬁes C is the nearest patch of A but not vice-versa ). Note that C ⇐ D as D is the nearestone for C . However, E is the nearest one for D , therefore, D ⇐ E . Although the distance between patch B and D arethe least, D can’t be the nearest patch pair to B because B and D belong to the same class. Hence, B is paired with C and B ⇐ C . The between patch distances are maximizedto enhance their separability (shown in blue arrow ) and thewithin patch, data samples are compressed to enhance thewithin patch compactness (shown in red arrow ).Now, we have the manifold M = { S , S , ..., S n } repre-sented by n number of local linear patches. Let S k and S k (cid:48) be two local patches of different classes which are situatedclose to each other. Since the nonlinear manifold is locallyapproximated by the linear patches, it is quite obvious toassume that, the distance between the data samples and patchesare locally linear [9]. Hence, the distance between two closelocal patches of the different class is a euclidean distance andit is computed as: D patchS k − S k (cid:48) = || µ S k − µ S k (cid:48) || where µ S k = 1 t k (cid:88) x i ∈ S k x i and µ S k (cid:48) = 1 t k (cid:48) (cid:88) x i ∈ S k (cid:48) x i (4) B Speck = (cid:88) x i ∈ S k (cid:88) x j ∈ S k (cid:48) t k t k (cid:48) ( x i − x j )( x i − x j ) T W Speck = (cid:88) x i ∈ S k (cid:88) x j ∈ S k t k t k ( x i − x j )( x i − x j ) T (5)where t k and t k (cid:48) denotes the number of samples in localpatch S k and S k (cid:48) respectively. B Speck measures the dissimi- larity matrix between samples in local k th patch S k and itsnearest neighbor S k (cid:48) . Similarly, W Speck measures the totaldissimilarity within the samples of the patch S k . Here, weare only using the samples of closest local linear patch pairsof different classes instead of all samples of the classes forcomputing the dissimilarity matrix. For example, in Fig. 2,local linear patches { A, B, D } , { C, E, G } and { F, H } arehaving different class labels. We can observe that patch B and D are more close to each other, than { B, C } or { C, D } or { D, E } . However, S k (cid:48) = C for S k = B as B and D belong tothe same class whereas patch C belongs to a different class.Similarly, we can say S k (cid:48) = E for S k = D . Then, thesenearest patches of the varied class are used for constructingthe dissimilarity matrix. These dissimilarity matrices are usedin the optimization process to obtain the optimal projectionmatrix for dimension reduction. Using the deﬁnition of B Speck and W Speck the objective function of MLSC is deﬁned as

M LSC ( V ) = (cid:12)(cid:12)(cid:12)(cid:12) n (cid:80) k =1 V T B Speck V (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:80) k =1 ( V T W Speck V + V T B Speck V ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12) V T B Spec V (cid:12)(cid:12) | V T ( W Spec + B Spec ) V | = (cid:12)(cid:12) V T B Spec V (cid:12)(cid:12) | V T T Spec V | and n (cid:88) k =1 B Speck = B Spec ; n (cid:88) k =1 W Speck = W Spec (6)where B Spec is between-patch, W Spec is within-patch and T Spec = B spec + W spec is the total dissimilarity matrix inspectral domain of all local patches. These spectral dissim-ilarity matrices are used to construct the optimal projectionmatrix V by simultaneously maximizing the between-patchdissimilarity matrix and minimizing the within-patch dissimi-larity matrix. E. Spectral Regularized Manifold LSC (RMLSC)

The major limitation of spectral MLSC method is that,it suffers from singularity issue caused due to small sizesamples. The above limitations are addressed by appendinga regularized term in the spectral MLSC. The regularizer alsoincreases the patch discrimination by enhancing the inter-patchvariability. The newly derived criterion is termed as regularizedmanifold LSC (RMLSC). The RMLSC performs the spectral-domain manifold local scaling cut operation with a penaltyterm. Inspired by the existing literatures on spectral domainDR methods in [23] and [15], we propose a new objectivefunction for the RMLSC criteria. The objective function isdeﬁned as RB spec = tr ( V T [(1 − α ) B spec + αXX T ] V ); RW spec = tr ( V T [(1 − α ) W spec + α ( Diag ( diag ( W spec )))] V ) RT spec = RB spec + RW spec = tr ( V T [(1 − α )( W spec + B spec ) + α ( R w + R b )] V ) (7) RM LSC ( V ) = max V ∈ R D × d RB spec RT spec = max V ∈ R D × d tr ( V T [(1 − α ) B spec + αR b ] V ) tr ( V T [(1 − α )( T spec ) + α ( R w + R b )] V ) (8)where R w = Diag ( diag ( W spec ))) and R b = XX T arethe regularizers of the within-patch dissimilarity W spec andbetween-patch dissimilarity B spec respectively. α ∈ [0 , isthe regularization parameter, tr ( · ) is the trace of a matrix, diag ( · ) represents the diagonal elements of a matrix, and Diag ( · ) converts the vector into a diagonal matrix. The nu-merator ( RB spec ) of the objective function corresponds to thebetween-patch dissimilarity with the regularization term R b and the denominator ( RT spec = RB spec + RW spec ) representsthe combination of within-patch ( RW spec ) and between-patch( RB spec ) dissimilarity matrix with their corresponding regu-larizer R w and R b .The major modiﬁcation in this RMLSC is the regularizationterms R b and R w . The regularization term R b in the numeratorwell preserve the data diversity by maximizing the data vari-ance [23], [42]. It is proven that, the classiﬁcation performanceis greatly improved by well preserved data diversity [43]. Theregularizer R w used in the within-patch dissimilarity matrixis a diagonal element, and it improves the stability of thesolution. Due to limited training samples in HSI [26], theeigenvalues decay very rapidly to zero [44], [23]. These Smallor zero eigenvalues attain instability and lose discriminativeinformation by placing null spaces in the basis. This diagonalregularizer reduces the decay of the eigenvalues by actingagainst the bias estimation of small eigenvalues of the limitedtraining data [45]. Hence, it provides better stability. However,in the denominator, the total dissimilarity combines the both RB spec and RW spec . Hence, the denominator provides boththe data diversity and stability to the solution.When α = 0 , the RMLSC becomes the MLSC. TheRMLSC uses the labeled samples to determine the discrim-inative projection direction by considering the original dis-tribution and modality. The regularization terms are addedto the between-patch and within-patch dissimilarity matrix toincorporate the data diversity and stability in the local manifoldstructure of the neighborhood samples. F. Spatial Neighboring Pixel MLSC (NPMLSC)

In spatial-domain, the neighboring pixels share similar landcover properties and usually belong to the same class. Hence,this spatial information of the MLSC patches can be useful indetermining the projection matrix to improve the classiﬁcationperformance. In the proposed neighboring pixel manifold localscaling cut (NPMLSC) method, we construct a dissimilaritymatrix using the spatial neighborhood pixel information of theMLSC patches. It preserves the original spatial neighborhoodpixel correlation in the projected NPMLSC embedding space.As explained in section II-D, we ﬁrst determine the locallysituated nearest linear patch of S k , i.e. S k (cid:48) . Here, both S k and S k (cid:48) are of two different classes. Next, we determine the spatialneighborhood of all the pixels present in S k (cid:48) . Let x j be a pixelin patch S k (cid:48) ( x j ∈ S k (cid:48) ). Let’s denote the p surrounding pixelsin the spatial neighborhood of x j by P j = { x j , x j , ..., x jp } . The total number of spatial neighborhood elementsin S k (cid:48) is denoted as ¯ S k (cid:48) and ¯ S k (cid:48) = { P , P , ... } = { x , x , ..., x p , x , x , ..., x p , ..., x j , x j , ..., x jp , ... } .Then, using these spatial neighborhood elements of thenearest patch ¯ S k and ¯ S k (cid:48) , we compute both the between-patchdissimilarity matrix ( B spa ) and within-patch dissimilaritymatrix ( W spa ). B spa = n (cid:88) k =1 (cid:88) x i ∈ ¯ S k (cid:88) x j ∈ ¯ S k (cid:48) η ij ( x i − x j )( x i − x j ) T W spa = n (cid:88) k =1 (cid:88) x i ∈ ¯ S k (cid:88) x j ∈ ¯ S k ,x i (cid:54) = x j η ij ( x i − x j )( x i − x j ) T (9)where η ij = d ij (cid:80) ( i,j ) d ij and d ij = exp( − γ || ( x i − x j ) || ) isa weight function, which controls the effect of contributionof the points based on their spectral distance measure. TheNPMLSC seeks the linear projection matrix to maximizethe spatial neighborhood class discrimination using B spa and W spa . G. Spatial Spectral Regularized MLSC (SSRMLSC)

Due to the heteroscedastic distribution of HSI data, thespectrally close samples may belong to different classes.Hence, only spectral distance measure is inadequate for de-termining the optimal projection matrix. The spectral RMLSCmethod exploits the local intrinsic manifold of the data inspectral domain. On the other hand, NPMLSC uses the spatialinformation to retain the local pixel neighborhood structure ofthe linear patch without using the labeled spectral information.However, NPMLSC fails to connect two pixels with spatiallyhigher pixel distance in a homogeneous region or in a lin-early constructed patch. In such a case, the labeled spectralinformation plays a vital role to establish a connection, whichimproves the discrimination criteria.Therefore, labeled spectral information and spatial in-formation complement each other in terms of informationcontent and thereby improves the HSI classiﬁcation perfor-mance. Here, We incorporate the information from the spatialdomain with the spectral domain and propose a spatial-spectral RMLSC (SSRMLSC) method. By merging the spec-tral RMLSC and spatial NPMLSC method, we constructspatial-spectral between-patch dissimilarity matrix B SS andwithin-patch dissimilarity matrix W SS as W SS = β ( RW spec ) + (1 − β ) W spa B SS = β ( RB spec ) + (1 − β ) B spa T SS = W SS + B SS (10)where β ∈ [0 , is used to control the contribution ofthe spectral and spatial information. The optimal projectionmatrix ... V is obtained by solving by the generalized eigenvalueproblem. The obtained projection matrix project the originaldata to a lower dimensional space spanned by ... V to get thenew feature vectors. H. Semi-supervised Spatial Spectral Regularized MLSC(S3RMLSC)

SSRMLSC seeks the optimal projection matrix for dimen-sion reduction by purely using the labeled training data inboth spectral and spatial domains. However, large amountof unlabeled data are available in practical scenarios. Semi-supervised algorithms take the advantage of the abundanceof the unlabeled data to improve classiﬁcation performance.The proposed SSRMLSC algorithm can be extended to semi-supervised spatial spectral regularized MLSC (S3RMLSC)method by adding a penalty term derived from the unlabeleddata. This exploits the underlying geometrical structure of theunlabeled data during construction of projection vectors.For this semi-supervised method, we construct a weightedundirected graph G ∈ ( X, E ) from the total training dataset of X = ( X L , X U ) . Each observation in the dataset is consideredas a node and they are connected by a set of edges E with some associated weights. These weighted edges arerepresented by an adjacency matrix. The adjacency matrix A of the graph G is determined by computing the Knn of eachvertex (data sample). The diagonal elements of the computedadjacency matrix are zero as the distance measure is zero forthe same vertex. Then the graph laplacian L is determined by L = D − A (11)where D is the diagonal degree of the adjacency matrix A .This degree estimates the density around the data samples.The i th diagonal entry of D is calculated by D ii = (cid:80) L + Uj =1 a ij ,where a ij is an element of adjacency matrix A . Here both adja-cency and laplacian matrices are symmetric in nature. V is theoptimal projection matrix for DR such that z i = V T x i ∈ R d .If we consider two close points x i and x j in a manifold, thentheir projection vectors z i and z j are expected to be as closeas possible on the reduced dimensional hyperplane. Hence, theprojection matrix can be obtained by solving the optimizationwith respect to V in the equation, min (cid:88) i,j a ij || z i − z j || (12)Motivated by [16] and [46], we obtain a proper projec-tion matrix V ∗ by formulating the regularization term as V T X L X T V . To make this paper self-contained, we derivethe regularization term as L + U (cid:88) i =1 ,j =1 a ij || z i − z j || = 12 (cid:88) h L + U (cid:88) i =1 ,j =1 a ij ( v Th x i − v Th x j ) = (cid:88) h  L + U (cid:88) i =1 ,j =1 v Th x i a ij x Ti v h − L + U (cid:88) i =1 ,j =1 v Th x i a ij x Tj v h  = (cid:88) h v Th  L + U (cid:88) i =1 x i d ii x Ti − L + U (cid:88) i =1 ,j =1 x i a ij x Tj  v h = (cid:88) h v Th X ( D − A ) X T v h = tr (cid:0) V T X ( D − A ) X T V (cid:1) = tr (cid:0) V T X L X T V (cid:1) (13) We obtain the updated objective function for the S3RMLSCby adding the penalty term with SSRMLSC. Motivated by[15], we formulate this optimization problem as a trace-ratioproblem: V ∗ = arg max V ∈ R D × d tr (cid:0) V T B SS V (cid:1) tr ( V T ( T SS + γX L X T ) V ) (14)where γ is called a pooling parameter to balance the contri-bution of the regularization term and it ranges between [0 , .Here, we use the trace-difference problem proposed in [47]and [48] to solve the trace-ratio problem. Hence, we formulatethe trace-ratio problem as a trace-difference problem: V ∗ = arg max V ∈ R D × d tr ( V T ( B SS − λ × ( T SS + γX L X T )) V ) (15)This trace-difference problem has been solved by a techniquecalled decomposed Newton method (DNM) [47] to achievethe global optimum of the trace-ratio problem. In a generalway, the trace-difference function depends on the largest d eigenvalues. Initially, it determines the eigenvalue set byfunction decomposition and use their Taylor series expansionto approximate the function value. Then, it ﬁnds the eigenvalue λ by solving this approximated function in an iterative manner.Since these dissimilarity matrices are scaled by the size of thepatches; therefore, this graph cut criterion is called scaling cutcriterion. As this scaling cut criterion is applied between twonearest patches, it is termed as localized scaling cut criterion.The obtained optimal projection matrix V in S3RMLSC,extracts reliable boundary of the linear patches in the manifoldspace. We use the obtained projection matrix V over thelabeled training set and testing dataset to project it onto thenew reduced dimension. Later, support vector machine (SVM)classiﬁer is employed on the projected test dataset to predictthe labels for evaluating its accuracy.III. E XPERIMENTAL R ESULTS AND A NALYSIS

A. Dataset description

This paper adopts two benchmark real world HSI datasets,i.e., Indian pine [49] and Botswana to conduct our experi-ments.i) Indian pine dataset: It was captured by airborne visi-ble/imaging spectrometer over Northwest Indiana’s Indian pinetest site. The size of the dataset is × in the spatialdirection and it contains bands in spectral direction with ground truth classes.ii) Botswana dataset: It is a spaceborne dataset collectedover Okavango Delta, Botswana. This was obtained by Hype-rion sensor of NASA Earth observing-1 satellite. This imagehas × spatial pixels and spectral bands with ground truth classes.TABLE I: Overall accuracy (%) with varying w . w × × × × ×

11 13 ×

13 15 × Botswana

B. Experimental settings

We provide a comparative analysis of the performanceof the proposed approaches with the state-of-the-art tech-niques in this section. SVM without DR is reported whichacts as the baseline. The supervised methods, such as SC,LSC, NWFE, RLDE, and MLSC use the labeled pixels tocompute the projection matrix. For supervised spatial-spectralmethods, such as SSRLDE, SSRMLSC, and S3RMLSC, weused the labeled pixels and their corresponding spatial pixelsto compute the projection matrix. However, for the semi-supervised methods like SELDLPP, SELDNPE [26], SSLSC,and S3RMLSC, the complete training set (labeled + unlabeledpixels) are used. For a fair comparison and to understand theeffect of guided ﬁltering, we have considered the proposedtechnique and its variants in both with ﬁltering (HGF) andwithout ﬁltering (No HGF) conditions. The semi-supervisedSSRLDE technique used in the comparison is consideredwith ﬁltering (weighted mean ﬁltering (WMF)) condition only.The evaluation of S3RMLSC is carried out by determiningthe class accuracy (CA), overall classiﬁcation accuracy (OA),class average accuracy (AA), and kappa coefﬁcient ( k ) [50]of the SVM with the linear kernel as the classiﬁer on theprojected data. All obtained results are the average of ﬁveindependent iterations. The parameters of the comparativeapproaches are selected based on their relevant literature. Forboth the datasets, we randomly selected labeled pixelsfrom each class and unlabeled pixels to construct thetraining set for the semi-supervised case and the remainingpixels as the testing set. C. Parameter selection

S3RMLSC uses several basic parameters such as - reg-ularization parameter (cid:15) in HGF, spectral penalty parameter α , spatial-spectral contribution pooling parameter β , spatialneighborhood pixels p , unlabeled data samples u , Knn graphparameter for Laplacian matrix k , and semi-supervised regu-larization parameter γ . The experimental value of the regular-ization parameter of HGF is set to (cid:15) = 0 . . This parameteris selected based on its parental literature [32]. To obtain thesuitable value of α and β , we performed a grid search whilevarying both parameters in the range [0,1]. Empirically wefound the values ( α = 0 . and ( β = 0 . to be the best ﬁtfor the problem at hand.In the experiment, p is related to a spatial neighborhoodsquare window w . Having a w = 3 × window results with p = 9 . We varied w from × to × , and obtainedmaximum OA at window size × as shown in Table I.The results are achieved with labeled training samplesper class, dimensions , α = 0 . , and β = 0 . . Largerwindow size increases the probability of interference from thepixel of other classes. Hence, we select the window w of size × to reduce the interference and it is used in subsequentexperiments.To show the impact of the number of unlabeled samplesin semi-supervised training, we experimented with differentunlabeled data size in the range { , , · · · } as shownin Table II. These experiments are conducted by selecting labeled samples per class as training data and the rest data asthe test set. The unlabeled training data samples are randomlyselected from the test set. As per the Table II, we foundthat the proposed approach performs well while using unlabeled samples in training for both the datasets. Hence weselected number of unlabeled samples for the rest of theexperiments.Fig. 3: Effect of Knn parameter on OA of S3RMLSC.Empirically we set the regularization parameter γ to . .Fig. 3 shows the variations of OA with respect to the various k value of Knn in the proposed semi-supervised approach.From the Fig. 3; we found that the proposed method performswell for k = 5 . Hence, we consider k = 5 for the rest of ourexperiments. D. Comparison with other DR methods

Table III and IV provide the statistics of the class-wise,average, kappa and overall classiﬁcation accuracy with theexecution time of the algorithms for Indian pine and Botswanadataset respectively. The experiments were conducted by ran-domly chosen training samples and the results are averagedover ﬁve iterations for each case. From Table III and IV, wecan observe that • S3RMLSC with HGF ﬁlter yields the best classiﬁcationperformance among these supervised and semi-supervisedDR methods on both the datasets. • The use of HGF ﬁlter signiﬁcantly improves the classi-ﬁcation accuracy of all methods. In Indian Pines dataset(Table III), we observed a performance boost of 15% withthe use of HGF ﬁlter while the improvement in Botswana(Table IV) is around 4%. • Among the spectral domain supervised DR methods,MLSC outperforms the other methods with a few excep-tions. For example, in Botswana dataset, RLDE performsTABLE II: The effect of the number of unlabeled data in semi-supervised training (OA in %).

UL Data points

200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2500

Botswana 97.30 96.78 97.15 96.32 96.97 96.61 97.23 96.54 96.79

TABLE III: Comparison of the best classiﬁcation accuracies (in %) with corresponding dimensions (in bracket) and computationtime (in sec) of Indian Pines dataset. 10% of labeled samples per class and 2000 unlabeled samples are used in this experiment.

Class Trainsize Testsize RAW(200) SC(48) LSC(48) LFDA(50) NWFE(36) RLDE(38) MLSC SSRLDE SSRMLSC SELDLPP(44) SELDNPE(42) SSLSC(16) S3RMLSCHGF(50) No HGF(46) WMF(42) HGF(48) No HGF(50) HGF(48) No HGF(46)1 10 36 53.89 61.11 61.24 55.00 65.56 56.67 95.16 66.58 30.65 98.49 88.17 27.32 20.23 45.56 97.00 75.312 143 1285 73.45 70.57 74.69 65.85 74.91 66.85 92.07 69.53 82.79 93.02 67.83 87.81 68.67 54.10 94.48 69.533 83 747 46.79 52.82 56.98 48.81 55.37 50.63 94.91 73.82 75.94 97.40 57.66 75.42 66.34 44.87 97.42 65.824 24 213 58.12 52.77 52.88 42.35 63.66 53.33 95.75 57.09 87.21 96.97 73.36 69.09 69.21 43.85 97.53 77.095 49 434 87.70 84.61 84.73 88.76 87.56 88.29 93.84 84.33 90.00 95.22 82.48 87.21 71.49 87.24 95.77 84.536 73 657 90.31 95.53 95.65 94.89 95.56 94.22 99.25 93.18 82.92 98.35 94.67 91.8 93.45 93.94 98.54 93.187 10 18 85.56 84.44 84.57 92.22 91.11 85.56 94.56 84.44 73.29 95.16 86.67 34.54 30.16 76.67 94.22 84.448 48 430 85.81 93.91 94.03 97.12 98.60 95.95 99.18 95.53 97.49 99.76 92.84 98.32 99.55 98.33 98.39 95.539 10 10 80.00 86.00 86.13 86.00 90.00 70.00 99.89 86.00 28.90 100.00 82.00 25.12 22.34 70.00 100.00 86.3210 98 874 66.89 67.39 76.49 44.55 66.16 63.43 89.93 76.31 89.20 90.35 76.16 86.31 84.57 59.24 91.63 68.3111 246 2209 81.28 79.67 79.79 88.20 81.96 71.30 97.09 80.53 99.86 98.59 77.71 96.18 98.70 81.11 98.26 80.5312 60 533 64.02 55.38 55.51 52.57 67.88 62.96 96.29 64.71 95.90 97.88 79.38 84.35 88.92 36.21 97.78 69.7113 21 184 95.11 93.37 93.50 97.28 90.76 95.98 96.67 92.93 91.43 94.14 93.80 88.78 72.71 93.80 93.84 92.9314 127 1138 73.92 93.55 93.68 96.82 95.17 90.72 98.99 94.59 99.23 99.44 92.02 94.75 98.82 95.10 99.51 94.5915 39 347 65.59 60.40 58.90 58.90 62.65 60.35 96.65 72.54 97.68 97.77 57.93 68.78 89.17 58.33 97.73 69.9516 10 83 52.41 82.89 72.05 72.05 76.39 76.63 93.80 83.37 59.76 94.35 78.55 60.4 50.54 77.83 95.39 83.37AA

OA 71.51 76.29 78.88 75.40 79.08 73.44 κ Time 931.70 20.20 0.34 216.13 126.77

TABLE IV: Comparison of the best classiﬁcation accuracies (in %) with corresponding dimensions (in bracket) and computationtime (in sec) of Botswana dataset. 10% of labeled samples per class and 2000 unlabeled samples are used in this experiment.

Class TrainSize TestSize RAW(145) SC(44) LSC(44) LFDA(50) NWFE(50) RLDE(8) MLSC SSRLDE SSRMLSC SELDLPP(46) SELDNPE(38) SSLSC(48) S3RMLSCHGF(50) No HGF(36) WMF(42) HGF(48) No HGF(48) HGF(40) No HGF(48)1 27 243 85.17 96.81 97.94 99.42 99.47 99.18 99.67 99.00 97.86 100.00 98.68 97.20 97.28 99.26 99.92 98.682 11 90 90.78 88.65 89.78 94.00 90.89 92.89 93.01 91.57 90.00 95.13 93.78 87.11 74.22 92.44 96.31 94.003 26 225 89.67 95.23 96.36 97.69 98.93 99.02 100.00 97.33 95.58 100.00 96.53 97.78 99.29 99.11 100.00 96.184 22 193 89.64 94.52 95.65 94.30 97.10 96.06 98.42 91.30 90.21 98.45 92.85 98.65 86.32 93.89 98.26 92.855 27 242 92.15 85.73 86.86 79.17 89.17 82.89 90.94 87.63 94.17 96.48 93.69 79.92 79.17 86.78 96.25 94.026 27 242 83.47 79.53 80.66 74.38 82.73 85.87 95.99 89.69 92.00 96.32 92.65 77.44 65.62 84.05 96.73 92.317 26 233 87.42 95.97 96.39 95.11 96.48 96.39 98.13 95.19 88.84 99.03 95.62 96.91 89.53 97.08 99.01 96.228 21 182 89.25 97.81 98.24 92.53 98.46 96.92 98.43 95.60 89.21 99.12 95.82 92.09 83.85 97.14 99.21 95.829 32 282 87.94 87.30 87.73 87.52 88.58 90.92 97.97 91.38 100.00 99.35 88.51 90.92 99.65 88.72 99.52 88.2310 25 223 93.27 89.62 90.04 91.39 90.13 92.74 95.13 90.49 90.18 97.27 93.27 95.16 86.46 91.57 97.65 93.0911 31 274 86.50 93.00 93.43 91.09 94.82 91.90 95.13 94.60 94.38 95.75 90.15 81.24 88.47 91.24 95.89 89.9312 19 162 85.36 89.08 89.51 87.16 87.41 89.26 96.63 93.46 95.12 95.43 90.97 84.94 89.14 91.98 95.71 93.2513 27 241 88.53 90.28 90.71 88.63 94.77 93.86 97.13 88.63 87.88 98.56 90.85 92.45 94.44 92.95 98.62 91.7914 10 85 86.47 96.04 96.47 93.41 90.12 92.71 96.00 95.76 94.31 94.24 91.12 76.94 78.10 96.24 95.39 89.18AA

331 2917

OA 86.90 90.11 91.83 89.96 92.94 92.79 κ Time 56.75 3.16 0.14 12.30 3.18 better than MLSC when HGF is not applied. However,the classiﬁcation accuracy of MLSC is close to the bestperforming methods while its time complexity is verylow. • The performance of NWFE is very competitive withMLSC without HGF condition in both the datasets.However, the time complexity of NWFE is 30 to 100times more than the time complexity of MLSC. For real-time operation, the proposed MLSC might be a suitablesolution with reliable performance. • MLSC extends the LSC along the nonlinear manifold andachieves better overall accuracy than LSC in both thedatasets. Additionally, the computational complexity ofMLSC is ten times smaller than that of LSC. • SSRMLSC outperforms the spectral base supervised,semi-supervised and other spatial-spectral DR methods.Though RLDE performs better than MLSC, the perfor-mance of its spatial extension SSRLDE is lower thanSSRMLSC (considering the case when HGF is not used). • The use of unlabeled data samples in semi-supervisedlearning can negatively affect the classiﬁcation perfor-mance as in the case of SELDLPP and SELDNPE. • From Table III, we can observe that all state-of-the-artmethods fail to classify for classes { , , , } properlydue to insufﬁcient data. However, the regularization termsin the proposed methods avoid this problem and classify these classes correctly. • Comparing the computation time, S3RMLSC is compara-tively slower than the other semi-supervised methods, butit provides faster performance in comparison to SSLSCmethod. Similarly, the time consumed by the MLSC andSSRMLSC are competitive with other state-of-the-art DRmethods while achieving reliable performance. • The proposed supervised and semi-supervised methodsalso perform well for very small size labeled trainingsamples. Table V supports the effectiveness of the pro-posed methods. • In this work, we performed most of the analysis usingthe Botswana and Indian pines data. However, to showthe effectiveness of the parameters learned from theseanalysis, we tested it on the Pavia university dataset( × spatial pixels, spectral bands and classes). From Table V, we can observe that the proposedalgorithms boost the performance to a greater extent onPavia university data in the presence of HGF ﬁlters. Forinstance, − improvement is observed with trainingsamples, while the improvement is − for trainingdata samples. Similarly, the proposed algorithms alsoachieves a performance boost of − and − for and training data samples respectively while HGFis not used. TABLE V: Classiﬁcation performance in the case of smallsample size, i.e., 8, 10, 15, and 20 labeled samples per class.

Indian PinesNo. of labeledtrain samples RLDE MLSC SSRLDE SSRMLSC SSLSC S3RMLSCHGF No HGF WMF HGF No HGF HGF No HGF8 53.21 71.04 53.70 58.70 76.01 58.20 48.72 75.89 58.9010 55.06 74.22 55.67 69.99 78.70 55.26 49.51 78.77 55.6215 61.91 81.57 61.69 73.95 84.40 61.97 56.89 84.60 62.5720 63.84 82.70 63.89 76.16 85.50 64.27 59.32 85.59 64.96BotswanaNo. of labeledtrain samples RLDE MLSC SSRLDE SSRMLSC SSLSC S3RMLSCHGF No HGF WMF HGF No HGF HGF No HGF8 85.66 89.87 86.22 85.96 90.64 86.58 84.86 90.75 86.4410 86.98 90.33 87.80 87.18 92.23 88.47 86.56 92.55 88.5115 90.17 94.17 90.31 92.85 95.09 90.87 89.59 95.12 90.7620 91.33 95.90 91.52 93.77 97.19 91.67 91.04 97.27 91.97Pavia UniversityNo. of labeledtrain samples RLDE MLSC SSRLDE SSRMLSC SSLSC S3RMLSCHGF No HGF WMF HGF No HGF HGF No HGF8 65.14 72.34 68.21 66.52 72.89 69.66 65.17 72.95 69.9410 67.19 75.81 70.65 69.74 76.21 71.31 66.73 76.00 71.3615 68.06 78.55 72.90 72.07 78.91 73.72 65.84 78.75 73.6620 70.16 80.34 74.93 73.96 80.81 76.73 66.57 80.88 76.97

E. Effect of Number of Reduced Dimensions

Fig. 4 shows the variations of the overall accuracy withrespect to different reduced dimensions for supervised andsemi-supervised methods. We can observe that S3RMLSC,SSRMLSC and MLSC methods outperform the other state-of-art DR methods at various reduced dimensions. SSRLDE isthe close competitor of the proposed semi-supervised method.However, its performance is lower than MLSC in Botswanadataset. S3RMLSC achieves signiﬁcant accuracy at − dimensions in Indian pine and − dimensions in Botswanadataset. However, it is observed that its performance graduallybecomes consistent afterward due to the redundancy of spectralbands. The performance in Fig. 4 proves the robustness of theproposed algorithm on different dimensions. (a) (b) Fig. 4: OA of different methods with increasing of subspacedimensions (4a) Botswana and (4b) Indian pine. (a) (b)

Fig. 5: OA of different methods with increase in labeledtraining data samples (in %) per class (5a) Botswana and (5b)Indian pine.

F. Effect of Number of Train Samples

Fig. 5 shows the effect of the number of labeled traindata on classiﬁcation performance. As per the experimentalobservations on Botswana (Figs. 5a) and Indian pine (5b),the proposed method signiﬁcantly outperforms the other state-of-the-art methods. Table V also proves the effectiveness ofthe proposed methods on small size training samples like [8 , , , samples per class. Fig. 6 and 7 show theclassiﬁcation maps of the Indian pine and Botswana imagesfor different methods in a single run.From these above-observed results, we can highlight thefollowing conclusionsi) A large number of the bands in an HSI data are re-dundant, and the intrinsic relevant information lies in afew intrinsic dimensions. Hence, DR improves the HSIclassiﬁcation performance by projecting the data to areduced feature space where the effects of redundantbands are lessened.ii) MLSC performs better than supervised graph cut methods(SC and LSC), whereas SSRMLSC performs way betterthan SSRLDE. Further, S3RMLSC achieves signiﬁcantperformance improvement compared to the other super-vised as well as semi-supervised methods.iii) The proposed S3RMLSC method not only gives the bestoverall accuracy with highest class wise average perfor-mance but also gives the best kappa coefﬁcient comparedto others using limited bands in an optimal time. Hence,this explains the effectiveness of the algorithm in termsof optimal space and performance.iv) Most of the state-of-the-art methods compromise theperformance gain with the computational complexity.However, the time complexity of S3RMLSC is com-petitive with other methods while achieving promisingperformances.v) The proposed method also outperforms the other state-of-the-art methods when training set size is small.vi) S3RMLSC gives the best classiﬁcation performance forboth Botswana and Indian pine data with the same set ofoptimal parameters. Whereas, other methods tune theirparameters based on the dataset. This demonstrates therobustness of the proposed algorithm over others.vii) The robustness of the proposed methods are also demon-strated by learning the parameters from the Indian pinesand Botswana data and successfully using them on Paviauniversity data to obtain better classiﬁcation results.viii) The use of labeled and unlabeled data samples in theproposed semi-supervised DR method exploits the localproperty as well as the global geometry of the data. Thisgives the performance edge to S3RMLSC over MLSC.IV. C ONCLUSION

In this paper, we propose S3RMLSC which uses boththe spectral and spatial information to maximize the classdiscrimination. Spectral RMLSC incorporates the spectral in-formation with a regularization term which overcomes the thedata singularity by diversifying the HSI data samples. This (a) (b) (c) (d)(e) (f) (g) Fig. 6: Classiﬁcation maps for Indian pines using la-beled and unlabeled training pixels with 50 dimensions.(6a) Ground Truth, (6b) RLDE [OA= . ], (6c) MLSC[OA= . ], (6d) SSRLDE [OA= . ], (6e) SSRMLSC[OA = . ], (6f) SSLSC [OA= . ], (6g) S3RMLSC[OA= . ].enhances the discrimination capability and improves the clas-siﬁcation accuracy. The NPMLSC method is a robust graph cutbased spatial segmentation technique, which incorporates thespectral neighborhood measure with the spatial pixel neighbor-hood correlation to improve the class dissimilarity matrices.S3RMLSC takes the advantage of both spectral RMLSC andNPMLSC to obtain the optimal projection direction. Theidea of maximizing the local patch margin from dissimilarclasses while maintaining the individual patch compactnessof manifold, makes our method theoretically and practicallyappealing. The selection of data samples from the patch-wise locality of the manifold retains the geometrical andnonlinear property of the data. Apart from that, the use ofHGF increases the neighboring pixel consistency, preservesthe spatial contextual information and discriminates the edgesof the complimentary information robustly. We tested ourmethod and other classical methods on two popular real-worldHSI datasets. On these experiments, the proposed methodconsistently outperforms the classical methods by a largemargin. These promising experimental results of S3RMLSCon different datasets demonstrate its robustness as well asgeneric applicability. Further, we aim to explore the multivari-ate tensorial extension of this method in our future studies.R EFERENCES[1] P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, andA. Plaza, “Advances in hyperspectral image and signal processing: Acomprehensive overview of the state of the art,”

IEEE Geoscience andRemote Sensing Magazine , vol. 5, no. 4, pp. 37–78, 2017.[2] N. Zheng and J. Xue,

Statistical learning and pattern analysis for imageand video processing . Springer Science & Business Media, 2009.[3] D. Lunga, S. Prasad, M. M. Crawford, and O. Ersoy, “Manifold-learning-based feature extraction for classiﬁcation of hyperspectral data: A reviewof advances in manifold learning,”

IEEE Signal Processing Magazine ,vol. 31, no. 1, pp. 55–66, 2014.[4] T. Lin and H. Zha, “Riemannian manifold learning,”

IEEE Transactionson Pattern Analysis and Machine Intelligence , vol. 30, no. 5, pp. 796–809, 2008.[5] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction bylocally linear embedding,”

Science , vol. 290, no. 5500, pp. 2323–2326,2000. (a) (b) (c) (d)(e) (f) (g)

Fig. 7: Classiﬁcation maps for Botswana using labeledand unlabeled training pixels with 50 dimensions. (7a)Ground Truth, (7b) RLDE [OA= . ], (7c) MLSC [OA= . ], (7d) SSRLDE [OA= . ], (7e) SSRMLSC [OA = . ], (7f) SSLSC [OA= . ], (7g) S3RMLSC [OA= . ]. [6] J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometricframework for nonlinear dimensionality reduction,” science , vol. 290,no. 5500, pp. 2319–2323, 2000.[7] A. M. Mart´ınez and A. C. Kak, “Pca versus lda,” Pattern Analysis andMachine Intelligence, IEEE Transactions on , vol. 23, no. 2, pp. 228–233,2001.[8] T. Han and D. G. Goodenough, “Nonlinear feature extraction of hyper-spectral data based on locally linear embedding (lle),” in

Geoscienceand Remote Sensing Symposium, 2005. IGARSS’05. Proceedings. 2005IEEE International , vol. 2. IEEE, 2005, pp. 1237–1240.[9] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniquesfor embedding and clustering.” in

NIPS , vol. 14, 2001, pp. 585–591.[10] X. Geng, D.-C. Zhan, and Z.-H. Zhou, “Supervised nonlinear dimension-ality reduction for visualization and classiﬁcation,”

IEEE Transactionson Systems, Man, and Cybernetics, Part B (Cybernetics) , vol. 35, no. 6,pp. 1098–1107, 2005.[11] L. Ma, M. M. Crawford, and J. Tian, “Local manifold learning-based-nearest-neighbor for hyperspectral image classiﬁcation,”

IEEE Transac-tions on Geoscience and Remote Sensing , vol. 48, no. 11, pp. 4099–4109,2010.[12] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs.ﬁsherfaces: Recognition using class speciﬁc linear projection,”

IEEETransactions on pattern analysis and machine intelligence , vol. 19, no. 7,pp. 711–720, 1997.[13] T. V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classiﬁcation ofhyperspectral images with regularized linear discriminant analysis,”

IEEE Transactions on Geoscience and Remote Sensing , vol. 47, no. 3,pp. 862–873, 2009.[14] X. Zhang, S. Zhou, and L. Jiao, “Local graph cut criterion for super-vised dimensionality reduction,” in

Sixth International Symposium onMultispectral Image Processing and Pattern Recognition . InternationalSociety for Optics and Photonics, 2009, pp. 74 962I–74 962I.[15] X. Zhang, Y. He, L. Jiao, R. Liu, J. Feng, and S. Zhou, “Scaling cut criterion-based discriminant analysis for supervised dimension reduc-tion,” Knowledge and Information Systems , vol. 43, no. 3, pp. 633–655,2015.[16] X. Zhang, Y. He, N. Zhou, and Y. Zheng, “Semisupervised dimension-ality reduction of hyperspectral images via local scaling cut criterion,”

IEEE Geoscience and Remote Sensing Letters , vol. 10, no. 6, pp. 1547–1551, 2013.[17] H.-T. Chen, H.-W. Chang, and T.-L. Liu, “Local discriminant embeddingand its variants,” in , vol. 2. IEEE,2005, pp. 846–853.[18] M. Sugiyama, “Dimensionality reduction of multimodal labeled databy local ﬁsher discriminant analysis,”

Journal of Machine LearningResearch , vol. 8, no. May, pp. 1027–1061, 2007.[19] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Locality-preservingdimensionality reduction and classiﬁcation for hyperspectral image anal-ysis,”

IEEE Transactions on Geoscience and Remote Sensing , vol. 50,no. 4, pp. 1185–1198, 2012.[20] B.-C. Kuo and D. A. Landgrebe, “Nonparametric weighted featureextraction for classiﬁcation,”

IEEE Transactions on Geoscience andRemote Sensing , vol. 42, no. 5, pp. 1096–1105, 2004.[21] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, “Face recognition us-ing laplacianfaces,”

IEEE transactions on pattern analysis and machineintelligence , vol. 27, no. 3, pp. 328–340, 2005.[22] L. He, J. Li, C. Liu, and S. Li, “Recent advances on spectral–spatialhyperspectral image classiﬁcation: An overview and new guidelines,”

IEEE Transactions on Geoscience and Remote Sensing , vol. 56, no. 3,pp. 1579–1597, 2018.[23] Y. Zhou, J. Peng, and C. P. Chen, “Dimension reduction using spatialand spectral regularized local discriminant embedding for hyperspectralimage classiﬁcation,”

IEEE Transactions on Geoscience and RemoteSensing , vol. 53, no. 2, pp. 1082–1095, 2015.[24] D. Cai, X. He, and J. Han, “Semi-supervised discriminant analysis,” in . IEEE,2007, pp. 1–7.[25] M. Sugiyama, T. Id´e, S. Nakajima, and J. Sese, “Semi-supervisedlocal ﬁsher discriminant analysis for dimensionality reduction,”

Machinelearning , vol. 78, no. 1-2, pp. 35–61, 2010.[26] W. Liao, A. Pizurica, P. Scheunders, W. Philips, and Y. Pi, “Semisuper-vised local discriminant analysis for feature extraction in hyperspectralimages,”

IEEE Transactions on Geoscience and Remote Sensing , vol. 51,no. 1, pp. 184–198, 2013.[27] W. Liao, R. Bellens, A. Pizurica, W. Philips, and Y. Pi, “Classiﬁcation ofhyperspectral data over urban areas using directional morphological pro-ﬁles and semi-supervised feature extraction,”

IEEE Journal of SelectedTopics in Applied Earth Observations and Remote Sensing , vol. 5, no. 4,p. 1177, 2012.[28] J. Xia, J. Chanussot, P. Du, and X. He, “Spectral–spatial classiﬁcation forhyperspectral data using rotation forests with local feature extraction andmarkov random ﬁelds,”

IEEE Transactions on Geoscience and RemoteSensing , vol. 53, no. 5, pp. 2532–2546, 2015.[29] L. Sun, Z. Wu, J. Liu, L. Xiao, and Z. Wei, “Supervised spectral–spatialhyperspectral image classiﬁcation with weighted markov random ﬁelds,”

IEEE Transactions on Geoscience and Remote Sensing , vol. 53, no. 3,pp. 1490–1503, 2015.[30] Z. Zhong, B. Fan, J. Duan, L. Wang, K. Ding, S. Xiang, and C. Pan,“Discriminant tensor spectral–spatial feature extraction for hyperspectralimage classiﬁcation,”

IEEE Geoscience and Remote Sensing Letters ,vol. 12, no. 5, pp. 1028–1032, 2015.[31] X. Huang and L. Zhang, “An svm ensemble approach combiningspectral, structural, and semantic features for the classiﬁcation of high-resolution remotely sensed imagery,”

IEEE transactions on geoscienceand remote sensing , vol. 51, no. 1, pp. 257–272, 2013.[32] B. Pan, Z. Shi, and X. Xu, “Hierarchical guidance ﬁltering-basedensemble classiﬁcation for hyperspectral images,”

IEEE Transactions onGeoscience and Remote Sensing , vol. 55, no. 7, pp. 4177–4189, 2017.[33] X. Kang, S. Li, and J. A. Benediktsson, “Spectral–spatial hyperspectralimage classiﬁcation with edge-preserving ﬁltering,”

IEEE transactionson geoscience and remote sensing , vol. 52, no. 5, pp. 2666–2677, 2014.[34] H. Tan, Z. Ma, S. Zhang, Z. Zhan, B. Zhang, and C. Zhang, “Grassmannmanifold for nearest points image set classiﬁcation,”

Pattern RecognitionLetters , vol. 68, pp. 190–196, 2015.[35] T.-K. Kim, O. Arandjelovi´c, and R. Cipolla, “Boosted manifold principalangles for image set-based recognition,”

Pattern Recognition , vol. 40,no. 9, pp. 2475–2484, 2007.[36] M.-H. Yang, J. Ho, and K.-c. Lee, “Video-based face recognition usingprobabilistic appearance manifolds,” Mar. 3 2009, uS Patent 7,499,574. [37] Y. Zhao, S. Xu, and Y. Jia, “Discriminant clustering embedding for facerecognition with image sets,” in

Asian Conference on Computer Vision .Springer, 2007, pp. 641–650.[38] W. Fan and D.-Y. Yeung, “Locally linear models on face appearancemanifolds with application to dual-subspace based classiﬁcation,” in , vol. 2. IEEE, 2006, pp. 1384–1390.[39] R. Wang, S. Shan, X. Chen, and W. Gao, “Manifold-manifold distancewith application to face recognition based on image set,” in

ComputerVision and Pattern Recognition, 2008. CVPR 2008. IEEE Conferenceon . IEEE, 2008, pp. 1–8.[40] R. Wang and X. Chen, “Manifold discriminant analysis,” in

ComputerVision and Pattern Recognition, 2009. CVPR 2009. IEEE Conferenceon . IEEE, 2009, pp. 429–436.[41] R. Wang, S. Shan, X. Chen, J. Chen, and W. Gao, “Maximal linearembedding for dimensionality reduction,”

IEEE transactions on patternanalysis and machine intelligence , vol. 33, no. 9, pp. 1776–1792, 2011.[42] K. Q. Weinberger and L. K. Saul, “An introduction to nonlineardimensionality reduction by maximum variance unfolding,” 2006.[43] Q. Gao, J. Ma, H. Zhang, X. Gao, and Y. Liu, “Stable orthogonallocal discriminant embedding for linear dimensionality reduction,”

IEEEtransactions on image processing , vol. 22, no. 7, pp. 2521–2531, 2013.[44] X. Jiang, B. Mandal, and A. Kot, “Eigenfeature regularization andextraction in face recognition,”

IEEE Transactions on Pattern Analysisand Machine Intelligence , vol. 30, no. 3, pp. 383–394, 2008.[45] J. H. Friedman, “Regularized discriminant analysis,”

Journal of theAmerican statistical association , vol. 84, no. 405, pp. 165–175, 1989.[46] M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: Ageometric framework for learning from labeled and unlabeled examples,”

Journal of machine learning research , vol. 7, no. Nov, pp. 2399–2434,2006.[47] H. Wang, S. Yan, D. Xu, X. Tang, and T. Huang, “Trace ratio vs.ratio trace for dimensionality reduction,” in . IEEE, 2007, pp. 1–8.[48] Y. Jia, F. Nie, and C. Zhang, “Trace ratio problem revisited,”

IEEETransactions on Neural Networks , vol. 20, no. 4, pp. 729–735, 2009.[49] M. F. Baumgardner, L. L. Biehl, and D. A. Landgrebe, “220 band avirishyperspectral image data set: June 12, 1992 indian pine test site 3,” Sep2015. [Online]. Available: https://purr.purdue.edu/publications/1947/1[50] R. G. Congalton and K. Green,