[PDF] High Dimensional Data Modeling Techniques for Detection of Chemical Plumes and Anomalies in Hyperspectral Images and Movies

Abstract

We briefly review recent progress in techniques for modeling and analyzing hyperspectral images and movies, in particular for detecting plumes of both known and unknown chemicals. For detecting chemicals of known spectrum, we extend the technique of using a single subspace for modeling the background to a "mixture of subspaces" model to tackle more complicated background. Furthermore, we use partial least squares regression on a resampled training set to boost performance. For the detection of unknown chemicals we view the problem as an anomaly detection problem, and use novel estimators with low-sampled complexity for intrinsically low-dimensional data in high-dimensions that enable us to model the "normal" spectra and detect anomalies. We apply these algorithms to benchmark data sets made available by the Automated Target Detection program co-funded by NSF, DTRA and NGA, and compare, when applicable, to current state-of-the-art algorithms, with favorable results.

Full PDF

HHigh Dimensional Data Modeling Techniques forDetection of Chemical Plumes and Anomalies inHyperspectral Images and Movies

Yi (Grace) Wang ∗ , Guangliang Chen † , and Mauro Maggioni ‡§ Abstract

We brieﬂy review recent progress in techniques for modeling and ana-lyzing hyperspectral images and movies, in particular for detecting plumesof both known and unknown chemicals. For detecting chemicals of knownspectrum, we extend the technique of using a single subspace for modelingthe background to a “mixture of subspaces” model to tackle more com-plicated background. Furthermore, we use partial least squares regressionon a resampled training set to boost performance. For the detection ofunknown chemicals we view the problem as an anomaly detection prob-lem, and use novel estimators with low-sampled complexity for intrinsi-cally low-dimensional data in high-dimensions that enable us to modelthe “normal” spectra and detect anomalies. We apply these algorithmsto benchmark data sets made available by the Automated Target Detec-tion program co-funded by NSF, DTRA and NGA, and compare, whenapplicable, to current state-of-the-art algorithms, with favorable results.

Keywords.

Remote sensing, mixture models, robust modeling chemicalplumes, automated detection.

The ability to remotely sense and analyze chemical plumes has become increas-ingly important in many civilian and military applications. Hyperspectral imag-ing sensors that operate in the long-wave infrared (LWIR) part of the spectrumare particularly well suited for these chemical-sensing tasks, because their wide ∗ Y. Wang is with the Mathematics Department, Syracuse University, Syracuse, NY 13244,United States. † G. Chen is with Department of Mathematics and Statistics, San Jos´e State University,San Jos´e, CA 95192, United States. ‡ M. Maggioni is with the Departments of Mathematics, Electrical and Computer Engi-neering, Computer Science, Duke University, Durham, NC 27708, United States. § The work of M. Maggioni was supported by the ATD program funded by NSF, NGA andDTRA, under award a r X i v : . [ s t a t . M L ] J a n elds of view and high spectral resolutions allow many square kilometers to beimaged almost simultaneously and even optically thin chemical clouds to bedetected. Data collected by these sensors has the form of m × n × p arrays,where m, n are the spatial dimensions and p is the spectral dimension. Underphysically reasonable assumptions and simpliﬁcations [1], the following linearmixing model is obtained: x = (cid:88) ≤ i ≤ N g i s i + v , (1)where x ∈ R p represents the radiance spectrum in the scene, N the number ofchemicals of interest, s i ∈ R p the signature spectrum for the i -th chemical ofinterest, g i ≥ i , and v the radiance spectrum of thebackground. In practice, it is usually assumed that N ≤

3. In this paper, dueto the data under consideration, we consider the case N = 1 (thus, x = g s + v ).However, the methods discussed in this paper can be naturally extended to thecases where N > v . Currentapproaches (e.g. [2, 3]) often represent the background by a single Gaussian dis-tribution or subspace and then derive corresponding statistical estimators whichassign detection scores to each spectrum (see the review [1]). These detectionalgorithms have shown their eﬀectiveness in many applications, but there isroom for improvement. In particular, background often consists of heteroge-neous regions (such as sky, mountain, desert; see e.g. [1, Fig. 9]) which mayrequire separate subspaces to better capture the complexity of the background.Some other methods tackle the problem alternatively [4, 5, 6, 7], including thecase when the target plume is unknown [8].Our contributions in this work are threefold: (a) we extend the single-subspace model [3] for modeling the background to a “mixture of subspaces”model; (b) we propose techniques to enhance the detectability of the chemicalplume region, moving beyond classical least squares and using resampling tech-niques to boost detection; and (c) when detection of chemicals with unknownsignature is of interest, we propose a ﬂexible anomaly detection procedure whicheﬃciently constructs an empirical model of “normal” spectra and ﬂags anoma-lous ones according to likelihood assigned by the empirical model. In particular,(c) is achieved by applying an eﬃcient multiscale transform [9] to the hyperspec-tral spectra of a training frame and then model the density of the transformedpixels for computing the likelihood of each spectrum in any testing frame: if thetesting frame contains gas, then the gas region will be ﬂagged as an anomaly ifit is assigned low likelihood values.We consider two diﬀerent scenarios, depending on whether we are given onlya single hyperspectral cube or a time series of them. In Scenario (I), we use theavailable cube both for learning an empirical model for the background and forchemical detection, while in Scenario (II) we use the ﬁrst few frames (assumedto be without chemical plume) for background modeling and any subsequent2rame for testing for anomalies based on the learned background model. Ifthe sensor is assumed stationary, one could use the temporal variation of thespectrum at each location to detect changes and detect anomalies: we do not assume that the sensor is stationary (albeit it is such in most of the data weconsider) and therefore we do not use spatial coherence information across theimages. This makes the problem harder, but the proposed solution applicablein a wider variety of settings. Based on diﬀerent assumptions on the background radiation v , diﬀerent detec-tion algorithms have been proposed for the setting when the signature of thetarget plume is known. In this section we review some of the existing approachesfollowing the presentation of [1]. The simplest and most practical algorithm for chemical gas detection uses thelinear model in (1) and assumes normally distributed background clutter withknown covariance, that is, x = s g + v , v ∼ N ( µ, σ Σ)where the plume spectral signature s is also assumed to be known. The coef-ﬁcients g and σ can be estimated by maximizing the likelihood from a givensample. The gas detection problem can be formalized as that of testing thehypotheses H : Plume absent ( g = 0); H : Plume present ( g > . By applying the generalized likelihood ratio test (GLRT) one obtains the fol-lowing detector: T NMF ( x | µ, Σ , s ) = ( s T Σ − ˜ x ) ( s T Σ − s )(˜ x T Σ − ˜ x ) , ˜ x = x − µ (2)This is known as the normalized matched ﬁlter (NMF) and also the adaptivecosine or coherence estimator (ACE) [2]. Since the NMF requires estimationand inversion of the background covariance matrix Σ, they are approximated inthe following way in order to avoid numerical instability and to obtain robustdetectors: ˆΣ = p (cid:88) k =1 λ k q k q Tk + δI, ˆΣ − = p (cid:88) k =1 λ k + δ q k q Tk where λ k and q k ( k = 1 , · · · , p ) are the eigenvalues and eigenvectors of thesample variance, δ is a regularization parameter and can be chosen as certainpercentile of the values of { λ k } . For the data of our interest in this paper, the3erformance of ACE is not sensitive to the choice of δ as long as it is moderate,say between 30 and 80 percentiles of { λ k } . In the experiments of this paper, wechoose δ as the median of { λ k } . Another category of approaches for chemical plume detection assumes that thebackground clutter can be well represented by a low-dimensional subspace. Inthat case, the signal model becomes x = s g + µ + B c + (cid:15), (cid:15) ∼ N (0 , σ I )where µ, B represent a ﬁxed point and a basis for the background subspaceand (cid:15) is the additive noise. To ﬁnd µ, B , principal component analysis (PCA)is applied to the background spectra. Similarly to before, the quantities g , c and σ are estimated by maximizing the likelihood. Due to the assumption ofnormally distributed error (cid:15) , the maximum likelihood estimators (MLE) of g and c may be computed by least squares.The GLRT approach yields the following detector [3]: T NSS ( x | µ, B , s ) = (cid:107) P ⊥ b x (cid:107) (cid:107) P ⊥ tb x (cid:107) (3)where P ⊥ b and P ⊥ tb are projection matrices: P ⊥ b = I − B ( B T B ) − B T ; P ⊥ tb = I − A ( A T A ) − A T , A = [ s B ] . Note that (cid:107) P ⊥ b x (cid:107) and (cid:107) P ⊥ tb x (cid:107) are respectively the orthogonal distances from thebackground and target-background subspaces. These algorithms can be naturally extended to the case where

N >

1, i.e., thereare more than one chemical plumes. The corresponding detectors can be writtenas: T NMF ( x | µ, Σ , S ) = (˜ x T Σ − S )( S T Σ − S ) − ( S T Σ − ˜ x )˜ x T Σ − ˜ x , ˜ x = x − µ where S = [ s , · · · , s N ] and s i ∈ R p is the signature spectrum for the i th chem-ical plume, and T NSS ( x | µ, B , S ) = (cid:107) P ⊥ b x (cid:107) (cid:107) P ⊥ tb x (cid:107) where P ⊥ b and P ⊥ tb are projection matrices: P ⊥ b = I − B ( B T B ) − B T ; P ⊥ tb = I − A ( A T A ) − A T , A = [ S B ] . Mixture models and Enhancement with KnownTarget Signature

In this section, we describe both the mixture models and some techniques toenhance the detection. Our algorithm is presented at the end of this section.

Gaussian mixture models v ∼ (cid:80) j π j N ( µ j , Σ j ), where N represents the Normaldistribution, may be used to capture complex backgrounds. We are particularlyinterested in the case where Σ j is rank-deﬁcient, and therefore N ( µ j , Σ j ) issupported on an aﬃne subspace spanned by cols( B j ), the columns of a matrix B j . More generally, in a subspace mixture model v ∼ (cid:80) j π j S ( µ j , B j ) where S ( µ j , B j ) is a probability measure with mean µ j and support contained in thesubspace spanned by the columns of B j .The model parameters π j and Θ j (( µ j , Σ j ) or ( µ j , B j )) in each case can beestimated by K -means, K -means-like subspace clustering algorithms (e.g. [10,11, 12]), fast multiscale techniques [13], or Expectation-Maximization (EM)methods, through iterations between updating cluster assignments and modelparameters.The signal x is then assigned to the cluster that maximizes the estimationof π j p ( x | Θ j ): ˆ j = arg max j π j p ( x | Θ j ) . Given a target plume signature s , the mixture versions of the two estimators,NMF (also known as ACE) and NSS, are given by T mixNMF ( x | s , { π j , Θ j } ) = T NMF ( x | s , µ ˆ j , Σ ˆ j ); T mixNSS ( x | s , { π j , Θ j } ) = T NSS ( x | s , µ ˆ j , B ˆ j ) . (4)Alternatively, for the subspace mixture model, we may simply use the coef-ﬁcient g as the detector and solve for it by least squares. Speciﬁcally, letting ˆ j be deﬁned as above, it is easy to show that the least squares estimator ˆ g of g isthe ﬁrst entry of ˆ β = ( A T ˆ j A ˆ j ) − A T ˆ j ( x − µ ˆ j ) , A ˆ j = [ s B ˆ j ] . This yields the mixture Linear Coeﬃcient (LC) estimator T mixLC ( x | s , { π j , Θ j } ) = max { ˆ g, } . The knowledge about the background complexity can be used to choose theappropriate number of Gaussians (or subspaces), e.g., for the data in Section 5.2,we use three components since the scene has sky, mountain and ground areas.In practice, the detection performance is not sensitive when this number isoverestimated and there are enough sample spectra. These mixture models canalso be applied with the detectors that handle multiple plumes.5inally, in section 4 we consider a related but diﬀerent family of multiscalemodels for background spectra, in the context of anomaly detection in hyper-spectral movies, in the setting when the spectral signature s of the chemical ofinterest is not given, nor the frames in the movies are assumed to be registered. We propose a few enhancement techniques for background estimation to reducethe aﬀects of the contamination:

We detect and remove a small fraction of the spectra that are dissimilar to themain part of the data, in terms of magnitude or connectivity (in this paperwe simply compute the sum of squares of each spectrum as its magnitude andclassify those spectra with largest magnitudes as outliers).

This technique is relevant only when we are in Scenario (I). For this goal weutilize an iterative scheme. We ﬁrst choose a few likely background spectrabased on a reliable detection score (output of a detection algorithm, e.g. ACE),and then select their spatial neighbors as well, since adjacent pixels are verylikely to be of the same category. Using these selected spectra, the backgroundmodel parameters are re-estimated and the detection statistics are re-deducedaccordingly.

Algorithm 1

Resampling Enhancement

Input: { x i } mni =1 ⊂ R p : spectra; { T i } mni =1 ⊂ R : detection score; s ∈ R p : plumesignature; τ ∈ (0 , Output: { y i } mni =1 ⊂ R : enhanced detection score. Sort T i s.t., T (1) ≤ · · · T ( mn ) . Choose δ = T ([ τ mn ]) . Let A := { x i : T i ≤ δ } . Let B be the union of A and the 4 (above, below, left and right) spatialneighbors of x i ∈ A . Re-estimate model parameters Θ j ’s from B . Compute the statistic (detection score) based on the updated model.

PLSR [14] reduces the dimensionality of the data in the way that the covariancebetween predictors and responses is maximized. The response is re-estimatedon the reduced predictors. In our problem, PLSR is applied so that the ra-diance data is projected to a subspace that is most relevant to the detectionscore (output of a detection algorithm, or enhanced detection score resulted6rom resampling). A new and enhanced detection score is computed from theprojected radiance data. In fact, we select a certain amount of spectra whichare most likely to be background as well as chemical clouds and apply PLSR onthe selections.

Algorithm 2

PLSR Enhancement

Input: { x i } mni =1 ⊂ R p : spectra; { T i } mni =1 ⊂ R : detection score; s ∈ R p : plumesignature; τ , τ ∈ (0 , Output: { y i } mni =1 ⊂ R : enhanced detection score. Sort T i s.t., T (1) ≤ · · · T ( mn ) . Choose δ = T ([ τ mn ]) and δ = T ([(1 − τ ) mn ]) . Let A := { x i : T i ≤ δ } and A := { x i : T i ≥ δ } . Apply PLSR on the selected pairs ( x i , T i ) with x i ∈ A (cid:83) A to obtainparameters β ∈ R p and β ∈ R . y i = β T x i + β , i = 1 , · · · , mn . We propose a unifying algorithm (see Algorithm. 3) that can detect chemi-cal plumes in both scenarios: in Scenario (I) when we have only a single hy-perspectral cube, we incorporate the enhancement techniques proposed in theprevious section to learn the background via mixture modeling; in Scenario(II) where we are given a time series of hyperspectral cubes, we assume thatthe ﬁrst few frames were collected before the chemical release so we may usetheir spectra for background modeling. Afterwards, we select a detector (from T mixNMF , T mixNSS , T mixLC )) and apply it to the given cube(s). Algorithm 3

Chemical plume detection through mixture background modelingin hyperspectral images or movies

Input: { x ( (cid:96) ) i } mni =1 , ≤ (cid:96) ≤ L : L hyperspectral frame(s); s ∈ R p : plume signa-ture, and detector (one of the T mixNMF , T mixNSS , T mixLC ). Output: { T ( x ( (cid:96) ) i ) } ⊂ R : detection score Fit a mixture model { Θ j } to the given cube (when L = 1) or the ﬁrst fewclean frames (when L > for each frame (cid:96) = 1 , . . . , L ,(1) Assign spectra of the (cid:96) -th frame to nearest component models by maxi-mizing the estimator of π j p ( x | Θ j )(2) Evaluate the given detector for all spectra of the frame (within thecorresponding clusters) to produce detection scores end for Anomaly Detection in hyperspectral movieswith Unknown Target Signature

In this section we consider the problem of anomaly detection instead of detec-tion of known chemicals. In particular, we are interested in the application tohyperspectral movies, where at unknown time and location a gas is releasedand the gas plume needs to be tracked. We think of the chemical plume re-gion, once released, as a set of anomalous spectra, when compared against thebackground clutter, and thus we base detection on accurately estimating theprobability measure modeling the space of the background spectra and comput-ing the likelihood scores of every spectrum in any testing frame relative to thedensity model.In the original paper on Geometric Multi-Resolution Analysis (GMRA) [9],an approach for approximating probability distributions in high-dimensions us-ing the intrinsically low-dimensional GMRA structure was suggested, and thoseideas were further developed in [13, 17]. In this case we are not given sig-natures of spectra to detect; instead we are given one or more hyperspectralscenes deﬁned “normal” (a training set), and given a new hyperspectral scenewe are interested in deciding if its spectra are normal or present “anomalies”.We model this problem as follows: we assume that there is an unknown prob-ability measure ν in R p from which “normal” spectra are drawn. The trainingset X := { x i } Ni =1 ⊆ R p consisting of all the spectra in the training hyperspec-tral scenes is modeled as n i.i.d. samples from ν . We use these n samples tolearn a probability measure ˆ ν X approximating, in a suitable sense, ν . Given anew scene, i.e. a new set of samples X new = { x new i } , we could ask what is theprobability of seeing x new i if it was sampled from ν , and call x new i an anomaly ifthis probability is below a certain threshold. Unfortunately this does not makesense since typically ν (and our estimator ˆ ν X ) do not assign positive probabil-ity to any point. Often one then replaces probabilities by probability densitiesand associated likelihoods. An alternative is to replace the question above byevaluating the probability (according to ˆ ν X ) of seeing a point within distance r from x new i , and decide whether x new is an anomaly or not based on a thresholdon such probability, i.e. we declare x new i an anomaly if P ˆ ν X ( B r ( x new i )) < η .The values of η and r tune the sensitivity of the anomaly detection. We maychoose them by ﬁrst ﬁxing η , then choosing r minimizing type I or type II er-ror (or other similar criterion), or by choosing r to be smallest value such that P ˆ ν X ( B r ( x val i )) > η for all i ’s, where { x val i } is a validation data set (possiblyextracted and excluded from a training set). Then as we vary η we obtain aROC for the anomaly detector (assuming we know the ground truth).The key problem here is to eﬃciently construct an estimator ˆ ν X : this is achallenging task, since ν is in R p , with p typically large ( p >

100 is common).This is accomplished by using techniques based on GMRA, originally suggestedin [9], and further developed and analyzed in [13, 18]. We have no space to Clearly, independence is a rather strong assumption, but could be relaxed with only ratherminor technical diﬃculties to more general settings that accommodate mild dependencies. ν X approximating ν . The results in [13, 17] guarantee that – under suitablegeometric assumptions on the data and on the regularity of ν X – the estimatorˆ ν X gets increasingly closer to ν (in Wasserstein distance) with high probabilityas N increases, and with a rate that depends only on the intrinsic dimension d ofthe data and not on the ambient dimension p . Furthermore, these constructionsare yielded by eﬃcient algorithms, having complexity essentially linear in thetraining data size pN , with a constant depending essentially on the intrinsicdimension of the data d . In the setting of hyperspectral images considered here, N = mn is the number of pixels in an image. The intrinsic dimension of thedata, measured by Multiscale SVD [19, 21] is often very small in hyperspectraldata, between 1 and 5. Moreover, these algorithms [13]-[17] easily allow toquickly (in O d (log N )) incorporate new samples in an online fashion, in boththe GMRA construction and the estimator ˆ ν X . Finally, the underlying GMRAconstruction, simultaneously to the above, yields a dictionary that sparsiﬁesthe data (in our case the spectra) [18, 21], enabling the compression of thedata. These representations also lead to a variation of Compressed Sensingwhere the data model is that of a nonlinear manifold, with extremely eﬃcientinversion algorithms that do not require the solution of high-dimensional convexoptimization problems [13]. In this section we consider both synthetic data and some real data sets in theColorado State Repository (will write CSR for short thereafter) made availablethrough the ATD program (

Algorithms for Threat Detection ), co-sponsored byNSF, DTRA and NGA. We start with synthetic data to present the functionalityof our models in simple situations, and subsequently consider real data sets,demonstrating the eﬀectiveness of our methods in applications.9 lgorithm 4

Multiscale-transform based Density Estimation

Input:

Data set X n Output:

Multiscale densities { ˆ ν j,k } j ≥ ,k ∈ Λ j Apply GMRA to the training data to obtain a multiscale dictionary { c j,k , Φ j,k , w j,k , Ψ j,k } j> ,k ∈ Λ j For each j > , k ∈ Λ j , apply the Geometric Wavelet Transform to the datain each C j,k and obtain low dimensional coeﬃcients Apply a density estimator (e.g. KDE [15]) to each set of geometric scalingand wavelet coeﬃcients corresponding to each C j,k , and obtain density es-timates ˆ ν j,k ; also record ˆ π j,k , the empirical probability of a point belongingto C j,k . By testing on a validation set, select an optimal scale j ∗ and return themixture model (cid:80) k ˆ π j ∗ ,k ˆ ν j ∗ ,k . Description . We use synthetic data set to realize the Gaussian mixture model.We take the MIT Lincoln Lab Challenge Data (see details in the next section) toobtain the mean spectra for three regions - sky, mountain and ground (denotedby µ , µ and µ respectively) and the target plume signature s . These spectraconsist of 68 measurements at diﬀerent values of wavelength and are shown inFigure 1.

900 1000 1100 12000.040.060.080.10.120.140.16 wavenumber r ad i an c e µ − sky µ − mountain µ − ground

900 1000 1100 120000.10.20.30.40.5 wavenumber ab s o r p t i on Figure 1: Left: the mean spectra of sky, mountain and ground. Right: theabsorption of the chemical plume.Moreover, 5 , , ,

000 and 4 ,

000 spectra are generated i.i.d. from Gaussiandistributions N ( µ , Σ ), N ( µ , Σ ) and N ( µ , Σ ) respectively for the threeregions, where Σ i = diag { σ i, , · · · , σ i, } , i = 1 , , σ i,j ’s are drawn i.i.d.from the uniform distribution on [0 . a, . a ] with a = max( µ ). Finally,1 ,

000 spectra on the ground with chemical plume are generated from g s + v ,where v ∼ N ( µ , Σ ) and g ∼ N ( − . , .

00 1000 1100 12000.060.070.080.090.10.110.12 wavenumber r ad i an c e skymountaingroundplume A C E s t a t i s t i cs ground mountain sky plume00.20.40.60.81 A C E s t a t i s t i cs ground mountain sky plume00.20.40.60.81 A C E s t a t i s t i cs Figure 2: Top left: radiance against wavenumber of 5 samples each from thegroups of sky, mountain, ground and plume. Top right: the ACE scores of 10spectra samples computed by 1 Gaussian and 3 Gaussians respectively. Bottom:the boxplot of the ACE scores of using 1 Gaussian (left) and 3 Gaussians (right).11 ask . Detect the plume and compare the Gaussian mixture model with thesingle Gaussian model by the detection.

Technique . We compute the ACE (or NMF) statistics (detection scores) usingthe ground truth. More precisely, we compute T mixNMF by (4) using the truevalues of µ i and Σ i with i = 1 , , T NMF by (2) using µ = ( µ + µ + µ ) / + Σ + Σ ) / Results . Ten sample ACE statistics each from the 4 groups are demonstratedon the top right of Figure 2 for both the single Gaussian and mixture Gaussianmodels. In addition, the summary of these statistics are shown as boxplot atthe bottom of Figure 2, with the single Gaussian on the left and the mixtureGaussian on the right. The boxes are colored by groups of sky, mountain,ground and plume. On each box, the central mark is the median, the edgesof the box are the 25th and 75th percentiles, the whiskers extend to the mostextreme data points not considered as outliers. These ﬁgures show that whenusing three Gaussians instead of a single Gaussian, the ACE statistics of theplumes are greatly larger than those of the other regions. This makes the plumesmore separable and thus more detectable.

Description . We use synthetic data set to realize the subspace mixture model.5 , , ,

000 and 4 ,

000 spectra are generated i.i.d. from cµ + (cid:15) , cµ + (cid:15) and cµ + (cid:15) respectively for sky, mountain and ground regions, where c ∼ N (1 , . (cid:15) ∼ N (0 , Σ (cid:15) ), Σ (cid:15) = diag { σ (cid:15), , · · · , σ (cid:15), } and σ (cid:15),j ’s are drawn i.i.d. from N (0 , .

005 max( µ )). Then 1 ,

000 spectra on the ground with plume are gener-ated as gs + cµ + (cid:15) with g ∼ N ( − . , . Task . Detect the plume and compare the subspace mixture model with thesingle subspace model by the detection.

Technique . We compute the NSS statistics (detection scores) using the groundtruth. More precisely, we compute T mixNSS by (4) letting B i = µ i , i = 1 , , T NSS by (3) using B = ( µ + µ + µ ) / Results . Ten sample NSS statistics each from the 4 groups are demonstratedon the top right of Figure 3 for both the single subspace and mixture subspacemodels. In addition, the summary of these statistics are shown as boxplot atthe bottom of Figure 3, with the single subspace on the left and the mixturesubspace on the right. From these ﬁgures, we know that the separation betweenthe plume and other regions greatly improves when using the mixture subspacemodel.

Description . We generate 10 ,

000 spectra from µ + b e , where µ = ( µ + µ + µ ) / b = max µ and e j ∼ Poisson(0 . j = 1 , · · · ,

68. Then 1 ,

000 spectraare generated from µ + b e + g s with g ∼ N ( − . , .

00 1000 1100 12000.060.080.10.12 wavenumber r ad i an c e skygroundplume N SS s t a t i s t i cs ground mountain sky plume01234567 N SS s t a t i s t i cs ground mountain sky plume01234567 N SS s t a t i s t i cs Figure 3: Top left: radiance against wavenumber of 5 samples each from thegroups of sky, mountain, ground and plume. Top right: the NSS scores of 10samples computed by 1 subspace and 3 subspaces respectively. Bottom: theboxplot of the NSS scores of using 1 Gaussian (left) and 3 Gaussians (right).

900 1000 1100 12000.050.10.150.2 wavenumber r ad i an c e sceneplume Figure 4: Background and plume spectra contaminated by Poisson noise.13

10 20 30 40 50−0.500.511.5 spectra instances A C E s t a t i s t i cs A C E a ft e r P L S R scene plume−0.500.511.5 A C E s t a t i s t i cs scene plume−22.5−22−21.5−21−20.5 A C E s t a t i s t i cs Figure 5: Top left: ACE statistics of 50 sample spectra of each class; Top right:ACE statistics after PLSR of the same 50 sample spectra of each class; Bottomleft: the boxplot of the ACE statistics; Bottom right: the boxplot of the ACEstatistics with PLSR.

Task . Detect the plume and compare the performances before and after PLSR.

Technique . We compute the ACE statistics as described in Section 2. Then weapply Algorithm 2 with τ = τ = 0 . Results . The top of Figure 5 displays the 50 sample ACE scores on the left andthe corresponding enhanced results after applying PLSR on the right. Theirstatistical summary are demonstrated as boxplots at the bottom of Figure 5with ACE on the left and PLSR on the right. These ﬁgures demonstrate thatthe PLSR procedure greatly improves the separation of the detection scoresbetween scene and plume.

Description . We follow Section 5.1.1 to generate 5 ,

000 sample spectra each forthe sky, mountain and ground areas, then we generate 1 ,

000 sample spectrafor the ground area each with chemical plume 1 and 2. The signature ( s , s )of these two plumes are demonstrated on the top left of Figure 6. Finally wegenerate 100 sample spectra for the ground area with both chemical plumesfrom g s + g s + v , where v ∼ N ( µ , Σ ) and g , g ∼ N ( − . , . Task . Detect where a plume (or both plumes) are present.

Technique . We compute the detectors as described in Section 2.3.14

00 1000 1100 1200 130000.10.20.30.40.50.60.7 wavenumber ab s o r p t i on plume 1plume 2

900 1000 1100 12000.060.070.080.090.10.110.12 wavenumber r ad i an c e skymountaingroundplume1plume2plume1&2 ground mountain sky plume1 plume2 plume1&200.10.20.3 A C E s t a t i s t i cs ground mountain sky plume1 plume2 plume1&20246810 N SS s t a t i s t i cs Figure 6: Top left: signature spectra of plume 1 and plume 2. Top right: samplespectra of diﬀerent areas, with and without plumes. Bottom left: boxplot of theACE statistics for diﬀerent areas. Bottom right: boxplot of the NSS statisticsfor diﬀerent areas.

Results . The box plots of the NMF (ACE) statistics and the NSS statistics areshown at the bottom of Figure 6. It is evident that the regions with plumescan be distinguished from those without plumes very well. We will not conductan experimental analysis as careful as for a single target plume for two reasons.First, all the real data under consideration have only a single chemical plume.Second, from the statistical point of view, our models (both Gaussian and sub-space) for diﬀerent numbers of chemical plumes are essentially the same modelswith diﬀerent dimensions. We will just use this simple example in this sectionto justify that our methods work eﬀectively for

N > Description . We use hyperspectral images available at the CSR, collected by theMIT Lincoln Lab, and made available through the ATD program. This data setconsists of four individual hyperspectral images, two of which are for releasedchemical plume and two for embedded plume. In each of the four, availabledata include a radiance data cube, its matrix form, the absorption coeﬃcientspectrum of the chemical plume of interest and plume present mask. Data cubesare about of the size 200 × × ×

300 is the spatial size and100 is the spectral dimension.

Task . Detect the chemical plume from a single cube.

Technique . We apply the three detection algorithms (NMF, NSS, LC), theircorresponding mixture models (mixNMF, mixNSS, mixLC), and the two en-hancement techniques (resampling and PLSR) as described in Section 2 and 3.15he dimension for the subspaces is chosen to be 2 by looking at where the sin-gular values start to ﬂatten. The parameters for the enhancement techniquesare τ = 0 . , τ = τ = 0 . Results . The receiver operating characteristic (ROC) curves are shown in Fig-ure 7 for one of the embedded data cube (also the most diﬃcult one). Com-parisons are made between single and mixture models on the left and betweenmixture models and those followed by resampling and then PLSR (the combi-nation of these two enhancement techniques are denoted as -eh ) on the right.From the left ﬁgure, we see that using mixture models for background can im-prove the detection results. Among the three algorithms, mixture LC worksthe best. In the right ﬁgure, when adding the enhancement procedures, bothmixNMF and mixNSS improve, but mixLC does not. Overall, mixNMF withenhancement outperforms others. FA P D NMFmixNMFNSSmixNSSLCmixLCdiag 0 0.2 0.4 0.6 0.8 100.20.40.60.81 ROC Curve in stairsP FA P D mixNMFNMF−ehmixNSSNSS−ehmixLCLC−ehdiag Figure 7: ROC curves. The left is comparison for single and mixture models andthe right for mixture models before and after enhancement (denoted as -eh ).Furthermore, we demonstrate the detection maps of the intermediate stepsof the best performed method mixNMF -eh , namely mixNMF, mixNMF with re-sampling (mixNMF-rs), mixNMF with resampling twice (mixNMF-rs2), mixNMFwith resampling twice and followed by PLSR (mixNMF-rs2-plsr). These resultsare shown with those of the NMF on the original data (NMF-outliers) and onthe data with outliers removed (NMF). All the methods stated in this sectionproceed the outlier removal ﬁrst, except NMF-outliers. Figure 8 shows that themixture model and the various enhancement steps (resampling and PLSR) im-prove the results gradually; the plume region becomes more and more separableover these steps.

Description . This dataset consists of ﬁve time series of radiance data, collectedusing an imaging Spectroradiometer that operates in the 8 – 11 micron rangeunder ﬁve diﬀerent combinations of the kind of chemical material (TEP, orMeS, or GAA), the release amount (75 kg/burst or 150), and the sensor used.To generate each sequence, a predetermined quantity of simulant is released into16

Figure 8: From top to bottom and from left to right: the detections map ofNMF-outlier, MNF, mixNMF, mixNMF-rs, mixNMF-rs2 and mixNMF-rs2-plsr.17igure 9: Detection results by ACE (left column) and T mixLC (right column) ona ﬁxed frame in each of the ﬁve hyperspectral movies of the Fabry-Perot dataset. Note that (1) our detection map is consistently cleaner than ACE, and (2)our algorithm signiﬁcantly outperformed ACE in the third movie.18he troposphere by using explosives (i.e. a burst release). Successive data cubesare collected, beginning before the event and terminating when it is deemedthat additional data collected will not augment scientiﬁc value. As a result,each hyperspectral time series contains several hundred frames of hyperspectralimages, which all have 256 ×

256 spectra along 20 spectral bands.

Task . Detect and track the chemical plume.

Technique . We ﬁrst check the time derivatives of the spectra (along the se-quence) but did not observe anything useful (this indicates that the data isparticularly challenging). We then applied Algorithm. 3 with the mixLC esti-mator and compared our results with ACE [2].

Results . The comparison is shown in Figure 9. It is obvious that the detectionmap by our algorithm is always much cleaner than that by ACE. Furthermore,in one case, our algorithm was able to clearly locate the plume while ACE couldnot (see Figure 9, third column). In this experiment, we used the 5th frame forbackground modeling (via a union of lines) and the 35th frame for testing (for allﬁve movies); future work will utilize multiple clean frames for joint backgroundmodeling.

Description . We use hyperspectral movies available at CSR, collected by JohnsHopkins Applied Physics Lab, and made available through the NSF ATD pro-gram. These movies are taken with an FTIR based long wave infrared sen-sor, recording a variety of releases of known chemicals in a gaseous, liquid andgaseous state. They have frames consisting of 128 × ×

120 hyperspectralcubes, with one collected approximately every 8 seconds. The data consists ofa desert scene where an unknown (to us) chemical is released at an unknownlocation at an unknown time (after the beginning of the movie), growing into achemical plume.

Task . Detect and track the chemical plume.

Technique . We detect the spectra in the chemical plume as an anomaly withrespect to an empirical model for the background. We use the ﬁrst two frames ofthe movies (where we are told that no chemical plum is present) as our trainingset X , and use the techniques detailed in section 4 to construct ˆ ν X , and detectanomalies. Results . An example of anomaly detection is visualized in Figure 10. In thecontext of these movies, the intrinsic dimension of the data d appears to bevery low, typically d ≤

5, as measured by Multiscale SVD [19]-[21]. Thereforethe number of samples required in order to learn ˆ ν X is expected to be low, evenif the high-dimensional space R p with p = 120. In practice, with non-optimizedMatlab code, the construction of ˆ ν X , using the ﬁrst frame as a training set(a much smaller set would be more than suﬃcient) takes about a minute, andthe evaluation of ˆ ν X at all the spectra of a new frame takes a few seconds.The anomaly detection requires the choice of a threshold, but not knowing theground truth we cannot study ROC curves. However the choice of thresholddid not seem to aﬀect the results much, especially in the detection of the “core”19art of the chemical plume, albeit it did aﬀect the detection of (presumablyless dense) regions of the plume: see Figure 10, where we chose on purpose aparticularly conservative threshold. For the algorithms in section 3, the algorithm [11] to estimate the subspacesmodeling background has computational complexity is O ( c mn ( dp + c +log( mn ))),where d is the intrinsic dimension of the subspaces, c and c are two parameters(set to 10 and 20). Algorithm [12] has similar cost and performance. Construct-ing the estimators T mixNMF , T mixNSS , T mixLC requires O ( p ) , O (( p + d ) d ) and O (( p + d ) d ) basic operations. Resampling has cost O ( mn log( mn )), PLSR re-quires O (( pmn + l + pl ) l ) operations, with l the dimension chosen for PLSR(typically O (1), independently of p ). The cost of the GMRA-based algorithms(see section 4) is: O ( C d mnp log( mn ) d ), where C d is a constant that dependsexponentially in the intrinsic dimension d of the subspaces, for constructing theGMRA; of O ( mnd ) for constructing the estimator ˆ µ s using low-rank Gaussiansor KDE, and for evaluating it at new points; O ( mnp log( mn )) for computingthe coeﬃcients of new data needed to evaluate the likelihood.In summary, all the algorithms we discuss run in time proportional to (up tologarithmic factors) the size mnp of the hyperspectral data cube, with constantsthat depend on the intrinsic dimension d of the data. We have presented several ideas aimed at improving the current state-of-art inseveral tasks related to the analysis of hyperspectral images, in particular forbackground modeling, chemical plume detection and anomaly detection. Wediscussed the application of these algorithms to a variety of data sets, withstate-of-art or better results (when ground truth was available). The proposedtechniques are diverse, but are mostly motivated by the observation that hy-perspectral data is often noisy but intrinsically low-dimensional, allowing oneto use ideas from the dimension reduction and manifold learning algorithmsoriginally considered in view of machine learning applications. In particularwe use techniques for approximating data by mixtures of distributions on low-dimensional subspaces [10, 11, 12], ﬁrst with a small, ﬁxed number of subspacesfor background modeling, and then with more complex, multiscale mixtures ofsubspaces using GMRA [9] and its extensions to the estimate of probabilitymeasures in high-dimensions.

Acknowledgment

The authors thank Dr. Dimitris G. Manolakis for sharing with them a manuscript [1]and for his correspondence answering questions.20igure 10: Anomaly detection of a chemical plume of an unknown chemicalappearing at an unknown time and location in a hyperspectral movie. No spatialregistration among the frames is assumed. The chemical plume is detected asanomaly, evolving in time and space. Left column: log-likelihood score accordingto empirical GMRA-based model, with darker red meaning lower log-likelihood,i.e. higher probability of being an anomaly. Right column: thresholded versionof the images on the left, with a non-optimized, conservative threshold, showingdetection of the chemical plume. Ground truth was not provided for this dataset. 21 eferences [1] D.G. Manolakis, S.E. Golowich and R.S. DiPietro. Long-Wave Infrared Hy-perspectral Remote Sensing of Chemical Clouds: A focus on signal process-ing approaches.

IEEE Signal Processing Magazine , vol. 31, no. 4, pp. 120–141, 2014.[2] S. Kraut, L. Scharf, and L. T. McWhorter. Adaptive subspace detectors.

IEEE Transactions on Signal Processing , vol. 49, no. 1, pp. 1–16, 2001.[3] D. Manolakis, C. Siracusa and G. Shaw. Hyperspectral subpixel targetdetection using the linear mixing model.

IEEE Trans. on Geoscience andRemote Sensing , vol. 39, no. 7, pp. 1392–1409, 2001.[4] M. Dao, D. Nguyen, T. Tran and S. Chin. Chemical plume detection inhyperspectral imagery via joint sparse representation.

Military Communi-cations Conference , 2012.[5] P. Gurram and H. Kwon Ensemble learning based on multiple kernel learn-ing for hyperspectral chemical plume detection.

Proc. SPIE 7695, Algo-rithms and Technologies for Multispectral, Hyperspectral, and UltraspectralImagery XVI , May, 2010.[6] J. Theiler, B.R. Foy and A.M. Fraser Characterizing non-Gaussian clutterand detecting weak gaseous plumes in hyperspectral imagery.

Proc. SPIE5806, Algorithms and Technologies for Multispectral, Hyperspectral, andUltraspectral Imagery XI , June, 2005.[7] T. Gerhart, J. Sunu, L. Lieu, E. Merkurjev, J. Chang, J. Gilles andA.L. Bertozzi Detection and tracking of gas plumes in LWIR hyperspectralvideo sequence data.

Proc. SPIE 8743, Algorithms and Technologies forMultispectral, Hyperspectral, and Ultraspectral Imagery XIX , May, 2013.[8] J. Theiler and B. Wohlberg Detection of unknown gas-phase chemicalplumes in hyperspectral imagery.

Proc. SPIE 8743, Algorithms and Tech-nologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XIX ,May, 2013.[9] W.K. Allard, G. Chen, and M. Maggioni. “Multiscale geometric methodsfor data sets II: Geometric multiscale analysis”.

Applied and ComputationalHarmonic Analysis (ACHA) , vol. 32, no. 3, pp. 435–462, 2012. Availableonline September 2011.[10] G. Chen and G. Lerman. Spectral Curvature Clustering (SCC).

Interna-tional Journal of Computer Vision , vol. 81, no. 3, pp. 317–330, 2009.[11] T. Zhang, A. Szlam, Y. Wang and G. Lerman. Hybrid Linear Modeling viaLocal Best-ﬁt Flats.

Proceedings of IEEE Conference on Computer Visionand Pattern Recognition (CVPR) , pp. 1927–1934, 2010.[12] G. Chen and M. Maggioni. Multiscale Geometric and Spectral Analysis ofPlane Arrangements.

Proceedings of IEEE Conference on Computer Visionand Pattern Recognition (CVPR) , pp. 2825–2832, 2011.2213] G. Chen, M. Iwen, S. Chin and M. Maggioni. A Fast Multiscale Frame-work for Data in High Dimensions: Measure Estimation, Anomaly Detec-tion, and Compressive Measurements. Visual Communications and ImageProcessing (VCIP), 2012 IEEE, pp. 1-6, 2012.[14] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statisticallearning: data mining, inference and prediction.

Springer

IEEE Asilomar Conference on Signals, Systems and Comput-ers, arXiv ,to appear in Jour. Mach. Learn. Res., 2014.[19] A. V. Little, Y.-M. Jung, and M. Maggioni. “Multiscale Estimation ofIntrinsic Dimensionality of Data Sets”.

Proc. A.A.A.I. , 2009[20] A. V. Little. “Estimating the Intrinsic Dimension of High-Dimensional DataSets: A Multiscale, Geometric Approach”.