[PDF] GLIMPS: A Greedy Mixed Integer Approach for Super Robust Matched Subspace Detection

Abstract

Due to diverse nature of data acquisition and modern applications, many contemporary problems involve high dimensional datum $\x \in \R^\d$ whose entries often lie in a union of subspaces and the goal is to find out which entries of $\x$ match with a particular subspace $\sU$, classically called \emph {matched subspace detection}. Consequently, entries that match with one subspace are considered as inliers w.r.t the subspace while all other entries are considered as outliers. Proportion of outliers relative to each subspace varies based on the degree of coordinates from subspaces. This problem is a combinatorial NP-hard in nature and has been immensely studied in recent years. Existing approaches can solve the problem when outliers are sparse. However, if outliers are abundant or in other words if $\x$ contains coordinates from a fair amount of subspaces, this problem can't be solved with acceptable accuracy or within a reasonable amount of time. This paper proposes a two-stage approach called \emph{Greedy Linear Integer Mixed Programmed Selector} (GLIMPS) for this abundant-outliers setting, which combines a greedy algorithm and mixed integer formulation and can tolerate over 80\% outliers, outperforming the state-of-the-art.

Full PDF

aa r X i v : . [ c s . L G ] O c t GLIMPS: A Greedy Mixed-Integer Approach forSuper Robust Matched Subspace Detection

Md Mahfuzur Rahman , Daniel Pimentel-Alarcón Georgia State University, University of Wisconsin-Madison

Abstract —Due to diverse nature of data acquisitionand modern applications, many contemporary problemsinvolve high dimensional datum x ∈ R d whose entriesoften lie in a union of subspaces and the goal is toﬁnd out which entries of x match with a particularsubspace U , classically called matched subspace detection .Consequently, entries that match with one subspaceare considered as inliers w.r.t the subspace while allother entries are considered as outliers. Proportion ofoutliers relative to each subspace varies based on thedegree of coordinates from subspaces. This problemis a combinatorial NP-hard in nature and has beenimmensely studied in recent years. Existing approachescan solve the problem when outliers are sparse. However,if outliers are abundant or in other words if x containscoordinates from a fair amount of subspaces, this problemcan’t be solved with acceptable accuracy or within areasonable amount of time. This paper proposes a two-stage approach called Greedy Linear Integer Mixed Pro-grammed Selector (GLIMPS) for this abundant-outlierssetting, which combines a greedy algorithm and mixedinteger formulation and can tolerate over 80% outliers,outperforming the state-of-the-art.

I. I

NTRODUCTION

Approaches for data analysis has been greatly ad-vanced in recent years. It results in the emergenceof high-dimensional data in all areas of science,business, and engineering problems. To this end,high-dimensional data can often be best explainedusing low-dimensional linear subspaces. In thismodel, a high dimensional datum x ∈ R d is a pointlying in a subspace U . The task of testing whethera given datum x lies in the given subspace U or not,is commonly known as matched subspace detection ( MSD ) [1]. Testing whether x is a complete inliervector or not can be readily accomplished usingthe mathematical relationships between a vector x and its corresponding subspace U . The classicalformulation of this MSD problem is formulated asa hypothesis test problem for which optimality ofthe solution is guaranteed based on the amount of relative signal energy of x w.r.t. the givensubspace U [1], [2]. Different variants of this matched subspace detection problem has numerousapplications in literature including remote sensingsystems [3], [4], neural activation detection [5],anomaly detection [6], [7], and communications[8]. In those applications, it is assumed that outliersare sparse in datum. However, many contemporaryapplications and its variants including computer vi-sion [9], hyperspectral imaging [10], recommendersystems [11], [12], lidar [13], and many moreinherently may allow vast majority of outliers inproblem setting. Furthermore, several other factors(e.g. missingness , inconsistencies, noises and out-liers in data) may also come into picture to makethe problem very difﬁcult to solve. To cope withthese varied situations, a several other modiﬁedformulations of MSD [14]–[16] are proposed in theliterature. For instance, it may be often the casethat full measurement of x is not feasible or notallowed as assumed in [17], [18].In this paper, we will assume two variants of MSD formulation, namely robust matched subspacedetection (R MSD ) and super robust matched sub-space detection (SR

MSD ). By R

MSD , we meana few number of coordinates of x ( < d2 ) is notjust slightly perturbed by small noises. Instead,those entries, by any means, have been replacedcompletely by wrong values (outliers). SR MSD refers to the case that the most of the coordinates( >> d2 ) of the given vector x has been severelycorrupted. In both cases, the goal is to identify theinlier entries in x .II. R ELATED W ORK

The R

MSD variant of the classical

MSD problemcan be easily tackled using the ℓ -norm minimiza-tion of the residual x − U θ because the solutionalways favors majority entries which is the casen R MSD (number of inlier entries > d2 ). Thisformulation is closely related to LASSO [19] as italso uses ℓ -loss as regularization parameter. ThisR MSD setting has been well studied in practicebecause numerous applications can be modeled asR

MSD problems [20]–[27]. However, R

MSD frame-works are not easily extensible for the problems tobe modeled as SR

MSD . The

MSD or R

MSD problemis often approached using randomized methodslike [28]–[30] but it is not applicable in SR

MSD setting because of its combinatorial NP-hard nature.Moreover, the solutions provided by randomizedmethods are not guaranteed to be globally optimal[31]–[38] even if it is an R

MSD problem. How-ever, a very recent bi-convex programming basedmethod is proposed in [35] that aims to reﬁnethe result obtained from RANSAC using a second-stage bisection search in the remaining solutionspace. However, RANSAC itself is sub-optimalin solving an SR

MSD problem. Another variantof RANSAC is recently proposed in [39] thatranks measurements in each iteration. Furthermore,it necessitates some parameter conﬁgurations thatmakes the approach harder to implement in prac-tice. A deterministic and locally convergent methodproposed in [37] can handle a large proportionof outliers but it is slow and heavily dependenton RANSAC or Least-Squares based initial set ofsolutions. In quest of global optimality for SR

MSD class of problems, several approaches have beenproposed in recent years. For example in MaxFS[31], similar to SR

MSD problem is formulated as aset of infeasible constraints and the goal is to ﬁndthe maximum feasible subset from the set. It usesdeterministic branch-and-bound (BnB) methods toﬁnd the solution but for large-scale problems,unfortunately it may take exponential time [34].Moreover, MaxFS only guarantees solution forsmall-scale homogenous linear systems. However,linear matrix inequality constraints within BnBmethod is proposed in [40] to obtain reduced timerequirement and handle large amount of outliers,but it requires some pre-deﬁned structure in theobjective variables. Recently, iterative re-weighted ℓ -minimization method proposed in [32] can ﬁndcomparatively large maximal inlier set but it re-quires iteratively solving a number of similar sub-problems causing it slower than RANSAC classof algorithms. In addition, it is not supportive for SR MSD formulation as its performance is highlydetermined by the proportion of outliers. Anothertree-based technique is recently proposed in [34]to solve the problem in reduced amount of timeusing application dependent heuristics and basedon certain pre-assumption that residual structure isto be quasi-convex. Unfortunately, in many situa-tions, these heuristics may not be available and theassumption on the residual structure may not beaccurate in practice.An MILP-based outlier removal technique,called GORE, is proposed in [33] but it can only beused as a preprocessing routine for the class of al-gorithms solving maximum consensus set problem.However, its removal is guaranteed up to 20% andcan’t be worthwhile if outliers are abundant. Morerecently, a heuristic greedy algorithm is proposedin [41] for subspace tracking that aims to ﬁndthe most probable outlier entry testing whetherremoval of this entry minimizes the gap between x and its projection onto U . It is observed thatthis greedy algorithm, which runs in polynomialtime O (d ) , outperforms the method based on ℓ -minimization of the residual and tolerates up to60% outliers.In this paper, we propose GLIMPS which is ajoint effort of the greedy [41] and MILP algorithmsto utilize the power of a fast (greedy) and anexact (MILP) algorithm to solve an SR MSD prob-lem. We show that GLIMPS can outperform otherexisting approaches including ℓ -minimization, thegreedy and MILP algorithms. We also show that ℓ -minimization even when assisted with the greedyalgorithm can’t perform very well.III. C ONTRIBUTIONS OF THIS PAPER

The contribution of this paper can be summarizedas follows: • GLIMPS offers a two-stage approach thatexacts the bests of both greedy and MILPalgorithms. To make it more precise, greedyalgorithm is fast and its performance in han-dling outliers is better than other randomizedand ℓ -based approaches. However, if outliersare abundant (80% or more), then in very fewcases, it unexpectedly removes some inliers asoutliers. This unintended phenomenon makesgreedy algorithm failed in majority of trials if x is severely outlier-dominant. On the otherand, MILP is an exact algorithm, howeverbecause of the combinatorial nature of SR MSD problem, it may take exponentially long timeto solve the problem. In contrast, GLIMPS re-duces the search space for MILP by removinga certain percentage of most harmful entriesusing greedy algorithm and thus eventuallymakes MILP faster. • By our proposed greedy algorithm in the ﬁrststage, we are providing means to warm-startMILP and thus scalability of MILP has beennoticeably increased in GLIMPS. • Greedy algorithm and MILP can tolerate upto 60% and 76% outliers respectively whenthey are separately used. On the other hand,GLIMPS offers tolerance up to 80% outliers. • Existing approaches like ℓ -minimization [18],[20], [42]–[44] assume subspaces of speciﬁccoherence and uniformity of outlier locationsin x . In contrast, GLIMPS has no restrictionon subspace structure or outlier locations. • SR MSD is a generalized version for a classof problems. So, the idea of GLIMPS caneasily be applied for other tasks if we canuse subspace based model for the problems.Examples include robust PCA (principal com-ponent analysis), background separation, sub-space tracking [18], [20], [27], [41], and ma-trix completion [42]–[51]. More details arediscussed in Section IV. • MILP can be implemented with a few lines ofcode using high-level mathematical optimiza-tion languages like AMPL [52] and JUMP[53]. A variety of solvers like GUROBI [54]and CPLEX [55] are readily available foracademic uses.IV. M

OTIVATING A PPLICATIONSMSD is the classical setting of R

MSD andSR

MSD . Moreover, the setting of R

MSD andSR

MSD can be used to model many applicationsas mentioned in Section I and II. In fact, the ideaof SR

MSD can be applied when subspace-basedmodeling is feasible. In this section, some applica-tions of SR

MSD in recommender systems, networkinference, signal processing, computer vision, andmetagenomics are brieﬂy described.

Recommender Systems.

Organizations like Net-ﬂix, YouTube, Amazon and Facebook, based on their requirements, use recommender systems forvideo, music, product and content recommenda-tions. Different approaches including k-NN, Pear-son correlation coefﬁcient and ensemble methods[12] have been proposed to solve the problem.However, as the user preference vector x usuallylies in a low dimensional subspace, low-rank ma-trix completion is considered as potentially thebest model whatsoever [42]–[47]. Unfortunately,if a single account is shared by multiple users,then the preference vector x may contain entriesthat belong to different subspaces. This situationcan easily be modeled as an SR MSD problem dueto predominance of outliers with respect to eachsubspace.

Networks Inference.

According to system biol-ogy, understanding a system can be accomplishedestablishing correlations among its components[56]. This idea can be applied to understand abroad class of networks (internet, biological net-works, social networks, and many more) [57], [58].In general, these interactions among componentscan be analyzed using mathematical graph theory,where, for example in biological network, eachnode represents an entity like DNA, RNA, proteinand small molecules. Edges may represent activa-tion levels or weights referring to conﬁdence levels,strength, and reaction speeds. This network canbe decomposed into multiple strongly connectedcomponents which we may refer to as subnets. Inthis setting, each node in subnet k , can be encodedas a datum x which , in turn, can be represented asa linear combination of two levels of interactions:1) interaction to its own subnet k and 2) interactionbetween subnet k and other subnets. According tothis modeling, x lies in a subspace U . Becauseof the dynamic behavior of interactions, the stateof each node may change dynamically over timeand consequently x can have measurements fromdifferent subspaces that results in majority of out-liers with respect to each subspace and hence wecan leverage SR MSD algorithms for these class ofproblems.

Signal Processing

In signal processing, for de-tection problems, usually a time-series data is givenand the task is to identify how the data was actuallygenerated. That is, whether the given vector x entirely consists of noises or outliers or it lies insome subspace containing the spectral signaturef the signal. This matched subspace detector isthe generalization of a special case called matchedsignal detector also called the matched ﬁlter whichis considered as an important building block ofsignal processing [1]. In modern signal processingapplications, e.g. target detection or anomaly de-tection is treated as a binary classiﬁer that labelspixels of input images as pixels of interest or asanomalous pixels. In this detection problem, a ﬁxedbackground can be treated as a data point lyingin a low dimensional linear space and targeted oranomalous pixels can be treated as outliers. Basedon the position and orientation of the cameras orreceiving stations, it may have different amount ofoutliers. This idea can also be applied to identifypeculiarities in solar images and thus infer aboutany potential events. Computer Vision.

Computer vision, as a sub-ﬁeld of signal processing, often tries to segmentbackground from the foreground to detect eventsoccurring in a video. Though the background intrafﬁc or video surveillance problems is subjectto minor variation with environmental or instru-mental changes, it can still be modeled as a low-dimensional subspace. For this kind of formulation,each video frame (an image) can be vectorizedas x and background can be approximated as asubspace U . The task is to ﬁnd which entriesin x match with U (inliers) and which entriesdon’t match (outliers). Depending on the positionand orientation of cameras and involving objects,frames i.e., x can have different number of outliers.If foreground objects are far from cameras, theoutliers become sparse in x as assumed in R MSD ,robust PCA [22]–[27] and other subspace trackingapproaches [18], [27], [41]. However, in some dataacquisition situations, this is always not the caseand number of outliers can be a dominant majority.It turns out that we need to precisely handlethe situation of SR

MSD . This kind of modelingcan also be applicable for image inpainting [59],scene reconstruction, designing industrial robotsand many more.

Metagenomics.

To better understand differentmicroorganisms present in an environmental sam-ple, traditionally microbiologists used cultivation-based methods which in fact replicate certain genesto create biodiversity. Unfortunately, this approachis less effective in the sense that it always misses most of the microbial diversity. Recently, metage-nomics, a recent ﬁeld of microbial study, aims toprovide means to understand all of the genomesfrom an environmental DNA sample, which is amixture of genes from multiple organisms. Eachgenome is a collection of genes and representativeof one organism. So, the task is to ﬁnd gene-wiseidentities of genome present in the mixed sample.Metagenomics is supposed to advance knowledgeand has vast impact on practical applications inthe areas of medicine, agriculture and engineering.For example, in agriculture, better understandingthe relationships between plants and microbes canreduce the risk of diseases in crops and livestock,provide enhanced means for adaptive farming [60].Metagenomic problem can easily be modeled as anSR

MSD problem where x refers to the DNA samplecollected from the environment, i.e., each gene willcorrespond to one entry in x . Each genome canbe modeled as a subspace and the goal is to ﬁndout which gene (coordinate) corresponds to whichgenome (subspace). As the sample x containsentries (genes) from multiple organisms, there willbe few inliers (correct genes) for each subspace.This formulation matches with our SR MSD setting.V. P

ROBLEM F ORMULATION

The problem is formulated as follows: consideran r -dimensional subspace U and a vector x inthe ambient dimension R d . The given vector x contains some coordinates correct relative to thesubspace U and the rest are all outliers relative to U . SR MSD is deﬁned as ﬁnding set of indices ω of all the inlier coordinates in x , where ω meansthe largest υ ⊂ [d] := { , . . . , d } such that x υ liesin the subspace U υ . The subscript υ refers to therestriction to the coordinates indicated in υ . Thesubspace U is spanned by U ∈ R d × r , whereas U υ is spanned by U υ ∈ R | υ |× r . Example 1.

Consider d = 5 and r = 2 . Given that U =   , x =   , and according to formulation, U = span { U } .t is evident that ω = { , , } , because x ω =   =   (cid:20) (cid:21) = U ω θ , and because ω is the largest υ for which x υ liesin U υ . The fact that θ we derived in the previousexample is correct and x ω are really inliers can beeasily justiﬁed as follows: Given that the dimensionof the subspace is r = 2 and consequently, thedimension of the coefﬁcient vector θ is . Hence,as we know x = U θ , we have two unknowns inthe chosen system of equations x ω = U ω θ . Now,if we choose m = | ω | ≤ r equations, the systemis either underdetermined or uniquely determined,and we must get a solution for θ . For example,if we choose ω = { , } , we have a solution θ = [4 − T . Same is the case if we choose evenfewer equations. However, this solution is not guar-anteed because the system is under-constrained oruniquely constrained. To guarantee that the set ofindices ω and the corresponding inliers are correct,we need to ﬁnd a solution for an overdeterminedsystem, i.e. we need at least r + 1 equations.VI. G REEDY L INEAR I NTEGER M IXED P ROGRAMMED S ELECTOR

As we have mentioned in Related Work section,to address the problem in question, i.e. to solveSR

MSD problem, greedy algorithm and mixed-integer linear program (MILP) based formulationhave been proposed in the literature. However, ifthe fraction of outlier becomes huge, these tech-niques when used separately either fail to givecorrect results or take exponentially longer time.Our two-stage approach to address SR

MSD is acombination of a greedy algorithm and a MILPto improve the existing results. In its ﬁrst stage,we iteratively remove those coordinates (one pereach iteration) whose removal minimizes the gapbetween x and its projection onto the subspace U . Projections are computed dynamically as werestrict coordinates only to the existing coordinates.Then, in the second stage, we apply a MILP toidentify the remaining inliers and outliers. Oncewe identify the remaining inliers, we obtain thecoefﬁcient of the inliers of x , θ which in turn allows us to identify all the inliers and outliers in x . First Stage: Greedy Algorithm.

Suppose ω c is aset of coordinates of the given vector x that arealready found to be outliers. We compute d − | ω c | pair (for U and x ) of projections onto each setof the coordinates { , . . . , d } \ { ω c ∪ { i }} , ∀ i ∈{ , . . . , d } \ ω c . Let ω i = { , . . . , d } \ { ω c ∪ { i }} .Mathematically, U ω i = P ω i U and x ω i = P ω i x ,where P ω i is the projection operator when x and U are restricted to the set of coordinates ω i . Then,we compute projection of x ω i onto the projectedsubspace U ω i , i.e., ˆ x ω i = U ω i ( U T ω i U ω i ) − U T ω i x ω i .For each ω i as deﬁned earlier, we compute the ratioof two norms, r i = k ˆ x ω i kk x ω i k ≤ . It is known thatlarger r i implies x ω i and ˆ x ω i are closer. Intuitively,it implies if we remove i -th coordinate along withother ω c coordinates, x gets closer to the subspace U whose basis is U . Because of having a clearset of intuitive choices, this approach is calledgreedy erasure (outlier removal) algorithm. Weidentify the coordinate with largest ratio, and thecoordinate is removed as an outlier adding theindex to the outlier set of indices ω c . Obviously,at the beginning of the algorithm, ω c = ∅ . Weiterate this process until we have sufﬁcient numberof inliers compared to the number of outliers sothat MILP can succeed in the second stage. Thegreedy algorithm used in the ﬁrst stage is shownin the following algorithm. Second Stage: MILP

As output from the greedystage, we obtain x υ —a reduced version of x . Allthe entries of x υ are not necessarily inliers. How-ever, it is expected after being partially processedby greedy algorithm in the ﬁrst stage that outliers in x υ are less dense than in x . We apply MILP (mixedinteger linear program) formulation on x υ whichis supposed to perform better as the number ofvariables in x is reduced before feeding into MILP.The MILP for x υ can be formulated as follows: arg min z ∈{ , }| υ | , θ ∈ R r k z k s.t. | x υ − U υ θ | ≤ M z , (1)In this formulation, M is a sufﬁciently big lgorithm 1: Greedy Algorithm (First Stage)

Input:

Subspace basis U , subspace dimension r , observed vector x , removalpercentage p Output:

A reduced version x υ of input vector x , to be solved by MILP . Initialize sets of outliers as ω c = ∅ ; . for i = 1 to d and i ω c do . . Project x and U onto the existingcoordinates except i and obtain x ω i and U ω i ; . . Project x ω i onto U ω i and obtain ˆ x ω i = U ω i ( U T ω i U ω i ) − U T ω i x ω i ; . . Compute r i = k ˆ x ω i kk x ω i k ; . Find r m = max ( r i ) , ∀ i ∈ { , . . . , d } \ ω c ; . Remove m -th coordinate and update ω c = ω c ∪ { m } ; . Repeat steps to until we remove desirednumber of coordinates and thus obtain x υ ;constant useful for MILP formulation [61]. Weshould keep M as small as possible to avoid anypotential numerical instability. Fortunately, we canuse problem structure to choose M . For example,we could use M = max i ∈ [d] | U i θ − x i | (2) z is a vector of binary variables and assumes zeroentries for inliers. To make it precise, let ˆz be thesolution to (1) and ˆ ω be the set of indices in ˆz suchthat ˆz ˆ ω = . Because | x ˆ ω − U ˆ ω θ | ≤ M ˆz ˆ ω = implies | x ˆ ω − U ˆ ω θ | = 0 . It means all the entriesindicated in x ˆ ω match with U ˆ ω for the θ that MILPobtains through its searching process. Moreover, M and z (takes for outlier) collectively satisfyconstraints for outliers. That is, the idea is to ﬁnda suitable θ that makes as many inliers as possiblemaximizing the number of zeros (largest possible ω ) in z or in other words, minimizing k z k . Oncewe ﬁnd correct θ for x υ with at least | ω | ≥ r + 1 inliers, then we can ﬁnd all the inliers in x . MILP Formulation for Noisy Data

As we men-tioned earlier, high-dimensional data in differentproblems can be modeled as points lying in asubspace. However, in real setting, that may alwaysnot be the case. That is, there might be small modeling noise causing x υ to be close to U υ butnot exactly lying on it. That is, if ω is the set ofinlier indices in x υ , then x ω ≈ U ω θ holds insteadof x ω = U ω θ in noiseless version. Alternatively,for noise case, we can write x ω = U ω θ + w ω ,where w ω refers to the vector of small noisesassociated with inlier entries. These noises canbe taken into account by a minor variation inour noiseless version (1) of MILP formulation asfollows: arg min z ∈{ , }| υ | , θ ∈ R r , w ∈ R | υ | k z k + λ k w k s.t. | x υ − U υ θ − w | ≤ M z , (3)where λ > is a regularization parameter thatquantiﬁes how much noise we will allow in ournoisy formulation for the entries to be treated asinliers. λ = 0 implies that noise has no restrictionon w in (3). Consequently, all the entries in x υ will be treated as inliers and we will end up withan erroneous θ . It happens because for any θ , itis possible to satisfy the constraint in (3) with w = x υ − U υ θ and z = . In another extreme,if λ = ∞ , it forces w to be and allows nonoise. In general, between these two extremes, if λ is smaller, the formulation in (3) will allowmore noise. On the other hand, if λ is larger, itpenalizes large values of w and thus allows lessnoise. For our experiments, we use moderatelylarge value for λ . We show in Section VII thatGLIMPS tolerates a reasonable amount of noisewith almost no compromise with accuracy. Remark 1.

As we already mentioned, ﬁnding r + 1 or more inliers is as good as ﬁnding all of theinliers in x . This is because we can estimate truecoefﬁcient vector θ with only a set of inliers ω , | ω | ≥ r + 1 using θ = ( U T ω U ω ) − U T ω x ω . As thecoefﬁcient vector θ has only r unknowns, and weat least have r + 1 equations in the system. Conse-quently, the system of equations is overdeterminedand the solution obtained for θ is correct. Once, weobtain θ , we can easily obtain the entire correct x using x = U θ even if x is partially observedor if it is severely corrupted by outliers. Thisidea can easily be applied to complete missing orcorrupted data in applications such as backgroundestimation when covered by foreground object or inmage inpainting [59] where we want to estimatecorrupted or outlier pixels in the image w.r.t thebackground. VII. E

XPERIMENTS

In this section, we study the performance ofGLIMPS on synthetic data as a function of thefraction of outliers p , and the noise variance σ .For comparison, we test GLIMPS against a sim-ilar MILP formulation, the very recent greedy Erasure algorithm in [41] and the greedy-assisted ℓ -minimization (combination of greedy and ℓ -minimization), which tolerate larger amounts ofoutliers than other existing algorithms. In our ex-periments, we measure accuracy as the normalizederror between the true and estimated inlier coefﬁ-cients θ and ˆ θ , i.e., k θ − ˆ θ k / ( k θ k + k ˆ θ k ) , and as theratio of misclassiﬁed entries vs. total number ofinliers. Both error metrics are tightly related, andall algorithms behave very similarly under bothcriteria.In all our simulations, we use the ambient dimen-sion d = 100 and subspace dimension r = 5 . Torepresent the subspace U and later on, to create asample vector x in the subspace, we ﬁrst generatea basis U ∈ R d × r and a coefﬁcient vector θ ∈ R r with i.i.d. standard gaussian entries. Then we gener-ate x using U θ + ǫ , where ǫ ∈ R d refers to possiblenoise and is generated using i.i.d. N (0 , σ ) . Finally,for each coordinate i ∈ [d] , with probability p , wereplace the entry in x with an outlier generatedusing i.i.d standard gaussian distribution. For noisyformulation of MILP, we solve (3) with λ = 1000 .We use T = 50 trials for all of our experiments.We implement the greedy algorithm in Matlaband for solving MILP formulation, we use CPLEX[55] solver in AMPL environment and set a reason-able time limit (60 seconds) for each trial. CPLEXreports the best result found within the allowedamount of time. All of the experiments were runon a Macbook Pro with 3.5 GHz Intel Core i7processor and 16 GB memory.In our ﬁrst experiment, we compare the per-formance of GLIMPS with other competing ap-proaches as a function of the fraction of outliers.It is evident from Fig. 1 that GLIMPS outperformsall the other competing approaches.As we have shown in Fig. 1 that the combinedperformance of greedy and ℓ -minimization can . . . . . Fraction of Outliers E rr o r GreedyMILPGreedy+MILPGreedy+ ℓ Performance of GLIMPS compared to other approaches

Fig. 1: GLIMPS outperforms other approaches. In the ﬁrst stageof GLIMPS, we remove 40% outliers using the greedy algorithmand then in the second stage, we identify the remaining inliersand outliers using MILP. This approach succeeds for around 80%outliers. In Greedy+ ℓ , we remove 100% outliers (empirically ﬁxed),then use ℓ -minimization, but similar performance can be achievedusing the greedy algorithm only. be achieved using only greedy algorithm that runsin O (d ) , we only show comparative analysis ofGLIMPS with greedy and MILP formulation in thesubsequent experiments.In our second experiment, we show how thebehavior of GLIMPS, the greedy Erasure algo-rithm and the MILP approach for SR

MSD changewith a gradual increase in fraction of outliers. Theaverage error rate for all algorithms is shown inFig. 2. It is observed that if we remove optimalnumber of outliers (in this case, 40%) in theﬁrst stage, GLIMPS can achieve 100% successrate for up to 80% outliers, whereas the greedyalgorithm and MILP approach can achieve similarperformance (100% success rate) for up to 60%and 76% outliers respectively.In our third experiment, we study the computa-tional time requirements for the greedy, MILP andproposed GLIMPS approaches. As this problemis a combinatorial problem, MILP and GLIMPStake longer time when compared with the greedyalgorithm, which reduces the search space usingheuristics and thus takes signiﬁcantly less time(polynomial time). The results are shown in Fig.3. In our fourth experiment, we study the signif- . . . . . Fraction of Outliers E rr o r GreedyMILPGreedy+MILP

Performance of GLIMPS compared to Greedy and MILP

Fig. 2: GLIMPS outperforms both the greedy and MILP approaches.In particular, GLIMPS tolerates 80% outliers whereas MILP strictlyfails after 76%. Moreover, the performance of the greedy algorithmbecomes unstable if the fraction of outliers goes beyond 60%.

Fraction of Outliers C o m pu t a ti on T i m e ( s ec ond s ) GreedyMILPGreedy+MILP

Analysis of computation time requirements

Fig. 3: MILP and GLIMPS take longer time to solve the problem.The greedy algorithm uses heuristics to reduce the solution spaceand hence computationally inexpensive. In this experiment, weremove 40% in the ﬁrst stage of GLIMPS. icance of outlier removal percentage in the ﬁrststage (greedy stage) of GLIMPS. We use thisexperiment as a step of hyperparameter tuning andapply three different removal percentages, namely30%, 40% and 50%. The average results are shownin Fig. 4. It is noticed that 40% removal of outliersin the greedy stage achieves the best tolerance ofoutliers. . . . . Fraction of Outliers E rr o r Effect of removal percentage in greedy stage

Fig. 4: We show that 40% removal in the ﬁrst stage provides thebest result in the second stage. Removal of 30% is low, whereasremoval of 50% is ambitious in abundant-outliers regime. All threeremoval quantities perform very well in sparse-outliers regime.

Finally, we study the behavior of GLIMPS underdifferent noise conditions: low, medium and high .We use three values of sigma σ = 1 e − (low), σ = 1 e − (medium), and σ = 1 e − (high) for threelevels of noises. We observe that GLIMPS steadilytolerates up to medium level of noise. However,high noise severely affects GLIMPS performanceas shown in Fig. 5.VIII. C ONCLUSION

In this paper, a two-stage approach (GLIMPS)for super robust matched subspace detection isproposed to improve the performance of MILPformulation in terms of both accuracy and compu-tation time. In the ﬁrst stage, a greedy algorithmis applied to remove some percentage of outliersand thus we obtain a reduced input vector. Lateron, we feed the reduced input vector and warm-start MILP to increase scalability and performanceof the same MILP. It is observed that GLIMPSachieves better results over

Erasure (greedy algo-rithm) and MILP with full accuracy for fraction ofoutliers up to 80%, whereas, greedy algorithm andMILP individually can tolerate up to 60% and 76%outliers respectively. We also show that if we usegreedy+ ℓ -minimization, it can’t achieve higheraccuracy than only greedy algorithm. In case ofGLIMPS, we notice that if outliers are abundant, . . . . Fraction of Outliers E rr o r Low NoiseNo Noise

Low Noise ( σ = 10 − ) . . . . Fraction of OutliersMedium NoiseNo Noise

Medium Noise ( σ = 10 − ) . . . . Fraction of OutliersHigh NoiseNo Noise

High Noise ( σ = 10 − ) Fig. 5: GLIMPS performs satisfactorily under low and medium noise conditions. However, high noise conditions considerably affect itsperformance. In this experiment, 30% entries of x were removed in the ﬁrst stage. greedy algorithm erroneously removes some inliersas outliers in the ﬁrst stage that, in turn, restrictsGLIMPS to succeed beyond 80%. Our next focusis to ﬁnd necessary and sufﬁcient deterministictheoretic conditions that will guarantee no error tobe incurred in greedy algorithm (ﬁrst stage) so thatGLIMPS can tolerate more outliers.R EFERENCES [1] L. Scharf and B. Friedlander,

Matched subspace detectors ,IEEE Transactions on Signal Processing, 1994.[2] L. Scharf,

Statistical Signal Processing

Addison-Wesley, 1991.[3] S. Shahbazpanahi, S. Valaee and M. Bastani,

Distributed sourcelocalization using ESPRIT algorithm , IEEE Transactions onSignal Processing, 2001.[4] M. Rangaswamy, F. Lin and K. Gerlach,

Robust adaptive signalprocessing methods for heterogeneous radar clutter scenarios ,Signal Processing, 2004.[5] B. Ardekani, J. Kershaw, K. Kashikura and I. Kanno,

Activationdetection in functional MRI using subspace modeling andmaximum likelihood estimation , IEEE Transactions on MedicalImaging, 1999.[6] D. Stein, S. Beaven, L. Hoff, E. Winter, A. Schaum andA. Stocker,

Anomaly detection from hyperspectral imagery ,IEEE Signal Processing Magazine, 2002.[7] T. Ahmed, M. Coates and A. Lakhina,

Multivariate onlineanomaly detection using kernel recursive least squares , INFO-COM. 2007.[8] M. McCloud and L. Scharf,

Interference estimation with ap-plications to blind multiple-access communication over fadingchannels , IEEE Transactions on Information Theory, 2000.[9] K. Kanatani,

Motion segmentation by subspace separation andmodel selection , IEEE International Conference in ComputerVision, 2001.[10] H. Kwon and N. Nasrabadi,

Kernel matched subspace detec-tors for hyperspectral target detection , IEEE Transactions onPattern Analysis and Machine Intelligence, 2006.[11] Netﬂix, Inc.

The Netﬂix prize

Thebellkor solution to the netﬂix prize. , KorBell Team’s Reportto Netﬂix , 2007.[13] C. Parrish, I. Jeong, R. Nowak, and R. Smith,

Empirical com-parison of full-waveform lidar algorithms: Range extractionand discrimination performance , Photogrammetric Engineering& Remote Sensing, 2011. [14] S. Kraut, L. Schaft, L. McWhorter,

Adaptive Subspace Detec-tors , IEEE Transactions on Signal Processing, 2001.[15] M. Desai and R. Mangoubi,

Robust gaussian and non-gaussian matched subspace detection , IEEE Transactions onSignal Processing, 2003.[16] O. Besson, L. Scharf and F. Vincent,

Matched direction detec-tors and estimators for array processing with subspace steeringvector uncertainties , IEEE Transactions on Signal Processing,2005.[17] L. Balzano, B. Recht and R. Nowak,

High-dimensionalmatched subspace detection when data are missing , IEEEInternational Symposium on Information Theory, 2010.[18] L. Balzano, R. Nowak and B. Recht,

Online identiﬁcationand tracking of subspaces from highly incomplete information ,Allerton Conference on Communication, Control and Comput-ing, 2010.[19] R. Tibshirani,

Regression Shrinkage and Selection via theLasso , Journal of the Royal Statistical Society, 1996.[20] J. He, L. Balzano and A. Szlam,

Incremental gradient on theGrassmannian for online foreground and background separa-tion in subsampled video , Conference on Computer Vision andPattern Recognition, 2012.[21] C. Papadimitriou, P. Rghavan, H. Tamaki and S. Vempala,

Latent semantic indexing, a probabilistic analysis , Journal ofComputer and System Sciences, 2000.[22] E. Learned-Miller M. Narayana and A. Hanson,

Coherentmotion segmentation in moving camera videos using opticalﬂow orientations , International Conference on Computer Vision,2013.[23] T. Bouwmans, A. Sobral, S. Javed, S. Jung and E. Zahzah,

Decomposition into low-rank plus additive matrices for back-ground/foreground separation: A review for a comparative eval-uation with a large-scale dataset , Computer Science Review,2016.[24] T. Bouwmans and E. Zahzah,

Robust PCA via principalcomponent pursuit: a review for a comparative evaluation invideo surveillance , Computer Vision and Image Understanding,2014.[25] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen and Y. Ma,

Fast convex optimization algorithms for exact recovery of acorrupted low-rank matrix , Computational Advances in Multi-Sensor Adaptive Processing, 2009.[26] D. Pimentel-Alarcón and R. Nowak,

Random consensus robustPCA , Electronic Journal of Statistics, 2017.[27] H. Mansour, X. Jiang,

A robust online subspace estimationand tracking algorithm , IEEE International Conference onAcoustics, Speech, and Signal Processing, 2015.28] M.A. Fischler, R.C. Bolles,

Random sample consensus: aparadigm for model ﬁtting with applications to image analysisand automated cartography , Communications of the ACM 24(1981) 381–395.[29] R. Raguram, J.M. Frahm, M. Pollefeys: A comparative anal-ysis of ransac techniques leading to adaptive real-time randomsample consensus. In: ECCV, Springer (2008) 500–513.[30] R. Raguram, O. Chum, M. Pollefeys, J. Matas, J.M. Frahm,

Usac: a universal framework for random sample consensus ,IEEE TPAMI 35 (2013) 2022–2038.[31] Y. Zheng, S. Sugimoto, and M. Okutomi,

Deterministicallymaximizing feasible subsystem for robust model ﬁtting withunit norm constraint , Computer Vision and Pattern Recog-nition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp.1825–1832.[32] P. Purkait, C. Zach, and A. Eriksson,

Maximum ConsensusParameter Estimation by Reweighted ℓ Methods , InternationalWorkshop on Energy Minimization Methods in Computer Vi-sion and Pattern Recognition. Springer, Cham, 2017.[33] T.-J. Chin, Y. Heng Kee, A. Eriksson, and F. Neumann,

Guar-anteed outlier removal with mixed integer linear programs ,inProceedings of the IEEE Conference on Computer Vision andPattern Recognition, 2016, pp. 5858–5866.[34] T.J. Chin and et al. , Efﬁcient globally optimal consensus max-imisation with tree search , Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition. 2015.[35] Z. Cai, and et al. , Deterministic consensus maximizationwith biconvex programming , arXiv preprint arXiv:1807.09436(2018).[36] M. C. Tsakiris and R. Vidal,

Dual principal componentpursuit , in Proceedings of the IEEE International Conferenceon Computer Vision Workshops, 2015, pp. 10–18.[37] H. Le, T.J. Chin, and D. Suter,

An exact penalty method forlocally convergent maximum consensus , Computer Vision andPattern Recognition (CVPR), 2017 IEEE Conference on. IEEE.2017.[38] K.H. Lee, C. Yu, and S.W. Lee,

Deterministic HypothesisGeneration for Robust Fitting of Multiple Structures , arXivpreprint arXiv:1807.09408 (2018).[39] R. Li, and et al , ARSAC: Efﬁcient Model Estimation viaAdaptively Ranked Sample Consensus , Neurocomputing (2018).[40] P. Speciale and et al. , Consensus Maximization with LinearMatrix Inequality Constraints , CVPR. 2017.[41] D. Pimentel-Alarcón,

Selective erasures for high-dimensionalrobust subspace tracking , SPIE Defense + Commercial Sensing,2018.[42] E. Candès and B. Recht,

Exact matrix completion via con-vex optimization , Foundations of Computational Mathematics,2009.[43] E. Candès and T. Tao,

The power of convex relaxation: near-optimal matrix completion , IEEE Transactions on InformationTheory, 2010.[44] B. Recht,

A simpler approach to matrix completion , Journalof Machine Learning Research, 2011.[45] Z. Lin, R. Liu and Z. Su,

Linearized alternating directionmethod with adaptive penalty for low rank representation ,Advances in Neural Information Processing Systems, 2011.[46] P. Jain, P. Netrapalli and S. Sanghavi,

Low-rank matrix com-pletion using alternating minimization , ACM symposium onTheory of computing, 2013.[47] D. Pimentel-Alarcón, N. Boston and R. Nowak,

A char-acterization of deterministic sampling patterns for low-rankmatrix completion , IEEE Journal of Selected Topics in SignalProcessing, 2016. [48] B. Eriksson, L. Balzano and R. Nowak,

High-rank matrix com-pletion and subspace clustering with missing data , ArtiﬁcialIntelligence and Statistics, 2012.[49] D. Pimentel-Alarcón, L. Balzano, R. Marcia, R. Nowak andR. Willett,

Group-sparse subspace clustering with missing data ,IEEE Statistical Signal Processing, 2016.[50] D. Pimentel-Alarcón and R. Nowak,

The information-theoreticrequirements of subspace clustering with missing data , Interna-tional Conference on Machine Learning, 2016.[51] E. Elhamifar,

High-rank matrix completion and clusteringunder self-expressive models , Advances in Neural InformationProcessing Systems, 2016.[52] R. Fourer, D. Gay and B. Kernighan,

AMPL: a modelinglanguage for mathematical programming , Second Edition, Dan-vers: Boyd and Fraser.[53] M. Lubin and I. Dunning,

Computing in Operations Researchusing Julia , INFORMS Journal on Computing 27.2 (2015): 238-248.[54] Gurobi Optimization, Inc.,

Gurobi Optimizer Reference Man-ual

Network inference, analysis, and modeling insystems biology. , The Plant Cell 19.11 (2007): 3327-3338.[57] B. Eriksson, P. Barford and R. Nowak,

Network Discoveryfrom Passive Measurements , ACM SIGCOMM, 2008.[58] B. Eriksson, P. Barford, J. Sommers and R. Nowak,

Domain-Impute: inferring unseen components in the Internet , IEEEINFOCOM, 2011.[59] J. Mairal, F. Bach, J. Ponce and G. Sapiro,

Online dictionarylearning for sparse coding , International Conference on Ma-chine Learning, 2009.[60] National Research Council,

The new science of metage-nomics: revealing the secrets of our microbial planet , NationalAcademies Press, 2007.[61] Chinneck, John W.,