[PDF] Faster Maximum Feasible Subsystem Solutions for Dense Constraint Matrices

Abstract

Finding the largest cardinality feasible subset of an infeasible set of linear constraints is the Maximum Feasible Subsystem problem (MAX FS). Solving this problem is crucial in a wide range of applications such as machine learning and compressive sensing. Although MAX FS is NP-hard, useful heuristic algorithms exist, but these can be slow for large problems. We extend the existing heuristics for the case of dense constraint matrices to greatly increase their speed while preserving or improving solution quality. We test the extended algorithms on two applications that have dense constraint matrices: binary classification, and sparse recovery in compressive sensing. In both cases, speed is greatly increased with no loss of accuracy.

Full PDF

aa r X i v : . [ m a t h . O C ] F e b Faster Maximum Feasible Subsystem Solutions forDense Constraint Matrices

Fereshteh Fakhar Firouzeh ∗ , John W. Chinneck, Sreeraman Rajan Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada

Abstract

Finding the largest cardinality feasible subset of an infeasible set of linearconstraints is the

Maximum Feasible Subsystem problem (MAX FS). Solvingthis problem is crucial in a wide range of applications such as machine learn-ing and compressive sensing. Although MAX FS is NP-hard, useful heuristicalgorithms exist, but these can be slow for large problems. We extend theexisting heuristics for the case of dense constraint matrices to greatly increasetheir speed while preserving or improving solution quality. We test the ex-tended algorithms on two applications that have dense constraint matrices:binary classiﬁcation, and sparse recovery in compressive sensing. In bothcases, speed is greatly increased with no loss of accuracy.

Keywords:

Maximum Feasible Subsystem, Linear Programming,Classiﬁcation, Compressive Sensing

1. Introduction

Finding the maximum cardinality feasible subsystem of an infeasible setof linear constraints is known as the

Maximum Feasible Subsystem problem(MAX FS) [1]. Its complement is the

Minimum Unsatisﬁed Linear Relation problem (MIN ULR) [2] of ﬁnding the minimum number of constraints toremove from an infeasible set such that the remaining constraints have afeasible solution. Moreover, all infeasible systems have atleast one

IrreducibleInfeasible Subsets (IISs) in their original set of constraints. An IIS is a set ∗ Corresponding author

Email address: [email protected] (Fereshteh FakharFirouzeh)

Preprint submitted to Journal of Computers & Operations Research February 12, 2021 f constraints that is infeasible, but is rendered feasible if any constraintis removed. If all of the IISs in the model are enumerated, then a MINULR solution is found as a minimum cover of the IISs; this is known asthe minimum IIS cover problem [3]. In this paper, MAX FS, MIN ULR,and MIN IIS COVER are the same problem and the terms will be usedinterchangeably.MAX FS is NP-hard [4, 5, 6] so exact solution algorithms are only avail-able for relatively small instances, but solutions are still needed in a widevariety of applications including machine learning [7], misclassiﬁcation min-imization [8], training of neural networks [2], telecommunications [9], com-putational biology [10], and compressive sensing (CS) [11]. To be usefulin practice, MAX FS heuristics should obtain high quality solutions (largefeasible subsets, or equivalently, small MIN ULR subsets) quickly.The MAX FS problem can be formulated for exact solution in variousways, including as a mixed-integer linear program [12], a linear program withequilibrium constraints (LPEC) [13, 14, 15, 16], or as a type of set-coveringproblem [17, 18, 19, 20]. As shown in chapter 7 of [12], there are practicaldiﬃculties in using any of these exact methods to solve this NP-hard problem(i.e. they can be used for a very small MAX FS problem). For this reason,heuristic solutions are required.Chinneck [3, 21] developed a set of eﬀective linear programming (LP)-based heuristics for solving the MAX FS problem, and showed experimentallythat they give better results than methods such as MISMIN, a state-of-the-art LPEC solver at the time [21]. Variants of Chinneck’s methods havebeen developed more recently for ﬁnding sparse solutions to underdeterminedsystems of linear equations in unbounded variables [11] for sparse recoveryin compressive sensing.This paper extends Chinneck’s algorithms to provide major improvementsin solution speed for the case of dense constraint matrices, based on the ob-servation that dense constraint matrices provide multiple similar candidatesfor removal during the MAX FS solution process. The new extensions areevaluated on the following two applications that have dense constraint ma-trices: binary classiﬁcation, and sparse recovery in compressive sensing.Chinneck’s algorithms remove constraints from the original set one by oneuntil what remains is the heuristic MAX FS solution; the removed constraintsconstitute the heuristic MIN ULR solution. An inner loop assesses candidateconstraints, and an outer loop removes the single constraint chosen by theinner loop. Two extensions that greatly improve solution speed when a dense2onstraint matrix provides multiple similar candidates are proposed in thispaper. Extension 1 amalgamates the two loops into a single one that bothassesses the candidates and removes multiple constraints at each iteration.Extension 2 provides an early exit under certain conditions. When tested onbinary classiﬁcation and compressive sensing sparse recovery, the extendedheuristics greatly reduce solution time while preserving or improving solutionquality.The paper is structured as follows. Section 2 reviews the existing MAXFS solution heuristics. The new extensions are developed in Section 3. Theexperimental results for binary classiﬁcation are given in Section 4 and forcompressive sensing sparse recovery in Section 5. Section 6 concludes thepaper and outlines future work.

2. LP-based MAX FS Solution Heuristics

Chinneck’s original polynomial time heuristics [3] are brieﬂy summarizedhere. All variants ﬁrst elasticize the infeasible set of linear row constraints(and possibly variable bounds) by adding nonnegative elastic variables asshown in Table 1. The elastic variables measure the extent to which eachconstraint is violated, so a linear program (LP) minimizing their sum ﬁndsthe minimum total violation for an infeasible set of constraints and bounds.There are two types of elastic models: the standard elastic LP elasticizesonly the row constraints, and the full elastic LP elasticizes both the rowconstraints and the column bounds. The objective for a full elastic LP is: min Z = P i ( e + i + e − i ) + P j ( e + j + e − j ) for constraints i = 1 ...m and boundedvariables j = 1 ...n ; the second term is omitted for a standard elastic LP. Z = 0 indicates a feasible system. Table 1: Elasticizing constraints by adding non-negative elastic variables.

Type Nonelastic Standard elastic Full elasticRow Cons. P j a ij x j ≥ b i P j a ij x j + e i ≥ b i P j a ij x j + e i ≥ b i P j a ij x j ≤ b i P j a ij x j − e i ≤ b i P j a ij x j − e i ≤ b i P j a ij x j = b i a ij x j + e + i − e − i = b i P j a ij x j + e + i − e − i = b i Var. Bnds x j ≥ l j x j ≥ l j x j + e + j ≥ l j x j ≤ u j x j ≤ u j x j − e − j ≤ u j As shown in Fig. 1, the General MAX FS algorithm of [3] has an innerloop (line 10) and an outer loop (line 5). The inner loop tests each member3f a set of candidate constraints in CandidateSet by examining their eﬀecton Z when temporarily removed from the model. The candidate constraintthat most reduces Z when removed is selected as the Winner candidate, andis subsequently removed from the model and added to

MINULR in the outerloop. This process continues until Z = 0 (feasibility is reached). The con-straints remaining in the model constitute the heuristic MAX FS solution;the removed constraints in MINULR constitute the heuristic MIN ULR so-lution. Algorithm variants diﬀer in how the list of candidate constraints isconstructed.The heuristic relies on

Observation 3 from [3], which notes that a con-straint that appears in more than one IIS will reduce Z more than a constraintthat appears in a single IIS because it will eliminate multiple IISs upon re-moval. This means that a large drop in Z when a constraint is removed isa useful sign that the constraint is part of the MINULR set. The larger thedrop in Z , the more IISs the constraint likely covers.There are eﬃciency enhancements in terms of algorithm processing time.If the CandidateSet has a single element (line 6), then that element is addedto

MINULR and the algorithm exits: there is no need to solve the LP. Asthe

W inner constraint is updated, so is the list of candidates for the nextround,

N extCandidateSet . All versions of the algorithm solve multiple LPs,but each LP is identical to the last one except for two constraints, so simplexadvanced starts to speed the process considerably.Algorithm 1 to 3 of the base algorithm diﬀer in how the list of candidateconstraints is constructed, as described next.

Candidate constraints are those to which the elastic objective function issensitive (nonzero dual price). Constraints not in this list do not aﬀect Z when dropped, so they are not candidates ( Observation 4 from [3]). This isthe longest possible candidate list, but its length can be limited as follows.Sort the constraints in descending order by their absolute dual cost and takeonly the ﬁrst k elements of the sorted list. This is referred to as Algorithm1( k ). For clarity in the sequel, unlimited lists are indicated by k = ∞ . Algorithm 2 uses a diﬀerent criterion for constructing the

CandidateSet based on

Observation 5 in [21]: a good predictor of the magnitude of thedrop in Z obtained by deleting a violated constraint j is given by Eqn. (1):4 igure 1 General MAX FS Algorithm [3] MINULR ← ∅ Set up elastic LP. Solve elastic LP. Construct

CandidateSet.

5: do [ outer loop ]:

6: if | CandidateSet | = 1 then7: Add the single candidate to

MINULR and exit.

8: end if9:

WinnerZ ← ∞ .

10: for each candidate j in CandidateSet do [ inner loop ]: Delete candidate j . Solve elastic LP.

13: if Z = 0 then14: Add candidate j to MINULR and exit.

15: end if16: if

Z <

WinnerZ then17:

Winner ← candidate j. WinnerZ ← Z. Construct

NextCandidateSet .

20: end if21:

Reinstate candidate j .

22: end for23:

Add

Winner to MINULR . Delete

Winner permanently.

CandidateSet ← NextCandidateSet .

26: end doOUTPUT : MINULR is a heuristic MIN ULR solution. roduct : e j × | constraint j dual price | (1)For equality constraints, the larger of the two elastic variables is used in Eqn.(1). Algorithm 2 can also limit the length of the candidate list. First sort thecandidate constraints in descending order by product magnitude, and thenadd only the top k constraints to CandidateSet ; this is Algorithm 2( k ). Listlength k = 1 is particularly useful in some applications, where it is referredto as the maximum product algorithm. Algorithm 3 uses

Observation 6 [21]: for satisﬁed constraints, a good pre-dictor of the relative magnitude of any drop in Z that may be obtained bydeleting the constraint is given by | constraint dual price | . Algorithm 3 uses 2lists: ( i ) for violated constraints, the list of product magnitudes (Eqn. (1)),and for satisﬁed constraints ( ii ) the list of absolute constraint sensitivities.Both lists are independently sorted in descending order, and the top k ele-ments are taken from each list, resulting in 2 k candidates in total. This isreferred to as Algorithm 3( k ). A sparse solution is one having very few nonzeros. Algorithms 1-3 can beused to ﬁnd sparse solutions to general systems of linear constraints as fol-lows. Given the system Ax {≤ , = , ≥} b , add the variable zeroing constraints x = . Now apply any of Algorithms 1-3 with candidates chosen only fromamong the constraints x = . The MAX FS solution provides the sparsestsolution by allowing the smallest number of nonzero variables. Finding thesparsest solution to an underdetermined system of linear equalities in un-bounded variables is of particular interest in a number of applications. Wefocus on this problem here.Chinneck’s algorithms have been adapted for underdetermined systemsof linear equations in unbounded variables. Jokar and Pfetsch [22] modiﬁedChinneck’s algorithm as follows. Given the m × n linear system Ax = b inunbounded variables, all variables x j are ﬁrst replaced by u j − v j where both u j and v j are nonnegative, and the following LP is constructed: min Z = X j ( u j + v j ) s.t. A ( u − v ) = b , u ≥ , v ≥ (2)6his LP has m constraints in 2 n nonnegative variables. The objective func-tion attempts to drive all variables to zero in a manner similar to drivingelastic variables to zero in Algorithms 1-3.The basic algorithm follows the logic in Fig. 1, except that candidatesare variables and not constraints. Additional diﬀerences from Fig. 1: • Candidate variables are those not in the MIN ULR set and having anonzero value in the current LP solution. Candidates are sorted indecreasing order by the magnitude of | u j − v j | . • Candidate variables are tested in the inner loop by temporarily remov-ing them from the objective function by setting the objective coeﬃ-cients of the associated u j , v j pair to zero and re-solving the LP. • A candidate variable x j is moved into the MIN ULR set by resettingthe objective function coeﬃcients of u j and v j to zero permanently sothat x j > Z . • At exit, the variables in the MIN ULR set are those that can takenonzero values in the sparse solution. • Extraneous variables are sometimes included in the MIN ULR solution.A simple post-processing step may remove them. First construct A mn by eliminating all columns from A that are not in the MIN ULR solu-tion. For each variable in the MIN ULR set in turn, force it to zero andtest the feasibility of A mn x = b ; if feasible then the tested variable ispermanently removed from the MIN ULR set.The well-known BasisP ursuit (BP) algorithm [23] solves system (2) asingle time. This works well if the solution is very sparse, but when this isnot the case, Fig.1 with the modiﬁcations above is more eﬀective.Several other recent variants have been shown to be very eﬀective for thisproblem [24, 11]: • Method B uses system (2). The objective coeﬃcients of the u j and v j associated with variable x j moved into the MIN ULR set are perma-nently reset to 0.1 instead of zero [24]. Since Z never reaches 0, exit isinstead triggered when | CandidateSet | = 0.7 Method M combines Method B with

Basis Pursuit (BP) [23], basedon the observation that if BP fails, it returns a solution that is typicallyof size m or nearly so. Thus, the BP result (the set of nonzeros in theﬁrst solution of (2)) is taken when the number of nonzeros is less than m −

3. Method B is applied when the number of nonzeros in that ﬁrstsolution is larger than m − • Method C has explicit elastic variable zeroing constraints, resultingin the following LP:min Z = X j ( e + j + e − j ) s.t. (cid:20) A 0 m × n m × n I I − I (cid:21) xe + e −  = (cid:20) y m × n × (cid:21) (3)where e + j and e − j are nonnegative, A is m × n and I is n × n . This LPhas m + n constraints in 3 n variables. There are two lists of candidates:(1) a list based on the magnitude of nonzero variables not yet assignedto the MIN ULR set, and (2) a list based on the dual prices of thevariable zeroing constraints x j + e + j − e − j = 0. The selected variable isadded to the MIN ULR set by zeroing the objective coeﬃcients of thecorresponding elastic variables.

3. Extensions to Increase Solution Speed

The existing algorithms are eﬀective, but there are opportunities for im-proving their speed. Limiting the length of the candidate list is helpful,but there are still two bottlenecks: evaluating multiple candidates in theinner loop, and removing just a single constraint in the outer loop. NewExtension 1 eliminates the inner loop entirely and removes multiple candi-date constraints simultaneously in the single remaining loop. The trick is inidentifying which constraints to remove in each iteration. New Extension 2provides an early exit under certain conditions.Each of the extensions can be combined with any of the existing list con-struction algorithms to create new combinations, e.g. combining Algorithm2( ∞ ) and Extension 1 creates Algorithm 2( ∞ )E1. There are three ways to assemble the candidate list in Algorithms 1-3.Regardless of the type of list, previous research shows that the order in the8orted list of candidates is important: those earlier in the list are much morelikely to be part of the MIN ULR than those later in the list. Thus, if weremove multiple constraints at each iteration of a single loop, it makes senseto select them from the start of the sorted list. This argument is especiallypotent when the constraint matrix is dense. In a dense constraint matrix, allconstraints involve all of the variables, so all of the constraints are similar instructure. A consequence is that many of the constraints have similar impactson Z . For example, in the classiﬁcation application, where the constraintsare derived from the data points, data points that are close together generatesimilar constraints. These similar constraints have similar ranking scores inthe candidate list, and hence have similar eﬀects on Z when removed fromthe LP, leading to this Observation : When candidates have similar rankingmeasures as the top-ranked element in the candidate list, they are likely toappear in the ﬁnal MIN ULR set .This observation allows both the elimination of the inner loop to testindividual candidate constraints and the simultaneous removal of multiplecandidates in the single remaining loop. The question is how to determinewhich ranking scores are “similar” to the top score. We sort the candidates bydescending ranking score and apply standard methods for detecting abruptchanges in the mean of subsets of the set [25]. We select for removal allcandidates up to the ﬁrst identiﬁed abrupt change.

Extension 2 provides an early exit. If the number of candidates is lessthan or equal to ℓ , then permanently remove all candidates and exit thealgorithm. This is designated E2( ℓ ).

4. Application: Binary Classiﬁcation

Amaldi [2], Parker [17], Chinneck [26, 27, 21], and Silva [28] have shownhow the binary classiﬁcation problem can be transformed into a MAX FSproblem:

Given a training set of I data points ( i =1. . . I ) in J dimensions( j =1. . . J ) where the class of each point is known (Type 0 or 1). Deﬁne a constraint for each data point: – For points of Type 0: P j d ij w j ≤ w − ǫ For points of type 1: P j d ij w j ≥ w + ǫ where d ij indicates the value of attribute j for point i . The d ij areknown constants, ǫ is a small positive constant and the variables w j are unrestricted.Most datasets cannot be completely separated by a single hyperplane, sothe initial system is infeasible. After the MAX FS solution, the ﬁnal LP isfeasible, and its solution returns the separating hyperplane P j d ij w j = w which correctly classiﬁes all of the data points in the MAX FS subset whilethose in the MIN ULR subset are incorrectly classiﬁed. This is heuristicallythe most accurate hyperplane. Prior work shows that Algorithm 2 provides a major speed-up over Al-gorithm 1 for binary classiﬁcation [21], with a tiny loss in average accuracy.Violated constraints correspond to misclassiﬁed points, so the product rank-ing score (Eqn. 1) used in Algorithm 2 estimates the drop in Z relativelyaccurately. A standard elastic program is used because the variables areunrestricted. Since all attribute values are speciﬁed for all data points, theconstraint matrix, D , is dense; hence, Extension 1 can be applied. The re-vised algorithm is designated as Algorithm 2E1, and is summarized in Fig.2. To illustrate the workings of Extension 1, Fig. 3 shows the ﬁrst sortedcandidate list for the “Ozone Level Detection” dataset. The ﬁrst sharp drop-oﬀ occurs after the ﬁfth candidate, so the ﬁrst ﬁve constraints/points areremoved in this case. We compare Algorithm 2E1 with two existing algorithms: (i) originalAlgorithm 2( ∞ ) and (ii) original Algorithm 2(1). Both have been shown toprovide high accuracies in binary classiﬁcation [21]. Accuracy is used to evaluate the quality of the solutions:

Accuracy = Total correctly classiﬁed instancesTotal instances (4)For comparison with previous results [21], the training set is identical to theentire dataset. 10 igure 2

Algorithm 2E1

MINULR ← ∅

Set up the standard elastic LP. do:

Solve LP. if Z = 0 then Exit. end if

Construct

CandidateSet using Eq. 1. if | CandidateSet | = 1 then Add the single candidate to

MINULR and exit. end if

Sort candidates from largest to smallest product.Select top p candidates for removal based on abrupt change in product score.Add the p selected candidates to MINULR and remove them from the LP. end doOUTPUT : MINULR is a heuristic MIN ULR solution. P r odu c t m a gn i t ud e Sorted candidates

Figure 3:

First Sorted Candidate List for “Ozone Level Detection” Dataset. .4. Data Sets Twenty binary classiﬁcation problems are derived from datasets in theUCI Repository of Machine Learning Databases [29]. Table 2 summarizestheir characteristics.

Table 2: Classiﬁcation Datasets

Dataset Instances Features

Breast Cancer Wisconsin Original 683 9Cardiotocography (Type 1 vs. others) 2126 34Cardiotocography (Type 2 vs. others) 2126 34Car Evaluation (Type 1 vs. others) 1728 6Car Evaluation (Type 2 vs. others) 1728 6Car Evaluation (Type 3 vs. others) 1728 6Car Evaluation (Type 4 vs. others) 1728 6Climate Model Simulation Crashes 540 18Credit Approval 653 15Diabetic Retinopathy Debrecen 1151 19Glass Identiﬁcation 214 9Heart Disease 297 13Hepatitis Domain 112 18Ionosphere 351 33BUPA Liver Disorders 341 6Ozone Level Detection 1848 102Parkinson’s Data Set 195 22Pima Indians Diabetes 768 8Thyroid Gland Data 215 5Vowel Recognition Data 990 11

All algorithms are implemented in Matlab version 2020.The linear pro-gramming solver is MOSEK via the MOSEK Optimization Toolbox for Mat-lab version 8 . . .

56 [30]. Abrupt changes in the ranking scores in the can-didate list are identiﬁed using the Matlab ischange () function. The compu-tations are carried out on a 3 .

40 GHz Intel core i . GB RAM, running Windows 10.

The results in this section show the higher accuracy and greater speedof Algorithm 2E1 vs. original Algorithms 2( ∞ ) and 2(1) for binary classi-ﬁcation. Table 3 reports algorithm accuracies with the best results bolded.12able 4 shows the diﬀerence from the best accuracy on each data set. Thelast three rows give additional metrics: the largest diﬀerence from the bestaccuracy, the average diﬀerence from the best accuracy, and number of bestaccuracies obtained by each algorithm.Algorithm 2E1 provides better total accuracy. It has the smallest averagediﬀerence from best accuracy, and provides the best accuracy for 12 of the 20data sets while it was 8 for the two existing algorithms. The largest diﬀerencefrom the best accuracy for Algorithm 2E1 is − .

38, which is smaller thanthose for the other two ( − .

28 for both).

Table 3: Algorithm Accuracies for Classiﬁcation Datasets.

Dataset AccuracyAlgorithm 2( ∞ ) Algorithm 2(1) Algorithm 2E1 Breast Cancer Wisconsin Original 98.10 98.10

Cardiotocography (Type 1 vs.others) 99.72 99.72

Cardiotocography (Type 2 vs.others) 99.67 99.58

Car Evaluation (Type 1 vs.others) 78.47 78.47

Car Evaluation (Type 2 vs.others)

Car Evaluation (Type 3 vs.others) 87.79 88.42

Car Evaluation (Type 4 vs.others)

Hepatitis Domain 92.86

Ionosphere 98.01

Parkinsons Data Set 94.36 94.87

Pima Indians Diabetes

Vowel Recognition Data 98.38 98.38

Table 5 compares the number of linear programs solved (

LPs ) and theprocessing time in seconds ( sec ) for each algorithm, with best results in boldface. Algorithm 2E1 requires the fewest LP solutions and the least processingtime in all cases. Table 6 reports the percentage reduction in number ofLPs and processing time for Algorithm 2E1 vs. the comparators for eachdataset. For example, the processing times for Algorithms 2( ∞ ) and 2E1 are3.1 and 0.3 seconds respectively for the ﬁrst dataset in Table 5. Therefore,Arithm 2E1 provides a 90 .

32% reduction in processing time when comparedto Algorithm 2( ∞ ) for the ﬁrst dataset in Table 6. The last two rows ofTable 6 show the maximum and average speed improvements obtained by13 able 4: Diﬀerences from Best Accuracy. Dataset Algorithm 2( ∞ ) Algorithm 2(1) Algorithm 2E1 Breast Cancer Wisconsin Original -0.29 -0.29 Cardiotocography (Type 1 vs.others) -0.14 -0.14 Cardiotocography (Type 2 vs.others) -0.14 -0.23 Car Evaluation (Type 1 vs.others) -4.28 -4.28 Car Evaluation (Type 2 vs.others)

Car Evaluation (Type 3 vs.others) -1.22 -0.59 Car Evaluation (Type 4 vs.others) -0.06 -0.06Climate Model Simulation Crashes -0.19 -0.18Credit Approval -1.37 -1.37Diabetic Retinopathy Debrecen -0.35 -0.79Glass Identiﬁcation -0.45 -1.38Heart Disease Cleveland Hepatitis Domain -3.57

Ionosphere -0.28BUPA Liver Disorders -0.29 -0.88Ozon Level Detection -0.33 -0.06 Parkinsons Data Set -2.05 -1.54 Pima Indians Diabetes -0.39 -0.39Thyrod gland data Vowel Recognition Data -0.1 -0.1 -4.28 -4.28 -1.38Average Diﬀerence -0.66 -0.46 -0.27No. of bests Algorithm 2E1 when compared with the other two algorithms. Algorithm2E1 reduces the average number of LPs by about 96 .

56% and 67 .

90% whencompared to Algorithms 2( ∞ ) and 2(1), respectively, and also reduces thesolution time by 94 .

77% and 71 .

31% when compared to Algorithm 2( ∞ ) andAlgorithm 2(1) respectively.Algorithm 2E1 thus reduces solution time by an average 70-90% whencompared to the existing algorithms, while also marginally improves the totalaccuracy on average.

5. Application: Sparse Recovery in Compressive Sensing

Finding a sparse solution to an underdetermined system of linear equa-tions in unbounded variables is an essential step in compressive sensing (CS)[31]. A common way to compress an input signal vector x in CS is to multiplyit by a random matrix A of size m × n where m << n to yield a compressedvector b of size m , which is transmitted or stored. The input vector is as-sumed to be sparse, so the decompression or recovery process tries to ﬁnd asparse solution given only A and the compressed vector b . The CS sparse able 5: Algorithm speeds (number of LPs, seconds) for binary classiﬁcation. Dataset Algorithm 2( ∞ ) Algorithm 2(1) Algorithm 2E1LPs Sec LPs Sec LPs Sec Breast Cancer Wisconsin Original 100 3.1 12 0.7

Cardiotocography (Type 1 vs.others) 35 4.3 4 0.6

Cardiotocography (Type 2 vs.others) 67 5.1 4 0.6

Car Evaluation (Type 1 vs.others) 2583 350.6 372 28.1

Car Evaluation (Type 2 vs.others) 462 28.7 69 4.5

Car Evaluation (Type 3 vs.others) 1458 183.2 203 13.8

10 0.9

Car Evaluation (Type 4 vs.others) 205 16.7 33 2.3

Climate Model Simulation Crashes 77 3.2 7 0.3

Credit Approval 1237 37.1 87 7.3

Diabetic Retinopathy Debrecen 4723 625.2 238 25.6

16 1.4

Glass Identiﬁcation 292 3.1 33 1.5

Heart Disease Cleveland 404 5.3 31 1.0

Hepatitis Domain 16 0.4 4 0.2

Ionosphere 104 4.2 7

BUPA Liver Disorders 581 14.8 83 2.7

10 0.3

Ozon Level Detection 542 159.7 14 4.1

Parkinsons Data Set 103 2.7 7

Pima Indians Diabetes 1325 41.7 153 6.7

12 0.5

Thyrod gland data 61 1.3 12 0.4

Vowel Recognition Data 146 4.0 14 0.5 recovery problem is to ﬁnd a sparse solution y to the underdetermined system Ay = b , where y is of size n × mathbf A , and thus the recovery process becomes more suitable forthe extended algorithms introduced here. This section examines whetherthe new algorithm extensions can increase the speed of the MAX FS basedalgorithms for the sparse recovery problem without loss of solution quality. Methods B, C, and M (Section 2.4) have been applied to CS sparse re-covery and are observed to provide improvements over the state of the art incertain scenarios. Previous work [11] showed Method M to be one of the bet-ter methods for CS sparse recovery. So we combine it with our two extensionsto create Method ME1E2( ℓ ). The steps are summarized in Fig. 4.Method ME1E2( ℓ ) assumes BP failure if the length of the ﬁrst candidatelist is greater than the deﬁned threshold ℓ . We set ℓ to m −

3, as done for15 able 6: Percentage reductions in LPs solved and solution times by using Algorithm 2E1instead of original Algorithms 2( ∞ ) and 2(1) for classiﬁcation. Dataset Algorithm 2E1 vs.Algorithm 2( ∞ ) Algorithm 2E1 vs.Algorithm 2(1)LPs Sec LPs Sec Breast Cancer Wisconsin Original 94 90.32 50 57.14Cardiotocography (Type 1 vs.others) 94.29 93.02 50 50Cardiotocography (Type 2 vs.others) 97.01 94.12 50 50Car Evaluation (Type 1 vs.others) 99.69 99.80 97.85 97.51Car Evaluation (Type 2 vs.others) 99.78 99.65 98.55 97.78Car Evaluation (Type 3 vs.others) 99.31 99.51 95.07 93.48Car Evaluation (Type 4 vs.others) 97.56 97.60 84.85 82.61Climate Model Simulation Crashes 93.51 93.75 28.57 33.33Credit Approval 99.92 99.73 98.85 98.63Diabetic Retinopathy Debrecen 99.66 99.78 93.28 94.53Glass Identiﬁcation 97.60 90.32 78.79 80Heart Disease Cleveland 98.51 96.23 80.64 80Hepatitis Domain 87.5 75 50 50Ionosphere 95.19 97.62 28.57 66.67BUPA Liver Disorders 98.28 97.97 87.95 88.89Ozon Level Detection 98.71 98.68 50 48.78Parkinsons Data Set 96.12 96.30 42.86 74.36Pima Indians Diabetes 99.09 98.80 92.16 92.54Thyrod gland data 90.16 84.61 50 50Vowel Recognition Data 95.20 92.5 50 40

Maximum Percent Reduction

Average Percent Reduction

Method M in [11].

Method ME1E2( ℓ ) is compared with the original Methods B(2), C(2),and M(2). We choose list length k = 2 to provide fast solutions with goodsolution quality ( k is typically 1 − In sparse recovery, if the number of nonzeros in recovered signal y equalsthe number of nonzeros in the input signal x , then y and x are usuallyidentical. For this reason our main metric relates to the sparsity (number ofnonzeros) of y vs. x . The critical sparsity is the largest number of nonzerosin the input vector x for which the sparsity of y reliably equals the sparsityof x . Higher critical sparsity means that input vectors that have been morecompressed, or that they have more nonzeros and therefore can be reliably16 igure 4 Algorithm ME1E2( ℓ ) MINULR ← ∅

Set up the sparse recovery LP. do:

Solve LP.Construct

MINULR and exit. end if

Sort candidates from largest to smallest | u j − v j | .Select top p candidates for removal based on ﬁrst abrupt change in | u j − v j | .Add the p selected candidates to MINULR and set the objective coeﬃcients of theassociated u j , v j pairs to 0.1. end doOUTPUT : MINULR is small number of nonzeros forming a support for the system ofequations. recovered. We use the critical sparsity as the evaluation metric: higher isbetter.

We construct examples for which the solution is known. The system is A × x × = b × , providing a compression ratio ( CR ) = (1 − mn ) × A is a random matrix with values ranging from -10 to 10, and hencedense. The constructed input vector x has various numbers of nonzerosranging from 12 to 96, randomly distributed among its elements, and withvalues randomly drawn from a normal distribution with mean 0 and variance1. See Section 4.5.

Table 7 compares Algorithm ME1E2( ℓ ) with existing methods, namely,Methods B(2), C(2), and M(2). Results in the table are averages over 100instances at each sparsity level S (actual number of nonzeros in the solution). Estimated sparsity ( T ) is the average number of nonzeros returned by eachmethod at each value of S over 100 instances, with the number of correct17esults shown in brackets. Results that are accurate in all 100 cases are shownin boldface. The critical sparsity is the largest number shown in boldface foreach method. Extended Algorithm ME1E2( ℓ ) has the same critical sparsityas the other methods.Table 8 shows the number of LPs solved and the processing time forMethod ME1E2( ℓ ) and the comparators. The smallest number of LPs andtime in seconds are bolded. Method ME1E2( ℓ ) requires the fewest LP solu-tions at all values of S . It is signiﬁcantly faster than the other algorithms, asshown in Table 9 which reports the percentage reduction in LP solutions re-quired and processing times for Method ME1E2( ℓ ) when compared to others.The last two rows of the table show the maximum and the average percentimprovement. At S = 52, Method ME1E2( ℓ ) reduces the solution time by96 . . .

83% compared to Methods C(2), B(2), and M(2),respectively. Method ME1E2( ℓ ) does not run faster than Method M(2) whenthe input is very sparse (up to S = 36), but for S >

36 it reduces solutiontime when compared to M(2) by 62 .

32% on average . Method ME1E2( ℓ )provides a 98 . . .

61% reduction in LPs solved when com-pared to Methods C(2), B(2), and M(2), respectively.It reduces the averagesolution time by 95 . . .

32% when compared to MethodsC(2), B(2), and M(2), respectively.Method ME1E2( ℓ ) reduces solution time signiﬁcantly with no loss of so-lution quality (as measured by critical sparsity) for this CS sparse recoveryscenario. 18 able 7: Average and critical sparsities of Method ME1E2( ℓ ) vs. Methods C(2), B(2) andM(2) for random matrices of size m = 128 and n = 256 in CS sparse recovery. S Estimated sparsity (T)Method C(2) Method B(2) Method M(2) Method ME1E2( ℓ )12 12 (100) (100) (100) (100)

16 16 (100) (100) (100) (100)

20 20 (100) (100) (100) (100)

24 24 (100) (100) (100) (100)

28 28 (100) (100) (100) (100)

32 32 (100) (100) (100) (100)

36 36 (100) (100) (100) (100)

40 40 (100) (100) (100) (100)

44 44 (100) (100) (100) (100)

48 48 (100) (100) (100) (100)

52 52 (100) (100) (100) (100) (99) (98) (98) (98) (90) (86) (86) (86) (77) (67) (67) (73) (50) (45) (45) (39) (25) (25) (25) (19) (9) (5) (5) (6) (3) (4) (4) (3) (1) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) able 8: Number of LPs and processing times for Method ME1E2( ℓ ) vs. Methods C(2),B(2) and M(2) for random matrices of size m = 128 and n = 256 in CS sparse recovery. S Method C(2) Method B(2) Method M(2) Method ME1E2( ℓ )LPs Sec LPs Sec LPs Sec LPs Sec12

48 1.44 23 0.84

64 2.02 31 1.16

493 23.7 245.06 13.91 256.08 14.05

6. Conclusions

We propose two new extensions to Chinneck’s original MAX FS solutionalgorithms [3] [21] for use with dense constraint matrices. These extensionsincrease solution speed greatly with no loss of solution quality when tested intwo applications: classiﬁcation, and sparse recovery in compressive sensing: • Classiﬁcation : The new Algorithm 2E1 reduces the mean solution timeby about 94 .

77% and 71 .

31% when compared to Algorithms 2( ∞ ) and2(1), respectively, while slightly improving accuracy on average. Thebetter classiﬁcation accuracies reﬂect larger feasible subsets found bythe algorithm over its predecessors. • Sparse recovery in compressive sensing : The new Method ME1E2( ℓ )reduces processing time by 95 . .

86% and 62 .

32% when comparedwith Methods C, B and M, on average, with no reduction in the criticalsparsity.We expect that these results will generalize to other applications that havedense constraint matrices. We plan to investigate the use of the extended20 able 9: Percent reduction in number of LPs and processing times for Method ME1E2( ℓ )vs. Methods C(2), B(2) and M(2) for random matrices of size m = 128 and n = 256 inCS sparse recovery. S Method ME1E2( ℓ ) vs.Method C(2) Method ME1E2( ℓ ) vs.Method B(2) Method ME1E2( ℓ ) vs.Method M(2)LPs Sec LPs Sec LPs Sec12 Maximum

Average algorithms in the recovery of compressively sensed biomedical signals anddictionary learning. We also intend to extend the basic ideas to less denseconstraint matrices, including those found in general LPs.

References [1] E. Amaldi, M. E. Pfetsch, L. E. Trotter, Some structural and algorithmicproperties of the maximum feasible subsystem problem, In InternationalConference on Integer Programming and Combinatorial Optimization(1999) 45–59.[2] E. Amaldi, From ﬁnding maximum feasible subsystems of linear systemsto feedforward neural network design, PhD dissertation, Swiss FederalInstitute of Technology at Lausanne (EPFL), 1994.213] J. W. Chinneck, An eﬀective polynomial-time heuristic for theminimum-cardinality iis set-covering problem, Annals of Mathematicsand Artiﬁcial Intelligence 17 (1996) 127–144.[4] N. Chakravarti, Some results concerning post-infeasibility analysis,Oper. Res. 73 (1994) 139–143.[5] E. Amaldi, V. Kann, The complexity and approximability of ﬁndingmaximum feasible subsystems of linear relations, Oper. Res. 147 (1994)181–210.[6] J. K. Sankaran, A note on resolving infeasibility in linear programs byconstraint relaxation, Oper. Res. 13 (1993) 19–20.[7] K. P. Bennett, E. J. Bredensteiner, A parametric optimization methodfor machine learning, INFORMS Journal on Computing 9 (1997) 311–318.[8] O. L. Mangasarian, Misclassiﬁcation minimization, Journal of GlobalOptimization 5 (1994) 309–323.[9] F. Rossi, S. Smriglio, A. Sassano, Models and algorithms for terrestrialdigital, Annals of Operations Research 107 (2001) 267–283.[10] M. Wagner, R. Elber, Large-scale linear programming techniques forthe design, Annals of Operations Research 318 (2004) 301–318.[11] F. F. Firouzeh, J. W. Chinneck, S. Rajan, Maximum feasible subsystemalgorithms for recovery of compressively sensed speech, IEEE Access 8(2020) 82539–82550.[12] J. W. Chinneck, Feasibility and Infeasibility in Optimization: Algo-rithms and Computational Methods, volume 118, Springer Science &Business Media, 2007.[13] E. Amaldi, The maximum feasible subsystem problem and some appli-cations, Modelli e Algoritmi per l’Ottimizzazione di Sistemi Complessi,A. Agnetis and G. D. Pillo, eds., Pitagora Editrice, Bologna, Italy (2003)31–69.[14] O. L. Mangasarian, Misclassiﬁcation minimization, Journal of GlobalOptimization 5 (1994) 309–323. 2215] O. Mangasarian, Machine learning via polyhedral concave minimization,in: Applied Mathematics and Parallel Computing, Springer, 1996, pp.175–188.[16] K. P. Bennett, E. J. Bredensteiner, A parametric optimization methodfor machine learning, INFORMS Journal on Computing 9 (1997) 311–318.[17] M. R. Parker, A set covering approach to infeasibility analysis of linearprogramming problems and related issues, Ph.D. thesis, University ofColorado at Denver Denver, Colorado, 1995.[18] M. Parker, J. Ryan, Finding the minimum weight iis cover of an infea-sible system of linear inequalities, Annals of Mathematics and ArtiﬁcialIntelligence 17 (1996) 107–126.[19] M. E. Pfetsch, The maximum feasible subsystem problem and vertex-facet incidences of polyhedra, Ph.D. thesis, TU Berlin, Berlin, 2003.[20] M. E. Pfetsch, Branch-and-cut for the maximum feasible subsystemproblem, SIAM Journal on Optimization 19 (2008) 21–38.[21] J. W. Chinneck, Fast heuristics for the maximum feasible subsystemproblem, INFORMS Journal on Computing 13 (2001) 210–223.[22] S. Jokar, M. E. Pfetsch, Exact and approximate sparse solutions of un-derdetermined linear equations, SIAM Journal on Scientiﬁc Computing31 (2008) 23–44.[23] S. S. Chen, D. L. Donoho, M. A. Saunders, Atomic decomposition bybasis pursuit, SIAM review 43 (2001) 129–159.[24] J. W. Chinneck, Sparse solutions of linear systems via maximum feasiblesubsets, Les Cahiers du GERAD G-2018-104 (2018).[25] R. Killick, P. Fearnhead, I. A. Eckley, Optimal detection of changepointswith a linear computational cost, Journal of the American StatisticalAssociation 107 (2012) 1590–1598.[26] J. W. Chinneck, Tailoring classiﬁer hyperplanes to general metrics, in:Operations research and cyber-infrastructure, Springer, 2009, pp. 365–387. 2327] J. W. Chinneck, Integrated classiﬁer hyperplane placement and featureselection, Expert Systems with Applications 39 (2012) 8193–8203.[28] A. P. D. Silva, Optimization approaches to supervised classiﬁcation,European Journal of Operational Research 261 (2017) 772–788.[29] D. Dua, C. Graﬀ, UCI machine learning repository, 2017. URL: http://archive.ics.uci.edu/ml .[30] Mosek ApS , The mosek optimization software, 2018. URL: , last accessed 2021.[31] E. J. Cand`es, M. B. Wakin, An introduction to compressive sampling,IEEE signal processing magazine 25 (2008) 21–30.[32] S. Qaisar, R. M. Bilal, W. Iqbal, M. Naureen, S. Lee, Compressive sens-ing: From theory to applications, a survey, Journal of Communicationsand networks 15 (2013) 443–456.[33] S. B. Meenakshi, A survey of compressive sensing based greedy pursuitreconstruction algorithms, International Journal of Image, Graphics andSignal Processing 7 (2015) 1–10.[34] F. Salahdine, N. Kaabouch, H. El Ghazi, A survey on compressive sens-ing techniques for cognitive radio networks, Physical Communication20 (2016) 61–73. 24 o w e l R e c ogn i t i on D a t a T h y r od g l and da t a P i m a I nd i an s D i abe t e s P a r k i n s on s D a t a S e t O z on Le v e l D e t e c t i on B U P A L i v e r D i s o r de r s I ono s phe r e H epa t i t i s D o m a i n H ea r t D i s ea s e C l e v e l and G l a ss I den t i f i c a t i on D i abe t i c R e t i nopa t h y D eb r e c en C r ed i t A pp r o v a l C li m a t e M ode l S i m u l a t i on C r a s he s C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) B r ea s t C an c e r W i sc on s i n O r i g i na l Dataset -7-6-5-4-3-2-10123 A cc u r a cy d i ff e r en c e Algorithm2Algorithm2(2)Algorithm2 E2(L)E3(p) V o w e l R e c ogn i t i on D a t a T h y r od g l and da t a P i m a I nd i an s D i abe t e s P a r k i n s on s D a t a S e t O z on Le v e l D e t e c t i on B U P A L i v e r D i s o r de r s I ono s phe r e H epa t i t i s D o m a i n H ea r t D i s ea s e C l e v e l and G l a ss I den t i f i c a t i on D i abe t i c R e t i nopa t h y D eb r e c en C r ed i t A pp r o v a l C li m a t e M ode l S i m u l a t i on C r a s he s C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) B r ea s t C an c e r W i sc on s i n O r i g i na l Dataset -8-6-4-2024 A cc u r a cy d i ff e r en c e Algorithm2Algorithm2(2)Algorithm2 (2)E1(2)E3(p)

70 75 80 85 90 95 100

Accuracy

Vowel Recognition DataThyrod gland dataPima Indians DiabetesParkinsons Data SetOzon Level DetectionBUPA Liver DisordersIonosphereHepatitis DomainHeart Disease ClevelandGlass IdentificationDiabetic Retinopathy DebrecenCredit ApprovalClimate Model Simulation CrashesCar Evaluation (type 4 vs.others)Car Evaluation (type 3 vs.others)Car Evaluation (type 2 vs.others)Car Evaluation (type 1 vs.others)Cardiotocography (type 2 vs.others)Cardiotocography (type 1 vs.others)Breast Cancer Wisconsin Original D a tt