Faster Maximum Feasible Subsystem Solutions for Dense Constraint Matrices
Fereshteh Fakhar Firouzeh, John W. Chinneck, Sreeraman Rajan
aa r X i v : . [ m a t h . O C ] F e b Faster Maximum Feasible Subsystem Solutions forDense Constraint Matrices
Fereshteh Fakhar Firouzeh ∗ , John W. Chinneck, Sreeraman Rajan Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada
Abstract
Finding the largest cardinality feasible subset of an infeasible set of linearconstraints is the
Maximum Feasible Subsystem problem (MAX FS). Solvingthis problem is crucial in a wide range of applications such as machine learn-ing and compressive sensing. Although MAX FS is NP-hard, useful heuristicalgorithms exist, but these can be slow for large problems. We extend theexisting heuristics for the case of dense constraint matrices to greatly increasetheir speed while preserving or improving solution quality. We test the ex-tended algorithms on two applications that have dense constraint matrices:binary classification, and sparse recovery in compressive sensing. In bothcases, speed is greatly increased with no loss of accuracy.
Keywords:
Maximum Feasible Subsystem, Linear Programming,Classification, Compressive Sensing
1. Introduction
Finding the maximum cardinality feasible subsystem of an infeasible setof linear constraints is known as the
Maximum Feasible Subsystem problem(MAX FS) [1]. Its complement is the
Minimum Unsatisfied Linear Relation problem (MIN ULR) [2] of finding the minimum number of constraints toremove from an infeasible set such that the remaining constraints have afeasible solution. Moreover, all infeasible systems have atleast one
IrreducibleInfeasible Subsets (IISs) in their original set of constraints. An IIS is a set ∗ Corresponding author
Email address: [email protected] (Fereshteh FakharFirouzeh)
Preprint submitted to Journal of Computers & Operations Research February 12, 2021 f constraints that is infeasible, but is rendered feasible if any constraintis removed. If all of the IISs in the model are enumerated, then a MINULR solution is found as a minimum cover of the IISs; this is known asthe minimum IIS cover problem [3]. In this paper, MAX FS, MIN ULR,and MIN IIS COVER are the same problem and the terms will be usedinterchangeably.MAX FS is NP-hard [4, 5, 6] so exact solution algorithms are only avail-able for relatively small instances, but solutions are still needed in a widevariety of applications including machine learning [7], misclassification min-imization [8], training of neural networks [2], telecommunications [9], com-putational biology [10], and compressive sensing (CS) [11]. To be usefulin practice, MAX FS heuristics should obtain high quality solutions (largefeasible subsets, or equivalently, small MIN ULR subsets) quickly.The MAX FS problem can be formulated for exact solution in variousways, including as a mixed-integer linear program [12], a linear program withequilibrium constraints (LPEC) [13, 14, 15, 16], or as a type of set-coveringproblem [17, 18, 19, 20]. As shown in chapter 7 of [12], there are practicaldifficulties in using any of these exact methods to solve this NP-hard problem(i.e. they can be used for a very small MAX FS problem). For this reason,heuristic solutions are required.Chinneck [3, 21] developed a set of effective linear programming (LP)-based heuristics for solving the MAX FS problem, and showed experimentallythat they give better results than methods such as MISMIN, a state-of-the-art LPEC solver at the time [21]. Variants of Chinneck’s methods havebeen developed more recently for finding sparse solutions to underdeterminedsystems of linear equations in unbounded variables [11] for sparse recoveryin compressive sensing.This paper extends Chinneck’s algorithms to provide major improvementsin solution speed for the case of dense constraint matrices, based on the ob-servation that dense constraint matrices provide multiple similar candidatesfor removal during the MAX FS solution process. The new extensions areevaluated on the following two applications that have dense constraint ma-trices: binary classification, and sparse recovery in compressive sensing.Chinneck’s algorithms remove constraints from the original set one by oneuntil what remains is the heuristic MAX FS solution; the removed constraintsconstitute the heuristic MIN ULR solution. An inner loop assesses candidateconstraints, and an outer loop removes the single constraint chosen by theinner loop. Two extensions that greatly improve solution speed when a dense2onstraint matrix provides multiple similar candidates are proposed in thispaper. Extension 1 amalgamates the two loops into a single one that bothassesses the candidates and removes multiple constraints at each iteration.Extension 2 provides an early exit under certain conditions. When tested onbinary classification and compressive sensing sparse recovery, the extendedheuristics greatly reduce solution time while preserving or improving solutionquality.The paper is structured as follows. Section 2 reviews the existing MAXFS solution heuristics. The new extensions are developed in Section 3. Theexperimental results for binary classification are given in Section 4 and forcompressive sensing sparse recovery in Section 5. Section 6 concludes thepaper and outlines future work.
2. LP-based MAX FS Solution Heuristics
Chinneck’s original polynomial time heuristics [3] are briefly summarizedhere. All variants first elasticize the infeasible set of linear row constraints(and possibly variable bounds) by adding nonnegative elastic variables asshown in Table 1. The elastic variables measure the extent to which eachconstraint is violated, so a linear program (LP) minimizing their sum findsthe minimum total violation for an infeasible set of constraints and bounds.There are two types of elastic models: the standard elastic LP elasticizesonly the row constraints, and the full elastic LP elasticizes both the rowconstraints and the column bounds. The objective for a full elastic LP is: min Z = P i ( e + i + e − i ) + P j ( e + j + e − j ) for constraints i = 1 ...m and boundedvariables j = 1 ...n ; the second term is omitted for a standard elastic LP. Z = 0 indicates a feasible system. Table 1: Elasticizing constraints by adding non-negative elastic variables.
Type Nonelastic Standard elastic Full elasticRow Cons. P j a ij x j ≥ b i P j a ij x j + e i ≥ b i P j a ij x j + e i ≥ b i P j a ij x j ≤ b i P j a ij x j − e i ≤ b i P j a ij x j − e i ≤ b i P j a ij x j = b i a ij x j + e + i − e − i = b i P j a ij x j + e + i − e − i = b i Var. Bnds x j ≥ l j x j ≥ l j x j + e + j ≥ l j x j ≤ u j x j ≤ u j x j − e − j ≤ u j As shown in Fig. 1, the General MAX FS algorithm of [3] has an innerloop (line 10) and an outer loop (line 5). The inner loop tests each member3f a set of candidate constraints in CandidateSet by examining their effecton Z when temporarily removed from the model. The candidate constraintthat most reduces Z when removed is selected as the Winner candidate, andis subsequently removed from the model and added to
MINULR in the outerloop. This process continues until Z = 0 (feasibility is reached). The con-straints remaining in the model constitute the heuristic MAX FS solution;the removed constraints in MINULR constitute the heuristic MIN ULR so-lution. Algorithm variants differ in how the list of candidate constraints isconstructed.The heuristic relies on
Observation 3 from [3], which notes that a con-straint that appears in more than one IIS will reduce Z more than a constraintthat appears in a single IIS because it will eliminate multiple IISs upon re-moval. This means that a large drop in Z when a constraint is removed isa useful sign that the constraint is part of the MINULR set. The larger thedrop in Z , the more IISs the constraint likely covers.There are efficiency enhancements in terms of algorithm processing time.If the CandidateSet has a single element (line 6), then that element is addedto
MINULR and the algorithm exits: there is no need to solve the LP. Asthe
W inner constraint is updated, so is the list of candidates for the nextround,
N extCandidateSet . All versions of the algorithm solve multiple LPs,but each LP is identical to the last one except for two constraints, so simplexadvanced starts to speed the process considerably.Algorithm 1 to 3 of the base algorithm differ in how the list of candidateconstraints is constructed, as described next.
Candidate constraints are those to which the elastic objective function issensitive (nonzero dual price). Constraints not in this list do not affect Z when dropped, so they are not candidates ( Observation 4 from [3]). This isthe longest possible candidate list, but its length can be limited as follows.Sort the constraints in descending order by their absolute dual cost and takeonly the first k elements of the sorted list. This is referred to as Algorithm1( k ). For clarity in the sequel, unlimited lists are indicated by k = ∞ . Algorithm 2 uses a different criterion for constructing the
CandidateSet based on
Observation 5 in [21]: a good predictor of the magnitude of thedrop in Z obtained by deleting a violated constraint j is given by Eqn. (1):4 igure 1 General MAX FS Algorithm [3] MINULR ← ∅ Set up elastic LP. Solve elastic LP. Construct
CandidateSet.
5: do [ outer loop ]:
6: if | CandidateSet | = 1 then7: Add the single candidate to
MINULR and exit.
8: end if9:
WinnerZ ← ∞ .
10: for each candidate j in CandidateSet do [ inner loop ]: Delete candidate j . Solve elastic LP.
13: if Z = 0 then14: Add candidate j to MINULR and exit.
15: end if16: if
Z <
WinnerZ then17:
Winner ← candidate j. WinnerZ ← Z. Construct
NextCandidateSet .
20: end if21:
Reinstate candidate j .
22: end for23:
Add
Winner to MINULR . Delete
Winner permanently.
CandidateSet ← NextCandidateSet .
26: end doOUTPUT : MINULR is a heuristic MIN ULR solution. roduct : e j × | constraint j dual price | (1)For equality constraints, the larger of the two elastic variables is used in Eqn.(1). Algorithm 2 can also limit the length of the candidate list. First sort thecandidate constraints in descending order by product magnitude, and thenadd only the top k constraints to CandidateSet ; this is Algorithm 2( k ). Listlength k = 1 is particularly useful in some applications, where it is referredto as the maximum product algorithm. Algorithm 3 uses
Observation 6 [21]: for satisfied constraints, a good pre-dictor of the relative magnitude of any drop in Z that may be obtained bydeleting the constraint is given by | constraint dual price | . Algorithm 3 uses 2lists: ( i ) for violated constraints, the list of product magnitudes (Eqn. (1)),and for satisfied constraints ( ii ) the list of absolute constraint sensitivities.Both lists are independently sorted in descending order, and the top k ele-ments are taken from each list, resulting in 2 k candidates in total. This isreferred to as Algorithm 3( k ). A sparse solution is one having very few nonzeros. Algorithms 1-3 can beused to find sparse solutions to general systems of linear constraints as fol-lows. Given the system Ax {≤ , = , ≥} b , add the variable zeroing constraints x = . Now apply any of Algorithms 1-3 with candidates chosen only fromamong the constraints x = . The MAX FS solution provides the sparsestsolution by allowing the smallest number of nonzero variables. Finding thesparsest solution to an underdetermined system of linear equalities in un-bounded variables is of particular interest in a number of applications. Wefocus on this problem here.Chinneck’s algorithms have been adapted for underdetermined systemsof linear equations in unbounded variables. Jokar and Pfetsch [22] modifiedChinneck’s algorithm as follows. Given the m × n linear system Ax = b inunbounded variables, all variables x j are first replaced by u j − v j where both u j and v j are nonnegative, and the following LP is constructed: min Z = X j ( u j + v j ) s.t. A ( u − v ) = b , u ≥ , v ≥ (2)6his LP has m constraints in 2 n nonnegative variables. The objective func-tion attempts to drive all variables to zero in a manner similar to drivingelastic variables to zero in Algorithms 1-3.The basic algorithm follows the logic in Fig. 1, except that candidatesare variables and not constraints. Additional differences from Fig. 1: • Candidate variables are those not in the MIN ULR set and having anonzero value in the current LP solution. Candidates are sorted indecreasing order by the magnitude of | u j − v j | . • Candidate variables are tested in the inner loop by temporarily remov-ing them from the objective function by setting the objective coeffi-cients of the associated u j , v j pair to zero and re-solving the LP. • A candidate variable x j is moved into the MIN ULR set by resettingthe objective function coefficients of u j and v j to zero permanently sothat x j > Z . • At exit, the variables in the MIN ULR set are those that can takenonzero values in the sparse solution. • Extraneous variables are sometimes included in the MIN ULR solution.A simple post-processing step may remove them. First construct A mn by eliminating all columns from A that are not in the MIN ULR solu-tion. For each variable in the MIN ULR set in turn, force it to zero andtest the feasibility of A mn x = b ; if feasible then the tested variable ispermanently removed from the MIN ULR set.The well-known BasisP ursuit (BP) algorithm [23] solves system (2) asingle time. This works well if the solution is very sparse, but when this isnot the case, Fig.1 with the modifications above is more effective.Several other recent variants have been shown to be very effective for thisproblem [24, 11]: • Method B uses system (2). The objective coefficients of the u j and v j associated with variable x j moved into the MIN ULR set are perma-nently reset to 0.1 instead of zero [24]. Since Z never reaches 0, exit isinstead triggered when | CandidateSet | = 0.7 Method M combines Method B with
Basis Pursuit (BP) [23], basedon the observation that if BP fails, it returns a solution that is typicallyof size m or nearly so. Thus, the BP result (the set of nonzeros in thefirst solution of (2)) is taken when the number of nonzeros is less than m −
3. Method B is applied when the number of nonzeros in that firstsolution is larger than m − • Method C has explicit elastic variable zeroing constraints, resultingin the following LP:min Z = X j ( e + j + e − j ) s.t. (cid:20) A 0 m × n m × n I I − I (cid:21) xe + e − = (cid:20) y m × n × (cid:21) (3)where e + j and e − j are nonnegative, A is m × n and I is n × n . This LPhas m + n constraints in 3 n variables. There are two lists of candidates:(1) a list based on the magnitude of nonzero variables not yet assignedto the MIN ULR set, and (2) a list based on the dual prices of thevariable zeroing constraints x j + e + j − e − j = 0. The selected variable isadded to the MIN ULR set by zeroing the objective coefficients of thecorresponding elastic variables.
3. Extensions to Increase Solution Speed
The existing algorithms are effective, but there are opportunities for im-proving their speed. Limiting the length of the candidate list is helpful,but there are still two bottlenecks: evaluating multiple candidates in theinner loop, and removing just a single constraint in the outer loop. NewExtension 1 eliminates the inner loop entirely and removes multiple candi-date constraints simultaneously in the single remaining loop. The trick is inidentifying which constraints to remove in each iteration. New Extension 2provides an early exit under certain conditions.Each of the extensions can be combined with any of the existing list con-struction algorithms to create new combinations, e.g. combining Algorithm2( ∞ ) and Extension 1 creates Algorithm 2( ∞ )E1. There are three ways to assemble the candidate list in Algorithms 1-3.Regardless of the type of list, previous research shows that the order in the8orted list of candidates is important: those earlier in the list are much morelikely to be part of the MIN ULR than those later in the list. Thus, if weremove multiple constraints at each iteration of a single loop, it makes senseto select them from the start of the sorted list. This argument is especiallypotent when the constraint matrix is dense. In a dense constraint matrix, allconstraints involve all of the variables, so all of the constraints are similar instructure. A consequence is that many of the constraints have similar impactson Z . For example, in the classification application, where the constraintsare derived from the data points, data points that are close together generatesimilar constraints. These similar constraints have similar ranking scores inthe candidate list, and hence have similar effects on Z when removed fromthe LP, leading to this Observation : When candidates have similar rankingmeasures as the top-ranked element in the candidate list, they are likely toappear in the final MIN ULR set .This observation allows both the elimination of the inner loop to testindividual candidate constraints and the simultaneous removal of multiplecandidates in the single remaining loop. The question is how to determinewhich ranking scores are “similar” to the top score. We sort the candidates bydescending ranking score and apply standard methods for detecting abruptchanges in the mean of subsets of the set [25]. We select for removal allcandidates up to the first identified abrupt change.
Extension 2 provides an early exit. If the number of candidates is lessthan or equal to ℓ , then permanently remove all candidates and exit thealgorithm. This is designated E2( ℓ ).
4. Application: Binary Classification
Amaldi [2], Parker [17], Chinneck [26, 27, 21], and Silva [28] have shownhow the binary classification problem can be transformed into a MAX FSproblem:
Given a training set of I data points ( i =1. . . I ) in J dimensions( j =1. . . J ) where the class of each point is known (Type 0 or 1). Define a constraint for each data point: – For points of Type 0: P j d ij w j ≤ w − ǫ For points of type 1: P j d ij w j ≥ w + ǫ where d ij indicates the value of attribute j for point i . The d ij areknown constants, ǫ is a small positive constant and the variables w j are unrestricted.Most datasets cannot be completely separated by a single hyperplane, sothe initial system is infeasible. After the MAX FS solution, the final LP isfeasible, and its solution returns the separating hyperplane P j d ij w j = w which correctly classifies all of the data points in the MAX FS subset whilethose in the MIN ULR subset are incorrectly classified. This is heuristicallythe most accurate hyperplane. Prior work shows that Algorithm 2 provides a major speed-up over Al-gorithm 1 for binary classification [21], with a tiny loss in average accuracy.Violated constraints correspond to misclassified points, so the product rank-ing score (Eqn. 1) used in Algorithm 2 estimates the drop in Z relativelyaccurately. A standard elastic program is used because the variables areunrestricted. Since all attribute values are specified for all data points, theconstraint matrix, D , is dense; hence, Extension 1 can be applied. The re-vised algorithm is designated as Algorithm 2E1, and is summarized in Fig.2. To illustrate the workings of Extension 1, Fig. 3 shows the first sortedcandidate list for the “Ozone Level Detection” dataset. The first sharp drop-off occurs after the fifth candidate, so the first five constraints/points areremoved in this case. We compare Algorithm 2E1 with two existing algorithms: (i) originalAlgorithm 2( ∞ ) and (ii) original Algorithm 2(1). Both have been shown toprovide high accuracies in binary classification [21]. Accuracy is used to evaluate the quality of the solutions:
Accuracy = Total correctly classified instancesTotal instances (4)For comparison with previous results [21], the training set is identical to theentire dataset. 10 igure 2
Algorithm 2E1
MINULR ← ∅
Set up the standard elastic LP. do:
Solve LP. if Z = 0 then Exit. end if
Construct
CandidateSet using Eq. 1. if | CandidateSet | = 1 then Add the single candidate to
MINULR and exit. end if
Sort candidates from largest to smallest product.Select top p candidates for removal based on abrupt change in product score.Add the p selected candidates to MINULR and remove them from the LP. end doOUTPUT : MINULR is a heuristic MIN ULR solution. P r odu c t m a gn i t ud e Sorted candidates
Figure 3:
First Sorted Candidate List for “Ozone Level Detection” Dataset. .4. Data Sets Twenty binary classification problems are derived from datasets in theUCI Repository of Machine Learning Databases [29]. Table 2 summarizestheir characteristics.
Table 2: Classification Datasets
Dataset Instances Features
Breast Cancer Wisconsin Original 683 9Cardiotocography (Type 1 vs. others) 2126 34Cardiotocography (Type 2 vs. others) 2126 34Car Evaluation (Type 1 vs. others) 1728 6Car Evaluation (Type 2 vs. others) 1728 6Car Evaluation (Type 3 vs. others) 1728 6Car Evaluation (Type 4 vs. others) 1728 6Climate Model Simulation Crashes 540 18Credit Approval 653 15Diabetic Retinopathy Debrecen 1151 19Glass Identification 214 9Heart Disease 297 13Hepatitis Domain 112 18Ionosphere 351 33BUPA Liver Disorders 341 6Ozone Level Detection 1848 102Parkinson’s Data Set 195 22Pima Indians Diabetes 768 8Thyroid Gland Data 215 5Vowel Recognition Data 990 11
All algorithms are implemented in Matlab version 2020.The linear pro-gramming solver is MOSEK via the MOSEK Optimization Toolbox for Mat-lab version 8 . . .
56 [30]. Abrupt changes in the ranking scores in the can-didate list are identified using the Matlab ischange () function. The compu-tations are carried out on a 3 .
40 GHz Intel core i . GB RAM, running Windows 10.
The results in this section show the higher accuracy and greater speedof Algorithm 2E1 vs. original Algorithms 2( ∞ ) and 2(1) for binary classi-fication. Table 3 reports algorithm accuracies with the best results bolded.12able 4 shows the difference from the best accuracy on each data set. Thelast three rows give additional metrics: the largest difference from the bestaccuracy, the average difference from the best accuracy, and number of bestaccuracies obtained by each algorithm.Algorithm 2E1 provides better total accuracy. It has the smallest averagedifference from best accuracy, and provides the best accuracy for 12 of the 20data sets while it was 8 for the two existing algorithms. The largest differencefrom the best accuracy for Algorithm 2E1 is − .
38, which is smaller thanthose for the other two ( − .
28 for both).
Table 3: Algorithm Accuracies for Classification Datasets.
Dataset AccuracyAlgorithm 2( ∞ ) Algorithm 2(1) Algorithm 2E1 Breast Cancer Wisconsin Original 98.10 98.10
Cardiotocography (Type 1 vs.others) 99.72 99.72
Cardiotocography (Type 2 vs.others) 99.67 99.58
Car Evaluation (Type 1 vs.others) 78.47 78.47
Car Evaluation (Type 2 vs.others)
Car Evaluation (Type 3 vs.others) 87.79 88.42
Car Evaluation (Type 4 vs.others)
Hepatitis Domain 92.86
Ionosphere 98.01
Parkinsons Data Set 94.36 94.87
Pima Indians Diabetes
Vowel Recognition Data 98.38 98.38
Table 5 compares the number of linear programs solved (
LPs ) and theprocessing time in seconds ( sec ) for each algorithm, with best results in boldface. Algorithm 2E1 requires the fewest LP solutions and the least processingtime in all cases. Table 6 reports the percentage reduction in number ofLPs and processing time for Algorithm 2E1 vs. the comparators for eachdataset. For example, the processing times for Algorithms 2( ∞ ) and 2E1 are3.1 and 0.3 seconds respectively for the first dataset in Table 5. Therefore,Arithm 2E1 provides a 90 .
32% reduction in processing time when comparedto Algorithm 2( ∞ ) for the first dataset in Table 6. The last two rows ofTable 6 show the maximum and average speed improvements obtained by13 able 4: Differences from Best Accuracy. Dataset Algorithm 2( ∞ ) Algorithm 2(1) Algorithm 2E1 Breast Cancer Wisconsin Original -0.29 -0.29 Cardiotocography (Type 1 vs.others) -0.14 -0.14 Cardiotocography (Type 2 vs.others) -0.14 -0.23 Car Evaluation (Type 1 vs.others) -4.28 -4.28 Car Evaluation (Type 2 vs.others)
Car Evaluation (Type 3 vs.others) -1.22 -0.59 Car Evaluation (Type 4 vs.others) -0.06 -0.06Climate Model Simulation Crashes -0.19 -0.18Credit Approval -1.37 -1.37Diabetic Retinopathy Debrecen -0.35 -0.79Glass Identification -0.45 -1.38Heart Disease Cleveland Hepatitis Domain -3.57
Ionosphere -0.28BUPA Liver Disorders -0.29 -0.88Ozon Level Detection -0.33 -0.06 Parkinsons Data Set -2.05 -1.54 Pima Indians Diabetes -0.39 -0.39Thyrod gland data Vowel Recognition Data -0.1 -0.1 -4.28 -4.28 -1.38Average Difference -0.66 -0.46 -0.27No. of bests Algorithm 2E1 when compared with the other two algorithms. Algorithm2E1 reduces the average number of LPs by about 96 .
56% and 67 .
90% whencompared to Algorithms 2( ∞ ) and 2(1), respectively, and also reduces thesolution time by 94 .
77% and 71 .
31% when compared to Algorithm 2( ∞ ) andAlgorithm 2(1) respectively.Algorithm 2E1 thus reduces solution time by an average 70-90% whencompared to the existing algorithms, while also marginally improves the totalaccuracy on average.
5. Application: Sparse Recovery in Compressive Sensing
Finding a sparse solution to an underdetermined system of linear equa-tions in unbounded variables is an essential step in compressive sensing (CS)[31]. A common way to compress an input signal vector x in CS is to multiplyit by a random matrix A of size m × n where m << n to yield a compressedvector b of size m , which is transmitted or stored. The input vector is as-sumed to be sparse, so the decompression or recovery process tries to find asparse solution given only A and the compressed vector b . The CS sparse able 5: Algorithm speeds (number of LPs, seconds) for binary classification. Dataset Algorithm 2( ∞ ) Algorithm 2(1) Algorithm 2E1LPs Sec LPs Sec LPs Sec Breast Cancer Wisconsin Original 100 3.1 12 0.7
Cardiotocography (Type 1 vs.others) 35 4.3 4 0.6
Cardiotocography (Type 2 vs.others) 67 5.1 4 0.6
Car Evaluation (Type 1 vs.others) 2583 350.6 372 28.1
Car Evaluation (Type 2 vs.others) 462 28.7 69 4.5
Car Evaluation (Type 3 vs.others) 1458 183.2 203 13.8
10 0.9
Car Evaluation (Type 4 vs.others) 205 16.7 33 2.3
Climate Model Simulation Crashes 77 3.2 7 0.3
Credit Approval 1237 37.1 87 7.3
Diabetic Retinopathy Debrecen 4723 625.2 238 25.6
16 1.4
Glass Identification 292 3.1 33 1.5
Heart Disease Cleveland 404 5.3 31 1.0
Hepatitis Domain 16 0.4 4 0.2
Ionosphere 104 4.2 7
BUPA Liver Disorders 581 14.8 83 2.7
10 0.3
Ozon Level Detection 542 159.7 14 4.1
Parkinsons Data Set 103 2.7 7
Pima Indians Diabetes 1325 41.7 153 6.7
12 0.5
Thyrod gland data 61 1.3 12 0.4
Vowel Recognition Data 146 4.0 14 0.5 recovery problem is to find a sparse solution y to the underdetermined system Ay = b , where y is of size n × mathbf A , and thus the recovery process becomes more suitable forthe extended algorithms introduced here. This section examines whetherthe new algorithm extensions can increase the speed of the MAX FS basedalgorithms for the sparse recovery problem without loss of solution quality. Methods B, C, and M (Section 2.4) have been applied to CS sparse re-covery and are observed to provide improvements over the state of the art incertain scenarios. Previous work [11] showed Method M to be one of the bet-ter methods for CS sparse recovery. So we combine it with our two extensionsto create Method ME1E2( ℓ ). The steps are summarized in Fig. 4.Method ME1E2( ℓ ) assumes BP failure if the length of the first candidatelist is greater than the defined threshold ℓ . We set ℓ to m −
3, as done for15 able 6: Percentage reductions in LPs solved and solution times by using Algorithm 2E1instead of original Algorithms 2( ∞ ) and 2(1) for classification. Dataset Algorithm 2E1 vs.Algorithm 2( ∞ ) Algorithm 2E1 vs.Algorithm 2(1)LPs Sec LPs Sec Breast Cancer Wisconsin Original 94 90.32 50 57.14Cardiotocography (Type 1 vs.others) 94.29 93.02 50 50Cardiotocography (Type 2 vs.others) 97.01 94.12 50 50Car Evaluation (Type 1 vs.others) 99.69 99.80 97.85 97.51Car Evaluation (Type 2 vs.others) 99.78 99.65 98.55 97.78Car Evaluation (Type 3 vs.others) 99.31 99.51 95.07 93.48Car Evaluation (Type 4 vs.others) 97.56 97.60 84.85 82.61Climate Model Simulation Crashes 93.51 93.75 28.57 33.33Credit Approval 99.92 99.73 98.85 98.63Diabetic Retinopathy Debrecen 99.66 99.78 93.28 94.53Glass Identification 97.60 90.32 78.79 80Heart Disease Cleveland 98.51 96.23 80.64 80Hepatitis Domain 87.5 75 50 50Ionosphere 95.19 97.62 28.57 66.67BUPA Liver Disorders 98.28 97.97 87.95 88.89Ozon Level Detection 98.71 98.68 50 48.78Parkinsons Data Set 96.12 96.30 42.86 74.36Pima Indians Diabetes 99.09 98.80 92.16 92.54Thyrod gland data 90.16 84.61 50 50Vowel Recognition Data 95.20 92.5 50 40
Maximum Percent Reduction
Average Percent Reduction
Method M in [11].
Method ME1E2( ℓ ) is compared with the original Methods B(2), C(2),and M(2). We choose list length k = 2 to provide fast solutions with goodsolution quality ( k is typically 1 − In sparse recovery, if the number of nonzeros in recovered signal y equalsthe number of nonzeros in the input signal x , then y and x are usuallyidentical. For this reason our main metric relates to the sparsity (number ofnonzeros) of y vs. x . The critical sparsity is the largest number of nonzerosin the input vector x for which the sparsity of y reliably equals the sparsityof x . Higher critical sparsity means that input vectors that have been morecompressed, or that they have more nonzeros and therefore can be reliably16 igure 4 Algorithm ME1E2( ℓ ) MINULR ← ∅
Set up the sparse recovery LP. do:
Solve LP.Construct
CandidateSet : variables x j not in MINULR that have nonzero | u j − v j | . if | CandidateSet | = φ then exit. if first iteration and | CandidateSet | ≤ ℓ then Add all candidates to
MINULR and exit. end if
Sort candidates from largest to smallest | u j − v j | .Select top p candidates for removal based on first abrupt change in | u j − v j | .Add the p selected candidates to MINULR and set the objective coefficients of theassociated u j , v j pairs to 0.1. end doOUTPUT : MINULR is small number of nonzeros forming a support for the system ofequations. recovered. We use the critical sparsity as the evaluation metric: higher isbetter.
We construct examples for which the solution is known. The system is A × x × = b × , providing a compression ratio ( CR ) = (1 − mn ) × A is a random matrix with values ranging from -10 to 10, and hencedense. The constructed input vector x has various numbers of nonzerosranging from 12 to 96, randomly distributed among its elements, and withvalues randomly drawn from a normal distribution with mean 0 and variance1. See Section 4.5.
Table 7 compares Algorithm ME1E2( ℓ ) with existing methods, namely,Methods B(2), C(2), and M(2). Results in the table are averages over 100instances at each sparsity level S (actual number of nonzeros in the solution). Estimated sparsity ( T ) is the average number of nonzeros returned by eachmethod at each value of S over 100 instances, with the number of correct17esults shown in brackets. Results that are accurate in all 100 cases are shownin boldface. The critical sparsity is the largest number shown in boldface foreach method. Extended Algorithm ME1E2( ℓ ) has the same critical sparsityas the other methods.Table 8 shows the number of LPs solved and the processing time forMethod ME1E2( ℓ ) and the comparators. The smallest number of LPs andtime in seconds are bolded. Method ME1E2( ℓ ) requires the fewest LP solu-tions at all values of S . It is significantly faster than the other algorithms, asshown in Table 9 which reports the percentage reduction in LP solutions re-quired and processing times for Method ME1E2( ℓ ) when compared to others.The last two rows of the table show the maximum and the average percentimprovement. At S = 52, Method ME1E2( ℓ ) reduces the solution time by96 . . .
83% compared to Methods C(2), B(2), and M(2),respectively. Method ME1E2( ℓ ) does not run faster than Method M(2) whenthe input is very sparse (up to S = 36), but for S >
36 it reduces solutiontime when compared to M(2) by 62 .
32% on average . Method ME1E2( ℓ )provides a 98 . . .
61% reduction in LPs solved when com-pared to Methods C(2), B(2), and M(2), respectively.It reduces the averagesolution time by 95 . . .
32% when compared to MethodsC(2), B(2), and M(2), respectively.Method ME1E2( ℓ ) reduces solution time significantly with no loss of so-lution quality (as measured by critical sparsity) for this CS sparse recoveryscenario. 18 able 7: Average and critical sparsities of Method ME1E2( ℓ ) vs. Methods C(2), B(2) andM(2) for random matrices of size m = 128 and n = 256 in CS sparse recovery. S Estimated sparsity (T)Method C(2) Method B(2) Method M(2) Method ME1E2( ℓ )12 12 (100) (100) (100) (100)
16 16 (100) (100) (100) (100)
20 20 (100) (100) (100) (100)
24 24 (100) (100) (100) (100)
28 28 (100) (100) (100) (100)
32 32 (100) (100) (100) (100)
36 36 (100) (100) (100) (100)
40 40 (100) (100) (100) (100)
44 44 (100) (100) (100) (100)
48 48 (100) (100) (100) (100)
52 52 (100) (100) (100) (100) (99) (98) (98) (98) (90) (86) (86) (86) (77) (67) (67) (73) (50) (45) (45) (39) (25) (25) (25) (19) (9) (5) (5) (6) (3) (4) (4) (3) (1) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) able 8: Number of LPs and processing times for Method ME1E2( ℓ ) vs. Methods C(2),B(2) and M(2) for random matrices of size m = 128 and n = 256 in CS sparse recovery. S Method C(2) Method B(2) Method M(2) Method ME1E2( ℓ )LPs Sec LPs Sec LPs Sec LPs Sec12
48 1.44 23 0.84
64 2.02 31 1.16
493 23.7 245.06 13.91 256.08 14.05
6. Conclusions
We propose two new extensions to Chinneck’s original MAX FS solutionalgorithms [3] [21] for use with dense constraint matrices. These extensionsincrease solution speed greatly with no loss of solution quality when tested intwo applications: classification, and sparse recovery in compressive sensing: • Classification : The new Algorithm 2E1 reduces the mean solution timeby about 94 .
77% and 71 .
31% when compared to Algorithms 2( ∞ ) and2(1), respectively, while slightly improving accuracy on average. Thebetter classification accuracies reflect larger feasible subsets found bythe algorithm over its predecessors. • Sparse recovery in compressive sensing : The new Method ME1E2( ℓ )reduces processing time by 95 . .
86% and 62 .
32% when comparedwith Methods C, B and M, on average, with no reduction in the criticalsparsity.We expect that these results will generalize to other applications that havedense constraint matrices. We plan to investigate the use of the extended20 able 9: Percent reduction in number of LPs and processing times for Method ME1E2( ℓ )vs. Methods C(2), B(2) and M(2) for random matrices of size m = 128 and n = 256 inCS sparse recovery. S Method ME1E2( ℓ ) vs.Method C(2) Method ME1E2( ℓ ) vs.Method B(2) Method ME1E2( ℓ ) vs.Method M(2)LPs Sec LPs Sec LPs Sec12 Maximum
Average algorithms in the recovery of compressively sensed biomedical signals anddictionary learning. We also intend to extend the basic ideas to less denseconstraint matrices, including those found in general LPs.
References [1] E. Amaldi, M. E. Pfetsch, L. E. Trotter, Some structural and algorithmicproperties of the maximum feasible subsystem problem, In InternationalConference on Integer Programming and Combinatorial Optimization(1999) 45–59.[2] E. Amaldi, From finding maximum feasible subsystems of linear systemsto feedforward neural network design, PhD dissertation, Swiss FederalInstitute of Technology at Lausanne (EPFL), 1994.213] J. W. Chinneck, An effective polynomial-time heuristic for theminimum-cardinality iis set-covering problem, Annals of Mathematicsand Artificial Intelligence 17 (1996) 127–144.[4] N. Chakravarti, Some results concerning post-infeasibility analysis,Oper. Res. 73 (1994) 139–143.[5] E. Amaldi, V. Kann, The complexity and approximability of findingmaximum feasible subsystems of linear relations, Oper. Res. 147 (1994)181–210.[6] J. K. Sankaran, A note on resolving infeasibility in linear programs byconstraint relaxation, Oper. Res. 13 (1993) 19–20.[7] K. P. Bennett, E. J. Bredensteiner, A parametric optimization methodfor machine learning, INFORMS Journal on Computing 9 (1997) 311–318.[8] O. L. Mangasarian, Misclassification minimization, Journal of GlobalOptimization 5 (1994) 309–323.[9] F. Rossi, S. Smriglio, A. Sassano, Models and algorithms for terrestrialdigital, Annals of Operations Research 107 (2001) 267–283.[10] M. Wagner, R. Elber, Large-scale linear programming techniques forthe design, Annals of Operations Research 318 (2004) 301–318.[11] F. F. Firouzeh, J. W. Chinneck, S. Rajan, Maximum feasible subsystemalgorithms for recovery of compressively sensed speech, IEEE Access 8(2020) 82539–82550.[12] J. W. Chinneck, Feasibility and Infeasibility in Optimization: Algo-rithms and Computational Methods, volume 118, Springer Science &Business Media, 2007.[13] E. Amaldi, The maximum feasible subsystem problem and some appli-cations, Modelli e Algoritmi per l’Ottimizzazione di Sistemi Complessi,A. Agnetis and G. D. Pillo, eds., Pitagora Editrice, Bologna, Italy (2003)31–69.[14] O. L. Mangasarian, Misclassification minimization, Journal of GlobalOptimization 5 (1994) 309–323. 2215] O. Mangasarian, Machine learning via polyhedral concave minimization,in: Applied Mathematics and Parallel Computing, Springer, 1996, pp.175–188.[16] K. P. Bennett, E. J. Bredensteiner, A parametric optimization methodfor machine learning, INFORMS Journal on Computing 9 (1997) 311–318.[17] M. R. Parker, A set covering approach to infeasibility analysis of linearprogramming problems and related issues, Ph.D. thesis, University ofColorado at Denver Denver, Colorado, 1995.[18] M. Parker, J. Ryan, Finding the minimum weight iis cover of an infea-sible system of linear inequalities, Annals of Mathematics and ArtificialIntelligence 17 (1996) 107–126.[19] M. E. Pfetsch, The maximum feasible subsystem problem and vertex-facet incidences of polyhedra, Ph.D. thesis, TU Berlin, Berlin, 2003.[20] M. E. Pfetsch, Branch-and-cut for the maximum feasible subsystemproblem, SIAM Journal on Optimization 19 (2008) 21–38.[21] J. W. Chinneck, Fast heuristics for the maximum feasible subsystemproblem, INFORMS Journal on Computing 13 (2001) 210–223.[22] S. Jokar, M. E. Pfetsch, Exact and approximate sparse solutions of un-derdetermined linear equations, SIAM Journal on Scientific Computing31 (2008) 23–44.[23] S. S. Chen, D. L. Donoho, M. A. Saunders, Atomic decomposition bybasis pursuit, SIAM review 43 (2001) 129–159.[24] J. W. Chinneck, Sparse solutions of linear systems via maximum feasiblesubsets, Les Cahiers du GERAD G-2018-104 (2018).[25] R. Killick, P. Fearnhead, I. A. Eckley, Optimal detection of changepointswith a linear computational cost, Journal of the American StatisticalAssociation 107 (2012) 1590–1598.[26] J. W. Chinneck, Tailoring classifier hyperplanes to general metrics, in:Operations research and cyber-infrastructure, Springer, 2009, pp. 365–387. 2327] J. W. Chinneck, Integrated classifier hyperplane placement and featureselection, Expert Systems with Applications 39 (2012) 8193–8203.[28] A. P. D. Silva, Optimization approaches to supervised classification,European Journal of Operational Research 261 (2017) 772–788.[29] D. Dua, C. Graff, UCI machine learning repository, 2017. URL: http://archive.ics.uci.edu/ml .[30] Mosek ApS , The mosek optimization software, 2018. URL: , last accessed 2021.[31] E. J. Cand`es, M. B. Wakin, An introduction to compressive sampling,IEEE signal processing magazine 25 (2008) 21–30.[32] S. Qaisar, R. M. Bilal, W. Iqbal, M. Naureen, S. Lee, Compressive sens-ing: From theory to applications, a survey, Journal of Communicationsand networks 15 (2013) 443–456.[33] S. B. Meenakshi, A survey of compressive sensing based greedy pursuitreconstruction algorithms, International Journal of Image, Graphics andSignal Processing 7 (2015) 1–10.[34] F. Salahdine, N. Kaabouch, H. El Ghazi, A survey on compressive sens-ing techniques for cognitive radio networks, Physical Communication20 (2016) 61–73. 24 o w e l R e c ogn i t i on D a t a T h y r od g l and da t a P i m a I nd i an s D i abe t e s P a r k i n s on s D a t a S e t O z on Le v e l D e t e c t i on B U P A L i v e r D i s o r de r s I ono s phe r e H epa t i t i s D o m a i n H ea r t D i s ea s e C l e v e l and G l a ss I den t i f i c a t i on D i abe t i c R e t i nopa t h y D eb r e c en C r ed i t A pp r o v a l C li m a t e M ode l S i m u l a t i on C r a s he s C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) B r ea s t C an c e r W i sc on s i n O r i g i na l Dataset -7-6-5-4-3-2-10123 A cc u r a cy d i ff e r en c e Algorithm2Algorithm2(2)Algorithm2 E2(L)E3(p) V o w e l R e c ogn i t i on D a t a T h y r od g l and da t a P i m a I nd i an s D i abe t e s P a r k i n s on s D a t a S e t O z on Le v e l D e t e c t i on B U P A L i v e r D i s o r de r s I ono s phe r e H epa t i t i s D o m a i n H ea r t D i s ea s e C l e v e l and G l a ss I den t i f i c a t i on D i abe t i c R e t i nopa t h y D eb r e c en C r ed i t A pp r o v a l C li m a t e M ode l S i m u l a t i on C r a s he s C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r E v a l ua t i on ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) C a r d i o t o c og r aph y ( t y pe vs . o t he r s ) B r ea s t C an c e r W i sc on s i n O r i g i na l Dataset -8-6-4-2024 A cc u r a cy d i ff e r en c e Algorithm2Algorithm2(2)Algorithm2 (2)E1(2)E3(p)
70 75 80 85 90 95 100
Accuracy
Vowel Recognition DataThyrod gland dataPima Indians DiabetesParkinsons Data SetOzon Level DetectionBUPA Liver DisordersIonosphereHepatitis DomainHeart Disease ClevelandGlass IdentificationDiabetic Retinopathy DebrecenCredit ApprovalClimate Model Simulation CrashesCar Evaluation (type 4 vs.others)Car Evaluation (type 3 vs.others)Car Evaluation (type 2 vs.others)Car Evaluation (type 1 vs.others)Cardiotocography (type 2 vs.others)Cardiotocography (type 1 vs.others)Breast Cancer Wisconsin Original D a tt
Vowel Recognition DataThyrod gland dataPima Indians DiabetesParkinsons Data SetOzon Level DetectionBUPA Liver DisordersIonosphereHepatitis DomainHeart Disease ClevelandGlass IdentificationDiabetic Retinopathy DebrecenCredit ApprovalClimate Model Simulation CrashesCar Evaluation (type 4 vs.others)Car Evaluation (type 3 vs.others)Car Evaluation (type 2 vs.others)Car Evaluation (type 1 vs.others)Cardiotocography (type 2 vs.others)Cardiotocography (type 1 vs.others)Breast Cancer Wisconsin Original D a tt a s e tt