Algorithmically generating new algebraic features of polynomial systems for machine learning
aa r X i v : . [ c s . S C ] J un Algorithmically generating new algebraicfeatures of polynomial systemsfor machine learning
Dorian Florescu and Matthew England
Faculty of Engineering, Environment and Computing,Coventry University, Coventry, CV1 5FB, UK { Dorian.Florescu, Matthew.England } @coventry.ac.uk Abstract.
There are a variety of choices to be made in both computeralgebra systems (CASs) and satisfiability modulo theory (SMT) solverswhich can impact performance without affecting mathematical correct-ness. Such choices are candidates for machine learning (ML) approaches,however, there are difficulties in applying standard ML techniques, suchas the efficient identification of ML features from input data which is typ-ically a polynomial system. Our focus is selecting the variable orderingfor cylindrical algebraic decomposition (CAD), an important algorithmimplemented in several CASs, and now also SMT-solvers. We created aframework to describe all the previously identified ML features for theproblem and then enumerated all options in this framework to automati-cally generation many more features. We validate the usefulness of thesewith an experiment which shows that an ML choice for CAD variable or-dering is superior to those made by human created heuristics, and furtherimproved with these additional features. We expect that this techniqueof feature generation could be useful for other choices related to CAD,or even choices for other algorithms with polynomial systems for input.
Keywords: machine learning; feature generation; non-linear real arith-metic; symbolic computation; cylindrical algebraic decomposition
Machine Learning (ML), that is statistical techniques to give computer systemsthe ability to learn rules from data, is a topic that has found great success in adiverse range of fields over recent years. ML is most attractive when the under-lying functional relationship to be modelled is complex or not well understood.Hence ML has yet to make a large impact in the fields which form SC , SymbolicComputation and Satisfiability Checking [1], since these prize mathematical cor-rectness and seek to understand underlying functional relationships. However,as most developers would acknowledge, our software usually comes with a rangeof choices which, while having no effect on the correctness of the end result,could have a great effect on the resources required to find it. These choices rangefrom the low level (in what order to perform a search that may terminate early) D. Florescu and M. England to the high (which of a set of competing exact algorithms to use for this prob-lem instance). In making such choices we may be faced with decisions whererelationships are not fully understood, but are not the key object of study.In practice such choices may be made by man-made heuristics based on someexperimentation (e.g. [18]) or magic constants where crossing a single thresholdchanges system behaviour [11]. It is likely that many of these decisions could beimproved by allowing learning algorithms to analyse the data. The broad topicof this paper is ML for algorithm choices where the input is a set of polynomials,which encompasses a variety of tools in computer algebra systems and the SMTtheory of [Quantifier Free] Non-Linear Real Arithmetic, [QF]NRA.There has been little research on the use of ML in computer algebra: only[28] [27] [24] on the topic of CAD variable ordering choice; [26], [27] on thequestion of whether to precondition CAD with Groebner Bases; and [31] ondeciding the order of sub-formulae solving for a QE procedure. Within SMTthere has been significant work on the Boolean logic side e.g. the portfolio SATsolver
SATZilla [45] and
MapleSAT [33] which views solver branching as anoptimisation problem. However there is little work on the use of ML to chooseor optimise theory solvers. We note that other fields of mathematical softwareare ahead in the use of ML, most notably the automated reasoning community(see e.g. [42], [32], [7], or the brief survey in [19]).
There are difficulties in applying standard ML techniques to problems in NRA.One is the lack of sufficiently large datasets, which is addressed only partiallyby the SMT-LIB. The experiment in [26] found that the [QF]NRA sections ofthe SMT-LIB too uniform, and had to resort to random generated examples(although the state of benchmarking in computer algebra is far worse [22]).There have been improvements since then, with the benchmarks increasing bothin number and diversity of underlying application. For example, there are nowproblems arising from biology [4], [23] and economics [37], [38].Another difficulty is the identification of suitable features from the input withwhich to train the ML models. There are some obvious candidates concerningthe size and degrees of polynomials, and the distribution of variables. However,this provides a starting set (i.e. before any feature selection takes place) thatis small in comparison to other machine learning applications. The main focusof this paper is to introduce a method to automatically (and cheaply) generatefurther features for ML from polynomial systems.
Our main contributions are the new feature generation approach described inSection 3 and the validation of its use in the experiments described in Sections4 −
5. The experiments are for the choice of variable ordering for cylindrical al-gebraic decomposition, a topic whose background we first present in Section 2,but we emphasise that the techniques may be applicable more broadly. enerating new features of polynomial systems for machine learning 3 A Cylindrical Algebraic Decomposition (CAD) is a decomposition of ordered R n space into cells arranged cylindrically : the projections of any pair of cells withrespect to the variable ordering are either equal or disjoint. The projections forman induced CAD of the lower dimensional space. The cells are (semi)-algebraicmeaning each can be described with a finite sequence of polynomial constraints.A CAD is produced to be truth-invariant for a logical formula (so the formulais either true or false on each cell). Such a decomposition can then be usedto perform Quantifier Elimination (QE) over the reals, i.e. given a quantifiedTarski formula find an equivalent quantifier free formula over the reals. Forexample, QE would transform ∃ x, ax + bx + c = 0 ∧ a = 0 to the equivalentunquantified statement b − ac ≥
0. A CAD over the ( x, a, b, c )-space could beused to ascertain this, so long as the variable ordering ensured that there was aninduced CAD of ( a, b, c )-space. We test one sample point per cell and constructa quantifier free formula from the relevant semi-algebraic cell descriptions.CAD was introduced by Collins in 1975 [15] and works relative to a set ofpolynomials. Collins’ CAD produces a decomposition so that each polynomialhas constant sign on each cell (thus truth-invariant for any formula built withthose polynomials). The algorithm first projects the polynomials into smallerand smaller dimensions; and then uses these to lift − to incrementally builddecompositions of larger and larger spaces according to the polynomials at thatlevel. For further details on CAD see for example the collection [12].QE has numerous applications throughout science and engineering [41]. Ourwork also speeds up independent applications of CAD, such as reasoning withmulti-valued functions [17] or motion planning [44]. The definition of cylindricity and both stages of the algorithm are relative to anordering of the variables. For example, given polynomials in variables orderedas x n ≻ x n − ≻ . . . , ≻ x ≻ x we first project away x n and so on until we areleft with polynomials univariate in x . We then start lifting by decomposing the x − axis, and then the ( x , x ) − plane and so so on. The cylindricity conditionrefers to projections of cells in R n onto a space ( x , . . . , x m ) where m < n .There have been numerous advances to CAD since its inception: new projectionschemes [34], [36]; partial construction [16], [43]; symbolic-numeric lifting [40],[29]; adapting to the Boolean structure [5], [20]; and adaptations for SMT [30],[9]. However, in all cases, the need for a fixed variable ordering remains.Depending on the application, the variable ordering may be determined, con-strained, or free. QE, requires that quantified variables are eliminated first andthat variables are eliminated in the order in which they are quantified. How-ever, variables in blocks of the same quantifier (and the free variables) can beswapped, so there is partial freedom. Of course, in the SMT context there is only D. Florescu and M. England a single existentially quantified block and so there is a free choice of ordering.So the discriminant in the example above could have been found with any CADwhich eliminates x first. A CAD for the quadratic polynomial under ordering a ≺ b ≺ c has only 27 cells, but needs 115 for the reverse ordering.Since we can switch the order of quantified variables in a statement when thequantifier is the same, we also have some choice on the ordering of quantifiedvariables. For example, a QE problem of the form ∃ x ∃ y ∀ a φ ( x, y, a ) could besolved by a CAD under either ordering x ≻ y ≻ a or ordering y ≻ x ≻ a .The choice of variable ordering can have a great effect on the time andmemory use of CAD, and the number of cells in the output. Further, Brownand Davenport presented a class of problems in which one variable orderinggave output of double exponential complexity in the number of variables andanother output of a constant size [10]. Heuristics have been developed to choose a variable ordering, with Dolzmann etal. [18] giving the best known study. After analysing a variety of metrics theyproposed a heuristic, sotd , which constructs the full set of projection polyno-mials for each permitted ordering and selects the ordering whose correspondingset has the lowest s um o f t otal d egrees for each of the monomials in each of thepolynomials. The second author demonstrated examples for which that heuristiccould be misled in [6]; and then later showed that tailoring to an implementationcould improve performance [21]. These heuristics all involved potentially costlyprojection operations on the input polynomials.In [28] the second author of the present paper collaborated to use a sup-port vector machine to choose which of three human made heuristics to believewhen picking the variable ordering, based only on simple features of the inputpolynomials. The experiments identified substantial subclasses on which each ofthe three heuristics made the best decision, and demonstrated that the machinelearned choice did significantly better than any one heuristic overall. This workwas picked up again in [24] by the present authors, where ML was used to predictdirectly the variable ordering for CAD, leading to the shortest computing time,with experiments conducted for four different ML models.Both [28] and [24] used a set of 11 human identified features. These did leadto good performance of the models, with ML outperforming the prior humancreated heuristics, but a starting set of 11 features is relatively small for ML andso we hypothesise that identifying more would improve the results. An early heuristic for the choice of CAD variable ordering is that of Brown [8],which chooses a variable ordering according to the following criteria, startingwith the first and breaking ties with successive ones. enerating new features of polynomial systems for machine learning 5 (1) Eliminate a variable first if it appears with the lowest overall degree in theinput.(2) For each variable calculate the maximum total degree for the set of termsin the input in which it occurs. Eliminate first the variable for which this islowest.(3) Eliminate a variable first if there is a smaller number of terms in the inputwhich contain the variable.Despite being computationally cheaper than the sotd heuristic (because the lat-ter performs projections before measuring degrees) experiments in [28] suggestedthis simpler measure actually performs slightly better, although the key messagefrom those experiments is that there were substantial subsets of problems forwhich each heuristic made a better choice than the others.The Brown heuristic inspired almost all the features used by the authorsof [28], [24] to perform ML for CAD variable ordering, with the full set of 11features listed in Table 1 (column 3 will be explained later).
Our new feature generation procedure is based on the observation that all themeasurements taken by the Brown heauristic, and all those features used in [28],[24] can be formalised mathematically using a small number of functions. Forsimplicity, the following discussion will be restricted to polynomials of 3 variablesas these were used in the following experiments, but everything generalises in anobvious way to n variables.Let a problem instance P r be defined by a set of P polynomials P r = {P p | p = 1 , . . . , P } . (1)This is the case for producing a sign-invariant CAD. Of course, any problem in-stance consisting of a logical formula whose atoms are polynomial sign conditionscan also have such a set extracted. Table 1.
Features used by ML in [28] to choose the ordering of 3 variables for CAD. f v P m,p (cid:0)P v d m,pv (cid:1) x among all polynomials max m,p d m,p x among all polynomials max m,p d m,p x among all polynomials max m,p d m,p x occurring in polynomials av p (cid:0) sgn (cid:0)P m d m,p (cid:1)(cid:1) x occurring in polynomials av p (cid:0) sgn (cid:0)P m d m,p (cid:1)(cid:1) x occurring in polynomials av p (cid:0) sgn (cid:0)P m d m,p (cid:1)(cid:1) x occurring in monomials av m,p (sgn ( d m,p ))10 Proportion of x occurring in monomials av m,p (sgn ( d m,p ))11 Proportion of x occurring in monomials av m,p (sgn ( d m,p )) D. Florescu and M. England In the following we define the notation for polynomial variables and coeffi-cients that will be used throughout the manuscript. Each polynomial with index p , for p = 1 , . . . , P contains a different number of monomials, which will belabelled with index m , where m = 1 , . . . , M p and M p denotes the number ofmonomials in polynomial p . We note that these are just labels and are not set-ting an ordering themselves. The degrees corresponding to each of the variables x , x , x are a function of m and p . These need to be explicitly labelled in orderto allow a rigorous definition of our proposed procedure of feature generation.We next write each polynomial as P p = M p X m =1 c m,p · x d m,p x d m,p x d m,p , p = 1 , . . . , P. (2)Here, x v represents the polynomial variables ( v = 1 , , m, p ) of positive integers that label it.Then in turn we denote by d m,pv the degree of variable x v in that monomial, andby c m,p the constant coefficient, i.e., tuple superscripts are giving a label for amonomial in a problem. The original indices are simply a labelling and not anordering of the variables x , x , x . Therefore, any one of our problem instances
P r is uniquely represented bya set of sets S P r = (cid:8)(cid:8) [ c m,p , ( d m,p , d m,p , d m,p )] | m = 1 , . . . , M p (cid:9) | p = 1 , . . . , P (cid:9) . (3)Observe now that each of Brown’s measures can be formalised as a vector offeatures for choosing a variable as follows.(1) Overall degree in the input of a variable: max m,p d m,pv .(2) Maximum total degree of those terms in the input in which a variable occurs:max m,p sgn( d m,pv ) · ( d m,p + d m,p + d m,p )(3) Number of terms in the input which contain the variable: P m,p sgn( d m,pv )In the latter two we use the sign function to discriminate between monomialswhich contain a variable (sign of degree is positive) and those which do not (signof degree is zero). Of course the sign of the degree is never negative.Define now also the averaging functionsav m , M p X m , av p , P X p , av m,p , P X p M p X m . Then the features in Table 1 can be formalised similarly to Brown’s metrics, asshown in the third column of Table 1.We can place all of these scalars into a single framework: f ( P r ) = ( g ◦ g ◦ g ◦ g ◦ h m,p ) ( P r ) , (4)where h m,p ( P r ) ∈ (cid:8) d m,pv , sgn ( d m,pv ) · ( P v ′ d m,pv ′ ) | v = 1 , , (cid:9) enerating new features of polynomial systems for machine learning 7 and g , g , g , and g are all taken from the set (cid:8) max p , max m , max m,p , max , P p , P m , P m,p , P , av p , av m , av m,p , av , sgn , sgn (cid:9) . In the above set max , P , av and sgn are all equal to the identity function.For example, let P r = { x x − x , x x x + x x } . If m = 1 , p = 2, then h , ( P r ) ∈ (cid:8) d , v , sgn (cid:0) d , v (cid:1) · (cid:16)P v ′ d , v ′ (cid:17) | v = 1 , , (cid:9) = (cid:8) , · , , · , , · (cid:9) . We will thus consider deriving all of the other features which fall into this frame-work, but to do so we must first impose a number of rules.1. The functions g , g , g , g must all belong to distinct categories of function,i.e. one each of max, P , av, and sgn.2. Exactly one of the functions g , g , g , g is computed over p and exactly oneis computed over m (it may be the same one).3. The computation over p is always performed by a function g i with an index i greater or equal to that of the function computing over m . Table 2.
Possible distribu-tions of indices to the functionclasses in feature framework. max av sum sgn p, m p m p m p, m p m
00 0 p, m p, m p m p p, m p m
10 0 p, m f ( P r ) can be interpreted asfollows. The values h m,p ( P r ) are functions of vari-ables m and p . Each of the functions g , g , g , g either leave the function unchanged, or they turnit into a function of fewer variables (first into afunction of p , and then into a scalar value, repre-senting the ML feature).The rules above are justified as follows. Rule 1reduces the redundancy in the feature set. Rules2 and 3 guarantee that the feature f v ( P r ) is welldefined and is a scalar number. In particular, Rule3 is necessary because the computation over theterms in a polynomial is dependent on their num-ber, which is not the same for all polynomials.The final set { f (1) ( P r ) , . . . , f ( N f ) ( P r ) } hassize N f = 1728 for a problem with 3 variables.This number is attained as follows: we have 12possible distributions of indexes to the functions g , . . . , g as shown in Table 2; then 4! possible or-derings of those functions; and 6 possible choicesfor h . 4! · ·
12 = 1728.However, many of these features will be identical (e.g. to a different placementof the identify function). We do not identify these manually now: the task thatis trivial for a given dataset, but substantial to do in generality.
D. Florescu and M. England
We now describe a ML experiment to choose the variable ordering for cylcindricalalgebraic decomposition. The methodology here is similar to that in our recentpaper [24] except for the addition of the extra features from Section 3. A moredetailed discussion of the methodology can be found in [24].
We use the nlsat dataset produced to evaluate the work in [30], thus the prob-lems are all fully existentially quantified. Although there are CAD algorithmsthat reduce what is being computed based on the quantifiers in the input (mostnotably via Partial CAD [16]), the conclusions drawn are likely to be applicableoutside of the SAT context.We use the 6117 problems with 3 variables from this database, so each hasa choice of six different variable orderings. We extracted only the polynomialsinvolved, and randomly divided into two datasets for training (4612) and testing(1505). Only the former was used to tune the parameters of the ML models. We used the CAD routine
CylindricalAlgebraicDecompose which is partof the
RegularChains
Library for
Maple . This algorithm builds decomposi-tions first of n -dimensional complex space before refining to a CAD of R n [14],[13], [3]. We ran the code in Maple 2018 but used an updated version of the RegularChains scikit-learn package [39] v0.20.2for Python 2.7. The features for ML were extracted using code written in the sympy package v1.3 for Python 2.7, as was Brown’s heuristic. The sotd heuristicwas implemented in
Maple as part of the
ProjectionCAD package [25].
CAD construction was timed in a Maple script that was called separately fromPython for each CAD (to avoid Maple’s caching of results). The target variableordering for ML was defined as the one that minimises the computing time fora given problem. All CAD function calls included a time limit. For the trainingdataset an initial time limit of 4 seconds was used, doubled incrementally if allorderings timed out, until CAD completed for at least one ordering (a targetvariable ordering could be assigned for all problems using time limits no biggerthan 64 seconds). The problems in the testing dataset were processed with alarger time limit of 128 seconds for all orderings (time outs set as 128s). Freely available from http://cs.nyu.edu/ ∼ dejan/nonlinear/enerating new features of polynomial systems for machine learning 9 When computed on a set of problems { P r , . . . , P r N } , some of the features f ( i ) turn out to be constant, i.e. f ( i ) ( P r ) = f ( i ) ( P r ) = · · · = f ( i ) ( P r N ) . Suchfeatures will have no benefit for ML and are removed. Further, other featuresmay be repetitive, i.e. f ( i ) ( P r n ) = f ( j ) ( P r n ) , ∀ n = 1 , . . . , N. This repetitionmay represent a mathematical equality, or just be the case of the given dataset.Either way, they are merged into a single feature for the experiment. After thisstep, we are left with 78 features: so while a large majority were redundant, westill have seven times those available in [28], [24].
Feature selection was performed with the training dataset to see if any featureswere redundant for the ML. We chose the Analysis of Variance (ANOVA) F-value to determine the importance of each feature for the classification task.Other choices we considered were unsuitable for our problem, e.g. the mutualinformation based selection requires very large amounts of data.The training dataset consists of N = 6117 problems with 3 variables, andeach problem is assigned a target ordering, or class c = 1 , . . . , C , where C = 6.Let P r c,n denote problem number n from the training dataset that is assignedclass number c , c = 1 , . . . , C and n = 1 , . . . , N c , where N c denotes the numberof problems that are assigned class c. Thus P Cc =1 N c = N. The F-value for feature number i is computed as follows [35]. F i = C − P Cc =1 N c (cid:16) ¯ f ( i ) c − ¯ f ( i ) (cid:17) N − C P Cc =1 P N c n =1 (cid:16) f ( i ) ( P r c,n ) − ¯ f ( i ) c (cid:17) , (5)where ¯ f ( i ) c is the sample mean in class c , and ¯ f ( i ) the overall mean of the data:¯ f ( i ) c = 1 N c N c X n =1 f ( i ) ( P r c,n ) , ¯ f ( i ) = 1 N C X c =1 N c X n =1 f ( i ) ( P r c,n ) . The numerator in (5) represents the between-class variability or explained vari-ance and the denominator the within-class variability or unexplained variance .Of the 78 features the three with the highest F-values were the following f ( P r ) = max av m,p P sign ( d m,p ) = P P p M p P m sign ( d m,p ) f ( P r ) = max P p av m sign ( d m,p ) · ( P v ′ d m,pv ′ )= P p M p P m sign ( d m,p ) · ( P v ′ d m,pv ′ ) f ( P r ) = av P p max m sign ( d m,p ) · ( P v ′ d m,pv ′ )= P p max m sign ( d m,p ) · ( P v ′ d m,pv ′ ) Table 3.
The ML hyperparameters used following optimisation on the training dataset.
Model Hyperparameter Value
Decision Tree Criterion Gini impurityMaximum tree depth 17K-Nearest Train instances weighting Inversely proportional to distanceNeighbours Algorithm Ball TreeSupport Vector Regularization parameter C γ . . α · − The new features may be translated back into natural language. For example,feature 65 is the proportion of monomials containing variable x , averaged acrossall polynomials; feature 46 the sum of the degrees of the variables in all mono-mials containing variable x , averaged across all monomials and summed acrossall polynomials;and feature 76 the maximum sum of the degrees of the variablesin all monomials containing variable x , summed across all polynomials.Feature selection did not suggest to remove any features (they all contributedmeaningful information), so we proceed with our experiment using all 78. Four of the most commonly used deterministic ML models were tuned on thetraining data (for details on the methods see e.g. the textbook [2]). – The K − Nearest Neighbours (KNN) classifier [2, § – The Multi-Layer Perceptron (MLP) classifier [2, § – The Decision Tree (DT) classifier [2, § – The Support Vector Machine (SVM) classifier with RBF kernel [2, § The ML approaches were compared in terms of prediction accuracy and resultingCAD computing time against the two best known human constructed heuristics enerating new features of polynomial systems for machine learning 11 [8], [18] as discussed earlier. Unlike the ML, these can end up predicting severalvariable orderings (i.e. when they cannot discriminate). In practice if this wereto happen the heuristic would select one randomly (or perhaps lexicographi-cally), however that final pick is not meaningful. To accommodate this, for eachproblem, the prediction accuracy of such a heuristic is judged to be the the per-centage of its predicted variable orderings that are also target orderings. Theaverage of this percentage over all problems in the testing dataset representsthe prediction accuracy. Similarly, the computing time for such methods wasassessed as the average computing time over all predicted orderings, and it isthis that is summed up for all problems in the testing dataset.
The results are presented in Table 4. We compare the four ML models on thepercentage of problems where they selected the optimum ordering, and the totalcomputation time (in seconds) for solving all the problems with their chosenorderings. The first two rows reproduce the results of [24] which used only the11 features from Table 1, while the latter two rows are the results from the newexperiment in the present paper which has 78 features. We also compare with thetwo human constructed heuristics and the outcome of a random choice betweenthe 6 orderings (which do not change with the number of features). We mightexpect a random choice to be correct one sixth of the time but it is higher as forsome problems there were multiple variable orderings with equally fast timings.We also consider the distribution of the computation times: the differencesbetween the computation time of each method and the minimum computationtime, given as a percentage of the minimum time, are depicted in Figure 1.
The minimum total computing time, achieved if we select an optimal ordering forevery problem, is 8 623s. Choosing at random would take 30 235s, almost 4 timesas much. The maximum time, if we selected the worst ordering for every problem,is 64 534s. The K-Nearest Neighbours model achieved the shortest time of ourmodels and heuristics, with 9 178s, only 6% more than the minimal possible.
Table 4.
The comparative performance of DT, KNN, MLP, SVM, and the Brown andsotd heuristics on the testing dataset for the present experiment and the one in [24].DT KNN MLP SVM
Brown sotd randFrom [24]
Accuracy .
6% 63 .
3% 61 .
6% 58 .
8% 51% 49 .
5% 22 . Time (s)
Accuracy .
2% 66 .
3% 67% 65%(78 Features)
Time (s) P r o b l e m c o un t Decision Tree
K-Nearest Neighbors P r o b l e m c o un t Multi Layer Perceptron
Support Vector Machine P r o b l e m c o un t Brown heuristic
Sotd heuristic
Fig. 1.
The histograms of the percentage increase in computation time relative to theminimum computation time for each method, calculated for a bin size of 1%.
Since they are not affected by the new feature framework of the present paperthe findings on the human made heuristics are the same as in [24]. Of the twohuman-made heuristics,
Brown performed the best, surprising since the sotd heuristic has access to additional information (not just the input polynomials butalso their projections). Obtaining an ordering for a problem instance with sotd hence takes longer than for Brown or any ML model − generating an orderingwith sotd for all problems in the testing dataset took over 30min. Using Brown we can solve all problems in 10,951s, 27% more than the minimum. While sotd is only 0.7% less accurate than
Brown in identifying the best ordering, it is muchslower at 11 938s or 38% more than the minimum. So, while
Brown is not muchbetter at identifying the best, it is much better at discarding the worst!
The results show that all ML approaches outperform the human constructedheuristics in terms of both accuracy and timings. Moreover, the results show thatthe new algorithm for generating features leads to a clear improvement in MLperformance compared to using only a small number of human generated featuresin [24]. For all four modules both accuracy has increased and computation timedecreased. The best achieved time was 14% above the minimum using the original11 features but now only 6% above with the new features. enerating new features of polynomial systems for machine learning 13
The computing time for all the methods lies between the best (8 623s) andthe worst (64 534s). Therefore, if we scale this time to [0 , .
16, and the best ML method (KNN) lies at 0 .
99. So using MLallows us to be 4 times closer to the minimum possible computing time.Figure 1 shows that the human-made heuristics result in computing timesthat are often significantly larger than 1% of the corresponding minimum timefor each problem. The ML methods, on the other hand, all result in over 1000problems ( ∼
75% of the testing dataset) within 1% of the minimum time.
In this experiment the MLP and KNN models offered the best performance, anda clear advance on the prior state of the art. But we acknowledge that there ismuch more to do and emphasise that these are only the initial findings of theproject and we need to see if the findings are replicated. Planned extensionsinclude: expanding the dataset to problems with more variables and quantifierstructure; trying different feature selection techniques, and seeing if classifierstrained for the
Maple
CAD may be applied to other implementations.Our main result is that a great many more features can be obtained triviallyfrom the input (i.e. without any projection operations) than previously thought,and that these are relevant and lead to better ML choices. Some of these are easyto express in natural language, such as the number of polynomials containinga certain variable, but others do not have an obvious interpretation. This isimportant because something that is hard to describe in natural language isunlikely to be suggested by a human as a feature, which illustrates the benefitof our framework. This contribution to feature extraction for algebraic problemsshould be more widely applicable than the CAD variable ordering decision.
Acknowledgements
This work is supported by EPSRC Project EP/R019622/1:
Embedding Machine Learning within Quantifier Elimination Procedures . References
1. ´Abrah´am, E., et al.: SC : Satisfiability checking meets symbolic computation. In:Proc. CICM ’16, LNCS 9791, pp. 28–43. Springer (2016)2. Bishop, C.: Pattern Recognition and Machine Learning. Springer (2006)3. Bradford, R., Chen, C., Davenport, J., England, M., Moreno Maza, M., Wilson,D.: Truth table invariant cylindrical algebraic decomposition by regular chains. In:Proc. CASC ’14, LNCS 8660, pp. 44–58. Springer (2014)4. Bradford, R., et al.: A case study on the parametric occurrence of multiple steadystates. In: Proc. ISSAC ’17, pp. 45–52. ACM (2017)5. Bradford, R., Davenport, J., England, M., McCallum, S., Wilson, D.: Truth tableinvariant cylindrical algebraic decomposition. J. Symb. Comp. , 1–35 (2016)4 D. Florescu and M. England6. Bradford, R., Davenport, J., England, M., Wilson, D.: Optimising problem for-mulations for cylindrical algebraic decomposition. In: Intelligent Computer Math-ematics, LNCS 7961, pp. 19–34. Springer Berlin Heidelberg (2013)7. Bridge, J., Holden, S., Paulson, L.: Machine learning for first-order theorem prov-ing. Journal of Automated Reasoning , 299–328 (1991)17. Davenport, J., Bradford, R., England, M., Wilson, D.: Program verification in thepresence of complex numbers, functions with branch cuts etc. In: Proc. SYNASC’12, pp. 83–88. IEEE (2012)18. Dolzmann, A., Seidl, A., Sturm, T.: Efficient projection orders for CAD. In: Proc.ISSAC ’04, pp. 111–118. ACM (2004)19. England, M.: Machine learning for mathematical software. In: Mathematical Soft-ware – Proc. ICMS ’18. LNCS 10931, pp. 165–174. Springer (2018)20. England, M., Bradford, R., Davenport, J.: Improving the use of equational con-straints in cylindrical algebraic decomposition. In: Proc. ISSAC ’15, pp. 165–172.ACM (2015)21. England, M., Bradford, R., Davenport, J., Wilson, D.: Choosing a variable orderingfor truth-table invariant CAD by incremental triangular decomposition. In: Proc.ICMS ’14, LNCS 8592, pp. 450–457. Springer (2014)22. England, M., Davenport, J.: Experience with heuristics, benchmarks & standardsfor cylindrical algebraic decomposition. In: Proc. SC ’16. CEUR-WS (2016)23. England, M., Errami, H., Grigoriev, D., Radulescu, O., Sturm, T., Weber, A.:Symbolic versus numerical computation and visualization of parameter regionsfor multistationarity of biological networks. In: Computer Algebra in ScientificComputing (CASC ’17), LNCS 10490, pp. 93–108. Springer (2017)24. England, M., Florescu, D.: Comparing machine learning models to choose thevariable ordering for cylindrical algebraic decomposition. To appear in Proc. CICM’19 (Springer LNCS) (2019). Preprint: https://arxiv.org/abs/1904.1106125. England, M., Wilson, D., Bradford, R., Davenport, J.: Using the Regular ChainsLibrary to build cylindrical algebraic decompositions by projecting and lifting. In:Mathematical Software – ICMS ’14. LNCS 8592, pp. 458–465. Springer (2014)enerating new features of polynomial systems for machine learning 1526. Huang, Z., England, M., Davenport, J., Paulson, L.: Using machine learning todecide when to precondition cylindrical algebraic decomposition with Groebnerbases. In: Proc. SYNASC ’16. pp. 45–52. IEEE (2016)27. Huang, Z., England, M., Wilson, D., Bridge, J., Davenport, J., Paulson, L.: Usingmachine learning to improve cylindrical algebraic decomposition. Mathematics inComputer Science, Volume to be assigned, 28 pages (2019)28. Huang, Z., England, M., Wilson, D., Davenport, J., Paulson, L., Bridge, J.: Apply-ing machine learning to the problem of choosing a heuristic to select the variableordering for cylindrical algebraic decomposition. In: Intelligent Computer Mathe-matics, LNAI 8543, pp. 92–107. Springer International (2014)29. Iwane, H., Yanami, H., Anai, H., Yokoyama, K.: An effective implementation ofa symbolic-numeric cylindrical algebraic decomposition for quantifier elimination.In: Proc. SNC ’09, pp. 55–64. SNC ’09 (2009)30. Jovanovic, D., de Moura, L.: Solving non-linear arithmetic. In: Gramlich, B., Miller,D., Sattler, U. (eds.) Automated Reasoning − Proc. IJCAR ’12, LNCS 7364, pp.339–354. Springer (2012)31. Kobayashi, M., Iwane, H., Matsuzaki, T., Anai, H.: Efficient subformula orders forreal quantifier elimination of non-prenex formulas. In: Proc. MACIS ’15, LNCS9582, pp. 236–251. Springer International Publishing (2016)32. K¨uhlwein, D., Blanchette, J., Kaliszyk, C., Urban, J.: MaSh: Machine learning forsledgehammer. In: Interactive Theorem Proving, LNCS 7998, pp. 35–50. SpringerBerlin Heidelberg (2013)33. Liang, J., Hari Govind, V., Poupart, P., Czarnecki, K., Ganesh, V.: An empiricalstudy of branching heuristics through the lens of global learning rate. In: Proc.SAT ’17, LNCS 10491, pp. 119–135. Springer (2017)34. McCallum, S.: An improved projection operation for cylindrical algebraic decom-position. In: [12], pp. 242–268. (1998)35. Markowski, C.A. and Markowski, E.P.: Conditions for the effectiveness of a pre-liminary test of variance. The American Statistician, :4, 322–326 (1990)36. McCallum, S., Parusi´niski, A., Paunescu, L.: Validity proof of Lazard’s method forCAD construction. Journal of Symbolic Computation , 52–69 (2019)37. Mulligan, C., Bradford, R., Davenport, J., England, M., Tonks, Z.: Non-linear realarithmetic benchmarks derived from automated reasoning in economics. In: Proc. SC ’18, pp. 48–60. CEUR-WS (2018)38. Mulligan, C., Davenport, J., England, M.: TheoryGuru: A Mathematica package toapply quantifier elimination technology to economics. In: Mathematical Software– Proc. ICMS ’18. LNCS 10931, pp. 369–378. Springer (2018)39. Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. Journal of MachineLearning Research , 2825–2830 (2011)40. Strzebo´nski, A.: Cylindrical algebraic decomposition using validated numerics.Journal of Symbolic Computation (9), 1021–1038 (2006)41. Sturm, T.: New domains for applied quantifier elimination. In: Computer Algebrain Scientific Computing, LNCS 4194, pp. 295–301. Springer (2006)42. Urban, J.: MaLARea: A metasystem for automated reasoning in large theories. In:Proc. ESARLT ’07, CEUR-WS , p. 14. CEUR-WS (2007)43. Wilson, D., Bradford, R., Davenport, J., England, M.: Cylindrical algebraic sub-decompositions. Mathematics in Computer Science , 263–288 (2014)44. Wilson, D., Davenport, J., England, M., Bradford, R.: A “piano movers” problemreformulated. In: Proc. SYNASC ’13, pp. 53–60. IEEE (2013)45. Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: SATzilla: Portfolio-based algo-rithm selection for SAT. J. Artificial Intelligence Research32