[PDF] Border Basis Computation with Gradient-Weighted Norm

Abstract

Normalization of polynomials plays an essential role in the approximate basis computation of vanishing ideals. In computer algebra, coefficient normalization, which normalizes a polynomial by its coefficient norm, is the most common method. In this study, we propose gradient-weighted normalization for the approximate border basis computation of vanishing ideals, inspired by the recent results in machine learning. The data-dependent nature of gradient-weighted normalization leads to powerful properties such as better stability against perturbation and consistency in the scaling of input points, which cannot be attained by the conventional coefficient normalization. With a slight modification, the analysis of algorithms with coefficient normalization still works with gradient-weighted normalization and the time complexity does not change. We also provide an upper bound on the coefficient norm based on the gradient-weighted norm, which allows us to discuss the approximate border bases with gradient-weighted normalization from the perspective of the coefficient norm.

Full PDF

BBorder Basis Computationwith Gradient-Weighted Norm

Hiroshi Kera ∗ Abstract

Normalization of polynomials plays an essential role in the approximate basiscomputation of vanishing ideals. In computer algebra, coeﬃcient normalization,which normalizes a polynomial by its coeﬃcient norm, is the most common method.In this study, we propose gradient-weighted normalization for the approximate bor-der basis computation of vanishing ideals, inspired by the recent results in machinelearning. The data-dependent nature of gradient-weighted normalization leads topowerful properties such as better stability against perturbation and consistency inthe scaling of input points, which cannot be attained by the conventional coeﬃcientnormalization. With a slight modiﬁcation, the analysis of algorithms with coeﬃ-cient normalization still works with gradient-weighted normalization and the timecomplexity does not change. We also provide an upper bound on the coeﬃcientnorm based on the gradient-weighted norm, which allows us to discuss the approx-imate border bases with gradient-weighted normalization from the perspective ofthe coeﬃcient norm.

Given a set of points X ⊂ R n , the vanishing ideal of X is a set of polynomials in R [ x , . . . , x n ] that vanish for any x ∈ X . I ( X ) = { g ∈ R [ x , . . . , x n ] | ∀ x ∈ X, g ( x ) = 0 } . In the last decade, approximate computation of bases of vanishing ideals has been exten-sively studied [1, 4, 6, 11, 12, 13, 15, 17, 18, 19], where a basis comprises approximatelyvanishing polynomials, i.e., g ( x ) ≈ , ( ∀ x ∈ X ). Such approximate basis computationand approximately vanishing polynomials have been exploited in various ﬁelds such asdynamics reconstruction, signal processing, and machine learning [2, 7, 8, 10, 14, 22,21, 23]. The wide variety of applications is based on the fact that the approximatebasis computation takes a set of noisy points as its input—which is suitable for the ∗ Department of Information and Communication Engineering, Graduate School of InformationScience and Technology, The University of Tokyo. Corresponding author: Hiroshi Kera (e-mail:[email protected]). a r X i v : . [ c s . S C ] J a n ecent data-driven science—and eﬃciently computes a set of multivariate polynomialsthat characterize the given data.To consider an approximately vanishing polynomial g for X , normalization of g playsa fundamental role because scaling g by k ∈ R can change the extent of approximatevanishing. For example, with a threshold (cid:15) = 0 .

5, even if g is approximately vanishingas g ( x ) = 0 . ≤ (cid:15) , the scaling 10 g is not approximately vanishing. In computer algebra,coeﬃcient normalization, where polynomials are normalized to have a unit coeﬃcientnorm, is the most common choice. In contrast, in machine learning, the basis com-putation of vanishing ideals is performed in a monomial-agnostic manner to sidestepsymbolic computation and term orderings [12, 15, 18], and one cannot eﬃciently getaccess to the coeﬃcients of terms. Thus, polynomials are handled without any propernormalization. Recently, this issue was resolved by gradient normalization [13], whichuses the gradient norm (cid:113)(cid:80) x ∈ X (cid:107)∇ g ( x ) (cid:107) . Interestingly, the data-dependent nature ofgradient normalization provides fascinating properties that have never been realized byother basis computation algorithms. However, the direct application of the gradient nor-malization to the monomial-aware basis computation in computer algebra does not takeover these advantages and merely increases the computational cost. Thus, an eﬀectivedata-dependent normalization remains unexplored for computer-algebraic approaches.In this paper, we propose a new normalization, called gradient-weighted normal-ization, which is a hybrid of the coeﬃcient normalization conventionally used in com-puter algebra and the gradient normalization recently developed in machine learning.Gradient-weighted normalization can be applied to most existing basis computationalgorithms for vanishing ideals in computer algebra. In particular, we focus on theapproximate computation of border bases because these are the most common choicesin the approximate computation [1, 6, 17] due to their better numerical stability thanthe Gr¨obner bases [4, 20]. We highlight the following advantages of gradient-weightednormalization in the approximate border basis computation. As a particular example,we analyze the approximate Buchberger–M¨oller (ABM) algorithm [17]. • Gradient-weighted normalization realizes an approximate border basis computa-tion that outputs polynomials that are more robust against perturbation on inputpoints. • With gradient-weighted normalization, approximate border basis computation equipsa scaling consistency; scaling input points does not change the size of the outputbasis and only linearly scales the evaluation values for the input points. • Gradient-weighted normalization only requires the algorithm to be slightly modi-ﬁed and leads to slight changes in the analysis of the original algorithm. The timecomplexity of the algorithms does not change with the use of gradient-weightednormalization as the gradient normalization.Furthermore, we derive an upper bound on coeﬃcient norm of a vanishing polynomialbased on the gradient-weighted norm. This result is helpful in analyzing the approximateborder bases, whose deﬁnition relies on the coeﬃcient norm. We consider that this2tudy provides a new direction of approximate border basis computation toward data-dependent normalization and analysis.

The gradient of polynomials was exploited for the approximate computation of vanish-ing ideals in several studies. In [1], the ﬁrst-order approximation (and thus gradient)of polynomials was computed to discover a set of monomials whose evaluation matrixremained full-rank for small perturbation in points. Similarly, in [5], the ﬁrst-order ap-proximation of polynomials was considered to compute a low-degree polynomial thatapproximately passed through the given points in terms of the geometrical distance.However, both methods use coeﬃcient normalization and do not always succeed. Incontrast, our algorithm uses gradient information in the normalization and always suc-ceeds.In the study that is the most similar to the present study [13], polynomials nor-malized by the gradient norm were considered. However, their method focuses on themonomial-agnostic basis computation, which does not consider a polynomial as a linearcombination of terms but instead consider it as a linear combination of lower-degreepolynomials. Although this can be helpful in some practical situations in which sym-bolic computation and term ordering are not favorable, it is still unknown how helpfulthe data-dependent normalization is in the monomial-aware setting, which is standardin computer algebra. In particular, the direct application of the gradient normaliza-tion in the border basis computation cannot fully exploit the advantages shown in themonomial-agnostic basis computation. Furthermore, how we can relate the gradientnorm to the coeﬃcient norm, which plays an important role in approximate borderbases, remains unclear. The gradient-weighted norm proposed in this study realizes theadvantages of the gradient normalization and also upper bound the coeﬃcient norm.By taking advantage of the monomial-aware setting, a more detailed analysis than [13]is performed such as providing lower and upper bounds on the norm of the gradient ofterms and polynomials.

Throughout the paper, we consider a ﬁnite set of points X ⊂ R n , a polynomial ring R n = R [ x , . . . , x n ], and set of terms T n ⊂ R n , where x , . . . , x n are indeterminates.The deﬁnitions of the order ideal and the border basis are based on those in [16], andthe deﬁnitions of approximate notions are based on [6]. Deﬁnition 3.1.

Given a set of points X = { x , x , ..., x N } ⊂ R n , with a slight abuse ofnotation, the evaluation vector of a polynomial h ∈ R n and its gradient ∇ h is deﬁned Here, following the convention in the border basis computation, we use terms for the power productsof indeterminates and monomials for the terms accompanied by coeﬃcients.

3s follows, respectively. h ( X ) = (cid:0) h ( x ) h ( x ) · · · h ( x N ) (cid:1) (cid:62) ∈ R N , ∇ h ( X ) = (cid:0) ∇ h ( x ) (cid:62) ∇ h ( x ) (cid:62) · · · ∇ h ( x N ) (cid:62) (cid:1) (cid:62) ∈ R nN . For a set of polynomials H = { h , h , . . . , h s } ⊂ R n and the set of their gradients ∇ H = {∇ h , ∇ h , . . . , ∇ h s } , each evaluation matrix is deﬁned as H ( X ) = (cid:0) h ( X ) h ( X ) · · · h s ( X ) (cid:1) ∈ R N × s , ∇ H ( X ) = (cid:0) ∇ h ( X ) ∇ h ( X ) · · · ∇ h s ( X ) (cid:1) ∈ R nN × s . Deﬁnition 3.2.

A polynomial f ∈ R n is said to be unitary if the norm of its coeﬃcientvector equals one. Deﬁnition 3.3.

A ﬁnite set of terms

O ⊂ T n is called an order ideal if the followingholds: if t ∈ T n divides o ∈ O , then t ∈ O . The border of O is deﬁned as ∂ O =( (cid:83) nk =1 x k O ) \O . Deﬁnition 3.4.

Let

O ⊂ k [ x , . . . , x n ] be an order ideal. Then, an O -border prebasis is a set of polynomials in the form b − (cid:88) o ∈ O c o o, where b ∈ ∂ O , and c o ∈ k . If O is a basis of the k -vector space k [ x , x , . . . , x n ] /I , then G is called an O -border basis of an ideal I . Deﬁnition 3.5.

Given δ ≥

0, a border basis G ⊂ R n is said to be a δ -approximate O -border basis of an ideal I if for any g, (cid:101) g ∈ G , the normal remainder of the S-polynomial has a coeﬃcient norm which is equal to or less than δ . Deﬁnition 3.6.

Given (cid:15) ≥

0, a polynomial g ∈ R n is said to be an (cid:15) -approximatelyvanishing for a set of points X ⊂ R n if (cid:107) g ( X ) (cid:107) ≤ (cid:15) , where (cid:107) · (cid:107) denotes the Euclideannorm. Deﬁnition 3.7.

Given (cid:15) ≥

0, an ideal

I ⊂ R n is said to be an (cid:15) -approximate van-ishing ideal for a set of points X ⊂ R n if there exists a system of unitary polynomialsthat generates I and (cid:15) -approximately vanishing for X . Remark 3.8.

Let us consider the O -border basis G of the vanishing ideal I ( X ) of X ⊂ R n . The evaluation vectors of the order terms span R n . The evaluation vectors ofthe terms in O are linearly independent, and |O| = | X | . In the approximate case, theformer still holds, and the latter becomes |O| ≤ | X | . We omit the deﬁnition of normal remainders and S-polynomials, which are relatively basic notionsin computer algebra. Refer to [3]. ther notations We denote the support of a given polynomial by supp( · ) and the setof linear combinations of a given set of terms with coeﬃcients in R by span R ( · ). Besides, (cid:107)·(cid:107) denotes the Euclidean norm of a vector, and (cid:107)·(cid:107) c denotes the coeﬃcient norm of apolynomial. The total degree of a polynomial is denoted by deg( · ) and deg k ( · ) denotesthe degree with respect to x k . The cardinality of a set is denoted by |·| . Deﬁnition 4.1.

The gradient norm of a polynomial g ∈ R n with respect to X ⊂ R n is (cid:107) g (cid:107) g ,X = (cid:113)(cid:80) x ∈ X (cid:107)∇ g ( x ) (cid:107) /Z , where Z = (cid:112)(cid:80) nk =1 deg k ( g ) . Deﬁnition 4.2.

The gradient-weighted norm of a polynomial g = (cid:80) i c i t i , ( c i ∈ R , t i ∈ T n ) is (cid:107) g (cid:107) gw ,X = (cid:115)(cid:88) i c i (cid:107) t i (cid:107) g,X . If the gradient-weighted coeﬃcient norm of g is equal to one, then g is gradient-weighted unitary . Remark 4.3.

For any term t ∈ T n , its gradient norm and gradient-weighted norm areidentical, i.e., (cid:107) t (cid:107) g ,X = (cid:107) t (cid:107) gw ,X . The derivations of our results work with any Z > Z provides simpler bounds.In general, the gradient-weighted norm and the coeﬃcient norm of a polynomial arenot correlated; a large gradient norm does not always imply a large coeﬃcient norm andvice versa. The following two examples illustrate this: Example 4.4.

Let us consider a polynomial f = x y − c ∈ R [ x, y ] , ( c ∈ R ). Thegradient-weighted norm of f is (cid:107) f (cid:107) gw ,X = 0 for X = { (1 , , (0 , } , whereas the coeﬃ-cient norm (cid:107) f (cid:107) c = √ c can be arbitrarily large by increasing | c | . Example 4.5.

Let us consider a polynomial f = ( x + y − / √ ∈ R [ x, y ] , ( c ∈ R ).The coeﬃcient norm of f is (cid:107) f (cid:107) c = 1, whereas the gradient-weighted norm for X = { ( k, , (0 , k ) } is (cid:107) f (cid:107) gw ,X = 2 | k | / √

3, which can be arbitrary large by increasing | k | .Example 4.4 also indicates that normalizing polynomials with their gradient-weightednorm is not always valid because zero-division can occur. However, in border basiscomputation, we can show that gradient-weighted normalization is always valid. Lemma 4.6.

Let

O ⊂ T n be an order ideal. Then, for any o ∈ O , if deg k ( o ) > ∂o∂x k (cid:54) = 0), then ∂o∂x k / deg k ( o ) ∈ O . Besides, for any b ∈ ∂ O , there exists some x k thatsatisﬁes ∂b∂x k / deg k ( b ) ∈ O . Proof.

Let o = (cid:81) nl =1 x α l l ∈ O , where α l ∈ Z ≥ and α k >

0. Then, ∂o / ∂x k =deg k ( o ) x α k − k (cid:81) l (cid:54) = k x α l l . Because x α k − k (cid:81) l (cid:54) = k x α l l divides o , it is ∂o∂x k / deg k ( o ) ∈ O . For b ∈ ∂ O , we can write b = x k o for some x k and o ∈ O . Thus, b = x k (cid:81) nl =1 x α l l ∈ O and ∂b / ∂x k = ( α k + 1) (cid:81) nl =1 x α l l = deg k ( b ) o ; hence, ∂b∂x k / deg k ( b ) = o ∈ O .5 roposition 4.7. Let G ⊂ R n be the O -border basis of the vanishing ideal I ( X ) of X ⊂ R n . Then, the following holds.1. Any o ∈ O\{ } has a nonzero gradient-weighted norm, i.e., (cid:107) o (cid:107) gw ,X (cid:54) = 0.2. Any g ∈ G has a nonzero gradient-weighted norm, i.e., (cid:107) g (cid:107) gw ,X (cid:54) = 0.3. For O = { , o , . . . , o |O| } , ∇ o ( X ) , . . . , ∇ o |O| ( X ) are linearly independent.4. For G = { g , . . . , g | G | } , ∇ g ( X ) , . . . , ∇ g | G | ( X ) are linearly independent. Proof.

From Lemma 4.6, for any order term except 1, there is a partial derivative whoseterm is again an order term. In addition, for any border basis polynomial, whose sup-port is

O ∪ ∂ O , there is a partial derivative that is a nontrivial linear combination oforder terms. Because the evaluation vectors of order terms are linearly independent (Re-mark 3.8), the gradient-weighted norm of order terms and border basis polynomials isalways nonzero (properties (1) and (2) are proven).Next, we prove property (3). For o ∈ O , we deﬁne the index set i ( o ) = { k | deg k ( o ) > } and an order term o ( k ) := ∂o∂x k / deg k ( o ) for k ∈ i ( o ). Let us consider any two distinctorder terms, o , o ∈ O\{ } . If i ( o ) ∩ i ( o ) = ∅ , then ∇ o ( X ) and ∇ o ( X ) are linearlyindependent because ∂o ∂x k ( X ) (cid:54) = always indicates that ∂o ∂x k ( X ) = and vice versa.If there exists k ∈ i ( o ) ∩ i ( o ), noting that o ( k )1 , o ( k )2 ∈ O are distinct order terms,their evaluation vectors with respect to X are linearly independent. Thus, ∇ o ( X ) and ∇ o ( X ) are always linearly independent. This proof can be readily generalized fromtwo order terms to O = { o , o , . . . , o s } . Property (4) can be proven similarly by notingthat distinct g , g ∈ G have diﬀerent border terms.Proposition 4.7 indicates that gradient-weighted normalization is always valid in thebasis computation. Thus, given a polynomial, its coeﬃcient normalization and gradient-weighted normalization are identical up to a constant scale (although this diﬀerenceof scales leads to interesting results presented in this paper). Hence, replacing thecoeﬃcient norm with the gradient-weighted norm does not aﬀect most symbolic analysesof border bases in the existing studies. Besides such compatibility, border bases withgradient-weighted normalization yield higher stability at the perturbation on points. Proposition 4.8.

For any gradient-weighted unitary polynomial g ∈ R n for X ⊂ R n ,it is (cid:107)∇ g ( X ) (cid:107) ≤ deg( g ) (cid:112) | X | . Proof.

Let g = (cid:80) i c i t i , ( c i ∈ R , t i ∈ T n ). In addition, we deﬁne an index set j ( g ) = { i ∈{ , . . . , s } | (cid:107) t i (cid:107) gw ,X (cid:54) = 0 } . Then, ∇ g ( x ) = (cid:88) i ∈ j ( g ) c i (cid:107) t i (cid:107) gw ,X ∇ t i ( x ) (cid:107) t i (cid:107) gw ,X , = (cid:88) i ∈ j ( g ) c i (cid:107) t i (cid:107) gw ,X ∇ t i ( x ) (cid:107)∇ t i ( X ) (cid:107) (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) k =1 deg k ( t i ) , (cid:112)(cid:80) k deg k ( t i ) ≤ deg( t i ) ≤ deg( g ), the triangle inequality, and (cid:107) g (cid:107) gw ,X = (cid:113)(cid:80) si =1 c i (cid:107) t (cid:107) ,X = 1, (cid:107)∇ g ( x ) (cid:107) ≤ deg( g ) (cid:88) i ∈ j ( g ) | c i |(cid:107) t i (cid:107) gw ,X (cid:107)∇ t i ( x ) (cid:107)(cid:107)∇ t i ( X ) (cid:107) , ≤ deg( g ) (cid:115) (cid:88) i ∈ j ( g ) c i (cid:107) t i (cid:107) ,X (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) i ∈ j ( g ) (cid:107)∇ t i ( x ) (cid:107) (cid:107)∇ t i ( X ) (cid:107) , = deg( g ) (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) i ∈ j ( g ) (cid:107)∇ t i ( x ) (cid:107) (cid:107)∇ t i ( X ) (cid:107) , where at the second inequality, we used the Cauchy–Schwarz inequality. Thus, (cid:107)∇ g ( X ) (cid:107) = (cid:115) (cid:88) x ∈ X (cid:107) g ( x ) (cid:107) , ≤ deg( g ) (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) x ∈ X (cid:88) i ∈ j ( g ) (cid:107)∇ t i ( x ) (cid:107) (cid:107)∇ t i ( X ) (cid:107) , = deg( g ) (cid:112) | supp( g ) \{ }| , ≤ deg( g ) (cid:112) | X | . At the last inequality, we used | supp( g ) \{ }| ≤ |O| ≤ | X | . Remark 4.9.

The inequality in Proposition 4.8 becomes (cid:107)∇ g ( X ) (cid:107) ≤ (cid:112) | X | if we use Z = 1 in Deﬁnition 4.1.Proposition 4.8 implies that for small perturbation p on x , two evaluation values g ( x )and g ( x + p ) are close to each other and the diﬀerence can be bounded by a constantscaling of the magnitude of the perturbation. This is not the case with the coeﬃcientnormalization because the (coeﬃcient-)unitary polynomial does not necessarily indicatea small gradient (cf., Example 4.5). Among several methods of the approximate border basis construction, we choose theABM algorithm [17] to make the analysis simple.

Given a ﬁnite set of points X ⊂ R n , an error tolerance (cid:15) ≥

0, and a term ordering σ ,the ABM algorithm collects order terms and approximately vanishing polynomials from7 lgorithm 1: The ABM algorithm with gradient-weighted normalization

Input:

X, (cid:15), σ

Output: G, O G = {} , O = { } for d = 1 , , . . . do L = { b ∈ ∂ O | deg( b ) = d } // Assuming terms are ordered increasingly w.r.t. σ if | L | = 0 then Return M := ( G, O ) and terminate. for b in L do Solve the generalized eigenvalue problem Eq. (12) and obtain( λ min , v min ). if √ λ ≤ (cid:15) then /* O = { o , o , · · · , o s } and v min = ( v , . . . , v s +1 ) (cid:62) */ g := v b + v o + · · · + v s +1 o s G = G ∪ { g } else O = O ∪ { b } lower to higher degrees. First, O = { } and G = {} are set at degree 0. At degree d ≥ d terms are prepared as L = { b ∈ ∂ O | deg( b ) = d } . If L is empty, the algorithmoutputs ( O , G ) and terminates; otherwise, the following steps S1–S3 are repeated until L becomes empty. S1 Select the smallest b ∈ L in terms of σ and remove b from L . S2 Let

M, D be M = (cid:0) b ( X ) O ( X ) (cid:1) and D = diag( (cid:107) b (cid:107) gw ,X , (cid:107) o (cid:107) gw ,X , · · · , (cid:107) o s (cid:107) gw ,X ) , where O = { o , . . . , o s } . Solve the following generalized eigenvalue problem: M (cid:62) M v min = λ min D v min , where λ min and v min are the smallest generalized eigenvalue and the correspondinggeneralized eigenvector, respectively. S3 If √ λ min ≤ (cid:15) , Deﬁne a new polynomial, g = v b + v o + v o + · · · + v s +1 o s , In the original paper [17], the largest term is selected. We consider this to be a minor error and thesmallest term should be selected ﬁrst because the term b is a potential leading term (or border term)and thus must always be larger than the terms in the tentative O . v min = ( v , . . . , v s +1 ) (cid:62) and G is updated by G = G ∪ { g } . Otherwise,update O by O = O ∪ { b } .Once L becomes empty, we proceed to the next degree d + 1 and construct a new L . Proposition 5.1.

During the ABM algorithm with gradient-weighted normalization,the following always holds.1. In Eq. (12), g is gradient-weighted unitary with a nonzero coeﬃcient on b and √ λ min -approximately vanishing for X .2. Any gradient-weighted unitary polynomial h ∈ span R ( O ) is not (cid:15) -approximatelyvanishing for X . Proof.

We prove the claim by induction. At the initialization, the claim holds. Assumethat the claim holds till some point of the ABM algorithm, and we have O , G, b at S1 . Bysolving Eq. (12) at S2 , we obtain the coeﬃcient vector v min and g = bv + v o + v o + · · · + v s +1 o s . Note that solving Eq. (12) minimizes (cid:107) g ( X ) (cid:107) = v (cid:62) min M (cid:62) M v min = λ min with the constraint (cid:107) g (cid:107) ,X = v (cid:62) min D v min = 1. Thus, g is gradient-weighted unitaryand √ λ min -approximately vanishing. By construction, the generalized eigenvalue prob-lem ( O ( X ) (cid:62) O ( X ) , (cid:101) D ), where (cid:101) D is a diagonal matrix whose diagonal entries are thegradient-weighted norm of the terms in O and only has eigenvalues larger than (cid:15) be-cause O is extended only when √ λ min > (cid:15) . This implies that any gradient-weightedunitary polynomial whose support is O is not (cid:15) -approximately vanishing and thus theleading coeﬃcient b in g is nonzero.The following theorem argues that the ABM algorithm with gradient-weighted nor-malization can enjoy properties that are almost identical to those of the original ABMalgorithm (Theorem 4.3.1 in [17]). The proof is presented in Section 5.3. Theorem 5.2.

Given X ⊂ R n , (cid:15) ≥

0, and a term ordering σ , the ABM algorithmwith gradient-weighted normalization computes G = { g , . . . , g | G | } ⊂ R n and O = { o , . . . , o | O | } ⊂ T n , which have the following properties:1. All the polynomials in G are gradient-weighted unitary and G generates an (cid:15) -approximately vanishing ideal of X .2. No gradient-weighted unitary polynomial exists in span R ( O ) which vanishes (cid:15) -approximately on X .3. If O is an order ideal of terms, then the set (cid:101) G = { / LC σ ( g ) g | g ∈ G } is an O -border prebasis, where LC σ ( · ) is the leading coeﬃcient of a polynomial in theordering σ .4. If O is an order ideal of terms and if ∈ X , then the set (cid:101) G is a δ -approximateborder basis in terms of gradient-weighted coeﬃcient norm and an η -approximate9order basis in terms of the coeﬃcient norm, where δ and η , respectively, are δ < (1 + (cid:112) | X | ) | γ min | (cid:32) (cid:107) X (cid:107) max + n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | (cid:33) ,η < (cid:112) δ + ( (cid:15)/ | γ min | ) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| (cid:112) | X | , where γ max is the maximum absolute value of the coeﬃcient of the border terms in G , deg( G ) is the highest degree of polynomials in G , and (cid:107) X (cid:107) max = max k (cid:107) x k ( X ) (cid:107) .5. If (cid:15) = 0, the algorithm produces the same results as the Buchberger–M¨oller algo-rithm for border bases with gradient-weighted normalization.The main change in our theorem from the original theorem with coeﬃcient normal-ization is as follows: in Theorem 5.2, (i) the unitarity of polynomials is based on thegradient-weighted norm instead of the coeﬃcient norm; and (ii) there are two upperbounds on the approximation in terms of the gradient-weighted norm and coeﬃcientnorm . Notably, the approximation of a border basis can be discussed with the coef-ﬁcient norm, even when the basis is computed with the gradient-weighted norm. Asshown in Examples 4.4 and 4.5, gradient-weighted norm and the coeﬃcient norm arenot generally correlated. However, we show in Theorem 5.2 that in the border basiscomputation, we can upper bound the coeﬃcient norm by the gradient-weighted norm. We now present two advantages of using gradient-weighted normalization. The ﬁrstadvantage is the robustness against the perturbation on the input points.

Proposition 5.3.

Let G ⊂ R n be an O -border basis of I ( X ), where O is an orderideal, where X = { x , . . . , x N } ⊂ R n . Let P = { p , . . . , p N } ⊂ R n be a set of smallperturbations. If g ∈ G is gradient-weighted unitary, then, g ( X + P ) ≤ (cid:15) + (cid:107) p max (cid:107) deg( g ) (cid:112) | X | + o ( (cid:107) p max (cid:107) ) , where p max = arg max p ∈ P (cid:107) p (cid:107) , and o ( · ) is the Landau’s small o. Proof.

By the Taylor expansion, we get (cid:107) g ( X + P ) (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 (cid:0) g ( x i ) + p (cid:62) i ∇ g ( x i ) + o ( (cid:107) p i (cid:107) ) (cid:1) , ≤ (cid:107) g ( X ) (cid:107) + (cid:107) p max (cid:107)(cid:107)∇ g ( X ) (cid:107) + o ( (cid:107) p max (cid:107) ) , ≤ (cid:15) + (cid:107) p max (cid:107) deg( g ) (cid:112) | X | + o ( (cid:107) p max (cid:107) ) , where at the last inequality, we used Proposition 4.8. Here, we assume ∈ X ; however, this restriction can be removed because the key lemmas (Lem-mas 5.8 and 5.7) do not assume ∈ X . However, because of the restriction of the number of pages andbecause the bounds are a little complicated, we omitted this result. We aim to derive simpler boundsin future work. emark 5.4. The inequality in Proposition 5.3 becomes g ( X + P ) ≤ (cid:15) + (cid:107) p max (cid:107) (cid:112) | X | + o ( (cid:107) p max (cid:107) ) if we use Z = 1 in Deﬁnition 4.1.The upper bound is similar to that derived in [1], where the ﬁrst-order approximationis used as in our analysis. However, their approach relies on the coeﬃcient normalizationand only succeeds when there is a set of points that is close enough to the original setof points as well as satisﬁes a certain criterion. The criterion is a little complicated tocalculate and is not necessarily satisﬁed. In contrast, gradient-weighted normalizationactively scales polynomials such that the extent of vanishing of the polynomials alwayssatisﬁes the bound in Proposition 5.3.Another advantage of using gradient-weighted normalization is that it enables theABM algorithm to output similar bases before and after scaling input points. Proposition 5.5.

Suppose the ABM algorithm with gradient-weighted normalizationoutputs ( O , G ) for ( X, (cid:15), σ ) and ( (cid:98) O , (cid:98) G ) for ( αX, | α | (cid:15), σ ), where α (cid:54) = 0. Then, O = (cid:98) O .In addition, a one-to-one correspondence exists between between G and (cid:98) G . For thecorresponding polynomials g ∈ G and (cid:98) g ∈ (cid:98) G , the following holds: (cid:98) g ( αX ) = αg ( X ) . Symbolically, supp( g ) = supp( (cid:98) g ) and the coeﬃcients of t ∈ supp( g ) in g and (cid:98) g , say v t , (cid:98) v t ,satisfy v t = α deg( t ) − (cid:98) v t . Proof.

Let us consider two processes of the ABM algorithm; one for (

X, (cid:15), σ ) and theother for ( αX, | α | (cid:15), σ ). We use the notations in Algorithm 1 for the former processand add (cid:98) · to the notations in the latter process. At the initialization, the claim holdsbecause O = (cid:98) O = { } and G = (cid:98) G = {} . Assume that the claim holds for severaliterations, and now we are at S1 with O = (cid:98) O , ( G, (cid:98) G ) which satisﬁes the correspondence,and L = (cid:98) L . Note that for any term t ∈ T n , t ( αX ) = α deg( t ) t ( X ), and (cid:107) t (cid:107) gw ,αX = α deg( t ) − (cid:107) t (cid:107) gw ,X . Thus, t ( αX ) / (cid:107) t (cid:107) gw ,αX = αt ( X ) / (cid:107) t (cid:107) gw ,X . Therefore, by deﬁning S = diag( α deg( b ) , α deg( o ) , . . . , α deg( o s ) ), (cid:99) M (cid:62) (cid:99) M (cid:98) v min = (cid:98) λ min (cid:98) D (cid:98) v min , ⇐⇒ M (cid:62) M S (cid:98) v min = (cid:98) λ min α − D S (cid:98) v min , from which we obtain λ min = α − (cid:98) λ min and v min ∝ S (cid:98) v min at S2 . This indicates thatthresholding √ λ min by (cid:15) in the ﬁrst process is equivalent to thresholding (cid:113)(cid:98) λ min by | α | (cid:15) in the second process. Furthermore, by comparing the constraint of each generalizedeigenvalue problem, 1 = v (cid:62) min D v min and 1 = (cid:98) v (cid:62) min (cid:98) D (cid:98) v min , we obtain v min = α − S (cid:98) v min .Thus, at S3 , the coeﬃcients of the i -th term t i of g and (cid:98) g are related as v i = α deg( t i ) − (cid:98) v i .In summary, if g = v b + v o + · · · , v s +1 o s is (cid:15) -approximately vanishing for X , then (cid:98) g = (cid:98) v (cid:98) b + (cid:98) v (cid:98) o + · · · , (cid:98) v s +1 (cid:98) o s is | α | (cid:15) -approximately vanishing for αX , and vice versa. If b is appended to O , then b is also appended to (cid:98) O and thus, O = (cid:98) O ; otherwise, g and (cid:98) g areappended to G and (cid:98) G , respectively, and thus, G and (cid:98) G maintain the correspondence.11roposition 5.5 works even in the approximate case (i.e., (cid:15) >

0) and provides atheoretical justiﬁcation for the scaling of the points at preprocessing. That is, even ifwe scale a set of points before computing a basis, e.g., for the numerical stability, thereis a corresponding basis for the set of points before the scaling, and one can retrievethis basis from the basis computed from the scaled points. As shown in Section 6 andFig. 1, this is not the case with the coeﬃcient normalization, although some existingwork assumes scaling of points at preprocessing such that the range of the values fallsin [-1, 1] for numerical stability or analysis [6].

From Proposition 5.1, the overall proof of Theorem 5.2 (except claim (4)) is almost thesame as that of the original theorem by replacing coeﬃcient-unitarity with gradient-weighted unitarity. For the claim (4) in Theorem 5.2, we now derive several inequalitiesto upper bound the coeﬃcient norm by the gradient-weighted norm.

Lemma 5.6.

Let

O ⊂ T n be an order ideal, which is obtained by the ABM algorithmwith gradient-weighted normalization for X ⊂ R n and (cid:15) ≥

0. Then, for any o ∈ O\{ } ,it is (cid:107) o (cid:107) gw ,X > (cid:15) deg( o ) − (cid:112) | X | . Proof.

We prove the claim by induction. Assume at degree d >

0, the ﬁrst claim holds;that is, for any o ∈ O of degree d , it is (cid:107) o (cid:107) gw ,X > (cid:15) d − (cid:112) | X | . Let i ( o ) ⊂ { , . . . , n } bethe index set such that x k can divide o for k ∈ i ( o ). Let o ( k ) denote the term such that o = x k o ( k ) for o ∈ O and k ∈ i ( o ). For any o ∈ O of degree d + 1, we obtain (cid:107) o (cid:107) ,X = 1 (cid:80) nk =1 deg k ( o ) (cid:88) k ∈ i ( o ) (cid:13)(cid:13)(cid:13)(cid:13) ∂o∂x k ( X ) (cid:13)(cid:13)(cid:13)(cid:13) , = 1 (cid:80) nk =1 deg k ( o ) (cid:88) k ∈ i ( o ) deg k ( o ) (cid:13)(cid:13)(cid:13) o ( k ) ( X ) (cid:13)(cid:13)(cid:13) , = 1 (cid:80) nk =1 deg k ( o ) (cid:88) k ∈ i ( o ) deg k ( o ) (cid:13)(cid:13) o ( k ) ( X ) (cid:13)(cid:13) (cid:13)(cid:13) o ( k ) (cid:13)(cid:13) ,X (cid:13)(cid:13)(cid:13) o ( k ) (cid:13)(cid:13)(cid:13) ,X ,> (cid:15) d | X | . For the ﬁrst equality, we used (cid:80) nk =1 (cid:13)(cid:13)(cid:13) ∂o∂x k ( X ) (cid:13)(cid:13)(cid:13) = (cid:80) k ∈ i ( o ) (cid:13)(cid:13)(cid:13) ∂o∂x k ( X ) (cid:13)(cid:13)(cid:13) because ∂o∂x k =0 for k / ∈ i ( o ). For the second equality, we used ∂o∂x k = deg k ( o ) o ( k ) . For the lastinequality, we used (cid:80) nk =1 deg k ( o ) = (cid:80) k ∈ i ( o ) deg k ( o ) , (cid:13)(cid:13) o ( k ) ( X ) (cid:13)(cid:13) / (cid:13)(cid:13) o ( k ) (cid:13)(cid:13) gw ,X > (cid:15) , andthe assumption at degree d . At d = 1, the claim holds because of (cid:107)∇ o ( X ) (cid:107) = (cid:112) | X | .Thus, for any o ∈ O\{ } , it follows that (cid:107)∇ o ( X ) (cid:107) > (cid:15) deg( o ) − (cid:112) | X | .Interestingly, we can upper bound the coeﬃcient norm of an approximate vanishingpolynomial by its gradient-weighted norm. Given X ⊂ R n , we translate it to X suchthat ∈ X . Consequently, in the exact case, any vanishing polynomial for X should12ot have a constant term. In the approximate case (with (cid:15) ), g ∈ G may have a smallconstant c ( | c | ≤ (cid:15) ). Lemma 5.7.

Let ( O , G ) ⊂ T n × R n be a pair of an order ideal and O -border basis of I ( X ), which are obtained by the ABM algorithm for X ⊂ R n and (cid:15) ≥

0. Then, for any g , the following holds: (cid:107) g (cid:107) c < (cid:113) (cid:107) g (cid:107) ,X + c min { (cid:15) deg( g ) − , } (cid:112) | X | , where c is the coeﬃcient of the constant term of g . Furthermore, if ∈ X , then (cid:107) g (cid:107) c < (cid:113) (cid:107) g (cid:107) ,X + (cid:15) min { (cid:15) deg( g ) − , } (cid:112) | X | . Proof.

Let g = (cid:80) si =0 c i t i , where c i ∈ R and t = 1. Let c = ( c , . . . , c s ) (cid:62) and D =diag( (cid:107) t (cid:107) gw ,X , (cid:107) t (cid:107) gw ,X , . . . , (cid:107) t s (cid:107) gw ,X ) (note that c and (cid:107) t (cid:107) gw ,X are excluded). Then, (cid:107) g (cid:107) ,X = c (cid:62) D c , ≥ min i ∈{ ,...,s } (cid:107) t i (cid:107) ,X ( (cid:107) g (cid:107) − c ) . From Lemma 5.6, we have (cid:107) t i (cid:107) gw ,X > (cid:15) deg( t i ) − (cid:112) | X | and obtain (cid:107) g (cid:107) c < (cid:113) (cid:107) g (cid:107) ,X + c min { (cid:15) deg( g ) − , } (cid:112) | X | If ∈ X , then g is (cid:15) -approximately vanishing for (i.e., | c | ≤ (cid:15) ). Lemma 5.8.

Let X ⊂ R n be a set of points. Let ( O , G ) be a pair of an order ideal andan approximate border basis obtained by the ABM algorithm with ( X, (cid:15) ). Then, for anypolynomial h ∈ R n with support O (but not necessarily gradient-weighted unitary), thefollowing holds: (cid:107) h (cid:107) gw ,X < (cid:107) h ( X ) (cid:107) + | c | (cid:112) | X | (cid:15) . If ∈ X and h is (cid:15) (cid:48) -approximately vanishing for X , then (cid:107) h (cid:107) gw ,X < (cid:15) (cid:48) (cid:15) (1 + (cid:112) | X | ) . Proof.

Let O = { , o , . . . , o s } . Let us consider O − = O\{ } and h − ( X ) = h ( X ) − c , where c is a constant part in h and ∈ R | X | is the all-one vector. Let D =13iag( (cid:107) (cid:107) gw ,X , (cid:107) o (cid:107) gw ,X , . . . , (cid:107) o s (cid:107) gw ,X ), and D − = diag( (cid:107) o (cid:107) gw ,X , . . . , (cid:107) o s (cid:107) gw ,X ). By thetriangle inequality, (cid:107) h − ( X ) (cid:107) ≤ (cid:107) h ( X ) (cid:107) + | c | (cid:112) | X | , and h − ( X ) = O − ( X ) D − − D − c . Note that O − ( X ) D − − is a full-rank “tall” matrix; thus, it follows D − c = ( O − ( X ) D − − ) † h − ( X ),where · † denotes the pseudo-inverse. By the Cauchy’s interlacing theorem, the smallesteigenvalue of ( O − ( X ) D − − ) (cid:62) O − ( X ) D − − is larger than that of ( O ( X ) D − ) (cid:62) O ( X ) D − ,which is larger than (cid:15) by construction. Thus, (cid:13)(cid:13) ( O − ( X ) D − − ) † (cid:13)(cid:13) s < /(cid:15) , where (cid:107)·(cid:107) s de-notes the spectral norm of a matrix. The gradient-weighted norm of h can be representedby (cid:107) h (cid:107) gw ,X = (cid:107) h − (cid:107) gw ,X , = (cid:107) D − c (cid:107) , ≤ (cid:13)(cid:13) ( O − ( X ) D − − ) † (cid:13)(cid:13) s (cid:107) h − ( X ) (cid:107) ,< (cid:107) h − ( X ) (cid:107) (cid:15) , ≤ (cid:107) h ( X ) (cid:107) + | c | (cid:112) | X | (cid:15) . If ∈ X and h is (cid:15) (cid:48) -approximately vanishing for , then (cid:107) h ( X ) (cid:107) ≤ (cid:15) (cid:48) and c ≤ (cid:15) (cid:48) .Now, we are ready to prove Theorem 5.2. Proof of Theorem 5.2.

Claims (1) and (2) are proven in Proposition 5.1. By Propo-sition 5.1, the coeﬃcient of the leading term (border term) of polynomials in G isnonzero; thus, by construction in Algorithm 1, claim (3) holds (refer to the proof of theoriginal ABM algorithm; [17]). In addition, by Proposition 4.7, claim (5) also holds .The claim (4) is proven as follows. For (cid:101) g i , (cid:101) g j ∈ (cid:101) G , we denote them as (cid:101) g i = b i − (cid:101) h i and (cid:101) g j = b j − (cid:101) h j , respectively, where b i , b j ∈ T n and (cid:101) h i , (cid:101) h j ∈ R n . Note that (cid:101) g i isan ( (cid:15)/ | γ i | )-approximately vanishing polynomial with a gradient-weighted norm 1 / | γ i | .Then, we denote the S-polynomial of (cid:101) g i , (cid:101) g j and its normal remainder as S ij and r ij ,respectively. There are two cases for the S-polynomial: (i) S ij = x k (cid:101) g i − x l (cid:101) g j and (ii) S ij = (cid:101) g i − x l (cid:101) g j . It is known that in either case, the normal remainder can be writtenas r ij = S ij − (cid:80) | (cid:101) G | µ =1 v µ (cid:101) g µ , where v µ is some coeﬃcient of (cid:101) h i [9, 17]. By the triangleinequality, (cid:107) r ij ( X ) (cid:107) ≤ (cid:107) S ij ( X ) (cid:107) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | (cid:101) G | (cid:88) µ =1 v µ (cid:101) g µ ( X ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . To show the claim (5), it is necessary to introduce the Buchberger–M¨oller algorithm for border basesand show that it can work with gradient-weighted normalization. However, owing to page limitations,we have omitted this part in this paper. (cid:107) S ij ( X ) (cid:107) ≤ (cid:15) (cid:107) X (cid:107) max | γ min | . For the second term, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | (cid:101) G | (cid:88) µ =1 v µ (cid:101) g µ ( X ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:15) | γ min | | (cid:101) G | (cid:88) µ =1 | v µ | ≤ (cid:15)n | X || γ min | max µ | v µ | , where we used | (cid:101) G | ≤ n | X | in the last inequality. Because v µ is some coeﬃcient of (cid:101) h µ ,and | v µ | ≤ (cid:107) (cid:101) g µ (cid:107) c , by Lemma 5.7, | v µ | < √ (cid:15) min { ( (cid:15)/ | γ min | ) deg( g µ ) − , }| γ min | (cid:112) | X | , Thus, Eq. (5.3) becomes (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | (cid:101) G | (cid:88) µ =1 v µ (cid:101) g µ ( X ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) < (cid:15)n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | . Applying this and Eqs. (5.3) and (5.3) to Eq. (5.3), we obtain (cid:107) r ij ( X ) (cid:107) < (cid:15) (cid:107) X (cid:107) max | γ min | + (cid:15)n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | . Therefore, from Lemma 5.8, (cid:107) r ij (cid:107) gw ,X < (1 + (cid:112) | X | ) | γ min | (cid:32) (cid:107) X (cid:107) max + n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | (cid:33) . Finally, by Lemma 5.7, we can upper bound the coeﬃcient norm by the gradient-weighted norm, which provides η in claim (4). We analyze the additional time complexity by introducing gradient-weighted normal-ization. In particular, we consider the step S2 , where a generalized eigenvalue problemEq. (12) needs to be solved instead of an eigenvalue problem in the original ABM. Notethat the computational complexity of solving generalized and standard eigenvalue prob-lems is the same. In our case, it is O ( |O| ) = O ( | X | ) because |O| ≤ | X | and we onlycompute the least (generalized) eigenvalue and the corresponding (generalized) eigen-vector. In Eq. (12), we prepare M (cid:62) M with cost O ( | X ||O| ) = O ( | X | ) by exploiting theprevious calculation. We only have to focus on the calculation of the diagonal matrix D

15n Eq. (12). Now, note that all but one diagonal entry can be taken over from the previ-ous iteration. Thus, the additional runtime complexity only comes from the evaluationof the gradient of the new term b at the given points. This costs O ( n | X | E ), where E isthe computational cost of evaluating a monomial for a point. This is negligible becausethe computational cost of solving the generalized eigenvalue problem is dominant. Inconclusion, introducing gradient-weighted normalization to the ABM algorithm does notchange the computational complexity. In contrast, if we use gradient normalization, itcan be more computationally expensive; it requires to calculate ∇O ( X ) (cid:62) ∇O ( X ), whichcosts O ( n | X ||O| ) = O ( n | X | ) if we utilize the results of the previous calculation. We demonstrate the scaling consistency property (Proposition 5.5) using a simple numer-ical example. We consider six points sampled from a unit circle, X ∗ = { (cos θ k , sin θ k ) } k =1 ,..., ,where θ k = kπ/

3. The points are perturbed by an additive Gaussian noise N ( , (cid:15)I ),where (cid:15) = 0 .

1, and we obtain , X = { (0 . , . , ( − . , . , ( − . , − . , ( − . , − . , (0 . , − , (0 . , − . } . We compare the the two normalizations by applying the ABM algorithm to (

X, (cid:15) ) and(0 . X, . (cid:15) ), and the degree reverse lexicographic ordering was used. The results areshown in Fig. 1; gradient-weighted normalization leads to two bases of the same sizeand almost identical contour plots (Fig. 1(a)), whereas the coeﬃcient normalization doesnot (Fig. 1(b)). Furthermore, as Proposition 5.5 suggests, the coeﬃcients of polynomialsin the ﬁrst and second rows in Fig. 1 exhibit a certain relationship. For example, theﬁrst basis polynomials (ellipses) are as follows. g = 0 . x − . xy + 0 . y + 0 . x − . y − . , (cid:98) g = 4 . x − . xy + 3 . y + 0 . x − . y − . . The coeﬃcients of the constant, linear, and quadratic terms in g are 0 . − , 0 . , and0 . (cid:98) g , respectively. In this paper, we proposed gradient-weighted normalization for the approximate bor-der basis computation of vanishing ideals. We showed that it is a valid normalizationin the border basis computation by proving that the gradient-weighted norm alwaystakes nonzero values for order terms and border basis polynomials. The introduction ofgradient-weighted normalization is compatible with the existing analysis of approximateborder bases and the computation algorithms. The time complexity does not change The values of points and coeﬃcients in this section are rounded for visibility.

16s well. The data-dependent nature of gradient-weighted normalization provides severalimportant properties (stability against perturbation and scaling consistency) in basiscomputation algorithms, which cannot be realized by the coeﬃcient normalization.A current limitation of our study is that although we proved that the change ofthe evaluation of a gradient-weighted unitary polynomial is bounded with respect tothe magnitude of the perturbation, it does not necessarily follow that the whole basiscomputation algorithm is more robust than the existing methods. Nevertheless, weconsider that the present study provides a new ingredient to analyze the border basiscomputation in the approximate setting, where perturbed points are given and the stablecomputation is required. In the present study, an inequality that relates the gradient-weighted norm and the coeﬃcient norm is also derived, which we consider is helpful toexploit the existing analyses.

References [1] John Abbott, Claudia Fassino, and Maria-Laura Torrente. Stable border bases forideals of points.

Journal of Symbolic Computation , 43(12):883–894, 2008.[2] Rika Antonova, Maksim Maydanskiy, Danica Kragic, Sam Devlin, and Katja Hof-mann. Analytic manifold learning: Unifying and evaluating representations forcontinuous control. arXiv preprint arXiv:2006.08718 , 2020.[3] David Cox, John Little, and Donal O’shea.

Ideals, varieties, and algorithms , vol-ume 3. Springer, 1992.[4] Claudia Fassino. Almost vanishing polynomials for sets of limited precision points.

Journal of Symbolic Computation , 45(1):19–37, 2010.[5] Claudia Fassino and Maria-Laura Torrente. Simple varieties for limited precisionpoints.

Theoretical Computer Science , 479:174–186, 2013.[6] Daniel Heldt, Martin Kreuzer, Sebastian Pokutta, and Hennie Poulisse. Approxi-mate computation of zero-dimensional polynomial ideals.

Journal of Symbolic Com-putation , 44:1566–1591, 2009.[7] Chenping Hou, Feiping Nie, and Dacheng Tao. Discriminative vanishing componentanalysis. In

Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence(AAAI) , pages 1666–1672. AAAI Press, 2016.[8] Artur Karimov, Erivelton G. Nepomuceno, Aleksandra Tutueva, and Denis Bu-tusov. Algebraic method for the reconstruction of partially observed nonlinearsystems using diﬀerential and integral embedding.

Mathematics , 8(2):300, 2020.[9] Achim Kehrein and Martin Kreuzer. Characterizations of border bases.

Journal ofPure and Applied Algebra , 196(2):251–270, 2005.[10] Hiroshi Kera and Yoshihiko Hasegawa. Noise-tolerant algebraic method for recon-struction of nonlinear dynamical systems.

Nonlinear Dynamics , 85:675–692, 2016.1711] Hiroshi Kera and Yoshihiko Hasegawa. Approximate vanishing ideal via data knot-ting. In

Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence(AAAI) , pages 3399–3406. AAAI Press, 2018.[12] Hiroshi Kera and Yoshihiko Hasegawa. Spurious vanishing problem in approximatevanishing ideal.

IEEE Access , 7:178961–178976, 2019.[13] Hiroshi Kera and Yoshihiko Hasegawa. Gradient boosts the approximate vanishingideal. In

Proceedings of the Thirty-Fourth AAAI Conference on Artiﬁcial Intelli-gence (AAAI) , pages 4428–4425. AAAI Press, 2020.[14] Hiroshi Kera and Hitoshi Iba. Vanishing ideal genetic programming. In

Proceedingsof the 2016 IEEE Congress on Evolutionary Computation (CEC) , pages 5018–5025.IEEE, 2016.[15] Franz J Kir´aly, Martin Kreuzer, and Louis Theran. Dual-to-kernel learning withideals. arXiv preprint arXiv:1402.0099 , 2014.[16] Martin Kreuzer and Lorenzo Robbiano.

Computational commutative algebra 2 ,volume 2. Springer Science & Business Media, 2005.[17] Jan Limbeck.

Computation of approximate border bases and applications . PhDthesis, Passau, Universit¨at Passau, 2013.[18] Roi Livni, David Lehavi, Sagi Schein, Hila Nachliely, Shai Shalev-Shwartz, andAmir Globerson. Vanishing component analysis. In

Proceedings of the ThirteenthInternational Conference on Machine Learning (ICML) , pages 597–605. PMLR,2013.[19] Lorenzo Robbiano and John Abbott.

Approximate Commutative Algebra . Springer-Verlag Wien, 01 2010.[20] Hans J. Stetter.

Numerical Polynomial Algebra . Society for Industrial and AppliedMathematics, USA, 2004.[21] Maria-Laura Torrente.

Application of algebra in the oil industry . PhD thesis, ScuolaNormale Superiore, Pisa, 2008.[22] Lu Wang and Tomoaki Ohtsuki. Nonlinear blind source separation unifying van-ishing component analysis and temporal structure.

IEEE Access , 6:42837–42850,2018.[23] Zhichao Wang, Qian Li, Gang Li, and Guandong Xu. Polynomial representationfor persistence diagram. In

The IEEE Conference on Computer Vision and PatternRecognition (CVPR) , June 2019. 18 a) Gradient-weighted normalization (0.1X , ✏ = 0.01)(X , ✏ = 0.1)(X , ✏ = 0.1)(0.1X , ✏ = 0.01) (b) Coefficient normalization (a) Gradient-weighted normalization AAACFHicbVDLSgMxFM3UV62vURcu3ASLUEGGGVF0IxTduKxgH9AOJZNm2tBMMiQZYRjmN/wBt/oH7sSte3/A7zBtZ2GrBy73cM693OQEMaNKu+6XVVpaXlldK69XNja3tnfs3b2WEonEpIkFE7ITIEUY5aSpqWakE0uCooCRdjC+nfjtRyIVFfxBpzHxIzTkNKQYaSP17YOa63idU9gjsaJMcHgNXcf1Tvp21fQp4F/iFaQKCjT69ndvIHASEa4xQ0p1PTfWfoakppiRvNJLFIkRHqMh6RrKUUSUn00/kMNjowxgKKQpruFU/b2RoUipNArMZIT0SC16E/E/r5vo8MrPKI8TTTieHQoTBrWAkzTggEqCNUsNQVhS81aIR0girE1mc1fiUaooVrkJxluM4S9pnTnehePen1frN0VEZXAIjkANeOAS1MEdaIAmwCAHz+AFvFpP1pv1bn3MRktWsbMP5mB9/gAh35yT (0 . X, ✏ = 0 . AAACEHicbVDLSgMxFM3UV62v0S7dBItQQYYZUXQjFN24rGBtoR1KJs20oZkkJBlhKP0Jf8Ct/oE7cesf+AN+h2k7C9t64MLhnHs5lxNJRrXx/W+nsLK6tr5R3Cxtbe/s7rn7B49apAqTBhZMqFaENGGUk4ahhpGWVAQlESPNaHg78ZtPRGkq+IPJJAkT1Oc0phgZK3XdcrV1CjtEasoEh9fQ94KTrlvxPX8KuEyCnFRAjnrX/en0BE4Twg1mSOt24EsTjpAyFDMyLnVSTSTCQ9QnbUs5SogOR9Pnx/DYKj0YC2WHGzhV/16MUKJ1lkR2M0FmoBe9ifif105NfBWOKJepIRzPguKUQSPgpAnYo4pgwzJLEFbU/grxACmEje1rLkUOMk2xHttigsUalsnjmRdceP79eaV2k1dUBIfgCFRBAC5BDdyBOmgADDLwAl7Bm/PsvDsfzudsteDkN2UwB+frF0thm6w= ( X, ✏ = 0 . AAACEHicbVDLSgMxFM3UV62v0S7dBItQQYYZUXQjFN24rGBtoR1KJs20oZkkJBlhKP0Jf8Ct/oE7cesf+AN+h2k7C9t64MLhnHs5lxNJRrXx/W+nsLK6tr5R3Cxtbe/s7rn7B49apAqTBhZMqFaENGGUk4ahhpGWVAQlESPNaHg78ZtPRGkq+IPJJAkT1Oc0phgZK3XdcrV1CjtEasoEh9fQ94KTrlvxPX8KuEyCnFRAjnrX/en0BE4Twg1mSOt24EsTjpAyFDMyLnVSTSTCQ9QnbUs5SogOR9Pnx/DYKj0YC2WHGzhV/16MUKJ1lkR2M0FmoBe9ifif105NfBWOKJepIRzPguKUQSPgpAnYo4pgwzJLEFbU/grxACmEje1rLkUOMk2xHttigsUalsnjmRdceP79eaV2k1dUBIfgCFRBAC5BDdyBOmgADDLwAl7Bm/PsvDsfzudsteDkN2UwB+frF0thm6w= ( X, ✏ = 0 . AAACFHicbVDLSgMxFM3UV62vURcu3ASLUEGGGVF0IxTduKxgH9AOJZNm2tBMMiQZYRjmN/wBt/oH7sSte3/A7zBtZ2GrBy73cM693OQEMaNKu+6XVVpaXlldK69XNja3tnfs3b2WEonEpIkFE7ITIEUY5aSpqWakE0uCooCRdjC+nfjtRyIVFfxBpzHxIzTkNKQYaSP17YOa63idU9gjsaJMcHgNXcf1Tvp21fQp4F/iFaQKCjT69ndvIHASEa4xQ0p1PTfWfoakppiRvNJLFIkRHqMh6RrKUUSUn00/kMNjowxgKKQpruFU/b2RoUipNArMZIT0SC16E/E/r5vo8MrPKI8TTTieHQoTBrWAkzTggEqCNUsNQVhS81aIR0girE1mc1fiUaooVrkJxluM4S9pnTnehePen1frN0VEZXAIjkANeOAS1MEdaIAmwCAHz+AFvFpP1pv1bn3MRktWsbMP5mB9/gAh35yT (0 . X, ✏ = 0 . (b) Coefficient normalization Figure 1: Numerical example of the scaling consistent property achieved by gradient-weighted normalization (Proposition 5.5). With gradient-weighted normalization, scal-ing of the points does not have an essential eﬀect on the ABM algorithm if the threshold (cid:15)(cid:15)