Border Basis Computation with Gradient-Weighted Norm
BBorder Basis Computationwith Gradient-Weighted Norm
Hiroshi Kera ∗ Abstract
Normalization of polynomials plays an essential role in the approximate basiscomputation of vanishing ideals. In computer algebra, coefficient normalization,which normalizes a polynomial by its coefficient norm, is the most common method.In this study, we propose gradient-weighted normalization for the approximate bor-der basis computation of vanishing ideals, inspired by the recent results in machinelearning. The data-dependent nature of gradient-weighted normalization leads topowerful properties such as better stability against perturbation and consistency inthe scaling of input points, which cannot be attained by the conventional coefficientnormalization. With a slight modification, the analysis of algorithms with coeffi-cient normalization still works with gradient-weighted normalization and the timecomplexity does not change. We also provide an upper bound on the coefficientnorm based on the gradient-weighted norm, which allows us to discuss the approx-imate border bases with gradient-weighted normalization from the perspective ofthe coefficient norm.
Given a set of points X ⊂ R n , the vanishing ideal of X is a set of polynomials in R [ x , . . . , x n ] that vanish for any x ∈ X . I ( X ) = { g ∈ R [ x , . . . , x n ] | ∀ x ∈ X, g ( x ) = 0 } . In the last decade, approximate computation of bases of vanishing ideals has been exten-sively studied [1, 4, 6, 11, 12, 13, 15, 17, 18, 19], where a basis comprises approximatelyvanishing polynomials, i.e., g ( x ) ≈ , ( ∀ x ∈ X ). Such approximate basis computationand approximately vanishing polynomials have been exploited in various fields such asdynamics reconstruction, signal processing, and machine learning [2, 7, 8, 10, 14, 22,21, 23]. The wide variety of applications is based on the fact that the approximatebasis computation takes a set of noisy points as its input—which is suitable for the ∗ Department of Information and Communication Engineering, Graduate School of InformationScience and Technology, The University of Tokyo. Corresponding author: Hiroshi Kera (e-mail:[email protected]). a r X i v : . [ c s . S C ] J a n ecent data-driven science—and efficiently computes a set of multivariate polynomialsthat characterize the given data.To consider an approximately vanishing polynomial g for X , normalization of g playsa fundamental role because scaling g by k ∈ R can change the extent of approximatevanishing. For example, with a threshold (cid:15) = 0 .
5, even if g is approximately vanishingas g ( x ) = 0 . ≤ (cid:15) , the scaling 10 g is not approximately vanishing. In computer algebra,coefficient normalization, where polynomials are normalized to have a unit coefficientnorm, is the most common choice. In contrast, in machine learning, the basis com-putation of vanishing ideals is performed in a monomial-agnostic manner to sidestepsymbolic computation and term orderings [12, 15, 18], and one cannot efficiently getaccess to the coefficients of terms. Thus, polynomials are handled without any propernormalization. Recently, this issue was resolved by gradient normalization [13], whichuses the gradient norm (cid:113)(cid:80) x ∈ X (cid:107)∇ g ( x ) (cid:107) . Interestingly, the data-dependent nature ofgradient normalization provides fascinating properties that have never been realized byother basis computation algorithms. However, the direct application of the gradient nor-malization to the monomial-aware basis computation in computer algebra does not takeover these advantages and merely increases the computational cost. Thus, an effectivedata-dependent normalization remains unexplored for computer-algebraic approaches.In this paper, we propose a new normalization, called gradient-weighted normal-ization, which is a hybrid of the coefficient normalization conventionally used in com-puter algebra and the gradient normalization recently developed in machine learning.Gradient-weighted normalization can be applied to most existing basis computationalgorithms for vanishing ideals in computer algebra. In particular, we focus on theapproximate computation of border bases because these are the most common choicesin the approximate computation [1, 6, 17] due to their better numerical stability thanthe Gr¨obner bases [4, 20]. We highlight the following advantages of gradient-weightednormalization in the approximate border basis computation. As a particular example,we analyze the approximate Buchberger–M¨oller (ABM) algorithm [17]. • Gradient-weighted normalization realizes an approximate border basis computa-tion that outputs polynomials that are more robust against perturbation on inputpoints. • With gradient-weighted normalization, approximate border basis computation equipsa scaling consistency; scaling input points does not change the size of the outputbasis and only linearly scales the evaluation values for the input points. • Gradient-weighted normalization only requires the algorithm to be slightly modi-fied and leads to slight changes in the analysis of the original algorithm. The timecomplexity of the algorithms does not change with the use of gradient-weightednormalization as the gradient normalization.Furthermore, we derive an upper bound on coefficient norm of a vanishing polynomialbased on the gradient-weighted norm. This result is helpful in analyzing the approximateborder bases, whose definition relies on the coefficient norm. We consider that this2tudy provides a new direction of approximate border basis computation toward data-dependent normalization and analysis.
The gradient of polynomials was exploited for the approximate computation of vanish-ing ideals in several studies. In [1], the first-order approximation (and thus gradient)of polynomials was computed to discover a set of monomials whose evaluation matrixremained full-rank for small perturbation in points. Similarly, in [5], the first-order ap-proximation of polynomials was considered to compute a low-degree polynomial thatapproximately passed through the given points in terms of the geometrical distance.However, both methods use coefficient normalization and do not always succeed. Incontrast, our algorithm uses gradient information in the normalization and always suc-ceeds.In the study that is the most similar to the present study [13], polynomials nor-malized by the gradient norm were considered. However, their method focuses on themonomial-agnostic basis computation, which does not consider a polynomial as a linearcombination of terms but instead consider it as a linear combination of lower-degreepolynomials. Although this can be helpful in some practical situations in which sym-bolic computation and term ordering are not favorable, it is still unknown how helpfulthe data-dependent normalization is in the monomial-aware setting, which is standardin computer algebra. In particular, the direct application of the gradient normaliza-tion in the border basis computation cannot fully exploit the advantages shown in themonomial-agnostic basis computation. Furthermore, how we can relate the gradientnorm to the coefficient norm, which plays an important role in approximate borderbases, remains unclear. The gradient-weighted norm proposed in this study realizes theadvantages of the gradient normalization and also upper bound the coefficient norm.By taking advantage of the monomial-aware setting, a more detailed analysis than [13]is performed such as providing lower and upper bounds on the norm of the gradient ofterms and polynomials.
Throughout the paper, we consider a finite set of points X ⊂ R n , a polynomial ring R n = R [ x , . . . , x n ], and set of terms T n ⊂ R n , where x , . . . , x n are indeterminates.The definitions of the order ideal and the border basis are based on those in [16], andthe definitions of approximate notions are based on [6]. Definition 3.1.
Given a set of points X = { x , x , ..., x N } ⊂ R n , with a slight abuse ofnotation, the evaluation vector of a polynomial h ∈ R n and its gradient ∇ h is defined Here, following the convention in the border basis computation, we use terms for the power productsof indeterminates and monomials for the terms accompanied by coefficients.
3s follows, respectively. h ( X ) = (cid:0) h ( x ) h ( x ) · · · h ( x N ) (cid:1) (cid:62) ∈ R N , ∇ h ( X ) = (cid:0) ∇ h ( x ) (cid:62) ∇ h ( x ) (cid:62) · · · ∇ h ( x N ) (cid:62) (cid:1) (cid:62) ∈ R nN . For a set of polynomials H = { h , h , . . . , h s } ⊂ R n and the set of their gradients ∇ H = {∇ h , ∇ h , . . . , ∇ h s } , each evaluation matrix is defined as H ( X ) = (cid:0) h ( X ) h ( X ) · · · h s ( X ) (cid:1) ∈ R N × s , ∇ H ( X ) = (cid:0) ∇ h ( X ) ∇ h ( X ) · · · ∇ h s ( X ) (cid:1) ∈ R nN × s . Definition 3.2.
A polynomial f ∈ R n is said to be unitary if the norm of its coefficientvector equals one. Definition 3.3.
A finite set of terms
O ⊂ T n is called an order ideal if the followingholds: if t ∈ T n divides o ∈ O , then t ∈ O . The border of O is defined as ∂ O =( (cid:83) nk =1 x k O ) \O . Definition 3.4.
Let
O ⊂ k [ x , . . . , x n ] be an order ideal. Then, an O -border prebasis is a set of polynomials in the form b − (cid:88) o ∈ O c o o, where b ∈ ∂ O , and c o ∈ k . If O is a basis of the k -vector space k [ x , x , . . . , x n ] /I , then G is called an O -border basis of an ideal I . Definition 3.5.
Given δ ≥
0, a border basis G ⊂ R n is said to be a δ -approximate O -border basis of an ideal I if for any g, (cid:101) g ∈ G , the normal remainder of the S-polynomial has a coefficient norm which is equal to or less than δ . Definition 3.6.
Given (cid:15) ≥
0, a polynomial g ∈ R n is said to be an (cid:15) -approximatelyvanishing for a set of points X ⊂ R n if (cid:107) g ( X ) (cid:107) ≤ (cid:15) , where (cid:107) · (cid:107) denotes the Euclideannorm. Definition 3.7.
Given (cid:15) ≥
0, an ideal
I ⊂ R n is said to be an (cid:15) -approximate van-ishing ideal for a set of points X ⊂ R n if there exists a system of unitary polynomialsthat generates I and (cid:15) -approximately vanishing for X . Remark 3.8.
Let us consider the O -border basis G of the vanishing ideal I ( X ) of X ⊂ R n . The evaluation vectors of the order terms span R n . The evaluation vectors ofthe terms in O are linearly independent, and |O| = | X | . In the approximate case, theformer still holds, and the latter becomes |O| ≤ | X | . We omit the definition of normal remainders and S-polynomials, which are relatively basic notionsin computer algebra. Refer to [3]. ther notations We denote the support of a given polynomial by supp( · ) and the setof linear combinations of a given set of terms with coefficients in R by span R ( · ). Besides, (cid:107)·(cid:107) denotes the Euclidean norm of a vector, and (cid:107)·(cid:107) c denotes the coefficient norm of apolynomial. The total degree of a polynomial is denoted by deg( · ) and deg k ( · ) denotesthe degree with respect to x k . The cardinality of a set is denoted by |·| . Definition 4.1.
The gradient norm of a polynomial g ∈ R n with respect to X ⊂ R n is (cid:107) g (cid:107) g ,X = (cid:113)(cid:80) x ∈ X (cid:107)∇ g ( x ) (cid:107) /Z , where Z = (cid:112)(cid:80) nk =1 deg k ( g ) . Definition 4.2.
The gradient-weighted norm of a polynomial g = (cid:80) i c i t i , ( c i ∈ R , t i ∈ T n ) is (cid:107) g (cid:107) gw ,X = (cid:115)(cid:88) i c i (cid:107) t i (cid:107) g,X . If the gradient-weighted coefficient norm of g is equal to one, then g is gradient-weighted unitary . Remark 4.3.
For any term t ∈ T n , its gradient norm and gradient-weighted norm areidentical, i.e., (cid:107) t (cid:107) g ,X = (cid:107) t (cid:107) gw ,X . The derivations of our results work with any Z > Z provides simpler bounds.In general, the gradient-weighted norm and the coefficient norm of a polynomial arenot correlated; a large gradient norm does not always imply a large coefficient norm andvice versa. The following two examples illustrate this: Example 4.4.
Let us consider a polynomial f = x y − c ∈ R [ x, y ] , ( c ∈ R ). Thegradient-weighted norm of f is (cid:107) f (cid:107) gw ,X = 0 for X = { (1 , , (0 , } , whereas the coeffi-cient norm (cid:107) f (cid:107) c = √ c can be arbitrarily large by increasing | c | . Example 4.5.
Let us consider a polynomial f = ( x + y − / √ ∈ R [ x, y ] , ( c ∈ R ).The coefficient norm of f is (cid:107) f (cid:107) c = 1, whereas the gradient-weighted norm for X = { ( k, , (0 , k ) } is (cid:107) f (cid:107) gw ,X = 2 | k | / √
3, which can be arbitrary large by increasing | k | .Example 4.4 also indicates that normalizing polynomials with their gradient-weightednorm is not always valid because zero-division can occur. However, in border basiscomputation, we can show that gradient-weighted normalization is always valid. Lemma 4.6.
Let
O ⊂ T n be an order ideal. Then, for any o ∈ O , if deg k ( o ) > ∂o∂x k (cid:54) = 0), then ∂o∂x k / deg k ( o ) ∈ O . Besides, for any b ∈ ∂ O , there exists some x k thatsatisfies ∂b∂x k / deg k ( b ) ∈ O . Proof.
Let o = (cid:81) nl =1 x α l l ∈ O , where α l ∈ Z ≥ and α k >
0. Then, ∂o / ∂x k =deg k ( o ) x α k − k (cid:81) l (cid:54) = k x α l l . Because x α k − k (cid:81) l (cid:54) = k x α l l divides o , it is ∂o∂x k / deg k ( o ) ∈ O . For b ∈ ∂ O , we can write b = x k o for some x k and o ∈ O . Thus, b = x k (cid:81) nl =1 x α l l ∈ O and ∂b / ∂x k = ( α k + 1) (cid:81) nl =1 x α l l = deg k ( b ) o ; hence, ∂b∂x k / deg k ( b ) = o ∈ O .5 roposition 4.7. Let G ⊂ R n be the O -border basis of the vanishing ideal I ( X ) of X ⊂ R n . Then, the following holds.1. Any o ∈ O\{ } has a nonzero gradient-weighted norm, i.e., (cid:107) o (cid:107) gw ,X (cid:54) = 0.2. Any g ∈ G has a nonzero gradient-weighted norm, i.e., (cid:107) g (cid:107) gw ,X (cid:54) = 0.3. For O = { , o , . . . , o |O| } , ∇ o ( X ) , . . . , ∇ o |O| ( X ) are linearly independent.4. For G = { g , . . . , g | G | } , ∇ g ( X ) , . . . , ∇ g | G | ( X ) are linearly independent. Proof.
From Lemma 4.6, for any order term except 1, there is a partial derivative whoseterm is again an order term. In addition, for any border basis polynomial, whose sup-port is
O ∪ ∂ O , there is a partial derivative that is a nontrivial linear combination oforder terms. Because the evaluation vectors of order terms are linearly independent (Re-mark 3.8), the gradient-weighted norm of order terms and border basis polynomials isalways nonzero (properties (1) and (2) are proven).Next, we prove property (3). For o ∈ O , we define the index set i ( o ) = { k | deg k ( o ) > } and an order term o ( k ) := ∂o∂x k / deg k ( o ) for k ∈ i ( o ). Let us consider any two distinctorder terms, o , o ∈ O\{ } . If i ( o ) ∩ i ( o ) = ∅ , then ∇ o ( X ) and ∇ o ( X ) are linearlyindependent because ∂o ∂x k ( X ) (cid:54) = always indicates that ∂o ∂x k ( X ) = and vice versa.If there exists k ∈ i ( o ) ∩ i ( o ), noting that o ( k )1 , o ( k )2 ∈ O are distinct order terms,their evaluation vectors with respect to X are linearly independent. Thus, ∇ o ( X ) and ∇ o ( X ) are always linearly independent. This proof can be readily generalized fromtwo order terms to O = { o , o , . . . , o s } . Property (4) can be proven similarly by notingthat distinct g , g ∈ G have different border terms.Proposition 4.7 indicates that gradient-weighted normalization is always valid in thebasis computation. Thus, given a polynomial, its coefficient normalization and gradient-weighted normalization are identical up to a constant scale (although this differenceof scales leads to interesting results presented in this paper). Hence, replacing thecoefficient norm with the gradient-weighted norm does not affect most symbolic analysesof border bases in the existing studies. Besides such compatibility, border bases withgradient-weighted normalization yield higher stability at the perturbation on points. Proposition 4.8.
For any gradient-weighted unitary polynomial g ∈ R n for X ⊂ R n ,it is (cid:107)∇ g ( X ) (cid:107) ≤ deg( g ) (cid:112) | X | . Proof.
Let g = (cid:80) i c i t i , ( c i ∈ R , t i ∈ T n ). In addition, we define an index set j ( g ) = { i ∈{ , . . . , s } | (cid:107) t i (cid:107) gw ,X (cid:54) = 0 } . Then, ∇ g ( x ) = (cid:88) i ∈ j ( g ) c i (cid:107) t i (cid:107) gw ,X ∇ t i ( x ) (cid:107) t i (cid:107) gw ,X , = (cid:88) i ∈ j ( g ) c i (cid:107) t i (cid:107) gw ,X ∇ t i ( x ) (cid:107)∇ t i ( X ) (cid:107) (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) k =1 deg k ( t i ) , (cid:112)(cid:80) k deg k ( t i ) ≤ deg( t i ) ≤ deg( g ), the triangle inequality, and (cid:107) g (cid:107) gw ,X = (cid:113)(cid:80) si =1 c i (cid:107) t (cid:107) ,X = 1, (cid:107)∇ g ( x ) (cid:107) ≤ deg( g ) (cid:88) i ∈ j ( g ) | c i |(cid:107) t i (cid:107) gw ,X (cid:107)∇ t i ( x ) (cid:107)(cid:107)∇ t i ( X ) (cid:107) , ≤ deg( g ) (cid:115) (cid:88) i ∈ j ( g ) c i (cid:107) t i (cid:107) ,X (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) i ∈ j ( g ) (cid:107)∇ t i ( x ) (cid:107) (cid:107)∇ t i ( X ) (cid:107) , = deg( g ) (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) i ∈ j ( g ) (cid:107)∇ t i ( x ) (cid:107) (cid:107)∇ t i ( X ) (cid:107) , where at the second inequality, we used the Cauchy–Schwarz inequality. Thus, (cid:107)∇ g ( X ) (cid:107) = (cid:115) (cid:88) x ∈ X (cid:107) g ( x ) (cid:107) , ≤ deg( g ) (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) x ∈ X (cid:88) i ∈ j ( g ) (cid:107)∇ t i ( x ) (cid:107) (cid:107)∇ t i ( X ) (cid:107) , = deg( g ) (cid:112) | supp( g ) \{ }| , ≤ deg( g ) (cid:112) | X | . At the last inequality, we used | supp( g ) \{ }| ≤ |O| ≤ | X | . Remark 4.9.
The inequality in Proposition 4.8 becomes (cid:107)∇ g ( X ) (cid:107) ≤ (cid:112) | X | if we use Z = 1 in Definition 4.1.Proposition 4.8 implies that for small perturbation p on x , two evaluation values g ( x )and g ( x + p ) are close to each other and the difference can be bounded by a constantscaling of the magnitude of the perturbation. This is not the case with the coefficientnormalization because the (coefficient-)unitary polynomial does not necessarily indicatea small gradient (cf., Example 4.5). Among several methods of the approximate border basis construction, we choose theABM algorithm [17] to make the analysis simple.
Given a finite set of points X ⊂ R n , an error tolerance (cid:15) ≥
0, and a term ordering σ ,the ABM algorithm collects order terms and approximately vanishing polynomials from7 lgorithm 1: The ABM algorithm with gradient-weighted normalization
Input:
X, (cid:15), σ
Output: G, O G = {} , O = { } for d = 1 , , . . . do L = { b ∈ ∂ O | deg( b ) = d } // Assuming terms are ordered increasingly w.r.t. σ if | L | = 0 then Return M := ( G, O ) and terminate. for b in L do Solve the generalized eigenvalue problem Eq. (12) and obtain( λ min , v min ). if √ λ ≤ (cid:15) then /* O = { o , o , · · · , o s } and v min = ( v , . . . , v s +1 ) (cid:62) */ g := v b + v o + · · · + v s +1 o s G = G ∪ { g } else O = O ∪ { b } lower to higher degrees. First, O = { } and G = {} are set at degree 0. At degree d ≥ d terms are prepared as L = { b ∈ ∂ O | deg( b ) = d } . If L is empty, the algorithmoutputs ( O , G ) and terminates; otherwise, the following steps S1–S3 are repeated until L becomes empty. S1 Select the smallest b ∈ L in terms of σ and remove b from L . S2 Let
M, D be M = (cid:0) b ( X ) O ( X ) (cid:1) and D = diag( (cid:107) b (cid:107) gw ,X , (cid:107) o (cid:107) gw ,X , · · · , (cid:107) o s (cid:107) gw ,X ) , where O = { o , . . . , o s } . Solve the following generalized eigenvalue problem: M (cid:62) M v min = λ min D v min , where λ min and v min are the smallest generalized eigenvalue and the correspondinggeneralized eigenvector, respectively. S3 If √ λ min ≤ (cid:15) , Define a new polynomial, g = v b + v o + v o + · · · + v s +1 o s , In the original paper [17], the largest term is selected. We consider this to be a minor error and thesmallest term should be selected first because the term b is a potential leading term (or border term)and thus must always be larger than the terms in the tentative O . v min = ( v , . . . , v s +1 ) (cid:62) and G is updated by G = G ∪ { g } . Otherwise,update O by O = O ∪ { b } .Once L becomes empty, we proceed to the next degree d + 1 and construct a new L . Proposition 5.1.
During the ABM algorithm with gradient-weighted normalization,the following always holds.1. In Eq. (12), g is gradient-weighted unitary with a nonzero coefficient on b and √ λ min -approximately vanishing for X .2. Any gradient-weighted unitary polynomial h ∈ span R ( O ) is not (cid:15) -approximatelyvanishing for X . Proof.
We prove the claim by induction. At the initialization, the claim holds. Assumethat the claim holds till some point of the ABM algorithm, and we have O , G, b at S1 . Bysolving Eq. (12) at S2 , we obtain the coefficient vector v min and g = bv + v o + v o + · · · + v s +1 o s . Note that solving Eq. (12) minimizes (cid:107) g ( X ) (cid:107) = v (cid:62) min M (cid:62) M v min = λ min with the constraint (cid:107) g (cid:107) ,X = v (cid:62) min D v min = 1. Thus, g is gradient-weighted unitaryand √ λ min -approximately vanishing. By construction, the generalized eigenvalue prob-lem ( O ( X ) (cid:62) O ( X ) , (cid:101) D ), where (cid:101) D is a diagonal matrix whose diagonal entries are thegradient-weighted norm of the terms in O and only has eigenvalues larger than (cid:15) be-cause O is extended only when √ λ min > (cid:15) . This implies that any gradient-weightedunitary polynomial whose support is O is not (cid:15) -approximately vanishing and thus theleading coefficient b in g is nonzero.The following theorem argues that the ABM algorithm with gradient-weighted nor-malization can enjoy properties that are almost identical to those of the original ABMalgorithm (Theorem 4.3.1 in [17]). The proof is presented in Section 5.3. Theorem 5.2.
Given X ⊂ R n , (cid:15) ≥
0, and a term ordering σ , the ABM algorithmwith gradient-weighted normalization computes G = { g , . . . , g | G | } ⊂ R n and O = { o , . . . , o | O | } ⊂ T n , which have the following properties:1. All the polynomials in G are gradient-weighted unitary and G generates an (cid:15) -approximately vanishing ideal of X .2. No gradient-weighted unitary polynomial exists in span R ( O ) which vanishes (cid:15) -approximately on X .3. If O is an order ideal of terms, then the set (cid:101) G = { / LC σ ( g ) g | g ∈ G } is an O -border prebasis, where LC σ ( · ) is the leading coefficient of a polynomial in theordering σ .4. If O is an order ideal of terms and if ∈ X , then the set (cid:101) G is a δ -approximateborder basis in terms of gradient-weighted coefficient norm and an η -approximate9order basis in terms of the coefficient norm, where δ and η , respectively, are δ < (1 + (cid:112) | X | ) | γ min | (cid:32) (cid:107) X (cid:107) max + n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | (cid:33) ,η < (cid:112) δ + ( (cid:15)/ | γ min | ) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| (cid:112) | X | , where γ max is the maximum absolute value of the coefficient of the border terms in G , deg( G ) is the highest degree of polynomials in G , and (cid:107) X (cid:107) max = max k (cid:107) x k ( X ) (cid:107) .5. If (cid:15) = 0, the algorithm produces the same results as the Buchberger–M¨oller algo-rithm for border bases with gradient-weighted normalization.The main change in our theorem from the original theorem with coefficient normal-ization is as follows: in Theorem 5.2, (i) the unitarity of polynomials is based on thegradient-weighted norm instead of the coefficient norm; and (ii) there are two upperbounds on the approximation in terms of the gradient-weighted norm and coefficientnorm . Notably, the approximation of a border basis can be discussed with the coef-ficient norm, even when the basis is computed with the gradient-weighted norm. Asshown in Examples 4.4 and 4.5, gradient-weighted norm and the coefficient norm arenot generally correlated. However, we show in Theorem 5.2 that in the border basiscomputation, we can upper bound the coefficient norm by the gradient-weighted norm. We now present two advantages of using gradient-weighted normalization. The firstadvantage is the robustness against the perturbation on the input points.
Proposition 5.3.
Let G ⊂ R n be an O -border basis of I ( X ), where O is an orderideal, where X = { x , . . . , x N } ⊂ R n . Let P = { p , . . . , p N } ⊂ R n be a set of smallperturbations. If g ∈ G is gradient-weighted unitary, then, g ( X + P ) ≤ (cid:15) + (cid:107) p max (cid:107) deg( g ) (cid:112) | X | + o ( (cid:107) p max (cid:107) ) , where p max = arg max p ∈ P (cid:107) p (cid:107) , and o ( · ) is the Landau’s small o. Proof.
By the Taylor expansion, we get (cid:107) g ( X + P ) (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 (cid:0) g ( x i ) + p (cid:62) i ∇ g ( x i ) + o ( (cid:107) p i (cid:107) ) (cid:1) , ≤ (cid:107) g ( X ) (cid:107) + (cid:107) p max (cid:107)(cid:107)∇ g ( X ) (cid:107) + o ( (cid:107) p max (cid:107) ) , ≤ (cid:15) + (cid:107) p max (cid:107) deg( g ) (cid:112) | X | + o ( (cid:107) p max (cid:107) ) , where at the last inequality, we used Proposition 4.8. Here, we assume ∈ X ; however, this restriction can be removed because the key lemmas (Lem-mas 5.8 and 5.7) do not assume ∈ X . However, because of the restriction of the number of pages andbecause the bounds are a little complicated, we omitted this result. We aim to derive simpler boundsin future work. emark 5.4. The inequality in Proposition 5.3 becomes g ( X + P ) ≤ (cid:15) + (cid:107) p max (cid:107) (cid:112) | X | + o ( (cid:107) p max (cid:107) ) if we use Z = 1 in Definition 4.1.The upper bound is similar to that derived in [1], where the first-order approximationis used as in our analysis. However, their approach relies on the coefficient normalizationand only succeeds when there is a set of points that is close enough to the original setof points as well as satisfies a certain criterion. The criterion is a little complicated tocalculate and is not necessarily satisfied. In contrast, gradient-weighted normalizationactively scales polynomials such that the extent of vanishing of the polynomials alwayssatisfies the bound in Proposition 5.3.Another advantage of using gradient-weighted normalization is that it enables theABM algorithm to output similar bases before and after scaling input points. Proposition 5.5.
Suppose the ABM algorithm with gradient-weighted normalizationoutputs ( O , G ) for ( X, (cid:15), σ ) and ( (cid:98) O , (cid:98) G ) for ( αX, | α | (cid:15), σ ), where α (cid:54) = 0. Then, O = (cid:98) O .In addition, a one-to-one correspondence exists between between G and (cid:98) G . For thecorresponding polynomials g ∈ G and (cid:98) g ∈ (cid:98) G , the following holds: (cid:98) g ( αX ) = αg ( X ) . Symbolically, supp( g ) = supp( (cid:98) g ) and the coefficients of t ∈ supp( g ) in g and (cid:98) g , say v t , (cid:98) v t ,satisfy v t = α deg( t ) − (cid:98) v t . Proof.
Let us consider two processes of the ABM algorithm; one for (
X, (cid:15), σ ) and theother for ( αX, | α | (cid:15), σ ). We use the notations in Algorithm 1 for the former processand add (cid:98) · to the notations in the latter process. At the initialization, the claim holdsbecause O = (cid:98) O = { } and G = (cid:98) G = {} . Assume that the claim holds for severaliterations, and now we are at S1 with O = (cid:98) O , ( G, (cid:98) G ) which satisfies the correspondence,and L = (cid:98) L . Note that for any term t ∈ T n , t ( αX ) = α deg( t ) t ( X ), and (cid:107) t (cid:107) gw ,αX = α deg( t ) − (cid:107) t (cid:107) gw ,X . Thus, t ( αX ) / (cid:107) t (cid:107) gw ,αX = αt ( X ) / (cid:107) t (cid:107) gw ,X . Therefore, by defining S = diag( α deg( b ) , α deg( o ) , . . . , α deg( o s ) ), (cid:99) M (cid:62) (cid:99) M (cid:98) v min = (cid:98) λ min (cid:98) D (cid:98) v min , ⇐⇒ M (cid:62) M S (cid:98) v min = (cid:98) λ min α − D S (cid:98) v min , from which we obtain λ min = α − (cid:98) λ min and v min ∝ S (cid:98) v min at S2 . This indicates thatthresholding √ λ min by (cid:15) in the first process is equivalent to thresholding (cid:113)(cid:98) λ min by | α | (cid:15) in the second process. Furthermore, by comparing the constraint of each generalizedeigenvalue problem, 1 = v (cid:62) min D v min and 1 = (cid:98) v (cid:62) min (cid:98) D (cid:98) v min , we obtain v min = α − S (cid:98) v min .Thus, at S3 , the coefficients of the i -th term t i of g and (cid:98) g are related as v i = α deg( t i ) − (cid:98) v i .In summary, if g = v b + v o + · · · , v s +1 o s is (cid:15) -approximately vanishing for X , then (cid:98) g = (cid:98) v (cid:98) b + (cid:98) v (cid:98) o + · · · , (cid:98) v s +1 (cid:98) o s is | α | (cid:15) -approximately vanishing for αX , and vice versa. If b is appended to O , then b is also appended to (cid:98) O and thus, O = (cid:98) O ; otherwise, g and (cid:98) g areappended to G and (cid:98) G , respectively, and thus, G and (cid:98) G maintain the correspondence.11roposition 5.5 works even in the approximate case (i.e., (cid:15) >
0) and provides atheoretical justification for the scaling of the points at preprocessing. That is, even ifwe scale a set of points before computing a basis, e.g., for the numerical stability, thereis a corresponding basis for the set of points before the scaling, and one can retrievethis basis from the basis computed from the scaled points. As shown in Section 6 andFig. 1, this is not the case with the coefficient normalization, although some existingwork assumes scaling of points at preprocessing such that the range of the values fallsin [-1, 1] for numerical stability or analysis [6].
From Proposition 5.1, the overall proof of Theorem 5.2 (except claim (4)) is almost thesame as that of the original theorem by replacing coefficient-unitarity with gradient-weighted unitarity. For the claim (4) in Theorem 5.2, we now derive several inequalitiesto upper bound the coefficient norm by the gradient-weighted norm.
Lemma 5.6.
Let
O ⊂ T n be an order ideal, which is obtained by the ABM algorithmwith gradient-weighted normalization for X ⊂ R n and (cid:15) ≥
0. Then, for any o ∈ O\{ } ,it is (cid:107) o (cid:107) gw ,X > (cid:15) deg( o ) − (cid:112) | X | . Proof.
We prove the claim by induction. Assume at degree d >
0, the first claim holds;that is, for any o ∈ O of degree d , it is (cid:107) o (cid:107) gw ,X > (cid:15) d − (cid:112) | X | . Let i ( o ) ⊂ { , . . . , n } bethe index set such that x k can divide o for k ∈ i ( o ). Let o ( k ) denote the term such that o = x k o ( k ) for o ∈ O and k ∈ i ( o ). For any o ∈ O of degree d + 1, we obtain (cid:107) o (cid:107) ,X = 1 (cid:80) nk =1 deg k ( o ) (cid:88) k ∈ i ( o ) (cid:13)(cid:13)(cid:13)(cid:13) ∂o∂x k ( X ) (cid:13)(cid:13)(cid:13)(cid:13) , = 1 (cid:80) nk =1 deg k ( o ) (cid:88) k ∈ i ( o ) deg k ( o ) (cid:13)(cid:13)(cid:13) o ( k ) ( X ) (cid:13)(cid:13)(cid:13) , = 1 (cid:80) nk =1 deg k ( o ) (cid:88) k ∈ i ( o ) deg k ( o ) (cid:13)(cid:13) o ( k ) ( X ) (cid:13)(cid:13) (cid:13)(cid:13) o ( k ) (cid:13)(cid:13) ,X (cid:13)(cid:13)(cid:13) o ( k ) (cid:13)(cid:13)(cid:13) ,X ,> (cid:15) d | X | . For the first equality, we used (cid:80) nk =1 (cid:13)(cid:13)(cid:13) ∂o∂x k ( X ) (cid:13)(cid:13)(cid:13) = (cid:80) k ∈ i ( o ) (cid:13)(cid:13)(cid:13) ∂o∂x k ( X ) (cid:13)(cid:13)(cid:13) because ∂o∂x k =0 for k / ∈ i ( o ). For the second equality, we used ∂o∂x k = deg k ( o ) o ( k ) . For the lastinequality, we used (cid:80) nk =1 deg k ( o ) = (cid:80) k ∈ i ( o ) deg k ( o ) , (cid:13)(cid:13) o ( k ) ( X ) (cid:13)(cid:13) / (cid:13)(cid:13) o ( k ) (cid:13)(cid:13) gw ,X > (cid:15) , andthe assumption at degree d . At d = 1, the claim holds because of (cid:107)∇ o ( X ) (cid:107) = (cid:112) | X | .Thus, for any o ∈ O\{ } , it follows that (cid:107)∇ o ( X ) (cid:107) > (cid:15) deg( o ) − (cid:112) | X | .Interestingly, we can upper bound the coefficient norm of an approximate vanishingpolynomial by its gradient-weighted norm. Given X ⊂ R n , we translate it to X suchthat ∈ X . Consequently, in the exact case, any vanishing polynomial for X should12ot have a constant term. In the approximate case (with (cid:15) ), g ∈ G may have a smallconstant c ( | c | ≤ (cid:15) ). Lemma 5.7.
Let ( O , G ) ⊂ T n × R n be a pair of an order ideal and O -border basis of I ( X ), which are obtained by the ABM algorithm for X ⊂ R n and (cid:15) ≥
0. Then, for any g , the following holds: (cid:107) g (cid:107) c < (cid:113) (cid:107) g (cid:107) ,X + c min { (cid:15) deg( g ) − , } (cid:112) | X | , where c is the coefficient of the constant term of g . Furthermore, if ∈ X , then (cid:107) g (cid:107) c < (cid:113) (cid:107) g (cid:107) ,X + (cid:15) min { (cid:15) deg( g ) − , } (cid:112) | X | . Proof.
Let g = (cid:80) si =0 c i t i , where c i ∈ R and t = 1. Let c = ( c , . . . , c s ) (cid:62) and D =diag( (cid:107) t (cid:107) gw ,X , (cid:107) t (cid:107) gw ,X , . . . , (cid:107) t s (cid:107) gw ,X ) (note that c and (cid:107) t (cid:107) gw ,X are excluded). Then, (cid:107) g (cid:107) ,X = c (cid:62) D c , ≥ min i ∈{ ,...,s } (cid:107) t i (cid:107) ,X ( (cid:107) g (cid:107) − c ) . From Lemma 5.6, we have (cid:107) t i (cid:107) gw ,X > (cid:15) deg( t i ) − (cid:112) | X | and obtain (cid:107) g (cid:107) c < (cid:113) (cid:107) g (cid:107) ,X + c min { (cid:15) deg( g ) − , } (cid:112) | X | If ∈ X , then g is (cid:15) -approximately vanishing for (i.e., | c | ≤ (cid:15) ). Lemma 5.8.
Let X ⊂ R n be a set of points. Let ( O , G ) be a pair of an order ideal andan approximate border basis obtained by the ABM algorithm with ( X, (cid:15) ). Then, for anypolynomial h ∈ R n with support O (but not necessarily gradient-weighted unitary), thefollowing holds: (cid:107) h (cid:107) gw ,X < (cid:107) h ( X ) (cid:107) + | c | (cid:112) | X | (cid:15) . If ∈ X and h is (cid:15) (cid:48) -approximately vanishing for X , then (cid:107) h (cid:107) gw ,X < (cid:15) (cid:48) (cid:15) (1 + (cid:112) | X | ) . Proof.
Let O = { , o , . . . , o s } . Let us consider O − = O\{ } and h − ( X ) = h ( X ) − c , where c is a constant part in h and ∈ R | X | is the all-one vector. Let D =13iag( (cid:107) (cid:107) gw ,X , (cid:107) o (cid:107) gw ,X , . . . , (cid:107) o s (cid:107) gw ,X ), and D − = diag( (cid:107) o (cid:107) gw ,X , . . . , (cid:107) o s (cid:107) gw ,X ). By thetriangle inequality, (cid:107) h − ( X ) (cid:107) ≤ (cid:107) h ( X ) (cid:107) + | c | (cid:112) | X | , and h − ( X ) = O − ( X ) D − − D − c . Note that O − ( X ) D − − is a full-rank “tall” matrix; thus, it follows D − c = ( O − ( X ) D − − ) † h − ( X ),where · † denotes the pseudo-inverse. By the Cauchy’s interlacing theorem, the smallesteigenvalue of ( O − ( X ) D − − ) (cid:62) O − ( X ) D − − is larger than that of ( O ( X ) D − ) (cid:62) O ( X ) D − ,which is larger than (cid:15) by construction. Thus, (cid:13)(cid:13) ( O − ( X ) D − − ) † (cid:13)(cid:13) s < /(cid:15) , where (cid:107)·(cid:107) s de-notes the spectral norm of a matrix. The gradient-weighted norm of h can be representedby (cid:107) h (cid:107) gw ,X = (cid:107) h − (cid:107) gw ,X , = (cid:107) D − c (cid:107) , ≤ (cid:13)(cid:13) ( O − ( X ) D − − ) † (cid:13)(cid:13) s (cid:107) h − ( X ) (cid:107) ,< (cid:107) h − ( X ) (cid:107) (cid:15) , ≤ (cid:107) h ( X ) (cid:107) + | c | (cid:112) | X | (cid:15) . If ∈ X and h is (cid:15) (cid:48) -approximately vanishing for , then (cid:107) h ( X ) (cid:107) ≤ (cid:15) (cid:48) and c ≤ (cid:15) (cid:48) .Now, we are ready to prove Theorem 5.2. Proof of Theorem 5.2.
Claims (1) and (2) are proven in Proposition 5.1. By Propo-sition 5.1, the coefficient of the leading term (border term) of polynomials in G isnonzero; thus, by construction in Algorithm 1, claim (3) holds (refer to the proof of theoriginal ABM algorithm; [17]). In addition, by Proposition 4.7, claim (5) also holds .The claim (4) is proven as follows. For (cid:101) g i , (cid:101) g j ∈ (cid:101) G , we denote them as (cid:101) g i = b i − (cid:101) h i and (cid:101) g j = b j − (cid:101) h j , respectively, where b i , b j ∈ T n and (cid:101) h i , (cid:101) h j ∈ R n . Note that (cid:101) g i isan ( (cid:15)/ | γ i | )-approximately vanishing polynomial with a gradient-weighted norm 1 / | γ i | .Then, we denote the S-polynomial of (cid:101) g i , (cid:101) g j and its normal remainder as S ij and r ij ,respectively. There are two cases for the S-polynomial: (i) S ij = x k (cid:101) g i − x l (cid:101) g j and (ii) S ij = (cid:101) g i − x l (cid:101) g j . It is known that in either case, the normal remainder can be writtenas r ij = S ij − (cid:80) | (cid:101) G | µ =1 v µ (cid:101) g µ , where v µ is some coefficient of (cid:101) h i [9, 17]. By the triangleinequality, (cid:107) r ij ( X ) (cid:107) ≤ (cid:107) S ij ( X ) (cid:107) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | (cid:101) G | (cid:88) µ =1 v µ (cid:101) g µ ( X ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . To show the claim (5), it is necessary to introduce the Buchberger–M¨oller algorithm for border basesand show that it can work with gradient-weighted normalization. However, owing to page limitations,we have omitted this part in this paper. (cid:107) S ij ( X ) (cid:107) ≤ (cid:15) (cid:107) X (cid:107) max | γ min | . For the second term, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | (cid:101) G | (cid:88) µ =1 v µ (cid:101) g µ ( X ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:15) | γ min | | (cid:101) G | (cid:88) µ =1 | v µ | ≤ (cid:15)n | X || γ min | max µ | v µ | , where we used | (cid:101) G | ≤ n | X | in the last inequality. Because v µ is some coefficient of (cid:101) h µ ,and | v µ | ≤ (cid:107) (cid:101) g µ (cid:107) c , by Lemma 5.7, | v µ | < √ (cid:15) min { ( (cid:15)/ | γ min | ) deg( g µ ) − , }| γ min | (cid:112) | X | , Thus, Eq. (5.3) becomes (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) | (cid:101) G | (cid:88) µ =1 v µ (cid:101) g µ ( X ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) < (cid:15)n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | . Applying this and Eqs. (5.3) and (5.3) to Eq. (5.3), we obtain (cid:107) r ij ( X ) (cid:107) < (cid:15) (cid:107) X (cid:107) max | γ min | + (cid:15)n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | . Therefore, from Lemma 5.8, (cid:107) r ij (cid:107) gw ,X < (1 + (cid:112) | X | ) | γ min | (cid:32) (cid:107) X (cid:107) max + n (cid:112) | X |√ (cid:15) min { ( (cid:15)/ | γ min | ) deg( G ) − , }| γ min | (cid:33) . Finally, by Lemma 5.7, we can upper bound the coefficient norm by the gradient-weighted norm, which provides η in claim (4). We analyze the additional time complexity by introducing gradient-weighted normal-ization. In particular, we consider the step S2 , where a generalized eigenvalue problemEq. (12) needs to be solved instead of an eigenvalue problem in the original ABM. Notethat the computational complexity of solving generalized and standard eigenvalue prob-lems is the same. In our case, it is O ( |O| ) = O ( | X | ) because |O| ≤ | X | and we onlycompute the least (generalized) eigenvalue and the corresponding (generalized) eigen-vector. In Eq. (12), we prepare M (cid:62) M with cost O ( | X ||O| ) = O ( | X | ) by exploiting theprevious calculation. We only have to focus on the calculation of the diagonal matrix D
15n Eq. (12). Now, note that all but one diagonal entry can be taken over from the previ-ous iteration. Thus, the additional runtime complexity only comes from the evaluationof the gradient of the new term b at the given points. This costs O ( n | X | E ), where E isthe computational cost of evaluating a monomial for a point. This is negligible becausethe computational cost of solving the generalized eigenvalue problem is dominant. Inconclusion, introducing gradient-weighted normalization to the ABM algorithm does notchange the computational complexity. In contrast, if we use gradient normalization, itcan be more computationally expensive; it requires to calculate ∇O ( X ) (cid:62) ∇O ( X ), whichcosts O ( n | X ||O| ) = O ( n | X | ) if we utilize the results of the previous calculation. We demonstrate the scaling consistency property (Proposition 5.5) using a simple numer-ical example. We consider six points sampled from a unit circle, X ∗ = { (cos θ k , sin θ k ) } k =1 ,..., ,where θ k = kπ/
3. The points are perturbed by an additive Gaussian noise N ( , (cid:15)I ),where (cid:15) = 0 .
1, and we obtain , X = { (0 . , . , ( − . , . , ( − . , − . , ( − . , − . , (0 . , − , (0 . , − . } . We compare the the two normalizations by applying the ABM algorithm to (
X, (cid:15) ) and(0 . X, . (cid:15) ), and the degree reverse lexicographic ordering was used. The results areshown in Fig. 1; gradient-weighted normalization leads to two bases of the same sizeand almost identical contour plots (Fig. 1(a)), whereas the coefficient normalization doesnot (Fig. 1(b)). Furthermore, as Proposition 5.5 suggests, the coefficients of polynomialsin the first and second rows in Fig. 1 exhibit a certain relationship. For example, thefirst basis polynomials (ellipses) are as follows. g = 0 . x − . xy + 0 . y + 0 . x − . y − . , (cid:98) g = 4 . x − . xy + 3 . y + 0 . x − . y − . . The coefficients of the constant, linear, and quadratic terms in g are 0 . − , 0 . , and0 . (cid:98) g , respectively. In this paper, we proposed gradient-weighted normalization for the approximate bor-der basis computation of vanishing ideals. We showed that it is a valid normalizationin the border basis computation by proving that the gradient-weighted norm alwaystakes nonzero values for order terms and border basis polynomials. The introduction ofgradient-weighted normalization is compatible with the existing analysis of approximateborder bases and the computation algorithms. The time complexity does not change The values of points and coefficients in this section are rounded for visibility.
16s well. The data-dependent nature of gradient-weighted normalization provides severalimportant properties (stability against perturbation and scaling consistency) in basiscomputation algorithms, which cannot be realized by the coefficient normalization.A current limitation of our study is that although we proved that the change ofthe evaluation of a gradient-weighted unitary polynomial is bounded with respect tothe magnitude of the perturbation, it does not necessarily follow that the whole basiscomputation algorithm is more robust than the existing methods. Nevertheless, weconsider that the present study provides a new ingredient to analyze the border basiscomputation in the approximate setting, where perturbed points are given and the stablecomputation is required. In the present study, an inequality that relates the gradient-weighted norm and the coefficient norm is also derived, which we consider is helpful toexploit the existing analyses.
References [1] John Abbott, Claudia Fassino, and Maria-Laura Torrente. Stable border bases forideals of points.
Journal of Symbolic Computation , 43(12):883–894, 2008.[2] Rika Antonova, Maksim Maydanskiy, Danica Kragic, Sam Devlin, and Katja Hof-mann. Analytic manifold learning: Unifying and evaluating representations forcontinuous control. arXiv preprint arXiv:2006.08718 , 2020.[3] David Cox, John Little, and Donal O’shea.
Ideals, varieties, and algorithms , vol-ume 3. Springer, 1992.[4] Claudia Fassino. Almost vanishing polynomials for sets of limited precision points.
Journal of Symbolic Computation , 45(1):19–37, 2010.[5] Claudia Fassino and Maria-Laura Torrente. Simple varieties for limited precisionpoints.
Theoretical Computer Science , 479:174–186, 2013.[6] Daniel Heldt, Martin Kreuzer, Sebastian Pokutta, and Hennie Poulisse. Approxi-mate computation of zero-dimensional polynomial ideals.
Journal of Symbolic Com-putation , 44:1566–1591, 2009.[7] Chenping Hou, Feiping Nie, and Dacheng Tao. Discriminative vanishing componentanalysis. In
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence(AAAI) , pages 1666–1672. AAAI Press, 2016.[8] Artur Karimov, Erivelton G. Nepomuceno, Aleksandra Tutueva, and Denis Bu-tusov. Algebraic method for the reconstruction of partially observed nonlinearsystems using differential and integral embedding.
Mathematics , 8(2):300, 2020.[9] Achim Kehrein and Martin Kreuzer. Characterizations of border bases.
Journal ofPure and Applied Algebra , 196(2):251–270, 2005.[10] Hiroshi Kera and Yoshihiko Hasegawa. Noise-tolerant algebraic method for recon-struction of nonlinear dynamical systems.
Nonlinear Dynamics , 85:675–692, 2016.1711] Hiroshi Kera and Yoshihiko Hasegawa. Approximate vanishing ideal via data knot-ting. In
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence(AAAI) , pages 3399–3406. AAAI Press, 2018.[12] Hiroshi Kera and Yoshihiko Hasegawa. Spurious vanishing problem in approximatevanishing ideal.
IEEE Access , 7:178961–178976, 2019.[13] Hiroshi Kera and Yoshihiko Hasegawa. Gradient boosts the approximate vanishingideal. In
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelli-gence (AAAI) , pages 4428–4425. AAAI Press, 2020.[14] Hiroshi Kera and Hitoshi Iba. Vanishing ideal genetic programming. In
Proceedingsof the 2016 IEEE Congress on Evolutionary Computation (CEC) , pages 5018–5025.IEEE, 2016.[15] Franz J Kir´aly, Martin Kreuzer, and Louis Theran. Dual-to-kernel learning withideals. arXiv preprint arXiv:1402.0099 , 2014.[16] Martin Kreuzer and Lorenzo Robbiano.
Computational commutative algebra 2 ,volume 2. Springer Science & Business Media, 2005.[17] Jan Limbeck.
Computation of approximate border bases and applications . PhDthesis, Passau, Universit¨at Passau, 2013.[18] Roi Livni, David Lehavi, Sagi Schein, Hila Nachliely, Shai Shalev-Shwartz, andAmir Globerson. Vanishing component analysis. In
Proceedings of the ThirteenthInternational Conference on Machine Learning (ICML) , pages 597–605. PMLR,2013.[19] Lorenzo Robbiano and John Abbott.
Approximate Commutative Algebra . Springer-Verlag Wien, 01 2010.[20] Hans J. Stetter.
Numerical Polynomial Algebra . Society for Industrial and AppliedMathematics, USA, 2004.[21] Maria-Laura Torrente.
Application of algebra in the oil industry . PhD thesis, ScuolaNormale Superiore, Pisa, 2008.[22] Lu Wang and Tomoaki Ohtsuki. Nonlinear blind source separation unifying van-ishing component analysis and temporal structure.
IEEE Access , 6:42837–42850,2018.[23] Zhichao Wang, Qian Li, Gang Li, and Guandong Xu. Polynomial representationfor persistence diagram. In
The IEEE Conference on Computer Vision and PatternRecognition (CVPR) , June 2019. 18 a) Gradient-weighted normalization (0.1X , ✏ = 0.01)(X , ✏ = 0.1)(X , ✏ = 0.1)(0.1X , ✏ = 0.01) (b) Coefficient normalization (a) Gradient-weighted normalization