Generalizing The Davenport-Mahler-Mignotte Bound -- The Weighted Case
11 Generalizing The Davenport-Mahler-Mignotte Bound – The Weighted Case
VIKRAM SHARMA,
Institute of Mathematical Sciences, HBNI
Root separation bounds play an important role as a complexity measure in understanding the behaviour of various algorithmsin computational algebra, e.g., root isolation algorithms. A classic result in the univariate setting is the Davenport-Mahler-Mignotte (DMM) bound. One way to state the bound is to consider a directed acyclic graph ( V , E ) on a subset of roots of adegree d polynomial f ( z ) ∈ C [ z ] , where the edges point from a root of smaller absolute value to one of larger absolute, andthe in-degrees of all vertices is at most one. Then the DMM bound is an amortized lower bound on the following product: (cid:206) ( α , β )∈ E | α − β | . However, the lower bound involves the discriminant of the polynomial f , and becomes trivial if thepolynomial is not square-free. This was resolved by Eigenwillig, (2008), by using a suitable subdiscriminant instead of thediscriminant. Escorcielo-Perrucci, 2016, further dropped the in-degree constraint on the graph by using the theory of finitedifferences. Emiris et al., 2019, have generalized their result to handle the case where the exponent of the term | α − β | inthe product is at most the multiplicity of either of the roots. In this paper, we generalize these results by allowing arbitrarypositive integer weights on the edges of the graph, i.e., for a weight function w : E → Z > , we derive an amortized lowerbound on (cid:206) ( α , β )∈ E | α − β | w ( α , β ) . Such a product occurs in the complexity estimates of some recent algorithms for rootclustering (e.g., Becker et al., 2016), where the weights are usually some function of the multiplicity of the roots. Because ofits amortized nature, our bound is arguably better than the bounds obtained by manipulating existing results to accommodatethe weights.Additional Key Words and Phrases: Root separation bounds, confluent Vandermonde matrix, finite differences, sub-discriminants,nuclear norm. Given a monic univariate polynomial f ( z ) ∈ C [ z ] , of degree d with roots α , . . . , α d , not all distinct, a rootseparation bound is a lower bound on the smallest distance sep ( f ) between any distinct pair of roots of f . Aclassic result [11] states that sep ( f ) > d −( d + )/ ∆ ( f ) / M ( f ) − d , where ∆ ( f ) := (cid:214) i < j ( α i − α j ) is the discriminant of f , and M ( f ) := d (cid:214) i = max { , | α i |} (1)is the Mahler measure of f .The parameter sep ( f ) naturally occurs in the complexity analysis of many algorithms; examples are the (realor complex) root isolation algorithms ([13], [3], [5], [2]). However, most of these algorithms need a lower boundon the product of certain pairs of roots and not just the worst case separation. To capture these pairs, we considera simple (i.e., no loops and multiple edges) undirected graph G = ( V , E ) , whose vertices are a subset of the distinctroots of f . Then we want a lower bound on (cid:206) ( α i , α j )∈ E | α i − α j | . One straightforward lower bound is sep ( f ) | E | ,but Davenport [3] used the amortized nature of the Mahler measure to derive a lower bound for real roots thatessentially matches the lower bound on sep ( f ) given above; the argument was later modified by Mignotte tocomplex roots [12]. A consequence of these results is a straightforward improvement in the complexity boundson the running time of algorithms for root isolation algorithms by a multiplicative factor of the degree. , Vol. 1, No. 1, Article 1. Publication date: January 2020. a r X i v : . [ c s . S C ] M a y oth these lower bounds, nevertheless, rely on the discriminant ∆ ( f ) and are trivial when the polynomial isnot square-free, i.e., it has multiple roots. A remedy is to work with the square-free part (cid:98) f of f , but this againblows the bound by exponential factors because of the growth in the coefficients of (cid:98) f as compared to f . Analternative was presented by Eigenwillig [4] that uses the ( d − r ) -th sub-discriminant of f instead of the thediscriminant, where r is the number of distinct roots of f . However, there are some constraints on the graph G for the bound to be applicable, namely, in the directed acyclic graph obtained by directing the edges of G froma root of smaller absolute value to one of larger absolute value, the in-degree of all the vertices is at most one.Escorcielo-Perrucci [7] dropped this in-degree constraint by using the theory of finite differences. Despite this,their result gives weaker bounds on products of the form (cid:214) ( α i , α j )∈ E | α i − α j | w ( α i , α j ) , (2)where w : E → N is a weight function that assigns a positive integer to all the edges. In the special case wherethe weight function is such that the weight of an edge is bounded by the multiplicity of one of its vertices, [10]and [6] have derived lower bounds when the coefficients of f are real and complex numbers, respectively. Tostate their bound, let f have r distinct roots α , . . . , α r with multiplicities m , . . . , m r , respectively, (cid:98) f denote thesquare-free part of f , and for a root α i let ∆ i denote the distance to the nearest distinct root. Then the bound in[6] is the following: If K ⊆ [ r ] and w i ∈ N is such that w i ≤ m i , for i ∈ K , then (cid:214) i ∈ K ∆ w i i ≥ − d ( r + ) ((cid:107) f (cid:107) ∞ (cid:107) (cid:98) f (cid:107) ∞ ) − d M ( f ) − r | res ( f , (cid:98) f (cid:48) )| , (3)here (cid:107) · (cid:107) ∞ is the maximum absolute value over the coefficient sequence of the polynomial, and res (· , ·) is theunivariate resultant. These bounds, though useful, fail to provide amortized lower bounds when the w i ’s exceedthe multiplicity. Such a scenario, for instance, occurs in the complexity analysis of some recent root clusteringalgorithms [1, 2], where the following product occurs, for some subsets K i ⊆ [ r ] : (cid:214) i ∈ K ∆ (cid:205) j ∈ Ki m j i . One way to derive a lower bound on this product is to exponentiate the left-hand side of (3) to the degree d (sincethe sum of the multiplicities over K i is bounded by d ), move the extraneous factors to the denominator in theright-hand side, and upper bound these to get a lower bound on the desired product. But, just as was the casewith sep ( f ) | E | earlier, such an approach loses the amortization property and gives exponentially worse bounds.In this paper, we derive a lower bound on the product in (2) for arbitrary weight functions. The restrictionson the weights in the earlier approaches was an outcome of the choice of the symmetric function (either thediscriminant, sub-discriminant or the resultant). We instead choose a symmetric function based on the weightsand try to optimize over all valid choices of the function. This is done by constructing a confluent Vandermondematrix to get the desired weight structure in the exponents. The choice of the confluent Vandermonde isespecially helpful when the weights are skewed in distribution, because this means we can pick a differentmultiplicity structure on the roots and obtain better bounds. The spectral structure of the weighted adjacencymatrix A w := [ w i , j ] i , j = ,..., r plays an important role in the choice of the multiplicity structure for constructing theconfluent Vandermonde matrix. For ease of comprehension, we state our result when f is an integer polynomial(since then the absolute value of the non-zero symmetric function is at least one, which is how the bounds areused in practice) and is also monic (otherwise divide M ( f ) by the absolute value of the leading coefficient). Let Throughout, we use N to denote the set of positive integers and Z ≥ the set of non-negative integers. A w (cid:107) (cid:63) denote the nuclear norm of A w , i.e., the sum of its singular values, n := r (cid:108)(cid:112) (cid:107) A w (cid:107) (cid:63) (cid:109) , and w ( E ) be the sumof the weights over the edges of G . Then we show that (cid:214) ( α i , α j )∈ E | α i − α j | w ( α i , α j ) > M ( f ) − r (cid:107) A w (cid:107) (cid:63) (cid:18) n √ (cid:19) − r (cid:107) Aw (cid:107) (cid:63) − w ( E ) n − n / . (4)The bound is amortized because the exponent of the Mahler measure does not contain w ( E ) , which would be thecase if we try to derive the lower bound by modifying the earlier results (see (11) below).In the next section, we give the requisite details and properties of the confluent Vandermonde matrix; Section 3contains the statement of our main result Theorem 3.2 and its comparison with a modification of an existingbound; Section 4 contains a proof of the main result, and in Section 4.1 we specialize it to obtain the form givenabove in (4). Consider the column vector v ( x ) t := (cid:2) x x · · · x n (cid:3) . Define the vector obtained by differentiating each entry in the column above i times and dividing by i !, ie., v i ( x ) t := (cid:2) (cid:0) i (cid:1) x − i (cid:0) i (cid:1) x − i (cid:0) i (cid:1) x − i · · · (cid:0) n − i (cid:1) x n − − i (cid:3) , (5)with the natural convention that (cid:0) ji (cid:1) = j < i . Let β := ( β , . . . , β r ) ∈ C r be an r -dimensional vector of complex numbers, µ := ( µ , . . . , µ r ) ∈ N r be a sequence of positive integers, and n := (cid:205) i µ i . Then the confluent Vandermonde matrix V ( β ; µ ) is the n × n matrix with columns ( v j ( β i )) , where 1 ≤ i ≤ r and 0 ≤ j ≤ µ i −
1. We will also use the notation V ( β , . . . , β r ; µ , . . . , µ r ) when we want to emphasize the β i ’s and µ i ’s. We illustrate it below for r = µ = , µ = V ( β , r ) = β β β (cid:0) (cid:1) β β β β (cid:0) (cid:1) β β β (cid:0) (cid:1) β β (cid:0) (cid:1) β β β (cid:0) (cid:1) β . The block, B ( β i ) , corresponding to a β i , is the set of columns ( v j ( β i )) , for j = , . . . , µ i −
1. If all the µ i ’s areone then we obtain the standard Vandermonde matrix denoted as V ( β ) . A key observation in understandingthe determinant of the matrix above is to consider the matrix obtained by replacing the last column v µ i − ( β i ) ,corresponding to some β i with µ i >
1, with the column v ( y ) , for some variable y , which gives us the matrix V ( y ) := V ( β , . . . , β i , y , β i + , . . . , β r ; µ , . . . , µ i − , , µ i + , . . . , µ r ) . (6)Let V( y ) := det ( V ( y )) . By expanding along the column corresponding to y , we can express V( y ) as a polynomialin y with degree at most n −
1. If we differentiate this polynomial ( µ i − ) times, divide by ( µ i − ) ! and substitute y = β i , then we will recover the determinant of V ( β ; µ ) expanded along the last column of the block B ( β i ) . Moreprecisely, det ( V ( β ; µ )) = V ( µ i − ) ( y )( µ i − ) ! (cid:12)(cid:12)(cid:12)(cid:12) y = β i . (7) is result is crucial in deriving the following explicit form for the determinant [9].Proposition 2.1. The determinant of the confluent Vandermonde matrix satisfies det ( V ( β ; µ )) = (cid:214) ≤ i < j ≤ r ( β j − β i ) µ i µ j . The following variant of the bound appears in [7].Proposition 3.1.
Let α := ( α , . . . , α r ) be a sequence of distinct complex numbers, and M ( α ) := r (cid:214) i = max { , | α i |} . (8) If G ( V , E ) is an undirected simple graph (i.e., with no multi edges and self-loops) with vertices V ⊆ { α , . . . , α r } , then (cid:214) ( α i , α j )∈ E | α i − α j | ≥ | det ( V ( α ))| M ( α ) −( r − ) (cid:18) r √ (cid:19) −| E | r − r / . Remark:
The result in [7] actually uses the sub-discriminant. Given a degree d polynomial f ( z ) = r (cid:214) i = ( z − α i ) m i , with distinct roots α i of multiplicity m i , for 1 ≤ i ≤ r , the ( d − r ) discriminant of f is given bysDisc d − r ( f ) := det ( V ( α )) r (cid:214) j = m i . (9)Taking absolute values and substituting the expression for the absolute value of the determinant into Proposi-tion 3.1 we get (cid:214) ( α i , α j )∈ E | α i − α j | ≥ sDisc d − r ( f ) / M ( f ) −( r − ) × (cid:18) r √ (cid:19) −| E | r − r / (cid:206) ri = √ m i . (10)Escorcielo-Perrucci [7] then use the following upper bound by Eigenwillig [4] to derive the final form of theirresult: If m , . . . , m r ∈ N and (cid:205) ri = m i = d then r (cid:214) i = √ m i ≤ min { d , ( d − r )}/ . Instead, if we use the AM-GM inequality then we get a sharper bound, namely r (cid:214) i = √ m i ≤ (cid:18) dr (cid:19) r / . Substituting this in (10), we get the following improvement over [7]: (cid:214) ( α i , α j )∈ E | α i − α j | ≥ sDisc d − r ( f ) / M ( f ) −( r − ) (cid:18) r √ (cid:19) −| E | d − r / . We will generalize Proposition 3.1 above to account for non-zero integer weights on the edges, i.e., a lowerbound on the product given in (2). To illustrate the advantage of our approach, we first give the details of a lowerbound obtained by a straightforward modification of Proposition 3.1. et w max be the largest weight over all the edges in G . Then we can raise the bound in the Proposition 3.1 tothis weight and move the extraneous factor to the right-hand side, and replace them with an upper bound. Forany edge ( α i , α j ) ∈ E , we have | α i − α j | w max − w ( α i , α j ) ≤ ( M ( f )) w max . Therefore, we obtain the following lower bound as a modification of Proposition 3.1, which we will use to comparewith the bound derived in this paper: (cid:214) ( α i , α j )∈ E | α i − α j | w ( α i , α j ) ≥ | det ( V ( α ))| w max M ( α ) −(( r − ) w max + | E | w max ) · −(| E | w max ) (cid:18) r √ (cid:19) −| E | w max r −( rw max )/ . (11)In comparison, we obtain the following generalization:Theorem 3.2. Let α , . . . , α r ∈ C be distinct complex numbers. Let G ( V , E ) be an undirected graph whose vertices V is a subset of { α , . . . , α r } , with an associated a weight function w : E → N . Denote by A w = [ w ( α i , α j )] i , j = ,..., r the associated weighted adjacency matrix. To every vertex α i ∈ V , we assign a potential µ i ∈ N such that for everyedge ( α i , α j ) ∈ E , we have w ( α i , α j ) ≤ µ i µ j . Define µ as the column-vector of these potentials, n := (cid:205) ri = µ i , M ( α ) beas in (8), and w ( E ) as the sum of the weights of the edges in the graph G , i.e., w ( E ) := (cid:213) ( α i , α j )∈ E w ( α i , α j ) . (12) Then (cid:214) ( α i , α j )∈ E | α i − α j | w ( α i , α j ) > | det ( V ( α ; µ )| M ( α ) −(cid:107) µµ t − A w (cid:107) ∞ (cid:18) n √ (cid:19) − (cid:205) i ( µi ) − w ( E ) n − n / , (13) where ∞ -norm of a matrix is the maximum one-norm over all the rows of the matrix. Remarks: (1) Since we are dealing with symmetric matrices, we can replace the ∞ -norm with the induced 1-norm,which is the sum of the columns.(2) If all the weights are one, then we can take µ i ’s as 1, and obtain Proposition 3.1 as a corollary.(3) There is an interesting trade-off between the absolute values of the exponent of M ( α ) and n /√
3, namely,as the number of edges in G increases the former decreases whereas the latter increases.In order to compare (10) and (13), we make three assumptions:(i) G is connected, so | E | ≥ r − µ i = √ w max , for all i = , . . . , r , and(iii) f is an integer polynomial.From the last assumption, it follows that both | det ( V ( α ))| and | det ( V ( α ; µ ))| are at least one, and that is howwe often use them in applications. The second assumption implies that n = r √ w max . We now compare threeanalogous terms from both the bounds by taking logarithms.From the assumption of connectivity, it follows that the absolute value of the exponent of M ( α ) in (10) is atleast 2 ( r − ) w max , whereas in (13) it is at most rw max . If r ≥
2, then it follows that the former is larger than thelatter. The difference is because of the amortized property of the bound in (13).Consider the negation of the logarithm of the term n − (cid:205) i ( µi ) − w ( E ) in (13). This is equal to (cid:32)(cid:213) i (cid:18) µ i (cid:19) + w ( E ) (cid:33) log n ≤ (cid:32)(cid:213) i (cid:18) µ i (cid:19) + | E | w max (cid:33) log ( r √ w max ) . ince (cid:0) µ i (cid:1) ≤ µ i /
2, it follows that (cid:205) i (cid:0) µ i (cid:1) ≤ rw max /
2. Therefore, the right-hand side above is upper bounded by2 | E | w max log ( r √ w max ) which is somewhat larger than (− log r −| E | w max ) , the corresponding term in (10). It must be remarked, nevertheless,that the choice in the second assumption is not the best (see Section 4.1) and is only used for illustration at thispoint.The negation of the logarithm of n − n / in (13) is r √ w max log ( r √ w max ) , which is better than the corresponding term in (10), namely, ( rw max ) log r , for sufficiently large w max . Let f : C → C be a function and y , . . . , y n be n nodes. Then the divided difference of f on these n nodes is givenby f [ y , . . . , y n ] := n (cid:213) k = n (cid:214) (cid:96) = (cid:96) (cid:44) k ( y k − y (cid:96) ) f ( y k ) . (14)If f ( z ) := z m , for some m ∈ Z ≥ , then we have the following closed form: f [ y , . . . , y n ] = (cid:213) ( t ,..., t n )∈ Z n ≥ (cid:205) ni = t i = m − n + (cid:206) nj = y t j j if n ≤ m +
10 if n > m + . (15)Given i , . . . , i n ∈ Z ≥ , denote by f ( i ,..., i n ) [ y , . . . , y n ] := 1 i ! ∂ i ∂ y i · · · i n ! ∂ i n ∂ y i n n f [ y , . . . , y n ] . (16)Then the following claim is straightforward to show:Lemma 3.3. Given i , . . . , i n ∈ Z ≥ , the quantity f ( i ,..., i n ) [ y , . . . , y n ] is a linear combination of f ( k j ) ( y j ) , where j = , . . . , n and k j = , . . . , i j . Moreover, the coefficient of f ( i j ) ( y j ) inthis linear combination is i j ! n (cid:214) (cid:96) = (cid:96) (cid:44) j ( y j − y (cid:96) ) i (cid:96) + . Proof. For simplicity, we only argue for i ; the argument is similar for other cases. Consider the effect of i ! ∂ i ∂ y i on f [ y , . . . , y n ] . By linearity of the derivative operator, we only need to focus on the term f ( y )/ (cid:206) i (cid:44) ( y − y i ) .From Leibniz’s rule applied to this term we get the expression1 i ! f ( i ) ( y ) (cid:206) i (cid:44) ( y − y i ) . e effect of the other partial derivatives i (cid:96) ! ∂ i (cid:96) ∂ y i (cid:96)(cid:96) is only on the terms in the denominator, which yields the desiredexpression for the coefficient of f ( i ) ( y ) . (cid:3) If f ( z ) := z m , for some m ∈ Z ≥ , and ( i , . . . , i n ) ∈ Z n ≥ , then as a generalization of (15), we obtain the following f ( i ,..., i n ) [ y , . . . , y n ] = (cid:213) ( t ,..., t n )∈ Z n ≥ (cid:205) ni = t i = m − n + (cid:206) nj = ( tjij ) y tj − ijj if n ≤ m + if n > m + (17)with the natural convention that (cid:0) t j i j (cid:1) = t j < i j . The idea of the proof is similar to [7]. Given the undirected graph G , we first direct its edges to go from a root ofsmaller modulus to one of larger modulus; this way we obtain a directed acyclic graph G ; the in-degrees of thevertices in G can be larger than one, which is the case addressed in [7]. We consider the vertices of G in thereverse order of a topological sort on its vertices, i.e., in the order ( α , . . . , α r ) , where if ( α i , α j ) is an edge in G then j < i . Let In ( α i ) denote the set of all vertices that have an edge pointing to α i , d i be the cardinality of In ( α i ) (i.e., the in-degree of α i ), and V := V ( α ; µ ) . (18)At the i th step we will process the block corresponding to α i in V i − , where i ≥
1, to obtain a matrix V i . Therelation between the two matrices is the following:det V i − = det ( V i ) (cid:214) α j ∈ In ( α i ) ( α i − α j ) w ( α j , α i ) . (19)The matrix V i is instead obtained from V i − in stages by modifying the columns in the block corresponding to α i ,that is, there are two loops – one over the blocks B ( α i ) , and an inner loop processing the columns of the block B ( α i ) . The end result is a matrix V r such thatdet ( V ) = det ( V r ) r (cid:214) i = (cid:214) α j ∈ In ( α i ) ( α i − α j ) w ( α j , α i ) . The final step is to derive an upper bound on | det ( V r )| ; this is done by applying Hadamard’s inequality, andobtaining upper bounds on the two-norms of the columns of V r . In what follows, we will use α in place of α i , µ α as the size of the block B ( α ) , k := d i , and V := V i − .Without loss of generality, let us assume that β , . . . , β k are the k vertices in In ( α ) , with respective weights w , . . . , w k . Since we are processing the vertices in reverse topological order, we know that the blocks corre-sponding to these vertices have not been changed. Let µ , . . . , µ k be the sizes of the blocks B ( β ) , . . . , B ( β k ) ,respectively. We will replace each column in the block B ( α ) by a suitable linear combination of the columns inthe blocks B ( α ) and B ( β i ) for i = , . . . , k . The linear combination will be obtained by taking a suitable partialderivative of the form given in (16) and then substituting y i ’s appropriately. Ideally, we would have replaced, saythe last column in B ( α ) , by the partial derivative obtained by taking full weights, w , . . . , w k . However, there is aslight obstacle, namely, that the derivatives of f ( β i ) , for i = , . . . , k , cannot exceed beyond µ i −
1. To overcomethis we assign each edge ( β i , α ) with corresponding weight w i to a column in the block B ( α ) , namely to the (cid:100) w i / µ i (cid:101) -th column in B ( α ) ; since w i ≤ µ i µ α by assumption on weights, the edge will be assigned to a column in B ( α ) . Let S j ⊆ [ k ] , for j = , . . . , µ α , denote the set of all indices assigned to the j th column of B ( α ) , i.e., S j := { i ∈ [ k ] : (cid:100) w i / µ i (cid:101) = j } . (20) y assignment it follows that S j ’s form a partition of [ k ] . The reason why this assignment works is the following:each column in B ( α ) , along with its preceding columns in B ( α ) and the blocks B ( β ) , . . . , B ( β k ) , can be used tofactor out ( β i − α ) µ i ; therefore, (cid:100) w i / µ i (cid:101) columns will be required to get to ( β i − α ) w i . An illustrative aid for thesubsequent proof is provided in Figure 1. V = V i −
11 2 µ B ( β ) 1 2 B ( α ) µ αj + 1In V ( j +1) ,the columns( j + 1) to µ α have beenprocessed. µ k B ( β k ) j Fig. 1. The matrix V i − and the block B ( α ) at stage i of the proof. At the j th step in processing V , the columns ( j + ) to µ α of the block B ( α ) have been processed to obtain V ( j + ) . In V ( j + ) , the j th column is processed to obtain V ( j ) . We will now process the columns of B ( α ) starting from the last column to the first in V ; it will help the readerto note that the columns will be counted from 1 to µ α . Suppose we have already processed the columns of B ( α ) from µ α down to ( j + ) in V ; let V ( j + ) be the resulting matrix; initially, define V ( µ α + ) := V . For β (cid:96) , (cid:96) = , . . . , k ,define r (cid:96) := (cid:40) µ (cid:96) if w (cid:96) is divisible by µ (cid:96) , ( w (cid:96) mod µ (cid:96) ) otherwise . (21)We inductively claim the following relation for j ≤ µ α :det ( V ) = det ( V ( j + ) ) µ α (cid:214) κ = j + (cid:214) (cid:96) ∈ S κ ( β (cid:96) − α ) ( κ − j − ) µ (cid:96) + r (cid:96) . (22)The proof is by reverse induction on decreasing values of j ; the base case trivially holds for j = µ α , since theproduct vanishes and V = V ( µ α + ) by choice.To complete the inductive claim (22), we have to obtain the following terms from the j th column in B ( α ) :(1) the residue terms ( β (cid:96) − α ) r (cid:96) , for each index (cid:96) ∈ S j , and(2) a factor of ( β (cid:96) − α ) µ (cid:96) for all the indices (cid:96) ∈ S κ , where κ > j .This is done by taking a suitable partial derivative of the finite difference. Let N j := | ∪ µ α κ = j S κ | , (23)that is the total number of indices assigned to column j or greater; clearly N j ≤ k . We will introduce N j variablesfor each of these indices, and a variable y for α . Note that the j th column of the block B ( α ) in V ( j + ) is obtainedby substituting z = α in v j − ( z ) given in (5). The m th entry of this column, for m = , . . . , n , is (cid:18) m − j − (cid:19) z m − j = ( z m − ) ( j − ) ( j − ) ! . (24) efine f m ( z ) := z m − , and consider the finite difference f m [ y , y , . . . , y N j ] , where the y (cid:96) ’s are variables. Since the order of y i ’s in (16) does not matter, we can assume without loss ofgenerality that S j = (cid:8) , . . . , | S j | (cid:9) , the indices in S j + are the next | S j + | numbers, and so on S µ α is the last | S µ α numbers smaller than N j ; thus the sets S κ , for κ = j , . . . , µ α , form a partition of the set (cid:8) , . . . , N j (cid:9) . Furtherdefine i := j − , i (cid:96) := r (cid:96) − , (25)for (cid:96) = , . . . , | S j | , and i (cid:96) = µ (cid:96) − , (26)for (cid:96) = | S j | + , . . . , N j . Then we replace the m th entry of the j th column v j − ( α ) in the matrix V ( j + ) by f ( i ,..., i N j ) m [ y , y , . . . , y N j ] (27)and substitute y := α , and y (cid:96) := β (cid:96) , for (cid:96) = , . . . , N j .This is done for all the n entries (that is, m = , . . . , n ) in the j th column. Let V ( j ) be the resulting matrix. From Lemma 3.3 we know that the coefficient of f ( i ) m ( y ) is1 i ! N j (cid:214) (cid:96) = ( y − y (cid:96) ) i (cid:96) + = ( j − ) ! µ α (cid:214) κ = j (cid:214) (cid:96) ∈ S κ ( α − β (cid:96) ) i (cid:96) + , (28)which is same for all m = , . . . , n . Therefore, the replacement of the entries of the j th column in the matrix V ( j + ) by (27), for m = , . . . , n , is tantamount to obtaining the matrix V ( j ) from V ( j + ) by replacing the j th column of V ( j + ) by a linear combination of its other columns and a scaled version of the j th column, where the scalingfactor is the product term in (28); the 1 /( j − ) ! is not part of the scaling as it already occurs in all the entries ofthe column (see (24)). In terms of the determinant, we obtain the following relation:det ( V ( j + ) ) = det ( V ( j ) ) µ α (cid:214) κ = j (cid:214) (cid:96) ∈ S κ ( β (cid:96) − α ) i (cid:96) + = det ( V ( j ) ) (cid:214) (cid:96) ∈ S j ( β (cid:96) − α ) r (cid:96) µ α (cid:214) κ = j + (cid:214) (cid:96) ∈ S κ ( β (cid:96) − α ) µ (cid:96) . Substituting this in (22), we have the desired inductive relation:det ( V ) = det ( V ( j ) ) µ α (cid:214) κ = j (cid:214) (cid:96) ∈ S κ ( β (cid:96) − α ) ( κ − j ) µ (cid:96) + r (cid:96) . We stop when j =
1, to get det ( V ) = det ( V ( ) ) µ α (cid:214) κ = (cid:214) (cid:96) ∈ S κ ( β (cid:96) − α ) ( κ − ) µ (cid:96) + r (cid:96) . But recall from (20) that for an index (cid:96) ∈ S κ , we have (cid:100) w (cid:96) / µ (cid:96) (cid:101) = κ . Furthermore, from (21) it follows that w (cid:96) = ( κ − ) µ (cid:96) + r (cid:96) . Since ∪ µ α κ = S κ = [ k ] , we have accounted for all the β (cid:96) ’s, and so the equation above is the ame as det ( V ) = det ( V ( ) ) k (cid:214) (cid:96) = ( β (cid:96) − α ) w (cid:96) = det ( V ( ) ) (cid:214) α j ∈ In ( α i ) ( α j − α ) w ( α j , α i ) . Defining V i := V ( ) and recalling that V = V i − , we complete the proof of the inductive claim (19). Applying theclaim for i = r , and making appropriate substitutions, we get the desired relation (29).det ( V ) = det ( V r ) r (cid:214) i = (cid:214) α j ∈ In ( α i ) ( α i − α j ) w ( α j , α i ) , (29)where V = V ( α ; µ ) (see (18)) The absolute value of the product on the right-hand side is the value that we needto lower bound; we know the determinant on the left-hand side from Proposition 2.1, so all that remains is toderive an upper bound on | det ( V r )| . We will use Hadamard’s inequality for this purpose, which requires us toderive an upper bound on the two-norms of the columns of the matrix V r . Let V r ( α i ; j ) denote the j th columnof the block of columns V r ( α i ) corresponding to B ( α i ) in V ; note that V r ( α i ) may be the same as B ( α i ) (thishappens, for instance, when there are no edges incident on α i in G ). In what follows, we derive an upper boundon (cid:107) V r ( α i ; j )(cid:107) .Recall the definition of N j from (23), and that µ i is the size of the block B ( α i ) . For convenience again, let thesets S j , S j + , . . . , S µ i ⊆ In ( α i ) be indexed such that S j = (cid:8) , . . . , | S j | (cid:9) , the next | S j + | numbers are in S j + and soon until S µ i is the last | S µ i | numbers smaller than N j ; thus these sets form a partition of the set (cid:8) , . . . , N j (cid:9) . Nowthe m th entry in the column V r ( α i ; j ) is (27). From (17), we have the following bound on the absolute value of (27)after substituting n := N j + y = α := α i , y (cid:96) := α (cid:96) , (cid:96) = , . . . , N j , and the indices i (cid:96) ’s are defined as in (25) and (26): (cid:213) ( t , t ,..., t Nj )∈ Z Nj + ≥ t + t + ··· + t Nj = m − − N j N j (cid:214) (cid:96) = (cid:18) t (cid:96) i (cid:96) (cid:19) | α (cid:96) | t (cid:96) − i (cid:96) . Since α , . . . , α N j have edges directed to α i , their absolute values are smaller than | α i | . Therefore, the quantityabove is upper bounded by (cid:213) ( t , t ,..., t Nj )∈ Z Nj + ≥ t + t + ··· + t Nj = m − − N j N j (cid:214) (cid:96) = (cid:18) t (cid:96) i (cid:96) (cid:19) | α i | m − − N j − i (cid:96) , which is equal to | α i | m − − N j − (cid:205) Nj (cid:96) = i (cid:96) (cid:213) ( t , t ,..., t Nj )∈ Z Nj + ≥ t + t + ··· + t Nj = m − − N j N j (cid:214) (cid:96) = (cid:18) t (cid:96) i (cid:96) (cid:19) . (30)Define M j := N j + N j (cid:213) (cid:96) = i (cid:96) = N j + j − + N j (cid:213) (cid:96) = i (cid:96) , (31) here the second equality follows from the fact that i = j − (cid:0) t (cid:96) i (cid:96) (cid:1) vanish for t (cid:96) < i (cid:96) , so we can assume that t (cid:96) ≥ i (cid:96) . If j (cid:96) := t (cid:96) − i (cid:96) , then N j (cid:213) (cid:96) = t (cid:96) = N j (cid:213) (cid:96) = i (cid:96) + N j (cid:213) (cid:96) = j (cid:96) , and so the constraint (cid:205) N j (cid:96) = t (cid:96) = m − − N j is equivalent to N j (cid:213) (cid:96) = j (cid:96) = m − − N j − N j (cid:213) (cid:96) = = m − − M j where the last step follows from the definition of M j (31). Changing the indices from t (cid:96) to j (cid:96) in (30), we get thefollowing bound the m th entry of V r ( α i ; j ) : | α i | m − − M j (cid:213) ( j , j ,..., j Nj )∈ Z Nj + ≥ j + j + ··· + j Nj = m − − M j N j (cid:214) (cid:96) = (cid:18) i (cid:96) + j (cid:96) i (cid:96) (cid:19) . (32)We next derive a closed form for the summation term above.Consider the generating function (cid:213) t (cid:96) ≥ i (cid:96) (cid:18) t (cid:96) i (cid:96) (cid:19) x t (cid:96) − i (cid:96) = (cid:213) j (cid:96) ≥ (cid:18) i (cid:96) + j (cid:96) j (cid:96) (cid:19) x j (cid:96) = ( − x ) −( i (cid:96) + ) for a given (cid:96) . Taking the product of these for different choices of (cid:96) , it follows that the summation term in theright-hand side of (32) is the coefficient of x m − − M j in the generating function ( − x ) − (cid:205) Nj (cid:96) = ( i (cid:96) + ) = ( − x ) −( M j + ) which is (cid:18) m − M j (cid:19) . This implies that (32) is equal to | α i | m − − M j (cid:0) m − M j (cid:1) .From the argument in the preceding paragraph, it follows that in the matrix V r the two-norm of the j th column,in the block of columns corresponding to B ( α i ) , is (cid:107) V r ( α i ; j )(cid:107) ≤ (cid:32) n (cid:213) m = | α i | ( m − − M j ) (cid:18) m − M j (cid:19) (cid:33) / . Since for m − ≤ M j the binomial term vanishes, we can start the summation from M j onwards to obtain thefollowing equivalent form (cid:107) V r ( α i ; j )(cid:107) ≤ (cid:169)(cid:173)(cid:171) n − (cid:213) m = M j | α i | ( m − M j ) (cid:18) mM j (cid:19) (cid:170)(cid:174)(cid:172) / . Substituting | α i | by max | α i | := max { , | α i |} (33) nd pulling out its largest power from the summation we have the following upper bound on the two-norm (cid:107) V r ( α i ; j )(cid:107) ≤ max | α i | ( n − − M j ) (cid:169)(cid:173)(cid:171) n − (cid:213) m = M j (cid:18) mM j (cid:19) (cid:170)(cid:174)(cid:172) / . Using the upper bound from [7, Lemma 7] on the summation term above, we get the following inequality (cid:107) V r ( α i ; j )(cid:107) ≤ max | α i | ( n − − M j ) (cid:18) n √ (cid:19) M j √ n . Taking the product of these quantities for j = , . . . , µ i , we get the following upper bound on the product of thetwo-norms of the columns in the block V r ( α i ) in V r : µ i (cid:214) j = (cid:107) V r ( α i ; j )(cid:107) ≤ max | α i | (cid:205) µij = ( n − − M j ) (cid:18) n √ (cid:19) (cid:205) µij = M j n µ i / . (34)Let us understand the term (cid:205) µ i j = M j .Lemma 4.1. For a vertex α i in the directed acyclic graph G , define w i := (cid:213) α (cid:96) ∈ In ( α i ) w ( α (cid:96) , α i ) , (35) that is, the sum of the weights of all edges incident on α i . Then µ i (cid:213) j = M j = (cid:18) µ i (cid:19) + w i . Proof. Recall the definition of the sets S j , from (20), and the definition of M j , from (31). Given a j , and (cid:96) ∈ S j , i (cid:96) = r (cid:96) − (cid:96) ∈ S κ , where κ = j + , . . . , µ i , i (cid:96) = µ (cid:96) −
1. Therefore, we can rewrite (31) as M j = N j + j − + (cid:213) (cid:96) ∈ S j ( r (cid:96) − ) + (cid:213) (cid:96) ∈∪ κ > j S κ ( µ (cid:96) − ) = j − + (cid:213) (cid:96) ∈ S j r (cid:96) + (cid:213) (cid:96) ∈∪ κ > j S κ µ (cid:96) . The sum (cid:205) j (cid:205) (cid:96) ∈ S j r (cid:96) is the sum of the residue terms over all indices in ∪ µ i j = S j . Now consider the sum µ i (cid:213) j = (cid:213) (cid:96) ∈∪ κ > j S κ µ (cid:96) . For two indices j < κ , the summation over j contributes an µ (cid:96) for every (cid:96) ∈ S κ . Therefore, µ i (cid:213) j = (cid:169)(cid:173)(cid:171) (cid:213) (cid:96) ∈ S j r (cid:96) + (cid:213) (cid:96) ∈∪ κ > j S κ µ (cid:96) (cid:170)(cid:174)(cid:172) = w i (cid:3) Substituting the result in the lemma above into (34), we get the following upper bound on the two-norms ofthe columns in V r ( α i ) µ i (cid:214) j = (cid:107) V r ( α i ; j )(cid:107) ≤ max | α i | ( n − ) µ i − ( µi ) − w i (cid:18) n √ (cid:19) ( µi ) + w i n µ i / . aking the product of this bound for i = , . . . , r , along with Hadamard’s inequality, gives us the following upperbound | det ( V r )| ≤ r (cid:214) i = (cid:32) max | α i | ( n − ) µ i − ( µi ) − w i (cid:18) n √ (cid:19) ( µi ) + w i (cid:33) n n / . (36)where we use the fact that n = (cid:205) ri = µ i . The term ( n − ) µ i − (cid:18) µ i (cid:19) − w i = r (cid:213) j = j (cid:44) i µ i µ j − w i + (cid:18) µ i (cid:19) < r (cid:213) j = j (cid:44) i µ i µ j − w i + µ i . If µ be the column vector of all µ i ’s, and A w be the adjacency matrix with the ( i , j ) th entry as the weight w ( α i , α j ) of the corresponding edge ( α i , α j ) , then the last term in the inequality above is the one-norm of the i th row of thematrix µµ t − A w . Since the ∞ -norm of the matrix (cid:107) µµ t − A w (cid:107) ∞ is the maximum over all the row-sums, we have ( n − ) µ i − (cid:18) µ i (cid:19) − w i ≤ (cid:107) µµ t − A w (cid:107) ∞ . As for the term r (cid:213) i = (cid:18)(cid:18) µ i (cid:19) + w i (cid:19) = r (cid:213) i = (cid:18) µ i (cid:19) + w ( E ) , where w ( E ) is defined in (12). Substituting these bounds in (36), we obtain the following upper bound | det ( V r )| ≤ M ( α ) (cid:107) µµ t − A w (cid:107) ∞ (cid:18) n √ (cid:19) (cid:205) i ( µi ) + w ( E ) n n / . (37)Substituting this upper bound in (29) and moving it to the denominator in the left-hand side completes the proofof Theorem 3.2. Theorem 3.2 leaves open the choice of the potentials µ i ∈ N , i = , . . . , r . Our aim here is to find the best possiblechoice of µ i ’s satisfying the edge constraints w ( α i , α j ) ≤ µ i µ j and at the same time minimizing (cid:107) µµ t − A w (cid:107) ∞ .For example, if all the weights are one then it is clear that µ i =
1, for i = , . . . , r , is the best possible assignment.In which case, V ( α ; µ ) = V ( α ) , (cid:107) µµ t − A w (cid:107) ∞ ≤ ( r − ) , n = r , w ( E ) = | E | and so Theorem 3.2 matches the bound given in Proposition 3.1.Consider the relaxed version of the problem where µ i ’s are positive reals; it is clear that rounding them up tothe nearest integer would give a valid solution (though not an optimum solution) to the problem over the positiveintegers. Then the optimization problem is to minimize (cid:107) µµ t − A w (cid:107) ∞ such that µµ t ≥ A w where ‘ ≥ ’ here means entry wise; note that the non-edge constraints are trivially satisfied since no µ i is everassigned to zero. Since A w is non-negative, we know from the Perron-Frobenius theory [9] that the spectrumof A w is an eigenvalue ρ ( A w ) of A w . Moreover, as A w is symmetric it can be orthogonally diagonalized, i.e., A w = Q Λ Q t , where Q is the r × r orthogonal matrix whose columns q k , k = , . . . , r , are the eigenvectors of A w and Λ is a diagonal matrix that has the corresponding eigenvalues of A w . Another way to express the relation isthat A w is the sum of some rank one matrices obtained by its eigenvectors, i.e., A w = r (cid:213) k = λ k q k q tk . e can also assume that the (cid:107) q k (cid:107) = k = , . . . , r . Combined with the equation above it follows that the ( i , j ) -th entry of A w w ( α i , α j ) = r (cid:213) k = λ k q k , i q k , j . Since by assumption (cid:107) q k (cid:107) =
1, taking absolute values we get w ( α i , α j ) ≤ r (cid:213) k = | λ k | = (cid:107) A w (cid:107) (cid:63) , where (cid:107) A w (cid:107) (cid:63) is the nuclear norm of A w . Therefore, we can take µ in Theorem 3.2 as the vector µ := (cid:108)(cid:112) (cid:107) A w (cid:107) (cid:63) (cid:109) r (cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125)(cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123) ( , , . . . , ) , (38)which implies that n = r (cid:108)(cid:112) (cid:107) A w (cid:107) (cid:63) (cid:109) in the theorem. The error in the approximation can be shown to be bounded by (cid:107) µµ t − A w (cid:107) ∞ ≤ r (cid:107) A w (cid:107) (cid:63) , and (cid:213) i (cid:18) µ i (cid:19) ≤ r (cid:107) A w (cid:107) (cid:63) , where in the last inequality we use the observation that as A w has entries in Z ≥ , its spectrum is greater than one,and hence (cid:107) A w (cid:107) (cid:63) ≥
1. By making these substitutions in Theorem 3.2, we obtain the result, namely (4), mentionedin Section 1.
Our derivation using the confluent Vandermonde matrix to get the desired weights in the exponents has theadvantage of optimizing over the various choices of the matrix. We have given a first attempt at exploiting thischoice. Whereas rank-one approximations to matrices are well studied [8], the challenge in our context is toderive a symmetric rank-one matrix that also dominates A w .One would also like to derive a lower bound on the absolute value of det ( V ( α ; µ )) in terms of the polynomial f , to get a more direct comparison with the earlier results. Perhaps an algorithm to compute the determinantfrom the coefficients would also be interesting; a related recent result is an algorithm to compute the D + ( f ) -rootfunction defined as (cid:206) ≤ i < j ≤ r ( α i − α j ) m i + m j , i.e., G is the complete graph on the roots and the weight of an edgeis the sum of the multiplicity of its vertices [14]. Similar to [6], one would like to derive weighted version of theresults for the more general setting of polynomial systems. REFERENCES [1] Prashant Batra and Vikram Sharma. 2019. Complexity of a Root Clustering Algorithm.
CoRR abs/1912.02820 (2019). arXiv:1912.02820http://arxiv.org/abs/1912.02820[2] Ruben Becker, Michael Sagraloff, Vikram Sharma, and Chee Yap. 2018. A near-optimal subdivision algorithm for complex root isolationbased on the Pellet test and Newton iteration.
Journal of Symbolic Computation
86 (2018), 51 – 96. https://doi.org/10.1016/j.jsc.2017.03.009[3] James H. Davenport. 1985.
Computer algebra for Cylindrical Algebraic Decomposition
Real Root Isolation for Exact and Approximate Polynomials Using Descartesfi Rule of Signs . Ph.D. Thesis. Universityof Saarland, Saarbruecken, Germany.
5] Arno Eigenwillig, Vikram Sharma, and Chee Yap. 2006. Almost Tight Complexity Bounds for the Descartes Method. In
Proc. of the 31stIntl. Symp. on Symbolic and Algebraic Computation . 71–78. Genova, Italy. Jul 9-12, 2006.[6] Ioannis Emiris, Bernard Mourrain, and Elias Tsigaridas. 2019. Separation bounds for polynomial systems.
Journal of SymbolicComputation (2019). https://doi.org/10.1016/j.jsc.2019.07.001[7] Paula Escorcielo and Daniel Perrucci. 2017. On the Davenport-Mahler bound.
J. Complexity
41 (2017), 72–81. https://doi.org/10.1016/j.jco.2016.12.001[8] Shmuel Friedland. 2013. Best rank one approximation of real symmetric tensors can be chosen symmetric.
Frontiers of Mathematics inChina
Topics in Matrix Analysis . Cambridge University Press, Cambridge.[10] Alexander Kobel and Michael Sagraloff. 2015. On the complexity of computing with planar algebraic curves.
J. Complexity
31, 2 (2015),206–236. https://doi.org/10.1016/j.jco.2014.08.002[11] Maurice Mignotte. 1992.
Mathematics for Computer Algebra . Springer-Verlag, Berlin.[12] Maurice Mignotte. 1995. On the Distance Between the Roots of a Polynomial.
Applicable Algebra in Engineering, Commun., and Comput.
Journal of SymbolicComputation
33, 5 (2002), 701–733.[14] Jing Yang and Chee K. Yap. 2020. On mu-Symmetric Polynomials. arXiv:cs.SC/2001.0740333, 5 (2002), 701–733.[14] Jing Yang and Chee K. Yap. 2020. On mu-Symmetric Polynomials. arXiv:cs.SC/2001.07403