Finding polynomial roots by dynamical systems -- a case study
Sergey Shemyakov, Roman Chernov, Dzmitry Rumiantsau, Dierk Schleicher, Simon Schmitt, Anton Shemyakov
FFINDING POLYNOMIAL ROOTS BY DYNAMICALSYSTEMS — A CASE STUDY
SERGEY SHEMYAKOV, ROMAN CHERNOV, DZMITRY RUMIANTSAU,DIERK SCHLEICHER, SIMON SCHMITT, AND ANTON SHEMYAKOV
Abstract.
We investigate two well known dynamical systemsthat are designed to find roots of univariate polynomials by it-eration: the methods known by Newton and by Ehrlich–Aberth.Both are known to have found all roots of high degree polynomialswith good complexity. Our goal is to determine in which caseswhich of the two algorithms is more efficient. We come to the con-clusion that Newton is faster when the polynomials are given byrecursion so they can be evaluated in logarithmic time with respectto the degree, or when all the roots are all near the boundary oftheir convex hull. Conversely, Ehrlich–Aberth has the advantagewhen no fast evaluation of the polynomials is available, and whenroots are in the interior of the convex hull of other roots. Introduction
Among the most fundamental and classical results in mathematicsis the fact that every univariate polynomial of degree d over C splitsinto d linear factors (i.e., C is algebraically closed), and that there is noclosed formula to determine their roots in terms of radicals. Therefore,these roots have to be approximated numerically, usually in terms ofan iterated process that can be viewed as a very natural dynamicalsystem.We investigate two such dynamical systems that are particularlyprominent as root finders: Newton’s method and the Ehrlich–Aberth-method. Both are well known as efficient root finders that have agood track record for finding all roots of rather large degrees, and thequestion arises which of these is more efficient. Of course, there arenumerous other root finding procedures, including eigenvalue methodsor Weierstrass’ iteration method. Our choice is motivated by the factthat the Ehrlich–Aberth-method is underlying one of the best estab-lished practical root finding software packages, MPSolve , while in ourown work Newton’s method has turned out as particularly successful. a r X i v : . [ m a t h . NA ] A p r S. SHEMYAKOV AND ET AL.
Moreover, these two root finders are conceptually related in an inter-esting way: Newton approximates one root at a time, while Ehrlich–Aberth may be seen as a system of d orbits that try to approximateall the d roots at the same time, and these d orbits can be viewed as d Newton orbits that are synchronized in the sense that each of themis aware of where the others are: they are “attracted” by the roots inthe same sense as for Newton’s method, but repelled from each otherso that they avoid approximating the same (simple) root.We feel it would be quite interesting to extend this comparison alsoto other root finding procedures.It is clear that no continuous deterministic root finding system canconverge to roots for every set of initial conditions: the domains of con-vergence to some set of roots is open, and the complement must containat least the boundaries of the basins. The second best desired propertyof a root finder is that it be generally convergent : this means it shouldconverge to a root for an open and dense subset of all possible startingconditions (starting points for Newton, starting vectors for Ehrlich–Aberth). This property is difficult to establish: Newton’s method isknown not to be generally convergent, and for Ehrlich–Aberth it is anopen question.The structure of this paper is as follows: in Section 2 we describethe iteration steps of Newton and of Ehrlich–Aberth, trying to bringout their conceptual similarities. Relevant and interesting properties ofboth methods are then described, respectively, in Sections 3 and 4. Ourkey results are presented in Section 5: we describe several families ofpolynomials with different properties, and all with a wide range of de-grees, and measure the performance of Newton’s and Ehrlich–Aberth’smethods in systematic experiments. The final Sections 6 contains adiscussion of the results and the complexity of the method, as well pos-sible conclusions and ideas for improvements that might inspire furtherresearch.
Acknowledgements.
This project has been inspired by several of ourfriends and colleagues. It owes a lot to many interesting discussionswith Dario Bini and Leonardo Robol in Pisa, as well as with VictorPan in New York, as well as John Hubbard in Cornell. It builds onmany ideas and discussions with the members of our research group,in particular Robin Stoll, Marvin Randig, and Bernhard Reinke. Inaddition, we are most grateful to the two referees for their helpfulcomments and suggestions.We gratefully acknowledge support through the Advanced GrantHOLOGRAM of the European Research Council.
OOT FINDING BY DYNAMICAL SYSTEMS 3 The root finding methods by Newton andEhrlich–Aberth in comparison
The setting in all cases is the same: we investigate a monic polyno-mial p ( z ) = ( z − α ) . . . ( z − α d ) of degree d , with d roots α ,. . . , α d ;nothing will be lost by assuming that p is monic.2.1. Newton’s method.
Newton’s well known root finding methodis not an algorithm but a heuristic: given a polynomial p (or moregenerally a differentiable mapping) and a guess z for a possible root,then a “suggested improvement” for the guess is(1) z (cid:48) = N p ( z ) := z − p ( z ) /p (cid:48) ( z ) . If z is already a root, i.e. p ( z ) = 0, then this point remains fixed: rootsof p are fixed points of N p . In general, N p is a rational map of thesame degree as the number of distinct roots of p (multiple roots of p are accounted for in the degree of p but not of N p ). Lemma 1 (Local convergence of Newton’s method) . Every simpleroot α has a neighborhood on which convergence of Newton’s methodis quadratic: | N p ( z ) α | < C | z − α | . For multiple roots the conver-gence is only linear: if α is a root with multiplicity k ≥ , then | N p ( z ) − α | / | z − α | −→ ( k − /k as z → α . There is an electrostatic interpretation of Newton’s method: placeprotons at the locations of the roots of p ; then the electrostatic forcefelt by an electron at a point z ∈ C is the sum of the attractive forces ofall the roots, and each of these forces is in the direction to the root andwith strength inversely proportional to the distance. Except for con-stants of physical units, the combined force equals − (cid:80) i / ( z − α i ) = − p (cid:48) ( z ) /p ( z ) (where the overbar denotes complex conjugation). ForNewton’s method, the image z (cid:48) of z is the unique point for which thissum is equal to − / ( z − z (cid:48) ): it places the point z (cid:48) so that a singlecharge at z (cid:48) has the same effect as the superposition of all d chargesat the roots. In other words, it moves z in the direction of the netforce of all the roots, but at a distance inversely proportional to thenet force: the smaller this net force, the further away all the roots, andthe further the point z has to move in the direction of the origin of theforce.2.2. The Ehrlich–Aberth-method.
This method does not approx-imate one root at a time, but tries to find all d roots at once; hence itis an iteration on C d (or rather an open subset thereof). S. SHEMYAKOV AND ET AL.
Consider a vector ( z , . . . , z d ) of initial guesses. The Ehrlich–Aberth-iteration consists of the map ( z , . . . , z k ) (cid:55)→ ( z (cid:48) , . . . , z (cid:48) k ) given by(2) z (cid:48) k := z k − p ( z k ) p (cid:48) ( z k ) − p ( z k ) p (cid:48) ( z k ) · (cid:80) i (cid:54) = k z k − z i . Observe that if some z k is equal to a root of p , i.e. p ( z k ) = 0, then z (cid:48) k = z k so this coordinate remains fixed. There is a heuristic interpre-tation for this formula: each of the d component variables z k “thinks”that all the other components are already equal to a root (“every-one else knows what they are doing, only I am wrong”), and tries touse the standard Newton update z (cid:55)→ z − f ( z ) /f (cid:48) ( z ) for the function f ( z ) = p ( z ) / (cid:81) i (cid:54) = k ( z − z k ), which would be the linear function ( z − z k )if all the other approximations were correct. To see this, observe that f (cid:48) ( z ) = p (cid:48) ( z ) (cid:81) i (cid:54) = k ( z − z k ) − p ( z ) (cid:81) i (cid:54) = k ( z − z k ) (cid:80) i z k − z i ( (cid:81) i (cid:54) = k ( z − z k )) = p (cid:48) ( z ) − p ( z ) (cid:80) i z k − z i (cid:81) i (cid:54) = k ( z − z k )and hence f ( z k ) f (cid:48) ( z k ) = p ( z k ) p (cid:48) ( z k ) − p ( z k ) (cid:80) i z k − z i = p ( z k ) p (cid:48) ( z k ) − p ( z k ) p (cid:48) ( z k ) · (cid:80) i (cid:54) = k z k − z i as claimed above. Lemma 2 (Local convergence of Ehrlich–Aberth method) . Every vec-tor consisting of the d roots of p has a neighborhood in C d on which theEhrlich–Aberth method converges to this solution vector. This conver-gence is cubic when all roots of p are simple, and linear otherwise. This method has an electrostatic interpretation similar as Newton’smethod: once again, we place positive unit charges at the positions α , . . . , α d of the d roots in C . This time, there are d approximationsto the roots at the positions z , . . . , z k , each coming with a negative unitcharge. The d approximations to the roots are as much attracted to theactual roots as they are repelled by each other, so they have a naturaltendency to find different roots; in particular, if one approximation hasalready “found” a root, then the positive and negative charges cancelexactly. Lemma 3 (Electrostatic interpretation of Ehrlich–Aberth-method) . In the Ehrlich–Aberth-method, each root moves in the direction of the
OOT FINDING BY DYNAMICAL SYSTEMS 5 resulting net charge, and a distance that is inversely proportional to thestrength of this charge.Proof.
The following computation starts with the Ehrlich–Aberth iter-ation formula end terminates with the electrostatic interpretation: z (cid:48) k := z k − p ( z k ) p (cid:48) ( z k ) − p ( z k ) p (cid:48) ( z k ) · (cid:80) i (cid:54) = k z k − z i = z k − p (cid:48) ( z k ) − p ( z k ) (cid:80) i (cid:54) = k zk − zi p ( z k ) = z k − p (cid:48) ( z k ) p ( z k ) − (cid:80) i (cid:54) = k z k − z i = z k − (cid:80) i z k − α i − (cid:80) i (cid:54) = k z k − z i . So indeed, the formula is the same as Newton’s method, except thatall the other d − (cid:3) If one runs Newton’s method on d approximations separately, onecan give it a similar electrostatic interpretation in the sense that the d approximations have infinitesimally small test charges so that they seethe roots, but do not interact with each other. This can be seen as aconceptual advantage of Ehrlich–Aberth over Newton: the d approx-imations are synchronized with each other. However, currently theredoes not seem to be theory known that exploits this fact towards anunderstanding of the global dynamics of the Ehrlich–Aberth-method.In particular, it is not known whether it is globally convergent; seeSection 3.1.One key difference between the Newton and Ehrlich–Aberth-methodswill be relevant below: in order to evaluate Newton’s method, it sufficesto have a way of evaluating p (cid:48) /p . In certain cases, for instance when p is given by iteration or other efficient types of recursion, this mayhave much better complexity than O ( d ): in the cases explored here,this complexity was O (log d ). Linear recursions for p do not offer im-provements here, but sparse polynomials might (compare for instance[BF1]).Even though one can use the same efficient evaluation of p (cid:48) /p inEhrlich–Aberth, this does not improve the overall complexity of themethod. This explains why specifically in such cases, the performanceof Newton is much better.We would like to mention that for both methods approximate im-provements are indeed possible that might lead to substantially fasterevaluations at least in appropriate cases: these are discussed in Sec-tion 6.5. S. SHEMYAKOV AND ET AL.
Machine precision and stopping criteria.
Like for all numer-ical experiments, it is important to keep in mind the capabilities andlimitations of the computing system. Our experiments were all per-formed with floating point numbers with double precision. In practice,we made the “idealistic” assumption that all our calculations were ex-act. They are not, of course, but we have reasons to believe that thisis not a relevant issue in practice for our polynomials, even for largedegrees: both numerical methods are known to be intrinsically stablewith self-correcting errors, and one can often verify a posteriori thatindeed all roots have been found (see for instance in [SSt1, Section 2],where an explicit worst-case estimate on error bounds for Newton’smethod is computed, and [RaSS] for such an a-posteriori verificationfor degrees up to 2 ). In fact, in many cases (including all our polyno-mials) the mutual distance between roots is much larger compared toavailable precision of computation, so all roots have numerically dis-tinguishable domains of quadratic convergence. In each case it remainsan a-posteriori verification that the conceivable problem of insufficientmachine precision does not lead to missed roots.A final remark concerns the stopping criterion: a frequent conditionthat a root has been found is when the value of the polynomial issmaller than a given threshold. A different criterion is that a particularapproximation z k is δ -close to an actual root α i , with a precise boundon δ . This can be upgraded to the requirement that all roots of p havebeen found if there exists a vector ( z , . . . , z d ) ∈ C d with a guaranteethat, up to permutation, | z i − α i | < δ for all i ∈ { . . . , d } . That is thestopping criterion that we used for Ehrlich–Aberth implementation.The question of whether some root has higher multiplicity is irrel-evant (and numerically usually not a valid question): what mattersis that several z i are close to each other, whether or not the approx-imated roots are multiple or only nearby. As we mentioned before,all the polynomials considered in our experiments have well-separatedroots.The actual precision of the computed roots is not of great relevancebecause of quadratic convergence, provided the roots are simple: oncethe roots have been found in the sense that they are separated thenany desired precision ε can be achieved in time of about O ( d log | log ε | )(within the limits of machine precision). For both methods, by far mostof the computation time is spent on separating the roots. Howeverif one wants to estimate the error from the perspective of numericalanalysis, the paper [Pr] gives some useful a priori and a posterioribounds for Newton’s method. OOT FINDING BY DYNAMICAL SYSTEMS 7 The Ehrlich–Aberth-method
Properties of the method; general convergence.
The keyfeature of this method is the recursive step, given in (2), that takes onevector of d complex numbers, viewed as initial guesses, and computes anew vector of supposedly improved approximations of the d roots of agiven degree d polynomial p . In order to upgrade this iteration step toan algorithm, one needs to specify the vector of initial guesses (startingpoints), as well as a stopping criterion. Moreover, at least in principlethere is the possibility that the iteration fails to find the roots of thepolynomials, and this needs to be detected.While the local convergence of the Ehrlich–Aberth-method in a neigh-borhood of the roots is understood (Lemma 2), we are not aware of anytheory about global convergence properties. The method cannot con-verge in all cases. An obvious case where it fails is when p is a realpolynomial with non-real roots, and the vector of starting points is en-tirely real. Then by symmetry all subsequent iterates will be entirelyreal, so they cannot converge to the roots. There are similar symmetricsituations that prevent convergence to roots.More conceptually, convergence must fail on larger sets of startingpoints: the set of initial vectors ( z , . . . , z d ) ∈ C d from which the it-eration converges to a vector of roots in any particular order is open,and it is non-empty because it includes a neighborhood of the rootvector. Since this is so for every ordered vector of the d roots, it fol-lows that the subset of C d from which convergence occurs is a finiteunion of disjoint open sets, more than one, and this cannot be all of C d : therefore convergence must fail on all the boundaries, which mustbe large enough to separate open sets in C d , so it must fail on a set oftopological dimension at least 2 d − C d that are permuted by the iteration. The wayhow Newton’s method fails to be generally convergent (see Section 4)is because some of its periodic points may become attracting, so theyattract a whole open set of starting points, and this happens for allnon-trivial cases: even for degree 3 there are polynomials that haveattracting cycles of any period. By analogy, one may suspect that the S. SHEMYAKOV AND ET AL.
Ehrlich–Aberth-method also has periodic cycles that are attracting. Infact, this has been the intuition of several people in dynamical systems;however, in numerous experiments in particular using the
MPSolve im-plementation no attracting cycles have ever been observed.At this point a digression to the Weierstrass method may be inter-esting, even though it is not the focus of the current paper. Like theEhrlich–Aberth-method, it is an iteration in C d (undefined at certainpoints where two or more coordinates coincide). Both methods areknown as reliable root finders that seem to find roots generically andefficiently, but both are lacking global theory.For the Weierstrass method, there are recent results in [ReSS]. Mostimportantly, it has been discovered that the Weierstrass is not gener-ally convergent : there is an open subset of the space of polynomials ofdegree 3 or higher that has an open set of starting vectors that fails toconverge to any roots. More precisely, there are explicit cubic polyno-mials such as z (cid:55)→ z + z + 180 for which the Weierstrass method hasperiodic points of period 4 that are attracting. This result seems to goagainst the expectation of people working in numerical analysis: but itis what had been suspected in analogy to Newton’s method by peoplein dynamical systems, in particular by Smale.The result in [ReSS] comes out of systematic investigation of periodicpoints of small degrees and periods. It is shown that for cubic poly-nomials, period 4 is minimal: all periodic points of period 2 or 3 arenon-attracting because explicit equations are found for the sums of theeigenvalues at periodic points, and these equations are not compatiblewith the condition that all eigenvalues are in D as required for attract-ing periodic points. Another unexpected property that was discoveredis that there are starting points for which the Weierstrass iteration isdefined at all times but for which the dynamics tends to ∞ ; this is sofor (almost) every polynomial of degree 3 or higher.This situation is less clear for the Ehrlich–Aberth-method. Thismethod seems similar to the Weierstrass method, but it is more com-plicated: the equations themselves are algebraically more complicated,and Weierstrass has the advantage that it has an invariant subspace ofcodimension 1 where the interesting dynamics takes place. Therefore,the analysis in [ReSS] (which even for Weierstrass very quickly devel-opes high complexity that limits the cases that can be treated) cancover only basic cases for Ehrlich–Abert, and it is not clear whetherattracting cycles might exist, or whether the substantial numerical ev-idence points to general convergence. OOT FINDING BY DYNAMICAL SYSTEMS 9
Starting points; order of update of approximation vector.
As mentioned earlier, there is no theory yet about where to start theEhrlich–Aberth-iteration from. For our experiments, we felt it was nat-ural to use those points that provide good starting points for Newton’smethod: hence, we used a vector of d points equidistributed on a largecircle that surrounds all the roots. In contrast, Bini [Bi] discusses theuse of Rouch´e’s theorem to find several circles that enclose differentclusters of roots.The update from the “old” vector in ( z , . . . , z d ) ∈ C d of possibleroots to the “new” vector ( z (cid:48) , . . . , z (cid:48) d ) can be done in (at least) twoways: for the computation of the new coordinates z (cid:48) k one can either useonly the old coordinates ( z , . . . , z d ) throughout (“Jacobi-style”), or, foreach z k , make use of the already computed new coordinates z (cid:48) , . . . , z (cid:48) k − (“Gauss–Seidel-style”). The former approach is a natural iteration in C d , while the latter is not, but it is considered more efficient.3.3. The MPSolve Implementation.
There is a prominent softwarepackage that finds roots of univariate polynomials in practice, authoredby Dario Bini, Giuseppe Fiorentino and Leonardo Robol and called
MPSolve (with the last published version ). This software pack-age has an impressive track record on root finding. Experiments weremade for polynomials of degrees exceeding 25 000, some of them withcoefficients larger than 10 ; see [Bi, Ro]. For all these experiments,usually no more than 20 Ehrlich–Aberth iteration cycles sufficed tofind all roots (due to superlinear convergence, the required precisionfor root finding is not a decisive issue once all roots were found withsufficient precision so as to separate them).In this implementation, the update of the new approximate rootvectors is done in Gauss–Seidel style: every new coordinate z (cid:48) k wasused immediately in all subsequent computations.4. Newton’s method and iterated refinement: virtues andproblems
Newton’s method in its original form is a heuristic not an algorithm:it gives a rule to modify an initial guess, and it comes with a promisethat initial guesses sufficiently close to a root will converge to this root.However, many starting points will not converge to a root, and if theydo, it is not clear where to start in order to find all roots, not a subset.For a polynomial p , the Newton method N p is a rational map thatvery naturally forms a dynamical system on the Riemann sphere C .The Riemann sphere decomposes into Julia set J and Fatou set F ,both of which are invariant under the dynamics. The Fatou set is open and dense and contains all the roots. The Julia set is compact andnowhere dense, but it is a non-empty, even uncountable compact setthat contains finitely many periodic points of all periods m ≥
2. Anystarting points in the Julia set will remain in the Julia set forever, sothese points fail to converge to any root. It is known that the Juliaset may have positive 2-dimensional Lebesgue measure, so a randomstarting point may be in the Julia set.Worse yet, the Fatou set may contain open sets of points that con-verge to attracting cycles of any period m ≥ p ( z ) = z − z + 2 wherethere is a superattracting 2-cycle 0 (cid:55)→ − (cid:55)→ N p ( z ) = z − z − ; see Figure 1, but it may happen for any degree and anyperiod; a complete classification may be found in [LMS]. Figure 1.
The Newton map for p ( z ) = z − z + 2 hasa superattracting 2-cycle; the domain in black convergesto this cycle.Therefore, individual orbits may fail to converge to a root (eventhough one may argue that most orbits will converge to a root; see[HSS, Section 4]). If nothing is known about the positions of the roots(other than a normalization such as a circle enclosing all the roots),then one needs more than d starting points in order to have a chancethat all d roots will be found. A sufficient such set of starting pointshas been constructed explicitly in [HSS]. OOT FINDING BY DYNAMICAL SYSTEMS 11
These starting points have to be placed on a circle, or on a smallnumber of circles, that surround all roots and are at a certain distancefrom the roots, at places where a good control of Newton’s methodcan be assured. The disadvantage is that from there it takes a lot ofiterations until the orbits reach the region where the roots are; typi-cally, more than d orbits each have to undergo more than d iterationsbefore interesting dynamics happens: so O ( d ) iterations are “wasted”to move from the domain with control about Newton’s method to thedomain where the roots are. This prohibits any complexity of less than O ( d ) iterations.In response to this problem, the “iterated refinement” Newton methodhas been developed in [SSt1, RaSS]: the idea is that initially, wherethe orbits are still close to the initial circles, there is a lot of control onthe Newton dynamics so all orbits do just about the same thing. Theiterated refinement approach thus starts with a small initial number ofstarting points on the original circles, perhaps 64 points, and iteratesthem while their orbits are “parallel” in a sense to be made precise (forinstance, in the sense of cross ratios between three adjacent orbits, to-gether with ∞ ). One may also think about such orbits as of “boring”,they don’t exhibit any unexpected behavior. Once two or three orbitsfail to remain parallel, these orbits are refined by creating additionalorbits between them. We think of this occurrence as of indication thatdynamics becomes more intricate and needs more orbits to be studied.This way, for certain families of polynomials of degrees up to 2 > ,all roots were found with log-linear complexity and log-linear comput-ing time [RaSS]. For an illustration of the iterated refinement in actionone can have a look at Figure 2.The problem is that this is experimental science, and the originalguarantee that all roots will be found by d or more starting pointsis given up in favor of speed of computation. Among the heuristicparameters are the exact way to measure deviation of parallel orbits,as well as a quantitative refinement sensitivity with respect to thismeasure. Too insensitive refinements will lead to missed roots, whiletoo high sensitivity will lead to slow computation. It is not clear howto fill in roots that are missed in the end. Deflation is usually not anoption: it is in general very unstable, and it may destroy the simplestructure of the polynomial (for instance, if the polynomial is given byiteration, then after deflation this structure is lost, and the evaluationbecomes highly inefficient).There is another possible optimization for Newton’s method: forthose cases where polynomials are given in terms of coefficients, onehas numerous parallel Newton orbits, hence has to evaluate the same Figure 2.
Illustration of the “iterated refinement” idea:we start with a small number of points and refine whenadjacent orbits no longer move in parallel. Note thatmost of the refinement happens once the orbits are closeto the roots, which drastically reduces the necessary com-putations. Figure taken from [RaSS].polynomial p at many different points. We did these evaluations at allthese points independently, not taking advantage of possible efficientevaluation of polynomials at many points in parallel.5. Experimental results of both methods in comparison
Overview of polynomial families considered.
The numericalexperiments were performed on various classes of polynomials of dif-ferent degrees, defined in as different ways as possible, and so that thegeometry of the root distribution was as different as possible, in orderto explore different behavior of the two root finders. We explored thefollowing families of polynomials:(1) iterated quadratic polynomials (or periodic points of quadraticpolynomials), Section 5.4.1;(2) polynomials that describe the centers of hyperbolic componentsof the Mandelbrot set, Section 5.4.2;
OOT FINDING BY DYNAMICAL SYSTEMS 13 (3) Chebyshev polynomials, Section 5.4.3;(4) Legendre polynomials, Section 5.4.4;(5) polynomials with random roots on a circle, Section 5.4.5;(6) polynomials with random roots on a disk, Section 5.4.6;(7) polynomials with roots on a finite square grid, Section 5.4.7.The motivation for these various cases was quite different: iter-ated polynomials were originally chosen because they can be evaluatedeasily, and our focus was on root finding not polynomial evaluation.Chebyshev and Legendre polynomials were selected because the distri-bution of their roots is well known (along intervals). For polynomialswhere the coefficients are equidistributed [ET, Ar], the roots tend toaccumulate at the unit circle with even distribution; this justifies someof the other choices.5.2.
Evaluation of polynomials.
A polynomial of degree d is a holo-morphic mapping from C to C of degree d ; we would like to stress thatone should not assume by default that it is given in terms of coeffi-cients: indeed, all we require is that p and p (cid:48) can be evaluated in someway (or possibly only p (cid:48) /p ). Indeed, much of the difficulties in find-ing roots of polynomials lies in evaluating the polynomials, and thisproblem is especially pronounced for high degree polynomials that aregiven in terms of coefficients. Here is one of our favorite prominentexamples: periodic points of period n for p ( z ) = z + 2 are roots of thepolynomial P n ( z ) = p ◦ n ( z ) − z , where p ◦ n denotes the n -th iterate of p .The constant coefficient in P n has size greater than 2 n , which is utterlyimpossible to handle even for small n , while P n and its derivatives caneasily be evaluated for n as large as 25 or more [RaSS], and all these2 n roots have been found successfully and easily.In our experiments, some polynomials are given by recursive formulasthat allow for evaluation in logarithmic time with respect to the degree.Others are evaluated in terms of coefficients, and yet others are, forthe purposes of these experiments, evaluated in terms of the roots thatare known ahead of time: in some experiments we had wanted to usea prescribed distribution of the locations of the roots, so these rootswere computed first and the polynomials were evaluated in terms ofthese. This may be against the spirit of “finding” the roots, but isquite interesting for checking properties of root finding algorithms. Ourmain focus is not on polynomial evaluation, but we do want to pointout that the evaluation is a serious and interesting issue, and often aserious bottleneck in the performance of root finders. Remarks on implementation and performance.
Our focuswas on flexibility of experimentation, not real time optimization. Wethus did most of our implementations in
Java .We measured complexity in terms of number of arithmetic opera-tions, and that is independent of the environment of implementation.More precisely, our implementation contains a counter which is updatedeach time an operation of real addition or multiplication is performed.In order to work with high-degree polynomials we had to solve po-tential overflow problems. In particular, when we compute any fractionwhere numerator or denominator can be unexpectedly large we manu-ally maintain the scientific representation of numbers by rememberingmantissa and the order of magnitude.In cases when a polynomial is given by its roots we computed thevalue of p ( z ) /p (cid:48) ( z ) with the formula (cid:16)(cid:80) di =1 1 z − ξ i (cid:17) − .When performing the Ehrlich–Aberth iteration we update the com-ponents of the approximation vector in Gauss-Seidel style. However allcomponents are updated until the iteration stops, even if some compo-nents approximate the root well enough. We admit that our implemen-tation of the Ehrlich–Aberth iteration was quite down-to-earth. Webelieve, however, that the overall structure of the results is unaffectedby such possible optimizations. A relevant method of improvementin the special case of polynomials with “fast” evaluation was recentlypointed out to us by Dario Bini; this is discussed in Section 6.5.5.4. Experimental results.
We describe the results of our experi-ments on the performance comparison. Graphs picture the dependenceof number of operations (real additions and multiplications) needed tofind all the roots in terms of the degree of polynomials. As mentionedearlier, various polynomials were evaluated in linear time with respectto the degree (“slow evaluation”) and others in logarithmic time (“fastevaluation”), depending on their definition.5.4.1.
Iterated quadratic polynomials.
A class of polynomials that isparticularly easy to evaluate are those that describe periodic points ofperiod n for quadratic polynomials p c ( z ) = z + c , with c ∈ C : thesepolynomials have the form P n,c ( z ) = p ◦ nc ( z ) − z , where p ◦ nc denotes the n -th iterate of p c . These polynomials have degree 2 n but can be evaluatedin complexity O ( n ) hence in logarithmic complexity with respect to thedegree. For certain choices of c , all periodic points of periods up to 30(i.e., degrees up to 2 > ) were computed successfully by Newton’smethod in [SSt1, RaSS]. OOT FINDING BY DYNAMICAL SYSTEMS 15
Figure 3.
Complexity (in terms of arithmetic opera-tions) for finding periodic points of quadratic polyno-mials; log-log scale. Top: the polynomial z + 1; bot-tom: the polynomial z + i . This graphs combine resultsfor evaluating the polynomials in coefficient form (“slowevaluation”) and evaluating using iteration (“fast eval-uation”). Notice that for the Ehrlich–Aberth-iterationdoes not accelerate significantly by taking advantage offast evaluation.In order to gauge the dependence of complexity of the possibility toevaluate the polynomials efficiently, these experiments were run twice:once by evaluating P n,c in logarithmic time (“fast”) and once in lin-ear time (“slow”). The results are shown in Figure 3. It turns outthat Newton’s method is much faster, especially for high degrees, whentaking advantage of the fast polynomial iteration, but not necessarily otherwise. Moreover, the graphs show clearly that most of the com-puting time for Ehrlich–Aberth is not spent for the evaluation of thepolynomials (but compare the discussion in Section 6.5).It is a well known and simple fact from complex dynamics that pe-riodic points of p c are distributed along the Julia sets (even equidis-tributed with respect to harmonic measure). For the later discussion itmay be worth mentioning that the Julia sets are never convex, exceptwhen c = 0 (the Julia set is a circle) or c = − z +1 (the Julia set is a Cantorset) and z + i (the Julia set is a dendrite).5.4.2. Centers of hyperbolic components of the Mandelbrot set.
Thesepolynomials in the variable c are defined by the iteration q = 0 and q n +1 = q n + c , i.e. q = c , q = c + c , q = ( c + c ) + c , etc., so q n has degree 2 n − . Equivalently, for the iteration p c : z (cid:55)→ z + c , we have q ( c ) = p ◦ nc ( z ) with z = 0: the roots are exactly the parameters c forwhich the critical point z = 0 of p c is periodic with period n . Suchparameters are well known to be the centers of hyperbolic componentsof period n for the Mandelbrot set (see for instance [DH, Sch2]). Thesepolynomials can be evaluated by recursion as efficiently as the iteratedquadratic polynomials from Section 5.4.1. For large n , these pointsaccumulate at the boundary of the Mandelbrot set, so they form avery non-convex set. The results of the experiments are displayed inFigure 4. Figure 4.
Complexity for finding centers of hyperboliccomponents of the Mandelbrot set; log-log scale.
OOT FINDING BY DYNAMICAL SYSTEMS 17
It may be interesting to note that these polynomials have been usedas sample polynomials for evaluating the efficiency the Ehrlich–Abert-method in its implementation in
MPSolve (see [BF2]), and indepen-dently also for our experiments on Newton’s method (see [SSt1, RaSS]).One has to keep in mind that formally speaking, [BF2] works withpolynomials q n ( c ) /c of degree 2 n − − Chebyshev polynomials.
These polynomials have the special prop-erty that they can be evaluated in logarithmic time with respect to thedegree, based on the recursion formula T d ( x ) = 2 T d ( x ) −
1. Thisevaluation works, of course, only for degrees d = 2 n , but we have tomention that Chebyshev polynomials can be defined for arbitrary de-gree. Two sets of experiments were done for these polynomials as well,with evaluation in logarithmic and in linear time (“fast” and “slow”).For these polynomials, Newton turned out to be significantly fastthan Ehrlich–Aberth in both cases — obviously more so for the recur-sive evaluation.5.4.4. Legendre polynomials.
Here the evaluation of the polynomialswas always in linear time. The result of the observed complexity seemscomparable to the “slow” experiment of Chebyshev polynomials (Fig-ure 5); which is consistent with the observation that the speed of eval-uation is comparable between both polynomial families, and so is thelocation of the roots (on the unit interval).5.4.5.
Polynomials with random roots on a circle.
Here the structureof the experiment was that initially a given number of roots was dis-tributed randomly on a circle, and then the polynomials were definedin terms of these roots (evaluation in linear complexity). The outcomeis that Newton’s method seem to be faster for large degrees, but thiseffect is not very pronounced.5.4.6.
Polynomials with random roots in a disk.
For these polynomials,the roots were placed randomly in the unit disk, and the polynomialswere again defined in terms of these roots; as a result, their evaluation islinear in the degree (“slow”). In this case, the Ehrlich–Aberth methodis consistently faster.
Figure 5.
Complexity for finding periodic points ofChebyshev polynomials; log-log scale. Top: evaluationof polynomials in coefficient form (“slow evaluation”);bottom: evaluation using iteration.We should mention that in our implementation, the roots were dis-tributed independently over the unit disk, but not with respect to pla-nar Lebesgue measure, but with respect to the linear product measurein polar coordinates: i.e. for the points z = re iϕ , the distribution usedplanar Lebesgue measure for ( r, ϕ ) ∈ [0 , × [0 , π ].5.4.7. Polynomials with roots on a square grid.
In this set of exper-iments, the roots were placed on a grid { , , . . . , n } + i { , , . . . , n } of points in C , for various values of n ∈ { , } , and once again thepolynomials were defined in terms of their roots, with evaluation inlinear time. Consistently, the Ehrlich–Aberth method was faster than OOT FINDING BY DYNAMICAL SYSTEMS 19
Figure 6.
Complexity for finding roots of Legendrepolynomials; log-log scale. Evaluated “slowly”.
Figure 7.
Complexity for finding roots distributed ran-domly on the unit circle; log-log scale.Newton’s method, but not as significantly as for random roots on thedisk. 6.
Conclusion
Analysis of the performance.
The outcome of our experimentsis that, among our two implementations, none of them is universallybetter: for some cases, one of the two methods outperforms the other,and for others the result is the opposite. What follows is a heuristicexplanation for the observed behavior.
Figure 8.
Complexity for finding roots distributed ran-domly in the unit disk; log-log scale.
Figure 9.
Complexity for finding roots on a square grid;log-log scale.(1)
Evaluation of polynomials.
Some of the experiments were moti-vated by the choice of polynomials that can be evaluated inlogarithmic time with respect to the degree. This leads toefficient evaluation by Newton’s method. Newton’s methodgenerally outperforms Ehrlich–Aberth for those polynomials:this includes iterated quadratic polynomials (Section 5.4.1) andChebyshev polynomials (Section 5.4.3), as well as the centers ofhyperbolic components that are among the “demo polynomials”for the MPSolve implementation.
OOT FINDING BY DYNAMICAL SYSTEMS 21
This advantage plays an especially significant role when thedegrees of the polynomials are large, and for these has the de-cisive effect on the overall efficiency outcome.(2)
Location of roots.
There are several additional cases whenNewton outperforms Ehrlich–Aberth, even though the evalu-ation of the polynomials was only linear with respect to thedegree: among our experiments these were Chebyshev polyno-mials (Section 5.4.3), Legendre polynomials (Section 5.4.4), andthose polynomials where the roots are randomly distributed onthe unit circle (Section 5.4.5). These were exactly those exper-iments where the roots were “exposed” from infinity, that is allroots were on the boundary of the convex hull formed by therootsOn the other hand, Ehrlich–Aberth was faster, sometimesvery much so, when roots were “hidden” in the interior of theconvex hull: this includes the case of iterated quadratics withlinear (“slow”) evaluation (Section 5.4.1) where the roots aredistributed along polynomial sets that are almost never convex,as well as the cases of random roots on the disk (Section 5.4.6)and when the roots were distributed on a grid (Section 5.4.7).6.2.
Complexity.
We are not aware of any complete analysis of thecomplexity of both methods, but there are some partial results thatshed some light on the overall efficiency.For the Ehrlich–Aberth-method, the complexity of every iterationstep seems to be of the order O ( d ), irrespective of how efficiently thepolynomials p (cid:48) /p can be evaluated. In the various experiments per-formed by Bini and Robol, no more than 20 iteration steps neededto be executed overall in order to find all roots (where the requiredprecision for finding the roots is largely irrelevant, at least in case ofsimple roots, because of the superlinear convergence once all roots areseparated). This suggests an overall complexity of O ( d ) for findingall roots. It is not sure that this method is generally convergent, eventhough failure to converge has never been observed experimentally ex-cept in cases with clear symmetry. If the method was not generallyconvergent, then this may have a relevant impact on the complexity.In the case of Newton’s method, the results are less clear, especiallysince the method is not generally convergent, and orbits that fail toconverge can frequently be observed. One needs at least d orbits toiterate, and can be certain to find all roots when starting at O ( d log d )initial points [HSS]. Each of these orbits starts away from the diskcontaining the roots, and in order to have good control on the starting points need to be iterated at least d times each in order to land inthe disk containing all roots; this gives a lower bound on O ( d log d )Newton iterations. It is shown in [BAS] that one can expect, underreasonable assumption of equidistribution of the roots, that the samecomplexity should also suffice to find all roots. Of course, this com-plexity needs to be multiplied with the complexity to evaluate eachNewton step, which may be O ( d ) or O (log d ) depending on how thepolynomial is specified. However, parallel evaluation of all the Newtonorbits can yield an overall logarithmic complexity for the Newton eval-uation, so that we expect again arithmetic complexity of O ( d log k d )for some k (based on a machine model that has infinite precision ineach operation); however, there may be issues with stability.In [RaSS], the iterated refinement Newton method was introduced;this was also the procedure implemented here as described earlier. Thenumber of orbits that are iterated in parallel goes down from d log d toa much smaller number. The exact complexity depends on the sensitiv-ity of refinement and is difficult to estimate theoretically (too sensitiverefinement leads to d orbits that are iterated in parallel, hence no gainin complexity; too insensitive refinement leads to missed roots that arehard to recover.)6.3. A new conjecture on starting points for Newton’s method.
Neither of the two root finding methods discussed here has a completetheory about its global dynamics. Very little is known about the globalproperties of the Ehrlich–Aberth-method (the recent results in [ReSS]show that the related Weierstrass method is not generally convergent,but it is not clear what the conclusion about Ehrlich–Aberth shouldbe). More is known about Newton’s method, beyond the fact that it isnot generally convergent and may have attracting cycles of any period.The first quantitative result was published in [HSS], as mentioned ear-lier; it is usually presented for polynomials with all roots in the unitdisk D (this is not a restriction after an appropriate coordinate change). Theorem 4 (Starting points for Newton’s method, [HSS]) . For everydegree d , there are c log d circles with c d log d points on each so thatfor every polynomial p of degree d with all roots in D , for every root α of p at least one of these c c d log d points converges to α underiteration of the Newton method. These circles and the points on them, as well as the constants c and c are explicitly given. When allowing to take the circles far away from D (thus allowing many iterations until the points on the circle reachthe disk containing the roots), one may get c c < . OOT FINDING BY DYNAMICAL SYSTEMS 23
This result was improved in [BLS], using a probabilistic set of start-ing points, to a total number of starting points of size c d (log log d ) that finds all roots with arbitrary desired probability (where of coursethe constant c depends on this probability).In the experiments described in this paper, we discovered that New-ton’s method finds roots more easily when they are on the boundary ofthe convex hull of all the roots. We believe that the complexity to findall roots with this property is much better, and suggest the following. Conjecture 1.
For every radius r > there is a universal constant c > with the following property. If p is a polynomial of degree d ,rescaled so that all its roots are in the unit disk, then (cid:100) cd (cid:101) equidis-tributed points on the circle ∂D r (0) have the property that for each rooton the boundary of the convex hull of all roots, at least one of the givenpoints will converge to this root under iteration of Newton’s method. In other words, there is a universal set of O ( d ) points that finds allthe d roots of p that are on the boundary of the convex hull. We believethat c ≈ . r .This conjecture goes back to the Bachelor thesis [Sh]. In order toexpress the underlying idea, we need to review some background from[HSS]. For every root α , the immediate basin of the root is the con-nected component containing α of the set of points that converge to α under the Newton iteration. A channel is a homotopy class of curvesin U α with endpoints fixed that connect the root α to ∞ . Every roothas some number k α of channels with 1 ≤ k α ≤ d −
1. Each of thesechannels has a certain “thickness”, measured in terms of a conformalinvariant. The thicker a channel, the fewer starting points are needed inorder to be sure that one of them is in the immediate basin. The morechannels a root has, the thinner they may be, and the more startingpoints are required.The idea behind the conjecture is that every root on the boundaryof the convex hull has “main channel” with the property that the rootcan be connected to ∞ along this main channel so that the convex hullof the roots is avoided. While this is a theorem, the conjecture is basedon the assumption that a main channel has at least the same thicknessas a root with only two channels. The conjecture would then follow asin [HSS] in the case when all roots are real.Of course, with the conjectured improvement of the number of start-ing points, the complexity of finding all the roots on the boundary ofthe convex hull, as described in [Sch1, BAS] would also improve. Postprocessing; mixing both approaches.
One notorious is-sue for the Iterated Refinement Newton Method is that it is optimizedbeyond the regime for which convergence can be guaranteed by theo-retical bounds; it depends on some experimental parameters such asthe refinement threshold. In case of too optimistic refinement, hencetoo insensitive refinement threshold, a certain number of roots maybe missed in the process. One possible conclusion is to run a “post-processing step” of Ehrlich–Aberth on the approximate roots foundby Newton. If most roots were found by Newton initially, say only d (cid:48) of the d roots are still missing, then the complexity of each Ehrlich–Aberth-postprocessing step requires O ( d (cid:48) d ) operations, so this impliesa nontrivial lower bound on the overall complexity. However, since thetotal number of Ehrlich–Aberth-steps is usually quite small (as men-tioned in Section 3.3 and also confirmed by our experiments), this maybecome a realistic option only when d (cid:48) = o ( d ).This post-processing was implemented in various of our experiments.The general observation was that, compared to necessary post-proces-sing by Ehrlich–Aberth, the performance of Newton’s method was gen-erally better when more sensitive refinement parameters were chosenthat avoided any post-processing steps, even though this led to earlierrefinement overall.Another rather radical approach is to run both methods in paral-lel, sharing execution time equally between both processes, and stop-ping once one of the two methods terminates successfully. This seemswasteful as roughly half the operations are not used at all eventually.However, the overall complexity is then the minimum of both methods,and if the complexities of both of them differ significantly in ways thatcannot be predicted efficiently ahead of time, this may lead to overallimproved complexity. (After all, for each of the two methods our focuswas on complexity not on improvement by bounded factors, so factorsor two were often ignored.)6.5. Possible improvements.
We take it as an encouraging sign thatsince the times of our original experiments, there have been inspiringdiscussions and suggestions for improvements on both methods.Newton’s method has its most obvious bottleneck in the case of poly-nomials with “slow” evaluation, i.e. when the computation of p (cid:48) /p hascomplexity O ( d ). Since this is a parallel computation, it can in princi-ple be accelerated by computing all orbits in parallel. We were awareof fast algorithms that were, however, too unstable to be used in prac-tice. We are grateful to Dan Spielman for having pointed out to us thatmeanwhile there are fast stable methods (albeit apparently not easy to OOT FINDING BY DYNAMICAL SYSTEMS 25 implement). These should have the potential to speed up Newton’smethod even for general polynomials to a speed comparable to thosewith “fast” evaluation.On the other hand, the Ehrlich–Aberth iteration step, especially inthe form given in (2), has two major ingredients: an evaluation of thelogarithmic derivative p (cid:48) /p , and the correction term (cid:80) i (cid:54) = k z k − z i . Thelogarithmic derivative benefit from the same advantages of fast compu-tation as for Newton’s method (especially in the cases of particularlysuitable polynomials like the ones considered in some of our experi-ments, but also for general polynomials as described above). However,since the correction term requires O ( d ) operations, its exact computa-tion can far outweigh all computations depending on the polynomials(which is why we had not optimized polynomial evaluations for theEhrlich–Aberth-method). Soon after the first version of this paper wassubmitted, we learned from Dario Bini and Victor Pan about the FastMultipole Method (FMM; see [CGR]) that makes a very fast approxi-mate computation of the correction term possible. Since all describedmethods only compute (hopefully ever-improving) approximations, itis not a structural disadvantage to settle for “approximate approxi-mations” — even more so as there is still no global theory for theEhrlich–Aberth-method even in its precise form.Dario Bini informed us that he has used this improved Ehrlich–Aberth-method very recently for certain polynomials that were definedby fast recursion (possibly inspired by the super-fast performance ofNewton’s method for such polynomials) with almost linear CPU timeup to very high degrees.It would be an interesting project to perform a fresh set of experi-ments where both methods are optimized in the ways described, andsee whether the relative advantages or comparisons in terms of recur-sive form of the polynomials, and shape of the roots, persist. Thiswould, however, be an entirely new project, inspired not the least bythe results of the experiments described here.6.6. Conclusion.
This project of comparing the efficiency between theNewton and Ehrlich–Aberth root finders is experimental, not an exactscience. It is supposed to inspire or possibly provoke further investi-gations into the question which root finders perform more efficientlyunder which circumstances, and how this depends on the propertiesof the polynomials — and possibly of the specific clever implementa-tions of the root finders; and we are quite pleased to learn that someof this has already happened since the first version of this paper wassubmitted. Surprisingly, even at this time the fundamental question of finding all roots of univariate complex polynomials seems far fromresolved and invites for more progress to be made!
References [Ar] Ludwig Arnold, ¨Uber die Nullstellenverteilung zuf¨alliger Polynome (On thedistribution of roots of random polynomials), Math. Z. (1966), 12–18.[BAS] Todor Bilarev, Magnus Aspenberg, and Dierk Schleicher, On the speed ofconvergence of Newton’s method for complex polynomials . Math. Comp. (2016), 693–705.[BLS] B´ela Bollob´as, Malte Lackmann, and Dierk Schleicher, A small probabilisticuniversal set of starting points for finding roots of complex polynomials byNewton’s method . Mathematics of Computation (2013), 443–457.[CGR] J. Carrier, Leslie Greengard, and Vladimir Rokhlin, A Fast Adaptive Mul-tipole Algorithm for Particle Simulations . SIAM J. Sci. and Stat. Comput. (1988) 669–686. Insert first names! [ET] Paul Erd˝os, P´al Tur´an,
On the distribution of roots of polynomials . Ann. ofMath. (2) (1950), 105–119.[Bi] Dario Andrea Bini, Numerical computation of polynomial zeros by means ofAberth’s method . Numerical algorithms , 179–200 (1996). DOI: 10.1007/BF02207694.[BF1] Dario Andrea Bini and Giuseppe Fiorentino, On the parallel evaluation ofa sparse polynomial at a point . Numerical Algorithms , 323–329 (1999).DOI: 10.1023/A:1019116203957.[BF2] Dario Andrea Bini and Giuseppe Fiorentino, Numerical computation of poly-nomials roots using MPSolve version .
2. 2000.[Pr] Petko D. Proinov,
Unified convergence analysis for Picard iteration in n -dimensional vector spaces . Calcolo Etude dynamique des polynˆomes com-plexes I & II (the “Orsay Notes”). Publications Math´ematiques d’Orsay(1984/85).[LMS] Russell Lodge, Yauhen Mikulich, and Dierk Schleicher,
A classifica-tion of postcritically finite Newton maps . Manuscript, submitted. arXiv:1510.02771.[HSS] John Hubbard, Dierk Schleicher, and Scott Sutherland,
How to find all rootsof complex polynomials with Newton’s method . Inventiones Mathematicae (2001), 1–33.[SSt1] Dierk Schleicher and Robin Stoll,
Newton’s method in practice: finding allroots of polynomials of degree one million efficiently . Journal of TheoreticalComputer Science (2017), 146–166. arXiv:1508.02935.[RaSS] Marvin Randig, Dierk Schleicher, and Robin Stoll,
Newton’s method in prac-tice II: The iterated refinement Newton method and near-optimal complexityfor finding all roots of some polynomials of very large degrees . Manuscript,submitted (2019). arXiv:1703.05847.[Ro] Leonardo Robol,
A rootfinding algorithm for polynomials and secular equa-tion . Master’s thesis, University of Pisa (2013).[ReSS] Bernhard Reinke, Dierk Schleicher, and Michael Stoll,
The Weierstrass rootfinder is not generally convergent.
Manuscript, in preparation.
OOT FINDING BY DYNAMICAL SYSTEMS 27 [Sch1] Dierk Schleicher,
On the efficient global dynamics of Newton’s method forcomplex polynomials . Manuscript, submitted. arXiv:1108.5773.[Sch2] Dierk Schleicher,
The dynamics of iterated polynomials . Manuscript, inpreparation.[Sh] Sergey Shemyakov,
Newton’s method for complex polynomials with roots ona unit circle . Bachelor thesis, Jacobs University (2018).
Institut de Mathmatiques (UMR CNRS7373) Campus de Luminy 163avenue de Luminy - Case 907 13288 Marseille 9
E-mail address : [email protected] E-mail address : [email protected] E-mail address : [email protected] E-mail address : [email protected] E-mail address ::