[PDF] Iterative Power Algorithm for Global Optimization with Quantics Tensor Trains

Abstract

Optimization algorithms play a central role in chemistry since optimization is the computational keystone of most molecular and electronic structure calculations. Herein, we introduce the iterative power algorithm (IPA) for global optimization and a formal proof of convergence for both discrete and continuous global search problems, which is essential for applications in chemistry such as molecular geometry optimization. IPA implements the power iteration method in quantics tensor train (QTT) representations. Analogous to the imaginary time propagation method with infinite mass, IPA starts with an initial probability distribution and iteratively applies the recurrence relation \rho_{k+1}({\bf x})=U({\bf x})\rho_k({\bf x})/\|U\rho_k\|_{L^1}, where U({\bf x})=e^{-V({\bf x})} is defined in terms of the potential energy surface (PES) V({\bf x}). Upon convergence, the probability distribution becomes a delta function, so the global minimum can be obtained as the position expectation value. QTT representations are generated by fast adaptive interpolation of multidimensional arrays to bypass the curse of dimensionality and the need to evaluate V({\bf x}) for all possible {\bf x}. We illustrate the capabilities of IPA for global search optimization of two multidimensional PESs, including a differentiable model PES of a DNA chain and a discrete non-differentiable PES, V(p)=\text{mod}(N,p), that resolves the prime factors of an integer N, with p in the space of prime numbers folded as a d-dimensional 2_1 \times 2_2 \times \cdots \times 2_d tensor. We find that IPA resolves multiple degenerate global minima even when separated by large energy barriers in the highly rugged landscape of the potentials. Therefore, IPA should be of great interest for a wide range of other optimization problems ubiquitous in molecular and electronic structure calculations.

Full PDF

IIterative Power Algorithm for GlobalOptimization with Quantics Tensor Trains

Micheline B. Soley, † , ‡ Paul Bergold, ¶ and Victor S. Batista ∗ , † , ‡ † Yale Quantum Institute, Yale University, P.O. Box 208334, New Haven, CT, 06520-8263,USA ‡ Department of Chemistry, Yale University, P.O. Box 208107, New Haven, CT, 06520,USA ¶ Zentrum Mathematik, Technische Universität München, Boltzmannstr. 3, 85748Garching, Germany

E-mail: [email protected] a r X i v : . [ qu a n t - ph ] J a n bstract We introduce the iterative power algorithm (IPA) for global optimization and aformal proof of convergence for both discrete and continuous global search problems.IPA implements the power iteration method in quantics tensor train (QTT) represen-tations. Analogous to the imaginary time propagation method with inﬁnite mass, IPAstarts with an initial probability distribution ρ ( r ) and iteratively applies the recurrencerelation ρ k +1 ( r ) = U ρ k ( r ) / (cid:107) U ρ k ( r ) (cid:107) L , where U = e − V ( r ) is deﬁned in terms of the‘potential energy’ function V ( r ) with global minimum at r = r ∗ . Upon convergence, theprobability distribution becomes a delta function δ ( r − r ∗ ) , so the global minimum canbe obtained as the position expectation value r ∗ = Tr [ r δ ( r − r ∗ )] . QTT representationsof V ( r ) and ρ ( r ) are generated by fast adaptive interpolation of multidimensional ar-rays to bypass the curse of dimensionality and the need to evaluate V ( r ) for all possiblevalues of r . We illustrate the capabilities of IPA as applied to solving the factorizationproblem formulated as a global search optimization on the "potential energy" surface V ( r ) = mod ( N, r ) , where N is the number to be factorized and r ∈ { , , , , , . . . } isthe space of prime numbers folded as a d -dimensional × × · · · × d tensor. We ﬁndthat IPA resolves multiple degenerate global minima corresponding to prime factorsof N even when separated by large energy barriers in the highly rugged landscape of V ( r ) . Therefore, IPA should be of great interest for a wide range of other optimizationproblems ubiquitous in molecular and electronic structure calculations. Introduction

The development of eﬃcient optimization algorithms remains a subject of great research in-terest since optimization problems are central to important applications in many branches ofscience and engineering, including molecular and electronic structure calculations. In controltheory, for example, global optimization algorithms are essential to determine the drives thatsteer a system into a desired ﬁnal state.

Another prototypical example is the problem ofﬁnding the minimum energy structure of a complex molecule, usually the ﬁrst step in studiesof molecular properties, molecular reactivity, and drug design.

The simplest approachfor ﬁnding the global optima in a discrete set is to sift through all possibilities. However,that approach becomes intractable for high dimensional systems since the number of possiblestates typically scales exponentially with the number of degrees of freedom– i.e. , the so-called“curse of dimensionality” problem. Analogously, simple approaches for continuous optimiza-tion involve sampling stochastically or deterministically.

Yet, these procedurestypically lead to “trapping” in local minima. Therefore, the development of eﬃcient globalsearch algorithms remains an open problem of great interest.In this paper, we build upon the strategy of the diﬀeomorphic modulation under observable-response-preserving homotopy (DMORPH) method, and we introduce the iterative poweralgorithm (IPA) for global optimization. DMORPH evolves a distribution function ρ ( r ) inthe search space of conﬁgurations, so that the distribution becomes localized at the global op-tima and the minimum can be revealed by computing the position expectation value. Anal-ogously, IPA implements the same strategy of evolving a probability distribution functionalthough with a very diﬀerent approach. Instead of implementing the DMORPH approach ofiteratively optimizing control parameters of an externally applied ﬁeld that localizes ρ ( r ) atthe global optimum, IPA applies a simple amplitude ampliﬁcation scheme based on the powermethod. The resulting algorithm is essentially an imaginary time propagation al-though with inﬁnite mass. The relation between the power method of linear algebra andthe imaginary time propagation method has been previously discussed, although it3emains to be formally analyzed.The power method is based on the recurrence relation ρ k +1 ( r ) = U ρ k ( r ) / (cid:107) U ρ k ( r ) (cid:107) L . Inthe IPA implementation, U = e − V ( r ) is deﬁned by the scaled potential energy surface V ( r ) ,and ρ k ( r ) is the density distribution after the k -th optimization step. Such an iterativeprocedure transforms any initial distribution with non-zero amplitude at the global minimuminto a delta function ρ ( r ) = δ ( r − r ∗ ) ( i.e. , the eigenvector of U ( r ) with maximum eigenvaluein the basis of Dirac delta functions). The global minimum can then be revealed, as in theDMORPH method, by computing the position expectation value r ∗ = Tr [ r ρ ( r )] .IPA can eﬃciently ﬁnd the global minimum of low-rank high-dimensional potential energysurfaces with d possible states r by approximating ρ ( r ) and V ( r ) in the form of quanticstensor trains (QTTs), a speciﬁc form of tensor trains (TT) or matrix product states(MPS) of great interest. The QTTs have arrays reshaped into × × · · · × d tensors, so they represent high-dimensional quantities Q ( i , ..., i d ) with d possible values. Since theydepend on d physical variables i k each of them with possible values, they are decomposedinto the outer product of tensor cores, as follows: Q ( i , ..., i d ) ≈ r (cid:88) α =1 r (cid:88) α =1 . . . r d − (cid:88) α d − =1 A (1 , i , α ) A ( α , i , α ) . . . A d ( α d − , i d , , (1)where Q is the reshaped d -dimensional tensor; A j are individual, order-three, rank r j tensorcores contracted over the auxiliary indices α j ; and i , . . . , i d ∈ { , } . The QTT format,introduced by Eq. (1), reduces the cost of evaluating Q over the search space of d possibilitiesto not more than dr evaluations for the maximal rank r = max( r , . . . , r d − ) . In addition,quantics tensor trains feature the same exponential improvement in data sparsity given byquantum computers, which oﬀers the possibility of developing methods like IPA that canbe thought of as classical computing analogues of quantum computing algorithms.Quantum search algorithms ( e.g. , the Grover’s search method ) typically initialize auniform superposition and evolve it multiple times until a measurement of the resulting4tate can identify one out of d possibilities with suﬃciently high probability. Analogously,we initialize ρ ( r ) as a uniform distribution in QTT format to enable sampling of the entiresearch space simultaneously. Iterative application of the recurrence relation ampliﬁes theamplitude at the global minima, which yields a ﬁnal density localized at the global minima.We prove that the number of steps required by IPA to amplify the amplitude of the globalminimum to a probability higher than 50% scales logarithmically with the size of the searchspace, which provides a valuable global search methodology alternative to well-establishedoptimization methods. The paper is organized as follows. The IPA method is introduced in Sec. 2, followedby the analysis of convergence rate in Sec. 3 and a discussion in perspective of existingapproaches in Sec. 4. Computational results are presented in Sec. 5, and conclusions inSec. 6. Appendix A presents a formal proof of IPA convergence. Appendix B analyzes theconvergence rate of the power method. Python codes to reproduce the reported calculationsare provided in Appendices C and D.

IPA solves the optimization problem min x ∈ R n V ( x ) , (2)for a given potential V : R n → R . Here, we limit the discussion to the one dimensionalcase n = 1 since any problem with n > can be vectorized into an n = 1 version. Toguarantee the existence of a global minimum, we assume V ( x ) is continuous and coercive( i.e. , V ( x ) → + ∞ as | x | → + ∞ ). Our goal is to compute the set of all minima locations of5 , arg min x ∈ R V ( x ) = (cid:110) x ∗ ∈ R | V ( x ) ≥ V ( x ∗ ) for all x ∈ R (cid:111) . (3)Therefore, we employ a non-negative probability density function ρ : R → [0 , ∞ ) that isbounded and with unit norm: (cid:107) ρ (cid:107) L = ˆ R d x ρ ( x ) = 1 . (4)The initial density ρ is supported (non-zero) around every minima x ∗ of the potential V , sofor all r > the initial density ρ satisﬁes the following condition: ˆ x ∗ + rx ∗ − r d x ρ ( x ) > . (5)In each IPA iteration, a transformation matrix U is applied from the left to ρ to increase thedensity amplitude at the global minimum positions relative to amplitudes at the remainderof the search space. The resulting density distribution U ρ is then normalized to obtain anew density ρ , which is the input for the next IPA iteration. Any U can be used, providedit satisﬁes the following two conditions: (i) U ( x ) must be a continuous and positive functionthat is maximized at the global minima of V arg max x ∈ R U ( x ) = arg min x ∈ R V ( x ) , (6)and (ii) U ( x ) must be integrable (we denote this by U ∈ L ( R ) ).A simple example is U ( x ) = e − βV ( x ) for a ﬁxed scaling parameter β > . We note thatEq. (6) holds for U ( x ) since the exponential is a strictly increasing function. Furthermore,the coercivity condition of the potential implies U ( x ) is integrable for a suﬃciently fastgrowing potential V ( x ) in the asymptotic region | x | → + ∞ .6 .1 Evolution: Amplitude Ampliﬁcation IPA generates a sequence of density distributions ρ , ρ , . . . , starting from a uniform distri-bution ρ , as follows: for k = 1 , , . . .r k = (cid:107) U ρ k − (cid:107) L = ˆ R d x U ( x ) ρ k − ( x ) ; ρ k ( x ) = U ( x ) ρ k − ( x ) r k = U ( x ) k ρ ( x ) (cid:107) U k ρ (cid:107) L ; end Since U is assumed to be continuous and integrable, we conclude it is bounded and L -normalizable ( U ∈ L ∞ ( R ) ∩ L ( R ) ). In particular, this guarantees the normalization factors r k > are well-deﬁned, since repeated applications of U remain L -normalizable ( U k ∈ L ( R ) for all iterations k ≥ ). Appendix A proves the sequence ρ , ρ , . . . produced by IPA converges to a "Dirac comb"– i.e. , a sum of Dirac delta functions located at the global minima x ∗ < x ∗ < · · · < x ∗ m of thepotential V (which can be viewed as the limit of the so-called Dirac sequences, as mentionedin Appendix A): ρ ﬁnal ( x ) = lim k →∞ ρ k ( x ) → m (cid:88) j =1 δ ( x − x ∗ j ) . (7) The global minima are obtained after obtaining ρ ﬁnal ( x ) , as follows:(i) When V ( x ) has a single global minimum at x = x ∗ , the minimum is obtained bycomputing the position expectation value with the ﬁnal density ρ ﬁnal ( x ) : x ∗ = (cid:104) x (cid:105) ρ ﬁnal = ˆ R d x xρ ﬁnal ( x ) . (8)7ii) When V ( x ) has only two degenerate global minima ( e.g. , as for the factorization ofbiprimes discussed below), we ﬁrst compute the position expectation value of ρ ﬁnal to obtainthe average position of the two global minima x . Then, we multiply ρ ﬁnal by a Heavisidestep function, Θ( x − ¯ x ) =  , if x ≤ x, , if x > x, (9)to obtain the distributions ρ ﬁnal ( x )Θ( x − ¯ x ) and ρ ﬁnal ( x )(1 − Θ( x − ¯ x )) , which are single deltafunctions resolving the two minima.(iii) When V ( x ) has an unknown number of global minima, we ﬁrst obtain ρ ﬁnal . Then,we reinitialize ρ = ρ ﬁnal and we run IPA with a "ramp potential" rather than using thepotential of the problem of interest. The ramp is usually a simple monotonically increasingfunction ( e.g. , V ( x ) = x ) that breaks the degeneracy of the Dirac comb by amplifying theamplitude of the minimum of all minima ( i.e. , x ∗ ). After computing x ∗ as an expectationvalue, we multiply ρ ﬁnal by the Heaviside function Θ( x − x ∗ ) introduced by Eq. (9) and werepeat the IPA ramp to identify the second minima. The scheme is then repeated until allglobal minima are resolved. IPA is not limited to a speciﬁc choice of basis set representation for ρ ( x ) , V ( x ) and U ( x ) .However, we employ the Quantics Tensor Train (QTT) representation, generated byfast adaptive interpolation of multidimensional arrays as implemented in Oseledets’ TT-Toolbox. The resulting implementation bypasses the curse of dimensionality, which allowsfor applications to high dimensional potentials (Python scripts provided in Appendices Cand D). 8

Convergence Rate Analysis

Appendix A provides a formal proof of convergence for IPA continuous global optimization.Here, we focus on discrete optimization for a problem with a single global minimum. We showthat the number of IPA steps necessary to amplify the amplitude of the global minimumto a value higher than / scales logarithmically with the number N of possible states.The analysis is analogous to the estimation of the number queries required for amplitudeampliﬁcation by Grover’s algorithm. First, we show that IPA converges to the global minimum for the speciﬁc case where U is given by an N × N diagonal matrix U with N ≥ positive entries (eigenvalues λ j with j = 1 , · · · , N ) with a unique maximum λ > . For simplicity, we take all other eigenvaluesto be λ , with < λ < λ . (10)Hence, U can be expressed as follows: U = diag ( λ , . . . , λ , λ , λ , . . . , λ ) ∈ R N × N , (11)where λ is the k -th diagonal entry for some ≤ k ≤ N . An illustration of U is given inFig. 1.We consider an initial density given by the discrete uniform distribution ρ = 1 N (1 , . . . , ∈ R N . (12)The k -th IPA iteration updates the density distribution, as follows: ρ k = U ρ k − (cid:107) U ρ k − (cid:107) = U k ρ (cid:107) U k ρ (cid:107) , (13) = u k (cid:107) u k (cid:107) , (14)9 λ Position in Search Space E i g e n v a l u e Value of Oracle U Figure 1: Illustration of the vector form of U assumed to have a unique maximum eigenvalue λ and all other eigenvalues of equal amplitude λ .where repeated application of U yields: u k = (cid:0) λ k , . . . , λ k , λ k , λ k , . . . , λ k (cid:1) , (15)with norm (cid:107) u k (cid:107) = N (cid:88) j =1 | ( u k ) j | = λ k + ( N − λ k . (16)We note that λ k > λ k since λ > λ , so the vector ρ k produced after k iterations has N ≥ positive entries, a unique maximum ρ k, max = max j =1 ,...,N ( ρ k ) j = λ k (cid:107) u k (cid:107) , (17)and all other entries with value ρ k, min = min j =1 ,...,N ( ρ k ) j = λ k (cid:107) u k (cid:107) . (18)10herefore, the minimum to maximum amplitude ratio is ρ k, min ρ k, max = (cid:18) λ λ (cid:19) k . (19)Each IPA iteration decreases the ratio by a factor of λ /λ < while the norm is conserved.Therefore, only the maximum entry of the state vector ρ k survives in the limit of an inﬁnitenumber of iterations k → + ∞ .Using the normalization condition, (cid:107) ρ k (cid:107) = ρ k, max + ( N − ρ k, min , (20)and inserting the ratio given by Eq. (19) into the normalization condition introduced byEq. (20), we can solve for the maximum amplitude ρ k, max , as follows: ρ k, max = 11 + ( N − · ( λ /λ ) k , (21)which converges to 1 in the limit k → ∞ .The number of iterations required to amplify the amplitude of the global minimum to avalue higher than or equal to / is

11 + ( N − · ( λ /λ ) k ≥ . (22)Solving this inequality gives the minimum number of required IPA iterations, k ≥ log ( N − λ /λ ) , (23)which scales logarithmically with the size of the search space N and inverse logarithmicallywith the ratio of eigenvalues λ /λ . 11 Comparison to Other Methods

IPA can be compared to the power method and imaginary time propagation.

Theconnection between the power method and imaginary time propagation has been discussed, although the relationship between the two methods has yet to be formally analyzed.We begin with the recurrence relation of the power method. For a matrix U ∈ C N × N with eigenvalues λ , . . . , λ N ∈ C , the subscripts denote the order | λ | > | λ | ≥ · · · ≥ | λ N | .Given a starting vector ρ ∈ C N that has a non-zero amplitude along the direction of theeigenvector with largest eigenvalue λ , the power method produces the following sequence ofvectors ρ k ∈ C N : ρ k = U ρ k − (cid:107) U ρ k − (cid:107) = U k ρ (cid:107) U k ρ (cid:107) , (24)a sequence that converges to an eigenvector associated with the largest eigenvalue λ independently of the norm (cid:107) · (cid:107) . The resulting convergence is geometric in the ratio, (cid:12)(cid:12)(cid:12)(cid:12) λ λ (cid:12)(cid:12)(cid:12)(cid:12) < . (25)We note that according to the recurrence relation, introduced by Eq. (24), imaginary timepropagation is essentially the power method where ρ represents a trial initial wavefunctionin a given basis set and U is the matrix representation of the Boltzmann operator e − β ˆ H ,where the Hamiltonian ˆ H is typically ˆ H = ˆ p / (2 m ) + V ( x ) with m the mass and ˆ p themomentum operator.In IPA, however, ρ is a probability density and U can be any integrable, continuous, andpositive function of x that is maximal at the global minimum of V . As a result, IPA ﬁnds theglobal minima of V ( x ) while the imaginary time propagation method ﬁnds the eigenstate ofthe Hamiltonian with minimum eigenvalue (i.e., the ground state). For the particular choiceof U ( x ) = e − βV ( x ) , however, IPA corresponds to imaginary time propagation with m = ∞ .12q. (24) also shows that IPA diﬀers from the power method because it employs U ∈ L ∞ ( R ) ∩ L ( R ) that meets the conditions described in Section 2 and a probability densityfunction ρ ∈ L ( R ) to ﬁnd the global minima, whereas the power method employs anarbitrary matrix U ∈ C N × N and a discrete vector ρ ∈ C N to ﬁnd an eigenvector. Thisrelationship also allows us to use the power method to analyze the convergence rate of IPAfor discrete problems, as discussed in Appendix B. This section shows that IPA successfully ﬁnds the degenerate global minima r ∗ < r ∗ < ...

Figure 3: As expected, the IPA procedure for global optimization of the function Eq. (26),for N as deﬁned in the text, yielded a ﬁnal density in the form of a Dirac comb that wasmaximal at positions of global optima and zero elsewhere. Only a fraction of the searchspace of prime numbers is illustrated for clarity.15 D en s i t y ( a r b . un i t s ) Prime NumberIsolated Components of the Final Densityp p p p p p p p p Figure 4: The Dirac delta components of the ﬁnal density in IPA were successfully isolatedwithout evaluation of the function at all points on the search space via the ramp methodfor U = e − ˜ β ramp with parameter ˜ β = 0 . and found to be located at the global optima ofthe function Eq. (26) for the large number N . Given the size of the search space of primenumbers, the density is shown in a restricted region to enable visualization of its maximalvalues.Figure 5 shows the IPA execution time as a function of N when solving the factorizationproblem of biprimes N = p × p , where p and p are primes with values up to 9998000099and where U = e − βV with β = 20 (requiring only one IPA iteration). The regressionanalysis shows that the execution time scales approximately as O (ln( N )) ( R = 0 . ), or O (ln (ln( N ))) ( R = 0 . ). The logarithmic scaling agrees with the analysis of Section 3,which shows that the resulting scaling for amplitude ampliﬁcation is comparable to or betterthan in optimal quantum search algorithms ( e.g. , the Grover quantum search method, where the number of queries necessary to amplify the amplitude of one out of N possiblestates scales as O (cid:16) √ N (cid:17) ). 16 T i m e ( s e c ond s ) NReal Execution Time IPAln(x) Fit, R =0.978ln(ln(x)) Fit, R =0.977 Figure 5: The real execution time for IPA global optimization of the function Eq. (26) in thetwin global minima case ( i.e. , for prime factorization of biprimes N ), which agrees with thepredicted scaling of Section 3 and which is comparable or better than the number of stepsrequired for the rate-limiting part of the foremost quantum approach. The QTT implementation of IPA illustrates the possibility of developing eﬃcient algorithmsfor classical computing. Analogous to quantum computing algorithms, superposition statescan be evolved by applying a sequence of unitary transformations, and the outcome of thecalculation corresponds to a "measurement" ( i.e. , an expectation value obtained with theevolved superposition). The QTT representation avoids the curse of dimensionality, enablingbenchmark calculations that would be otherwise impossible on classical high-performancecomputing facilities. We ﬁnd that such a computational strategy enables IPA to performquite eﬃciently, bypassing the usual limitations of traditional optimization methods. There-fore, it is natural to anticipate that IPA should be of great interest for a wide range ofapplications, including optimization problems in molecular and electronic structure calcula-tions. 17 cknowledgements

The authors are grateful for conversations with Dr. Caroline Lasser, Dr. Erik T. J. Nibbering,and Dr. Maximilian Engel. M. B. S. acknowledges ﬁnancial support from the Yale Quan-tum Institute Postdoctoral Fellowship, the National Science Foundation Graduate ResearchFellowship grant number DGE-1144152 and the Blue Waters Graduate Research Fellowship,part of the Blue Waters sustained-petascale computing project supported by the NationalScience Foundation (awards OCI-0725070 and ACI-1238993) and the State of Illinois. BlueWaters is a joint eﬀort of the University of Illinois at Urbana-Champaign and its NationalCenter for Supercomputing Applications. V.S.B. acknowledges support from the NSF GrantNo. CHE-1900160, and high performance computing time from NERSC and the Yale HighPerformance Computing Center.

TOC Graphic eywords Global optimization, quantum computing, tensor networks, prime factorization, quantumsuperposition

Appendix

A Proof of Convergence

This section shows that the sequence generated by the IPA recurrence relation convergesto a delta distribution δ ( x − x ∗ ) when V ( x ) has a single global minimum at x = x ∗ . Ananalogous proof can be provided for surfaces with multiple global minima by generalizationof the concept of a Dirac sequence.The sequence of densities ρ k converges to the delta distribution as the Dirac sequence:(i) For all k ∈ N and all x ∈ R : ρ k ( x ) ≥ ,(ii) For all k ∈ N : ρ k ∈ L ( R ) and ˆ R d x ρ k ( x ) = 1 ,(iii) For all ε > : lim k →∞ ˆ R \ ( x ∗ − ε,x ∗ + ε ) d x ρ k ( x ) = 0 , where the integral is evaluated over thereal line R except the interval ( x ∗ − ε, x ∗ + ε ) .These conditions guarantee the area under the curve ρ k is concentrated near the globalminimum location x ∗ , provided the number of iterations k is suﬃciently large.The properties (i) and (ii) follow by construction of the IPA sequence. To prove property(iii), let ε > be a positive distance. For a radius r > , we denote the minimum of U onthe interval [ x ∗ − r, x ∗ + r ] by m r = min x ∈ [ x ∗ − r,x ∗ + r ] U ( x ) . (28)19ince U is continuous with a single global maximum at x ∗ , there exists a radius r ε > such that the number m r ε is a positive and strict upper bound for U outside the interval ( x ∗ − ε, x ∗ + ε ) , as follows (Figure 6): U ( x ) m r ε < , for all x ∈ R \ ( x ∗ − ε, x ∗ + ε ) . (29) x ∗ εεm r ε • ••• • xr ε U Figure 6: There exists a radius r ε > such that the minimum m r ε on [ x ∗ − r ε , x ∗ + r ε ] isa strict upper bound (orange) for all values outside the interval ( x ∗ − ε, x ∗ + ε ) (shown inblue), since U is a continuous function with a single global maximum at x = x ∗ .We then introduce the probability, p ε = ˆ x ∗ + r ε x ∗ − r ε d x ρ ( x ) > , (30)and according to the deﬁnition of the minimum m r ε > , introduced by Eq. (28), for all k ≥ we obtain the norm, (cid:107) U k ρ (cid:107) L = ˆ R d x U ( x ) k ρ ( x ) ≥ m kr ε ˆ x ∗ + r ε x ∗ − r ε d x ρ ( x ) = m kr ε p ε , (31)20hich gives the bound, ρ k ( x ) = U ( x ) k ρ ( x ) (cid:107) U k ρ (cid:107) L ≤ (cid:107) ρ (cid:107) ∞ p ε (cid:18) U ( x ) m r ε (cid:19) k for all x ∈ R , (32)where (cid:107) ρ (cid:107) ∞ is the supremum sup x ∈ R | ρ ( x ) | . According to Eq. (29), U ( x ) /m r ε < for allpositions outside the interval ( x ∗ − ε, x ∗ + ε ) . Hence, we conclude that the density after k iterations is bounded for all those positions x and all iterations k ≥ , as follows: ρ k ( x ) ≤ (cid:107) ρ (cid:107) ∞ p ε U ( x ) m r ε , (33)showing that the sequence is dominated by an integrable function. Thus, the Lebesguedominated convergence theorem yields lim k →∞ ˆ R \ ( x ∗ − ε,x ∗ + ε ) d x ρ k ( x ) = ˆ R \ ( x ∗ − ε,x ∗ + ε ) d x lim k →∞ ρ k ( x ) = 0 . (34) B Power Method: Convergence Rate Analysis

We consider a diagonal matrix U ∈ R N × N whose entries are given by the value of U at theequally spaced positions x , . . . , x N with ∆ x = x j +1 − x j = ( b − a ) /N in the ﬁnite interval x = [ a, b ] , U = diag (cid:0) U ( x ) , U ( x ) , . . . , U ( x N ) (cid:1) . (35)We consider an initial vector whose entries are given by the value of the initial density ρ atthe same positions, ρ = (cid:0) ρ ( x ) , ρ ( x ) , . . . , ρ ( x N ) (cid:1) ∈ R N . (36)21hen N is suﬃciently large, we obtain the following approximation for all iterations: (cid:107) U k ρ (cid:107) = N (cid:88) j =1 U ( x j ) k ρ ( x j ) ≈ x ˆ R d x U ( x ) k ρ ( x ) = 1∆ x (cid:107) U k ρ (cid:107) L . (37)In the following, we denote by ρ ∗ ∈ R N the vector whose j -th coordinate equals one if U ( x j ) = λ is the dominant eigenvalue of U and zero otherwise. Moreover, we introduce theconstant, c = 1 { j | U ( x j ) = λ } , (38)where we use the notation A for the cardinality (i.e. , number of elements in the set). Thedeﬁnition of U in Eq. (35) yields that the sequence ρ , ρ , . . . produced by the power iteration( i.e. , Eq. (24) using the norm (cid:107) · (cid:107) ) converges to c · ρ ∗ . Using the approximation in Eq. (37),we conclude that the density ρ k produced by IPA can be approximated at a given grid point x j as ρ k ( x j ) = U ( x j ) k ρ ( x j ) (cid:107) U k ρ (cid:107) L ≈ x ( U k ρ ) j (cid:107) U k ρ (cid:107) k →∞ −→ c ∆ x ρ ∗ j . (39)In the special case where U has a unique dominant eigenvalue, say λ = U ( x n ) for someunique n ∈ { , . . . , N } , we get ρ ∗ j is the Kronecker delta δ j,n . This allows us to conﬁrmIPA generates a Dirac sequence at the global minimum for discrete optimization problems.The relationship of this expression to that of the power method also shows IPA inherits thegeometric convergence rate in the ratio λ /λ < from the power method, in agreementwith the alternative analysis introduced in Section 3.To further specify the convergence rate of IPA, we relate the ratio λ /λ of the powermethod to the spacing ∆ x in IPA. This is accomplished by classifying the steepness of U around its maximum location x ∗ via local approximations by polynomials of even degree. Ifthere exists a positive parameter α > and an integer m ≥ such that, for all positions22ithin a distance of ∆ x of the maximum x ∗ , U is bounded from below by U ( x ) ≥ U ( x ∗ ) − α ( x − x ∗ ) m , (40)then the eigenvalue λ is bounded from below by U ( x ∗ ) − α ∆ x m . Therefore, we concludethe rate of convergence is bounded as λ λ ≥ U ( x ∗ ) − α ∆ x m U ( x ∗ ) = 1 − αU ( x ∗ ) ∆ x m . (41)In particular, λ /λ → as ∆ x → . C Multiple Degenerate Global Minima

The following Python script illustrates the implementation of IPA as applied to ﬁndingmultiple degenerate global minima corresponding to the degenerate prime factors of theinteger N = (3 × × × × × × × × with 2,773 digits, when usingthe ttpy library installed from http://github.com/oseledets/ttpy. i m p o r t numpy a s npfrom numpy i m p o r t z e r o s , r e s h a p e , s q r t , arange , v e c t o r i z e , e x t r a c t , i n ti m p o r t m a t p l o t l i b . p y p l o t a s p l ti m p o r t t ti m p o r t mpmathfrom mpmath i m p o r t mp, mpf , f l o o r , exp , n i n td e f p a r a m e t e r s ( ) :g l o b a l dim , eps , num , rmax , n s t e p s , d , s e a r c h s p a c e s i z e , beta , b e t a p r i m enum=mpf ( 3 ∗ 3 ∗ 1 1 ∗ 1 7 ∗ 2 3 ∗ 4 1 ∗ 5 3 ∗ 7 9 ∗ 1 0 1 ∗ 1 0 9 ) ∗ ∗ 2 0 0b e t a =30b e t a p r i m e =0.5dim=1e p s =1.0 e −100rmax=100n s t e p s =3d=6 e a r c h s p a c e s i z e =2∗∗dr e t u r n ( )d e f rhoo ( i n p u t ) :V=1.0+0∗ i n p u tr e t u r n Vd e f i s _ p r i m e ( n ) :i f n % 2 == 0 and n > 1 :r e t u r n F a l s er e t u r n a l l ( n % i f o r i i n r a n g e ( 3 , i n t ( s q r t ( n ) ) + 1 , 2 ) )d e f t t o ( i n p u t , param=None ) :g l o b a l num , b e t an e v a l s , dim=i n p u t . shapeout=np . z e r o s ( ( n e v a l s , ) )f o r i i i n r a n g e ( n e v a l s ) :a=num−n i n t ( i n p u t [ i i , 0 ] ) ∗ f l o o r (num/ n i n t ( i n p u t [ i i , 0 ] ) )out [ i i ]= i n p u t [ i i , 1 ] ∗ exp(− b e t a ∗ a )r e t u r n ( out )d e f newtto ( i n p u t , param=None ) :g l o b a l num , b e t a p r i m en e v a l s , dim=i n p u t . shapeout=np . z e r o s ( ( n e v a l s , ) )f o r i i i n r a n g e ( n e v a l s ) :a=i n p u t [ i i , 0 ]out [ i i ]= i n p u t [ i i , 1 ] ∗ exp(− b e t a p r i m e ∗ a )r e t u r n ( out )d e f t t r o u n d ( i n p u t , param=None ) :n e v a l s , dim=i n p u t . s hapeout=np . z e r o s ( ( n e v a l s , ) )f o r i i i n r a n g e ( n e v a l s ) :out [ i i ]=np . round ( i n p u t [ i i , 1 ] )r e t u r n ( out )d e f t t r e m o v e f a c t o r ( i n p u t , param=None ) :g l o b a l l a r g e s t f a c t o rn e v a l s , dim=i n p u t . shapeout=np . z e r o s ( ( n e v a l s , ) )f o r i i i n r a n g e ( n e v a l s ) :i f i n p u t [ i i , 0 ] − 0 . 5 < l a r g e s t f a c t o r :out [ i i ] = 0 .e l s e :out [ i i ]=np . round ( i n p u t [ i i , 1 ] )r e t u r n ( out ) f __name__ == "__main__" :g l o b a l eps , num , rmax , n s t e p s , d , s e a r c h s p a c e s i z e , l a r g e s t f a c t o rnp . random . s e e d ( 1 2 3 4 )mp . dps = 3000p a r a m e t e r s ( )p r i n t (num)a=a r a n g e ( 2 , 1 0 ∗ ∗ 6 )f o o=v e c t o r i z e ( i s _ p r i m e )p b o o l s=f o o ( a )p r i m e s=e x t r a c t ( p b o o l s , a )pp=np . z e r o s ( s e a r c h s p a c e s i z e , dtype=f l o a t )f o r j i n r a n g e ( s e a r c h s p a c e s i z e ) :pp [ j ]= p r i m e s [ j ]t t p p=t t . t e n s o r ( r e s h a p e ( pp , [ 2 ] ∗ d ) )l p r i m e s = [ ]t t r h o=t t . m u l t i f u n c r s 2 ( [ t t p p ] , rhoo , eps , v e r b =0 ,rmax=rmax )f o r k i n r a n g e ( n s t e p s ) :t t r h o=t t . m u l t i f u n c r s ( [ ttpp , t t r h o ] , t t o , eps , v e r b =0 ,rmax=rmax )t t r h o s t o r e=t t . m u l t i f u n c r s ( [ ttpp , t t r h o ] , t t r o u n d , eps , v e r b =0 ,rmax=rmax )t t r h o=t t r h o ∗ ( 1 . 0 / ( t t r h o . norm ( ) ) ∗ ∗ 2 )p l t . bar ( pp , r e s h a p e ( t t r h o . f u l l ( ) , s e a r c h s p a c e s i z e ) , c o l o r =’ red ’ , l s = ’ − ’ , l a b e l =’ t t r h o ’ )p l t . x l a b e l ( " Prime Number " )p l t . y l a b e l ( " D e n s i t y [ arb . u n i t s . ] " )p l t . t i t l e ( " Optimized D e n s i t y " )p l t . x l i m ( 1 , 1 2 8 )p l t . pause ( 1 . 0 )p l t . s a v e f i g ( ’ diraccomb . png ’ )l a r g e s t f a c t o r =0.c o u n t=0w h i l e num > 1 :c o u n t=c o u n t+1t t r h o=t t . m u l t i f u n c r s ( [ ttpp , t t r h o s t o r e ] , t t r e m o v e f a c t o r , eps , v e r b =0 ,rmax=rmax )f o r k i n r a n g e ( n s t e p s ) :t t r h o=t t . m u l t i f u n c r s ( [ ttpp , t t r h o ] , newtto , eps , v e r b =0 ,rmax=rmax )t t r h o=t t r h o ∗ ( 1 . 0 / t t r h o . norm ( ) )ev=n i n t ( t t . dot ( ttpp , t t r h o ) )l a r g e s t f a c t o r=evl p r i m e s . append ( ev )num=num/ evw h i l e n i n t (num)% n i n t ( ev ) == 0 :l p r i m e s . append ( ev ) um=num/ evp l t . c l f ( )p l t . bar ( pp , r e s h a p e ( t t r h o . f u l l ( ) , s e a r c h s p a c e s i z e ) , c o l o r =’ red ’ , l s = ’ − ’ ,l a b e l =’ D e n s i t y ’ )p l t . x l a b e l ( " Prime Number " )p l t . y l a b e l ( " D e n s i t y [ arb . u n i t s . ] " )p l t . t i t l e ( " Prime F a c t o r %i " % c o u n t )p r i n t ( " prime f a c t o r s =" , l p r i m e s , num)p l t . x l i m ( 1 , 1 2 8 )p l t . s a v e f i g ( ’ p r i m e f a c t o r ’+ s t r ( c o u n t ) + ’ . png ’ ) D Paired Degenerate Global Minima

The following Python script illustrates the implementation of IPA as applied to ﬁnding theprime factors of the biprime N = 99989 × by resolving the degenerate global minimaof the mod function, as described in the text, while using the ttpy library installed fromhttp://github.com/oseledets/ttpy. i m p o r t numpy a s npfrom numpy i m p o r t z e r o s , r e s h a p e , s q r t , arange , v e c t o r i z e , e x t r a c t , i n t , empty_likei m p o r t t ti m p o r t mpmathfrom mpmath i m p o r t mp, mpf , f l o o r , exp , n i n td e f p a r a m e t e r s ( ) :g l o b a l dim , eps , num , rmax , n s t e p s , d , s e a r c h s p a c e s i z e , b e t anum=mpf ( 9 9 9 8 9 ∗ 9 9 9 9 1 )sqrtnum=s q r t (num)i f sqrtnum < 2 4 :d=3e l i f sqrtnum < 6 0 :d=4e l i f sqrtnum < 1 3 8 :d=5e l i f sqrtnum < 3 1 4 :d=6e l i f sqrtnum < 7 2 8 :d=7e l i f sqrtnum < 1 6 2 2 :d=8e l i f sqrtnum < 3 6 7 4 :d=9 l i f sqrtnum < 8 1 6 8 :d=10e l i f sqrtnum < 1 7 8 8 2 :d=11e l i f sqrtnum < 3 8 8 9 2 :d=12e l i f sqrtnum < 8 4 0 4 8 :d=13e l i f sqrtnum < 1 8 0 5 1 2 :d=14e l s e :p r i n t ( " E r r o r : Dimension not implemented . " )q u i t ( )rmax=100b e t a =20dim=1e p s =1.0 e −100n s t e p s =1s e a r c h s p a c e s i z e =2∗∗dr e t u r n ( )d e f rhoo ( i n p u t ) :V=1.0+0∗ i n p u tr e t u r n Vd e f i s _ p r i m e ( n ) :i f n % 2 == 0 and n > 1 :r e t u r n F a l s er e t u r n a l l ( n % i f o r i i n r a n g e ( 3 , i n t ( s q r t ( n ) ) + 1 , 2 ) )d e f t t o ( i n p u t , param=None ) :g l o b a l num , b e t an e v a l s , dim=i n p u t . shapeout=np . z e r o s ( ( n e v a l s , ) )f o r i i i n r a n g e ( n e v a l s ) :a=num−n i n t ( i n p u t [ i i , 0 ] ) ∗ f l o o r (num/ n i n t ( i n p u t [ i i , 0 ] ) )i f a > 1 0 :a=10out [ i i ]= i n p u t [ i i , 1 ] ∗ exp(− b e t a ∗ a )r e t u r n ( out )i f __name__ == "__main__" :g l o b a l eps , num , rmax , n s t e p s , d , s e a r c h s p a c e s i z e , t t a v gnp . random . s e e d ( 1 2 3 4 )mp . dps =2000p a r a m e t e r s ( )p r i n t (num) =a r a n g e ( 2 , 1 0 ∗ ∗ 6 )f o o=v e c t o r i z e ( i s _ p r i m e )p b o o l s=f o o ( a )p r i m e s=e x t r a c t ( p b o o l s , a )pp=np . z e r o s ( s e a r c h s p a c e s i z e , dtype=f l o a t )f o r j i n r a n g e ( s e a r c h s p a c e s i z e ) :pp [ j ]= p r i m e s [ j ]t t p p=t t . t e n s o r ( r e s h a p e ( pp , [ 2 ] ∗ d ) )l p r i m e s = [ ]t t r h o=t t . m u l t i f u n c r s 2 ( [ t t p p ] , rhoo , eps , v e r b =0 ,rmax=rmax )f o r k i n r a n g e ( n s t e p s ) :t t r h o=t t . m u l t i f u n c r s ( [ ttpp , t t r h o ] , t t o , eps , v e r b =0 ,rmax=rmax )t t r h o=t t r h o ∗ ( 1 . 0 / ( t t r h o . norm ( ) ) ∗ ∗ 2 )t t r h o s t o r e=t t r h ot t a v g=n i n t ( t t . dot ( ttpp , t t r h o ) )h e a v i s i d e=empty_like ( pp )f o r j i n r a n g e ( s e a r c h s p a c e s i z e ) :i f pp [ j ] − 0 . 5 > t t a v g :h e a v i s i d e [ j ] = 0 .e l s e :h e a v i s i d e [ j ] = 1 .t t h e a v i s i d e=t t . t e n s o r ( r e s h a p e ( h e a v i s i d e , [ 2 ] ∗ d ) )t t r h o=t t h e a v i s i d e ∗ t t r h o s t o r eev=n i n t ( t t . dot ( ttpp , t t r h o ∗ ( 1 . 0 / ( t t r h o . norm ( ) ) ) ) )num=n i n t (num/ ev )l p r i m e s . append ( ev )l p r i m e s . append (num)p r i n t ( " prime f a c t o r s =" , l p r i m e s , num) References (1) Bellman, R. E.

Adaptive Control Processes: A Guided Tour ; Princeton University Press:Princeton, NJ, 1961.(2) Li, X.; Pęcak, D.; Sowiński, T.; Sherson, J.; Nielson, A. E. B. Global optimization forquantum dynamics of a few-fermion systems.

Phys. Rev. A , , 033602.(3) Shi, S.; Woody, A.; Rabitz, H. Optimal control of selective vibrational excitation inharmonic linear chain molecules. J. Chem. Phys. , , 6870–6883.284) Shi, S.; Rabitz, H. Selective excitation in harmonic molecular systems by optimallydesigned ﬁelds. Chem. Phys. , , 185–199.(5) Peirce, A. P.; Dahleh, M. A.; Rabitz, H. Optimal control of quantum-mechanical sys-tems: Existence, numerical approximation, and applications. Phys. Rev. A , ,4950–4964.(6) Kosloﬀ, R. Wavepacket dancing: Achieving chemical selectivity by shaping light pulses. Chem. Phys. , , 201–220.(7) Jakubetz, W. Theory of optimal laser pulses for selective transitions between moleculareigenstates. Chem. Phys. Lett. , , 100–106.(8) Brif, C.; Chakrabarti, R.; Rabitz, H. Control of quantum phenomena: past, presentand future. New J. Phys. , , 075008.(9) Soley, M.; Markmann, A.; Batista, V. S. Steered quantum dynamics for energy mini-mization. J. Phys. Chem. B , , 715–727.(10) Soley, M. B.; Markmann, A.; Batista, V. S. Classical Optimal Control for Energy Min-imization Based on Diﬀeomorphic Modulation under Observable-Response-PreservingHomotopy. J. Chem. Theory Comput. , , 3351–3362.(11) Levinthal, C. How to Fold Graciously. Mossbauer Spectroscopy in Biological Systems:Proceedings of a meeting held in Allerton House, Monticello, Illinois. Monticello, Illi-nois, 1969; pp 22–24.(12) Šali, A.; Shakhnovich, E.; Karplus, M. How does a protein fold? Nature , ,248–251.(13) Wales, D. J.; Doye, J. P. K.; Miller, M. A.; Mortenson, P. N.; Walsh, T. R. Energylandscapes: From clusters to biomolecules. Adv. Chem. Phys. , , 1–111.2914) Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. The Protein Folding Problem. Annu. Rev. Biophys. , , 289–316.(15) Fogel, L. J. Autonomous Automata. Industrial Research Magazine , , 14–19.(16) Pincus, M. A Closed Form Solution of Certain Programming Problems. Oper. Res. , , 690–694.(17) Daniel Joseph Cavicchio, J. Adaptive search using simulated evolution. Ph.D. thesis,University of Michigan, Ann Arbor, MI, 1970.(18) Pincus, M. A Monte Carlo Method for the Approximate Solution of Certain Types ofConstrained Optimization Problems. Oper. Res. , , 1225–1228.(19) Holland, J. H. Adaptation in natural and artiﬁcial systems: an introductory analysiswith applications to biology, control, and artiﬁcial intelligence ; University of MichiganPress, 1975.(20) Kirkpatrick, S.; C. D. Gelatt, J.; Vecchi, M. P. Optimization by simulated annealing.

Science , , 671–680.(21) Černý, V. Thermodynamical approach to the traveling salesman problem: An eﬃcientsimulation algorithm. J. Optimize. Theory App. , , 41–51.(22) Li, Z.; Scheraga, H. A. Monte Carlo-minimization approach to the multiple-minimaproblem in protein folding. Proc. Natl. Acad. Sci. USA , , 6611–6615.(23) Koza, J. R. Hierarchical Genetic Algorithms Operating on Populations of ComputerPrograms. Proceedings of the Eleventh International Joint Conference on ArtiﬁcialIntelligence IJCAI-89. Detroit, MI, 1989; pp 768–774.(24) Koza, J. R. Genetic Programming: A Paradigm for Genetically Breeding Populationsof Computer Programs to Solve Problems ; 1990.3025) Wales, D. J.; Doye, J. P. K. Global Optimization by Basin-Hopping and the LowestEnergy Structures of Lennard-Jones Clusters Containing up to 110 Atoms.

J. Phys.Chem. A , , 5111–5116.(26) Hooke, R.; Jeeves, T. A. "Direct Search" Solution of Numerical and Statistical Prob-lems. JACM , , 212–229.(27) Spendley, W.; Hext, G. R.; Himsworth, F. R. Sequential Application of Simplex Designsin Optimisation and Evolutionary Operation. Technometrics , , 441–461.(28) Nelder, J. A.; Mead, R. A Simplex Method for Function Minimization. Comput. J. , , 308–313.(29) Land, A. H.; Doig, A. G. An automatic method of solving discrete programming prob-lems. Econometrica , , 497–520.(30) Little, J. D. C.; Murty, K. G.; Sweeney, D. W.; Karel, C. An Algorithm for the TravelingSalesman Problem. Oper. Res. , , 972–989.(31) Glover, F.; McMillan, C.; Novick, B. Interactive design software and computer graphicsfor architectural and space planning. Ann. Oper. Res. , , 557–573.(32) Glover, F. Future paths for integer programming and links to artiﬁcial intelligence ;CAAI Report 85-8, 1985.(33) Amara, P.; Hsu, D.; Straub, J. E. Global energy minimum searches using an approxi-mate solution of the imaginary time Schroedinger Equation.

J. Phys. Chem. , ,6715–6721.(34) Andricioaei, I.; Straub, J. E. Finding the needle in the haystack: Algorithms for con-formational optimization. Comput. in Phys. , , 449–454.3135) Piela, L.; Kostrowicki, J.; Scheraga, H. A. The Multiple-Minima Problem in the Con-formational Analysis of Molecules. Deformation of the Potential Energy Hypersurfaceby the Diﬀusion Equation Method. J. Phys. Chem. , , 3339–3346.(36) Pillardy, J.; Olszewski, K. A.; Piela, L. Performance of the Shift Method of GlobalMinimization in Searches for Optimum Structures of Clusters of Lennard-Jones Atoms. J. Phys. Chem. , , 4337–4341.(37) Fletcher, R.; Powell, M. J. D. A rapidly convergent descent method for minimization. Comput. J. , , 163–168.(38) Fletcher, R.; Reeves, C. M. Function minimization by conjugate gradients. ComputerJournal , , 149–154.(39) Lee, E. T. Optimization by a Gradient Technique. Ind. Eng. Chem. Res. , ,373–380.(40) Brotden, C. G. Quasi-Newton Methods and Their Application to Function Minimisa-tion. Mathematics of Computation , , 368–381.(41) Broyden, C. G. The Convergence of a Class of Double-rank Minimization Algorithms1. General Considerations. J. Inst. Maths Applics , , 76–90.(42) Goldfarb, D. A family of variable-metric methods derived by variational means. Math-ematics of Computation , , 23–26.(43) Shanno, D. F. Conditioning of quasi-Newton methods for function minimization. Math-ematics of Computation , , 647–656.(44) Byrd, R. H.; Lu, P.; Nocedal, J. A Limited Memory Algortihm for Bound ConstrainedOptimization. SIAM Journal on Sci. Stat. Comp. , , 1190–1208.3245) Morales, J. L.; Nocedal, J. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B, FOR-TRAN routines for large scale bound constrained optimization. ACM Trans. Math.Softw. , , 7:1–7:4.(46) Zhu, C.; Byrd, R. H.; Nocedal, J. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRANroutines for large scale bound constrained optimization. ACM Trans. Math. Softw. , , 550–560.(47) Müntz, C. L. Solution direct de l’équation séculaire et de quelques problèmes analoguestranscendents. Comptes Rendus Acad. Sci. Paris , , 43–46.(48) R. v. Mises, H. P.-G. Praktische Verfahren der Gleichungsauﬂösung. Z. Angew. Math.Mech. , , 58–77.(49) R. v. Mises, H. P.-G. Praktische Verfahren der Gleichungsauﬂösung. Z. Angew. Math.Mech. , , 152–164.(50) Chatelin, F. Eigenvalues of Matrices ; Society for Industrial % Applied Mathemat-ics,U.S.: New York, NY, 2013.(51) Kosloﬀ, R.; Tal-Ezer, H. A direct relaxation method for calculating eigenfunctions andeigenvalues of the schrÃ¶dinger equation on a grid.

Chemical Physics Letters , , 223 – 230.(52) Metropolis, N.; Ulam, S. The Monte Carlo Method. J. Am. Stat. Assoc. , ,335–341.(53) Donsker, M. D.; Kac, M. A Sampling Method for Determining the Lowest Eigenvalueand the Principal Eigenfunction of Schrödinger’s Equation. J. Res. Natl. Bur. Stand. , , 551–557.(54) Anderson, J. B. A random-walk simulation of the Schrödinger equation: H +3 . J. Chem.Phys. , , 1499–1503. 3355) Reynolds, P. J.; Ceperley, D. M. Fixed-node quantum Monte Carlo for molecules. J.Chem. Phys. , , 5593–5603.(56) Greene, S. M.; Batista, V. S. Tensor-Train Split-Operator Fourier Transform (TT-SOFT) Method: Multidimensional Nonadiabatic Quantum Dynamics. JCTC , , 4034–4042.(57) Lehtovaara, L.; Toivanen, J.; Eloranta, J. Solution of time-independent Schrödingerequation by the imaginary time propagation method. J. Comput. Phys. , ,148–157.(58) Bader, P.; Blanes, S.; Casas, F. Solving the Schrödinger eigenvalue problem by theimaginary time propagation technique using splitting methods with complex coeﬃ-cients. arXiv , 1304.6845v2.(59) Shani, E. Analysis and Numerical Performance of Methods of Solving the Time In-dependent Schrödinger Equation for Simulation in Strong-Field Physics. M.Sc. thesis,University of Colorado, 2017.(60) Schwarz, L. R.; Alavi, A.; Booth, G. H. A Projector Quantum Monte Carlo Methodfor non-linear wavefunctions. Phys. Rev. Lett. , , 176403.(61) Khoromskij, B. N. O ( d log N ) -Quantics Approximation of N − d Tensors in High-Dimensional Numerical Modeling.

Constr. Approx. , , 257–280.(62) Khoromskij, B. N.; Oseledets, I. V. DMRG+QTT approach to computation of theground state for the molecular Schrödinger operator. ,(63) Gavrilyuk, I.; Khoromskij, B. Quantized-TT-Cayley Transform for Computing the Dy-namics and the Spectrum of High-Dimensional Hamiltonians. Comput. Met. Appel.Mat. , , 273–290. 3464) Oseledets, I.; Tyrtyshnikov, E. T. TT-cross approximation for multidimensional arrays. Linear Algebra Its Appl. , , 70–88.(65) Oseledets, I. V. Tensor-Train Decomposition. SIAM J. Sci. Comput. , , 2295–2317.(66) Östlund, S.; Rommer, S. Thermodynamic Limit of Density Matrix Renormalization. Phys. Rev. Lett. , , 3537–3540.(67) Savostyanov, D. QTT-rank-one vectors with QTT-rank-one and full-rank Fourier im-ages. Linear Algebra Its Appl. , , 3215–3224.(68) Grover, L. K. A Fast Quantum Mechanical Algorithm for Database Search. Proceedingsof the Twenty-eighth Annual ACM Symposium on Theory of Computing, Philadelphia,PA. New York, 1996; pp 212–219.(69) Eiselt, H.; Sandblom, C.-L. Nonlinear Optimization: Methods and Applications ; 2019.(70) Bomze, I. M.; Demyanov, V.; Fletcher, R.; Terlaky, T.

Nonlinear optimization ; LectureNotes in Mathematics; Springer-Verlag, Berlin; Fondazione C.I.M.E., Florence, 2010;Vol. 1989; pp xiv+279, Papers from the CIME Summer School held in Cetraro, July1–7, 2007, Edited by Gianni Di Pillo and Fabio Schoen.(71) Aragón, F. J.; Goberna, M. A.; López, M. A.; Rodríguez, M. M. L.

Nonlinear optimiza-tion ; Springer Undergraduate Texts in Mathematics and Technology; Springer, Cham,2019; pp xiv+350.(72) Folland, G. B.

Real analysis , 2nd ed.; Pure and Applied Mathematics (New York);John Wiley & Sons, Inc., New York, 1999; pp xvi+386, Modern techniques and theirapplications, A Wiley-Interscience Publication.(73) Oseledets, I. oseledets/TT-Toolbox. 2020; . 3574) Golub, G. H.; Van Loan, C. F.

Matrix computations , 4th ed.; Johns Hopkins Studies inthe Mathematical Sciences; Johns Hopkins University Press, Baltimore, MD, 2013; ppxiv+756.(75) et al., F. J. mpmath: a Python library for arbitrary-precision ﬂoating-point arithmetic(version 0.18). 2013; http://mpmath.org/http://mpmath.org/