Quantum System Compression: A Hamiltonian Guided Walk Through Hilbert Space
QQuantum System Compression: A Hamiltonian Guided Walk Through Hilbert Space.
Robert L. Kosut,
1, 2
Tak-San Ho, and Herschel Rabitz SC Solutions, Sunnyvale CA, 94085 Department of Chemistry, Princeton University, Princeton, NJ, 08544 (Dated: June 25, 2020)We present a systematic study of quantum system compression for the evolution of generic many-body prob-lems. The necessary numerical simulations of such systems are seriously hindered by the exponential growthof the Hilbert space dimension with the number of particles. For a constant
Hamiltonian system of Hilbertspace dimension n whose frequencies range from f min to f max , we show via a proper orthogonal decomposi-tion, that for a run-time T , the dominant dynamics are compressed in the neighborhood of a subspace whosedimension is the smallest integer larger than the time-bandwidth product ∆ = ( f max − f min ) T . We also showhow the distribution of initial states can further compress the system dimension. Under the stated conditions,the time-bandwidth estimate reveals the existence of an effective compressed model whose dimension is derivedsolely from system properties and not dependent on the particular implementation of a variational simulator,such as a machine learning system, or quantum device. However, finding an efficient solution procedure is dependent on the simulator implementation, which is not discussed in this paper. In addition, we show that thecompression rendered by the proper orthogonal decomposition encoding method can be further strengthenedvia a multi-layer autoencoder. Finally, we present numerical illustrations to affirm the compression behaviorin time-varying Hamiltonian dynamics in the presence of external fields. We also discuss the potential impli-cations of the findings for machine learning tools to efficiently solve the many-body or other high dimensionalSchr¨odinger equations. I. INTRODUCTION
Numerous recent studies [1–6] utilizing a flexible repre-sentation of a variational quantum state have been proposedbased on artificial neural networks (ANN) for solving many-body quantum problems. These studies have shown a favor-able polynomial scaling with respect to the system’s num-ber of particles. Such findings are consistent across manyfields [7–12]; despite system complexity in the underlyingphysics, much of observed behavior is compressed, i.e. , thedominant dynamics is manifested in significantly lower di-mensions. Compression arises also in the search over thequantum control landscape as a favorable scaling of controlcomplexity [13, 14].By compression of quantum system dynamics we mean asimulator that has these two features: (1) the simulator has areduced number of variables that do not scale exponentiallywith the number of quantum particles (or, the number of sim-ulator variables is exponentially smaller than the Hilbert spacedimension), and (2) the state error, upon using the simulator,remains satisfactory for the intended purpose. In this con-text, we present a time-bandwidth product which reveals theexistence of a compressed system which satisfies the statedfeatures. Naturally the level of compression in (1) above isinversely related to the degree of dynamical error tolerated in(2). The title of the paper can be understood, since the com-pression occurs in the system’s Hilbert space guided by par-ticular characteristics of the Hamiltonian involved as shown inthe main body of the paper. The key time-bandwidth productis reminiscent of “Hartley’s Law” [15]-[16] referenced in [17]in relation to optimal control complexity. This potential com-pression of quantum dynamics depends only on the systemproperties, including the range of the eigenvalues of Hamilto-nian, the simulation run-time, and the initial state. Thoughthe compression is not dependent on the particular method of simulation, achieving a similar level of compression is ex-pected to be dependent on the simulator implementation. Ad-ditionally we show that the compression rendered by properorthogonal decomposition can be further reduced via a multi-layer autoencoder. We remark that the compressibility analy-sis in this paper will mainly be carried out for constant Hamil-tonians as well as the autoencoder technique for estimatingthe reduced dimensionality. Numerical illustrations for time-varying Hamiltonian dynamics in external (control) fields willalso be presented to support the pervasive nature of systemcompression behavior. Evidence further supporting this resultcomes from the tested Hamiltonians ranging from many-bodycoupled spin systems to those chosen randomly.The paper is written in a style of introducing concepts alongwith the associated mathematical formulation as well as clar-ifying numerical illustrations throughout the text to best ex-press the various aspects of quantum system compression asthey naturally arise. The paper is organized as follows: Sec-tion II introduces the variational state problem, while Sec-tion III describes the method of proper orthogonal decom-position for obtaining linear variational states. Section IVfocuses on constant Hamiltonian dynamics with an exampleprovided in Section V. Sections VI and VII present the maintheoretical results followed by particular numerical simula-tion tests in Section VIII. The utilization of an autoencoder toenhance compression is presented in Section IX. Additionalnumerical illustrations for time-varying dynamics in externalfields are presented in Section X, and we present extensionson compression for unitary dynamics and for nonlinear fre-quency sweeps in Section XI. A discussion of the findings inthe paper is given in Section XII. Finally, details of particularderivations are given in the Appendix. a r X i v : . [ qu a n t - ph ] J un II. VARIATIONAL STATE
The goal is to simulate the quantum state ψ t ∈ C n for t ∈ [0 , T ] by a variational quantum state x ( θ t ) ∈ C n , a func-tion of a time-varying parameter θ t ∈ C m where m < n . Werefer to θ t as the compressed state and to m as compression ,respectively. The effective “small volume in Hilbert space”referred to in [8] and exposed in [9] is akin to the compressedstate discussed here. In a machine learning system, such as aneural network, the “parameters” are the weights that connectall the layers. These weighting paramters are not necessarilythe same as the compressed dimension or the associated statevariables, though it is possible depending on the simulator im-plementation. In this note we leave the simulator parametriza-tion and implementation unspecified and seek to show thatgenerally there are fewer effective compressed states ( m ) thanthat of the full Hilbert space ( n ). Irrespective of the imple-mentation, it is assumed that the variational state x ( θ t ) is asmooth function of the time-varying parameter vector θ t , andas a result, ˙ x ( θ t ) = G ( θ t ) ˙ θ t , G ( θ ) = ∇ θ x ( θ ) ∈ C n × m , (1)where G ( θ t ) as indicated is a gradient matrix. A. Variational formulations
In the most general case the system to be simulated is thetime-varying quantum system, i ˙ ψ t = H t ψ t , t ∈ [0 , T ] . (2)The data available for simulation is the initial state ψ ∈ C n and the time-varying Hamiltonian H t ∈ C n × n , a descriptionthat encompasses a quantum system under a known control orexternal field. The variational optimization problem that we seek to solve is to minimize the following functional , E [ θ ] = 1 T (cid:90) T (cid:107) ψ t − x ( θ t ) (cid:107) dt. (3)The qualifing phrase “seek to solve” is stated because the ac-tual state flow { ψ t , t ∈ [0 , T ] } is not available in the simu-lation context. If it were available then there is no need for avariational version unless one seeks to post facto find a low di-mensional representation. In general, minimizing E [ θ ] is notpossible without the actual state. As in [1] for variational sim-ulation, there is a means to query as necessary the availabledata: the initial state ψ and the time-varying Hamiltonian { H t , t ∈ [0 , T ] } . The variational problem is then pragmati-cally posed to minimize a functional such as, E [ θ ] = 1 T (cid:90) T (cid:13)(cid:13) H t x ( θ t ) − iG ( θ t ) ˙ θ t ) (cid:13)(cid:13) dt. (4)Depending on the context we will refer to E [ θ ] or its inte-grand as the state error , and to E [ θ ] or its integrand as the equation error . To construct a variational quantum simula-tor in a classical device using only the available data ( e.g. , ψ ∈ C n , { H t ∈ C n × n , t ∈ [0 , T ] } ) is equivalent to minimiz-ing an equation error functional such as (4). This is clearlya necessary surrogate for the ideal goal: minimizing the stateerror (3). B. Compression
If the variational solution is perfect then x ( θ t ) = ψ t , ∀ t ∈ [0 , T ] from which it follows that both state and equation errors(and associated functionals) are zero. As a consequence thevariational parameter θ t must satisfy, G ( θ t ) ˙ θ t + iH t ψ t = 0 , x ( θ ) = ψ , t ∈ [0 , T ] . (5)These equations for finding ˙ θ t ∈ C m with m < n are over-determined, i.e. , more equations ( n ) than variables ( m ). As aresult, (5) will hold only if H t ψ t (or i ˙ ψ t ) is a linear combina-tion of the columns of G ( θ t ) . In particular, a solution for ˙ θ t tosolve (5) will exist if and only if for t ∈ [0 , T ] , ( I m − Γ m ( θ t )) H t ψ t = 0 , Γ m ( θ t ) = G ( θ t ) G ( θ t ) ∈ C n × n . (6)where G ( θ t ) ∈ C m × n is the pseudo-inverse of G ( θ t ) .Note that properties of the pseudo-inverse yields Γ m ( θ t ) =Γ m ( θ t ) , i.e. , it is idempotent. Condition (6) is necessary andsufficient for the variational state to provide a perfect simula-tion, i.e. , x ( θ t ) and true state ψ t are identical.For a perfect simulation, and clearly for an imperfect butvery good simulation, especially with m (cid:28) n , or perhapsmore importantly, where m does not scale exponentially withthe number of particles as does n , the state flow ψ t from(2) must be inherently compressible . To understand the ba-sic foundations of quantum system compression for now wefocus on a time-invariant quantum system, meaning that H t = H is a constant matrix; the time-dependent extension will bereturned to in Section X. In addition we restrict attention toa linear variational state, that is, the variational state gradi-ent is a constant matrix: G ( θ t ) = M ∈ C n × m , m < n .The introduction of nonlinearity into the compression processwill be treated with an autoencoder in Section IX. Assum-ing that the eigenvalues sweep linearly from their lowest tohighest values, we show that it is the system properties alonewhich allow the state and equation error measures to show afavorable scaling of compression m with system dimension n . Our analysis provides a means to assess the compressionrange that might be achievable from a more generally flexi-ble variational state parametrization, i.e. , with a time-varyingnonlinear gradient G ( θ t ) . As in numerous studies showingfavorable compression, we are led to the assertion that a vari-ational quantum solution of a time-invariant quantum systemis possible with its compression dependent solely on the sys-tem parameters: Hamiltonian eigenvalues, initial state, andsimulation run time. However, the demonstrated existence ofsystem compression shown in this paper still leaves the chal-lenge to explicitly exploit that for the performance of dynam-ical simulations ( i.e. , especially for high dimensional many-body situations ) in, for example, a machine learning format. III. LINEAR VARIATIONAL STATE
A common means of obtaining a linear variational stateis the method of proper orthogonal decomposition (POD) orequivalently often referred to as principal component analy-sis (PCA) [18]. There are several variants of POD, here weuse an unbiased version where the variational state is set to x ( θ t ) = M θ t with gradient matrix M ∈ C n × m where m < n [19]. The optimization variables M and θ t are selected tominimize a state error functional chosen as (3):minimize E pod = 1 T (cid:90) T (cid:107) ψ t − M θ t (cid:107) dt, subject to M ∈ C n × m , θ t ∈ C m , t ∈ [0 , T ] . (7)The well known optimal solution is, θ t = M † m ψ t , M m = arg min M ∈ C n × m Tr ( I n − M M † ) C [ ψ ] , (8)where M m ∈ C n × m is found via a singular value decompo-sition of the n × n positive semi-definite “covariance” matrix, C [ ψ ] = T (cid:82) T ψ t ψ † t dt = (cid:2) M m M n − m (cid:3) (cid:20) Q m Q n − m (cid:21) (cid:2) M m M n − m (cid:3) † , (9)with singular values arranged (as usual) in descending or-der in Q m = diag ( σ ( C ) , . . . , σ m ( C ) and Q n − m = diag ( σ m +1 ( C ) , . . . , σ n ( C )) and where (cid:2) M m M n − m (cid:3) ∈ U ( n ) . The resulting optimal (POD) variational state is, x t = Γ m ψ t , Γ m = M m M † m ∈ C n × n , (10)which produces the minimum state POD error measure: thesum of the smallest n − m singular values of the covariancematrix, i.e. , E pod = Tr ( I n − Γ m ) C [ ψ ]) = n (cid:88) i = m +1 σ i ( C ) . (11)Because the matrix M m is part of a unitary matrix, its m columns are orthonormal vectors in C n and thus M † m M m = I m . The columns of M m form a basis set for the compressedstate θ t making Γ m idempotent, i.e. , Γ m = Γ m . Combining(10) with (4) we get the corresponding POD equation errormeasure, E pod = 1 T (cid:90) T (cid:107) ( H t Γ m − Γ m H t ) ψ t (cid:107) dt. (12)A typical measure of model reduction error is the relative errorin the cumulative sum of singular values of the covariance matrix compared to the sum of all the singular values: (cid:15) m ( C ) = 1 − m (cid:88) i =1 σ i ( C ) / n (cid:88) i =1 σ i ( C ) = n (cid:88) i = m +1 σ i ( C ) . (13)To predict compression quantitatively for a quantum system,we make an obvious restriction as discussed next which leadsto the time-bandwidth product presented in Section VII. Laterin Section X we will discuss the compression effect with atime-varying Hamiltonian and suggest how a compression es-timate can be obtained. IV. TIME-INVARIANT HAMILTONIAN DYNAMICS
Consider a time-invariant
Hamiltonian driving the quantumdynamical system, i ˙ ψ t = Hψ t , t ∈ [0 , T ] . (14)Since the Hamiltonian H ∈ C n × n is constant, the standardeigenvalue decomposition yields, H = V Ω V † (cid:26) V ∈ U ( n ) , Ω = diag ( ω ) , ω ∈ R n . (15)Subsequently the state flow ψ t can be expressed variously as, ψ t = V e − it Ω V † ψ = V f t ,f t = (cid:2) α e − iω t · · · α n e − iω n t (cid:3) T ,α = V † ψ , (16)where α ∈ C n , (cid:107) α (cid:107) = 1 is the projection of the initial stateonto the natural basis of the Hamiltonian. As shown in Ap-pendix A, the singular values of the POD covariance matrix(9) are identical with those of a positive semidefinite matrix R ∈ C n × n which has the decomposition, R = diag ( α ) S diag ( α ) † ,S = sinc(Λ) , Λ k(cid:96) = ( ω k − ω (cid:96) ) T / , k, (cid:96) = 1 , . . . , n. (17)We will refer to R as a “covariance” matrix and to S as the“sinc” matrix with respective elements, R k(cid:96) = (cid:26) | α k | , k = (cid:96), ( α k α ∗ (cid:96) ) sinc Λ k(cid:96) , k (cid:54) = (cid:96),S k(cid:96) = (cid:26) , k = (cid:96), sinc Λ k(cid:96) , k (cid:54) = (cid:96). (18)The corresponding normalized singular value errors in theform of (13) for m = 1 , . . . , n are, (cid:15) m ( R ) = 1 − m (cid:88) i =1 σ i ( R ) ,(cid:15) m ( S ) = 1 − m (cid:88) i =1 σ i ( S ) /n, (19)where the cumulative sum of singular value errors are normal-ized by their respective singular value sums: Tr R = (cid:107) α (cid:107) = (cid:107) ψ (cid:107) = 1 and Tr S = n . Though the calculation of the ele-ments of R and S do not require any dynamical simulation, touse these expressions necessitates obtaining an eigenvalue de-composition of the system Hamiltonian, knowledge of the ini-tial state, and calculating the singular values of R and S . Wecan, however, without doing any such decompositions, assertsome generic properties of compression. Before discussingthese, it is worthwhile to do an example. V. EXAMPLE: TIME-INVARIANT HAMILTONIAN SPINSYSTEM
Consider the transverse-field Ising (TFI) n spin -system withHamiltonian, H TFI = − h n spin (cid:88) i =1 σ xi − n spin − (cid:88) i =1 σ zi σ zi +1 . (20)This system was studied in [1]; here we use the same modelparameters and initial state choices but only for a limited num-ber of spins. The upper plot (a) in Figure 1 shows the eigenval-ues of H TFI with TFI parameter h = 2 for n spin ∈ { , } ,ergo, space dimension n = 2 n spin ∈ { , } . Theeigenvalues clearly sweep uniformly from max to min andsum to zero, thus forming two groups of n/ positive eigen-values and n/ negative eigenvalues with each group havingthe same magnitudes: ω ∈ {− ω , . . . , − ω n/ , ω , . . . , ω n/ } .Plots (b)-(c) in Figure 1 correspond to the two spin exam-ples: n spin ∈ { , } each with run time T = 2 . In each ofthe (b)-(c) plots, the two sub-plots on the right show, respec-tively, the RMS values over t ∈ [0 , T ] of both state error andequation error, i.e. , (cid:107) ψ − x (cid:107) rms and (cid:107) Hx − i ˙ x (cid:107) rms . (Theseare the square-roots, respectively, of E pod and E pod from (11)-(12).) The left sub-plots of (b)-(c) show the the relative errorin the cumulative sum of singular values of the covariance ma-trix R (blue dots) and the sinc matrix S (red dots), both calcu-lated from (18). For comparison with the RMS error measureswe use the square root of the cumulative singular value errorsof R and S from (19), namely, (cid:112) (cid:15) m ( R ) and (cid:112) (cid:15) m ( S ) . Theerror measures are all computed for H tfi with h = 2 , and foreach of two initial states: one fixed at the ground state of H tfi with h = 4 (blue dots), and one with the initial state randomlyselected (green dots). (Only results for the ground state initial-ization was presented in [1].) As seen in Figure 1 the selectionof a random initial state (green dots in the singular value plotsand green triangles in the error measure plots) increases thevariational compression to achieve the same relative modelerror, though the increases are not dramatic.Figure 2 displays the TFI data in histograms of log of the absolute values of all elements of state error matrix ( I n − Γ m ) C [ ψ ] from (11) for n = 2 and for the selectedcompressions shown. The decrease in error magnitudes isquite dramatic for compressions greater than 15. This valueis effectively predicted by the time-bandwidth product, i.e. , (a)(b)(c) FIG. 1.
TFI . All runs with h = 2 (20), run-time T = 2 , 10 and12 spins (eigenvalues in (a) ) and two initial states: one at the groundstate of H tfi with h = 4 (blue dots and triangles in (b)-(c) ; onerandomly selected (green dots and triangles in (b)-(c) ). The left sub-plots in (b)-(c) show the relative RMS error (square-root of (19)) ofthe covariance matrix R (blue dots) and the sinc matrix S (red dots)from (18) for compressions m = 1 , . . . , . Lower right sub-plotsin (b)-(c) show errors for m ∈ { , , , } . ∆ = ( f max − f min ) T ≈ .Clearly there is significant dynamic compression: not onlyis the level small compared to the system dimension, more-over, there is little increase in compression with increasingnumber of spins. In both cases shown a compression of or-der m = 25 suffices to produce very small state and equation FIG. 2.
Histograms of state errors
The log of the absolute valuesof all elements of state error matrix ( I n − Γ m ) C [ ψ ] from (11) for n = 2 and for the selected compressions shown. (a) (b) FIG. 3.
Robustness of RMS magnitude of POD error time re-sponses (a) change to run time T = 4 from variational states op-timized for run time T = 2 , and (b) small perturbations to initialstate. errors.The compression order selections ( m ) are based on the ini-tial state and system dynamics over the specified simulationtime interval t ∈ [0 , T ] with T = 2 . The left plot (a) in Fig-ure 3 shows what happens when the run time is extended to T = 4 for compressions m ∈ { , } . As expected, outsideof the design range of t ∈ [0 , the state RMS error as a func-tion of time dramatically increases. In the right plot (b) theinitial state differs from that used in the SVD of the covari-ance matrix, and as a result the state RMS error deterioratesover the whole time interval. Two levels of initial state pertur-bation are shown: ∆ ψ ∈ { . , . } . For m = 14 thereis no significant change whereas for m = 21 the change isconsiderable, more than two orders of magnitude increase inerror. The difference in robustness to initial state may be at-tributed to the nominal error magnitude, i.e. , the state error for m = 14 is much larger than that for m = 21 , and hence, thelatter is more sensitive to changes in the initial state. VI. RUN-TIME AND SINGULAR VALUE BOUNDS
In the previous examples the relative singular value error of S , the sinc-matrix, provides an upper bound on the errors ofthe covariance matrix R , and these errors are close when theinitial state is random. This suggests that the efficacy of a vari-ational quantum state of a time-invariant quantum system (14)can be obtained from the singular values of R (equivalently C )and the sinc-matrix S (17)-(18). To see this we first examinesome qualitative properties at the extremes of run time T . A. Run-time
In the limit as the run-time T goes to infinity the sinc matrixapproaches the identity, hence, lim T →∞ σ ( R ) = σ ( diag ( | α | ) . (21)The singular values of R become the sorted magnitudes of α ,the initial state expressed via the Hamiltonian eigenvectors.These are the diagonal elements of R , all the other elementstend to zero. Compression to m < n in this case requires thatthe last n − m elements of α are much smaller than the first m elements. At the opposite end, as the run-time T goes to zero, R → αα † , thus, lim T → σ ( R ) = (cid:40) n (cid:88) k =1 | α k | , , . . . , (cid:41) = { , , . . . , } . (22)Note that there is only one non-zero singular value at (cid:107) α (cid:107) = (cid:107) ψ (cid:107) = 1 . A reduction to a model with dimension one iscertainly extreme, but expected. The message to take hereis that the dynamics becomes further compressed as run-timedecreases. In effect, for small run-times not very much of thespace gets filled out beyond where the state started.The extreme run-time scenarios reflect the fact that since α = V † ψ , compression depends on how the initial state isprojected onto the eigenvectors of the Hamiltonian. In the pre-vious examples (Figure 1) when the initial state is preparednearby the ground state, the error measure for R is signifi-cantly smaller than for S because the initial state is localizedin the Hilbert space. For a random initial state we see that thesingular value errors for R and S are close since the initialstate now is more spread out. B. Singular value bounds
More quantitative insights can be revealed using a standardsingular value inequality for the product of matrices. From therelation of R and S (17), the singular values of R are boundedby, σ k ( R ) ≤ min (cid:110) (cid:107) α (cid:107) ∞ σ k ( S ) , (cid:107) S (cid:107) σ k ( α ) (cid:111) , k = 1 , . . . , n. (23)Here we use the notation σ k ( α ) to mean σ k ( diag α ) , so σ k ( α ) assumes that the elements of α have been reordered in de-ceasing magnitude. Another well known inequality follows,namely, rank R ≤ min (cid:110) rank S, rank α (cid:111) . (24)This rank inequality is only useful if S has some zero singularvalues and/or α has some zero elements. The latter can occurwhen the initial state lies completely in a lower dimensionalsubspace, e.g. , [20]. Though this is generally unlikely, as ob-served in the previous spin system examples, compression ispossible because many singular values of R are nearly zero,possibly driven by the localized initial state lying dominantlybeing in a low dimensional subspace. For example, supposethat no elements of α are zero and that σ k ( S ) ≈ for k > m with m (cid:28) n . Then σ k ( R ) ≈ for k > m , and thus m is themaximum variational compression. In this case the compres-sion order is bounded by the number of non-zero singular (ornon-small) singular values of S , the sinc matrix. This prop-erty is clearly seen in the numerical results of Figure 1 withthe TFI example. We see this in many other cases that we haverun: the sinc-matrix error bounds the covariance matrix error, i.e. , (cid:15) m ( R ) ≤ (cid:15) m ( S ) , or equivalently, m (cid:88) i =1 σ i ( S ) /n ≤ m (cid:88) i =1 σ i ( R ) . (25)Both sides of this inequality are norms, specifically Ky Fan m -norms respectively of S/n and R . From the Ky Fan Domi-nance Theorem [21] the above will hold for all m = 1 , . . . , n if and only if (cid:107) S/n (cid:107) ≤ (cid:107) R (cid:107) for any norm invariant unitarytransformation. Only in very few cases have we seen this be-havior violated. Nevertheless, at the moment (25) remains asufficient condition for a general frequency sweep. For a lin-ear sweep approximation we can make stronger statements. VII. LINEAR FREQUENCY SWEEP
In many quantum systems, such as just observed for thespin system example, the eigenvalues of the Hamiltonian arealmost linear from minimum to maximum, or reasonably ap-proximated as such over a large portion of the range (see Fig-ure 1). Under the assumption of a linear sweep of eigenvalues,the sinc matrix S (17) becomes a symmetric (real) Toeplitz matrix, S lin = sinc Λ lin ∈ R n × n , Λ lin kl = ( ω k − ω (cid:96) ) T / (cid:16) k − (cid:96)n − (cid:17) ∆ π, k, (cid:96) = 1 , . . . , n, ∆ = ( ω max − ω min ) T / π. (26)The non-dimensional variable ∆ , referred to here as the “time-bandwidth product,” will be seen to play the key role in estab-lishing an approximate upper bound on the variational modelorder. The covariance matrix corresponding to S lin follows from the form of (17), R lin = diag ( α ) S lin diag ( α ) † , (27)in which R lin is Hermitian matrix but not generally of Toeplitzform. The spin system results in Figure 4 are of sufficientlysmall size so that the singular values of the R , S , S lin , and R lin matrices can be directly calculated on a standard laptop.For large dimensions, this becomes infeasible. Fortunately,even for very large n , the singular values of S lin can be ap-proximated by taking the digital Fourier transform (DFT) ofa column of a related circulant matrix, for which there area number of versions, all resulting in asymptotic approxima-tions of the eigenvalues [22, 23]. A similar procedure wasutilized in [24]. In the case here, with a very specific (sinc)function forming the symmetric Toeplitz matrix elements, wecan appeal to a more direct result in [25, 26] on the asymp-totic distribution of the Toeplitz eigenvalues. As shown inAppendix B, for large n the singular values of the Toeplitzsinc matrix σ i ( S lin ) , i = 1 , . . . , n , are well approximated by, σ i ( S lin ) ≈ (cid:26) ( n − / ∆ , i < m tbw , , i ≥ m tbw , (28)where the index value m tbw , under the conditions of a linearfrequency sweep, is the level of compression, m tbw = (cid:108) nn − (cid:109) , ∆ = ( f max − f min ) T. (29)The frequency spread is expressed here in Hz using f = 2 πω .The corresponding covariance error (19) is, (cid:15) k ( S lin ) = 1 − (cid:80) ki =1 σ i ( S lin ) / (cid:80) ni =1 σ i ( S lin ) ≈ (cid:26) − k/m k < m tbw , k ≥ m tbw . (30)Note that one could multiply the constant Hamiltonian H by a and divide the run-time T by a and get the same value for m tbw . This is a trivial scaling for a constant Hamiltonian,however, as we will see later (Section X), this simple scalingdoes not apply for a time-varying Hamiltonian where m tbw isinterpreted differently to account for the observed compres-sion.At the compression level m tbw , the variational state errorsare either very small or rapidly decreasing for m > m tbw . Forquantum systems with n (cid:29) we can take m tbw = (cid:100) ∆ (cid:101) . Let S lin denote the ideal sinc matrix with exactly m tbw = (cid:100) ∆ (cid:101) non-zero constant singular values, σ k ( S lin ) = (cid:26) n/ ∆ , k < (cid:100) ∆ (cid:101) , , k ≥ (cid:100) ∆ (cid:101) . (31)Let R lin denote the corresponding covariance matrix, R lin = diag ( α ) S lin diag ( α ) † . (32)Application of the singular value bound (23) results in, σ k ( R lin ) ≤ ( n/ ∆) σ k ( a ) , k < (cid:100) ∆ (cid:101) , = 0 , k ≥ (cid:100) ∆ (cid:101) . (33)This result ensures that for a linear frequency sweep andlarge n the Ky Fan Dominance Theorem holds for the idealpair ( R lin , S lin ) , namely, σ k ( S lin /n ) ≤ σ k ( R lin ) because (cid:107) α (cid:107) ∞ ≥ /n since (cid:107) α (cid:107) = 1 . Thus for a linear sweep, theideal relative covariance error is bounded by the relative sincerror. Further, if for all elements | α i | = 1 /n, i = 1 , . . . , n ,then both the singular values of R lin and S lin as well as theirrelative errors coincide. This tendency was observed for ran-dom initial states in the spin example and is seen more point-edly in the numerical simulations presented next.Note that the time-bandwidth product m tbw is not indepen-dent of the system dimension n . For example, the spin sys-tem Hamiltonian (20) frequency range is linear in the num-ber of spins n spin , and since n = 2 n spin , it follows that m tbw ∼ log n . More specifically, m tbw = (cid:108) (log n/ log n )∆ (cid:109) (34)where ∆ , n are chosen nominal values for comparison with n ≥ n . For example, with n = 2 and m = (cid:100) ∆ (cid:101) , if n = 2 then m tbw = 2 m , or if n = 2 then m tbw = 8 m and so on. More generally, if the frequency range scales as apolynomial function of the number of two-level particles, say ∼ N k , then m tbw ∼ (log n ) k . Such behavior is still a veryfavorable scaling with respect to the exponential scaling of thesystem dimension with the number of quantum particles. VIII. NUMERICAL RESULTSA. TFI system
Figure 4 shows plots of the normalized singular values of R , S , S lin and S lin for H tfi (20) with h = 2 for 10 and 12spins and with two initial states: (a,c) at the ground state for h = 4 and (b,d) random. Both R and S are calculated usingthe actual (nonlinear) eigenvalue sweep of H tfi whereas S lin uses a linear sweep over the same range. The text boxes with m tbw = 14 and m tbw = 17 show the ideal predicted com-pression for the two spin cases assuming a linear frequencysweep. The ratio / ≈ . follows the log-scaling (34)with log 4096 / log 1024 = 1 . which obviously is the ratioof spins / . For 80 spins the compression estimate in-creases by a factor of 8 over 10 spins to m = 112 , dramati-cally low compared to n = 2 , and so on. The dashed linerect-function next to these boxes is the ideal singular valuefunction given by (B1): a constant until index k = m andthen drops to zero thereafter. The compression level wherethe singular values of both S and S lin drop to essentially zerois almost identical to that predicted by (29). Additionally, the actual singular values of S lin follow those of the ideal S lin ,tending more closely as n increases from 1024 to 4096.The break points ( i.e. , compression level) where the singu- (a) (b)(c) (d) FIG. 4.
Normalized singular values of R , S and S lin for H tfi (20)with h = 2 for 10 and 12 spins and with initial states at ground for h = 4 ( (a,c) ) and random ( (b,d) ). The rect-function (dashed-lines)is ¯ S lin from (31) with associated singular values as indicated by thetext boxes m tbw = 14 and m tbw = 17 . lar values drop significantly as predicted by the sinc matrix S ,or the linear sweep matrix S lin , can differ, and exceed, thoseof the covariance matrix R (or R lin ), whose break points aregenerally smaller especially from a ground state. They adhereclosely to S (or S lin ) for a random initial state. The former isto be expected considering that the sinc matrix, S , contains noinformation about any correlations with the initial state. Wealso see that the linear frequency sweep approximation, result-ing in the symmetric Toeplitz matrix S lin and the correspond-ing covariance R lin , are in agreement with these findings. B. POD via Snapshot
Figure 5 shows a further comparison with n spin ∈ { , } where the singular values are computed from the “snapshot”version of the covariance matrix (9), Ψ = (cid:2) ψ t · · · ψ t K (cid:3) ∈ C n × K (35)where K = 200 uniformly spaced time samples over the samesimulation run-time T = 2 as in the previous TFI examples.This gives a sampling rate many times the maximum Hamil- FIG. 5.
Singular values of snapshot matrix (35) for H tfi system with n spin ∈ { , } each initialized from a random state.FIG. 6. Random Hamiltonian n=2000, initial state confined to ran-dom selection of 15, 30, and 60 eigenvectors of Hamiltonian. tonian frequency. For both spin settings the dynamics are ini-tialized with a random state. The singular values shown arethe squares of the singular values of the n × snapshot ma-trix Ψ ; these approximate those of the covariance matrix, i.e. , C ([ ψ ]) ≈ ( T /K )ΨΨ † . The time-bandwidth predicted com-pression levels m tbw ∈ { , } are again validated by thedata and also follow the log scaling (34) for these spin sys-tems, i.e. , compression ratio / ≈ . compared to spinratio /
10 = 1 . . C. Random Hamiltonians
We generated many random Hamiltonians for n = 2000 and with a variety of distributions of elements. The results al-ways come down to confirming the largest compression valuein the range predicted by the time-bandwidth product, andwith significant reduction dependent on the initial state dis-tribution amongst the subspace defined by the Hamiltonianeigenvectors. Figure 6 depicts a typical result with a randomHamiltonian of dimension n = 2000 . For the three exam-ples shown, we confined the random initial states to be linearcombinations of 15, 30, and 60 eigenvectors of the Hamilto-nian. (Of course no initial state would be exactly so confined;this illustrates the effect.) As expected the confinement of theinitial state to a further compressed subspace is seen in themiddle plot where the sorted magnitudes of the unit vector(16) ( α = V † ψ ) clearly drops to zero at exactly 15, 30, and60. The time-bandwidth product predicts a maximum com-pression of m tbw = 32 which is confirmed by the singularvalue errors shown in the lower plot. What is noteworthy isthat for the case of the initial state being confined to a 60 di-mensional subspace, the time-bandwidth product bounds thecovariance level compression and the true covariance error –the same as e sv ( R ) – begins to approach the sinc matrix error.In other words, the “sinc” matrix singular values dominate theonset of compression. A fully random initial state will causethe covariance singular values to line up with those of the sincmatrix which bounds the compression. As seen in the lowerplot, as the initial state subspace dimension increases, the er-rors get larger, and as expected, do not exceed the sinc matrixerrors.To emphasize this point, a variety of full length ( i.e. , n =2000 ) random initial states were tested, and the compressionlevel matched that predicted by the time-bandwidth measure.An interesting point is that the structured Hamiltonian many-body cases reported earlier in the paper, and the extreme ofrandom Hamiltonians shown here, both displayed the samecharacteristic compression behavior. This situation indicatesthat the origin of the compression arises from the Hamiltonianspectral bandwidth and not any other special features ( e.g. ,many-body coupling character or patterns). IX. FURTHER COMPRESSION WITH ANAUTOENCODER
We compare our findings using POD with a variationalmodel constructed from an autoencoder (AE) as depicted inFigure 7. This configuration is one of several variations forseeking compression by combining an autoencoder with adata-driven pre-processing procedure such as POD, e.g. , [27].By disconnecting the encoder and decoder the output becomes X = M m M † m Ψ which is exactly the POD solution (10).The autoencoder weights w ∈ R p are selected so as to min- FIG. 7.
Autoencoder with POD pre- and post-processing. imize a weight dependent error, i.e. ,minimize E ( w ) = (cid:107) Ψ − X (cid:107) /K, subject to X = A dec ( w )( Z ) − M m Z,Z = A enc ( w )(Ψ) − M † m Ψ , (36)where (cid:107)·(cid:107) fro is the Frobenius norm, Ψ = [ ψ t · · · ψ t K ] ∈ C n × K is the data (snapshot (35)) at uniform sample times ∆ t . The weight dependent interconnection structure of theencoder A enc ( w ) : C n → C m , decoder A dec ( w ) : C m → C n and variational parameter dimension m < n are all specifiedbefore employing the AE. That is, the AE is seeking to achievebetter predictive quality than POD for a given m value. Thisutilization of the AE ( i.e. , a special form of machine learningalgorithm) can be viewed as bringing in a nonlinear featurebeyond linear use of M as in POD. The flexibility inherentin the nonlinear encoding/decoding structure gives the AE thepotential to produce a smaller state error for the same com-pression level m than that of POD which is restricted to a lin-ear encoder/decoder structure. Finally, unlike in a variationalsimulator, the input state flow is available, embedded here inthe snap- shot matrix Ψ .Table I and Figure 8 compares the POD-only state errorswith those from the POD/AE system. The input data is thesnapshot matrix ψ ∈ C × from our previous TFI sys-tem example with a random initial state. The table shows thatthe POD-only RMS error of 0.5452 at m = 5 is reduced to0.0101 with POD/AE, a value near to that of POD-only for m = 15 which has an RMS error as shown of 0.0121. At m = 10 , the RMS errors with POD/AE are approximatelythose with POD-only at m = 16 , , possibly indicating thelimit obtainable with this multi-layer autoencoder. Figure 8highlights the large error reduction with the addition of theAE.The POD/AE mechanism achieves an error level commen-surate with the time-bandwidth product at m tbw = 14 for acompression m = 5 and 10. Such an error reduction indicatesthe benefit of the inherent nonlinear variational AE state. Thefundamental reason that the AE, and the particular configura-tion used here, can outperform POD remains an open issue.That is, given that an AE, or any neural network, which canapproximate most nonlinear functions, what is the character ofthe AE transformation that has this property i.e. , compressionin this case? m POD only POD with AE5 0.5452 0.010110 0.1717 0.007015 0.0121 0.003016 0.0044 –17 0.0016 –
TABLE I.
Comparison of RMS error (cid:107) Ψ − X m (cid:107) fro / √ K fromPOD-only and POD/AE (Figure 7) for m = 5 , , . log | ( ψ − X ) ij | FIG. 8.
Histograms of log of the absolute value of all elementsof the state error matrix Ψ − X (36). Upper
POD-only for m =5 , , . Lower
POD with autoencoder for m = 5 , . X. COMPRESSION OF TIME-VARYING HAMILTONIANDYNAMICS
Consider the time-varying quantum system, ˙ ψ t = ( H + c t H ) ψ t , ψ ∈ C n , t ∈ [0 , T ] , (37)where the external field c t is independently, identically, anduniformly distributed in [ − c mag , c mag ] at each of the K timeintervals in [0 , T ] . The resulting state samples are stored inthe “snapshot” matrix, Ψ = (cid:104) ψ t · · · ψ t K (cid:105) ∈ C n × K , (38)which is used to compute the sampled-data version of thecovariance matrix (9) via C = ΨΨ † /K . The blue curvesin Figure 9 are the singular value errors ε m ( C ) vs. com-pression level m from (13) for n = 128 with 100 trials at K = 200 time samples for T = 2 and with each field mag-nitude c mag = { . , , } with both H , H ∈ C × randomly generated, normalized to (cid:107) H (cid:107) = (cid:107) H (cid:107) = 10 , andheld fixed throughout the 100 trials. The two red curves arethe singular value errors for one trial with n = 1024 at thetwo field magnitudes c mag = { , } and again randomly0 FIG. 9.
Linear-Time-Varying random field. c t , t ∈ [0 , . Bluecurves are POD errors (13) for n = 128 from 100 trials each at uni-form random control with magnitude ranges {± . , ± , ± } fromfixed random Hamiltonians normalized as indicated. The two redcurves are for n = 1024 from one trial each with field magnituderange ± and random Hamiltonians as previously normalized. Ap-plying the autoencoder (Figure 7) with POD at compression level m = 15 , ε = 0 . (upper diamond) results in an error approxi-mately × − (lower diamond). generated H , H ∈ C × and normalized also to 10.We ran many cases at n = 1024 ; so as not to crowd the fig-ure we show two representative examples, the rest fell in thesame range as the blue curves. What is interesting to noteis that there is compression in these cases and the levels donot depend very much on the Hilbert space dimension. Theydo, however, depend significantly on the external field mag-nitude and certainly the relative magnitudes of the Hamiltoni-ans. Fixing these at 10 for both n = 128 and n = 1024 showsthis effect. The results depicted are qualitatively similar towhat we expect for the time-invariant Hamiltonians and thecompression level predicted by the time-bandwidth product.Though compression is clearly revealed by the POD pro-cedure, it is based on a linear variational state. A nonlinearvariational state, such as one obtained from a neural networkwould have the potential for improvement. Applying the au-toencoder of Figure 7 with POD initiated at the compressionlevel m = 15 , ε = 0 . (upper diamond) results in anerror approximately × − (lower diamond), almost a two-order of magnitude error reduction for the same compressionlevel.Although a detailed mathematical analysis of the time-dependent compression behavior remains to be determined,we speculate here on how a time-bandwidth measure can bedeveloped for a time-varying quantum system such as (37).Following the exposition and notation in [28], consider the norm-bounded linear differential inclusion (NLDI), i ˙ ψ N t ∈ H NLDI ψ N t , ψ N0 = ψ , H NLDI = { H δ = H + δH | | δ | ≤ c mag } , (39) Compression Level c mag NLDI Prediction max ∆eig {H NLDI } POD error − POD error − TABLE II. Comparison of POD compression error levels at − and − with predicted compression onset from the worst-case spreadof the eigenvalues of H NLDI (39) for system (37) with respect to c mag ∈ { . , , } for 100 trials each for systems evolving from(37) with dimension n = 128 and (cid:107) H , (cid:107) = 10 where the set H NLDI ⊂ C n × n . Any solution of (37) is alsoa solution of the NLDI and in most cases the converse alsoholds. As a result, many characteristics of all solutions of theNLDI are inherited by all solutions of (37). Our speculation is that if compression is one of these characteristics, then thetime-bandwidth product can be applied to the NLDI to estab-lish the onset level of compression for (37). Table II shows theresults of using the worst-case frequency spread from H NLDI to predict the onset of compression for (37). Though the pre-dicted NLDI onset of compression is in the neighborhood pro-duced by POD, absent a more rigorous theoretical analysis weleave this objective for a future study. One possible path wewill explore is using approaches based on robust control the-ory for multiple uncertainties as applied in [29] for findingthe set of uncertain eigenvalues as might be characterized bythe time varying nature of a Hamiltonian with multiple time-varying fields.
XI. EXTENSIONS OF THE ANALYSIS
In this section we briefly discuss potential extensions of thetime-bandwidth product theory to (A) unitary dynamics and(B) nonlinear frequency sweeps for constant Hamiltonians
A. Unitary dynamics
The time-bandwidth product (29) also predicts the approx-imate size of the compression of a variational simulation ofunitary dynamics, i ˙ U t = HU t , U = I n , t ∈ [0 , T ] . (40)For ν = 1 , . . . , n let u t,ν ∈ C n denote the columns of U t .Each column of the unitary evolves according to the same Hamiltonian system, i.e. , i ˙ u t,ν = Hu t,ν , u ,ν = ε ν , t ∈ [0 , T ] . (41)Since the initial unitary is the identity matrix, it follows thatthe initial value of the ν -th column is ε ν ∈ R n , a vector withsingle non-zero element equal to one in the ν -th place. Let X t denote the variational unitary approximation with columns1 (a) (b) FIG. 10.
Two types of eigenvalue distributions with system dimen-sion n = 1024 . (a) from an XXZ spin-chain exhibiting characteristicpiecewise constant variations, and (b) fictitious system with an exag-gerated polynomial distribution. x t,ν ∈ C n , ν = 1 , . . . , n . Using the Frobenius norm the state(unitary) error is, E = 1 T (cid:90) T (cid:107) X t − U t (cid:107) dt = n (cid:88) ν =1 T (cid:90) T (cid:107) x t,ν − u t,ν (cid:107) dt. (42)As shown in Appendix C, under a linear sweep of Hamiltonianeigenvalues, the previously defined sinc matrix bounds all thesingular values of each subsystem covariance resulting in thetotal error bound, E ≈ E lin ≤ n (cid:32) − m (cid:88) i =1 σ i ( S lin /n ) (cid:33) . (43)As we have shown for compression of the quantum state dy-namics with a constant Hamiltonian, we see the same here, i.e. , for large n the singular values of S lin approach those of S lin , and as in (30), the error tends asymptotically to zero,thereby ensuring compression for a value of m not dependenton the exponential growth of the Hilbert space dimension withthe number of particles. B. Nonlinear frequency sweeps
The more common case to expect from an arbitrary Hamil-tonian is a nonlinear eigenvalue sweep ( e.g. , see Figure 1).Figure 10 presents two other types of eigenvalue spreads andthe corresponding singular value plots of the covariance ma-trix C , the sinc matrix S , and the linear sinc matrix S lin .The upper plot in (a) shows the eigenvalues from an XXZspin-chain with Hamiltonian H = (cid:80) i σ ix σ i +1 x + σ iy σ i +1 y + δσ iz σ i +1 z . The characteristic feature of these eigenvalues isthat they are piecewise constant over varying intervals. With δ = 1 . and with a random initial state, the lower plot in (a) shows that the linear approximation gives a compressionthat is a comparable range with that obtained from POD ofa snapshot matrix. On the right the upper plot in (b) showsa fictitious eigenvalue distribution made from an exaggeratedpolynomial where the eigenvectors are selected randomly togenerate the Hamiltonian. Though the linear approximationis clearly not very good, the compression estimate remains inthe range predicted by the time-bandwidth product.Though these circumstances violate the assumption lead-ing to the time-bandwidth product analysis, the examples pre-sented earlier in the paper show that a nonlinear eigenvaluesweep does not result in a serious problem: the predicted com-pression from the linear eigenvalue sweep is in the range com-puted from the actual system. To further support this findingwe propose to stretch the nonlinear eigenvalue sweep engen-dered by an n -dimensional Hamiltonian into a straightenedout linear sweep over the same range. This is accomplishedby equating the path lengths of the two sweeps.The path length of n nonlinear monotonically increasingeigenvalues from ω to ω n is, d ( ω ) = n − (cid:88) k =1 (cid:0) ( ω k +1 − ω k ) + 1 (cid:1) / . (44)If the sweep were linear then d ( ω lin ) = (cid:112) (∆ ω ) + ( n − with ∆ ω = ω n − ω . To keep the same eigenvalue rangewith the stretched linear sweep requires interpolating betweenthe stretched gaps in successive eigenvalues with an additional n (cid:48) − n eigenvalues where, n (cid:48) = (cid:100) (cid:0) d ( ω ) − (∆ ω ) (cid:1) / (cid:101) (45)For the two TFI frequency sweeps shown in Figure 1 for n ∈ { , } the stretched system dimension increasesmodestly to n (cid:48) ∈ { , } . Since the compression m scales logarithmically with dimension (34), the relativetime-bandwidth compression estimate is log n (cid:48) / log n − ∈{ . × − , . × − } . These changes are not visi-ble in Figure 4 where the predicted compressions under thelinear sweep assumption are compared with the order wherethe actual singular values become insignificant. Similarly forthe XXZ and polynomial eigenvalues shown in Figure 10: n (cid:48) xxz = 1046 and n (cid:48) poly = 1047 . XII. SUMMARY
The main result reported here is the introduction of thetime-bandwidth product, ∆ = ( f max − f min ) T (46)which, for a quantum state evolving for time T with a constantHamiltonian whose frequencies range from f min to f max , pro-vides an estimate of compression to a reduced system whosedimension is the nearest integer. We show that the predictedcompressed dimension is exact (in a well defined asymptoticsense) if the Hamiltonian frequencies (eigenvalues) range lin-2early when ordered from minimum to maximum. Since thetime-bandwidth product does not depend on the initial state,it is, in effect, predicting the range of the worst-case level ofcompression, or more precisely, where compression begins asdefined in the Introduction. Though every real system has anonlinear frequency sweep, numerous simulations of systemswith random (constant) Hamiltonians shown here do not vio-late the predicted approximate level of compression, i.e. , noorders of magnitude changes. In general, the time-bandwidthproduct is consistent with the level or onset of observed com-pression.The time-bandwidth product is of course not independentof the Hilbert space dimension. For spin systems com-posed of two-level particles, the frequency range typicallyscales linearly with the number of particles. As a result thetime-bandwidth estimated compression level scales logarith-mically. More generally a polynomial scaling of frequencyrange will still result in a compression that beats exponentialscaling.A lower value of the predicted level of compression, andsometimes significantly lower, is possible dependent uponhow the initial state is distributed amongst the Hamiltonianeigenvectors. In particular, for a system with a random ini-tial state, the system dimension has little impact on the vari-ational compression, the main driver being the product of thefrequency sweep range and the simulation run time. In con-trast, for a specific initial state, such as one close to the systemground state, the interaction with the Hamiltonian eigenvec-tors plays a significant role in further reducing the variationalcompression. We showed how this comes into play usinga related measure: if the initial state is predominantly in ornear a subspace spanned by a small number of Hamiltonianeigenvectors, compression will be in the range of the initialsubspace dimension, lower than that indicated by the time-bandwidth product. With enough run-time, and with the initialstate not predominantly confined to a subspace, all of Hilbertspace will eventually be populated. The good news is that thisupper dimension can only grow linearly with run-time: witha huge number of states the run-time would have to be verylong to make such an impact [8].A limitation of the time-bandwidth product is that it is de-rived from a linear variational model. In contrast, a machinelearning system is built to implement a nonlinear transforma-tion, and thereby potentially delivering a lower level of com-pression with the same or smaller error. We demonstrated this effect using an autoencoder. On the other hand, the time-bandwidth product does reveal the existence of a useful levelof compression; finding that with a machine learning systemto actually solve for the dynamics is an evolving area of chal-lenging research.For a time-varying Hamiltonian, e.g. , a system affected bytime-varying external fields, the time-bandwidth product isnot strictly applicable. Nevertheless, simulations show thatcompression still holds, though with an expected dependenceon the field strength. It also seems reasonable to expect com-pression with a time-varying Hamiltonian to be worse thanthe time-invariant case, as the external field can be thoughtof as moving the initial state around through some portion ofthe specified Hilbert space. For a reasonably posed controlproblem, even for an exponentially large many-body system,it would normally not be expected to define the goal for thecontrol to move from one end of Hilbert space to the other.The time-bandwidth predicted compression in some waysreveals that the Schr¨odinger equation is a giant variationalminimization machine within which is found a “discovery:” Compression . Returning to the second paragraph of the paperwith respect to points (1)and (2) there, one can also view thesystem compression as an asymptotic result, where the for-mulation presented in the paper and the numerical evidenceclearly indicates that a tolerable level of the onset of com-pression appears to typically set in at very low values of m with m (cid:28) n . That possibility is of course buried in the datathat goes into any machine learning system. It would seem,then, that a neural net quantum simulator would have to finda compressed system, or else how could it simulate quantumdynamics (with usefully small errors) without using a numberof parameters equivalent to the exponential size of the quan-tum state. Though we have no proof at this time, the reportsof successful simulations of quantum many-body dynamicswith non-exponentially scaling of neural network parameterslend support to the time-bandwidth product prediction of theexistence of compression. Acknowledgments
All of the authors acknowledge supportby the Data X project at Princeton University. RLK partlysupported under the Defense Advanced Research ProjectsAgency (DARPA) Physics of Artificial Intelligence (PAI) Pro-gram (Contract HR00111890031). RLK thanks Shaowu Panfor alerting us to the POD modified autoencoder structure andJun Kyu Lee and Kamal Nayal for the implementation anddata assembly thereof. [1] Giuseppe Carleo and Matthias Troyer. Solving the quantummany-body problem with artificial neural networks.
Science ,355(6325):602–606, 2017.[2] Stefanie Czischek, Martin G¨arttner, and Thomas Gasenzer.Quenches near ising quantum criticality as a challenge for arti-ficial neural networks.
Phys. Rev. B , 98:024311, Jul 2018.[3] Markus Schmitt and Markus Heyl. Quantum dynamics intransverse-field Ising models from classical networks.
SciPostPhys. , 4:013, 2018.[4] G. Fabiani and J. H. Mentink. Investigating ultrafast quantum magnetism with machine learning.
SciPost Phys. , 7:4, 2019.[5] Sankar Das Sarma, Dong-Ling Deng, and Lu-Ming Duan.Machine learning meets quantum physics.
Physics Today ,72(3):48–54, March 2019.[6] Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, LaurentDaudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto,and Lenka Zdeborov´a. Machine learning and the physical sci-ences.
Rev. Mod. Phys. , 91:045002, Dec 2019.[7] Seth Lloyd. Universal quantum simulators.
Science ,273(5278):1073–1078, 1996. [8] David Poulin, Angie Qarry, Rolando Somma, and Frank Ver-straete. Quantum simulation of time-dependent hamiltoniansand the convenient illusion of hilbert space. Physical ReviewLetters , 106(17):4, 2011.[9] Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, andJames P. Sethna. Parameter space compression underlies emer-gent theories and predictive models.
Science , 342(6158):604–607, 2013.[10] J. K. Freericks, B. K. Nikoli, and O. Frieder. The nonequi-librium quantum many-body problem as a paradigm for ex-treme data science.
International Journal of Modern PhysicsB , 28(31):1430021, 2014.[11] Henry W. Lin, Max Tegmark, and David Rolnick. Why doesdeep and cheap learning work so well?
Journal of StatisticalPhysics , 168(6):1223–1247, Sep 2017.[12] Xizhi Han and Sean A. Hartnoll. Deep quantum geometry ofmatrices.
Phys. Rev. X , 10:011069, Mar 2020.[13] Benjamin Russell, Herschel Rabitz, and Re-Bing Wu. Con-trol landscapes are almost always trap free: a geometric assess-ment.
Journal of Physics A: Mathematical and Theoretical ,50(20):205302, 2017.[14] Robert L Kosut, Christian Arenz, and Herschel Rabitz. Quan-tum control landscape of bipartite systems.
Journal of PhysicsA: Mathematical and Theoretical , 2019.[15] R. V. L. Hartley. Transmission of information.
The Bell SystemTechnical Journal , 7(3):535–563, 1928.[16] Hartley’s Law: “It is shown that when the storage of energy isused to restrict the steady state transmission to a limited rangeof frequencies the amount of information that can be transmit-ted is proportional to the product of the width of the frequency-range by the time it is available.”.[17] S. Lloyd and S. Montangero. Information theoretical analysisof quantum optimal control.
Phys. Rev. Lett. , 113:010502, Jul2014.[18] J. Nathan Kutz.
Data-Driven Modeling & Scientific Computa-tion: Methods for Complex Systems & Big Data . Oxford Uni-versity Press, Inc., USA, 2013.[19] In POD/PCA often a bias term is included: x ( θ t ) = Mθ t + b .[20] Akshat Kumar and Mohan Sarovar. On model reduction forquantum dynamics: symmetries and invariant subspaces. Jour-nal of Physics A: Mathematical and Theoretical , 48(1):015301,Dec 2014.[21] Roger A. Horn and Charles R. Johnson.
Matrix Analysis . Cam-bridge University Press, 1990.[22] Ulf Grenander and Gabor Szego.
Toeplitz Forms and Their Ap-plications . Chelsea Pub. Co., first edition, 1958. Second Edition1984.[23] Robert M. Gray. On the asymptotic eigenvalue distribution oftoeplitz matrices.
IEEE Transactions on Information Theory ,18(6):725–730, 1972.[24] Georgia Fisanick-Englot and Herschel Rabitz. Studies ofinelastic molecular collisions using impact parameter meth-ods. i. model calculations.
The Journal of Chemical Physics ,62(4):1409–1424, 1975.[25] A. Bottcher, J. M. Bogoya, S. M. Grudsky, and E. A. Maxi-menko. Asymptotics of eigenvalues and eigenvectors of toeplitzmatrices.
Sbornik: Mathematics , 208(11):1578–1601, nov2017.[26] Sven-Erik Ekstr¨om, Carlo Garoni, and Stefano Serra-Capizzano. Are the eigenvalues of banded symmetric toeplitzmatrices known in almost closed form?
Experimental Mathe-matics , 12 2017.[27] Shaowu Pan and Karthik Duraisamy. Physics-informed prob-abilistic learning of linear embeddings of nonlinear dynamics with guaranteed stability.
SIAM Journal on Applied DynamicalSystems , 19(1):480–509, 2020.[28] Stephen Boyd, Laurent El Ghaoui, Eric Feron, and Venkatara-manan Balakrishnan.
Linear Matrix Inequalities in System andControl Theory . Society for Industrial and Applied Mathemat-ics, 1994.[29] M. Kishida and R. D. Braatz. On the analysis of the eigenval-ues of uncertain matrices by µ and ν : Applications to bifurca-tion avoidance and convergence rates. IEEE Transactions onAutomatic Control , 61(3):748–753, 2016.
Appendix A: Singular values of R Using the eigenvalue decomposition of the Hamiltonianand state flow (15)-(16), the covariance matrix (9) can be ex-pressed as, C [ ψ ] = V F V † , F = 1 T (cid:90) T f t f † t dt ∈ C n × n . (A1)Because V is unitary, the singular values of the state covari-ance matrix C (we drop the C [ ψ ] notation for clarity) are iden-tical to those of F : F = W (cid:34) Q m Q n − m (cid:35) W † ,W = (cid:104) W m W n − m (cid:105) , W m ∈ C n × m , ⇓ (cid:32) (cid:104) M m M n − m (cid:105) = V WM m = V W m ∈ C n × m (cid:33) . (A2)The matrices Q m , Q n − m are diagonal and contain the singu-lar values of C (or F ) in descending order (same as in (9)).Since V, W ∈ U ( n ) , the singular vectors of C are the columnsof the unitary product V W ∈ U ( n ) from which the first m columns provide the basis for the variational model reductioncorresponding to the model error Tr Q n − m (11). An equiva-lent expression for F is, F = diag ( e − iωT/ ) R diag ( e − iωT/ ) † ,R = diag ( α ) S diag ( α ) † , (A3)with R and S as defined in (17)-(18). Since diag ( e − iωT/ ) isa unitary, the singular values of F are the same as those of R and also C . Appendix B: Singular values of S lin Let T L ∈ R L × L be a real symmetric Toeplitz matrix whosefirst column is generated by the series ¯ f L = { f (cid:96) , (cid:96) ∈ [0 , L − } . From [25], if the Fourier Transform F ( ω ) = F ( ¯ f ∞ ) sat-isfies certain monotonicity conditions, then as L → ∞ theeigenvalues of T L approach those of the Fourier transform F ( ω ) . The first column of the sinc-matrix with a linear sweep S lin (26) are the first n terms of the of the L -length series4 (a)(b) FIG. 11.
Singular values of ideal (28) and S lin . (a) { n = 500 , ∆ =100 /π } and (b) { n = 3000 , ∆ = 300 /π } with compression esti-mate m tbw = (cid:108) (cid:16) nn − (cid:17) ∆ (cid:109) from (29). ¯ s L = { sinc (cid:16) π ∆ (cid:96)n − (cid:17) , (cid:96) ∈ [0 , L − } . With n fixed, the FourierTransform of ¯ s ∞ satisfies the aforementioned conditions, i.e. ,from standard tables, F ( ω ) = (cid:80) ∞ (cid:96) = −∞ e − iω(cid:96) sinc (cid:16) ∆ π(cid:96)n − (cid:17) = (cid:40) ( n − / ∆ , | ω | ≤ π ∆ / ( n − , , π ∆ / ( n − < | ω | < π. (B1)Discretizing ω = 2 πk/n, k = 0 , . . . , n − , and then sortingand grouping the absolute values of | F ( ω k ) | , we get the sin-gular value (asymptotic in n ) approximation (28). (In [26] thistype of asymptotic approximation was used for Toeplitz eigen-values.) Figure 11 compares the ideal (Fourier transform)singular values (28) labeled svSft with those from the linearsweep matrix S lin ∈ R n × n labeled svS for two instances: (a) { n = 500 , ∆ = 100 /π } and (b) { n = 3000 , ∆ = 300 /π } with compression estimate m tbw = (cid:108) (cid:16) nn − (cid:17) ∆ (cid:109) from (29). Appendix C: Unitary Compression
Applying the POD method, mutatis mutandis , to each ofthe n systems (41) with the variational compression fixed forall at m , results in the optimal linear variational unitary as, X t = (cid:104) Γ u t, · · · Γ n u t,n (cid:105) , (C1)where the rank- m matrices Γ ν = M ν M † ν ∈ C n × n with each M ν ∈ C n × m formed as in (9) from the m singular vectorscorresponding to the m largest singular values of each covari-ance matrix, C ν = 1 T (cid:90) T u t,ν u † t,ν dt. (C2)The optimal (POD) state (unitary) error and the corresponding R and S matrices are, E = n (cid:88) ν =1 Tr ( I n − Γ ν ) C ν ,C ν = V diag ( e − iωT/ ) R ν diag ( e iωT/ ) V † ,R ν = diag ( α ν ) S diag ( α ν ) † , α ν = V † ε ν . (C3)Here S is exactlyexactly