Global sensitivity analysis for optimization of the Trotter-Suzuki decomposition
GGlobal sensitivity analysis for optimization of the Trotter-Suzuki decomposition
Alexey N. Pyrkov, ∗ Yurii Zotov, Jiangyu Cui, and Manhong Yung Institute of problems of chemical physics RAS, Acad. Semenov av. 1, Chernogolovka, Moscow region, Russia, 142432 Huawei Russian Research Institute, Huawei Technologies Co. Ltd., Moscow, Russia, 121614 Central Research Institute, Huawei Technologies, Shenzhen 518129, China Data Center Technology Laboratory, Huawei Technologies Co Ltd, 115371 Shenzhen, Guangdong, China
The Trotter-Suzuki decomposition is one of the main approaches for realization of quantum simu-lations on digital quantum computers. Variance-based global sensitivity analysis (the Sobol method)is a wide used method which allows to decompose output variance of mathematical model into frac-tions allocated to different sources of uncertainty in inputs or sets of inputs of the model. Here wedeveloped a method for application of the global sensitivity analysis to the optimization of Trotter-Suzuki decomposition. We show with a proof-of-concept example that this approach allows to reducethe number of exponentiations in the decomposition and provides a quantitative method for findingand truncation ’unimportant’ terms in the system Hamiltonian.
I. INTRODUCTION
Nowadays complex mathematical and computationalmodels are used in all areas of modern society and af-fected economy and technologies in many different ways[1]. Contemporary mathematical models and develop-ment of computational devices have given impulse to fastdigitalization of different areas from medicine to agricul-ture [2–5]. However, growing of data and complexity ofsystems under consideration does not allow to process allthat exactly and the ab-initio understanding of importantfeatures of the complex mathematical models looks likean unrealizable dream [6, 7]. Recently, a lot of methodswere developed in order to extract the most importantfeatures of the complex models in such a manner that itis possible to make predictions about the outcomes onthe basis of the chosen features only [8–12]. One of suchmethods is the variance-based global sensitivity analy-sis (the Sobol method) [21–24]. It is a wide used methodwhich allows to decompose output variance of mathemat-ical model into fractions allocated to different sources ofuncertainty in inputs or sets of inputs of the model. Thisallows to simplify model under consideration: we canidentify the inputs that have no effect on the output andremove redundant parts of the model from consideration.On the other hand, recently the fast progress in ex-perimental quantum computing and quantum algorithmswas observed [25–36] and now many researchers are try-ing to learn how quantum computing and the Big Datacan benefit from each other. Since the idea of quantumcomputing appeared, one of the most promising appli-cations of quantum computers has been the simulationof complex quantum systems [37–44]. Nowadays quan-tum simulations has become one of the most importantapplications for NISQ devices [45–53]. Quantum pro-grammable devices with two-qubit gate fidelities more98 percents were developed [54–61] and this raises theexciting opportunities for solving problems that are be- ∗ [email protected] yond the reach of classical computation [45, 62]. How-ever, the error rates of currently available and near-termquantum machines are severely limited by the total num-ber of gates that can be reliably performed to simulatethe dynamics of quantum systems. Usually, the totalnumber of gates is defined by the Trotter-Suzuki (TS)decomposition [63–65, 67]. This decomposition allows torepresent the evolutionary operator, in the first order ex-pansion, as a product of matrix exponents of the systemHamiltonian terms. Furthermore, TS decomposition is akey routine in some methods of quantum optimal con-trol [13], in time evolution algorithms for matrix productstates [14–16] and so on [20]. Thus reaching the goal ofpractical quantum supremacy will require not only sig-nificant experimental advances, but also careful quantumalgorithm design and implementation. Recently, somemethods for optimization of TS decomposition were pro-posed. In particular, the schemes based on evolutionaryalgorithms [17], random sampling [18] and so on [19] wereconsidered. Nevertheless, all the protocols work withsome restrictions and have some disadvantages for prac-tical using on current quantum devices (for example, theprotocol on the basis of random sampling [18] has ad-vantages only for the huge unpractical number of gatesunachievable on current digital quantum machines). Onthe other hand, one of the simplest approaches that canbe used on the current generation of quantum devicesand as preprocessing for other more complex protocolsis optimization of TS decomposition by removing unim-portant gates from consideration with some increasing inerror. However, in many cases it is not clear what termsof Hamiltonian can be removed from consideration whenwe apply the TS decomposition and what criteria we canuse for that. Nowadays, the removing of the irrelevantterms is mostly empirical and deals only with the termswhich are a few orders of magnitude smaller in terms ofsome matrix norm in comparison with others.Here, we develop an approach based on the Global sen-sitivity analysis for optimization of the TS decompositionto remove the irrelevant terms. It is shown that the ap-proach allows to find most of the terms that give smallcontribution in the whole problem variance providing a a r X i v : . [ qu a n t - ph ] J a n method for quantitative analysis of the irrelevant termsin the system Hamiltonian. We show that the removingthese terms from consideration allows to reduce the num-ber of exponentiations up to 25 percent with moderateincreasing in the error of approximation. II. THE TROTTER-SUZUKI DECOMPOSITION
Simulations of quantum dynamics of complex systemsis the classically unsolvable problem due to the fact thatthe dimension of the problem grows exponentially withthe system size and the matrix exponentiation is re-quired. The Trotter-Suzuki decomposition theoreticallyallows one to model efficiently evolution of complex quan-tum systems on a digital quantum computer. This de-composition approximates the matrix exponentiation ofthe sum of non-commuting operators with the productof exponents, each of which can be represented throughsingle-qubit and two-qubit gates, and thus allows to sim-ulate the matrix exponentiation of quantum systems withthe desired accuracy.The simplest first-order Trotter-Suzuki decompositioncan be represented as e x ( A + B ) ≈ e xA e xB + O ( x ) , (1)where x - small parameter and A , B - non-commutingoperators [ A, B ] (cid:54) = 0.In order to obtain the second–order expansion, it ispossible to represent both sides of the equation (1) inthe following form: e x ( A + B ) = I + x ( A + B ) + 12 x ( A + B ) + O ( x ) = I + x ( A + B ) + 12 x (cid:0) A + AB + BA + B (cid:1) + O ( x ) ,e xA e xB = (cid:18) I + xA + 12 x A + O ( x ) (cid:19) × (cid:18) I + xB + 12 x B + O ( x ) (cid:19) = I + x ( A + B ) + 12 x (cid:0) A + 2 AB + B (cid:1) + O ( x ) . (2)Thus, we get the following expression e xA e xB = e x ( A + B )+ x [ A,B ]+ O ( x ) , (3)which corresponds to the well-known Baker-Hausdorffoperator identity e ( A + B ) = e A e B e − [ A,B ] , (4)when A and B commute with their commutator.Usually, using the Trotter-Suzuki decomposition, wesplit x into n parts and in this case the representation FIG. 1. Error of the TS decomposition versus number ofqubits for the dense case. The Hamiltonian has 105 randomlysampled Hermitian terms and about 30 of them are weakenin 50 times. The sensitivity analysis allows to reduce num-ber of exponentiations to 95, 76, 75 terms for 5, 7, 9 qubitsrespectively with moderate increasing of the error. can be written as: (cid:0) e xn A e xn B (cid:1) n = (cid:20) e xn ( A + B )+ ( xn ) [ A,B ]+ O (cid:16) ( xn ) (cid:17) (cid:21) n = e x ( A + B )+ x n [ A,B ]+ O (cid:16) x n (cid:17) , (5)that gives the convergence of the formula for n → ∞ .Generalization of the Trotter-Suzuki decomposition tothe higher orders reads as e x ( A + B ) = e p xA e p xB e p xA e p xB · · · e p M xB + O ( x m +1 ) , (6)where the selection of the parameters { p , p , · · · , p M } iscarried out in such a way that the correction term is ofthe order of x m +1 . In this paper we will benchmark onlyfirst and second order TS decompositions that are morerelevant from practical point of view. III. VARIANCE-BASED GLOBAL SENSITIVITYANALYSIS
Modern physical systems can be very complex andwhen we try to model them the relationships betweeninputs and outputs are mostly poorly understood. Theinputs and outputs can be subject to different sources ofuncertainty, including external noise, errors in measure-ments and incomplete understanding of intrinsic mecha-nisms that manage the complex system under considera-tion. This uncertainty imposes a limit on our confidencein the output of the model and requires an knowledge ofhow much each input is contributing to the output uncer-tainty. Sensitivity analysis performs the role of orderingthe inputs by impact in the variance of the output. Inmodels involving many input variables, sensitivity anal-ysis allows to make the dimension reduction. Here weoverview the theory of variance-based global sensitivityanalysis [21–24].
A. Decomposition of variance and Sensitivityindexes
From a black box perspective, any model may beviewed as a function y = f ( x ) defined in the d -dimensional unit hypercube, where x is a vector with d components x , x , . . . x d , and y is a scalar output. Inorder to intuitively understand how the sensitivity anal-ysis works lets consider what happens with the output y when we fix one of the variables x i at a particular value˜ x i . Denote the conditional variance of y over all vari-ables but x i as V ar x − i ( y | x i = ˜ x i ). It is evident thatthe resulting conditional variance with one frozen vari-able is less or equal to the total unconditional variance V ar ( y ). Furthermore, it is easy to notice that the smaller V ar x − i ( y | x i = ˜ x i ) in comparison with the total vari-ance of the model, the greater the influence of x i on themodel. Thus, the conditional variance V ar x − i ( y | x i = ˜ x i )can be used as a measure of the relative importance of x i . In order to avoid the dependence of the measure onthe particular ˜ x i we can average V ar x − i ( y | x i = ˜ x i ) overall possible points x i in our hypercube and consider theexpectation value E x i ( V ar x − i ( y | x i )) as the importancemeasure. This expectation value satisfies the followingexpression: E x i ( V ar x − i ( y | x i )) + V ar x i ( E x − i ( y | x i )) = V ar ( y )and a small E x i ( V ar x − i ( y | x i )) (or large V ar x i ( E x − i ( y | x i ))) implies that x i is an importantvariable in the model. Defining and ordering thesensitivity indexes of the first order as S i = V ar x i ( E x − i ( y | x i )) V ar ( y )it is possible to estimate how much each input is con-tributing to the output uncertainty (a high value indi-cates an important variable).More rigorously and generally speaking, according tothe Sobol’s approach the y = f ( x ) may be decomposedas y = f + d (cid:88) i =1 f i ( x i )+ d (cid:88) i In the most cases it is not possible to calculate thesensitivity indexes analytically and usually they are es-timated with some numerical approximation methods.One of the most common approaches in this case is theMonte Carlo method. The Monte Carlo method involvesgenerating a sequence of randomly distributed points inthe input space and calculating some probabilistic char-acteristics of the process under consideration. In prac-tice, in order to improve efficiency, the quasi-Monte Carlomethod is used, in this case the genuine random samplesequence replaced with low-discrepancy sequence. One ofthe low-discrepancy sequences commonly used in sensi-tivity analysis is the Sobol sequence. In order to calculatethe sensitivity indexes in the Sobol paradigm, the follow-ing steps are required: 1) Generate an N × d samplematrix where each row is a sample point with 2 d dimen-sions; 2) Denote the first d columns of the matrix as ma-trix A , and the remaining d columns as matrix B . Thiseffectively can be considered as two independent sets ofsamples of N points in the d -dimensional unit hypercube; number of qubits 5 7 9dense (no. of gates after reduction) 95 76 75sparse (no. of gates after reduction) 104 95 77TABLE I. Number of gates after truncation for different num-ber of qubits for dense and sparse cases. Initial randomlygenerated system Hamiltonians consist of 105 terms for eachnumber of qubits. 3) Construct a set of d matrices with dimension of N × d with replacing of i th column of matrix A with i th columnof matrix B and the remaining columns leave unchanged,denote each such matrix as A i ( B ) where i = 1 , , ..., d ; 4)The matrices A , B , and A i ( B ) define N × ( d + 2) pointsin the input space (one for each row). Using this samplepoints calculate the corresponding outcome values f ( A ), f ( B ) and f ( A i ( B )); 5) Then in order to calculate thesensitivity indexes it is possible to use the following esti-mators: V ar x i ( E x − i ( y | x i )) ≈ N N (cid:88) n =1 f ( B ) n ( f ( A i ( B )) n − f ( A ) n )(14)It is worth to emphasize that the accuracy of the es-timators depends on N which can be chosen by sequen-tially adding points to reach convergence. In order toestimate the sensitivity indexes for all input variables itis required to run calculations N ( d + 2) times. Since N isoften quite large, computational cost can quickly becomechallenging when the model requires a significant amountof time for a single run. In order to reduce the compu-tational cost some additional numerical techniques weredeveloped, such as emulators, HDMR [68–70] and FAST[71]. IV. OPTIMIZATION OF THETROTTER-SUZUKI DECOMPOSITION In this section we present an algorithm for optimizationof the TS decomposition by removing unimportant gatesfrom consideration using the global sensitivity analysiswith some increasing in error. We apply the algorithm tothe first order TS decomposition and show that it allowsto find unimportant terms in the system Hamiltonian.We present a proof-of-concept examples with randomlysampled dense and sparse Hermitian matrices as parts ofa system Hamiltonian.In order to construct the proof-of-concept examples,we randomly generate Hamiltonians with some fixednumber of random Hermitian matrices, sparse (only 30percent of each Hamiltonian term are non-zero) anddense: H = (cid:88) n H n . (15) FIG. 2. Error of the TS decomposition versus number ofqubits for the sparse case. The Hamiltonian has 105 randomlysampled Hermitian terms and about 30 of them are weakenin 50 times. The sensitivity analysis allows to reduce numberof exponentiations to 104, 95, 77 terms for 5, 7, 9 qubitsrespectively with moderate increasing of the error. Our algorithm is based on sensitivity analysis of theFrobenious norm of different linear compositions of theterms in our system Hamiltonian (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:88) n β in H n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (16)where the coefficients β in constructed as quasi-randomSaltelli sequence [24] of the coefficients and i is indexingsimilar for rows in matrices A , B , and A i ( B ) from theprevious section. Then we calculate how the coefficientsaffect the variance of the norm of the system Hamilto-nian of (14). In order to take into account contributionof each Hamiltonian term in the total output variancewe consider sensitivity indexes of first-order only of (12)and trying to indicate with the indexes what terms ofour Hamiltonian can be truncated. We use the SALibpython library [72] to implement our calculations of thesensitivity indexes on the HPC [73].In particular, we consider quantum systems with 5, 7and 9 qubits. For each of them we construct a Hamil-tonian on the basis randomly sampled Hermitian matri-ces with 105 terms H = (cid:80) n H n (separately for denseand sparse cases). Then we artificially weaken about 30of randomly choosen { H n } in 50 times multiplying with0.02 and construct a Hamiltonian ˜ H = (cid:80) n ˜ H n with theweaken terms and the rest (the total number of termsin the Hamiltonian ˜ H does not change and still equalsto 105). In order to calculate error of the TS decompo-sition in comparison with genuine evolutionary operatorwith Hamiltonian ˜ H we calculate the norm of differenceof matrix exponentiations: (cid:15) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) e − i ˜ Ht − N (cid:89) n =1 e − i ˜ H n t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (17) for the first-order TS approximation and (cid:15) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) e − i ˜ Ht − N (cid:89) n =1 e − i ˜ Hnt (cid:89) l = N e − i ˜ Hlt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (18)for the second-order TS approximation. In order to cal-culate the errors we use the HiQPulse python library[74] which allows to calculate the matrix exponentiationsmore than 5 times faster in comparison with SciPy li-brary.Then we implement the sensitivity analysis for theFrobenious norm of (cid:13)(cid:13)(cid:13) β i ˜ H (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:80) n β in ˜ H n (cid:13)(cid:13)(cid:13) with the useof the Saltelli sequence and find the first-order sensitivityindexes. We remove from consideration the terms withsensitivity indexes which are in 1000 times smaller thanthe average value of the sensitivity indexes for the densecase and in 5000 times smaller than the average value forthe sparse case. After that, we calculate the error of theTS decomposition for the reduced Hamiltonian in com-parison with genuine evolutionary operator with Hamil-tonian ˜ H using the HiQPulse python library. It allows toremove 4, 13 and 25 gates in the first-order TS decom-position for 5, 7 and 9 qubit systems respectively for thedense case (6, 11 and 26 gates for the sparse case) withvery small increasing in error (estimated analogously to(17) and (18)), see Table I.Choosing the same terms for reduction in the second-order TS decomposition we can reduce number of gatesfrom 209 in about 5, 13 and 25 percents for 5, 7 and 9qubit systems respectively with moderate increasing inerror, see Table I.On the Fig. 1, the results in error between actual sys-tem evolution and TS approximations with and withoutremoving unimportant gates in the TS approximationsof different orders for the dense case are presented. Wecan see that the error of the reduced second-order TSapproximation is much better in comparison with thefirst-order TS decomposition with 50 percent more gatesfor implementation and moderately worse than the gen-uine second-order TS approximation with 25 percent lessnumber of gates. This opens up some opportunities formore optimal TS approximations with truncation unim-portant gates in the NISQ era devices. The similar resultsare obtained for the sparse case (see Fig. 2). V. DISCUSSION