Practical Pulse Engineering: Gradient Ascent Without Matrix Exponentiation
aa r X i v : . [ qu a n t - ph ] A p r Practical Pulse Engineering: Gradient Ascent Without Matrix Exponentiation
Gaurav Bhole and Jonathan A. Jones ∗ Centre for Quantum Computation, Clarendon Laboratory,University of Oxford, Parks Road, OX1 3PU, United Kingdom (Dated: April 25, 2018)Since 2005 there has been a huge growth in the use of engineered control pulses to perform desiredquantum operations in systems such as NMR quantum information processors. These approaches,which build on the original gradient ascent pulse engineering (GRAPE) algorithm, remain compu-tationally intensive because of the need to calculate matrix exponentials for each time step in thecontrol pulse. Here we discuss how the propagators for each time step can be approximated usingthe Trotter–Suzuki formula, and a further speed up achieved by avoiding unnecessary operations.The resulting procedure can give a substantial speed gain with negligible cost in propagator error,providing a more practical approach to pulse engineering.
PACS numbers:
Quantum information processors encode informationin two-level quantum systems (qubits) and manipulatethis through a series of elementary unitary transforma-tions (quantum logic gates) [1, 2]. Quantum control seeksto implement some target unitary propagator U in aquantum system with background Hamiltonian H byapplying some time-dependent Hamiltonian H ( t ). Theresulting operator can be written as V = T (cid:18) exp (cid:20) − i Z H + H ( t ) d t (cid:21)(cid:19) (1)where T is the Dyson time-ordering operator. To makeprogress beyond this formal solution it is usually neces-sary to replace this continuously varying Hamiltonian bya piecewise constant form, so that V = V n V n − . . . V , V j = exp [ − i( H + H j ) δt ] , (2)and to write the time-varying portion of the Hamiltonianas a weighted sum of a set of p distinct control fields H j = p X k =1 a kj H k . (3)Any particular control pulse can then be described by thecorresponding set of amplitudes, a kj , and the time step δt ,here taken as fixed. The quality of a control pulse can bemeasured by its fidelity with the desired operation U Φ = |h U | V i| (4)where the Hilbert-Schmidt inner product is defined by h U | V i = tr( U † V ), possibly normalised by the dimensionof the operators [2]. The optimal control problem is thento find the set of amplitudes which maximises this fi-delity, usually in the presence of practical constraints onthe magnitudes of the amplitudes and the total length ∗ Electronic address: [email protected] of the sequence. This is computationally challenging asthe dimension of the underlying Hilbert space rises ex-ponentially with the number of qubits to be controlled,although this difficulty can in some cases be reduced byusing subsystems to simplify the calculations [3]. One re-cent approach [4] is to use subsystem methods to find ap-proximate control pulses and then optimise these directlyusing the quantum system itself. Whatever approach isadopted, it is important to perform any computations asefficiently as possible.
I. GRADIENT ASCENT
Practical algorithms to maximize a function usuallyrely on the calculation of gradients, built from values of ∂ Φ /∂a kj , but early work on optimal control was hamperedby the belief that calculation of each of the np distinctelements of the gradient vector would require the fullevaluation of V , and thus a total of n p sub-propagators.The design of control pulses was a major topic withinNuclear Magnetic Resonance (NMR) [5], but most NMRsequences were paramaterized by a small number of vari-ables to keep the size of the computation manageable.A breakthrough in 2005 was the realisation that itis not necessary to recalculate these sub-propagators,which can instead be stored and reused. The resultingalgorithm, known as GRadient Ascent Pulse Engineer-ing (GRAPE) [6], is now widely used, as are many vari-ants, such as second order GRAPE [7], Newton–RaphsonGRAPE [8], and GRadient Ascent in Functional Space(GRAFS) [9]. However, although this algorithm reducesthe number of sub-propagator calculations from n p to n ,the calculation of each of the n sub-propagators still re-quires a matrix exponential, which must be re-evaluatedat each round of the search algorithm.A variety of algorithms exist for calculating numer-ical matrix exponentials [10], but the most widely usedmethods combine scaling and squaring with Pad´e approx-imants [11, 12]. Whatever approach is used, this largenumber of matrix exponentials remains a key computa-tional bottleneck, and reducing the number of such op-erations would greatly speed up the entire process. Sim-ilar issues are encountered in other optimal control algo-rithms such as Krotov [13], Lyapunov [14], and ChoppedRandom Basis Optimization (CRAB) [15]. As we shallshow below, however, when control pulses are made upfrom a large number of small rotations then the matrixexponentials can be efficiently approximated with negli-gible error.To make further progress we will initially specialise tothe case of a homonuclear NMR system of q distinct spin-1 / q qubits, with a Hilbert space of dimension N = 2 q .Our efficient algorithm is not confined to this case, butit is useful to begin with a concrete example. The back-ground Hamiltonian can then be written as H = q X r =1 ω r I zr + X s It is, however, possible to approximate the sub-propagator as long as δt is small enough, as all small-angle unitary evolutions approximately commute. If theterms are very small then it suffices to writee − i( H + H xj ) δt ≈ e − i H δt e − i H xj δt ≈ e − i H xj δt e − i H δt , (9)but this Trotter approximation, implicitly used by Bholeand Mahesh [17], is only accurate to second order in δt .It is better to use the Trotter–Suzuki form [2, 18]e − i( H + H xj ) δt ≈ e − i H δt/ e − i H xj δt e − i H δt/ , (10)which is accurate to third order in δt . As the propaga-tor fidelity depends quadratically on the size of the error,this approximation will be accurate to sixth order in δt when calculating fidelities and their associated gradients.A more thorough treatment of these errors is given be-low, but in our experience these errors are essentiallynegligible for pulse engineering calculations.It might seem that approximating a matrix exponen-tial by the product of three exponentials does not con-stitute progress. However, the two outer terms are con-stant, and so only need to be evaluated once for the entireGRAPE calculation, while the central Hamiltonian is asimple scalar multiple of F x , and so will always be diag-onal in the Hadamard basis. The basis transformationcan be combined with the fixed outer terms, giving V j ≈ e − i φ j F z W e − i α j F z δt W e i φ j F z , (11)with all explicit matrix exponentials now diagonal in thecomputational basis. The basis transformations W = e − i H δt/ H ( q ) , W = H ( q ) e − i H δt/ , (12)where H ( q ) is the q -qubit Hadamard gate, are the samefor every propagator and so are evaluated only once, re-quiring only a single full matrix exponential.This expression does not depend on the weak cou-pling approximation, as strong couplings are unaffectedby phase transformations. Similarly dipolar couplingsare axially symmetric, and so once again unaffected. Itcan be trivially extended to heteronuclear spin systems,as control fields applied to one nuclear species do not af-fect spins of other species, and so all commute with oneanother. The resulting approximate expression is iden-tical to Eqn. 11, except that the three diagonal matrixexponentials are now calculated over weighted sums ofoperators corresponding to each control field. III. ERRORS AND SPEED GAINS The errors in this approach are most simply consideredfor a one-spin system with H = ω I z and H j = α j I x ,which permits analytical calculations. The fidelity be-tween the exact value of V j , Eqn. 2, and its approxima-tion, Eqn. 11, isΦ = 1 − ω α j ( ω + 4 α j )2304 δt + O ( δt ) . (13)For realistic frequencies and time steps in NMR systems,the infidelity of the approximate approach will be verysmall. Offset frequencies rarely exceed 15 kHz, and forthe low RF powers used during long control pulses thenutation rate is usually below 5 kHz. Even for these ex-treme values, the error for a 10 µ s time step is below5 × − , and for the lower frequencies and smaller timesteps normally used the error will far smaller.Evaluating the error for pulse engineering in a real sys-tem is more complicated, reflecting the larger matricesinvolved, the large number of relevant frequencies, andthe need to evaluate the error in the entire combinedpropagator, and not just the individual sub-propagators.The worst case occurs when each sub-propagator is iden-tical: in this case the errors grow linearly with the num-ber of steps, and so the infidelity, which depends on thesquare of the error, grows quadratically with the num-ber of steps. However such a case is quite unrealisticin practice, as variations in the amplitude and phase ofcontrol fields mean that errors will partly cancel. Oursimulations indicate that the infidelities remain small inrealistic cases, typically around 10 − .The speed gains achieved by this approximate ap-proach arise from avoiding full matrix exponentials byperforming all calculations in an appropriate eigenbasiswhere the operators are diagonal. The new computa-tional bottleneck is the calculation of matrix products,and since both matrix multiplication and numerical ma-trix exponentiation have a computational complexity of O ( N ) for N -by- N matrices, these gains are approxi-mately constant, and independent of the dimension ofthe Hamiltonian. (Much larger speed gains have beendemonstrated in open system GRAPE [21], but only forstate-to-state tasks.)As three of the matrices in Eqn. 11 are diagonal moststeps can be carried out using efficient partial matrixmultiplication, and only one full matrix product is re-quired (see the Appendix). The exact speed up achievedcan be quite complex, due to implicit parallel compu-tation on larger matrices. While this reduces the wallclock time, the CPU time will be increased to handle theoverheads of parallelisation. This will affect the observedspeed-up in a way that depends upon matrix size [19]as matrix exponentiation is more easily parallelised thanour efficient algorithm. The speed gain achievable willdepend on both the spin system chosen and the precisecode used, but our MATLAB simulations for the 3-spin homonuclear system corresponding to the three C nu-clei in alanine [20] indicate that the calculation of sub-propagators can be sped up by a factor of around 18,while full propagators (which require an additional fullmatrix multiplication for each sub-propagator) are spedup by a factor of around 13. This speed gain means thata four-spin system can be simulated more rapidly withour approximate approach than an exact simulation of athree-spin system. IV. REDUCING ERRORS The infidelity in Eqn. 11 arises from the fact that H xj does not commute with H , and can be minimised bymaking H xj as small as possible. This can be achieved bymoving part of H xj into H , writing H ′ = H + Ω F x , H ′ j = α ′ j F x = ( α j − Ω) F x , (14)where the offset frequency Ω is chosen to make the val-ues α ′ j as close to zero as possible, for example by set-ting it to half the maximum amplitude expected. Whilethis will increase the infidelity of some individual sub-propagators, on average the infidelity will be decreased,and the fidelity of the total propagator will improve. Thebasis transformation operators, Eqn. 12, have to be cal-culated for the new value of H ′ , but as this only needs tobe done once for the entire calculation this gain in fidelitycomes at no cost in time.A more accurate approach is to choose Ω as the meanamplitude for the particular propagator being calculated:in this case the basis transformation operators have to berecalculated each time, but this still only requires one fullmatrix exponential for each propagator. If even higheraccuracy is required then it is possible to use two differ-ent offset values, one for small values of α j and one forlarge values, choosing the appropriate basis transforma-tion in each case. Our simulations suggest that using asingle value of Ω can improve the infidelity of the propa-gator by a factor around 15, while using two values cangive an improvement of about 200, leading to propagatorinfidelities around 10 − .This could be improved still further by using a largernumber of offsets, effectively trading memory for time[17], but these later gains are smaller than the early ones.When the numbers of offsets is large then all values of α ′ j are small in comparison with all other frequencies, and inthis limit the infidelity falls quadratically with the num-ber of offsets, as expected from Eqn. 13. If very precisepulse engineering is required then it is simpler to use ap-proximate techniques in the early stages of optimization,and switch to exact calculations using full matrix expo-nentials once the pulse fidelity is high enough to justifythis. V. CONCLUSIONS We have applied our efficient algorithm with a singleoffset frequency to calculate GRAPE control pulses in avariety of NMR systems, observing a speed increase of atleast a factor of 6. We have checked each pulse againsta conventional full matrix exponential calculation, andthe error in the fidelity of the final propagator has al-ways been below 10 − . As typical experiments in thisfield seek a fidelity of around 0.999 [22] this error is effec-tively negligible, and approximate propagators providean entirely practical approach to pulse engineering.Although developed, like GRAPE, for use in NMR, ourapproach could be used in other fields where GRAPE hasbeen applied such as ESR [23], NV centres [24], ion traps[25], and circuit QED [26]. The key requirement is thatcontrol Hamiltonians either commute with one another orcan be converted to commuting forms using phase trans-formations which commute with the background Hamil-tonian, and that the individual sub-propagators remainsmall enough to use Trotter–Suzuki approximations. Acknowledgments G. Bhole is supported by a Felix Scholarship. Appendix: Efficient matrix multiplication As matrix multiplication is now the computational bot-tleneck, it is vital that this is carried out as efficiently aspossible. Full matrix multiplication, with computationalcomplexity of O ( N ), should only be used when actuallynecessary: when multiplying a full matrix by a diagonalmatrix only O ( N ) steps are required.If the algorithm is coded in a low level language then itis easy to ensure this, but in higher level languages, suchas MATLAB, it is necessary to code carefully. Diagonalmatrices should be stored not as matrices but as vec-tors, to avoid unnecessary operations. In particular theexponentials of diagonal matrices must be calculated us-ing direct exponentiation of the individual elements. Tocombine the individual matrices, Eqn. 11 can be written as { φ } W { α } W { φ ∗ } (A.1)where braces indicate diagonal matrices stored as vectors.The multiplications by the two outer diagonal matricescan be combined, and the overall process written asΦ . ∗ ( W ∗ ( α . ∗ W )) , with Φ = φ . ∗ φ ′ , (A.2)where ∗ is the MATLAB operator for a full matrix mul-tiplication, . ∗ is the operator for an element-by-elementmultiplication, and φ ′ indicates the adjoint of the vector φ . Note that only a single full matrix multiplication isrequired.Combining the phase multiplications into a single ma-trix Φ is particularly useful when developing pulses whichare robust to variations in the RF coupling strength.These are simulated by evaluating propagators with thecontrol amplitude set to a range of values [22] (e.g., 95%,100% and 105% of the nominal value). In such casesthe matrix Φ is the same for all the different couplingstrengths, and need only be calculated once.When programming in MATLAB it is also importantto think carefully about memory handling. In particu-lar it is quicker to evaluate Eqn. A.2 one multiplicationat a time, storing intermediate results in explicit vari-ables. If the multiplications are carried out in one linethen temporary variables are created to hold interme-diate results, and subsequently destroyed. If equivalentcalculations are carried out many times, as happens whenevaluating propagators, it is quicker to reuse previouslyallocated variables. These minor issues are almost ir-relevant in conventional GRAPE calculations, where thetime needed for matrix exponentiation dominates overeverything else, but become important once all these slowstages have been removed.It might appear possible to speed up calculations fur-ther, for example by using the structure in F z to avoidrepeatedly calculating the same exponential terms. Thiswould certainly be sensible when programming in a lowlevel language, but in our experience such tricks actu-ally slow MATLAB down. It can be difficult to predictprecisely what will give the fastest MATLAB code, andexperimentation is the best approach. [1] C. H. Bennett and D. P. DiVincenzo, Nature , 247(2000).[2] M. A. Nielsen and I. L. Chuang, Quantum Computationand Quantum Information (CUP, 2000).[3] C. A. Ryan, C. Negrevergne, M. Laforest, E. Knill, andR. Laflamme, Phys. Rev. A , 012328 (2008).[4] D. Lu, K. Li, J. Li, H. Katiyar, A. J. Park, G. Feng,T. Xin, H. Li, G. Long, A. Brodutch, et al., npj QuantumInformation , 45 (2017).[5] R. R. Ernst, G. Bodenhausen, and A. Wokaun, Prin-ciples of Nuclear Magnetic Resonance in One and Two Dimensions (Oxford University Press, 1987).[6] N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbr¨uggen,and S. J. Glaser, J. Magn. Reson. , 296 (2005).[7] P. De Fouquieres, S. G. Schirmer, S. J. Glaser, andI. Kuprov, J. Magn. Reson. , 412 (2011).[8] D. L. Goodwin and I. Kuprov, J. Chem. Phys. ,204107 (2016).[9] D. G. Lucarelli, preprint arXiv:1611.00188 (2016).[10] C. Moler and C. V. Loan, SIAM Review , 3 (2003).[11] N. J. Higham, SIAM J. Matrix Anal. Appl. , 1179(2005). [12] A. H. Al-Mohy and N. J. Higham, SIAM J. Matrix Anal.Appl. , 970 (2009).[13] I. I. Maximov, Z. Toˇsner, and N. C. Nielsen, J. Chem.Phys. , 05B609 (2008).[14] S. C. Hou, L. C. Wang, and X. X. Yi, Phys. Lett. A ,699 (2014).[15] T. Caneva, T. Calarco, and S. Montangero, Phys. Rev.A , 022326 (2011).[16] G. Bhole, V. S. Anjusha, and T. S. Mahesh, Phys. Rev.A , 042339 (2016).[17] G. Bhole and T. S. Mahesh, preprint arXiv:1707.02162(2017).[18] M. Suzuki, J. Stat. Phys. , 883 (1986).[19] K. Waldherr, T. Huckle, T. Auckenthaler, U. Sander, andT. Schulte-Herbr¨uggen, High Performance Computing inScience and Engineering (Springer, 2010), chap. Fast 3DBlock Parallelisation for the Matrix Multiplication PrefixProblem, pp. 39–50. [20] D. G. Cory, M. D. Price, W. Maas, E. Knill, R. Laflamme,W. H. Zurek, T. F. Havel, and S. S. Somaroo, Phys. Rev.Lett. , 2152 (1998).[21] S. Boutin, C. K. Andersen, J. Venkatraman, A. J. Ferris,and A. Blais, Phys. Rev. A , 042315 (2017).[22] T. Xin, S. Huang, S. Lu, K. Li, Z. Luo, Z. Yin, J. Li,D. Lu, G. Long, and B. Zeng, Sci. Bull. , 17 (2018).[23] Y. Zhang, C. A. Ryan, R. Laflamme, and J. Baugh, Phys.Rev. Lett. , 170503 (2011).[24] F. Dolde, V. Bergholm, Y. Wang, I. Jakobi, B. Nayde-nov, S. Pezzagna, J. Meijer, F. Jelezko, P. Neumann,T. Schulte-Herbrggen, et al., Nat. Commun. , 3371(2014).[25] V. Nebendahl, H. H¨affner, and C. F. Roos, Phys. Rev. A , 012312 (2009).[26] R. Fisher, F. Helmer, S. J. Glaser, F. Marquardt, andT. Schulte-Herbr¨uggen, Phys. Rev. B81