Geometrical versus time-series representation of data in quantum control learning
GGeometrical versus time-series representation of data in learning quantum control
M. Ostaszewski, J.A. Miszczak, and P. Sadowski
Institute of Theoretical and Applied Informatics,Polish Academy of Sciences, Ba(cid:32)ltycka 5, 44-100 Gliwice, Poland (Dated: v. 1.0 (14/03/2018))We study the application of machine learning methods based on a geometrical and time-seriescharacter of data in the application to quantum control. We demonstrate that recurrent neuralnetworks posses the ability to generalize the correction pulses with respect to the level of noisepresent in the system. We also show that the utilisation of the geometrical structure of controlpulses is sufficient for achieving high-fidelity in quantum control using machine learning procedures.
I. INTRODUCTION
The presented work aims to study two approaches to representing the information included in control pulses. Inthe first approach, the pulses are treated as time-series. In the second approach, we utilise only Euclidean geometryof the space of pulses. We focus on the correction scheme, which can be used to utilise normal control pulses (NCP), ie. control pulses corresponding to the system without the drift Hamiltonian, for obtaining denoising control pulses (DCP) for the system with the drift Hamiltonian, taking into account the undesired dynamics in the system.The optimization of the control parameters of quantum systems is carried out mainly using greedy gradient meth-ods [1, 2]. This is a different situation than in the field of control in classical systems, where machine learning methodsare often used. In contrast to gradient methods, machine learning methods allow for the representation of the dy-namics of the quantum system. The artificial neural network processing control impulses for the quantum system isan unambiguous representation of such dynamics. Therefore, it can be used to study the dynamics of the system,such as sensitivity to the disorder. Thus, the approach based on the use of neural networks in the theory of quantumcontrol is important not only from the point of view of obtaining optimal control impulses [3, 4]. It allows to buildand analyze the dynamics of the quantum system [5].We present the analysis of different methods for representing the knowledge about the correction scheme. We focuson two categories of such methods. The first one relies on the geometrical dependencies of data. The second one takesinto account the time-series characteristics of data. To be more specific, we limit our considerations to two methods:construction of correction scheme by k-means and kNN algorithms, and an approximation of this scheme by recurrentneural networks. The methods based on k-means and kNN algorithms are machine learning algorithms utilizing thegeometrical features of data. They are used in many applications where fast classification is required. On the otherhand, recurrent neural networks were developed for the purpose of learning time correlations of sequences and areapplied in the situations with time series with long-range correlations, such as in natural language processing.Our analysis of the methods for representing the knowledge about the correction scheme is focused on two criteria. • Efficiency in reproduction of the correction scheme. • Ability of generalization with respect to the strength of the noise parameter.The first criterion tells us how good the analysed methods are in terms of reproducing the correction scheme, ie. what is the quality of the approximation provided by them. The second criterion enables us the examination howmuch information about the reparation of control pulses for a fixed strength of noise can be used to generate thecontrol pulses for different strengths of the noise. Such analysis can be used to investigate to what degree the usedmethods can encode the form of the undesired interaction present in the system.We will present our results in the following order. We start by introducing the notation and technical details requiredfor the description of the experiments. In particular, we introduce the architecture of artificial neural network suitablefor processing quantum control pulses. Next, we focus on the efficiency of artificial neural networks and geometricalmethods in the reconstruction of the quantum control pulses. a r X i v : . [ qu a n t - ph ] M a r II. PRELIMINARIESA. Single-qubit dynamics
Let us consider a two-dimensional system, described by H = C , with evolution described by Schr¨odinger equationof the form d | ψ (cid:105) d t = − i H ( t ) | ψ (cid:105) . (1)The Hamiltonian of the system is a sum of two terms corresponding to the control field and drift interaction respectively H ( t ) = H c ( t ) + H γ , (2)with control Hamiltonian H c ( t ) = h x ( t ) σ x + h z ( t ) σ z , (3)and drift Hamiltonian H γ , where γ is the real parameter. To steer the system, one needs to choose the coefficients inEq. (3), ie. h ( t ) = ( h x ( t ) , h z ( t )). In simulations we assume that function h ( t ) is constant in time intervals/time slots∆ t i = [ t i , t i +1 ], which are the equal partition of evolution time interval T = (cid:83) n − i =0 ∆ t i . Moreover, we assume that h ( t ) has values from the interval [ − , h ( t ) will be denoted as a vector of values for time intervals. For γ = 0, control parametersare called NCP (normal control pulses), while for γ > { , . . . , n − } , and the seconddimension corresponding to controls { x, z } , N CP i,j = h j ( t ) , for γ = 0 ,DCP i,j = h j ( t ) , for γ (cid:54) = 0 , (4)with t ∈ [ t i , t i +1 ], i ∈ { , . . . , n − } , j ∈ { x, z } .The figure of merit in our problem is the fidelity distance between superoperators, defined as [6] F = 1 − F err , (5)with F err = 12 N (Tr( Y − X ( T )) † ( Y − X ( T ))) , (6)where N is the dimension of the system in question, Y is superoperator of the fixed target operator, and X ( T ) isevolution superoperator of operator resulting from the numerical integration of Eq. (1) with given controls. Oneshould note that superoperator S of operator U is given by formula S = U ⊗ ¯ U . (7)
B. Machine learning methods
In this section we introduce two machine learning algorithms, which will be compared in the next sections. Itshould be noted that the algorithms described below directly work on (NCP,DCP) pairs. However, the evaluationof their efficiency demands the comparison on the level of operators corresponding to the considered control pulses.Therefore, (NCP, U target ) pairs are used in the testing data. By the efficiency we understand the mean fidelitybetween operators generated from DCP using the considered approximation and target operators. We can distinguishtwo reference values for which the efficiency of considered approximation is compared1. mean fidelity between operators obtained from NCP applied on system with drift, and target operators,2. one ie. maximal value which can be obtained from fidelity function.The necessary condition is to obtain the efficiency higher than the value from point 1). The desired condition is toobtain the efficiency close to the value from point 2).
1. LSTM as an approximation of the correction scheme
The control pulses used to drive the quantum system with Hamiltonian given by Eq. 3 are formally time series.This suggests that one may study their properties using the methods from pattern recognition and machine learning[7, 8] that have been successfully applied to process data with similar characteristics. The mapping from NCP to DCPshares similar mathematical properties with that of statistical machine translation [9], a problem which is successfullymodelled with artificial neural networks (ANN) [10]. Because of this analogy, we use ANN as the approximationfunction to learn the correction scheme for control pulses. A trained artificial neural network will be used as a mapfrom NCP to DCP ANN(NCP) = nnDCP , (8)where nnDCP, neural network DCP, is an approximation of DCP obtained using the neural network.Our approach is based on bidirectional LSTM network, where the input will be the batch of NCP. After the lastunit of bidirectional LSTM, we apply one dense layer, which processes the output of LSTMs into the final output.Because of time series character of control sequences, bidirectional long short-term memory (LSTM) units are the coreof our network [11]. Long short-term memory (LSTM) block is a special kind of recurrent neural network (RNN), atype of neural network which has directed cycles between units. Similarly to other RNN, LSTM networks can takeinto account hidden states from their history. In other words, the output in given time depends not only on currentinput but also on earlier inputs. Therefore, this kind of neural network is applicable in situations with time serieswith long-range correlations, such as in natural language processing where the next word depends not only on theprevious sentence but also on the context.Basic architectures of RNN are not suitable for maintaining long time dependences, due to the so called vanish-ing/exploding gradient problem – the gradient of the cost function may either exponentially decay or explode as afunction of the hidden units. The LSTM unit has a structure specifically built to solve the vanishing/explodinggradient problem of other RNNs, and is adjusted to maintain memory over long periods of time. The bidirectionalversion of LSTM is characterized by the fact that it analyses input sequence/time series forwards and backwards, soit uses not only information from the past but also from the future [12, 13].It should be noted that LSTM units are not the only ones in our network. Because a bidirectional LSTM returnsforward and backward layers, the resulting signals have to be merged. The obtained results sugest that the fullyconnected layer at the end of network increases the efficiency.For two qubit systems we found that three stacked bidirectional LSTM layers are sufficient. Moreover, at the end ofthe network we use one dense layer which precesses the output of stacked LSTMs to obtain our nnDCP. Experimentsare performed using the TensorFlow library [14, 15].The details of the proposed architecture are as follows: • input representing a batch of NCP, with shape equal to [batch size, time slots, 2], where the last dimension is2 because we have 2 controls; • three layers of bidirectional LSTM with the number of hidden units respectively 200, 250, 300, resulting in twooutputs with shapes [batch size, time slots, 300]; • merging of the outputs of the last LSTM unit ie. element-wise sum of forward LSTM output and backwardLSTM product, resulting in the output shape [batch size, time slots, 300]; • joining batch size and time slots dimensions ie. reshape, resulting in the output shape [batch size * time slots,300]; • processing the data by dense layer ie. fully connected layer (MLP), with tanh as an activation function, resultingin the output shape [batch size*time slots, 2]), • reshape, resulting in the output shape [batch size, time slots, 2]), • output with shape [batch size, time slots, 2].As the cost function we choose mean squared error between nnDCP and target DCP. The scheme of generatingdata and utilizing networks is as follows1. Generating training set
Generate (NCP, DCP) pairs for random target unitary operators by QuTIP [16–18].2.
Training of the network
Train neural network using NCP as an input and treating the corresponding DCPas an output reference.3.
Testing (a) use QuTIP to generate a testing set consisting of NCP vectors for random U target operators,(b) use the trained neural network we generate a nnDCP vector for each NCP in the testing set,(c) construct operator F nnDCP for the system with the drift using resulting nnDCP as a control sequence,(d) construct operator F NCP for system with the drift from testing NCP,(e) calculate fidelity distance between F nnDCP and U target ,(f) calculate fidelity between F NCP and U target which corresponds to NCP (from the testing set),(g) compare the results from 5) and 6). C. Clustering/classification algorithm
The considered geometric correction scheme approximation method is based on two steps. The first one is clusteringof the training set and generating the set of representative corrections for the resulting clusters. In this step we generatethe corrections which will be assigned in the next step during the classification procedure. The second step is buildinga classifier, that decides which correction cluster should be taken into account given the input NCP. This is motivatedby the question whether is it possible to divide control pulses into clusters such that a common correction methodwould be effective.
1. Generation of representative corrections
In the first step of our procedure we need to extract the key information on the set of the training control pulses.To do this, we perform k -means algorithm on training data and calculate the means over each cluster.In the simplest approach, the input data for the k -means algorithm are provided as raw pulses obtained fromQuTIP. However, it is possible to reduce the dimensionality of the space by using various approximations. Here, wehave checked three types of approximation of correction control pulses, namelya) sinusoid a sin( bx + c ) + d sin( ex + f ) + g, (9)with a > d ,b) polynomial of the third degree (poly3) ax + bx + cx + d, (10)c) polynomial of the forth degree (poly4) ax + bx + cx + dx + e. (11) Input:
A set of
CCP i = N CP i − DCP i vectors, number of clusters k .For each unitary matrix we calculate DCP and NCP using QuTIP. Next, we calculate correction control pulses(CCP) as the differences between DCP and NCP. The number of clusters k should be chosen to maximize themean fidelity, but it should be significantly smaller than the number of samples. Output:
Cluster labels for each training point, set of k corrections ¯ C , . . . , ¯ C k representative for the clusters, meanefficiency of the corrections. Step 0: (Optional) Approximation of CCP vectors.We choose the type of approximation (one from Eqs. (9),(10), or (11)) and change the space of considered data ie. we express every CCP i by a vector of suitable approximation coefficients – coeffs i . Step 1:
Clustering of the set of CCP vectors.Apply k -means algorithm on the set of correction pulses C = { CCP i } i or on coeffs i . As every coeffs i correspondsto CCP i , the result is the set of k clusters C , C , . . . , C k . Note that this set of CCP vectors is disjoint from theinitial training set C . Step 2:
Calculate output corrections.For each of the clusters we calculate the output correction as the mean correction vector¯ C i = (cid:80) CCP ∈ C i CCP | C i | . (12) Step 3:
Calculate the efficiency of the output corrections.1. For each training sample i we calculate fidelity of the operator obtained from N CP i + CCP i and theoperator resulting from applying N CP i + ¯ C j , for j being the cluster label for the i th training sample.2. The mean of obtained results is the score of the parameter k , SoP( k ). Step 4:
Return calculated labels, corrections and the score of parameter k .It should be stressed that if Step 0 of the above algorithm is executed then the data from the new space are used onlyin the first step of the algorithm for calculating distances. The remaining part of the algorithm is based on the rawdata and the resulting corrections are calculated and benchmarked against the raw data.If SoP( k ) is lower than the expected efficiency, the above algorithm should be repeated for another k . In otherwords, if we do not know a suitable k , then the above algorithm can be used as a subprocedure for finding theparameter k with assumed efficiency.
2. Application of corrections
In the next step of our procedure we need to develop a method for deciding which correction should be used forgiven NCP. For this purpose we utilize kNN algorithm, which will be fitted on the set of training NCP with labelsobtained during the clustering of CCP. Prediction of kNN on test NCP gives us the most probable cluster j , and inresult a correction ¯ C j which should be used to generate suitable DCP.The choice of kNN algorithm is motivated by geometrical dependencies of data it utilizes. This is in line with ourassumption about the examined approaches for representing control pulses. III. EFFICIENCY OF LEARNING
In the experiments described below, we examine the two introduced methods ie.
LSTM and k -means/kNN. Wecompare their efficiency as the approximations of the correction scheme and we analyse their ability to generalize thescheme additional information about noise strength in the input.The training and testing data used in the experiments described below were generated using standards methods [19,20] and QuTIP [16–18] package for Python programming language. First, we construct the target matrix U whichis a 2 × • time of evolution T = 2 . • number of intervals n = 16, • control pulses in [ − , A. Efficiency of LSTM
In this experiment we generate 5000 random unitary matrices and train the neural network to predict DCP based onNCP. The network is given thr reference (NCP, DCP) pairs as a training set. The following results were obtained withLSTM trained on 3000 control pulses. In this experiment we aim at analysing the limiting efficiency, ie. the maximal γ Method 0 . . . . . . . . . . . . . . . . H γ = σ y for different methods of obtaining control pulses. obtainable efficiency with unlimited number of training samples and we do not observe any significant improvementwith the increasing number of training samples.The test has been performed on the set of 2000 control pulses and the range of values of the parameter γ is adjustedto the restrictions for the parameters of the model. We restrict the values of this parameter to γ ∈ [0 , γ using gradient-based methodsimplemented in QuTIP.The results presented in Table I demonstrate that LSTM network can achieve efficiency with errors of the sameorder of magnitude as the reference data. However, one should note that for higher values of gamma parameter thedata obtained from QuTIP contain many outliers, ie. there is a significant number of matrices for which the resultingfidelity is below the acceptable level of 0.90. B. Efficiency of corrections from clustering
In the second experiment we use k -means algorithm to utilize mean control from the cluster to correct NCP. Asthe input data we take 3000 unitary matrices and generate DCP, NCP and CCP vectors for them. It should be notedthat at this level of algorithm, we test how the clusters are efficient. For this purpose we calculate the mean fidelityform operators obtained from NCP corrected by representative corrections and compare it to the efficiency of DCPand to the efficiency of NCP applied on system with noise. Also, on all CCP we perform three approximations.As one can see in Fig. 1, for each kind of approximation method, there exists k such that applying the correctionscheme, being mean CCP within corresponding cluster, is better than the mean fidelity of application of NCP with γ (cid:54) = 0. Moreover, the clustering on the raw data, without any kind of approximation, yields significantly better results.This demonstrates that the methods used for approximation are not suitable for reducing the dimensionality in thiscase.In Fig. 2 one can see the important characteristic of the algorithm, namely its invariance with respect to the numberof samples. One can see that the clustering with a fixed parameter k gives similar results for training sets consistingof 1000 and 5000 samples.The last part of the assessment procedure consists of the efficiency test of the whole correction scheme, ie. we aimat answering what is the efficiency when we use kNN classifier. In this case the efficiency of kNN should be limitedby the efficiency resulting from clustering. As we can see in Fig. 3, kNN classifier has similar results on a test set asthe clustering on the training set. Moreover, one can see that for the number of clusters close to 300, we obtain theefficiency similar to LSTM.From the above results one can see, that the data have some geometrical structure which can be captured by k -means and kNN algorithms. Moreover, the results obtained using this approach are very similar to the resultsobtained using significantly more complicated approach represented by LSTM. IV. GENERALIZATION IN LEARNING QUANTUM CONTROL
As we are interested not only in the efficiency of the reproduction of the correction scheme, the goal of the nextexperiment is to investigate the ability of the considered methods to provide a generalization of the correction scheme.To achieve this goal, we modify inputs in such way that we add γ as an additional parameter to NCP. Next we trainthe models for many choices of gamma, and check whether the analysed algorithms are able to infer the correctionpulse for the new choices of gamma.This allows the examination of the analysed algorithms ability to interpolate and extrapolate the correction schemeaccording to parameter γ . The analysis will be based on the mean efficiency of the considered algorithms. Moreover,we will compare their results to the reference point ie. to the efficiency of DCP generated for some gamma, butintegrated within the system with different gammas. k )0 . . . . . . . m e a nfid e li t y (a)Case γ = 0 .
2, where average fidelity of NCP is .90 k )0 . . . . m e a nfid e li t y (b)Case γ = 0 .
4, where average fidelity of NCP is .66 k )0 . . . . . . . m e a nfid e li t y (c)Case γ = 0 .
6, where average fidelity of NCP is .37 k )0 . . . . . m e a nfid e li t y (d)Case γ = 0 . FIG. 1. The above plots show the comparisons of the efficiency of corrections from clustering with the efficiency of NPC.The analysis was performed on 3000 samples. The number of clusters on axis x varies from 1 to 496 with step 5. The xplots correspond to clustering directly on CCP, dot plots correspond to clustering on approximated data by poly4, dot-dashplots correspond to clustering on approximated data by poly3, dash plots correspond to clustering on approximated data bycombination of sinusoid and continuous lines correspond to the average fidelity obtained from the application of NCP on thesystem with drift Hamiltonian. We perform experiments on two sets of gamma parameters, { . , . , . } and { . , . , . } . We choose traininggammas from different halves of [0 ,
1] interval. The training on different gammas has different efficiency, that is ifgamma is bigger, then the efficiency of approximation is lower. Moreover, tests in all experiments are performed onthe set of gammas { . , . , . . . , . } . A. Generalization using LSTM
In previous experiments, the vectors of pairs ie.
NCP were applied as the input of artificial neural network. Now,as the input we will take vectors of triples, where each triple will be of the form ( h x ( t ) , h z ( t ) , γ ). We denote thisinputs as (NCP, ¯ γ ). It can be interpreted as the addition of the third dimension to the time series, where at each timeslot there is the same value of γ . Because of that, the architecture of our network remains unchanged. The elementsof the training set • were generated from 3000 random unitary matrices, • consist of 9000 NCP with gammas γ , γ , γ , where we have 3000 vectors for each γ , number of clusters (k) . . . . . m e a nfid e li t y FIG. 2. Invariance of the clustering with respect to the number of samples. The graph plotted with a blue dashed linecorresponds to the sample of size 1000, and the graph plotted with a yellow dot-dashed line corresponds to the sample of size5000. The experiment was performed for γ = 0 . k )0 . . . . . . m e a nfid e li t y after kNNafter kMeansLSTM efficiency (a)Case γ = 0 .
2, where average fidelity of NCP is .90 k )0 . . . . . m e a nfid e li t y after kNNafter kMeansLSTM efficiency (b)Case γ = 0 .
4, where average fidelity of NCP is.65 k )0 . . . m e a nfid e li t y after kNNafter kMeansLSTM efficiency (c)Case γ = 0 .
6, where average fidelity of NCP is .37 k )0 . . . . . . m e a nfid e li t y after kNNafter kMeansLSTM efficiency (d)Case γ = 0 .
8, where average fidelity of NCP is .16
FIG. 3. The above plots show the comparisons of the efficiency of clustering on a training set (3000 samples) and classifier ona test set (2000 samples) with respect to the number of clusters ie. the number of universal correction schemes. Because oflow divergence, we choose one representative k in kNN algorithm ie. k = 4. • each pair (NCP, ¯ γ ) corresponds to a different DCP.The elements of the test set • were generated from different 2000 random unitary matrices • consist of 18000 NCP with gammas 0 . , . , . . . , .
9, where we have 2000 vectors for each γ , • each pair (NCP, γ ) corresponds to a different DCP. B. Generalization using k –means and kNN Training of k –means: We train the clustering algorithm with number of clusters equal to 500, on CCP obtainedfrom DCP for γ = 0 . , . , . . , . , . Training of kNN:
We train the classification algorithm on the flattened NCP corresponding to CCP from theclustering. However, we include additional information about γ by adding γ multiplied by some large numberto the flattened NCP vector. This separates the vectors on the subspace spanned by the added element. In ourexperiments, this number is equal to 1000. In kNN we choose k = 4. Testing of kNN:
This situation is analogical to training, ie. we take NCP from the test set and concatenate to itsome label about gamma.
1. Reference point
For the purpose of analysing the ability to generalize the correction scheme we are using a reference point , definedas the values of mean fidelity obtained by the algorithm trained on data with fixed parameter γ , applied to othervalues of γ . Reference point provided the minimal efficiency which should be obtained by the tested algorithm inorder to consider it acceptable.The reference points are constructed as follows. Let us suppose that we have an already trained artificial neuralnetwork and k -means/kNN algorithms. This trained approximations reproduce corrections schemes for a system witha fixed parameter γ and can be applied on a test set of NCP. The generated approximations of DCP can be integratedwith different values of parameter γ . The fidelity of the resulting operator with the target operator provides theefficiency of the approximation. In the other words, we test how efficient is DCP generated for some γ when we applyit to system with different γ . . . . . γ . . . . . . . m e a nfid e li t y LSTM trained for γ = 0 . γ = 0 . γ = 0 . γ = 0 . γ = 0 . γ = 0 . FIG. 4. Reference point of generalization of LSTM and kNN.
The reference points for kNN and LSTM are presented in Fig. 4. One can observe that LSTM near the value of γ for the reference point has efficiency higher than kNN.0 . . . . γ . . . . . . . m e a nfid e li t y kNN trained for γ = 0 . γ = 0 . γ = 0 . , . , . (a)Comparison of the efficiency of generalization obtainedusing kNN with reference points for γ = 0 . . . . . γ . . . . . . . m e a nfid e li t y kNN trained for γ = 0 . γ = 0 . γ = 0 . , . , . (b)Comparison of the efficiency of generalization obtainedusing kNN with reference points for γ = 0 . . . . . γ . . . . . . . m e a nfid e li t y LSTM trained for γ = 0 . γ = 0 . γ = 0 . , . , . (c)Comparison of the efficiency of generalization obtainedusing LSTM with reference points for γ = 0 . . . . . γ . . . . . . . m e a nfid e li t y LSTM trained for γ = 0 . γ = 0 . γ = 0 . , . , . (d)Comparison of the efficiency of generalization obtainedusing LSTM with reference points for γ = 0 . FIG. 5. Comparison of the efficiency of generalization abilities between LSTM and kNN. Plots 5(a),5(b), performs generalizationabilities of kNN. Plots 5(c),5(d), performs generalization abilities of LSTM. The training set has size 3000, and the test set hassize 2000.
2. Results
As one can see in Fig. 5(a), the k –means/kNN has three local extrema, which correspond to gamma on whichthe algorithm was trained. Therefore, this algorithm has noticeable drops in the interpolation. Comparing exactvalues, the k –means/kNN trained on data with γ = 0 . , . , .
5, has efficiency 0.962 and 0.945 for gammas 0.4 and0.6 respectively. Reference values for kNN trained with γ = 0 . γ = 0 .
3. In this case the valuesof mean fidelity for γ = 0 . γ = 0 . γ . Moreover, for kNN trained with γ = 0 . , . , . γ = 0 .
5. For kNN trained with γ = 0 . , . , . γ is also limited. In this case thismight be caused by the presence of outliers in the training data (see Fig 5(b)). This suggests that this method doesnot utilize the information about the γ parameter. The additional γ in the input is not utilized and the algorithmobtains similar results as the algorithm without γ in the input.The situation is different in the case of utilizing of LSTM network, which displays the ability to generalize thecorrection scheme using information about the γ parameter. This effect can be observed in Fig. 5(c). As one can see,LSTM has high efficiency in the neighbourhood of the training points. The reference point, is the result obtained fromLSTM trained on pairs (NCP, DCP) for the system with γ = 0 .
5. For the tested cases γ = 0 . , . γ as additional parameterin input has efficiencies equal to 0.992 and 0.984, respectively. Thus the LSTM provides better results in the case ofthe interpolation. Similar effect can be observed for reference points for LSTM trained with γ = 0 .
3. In this case, the1values of mean fidelity for γ = 0 . γ = 0 . γ = 0 . , . , . γ = 0 .
5. Thus, one can conclude that LSTM has the ability to generalize for other γ .One can observe the decrease of the efficiency for the generalization for LSTM trained on γ = 0 . , . , .
9, appliedfor γ = 0 . γ = 0 .
9. This might be caused by the presence of outliers in the training data. However, one shouldnote that this effect does not influence the ability to the provided correction scheme for lower values of γ , whichis more efficient than the reference point. This is in contrast with the lack of such ability observed in the case ofusing kNN.One should note that the ability of generalization does not depend on the absolute values of γ . This can be observedby the decrease in the efficiency of extrapolation which can be observed for the cases when we train the algorithmson small gammas and extrapolate for larger values (see Figs 5(a) and 5(c)) or when we train the algorithms on largevalues and extrapolate to small vales (see Figs 5(b) and 5(d)). V. CONCLUDING REMARKS
The presented work demonstrates that the techniques used in machine learning can be applied for the purposeof generating quantum control pulses. The conducted experiments demonstrate that both neural networks andgeometrical methods provide good approximations of correction schemes and enable counteracting the undesiredinteraction present in the system.However, one should note that both methods have their specific advantages and disadvantages. Artificial neuralnetworks are useful in the sense of approximation function as the trained network is a unique map from NCP toDCP. Because of this, one can examine a variation, continuity, and other mathematical features of this correctionscheme [21]. Moreover, we demonstrated that recurrent neural networks have the ability to generalise their predictions.This can be seen in the presented experiments where the network that was trained on few gammas, has also goodresults for gammas which were absent in the training process.On the other hand, the application of clustering shows that this repair scheme can be compressed to a relativelysmall number of corrections. This demonstrates that the continuous process of quantum control can be representedbe a relatively small number of representative control pulses. Such method provides the efficiency of approximationsimilar as in the case of the recurrent neural networks. Unfortunately, the obtained results suggest that such purelygeometrical approach is significantly less reliable in the process of generalization. This is especially visible in thesituation where the extrapolation of the correction scheme is required. One should also note that the correctionscheme based on the geometrical features of control pulses cannot be easily simplified by using standard approximationmethods.
ACKNOWLEDGMENTS
MO acknowledges support from Polish National Science Center grant 2011/03/D/ST6/00413. JAM acknowledgessupport from Polish National Science Center grant 2014/15/B/ST6/05204. Authors would like to thank DanielBurgarth and Leonardo Banchi for discussions about quantum control, Bartosz Grabowski and Wojciech Masarczykfor discussions concerning the details of LSTM architecture, and Izabela Miszczak for reviewing the manuscript. [1] N. Khaneja, T. Reiss, C. Kehlet, T. Schulte-Herbr¨uggen, and S. J. Glaser, J. Magn. Reson. , 296 (2005).[2] P. Doria, T. Calarco, and S. Montangero, Phys. Rev. Lett. , 190501 (2011).[3] M. August and J. M. Hern´andez-Lobato, (2018), arXiv:1802.04063.[4] M. Swaddle, L. Noakes, L. Salter, H. Smallbone, and J. Wang, (2017), arXiv:1703.10743.[5] S. Lloyd and S. Montangero, Phys. Rev. Lett. , 010502 (2014).[6] F. Floether, P. de Fouquieres, and S. Schirmer, New J. Phys. , 073023 (2012).[7] C. M. Bishop, Neural networks for pattern recognition (Oxford University Press, Oxford, UK, 1995).[8] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio,
Deep learning , Vol. 1 (MIT Press, Cambridge, USA, 2016).[9] P. Koehn,
Statistical machine translation (Cambridge University Press, Cambridge, UK, 2009).[10] D. Bahdanau, K. Cho, and Y. Bengio, (2014), 1409.0473.[11] S. Hochreiter and J. Schmidhuber, Neural Computation , 1735 (1997).[12] M. Schuster and K. K. Paliwal, IEEE Transactions on Signal Processing , 2673 (1997). [13] A. Graves, S. Fern´andez, and J. Schmidhuber, in International Conference on Artificial Neural Networks (Springer, 2005)pp. 799–804.[14] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur,J. Levenberg, R. Monga, S. Moore, D. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, andX. Zheng, in , Vol. 16 (2016) pp. 265–283.[15] “Tensorflow: An open-source machine learning framework for everyone,” (2016-).[16] “QuTiP - Quantum Toolbox in Python,” (2012-).[17] J. Johansson, P. Nation, and F. Nori, Comput. Phys. Commun. , 1760 (2012).[18] J. Johansson, P. Nation, and F. Nori, Comput. Phys. Commun. , 1234 (2013).[19] F. Mezzadri, Notices of the American Mathematical Society (2007), math-ph/0609050.[20] J. Miszczak, Comput. Phys. Commun.183