Denoising quantum states with Quantum Autoencoders -- Theory and Applications
DDenoising quantum states with QuantumAutoencoders - Theory and Applications
Tom Achache † , , Lior Horesh † , (cid:63) , and John Smolin † , (cid:63) † Columbia University (cid:63)
IBM ResearchJanuary 1, 2021
Abstract
We implement a Quantum Autoencoder (QAE) as a quantum circuit capable of correcting Green-berger–Horne–Zeilinger (GHZ) states subject to various noisy quantum channels : the bit-flip channeland the more general quantum depolarizing channel. The QAE shows particularly interesting results,as it enables to perform an almost perfect reconstruction of noisy states, but can also, more surpris-ingly, act as a generative model to create noise-free GHZ states. Finally, we detail a useful applicationof QAEs : Quantum Secret Sharing (QSS). We analyze how noise corrupts QSS, causing it to fail, andshow how the QAE allows the QSS protocol to succeed even in the presence of noise.
Autoencoders are a powerful type of Neural Net-works that can realize compact representations ofdata, and are often used as a denoising method[1]. Following this idea, Quantum Autoencoders(QAEs) have recently emerged as a possible solu-tion to tackle the crippling quantum noise prob-lem, which is ubiquitous in real-device quantumcomputations. Here, we choose to focus on QAEsas proposed in [2], which enable an almost per-fect reconstruction of Greenberger–Horne–Zeilinger(GHZ) states subject to random bit-flips and smallunitary noise. A m -qubit GHZ state is defined by1 √ | (cid:105) (cid:78) m + | (cid:105) (cid:78) m ) . (1)Studying GHZ states is of particular relevance asthey exhibit long-range entanglement, and are usedin a wide range of applications.Noticeably, the circuit depth of the QAEs consid-ered in this paper is very shallow (for instance, the [email protected] QAE denoising 3-qubit GHZ states only requires 4qubits), making them easy to implement on cur-rently existing quantum devices.These QAEs rely on the Quantum Neural Net-works (QNNs) architecture described in [3], andare trained in an unsupervised learning fashion [4].While [3] already provides an implementation ofQNNs in Matlab [5], we choose here to implementthe QAEs in a pure quantum framework, using theQiskit [6] library.In this study, we propose an end-to-end quantum(circuit) encoding of all the algorithmic steps asso-ciated with QAEs training and testing. Further, wediscuss important implementation and training de-tails accompanied by extended results of the QAEs.Lastly, we perform a thorough mathematical anal-ysis of the impact of noise in Quantum Secret Shar-ing (QSS) and show how to counter it. Hence, thiswork covers both theoretical and experimental as-pects of QAEs.This paper is organized as follows. Section 2 ad-dresses the theoretical background regarding theimplementation and training of QAEs on quantum1 a r X i v : . [ qu a n t - ph ] D ec ircuits. Section 3 presents numerical results per-taining to the performance of the QAEs when de-noising corrupted states subject to different kindsof noise. Eventually, in section 4 we discuss an ap-plication of QAEs in QSS, and demonstrate theirsignificance in that context. This section summarizes the theory for imple-menting a QNN on a quantum computer, with par-ticular attention to the specific behavior of QAEs.The critical steps of the computation are describedbelow, but the reader interested in the details ofthe calculus is referred to [3].
The QAE used in this study was selected from [3]for its appealing and practical property of beingable to be trained and tested using a single quan-tum circuit and a minimal number of qubits. Wedenote by [ m , . . . , m (cid:96) ] a QNN with (cid:96) layers, eachlayer having size m i , ≤ i ≤ (cid:96) (thus the input ofthe QNN is a quantum state of m qubits). Forsuch a QNN, the quantum circuit is made up of Q qubits, where Q = 1 + m + w (2)with w = max ≤ i ≤ (cid:96) − ( m i + m i +1 ) . (3) w is the width of the QNN, and the other bitsprovide a measure of the fidelity (see Eq. (4)) be-tween the results of the QNN and the target statesduring the training phase. The fidelity between astate | φ (cid:105) and a density matrix ρ is defined as F ( | φ (cid:105) , ρ ) = (cid:104) φ | ρ | φ (cid:105) . (4)It should be underlined that once the traininghas been completed, we only use the QAE, whichhas w qubits. An example of the complete circuitis represented in Figure 1. We divided it in fourparts, separated with vertical grey lines (”barri-ers”). The first part of the network represents thestates’ preparation, the second part contains theQAE itself, the third part allows us to compute thefidelity between the output of the QAE and thetarget state, as demonstrated in [3], and the last part is the measurement. If we denote ρ the den-sity matrix of the state output by the QAE, | φ (cid:105) the target state, and p the probability of getting0 when doing the measurement, we then have p = 12 (1 + F ( | φ (cid:105) , ρ )) , (5)where F is the fidelity function defined in Eq. (4).The QAE circuit is represented in Figure 2. Ithas two internal layers, separated by a barrier.As explained in [3], a QNN is composed of lay-ers of qubits (replacing the neurons in classicalNNs), connected by unitary transformations (re-placing the weights). Each unitary transformationacts on all the qubits of the input layer and onequbit of the output layer. This is equivalent toclassical NNs where each neuron at a certain layeris connected to all the neurons of the previous one.Thus, there are as many unitary transformationsconnecting two layers as there are qubits in the out-put layer. Moreover, each unitary transformation U is defined by U = e iK with K = (cid:88) σ ∈ P ⊗ ( m +1) k σ · σ, (6)where the k σ are the parameters to be learned, m is the size of the input layer, P = { I, X, Y, Z } isthe set of Pauli matrices and the identity, and P ⊗ j denotes the set of all possible tensor products oflength j between the elements of P . For instance, P ⊗ = { II, IX, IY, IZ, XI, XX, XY, XZ, . . . } .Since P forms a basis for the real vector space of2 × P ⊗ ( m +1) naturally formsa basis for the real vector space of 2 m +1 × m +1 Hermitian matrices. Hence, K is uniquely definedby its coefficients k σ .As a reminder, the expressions of the Pauli ma-trices are X = (cid:18) (cid:19) , Y = (cid:18) − ii (cid:19) , Z = (cid:18) − (cid:19) (7)Thus, between two layers of size m i and m i +1 ,we will have m i +1 unitary transformations, eachhaving 4 m i +1 coefficients. Hence, for a circuit ofshape [ m , . . . , m (cid:96) ], the total number of coefficientstrained is (cid:96) − (cid:88) i =1 m i +1 · m i +1 . (8)2
12 Input State Prep012 Target State Prep 0123 Quantum AE
H H q q q q q q q q c Figure 1: Circuit for the training of a [3,1,3] QAE. Here, w = 4 and Q = 8. | i| i| i
01 Unitary 01 Unitary 01 Unitary | i q q q q Figure 2: Circuit of a [3,1,3] QAEIn order to use the minimum possible amount ofqubits for the QAE circuit, we can reuse the qubitsof the precedent layers by resetting them, as illus-trated in Figure 2 (the | (cid:105) ’s). In practice, at alltimes during training, we only need the qubits rep-resenting two consecutive layers. So we can use afixed number of qubits by resetting the ones usedin the past layers. However, one has to be partic-ularly careful at the end of the circuit to identifythe qubits where the states is actually encoded. The aim of the training part is to maximize thefidelity between the output of the QAE and theactual state(s) we are trying to reconstruct (for in-stance, the noise-free GHZ states).As in a regular Neural Network, we have N train-ing pairs, which here consist in two circuits prepar-ing the input and target states (see Figure 1). Sinceour aim is to denoise GHZ states, it is not possibleto set them as the targets of our network, becausethat would imply that actual access to them is vi- able. Rather, we train our QAE in an unsupervisedway, similarly to the training of classical AEs. How-ever, classical AEs are trained with pairs ( x, x ), i.e.where the input and the target are identical. Thismethod does not apply here, as we do not haveaccess to the specific noise affecting a state. There-fore, we train our QAE on pairs ( x, y ), where x and y are drawn from the same noisy distribution.The cost function C is a function of the vector κ of all the parameters k σ . We denote by ρ κi theoutput of the QAE with parameters κ , when fedwith the i -th training state | φ (cid:105) i . We have C ( κ ) = 1 N N (cid:88) i =1 F ( | φ (cid:105) i , ρ κi ) . (9)The network is trained using a gradient ascentmethod. To approximate the gradient of the costfunction, we use the finite difference method, due toits simplicity. However, this method can be ratherunstable numerically in the case of noisy functions,and one might want to consider other DerivativeFree Optimization methods instead [7].Let (cid:15) >
0, for every parameter k σ , if (cid:15) is smallenough, ∂C∂k σ ( κ ) (cid:39) C ( κ + (cid:15) k σ ) − C ( κ ) (cid:15) , (10)where (cid:15) k σ is an indicator vector with (cid:15) at the po-sition corresponding to k σ , and 0 otherwise. We3an then update the parameters vector κ with theclassical rule κ ←− κ + η ∂C∂k σ ( κ ) , (11)where η is the learning rate.Hence, we implemented the training algorithm asfollows : at each epoch, we start by computing C ( κ )(with the circuit depicted in Figure 1), and then, foreach coefficient k σ , we compute C ( κ + (cid:15) k σ ). Finally,we update κ .In terms of complexity, we need to make N · ( | κ | +1) · S measurements, where | κ | is the value definedin Eq. (8) and S is the number of times we runeach circuit in order to estimate the probability ofgetting 0 as a result (and thus the fidelity, throughEq. (5)). Choosing a large S may unnecessarily in-crease the computation time. Conversely, choosinga small S may result in a poor approximation of p (and thus of the fidelity) which will hinder thetraining performance. We decided to use S = 1000in all simulations, as it offers a sensible approxi-mation of p while keeping the number of measure-ments reasonable.All the measurements can be done sequentially.This confers its practical interest to the QAE as itcan be implemented and trained using a single andrelatively small quantum circuit. However, runningmeasurements in parallel can significantly decreasethe training time of the algorithm, as each epochcan be entirely parallelized. We included both op-tions in our code.Besides, we need to tune two hyperparametersin the training phase, (cid:15) (the differential length)and η (the learning rate), which can be prohibitive.However, while the success of the training phaseis dependent on these two parameters, in practice (cid:15) = 0 . η = 1 / η due to its adaptive learning rate. However, us-ing the vanilla gradient ascent with the value of η stated yielded more than satisfying results. Due to the high number of coefficients (see Eq.(8)) and training states needed to properly train the QAE, the training phase is heavily time-expensive.Indeed, we were able to obtain almost-perfect re-sults (fidelity above 0.95) on [2,1,2] and [3,1,3]QAEs, but the resources needed for the trainingof larger networks exceeded our capacity. But themethod should work in theory, independently ofthe size of the network. [2] shows the results ona [4,2,1,2,4] network.
We first tested our QAE on the task of denoisingGHZ states submitted to random bit-flips. In ourmodel, every bit of the state is flipped with a certainprobability p .We trained a [2 , ,
2] QAE with 100 trainingpairs, drawn by taking GHZ states submitted toa bit-flip probability p = 0 .
2. The training curveis shown in Figure 3. At the end of each epoch wecomputed the fidelity on the training pairs and onvalidation pairs (100 pairs, drawn from the samedistribution as the training pairs). As we can ob-serve by the small discrepancies between the train-ing and the validation sets performance in Figure3, overfitting is not a concern here. Indeed, thevalidation set performance closely follows the per-formance of the training set, and in particular doesnot start decreasing at some point.
20 40 60 80 100Epochs0 . . . . . . T r a i n i n g F i d e li t y Training pairsValidation pairs
Figure 3: Training fidelity of a [2 , ,
2] QAE using100 training pairs, for a bit-flip noise of probability p = 0 .
2. We used (cid:15) = 0 . η = 1 / p = 0 . p can be computed by noting that when submitting atwo-qubit GHZ state to random bit-flips, there areonly two possible resulting states :1 √ | (cid:105) + | (cid:105) ) and 1 √ | (cid:105) + | (cid:105) ) . (12)The fidelities of these states with the GHZ stateare respectively 1 and 0. Moreover, to get the firststate (the GHZ state), either zero or both qubitshave to be flipped. Hence, the theoretical fidelityis simply (1 − p ) + p . (13) . . . . . . . . . . . . F i d e li t y w i t h G H Z Noisy theoretical fidelityNoisy actual fidelityDenoised fidelity
Figure 4: Mean and deviation of the fidelity of the[2 , ,
2] network, trained with p = 0 .
2, over differentbit-flip probabilities. For each p , 200 test stateswere drawn. The green points are the fidelities ofthe test states. The red points are their theoreticalfidelities.Thus, our QAE almost perfectly reconstructsGHZ states affected by bit-flips of arbitrary prob-ability.Since increasing the bit-flip probability does notaffect our QAE’s performance, one could wonderwhether the strength of the bit-flip channel (i.e.the bit-flip probability) is totally irrelevant in thewhole experiment. Actually, it is not the case. Thestrength of the channel plays a critical role in the training part : the stronger the noise, the slowerthe convergence. And if the noise is too strong, theQAE might never converge to the desired state.For instance, if p is too large, we will have more √ ( | (cid:105) + | (cid:105) ) states than GHZ states, thus theQAE will erroneously conclude that this is the orig-inal state to learn. This is represented in Figure 5.In sub-figures 5a to 5d, the QAE is able toachieve an almost-perfect fidelity. However, in sub-figures 5e and 5f, we observe two different behavior: the fidelity stagnates around 0.5 or goes to 0 in-stead of 1. This is because when p = 0 .
4, by Eq.(13), we have 52% of GHZ state (and thus 48% ofthe √ ( | (cid:105) + | (cid:105) ) state). Since these probabilitiesare very close to 50% / √ ( | (cid:105) + | (cid:105) ) state (sub-figure 5f). Considering the ability of our QAE to denoisestates corrupted by bit-flips, we further attemptedto denoise states affected by the quantum depo-larizing channel (QDC), which is a more generalform of noise. For an input state with density ma-trix ρ representing m qubits, the QDC with noisestrength p is defined as ρ QDC −−−→ (1 − p ) ρ + p (cid:0) m · (cid:1) , (14)where m · is the density matrix of the maximallymixed state. One can show that this is equivalentto applying exactly one of the matrices { I, X, Y, Z } (the Pauli matrices and the identity) to each qubit,with respective probabilities { − p , p , p , p } .We trained a [2,1,2] QAE with 100 training pairssubmitted to the QDC with noise strength p = 0 . p .5 . . . . . . T r a i n i n g F i d e li t y (a) p = 0 .
20 40 60 80 100Epochs0 . . . . . . T r a i n i n g F i d e li t y (b) p = 0 .
20 40 60 80 100Epochs0 . . . . . . T r a i n i n g F i d e li t y (c) p = 0 .
20 40 60 80 100Epochs0 . . . . . . T r a i n i n g F i d e li t y (d) p = 0 .
20 40 60 80 100Epochs0 . . . . . . T r a i n i n g F i d e li t y (e) p = 0 .
20 40 60 80 100Epochs0 . . . . . T r a i n i n g F i d e li t y (f) p = 0 . Figure 5: Training fidelities of a [2,1,2] QAE trained with pairs affected by the bit-flip channel for variousstrengths. In all experiments, we trained on 100 pairs, using (cid:15) = 0 . η = 1 /
4. We can indeed seethat the larger p is, the slower the convergence. For instance, one can compare the epoch at which thefidelity crosses the 0.9 threshold.
20 40 60 80 100Epochs0 . . . . . . T r a i n i n g F i d e li t y Figure 6: Training fidelity of a [2 , ,
2] QAE using100 training pairs affected by the QDC with noisestrength p = 0 .
2. We used (cid:15) = 0 . η = 1 / Proposition.
For a m -qubit GHZ state corruptedby the QDC with noise strength p , the theoreticalfidelity with GHZ is m (cid:88) k =0 k even (cid:18) mk (cid:19)(cid:16) p (cid:17) k (cid:16) − p (cid:17) m − k + 2 m − · (cid:16) p (cid:17) m . (15) Proof.
Every state output by the QDC that is not GHZ (up to a global phase) will be orthogonal toit, hence their fidelity will be 0. And for the QDCto output the GHZ state (up to a global phase),there is only two possibilities : • an even number of qubits were affected by Z and the rest by I • an even number of qubits were affected by Y and the rest by X Since the fidelity of these states with GHZ is 1, thetheoretical fidelity is thus m (cid:88) k =0 k even (cid:18) mk (cid:19)(cid:16) p (cid:17) k (cid:16) − p (cid:17) m − k + m (cid:88) k =0 k even (cid:18) mk (cid:19) (cid:16) p (cid:17) k (cid:16) p (cid:17) m − k (cid:124) (cid:123)(cid:122) (cid:125) ( p ) m . (16)Since m (cid:80) k =0 k even (cid:0) mk (cid:1) = 2 m − , we get the final result. (cid:4) As it was the case with the bit-flip channel, ourQAE is indifferent to the strength of the QDC6 . . . . . . . . . . . . . . F i d e li t y w i t h G H Z Noisy theoretical fidelityNoisy actual fidelityDenoised fidelity
Figure 7: Mean and deviation of the fidelity of the[2 , ,
2] network, trained with p = 0 .
2, over differ-ent noise strength of the QDC. For each p , 200 teststates were drawn. The green points are the fi-delities of the test states. The red points are theirtheoretical fidelities.and displays almost perfect reconstruction of GHZstates in every case.The QAE also features impressive generativeabilities. Since the QAE is able to effectively recre-ate GHZ states independently of the noise strength,it should be able to recreate them without takingany input (i.e. starting with the state | · · · (cid:105) ), andthus act as a generative model for non-noisy GHZstates. We ran 200 simulations, and found that theaverage fidelity between the states created by ourQAE and the GHZ state is 0.96 with a standarddeviation of 0.01, thus supporting the above claim.In a second experiment, we trained a [3 , , , ,
3] QAE is displayed inFigure 9. Once again, the QAE is rather insensitiveto stronger noise. However, the fidelity is not ashigh as with the [2 , ,
2] QAE (between 0.9 and 0.95here), and the standard deviation is quite large,especially for very strong levels of noise.To further improve the fidelity and dampen thestandard deviation, we can apply our [3 , ,
3] QAEtwice. The results are presented in Figure 10.Our [3 , ,
3] QAE can also be used as a generativemodel. Running the same experiment as with the[2 , ,
2] QAE, we found that the states created byour QAE achieved an average fidelity of 0 . ± .
25 50 75 100 125 150 175 200Epochs0 . . . . . . T r a i n i n g F i d e li t y Figure 8: Training fidelity of a [3 , ,
3] QAE using150 training pairs affected by the QDC with noisestrength p = 0 .
2. We used (cid:15) = 0 . η = 1 / . . . . . . . . . . . F i d e li t y w i t h G H Z Noisy theoretical fidelityNoisy actual fidelityDenoised fidelity
Figure 9: Mean and deviation of the fidelity of the[3 , ,
3] network, trained with p = 0 .
2, over differ-ent noise strength of the QDC. For each p , 200 teststates were drawn. The green points are the fi-delities of the test states. The red points are theirtheoretical fidelities.with the 3-qubit GHZ state.To conclude this section, we can visualize the ef-fect of our [2 , ,
2] QAE on noisy states, using the’States City’ plot feature of Qiskit (see Figure 11).It can be observed how the QAE is re-heighteningthe relevant states, i.e. the states corresponding tothe density matrix of the GHZ state.
It is interesting to assess the robustness of ourQAEs to internal noise, as it is ubiquitous in a realquantum device. To do so, we introduce a Gaussiannoise in the unitary gates composing our QAEs.7 . . . . . . . . . . . F i d e li t y w i t h G H Z Noisy theoretical fidelityNoisy actual fidelityDenoised fidelity 2x
Figure 10: Mean and deviation of the fidelity of the[3 , ,
3] network, trained with p = 0 . p , 200 test states were drawn. The greenpoints are the fidelities of the test states. The redpoints are their theoretical fidelities.Using Eq. (6), the addition of a noisy term δ leadsto the following equation for the noisy gates : (cid:101) U = e i ( K + δ ) = e iδ · U (17)For a meaningful comparison, the introducednoise δ has to be significantly smaller than K ,which was found in previous simulations to beabout − . ± .
25. We choose a Gaussian noise δ centered on 0, and tested the robustness of theQAE while increasing its standard deviation. Theresults are illustrated in Figures 12 and 13, respec-tively for the [2,1,2] and [3,1,3] QAEs. In each case,200 states are picked from the noisy QDC withnoise strength p = 0 .
3. The results show a signifi-cant robustness to noise, as the QAEs still increasethe fidelity of the denoised states, until the stan-dard deviation of the gates’ noise reaches roughlyone third of the average value of the coefficients K .Furthermore, Figures 12 and 13 display that theGHZ states are almost perfectly reconstructed bythe QAEs as long as the standard deviation remainsbelow 0.03. In this section, we have demonstrated that theQAEs studied in this paper are able to (almost)perfectly denoise GHZ states, and at the same timegenerate them without explicitly being given the goal state. Furthermore, we have shown that theseQAEs are extremely robust to internal noise.We emphasize once again that during the wholetraining phase, the QAE only sees noisy GHZstates, both on the input and on the target. Itis then able to perceive the underlying structurebehind all these noisy states, and to recompose it.
We detailed the creation and training of QAEs.We now discuss a practical application of them inthe context of QSS. We analyze the limitation noiseconfers to QSS and show how QAEs can save QSSprotocols by dampening the impact of noise. Butfirst, we explain the basis of a QSS protocol.
A Secret Sharing protocol is a method that al-lows a secret to be split between participants, sothat each share does not give any information aboutthe secret, but if combined together, they allow theparticipants to recover the secret.In a classical setting, for instance, a secret couldbe a two digit number (i.e. between 00 and 99).If we have two participants, one possible schemewould be to give one digit of the number to eachparticipant. However, this scheme is not consideredas a secure secret sharing protocol, as having a digitof a number gives considerable information aboutthat number. In contrast to this insecure protocol,consider a scheme where a randomly drawn num-ber is added to the secret and the sum is reducedmodulo 100. We then give the first participant theresult, and the second party the drawn number.In this example, none of the participants can learnanything about the secret alone. But by collabo-rating, they can fully retrieve it.Formally, a secured Secret Sharing protocol mustensure that : • collaboration allows the participants to recoverthe secret • shares alone do not allow any participant to re-duce the search state space in which the secretlies8
00 01 10 11 R e [ ρ ] . . . . . .
00 01 10 11
00 01 10 11 I m [ ρ ] . . . . . . (a) Noisy states city
00 01 10 11
00 01 10 11 R e [ ρ ] . . . . . .
00 01 10 11
00 01 10 11 I m [ ρ ] . . . . . . (b) Denoised states city Figure 11: States city of noisy and denoised states. We used 200 GHZ states affected by the QDC withnoise strength p = 0 .
4. The states city are respectively the average initial state vector and the averagedensity matrix output by the QAE. .
000 0 .
025 0 .
050 0 .
075 0 .
100 0 .
125 0 .
150 0 .
175 0 . . . . . . . . F i d e li t y w i t h G H Z Mean original fidelityDenoised fidelity
Figure 12: Noise robustness of the [2,1,2] QAEQuantum Physics helps building secure SecretSharing protocols by distributing entangled statesbetween participants. We refer to QSS as describedin [9]. The interest of QSS is both to make it diffi-cult for a malignant participant to cheat and for aneavesdropper to intercept information, as they willbe (with a high probability) detected by the otherparticipants.We consider here a GHZ triplet, shared betweenAlice, Bob and Charlie. Initially, Charlie possessesthe whole triplet. He sends to Alice and Bob thefirst and second qubits, and then each of the partiesmakes a measurement in either the X or Y basis(randomly selected), defined by X : (cid:40) | + x (cid:105) = √ ( | (cid:105) + | (cid:105) ) |− x (cid:105) = √ ( | (cid:105) − | (cid:105) ) , (18) Y : (cid:40) | + y (cid:105) = √ ( | (cid:105) + i | (cid:105) ) |− y (cid:105) = √ ( | (cid:105) − i | (cid:105) ) . (19)Once all measurements are made, each partici-pant announces the basis he has chosen. As we will .
000 0 .
025 0 .
050 0 .
075 0 .
100 0 .
125 0 .
150 0 .
175 0 . . . . . . F i d e li t y w i t h G H Z Mean original fidelityDenoised fidelity
Figure 13: Noise robustness of the [3,1,3] QAEshow, in half of the cases Alice and Bob will knowthe result of Charlie’s measurement, provided thatthey collaborate (and they can not find it if they donot). They will then have a secret bit in common.By repeating this protocol multiple times, Charliewill be able to establish a secret key with Alice andBob, that can be used, for instance, for sendinga classical message encrypted with the shared key(like a One-Time-Pad [10]).To show how Alice and Bob can know the resultof Charlie’s measurement, we can re-write the GHZtriplet in all the possible combinations of the X and Y basis. We have : | ψ (cid:105) = 1 √ | (cid:105) + | (cid:105) ) (20)= (cid:104)(cid:0) | + x (cid:105) | + x (cid:105) + |− x (cid:105) |− x (cid:105) + | + y (cid:105) |− y (cid:105) + |− y (cid:105) | + y (cid:105) (cid:1) | + x (cid:105) + (cid:0) | + x (cid:105) |− x (cid:105) + |− x (cid:105) | + x (cid:105) + | + y (cid:105) | + y (cid:105) + |− y (cid:105) |− y (cid:105) (cid:1) |− x (cid:105) + (cid:0) | + x (cid:105) |− y (cid:105) + |− x (cid:105) | + y (cid:105) + | + y (cid:105) |− x (cid:105) + |− y (cid:105) | + x (cid:105) (cid:1) | + y (cid:105) + (cid:0) | + x (cid:105) | + y (cid:105) + |− x (cid:105) |− y (cid:105) + | + y (cid:105) | + x (cid:105) + |− y (cid:105) |− x (cid:105) (cid:1) |− y (cid:105) (cid:105) . (21)9ence, we can see that the ’valid’ bases (underwhich Charlie will be able to have a shared bit withAlice and Bob) are {X X X , YYX , X YY , YX Y} orsimply when an even number of participants takesa measurement according to the Y basis (as mea-suring a basis in the other one leads to a probabilityof 1 / Now let us assume that participants do not haveaccess to a noise-free GHZ triplet (for instance be-cause of hardware constraints or shortcomings), orthat the one they have has been corrupted by noise.In this section, we analyze how that will impact theprotocol.To do so, we suppose that the GHZ triplet is cor-rupted by the QDC with strength p . This meansthat each of the qubits will be affected by a par-ticular Pauli matrix with probability p (for each ofthe three matrices), and will stay unharmed withprobability 1 − p . Hence, to quantify the impactof the QDC, we only have to assess the effect ofPauli matrices on the basis X and Y . We have X |± x (cid:105) = ± |± x (cid:105) X |± y (cid:105) = ± i |∓ y (cid:105) Y |± x (cid:105) = ∓ i |∓ x (cid:105) Y |± y (cid:105) = ± |± y (cid:105) Z |± x (cid:105) = |∓ x (cid:105) Z |± y (cid:105) = |∓ y (cid:105) (22)i.e. X affects the Y basis, Y affects the X basis, and Z affects both. By ’affecting’ the basis, we meanthat it inverts the basis’ states, which will disorderEq. (21). The coefficients are irrelevant here.It then becomes straightforward to understandhow the QDC affects the protocol. Let’s supposethat we are in the case of a ’valid’ basis, hencethe result of this measurement is kept a part ofthe shared key. Let’s further suppose that in theQDC, Alice’s qubit is affected by X , and Bob andCharlie’s qubits are lucky enough to stay still. Willthe protocol fail ?If Alice chooses to do her measurement accord-ing to the X basis, nothing will change, as X leavesthe X basis unharmed, so the protocol will succeed.However, if she chooses the Y basis, then the bit re-covered by Alice and Bob (which is supposed to be Charlie’s bit) will always be the wrong one. Indeed,let us assume for instance that Bob chooses the Y basis too and finds |− y (cid:105) , and Charlie chooses the X basis. If Alice finds | + y (cid:105) , then, by referring toEq. (21), Bob and her will assume that Charliefound | + x (cid:105) , when he actually found |− x (cid:105) , since Al-ice’s measurement should have been |− y (cid:105) , had herqubit not been affected by X . Hence, in this case,the probability of getting the wrong bit (and thus,having the protocol failed) is 1/2. We can conductthe same reasoning for every other options (syn-dromes) in the QDC, which leads to the followingresult. Proposition.
If the QSS protocol is affected by theQDC with strength p , the probability Γ of the pro-tocol failing is Γ = p p − p + 3) . (23) Proof.
Assuming that Charlie’s qubit is left un-harmed, we get for Alice and Bob : • XI, Y I, IX, IY, XX, Y Y, XY, Y X, XZ, ZX , Y Z, ZY = ⇒ the protocol fails half of the time • IZ, ZI = ⇒ the protocol always fails • II, ZZ = ⇒ the protocol never fails (we cansee in Eq. (21) that if both signs of the twofirst qubits change, that will leave the protocolunharmed)We can then obtain the probability Γ I of the pro-tocol failing when Charlie’s qubit is affected by I (i.e. is unharmed) by multiplying the probabilitiesof failing and the probabilities of having each syn-drome in the QDC :Γ I = 12 (cid:104) · p (cid:0) − p (cid:1) + 8 · (cid:0) p (cid:1) (cid:105) + 2 · p (cid:0) − p (cid:1) = 12 (cid:104) p − p p (cid:105) + p − p p (1 − p . (24)We can apply the same method when Charlie’squbit is affected by X or Y , which yields :Γ X = Γ Y = 12 . (25)Finally, when Charlie’s qubit is affected by Z , wehave :10
012 Depolarizing Channel S † HS † HH q q q c Figure 14: Noisy QSS circuit. Here, the basis chosen is
YX Y .Γ Z = 1 − p + p . (26)Putting everything together, we get the totalprobability Γ of the protocol failing :Γ = (1 − p · Γ I + p · Γ X + p · Γ Y + p · Γ Z = p p − p + 3) . (27) (cid:4) We simulated the QSS protocol on Qiskit, as canbe seen in Figure 14 (making a measurement in the X basis is equivalent to applying H and makinga measurement in the Z = {| (cid:105) , | (cid:105)} basis, andsimilarly for the Y basis by adding a S † before H ).In every simulation, the bases are randomly cho-sen, and if they correspond to a valid configuration,the secret bit is kept for the key. We made p varybetween 0 and 1, ran 1000 measurements for eachvalue of p , then computed the number of differencesbetween the two keys, weighted by the key length.The results are shown in Figure 15.The probability of failure increases with p , and atthe limit p = 1 (for a full-noise QDC), we have Γ = , thus the protocol is not better than guessing abit at random. Even a small value of p , like p = 0 . .
24, so on average approximately1 bit out of 4 of the final secret key will be wrong,rendering it rather useless.Moreover, Charlie will not be able to detect ifAlice or Bob is cheating, or if a malignant thirdparty is trying to get a hold of the key, at it wasthe case in a non-noisy QSS protocol. Indeed, [9]showed that if Alice or Bob is cheating by gettingboth qubits and sending the other a qubit she/he . . . . . . . . . . . . P r o b a b ili t y o f h a v i n g w r o n g s h a r e db i t s Empirical probabilityTheoretical probabilityDenoised probability
Figure 15: Probabilities of the QSS protocol failingwhen submitted to the QDC with noise strength p .The blue line corresponds to Γ.carefully prepared, or if a third party is grafting anancilla to the GHZ state to get any information,then it will irreducibly introduce errors. In a non-noisy QSS scheme, the participants can detect that,by revealing a fraction of the key, and see if thereare any discrepancies. However, in a noisy QSSscheme, it will be challenging to determine if theerror is due to a cheating act or simply due to thenoisy channel. Although, cheating activities canstill be detected, provided that the channel is nottoo noisy, and at the expense of a great numberof simulations of the protocol, which may not bepossible in practice. Indeed, the probability of theprotocol failing in the presence of a cheater is 1/4.Hence, if the noise strength of the channel leads toa fewer probability of errors (for instance, when itis fewer than 0 . We have seen in the previous section that noiseis a serious limitation to QSS protocols. In thissection, we show how QAEs can limit the impactof noise by drastically diminishing the probabilityof the protocol failing.To do so, Charlie can apply the QAE to de-noise the triplet before it sends Alice and Bob theirqubits. Alternatively, Charlie can directly use theQAE to create an almost perfect GHZ triplet, asdemonstrated in section 3. This will dampen, ifnot almost annihilate, the probability of having er-roneous shared bits between Alice and Bob, andCharlie.The circuit used for this experiment is repre-sented in Figure 16. We use our [3 , ,
3] QAEtrained on the QDC. Qubit 0 is Alice’s qubit, qubit1 is Bob’s qubit, qubit 2 is Charlie’s qubit, andqubit 3 is an ancilla used by the QAE (remem-ber that the width of our QAEs is defined bymax ≤ i ≤ (cid:96) − ( m i + m i +1 ) = 4 here). At the output ofthe QAE, the state is encoded in qubit 0, 1 and 2(but that may not always be the case, for instanceif the QAE had shape [3 , , , We successfully implemented the QAE from [2],using Python and Qiskit. Additionally, we demon-strated the QAE denoising ability on GHZ statesfor multiple noise channels. The network is able toeffectively denoise all of the given channels for var-ious noise strengths, and is itself resistant to noise.Finally, we have shown that after being trained, theQAE is able to function as a generative model forthe denoised state. It also proved itself particularly useful in the context of QSS. Other advantages ofthe QAE lie in its compactness and in the fact thatit can be applied instantaneously once trained, de-spite a lengthy training process.There are many avenues worth exploring withthis QAE in the future. In particular, since thenetwork can function as a quantum error correctingcode (QECC), it would be interesting to benchmarkit again other state-of-the-art QECCs, notably tocompare its denoising accuracy as well as the timeand resources involved.In this paper, we were limited by computationalresources, so we could only fully test [2 , ,
2] and[3 , ,
3] networks. In the future, it would be use-ful to explore the effectiveness of the network withdifferent topologies.
Code
The code to create, train and test the QAEs, aswell as for the QSS part can be found at https://github.com/Tom-Achache/QAEs . References [1] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol,
Extracting and Composing Ro-bust Features with Denoising Autoencoders .ICML ’08, New York, NY, USA: Associationfor Computing Machinery, 2008.[2] D. Bondarenko and P. Feldmann,
Quantumautoencoders to denoise quantum data . InarXiv:1910.09169, 2019.[3] K. Beer, D. Bondarenko, T. Farrelly, T. J. Os-borne, R. Salzmann, and R. Wolf,
EfficientLearning for Deep Quantum Neural Networks .In arXiv:1902.10445, 2019.[4] G. Sent´ıs, A. Monr`as, R. Mu˜noz-Tapia, J. Cal-samiglia, and E. Bagan,
Unsupervised classifi-cation of quantum data . In arXiv:1903.01391,2019.[5] https://github.com/R8monaW/DeepQNN .[6] https://github.com/Qiskit .12
012 Depolarizing Channel 0123 Quantum Autoencoder S † HS † HH q q q q c Figure 16: Quantum circuit to denoise the QSS protocol[7] A. R. Conn, K. Scheinberg, and L. N. Vi-cente,
Introduction to derivative-free optimiza-tion . SIAM, 2009.[8] D. P. Kingma and J. Ba,
Adam: A Method forStochastic Optimization . In arXiv:1412.6980,2014.[9] M. Hillery, V. Buzek, and A. Berthiaume,
Quantum secret sharing . In arXiv:quant-ph/9806063, 1998.[10] F. Miller,