[PDF] Quantum generalisation of feedforward neural networks

Abstract

We propose a quantum generalisation of a classical neural network. The classical neurons are firstly rendered reversible by adding ancillary bits. Then they are generalised to being quantum reversible, i.e.\ unitary. (The classical networks we generalise are called feedforward, and have step-function activation functions.) The quantum network can be trained efficiently using gradient descent on a cost function to perform quantum generalisations of classical tasks. We demonstrate numerically that it can: (i) compress quantum states onto a minimal number of qubits, creating a quantum autoencoder, and (ii) discover quantum communication protocols such as teleportation. Our general recipe is theoretical and implementation-independent. The quantum neuron module can naturally be implemented photonically.

Full PDF

QQuantum generalisation of feedforward neural networks

Kwok Ho Wan,

1, 2

Oscar Dahlsten,

1, 3, 2

Hl´er Kristj´ansson,

1, 2

Robert Gardner,

1, 2 and M.S. Kim Blackett Laboratory, Imperial College London, London, SW7 2AZ, United Kingdom London Institute for Mathematical Sciences, 35a South Street Mayfair, London, W1K 2XF, United Kingdom Clarendon Laboratory, University of Oxford, Parks Road, Oxford, OX1 3PU, United Kingdom (Dated: December 6, 2016)We propose a quantum generalisation of a classical neural network. The classical neurons areﬁrstly rendered reversible by adding ancillary bits. Then they are generalised to being quantumreversible, i.e. unitary. (The classical networks we generalise are called feedforward, and havestep-function activation functions.) The quantum network can be trained eﬃciently using gradientdescent on a cost function to perform quantum generalisations of classical tasks. We demonstratenumerically that it can: (i) compress quantum states onto a minimal number of qubits, creating aquantum autoencoder, and (ii) discover quantum communication protocols such as teleportation.Our general recipe is theoretical and implementation-independent. The quantum neuron modulecan naturally be implemented photonically.

INTRODUCTION

Artiﬁcial neural networks mimic biological neural net-works to perform information processing tasks. Theyare highly versatile, applying to vehicle control, trajec-tory prediction, game-playing, decision making, patternrecognition (such as facial recognition, spam ﬁlters), ﬁ-nancial time series prediction, automated trading sys-tems, mimicking unpredictable processes, and data min-ing [1, 2]. The networks can be trained to perform taskswithout the programmer necessarily detailing how to doit. Novel techniques for training networks of many layers(deep networks) is credited with giving impetus to theneural networks approach [3].The ﬁeld of quantum machine learning is rapidly devel-oping though the focus has aruably not been on the con-nection to neural networks. Quantum machine learning,see e.g. [4–17] employs quantum information processing(QIP) [18]. QIP uses quantum superpositions of stateswith the aim of faster processing of classical data as wellas tractable simulation of quantum systems. In a super-position each bit string is associated with two numbers:the probability of the string and the phase [19], respec-tively. The phase impacts the future probabilities via atime evolution law. There are certain promising resultsthat concern quantum versions of recurrent neural net-works, wherein neurons talk to each other in all directionsrather than feeding signals forward to the next layer, e.g.with the purpose of implementing quantum simulated an-nealing [8, 14, 20, 21]. In [22] several papers proposingquantum neural network designs are discussed and criti-cally reviewed. A key challenge to overcome is the clashbetween the nonlinear, dissipative dynamics of neuralnetwork computing and the linear, reversible dynamicsof quantum computing [22]. A key reason for wantingwell-functioning quantum neural networks is that thesecould do for quantum inputs what classical networks cando for classical inputs, e.g. compressing data encoded inquantum superpositions to a minimal number of qubits.We here accordingly focus on creating quantum gen- eralisations of classical neural networks, which can takequantum inputs and process them coherently. Our net-works contribute to a research direction known as quan-tum learning [23–27] which concerns learning and opti-mising with truly quantum objects. The networks pro-vide a route to harnessing the powerful neural networkparadigm for this purpose. Moreover they are strict gen-eralisations of the classical networks, providing a clearframework for comparing the power of quantum and clas-sical neural networks.The networks generalise classical neural networks tothe quantum case in a similar sense to how quantumcomputing generalises classical computing. We start witha common classical neural network family: feedforwardperceptron networks. We make the invidual neuronsreversible and then naturally generalise them to beingquantum reversible (unitary). This resolves the classical-quantum clash mentioned above from [22]. An eﬃcienttraining method is identiﬁed: global gradient descent fora quantum generalisation of the cost function, a func-tion evaluating how close the outputs are to the desiredoutputs. To illustrate the ability of the quantum networkwe apply it to (i) compressing information encoded in su-perpositions onto fewer qubits (an autoencoder) and (ii)re-discovering the quantum teleportation protocol—thisillustrates that the network can work out QIP protocolsgiven only the task. To make the connection to physicsclear we describe how to simulate and train the networkwith quantum photonics.We proceed as follows. Firstly, we describe the recipefor generalising the classical neural network. Then itis demonstrated how the network can be applied to thetasks mentioned above, followed by a design of a quan-tum photonic realisation of a neural module. We discussthe results, followed ﬁnally by a summary and outlook.

QUANTUM NEURAL NETWORKS

Classical neural networks are composed of elementaryunits called neurons. We begin with describing these, a r X i v : . [ qu a n t - ph ] D ec before detailing how to generalise them to quantum neu-rons. The classical neuron

A classical neuron is depicted in FIG. 1. In this case,it has two inputs (though there could be more). Thereis one output, which depends on the inputs (bits in ourcase) and a set of weights (real numbers): if the weightedsum of inputs is above a set threshold, the output is 1,else it is 0.

FIG. 1. A classical neuron taking two inputs in and in andgiving a corresponding output out [1]. a ( l ) j labels the outputof the j th neuron in the l th layer of the network. We will use the following standard general notation.The j th neuron in the l th layer of a network takes a num-ber of inputs, a ( l − k , where k labels the input. The inputsare each multiplied by a corresponding weight, w ( l ) jk , andan output, a ( l ) j , is ﬁred as a function of the weighted input z ( l ) j = (cid:80) nk =1 w ( l ) jk a ( l − k , where n is the number of inputsto the neuron (FIG. 1). The function relating the outputto the weighted input is called the activation function,which has most commonly been a Heaviside step func-tion or a sigmoid [1]. For example, the neuron in FIG. 1with a Heaviside activation function gives an output ofthe form: a ( l ) j = (cid:26) , if z ( l ) j > . , otherwise. (1)This paper aims to generalise the classical neuron to aquantum mechanical one. In the absence of measure-ment, quantum mechanical processes are required to bereversible, and more speciﬁcally, unitary, in a closedquantum system [18, 28]. This suggests the followingprocedure for generalising the neuron ﬁrst to a reversiblegate and ﬁnally to a unitary gate: Irreversible → reversible: For an n -input classi-cal neuron having ( in , in , ..., in n ) → out , create aclassical reversible gate taking ( in , in , ..., in n , → ( in , in , ..., in n , out ). Such an operation can always berepresented by a permutation matrix [29]. This is a cleanway of rendering the classical neuron reversible. The ex-tra ‘dummy’ input bit is used to make it reversible [28];in particular, some of the ‘2 bits in – 1 bit out’ functionsthe neuron can implement require 3 bits to be made re-versible in this manner. Reversible → unitary: Generalise the classical re-versible gate to a quantum unitary taking input ( | ψ in (cid:105) , ,...,n | (cid:105) ) → | ψ out (cid:105) , ,...,n,out , such that the ﬁnaloutput qubit is the output of interest. This is the natu-ral way of making a permutation matrix unitary.If the input is a mixture of states in the computationalbasis and the unitary a permutation matrix [30], the out-put qubit will be a mixture of | (cid:105) or | (cid:105) : this we call the classical special case . This way the quantum neuron cansimulate any classical neuron as deﬁned above. The gen-eralisation recipe summarised in FIG. 2 also illustrateshow any irreversible classical computation can be recov-ered as a special case from reversible classical computa-tion (by ignoring the dummy and copied bits), which inturn can be recovered as a special case from quantumcomputation. FIG. 2. Diagram summarising our method of generalising theclassical irreversible neuron with Heaviside activation func-tion, ﬁrst to a reversible neuron represented by a permutationmatrix (P), and ﬁnally to a quantum reversible computation,represented by a unitary operator (U).

The network

In order to form a neural network, classical neurons areconnected together in various conﬁgurations. Here, weconsider feedforward classical networks, where neuronsare arranged in layers and each neuron in the l th layer isconnected to every neuron in the ( l − th and ( l + 1) th layers, but with no connections within the same layer.For an example of such a classical network, see FIG. 3.Note that in this case the same output of a single neuronis sent to all the neurons in the next layer [1, 2].To make the copying reversible, in line with ourapproach of ﬁrstly making the classical neural networkreversible, we propose the recipe: Irreversible → reversible: For a classical irreversiblecopying operation of a bit b → ( b, b ), create a classicalreversible gate, which can be represented by a permuta-tion matrix [28], taking ( b, → ( b, b ).In the quantum case the no-cloning theorem showsone cannot do this in the most naive way [18]. For a2-qubit case, one can use a CNOT for example to copyin the classical computational basis [28]: | b (cid:105) | (cid:105) → | b (cid:105) | b (cid:105) ,if | b (cid:105) ∈ {| (cid:105) , | (cid:105)} . Thus one may consider replacingthe copying with a CNOT. However when investigatingapplications of the network we realised that thereare scenarios (the autoencoder in particular) whereentanglement between diﬀerent neurons is needed toperform the task. We have therefore chosen the followingdeﬁnition: Reversible → unitary: The classical CNOT is gen-eralised to a general 2-qubit ‘fan-out’ unitary U F , withone dummy input set to | (cid:105) , such that | b (cid:105) | (cid:105) → U F | b (cid:105) | (cid:105) .As this unitary does not in general copy quantum statesthat are non-orthogonal we call it a ‘fan-out’ operationrather than a copying operation, as it distributesinformation about the input state into several outputqubits. Note that a quantum network would be trained to choose the unitary in question. FIG. 3. A classical autoencoder taking two inputs in = a (0)1 and in = a (0)2 and compressing them to one hidden layeroutput a ( l )1 . The ﬁnal output layer is used in training andis trained to reconstruct the inputs. The notation here isin accordance with [1]. The blue box represents the datacompression device after the training procedure. Eﬃcient training with gradient descent

A classical neural network is trained to perform par-ticular tasks. This is done by randomly initialising theweights and then propagating inputs through the net-work many times, altering the weights after each prop-agation in such a way as to make the network outputcloser to the desired output. A cost function, C , relating the network output to the desired output is deﬁned by C = 12 (cid:12)(cid:12)(cid:12) (cid:126)y ( L ) − (cid:126)a ( L ) (cid:12)(cid:12)(cid:12) , (2)where (cid:126)y ( L ) is a vector of the desired outputs from eachof the ﬁnal layer l = L neurons and (cid:126)a ( L ) is the vector ofactual outputs, which depends on the network weights,and (cid:12)(cid:12)(cid:12) ( . ) (cid:12)(cid:12)(cid:12) is the l -norm. The cost function is minimisedto zero when the weights propagate the input in such away that the network output vector equals the desiredoutput vector.Since the weights are continuous variables, the nu-merical partial derivatives of the cost function w.r.t.each weight can be found by approximating ∂C∂w ≈ C ( w + (cid:15) ) − C ( w ) (cid:15) . After each propagation, these partialderivatives are computed and the weights are altered inthe direction of greatest decrease of the cost function.Speciﬁcally, each weight w ( l ) jk is increased by δw ( l ) jk , with δw ( l ) jk = − η ∂C∂w ( l ) jk , (3)where η is an adjustable non-negative parameter. Thistraining procedure is known as gradient descent [1].Note that gradient descent normally also requires acontinuous and diﬀerentiable activation function, to al-low small changes in the weights to relate to smallchanges in the cost. For this reason, the Heaviside acti-vation function has traditionally been replaced by a sig-moid function [1, 2]. Nevertheless, gradient descent hasalso been achieved using Heaviside activation functions,by taking the weights as Gaussian variables and takingpartial derivatives w.r.t. the means and standard devia-tions of the appropriate Gaussian distributions [31, 32].In the reversible generalisation, where each neuron isreplaced by a permutation matrix, we ﬁnd that the out-put is no longer a function of the inputs and continuous weights, but rather of the inputs and a discrete set ofpermutation matrices. However, in the generalisation tounitaries, for a gate with n inputs and outputs, there ex-ist an inﬁnite number of unitaries, in contrast with thediscrete set of permutation matrices. This means thatthe unitaries can be parametrised by continuous vari-ables, which once again allows the application of gradientdescent.Given that any unitary matrix U can be expressed as U = e iH , where H is a Hermitian matrix [18], and thatsuch matrices can be written as linear combinations oftensor products of the Pauli matrices and the identity, itfollows that a general N -qubit unitary can be expressedas U N = exp (cid:34) i (cid:32) ,..., (cid:88) j ,...,j N =0 ,..., α j ,...,j N × ( σ j ⊗ ... ⊗ σ j N ) (cid:33)(cid:35) , (4)where σ i are the Pauli matrices for i ∈ { , , } and σ isthe 2 × w ( l ) jk with a general parameter α j ,...,j N of the unitary U N : δα j ,...,j N = − η ∂C∂α j ,...,j N . (5)A simpler and less general form of U N has been suﬃ-cient for the tasks discussed in this paper: U = (cid:88) j =1 | τ j (cid:105) (cid:104) τ j | ⊗ T j , (6)where {| τ j (cid:105)} j =1 = { V | (cid:105) , V | (cid:105) , V | (cid:105) , V | (cid:105)} . V isa general 2-qubit unitary of the form of Eq. 4. Each T j issimilarly a general 1-qubit unitary and one can see, usingthe methods of [33] on Eq. 4, that this can be expressedas a linear combination of the Pauli matrices, σ j : U − qubit = e iα (cid:32) cos Ω (cid:49) + i sin ΩΩ (cid:88) j =1 α j σ j (cid:33) , (7)where Ω = (cid:112) α + α + α [33]. To extend this to higherdimensional unitaries, see e.g. [34].The cost function we use for the quantum neural net-works is, with experimental feasibility in mind, deter-mined by the expectation values of local Pauli matrices( σ , σ , σ ) on individual output qubits, j . It has theform C = (cid:88) i,j f ij ( (cid:104) σ ( j ) i (cid:105) actual − (cid:104) σ ( j ) i (cid:105) desired ) (8)where f ij is a real non-negative number (in the examplesto follow f ij ∈ { , } ). We note in the classical mode ofoperation, where the total density matrix state is diago-nal in the computational basis, only σ will have non-zeroexpectation, and the cost function becomes the same asin the classical case (Eq. 2) up to a simple transforma-tion.It is important to note that the number of weights growpolynomially in the number of neurons. Each weight shiftis determined by evaluating the cost function twice to getthe RHS of Eq. 5. Thus the number of evaluations of thecost function for a given iteration of the gradient descentgrows polynomially in the number of neurons. The train-ing procedure is eﬃcient in this sense. We do not hereattempt to provide a proof that the convergence to zerocost-function, where possible, will always take a numberof iterations that grows polynomially in the number ofneurons. Note also that the statements about the eﬃ-ciency of the training procedure refer to the physical im-plementation with quantum technology: the simulationof quantum systems with a classical computer is, withthe best known methods, in general ineﬃcient. Example: Autoencoder for data compression

We now demonstrate applications of our quantum gen-eralisation of neural networks described in the previous section. We begin with autoencoders. These compressan input signal from a given set of possible inputs onto asmaller number of bits, and are ‘work-horses’ of classicalmachine learning [2].

Classical autoencoder

Autoencoders are commonly achieved by a feedforwardneural network with a bottleneck in the form of a layerwith fewer neurons than the input layer. The network istrained to recreate the signal at a later layer, which ne-cessitates reversibly compressing it (as well as possible)to a bit size equal to the number of neurons in the bottle-neck layer [2]. The bottleneck layer size can be varied aspart of the training to ﬁnd the smallest compression sizepossible, which depends on the data set in question. Af-ter the training is complete, the post-bottleneck part ofthe network can be discarded and the compressed outputtaken directly from after the bottleneck.In FIG. 3 a basic autoencoder designed to compresstwo bits into a single bit is shown. (Here the numberof input bits, j max = 2.) The basic training procedureconsists of creating a cost function: C = j max (cid:88) j =1 ( in j − out j ) , (9)with which the network is trained using the learning ruleof Eq. 3. If the outputs are identical to the inputs (towithin numerical precision), the network is fully trained.The ﬁnal layer is then removed, revealing the second lastlayer, which should enclose the compressed data. Thenumber of neurons in a given hidden layer for a classicalneuron will not exceed j max . Once the network is trained,the removal of the post-bottleneck layer(s) will yield asecond last layer of fewer neurons, achieving dimensionalreduction [2]. Quantum autoencoder

We now generalise the classical autoencoder as shownin FIG. 3 to the quantum case. We generalise the neuronslabelled 1, 2 and 3 in FIG. 3 into unitary matrices U , U and U , respectively, with the addition of a ‘fan-out’ gate, U F , as motivated in the previous sections. The result isshown in FIG. 4 as a quantum circuit model. (We followthe classical convention that this neural network is drawnwith the input neurons as well, but they are identity op-erators which let the inputs through regardless, and canbe ignored in the simulation of the network.) The inputstate of interest | in (cid:105) is on 2 qubits, each fed into a dif-ferent neuron, generalising the classical autoencoder inFIG. 3. From each of these neurons, one output qubiteach is led into the bottleneck neuron U , followed by afan-out of its output. We add as an extra desideratumthat the compressed bit, the output of U , is diagonalin the computational basis. The ﬁnal neurons have thetask of recreating | in (cid:105) on the outputs labelled 6 and 8respectively. The result is shown in FIG. 4. This means FIG. 4. Neural network implementing a quantum autoen-coder that can accomodate two input qubits that are entan-gled. The blue box represents the quantum compression de-vice after training. that a natural and simple cost function is C = (cid:88) j =0 ,k =0 (Tr( ρ , σ j ⊗ σ k ) − Tr( ρ in , σ j ⊗ σ k )) . (10)Training is then conducted via global gradient descentof the cost w.r.t. the α j ,...,j N parameters, as deﬁned inEq. 5. During the training the network was fed statesfrom the given input set, picked independently and iden-tically for each step (i.i.d). Standard speed-up techniquesfor learning were used, e.g. a momentum term [1, 2].In training with a variety of 2 possible orthogonal in-put states including superposition states, the cost func-tion of the quantum autoencoder converged towards zerothrough global gradient descent in every case, startingwith uniformly randomised weights, α j ,...,j N ∈ [ − , | (cid:105) + | (cid:105) ) / √ | (cid:105) − | (cid:105) ) / √

2. One can force thecompressed output to be diagonal in a particular basis byadding an extra term to the cost-function (e.g. desiringthe expectation value of Pauli X and Y to be zero in thecase of a single qubit will push the network to give anoutput diagonal in the Z-basis).

Example: Neural network discovers teleportationprotocol

With quantum neural networks already shown to beable to perform generalisations of classical tasks, we nowconsider the possibility of quantum networks discoveringsolutions to existing and potentially undiscovered quan-tum protocols. We propose a quantum neural networkstructure that can, on its own, work out the standardprotocol for quantum teleportation [18].The design and training of this network is analogousto the autoencoder and the quantum circuit diagram is

FIG. 5. A circuit diagram of a quantum neural network thatcan learn and carry out teleportation of the state | ψ (cid:105) fromAlice to Bob using quantum entanglement. The standardteleportation protocol allows only classical communication of2 bits [18]; this is enforced by only allowing two connections,which are dephased in the Z-basis ( D ). U , U and U areunitaries,. The blue line is the boundary between Alice andBob. shown in FIG. 5. The cost function used was: C = (cid:88) j =0 (Tr( | ψ (cid:105) (cid:104) ψ | σ j ) − Tr( ρ σ j )) . (11)A fully trained network can teleport the state | ψ (cid:105) (fromAlice) to the output port of qubit 6 (to Bob). Oncetrained properly, ρ out will no longer be | ψ (cid:105) (cid:104) ψ | , as theteleportation has ‘messed up’ Alice’s state [35].In order to train the teleportation for any arbitrarystate | ψ (cid:105) (and to avoid the network simply learning to copy | ψ (cid:105) from Alice to Bob), the training inputs are ran-domly picked from the axis intersection states on the sur-face of the Bloch sphere [18]. FIG. 6 shows the conver-gence of the cost function during training, simulated ona classical computer. As can be seen, the training wasfound to be successful, i.e. the cost function convergedtowards zero. This held for all tests with randomly ini-tialised weights. DISCUSSIONQuantum vs. classical

Can these neural networks show some form of quantumsupremacy? The comparison of classical and quantumneural networks is well-deﬁned within our set-up, as theclassical networks correspond to a particular parameterregime for the quantum networks. A key type of quan-tum supremacy is that the quantum network can takeand process quantum inputs: it can for example process | + (cid:105) and |−(cid:105) diﬀerently. Thus, there are numerous quan-tum tasks it can do that the classical network cannot,including the two examples above. We anticipate thatthey will moreover, in some cases be able to process clas-sical inputs faster, by turning them into superpositions—investigating this is a natural follow-on from this work. FIG. 6. A plot of the teleportation cost function w.r.t. thenumber of steps used in the training procedure. The costfunction can be seen to converge to zero. The non-monotonicdecrease is to be expected as we are varying the input states.The network now teleports any qubit state: picking 1000states at random from the Haar measure (uniform distributionover the Bloch sphere) gives a cost function distribution withmean 5 . × − and standard deviation 1 . × − ,which is eﬀectively zero. We also mention that we term our above design a quan-tum neural network with classical learning parameters ,as the parameters in the unitaries are classical. It seemsplausible that allowing these parameters to be in super-positions, whilst experimentally more challenging, couldgive further advantages.Whilst adding the ancillary qubits ensures that thenetwork is a strict generalisation of the classical network,it can of course be experimentally and numerically sim-pler to omit these. Then one would sacriﬁce performancein the classical mode of operation, and the network maynot be as good as a classical network with the same num-ber of neurons for all tasks.

Visualising the cost function landscape

To gain intuitive understanding, one can visualise thegradient descent in 3D by reducing the number of freeparameters. We sampled the cost surface and gradientdescent path of a one-input neuron (4 × | + (cid:105) ⊗ | (cid:105) →| + (cid:105)⊗| (cid:105) and |−(cid:105)⊗| (cid:105) → |−(cid:105)⊗| (cid:105) . We optimised, similarlyto Eq. 6, over unitaries of the form U = | τ (cid:105) (cid:104) τ | ⊗ (cid:49) + | τ ⊥ (cid:105) (cid:104) τ ⊥ | ⊗ σ , (12)where | τ (cid:105) = cos( θ/ | (cid:105) + e iφ sin( θ/ | (cid:105) and | τ ⊥ (cid:105) =sin( θ/ | (cid:105) − e iφ cos( θ/ | (cid:105) . We performed gradient de-scent along the variables θ and φ as shown by the redpath in FIG. 7. FIG. 7. A 3-D plot of the cost function (vertical axis) of a 2-qubit unitary as a function of θ and φ (horizontal axes). Thered line represents the path taken when carrying out gradientdescent from a particular starting point. Scaling to bigger networks

The same scheme can be used to make quantum gen-eralisations of networks whose generalised neurons havemore inputs/outputs and connections. FIG. 8 illustratesan M -qubit input quantum neuron with a subsequent N -qubit fan-out gate. FIG. 8. Diagram of the quantum generalisation of a classicalneuron with M inputs and N outputs. The superscripts insidethe square brackets of the unitaries represent the number ofqubits the respective unitaries act on. U [ M +1] is the unitarythat represents the quantum neuron with an N -qubit inputand U [ N ] is the fan-out gate that fans out the output in theﬁnal port of U [ M +1] in a particular basis. If one wishes the number of free parameters of a neu-ron to grow no more than polynomially in the number ofinputs, one needs to restrict the unitary. It is natural todemand it to be a polynomial length circuit of some ele-mentary universal gates, in particular if the input statesare known to be generated by a polynomial length circuitof a given set of gates, it is natural to let the unitary berestricted to that set of gates.The evaluation of the cost function can be kept to asensible scaling if we restrict it to be a function of localobservables on each qubit, in particular a function of thelocal Pauli expectation values, as was used in this paper,for which case a vector of 3 n expectation values suﬃcesfor n qubits. QUANTUM PHOTONICS NEURON MODULE

To investigate the physical viability of these quantumneural networks we consider quantum photonics. This isan attractive platform for quantum information process-ing: it has room temperature operation, the possibility ofrobust miniaturisation through photonic integrated cir-cuits; in general it harnesses the highly developed opticalﬁbre-related technology for QIP purposes [36]. Moreoveroptical implementations have been viewed as optimal forneural networks, in the classical case, due to the low de-sign cost of adding multiple connections (as light passesthrough light without interacting) [37]. A ﬁnal motiva-tion for choosing this platform is that the tuning can benaturally implemented, as detailed below.We design a neuron as a module that can then be con-nected to other neurons. This makes it concrete howexperimentally complex the network would be to buildand operate, including how it could be trained.The design employs the Cerf-Adami-Kwiat (C-A-K)protocol [38], where a single photon with polarisationand multiple possible spatial modes encodes the quan-tum state; the scheme falls into the category of hyper-entangling schemes, which entangle diﬀerent degrees offreedom. One qubit is the polarisation; digital encodingsof the spatial mode labels give rise to the others. Withfour spatial modes this implements 3 qubits, with basis | / (cid:105) | H/V (cid:105) | / (cid:105) , where H/V are two diﬀerent polari-sation states, and the other bits label the four spatialmodes. The ﬁrst bit says whether it is in the top two orbottom two pairs of modes and the last bit whether it isthe upper or lower one in one of those pairs. This schemeand related ones such as [39, 40] are experimentally vi-able, theoretically clean and can implement any unitaryon a single photon spread out over spatial modes. In sucha single photon scenario they do not scale well however.The number of spatial modes grows exponentially in thenumber of qubits. Thus for larger networks our designbelow would need to be modiﬁed to something less sim-ple, e.g. accepting probabilistic gates in the spirit of theKLM scheme [41], or using measurement-based clusterstate quantum computation approaches [36].Before describing the module we make the simplifyingrestriction that there is one input qubit to the neuronand one dummy input. We will ensure that the desig-nated output qubit can be fed into another neuron, asin FIG. 9 and FIG. 10. We propose to update theneural network by adjusting both variable polarisation

FIG. 9. The ﬁrst neuron takes one input and one dummyinput and its designated output is fed into the next neuron.FIG. 10. A circuit diagram of our neural module. FollowingC-A-K there are three qubits, with basis | / (cid:105) | H/V (cid:105) | / (cid:105) ,where H/V label diﬀerent polarisation states, and the otherbits label the four spatial modes. We deﬁne the input to themodule to be carried by the middle (polarisation) qubit. Theneuron U has the form of Eq. 6, modifying the output con-ditional on the input state. The swaps ensure that the nextneuron module U also gets the input via the polarisation. rotators, and spatial phase shifters in a set of Mach-Zehnder interferometers as shown in FIG. 11. In thiswe are able to change the outputs from each layer of thenetwork. The spatial shift could be induced by varyingthe strain or temperature on the waveguides at given lo-cations, to change their refractive indices and hence therelative phase; this may have additional diﬃculties inthat silicon waveguides are birefringent [42]. Alterna-tively we can tune both polarisation and spatial qubitsvia the electro-optic eﬀect.This circuit can be made more robust and minita-turised using silicon or silica optical waveguides [36].They have been extensively used to control spatial modesand recently also polarisation [43]. Several labs can im-plement the phase shifting via heaters or the electro-optic eﬀect. Conventionally phase shifters built upon theelectro-optic eﬀect are known to work in the megahertzregion and have extremely low loss [36]. For many ap-plications this would be considered slow, but our tuningonly requires (in the region of) a few thousand steps oftuning, meaning learning tasks for neural networks thissmall could be completed in milliseconds. While it ap-pears that this eﬀect will be the limiting factor in termsof speed, photodetectors are able to reach reset times inthe tens of nanoseconds, while the production of singlephotons through parametric down conversion have mega-hertz repetion rates [44]. SUMMARY AND OUTLOOK

We have given a protocol for generalising classical feed-forward step-function neural networks to networks thattake and process quantum inputs. We have shown thatthese networks can perform the natural quantum gener-alisation of the classical network in the case of an au-toencoder, being able for example to compress entangledinputs. We have shown that they can be used to workout a quantum information processing protocol: telepor-tation, without being told how to do it, only the task.Based on these results we think that these networks willbe highly versatile tools for quantum information scien-tists, similar to the classical networks’ role in classicalinformation processing.

ACKNOWLEDGMENTS

We acknowledge discussions with Stefanie Baerz, Ab-bas Edalat, William Clements, Alex Jones, Mio Murao,Maria Schuld, Vlatko Vedral, Alejandro Valido and dis-cussions and detailed comments from Doug Plato, MihaiVidrighin, Peter Wittek. We are grateful for fundingfrom the the EU Collaborative Project TherMiQ (GrantAgreement 618074), the London Institute for Mathemat-ical Sciences, a Leverhulme Trust Research Grant (No.RPG-2014-055), a programme grant from the UK EP-SRC (EP/K034480/1). [1] M. A. Nielsen,

Neural Networks and Deep Learning (De-termination Press, online book, 1991).[2] E. M. Azoﬀ,

Neural Network Time Series Forecasting ofFinancial Markets (John Wiley and Sons, Chichester,1994).[3] Y. LeCunn, Y. Bengio, and G. Hinton, “Deep learning,”Nature , 436–444 (2015).[4] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum al-gorithms for supervised and unsupervised machine learn-ing,” (2013), arXiv:1307.0411 [quant-ph].[5] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantumprincipal component analysis,” Nature Physics , 631–633 (2014).[6] A. Montanaro, “Quantum pattern matching fast on av-erage,” Algorithmica , 1–24 (2015).[7] S. Aaronson, “Read the ﬁne print,” Nature Physics ,291–293 (2015).[8] S. Garnerone, P. Zanardi, and D. A. Lidar, “Adiabaticquantum algorithm for search engine ranking,” Phys.Rev. Lett. , 230506 (2012).[9] A. W. Harrow, A. Hassidim, and S. Lloyd, “Quantumalgorithm for linear systems of equations,” Phys. Rev.Lett. , 150502 (2009).[10] S. Lloyd, S. Garnerone, and P. Zanardi, “Quantum al-gorithms for topological and geometric analysis of bigdata,” Nature Communications , 10138 (2016).[11] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantumsupport vector machine for big data classiﬁcation,” Phys.Rev. Lett. , 130503 (2014).[12] N. Wiebe, D. Braun, and S. Lloyd, “Quantum algorithmfor data ﬁtting,” Phys. Rev. Lett. , 050505 (2012).[13] J. Adcock, E. Allen, M. Day, S. Frick, J. Hinchliﬀ,M. Johnson, S. Morley-Short, S. Pallister, A. Price, andS. Stanisic, “Advances in quantum machine learning,”(2015), arXiv:1512.02900 [quant-ph].[14] B. Heim, T. F. Rønnow, S. V. Isakov, and M. Troyer,“Quantum versus classical annealing of Ising spinglasses,” Science , 215–217 (2015).[15] D. Gross, Y.K. Liu, S. T. Flammia, S. Becker, andJ. Eisert, “Quantum state tomography via compressedsensing,” Phys. Rev. Lett. , 150401 (2010).[16] V. Dunjko, J. M. Taylor, and H. J. Briegel, “Quantum-enhanced machine learning,” Phys. Rev. Lett. , 130501 (2016).[17] P. Wittek, ed., Quantum Machine Learning (AcademicPress, Boston, 2014) pp. i – ii.[18] M. A. Nielsen and I. L. Chuang,

Quantum Computationand Quantum Information (Cambridge University Press,Cambridge, 2000).[19] A. J. P. Garner, O. C. O. Dahlsten, Y. Nakata, M. Mu-rao, and V. Vedral, “A framework for phase and interfer-ence in generalized probabilistic theories,” New Journalof Physics , 093044 (2013).[20] W. Lechner, P. Hauke, and P. Zoller, “A quantum an-nealing architecture with all-to-all connectivity from lo-cal interactions,” Science Advances (2015).[21] N. Wiebe, A. Kapoor, and K. M. Svore, “Quantum deeplearning,” (2014), unpublished.[22] M. Schuld, I. Sinayskiy, and F. Petruccione, “The questfor a quantum neural network,” Quantum InformationProcessing , 25672586 (2014).[23] A. Bisio, G. Chiribella, G. M. D’Ariano, S. Facchini,and P. Perinotti, “Optimal quantum learning of a uni-tary transformation,” Phys. Rev. A , 032324 (2010).[24] M. Sasaki and A. Carlini, “Quantum learning and uni-versal quantum matching machine,” Phys. Rev. A ,022303 (2002).[25] G. Sent´ıs, M. GuT¸ ˘a, and G. Adesso, “Quantum learn-ing of coherent states,” EPJ Quantum Technology , 17(2015).[26] L. Banchi, N. Pancotti, and S. Bose, “Quantum gatelearning in qubit networks: Toﬀoli gate without time-dependent control,” Npj Quantum Information , 16019EP – (2016).[27] P. Palittapongarnpim, P. Wittek, E. Zahedinejad,S Vedaie, and B. C. Sanders, “Learning in quantumcontrol: high-dimensional global optimization for noisyquantum dynamics,” (2016).[28] R. P. Feynman, “Quantum mechanical computers,”Found. Physics , 507531 (1986).[29] A. Muthukrishnan, “Classical and quantum logic gates:An introduction to quantum computing,” (1999), semi-nar notes, unpublished.[30] C. W. Curtis and I. Reiner, Representation Theory ofFinite Groups and Associative Algebras (AMS ChelseaPublishing, Providence, RI, 1962). [31] P. L. Bartlett and T. Downs, “Using random weights totrain multilayer networks of hard-limiting units,” IEEETransactions on Neural Networks , 202–210 (1992).[32] T. Downs and R. J. Gaynier, “The use of random weightsfor the training of multilayer networks of neurons withheaviside characteristics,” Mathl. Comput. Modelling ,53–61 (1995).[33] D. Rowell, “Computing the matrix exponential theCayley-Hamilton method,” (2004), online lecture notesfrom MIT Dept. of Mech. Eng., unpublished.[34] S. R. Hedemann, “Hyperspherical parameterization ofunitary matrices,” (2013), arXiv:1303.5904 [quant-ph].[35] M. M. Wilde, Quantum Information Theory (CambridgeUniversity Press, Cambridge, 2013).[36] T. Rudolph, “Why I am optimistic about the silicon-photonic route to quantum computing,” (2016),arXiv:1607.08535 [quant-ph].[37] R. Rojas,

Neural Networks (Springer, Berlin, 1996).[38] N. J. Cerf, C. Adami, and P. G. Kwiat, “Optical simula-tion of quantum logic,” Phys. Rev. A , R1477–R1480(1998).[39] M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani,“Experimental realization of any discrete unitary opera- tor,” Phys. Rev. Lett. , 58–61 (1994).[40] W. R. Clements, P. C. Humphreys, B. J. Metcalf, W. S.Kolthammer, and I. A. Walmsley, “An optimal de-sign for universal multiport interferometers,” (2016),arXiv:1603.08788 [physics.optics].[41] E. Knill, R. Laﬂamme, and G. J. Milburn, “A schemefor eﬃcient quantum computation with linear optics,”Nature , 46–52 (2001).[42] P. C. Humphreys, B. J. Metcalf, J. B. Spring, M. Moore,P. S. Salter, M. J. Booth, W. S. Kolthammer, and I. A.Walmsley, “Strain-optic active control for quantum inte-grated photonics,” Opt. Express , 21719–21726 (2014).[43] L. Sansoni, F. Sciarrino, G. Vallone, P. Mataloni,A. Crespi, R. Ramponi, and R. Osellame, “Polariza-tion entangled state measurement on a chip,” Phys. Rev.Lett. , 200503 (2010).[44] D. Bonneau, M. Lobino, P. Jiang, C. M. Natarajan,M. G. Tanner, R. H. Hadﬁeld, S. N. Dorenbos, V. Zwiller,M. G. Thompson, and J. L. O’Brien, “Fast path and po-larization manipulation of telecom wavelength single pho-tons in lithium niobate waveguide devices,” Phys. Rev.Lett. , 053601 (2012). FIG. 11. The optics circuit of the neuron module. There are four spatial modes labelled | (cid:105) , | (cid:105) , | (cid:105) and | (cid:105) . Initially only | (cid:105) and | (cid:105)(cid:105)