Spiking neuromorphic chip learns entangled quantum states
Stefanie Czischek, Andreas Baumbach, Sebastian Billaudelle, Benjamin Cramer, Lukas Kades, Jan M. Pawlowski, Markus K. Oberthaler, Johannes Schemmel, Mihai A. Petrovici, Thomas Gasenzer, Martin Gärttner
SSpiking neuromorphic chip learns entangled quantum states
Stefanie Czischek, ∗ Andreas Baumbach, Sebastian Billaudelle, Benjamin Cramer, Lukas Kades, Jan M. Pawlowski, Markus K. Oberthaler, Johannes Schemmel, Mihai A. Petrovici,
3, 1
Thomas Gasenzer,
1, 2 and Martin G¨arttner
1, 2, † Kirchhoff-Institut f¨ur Physik, Ruprecht-Karls-Universit¨at Heidelberg, Im Neuenheimer Feld 227, 69120 Heidelberg, Germany Institut f¨ur Theoretische Physik, Ruprecht-Karls-Universit¨at Heidelberg, Philosophenweg 16, 69120 Heidelberg, Germany Department of Physiology, University of Bern, 3012 Bern, Switzerland (Dated: August 6, 2020)Neuromorphic systems are designed to emulate certain structural and dynamical properties ofbiological neuronal networks, with the aim of inheriting the brain’s functional performance andenergy efficiency in artificial-intelligence applications [1, 2]. Among the platforms existing today,the spike-based BrainScaleS system stands out by realizing fast analog dynamics which can boostcomputationally expensive tasks [3]. Here we use the latest BrainScaleS generation [4] for thealgorithm-free simulation of quantum systems, thereby opening up an entirely new applicationspace for these devices. This requires an appropriate spike-based representation of quantum statesand an associated training method for imprinting a desired target state onto the network. Weemploy a representation of quantum states using probability distributions [5, 6], enabling the useof a Bayesian sampling framework for spiking neurons [7]. For training, we developed a Hebbianlearning scheme that explicitly exploits the inherent speed of the substrate, which enables us torealize a variety of network topologies. We encoded maximally entangled states of up to four qubitsand observed fidelities that imply genuine N -partite entanglement. In particular, the encoding ofentangled pure and mixed two-qubit states reaches a quality that allows the observation of Bellcorrelations, thus demonstrating that non-classical features of quantum systems can be captured byspiking neural dynamics. Our work establishes an intriguing connection between quantum systemsand classical spiking networks, and demonstrates the feasibility of simulating quantum systems withneuromorphic hardware. As von-Neumann computers are rapidly approachingfundamental physical limitations of conventional semi-conductor technology, a number of alternative computingarchitectures are currently being explored. Among them,neuromorphic devices [1, 2], which take inspiration fromthe way the human brain works, hold promise of having awide range of applications, in particular in machine learn-ing and artificial intelligence [4, 8–15]. Here we focus onusing them for an algorithm-free emulation of measure-ment outcomes in quantum physics [16], which are inher-ently probabilistic in nature. For this task, our approachhas two distinct advantages when compared with classicalMonte Carlo methods using von-Neumann devices. First,the accelerated analog circuit dynamics enable a rapidgeneration of samples, which inherently corresponds to afaster approximation of the underlying probability distri-bution. Second, the parallel nature of the neuromorphicsubstrate carries immediate scaling benefits, as the timeto solution is independent of the emulated network size.We use neuronal spikes (action potentials) to marktransitions between discrete states and thereby effectivelycarry out the sampling process. The all-or-nothing na-ture of spikes represents a blessing in disguise. On theone hand, it does have an apparent drawback by mak-ing the computation of gradients – and thus, training – ∗ [email protected] † [email protected] more demanding than in classical deep neural networks[2]. However, it also allows us to use a spiking neuromor-phic substrate in the first place, the speed-up of whichwe harness for efficient Hebbian learning [17].Since any quantum state can be equivalently repre-sented by a probability distribution [5, 6], it can, in turn,be encoded using networks of leaky integrate-and-fire(LIF) neurons [3, 7, 18]. Here, we use the BrainScaleS-2 chip [4] as a physical substrate to emulate such net-works. This mixed-signal neuromorphic platform is cen-tered around an analog core: neuro-synaptic states arerepresented as voltages and currents in integrated elec-tronic circuits and evolve in continuous time. Its config-urable connectivity of neurons allows us to explore therepresentational power of different network topologies,including shallow, as well as deep and densely connectedones. With this substrate, we demonstrate the encodingof entangled quantum states in spiking neural networks. Neuromorphic encoding of quantum states
In classical machine learning, generative models basedon artificial neural networks are used to encode and sam-ple from probability distributions [17]. Similarly, spikingneural networks can be viewed as approximating Markov-chain Monte-Carlo sampling, albeit with dynamics thatdiffer fundamentally from standard statistical methods[19]. Here, we encode quantum states using the hierarchi-cal network architecture illustrated in Fig. 1a. The net-work consists of N visible and M hidden leaky integrate- a r X i v : . [ c s . ET ] A ug [ µs ] N e u r o n I D V o l t a g e c Spiking neurons ... v v v v h h hM − hM Wi,jbj di Hidden layer Visible layer a Network structure b Neuromorphic chip ↓↓ ↓↑ ↑↓ ↑↑ ↓↓↓↑↑↓↑↑− . . . . . ↓↓ ↓↑ ↑↓ ↑↑ ↓↓↓↑↑↓↑↑− . . . . . Im ( ρ )Re ( ρ ) | Ψ i i = c ↑↑ i |↑↑ i + c ↓↓ i |↓↓ i + c ↑↓ i |↑↓ i + c ↓↑ i |↓↑ i ρ = P i q i | Ψ i ih Ψ i | = P { a ,a } P ( a , a ) Q a ,a e Quantum spin state ( , ) ( , ) ( , ) (cid:0) a ,a (cid:1) . . . . P (cid:0) a , a (cid:1) a a d POVM representation a = 3 a = 2 a = 0 a = 1 POVM basis
FIG. 1.
Neuromorphic representation of quantum states. a , Two-layer spiking network architecture with weightparameters W i,j between the visible (orange) and hidden (green) neurons and biases d i ( b j ) for the binary visible (hidden)neurons. b , Photograph of the BrainScaleS-2 chip used as a substrate for the experiments in this work. c , Dynamical evolutionof the spiking network. Upper panel: membrane potential evolution of a single LIF neuron integrating synaptic input. Wheneverthe potential crosses a threshold a spike is generated and the potential is clamped to prevent immediate refiring (refractoryperiod). Lower panels: Spikes (solid lines) for 4 visible (orange) and 8 hidden (green) neurons with associated z = 1 time frames(shaded regions). The network state is observed periodically (gray lines showing only every fifth observation time for visibilityreasons). Each observation results in a binary vector corresponding to a sample drawn from the underlying distribution. d , The4-state positive-operator-valued measure (POVM) representation of a qubit state can be encoded by a pair of visible neurons.A combination of N such neuron pairs thus serves to represent an N -qubit system. The frequency of occurrence of neuronconfigurations drawn from a trained network encodes the POVM probability distribution of a quantum state (lower panel). e , Any quantum state can be represented as a density matrix ρ , which can be a statistical mixture of states | Ψ i (cid:105) . For theexample of two qubits shown here, the complex-valued entries of ρ can be reconstructed linearly from the sampled probabilities P ( a , a ). For the definition of the operators Q a ,a , see Methods. and-fire neurons arranged in a bipartite graph with asymmetric connectivity matrix. Such a network can betuned to approximate the probability of the visible neu-rons to be in state v = ( v , . . . , v N ), v i ∈ { , } , as themarginal p ( v ; W ) = 1 Z ( W ) (cid:88) { h } exp [ − E ( v , h ; W )] , (1)over all hidden states h = ( h , . . . , h M ), where h j ∈ { , } , of the joint Boltzmann distribution p ( v , h ; W ) = exp [ − E ( v , h ; W )] [19]. The network en-ergy E ( v , h ; W ) = − (cid:80) i,j v i W i,j h j − (cid:80) i v i d i − (cid:80) j h j b j depends on the set of network parameters W = ( W, b , d )including the weights W i,j and biases b j and d i . Thepartition sum Z ( W ) = (cid:80) { v , h } p ( v , h ; W ) ensures nor-malization.The BrainScaleS-2 system, depicted in Fig. 1b, fea-tures 512 LIF neuron circuits interacting through a con-figurable weight matrix [4]. Communication in a networkof LIF neurons is realized via spikes. Whenever the mem-brane potential of a neuron, which is the result of a leakyintegration of the synaptic input, exceeds a threshold, itsends a spike to the neurons connected to it (Fig. 1c, toppanel). In the spike-based sampling framework, the re-fractory period τ ref following a spike encodes the state z = 1, while z = 0 at all other times (Fig. 1c, lower panel). The stochasticity required for sampling is in-duced by adding a random component to the generationof spikes; for LIF networks, this can be ensured by suf-ficiently noisy membrane potentials [7, 18]. To this end,we used on-chip sources to inject pseudo-Poisson spiketrains into the network (see [20]).As an experimental result, the BrainScaleS-2 chip re-turns a list of all spike times and associated neuronIDs. This information is sufficient to reconstruct thestate sampled by the network at any point in time. Weestimated the distribution sampled by the network byobserving its state at regular intervals, as visualized inFig. 1c. To ensure an optimal estimate, the observationfrequency needs to be at least ( τ ref / − (see [20]). Forour analysis, we used ( τ ref / − , thereby guaranteeing alarge safety margin. The resulting binary configurationsare collected in a histogram as shown in Fig. 1d.A pure quantum state is described by a vector inHilbert space and can be represented by a hermitian den-sity matrix with complex entries. Density matrices canalso encode mixed states and thus account for a possiblecoupling to an environment, which is relevant for a realis-tic description of experiments. Fig. 1e shows an exampleof a density matrix for a system of two spin-1/2 degreesof freedom (qubits) corresponding to a Hilbert-space di-mension d = 4. The corresponding probability distribu-tion which we encode in our network is obtained froma so-called tomographically complete measurement [5].Such a measurement has d possible outcomes. Math-ematically, these outcomes are represented by a set ofoperators { M a } a , forming a so-called positive-operator-valued measure (POVM). The density matrix can bereconstructed uniquely as ρ = (cid:80) { a } P ( a ) Q a from theprobabilities P ( a ) = Tr [ ρM a ] for obtaining outcome a .The operators Q a are given by Q a = (cid:80) { a (cid:48) } T − a , a (cid:48) M a (cid:48) ,with T a , a (cid:48) = Tr [ M a M a (cid:48) ] [6]. In our two-qubit exam-ple (Fig. 1d) we chose M a = M a ⊗ M a , where M a i ( a i = 0 , . . . ,
3) are projection operators onto the single-qubit states represented as the four corners of a tetrahe-dron on the Bloch sphere. As each a i can take four dif-ferent values, the encoding of the probabilities P ( a ) by aspiking network is realized by representing each qubit bya pair of binary neurons in the visible layer (cf. gray shad-ings in Fig. 1a). This results in the distribution p ∗ ( v )over the visible neurons (see Methods).To achieve spike-based sampling from the target distri-bution p ∗ ( v ), the parameters of the spiking network wereadjusted in an iterative training procedure. We used theKullback-Leibler divergence D KL ( p ∗ (cid:107) p ) = (cid:88) { v } p ∗ ( v )ln (cid:20) p ∗ ( v ) p ( v ; W ) (cid:21) (2)to measure the quality of the sampled marginal p ( v ; W ).In each training epoch, the synaptic weights were up-dated along the gradient of the D KL (see Methods):∆ W i,j ∝ (cid:104) v i h j (cid:105) target − (cid:104) v i h j (cid:105) model . (3)Pairwise correlations (cid:104) v i h j (cid:105) model in the network weredirectly estimated from the sampled distribution p ( v , h ; W ). Target correlations were also obtained fromthe sampled distribution by renormalization to the targetmarginal distribution: (cid:104) v i h j (cid:105) target = (cid:28) p ∗ ( v ) p ( v ; W ) v i h j (cid:29) p ( v , h ; W ) . (4)A similar scheme was used for the neuronal biases b j and d i . This otherwise prohibitively compute-intensivemethod was made possible by the accelerated hardwaredynamics and allows a much better approximation ofthe D KL gradient than the more conventional contrastivedivergence update scheme [17]. Moreover, it does notrely on layer-wise conditional independence, allowing theexploration of network topologies other than bipartitegraphs. Encoding an entangled Bell state
To demonstrate that a spiking neural network can learnto represent entangled quantum states we focus on a max-imally entangled two-qubit state, the Bell state | Ψ + (cid:105) =( | ↑↑(cid:105) + | ↓↓(cid:105) ) / √
2. This state is a prototypical example exhibiting quantum mechanical correlations [21, 22]. Wetrained a network of four visible and 20 hidden neurons toencode the POVM probability distribution correspondingto ρ B = | Ψ + (cid:105)(cid:104) Ψ + | . For calculating the weight updatesin each epoch of the training procedure, as well as forevaluating expectation values, we drew 125000 samplesof neuron states. This number is sufficient for the satu-ration of the D KL as can be seen in Fig. 3b and was usedfor all experiments, if not specified otherwise.To characterize the learned quantum state, we used theobservable B (Θ), which can signal genuine quantum cor-relations and is experimentally accessible via measure-ments as illustrated and defined in Fig. 2a: The twoqubits are distributed to two parties who independentlyperform one of two possible measurements on their re-spective qubit. We choose the standard parametrizationof the different measurements by a single angle Θ. Fora Bell state this procedure yields correlations violatingthe inequality |B (Θ) | ≤
2, which is obeyed by classicalsystems [22]. At Θ = π/ ρ B and thus yields an experi-mentally accessible witness for Bell correlations [21, 23].The data which we obtain from the trained spiking net-work clearly exceeds the shows that the encoded quantumcorrelations surpass the classicality bound |B (Θ) | = 2(red points in Fig. 2b) and is in agreement with the the-oretical prediction (black line). The inset shows how theBell correlation witness B (Θ = π/
4) develops during thetraining, converging after less than 1000 iterations.To illustrate the generality of our neuromorphic en-coding scheme we consider mixed quantum states byadding white noise to the pure Bell state resulting inthe Werner state ρ W = rρ B + (1 − r ) / ≤ − r ≤ |B (Θ) | and eventually confines it within the classi-cal regime (cf. green data in Fig. 2b). For 1 − r > / √ − r > / r as shown inFig. 2c. The fluctuations in the experimental data de-crease with increasing noise contribution, allowing a moreaccurate learning of mixed states. This counterintuitiveeffect is due to additional noise leading to an increase inentropy, which is synonymous with sampling from moreuniform distributions. These, in turn, are realized byweaker weights, thus decreasing the influence of imper-fect synaptic interactions in the neuromorphic substrate. Learning performance
We analyzed in detail the convergence of the learning al-gorithm using the classical Kullback-Leibler divergence D KL as defined in Eq. (2). In addition, we use the quan- . . . . . . − r . . . . . . . B ( π / ) π/ π/ π/ π π/ π/ π/ π Θ − − − B ( Θ ) Training epochs B ( π / ) Entangled SeparableClassical limitPure Bell stateNoisy Bell state − r = 0 . Classical regimeQuantum regime a b c
Violation of the classical bound Effects of white noiseMeasurement setup
Source oror Comparison
Eα,β = h Sα Sβ i − h Sα ih Sβ i Sα = cos( α ) σz + sin( α ) σx B (Θ) = E , Θ + E , − Θ + E , Θ − E , − Θ Basis choice β = Θ β = − Θ DetectionBasis choice α = 0 α = 2Θ DetectionParty 1 Party 2
FIG. 2.
Encoding Bell states and Werner states. a , Illustration of a typical Bell-test scenario. Two correlated qubitsemerging from a source are distributed between two parties. Each of the parties is allowed to choose between two differentmeasurements each characterized by a single common angle Θ. The measurement outcomes indicate genuine quantum correla-tions if the combination B (Θ) of the correlations violates the inequality |B (Θ) | ≤ b , Observable B (Θ) evaluated on the learned encoding of the Bell state ρ B = | Ψ + (cid:105)(cid:104) Ψ + | on the neuromorphic hardware, with M = 20 hiddenneurons. Red symbols depict the observable for different angles Θ, averaged over the last 200 training epochs, where errorbarshere and in the following denote the standard deviation. Note that these data points have been obtained from the same trainednetwork and the same set of neuron states sampled from it by evaluating the observable for different angles Θ on this sampleset. Werner states ρ W = rρ B + (1 − r ) / r = 0 .
3. In both cases, the data capture the exact values (black lines) well, including the violation of the classical bound inthe pure case r = 1. The inset shows the evolution of the Bell-correlation witness B (Θ = π/
4) during training (red line) andthe convergence towards the expected value (black dashed line). c , Bell-correlation witness B (Θ = π/
4) for a Werner state asa function of the noise strength 1 − r . The exact solution (black line) is captured well for both entangled and separable states.
103 104 105 S − − D K L ` p ∗k p ´
20 40 60
Hidden Neurons M . . . F i d e li t y Training epochs . . . . . . . . F i d e li t y Training epochs − − − D K L Hidden neurons M N = 2 N = 3 N = 4 a Training process b Sampling behavior c Multiple spins
FIG. 3.
Training performance. a , Dynamics of the learn-ing procedure for the pure Bell state ρ B . The quality of thenetwork-encoded state is measured by the quantum fidelity,Eq. (5) (main frame), and by the Kullback-Leibler divergence,Eq. (2) (inset), for different numbers of hidden neurons. Forbetter visibility, the running average over 50 epochs is shownin the inset as solid lines, with the shaded areas indicating thecorresponding standard deviation. b , Kullback-Leibler diver-gence in a fixed trained network with M = 20 hidden neuronsas a function of the number of samples drawn. The dashed lineshows the expected trend for exact sampling from the targetdistribution. c , Quantum fidelity as a function of the numberof hidden neurons for GHZ states | Ψ (cid:105) = ( |↑(cid:105) ⊗ N + |↓(cid:105) ⊗ N ) / √ N = 2 (Bell state), 3, and 4 qubits. We show the aver-ages over 200 training epochs after convergence (gray shadedarea in a ). The dashed line shows the bound for genuine N -partite entanglement. tum fidelity F ( ρ B , ρ N ) = Tr (cid:20)(cid:113) √ ρ B ρ N √ ρ B (cid:21) , (5)to quantify the distance between the target state ρ B andthe network-encoded state ρ N , which, for pure states, re-duces to the state overlap. As shown in Fig. 3a, thelearning converges after 1000 training epochs. Increas-ing the number of hidden neurons we find that the fi-delity reaches ≈
98% (correspondingly D KL (cid:46) − )for M (cid:38)
20 hidden neurons. The limited reachable fi-delity is a result of many different factors of the physi-cal implementation of the spiking neural network on theBrainScaleS-2 platform. The synaptic connections areimplemented with 6-bit resolution, limiting the achiev-able precision of approximating the probability distribu-tion. Also, uncontrolled environmental changes such astemperature variations or host-to-system effects influencethe performance of the hardware. This manifests in thejumps of fidelity occurring during learning, as well as instrong noise in the fidelity after the learning process hassaturated, as can be seen in Fig. 3a. These instabilitiesexceed the anticipated noise level due to finite samplestatistics used for evaluating observables and calculatinggradients in each epoch. These factors degrade the cor-respondence between the model assumption underlyingthe employed learning rule and the actual dynamics ofthe hardware. Many of the issues mentioned above canbe resolved in future hardware generations.
Hidden neurons M . . . . F i d e li t y Hidden neurons M . . . F i d e li t y a Partially-layered network b Deep network structure ··· ········· M M NM M M FIG. 4.
Extending the network architecture. a , Fidelitybetween network-encoded and perfect Bell state, Eq. (5), as afunction of the number of hidden neurons for strictly layerednetwork (blue) and an architecture with additional connec-tions between the visible neurons (orange). b , Quantum fi-delity for states encoded in a deep network architecture witha second hidden layer containing 5 (orange) and 10 (green)neurons compared to the restricted two-layer network (blue). To ensure that the learning performance is not limitedby finite sample statistics, we evaluated the Kullback-Leibler divergence as a function of the number of sam-ples in a trained network with fixed network parame-ters. Figure 3b shows the expected convergence towardsa minimum value determined by the quality with whichthe spiking network approximates the POVM distribu-tion. Typically, for > samples the statistical erroris negligible compared to the representation error, whichjustifies our choice of training with 125000 samples perepoch.Having demonstrated high-fidelity emulation of two-qubit entangled states, we investigated whether states ofmultiple qubits can also be encoded by our spiking sam-pling network. Figure 3c shows the fidelity achieved inlearning Greenberger-Horne-Zeilinger (GHZ) states [25],i. e. N -qubit generalizations of a Bell state, as a functionof the number of hidden neurons M . The underlyingprobability distribution covers a larger state space of thevisible neurons, requiring us to increase the number ofsamples to 225000 to reach convergence in the D KL . In allcases the fidelity of the learned state to the perfect GHZstate increases with M , reaching values of close to 90%and about 70% for three and four qubits, respectively.Note that a GHZ-state fidelity above F = 1 / √ ≈ N -partite entan-glement (cf. dashed line in Fig. 3c). Deep and partially restricted networks
Our flexible learning scheme allows the training of net-work architectures beyond simple bipartite graphs. Toexplore network architectures with potentially larger rep-resentational power we added connections between the visible neurons, resulting in a more densely connectednetwork. Figure 4a shows that a Bell state can be en-coded successfully with this architecture, reaching similarfidelities as the two-layer fully restricted spiking network.We also explored deeper network architectures by addingan additional hidden layer, see Fig. 4b. Again, the Bellstate was learned successfully reaching similar fidelitiesas in the bipartite case. We note that the learning perfor-mance is not monotonic at small M for M = 10 neuronsin the second hidden layer. This is expected, since theintermediate layer constitutes an information bottlenecktowards the visible layer, which makes learning more dif-ficult. Therefore, the greater representational power of-fered by additional depth [26] does not necessarily trans-late into a higher fidelity for M < M .The fact that the learning performance does not im-prove when using different architectures indicates thatthe reachable fidelity is currently limited by technicalimperfections rather than the representational power ofthe ansatz. Larger-scale systems may be able to exploitthe greater representational power of these deeper andmore complex architectures, especially considering thatthe sampling time is independent of the network size forthis class of neuromorphic devices. Discussion and Outlook
We have shown that a spiking neural network imple-mented on a neuromorphic chip can encode entangledquantum states of few particles with high quantum fi-delity. In particular non-classical Bell correlations can beencoded faithfully, demonstrating that intrinsic quantumfeatures can be captured by a classical spiking network.The fidelities and system sizes achieved in this firststudy on neuromorphic quantum state encoding shouldbe regarded as a proof of principle. The experiencedrestrictions are only technical in nature and can be im-proved in future generations of spiking neuromorphic de-vices. Specifically for the BrainScaleS-2 system, both thehardware and its surrounding software framework are inan ongoing maturation process. The size and fidelity ofamenable simulations of quantum systems can be signif-icantly improved upon by optimizing the usage of hard-ware real-estate, the signal-to-noise ratio of the analogcircuitry and the calibration of the chip. Judging fromthe current pace of progress in neuromorphic engineer-ing, significantly larger systems, both digital and analog,can be expected to become available in the near future[1].Furthermore, runtime improvements are anticipated,as the current bottleneck is the calculation of the weightupdates of the network parameters, which is done “off-line” on a conventional computer and only the samplingitself is performed on the chip (see [20]). Using the on-chip plasticity processor to update synaptic weights hasthe potential of drastically reducing the training time byremoving the cumbersome chip-host loop [27].One key advantage of this neuromorphic system ascompared with simulated generative models is that scal-ing to larger network sizes does not increase the timeneeded to collect a desired number of samples. Given theefficient learnability [28] and representability of quantumstates [29–31], as well as sampling schemes for neuromor-phic devices [32, 33], we thus expect favorable scalingproperties for our approach. Thus our work opens up apath towards applications of neuromorphic hardware inquantum many-body physics.
METHODSNeuromorphic hardware chip.
The BrainScaleS-2system is a mixed-signal neuromorphic platform. Itsanalog core is composed of neuron and synapse circuitswith inherent time constants of the order of microsec-onds. An application-specific integrated circuit (ASIC)for the BrainScaleS-2 system features 512 neuron cir-cuits, which emulate the adaptive exponential integrate-and-fire model. These individual compartments can bewired to resemble more complex structured neurons. Anon-chip analog parameter memory as well as integratedstatic random-access memory (SRAM) cells allow us toindividually configure and optimize the dynamics of eachcircuit. Each neuron integrates input from 256 dedicatedsynapses, which carry a 6-bit weight and can be eitherexcitatory or inhibitory.The analog core is accompanied by supporting logic,including circuitry for communication and configuration.Further functionality is provided by high-bandwidthspike sources, which can emit either regular or Poisso-nian spike trains of configurable frequency. A routingmodule allows mixing these spikes with external stimuliand recurrent events. It allows, in combination with in-synapse event filtering, the implementation of arbitrarynetwork topologies.Custom embedded processors allow the modification ofthe entire configuration space during the runtime of anexperiment. Tightly coupled to the synaptic arrays, theyallow the efficient and flexible implementation of learningrules based on observables such as neuronal potentials,firing rates, and synaptic correlations.
Representation of the Bell state.
The Bell state, | Ψ + (cid:105) = 1 / √ | ↑↑(cid:105) + | ↓↓(cid:105) ), is described by the densitymatrix ρ B = 12 in the standard basis. To encode this state in a spik-ing neural network, we map it to a POVM probabilitydistribution. While several choices of POVM representations arepossible, we here focus on the tetrahedral representa-tion, where each measurement projects a single qubitonto one corner of a tetrahedron in the Bloch sphere[6]. The POVM elements M a i for each qubit i can hencebe expressed in the form M a i = ( + s a i σ ) /
4, withPauli operators σ = ( σ x , σ y , σ z ) and s a i =0 = (0 , , s a i =1 = 1 / (cid:0) √ , , − (cid:1) , s a i =2 = 1 / (cid:0) −√ , √ , − (cid:1) , s a i =3 = 1 / (cid:0) −√ , −√ , − (cid:1) . The POVM elements thustake the form M a i =0 = 12 (cid:20) (cid:21) , M a i =1 = 16 (cid:20) √ √ (cid:21) ,M a i =2 = 112 (cid:20) −√ − i √ −√ √ (cid:21) ,M a i =3 = 112 (cid:20) −√ √ −√ − i √ (cid:21) . (6)With this, the POVM probability distribution of the Bellstate, P B ( a , a ) = Tr [ ρ B M a ⊗ M a ], evaluates to P B = 18 / / / / / / / / / / / / , (7)where columns correspond to the index a and rows tothe index a .To reconstruct the density matrix from this probabilitydistribution, the inverse of the full-system overlap ma-trix T is needed, which can be constructed as the prod-uct of the single-qubit overlap matrices, T = T ⊗ T .Each single-qubit overlap matrix consists of the elements T a i ,a (cid:48) i = Tr (cid:2) M a i M a (cid:48) i (cid:3) . For the tetrahedral POVM theinverse T − i of the single-qubit overlap matrix takes theform T − i = − − − − − − − − − − − − . (8)The density matrix can then be reconstructed linearly as ρ = (cid:80) { a ,a } P ( a , a ) Q a ,a , with operators Q a ,a = (cid:80) { a (cid:48) ,a (cid:48) } ( T − a ,a (cid:48) ⊗ T − a ,a (cid:48) )( M a (cid:48) ⊗ M a (cid:48) ).Furthermore, expectation values of general operators O can be rewritten in terms of the probability distribu-tion P ( a , a ), (cid:104)O(cid:105) = Tr [ ρ O ]= (cid:88) { a ,a } Q O a ,a P ( a , a ) , (9)with Q O a ,a = (cid:80) { a (cid:48) ,a (cid:48) } Tr (cid:2) O M a (cid:48) ⊗ M a (cid:48) (cid:3) T − a ,a (cid:48) ⊗ T − a ,a (cid:48) .This enables an efficient evaluation of expectation valuesby sampling configurations from P ( a , a ) in the POVMrepresentation, where the density matrix does not needto be calculated explicitly [6].The Bell state is encoded in a sampling spiking net-work as follows. The visible neurons v are identifiedwith the qubits a in the POVM representation. The net-work parameters are trained such that the distribution P B ( a , a ) is represented by the network. To achieve this,we need to translate the variables a , a , which can takefour possible values each, into binary neurons v , whereeach neuron can take the values 0 or 1. The mapping tofour binary visible neurons v , . . . , v is accomplished bydefining a = 2 v + v , a = 2 v + v . (10)From this we can derive the distribution p ∗ B ( v ) over thestates of the visible neurons and have all ingredients toencode the Bell state in our spiking network.Analogously, the probability distribution for the two-qubit Werner state with noise contribution r can be de-rived from its density matrix [24, 34], ρ W = 14 r r − r − r r r . (11)The same is true for Greenberger-Horne-Zeilinger(GHZ) states of more than two qubits, described by thedensity matrices [25], ρ GHZ = 12 . . . ...0 01 0 . . . . (12)We can then approximate the corresponding probabilitydistributions by a spiking sampling network. Training algorithm.
Our goal is to approximate a tar-get distribution p ∗ ( v ) by the model distribution p ( v ; W )encoded by the spiking neuromorphic hardware. The dis-tance between the two distributions is quantified by theKullback-Leibler divergence, D KL ( p ∗ (cid:107) p ) = (cid:88) { v } p ∗ ( v ) ln (cid:20) p ∗ ( v ) p ( v ; W ) (cid:21) = (cid:88) { v } p ∗ ( v ) (cid:16) ln [ p ∗ ( v )] − ln [˜ p ( v ; W )]+ ln [ Z ( W )] (cid:17) . (13)Here we assumed that p ( v ; W ) is well described by themarginal of a Boltzmann distribution and introducedthe unnormalized probability distribution ˜ p ( v ; W ) = (cid:80) { h } exp [ − E ( v , h ; W )] as the exponential of the nega-tive network energy, as well as the partition sum Z ( W ) = (cid:80) { v , h } exp [ − E ( v , h ; W )], which allows us to replace p ( v ; W ) = ˜ p ( v ; W ) /Z ( W ) in the second line of eq. (13).The gradient of the Kullback-Leibler divergence withrespect to a general connecting weight W i,j is given by ∂D KL ∂W i,j = (cid:88) { v } p ∗ ( v ) (cid:20) − p ( v ; W ) ∂ ˜ p ( v ; W ) ∂W i,j + 1 Z ( W ) ∂Z ( W ) ∂W i,j (cid:21) = (cid:88) { v } p ∗ ( v ) − p ( v ; W ) (cid:88) { h } v i h j e − E ( v , h ; W ) + 1 Z ( W ) (cid:88) { v (cid:48) , h } v (cid:48) i h j e − E ( v (cid:48) , h ; W ) = − (cid:88) { v , h } p ∗ ( v ) p ( v ; W ) v i h j e − E ( v , h ; W ) Z ( W )+ (cid:88) { v (cid:48) , h } v (cid:48) i h j e − E ( v (cid:48) , h ; W ) Z ( W )= (cid:88) { v , h } (cid:20) − p ∗ ( v ) p ( v ; W ) (cid:21) v i h j exp [ − E ( v , h ; W )] Z ( W )= (cid:28)(cid:20) − p ∗ ( v ) p ( v ; W ) (cid:21) v i h j (cid:29) p ( v , h ; W ) . (14)Thus, the weight updates are calculated by drawing asample set of network states, evaluating the probability p ( v ; W ) underlying the configurations in the set, and cal-culating the expectation value of the product of the twoconnected neurons, weighted with 1 − p ∗ ( v ) /p ( v ; W ).When using the spiking network on the BrainScaleS-2system, we draw these sample states by observing thenetwork at regular points in time spaced by 2 µ s (for arefractory time of about 10 µ s, see [20]).The weight update in training epoch t then reads W ti,j = W t − i,j − η (cid:28)(cid:20) − p ∗ ( v ) p ( v ; W ) (cid:21) v i h j (cid:29) p ( v , h ; W ) , (15)with learning rate η . Analogously, updates for the biasescan be derived, b tj = b t − j − η (cid:28)(cid:20) − p ∗ ( v ) p ( v ; W ) (cid:21) h j (cid:29) p ( v , h ; W ) ,d ti = d t − i − η (cid:28)(cid:20) − p ∗ ( v ) p ( v ; W ) (cid:21) v i (cid:29) p ( v , h ; W ) . (16)If connections between the visible neurons exist in thenetwork structure, the updates for those connectingweights are analogous to Eq. (15), where the weightedexpectation value of the product of the correspondingvisible neurons is evaluated. This learning scheme is amodified version of wake-sleep learning [17].Since this training algorithm is based on a gradient-descent ansatz, we can apply further modifications whichlead to better convergence, such as a momentum ap-proach to avoid getting stuck at a local minimum. Inour simulations, we apply the Adam optimizer scheme.This scheme combines a momentum approach with anadaptive learning rate which is chosen for each networkparameter individually. The update for a general net-work parameter W k is given, in the Adam optimizer, by m tk = β m t − k + (1 − β ) ∂D KL ( p ∗ (cid:107) p ) ∂ W k ,v tk = β v t − k + (1 − β ) (cid:20) ∂D KL ( p ∗ (cid:107) p ) ∂ W k (cid:21) , ˆ m tk = m tk − β t , ˆ v tk = v tk − β t , W tk = W t − k − η ˆ m tk (cid:112) ˆ v tk + ε , (17)where m k acts as a momentum and v k sets the adaptivestepsize. Here we follow the common choice and set thehyper-parameters to β = 0 . β = 0 . ε = 10 − ,[35].In general, Hebbian training algorithms are based onminimizing the correlation mismatch between data andmodel distributions. The traditional way for estimatingthis mismatch is contrastive divergence [17, 36], wherethe target and model distributions are approximated bya single layer-wise network update (CD-1). This methodcan only be used to obtain an update of the networkparameters for layer-wise connected networks and essen-tially represents a performance optimization with respectto sampling from the complete distributions. For phys-ical neuromorphic systems, the notion of a “single net-work update” becomes meaningless and the performancecharacteristics make the actual sampling run cheaply ascompared with the start-up cost. We take advantage ofthis difference by using the full model distribution for cal-culating updates. We thus reconstruct the target visibledistribution from the model distribution by reweightingas described above. The advantage of this strategy isthat arbitrary network architectures including partiallyrestricted and deep networks can be used. Acknowledgments
We are indebted to the late K.Meier who envisioned and championed the BrainScaleSsystem and made seminal contributions to this endeavor.We thank A. Kungl for discussions and the ElectronicVision(s) group, in particular E. C. M¨uller, C. Mauch,Y. Stradmann, P. Spilger, J. Weis, and A. Emmel, formaintaining and providing access to the BrainScaleS-2system and for technical support. This work is sup-ported by the Deutsche Forschungsgemeinschaft (DFG,German Research Foundation) under Germany’s Excel-lence Strategy EXC 2181/1 – 390900948 (the Heidel-berg STRUCTURES Excellence Cluster), by the DFG – project-ID 273811115 – SFB 1225 (ISOQUANT), bythe European Union 7th and Horizon-2020 FrameworkProgrammes, through the ERC Advanced Grant Entan-gleGen (Project-ID 694561) and under grant agreements604102, 720270, 785907, 945539 (HBP), and by the Man-fred St¨ark Foundation.
Author contributions
S.C. and A.B. carried out theexperiment, analysed the data and wrote the paper. S.B.,B.C., and A.B. configured the neuromorphic hardwareand provided the software interface to access it. S.C.,M.G., and T.G. developed the theory and designed theneuromorphic encoding scheme. All authors contributedto interpreting the data and writing the manuscript. [1] C. S. Thakur, J. L. Molin, G. Cauwenberghs, G. Indiveri,K. Kumar, N. Qiao, J. Schemmel, R. Wang, E. Chicca,J. Olson Hasler, J. Seo, S. Yu, Y. Cao, A. van Schaik, andR. Etienne-Cummings, Front. Neurosci. , 891 (2018).[2] K. Roy, A. Jaiswal, and P. Panda, Nature , 607(2019).[3] A. F. Kungl, S. Schmitt, J. Kl¨ahn, P. M¨uller, A. Baum-bach, D. Dold, A. Kugele, E. M¨uller, C. Koke, M. Klei-der, C. Mauch, O. Breitwieser, L. Leng, N. G¨urtler,M. G¨uttler, D. Husmann, K. Husmann, A. Hartel,V. Karasenko, A. Gr¨ubl, J. Schemmel, K. Meier, andM. A. Petrovici, Front. Neurosci. , 1201 (2019).[4] S. Billaudelle, Y. Stradmann, K. Schreiber, B. Cramer,A. Baumbach, D. Dold, J. G¨oltz, A. F. Kungl, T. C.Wunderlich, A. Hartel, E. M¨uller, O. Breitwieser,C. Mauch, M. Kleider, A. Gr¨ubl, D. St¨ockel, C. Pehle,A. Heimbrecht, P. Spilger, G. Kiene, V. Karasenko,W. Senn, M. A. Petrovici, J. Schemmel, and K. Meier,arXiv:1912.12980 [q-bio.NC] (2019).[5] A. Peres, Quantum Theory: Concepts and Methods (Springer, Dordrecht, 2002).[6] J. Carrasquilla, G. Torlai, R. G. Melko, and L. Aolita,Nat. Mach. Intell. , 155 (2019).[7] M. A. Petrovici, J. Bill, I. Bytschok, J. Schemmel, andK. Meier, Phys. Rev. E , 042312 (2016).[8] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao,S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, et al. , IEEE Micro , 82 (2018).[9] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cas-sidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam,C. Guo, Y. Nakamura, et al. , Science , 668 (2014).[10] J. Pei, L. Deng, S. Song, M. Zhao, Y. Zhang, S. Wu,G. Wang, Z. Zou, Z. Wu, W. He, et al. , Nature , 106(2019).[11] F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra,Z. Zhang, M. P. Flynn, and W. D. Lu, Nature Electronics , 290 (2019).[12] I. Boybat, M. Le Gallo, S. Nandakumar, T. Moraitis,T. Parnell, T. Tuma, B. Rajendran, Y. Leblebici, A. Se-bastian, and E. Eleftheriou, Nature communications ,1 (2018).[13] S. Moradi, N. Qiao, F. Stefanini, and G. Indiveri, IEEEtransactions on biomedical circuits and systems , 106(2017). [14] S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana,Proceedings of the IEEE , 652 (2014).[15] M. A. Petrovici, S. Schmitt, J. Kl¨ahn, D. St¨ockel,A. Schroeder, G. Bellec, J. Bill, O. Breitwieser,I. Bytschok, A. Gr¨ubl, et al. , in (IEEE,2017) pp. 1–4.[16] R. P. Feynman, Int. J. Theor. Phys. , 467 (1982).[17] G. Hinton, P. Dayan, B. Frey, and R. Neal, Science ,1158 (1995).[18] D. Dold, I. Bytschok, A. F. Kungl, A. Baumbach, O. Bre-itwieser, W. Senn, J. Schemmel, K. Meier, and M. A.Petrovici, Neural Networks , 200 (2019).[19] M. A. Petrovici, Form Versus Function: Theory andModels for Neuronal Substrates , edited by S. I. Publish-ing (Springer International Publishing, 2016).[20] See Supplemental Material at [URL will be inserted bypublisher] for details on the hardware setup.[21] J. S. Bell,
Speakable and Unspeakable in Quantum Me-chanics: Collected Papers on Quantum Philosophy , 2nded. (Cambridge University Press, 2004).[22] J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt,Phys. Rev. Lett. , 880 (1969).[23] A. Aspect, P. Grangier, and G. Roger, Phys. Rev. Lett. , 460 (1981).[24] A. Cabello, A. Feito, and A. Lamas-Linares, Phys. Rev.A , 052112 (2005).[25] D. M. Greenberger, M. A. Horne, and A. Zeilinger, “Go- ing beyond Bell’s theorem,” in Bell’s Theorem, Quan-tum Theory and Conceptions of the Universe , edited byM. Kafatos (Springer Netherlands, Dordrecht, 1989) pp.69–72.[26] N. Le Roux and Y. Bengio, Neural Comput. , 1631(2008).[27] T. Wunderlich, A. F. Kungl, E. M¨uller, A. Hartel,Y. Stradmann, S. A. Aamir, A. Gr¨ubl, A. Heim-brecht, K. Schreiber, D. St¨ockel, C. Pehle, S. Billaudelle,G. Kiene, C. Mauch, J. Schemmel, K. Meier, and M. A.Petrovici, Front. Neurosci. , 260 (2019).[28] S. Aaronson, P. Roy. Soc. A-Math. Phy. , 3089 (2007).[29] G. Carleo and M. Troyer, Science , 602 (2017).[30] R. G. Melko, G. Carleo, J. Carrsquilla, and J. I. Cirac,Nat. Phys. , 887 (2019).[31] C. Wetterich, Nucl. Phys. B , 114776 (2019).[32] S. Czischek, J. M. Pawlowski, T. Gasenzer, andM. G¨arttner, Phys. Rev. B , 195120 (2019).[33] L. Kades and J. M. Pawlowski, Phys. Rev. E , 063304(2020).[34] R. F. Werner, Phys. Rev. A , 4277 (1989).[35] D. P. Kingma and J. Ba, arXiv:1412.6980 [cs.LG] (2014).[36] G. E. Hinton, Neural Comput. , 1771 (2002).[37] I. Bytschok, D. Dold, J. Schemmel, K. Meier, and M. A.Petrovici, arXiv:1707.01746 (2017).[38] O. Breitwieser, A. Baumbach, A. Korcsak-Gorzo,J. Kl¨ahn, M. Brixner, and M. Petrovici, “sbs: Spike-based sampling (v1.8.2),” (2020). Supplementary Material
Sampling Experiments on BrainScaleS-2
A network of leaky integrate-and-fire (LIF) neurons can implement a sampling spiking network (SSN) if the neuronsare under stochastic noise influence, their membrane time constant is sufficiently small and the synaptic and refractorytime constants roughly match [7]. A system-specific calibration is required to configure the analog core of BrainScaleS-2, shown in Fig. S1a according to these requirements. For ease of implementation we use a simple routing schemein which the on-chip network looks like 128 unique sources which can be arbitrarily connected. This allows theassociation of each of the 128 synapse drivers with one spike source while using the double line to implement signedsynapses (cf. Fig. S1b).The stochastic input spikes are generated via two of the eight on-chip linear shift registers (LSFRs). We assign thespike source IDs 0-63 to the network neurons and split the spike trains from the LSFRs among the IDs 64-127. Fornetworks smaller than 64 neurons, the upper part of (0-63) remains unused. Again simplifying the implementation weuse the first half of the noise IDs (64-95) as excitatory and the second half (96-127) as inhibitory sources (cf. Fig. S1clower part). This scheme allows in principle all-to-all connectivity within the network. Choosing to use a layerednetwork structure results in a block structure of the upper part of the synapse array (cf. Fig. S1c).Each sampling neuron is connected to 5 randomly chosen excitatory and 5 randomly chosen inhibitory noise sources.This introduces correlations between neurons even without synaptic connections, but in general does not hindertraining [18, 37]. Synaptic connections on BrainScaleS-2 are 6-bit-valued circuits. The dynamical impact of a singlenetwork spike (used to mediate the stochastic response of the receiving sampling unit) onto another neuron is givenby its own strength relative to the total strength of the input provided by the background sources. The latter definesthe transfer function and thereby the excitability of the neurons (cf. Fig. S1g). Choosing the noise parameters (weightand number of sources) is done such as to attain the competing goals of allowing the network neurons to drive eachother significantly while allowing for small weight changes within the 6-bit resolution limit. The particular choice is,in general, problem dependent.Having chosen the noise parameters, the sampling interface of BrainScaleS-2 becomes a black box that requires aweight (6-bit) matrix and a (10-bit) bias vector and returns a set of spike trains. Neurons are assigned a state of z = 1at time t if they emitted a spike within their effective refractory period τ effref prior to t (cf. Fig. 1c in the main text).We determine τ effref by setting the leak potential of the neurons to its maximum value and measuring the resultinginter-spike intervals (cf. Fig. S1e). The effective refractory time consists of the clamped part which is digitally drivenand therefore does not vary between different neurons and the drift part back to the spiking threshold in the end. Dueto the circuit variability (e.g. different membrane time constants) of the analog circuits we see some modest variationin τ effref (cf. Fig. S1f). Using the measured τ effref we assign a state every 2 µ s and use the set of these states for theevaluation and the update calculation (cf. Methods section of the main text).Figure Fig. S1h demonstrates the correctness of an approximated distribution for a simulated sampling spikingnetwork (using [38]) as a function of the number of samples for different state assign times dt (cf. Fig. 1 in themain text). For more than two samples per refractory period τ ref the number of samples required to achieve a givenperformance level increases due to the correlated states as expected from the Nyquist-Shannon theorem. Both thenoise parameters and the sample frequency were chosen such that they enable sufficiently accurate sampling, butwithout performing an exhaustive optimization.As discussed above, a chip-specific calibration is required but can be reused for each training. For each experimentthe chip needs to be initialized (blue period in Fig. S1d) once. This ensures that the correct calibration is loadedand the routing is configured correctly before the training iterations (orange period in Fig. S1d) can start. After theinitialization only the synapse array (weights) and the leak potentials of the neurons (biases) are reconfigured onceper epoch (green period in Fig. S1d). Each training epoch consists of 26 sampling runs (red section in Fig. S1d) anda single calculation of the parameter update (purple in Fig. S1d). In each hardware run we build a program for theFPGA to execute (dark red in Fig. S1d), transfer it to the FPGA with some initial buffering (yellow in Fig. S1d) inorder to compensate for network latencies, perform the actual execution on chip (light blue in Fig. S1d) and transferthe spikes back to the host computer (grey in Fig. S1d).In total, an epoch takes about 1 . HW-run
SynapsesSynapse drivers Neurons + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - Spike source(LSFR) Signed synapse + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - Spike source(LSFR) Signed synapse
Target ID S o u r ce I D NetworkExc. NoiseInh. Noise − H W w e i g h t [ l s b ] T i m e [ s ] C h i p i n i t . E p o c h E p o c h C a l c u l a t e ∆ W × H W - r un I n C h i p O u t
220 230 240 250 260 270 280
Time [ µ s] M e m . v o l t . [ l s b ] τ effref . . . τ effref [ µs ] F r e qu e n c y [ a u ]
200 400 600 800 1000
HW bias [lsb] O u t pu t r a t e [ k H z ] Number of samples − D K L . τ ref . τ ref . τ ref . τ ref . τ ref a bc d efgh FIG. S1.
Details of the BrainScaleS-2 neuromorphic chip. a , Photograph of the BrainScaleS-2 chip with circuits of4 ×
128 AdEx-LIF neurons (green), 2 × ×
128 synapse drivers (white) and 4 synapse arrays with 256 ×
128 synapses (yellow). b ,Routing schematic used to implement the sampling spiking network. Each synapse driver projects to two synapse rows in orderto allow signed synapses. c , Utilized logical connectivity matrix projecting onto the 64 neurons used. Network (neuron-to-neuron) connections are truncated at index 24 (4 visible and 20 hidden) and intra-layer connections are not used. Each neuronreceives noise input from 5 excitatory (64-95) and 5 inhibitory (96-127) sources, generated by one on-chip LSFR each. Eachconnection selects the appropriate synapse row depending on its sign (cf. b ) d , Time usage across a training experiment. Theinitial configuration (blue) of the chip is comparable to a single epoch (orange). Each epoch consists of a parameter update(green), 26 sampling runs (red) and the update calculation (purple). Each hardware run consist of the construction of theplayback program (ruby), the initial buffering on the FPGA (brown), the actual chip runtime (turquoise) and the readout tothe host (grey). e , Membrane trace of an exemplary neuron at the high-bias end. τ effref is the inter spike interval. f , Histogramof measured τ effref . Variations are due to the analog nature of the system. g , Activation functions as a function of the leakpotential under noise input of the 64 neurons used. τ effref is estimated by the output frequency at the high-bias end. h , Samplingperformance as a number of samples, rather than execution time for different sampling time deltas dt . More than two samplesper refractory time τ effref ≈ µ s increase the Kullback-Leibler divergence as the samples are not independent.s increase the Kullback-Leibler divergence as the samples are not independent.