[PDF] Nanoscale neural network using non-linear spin-wave interference

Abstract

We demonstrate the design of a neural network, where all neuromorphic computing functions, including signal routing and nonlinear activation are performed by spin-wave propagation and interference. Weights and interconnections of the network are realized by a magnetic field pattern that is applied on the spin-wave propagating substrate and scatters the spin waves. The interference of the scattered waves creates a mapping between the wave sources and detectors. Training the neural network is equivalent to finding the field pattern that realizes the desired input-output mapping. A custom-built micromagnetic solver, based on the Pytorch machine learning framework, is used to inverse-design the scatterer. We show that the behavior of spin waves transitions from linear to nonlinear interference at high intensities and that its computational power greatly increases in the nonlinear regime. We envision small-scale, compact and low-power neural networks that perform their entire function in the spin-wave domain.

Full PDF

NNanoscale neural network using non-linearspin-wave interference ´Ad ´am Papp , Wolfgang Porod , and Gyorgy Csaba P ´azm ´any P ´eter Catholic University, Faculty of Information Technology and Bionics, Budapest, Hungary Center for Nano Science and Technology University of Notre Dame (ND nano ), Notre Dame, IN, USA * [email protected] ABSTRACT

We demonstrate the design of a neural network, where all neuromorphic computing functions, including signal routing andnonlinear activation are performed by spin-wave propagation and interference. Weights and interconnections of the network arerealized by a magnetic ﬁeld pattern that is applied on the spin-wave propagating substrate and scatters the spin waves. Theinterference of the scattered waves creates a mapping between the wave sources and detectors. Training the neural networkis equivalent to ﬁnding the ﬁeld pattern that realizes the desired input-output mapping. A custom-built micromagnetic solver,based on the Pytorch machine learning framework, is used to inverse-design the scatterer. We show that the behavior of spinwaves transitions from linear to nonlinear interference at high intensities and that its computational power greatly increases inthe nonlinear regime. We envision small-scale, compact and low-power neural networks that perform their entire function in thespin-wave domain.

The interest in neuromorphic computing hardware skyrocketed in recent years, for two main reasons. It was realized longago that digital systems (let they be CPUs or GPUs) are rather inefﬁcient for such inherently analog tasks. A more recentdevelopment is that traditional, MOS-transistor based devices turned out to have a strong staying power for Boolean, digitallogic - which has driven the research of emerging nanoelectric devices towards neuromorphic, analog problems. These arethe application areas where emerging devices have the potential to show substantial beneﬁts over MOS switches .A central challenge of the research on neuromorphic devices is that most computing models require highly inter-connected systems, i.e. artiﬁcial neurons with a large number of connections, often all-to-all connections. Stand-aloneneuronal units has little utility - there should always be an effective way to interconnect those devices to computingsystems. This is where wave-based computing concepts show their strengths: if the computing device is realized in awave-propagating substrate, then interference patterns realize an all-to-all interconnection between points of this substrate.The power of wave-based computing have long been harnessed in optical computing and the high interconnectivityis a major selling point for most optical (holographic, interference-based) devices. It is, however, clear that while linear interference is excellent for high interconnections, its computing power is fairly limited. Linear interference is sufﬁcientonly for signal processing tasks: general-purpose computing and all variants of neuromorphic computing require somesort of nonlinearity. In optical computing, implementing nonlinearities requires high optical intensities and nonlinearitiesare often implemented separately from the linear scatterer that provides the interconnections. Other types of waves mayimplement nonlinear functions in a more natural way. In the present paper we show that spin waves provide both highinterconnections and the nonlinearities required for neuromorphic computing.Spin waves (also referred to as magnons) are wave-like, collective excitations of a spin ensemble. Here we restrictourselves to spin waves propagating in ferro-, and ferrimagnetic thin ﬁlms. Spin-wave behavior is approximately linear atlow amplitudes, but nonlinearities become signiﬁcant at moderate intensities. Unlike photons, magnons interact with eachother, which is a requirement for non-trivial computation. Spin waves show many similarities to electromagnetic wavesand preserve many beneﬁts of optics, for example, they can maintain long coherence length even at room temperature .Spin waves exist down to sub-100 nm wavelengths at microwave frequencies and they are suitable for integration withelectronic components .High connectivity and built-in nonlinearity make spin waves an ideal choice for neuromorphic computing, in theory.However, in order to actually use spin waves for useful computing tasks, an inverse problem must be solved: one must ﬁnda scatterer conﬁguration that yields to a certain input / output relation via the formation of an interference pattern. This isin general a daunting task due to the complexity of nonlinear wave propagation. a r X i v : . [ c ond - m a t . d i s - nn ] D ec ery recently, Hughes et. al. presented a theoretical framework for implementing a recurrent neural network (RNN)in a medium described by a nonlinear wave equation. Speciﬁcally, it was shown that if a substrate is described by thenonlinear wave equation and this substrate is excited and probed at given points, then the equations that give the wavedynamics between the prescribed points map to an RNN. In their work the nonlinearity of the medium is modeled by aspatially varying and intensity-dependent wave propagation speed. Training of the neural network is implemented byadjusting the spatially dependent wave propagation speed by gradient-based computational learning.The work of Hughes et al. is an original, fresh approach to wave-based computing, but leaves crucial questionsunanswered. It is admitted that numerical simulations with the computational learning machine do not fully supportthe premise of the paper, as the RNN-equivalent nonlinear structure shows similar performance to what is achievable bylinear propagation. Thus, it is not proved that the presented structure can indeed exploit nonlinear waves to achieve betterperformance in problems beyond linear signal processing. Furthermore, it is not elaborated, how the form of nonlinearityassumed in the paper can be realized by a physical system, albeit a few hints are provided for optical implementations. Inthe present paper, we use the work of Hughes et al. as a starting point, but we study an experimentally realizable magneticsystem and model it with full micromagnetic simulations that can precisely describe experimental scenarios. We employa speciﬁc physical system and program it to do true neuromorphic functions. The device is a magnetic thin ﬁlm, with aspatially non-uniform magnetic ﬁeld acting on it. A custom micromagnetic solver based on a machine-learning framework,Pytorch , is used to design a magnetic ﬁeld distribution that steers (scatters) spin waves to achieve the desired function. Wenamed our micromagnetic design engine Spintorch.For small-amplitude excitations, Spintorch solves an inverse problem for the linear wave equation - it designs a magneticﬁeld distribution that performs a desired linear operation (such as matrix multiplication, convolution, pattern matching,spectral analysis, matched ﬁltering) as we will show in Sec. 2. The algorithm has great utility already in this regime, as itautomatizes the design of spin-wave based RF signal processors.Higher-amplitude spin waves, with a precession angle above few degrees, show nonlinear behavior and Spintorch –the exact same computational learning engine – can be used to design a nonlinear interference device. This device isfunctionally equivalent to the RNN of Hughes et al. , and in Sec. 2.3 we will show how the introduction of nonlinearityincreases the computational ability of the device. The spin-wave scatterer becomes a true neural network, exploitingnonlinearity to exceed the performance of linear classiﬁers.It is noted, that our manuscript is submitted simultaneously with , in which the Authors use the inverse-designmagnonics to create arbitrary linear, nonlinear and nonreciprocal devices. A spin-waves scatterer is a magnetic thin-ﬁlm with spatially non-uniform magnetic ﬁeld acting on it: this magnetic ﬁelddistribution locally changes the dispersion relation of the wave , scatters (steers) the spin waves, creating an interferencepattern. For sake of concreteness we assumed that the wave source is a microwave coplanar waveguide (CPW). The outputof the spin wave scatterer is the spin-wave intensity at particular areas, which experimentally could be picked up viaantennas on the ﬁlm surface. a b o o o CPW

PMA c absorbing boundaryinput waveguide o o o o o o input waveguide Figure 1.

Nanomagnet based spin-wave scatterer. a ) The schematics of the envisioned computing device. The inputsignal is applied on the coplanar waveguide (CPW) on the left, and the magnetic state (up/down) of programming magnetson top of the YIG ﬁlm deﬁne the weights. b ) The PMA magnets on top of YIG generate a bias-ﬁeld landscape. The trainingalgorithm ﬁnds the binary state of the programming magnets. c ) Spin-wave intensity pattern for a particular applied input,which results in a high intensity at o . The size of the simulation area is 10 µ m by 10 µ m. n order to design experimentally realizable ﬁeld distributions, a speciﬁc geometry of ﬁeld-generating nanomagnetswas assumed, as sketched in Fig. 1 a . The punchcard-like pattern of up/down pointing nanomagnets sits on top ofa low-damping substrate (such as yttrium iron garnet, YIG) and acts as the program for the spin-wave scatterer. Theprogramming nanomagnets assumed to exhibit strong perpendicular magnetic anisotropy (PMA) and their magnetizationis not inﬂuenced by the spin waves propagating in the layer underneath - see Supplementary for details on the materialsystem. The physical system is straightforwardly realizable, in fact it is rather similar to the scenarios used in recentexperiments, such as in . For some simulations we used a more ﬁne-grained ﬁeld distribution, see Supplementary fordetails.Spintorch inverse-designs the up/down conﬁguration of the programming magnets in order to realize particular outputintensity patterns as a response to an input temporal waveform. The code uses the same gradient-based algorithm thatis implemented in Wavetorch , but a GPU-based, custom-built full micromagnetic solver is used to model spin-wavepropagation as described in the Supplementary section. Instead of using a (nonlinear) wave equation for modeling wavepropagation, we solve for the underlying physics by discretizing the modeled region into 25 nm ×

25 nm ×

25 nm sizedvolumes and solve the Landau-Lifshitz-Gilbert equations to calculate the precession of magnetic moments in thesecomputational regions. Most importantly, our micromagnetic solver fully accounts for the demagnetizing ﬁeld, and thus thechange in magnetic ﬁeld due to the magnetization precession, which is the source of nonlinearity in spin-wave propagation.The micromagnetic solver is fully integrated within the computational engine which performs gradient based optimizationof the trainable parameters ﬁnding the optimal up/down magnet conﬁguration. Perhaps the simplest example of inverse design is that of a spectrum analyzer, where the design objective is to focus differentspectral components (frequencies) to different spatial locations of the scatterer. In our example we used a 10 µ m × µ mscatterer to separate 3 GHz, 3.5 GHz and 4 GHz components of the time-domain signal applied on the waveguide. Theoutputs are 300 nm diameter areas and time-integrated wave intensity over these areas is deﬁned as the output variable.The computational learning engine converges to a high-quality design in about 30 training epochs. Here we usedsmall-amplitude spin waves for the training: for precession angles not exceeding a few degrees (excitation ﬁelds in the mTrange), the computational learning algorithm ﬁnds the same solution regardless of the amplitude. The snapshots of Fig. 2show the spin-wave intensity for the the three frequencies and show that the device performs the required function. Thepunch-card program that is found by the learning engine is non-intuitive and does not resemble spectrum analyzer designsthat were constructed from optical analogies . The ﬁeld pattern however, makes a similar impression to refractive indexpatterns in photonic metamaterial devices . The converged scattering pattern also depends on initial conditions thatare given to the computational learning engine. The designs, however, all appear to be robust: we veriﬁed that switchingerrors in the magnet states (which are unavoidable in an experimentally realized device) do not affect the performancesigniﬁcantly in most cases. a b c o o o f f f o o o o o o Figure 2.

Frequency separation by training. a - c ) The scatterer was trained to direct frequency components f = 3 GHz, f = 3.5 GHz, and f = 4 GHz to the corresponding outputs denoted by o , o , o . The bar charts indicate time-integratedintensities measured at the outputs (green circles). The colormaps show time-integrated intensity of spin waves at t = 30 ns.Black/white circles are contours of the out-of-plane component of the magnetic ﬁeld indicating the state of the magnets ontop of the YIG ﬁlm (same in all cases a - c ). The size of the simulation area is 10 µ m by 10 µ m.We would like to point out that the selectivity of the spectrum analyzer design is limited by the relatively small degreesof freedom provided by the approximately 300 binary values (i.e. the magnets). To scale up the simulations to include more agnets, signiﬁcantly more computing resources would be needed. Instead, in the following examples we used externalmagnetic ﬁeld values as training parameters directly without simulating magnets on top. This increases the degrees offreedom to approximately 1600 continuous variables, resulting in much better performance at the same computationalexpense.The automatized design of linear signal processors alone is an important results and opens many potential applicationfor spin-wave-based devices. Just as photonic metamaterials have much smaller footprint than classical optics devices(such as a 4 f correlator), the above-designed scatterer (spin-wave metamaterial) has the same advantages over designsbased on classical optics (such as ). For the computational engine it makes no difference whether the scatterer needs to focus ’pure’ frequencies to the outputpoints or it has to identify a certain spectral pattern. We tested this by running a vowel recognition example using the vowelsamples available in the Wavetorch package . The waveforms of the vowels were scaled up to microwave frequencies, insuch a way that the frequency components with signiﬁcant energy content on the input waveguide launch propagatingwaves with wavelengths compatible with the scatterer. The scatterer structure was trained to maximize the spin-waveintensity at one of the three output points, which correspond to the recognized vowels. We used two or four samples ofeach vowel as a training set. The rest of the 44 samples for each vowel was used as test set.Some results on the training samples can be seen in Fig. 3 a . In 30 training epochs the system was able learn todistinguish the vowels ‘ae’, ‘ei’, and ‘iy’, directing the waves toward the correct outputs.For comparison, we repeated the simulations with increased excitation ﬁelds (nonlinear regime, see Fig. 3 b ). On thetraining dataset the difference is not very signiﬁcant, although in Fig. 3 c it is clearly visible that the nonlinear operationachieved better performance and also the convergence is faster. The quality of the vowel recognition operation is alsocompared using confusion matrices in case of the testing dataset. For three vowels these are 3 × c i j matrix elements give the percentage ofcases where vowel i is identiﬁed for j vowel as input. For perfect recognition the confusion matrix is diagonal with 100% atthe c ii elements. a b c o o o ae ei iy ae eiiy d e linear nonlinear Figure 3.

Using the spin-wave scatterer for vowel recognition. a-b ) Wave intensity patterns, formed in response to thetime-domain excitations (vowels). The scatterer was trained to focus waves to the corresponding outputs. The bar chartsshow the intensity at the output locations (normalized). The linear regime a (1 mT excitation ﬁeld) and the nonlinearregime b (50 mT excitation ﬁeld) performs comparably well on the training data (slight improvement in case of nonlinearwaves). c ) Cross-entropy loss decreases during the training, indicating learning. After 30 epochs (training steps), thenonlinear cases achieve better performance compared to the linear case. Note that a nonzero loss value corresponds to theperfect response, indicated by solid line. d-e ) Confusion matrices over the testing data set (120 vowel samples). onfusion matrices on the training data set – not shown here – were perfectly diagonal for all amplitudes, which isnot surprising with these small training sets and the comparably large internal complexity of the scatterer. But a strikingdifference appears between linear and nonlinear performance in case the larger, 44-element data set is used for testing.Figure 3 d shows the confusion matrices for this test scenario. Confusion matrices are signiﬁcantly more diagonal in thenon-linear case. The linear device performed signiﬁcantly worse in learning to recognize unseen vowels. The nonlineardevice, however, performed relatively well even when trained only on two samples (not shown here) and its accuracy furtherimproved when using four samples for training. In the latter case it misidentiﬁed only 7 out of 120 vowels.The confusion matrices in this test scenario characterize the generalization (extrapolation) ability of the network. Basedon a very small (2 or 4 vowel waveform) learning set, the network had to recognize and classify vowels that it had not seenbefore. Linear scatterers can not excel in this job - they match the distinctive spectral features of learned samples, buttheir ability to generalize from learned data is very limited. The nonlinear scatterer appears to behave as a true neuralnetwork, which performs nonlinear classiﬁcation and generalizes (extrapolates) from the training data. We believe that oursimulation data may also verify the hypothesis of , that nonlinear wave interference acts as an RNN. Performing successful vowel recognition does not necessarily require a neural network and satisfactory results can beobtained by linear classiﬁers, as shown in the above example. Using the vowel recognition example, it is not at allstraightforward to identify what beneﬁts could possibly come from using a neural network like behavior.However, a fairly simple example can show the computational limitations of linear interference, where the superpositionprinciple always holds. In the following example (see Fig. 4) the training goals were to(A) focus waves on output o at 3 GHz input frequency,(B) focus waves on output o at 4 GHz input frequency, and(C) focus waves on output o if and only if 3 GHz and 4 GHz are simultaneously present.Clearly, condition (C) is inconsistent with the superposition of (A) and (B).We used excitation amplitudes in the linear (1 mT) and soft-nonlinear (20 mT and 50 mT) regimes and run the trainingfor 30 epochs in every case. The resulting spin-wave intensity snapshots are shown in Fig. 4. It is clearly visible that theresults of the training are different in case of different amplitudes. The paths traveled by the waves are completely differentin the three cases. It is also clear from the snapshots that the linear case failed to focus on o , while the nonlinear caseswere clearly focusing to the bottom output ( o ) avoiding o .As expected, the linear case could not provide the desired outcome: the output of the two-frequency case is a linearcombination of the outputs observed with single-frequency excitations. On the contrary, the operation in the nonlinearregime achieved good results, with the highest amplitude excitation giving the best outcome. Quantitatively, the lossfunction, which quantiﬁes the quality of the computational learning, yields the same conclusion: The linear case didnot show a convergence over the 30 epochs, while the nonlinear cases converged to an optimal loss value. The highestexcitation amplitude achieved lower loss at the end of the training, and its convergence was also faster.This elementary example demonstrates a crucial difference between the computational power of linear and nonlinearspin waves - and this difference is expected to manifest itself for more complex operations, such as the high-amplitudevowel recognition example in Sec 2.2. It also serves as a proof that our computation engine is able to exploit the nonlinearityof spin waves. Nowadays complex neuromorphic computing pipelines are implemented on CPUs and GPUs, which posses extremecomputing power, but have poor energy efﬁciency for analog neuromorphic tasks: computing steps are implementedon digital (often ﬂoating point) arithmetic and each of those steps consume energy in the E = − J range. A typicalprocessing computing primitive (such as a single convolution with smaller precision in Convolutional Neural Network)consumes roughly the same amount.Spin waves in nanoscale magnetic structures carry very little energy: the total magnetic energy stored in the patternsof say Fig. 4 c is about E ≈ E ≈ − J. Patterns in the linear regime hold orders of magnitude less energy, in thefew-eV range. The stored energy can serve as a ﬁrst estimate of the energy that is dissipated in the magnetic domain ineach neural computation step. The time it takes for the interference pattern to build up is in the order of t =

10 ns – so thespin-wave scatterer simultaneously achieves low power and high speed. These are stellar numbers when compared to theabove-mentioned energy of approximate convolution or ﬂoating point operation, indicating great potential to the spin b c o o o o o o o o f f (f +f )

10 10 10 o o f f (f +f )10 10 10 o o f f (f +f )10 10 1010 Figure 4.

A simple example of a problem that is not solvable by a linear system. Input is encoded in two frequencies( f = 3 GHz and f = 4 GHz), and the training function is listed in the inset tables (expected results indicated by numbers,output data is shown in color). a ) In the linear case (1 mT excitation ﬁeld) application of simultaneous frequencies resultsin both o and o high (incorrect training). b - c ) In the nonlinear cases the wave is focused at o , but o is avoided (correctoperation). In case of 50 mT excitation the distinction is even stronger. The colormap shows integrated wave intensity. Thesize of the simulation area is 10 µ m by 10 µ m.wave based processor. Neurons and synapses based on other emerging devices also consume signiﬁcantly more energythan the stated E ≈ − and they do it at slower computational times. The neural operation done by the scattering blockin the nonlinear regime is also considerably more complex than a convolution or what is performed by the synapses andneurons in .The energy dissipated in the magnetic domain is just a lower bound for consumption: the spin-wave scatterer is mostlikely used as a hardware accelerator in electrical circuitry – and in that case the net energy efﬁciency of spin-wave-basedcomputing block is dominated by the magneto-electric transducer. More speciﬁcally, picking up magnetic oscillationsfrom sub-square-micrometer areas will induce less then a microvolt voltage in the transducer antenna, possibly even lessthen that. Amplifying such small and high-frequency signals requires signiﬁcant microwave circuitry, which consumes atleast 10 mW of power . Assuming a GHz date rate, this gives E = − J per output point. Transduction on the input side(creation of spin waves) is less of a problem as can be done by acceptable efﬁciency using coplanar waveguides and a singlewaveguide can excite a larger number of scatterers.The net power efﬁciency of the spin-wave scatterer is comparable to that of electronic implementation for a simpleoperation (i.e. a convolution). If large internal complexity can be reached in the scatterer with a single or very few inputs,then the spin-wave scatterer potentially leads to several orders of magnitude performance gain compared to electronicimplementations.It is worth noting that optical reservoir computing – another promising hardware for accelerating neural compu-tations – consumes in the order of E = − to E = − , which is comparable to a small spin-wave scatterer with I/O,with a signiﬁcantly larger device footprint. Strictly linear operations in optics may be performed with much higher energyefﬁciency (due to the more straightforward scalability of optical systems ), but such systems require several additionalcomponents for general-purpose computation. Spin waves are a leading candidate for non-electrical information processing and magnonic devices have been designed formany different purposes, such as Boolean logic gates , and signal processors . In many cases, magnonic computers arederived from photonic computing devices and most often classical photonics is used as an inspiration, with lenses, mirrors,interferometers designed in the spin-wave domain.Our work advances the state of art of magnonic computing devices on two fronts. Firstly, we demonstrated that thecomputational tools developed for the inverse-design of photonic metamaterials (a.k.a. photonic inverse design) canbe applied in the spin-wave domain: convolvers, spectrum analyzers, matched ﬁlters and possibly a large variety of RFsignal processing devices can be designed in a fully automatic way. Spin waves, unlike electromagnetic waves, seamlesslytransition to a nonlinear regime at higher excitation amplitudes. Apparently, the computational design algorithm operatesjust as well if the underlying wave propagation is nonlinear wave, and designs devices based on nonlinear interference. The econd and perhaps most important result of our work is that the capabilities of such-designed nonlinear interferencedevices go way beyond linear signal processing, they are likely equivalent to recurrent neural networks. The device realizesall the interconnections, weighted sums and the nonlinearities in a single magnetic ﬁlm.Wave-based general-purpose computing – and more generally, computing in a material substrate by the laws of physics– is a longtime dream of the emerging computing community . Possibly, spin-wave-based nonlinear processors bringcloser to the fulﬁllment of this vision. Acknowledgements

The authors are grateful for fruitful discussions and encouragement from Markus Becherer (TUM), Andrii Chumak, QiWang (University of Vienna) and Philipp Pirro (TU Kaiserslautern). We are also grateful for the team of the DARPA NAC(Nature as Computer) program, especially to Dr. Jiangying Zhou (DARPA) for her professional project management andexpert advice to drive our the work. This research was in part ﬁnancially suported by the DARPA NAC program. Adam Pappreceived funding from the postdoctoral grant (PPD-2019) of the Hungarian Academy of Sciences.

Author contributions statement

G. C. and W.P. conceived the original idea, Á.P. designed the computational engine, performed the micromagnetic simula-tions and explored the role of nonlinearities. Á.P. W. P. and G.C. wrote the manuscript. All authors discussed the results andreviewed the manuscript.

Additional information

Competing ﬁnancial interests:

The authors declare no competing ﬁnancial interests.

Spintorch is a modiﬁed version of Wavetorch , in which we implemented a full micromagnetic solver to precisely model spin-wavebehavior. The numerical engine for inverse design is built on the popular and open-source machine learning framework, Pytorch . Animportant feature of Pytorch is the automatic gradient calculation, which allows automatic backpropagation throughout complicatedmultilayered computational ﬂows. In our system this means gradient calculation can be performed backwards in time throughoutthe whole wave propagation. This allows us to perform gradient based optimization of the trainable parameters, e.g. the appliedmagnetic-ﬁeld distribution. Pytorch also provides a number of optimizers, loss functions, and data loading modules, so we did not needto implement these from scratch. Pytorch modules can run on CPU or GPU (using CUDA), without any device-speciﬁc coding on theuser side.In order to exploit the automatic gradient calculation feature of Pytorch, custom modules must use the internal methods forimplementing the forward path of the system. This way the backward method is automatically generated on the ﬂy by building acomputational graph and saving the required intermediate results. Thus, readily available micromagnetic solvers (such as OOMMF ormumax3 ) cannot be integrated in Pytorch because these do not build a computational graph and do not save intermediate results forbackpropagation. The dynamics of the media are described by the Landau-Lifsitz-Gilbert (LLG) equation and takes into account all relevant physicalinteractions in the micromagnetic model. Elementary magnetic moments are represented by three-dimensional vectors, and we use aﬁnite difference discretization with a rectangular grid. The dynamics of magnetic moments depend on the torque exerted on them bythe effective magnetic ﬁeld, which is a sum of several ﬁeld components. Most importantly, it includes a (space-, and time-dependent)external ﬁeld, the dipole ﬁelds of other magnetic moments, and the exchange interaction between neighboring electrons. The dipoleinteraction is a long-range effect, thus, it is the most computationally expensive part of the calculation. We used FFT based accelerationfor calculating the solution of the Poisson equation (i.e. determine the dipole ﬁelds), for which the GPU-accelerated FFT module ofPytorch enabled effective implementation. The exchange ﬁeld is calculated only between nearest neighbors, as exchange ﬁeld is local.Exchange-ﬁeld calculation is implemented using a convolution with a Laplacian kernel. The time stepping of the differential equation isrealized by a classical 4th order Runge-Kutta method. The LLG equation also includes a damping term, which is also implemented inour code, thus realistic attenuation of spin waves is simulated. Damping is also used to realize absorbing boundary conditions, so wecould accurately model a few-micron sized, ﬁnite region of an extended magnetic ﬁlm. The micromagnetic model fully accounts for thenonlinearities appearing at higher intensities: these nonlinearities are a direct consequence of the dependence of the demagnetizingﬁeld on the local magnetization and the spin-wave amplitude.We veriﬁed our solver by comparing results with the widely used mumax3 solver . The high-level use of GPU-based functions and theoverhead of automatic gradient computation makes our code less efﬁcient as a general purpose micromagnetic solver, but still, running imes are comparably fast and more than 100,000 cells with a few thousand timesteps can be simulated in minutes on a state-of-the-artGPU. This makes it possible to embed the solver into the learning algorithm and train the system with multiple samples and epochswithin a few hours or, with a larger training set, days. YIG is used as a medium for low-damping spin-wave propagation, and arrays of nanomagnets with perpendicular magnetic anisotropy(PMA) provide control over spin waves via their dipole ﬁelds. PMA magnets are bistable (magnetization pointing either upwards ordownwards) if their size is below the single domain limit (typically less than a few hundred nanometers).Such a system could provide a reconﬁgurable means to programming spin-wave-based neural networks, by individual switching ofthe nanomagnets. This implementation of the scatterer shows many beneﬁts over a lithographically patterned (hardwired) scatterer. Inour model we included the calculation of realistic dipole ﬁelds of the nanomagnet arrays, which works for any conﬁguration.The chosen material system and geometry is just one of many possible choices. Metallic ferromagnets could have been used in placeof the YIG ﬁlm - these have higher damping (shorter propagation length), but easier to integrate and access electrically. Also instead ofthe stray-ﬁeld programming, lithographically deﬁned patterns (lithography followed by etching) could have deﬁned the function of thescatterer. Fine-grained tuning of YIG magnetic properties can be achieved by FIB irradiation of a YIG ﬁlm that continuously changesmagnetic parameters as a function of the local dose. We expect that our computational engine can be used with similar effectivenesswhen ﬁlm magnetic parameters are adjusted by training, instead of designing the applied ﬁeld pattern as we have done in this work.

Here we would like to show how a spin-wave scatterer block can represent a single layer of a neural network (perceptron layer). Aperceptron layer can be described mathematically as a linear transformation (vector-matrix multiplication) followed by a nonlinearactivation function: y = σ ( Wx ), where x is a vector of length n representing the input, W is an m × n matrix that contains the trainableweights of the perceptron layer, and σ is an activation function applied on every output channel (in the simplest case a thresholdfunction). Regarding its functionality, a perceptron performs a linear classiﬁcation, so a layer of perceptrons performs m different linearclassiﬁcations. The linear transformation ( W ) can be performed by a spin-wave scatterer block, as depicted in Fig. 5. If the amplitude ofthe spin-waves is sufﬁciently small, the wave propagation can be described by the linear wave equation. Input signals routed to inputantennas generate spin waves with corresponding amplitudes and phases. Waves travel through a region where the effective refractiveindex is spatially varying according to the program (the desired linear transformation). The wave intensity from every input will bedistributed among the outputs by the scatterer map (some losses may also occur). Since the wave propagation is assumed to be linear,the activation function has to be implemented in the readout circuitry. Figure 5. a) Perceptron layer b) Spin-wave scatterer

The matrix representation of a given scatterer map can be constructed using the superposition principle: exciting the inputs one ata time with unit amplitude (base vectors) and recording the outputs will produce the columns of the equivalent matrix. The inverseproblem is, however, more cumbersome to solve in general. One possible approach is the machine learning method described by Hugheset al. , which is directly applicable to any system that obeys the linear wave equation, or can be modiﬁed for nonlinear equations.Such a device, apart from dynamic range and scaling limitations, can in principle realize any perceptron layer. But the computingcapabilities of a single layer are limited to linear classiﬁcation, and even some relatively simple operations (such as an XOR gate) isimpossible to realize using this device. To overcome such limitations one could create a multilayer neural network using such devicessequentially, but any advantages that come from the low-power operation and compactness of the spin-wave scatterer would beovershadowed by the required readout circuitry. Any approach to exploit the beneﬁts of the highly interconnected nature of waveinterference should minimize the number of input-output conversions. Thus, we investigated the feasibility of exploiting the nonlinearityof spin waves, which would allow implementing a multilayer neural network (or a recurrent neural network) within a single scatteringblock. eferences Markovic, Danijela, Alice Mizrahi, Damien Querlioz, and Julie Grollier. "Physics for neuromorphic computing." NatureReviews Physics (2020): 1-12. Maendl, Stefan, Ioannis Stasinopoulos, and Dirk Grundler. "Spin waves with large decay length and few 100 nmwavelengths in thin yttrium iron garnet grown at the wafer scale." Applied Physics Letters 111, no. 1 (2017): 012403.Harvard Csaba, Gyorgy, Adam Papp, and Wolfgang Porod. "Perspectives of using spin waves for computing and signal processing."Physics Letters A 381, no. 17 (2017): 1471-1476. Hughes, Tyler W., Ian AD Williamson, Momchil Minkov, and Shanhui Fan. "Wave physics as an analog recurrent neuralnetwork." Science Advances 5, no. 12 (2019): eaay6946. https://pytorch.org Q. Wang, A. Chumak, and P. Pirro, Inverse-design magnonic devices, arXiv (2020). Papp, A., W. Porod, and G. Csaba. "Hybrid yttrium iron garnet-ferromagnet structures for spin-wave devices." Journal ofApplied Physics 117, no. 17 (2015): 17E101. https://github.com/fancompute/wavetorch Vansteenkiste, Arne, Jonathan Leliaert, Mykola Dvornik, Mathias Helsen, Felipe Garcia-Sanchez, and Bartel Van Waeyen-berge. "The design and veriﬁcation of MuMax3." AIP advances 4, no. 10 (2014): 107133 see also M.J. Donahue and D.G.Porter Interagency Report NISTIR 6376, National Institute of Standards and Technology, Gaithersburg, MD (Sept 1999)

Papp, Adam, Wolfgang Porod, Arpad I. Csurgay, and Gyorgy Csaba. "Nanoscale spectrum analyzer based on spin-waveinterference." Scientiﬁc Reports 7, no. 1 (2017): 1-9.

Molesky, Sean, Zin Lin, Alexander Y. Piggott, Weiliang Jin, Jelena Vuckovi´c, and Alejandro W. Rodriguez. "Inverse designin nanophotonics." Nature Photonics 12, no. 11 (2018): 659-670.

Lu, Jesse, and Jelena Vuˇckovi´c. "Nanophotonic computational design." Optics express 21, no. 11 (2013): 13351-13367.

Estakhri, Nasim Mohammadi, Brian Edwards, and Nader Engheta. "Inverse-designed metastructures that solve equa-tions." Science 363, no. 6433 (2019): 1333-1338.

Csaba, G., A. Papp, and W. Porod. "Spin-wave based realization of optical computing primitives." Journal of AppliedPhysics 115, no. 17 (2014): 17C741.

Nikonov, Dmitri E., and Ian A. Young. "Benchmarking Delay and Energy of Neural Inference Circuits." IEEE Journal onExploratory Solid-State Computational Devices and Circuits 5, no. 2 (2019): 75-84.

Egel, Eugen, György Csaba, Andreas Dietz, Stephan Breitkreutz-von Gamm, Johannes Russer, Peter Russer, FranzKreupl, and Markus Becherer. "Design of a 40-nm CMOS integrated on-chip oscilloscope for 5-50 GHz spin wavecharacterization." AIP Advances 8, no. 5 (2018): 056001.

Nakajima, Kohei. "Physical reservoir computing—an introductory perspective." Japanese Journal of Applied Physics 59,no. 6 (2020): 060501.

Freiberger, Matthias, Andrew Katumba, Peter Bienstman, and Joni Dambre. "Training passive photonic reservoirs withintegrated optical readout." IEEE transactions on neural networks and learning systems 30, no. 7 (2018): 1943-1953.

Mario Miscuglio, Volker J. Sorger: Photonic tensor cores for machine learning Applied Physics Reviews 7, 031404 (2020)

Wang, Q., M. Kewenig, M. Schneider, R. Verba, F. Kohl, B. Heinz, M. Geilen et al. "A magnonic directional coupler forintegrated magnonic half-adders." Nature Electronics (2020): 1-10.

Stepney, Susan. "The neglected pillar of material computation." Physica D: Nonlinear Phenomena 237, no. 9 (2008):1157-1164.

W. Porod: Let Physics do the Computing: Analog Computation Revisited, keynote, ISCASD 2020

Csaba, Gyorgy, Adam Papp, Wolfgang Porod, and Ramazan Yeniceri. "Non-boolean computing based on linear wavesand oscillators." In 2015 45th European Solid State Device Research Conference