Reservoir Computing with Thin-film Ferromagnetic Devices
Matthew Dale, Richard F. L. Evans, Sarah Jenkins, Simon O'Keefe, Angelika Sebald, Susan Stepney, Fernando Torre, Martin Trefzer
RR E S E RVO I R C O M P U T I N G W I T H T H I N - FI L MF E R RO M AG N E T I C D E V I C E S
Matthew Dale , Richard F. L. Evans , Sarah Jenkins , Simon O’Keefe , Angelika Sebald , Susan Stepney ,Fernando Torre , and Martin Trefzer Department of Computer Science, University of York, UK Department of Physics, University of York, UK Department of Chemistry, University of York, UK Department of Electronic Engineering, University of York, UK * [email protected] A BSTRACT
Advances in artificial intelligence are driven by technologies inspired by the brain, but these tech-nologies are orders of magnitude less powerful and energy efficient than biological systems. In-spired by the nonlinear dynamics of neural networks, new unconventional computing hardware hasemerged with the potential for extreme parallelism and ultra-low power consumption. Physicalreservoir computing demonstrates this with a variety of unconventional systems from optical-basedto spintronic [1]. Reservoir computers provide a nonlinear projection of the task input into a high-dimensional feature space by exploiting the system’s internal dynamics. A trained readout layer thencombines features to perform tasks, such as pattern recognition and time-series analysis. Despiteprogress, achieving state-of-the-art performance without external signal processing to the reservoirremains challenging. Here we show, through simulation, that magnetic materials in thin-film ge-ometries can realise reservoir computers with greater than or similar accuracy to digital recurrentneural networks. Our results reveal that basic spin properties of magnetic films generate the re-quired nonlinear dynamics and memory to solve machine learning tasks. Furthermore, we show thatneuromorphic hardware can be reduced in size by removing the need for discrete neural componentsand external processing. The natural dynamics and nanoscale size of magnetic thin-films presenta new path towards fast energy-efficient computing with the potential to innovate portable smartdevices, self driving vehicles, and robotics.
Main
Performing machine learning at ‘the edge’ is a growing area of interest, where inference is performed locally in realtime [2, 3, 4]. Embedded devices that can perform complex information processing without the need to outsourceto remote servers are ideal for real-time applications. However, current systems are limited by processing speeds,memory, size, and power consumption. Unconventional hardware is a potential alternative to classical computinghardware, with low-energy consumption, inherent parallelism, and no separation between processor and memory (thevon Neumann bottleneck) [5]. Neuro-inspired hardware [6] is one route to embed machine learning at the edge,another is to exploit embodied computation in novel dynamical systems.By design, neural-based hardware implements the abstract behaviour of neurons and their connectivity at the low-est circuit level, e.g. weighted summation, threshold functions, synapses. This typically requires a combinationof simpler components to implement the model. For example, a single neuron with conventional complementarymetal–oxide–semiconductor technology takes 10s to 100s of transistors to replicate a neuron-synapse circuit [7, 8].Another option is to force the neuron model directly onto the material to improve energy-efficiency and reduce thephysical footprint, yet model constraints may require removal of useful natural properties (e.g. variability in compo-nents) or require additional engineering [9]. Here we demonstrate an alternative approach exploiting the dynamicalbehaviours of neural systems without the direct implementation of neural units, allowing further reductions in size andefficiency. a r X i v : . [ c s . ET ] J a n ynamical properties that occur naturally within complex materials, such as memory, nonlinear oscillation, and chaoscan be directly exploited for computation, with less top-down engineering of the material. However, the discovery andcontrol of intractable or unknown material properties raises new challenges.Two novel approaches have been proposed to exploit the embodied computation of materials: evolution in materio and reservoir computing. Miller and Downing [10] proposed using artificial evolution as a mechanism to exploit andconfigure materials, arguing natural evolution is the method par excellence for exploiting the physical properties ofmaterials.Evolution in materio uses computer-controlled manipulation of external stimuli to configure the material and its input-output mapping, using digital computers to directly evolve physical material configurations. A range of materials havebeen evolved to perform classification, real-time robot control and pattern recognition [11, 12, 13, 14].Reservoir computing is a neuro-inspired framework that harnesses the high-dimensionality and temporal propertiesof recurrent networks and novel systems [15, 16]. Physical implementations of the reservoir model are diverse [17,18, 19] with recent spintronic reservoirs showing some key advantages compared to other systems, combining GHz+operating frequencies, ultra-compact size and ultra-low-energy consumption [20, 21, 22, 23, 24, 25, 26, 27].Here we demonstrate material computation with ferromagnetic materials in thin nano-film geometries, combiningboth evolution in materio and reservoir computing methods. The reservoir model is used to harness the propagationof information through magnetic films, and artificial evolution is used to optimise reservoir parameters. Using open-source simulation software, we evolve three ferromagnetic materials to solve three time-dependent tasks of increasingcomplexity. All materials are evaluated at various film sizes with direct comparisons to equivalent-sized recurrentneural networks. The magnetic system is then characterised by metrics to understand the dynamical properties of eachmaterial. Lastly, the effects of temperature and film size are explored to inform future physical implementations. Reservoir computers are composed of three layers: input, reservoir and output layer (Fig. 1a). A reservoir, typically afixed random network of discrete processing nodes with recurrent connections, features non-linear characteristics anda short-term memory. The reservoir network is driven by a time-varying input u that propagates through a randominput mapping via connection weights W in (see Methods). The non-linear reservoir provides a high-dimensionalprojection of the input from which a subsequent linear readout layer can extract features relevant to the problem task.Training occurs only at the readout through trained weighted connections W out connecting observable states to thefinal output. Typically, one-shot learning is used through linear regression, making learning extremely fast.Fig. 1b details the layout of the proposed magnetic system and its reservoir representation. The film does not possessany discrete processing nodes; our representation of the system defines discrete “cells” for the purpose of input andoutput locations. The film is conceptually divided into a grid of magnetic cells; each cell is connected to a time-varyinginput signal source and a bias source via weighted connections W in . The output of each cell is represented by a three-dimensional magnetisation vector X xyz . This approach models a grid of nano-contacts across the film, measuring alow-resolution snapshot of the film’s magnetic state.The reservoir thin films are simulated micromagnetically where the atomistic detail is coarse grained into 5 nm cells(see Methods). Here we consider three simple ferromagnetic metals: Cobalt (Co), Nickel (Ni) and Iron (Fe). Theatomic magnetic properties of these materials are well understood from first principle calculations [28], providing adetailed insight into microscopic and macroscopic magnetic behaviour. These metals are abundant in nature, inexpen-sive and highly stable.As a thin film, the reservoir is highly-structured. The influence each cell has on its nearest neighbours is determinedby the physical properties of exchange, anisotropy, and dipole Hamiltonian (see Methods). The exchange interactionsdominate over short lengthscales, meaning that cells have finite time- and spatial correlations over the total sample size.Fig. 1c shows a typical simulated micromagnetic response to three input pulses at the films centre. When perturbed,spin waves propagate through the film inducing reflections, oscillations and interference patterns. At the edges, asimilar characteristic response is seen per impulse but with some contributions from previous stimuli.To exploit the fast spin dynamics of the ferromagnetic materials, data inputs are applied at 10ps intervals (100 GHz).Selecting a suitable input timescale depends on the material’s dynamics. An input faster or slower than the system’sintrinsic timescale alters the temporal dynamics and thus can affect settling times, refractory periods and memory inthe system. The inherent volatility and nonlinear dynamics of the spin precession provides a temporal mapping of theinput into different reservoir states. 2 c ) Magnetic simulation ( b ) Film layout( a ) Reservoir ComputerReservoirInput layer Output layer Macro-cell level
Magnetic moment ( 𝑋 (cid:3051)(cid:3052)(cid:3053)(cid:3036),(cid:3037) ) Atomic spins
Z X Y Outputs( 𝑦 )Inputs( 𝑢 ) Readout weights ( 𝑊 (cid:3042)(cid:3048)(cid:3047) )Input weights ( 𝑊 (cid:3036)(cid:3041) ) 𝐶 (cid:3031)(cid:3036)(cid:3040) (cid:3036)(cid:3041) (cid:3042)(cid:3048)(cid:3047) Figure 1: a) Reservoir computing model split into input, reservoir and output layers connected by adjustable weights.The reservoir is self-contained, typically featuring a sparse, recurrent network of processing nodes. b) Schematic ofour simulated thin-film magnetic reservoir system, consisting of micromagnetic cells derived from atomistic values.Global input sources u connect via weights W in to drive local magnetisation fields inducing spin oscillations. Eachcell’s average magnetic moment produces a 3-d orientation vector X xyz forming a reservoir state. States are thencombined via a linear readout function W out to produce the final system output y . c) Impulse response of micromag-netic spin system. Signal injected in the centre of the film via the z -axis at 25 time-step intervals with 10 ps scanningfrequency.To evaluate the materials, three temporal tasks are evaluated. The time series prediction Santa Fe chaotic laser dataset [29] is chosen for its nonlinear properties and periodic structure, and the nonlinear autoregressive moving averagemodel (NARMA) with lags of 10 (NARMA-10) and 30 (NARMA-30) time-steps are chosen to evaluate the film’sability to manage the nonlinearity-memory trade off [30]. Each benchmark increases in difficulty, demonstrating thefilm’s dynamic range and ability to perform increasingly complex tasks. Our experimental results show the investigated materials are competitive to state-of-the-art reservoir networks, andtypically outperform small networks with equivalent reservoir size. Fig. 2 shows the performance of each materialat three film sizes. Four types of recurrent neural networks are provided for the comparison, including random andevolved networks, and networks with limited connectivity. As reservoir-internal connections are typically random,baseline comparisons to random networks are included. Highly-structured networks, such as a lattice, more accuratelymodel the material crystal structure. Lattice networks with recurrent connections have be shown to be dynamicallysimilar to less restrictive recurrent neural networks, but often have to compensate with larger network size [31, 32, 33].For the laser task (Fig. 2, top row), all materials significantly outperform random networks at small sizes. At thelargest size (225 nodes, right column), only Co outperforms random networks, however, Ni and Fe remain statisticallysimilar. At the smallest film size, all materials outperform evolved networks. At 100 nodes, only Co outperformsevolved networks with a normalised mean square error (NMSE) of roughly . × − , the smallest error found. At100 nodes, Ni and Fe remain statistically similar to evolved networks. For the laser task, even the smallest magneticreservoirs here outperform larger material reservoirs reported in the literature [34, 35].For the NARMA-10 task (Fig. 2, middle row), all materials outperform random networks at small sizes. At 225 nodes,all materials are statistically similar to random lattices but worse than other networks. In some cases, materials are3 .568.51113.5 10 -3
49 nodes 100 nodes 225 nodes N M SE G r aph ( R N D ) G r aph ( E V O ) E S N ( R N D ) E S N ( E V O ) N i C o F e G r aph ( R N D ) G r aph ( E V O ) E S N ( R N D ) E S N ( E V O ) N i C o F e G r aph ( R N D ) G r aph ( E V O ) E S N ( R N D ) E S N ( E V O ) N i C o F e LaserNARMA-10NARMA-30
Graph networks Echo state networks Magnetic materials
Figure 2: Performance of materials and simulated reservoir networks on benchmark tasks. Normailsed mean squareerror (NMSE) is used to compare equivalent-sized reservoirs. Multiple reservoir sizes are displayed in columns, andeach task is divided into rows. Each type of system is represented by colour (lattice reservoir = purple; echo statereservoir = green; material = orange). The method used to create the reservoirs is given on the x -axis (random orevolved). For random search, the best reservoir from a batch of 2000 instances is shown, repeated over 20 batches.For evolved, the final evolved reservoirs are given from 20 evolutionary runs of 2000 evaluations each.better than, or similar to, evolved networks, which have unrestricted access to long-distance connections. The lowestmaterial errors found on this task are N M SE = 0 . (Co, 49 nodes), 0.032 (Co, 100) and 0.025 (Co, 225). Theseare highly competitive to, or outperform other, material reservoirs reported in the literature, such as optoelectronic( N M SE ≈ . , 50 nodes [36]) and digital reservoirs ( N M SE ≈ . , 400 node delay-line [17]).For the NARMA-30 task (Fig. 2, bottom row), the difference between materials becomes clear. Each material performsdifferently, with Co being able to better match the dynamics of the task. Across all sizes, Co is competitive to randomand evolved networks. The lowest error found is N M SE ≈ . at 225 nodes. Ni and Fe struggle to compete withother networks at small sizes; nevertheless, as size increases, N M SE decreases. This suggests that these materialsrequire larger films to exhibit the necessary dynamics to perform the tasks.The NARMA-30 task results show a strong distinction between the materials, despite their similar performances onother tasks. To understand this further, task-independent measures are used to assess non-linearity and memory. Thesemeasures better determine the general underlying dynamics of the system than tasks can achieve alone. They havebeen used to qualitatively assess the dynamical range of materials for reservoir computing [37, 31] and to determinea system’s total information processing capacity [38]. Here, the non-linear projection and short-term memory aremeasured, using the kernel rank (KR) [39] and linear memory capacity (MC) [40] of the reservoir (see SupplementaryMaterial). Fig. 3 shows values of these measures for each of the material reservoirs used in the NARMA-30 task (seeSupplementary Material for all tasks). The Co material (orange) tends to cluster around a normalised KR ≈ . and an M C ≈ . This suggests it is exploiting a weak non-linearity and a large memory to perform the task, whichcorresponds to the known dynamics of the task (see eq. 13). Ni (green) typically has smaller memory than Co butlarger than Fe (black), explaining its intermediate performance. Fe features small values in both KR and MC across all4 M C
49 nodes
Normalised KR100 nodes
225 nodes
NiCoFe
Figure 3: Normalised kernel rank (KR) and linear memory capacity (MC) of evolved films across three sizes, for theNARMA-30 task. Materials are separated by colour – Nickel (red), Cobalt (blue), and Iron (black). For the NARMA-30 task, to perform well, MC should be close to the driving equation’s time-lag of 30, which in turn requires morelinear behaviour (i.e., a low KR). The Ni and Co materials do this well, however Fe does not. Only at larger film sizesdoes Fe grow in memory capacity.sizes; however as size increases both measures slowly move towards values representative of more desirable dynamics.This change, relative to increase in size, mirrors the gradual decrease in error shown in Fig. 2.Task performances and KR/MC measure assessment indicate that several trade-offs exist. First, smaller films generallyshow better performance than similarly sized digital reservoirs. This suggests properties of small films, such as shorterdistances between edges, may improve performance. Interference and reflection from edges of travelling spin wavesare likely to increase as size decreases. The geometry of the film is also likely to have an effect. In our experiments,only square films are used; other shapes can provide greater asymmetry at the boundaries. Second, depending on thematerial, larger films can boost desirable dynamical properties such as memory. A large surface area enables signals topersist unperturbed away from rapidly changing input sources. Exploiting geometry, size, and inputs to control thesetrade-offs are of great interest for future work.
The simulated platform is realisable in physical hardware. Fig. 4a shows a proposed 5 × B ( t ) to each region of the device.With any new reservoir system, an ability to scale hardware components and reduce error is desired. In our experi-ments, each material exhibits a significant improvement as film size increases, despite its restrictive lattice topologyand no predefined discrete processing nodes. The greatest improvements relate to the difficulty of the task, wheredistinct trade-offs in non-linearity and memory are required. The most significant differences between material andsize are shown for the NARMA tasks, where memory is a strong indicator of performance.To assess scaling potential, additional evolutionary searches are conducted with the Co material for larger systems. Inorder to compare material scaling with digital reservoirs, equivalent-sized networks are evolved as well. Fig. 4b showsNARMA-10 task performance as film and reservoir size increases. Scaling begins at 25 material cells/network nodesup to 900-cells/nodes, representing film dimensions ( D ) of 25nm up to 150nm : D = ( √ N um cells ) × cell size .The results show that up to 400 cells/nodes there is a significant reduction in the average error as size increases. Afterthis, the median error is no longer significantly different, however lower errors continue to be found in the best runs.This could indicate that larger films with lower errors are more challenging to discover, or that potentially beneficialproperties of small films are lost, such as interaction of reflections from edges.At the nanoscale, thermal noise is a limiting factor. Maintaining performance close to room temperature is desirable forpractical implementations. Stability and reproduciblity can be adversely affected by thermal noise. In our experiments,temperature is set to absolute zero kelvin to observe pure magnetic behaviour without thermal effects. Methods tocontrol and reduce thermal fluctuations have been proposed using spin transfer torque to modify thermal activation5 b ) NARMA-10 scaling( c ) Temperature, thickness and performance( a ) Hardware interface Temperature (K)
Figure 4: a ) Proposed hardware interface to realise a thin-film reservoir computing device. b ) Performance of Comaterial on the NARMA-10 task as number of cells increases. Performance of the material remains competitive toscaled simulated reservoirs. c ) Grid sweep of film temperatures (K) and film thickness (nm). The NMSE of theevolved Co configuration is shown using colour. Errors are for the NARMA-30 task. Temperature ranges from 0 K(original experiments) to more practical temperatures including room temperature (300 K). White box-plots in colourbar display performances of the 20 best random ESNs at the respective size. A white diamond in a cell signals taskerror is within the ESN range.rates [41]. This suggests different paths towards room temperature computing with thin-films without cooling areplausible.To demonstrate the effect of temperature on our films, additional experiments are conducted. Fig. 4c shows reservoirperformance at various temperatures on the NARMA-30 task. The temperature range includes: millikelvin ( . K), liquid helium ( . K), liquid nitrogen ( K), and room temperature (
K). The top-left shows the originalexperimental setup (temperature = 0 K and thickness ≈ . nm) for an evolved Co reservoir. As temperature increasesalong the x -axis, thermal noise dominates and degrades performance. A similar pattern is present across all film sizes,tasks and materials (see Supplementary Material).Film thickness is also investigated to see whether thickness can compensate for a rise in temperature. On the y -axis of Fig. 4c, film thickness varies from 0.1–2nm. In general, performance is maintained with thicknesses up to . nm and temperatures up to 30–77 K. Between 0.5–1nm, the change in error slows as temperature rises (30 to200 K), however errors are higher than for thinner films. Beyond nm, thicker films tend to degrade performance,but this varies depending on material and film size (see Supplementary Material). The results show that films withsub-nanometer thickness at temperatures up to 100 K work best, outperforming or matching equivalent-sized randomreservoir networks. Our spintronic-based system provides an exceptional platform for machine learning with analogue hardware. Bycombining two frameworks, evolution in materio and reservoir computing, novel magnetic computing devices aredemonstrated. 6ithout the need for discrete neural components, physical reservoirs are possible with smaller footprints than otherneuromorphic devices, e.g., memristors, spin torque oscillators, photonics [42, 22, 24]. The evolved devices operateat frequencies of 100 GHz and require no special preprocessing to emulate network structures [17, 22]. The basicmaterials used are inexpensive and feature a large dynamical range that can be reconfigured externally to solve differentmachine learning tasks.With this generic platform, other complex magnetic materials such as alloys, oxides, skyrmion fabrics, and antiferro-magnetic reservoirs [43] can be optimised and exploited. Furthermore, simulations of complex atomic structures arepossible. With atomistic simulations, desirable hetero-structures or defects can be introduced to add more reservoircomplexity and greater physical realism.The natural dynamics and nanoscale size of the proposed magnetic substrates presents a new path towards fast energy-efficient computing platforms enabling new innovations in smart technologies.
For a generic atomistic model with n nearest neighbour interactions, the Curie temperature T C can be calculated fromthe atomistic exchange J ij by the mean-field expression. This sums over every exchange that occurs in each cell tocalculate the total exchange [44]. T C = ε k B N c N c (cid:88) i =0 n (cid:88) j =0 J ij (1)where k B is the Boltzmann constant, N c is the number of atoms per cell, and ε is a correction factor from the usualmean-field expression which arises due to spin waves in the 3D Heisenberg model.The anisotropy k u and the spontaneous magnetisation M s are calculated as a sum of the atomic anisotropies and spinmoments within each cell. The gyromagnetic ratio γ and the damping constant α are calculated as an average of theatomic parameters for each cell.The energetics of the micromagnetic system are described using a spin Hamiltonian neglecting non-magnetic contri-butions and given by: H eff = H app + H ani + H exc + H dip (2)where H app is the applied field, H ani is the anisotropy field, H exc is the intergranular exchange, and H dip is the dipolefield.The anisotropy Hamiltonian describes the directional dependence of the materials magnetisation, in this case theanisotropy is uniaxial along z and is described by: H ani = KV ( m x + m y ) (3)The exchange field is calculated as a sum of the exchange interactions between neighbouring cells, the micromagneticexchange constant A is a sum over all atoms which have a neighbours in another cell. The summation over all theinteractions gives a total interaction from cell i to cell j . From this the micromagnetic exchange constant is calculatedby multiplying by the distance between the atomistic atoms. H iex = A ij M S ∆ m e (cid:88) n cells ( m j − m i ) (4) H dip = µ π m · ˆ r )ˆ r − m | ˆ r | − µ m V (5)The atomistic Landau–Lifshitz–Gilbert (LLG) equation is used to model the time-dependent behaviour of the magneticfilms given by: ∂ m i ∂t = − γ (1 + λ ) (cid:2) m i × H i eff + λ m i × (cid:0) m i × H i eff (cid:1)(cid:3) (6)7here m i is a unit vector representing the direction of the magnetic spin moment of cell i , γ is the gyromagnetic ratioand H i eff is the net magnetic field on each cell and is equal to the derivative of the spin Hamiltonian: H i eff = − µ s ∂ H eff ∂ S i (7) The reservoir dynamics of simulated networks are given by the state update equation: x ( t ) = (1 − a ) x ( t −
1) + af ( b W in [ u ( t ); u bias ] + c W x ( t − (8)where x is the internal state at time-step t , f is the non-linear neuron activation function (a tanh function), u is theinput signal, and u bias is a bias source. W in and W are weight matrices giving the connection weights to inputsand internal neurons respectively. The parameters b and c control the global scaling of the input weights and internalweights. Input scaling b affects the non-linear response of the reservoir and relative effect of the current input. Internalscaling c controls the reservoir’s stability as well as the influence and persistence of the input: low values dampeninternal activity and increase response to input, and high values lead to chaotic behaviour. A leakage filter a is used tomatch the internal timescales of the film to the characteristic timescale of the task. This is similar to adding a low-passfilter before the output. The leak rate controls the time-scale mismatch between the input and reservoir dynamics;when a = 1 , the previous states do not leak into the current states.For both random and evolved reservoir networks, W in and W are initialised as sparse normally distributed randommatrices (input sparsity = 0 . , internal sparsity = 0 . , mean = 0 , variance = 1 ). For the lattice network, we define asquare grid of neurons each connected to its nearest neighbours in its Moore neighbourhood [45]. Each non-perimeternode has eight connections to neighbours and one self-connection, resulting in each node having a maximum of nineadaptable weights in W .The final trained output y ( t ) is given when the reservoir states x ( t ) are combined with the trained readout weightmatrix W out : y ( t ) = W out x ( t ) (9)Readout training is performed using ridge regression [46] and occurs within the evolutionary loop during the trainingphase. A validation and testing phase is carried out to evaluate the generalisation of the readout to new data. Thisapproach is similar to previous work [47, 48]. During the simulation, material parameters such as exchange interaction, anisotropies, and atomic moments are definedby the material and remain unaltered. Parameters controlling the input mapping, field intensity b , intrinsic magneticdamping α , and a post-state collection filter a are tuned.The material is interpreted as a reservoir in the following way: X ( t ) = σ ( b W in [ u ; u bias ] , α ) (10) X f ( t ) = (1 − a ) X ( t −
1) + a X ( t ) (11) y ( t ) = W out X f ( t ) (12)where X is the global material state comprising each cell’s local X xyz
3d magnetisation vector, σ represents thematerial function, a is the leakage parameter, and X f is an external filter layer with a one-step memory implementedafter the observation of material state X and before the readout weights are applied.The input mapping W in consists of weighted connections from the input u and a bias u bias source to each cell. Theinput search space is typically large and grows with film size. Field intensity ( < b ≤ ) is a global scaling factorapplied to the input mapping. This suppresses or raises the overall magnitude of the locally applied fields promotingvarying dynamical behaviours.The magnetic damping parameter ( < α ≤ ) controls the speed of information propagation and oscillation. Damp-ing describes the non-linear spin relaxation across the film, controlling the rate at which magnetisation spins reachequilibrium.To optimise magnetic reservoirs, artificial evolution is applied. To reduce convergence time, linear regression is alsoused to train the readout rather than evolving it (see Methods). The evolutionary goal is to find parameters that optimisethe efficiency and ability of the readout layer to perform its function.8 i Co Fe unitCrystal structure fcc fcc bcc –Unit cell size a µ s µ B Exchange energy J ij . × − . × − . × − J/linkAnisotropy k . × − . × − . × − J/atomTemp. rescaling exponent .
322 2 .
369 2 . –Rescaling Curie temperature
635 1395 1049 – Table 1: Parameters used to simulate each ferromagnetic material in VAMPIRE. These parameters are static in ourwork and are not affected by the evolutionary algorithm.Many heuristics can be used to optimise reservoirs [49], but here the microbial genetic algorithm (MGA) [50] is chosenfor its simplicity. The MGA allows individuals to survive across many generations, provides elitism for free, and offersa simple mechanism for selection, recombination and mutation.Parameters for the MGA include: population size = 100 , number of generations = 2000 , mutation rate = 0 . ,recombination rate = 0 . , deme size (species separation) = 0 . of population), and number of runs = 20 .These parameters were used for all experiments involving an evolutionary algorithm.To conduct the experiments, the VAMPIRE source code was adapted to construct a dynamic input-output mechanism.Important parameters for the VAMPIRE simulation include input frequency, integration time-step, initial spin direc-tion, and macro-cell size (micromagnetic simulation). The input frequency chosen – 10ps / 100 GHz – was basedon qualitative experiments in search of characteristic behaviours, such as fast response and a short settling time. Theinput frequency has to closely match the internal timescales and dynamics of the system.To optimise the evaluation process and reduce computational cost an integration timestep of 100fs was used. Thisprovides a less accurate model compared to an integration timestep of 1fs but provides manageable computationalrun times. Details about how this parameter choice minimally affects performance are provided in the supplementarymaterial.The initial spin direction was aligned with the x -axis, and input signals were injected in the z -direction. The macro-cellsize for each simulation was fixed at 5nm.Simulation parameters for each material are given in Table 1. These include exchange constants and second-order uni-axial anisotropy constants. To conduct accurate temperature calculations, rescaling exponents and curie temperatureinformation are also included. The chosen tasks are widely used benchmarks for different reservoir systems and methods [51, 33, 52, 36, 34, 30].The laser task predicts the next value of the Santa Fe time-series Competition Data (dataset A) [29]. The dataset holdsoriginal source data recorded from a Far-Infrared-Laser in a chaotic state. The training and testing uses the first 2,000values of the dataset, divided into three sets: 1200 (training set), 400 (validation set), and 400 (test set). The first 50output values of each sub-set are discarded as an initial washout period.The NARMA task originates from work on training recurrent networks [53]. It evaluates a reservoir’s ability to modelan n -th order highly non-linear dynamical system where the system state depends on the driving input as well as its ownhistory. The challenging aspect of the NARMA task is that it contains both non-linearity and long-term dependenciescreated by the n -th order time-lag.An n -th ordered NARMA experiment is carried out by predicting the output y ( n + 1) given by eq.(13) when suppliedwith u ( n ) from a uniform distribution of interval [0, 0.5]. For the 10-th and 30-th order systems α = 0 . , β = 0 . , δ = 10 and γ = 0 . . y ( n + 1) = αy ( n ) + βy ( n ) (cid:32) δ (cid:88) i =0 y ( n − i ) (cid:33) + 1 . u ( n − δ ) u ( n ) + γ (13)The NARMA equation is simulated for 5,000 values and split into: 3,000 training, 1,000 validation and 1,000 test forboth versions. The first 50 values of each sub-set are discarded as an initial washout period.9 cknowledgements This work is part of the SpInspired project, funded by EPSRC Grant EP/R032823/1. All experiments were carried outusing the University of York’s Super Advanced Research Computing Cluster (Viking).
References [1] Tanaka, G. et al.
Recent advances in physical reservoir computing: A review.
Neural Networks (2019).[2] Shi, W., Cao, J., Zhang, Q., Li, Y. & Xu, L. Edge computing: Vision and challenges.
IEEE internet of thingsjournal , 637–646 (2016).[3] Chen, J. & Ran, X. Deep learning with edge computing: A review. Proceedings of the IEEE , 1655–1674(2019).[4] Wang, X. et al.
Convergence of edge computing and deep learning: A comprehensive survey.
IEEE Communi-cations Surveys & Tutorials , 869–904 (2020).[5] Adamatzky, A. (ed.) Advances in Unconventional Computing: Volume 2 Prototypes, Models and Algorithms (Springer, 2016).[6] Young, A. R., Dean, M. E., Plank, J. S. & Rose, G. S. A review of spiking neuromorphic hardware communica-tion systems.
IEEE Access , 135606–135620 (2019).[7] Indiveri, G. et al. Neuromorphic silicon neuron circuits.
Frontiers in neuroscience , 73 (2011).[8] Jo, S. H. et al. Nanoscale memristor device as synapse in neuromorphic systems.
Nano letters , 1297–1301(2010).[9] Xia, Q. & Yang, J. J. Memristive crossbar arrays for brain-inspired computing. Nature materials , 309–323(2019).[10] Miller, J. F. & Downing, K. Evolution in materio: Looking beyond the silicon box. In NASA/DoD Conferenceon Evolvable Hardware 2002 , 167–176 (IEEE, 2002).[11] Mohid, M. et al.
Evolving solutions to computational problems using carbon nanotubes.
International Journalof Unconventional Computing , 245–281 (2015).[12] Massey, M. et al. Evolution of electronic circuits using carbon nanotube composites.
Scientific Reports (2016).[13] Bose, S. et al. Evolution of a designless nanoparticle network into reconfigurable boolean logic.
Nature nan-otechnology doi:10.1038/nnano.2015.207 (2015).[14] Chen, T. et al.
Classification with a disordered dopant-atom network in silicon.
Nature , 341–345 (2020).[15] Schrauwen, B., Verstraeten, D. & Van Campenhout, J. An overview of reservoir computing: theory, applicationsand implementations. In
Proceedings of the 15th European symposium on artificial neural networks (Citeseer,2007).[16] Verstraeten, D. & Schrauwen, B. On the quantification of dynamics in reservoir computing. In
Artificial NeuralNetworks–ICANN 2009 , 985–994 (Springer, 2009).[17] Appeltant, L. et al.
Information processing using a single dynamical node as complex system.
Nature Commu-nications , 468 (2011).[18] Caravelli, F. & Carbajal, J. Memristors for the curious outsiders. Technologies , 118 (2018).[19] Dion, G., Mejaouri, S. & Sylvestre, J. Reservoir computing with a single delay-coupled non-linear mechanicaloscillator. Journal of Applied Physics , 152132 (2018).[20] Prychynenko, D. et al.
Magnetic skyrmion as a nonlinear resistive element: A potential building block forreservoir computing.
Physical Review Applied , 014034 (2018).[21] Pinna, D., Bourianoff, G. & Everschor-Sitte, K. Reservoir computing with random skyrmion textures. Phys. Rev.Applied , 054020 (2020). URL https://link.aps.org/doi/10.1103/PhysRevApplied.14.054020 .[22] Torrejon, J. et al. Neuromorphic computing with nanoscale spintronic oscillators.
Nature , 428–431 (2017).[23] Nakane, R., Tanaka, G. & Hirose, A. Reservoir computing with spin waves excited in a garnet film.
IEEE Access , 4462–4469 (2018).[24] Romera, M. et al. Vowel recognition with four coupled spin-torque nano-oscillators.
Nature , 230–234(2018). 1025] Zheng, Q., Zhu, X., Mi, Y., Yuan, Z. & Xia, K. Recurrent neural networks made of magnetic tunnel junctions.
AIP Advances , 025116 (2020).[26] Watt, S. & Kostylev, M. Reservoir computing using a spin-wave delay-line active-ring resonator based onyttrium-iron-garnet film. Physical Review Applied , 034057 (2020).[27] Zahedinejad, M. et al. Two-dimensional mutually synchronized spin hall nano-oscillator arrays for neuromorphiccomputing.
Nature Nanotechnology , 47–52 (2020).[28] Pajda, M., Kudrnovsk`y, J., Turek, I., Drchal, V. & Bruno, P. Ab initio calculations of exchange interactions,spin-wave stiffness constants, and curie temperatures of fe, co, and ni. Physical Review B , 174402 (2001).[29] Weigend, A. The Santa Fe Time Series Competition Data: Data set A, Laser generated data (1991 (accessedMarch, 2016)). URL .[30] Inubushi, M. & Yoshimura, K. Reservoir computing beyond memory-nonlinearity trade-off.
Scientific Reports , 10199 (2017).[31] Dale, M. et al. The role of structure and complexity on reservoir computing quality. In
International Conferenceon Unconventional Computation and Natural Computation , 52–64 (Springer, 2019).[32] Dale, M., O’Keefe, S., Sebald, A., Stepney, S. & Trefzer, M. A. Reservoir computing quality: connectivity andtopology.
Natural Computing (2020). doi:10.1007/s11047-020-09823-1.[33] Rodan, A. & Tiˇno, P. Simple deterministically constructed recurrent neural networks. In
International Confer-ence on Intelligent Data Engineering and Automated Learning , 267–274 (Springer, 2010).[34] Larger, L. et al.
Photonic information processing beyond turing: an optoelectronic implementation of reservoircomputing.
Optics Express , 3241–3249 (2012).[35] Hou, Y. et al. Prediction performance of reservoir computing system based on a semiconductor laser subject todouble optical feedback and optical injection.
Optics Express , 10211–10219 (2018).[36] Paquot, Y. et al. Optoelectronic reservoir computing.
Scientific Reports (2012).[37] Dale, M., Miller, J. F., Stepney, S. & Trefzer, M. A. A substrate-independent framework to characterize reservoircomputers. Proceedings of the Royal Society A , 20180723 (2019).[38] Dambre, J., Verstraeten, D., Schrauwen, B. & Massar, S. Information processing capacity of dynamical systems.
Scientific Reports (2012).[39] Legenstein, R. & Maass, W. Edge of chaos and prediction of computational performance for neural circuitmodels. Neural Networks , 323–334 (2007).[40] Jaeger, H. Short term memory in echo state networks (GMD-Forschungszentrum Informationstechnik, 2001).[41] Demidov, V. E. et al.
Magnetic nano-oscillator driven by pure spin current.
Nature materials , 1028–1031(2012).[42] Du, C. et al. Reservoir computing using dynamic memristors for temporal information processing.
Naturecommunications , 1–10 (2017).[43] Kurenkov, A., Fukami, S. & Ohno, H. Neuromorphic computing with antiferromagnetic spintronics. Journal ofApplied Physics , 010902 (2020).[44] Jiles, D.
Introduction to magnetism and magnetic materials (CRC press, 2015).[45] Adamatzky, A.
Game of life cellular automata , vol. 1 (Springer, 2010).[46] Lukoˇseviˇcius, M. A practical guide to applying echo state networks. In
Neural Networks: Tricks of the Trade ,659–686 (Springer, 2012).[47] Dale, M., Miller, J. F., Stepney, S. & Trefzer, M. A. Evolving carbon nanotube reservoir computers. In
Interna-tional Conference on Unconventional Computation and Natural Computation , 49–61 (Springer, 2016).[48] Dale, M. Neuroevolution of hierarchical reservoir computers. In
Proceedings of the Genetic and EvolutionaryComputation Conference , 410–417 (ACM, 2018).[49] Bala, A., Ismail, I., Ibrahim, R. & Sait, S. M. Applications of metaheuristics in reservoir computing techniques:a review.
IEEE Access , 58012–58029 (2018).[50] Harvey, I. The microbial genetic algorithm. In European Conference on Artificial Life , 126–133 (Springer,2009).[51] Jaeger, H. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note.
Bonn, Germany: German National Research Center for Information Technology GMD Technical Report , 34(2001). 1152] Tran, S. D. & Teuscher, C. Memcapacitive reservoir computing. In , 115–116 (IEEE, 2017).[53] Atiya, A. F. & Parlos, A. G. New results on recurrent network training: unifying the algorithms and acceleratingconvergence.
IEEE Transactions on Neural Networks , 697–709 (2000).[54] Ganguli, S., Huh, D. & Sompolinsky, H. Memory traces in dynamical systems. Proceedings of the NationalAcademy of Sciences , 18970–18975 (2008). 12 upplementary Material
Optimised Integration Time-step
To reduce computational time simulating thin-films a large integrator time-step was used. Ideally, small time stepsare preferable to more accurately capture spin precession and general dynamics between input pulses, however, thiscomes with a large computational cost.A characterisation of the how the integrator time step affects task performance is given in Fig. 5. Here, we show thechosen 100fs integrator time-step compared to the more accurate 1fs time-step. These results show that, in general,our chosen time-step is statistically similar, representing a reasonably accurate model of the driven dynamics.To test whether the medians of both time-steps were significantly different, the non-parametric two-sided Wilcoxonrank sum test was used. This tests the null hypothesis that both samples are from the same distribution with equalmedians. A rejection of the null hypothesis at the 95% significance level is indicated by a p -value > . .The p -values for each task are: p = 0 . (laser), p = 0 . (NARMA-10), and p = 0 . (NARMA-30). This indicatesthat performance is not significantly affected by the change in time-step, however, computational time is reduceddramatically from hours to minutes. N M SE Laser
NARMA-10
NARMA-30
Figure 5: Comparing integrator time-steps across three tasks. 100fs provides a less accurate model compared to 1fsbut dramatically reduces run-time. Only the Co material used in this experiment. Each boxplot shows a total of 30random configurations for each task compared at both time-steps.
Kernel Rank and Memory Capacity
In this work, two property measures are used to assess the underlying dynamics of the reservoir system. Thesemeasures are the kernel rank (KR) and linear memory capacity (MC).Kernel rank is a measure of the reservoir’s ability to separate distinct input patterns [39]. It measures a reservoir’sability to produce a rich non-linear representation of the input u and its history u ( t − , u ( t − , . . . . This is closelylinked to the linear separation property , measuring how different input signals map onto different reservoir states. Asmany practical tasks are linearly inseparable, reservoirs typically require some non-linear transformation of the input.KR is a measure of the complexity and diversity of these non-linear operations performed by the reservoir.Reservoirs in ordered dynamical regimes typically have a low ranking value of KR, and in chaotic regimes, it is high.The maximum value of KR is relative to the number of observable states. In our experiments, KR is normalised toobserve the underlying non-linearity of the task without distortion from reservoir size.Another important property for reservoir computing is memory, as reservoirs are typically configured to solve temporalproblems. A simple measure for reservoir memory is the linear short-term memory capacity (MC). This was firstoutlined in [40] to quantify the echo state property. For the echo state property to hold, the dynamics of the input drivenreservoir must asymptotically wash out any information resulting from initial conditions. This property thereforeimplies a fading memory exists, characterised by the short-term memory capacity.A full understanding of a reservoir’s memory capacity, however, cannot be encapsulated through a linear memorymeasure alone, as a reservoir will possess some non-linear memory. Other memory measures proposed in the literaturequantify other aspects of memory, such as the quadratic and cross-memory capacities, and total memory of reservoirs13sing the Fisher Memory Curve [54, 38]. The linear measure is used here as a simple benchmark. More sophisticatedmeasures are unnecessary to identify the differences in the following tasks.
49 nodes 100 nodes 225 nodes M C Normalised KR
NiCoFe
Figure 6: Kernel rank (normalised) versus linear memory capacity for all materials, sizes and tasks. Each columnrefers to a film size (left, middle, right: 49, 100, 225) and rows to a task (top, middle, bottom: laser, NARMA-10,NARMA-30). Material reservoirs shown are those displayed in Fig. 2.In Fig. 3, just the results for the NARMA-30 task are given. The results for all tasks and sizes are provided in Fig. 6.From these results, it is possible to determine the difference in dynamics required for each task.The laser task requires very little memory, and is mainly driven by non-linear dynamics. The normalised KR of 0.5 isrelatively high when taking into account that many of the magnetic materials observable states are highly correlated,e.g., from the x and y dimension of the spins.The NARMA-10 task features more linear tendencies. We see memory capacity clusters around the value of 10,relating of course to the 10 time-step time-lag present in the system being modelled. Irrespective of size, the samecharacteristic dynamics have converged during evolutionary selection, and all materials are able to exhibit the samedynamics.The NARMA-30 task requires a notable increase in memory capacity. At the smallest size, no material meets thenecessary criteria (
M C = 30 ) to perform well at the task, however Co and Ni attempt to maximise their MC. Festruggles to exhibit any memory. As size increases, Co and Ni gradually reach
M C = 30 and this is reflected in theirperformance. The MC of Fe also increases but at a slower rate proportional to size.14 emperature Effect and Film Thickness
To build practical computing systems it is desirable for the materials to function close to room temperature. In addition,thicker films put less strain on the fabrication process. In our main experiments, each material film was evolved at 0Kto evaluate performance without thermal fluctuations. Here, we show how temperature affects performance at all filmsizes (number of cells) and across each task.For the laser task (Fig. 7), performance is stable and competitive – to random ESNs of equivalent node size – at highertemperatures typically up to 100K, depending on the material and number of cells. The most stable material and filmsize is Fe at 100 cells. In this configuration, only a small change in performance is present as thickness is increased upto 1nm.For the NARMA-10 task performance is again stable, in some cases up to 100K, e.g., Co with 100 cells. As temperatureincreases, performance tends to drop off slightly faster than the laser task. This could be due to degradation inmemory quality as thermal noise increases. In general, the results suggest the Co material responds better to increasedtemperatures. However, thicker films tend to be more detrimental to performance. The same trends are seen for theNARMA-30 task.
Interference and Reflective Boundaries
The proposed system exploits the nonlinear interactions of spins when perturbed by local magnetic fields. As informa-tion propagates, local coupled spins form wave crests and troughs that interact, creating interference patterns. At theboundaries, waves are reflected back into the film. Figs. 10 and 11 provide a visualisation of this dynamical behaviourfor two Co films (49 cell and 900 cell). At t = 10 , a single input pulse is supplied to two separate input locations.At t = 11 - , waves appear and propagate. The smaller film (Fig. 10) interacts almost instantly with the boundaries,and waves reverberate around the film for some time. In the larger film (Fig. 11), signals propagate for longer, undis-turbed, until the wave crests reach each other and the boundaries. At t > , interference and reflected waves begin todominate; however, memory of past inputs are still recoverable according to the memory capacity measure.
49 cells
100 cells 225 cells -3 -2 -1 T h i ck ne ss ( n m ) -3 -2 -1 .
28 4 . .
28 4 . Temperature (K) .
28 4 . -3 -2 -1 NiCoFe
Figure 7: Temperature and film thickness sweeps for laser task. Normalised mean square error (colour) of an evolvedconfiguration for Ni, Co, Fe. Box plots (white) display performances of 20 best random ESNs for this task. A diamond(white) signals that performance of film is within the ESN range. A diamond (yellow) signals that performance of filmis better than ESN 15
100 cells 225 cells -2 -1 T h i ck ne ss ( n m ) -2 -1 .
28 4 . .
28 4 . Temperature (K) .
28 4 . -2 -1 NiCoFe
Figure 8: Temperature and film thickness sweeps for NARMA-10 task. Normalised mean square error (colour) ofan evolved configuration for Ni, Co, Fe. Box plots (white) display performances of 20 best random ESNs for thistask. A diamond (white) signals that performance of film is within the ESN range. A diamond (yellow) signals thatperformance of film is better than ESN.
49 cells
100 cells 225 cells -1 T h i ck ne ss ( n m ) -1 .
28 4 . .
28 4 . Temperature (K) .
28 4 . -1 NiCoFe
Figure 9: Temperature and film thickness sweeps for NARMA-30 task. Normalised mean square error (colour) ofan evolved configuration for Ni, Co, Fe. Box plots (white) display performances of 20 best random ESNs for thistask. A diamond (white) signals that performance of film is within the ESN range. A diamond (yellow) signals thatperformance of film is better than ESN 16 =11 t =12 t =13 t =14t =15 t =16 t =17 t =18t =19 t =20 t =21 t =22
Figure 10: Dynamics of Co magnetic film with 49 cells. An input pulse is supplied at two locations on the film at t = 10 . Red indicates a positive magnetisation, and blue, negative. t =11 t =12 t =13 t =14t =15 t =16 t =17 t =18t =19 t =20 t =21 t =22 Figure 11: Dynamics of Co magnetic film with 900 cells. An input pulse is supplied at two locations on the film at t = 10= 10