[PDF] Reservoir Computing with Thin-film Ferromagnetic Devices

Abstract

Advances in artificial intelligence are driven by technologies inspired by the brain, but these technologies are orders of magnitude less powerful and energy efficient than biological systems. Inspired by the nonlinear dynamics of neural networks, new unconventional computing hardware has emerged with the potential for extreme parallelism and ultra-low power consumption. Physical reservoir computing demonstrates this with a variety of unconventional systems from optical-based to spintronic. Reservoir computers provide a nonlinear projection of the task input into a high-dimensional feature space by exploiting the system's internal dynamics. A trained readout layer then combines features to perform tasks, such as pattern recognition and time-series analysis. Despite progress, achieving state-of-the-art performance without external signal processing to the reservoir remains challenging. Here we show, through simulation, that magnetic materials in thin-film geometries can realise reservoir computers with greater than or similar accuracy to digital recurrent neural networks. Our results reveal that basic spin properties of magnetic films generate the required nonlinear dynamics and memory to solve machine learning tasks. Furthermore, we show that neuromorphic hardware can be reduced in size by removing the need for discrete neural components and external processing. The natural dynamics and nanoscale size of magnetic thin-films present a new path towards fast energy-efficient computing with the potential to innovate portable smart devices, self driving vehicles, and robotics.

Full PDF

RR E S E RVO I R C O M P U T I N G W I T H T H I N - FI L MF E R RO M AG N E T I C D E V I C E S

Matthew Dale , Richard F. L. Evans , Sarah Jenkins , Simon O’Keefe , Angelika Sebald , Susan Stepney ,Fernando Torre , and Martin Trefzer Department of Computer Science, University of York, UK Department of Physics, University of York, UK Department of Chemistry, University of York, UK Department of Electronic Engineering, University of York, UK * [email protected] A BSTRACT

Advances in artiﬁcial intelligence are driven by technologies inspired by the brain, but these tech-nologies are orders of magnitude less powerful and energy efﬁcient than biological systems. In-spired by the nonlinear dynamics of neural networks, new unconventional computing hardware hasemerged with the potential for extreme parallelism and ultra-low power consumption. Physicalreservoir computing demonstrates this with a variety of unconventional systems from optical-basedto spintronic [1]. Reservoir computers provide a nonlinear projection of the task input into a high-dimensional feature space by exploiting the system’s internal dynamics. A trained readout layer thencombines features to perform tasks, such as pattern recognition and time-series analysis. Despiteprogress, achieving state-of-the-art performance without external signal processing to the reservoirremains challenging. Here we show, through simulation, that magnetic materials in thin-ﬁlm ge-ometries can realise reservoir computers with greater than or similar accuracy to digital recurrentneural networks. Our results reveal that basic spin properties of magnetic ﬁlms generate the re-quired nonlinear dynamics and memory to solve machine learning tasks. Furthermore, we show thatneuromorphic hardware can be reduced in size by removing the need for discrete neural componentsand external processing. The natural dynamics and nanoscale size of magnetic thin-ﬁlms presenta new path towards fast energy-efﬁcient computing with the potential to innovate portable smartdevices, self driving vehicles, and robotics.

Main

Performing machine learning at ‘the edge’ is a growing area of interest, where inference is performed locally in realtime [2, 3, 4]. Embedded devices that can perform complex information processing without the need to outsourceto remote servers are ideal for real-time applications. However, current systems are limited by processing speeds,memory, size, and power consumption. Unconventional hardware is a potential alternative to classical computinghardware, with low-energy consumption, inherent parallelism, and no separation between processor and memory (thevon Neumann bottleneck) [5]. Neuro-inspired hardware [6] is one route to embed machine learning at the edge,another is to exploit embodied computation in novel dynamical systems.By design, neural-based hardware implements the abstract behaviour of neurons and their connectivity at the low-est circuit level, e.g. weighted summation, threshold functions, synapses. This typically requires a combinationof simpler components to implement the model. For example, a single neuron with conventional complementarymetal–oxide–semiconductor technology takes 10s to 100s of transistors to replicate a neuron-synapse circuit [7, 8].Another option is to force the neuron model directly onto the material to improve energy-efﬁciency and reduce thephysical footprint, yet model constraints may require removal of useful natural properties (e.g. variability in compo-nents) or require additional engineering [9]. Here we demonstrate an alternative approach exploiting the dynamicalbehaviours of neural systems without the direct implementation of neural units, allowing further reductions in size andefﬁciency. a r X i v : . [ c s . ET ] J a n ynamical properties that occur naturally within complex materials, such as memory, nonlinear oscillation, and chaoscan be directly exploited for computation, with less top-down engineering of the material. However, the discovery andcontrol of intractable or unknown material properties raises new challenges.Two novel approaches have been proposed to exploit the embodied computation of materials: evolution in materio and reservoir computing. Miller and Downing [10] proposed using artiﬁcial evolution as a mechanism to exploit andconﬁgure materials, arguing natural evolution is the method par excellence for exploiting the physical properties ofmaterials.Evolution in materio uses computer-controlled manipulation of external stimuli to conﬁgure the material and its input-output mapping, using digital computers to directly evolve physical material conﬁgurations. A range of materials havebeen evolved to perform classiﬁcation, real-time robot control and pattern recognition [11, 12, 13, 14].Reservoir computing is a neuro-inspired framework that harnesses the high-dimensionality and temporal propertiesof recurrent networks and novel systems [15, 16]. Physical implementations of the reservoir model are diverse [17,18, 19] with recent spintronic reservoirs showing some key advantages compared to other systems, combining GHz+operating frequencies, ultra-compact size and ultra-low-energy consumption [20, 21, 22, 23, 24, 25, 26, 27].Here we demonstrate material computation with ferromagnetic materials in thin nano-ﬁlm geometries, combiningboth evolution in materio and reservoir computing methods. The reservoir model is used to harness the propagationof information through magnetic ﬁlms, and artiﬁcial evolution is used to optimise reservoir parameters. Using open-source simulation software, we evolve three ferromagnetic materials to solve three time-dependent tasks of increasingcomplexity. All materials are evaluated at various ﬁlm sizes with direct comparisons to equivalent-sized recurrentneural networks. The magnetic system is then characterised by metrics to understand the dynamical properties of eachmaterial. Lastly, the effects of temperature and ﬁlm size are explored to inform future physical implementations. Reservoir computers are composed of three layers: input, reservoir and output layer (Fig. 1a). A reservoir, typically aﬁxed random network of discrete processing nodes with recurrent connections, features non-linear characteristics anda short-term memory. The reservoir network is driven by a time-varying input u that propagates through a randominput mapping via connection weights W in (see Methods). The non-linear reservoir provides a high-dimensionalprojection of the input from which a subsequent linear readout layer can extract features relevant to the problem task.Training occurs only at the readout through trained weighted connections W out connecting observable states to theﬁnal output. Typically, one-shot learning is used through linear regression, making learning extremely fast.Fig. 1b details the layout of the proposed magnetic system and its reservoir representation. The ﬁlm does not possessany discrete processing nodes; our representation of the system deﬁnes discrete “cells” for the purpose of input andoutput locations. The ﬁlm is conceptually divided into a grid of magnetic cells; each cell is connected to a time-varyinginput signal source and a bias source via weighted connections W in . The output of each cell is represented by a three-dimensional magnetisation vector X xyz . This approach models a grid of nano-contacts across the ﬁlm, measuring alow-resolution snapshot of the ﬁlm’s magnetic state.The reservoir thin ﬁlms are simulated micromagnetically where the atomistic detail is coarse grained into 5 nm cells(see Methods). Here we consider three simple ferromagnetic metals: Cobalt (Co), Nickel (Ni) and Iron (Fe). Theatomic magnetic properties of these materials are well understood from ﬁrst principle calculations [28], providing adetailed insight into microscopic and macroscopic magnetic behaviour. These metals are abundant in nature, inexpen-sive and highly stable.As a thin ﬁlm, the reservoir is highly-structured. The inﬂuence each cell has on its nearest neighbours is determinedby the physical properties of exchange, anisotropy, and dipole Hamiltonian (see Methods). The exchange interactionsdominate over short lengthscales, meaning that cells have ﬁnite time- and spatial correlations over the total sample size.Fig. 1c shows a typical simulated micromagnetic response to three input pulses at the ﬁlms centre. When perturbed,spin waves propagate through the ﬁlm inducing reﬂections, oscillations and interference patterns. At the edges, asimilar characteristic response is seen per impulse but with some contributions from previous stimuli.To exploit the fast spin dynamics of the ferromagnetic materials, data inputs are applied at 10ps intervals (100 GHz).Selecting a suitable input timescale depends on the material’s dynamics. An input faster or slower than the system’sintrinsic timescale alters the temporal dynamics and thus can affect settling times, refractory periods and memory inthe system. The inherent volatility and nonlinear dynamics of the spin precession provides a temporal mapping of theinput into different reservoir states. 2 c ) Magnetic simulation ( b ) Film layout( a ) Reservoir ComputerReservoirInput layer Output layer Macro-cell level

Magnetic moment ( 𝑋 (cid:3051)(cid:3052)(cid:3053)(cid:3036),(cid:3037) ) Atomic spins

Z X Y Outputs( 𝑦 )Inputs( 𝑢 ) Readout weights ( 𝑊 (cid:3042)(cid:3048)(cid:3047) )Input weights ( 𝑊 (cid:3036)(cid:3041) ) 𝐶 (cid:3031)(cid:3036)(cid:3040) (cid:3036)(cid:3041) (cid:3042)(cid:3048)(cid:3047) Figure 1: a) Reservoir computing model split into input, reservoir and output layers connected by adjustable weights.The reservoir is self-contained, typically featuring a sparse, recurrent network of processing nodes. b) Schematic ofour simulated thin-ﬁlm magnetic reservoir system, consisting of micromagnetic cells derived from atomistic values.Global input sources u connect via weights W in to drive local magnetisation ﬁelds inducing spin oscillations. Eachcell’s average magnetic moment produces a 3-d orientation vector X xyz forming a reservoir state. States are thencombined via a linear readout function W out to produce the ﬁnal system output y . c) Impulse response of micromag-netic spin system. Signal injected in the centre of the ﬁlm via the z -axis at 25 time-step intervals with 10 ps scanningfrequency.To evaluate the materials, three temporal tasks are evaluated. The time series prediction Santa Fe chaotic laser dataset [29] is chosen for its nonlinear properties and periodic structure, and the nonlinear autoregressive moving averagemodel (NARMA) with lags of 10 (NARMA-10) and 30 (NARMA-30) time-steps are chosen to evaluate the ﬁlm’sability to manage the nonlinearity-memory trade off [30]. Each benchmark increases in difﬁculty, demonstrating theﬁlm’s dynamic range and ability to perform increasingly complex tasks. Our experimental results show the investigated materials are competitive to state-of-the-art reservoir networks, andtypically outperform small networks with equivalent reservoir size. Fig. 2 shows the performance of each materialat three ﬁlm sizes. Four types of recurrent neural networks are provided for the comparison, including random andevolved networks, and networks with limited connectivity. As reservoir-internal connections are typically random,baseline comparisons to random networks are included. Highly-structured networks, such as a lattice, more accuratelymodel the material crystal structure. Lattice networks with recurrent connections have be shown to be dynamicallysimilar to less restrictive recurrent neural networks, but often have to compensate with larger network size [31, 32, 33].For the laser task (Fig. 2, top row), all materials signiﬁcantly outperform random networks at small sizes. At thelargest size (225 nodes, right column), only Co outperforms random networks, however, Ni and Fe remain statisticallysimilar. At the smallest ﬁlm size, all materials outperform evolved networks. At 100 nodes, only Co outperformsevolved networks with a normalised mean square error (NMSE) of roughly . × − , the smallest error found. At100 nodes, Ni and Fe remain statistically similar to evolved networks. For the laser task, even the smallest magneticreservoirs here outperform larger material reservoirs reported in the literature [34, 35].For the NARMA-10 task (Fig. 2, middle row), all materials outperform random networks at small sizes. At 225 nodes,all materials are statistically similar to random lattices but worse than other networks. In some cases, materials are3 .568.51113.5 10 -3

49 nodes 100 nodes 225 nodes N M SE G r aph ( R N D ) G r aph ( E V O ) E S N ( R N D ) E S N ( E V O ) N i C o F e G r aph ( R N D ) G r aph ( E V O ) E S N ( R N D ) E S N ( E V O ) N i C o F e G r aph ( R N D ) G r aph ( E V O ) E S N ( R N D ) E S N ( E V O ) N i C o F e LaserNARMA-10NARMA-30

Graph networks Echo state networks Magnetic materials

Figure 2: Performance of materials and simulated reservoir networks on benchmark tasks. Normailsed mean squareerror (NMSE) is used to compare equivalent-sized reservoirs. Multiple reservoir sizes are displayed in columns, andeach task is divided into rows. Each type of system is represented by colour (lattice reservoir = purple; echo statereservoir = green; material = orange). The method used to create the reservoirs is given on the x -axis (random orevolved). For random search, the best reservoir from a batch of 2000 instances is shown, repeated over 20 batches.For evolved, the ﬁnal evolved reservoirs are given from 20 evolutionary runs of 2000 evaluations each.better than, or similar to, evolved networks, which have unrestricted access to long-distance connections. The lowestmaterial errors found on this task are N M SE = 0 . (Co, 49 nodes), 0.032 (Co, 100) and 0.025 (Co, 225). Theseare highly competitive to, or outperform other, material reservoirs reported in the literature, such as optoelectronic( N M SE ≈ . , 50 nodes [36]) and digital reservoirs ( N M SE ≈ . , 400 node delay-line [17]).For the NARMA-30 task (Fig. 2, bottom row), the difference between materials becomes clear. Each material performsdifferently, with Co being able to better match the dynamics of the task. Across all sizes, Co is competitive to randomand evolved networks. The lowest error found is N M SE ≈ . at 225 nodes. Ni and Fe struggle to compete withother networks at small sizes; nevertheless, as size increases, N M SE decreases. This suggests that these materialsrequire larger ﬁlms to exhibit the necessary dynamics to perform the tasks.The NARMA-30 task results show a strong distinction between the materials, despite their similar performances onother tasks. To understand this further, task-independent measures are used to assess non-linearity and memory. Thesemeasures better determine the general underlying dynamics of the system than tasks can achieve alone. They havebeen used to qualitatively assess the dynamical range of materials for reservoir computing [37, 31] and to determinea system’s total information processing capacity [38]. Here, the non-linear projection and short-term memory aremeasured, using the kernel rank (KR) [39] and linear memory capacity (MC) [40] of the reservoir (see SupplementaryMaterial). Fig. 3 shows values of these measures for each of the material reservoirs used in the NARMA-30 task (seeSupplementary Material for all tasks). The Co material (orange) tends to cluster around a normalised KR ≈ . and an M C ≈ . This suggests it is exploiting a weak non-linearity and a large memory to perform the task, whichcorresponds to the known dynamics of the task (see eq. 13). Ni (green) typically has smaller memory than Co butlarger than Fe (black), explaining its intermediate performance. Fe features small values in both KR and MC across all4 M C

49 nodes

Normalised KR100 nodes

225 nodes

NiCoFe

Figure 3: Normalised kernel rank (KR) and linear memory capacity (MC) of evolved ﬁlms across three sizes, for theNARMA-30 task. Materials are separated by colour – Nickel (red), Cobalt (blue), and Iron (black). For the NARMA-30 task, to perform well, MC should be close to the driving equation’s time-lag of 30, which in turn requires morelinear behaviour (i.e., a low KR). The Ni and Co materials do this well, however Fe does not. Only at larger ﬁlm sizesdoes Fe grow in memory capacity.sizes; however as size increases both measures slowly move towards values representative of more desirable dynamics.This change, relative to increase in size, mirrors the gradual decrease in error shown in Fig. 2.Task performances and KR/MC measure assessment indicate that several trade-offs exist. First, smaller ﬁlms generallyshow better performance than similarly sized digital reservoirs. This suggests properties of small ﬁlms, such as shorterdistances between edges, may improve performance. Interference and reﬂection from edges of travelling spin wavesare likely to increase as size decreases. The geometry of the ﬁlm is also likely to have an effect. In our experiments,only square ﬁlms are used; other shapes can provide greater asymmetry at the boundaries. Second, depending on thematerial, larger ﬁlms can boost desirable dynamical properties such as memory. A large surface area enables signals topersist unperturbed away from rapidly changing input sources. Exploiting geometry, size, and inputs to control thesetrade-offs are of great interest for future work.

The simulated platform is realisable in physical hardware. Fig. 4a shows a proposed 5 × B ( t ) to each region of the device.With any new reservoir system, an ability to scale hardware components and reduce error is desired. In our experi-ments, each material exhibits a signiﬁcant improvement as ﬁlm size increases, despite its restrictive lattice topologyand no predeﬁned discrete processing nodes. The greatest improvements relate to the difﬁculty of the task, wheredistinct trade-offs in non-linearity and memory are required. The most signiﬁcant differences between material andsize are shown for the NARMA tasks, where memory is a strong indicator of performance.To assess scaling potential, additional evolutionary searches are conducted with the Co material for larger systems. Inorder to compare material scaling with digital reservoirs, equivalent-sized networks are evolved as well. Fig. 4b showsNARMA-10 task performance as ﬁlm and reservoir size increases. Scaling begins at 25 material cells/network nodesup to 900-cells/nodes, representing ﬁlm dimensions ( D ) of 25nm up to 150nm : D = ( √ N um cells ) × cell size .The results show that up to 400 cells/nodes there is a signiﬁcant reduction in the average error as size increases. Afterthis, the median error is no longer signiﬁcantly different, however lower errors continue to be found in the best runs.This could indicate that larger ﬁlms with lower errors are more challenging to discover, or that potentially beneﬁcialproperties of small ﬁlms are lost, such as interaction of reﬂections from edges.At the nanoscale, thermal noise is a limiting factor. Maintaining performance close to room temperature is desirable forpractical implementations. Stability and reproduciblity can be adversely affected by thermal noise. In our experiments,temperature is set to absolute zero kelvin to observe pure magnetic behaviour without thermal effects. Methods tocontrol and reduce thermal ﬂuctuations have been proposed using spin transfer torque to modify thermal activation5 b ) NARMA-10 scaling( c ) Temperature, thickness and performance( a ) Hardware interface Temperature (K)

Figure 4: a ) Proposed hardware interface to realise a thin-ﬁlm reservoir computing device. b ) Performance of Comaterial on the NARMA-10 task as number of cells increases. Performance of the material remains competitive toscaled simulated reservoirs. c ) Grid sweep of ﬁlm temperatures (K) and ﬁlm thickness (nm). The NMSE of theevolved Co conﬁguration is shown using colour. Errors are for the NARMA-30 task. Temperature ranges from 0 K(original experiments) to more practical temperatures including room temperature (300 K). White box-plots in colourbar display performances of the 20 best random ESNs at the respective size. A white diamond in a cell signals taskerror is within the ESN range.rates [41]. This suggests different paths towards room temperature computing with thin-ﬁlms without cooling areplausible.To demonstrate the effect of temperature on our ﬁlms, additional experiments are conducted. Fig. 4c shows reservoirperformance at various temperatures on the NARMA-30 task. The temperature range includes: millikelvin ( . K), liquid helium ( . K), liquid nitrogen ( K), and room temperature (

K). The top-left shows the originalexperimental setup (temperature = 0 K and thickness ≈ . nm) for an evolved Co reservoir. As temperature increasesalong the x -axis, thermal noise dominates and degrades performance. A similar pattern is present across all ﬁlm sizes,tasks and materials (see Supplementary Material).Film thickness is also investigated to see whether thickness can compensate for a rise in temperature. On the y -axis of Fig. 4c, ﬁlm thickness varies from 0.1–2nm. In general, performance is maintained with thicknesses up to . nm and temperatures up to 30–77 K. Between 0.5–1nm, the change in error slows as temperature rises (30 to200 K), however errors are higher than for thinner ﬁlms. Beyond nm, thicker ﬁlms tend to degrade performance,but this varies depending on material and ﬁlm size (see Supplementary Material). The results show that ﬁlms withsub-nanometer thickness at temperatures up to 100 K work best, outperforming or matching equivalent-sized randomreservoir networks. Our spintronic-based system provides an exceptional platform for machine learning with analogue hardware. Bycombining two frameworks, evolution in materio and reservoir computing, novel magnetic computing devices aredemonstrated. 6ithout the need for discrete neural components, physical reservoirs are possible with smaller footprints than otherneuromorphic devices, e.g., memristors, spin torque oscillators, photonics [42, 22, 24]. The evolved devices operateat frequencies of 100 GHz and require no special preprocessing to emulate network structures [17, 22]. The basicmaterials used are inexpensive and feature a large dynamical range that can be reconﬁgured externally to solve differentmachine learning tasks.With this generic platform, other complex magnetic materials such as alloys, oxides, skyrmion fabrics, and antiferro-magnetic reservoirs [43] can be optimised and exploited. Furthermore, simulations of complex atomic structures arepossible. With atomistic simulations, desirable hetero-structures or defects can be introduced to add more reservoircomplexity and greater physical realism.The natural dynamics and nanoscale size of the proposed magnetic substrates presents a new path towards fast energy-efﬁcient computing platforms enabling new innovations in smart technologies.

For a generic atomistic model with n nearest neighbour interactions, the Curie temperature T C can be calculated fromthe atomistic exchange J ij by the mean-ﬁeld expression. This sums over every exchange that occurs in each cell tocalculate the total exchange [44]. T C = ε k B N c N c (cid:88) i =0 n (cid:88) j =0 J ij (1)where k B is the Boltzmann constant, N c is the number of atoms per cell, and ε is a correction factor from the usualmean-ﬁeld expression which arises due to spin waves in the 3D Heisenberg model.The anisotropy k u and the spontaneous magnetisation M s are calculated as a sum of the atomic anisotropies and spinmoments within each cell. The gyromagnetic ratio γ and the damping constant α are calculated as an average of theatomic parameters for each cell.The energetics of the micromagnetic system are described using a spin Hamiltonian neglecting non-magnetic contri-butions and given by: H eff = H app + H ani + H exc + H dip (2)where H app is the applied ﬁeld, H ani is the anisotropy ﬁeld, H exc is the intergranular exchange, and H dip is the dipoleﬁeld.The anisotropy Hamiltonian describes the directional dependence of the materials magnetisation, in this case theanisotropy is uniaxial along z and is described by: H ani = KV ( m x + m y ) (3)The exchange ﬁeld is calculated as a sum of the exchange interactions between neighbouring cells, the micromagneticexchange constant A is a sum over all atoms which have a neighbours in another cell. The summation over all theinteractions gives a total interaction from cell i to cell j . From this the micromagnetic exchange constant is calculatedby multiplying by the distance between the atomistic atoms. H iex = A ij M S ∆ m e (cid:88) n cells ( m j − m i ) (4) H dip = µ π m · ˆ r )ˆ r − m | ˆ r | − µ m V (5)The atomistic Landau–Lifshitz–Gilbert (LLG) equation is used to model the time-dependent behaviour of the magneticﬁlms given by: ∂ m i ∂t = − γ (1 + λ ) (cid:2) m i × H i eﬀ + λ m i × (cid:0) m i × H i eﬀ (cid:1)(cid:3) (6)7here m i is a unit vector representing the direction of the magnetic spin moment of cell i , γ is the gyromagnetic ratioand H i eﬀ is the net magnetic ﬁeld on each cell and is equal to the derivative of the spin Hamiltonian: H i eﬀ = − µ s ∂ H eff ∂ S i (7) The reservoir dynamics of simulated networks are given by the state update equation: x ( t ) = (1 − a ) x ( t −

1) + af ( b W in [ u ( t ); u bias ] + c W x ( t − (8)where x is the internal state at time-step t , f is the non-linear neuron activation function (a tanh function), u is theinput signal, and u bias is a bias source. W in and W are weight matrices giving the connection weights to inputsand internal neurons respectively. The parameters b and c control the global scaling of the input weights and internalweights. Input scaling b affects the non-linear response of the reservoir and relative effect of the current input. Internalscaling c controls the reservoir’s stability as well as the inﬂuence and persistence of the input: low values dampeninternal activity and increase response to input, and high values lead to chaotic behaviour. A leakage ﬁlter a is used tomatch the internal timescales of the ﬁlm to the characteristic timescale of the task. This is similar to adding a low-passﬁlter before the output. The leak rate controls the time-scale mismatch between the input and reservoir dynamics;when a = 1 , the previous states do not leak into the current states.For both random and evolved reservoir networks, W in and W are initialised as sparse normally distributed randommatrices (input sparsity = 0 . , internal sparsity = 0 . , mean = 0 , variance = 1 ). For the lattice network, we deﬁne asquare grid of neurons each connected to its nearest neighbours in its Moore neighbourhood [45]. Each non-perimeternode has eight connections to neighbours and one self-connection, resulting in each node having a maximum of nineadaptable weights in W .The ﬁnal trained output y ( t ) is given when the reservoir states x ( t ) are combined with the trained readout weightmatrix W out : y ( t ) = W out x ( t ) (9)Readout training is performed using ridge regression [46] and occurs within the evolutionary loop during the trainingphase. A validation and testing phase is carried out to evaluate the generalisation of the readout to new data. Thisapproach is similar to previous work [47, 48]. During the simulation, material parameters such as exchange interaction, anisotropies, and atomic moments are deﬁnedby the material and remain unaltered. Parameters controlling the input mapping, ﬁeld intensity b , intrinsic magneticdamping α , and a post-state collection ﬁlter a are tuned.The material is interpreted as a reservoir in the following way: X ( t ) = σ ( b W in [ u ; u bias ] , α ) (10) X f ( t ) = (1 − a ) X ( t −

1) + a X ( t ) (11) y ( t ) = W out X f ( t ) (12)where X is the global material state comprising each cell’s local X xyz

3d magnetisation vector, σ represents thematerial function, a is the leakage parameter, and X f is an external ﬁlter layer with a one-step memory implementedafter the observation of material state X and before the readout weights are applied.The input mapping W in consists of weighted connections from the input u and a bias u bias source to each cell. Theinput search space is typically large and grows with ﬁlm size. Field intensity ( < b ≤ ) is a global scaling factorapplied to the input mapping. This suppresses or raises the overall magnitude of the locally applied ﬁelds promotingvarying dynamical behaviours.The magnetic damping parameter ( < α ≤ ) controls the speed of information propagation and oscillation. Damp-ing describes the non-linear spin relaxation across the ﬁlm, controlling the rate at which magnetisation spins reachequilibrium.To optimise magnetic reservoirs, artiﬁcial evolution is applied. To reduce convergence time, linear regression is alsoused to train the readout rather than evolving it (see Methods). The evolutionary goal is to ﬁnd parameters that optimisethe efﬁciency and ability of the readout layer to perform its function.8 i Co Fe unitCrystal structure fcc fcc bcc –Unit cell size a µ s µ B Exchange energy J ij . × − . × − . × − J/linkAnisotropy k . × − . × − . × − J/atomTemp. rescaling exponent .

322 2 .

369 2 . –Rescaling Curie temperature

635 1395 1049 – Table 1: Parameters used to simulate each ferromagnetic material in VAMPIRE. These parameters are static in ourwork and are not affected by the evolutionary algorithm.Many heuristics can be used to optimise reservoirs [49], but here the microbial genetic algorithm (MGA) [50] is chosenfor its simplicity. The MGA allows individuals to survive across many generations, provides elitism for free, and offersa simple mechanism for selection, recombination and mutation.Parameters for the MGA include: population size = 100 , number of generations = 2000 , mutation rate = 0 . ,recombination rate = 0 . , deme size (species separation) = 0 . of population), and number of runs = 20 .These parameters were used for all experiments involving an evolutionary algorithm.To conduct the experiments, the VAMPIRE source code was adapted to construct a dynamic input-output mechanism.Important parameters for the VAMPIRE simulation include input frequency, integration time-step, initial spin direc-tion, and macro-cell size (micromagnetic simulation). The input frequency chosen – 10ps / 100 GHz – was basedon qualitative experiments in search of characteristic behaviours, such as fast response and a short settling time. Theinput frequency has to closely match the internal timescales and dynamics of the system.To optimise the evaluation process and reduce computational cost an integration timestep of 100fs was used. Thisprovides a less accurate model compared to an integration timestep of 1fs but provides manageable computationalrun times. Details about how this parameter choice minimally affects performance are provided in the supplementarymaterial.The initial spin direction was aligned with the x -axis, and input signals were injected in the z -direction. The macro-cellsize for each simulation was ﬁxed at 5nm.Simulation parameters for each material are given in Table 1. These include exchange constants and second-order uni-axial anisotropy constants. To conduct accurate temperature calculations, rescaling exponents and curie temperatureinformation are also included. The chosen tasks are widely used benchmarks for different reservoir systems and methods [51, 33, 52, 36, 34, 30].The laser task predicts the next value of the Santa Fe time-series Competition Data (dataset A) [29]. The dataset holdsoriginal source data recorded from a Far-Infrared-Laser in a chaotic state. The training and testing uses the ﬁrst 2,000values of the dataset, divided into three sets: 1200 (training set), 400 (validation set), and 400 (test set). The ﬁrst 50output values of each sub-set are discarded as an initial washout period.The NARMA task originates from work on training recurrent networks [53]. It evaluates a reservoir’s ability to modelan n -th order highly non-linear dynamical system where the system state depends on the driving input as well as its ownhistory. The challenging aspect of the NARMA task is that it contains both non-linearity and long-term dependenciescreated by the n -th order time-lag.An n -th ordered NARMA experiment is carried out by predicting the output y ( n + 1) given by eq.(13) when suppliedwith u ( n ) from a uniform distribution of interval [0, 0.5]. For the 10-th and 30-th order systems α = 0 . , β = 0 . , δ = 10 and γ = 0 . . y ( n + 1) = αy ( n ) + βy ( n ) (cid:32) δ (cid:88) i =0 y ( n − i ) (cid:33) + 1 . u ( n − δ ) u ( n ) + γ (13)The NARMA equation is simulated for 5,000 values and split into: 3,000 training, 1,000 validation and 1,000 test forboth versions. The ﬁrst 50 values of each sub-set are discarded as an initial washout period.9 cknowledgements This work is part of the SpInspired project, funded by EPSRC Grant EP/R032823/1. All experiments were carried outusing the University of York’s Super Advanced Research Computing Cluster (Viking).

References [1] Tanaka, G. et al.

Recent advances in physical reservoir computing: A review.

Neural Networks (2019).[2] Shi, W., Cao, J., Zhang, Q., Li, Y. & Xu, L. Edge computing: Vision and challenges.

IEEE internet of thingsjournal , 637–646 (2016).[3] Chen, J. & Ran, X. Deep learning with edge computing: A review. Proceedings of the IEEE , 1655–1674(2019).[4] Wang, X. et al.

Convergence of edge computing and deep learning: A comprehensive survey.

IEEE Communi-cations Surveys & Tutorials , 869–904 (2020).[5] Adamatzky, A. (ed.) Advances in Unconventional Computing: Volume 2 Prototypes, Models and Algorithms (Springer, 2016).[6] Young, A. R., Dean, M. E., Plank, J. S. & Rose, G. S. A review of spiking neuromorphic hardware communica-tion systems.

IEEE Access , 135606–135620 (2019).[7] Indiveri, G. et al. Neuromorphic silicon neuron circuits.

Frontiers in neuroscience , 73 (2011).[8] Jo, S. H. et al. Nanoscale memristor device as synapse in neuromorphic systems.

Nano letters , 1297–1301(2010).[9] Xia, Q. & Yang, J. J. Memristive crossbar arrays for brain-inspired computing. Nature materials , 309–323(2019).[10] Miller, J. F. & Downing, K. Evolution in materio: Looking beyond the silicon box. In NASA/DoD Conferenceon Evolvable Hardware 2002 , 167–176 (IEEE, 2002).[11] Mohid, M. et al.

Evolving solutions to computational problems using carbon nanotubes.

International Journalof Unconventional Computing , 245–281 (2015).[12] Massey, M. et al. Evolution of electronic circuits using carbon nanotube composites.

Scientiﬁc Reports (2016).[13] Bose, S. et al. Evolution of a designless nanoparticle network into reconﬁgurable boolean logic.

Nature nan-otechnology doi:10.1038/nnano.2015.207 (2015).[14] Chen, T. et al.

Classiﬁcation with a disordered dopant-atom network in silicon.

Nature , 341–345 (2020).[15] Schrauwen, B., Verstraeten, D. & Van Campenhout, J. An overview of reservoir computing: theory, applicationsand implementations. In

Proceedings of the 15th European symposium on artiﬁcial neural networks (Citeseer,2007).[16] Verstraeten, D. & Schrauwen, B. On the quantiﬁcation of dynamics in reservoir computing. In

Artiﬁcial NeuralNetworks–ICANN 2009 , 985–994 (Springer, 2009).[17] Appeltant, L. et al.

Information processing using a single dynamical node as complex system.

Nature Commu-nications , 468 (2011).[18] Caravelli, F. & Carbajal, J. Memristors for the curious outsiders. Technologies , 118 (2018).[19] Dion, G., Mejaouri, S. & Sylvestre, J. Reservoir computing with a single delay-coupled non-linear mechanicaloscillator. Journal of Applied Physics , 152132 (2018).[20] Prychynenko, D. et al.

Magnetic skyrmion as a nonlinear resistive element: A potential building block forreservoir computing.

Physical Review Applied , 014034 (2018).[21] Pinna, D., Bourianoff, G. & Everschor-Sitte, K. Reservoir computing with random skyrmion textures. Phys. Rev.Applied , 054020 (2020). URL https://link.aps.org/doi/10.1103/PhysRevApplied.14.054020 .[22] Torrejon, J. et al. Neuromorphic computing with nanoscale spintronic oscillators.

Nature , 428–431 (2017).[23] Nakane, R., Tanaka, G. & Hirose, A. Reservoir computing with spin waves excited in a garnet ﬁlm.

IEEE Access , 4462–4469 (2018).[24] Romera, M. et al. Vowel recognition with four coupled spin-torque nano-oscillators.

Nature , 230–234(2018). 1025] Zheng, Q., Zhu, X., Mi, Y., Yuan, Z. & Xia, K. Recurrent neural networks made of magnetic tunnel junctions.

AIP Advances , 025116 (2020).[26] Watt, S. & Kostylev, M. Reservoir computing using a spin-wave delay-line active-ring resonator based onyttrium-iron-garnet ﬁlm. Physical Review Applied , 034057 (2020).[27] Zahedinejad, M. et al. Two-dimensional mutually synchronized spin hall nano-oscillator arrays for neuromorphiccomputing.

Nature Nanotechnology , 47–52 (2020).[28] Pajda, M., Kudrnovsk`y, J., Turek, I., Drchal, V. & Bruno, P. Ab initio calculations of exchange interactions,spin-wave stiffness constants, and curie temperatures of fe, co, and ni. Physical Review B , 174402 (2001).[29] Weigend, A. The Santa Fe Time Series Competition Data: Data set A, Laser generated data (1991 (accessedMarch, 2016)). URL .[30] Inubushi, M. & Yoshimura, K. Reservoir computing beyond memory-nonlinearity trade-off.

Scientiﬁc Reports , 10199 (2017).[31] Dale, M. et al. The role of structure and complexity on reservoir computing quality. In

International Conferenceon Unconventional Computation and Natural Computation , 52–64 (Springer, 2019).[32] Dale, M., O’Keefe, S., Sebald, A., Stepney, S. & Trefzer, M. A. Reservoir computing quality: connectivity andtopology.

Natural Computing (2020). doi:10.1007/s11047-020-09823-1.[33] Rodan, A. & Tiˇno, P. Simple deterministically constructed recurrent neural networks. In

International Confer-ence on Intelligent Data Engineering and Automated Learning , 267–274 (Springer, 2010).[34] Larger, L. et al.

Photonic information processing beyond turing: an optoelectronic implementation of reservoircomputing.

Optics Express , 3241–3249 (2012).[35] Hou, Y. et al. Prediction performance of reservoir computing system based on a semiconductor laser subject todouble optical feedback and optical injection.

Optics Express , 10211–10219 (2018).[36] Paquot, Y. et al. Optoelectronic reservoir computing.

Scientiﬁc Reports (2012).[37] Dale, M., Miller, J. F., Stepney, S. & Trefzer, M. A. A substrate-independent framework to characterize reservoircomputers. Proceedings of the Royal Society A , 20180723 (2019).[38] Dambre, J., Verstraeten, D., Schrauwen, B. & Massar, S. Information processing capacity of dynamical systems.

Scientiﬁc Reports (2012).[39] Legenstein, R. & Maass, W. Edge of chaos and prediction of computational performance for neural circuitmodels. Neural Networks , 323–334 (2007).[40] Jaeger, H. Short term memory in echo state networks (GMD-Forschungszentrum Informationstechnik, 2001).[41] Demidov, V. E. et al.

Magnetic nano-oscillator driven by pure spin current.

Nature materials , 1028–1031(2012).[42] Du, C. et al. Reservoir computing using dynamic memristors for temporal information processing.

Naturecommunications , 1–10 (2017).[43] Kurenkov, A., Fukami, S. & Ohno, H. Neuromorphic computing with antiferromagnetic spintronics. Journal ofApplied Physics , 010902 (2020).[44] Jiles, D.

Introduction to magnetism and magnetic materials (CRC press, 2015).[45] Adamatzky, A.

Game of life cellular automata , vol. 1 (Springer, 2010).[46] Lukoˇseviˇcius, M. A practical guide to applying echo state networks. In

Neural Networks: Tricks of the Trade ,659–686 (Springer, 2012).[47] Dale, M., Miller, J. F., Stepney, S. & Trefzer, M. A. Evolving carbon nanotube reservoir computers. In

Interna-tional Conference on Unconventional Computation and Natural Computation , 49–61 (Springer, 2016).[48] Dale, M. Neuroevolution of hierarchical reservoir computers. In

Proceedings of the Genetic and EvolutionaryComputation Conference , 410–417 (ACM, 2018).[49] Bala, A., Ismail, I., Ibrahim, R. & Sait, S. M. Applications of metaheuristics in reservoir computing techniques:a review.

IEEE Access , 58012–58029 (2018).[50] Harvey, I. The microbial genetic algorithm. In European Conference on Artiﬁcial Life , 126–133 (Springer,2009).[51] Jaeger, H. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note.

Bonn, Germany: German National Research Center for Information Technology GMD Technical Report , 34(2001). 1152] Tran, S. D. & Teuscher, C. Memcapacitive reservoir computing. In , 115–116 (IEEE, 2017).[53] Atiya, A. F. & Parlos, A. G. New results on recurrent network training: unifying the algorithms and acceleratingconvergence.

IEEE Transactions on Neural Networks , 697–709 (2000).[54] Ganguli, S., Huh, D. & Sompolinsky, H. Memory traces in dynamical systems. Proceedings of the NationalAcademy of Sciences , 18970–18975 (2008). 12 upplementary Material

Optimised Integration Time-step

To reduce computational time simulating thin-ﬁlms a large integrator time-step was used. Ideally, small time stepsare preferable to more accurately capture spin precession and general dynamics between input pulses, however, thiscomes with a large computational cost.A characterisation of the how the integrator time step affects task performance is given in Fig. 5. Here, we show thechosen 100fs integrator time-step compared to the more accurate 1fs time-step. These results show that, in general,our chosen time-step is statistically similar, representing a reasonably accurate model of the driven dynamics.To test whether the medians of both time-steps were signiﬁcantly different, the non-parametric two-sided Wilcoxonrank sum test was used. This tests the null hypothesis that both samples are from the same distribution with equalmedians. A rejection of the null hypothesis at the 95% signiﬁcance level is indicated by a p -value > . .The p -values for each task are: p = 0 . (laser), p = 0 . (NARMA-10), and p = 0 . (NARMA-30). This indicatesthat performance is not signiﬁcantly affected by the change in time-step, however, computational time is reduceddramatically from hours to minutes. N M SE Laser

NARMA-10

NARMA-30

Figure 5: Comparing integrator time-steps across three tasks. 100fs provides a less accurate model compared to 1fsbut dramatically reduces run-time. Only the Co material used in this experiment. Each boxplot shows a total of 30random conﬁgurations for each task compared at both time-steps.

Kernel Rank and Memory Capacity

In this work, two property measures are used to assess the underlying dynamics of the reservoir system. Thesemeasures are the kernel rank (KR) and linear memory capacity (MC).Kernel rank is a measure of the reservoir’s ability to separate distinct input patterns [39]. It measures a reservoir’sability to produce a rich non-linear representation of the input u and its history u ( t − , u ( t − , . . . . This is closelylinked to the linear separation property , measuring how different input signals map onto different reservoir states. Asmany practical tasks are linearly inseparable, reservoirs typically require some non-linear transformation of the input.KR is a measure of the complexity and diversity of these non-linear operations performed by the reservoir.Reservoirs in ordered dynamical regimes typically have a low ranking value of KR, and in chaotic regimes, it is high.The maximum value of KR is relative to the number of observable states. In our experiments, KR is normalised toobserve the underlying non-linearity of the task without distortion from reservoir size.Another important property for reservoir computing is memory, as reservoirs are typically conﬁgured to solve temporalproblems. A simple measure for reservoir memory is the linear short-term memory capacity (MC). This was ﬁrstoutlined in [40] to quantify the echo state property. For the echo state property to hold, the dynamics of the input drivenreservoir must asymptotically wash out any information resulting from initial conditions. This property thereforeimplies a fading memory exists, characterised by the short-term memory capacity.A full understanding of a reservoir’s memory capacity, however, cannot be encapsulated through a linear memorymeasure alone, as a reservoir will possess some non-linear memory. Other memory measures proposed in the literaturequantify other aspects of memory, such as the quadratic and cross-memory capacities, and total memory of reservoirs13sing the Fisher Memory Curve [54, 38]. The linear measure is used here as a simple benchmark. More sophisticatedmeasures are unnecessary to identify the differences in the following tasks.

49 nodes 100 nodes 225 nodes M C Normalised KR

NiCoFe

Figure 6: Kernel rank (normalised) versus linear memory capacity for all materials, sizes and tasks. Each columnrefers to a ﬁlm size (left, middle, right: 49, 100, 225) and rows to a task (top, middle, bottom: laser, NARMA-10,NARMA-30). Material reservoirs shown are those displayed in Fig. 2.In Fig. 3, just the results for the NARMA-30 task are given. The results for all tasks and sizes are provided in Fig. 6.From these results, it is possible to determine the difference in dynamics required for each task.The laser task requires very little memory, and is mainly driven by non-linear dynamics. The normalised KR of 0.5 isrelatively high when taking into account that many of the magnetic materials observable states are highly correlated,e.g., from the x and y dimension of the spins.The NARMA-10 task features more linear tendencies. We see memory capacity clusters around the value of 10,relating of course to the 10 time-step time-lag present in the system being modelled. Irrespective of size, the samecharacteristic dynamics have converged during evolutionary selection, and all materials are able to exhibit the samedynamics.The NARMA-30 task requires a notable increase in memory capacity. At the smallest size, no material meets thenecessary criteria (

M C = 30 ) to perform well at the task, however Co and Ni attempt to maximise their MC. Festruggles to exhibit any memory. As size increases, Co and Ni gradually reach

M C = 30 and this is reﬂected in theirperformance. The MC of Fe also increases but at a slower rate proportional to size.14 emperature Effect and Film Thickness

To build practical computing systems it is desirable for the materials to function close to room temperature. In addition,thicker ﬁlms put less strain on the fabrication process. In our main experiments, each material ﬁlm was evolved at 0Kto evaluate performance without thermal ﬂuctuations. Here, we show how temperature affects performance at all ﬁlmsizes (number of cells) and across each task.For the laser task (Fig. 7), performance is stable and competitive – to random ESNs of equivalent node size – at highertemperatures typically up to 100K, depending on the material and number of cells. The most stable material and ﬁlmsize is Fe at 100 cells. In this conﬁguration, only a small change in performance is present as thickness is increased upto 1nm.For the NARMA-10 task performance is again stable, in some cases up to 100K, e.g., Co with 100 cells. As temperatureincreases, performance tends to drop off slightly faster than the laser task. This could be due to degradation inmemory quality as thermal noise increases. In general, the results suggest the Co material responds better to increasedtemperatures. However, thicker ﬁlms tend to be more detrimental to performance. The same trends are seen for theNARMA-30 task.

Interference and Reﬂective Boundaries

The proposed system exploits the nonlinear interactions of spins when perturbed by local magnetic ﬁelds. As informa-tion propagates, local coupled spins form wave crests and troughs that interact, creating interference patterns. At theboundaries, waves are reﬂected back into the ﬁlm. Figs. 10 and 11 provide a visualisation of this dynamical behaviourfor two Co ﬁlms (49 cell and 900 cell). At t = 10 , a single input pulse is supplied to two separate input locations.At t = 11 - , waves appear and propagate. The smaller ﬁlm (Fig. 10) interacts almost instantly with the boundaries,and waves reverberate around the ﬁlm for some time. In the larger ﬁlm (Fig. 11), signals propagate for longer, undis-turbed, until the wave crests reach each other and the boundaries. At t > , interference and reﬂected waves begin todominate; however, memory of past inputs are still recoverable according to the memory capacity measure.

49 cells

100 cells 225 cells -3 -2 -1 T h i ck ne ss ( n m ) -3 -2 -1 .

28 4 . .

28 4 . Temperature (K) .

28 4 . -3 -2 -1 NiCoFe

Figure 7: Temperature and ﬁlm thickness sweeps for laser task. Normalised mean square error (colour) of an evolvedconﬁguration for Ni, Co, Fe. Box plots (white) display performances of 20 best random ESNs for this task. A diamond(white) signals that performance of ﬁlm is within the ESN range. A diamond (yellow) signals that performance of ﬁlmis better than ESN 15

100 cells 225 cells -2 -1 T h i ck ne ss ( n m ) -2 -1 .

28 4 . .

28 4 . Temperature (K) .

28 4 . -2 -1 NiCoFe

Figure 8: Temperature and ﬁlm thickness sweeps for NARMA-10 task. Normalised mean square error (colour) ofan evolved conﬁguration for Ni, Co, Fe. Box plots (white) display performances of 20 best random ESNs for thistask. A diamond (white) signals that performance of ﬁlm is within the ESN range. A diamond (yellow) signals thatperformance of ﬁlm is better than ESN.

49 cells

100 cells 225 cells -1 T h i ck ne ss ( n m ) -1 .

28 4 . .

28 4 . Temperature (K) .

28 4 . -1 NiCoFe

Figure 9: Temperature and ﬁlm thickness sweeps for NARMA-30 task. Normalised mean square error (colour) ofan evolved conﬁguration for Ni, Co, Fe. Box plots (white) display performances of 20 best random ESNs for thistask. A diamond (white) signals that performance of ﬁlm is within the ESN range. A diamond (yellow) signals thatperformance of ﬁlm is better than ESN 16 =11 t =12 t =13 t =14t =15 t =16 t =17 t =18t =19 t =20 t =21 t =22

Figure 10: Dynamics of Co magnetic ﬁlm with 49 cells. An input pulse is supplied at two locations on the ﬁlm at t = 10 . Red indicates a positive magnetisation, and blue, negative. t =11 t =12 t =13 t =14t =15 t =16 t =17 t =18t =19 t =20 t =21 t =22 Figure 11: Dynamics of Co magnetic ﬁlm with 900 cells. An input pulse is supplied at two locations on the ﬁlm at t = 10= 10