Harnessing adaptive dynamics in neuro-memristive nanowire networks for transfer learning
Ruomin Zhu, Joel Hochstetter, Alon Loeffler, Adrian Diaz-Alvarez, Adam Stieg, James Gimzewski, Tomonobu Nakayama, Zdenka Kuncic
HHarnessing adaptive dynamics in neuro-memristivenanowire networks for transfer learning
Ruomin Zhu ∗ , Joel Hochstetter ∗ , Alon Loeffler ∗ , Adrian Diaz-Alvarez † , Adam Stieg †‡ , James Gimzewski †‡ ,Tomonobu Nakayama †∗ and Zdenka Kuncic ∗†∗ School of Physics and Sydney Nano Institute, University of Sydney, Sydney, NSW 2006, AustraliaEmail: [email protected] † International Centre for Materials Nanoarchitectonics, National Institute for Materials Science, Tsukuba, Japan ‡ California NanoSystems Institute, University of California at Los Angeles, California, USA
Abstract —Nanowire networks (NWNs) represent a uniquehardware platform for neuromorphic information processing. Inaddition to exhibiting synapse-like resistive switching memory attheir cross-point junctions, their self-assembly confers a neuralnetwork-like topology to their electrical circuitry, something thatis impossible to achieve through conventional top-down fabrica-tion approaches. In addition to their low power requirements, costeffectiveness and efficient interconnects, neuromorphic NWNsare also fault-tolerant and self-healing. These highly attractiveproperties can be largely attributed to their complex networkconnectivity, which enables a rich repertoire of adaptive nonlin-ear dynamics, including edge-of-chaos criticality. Here, we showhow the adaptive dynamics intrinsic to neuromorphic NWNs canbe harnessed to achieve transfer learning. We demonstrate thisthrough simulations of a reservoir computing implementationin which NWNs perform the well-known benchmarking task ofMackey-Glass (MG) signal forecasting. First we show how NWNscan predict MG signals with arbitrary degrees of unpredictability(i.e. chaos). We then show that NWNs pre-exposed to a MGsignal perform better in forecasting than NWNs without priorexperience of an MG signal. This type of transfer learningis enabled by the network’s collective memory of previousstates. Overall, their adaptive signal processing capabilities makeneuromorphic NWNs promising candidates for emerging real-time applications in IoT devices in particular, at the far edge.
Index Terms —neuromorphic information processing, memris-tive switching, neural network, nonlinear dynamics, transferlearning
I. I
NTRODUCTION
The field of neuromorphic engineering is widely recognizedas the realization of Carver Mead’s original vision for a newtype of electronic hardware engineered to mimic informationprocessing in biological nervous systems [1], [2]. Today, thelowest common denominator of virtually all neuromorphichardware systems is the co-location of memory and process-ing units (i.e. non-von Neumann architecture). This minimalneuromorphic feature alone has dramatically improved powerefficiency in training various artificial neural network (ANN)models [3].A higher level neuromorphic attribute is the ability to learnand while ANN models demonstrate learning in software,learning in hardware is desirable for next-generation stand-alone cognitive devices, especially at the IoT edge [4]. Inhardware, spike-based learning has been successfully imple- mented in conventional silicon CMOS technology (e.g. [5]–[8]). Beyond silicon, nanoelectronic materials with intrinsicneuromorphic properties, including memory and the ability toemulate synaptic connections [3], [9], have attracted enormousattention for on-chip learning [10]. In particular, resistiveswitching memory (memristive) devices [11]–[14] are leadingcandidates for efficient neuromorphic computing architectures,with demonstrated neuromorphic learning functionalities suchas short-term and long-term potentiation (STP/LTP) and spike-timing dependent plasticity (STDP) [15]–[18].At the device level, neuromorphic functionalities can bebroadly attributed to modification of electronic transportmechanisms by nanoscale geometric confinement, usuallyacross a metal-insulator-metal (MIM) junction [19]. Impor-tantly, synapse-like memristive switching is observed not justin memristors fabricated from conventional bulk materials(e.g. metal oxides), but also in neuromorphic systems self-assembled from nanomaterials using bottom-up techniques[10]. Here, we focus on self-assembled metallic nanowiresbecause not only do they form memristive switching MIMjunctions, but they also form a complex neural-like networktopology, with all-in-one connectivity properties such as small-worldness, modularity and recurrent feedback loops [20]–[27]. The unique neuromorphic topology of self-assemblednanowire networks (NWNs) is responsible for collective func-tionalities emerging from the interplay between network con-nectivity and synaptic nonlinear dynamics [25], [28]–[32].Learning in NWN hardware does not require implemen-tation of an ANN model, as has been demonstrated withassociative memory tasks [33], [34] and with temporal infor-mation processing tasks using a reservoir computing approach,where the network self-regulates in response to continuous-time input signals and only the readout is trained [29], [35],[36]. Varying spatio-temporal input signals (i.e. delivered viadifferent contact electrodes and with time-varying amplitudes)results in the formation of new electrical pathways, analogousto synaptogenetic learning [37], [38]. Here, we show thatNWNs with prior experience of a complex, nonlinear time-series signal can perform better in forecasting the signal than aNWN without prior exposure, thus demonstrating capacity fortransfer learning, an important attribute for general intelligence(see [39] for a recent comprehensive review). a r X i v : . [ c ond - m a t . d i s - nn ] N ov ig. 1. Graph representations of 300-node networks: left – self–assemblednanowire network (2434 edge junctions, average degree 16, small-worldpropensity 0.67); right – random network (2400 junctions, average degree16, small-world propensity 0.29). Nodes in red, edges in black. II. M
ETHODS
A. Modelling network connectivity and memristive junctions
We performed simulations using a physically motivatedmodel based on polymer-coated Ag nanowires that self–assemble into a complex network [25], [31]. Self–assemblywas modelled by distributing individual nanowires on a 2Dplane, with uniformly random positions and orientations, andwith lengths uniformly sampled from a gamma distribution(mean 100 µ m, stdev 10 µ m). The variance in nanowire lengthis based on experimental observations [20], [21], [25], [33]and increases the probability of forming cross-point junctionsbetween overlapping nanowires. This mimics biological neuralnetworks, in which individual neurons can each make severalthousand synaptic connections to neighbouring neurons. In ourmodel of self-assembled nanowire networks (NWNs), a rangeof nanowire connectivities are possible for a fixed number ofnanowires. Importantly, the resulting network structure is morecomplex than a purely random topology or fully connectednetwork (Fig. 1), with sparseness and recurrence characteris-tics that are responsible for efficient signal transduction andemergent cognitive function in biological neural networks [27],[40], [41]. It is also noteworthy that the complex networktopology of self-assembled networks differs from the bipartitestructure used in ANN models.Nanowire-nanowire cross-points were modelled as voltage-controlled memristive junctions described by a state-dependentOhm’s law, I = G ( λ ) V , where the conductance G ( λ ) isa function of the state variable λ ( t ) that depends on thepast history of voltage input. Physically, λ ( t ) parameterizesthe evolution of a conductive filament that forms across theMIM junction above a threshold bias. For polymer-coated Agnanowires, the polymer is electrically insulating, but ionicallyconducting, so Ag + cations can migrate across the biasedjunction [42]. The conducting atomic filament that forms inthis way switches the junction from a high-resistance “off”state, to a low-resistance “on” state, when λ ≥ λ crit , where λ crit is a threshold. As the polymer thickness ( ≈ − nm)is comparable to the Fermi length of Ag ( ≈ . nm), resistiveswitching is modelled as a change in the junction conductancestate G ( λ ) by an amount equal to the conductance quanta G = (13 kΩ) − , consistent with measurements of individualnanowire junctions [43]. The corresponding resistance statesare R on = G − and R off = ζR on , with ζ = 10 used in the simulation results presented here. Additionally, the Simmonsformula is used to model the low voltage tunneling regime in G ( λ ) when the conductive filament is close to the oppositenanowire [44]. Network conductance is calculated using amodified nodal analysis [45] to solve Kirchoff’s circuit lawequations at each time point. B. Mackey–Glass time series prediction
Reservoir computing was implemented on a network with N = 100 nanowire nodes and 577 memristive junctions. TheMackey–Glass (MG) signal was delivered to one source nodeas input voltage bias relative to a drain node. MG signalswith varying time delays τ ≥ were predicted, with τ = 17 corresponding to the onset of chaotic dynamics.The network state as a function of time t is represented byinstantaneous voltage on all the N nodes. The MG signal ata future time u t + δt was predicted using a subset of n = 10 node states weighted by a vector w : u t + δt = w · η t , (1)where η t is an 11-element vector that includes a 1 V linearshift element and where w was determined by least squaresregression using all past states of the n nodes and the input(teacher) signal in the time interval t ∈ [0 , T ] , i.e. [ u δt , u δt +1 , ..., u T ] = w · [ η , η , ... η T − δt ] . (2)A history length of T > τ was used to train the n outputweights and the prediction step was set to δt = τ . Accuracyof the prediction task was calculated asAccuracy = 1 − RNMSE (3)where RNMSE is the root-normalized mean square error. Sta-tistical uncertainties were determined by randomly selectingthe n = 10 readout nodes for 100 simulations and averagingthe accuracy. C. Transfer learning
In conventional reservoir computing, the initial state ofthe network is homogeneous and for the MG prediction taskdescribed above, we set η = . We modified the task byfirst exposing the network to a source MG signal with delayparameter τ before training and predicting a second targetMG signal with delay parameter τ . Delivering the sourceMG ( τ ) signal for 1.5 s effectively primed the network toan initial state η (cid:48) (cid:54) = 0 that has memory of previous statesassociated with the source MG signal. This is analogous totransfer learning methods applied to ANN models, wheresynaptic weights are trained on a source domain and theknowledge gained is transferred to a different, but related,target domain [39]. Our case is somewhat different as thenetwork dynamically self-adjusts its own synaptic junctionstates during the priming period (since in reservoir computing,only the output weights are trained, not the network weights).We compared the accuracy in predicting the MG target signalto that obtained for a network without prior exposure to thesource MG signal during a pre-training period.II. R ESULTS
A. Adaptive dynamics
Fig. 2. Top panel – Individual memristive junction conductances G jn (inunits of conductance quanta G ) as a function of time t (in units of totalsimulation time T ) for a triangular voltage signal (black) input to a 261-junction NWN. Bottom panel – snapshots of the network at four sequentialtime points ( t/T = 0 . , . , . , . ) with colorbar indicating G jn . Darkblue junctions denote memristive switches in the off state. Figure 2 shows the NWN response to a triangular inputsignal. In the top panel, each colored curve represents theevolution in time of conductance G jn across an individualmemristive junction. The network connectivity determinesthe spatial distribution of voltage at each moment in time.This connectivity influences the voltage-controled memristivedynamics of each junction, resulting in collective switching as G jn continuously adapts. The bottom panel shows this self- regulation of the synaptic junctions in snapshot visualizationsof the network at successive time points during evolution.Brightly colored circles evident in the frames at t = 0 . T and t = 0 . T represent memristive switches in their on state,with current paths indicated (white). The intrinsic adaptive dy-namics of NWNs can in principle be harnessed for informationprocessing. For the parameters used in Fig. 2 ( f = 0 . Hz. A = 0 . V), the network exhibits “edge-of’chaos” dynamics(e.g. I − V trajectories begin to diverge), which may beoptimal for information processing [46]. B. Mackey–Glass time–series prediction
Fig. 3 plots the time series for training and predicting aMG signal with τ = 20 . Network output weights w aretrained using the first 2400 time steps (i.e. T = 2 . s) afterwhich the signal is predicted 20 steps ahead using eq. (1)(i.e. with δt = 0 . s in this case). The target signal isoverplotted for comparison. The resulting prediction accuracyis 75%. Fig. 4 plots the prediction accuracy as a function of τ ≥ . Accuracy decreases with τ because errors amplifyexponentially as the MG signal becomes more chaotic. Fig. 3. Time series of MG source signal during training (gray), followed bytarget (dashed blue) and predicted (red) signals for τ = 20 . The inset showsa zoom-in of part of the prediction period.Fig. 4. Average MG forecasting accuracy as a function of delay parameter τ . Shading indicates standard error. . Transfer learning Figure 5 plots MG prediction accuracy when the networkis primed by a MG signal prior to training. Accuracy isplotted for three different pre-training MG signals ( τ =20 , , ) as a function of τ used for MG signal training.For comparison, prediction accuracy without the pre-training(cf. Fig. 4) is also overplotted. Accuracy improves when thenetwork is first primed with a MG signal. This demonstratesthe principle of transfer learning, where knowledge is extractedfrom a source domain and then leveraged for learning in arelated target domain. Fig. 5. Average MG signal prediction accuracy as a function of signal delayparameter τ of the predicted signal and different τ signals used to primethe network before training. Average accuracy without pre-training is shownfor comparison (cf. Fig. 4). Shading indicates standard error. Learning performance is expected to improve especiallywhen there is insufficient information in the target domaincompared to the source domain. In this example, predictionaccuracy improves more when the network is primed by asource MG signal that is more chaotic (i.e. more degrees offreedom) than the target MG signal (i.e. τ > τ ). This isshown by the blue curve (for τ = 150 ) in Fig. 5 and by theaccuracy difference heatmap in Fig. 6. Fig. 6. Heatmap showing change in average accuracy in predicting a MGsignal with τ when the network is primed using a MG signal with τ relativeto prediction without priming. Importantly, the target MG signal is predicted withoutrelying on any teacher signal for recall. This suggests learningis achieved by harnessing the network’s collective mem-ory of past dynamical states. Priming the network beforetraining improves learning by strengthening the memristive connections in an adaptive way, enabling longer-term memoryconsolidation.Prediction accuracy also depends on the instantaneous net-work state selected for priming. Regardless of the value of τ of the MG signal being predicted, we find accuracy isoptimized for a small range of primed network states. Thisoptimal range of states occurs around network activation,coinciding with the formation of a winner-takes-all (WTA)current path (cf. Fig. 2, bottom panel). Such WTA gatemodules in network circuits are purported to have universalcomputational power for both digital and analog informationprocessing [47]. IV. C ONCLUSIONS
We have demonstrated that the complex interplay betweenthe neural network-like circuitry of nanowire networks andtheir memristive junctions results in adaptive dynamics, wherethe network self-regulates to find the optimal signal transduc-tion routes. We showed how the adaptive dynamics can beharnessed for signal processing using a reservoir computingimplementation. Prediction of the highly nonlinear Mackey–Glass signal was demonstrated well into the strongly chaoticregime. This has not previously been demonstrated with othermemristive reservoir computing approaches. Moreover, wefound performance accuracy of this task is improved bytransfer learning, where the network is primed by a Mackey–Glass signal before training. Our results show that transferlearning improves performance the most when pre-trainingwith a source signal that is more complex than the target signalto be predicted. A
CKNOWLEDGMENT
The authors acknowledge use of the Artemis High Perfor-mance Computing resource at the Sydney Informatics Hub, aCore Research Facility of the University of Sydney.
EFERENCES[1] C. Mead., “Neuromorphic electronic systems”,
Proc. IEEE , 78, pp.1629-1636, 1990[2] C. Mead., “How we created neuromorphic engineering”,
Nat. Elect. , 3,pp. 434-435, 2020[3] W. Zhang et al. , “Neuro-inspired computing chips”,
Nat. Elect. , 3, pp.371-382, 2020[4] O. Krestinskaya, A. P. James, L. O. Chua, ”Neuromemristive Circuitsfor Edge Computing: A Review”,
IEEE Trans. Neur. Net. Learn. Sys. ,31, 4, 2020.[5] G. Indiveri et al. , “Neuromorphic silicon neuron circuits”,
Front. Neu-rosci. , 5, pp. 1-23, 2011[6] P. A. Merolla et al. , “A million spiking neuron integrated circuit with ascalable communication network and interface”,
Science , 345, pp. 668-673, 2014[7] T. Pfeil et al. , “Six networks on a universal neuromorphic computingsubstrate”,
Front. Neurosci. , 7, pp. 1-17, 2013[8] T. Wunderlich et al. , “Demonstrating advantages of neuromorphic com-puting: a pilot study”,
Front. Neurosci. , 13, pp. 260, 2019[9] G. W. Burr et al. , “Neuromorphic computing using non-volatile mem-ory”,
Adv. Phys. X , 2, pp. 89-124, 2017[10] V. K. Sangwan, M. C. Hersam, “Neuromorphic nanoelectronic materi-als”,
Nat. Nanotech. , 15, pp. 517-528, 2020[11] R. Waser, M. Aono, “Nanoionics-based resistive switching memories”,
Nat. Mat. , 6, pp. 833-840, 2007[12] M. A. Zidan, J. P. Strachan, W. D. Lu, “The future of electronics basedon memresitive systems”,
Nat. Electron. , 1, pp. 22-29, 2018[13] D. Ielmini, H.-S. D. Wong, “In-memory computing with resistiveswitching devices”,
Nat. Electron. , 1, pp. 333-343, 2018[14] Z. Wang, H. Wu, G. W. Burr, C. S. Hwang, K. L. Wang, Q. Xia,J. J. Yang, “Resistive switching materials for information processing”,
Nat. Rev. Mat. , 5, pp. 173-195, 2020[15] T. Ohno, T. Hasegawa, T. Tsuruoka, K. Terabe, J. K. Gimzewski,M. Aono, “Short-term plasticity and long-term potentiation mimickedin single inorganic synapses”,
Nat. Mat. , 10, pp. 591-595, 2011[16] T. Serrano-Gotarredona, T. Masquelier, T. Prodromakis, G. Indiveri,B. Linares-Barranco, “STDP and STDP variations with memristors forspiking neuromorphic learning systems”,
Front. Neurosci. , 7, pp. 2, 2013[17] A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein, T. Prodromakis,“Unsupervised learning in probabalistic neural networks with multi-statemetal-oxide memristive synapses”,
Nat. Commun. , 7, pp. 12611, 2016[18] A. Mehonic, A. Sebastian, B. Rajendran, O. Simeone, E. Vasilaki,A. J. Kenyon, “Memristors – from in-memory computing, deep learningacceleration, and spiking neural networks to the future of neuromorphicand bio-inspired computing”,
Adv. Intell. Syst. , 2000085, 2020[19] Y. V. Pershin, M. Di Ventra, “Memory effects in complex materials andnanoscale systems”,
Adv. Phys. , 60, pp. 145-227, 2011[20] P. N. Nirmalraj et al. , “Manipulating connectivity and electrical conduc-tivity in metallic nanowire networks”,
Nano. Lett. , 12, pp. 5966-5971,2012[21] A. T. Bellew, A. P. Bell, E. K. McCarthy, J. A. Fairfield, J. J. Boland,“Programmability of nanowire networks”,
Nanoscale , 6, pp. 9632-9639,2014[22] A. V. Avizienis, H. O. Sillin, C. Martin-Olmos, H. H. Shieh, M. Aono,A. Z. Stieg, J. K. Gimzewski, “Neuromorphic atomic switch networks”,
PLoS ONE , 7, pp. e427772, 2012[23] E. C. Demis et al. , ”Atomic switch networks nanoarchitectonic designof a complex system for natural computing”,
Nanotech. , 26, pp. 204003,2015[24] G. Milano, S. Porro, I. Valov, C. Ricciardi, “Recent developments andperspectives for memristive devices based on metal oxide nanowires”,
Adv. Electronic Mat. , 5, 1800090, 2019[25] A. Diaz-Alvarez et al. , “Emergent dynamics of neuromorphic nanowirenetworks”,
Sci. Rep. , 9, pp. 14920, 2019.[26] R. D. Pantone, J. D. Kendall, J. C. Nino, “Memristive nanowires exhibitsmall-world connectivity”,
Neur. Net. , 106, pp. 144-151, 2018[27] A. Loeffler et al. , “Topological properties of neuromorphic nanowirenetworks”,
Front. Neurosci. , 14, pp. 184, 2020[28] A. Stieg, A. V. Avizienis, H. O. Sillin, C. Martin-Olmos, M. Aono,J. K. Gimzewski, “Emergent criticality in complex Turing-B type atomicswitch networks”,
Adv. Mater. , 24, pp. 286-293, 2012. [29] H. O. Sillin, R. Aguilera, H. H. Shieh, A. V. Avizienis, M. Aono,A. Z. Stieg, J. K. Gimzewski, “A theoretical and experimental studyof neuromorphic atomic switch networks for reservoir computing”,
Nanotech. , 24, pp. 384004, 2013[30] H. G. Manning et al. , ”Emergence of winner-takes-all connectivity pathsin random nanowire networks”,
Nat. Commun. , 9, pp. 3219, 2018.[31] Z. Kuncic et al. , “Emergent brain-like complexity from nanowire atomicswitch networks,” (IEEE-NANO), Cork, Ireland, pp. 1-3, 2018.[32] G. Milano et al. , “Brain-inspired structural plasticity through reweight-ing and rewiring in multi-terminal self-organizing memristive nanowirenetworks”,
Adv. Intell. Syst. , 2, pp. 2000096, 2020[33] A. Diaz-Alvarez, R. Higuchi, Q. Li, Y. Shingaya, T. Nakayama, “As-sociative routing through neuromorphic nanowire networks”,
AIP Adv. ,10, pp. 025134, 2020[34] Q. Li et al. , “Dynamical electrical pathway tuning in neuromorphicnanowire networks”,
Adv. Func. Mat. , in press, 2020[35] K. Fu et al. , “Reservoir computing with neuro-memristive nanowirenetworks”, in
Proc. Intl. Joint Conf. Neural Networks (IJCNN), in press,2020[36] Z. Kuncic et al. , “Neuromorphic information processing with nanowirenetworks”, in , in press,2020[37] K. Zito, K. Svoboda, “Activity-dependent synaptogenesis in the adultmammalian cortex”,
Neuron , 35, pp. 1015-1017, 2002[38] Y. Cui, S. Ahmad, J. Hawkins, “Continuous online sequence learningwith an unsupervised neural network model”,
Neur. Comp. , 28, pp. 2474-2504, 2016[39] F. Zhuang et al. , “A comprehensive survey on transfer learning”,
Proc.IEEE , pp. 1-34, 2020[40] E. Bullmore, O. Sporns, “Complex brain networks: Graph theoreticalanalysis of structural and functional systems”,
Nat. Rev. Neurosci. , 10,pp. 186–198, 2009[41] C. W. Lynn, D. S. Bassett, “The physics of brain network structure,function and control”,
Nat. Rev. Phys. , 1, pp. 318-332, 2019[42] J. Zhu, T. Zhang, Y. Yang, R. Huang, “A comprehensive review onemerging artificial neuromorphic devices”,
Appl. Phys. Rev. , 7, pp.011312, 2020[43] K. Terabe, T. Hasegawa, T. Nakayama, M. Aono, “Quantized conduc-tance atomic switch”,
Nature , 433, pp. 47-50, 2005[44] J. G. Simmons, “Generalized formula for the electric tunnel effectbetween similar electrodes separated by a thin insulating film”,
J. Appl.Phys. , 34, pp. 1793-1803, 1963[45] C-W Ho, A. Ruehli, P. Brennan, “The modified nodal approach tonetwork analysis”,
IEEE Transactions on Circuits and Systems , 22, pp.504-509, 1975[46] N. Bertschinger, T. Natschl¨ager, “Real-time computation at the edge ofchaos in recurrent neural networks”,
Neur. Comp. , 16, pp. 1413–1436,2004.[47] W. Maass, “On the computational power of winner-take-all”,