[PDF] Harnessing adaptive dynamics in neuro-memristive nanowire networks for transfer learning

Abstract

Nanowire networks (NWNs) represent a unique hardware platform for neuromorphic information processing. In addition to exhibiting synapse-like resistive switching memory at their cross-point junctions, their self-assembly confers a neural network-like topology to their electrical circuitry, something that is impossible to achieve through conventional top-down fabrication approaches. In addition to their low power requirements, cost effectiveness and efficient interconnects, neuromorphic NWNs are also fault-tolerant and self-healing. These highly attractive properties can be largely attributed to their complex network connectivity, which enables a rich repertoire of adaptive nonlinear dynamics, including edge-of-chaos criticality. Here, we show how the adaptive dynamics intrinsic to neuromorphic NWNs can be harnessed to achieve transfer learning. We demonstrate this through simulations of a reservoir computing implementation in which NWNs perform the well-known benchmarking task of Mackey-Glass (MG) signal forecasting. First we show how NWNs can predict MG signals with arbitrary degrees of unpredictability (i.e. chaos). We then show that NWNs pre-exposed to a MG signal perform better in forecasting than NWNs without prior experience of an MG signal. This type of transfer learning is enabled by the network's collective memory of previous states. Overall, their adaptive signal processing capabilities make neuromorphic NWNs promising candidates for emerging real-time applications in IoT devices in particular, at the far edge.

Full PDF

HHarnessing adaptive dynamics in neuro-memristivenanowire networks for transfer learning

Ruomin Zhu ∗ , Joel Hochstetter ∗ , Alon Loefﬂer ∗ , Adrian Diaz-Alvarez † , Adam Stieg †‡ , James Gimzewski †‡ ,Tomonobu Nakayama †∗ and Zdenka Kuncic ∗†∗ School of Physics and Sydney Nano Institute, University of Sydney, Sydney, NSW 2006, AustraliaEmail: [email protected] † International Centre for Materials Nanoarchitectonics, National Institute for Materials Science, Tsukuba, Japan ‡ California NanoSystems Institute, University of California at Los Angeles, California, USA

Abstract —Nanowire networks (NWNs) represent a uniquehardware platform for neuromorphic information processing. Inaddition to exhibiting synapse-like resistive switching memory attheir cross-point junctions, their self-assembly confers a neuralnetwork-like topology to their electrical circuitry, something thatis impossible to achieve through conventional top-down fabrica-tion approaches. In addition to their low power requirements, costeffectiveness and efﬁcient interconnects, neuromorphic NWNsare also fault-tolerant and self-healing. These highly attractiveproperties can be largely attributed to their complex networkconnectivity, which enables a rich repertoire of adaptive nonlin-ear dynamics, including edge-of-chaos criticality. Here, we showhow the adaptive dynamics intrinsic to neuromorphic NWNs canbe harnessed to achieve transfer learning. We demonstrate thisthrough simulations of a reservoir computing implementationin which NWNs perform the well-known benchmarking task ofMackey-Glass (MG) signal forecasting. First we show how NWNscan predict MG signals with arbitrary degrees of unpredictability(i.e. chaos). We then show that NWNs pre-exposed to a MGsignal perform better in forecasting than NWNs without priorexperience of an MG signal. This type of transfer learningis enabled by the network’s collective memory of previousstates. Overall, their adaptive signal processing capabilities makeneuromorphic NWNs promising candidates for emerging real-time applications in IoT devices in particular, at the far edge.

Index Terms —neuromorphic information processing, memris-tive switching, neural network, nonlinear dynamics, transferlearning

I. I

NTRODUCTION

The ﬁeld of neuromorphic engineering is widely recognizedas the realization of Carver Mead’s original vision for a newtype of electronic hardware engineered to mimic informationprocessing in biological nervous systems [1], [2]. Today, thelowest common denominator of virtually all neuromorphichardware systems is the co-location of memory and process-ing units (i.e. non-von Neumann architecture). This minimalneuromorphic feature alone has dramatically improved powerefﬁciency in training various artiﬁcial neural network (ANN)models [3].A higher level neuromorphic attribute is the ability to learnand while ANN models demonstrate learning in software,learning in hardware is desirable for next-generation stand-alone cognitive devices, especially at the IoT edge [4]. Inhardware, spike-based learning has been successfully imple- mented in conventional silicon CMOS technology (e.g. [5]–[8]). Beyond silicon, nanoelectronic materials with intrinsicneuromorphic properties, including memory and the ability toemulate synaptic connections [3], [9], have attracted enormousattention for on-chip learning [10]. In particular, resistiveswitching memory (memristive) devices [11]–[14] are leadingcandidates for efﬁcient neuromorphic computing architectures,with demonstrated neuromorphic learning functionalities suchas short-term and long-term potentiation (STP/LTP) and spike-timing dependent plasticity (STDP) [15]–[18].At the device level, neuromorphic functionalities can bebroadly attributed to modiﬁcation of electronic transportmechanisms by nanoscale geometric conﬁnement, usuallyacross a metal-insulator-metal (MIM) junction [19]. Impor-tantly, synapse-like memristive switching is observed not justin memristors fabricated from conventional bulk materials(e.g. metal oxides), but also in neuromorphic systems self-assembled from nanomaterials using bottom-up techniques[10]. Here, we focus on self-assembled metallic nanowiresbecause not only do they form memristive switching MIMjunctions, but they also form a complex neural-like networktopology, with all-in-one connectivity properties such as small-worldness, modularity and recurrent feedback loops [20]–[27]. The unique neuromorphic topology of self-assemblednanowire networks (NWNs) is responsible for collective func-tionalities emerging from the interplay between network con-nectivity and synaptic nonlinear dynamics [25], [28]–[32].Learning in NWN hardware does not require implemen-tation of an ANN model, as has been demonstrated withassociative memory tasks [33], [34] and with temporal infor-mation processing tasks using a reservoir computing approach,where the network self-regulates in response to continuous-time input signals and only the readout is trained [29], [35],[36]. Varying spatio-temporal input signals (i.e. delivered viadifferent contact electrodes and with time-varying amplitudes)results in the formation of new electrical pathways, analogousto synaptogenetic learning [37], [38]. Here, we show thatNWNs with prior experience of a complex, nonlinear time-series signal can perform better in forecasting the signal than aNWN without prior exposure, thus demonstrating capacity fortransfer learning, an important attribute for general intelligence(see [39] for a recent comprehensive review). a r X i v : . [ c ond - m a t . d i s - nn ] N ov ig. 1. Graph representations of 300-node networks: left – self–assemblednanowire network (2434 edge junctions, average degree 16, small-worldpropensity 0.67); right – random network (2400 junctions, average degree16, small-world propensity 0.29). Nodes in red, edges in black. II. M

ETHODS

A. Modelling network connectivity and memristive junctions

We performed simulations using a physically motivatedmodel based on polymer-coated Ag nanowires that self–assemble into a complex network [25], [31]. Self–assemblywas modelled by distributing individual nanowires on a 2Dplane, with uniformly random positions and orientations, andwith lengths uniformly sampled from a gamma distribution(mean 100 µ m, stdev 10 µ m). The variance in nanowire lengthis based on experimental observations [20], [21], [25], [33]and increases the probability of forming cross-point junctionsbetween overlapping nanowires. This mimics biological neuralnetworks, in which individual neurons can each make severalthousand synaptic connections to neighbouring neurons. In ourmodel of self-assembled nanowire networks (NWNs), a rangeof nanowire connectivities are possible for a ﬁxed number ofnanowires. Importantly, the resulting network structure is morecomplex than a purely random topology or fully connectednetwork (Fig. 1), with sparseness and recurrence characteris-tics that are responsible for efﬁcient signal transduction andemergent cognitive function in biological neural networks [27],[40], [41]. It is also noteworthy that the complex networktopology of self-assembled networks differs from the bipartitestructure used in ANN models.Nanowire-nanowire cross-points were modelled as voltage-controlled memristive junctions described by a state-dependentOhm’s law, I = G ( λ ) V , where the conductance G ( λ ) isa function of the state variable λ ( t ) that depends on thepast history of voltage input. Physically, λ ( t ) parameterizesthe evolution of a conductive ﬁlament that forms across theMIM junction above a threshold bias. For polymer-coated Agnanowires, the polymer is electrically insulating, but ionicallyconducting, so Ag + cations can migrate across the biasedjunction [42]. The conducting atomic ﬁlament that forms inthis way switches the junction from a high-resistance “off”state, to a low-resistance “on” state, when λ ≥ λ crit , where λ crit is a threshold. As the polymer thickness ( ≈ − nm)is comparable to the Fermi length of Ag ( ≈ . nm), resistiveswitching is modelled as a change in the junction conductancestate G ( λ ) by an amount equal to the conductance quanta G = (13 kΩ) − , consistent with measurements of individualnanowire junctions [43]. The corresponding resistance statesare R on = G − and R oﬀ = ζR on , with ζ = 10 used in the simulation results presented here. Additionally, the Simmonsformula is used to model the low voltage tunneling regime in G ( λ ) when the conductive ﬁlament is close to the oppositenanowire [44]. Network conductance is calculated using amodiﬁed nodal analysis [45] to solve Kirchoff’s circuit lawequations at each time point. B. Mackey–Glass time series prediction

Reservoir computing was implemented on a network with N = 100 nanowire nodes and 577 memristive junctions. TheMackey–Glass (MG) signal was delivered to one source nodeas input voltage bias relative to a drain node. MG signalswith varying time delays τ ≥ were predicted, with τ = 17 corresponding to the onset of chaotic dynamics.The network state as a function of time t is represented byinstantaneous voltage on all the N nodes. The MG signal ata future time u t + δt was predicted using a subset of n = 10 node states weighted by a vector w : u t + δt = w · η t , (1)where η t is an 11-element vector that includes a 1 V linearshift element and where w was determined by least squaresregression using all past states of the n nodes and the input(teacher) signal in the time interval t ∈ [0 , T ] , i.e. [ u δt , u δt +1 , ..., u T ] = w · [ η , η , ... η T − δt ] . (2)A history length of T > τ was used to train the n outputweights and the prediction step was set to δt = τ . Accuracyof the prediction task was calculated asAccuracy = 1 − RNMSE (3)where RNMSE is the root-normalized mean square error. Sta-tistical uncertainties were determined by randomly selectingthe n = 10 readout nodes for 100 simulations and averagingthe accuracy. C. Transfer learning

In conventional reservoir computing, the initial state ofthe network is homogeneous and for the MG prediction taskdescribed above, we set η = . We modiﬁed the task byﬁrst exposing the network to a source MG signal with delayparameter τ before training and predicting a second targetMG signal with delay parameter τ . Delivering the sourceMG ( τ ) signal for 1.5 s effectively primed the network toan initial state η (cid:48) (cid:54) = 0 that has memory of previous statesassociated with the source MG signal. This is analogous totransfer learning methods applied to ANN models, wheresynaptic weights are trained on a source domain and theknowledge gained is transferred to a different, but related,target domain [39]. Our case is somewhat different as thenetwork dynamically self-adjusts its own synaptic junctionstates during the priming period (since in reservoir computing,only the output weights are trained, not the network weights).We compared the accuracy in predicting the MG target signalto that obtained for a network without prior exposure to thesource MG signal during a pre-training period.II. R ESULTS

A. Adaptive dynamics

Fig. 2. Top panel – Individual memristive junction conductances G jn (inunits of conductance quanta G ) as a function of time t (in units of totalsimulation time T ) for a triangular voltage signal (black) input to a 261-junction NWN. Bottom panel – snapshots of the network at four sequentialtime points ( t/T = 0 . , . , . , . ) with colorbar indicating G jn . Darkblue junctions denote memristive switches in the off state. Figure 2 shows the NWN response to a triangular inputsignal. In the top panel, each colored curve represents theevolution in time of conductance G jn across an individualmemristive junction. The network connectivity determinesthe spatial distribution of voltage at each moment in time.This connectivity inﬂuences the voltage-controled memristivedynamics of each junction, resulting in collective switching as G jn continuously adapts. The bottom panel shows this self- regulation of the synaptic junctions in snapshot visualizationsof the network at successive time points during evolution.Brightly colored circles evident in the frames at t = 0 . T and t = 0 . T represent memristive switches in their on state,with current paths indicated (white). The intrinsic adaptive dy-namics of NWNs can in principle be harnessed for informationprocessing. For the parameters used in Fig. 2 ( f = 0 . Hz. A = 0 . V), the network exhibits “edge-of’chaos” dynamics(e.g. I − V trajectories begin to diverge), which may beoptimal for information processing [46]. B. Mackey–Glass time–series prediction

Fig. 3 plots the time series for training and predicting aMG signal with τ = 20 . Network output weights w aretrained using the ﬁrst 2400 time steps (i.e. T = 2 . s) afterwhich the signal is predicted 20 steps ahead using eq. (1)(i.e. with δt = 0 . s in this case). The target signal isoverplotted for comparison. The resulting prediction accuracyis 75%. Fig. 4 plots the prediction accuracy as a function of τ ≥ . Accuracy decreases with τ because errors amplifyexponentially as the MG signal becomes more chaotic. Fig. 3. Time series of MG source signal during training (gray), followed bytarget (dashed blue) and predicted (red) signals for τ = 20 . The inset showsa zoom-in of part of the prediction period.Fig. 4. Average MG forecasting accuracy as a function of delay parameter τ . Shading indicates standard error. . Transfer learning Figure 5 plots MG prediction accuracy when the networkis primed by a MG signal prior to training. Accuracy isplotted for three different pre-training MG signals ( τ =20 , , ) as a function of τ used for MG signal training.For comparison, prediction accuracy without the pre-training(cf. Fig. 4) is also overplotted. Accuracy improves when thenetwork is ﬁrst primed with a MG signal. This demonstratesthe principle of transfer learning, where knowledge is extractedfrom a source domain and then leveraged for learning in arelated target domain. Fig. 5. Average MG signal prediction accuracy as a function of signal delayparameter τ of the predicted signal and different τ signals used to primethe network before training. Average accuracy without pre-training is shownfor comparison (cf. Fig. 4). Shading indicates standard error. Learning performance is expected to improve especiallywhen there is insufﬁcient information in the target domaincompared to the source domain. In this example, predictionaccuracy improves more when the network is primed by asource MG signal that is more chaotic (i.e. more degrees offreedom) than the target MG signal (i.e. τ > τ ). This isshown by the blue curve (for τ = 150 ) in Fig. 5 and by theaccuracy difference heatmap in Fig. 6. Fig. 6. Heatmap showing change in average accuracy in predicting a MGsignal with τ when the network is primed using a MG signal with τ relativeto prediction without priming. Importantly, the target MG signal is predicted withoutrelying on any teacher signal for recall. This suggests learningis achieved by harnessing the network’s collective mem-ory of past dynamical states. Priming the network beforetraining improves learning by strengthening the memristive connections in an adaptive way, enabling longer-term memoryconsolidation.Prediction accuracy also depends on the instantaneous net-work state selected for priming. Regardless of the value of τ of the MG signal being predicted, we ﬁnd accuracy isoptimized for a small range of primed network states. Thisoptimal range of states occurs around network activation,coinciding with the formation of a winner-takes-all (WTA)current path (cf. Fig. 2, bottom panel). Such WTA gatemodules in network circuits are purported to have universalcomputational power for both digital and analog informationprocessing [47]. IV. C ONCLUSIONS

We have demonstrated that the complex interplay betweenthe neural network-like circuitry of nanowire networks andtheir memristive junctions results in adaptive dynamics, wherethe network self-regulates to ﬁnd the optimal signal transduc-tion routes. We showed how the adaptive dynamics can beharnessed for signal processing using a reservoir computingimplementation. Prediction of the highly nonlinear Mackey–Glass signal was demonstrated well into the strongly chaoticregime. This has not previously been demonstrated with othermemristive reservoir computing approaches. Moreover, wefound performance accuracy of this task is improved bytransfer learning, where the network is primed by a Mackey–Glass signal before training. Our results show that transferlearning improves performance the most when pre-trainingwith a source signal that is more complex than the target signalto be predicted. A

CKNOWLEDGMENT

The authors acknowledge use of the Artemis High Perfor-mance Computing resource at the Sydney Informatics Hub, aCore Research Facility of the University of Sydney.

EFERENCES[1] C. Mead., “Neuromorphic electronic systems”,

Proc. IEEE , 78, pp.1629-1636, 1990[2] C. Mead., “How we created neuromorphic engineering”,

Nat. Elect. , 3,pp. 434-435, 2020[3] W. Zhang et al. , “Neuro-inspired computing chips”,

Nat. Elect. , 3, pp.371-382, 2020[4] O. Krestinskaya, A. P. James, L. O. Chua, ”Neuromemristive Circuitsfor Edge Computing: A Review”,

IEEE Trans. Neur. Net. Learn. Sys. ,31, 4, 2020.[5] G. Indiveri et al. , “Neuromorphic silicon neuron circuits”,

Front. Neu-rosci. , 5, pp. 1-23, 2011[6] P. A. Merolla et al. , “A million spiking neuron integrated circuit with ascalable communication network and interface”,

Science , 345, pp. 668-673, 2014[7] T. Pfeil et al. , “Six networks on a universal neuromorphic computingsubstrate”,

Front. Neurosci. , 7, pp. 1-17, 2013[8] T. Wunderlich et al. , “Demonstrating advantages of neuromorphic com-puting: a pilot study”,

Front. Neurosci. , 13, pp. 260, 2019[9] G. W. Burr et al. , “Neuromorphic computing using non-volatile mem-ory”,

Adv. Phys. X , 2, pp. 89-124, 2017[10] V. K. Sangwan, M. C. Hersam, “Neuromorphic nanoelectronic materi-als”,

Nat. Nanotech. , 15, pp. 517-528, 2020[11] R. Waser, M. Aono, “Nanoionics-based resistive switching memories”,

Nat. Mat. , 6, pp. 833-840, 2007[12] M. A. Zidan, J. P. Strachan, W. D. Lu, “The future of electronics basedon memresitive systems”,

Nat. Electron. , 1, pp. 22-29, 2018[13] D. Ielmini, H.-S. D. Wong, “In-memory computing with resistiveswitching devices”,

Nat. Electron. , 1, pp. 333-343, 2018[14] Z. Wang, H. Wu, G. W. Burr, C. S. Hwang, K. L. Wang, Q. Xia,J. J. Yang, “Resistive switching materials for information processing”,

Nat. Rev. Mat. , 5, pp. 173-195, 2020[15] T. Ohno, T. Hasegawa, T. Tsuruoka, K. Terabe, J. K. Gimzewski,M. Aono, “Short-term plasticity and long-term potentiation mimickedin single inorganic synapses”,

Nat. Mat. , 10, pp. 591-595, 2011[16] T. Serrano-Gotarredona, T. Masquelier, T. Prodromakis, G. Indiveri,B. Linares-Barranco, “STDP and STDP variations with memristors forspiking neuromorphic learning systems”,

Front. Neurosci. , 7, pp. 2, 2013[17] A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein, T. Prodromakis,“Unsupervised learning in probabalistic neural networks with multi-statemetal-oxide memristive synapses”,

Nat. Commun. , 7, pp. 12611, 2016[18] A. Mehonic, A. Sebastian, B. Rajendran, O. Simeone, E. Vasilaki,A. J. Kenyon, “Memristors – from in-memory computing, deep learningacceleration, and spiking neural networks to the future of neuromorphicand bio-inspired computing”,

Adv. Intell. Syst. , 2000085, 2020[19] Y. V. Pershin, M. Di Ventra, “Memory effects in complex materials andnanoscale systems”,

Adv. Phys. , 60, pp. 145-227, 2011[20] P. N. Nirmalraj et al. , “Manipulating connectivity and electrical conduc-tivity in metallic nanowire networks”,

Nano. Lett. , 12, pp. 5966-5971,2012[21] A. T. Bellew, A. P. Bell, E. K. McCarthy, J. A. Fairﬁeld, J. J. Boland,“Programmability of nanowire networks”,

Nanoscale , 6, pp. 9632-9639,2014[22] A. V. Avizienis, H. O. Sillin, C. Martin-Olmos, H. H. Shieh, M. Aono,A. Z. Stieg, J. K. Gimzewski, “Neuromorphic atomic switch networks”,

PLoS ONE , 7, pp. e427772, 2012[23] E. C. Demis et al. , ”Atomic switch networks nanoarchitectonic designof a complex system for natural computing”,

Nanotech. , 26, pp. 204003,2015[24] G. Milano, S. Porro, I. Valov, C. Ricciardi, “Recent developments andperspectives for memristive devices based on metal oxide nanowires”,

Adv. Electronic Mat. , 5, 1800090, 2019[25] A. Diaz-Alvarez et al. , “Emergent dynamics of neuromorphic nanowirenetworks”,

Sci. Rep. , 9, pp. 14920, 2019.[26] R. D. Pantone, J. D. Kendall, J. C. Nino, “Memristive nanowires exhibitsmall-world connectivity”,

Neur. Net. , 106, pp. 144-151, 2018[27] A. Loefﬂer et al. , “Topological properties of neuromorphic nanowirenetworks”,

Front. Neurosci. , 14, pp. 184, 2020[28] A. Stieg, A. V. Avizienis, H. O. Sillin, C. Martin-Olmos, M. Aono,J. K. Gimzewski, “Emergent criticality in complex Turing-B type atomicswitch networks”,

Adv. Mater. , 24, pp. 286-293, 2012. [29] H. O. Sillin, R. Aguilera, H. H. Shieh, A. V. Avizienis, M. Aono,A. Z. Stieg, J. K. Gimzewski, “A theoretical and experimental studyof neuromorphic atomic switch networks for reservoir computing”,

Nanotech. , 24, pp. 384004, 2013[30] H. G. Manning et al. , ”Emergence of winner-takes-all connectivity pathsin random nanowire networks”,

Nat. Commun. , 9, pp. 3219, 2018.[31] Z. Kuncic et al. , “Emergent brain-like complexity from nanowire atomicswitch networks,” (IEEE-NANO), Cork, Ireland, pp. 1-3, 2018.[32] G. Milano et al. , “Brain-inspired structural plasticity through reweight-ing and rewiring in multi-terminal self-organizing memristive nanowirenetworks”,

Adv. Intell. Syst. , 2, pp. 2000096, 2020[33] A. Diaz-Alvarez, R. Higuchi, Q. Li, Y. Shingaya, T. Nakayama, “As-sociative routing through neuromorphic nanowire networks”,

AIP Adv. ,10, pp. 025134, 2020[34] Q. Li et al. , “Dynamical electrical pathway tuning in neuromorphicnanowire networks”,

Adv. Func. Mat. , in press, 2020[35] K. Fu et al. , “Reservoir computing with neuro-memristive nanowirenetworks”, in

Proc. Intl. Joint Conf. Neural Networks (IJCNN), in press,2020[36] Z. Kuncic et al. , “Neuromorphic information processing with nanowirenetworks”, in , in press,2020[37] K. Zito, K. Svoboda, “Activity-dependent synaptogenesis in the adultmammalian cortex”,

Neuron , 35, pp. 1015-1017, 2002[38] Y. Cui, S. Ahmad, J. Hawkins, “Continuous online sequence learningwith an unsupervised neural network model”,

Neur. Comp. , 28, pp. 2474-2504, 2016[39] F. Zhuang et al. , “A comprehensive survey on transfer learning”,

Proc.IEEE , pp. 1-34, 2020[40] E. Bullmore, O. Sporns, “Complex brain networks: Graph theoreticalanalysis of structural and functional systems”,

Nat. Rev. Neurosci. , 10,pp. 186–198, 2009[41] C. W. Lynn, D. S. Bassett, “The physics of brain network structure,function and control”,

Nat. Rev. Phys. , 1, pp. 318-332, 2019[42] J. Zhu, T. Zhang, Y. Yang, R. Huang, “A comprehensive review onemerging artiﬁcial neuromorphic devices”,

Appl. Phys. Rev. , 7, pp.011312, 2020[43] K. Terabe, T. Hasegawa, T. Nakayama, M. Aono, “Quantized conduc-tance atomic switch”,

Nature , 433, pp. 47-50, 2005[44] J. G. Simmons, “Generalized formula for the electric tunnel effectbetween similar electrodes separated by a thin insulating ﬁlm”,

J. Appl.Phys. , 34, pp. 1793-1803, 1963[45] C-W Ho, A. Ruehli, P. Brennan, “The modiﬁed nodal approach tonetwork analysis”,

IEEE Transactions on Circuits and Systems , 22, pp.504-509, 1975[46] N. Bertschinger, T. Natschl¨ager, “Real-time computation at the edge ofchaos in recurrent neural networks”,

Neur. Comp. , 16, pp. 1413–1436,2004.[47] W. Maass, “On the computational power of winner-take-all”,