[PDF] On the number of limit cycles in diluted neural networks

Abstract

We consider the storage properties of temporal patterns, i.e. cycles of finite lengths, in neural networks represented by (generally asymmetric) spin glasses defined on random graphs. Inspired by the observation that dynamics on sparse systems have more basins of attractions than the dynamics of densely connected ones, we consider the attractors of a greedy dynamics in sparse topologies, considered as proxy for the stored memories. We enumerate them using numerical simulation and extend the analysis to large systems sizes using belief propagation. We find that the logarithm of the number of such cycles is a non monotonic function of the mean connectivity and we discuss the similarities with biological neural networks describing the memory capacity of the hippocampus.

Full PDF

OOn the number of limit cycles in diluted neural networks.

Sungmin Hwang, Enrico Lanza, Giorgio Parisi, Jacopo Rocchi, Giancarlo Ruocco, and Francesco Zamponi LPTMS, Universit´e Paris-Sud 11, UMR 8626 CNRS, Bˆat. 100, 91405 Orsay Cedex, France Dipartimento di Biotecnologie Cellulari ed Ematologia,Sapienza Universit`a di Roma, P.le A. Moro 2, 00185 Roma, Italy Dipartimento di Fisica, Sapienza Universit`a di Roma, P.le A. Moro 2, 00185 Roma, Italy Center for Life Nano Science, Fondazione Istituto Italiano di Tecnologia (IIT), Viale Regina Elena 291, I00161 Roma, Italy Laboratoire de Physique de l’Ecole Normale Sup´erieure, ENS, Universit´e PSL,CNRS, Sorbonne Universit´e, Universit´e de Paris, F-75005 Paris, France (Dated: January 3, 2020)We consider the storage properties of temporal patterns, i.e. cycles of ﬁnite lengths, in neuralnetworks represented by (generally asymmetric) spin glasses deﬁned on random graphs. Inspired bythe observation that dynamics on sparse systems has more basins of attractions than the dynamicsof densely connected ones, we consider the attractors of a greedy dynamics in sparse topologies,considered as proxy for the stored memories. We enumerate them using numerical simulation andextend the analysis to large systems sizes using belief propagation. We ﬁnd that the logarithm ofthe number of such cycles is a non monotonic function of the mean connectivity and we discuss thesimilarities with biological neural networks describing the memory capacity of the hippocampus.

I. INTRODUCTION

Identifying the relation between single neuron activityand the mechanisms underlying the cognitive processesand collective properties of complexly interconnected net-works is one of the most important questions in neuro-science. In spite of the highly organized structure, themammalian brain is impressively complex, with billionsof neurons and ten thousand times as many connectionssupporting the integration of information from diﬀerentareas. Facing such a high complexity, we are forced toresort to simpliﬁed models to dissect the problem in clearand controllable elements. Artiﬁcial neural networks of-fer a way to model and study nervous systems with arbi-trarily simple connectivity. To this extent, understandingthe dynamics of these models in the resulting informationstorage and processing at the level of network topologyis a way to investigate crucial elements aﬀecting the be-haviour of complex nervous system.The present work takes up where Ref. [1] left oﬀ, fur-ther investigating the relationship between connectivitystructure properties of recurrent networks and their stor-ing power, associated with the resulting number of limitcycles, which represent cyclically repeating patterns ofneuronal activity. As content-addressable model for sucha Hopﬁeld network [2], we employ a recurrent network ofMcCulloch-Pitts neurons [3], which presents determin-istic dynamics. In recurrent neural networks, informa-tion storage is based on the association of input patternsto cyclic activity (limit cycles, or attractors), which en-code memorized elements. Considering the determinis-tic dynamics, this association is necessarily unequivocal,making it a reliable way to store information non-locally.The retrieval time of a memory encoded in such a way,depends on the strength of the associated neuronal pat-terns. The connectivity structure of a system like thiscompletely determines the clustering operation mapping the input set into the generally smaller set of attractors.The identiﬁcation of information storage with cyclicalactivation patterns, as the ones featured in Hopﬁeld mod-els, proved to have experimental support in resting staterecordings of neuronal populations in the hippocampusof rodents [4] and in task driven neuronal activity in mon-keys [5, 6]. Additionally, the Hopﬁeld model found manybeneﬁts from the striking analogy it bears with spin sys-tems, which belong to a category of complex systemsthat has been extensively studied in physics, from its in-troduction [7] until today [8–11]. Both models representnetworks of relatively simple elementary units, whose dy-namics result from the interaction among elements. The“neighboring” elements are completely deﬁned by theconnectivity matrix J , playing a key role in such a sys-tem. In the case of neuronal networks, the connectivitymatrix contains the information about the connectome ,i.e. the set of synaptic connections of either chemical orelectrical nature. The construction of the connectomemay follow two diﬀerent paths. On one side, it can bebased on synaptic plasticity, which means that a preex-isting ‘empty’ network is enriched with new connectionsevery time a new memory is produced. Alternatively,when storing memories, preexisting connections of an al-ready ﬁlled matrix may be modiﬁed (reinforcement). Inany case, memories are represented by attractors of thedynamics deﬁned by the preexisting random network andthe maximum storage capacity is linearly dependent onthe system size ( ≈ . N ). This is the framework wefocus on, studying the properties of random J matrices.Tanaka and Edwards [12] identify the relation betweenthe mean number of ﬁxed points and the system sizeof random ensembles of fully connected Ising spin glassmodels at thermodynamic limit. In this case, the binaryvariables are spins σ i ∈ {− , } and again, the dynamicsare driven by the strength of the interaction of neighbor-ing nodes, which are deﬁned by asymmetric matrices J ,whose J ij elements are drawn from a ﬁxed distribution a r X i v : . [ c ond - m a t . d i s - nn ] J a n with zero mean. By including a symmetry parameter(shown to inﬂuence the number of limit cycles as well[13]), another study focusing on this kind of model at ﬁ-nite size show the presence of a transition point linked tothe symmetry of the connectivity matrix [14]. Systemswith higher symmetry feature ﬁxed points or limit cyclesof length 2, whose number increase exponentially withthe system size. Systems with lower symmetry parame-ter present limit cycles whose length increases exponen-tially with the system size. Other transitions obtainedwith this model regard the transient time spent to reachthe attractor from random inputs [13–15] and the sizeof the basins of attraction of the existing limit cycles[14]. Adding noise [16, 17], dilution [18], a gain function[19–21] or self-interaction [22] may further expand thearray of possible transitions in the system. In particular,adding dilution to fully asymmetric networks results inan increased number of attractors, and therefore to anincreased storage capacity [23].Our previous work [1] focused on the mean numberof limit cycles of arbitrary length L and their associatedcomplexity Σ L . In particular, we presented a methodbased on [24] that enables to compute, for fully con-nected matrices and diﬀerent symmetry degrees, the av-erage number of limit cycles of any length in the form n L ∼ ( A L /L ) exp( N Σ L ), and solved the numerical com-putation of the average for lengths L ≤

3. We also in-cluded information on cycle states overlap, which is re-lated to the attractor structure, and showed the presenceof a transition to chaotic regimes, depending on the sym-metry degree of the network.Here, we focus on the complexity of limit cycles oflength 4 at the thermodynamic limit, in diluted matri-ces as a function of the connectivity parameter c and thesymmetry parameter (cid:15) . The problem of storing temporalpatterns in a supervised manner has been considered in[25]. Here we consider a similar problem in an unsuper-vised context. We argue that for fully asymmetric net-works, these are the smallest cycle lengths allowed, andthat longer cycles have lengths that are multiples of four.Additionally, we investigate the ﬁnite size eﬀect of themodel with three diﬀerent ensembles of random matricesand show how the Σ complexity at ﬁnite sizes convergestowards the fully connected case as c is increased. II. MODEL

We consider a system of N binary neurons, encodeda vector spin variable σ with entries σ i ∈ {− , } ( i =1 , . . . , N ). At each discrete time step t , σ t undergoesa deterministic dynamics which updates all nodes syn-chronously according to the following rule: σ t +1 i = sgn  N (cid:88) j =1 J ij σ tj  , (1)where sgn( x ) is the sign function. The couplings J ij ’s are quenched random variableswhich stay constant over the dynamical process. Tostudy the average properties of the dynamics, we in-troduce matrix ensembles for J ij constructed as follows.First, regarding topology, we consider two commonlyused graph ensembles that control the overall connec-tivity of networks, namely, regular random graphs andErd˝os-R´enyi (ER) graphs. For both cases, as long asthe average connectivity c , or the mean degree, is keptconstant in the limit N → ∞ , the resulting graphs arelocally tree-like and their adjacency matrices are sparse.Next, for each graph realization G , we assign the cou-pling constants for connected links considering bidirec-tional couplings such that both J ij and J ji are nonzeroif i and j are connected, following experimental evidencecommonly found in real cortical networks. We further as-sume that the coupling J ij and J ji are constructed fromtwo random numbers of the form: J ij = (cid:16) − (cid:15) (cid:17) S ij + (cid:15) A ij , (2)where S ij and A ij are i.i.d. standard Gaussian randomvariables with the symmetric and antisymmetric condi-tions, i.e., S ji = S ij , A ji = − A ij , respectively. The param-eter (cid:15) controls the overall asymmetry of coupling con-stants, the special cases (cid:15) = (0 , ,

2) corresponding tosymmetric, asymmetric and antisymmetric coupling ma-trices, respectively.Here, we mainly focus on the long-time properties ofthe dynamics, i.e., the statistical properties of periodicpoints (or limit cycles) of the dynamics. In the following,we present a general formalism that computes the averagenumber of L -cycles in the limit N → ∞ by extending thepioneering work by Gardner, Derrida and Mottishaw [24].Given two spin conﬁgurations σ , σ (cid:48) , we construct anindicator variable w ( σ , σ (cid:48) ), which yields a binary valueequal to one if σ (cid:48) is the one-step evolution of σ ac-cording to Eq. (1), and zero otherwise. This indicatorvariable is conveniently represented by observing that σ (cid:48) is the evolution of σ if and only if the local ﬁeld H i ( σ ∂i ) = (cid:80) j ∈ ∂i J ij σ j has the same sign as σ (cid:48) i , for all i . Here, the symbol σ ∂i is introduced to denote the setof neighboring spins of i , which emphasizes the fact that H i ( σ ∂i ) only depends on the spin conﬁguration of theneighbors of node i . The above condition allows us toencode w ( σ , σ (cid:48) ) in a compact way as a product of Heav-iside theta functions: w ( σ , σ (cid:48) ) = N (cid:89) i =1 θ ( σ (cid:48) i H i ( σ ∂i )) . (3)By applying this logic repeatedly, it is clear that theproduct (cid:81) Lt =1 w ( σ t , σ t +1 ) is again an indicator variablethat detects the trajectory σ = { σ , · · · , σ L , σ L +1 } .Here and from now on, the underline symbol de-notes a trajectory σ = { σ , · · · , σ L , σ L +1 } and σ i = { σ i , · · · , σ Li , σ L +1 i } denotes its single entry. With theadditional periodic boundary condition σ L +1 = σ , thetrajectory is a cycle of length L . Speciﬁcally, we deﬁnethe partition function: Z L = (cid:88) σ L (cid:89) t =1 w ( σ t , σ t +1 ) ≡ (cid:88) σ N (cid:89) i =1 ψ i ( σ i , σ ∂i ) , (4)where the summation over σ contains all possible spintrajectories obeying the periodic boundary condition. Inthe last equality, we rearrange the terms to represent thepartition function as the product of local constraints ψ i ( σ i , σ ∂i ) = L (cid:89) t =1 θ (cid:0) σ t +1 i H i ( σ t∂i ) (cid:1) . (5)By explicitly writing out their arguments, we stress thateach local constraint ψ i ( σ i , σ ∂i ) only depends on the con-ﬁgurations of the node i and of its neighbors.One caveat follows from the formalism. This periodicboundary condition in Eq. (4) is not only satisﬁed by L -cycles but also by cycles of length L (cid:48) if L (cid:48) is a divisor of L , i.e., L (cid:48) | L . Thus, denoting the number of L -cycles by n L , the following identity holds Z L = (cid:88) L (cid:48) | L n L (cid:48) L (cid:48) , (6)where the additional factor L comes from the fact thateach L -cycle consists of L distinct spin conﬁgurations.Thus to compute n L exactly, one needs to evaluate Z L (cid:48) for all divisors L (cid:48) and reconstruct n L (cid:48) recursively fromEq. (6). This procedure corresponds to M¨obius inversionformula of a Dirichlet convolution.We use a powerful technique, the so-called cavity equa-tion or belief propagation equations (BP) [26, 27], whicheﬃciently computes Z L in a linear time in N in con-trast to an exponential time complexity exhibited by thenaive brute-force counting. This method is guaranteed towork if two conditions are met: i) the underlying graphstructure is locally tree-like and ii) the partition functionis given as the product of local constraints ψ i ( σ i , σ ∂i )summed over the conﬁgurations. In the problem underconsideration, the usual BP framework needs to be gen-eralised [28, 29] and it can be used once the problemshas been set on a dual graph representation, explainedin Appendix A and recently used in [30, 31]. Under theseconditions, the partition function is decomposed into twocontributions: i) z i coming from the nodes i and ii) z ( ij ) from the links ( ij ) on the underlying graph G , respec-tively. Namely, we havelog Z L = (cid:88) i log z i − (cid:88) ( ij ) log z ( ij ) , (7)where z i and z ( ij ) are deﬁned below. The derivation ofsuch decomposition relies on a super-factor representa-tion and is detailed in Appendix A.Furthermore, one can show that each contribu-tion is solely determined by a single set of messages m i → j ( σ i , σ j ) in the following way z ( ij ) = (cid:88) σ i ,σ j m i → j ( σ i , σ j ) m j → i ( σ j , σ i ) (8)and z i = (cid:88) σ i (cid:88) σ ∂i ψ i ( σ i , σ ∂i ) (cid:89) k ∈ ∂i m k → i ( σ k , σ i ) , (9)where the messages satisfy a recursive relation m i → j ( σ i , σ j ) ∝ (cid:88) σ ∂i \ j ψ i ( σ i , σ ∂i ) (cid:89) k ∈ ∂i \ j m k → i ( σ k , σ i ) . (10)With the additional normalization condition (cid:80) σ i ,σ j m i → j ( σ i , σ j ) = 1, this set of recursive equationscan be solved iteratively. Starting from an arbitrarilychosen set of messages, the right hand side of Eq. (10)is computed repeatedly until convergence. This iterativescheme is guaranteed to work on a tree-like factor graph,which is the case for our problem in the super-factorrepresentation as explained in Appendix A. III. RESULTS

With the powerful theoretical tool for computingEq. (6) at hand, we are ready to compute the ensem-ble average over many samples up to a suﬃciently largesystem size N ∼ Z L , which is a self-averagingquantity as opposed to the annealed average Z L . Mainly,we want to estimate the exponential growth coeﬃcientΣ L by extracting the linear part from the usual 1 /N ex-pansion: ln Z L ∼ Σ L N + B L + O ( N − ) . (11)We speciﬁcally focus on the case L = 4 because cy-cles of smaller length are not allowed in the inﬁnite N limit. As discussed in Appendix B, this is due to theexistence of a graph motif which only forms periodic or-bits of length L = 4. The probability of ﬁnding thesemotifs in a graph can be arbitrarily small depending onthe model parameters c or (cid:15) >

0, but the probability pernode is nonzero and independent of N , implying thatthey will eventually appear if the system size is takento be large enough. Therefore, the smallest possible cy-cles in the system are of length L = 4 and cycles shouldbe of length equal to a multiple of 4. This statement isconﬁrmed by our numerical ﬁndings. Strictly speaking,one cannot exclude the possibility of ﬁnding other mo-tifs that further prevent the existence of cycles of length L = 4. However, both our brute force counting and ourresults from the cavity method suggest that 4-cycles al-ways yield the dominant contributions to the partitionfunction and thus at least within the system sizes weconsider, N ≤ FIG. 1. Quenched complexity Σ as a function of (cid:15) for various c , extrapolated to the thermodynamic limit. The complexityis seen to be symmetric around the line (cid:15) = 1 at which thelowest value of Σ is achieved. The solid line indicates the annealed complexity Σ ann4 for the fully-connected model [1].Quenched and annealed averages are not guaranteed to be thesame. Nevertheless, as c increases, the quenched complexity(slowly) converges to the annealed one, see Fig. 2. motifs do not exist or they are rare enough to not befound within our numerical procedures. On the otherhand, solving our BP equations takes a time growing ex-ponentially in L , as a result of the increasing dimensionof the state variables, due to the length of trajectoriesincreasing linearly in L , see Eq. (10). Computing Σ L forthe next non-trivial length L = 8 is thus computation-ally very challenging. Unless speciﬁed otherwise, in thefollowing we focus on the special case (cid:15) = 1. A. Thermodynamic limit of the random regulargraph

Now, let us discuss our numerical ﬁndings. First ofall, we present the results on the random regular (RR)graph ensemble in which the connectivity is ﬁxed tobe c , thus no ﬂuctuations are associated to the degreedistribution. When dealing with diﬀerent ensembles ofgraphs, we generally use c to denote mean connectiv-ity, except in the RR case where it refers to the ﬁxednode degree. In Fig. 1, we present the quenched com-plexity Σ ( c ), extrapolated to the thermodynamic limitusing Eq. (11) with data for N ≤ c = 2 , , ,

5, as well as the annealed complexityΣ ann4 ( c → ∞ ), which is computed analytically for thefully connected graphs in the thermodynamic limit [1].The complexity attains its minimum at (cid:15) = 1, i.e. forfully asymmetric couplings. In fact, since J ij and J ji arecompletely independent, the formation of cycles shouldrely purely on the random occurrences. Moreover, at aﬁxed (cid:15) , we clearly see the decreasing behavior of Σ as afunction of c .Note that Σ may converge or not to the fully con-nected annealed result (solid line) in the limit c → ∞ ,because the quenched complexity is not necessarily the - - - - - - FIG. 2. Quenched complexity Σ , extrapolated to the thermo-dynamic limit, as a function of 1 /c at (cid:15) = 1 (a) and (cid:15) = 0 (b).As expected from the fully-connected case, the complexitydecreases monotonously with increasing c . The decay is rea-sonably described by a power-law of the form (a) Σ ∼ c − . and (b) Σ − Σ ∞ ∼ c − . respectively. The decay exponentfor (b) is signiﬁcantly slower than the one for (a). same as the annealed one. On the other hand, theJensen’s inequality implies that for any c, N ,Σ ( c, N ) ≤ Σ ann4 ( c, N ) . (12)Hence, the limit for c → ∞ of Σ ( c, N ) should be lesseror equal than the fully connected annealed result. Weattribute the observation that in Fig. 1 the data seem toconverge to a slightly higher value than the solid line toa slow convergence with c , as we now discuss.In Fig. 2 we present the behavior of Σ ( c ), extrapolatedto N → ∞ as above, as a function of c for (cid:15) = 1 and (cid:15) = 0. Because (i) Σ ann4 ( (cid:15) = 1) = 0 for the fully connectedsystem, (ii) the annealed complexity is an upper boundof the quenched one, and (iii) the complexity cannot benegative, we should expect Σ to converge to zero as c increases. Consistently with this argument, in Fig. 2, weshow that Σ decays as a power-law of the form Σ ∼ c − α with an exponent α ≈ .

4. For (cid:15) = 0, our data show thatΣ also converges to the annealed result for c → ∞ , thistime with an exponent α ≈ . FIG. 3. Quenched complexity Σ as a function of c at (cid:15) = 1for ﬁnite size systems with N ranging from 8 to 15. (Inset)Quenched complexity Σ at c = 6 as a function of 1 /N at (cid:15) = 1, together with a quadratic ﬁt Σ ( N ) = Σ ( ∞ ) + a/N + b/N , with Σ ( ∞ ) = 0 .

09 (taken from Fig. 2), a = 1 . b = − . FIG. 4. Plot of complexity Σ ( N ) = ln Z /N vs N for c = 4.It shows the rate of decay of the ﬁnite-size correction to Σ .The dashed line indicates the estimate of Σ from the ansatzEq. (11). The error bars shows the sample-to-sample ﬂuctu-ations for diﬀerent system sizes. The variability decreasesmonotonously with increasing N implying that Σ is self-averaging. B. Finite size corrections

Next, we discuss the ﬁnite size eﬀects in our nu-merical results for Σ ( N ) = ln Z /N . The results forthe quenched complexity obtained by exact enumerationfrom ﬁnite size systems with sizes ranging from N = 8 to15 are reported in Fig. 3. Fig. 4 shows the same quantitybut now obtained from the cavity method for much largersystems. As indicated by the 1 /N expansion in Eq. (11),we can extrapolate to the inﬁnite size limit and we ob-serve that the result obtained at N = 200 is close enoughto the asymptotic value. Besides, the sample-to-sample ﬂuctuations for diﬀerent system sizes are reported. Con-sistently with our theoretical prediction of ln Z /N beingself-averaging, we ﬁnd a decreasing variability with sys-tem size N . C. Other graph ensembles

We compute the complexity as a function of the aver-age connectivity in diﬀerent ensembles of graphs, see Fig.5. For ER graphs, we observe that the complexity is anon monotonic function of the mean connectivity. Onthe other hand, for RR graphs, we observe that it is adecreasing function of the mean connectivity. A possibleinterpretation of this phenomenon is that, once a giantconnected cluster exists, each time a connection is estab-lished, some cycle disappears. The ﬁrst increase in ERgraphs at low connectivity is due to a very sparse regimewith many isolated nodes and without a giant connectedcluster. To test this hypothesis, we consider a third en-semble of graphs, built according to the following rule.Be M the number of total links. For a graph of N sites,we can form N/ c = 1. Only after having pairedany node with another random one, random nodes canbe paired by the creation of other links. We denote thisensemble by the acronym DP (Dyadic Pair). In this en-semble, adding new links at low connectivity leads to alinear increase of the complexity. This eﬀect is smoothedin ER graphs because, at the same value of c with respectto a DP graph, some of the links are added between al-ready connected nodes, leading to a smaller increase ofthe connectivity with c .The behavior observed at low connectivity depends onthe choice to exclude isolated nodes (i.e. nodes with de-gree equal to zero) contributions from the computation ofthe complexity. In fact, the case of isolated nodes seemspathological at ﬁrst sight because, having no neighbors,they are not considered in the dynamical update deﬁnedin Eq. (1). One possible choice to extend the dynamicalrule to cover this case would be to set the state of σ ti equal to +1 or − /

2. In this case, thedynamics would be completely symmetric with respect tothe up-down symmetry. The non-monotonous behaviorof the complexity as a function of the connectivity woulddisappear in this case, leading in particular to the resultΣ = log 2 at c = 0, because any conﬁguration wouldtrivially satisfy the constraints. On the other hand, thereare no reasons for the dynamical update to be symmet-ric and the choice made here is equivalent to consider σ ti equal to − σ ti to take one state only, we notice thattheir contribution to the number of cycles is just a factorone. Since the complexity of disconnected parts of thegraph is the sum of their relative complexities, the contri-bution due to isolated nodes is zero. Moreover, σ ti equal . . . .

81 0 0 . . . . . . . .

81 0 0 . . . . Σ c DPERRR | C | M AX c DPERRR

FIG. 5. Complexity Σ at (cid:15) = 1 for diﬀerent ensembles andconnectivity at N = 100. The DP label refers to the ensemblediscussed in the main text. Inset: size of the largest compo-nent in the graph. Results are averaged over 100 randomrealizations for each ensemble. to − − c depends on theensemble considered and, for c <

1, ER graph containsmore such nodes than a DP graph, explaining the lowergrowth of the complexity in the ﬁrst case.

IV. CONCLUSION

In Ref. [1] we extended the analysis of [12] derivingan analytical expression of the average number of limitcycles of length L , (cid:104) n L (cid:105) , as a function of the symme-try parameter (cid:15) , Eq. (2), in neural networks deﬁned byEq. (1), and solved it for L ≥ c , which is included as anadditional argument of the complexity Σ, and is relatedto the degree of dilution of the network. We compute thequenched average Σ L ( N ) = ln Z L /N , reported in Fig. 1as a function of (cid:15) for diﬀerent values of connectivity c ,together with the corresponding annealed average.We argue that, at (cid:15) =1, i.e. for fully asymmetric net-works, the shortest limit cycles appearing in the dynam-ics are of length L = 4, and that the length of longercycles must be a multiple of 4 ( L = 4 n , n = 1 , , ... ).We show that the hypothesis that these cycles providethe main contribution to the average of the total numberof attractors is conﬁrmed both by our brute force calcu-lation and by the results of the cavity method. We also provide an explanation as to why there cannot be limitcycles with L < , we report the complexity for L = 4cycles and we show that it is a decreasing function ofthe connectivity c at (cid:15) =1. In particular, the complexitymay be approximated in the form Σ ( c, (cid:15) = 1) ∼ c − α ,with α ≈ .

4. Similar results are also obtained at (cid:15) =0,but with a smaller α exponent. This analysis is enrichedwith the study of ﬁnite size eﬀects, carried out on threediﬀerent ensembles of random networks. The numericalresults show how ln Z L /N is close to the inﬁnite size limitalready at N = 200. Additionally, its decreasing vari-ability is in line with the hypothesis that this quantity isself-averaging. We also notice that adding new links atlow connectivity ( c <

1) increases the complexity, whiledoing so at high connectivity ( c >

1) results in the oppo-site eﬀect, and associate this behavior with the creationof a giant cluster crucially altering the dynamics of thenetworks.Taken together, these results are in line with previ-ous ﬁndings, which prove that diluting random networksof large size results in a higher number of basins of at-traction directly related to a higher storing capacity ofthe system [23]. Interestingly, diluted networks have alsoother important properties shared with biologic neuralnetworks: sparse connections have been proved to re-duce spiking related noise in cortical areas, contributingto enhance the reliability and the stability of the memoryretrieval process [32].Sparsity is a feature shared also by local areas of thehippocampus known as CA3: the probability of estab-lishing a connection between two CA3 neurons is of 4%in mammalian hippocampus and 10% for pyramidal cellsin neocortex [33, 34]. To explain this observation, theauthors of [33] show that increasing the connectivity re-duces the memory capacity of the network. This is con-sistent with the memory-oriented specialization of thesebrain areas, although other factors may contribute to de-ﬁne these particular structures, such as the retrieval timeof the stored limit cycles. We notice that the ‘overload-ing’ condition discussed in [33], linked to a memory re-trieval impairment due the shift of the energy landscapeof the system towards fewer and bigger basins of attrac-tions, shares the same ﬂavour of the results presented inthis study.In conclusion, this work emphasizes the relationshipbetween the observed structure of diluted brain areasand the crucial role of dilution in models of neuronal net-works, which determines an optimization rule that mayhave shaped these brain areas to maximize the possiblestorage capacity. (a) (b)

FIG. 6. (a) Original graph topology G on which we considera parallel dynamics. (b) Corresponding factor graph for thecounting cycles problem. While G contains no loops, we seethat the factor graph contains many short loops. S ij ˆ i ˆ j ˆ j ˆ l ˆ l S l j S j l S ij ˆ j ˆ k ˆ k S k j S ij S k j FIG. 7. An equivalent super-factor graph representation ofthe original problem in Fig. 6 is shown. For each node i of theoriginal graph G we assign a super-factor ˆ i and for each link( i, j ) we assign a super-variable S ij = ( σ ij , σ ji ). At the cost ofdoubling the dimension of super variables, the resulting graphtopology retains the same graph structure as G , and thus itallows us to use a cavity method as long as the original graph G is tree-like. The redundancy due to the increase of statedimension is then handled by implementing additional localconstraints such that all the variables σ ik sharing the sameﬁrst index are constrained to be the same. ACKNOWLEDGMENTS

We thank Guilhem Semerjian for useful discussions re-lated to this work and in particular for contributing to thedevelopment of the procedure discussed in Appendix A.J.R. and S.H. acknowledge the support of a grant fromthe Simons Foundation (No. 454941, Silvio Franz).

Appendix A: BP equations

A well-known technique for computing the partitionfunction Eq. (4) on a sparse graph is the so-called cav-ity method [26, 27]. Given the problem under consider-ation, one can construct a factor graph by introducing factor nodes ˆ i ’s for each local constraint ψ i ( σ i , σ ∂i ), seeFig. 6(b). Even if the underlying graph G is tree-like asillustrated in Fig. 6(a), the resulting factor graph con-tains many small loops as can be seen in Fig. 6(b). Thisis notoriously a problem if we want to implement a cavityequation naively since this method is guaranteed to workwell only on tree-like factor graphs.One way to overcome this problem is to consider aparticular form of Generalized Belief Propagation (GBP)[28] proposed precisely to deal with loops. In the originalformulation, the diﬃculty arising from the presence ofloops is alleviated considering messages between regionsof nodes that have a tree-like structure. In the presentcase, this comes at the price of doubling the state spaceand considering the super-factor graph shown in Fig. 7.In this representation we introduce a super variablefor each link ( i, j ) of G to hold two spin trajectories S ij = ( σ ij , σ ji ). Then, this redundancy due to doublingthe spin dimension is removed by adding delta functionconstraints to each factor node ˆ i to ensure that all spinvariables sharing the same ﬁrst index i are bound to bethe same, i.e, σ ij = σ i for all j ∈ ∂i . From the abovesetting, we achieve a tree-like factor graph without losingthe locality of constraints. This trick has been recentlyused in similar contexts [30, 31].Having established a locally tree-like factor graph, itis straightforward to write the corresponding BP equa-tions. Following the standard procedure, the messagethat factor ˆ i sends to node ( ij ) reads ν ˆ i → ( ij ) ( σ i , σ j ) ∝ (cid:88) σ ∂i \ j ψ i ( σ i , σ ∂i ) (cid:89) k ∈ ∂i \ j η ( ki ) → ˆ i ( σ k , σ i )(A1)where the ﬁrst summation runs over all possible sub-trajectories of spins ∂i \ j . Similarly, the messages fromthe super-variables to the super-factors can be writtentrivially as there exist only two links for each super-variables, namely η ( ij ) → ˆ j ( σ i , σ j ) ∝ ν ˆ i → ( ij ) ( σ i , σ j ) . (A2)By combining Eqs. (A1) and (A2), we can deﬁne asingle set of messages m i → j ( σ i , σ j ) ≡ η ( ij ) → ˆ j ( S ij ) ∝ ν ˆ i → ( ij ) ( S ij ) satisfying Eq. (10).Once the messages are determined through the iter-ative process with Eq. (10) or equivalently Eqs. (A1)and (A2), the partition function Eq. (4) can be viewedas the sum of three diﬀerent contributions coming fromsuper-nodes, super-factors, and super-links, namely, thepartition function readslog Z L = (cid:88) ( ij ) log z ( ij ) + (cid:88) ˆ i log z ˆ i − (cid:88) < ˆ i, ( ij ) > z ˆ i, ( ij ) , (A3)where the ﬁrst sum is made on all super-nodes, the secondone on all super-factors and the third one on all the linksbetween super-nodes and super-factors. For a generalfactor graph, these factors are given in terms of ν ˆ i → ( ij ) and η ( ij ) → ˆ i , i.e., z ( ij ) = (cid:88) S ij ν ˆ i → ( ij ) ( S ij ) ν ˆ j → ( ij ) ( S ji ) , (A4) z ˆ i = (cid:88) S ∂ ˆ i ψ S ˆ i ( S ∂ ˆ i ) (cid:89) k ∈ ∂i η ( ki ) → ˆ i ( S ki ) , (A5) z ˆ i, ( ij ) = (cid:88) S ij ν ˆ i → ( ij ) ( S ij ) η ( ij ) → ˆ i ( S ij ) , (A6)where ψ S ˆ i ( S ∂ ˆ i ) denotes the constraint on the super graph, ψ S ˆ i ( S ∂ ˆ i ) = ψ i ( σ i , σ ∂i ) (cid:89) j ∈ ∂i δ ( σ ij − σ i ) (A7)In our setting, these equations are compactly writtenin terms of the messages Eq. (10): z ( ij ) = (cid:88) σ i ,σ j m i → j ( σ i , σ j ) m j → i ( σ j , σ i ) (A8)and z i = z ˆ i = (cid:88) σ i (cid:88) σ ∂i ψ i ( σ i , σ ∂i ) (cid:89) k ∈ ∂i m k → i ( σ k , σ i ) (A9)where we have introduced another representation z i = z ˆ i to drop the hat from ˆ i , which is possible because factornodes and original nodes are in one-to-one correspon-dence ˆ i ↔ i . Additionally, one can immediately establishthe relations z ˆ i, ( ij ) = z ˆ j, ( ji ) and z ( ij ) = z ˆ i, ( ij ) . After mak-ing changes with these simpliﬁcations to Eq. (A3), oneestablishes Eq. (7). Appendix B: Motifs preventing cycles of lengths

L < There exist certain motifs that forbid the existence ofcycles of length

L <

4. Let us consider a subset of graph G given by Fig. 8 where two nodes i and j are connected.Now, let us suppose that the couplings satisfy the follow-ing condition J ij > (cid:88) k ∈ ∂i \ j | J ik | . (B1)If this condition is met, the state of spin σ ti is solelydetermined to be σ t − j by the state of the spin j at timestep ( t −

1) according to Eq. (1). Additionally, let usimpose a similar condition for the node j but with thenegative sign: − J ji > (cid:88) l ∈ ∂j \ i | J jl | . (B2) FIG. 8. Illustration of graph motifs that forbid cycleswith

L <

4. If the coupling matrix satisﬁes both conditionsEqs. (B1) and (B2), the states of i and j are completely in-dependent of the rest of the system. At the same time, thestates can only form a periodic orbit of length 4, implyingthat there exist only cycles of length a multiple of 4 in thesystem. Thus, it similarly follows that σ tj = − σ t − i . The existenceof such conditions implies that even though the networkis connected, there can be subsets of a network that areindependent of the rest of the network. In this particularcase, because of the imposed conditions, the pair ( σ ti , σ tj )can only form a periodic orbit of length 4, thus the lengthof the cycles in this system should be a multiple of 4.Considering that all the graph ensembles discussed inthis paper have the ﬁnite probability per node of havingsuch motifs as long as (cid:15) > L < (cid:15) increases. In fact, for small values of (cid:15) thecouplings tend to be more symmetric and the occurrencesof this motif is suppressed. In this case, the existence ofcycles of length smaller than L = 4 is possible at ﬁnitesizes, as shown in Fig. 9.If the two conditions deﬁned in Eq. (B1) and Eq. (B2)holds for each couple of link, cycles are skew symmetric.A generic 4-cycle σ → σ → σ → σ → σ → · · · is skew symmetric if σ → σ → − σ → − σ → σ → · · · Such symmetry may be conveniently described by thequantity Q : Q = 12 N ( σ · σ + σ · σ ) . (B3)Because σ = {− , } , which means that | σ | = N , wehave − ≤ Q <

1. If a cycle is skew symmetric ( σ = − σ and σ = − σ ), then Q = −

1. Additionally, if we con-sider only limit cycles of length 4, Q cannot be equal to1. At (cid:15) = 2, coupling are perfectly antisymmetric andour numerical results conﬁrm that the limit cycles are of . . . . . . . . .

45 0 0 . . . . (cid:15) Σ Σ FIG. 9. Complexity of cycles of length L = 1 and L = 2for a RR graph with c = 6 as a function of (cid:15) for N = 100.As mentioned in the text, these cycles disappears as N in-creases. However, at ﬁnite N and small (cid:15) , the probability ofthe existence of the motif discussed in the text is small andnumerically it is not found.FIG. 10. Histogram of the values of Q for N = 10, (cid:15) = 1and connectivity c = 1 (top) or c = 4 (bottom) in the RRensemble. length L = 4, and all of them have Q = −

1. We thenran simulations and checked the skew symmetry of limitcycles of length 4 to study the Q values with decreasingconnectivity parameter at (cid:15) =1 (from N = 6 to N = 15and c ranging from 0.4 to 4). As an example, in Fig. 10we report two histograms of the obtained Q values for N = 10, in the case of c = 1 and c = 4 ( (cid:15) = 1). It is evi-dent that the fraction of limit cycles with Q = −

1, whichdominates in the case of c = 4, tends to disappear whenconsidering the case of c = 1. To conﬁrm this data, Fig. FIG. 11. Fraction of limit cycles of length 4 with Q = − c for various N , ranging from7 to 16, at (cid:15) = 1 obtained for the RR ensemble.

11 shows the fraction of limit cycles of length 4 for which Q = −

1, as a function of c . In this case, all curves de-crease on decreasing connectivity, consistently with theresults of the previous ﬁgure. It is worth noticing thatfor large N the occurrences of situations like the one out-lined in Fig. 8, which can be found only at suﬃcientlysmall values of c , increase, as well as the probability thatsgn( J ij ) = sgn( J ji ) (given that (cid:15) = 1) in at least one pairof nodes, which is suﬃcient to break the skew-symmetryof the limit cycles of length 4. The second eﬀect dom-inates over the ﬁrst one at low connectivities and skewsymmetricity disappears. Appendix C: Additivity of complexity in thepresence of disconnected clusters

In this appendix, we construct a simple relation for Z L for the case of graphs with a ﬁnite number of connectedclusters. Since each connected cluster is independent ofthe other ones, we have the following additivity property:ln Z L = N C (cid:88) C P ( C ) ln Z C , (C1)where N C is the number of connected clusters and P ( C )is the cluster size distribution. From the normalizationcondition, we have the following relation N C (cid:88) C P ( C ) C = N. (C2)If the clusters possess the same statistical properties andtheir sizes are suﬃciently large, it is reasonable to as-sume that ln Z C follows the same asymptotic expansionof ln Z L i.e., ln Z C ∼ C Σ L + B L + O ( C − ). Under theseconditions, the overall partition function readsln Z L ∼ N Σ L + B L + O ( N − )= N C (cid:88) C P ( C ) ln Z C ∼ N Σ L + N C B L + O ( N C C − ) , (C3)0where we have used the normalization condition. Ad-ditionally, using eq. (11), we reach a mismatch in theconstant order, thus implying B L = 0. This relation canpotentially become wrong if N C is extensive. In this case all the higher corrections in Eq. (C3) become of the sameorder as Σ L . Nevertheless, we found numerically that B L is distributed close to our prediction B L = 0. [1] Sungmin Hwang, Viola Folli, Enrico Lanza, GiorgioParisi, Giancarlo Ruocco, and Francesco Zamponi. Onthe number of limit cycles in asymmetric neural net-works. Journal of Statistical Mechanics: Theory and Ex-periment , 2019(5):053402, 2019.[2] J.J. Hopﬁeld. Neural networks and physical systems withemergent collective computational abilities.

Proceedingsof the National Academy of Sciences , 79(8):2554–2558,1982.[3] Warren S McCulloch and Walter Pitts. A logical calculusof the ideas immanent in nervous activity.

The Bulletinof Mathematical Biophysics , 5(4):115–133, 1943.[4] Brad E. Pfeiﬀer and David J. Foster. Autoassociativedynamics in the generation of sequences of hippocampalplace cells.

Science , 349(6244):180–183, 2015.[5] Joaquin M Fuster, Garrett E Alexander, et al. Neu-ron activity related to short-term memory.

Science ,173(3997):652–654, 1971.[6] Y Miyashita. Neuronal correlate of visual associativelong-term memory in the primate temporal cortex.

Na-ture , 335(6193):817–20, Oct 1988.[7] W. Heisenberg. Zur theorie des ferromagnetismus.

Zeitschrift f¨ur Physik , 49(9):619–636, Sep 1928.[8] Daniel J Amit, Hanoch Gutfreund, and Haim Sompolin-sky. Spin-glass models of neural networks.

Physical Re-view A , 32(2):1007, 1985.[9] Daniel J Amit, Hanoch Gutfreund, and Haim Sompolin-sky. Storing inﬁnite numbers of patterns in a spin-glass model of neural networks.

Physical Review Letters ,55(14):1530, 1985.[10] Daniel J Amit, Hanoch Gutfreund, and Haim Sompolin-sky. Statistical mechanics of neural networks near satu-ration.

Annals of physics , 173(1):30–67, 1987.[11] Daniel J Amit and Daniel J Amit.

Modeling brain func-tion: The world of attractor neural networks . Cambridgeuniversity press, 1992.[12] F Tanaka and S F Edwards. Analytic theory of theground state properties of a spin glass. i. ising spin glass.

Journal of Physics F: Metal Physics , 10(12):2769, 1980.[13] H Gutfreund, J D Reger, and A P Young. The natureof attractors in an asymmetric spin glass with determin-istic dynamics.

Journal of Physics A: Mathematical andGeneral , 21(12):2775, 1988.[14] Ugo Bastolla and Giorgio Parisi. Relaxation, closingprobabilities and transition from oscillatory to chaoticattractors in asymmetric neural networks.

Journal ofPhysics A: Mathematical and General , 31(20):4583, 1998.[15] K Nutzel. The length of attractors in asymmetric randomneural networks with deterministic dynamics.

Journal ofPhysics A: Mathematical and General , 24(3):L151, 1991.[16] L. Molgedey, J. Schuchhardt, and H. G. Schuster. Sup-pressing chaos in neural networks by noise.

Phys. Rev.Lett. , 69:3717–3719, Dec 1992.[17] Jannis Schuecker, Sven Goedeke, and Moritz Helias.Optimal sequence memory in driven random networks.

Physical Review X , 8(4):041029, 2018. [18] B. Tirozzi and M. Tsodyks. Chaos in highly dilutedneural networks.

EPL (Europhysics Letters) , 14(8):727,1991.[19] H. Sompolinsky, A. Crisanti, and H. J. Sommers. Chaosin random neural networks.

Phys. Rev. Lett. , 61:259–262,Jul 1988.[20] H. Sompolinsky A. Crisanti, H.J. Sommers. Chaos inneural networks: chaotic solutions. preprint , March 1990.[21] A Crisanti and H Sompolinsky. Path integral ap-proach to random neural networks.

Physical Review E ,98(6):062120, 2018.[22] M. Stern, H. Sompolinsky, and L. F. Abbott. Dynamicsof random neural networks with bistable units.

Phys.Rev. E , 90:062710, Dec 2014.[23] Viola Folli, Giorgio Gosti, Marco Leonetti, and GiancarloRuocco. Eﬀect of dilution in asymmetric recurrent neuralnetworks.

Neural Networks , 104:50–59, 2018.[24] Gardner, E., Derrida, B., and Mottishaw, P. Zero tem-perature parallel dynamics for inﬁnite range spin glassesand neural networks.

J. Phys. France , 48(5):741–755,1987.[25] Carlo Baldassi, Alfredo Braunstein, and RiccardoZecchina. Theory and learning protocols for the mate-rial tempotron model.

Journal of Statistical Mechanics:Theory and Experiment , 2013(12):P12013, 2013.[26] Marc M´ezard, Giorgio Parisi, and Miguel Virasoro.

Spinglass theory and beyond: An Introduction to the ReplicaMethod and Its Applications , volume 9. World ScientiﬁcPublishing Company, 1987.[27] Marc M´ezard and Giorgio Parisi. The bethe latticespin glass revisited.

The European Physical Journal B-Condensed Matter and Complex Systems , 20(2):217–233,2001.[28] Jonathan S Yedidia, William T Freeman, and Yair Weiss.Characterization of belief propagation and its generaliza-tions.

IT-IEEE , 51:2282–2312, 2001.[29] Jonathan S Yedidia, William T Freeman, and Yair Weiss.Generalized belief propagation. In

Advances in neuralinformation processing systems , pages 689–695, 2001.[30] Andrey Y Lokhov, Marc M´ezard, and Lenka Zdeborov´a.Dynamic message-passing equations for models with uni-directional dynamics.

Physical Review E , 91(1):012811,2015.[31] Jacopo Rocchi, David Saad, and Chi Ho Yeung. Slowspin dynamics and self-sustained clusters in sparsely con-nected systems.

Physical Review E , 97(6):062154, 2018.[32] Edmund T Rolls and Tristan J Webb. Cortical attrac-tor network dynamics with diluted connectivity.

BrainResearch , 1434:212–225, 2012.[33] Edmund T Rolls. Advantages of dilution in the con-nectivity of attractor networks in the brain.

BiologicallyInspired Cognitive Architectures , 1:44–54, 2012.[34] Menno P Witter. Connectivity of the hippocampus. In