[PDF] A new stochastic STDP Rule in a neural Network Model

Abstract

Thought to be responsible for memory, synaptic plasticity has been widely studied in the past few decades. One example of plasticity models is the popular Spike Timing Dependent Plasticity (STDP). The huge litterature of STDP models are mainly based deterministic rules whereas the biological mechanisms involved are mainly stochastic ones. Moreover, there exist only few mathematical studies on plasticity taking into account the precise spikes timings. In this article, we aim at proposing a new stochastic STDP rule with discrete synaptic weights which allows a mathematical analysis of the full network dynamics under the hypothesis of separation of timescales. This model attempts to answer the need for understanding the interplay between the weights dynamics and the neurons ones.

Full PDF

AA new stochastic STDP Rulein a neural Network Model

Pascal Helson ∗ Draft March 2, 2018

Contents ∗ [email protected] a r X i v : . [ m a t h . P R ] M a r bstract Thought to be responsible for memory, synaptic plasticity has been widely studied in the pastfew decades. One example of plasticity models is the popular Spike Timing Dependent Plasticity(STDP). There is a huge litterature on STDP models. Their analysis are mainly based onnumerical work when only a few has been studied mathematically. Unlike most models, we aim atproposing a new stochastic STDP rule with discrete synaptic weights. It brings a new frameworkin order to use probabilistic tools for an analytical study of plasticity. A separation of time-scaleenables us to derive an equation for the weights dynamics, in the limit plasticity is inﬁnitely slowcompare to the neural network dynamic. Such an equation is then analysed in simple cases whichshow counter intuitive result: divergence of weights even when integral over the learning windowis negative. Finally, without adding constraints on our STDP, such as bounds or metaplasticity,we are able to give a simple condition on parameters for which our weights’ process remainsergodic. This model attempts to answer the need for understanding the interplay between theweights dynamics and the neurons ones.

A huge amount of studies have focused on neural networks dynamics in order to reproducebiological phenomena observed in experiments. Thereby, there exist many diﬀerent individualneuron models from the two states neurons to the adaptive exponential integrate-and-ﬁre [17, 24].Compare to this kind of literature, plasticity in recurrent networks has been well less studied.One reason is because it adds an additional layer of complexity to existing models despite beinga candidate for memory formation, learning, etc [6, 10].In the beginning, plasticity models were based on ﬁring rates [8]. Later on, as suggested byHebb’s in 1949 [23], the crucial role of precise spikes timings was proved experimentally and gaverise to Spike-Timing Dependent Plasticity (STDP) [7, 34, 36]. Following such a breakthrough,numerous STDP models emerged. They were associated with neural networks of either Poissonneurons [18, 29, 30] or continuous model of neurons [1, 12, 40]. Here, we would like to presenta new STDP rule which is implemented in the well-known stochastic Wilson-Cowan model ofspiking neurons as presented in [5]. More precisely, because of the plasticity rule, our model is apiecewise deterministic Markov process [13, 14] whereas it is a pure point process in [5].Motivations for proposing such a new model are four folds. First, although mechanisms in-volved in plasticity are mainly stochastic such as the activation of ions channels and proteins, themajority of studies on STDP are implemented using a deterministic description or an extrinsicnoise source [12, 21, 38]. One exception is the stochastic STDP model proposed by Appleby andElliott in [3, 4]. The stochasticity of their model lies in the learning window size. They analysethe dynamic of the weights of one target cell innervated by a few Poisson neurons. A ﬁxed pointanalysis enabled them to show that their model is not relevant in the pair-based case and that mul-tispike interactions are required to get stable competitive weights dynamics. Second, most studiesare based on simulations and their analyses, thus there is still a need to ﬁnd a good mathematicalframework, see [16, 33, 40]. We propose here a mathematical analysis based on probabilisticmethods which leads to a control of weights through the study of their dynamics on their slow timescale. Indeed, long term plasticity timescale ranges from minutes to more than one hour. On theother hand, a spike lasts for a few milliseconds [38]. Thus, third, there is a need to understand howto bridge this time scale gap between the synapse level and the network one [15, 45, 48]. Finally,the interplay between the weights dynamics and the neurons ones is not yet fully understood andwe think the study of recurrent networks is necessary to bring some basis to fully numerical studies.2uch motivations impose some constraints on our model. It has to be rich enough to reproducebiological phenomena, simple enough to be mathematically tractable and easily simulated withthousands of neurons. Finally, it has to enable us to observe macroscopic eﬀects out of microscopicevents. The Wilson-Cowan model has been widely studied [5,9,33] and reproduces many biologicalfeatures of a network such as oscillation and bi-stability for example. On the other hand, based onexperimental evidence [7,44], we propose a new STDP rule with intrinsic noise with ﬁxed synapticweight increment [41]. This allows to control independently the synaptic weight increment andthe probability of a plasticity event. Indeed, several pairs protocol are required for the inductionof plasticity [7, 36].Thus, we can produce a mathematical analysis by studying the Markov process composedof the following three components: the synaptic weight matrix, the inter-spiking times and theneuron states. In the context of long term plasticity, synaptic weights dynamics are much slowerthan the neural network one. A timescale analysis enables us to remove the neurons dynamicsfrom the equations. Then we can derive an equation for the slow weights dynamics alone, inwhich neurons dynamics are replaced by their stationary distributions. Thus, we don’t need tosimulate the dynamics of thousands of fast neurons and we obtain a much easier equation toanalyse. We then discuss the implications of such derivation for learning and adaptation in neuralnetworks.A similar analysis has been done in a few papers with diﬀerent mathematical tools and models [18,19, 29, 30, 32, 40]. When the two ﬁrst one studied only one postsynaptic neuron, the last oneshad a look at recurrent networks. Thanks to a separation of time scale, they derive an equationfor weights in which STDP appears in an integral of the STDP curve against cross-correlationmatrix. The main problem is the computation of such a matrix, they use Taylor expansion andFourier analysis to derive estimations of it. We don’t need such an estimation for our analysisthanks to probabilistic methods.

As in all model of neural networks with plastic connections, one can separate the neuron modeland the plasticity one. Our neuron model is the well-known stochastic Wilson-Cowan model ofspiking neurons presented in [5]. In such a model, neurons are binary, meaning they are either atrest, state 0, or spiking, state 1. This model has been widely studied in the case of ﬁx weightsand presents realistic features such as oscillations or bistable phenomenon, see [9]. However, thereare only few studies with plasticity, see for instance with an Ising model in [42].We implement plasticity in this model in a stochastic way. Indeed, our plasticity rule dependson the precise spike times and thus has the same form as STDP, see [35] for an overview, but isnot deterministic: in the situation of correlated spikes, weights will change or not according to acertain probability.First, we are interested in excitatory neurons, as in most models inhibitory neurons are notplastic, so the synaptic weights will be positive. Also, we suppose they are all to all connected sothis positivity will be strict. We will discuss about these assumptions at the end. Therefore, weﬁrst give some global notations, then explain the neuron model, the plasticity rule, and ﬁnally wegather these dynamics in the generator of the process.3e are interested in analysing the time continuous Markov process ( W t , S t , V t ) t ≥ where:- W t ∈ { ∆ wK, K ∈ E } synaptic weights matrix, E = n K, K ∈ N N , K ij > ∀ i = j and K ii = 0 ∀ i o ,∆ w ∈ R + ∗ and W ∈ { ∆ wK, K ∈ E } , W ijt weight of the connection from neuron i to j at t.- S t ∈ R N + vector of times from last spikes of neurons.- V t ∈ I = { , } N neuron system state.As weights dynamics and the neural network one will be separated, we spare the global statespace E in two spaces. Hence, in the following we denote E = { ∆ wK, K ∈ E } , E = R N + × I such that E = E × E . Neuron model

Let’s deﬁne the dynamic of the process. It is a recurrent neural plastic network with Poissonneurons in interaction. Each neuron jumps with an inhomogeneous rate between two states: 0and 1. This rate depends on the network state and the weights matrix:0 α i ( W t ,V t ) −−−−−− *) −−−−−− β α i is given by ξ i : R R + ∗ bounded, positive and nondecreasing: α i ( W t , V t ) = ξ i  N X j =1 W jit V jt  (2)As the neuron activity is never null, we will consider that for all i , inf x ∈ R ξ i ( x ) ≥ α m > α i is uniformly bounded in w and v for all i :0 < α m = min i (cid:18) inf x ∈ R ξ i ( x ) (cid:19) ≤ α i ( w, v ) ≤ α M = max i (cid:18) sup x ∈ R ξ i ( x ) (cid:19) Plasticity rule

The basic idea of STDP is that of the Hebb’s law (1949):“

When an axon of cell A[...] repeatedly or persistently takes part in ﬁring (a cell B), [...]A’seﬃciency, as one of the cells ﬁring B, is increased ” [23].STDP is a bit more complex as it completes this law with the possibility for weights to decreasewhen they are decorrelated.We expose our plasticity model through an example. First, weights can change only when aneuron spikes that we deﬁne as the jump from 0 to 1 (we could have chosen from 1 to 0 . Sosuppose the neuron i spikes at time t . Then, weights related to this neuron, that is to say W jit and W ijt for all j = i , have a certain probability to jump. This diﬀers from models we can ﬁndin the literature for which weights’ jumps are systematic but small [1, 29, 38]. Here, the jump isnot small but happens with a small probability: W jit has probability p + ( S jt ) to increase and W ijt decrease with probability p − ( S jt ). These probabilities depends on the inter-spiking times givenby S jt : 4 igure 1: Dynamics of neurons i and j over time, and the corresponding probability of jump for weights As the classic STDP curve, found by Bi&Poo [7], suggests it, we take the following probabilityfunctions in our examples, with 0 < A + , A − ≤ τ + , τ − > p + ( s ) = A + e − sτ + and p − ( s ) = A − e − sτ − (3) Remark . By deﬁnition of E and α i , we study excitatory neurons. We see at the end how toextend our results to inhibitory-excitatory neurons. Also, we remark that W iit stays constant andas W ii = 0 for all i , W iit = 0 for all t . We will discuss this assumption later on. Finally, ( S t ) t ≥ is crucial for our process to be Markovian. Generator of the process

Now we know how the process works, we can write its inﬁnitesimal generator. To do so, we needthe following notations. We denote by G wi all reachable weights after a spike of neuron i whilethe current weight is w ∈ E . Thus: G wi =  w + ∆ w  . . . . . . ~ζ p ... ...... ... ... ...0 . . . . . .  −  . . . . . . ~ζ d . . . . . . | {z } N × N matrix  , ( ~ζ p , ~ζ d ) ∈ F wi  Where F wi =  ( ~ζ p , ~ζ d ) , ~ζ d = (cid:2) ζ d , ..., ζ Nd (cid:3) , ~ζ p =  ζ p ... ζ Np  , ζ jd , ζ jp ∈ { , } , ζ id = ζ ip = 0 and ζ jd = 0 if w ij = ∆ w  We call Z p (respectively Z d ) the matrix associated to the vector ~ζ p (respectively ~ζ d ). As eachweight jumps independently whenever a neuron i spikes, we can decompose the probability of5umping to a certain state as the product of probabilities to jump or not for each weights. Wewant to compute φ i ( s, ˜ w, w ), the probability of jumping in a given ˜ w ∈ G wi knowing the neuron i spikes. Let ˜ w = w + ∆ w ( Z p + Z d ), the probability for w ji to increase ( ζ jp = 1) is p + ( s j ) whenthe probability to stay the same ( ζ p = 0) is (1 − p + ( s j )), for all j = i . This will appear as ζ jp p + ( s j ) + (1 − ζ jp )(1 − p + ( s j )) in φ i ( s, ˜ w, w ): φ i ( s, ˜ w, w ) = Φ i ( s, ~ζ p , ~ζ d ) = Y j = i (cid:2) ζ jp p + ( s j ) + (1 − ζ jp )(1 − p + ( s j )) (cid:3) h ζ jd p − ( s j ) + (1 − ζ jd )(1 − p − ( s j )) i (4)Therefore, we can write the generator ( C , D ( C )) of the all process ( W t , S t , V t ) t ≥ where D ( C ) ⊂ C b ( E ) and C given ∀ f ∈ D ( C ) : C f ( w, s, v ) = X i δ ( v i ) β [ f ( w, s, v − e i ) − f ( w, s, v )]+ X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi ( f ( ˜ w, s − s i e i , v + e i ) − f ( w, s, v )) φ i ( s, ˜ w, w )  + N X i =1 ∂ s i f ( w, s, v )Or C f ( w, s, v ) = X i δ ( v i ) β [ f ( w, s, v − e i ) − f ( w, s, v )] | {z } B ↓ f ( w,s,v ) + X i φ i ( s, w, w ) α i ( w, v ) δ ( v i ) ( f ( w, s − s i e i , v + e i ) − f ( w, s, v )) | {z } B ↑ f ( w,s,v ) + N X i =1 ∂ s i f ( w, s, v ) | {z } B tr f ( w,s,v ) + X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w, s − s i e i , v + e i ) − f ( w, s, v )) φ i ( s, ˜ w, w )  Written in this form, the generator shows two diﬀerent dynamics which are related: the weightsdynamic and the network, inter-spiking time dynamics. As we know that synaptic weightsdynamics are slow compare to the network dynamics (( S t , V t ) t> change fast compare to ( W t ) t> ),this means that for all i : X ˜ w ∈ G wi , ˜ w = w φ i ( s, ˜ w, w ) (cid:28) φ i ( s, w, w )Typically, P ˜ w ∈ G wi , ˜ w = w φ i ( s, ˜ w, w ) = O ( (cid:15) ) and φ i ( s, w, w ) = 1 − O ( (cid:15) ). This time scale diﬀerenceis studied in section 3.2 while the study of the fast part of the process is done in section 3.1. Thisprocess is given by the generator B : D ( B ) ⊂ C b ( E ) → C b ( E ): B = B tr + B ↓ + B ↑ (5)6 Derivation of the weight equation

In this section, W t = W = w ∈ E is ﬁxed. We are interested in proving: Theorem 3.1.

For all w ∈ E , the process ( S t , V t ) t ≥ with generator B w mapping D ( B ) into C b ( E ) , deﬁned ∀ f ∈ D ( B ) as: B w f ( s, v ) = X i δ ( v i ) β [ f ( s, v − e i ) − f ( s, v )] (6)+ X i α i ( w, v ) δ ( v i ) ( f ( s − s i e i , v + e i ) − f ( s, v )) (7)+ N X i =1 ∂ s i f ( s, v ) (8) has a unique invariant measure. This aim enters in a bigger ambition to analyse the total process ( W t , S t , V t ) t ≥ on two diﬀerenttime scales. Indeed, in the limit where the plasticity is inﬁnitely slow, it stays constant so φ i ( s, w, w ) = 1, and then for all f ∈ D ( B w ), B w f ( s, v ) = B f ( w, s, v ). This analysis enables us toshow in section 3.2 that, on the slow time scale of plasticity, ( W t ) t ≥ behaves simply against theinvariant measure of ( S t , V t ) wt ≥ . In the following, we omit the dependence on w in the notationof processes only and we use ( S t , V t ) t ≥ instead of ( S t , V t ) wt ≥ .In a ﬁrst subsection we show existence of an invariant measure of the process ( S t , V t ) t ≥ andthen its uniqueness in the next subsection. We start with some notations. Notations

Let X t = ( S t , V t ) with S t ∈ R N + and V t ∈ I = { , } N . The process is then the same as theone deﬁned before with a ﬁxed matrix of weights w . Each X it = ( S it , V it ) ∈ R + × { , } , for i ∈ [[1 , N ]], follows the same kind of process: the discrete variable V t jumps with a total rate P j (cid:0) α j ( w, v ) δ ( v j ) + βδ ( v j ) (cid:1) when V t = v . Between these jumps, the continuous part S t willgrow linearly with a slope of 1 ( dS t dt = 1) except when V it jumps from 0 to 1 at time t , then thecontinuous part restarts from 0, i.e. S it = 0, see F igure N t ) t ≥ the counting process corresponding tothe number of jump of the process ( V t ) t ≥ . We can then deﬁne the processes N t = P Ni =1 N it where ( N it ) t ≥ are counters of the number of jumps of neuron i. By deﬁnition of α i , one has N it = Y i (cid:16)R t α i ( w, V s ) ds (cid:17) where Y i are independent Poisson processes of intensity 1, as in [27].Finally, we call ( P t ) t ≥ the transition probability of the process, P t maps E × B ( E ) in R + .Hence, for all x ∈ E , A ∈ B ( E )( σ -algebra of Borel sets of E ), P t ( x, A ) is the probability that X t ∈ A knowing X = x , probability also written as P x ( X t ∈ A ).7 igure 2: Graph representing the i th coordinates of the processes S t and V t In this subsection, we aim at proving the following theorem:

Proposition 3.2.

The process ( S t , V t ) t ≥ deﬁned in Theorem 3.1 has at least one invariantmeasure of probability. To do so, we use the following theorem, classical in theory of discrete Markov chains on anystate space:

Theorem 3.3.

If a transition probability P is Feller and admits a Lyapunov function, then italso has an invariant probability measure.Proof.

A nice proof of this result can be found in the course of Martin Hairer called

ErgodicProperties of Markov Processes . See theorem 2 of [46]. Just need to show condition ( F ) isequivalent to our Lyapunov condition.After recalling the deﬁnitions of a Lyapunov function and a Feller process, we ﬁnd such aLyapunov function for our process. Deﬁnition 3.4.

Let X be a complete separable metric space and let P be a transition probabilityon X . A Borel measurable function V : X R + ∪ {∞} is called a Lyapunov function for P if itsatisﬁes the following conditions:- V − ( R + ) = ∅ , in other words there are some values of x for which V ( x ) is ﬁnite.- For every c ∈ R + , the set V − ( { x ≤ c } ) is compact.- There exists a positive constant γ < 1 and a constant C such that for every x such that V ( x ) = + ∞ : Z X V ( y ) P ( x, dy ) ≤ γV ( x ) + C eﬁnition 3.5. We say that a homogeneous Markov process with transition operator P is Fellerif Pf is continuous whenever f is continuous and bounded. It is strong Feller if Pf is continuouswhenever f is measurable and bounded.We emphasize that previous deﬁnitions and theorem are given for Markov chains and notprocesses. The following proposition links them.

Proposition 3.6.

Let ( P t ) t ≥ be a Markov semigroup over X and let P = P T for some ﬁxed T > . Then, if µ is invariant for P, the measure ∧ µ deﬁned by: ∧ µ ( A ) = 1 T Z T P t µ ( A ) dt, ∀ A ∈ B ( E ) is invariant for ( P t ) t ≥ .Proof. P t ∧ µ = P t T Z T P s µ ds ! = 1 T Z T P t P s µ ds = 1 T Z T P t + s µ ds = 1 T Z T + tt P s µ ds = 1 T Z Tt P s µ ds + Z T + tT P s µ ds ! = 1 T Z Tt P s µ ds + Z t P s P T µ ds ! = 1 T Z T P s µ ds = ∧ µ Hence, we want to apply theorem 3.3 to the transition probability P T extracted from ( P t ) t ≥ for some ﬁxed T >

0. To do so, we show that for

T > V deﬁned as V ( x ) = s + s + ...s N ∀ x = ( s, v ) ∈ E is a Lyapunov function for P T . Then we use theorem27.6 of the Davis’ book [14] to prove P T is Feller. We conclude on the existence of the invariantmeasure of probability for P T and thus for ( P t ) t ≥ thanks to proposition 3.6.After these deﬁnitions and notations, let’s prove the process ( X t ) t ≥ has at least one invariantmeasure π , i.e. X ∼ π ⇒ ∀ t ≥ , X t ∼ π or more formally, ∀ A ∈ B ( E ): Z E P t ( x, A ) π ( dx ) = π ( A ) (9) Existence

Assumption 3.7. ∃ α m , α M ∈ R + such that ∀ v ∈ I, w ∈ E : < α m ≤ α i ( w, v ) , β ≤ α M < ∞ Proposition 3.8.

With assumption 3.7, for any

T > , V ( x ) = s + ... + s N is Lyapunov for P T with constants C = N T and γ = P x ( ∃ i : N iT < < , ∀ x ∈ E .Proof. The main idea is to use the fact that S it values return to 0 whenever neuron i jumps from0 to 1. Hence, as neurons have only two states, if N iT ≥

2, neuron i has jumped at least one timefrom 0 to 1 between 0 and T . Therefore, decomposing possible events we get: V ( X T ) ≤ ( V ( x ) + N T ) {∃ iN iT < } + N T {∀ iN iT ≥ } E x V ( X T ) ≤ N T + V ( x ) P x ( ∃ i : N iT < | {z } < Furthermore, one can show the process ( S t , V t ) is Feller thanks to Davis’ book [14]: Proposition 3.9. ( S t , V t ) is Feller.Proof. First, we deﬁne a distance ρ such that ( E , ρ ) is a metric space, locally compact. Such adistance is proposed in [14] page 58: ∀ x = ( s x , v x ) , y = ( s y , v y ) ∈ E : ρ ( x, y ) = ( v x = v y π max { ≤ i ≤ N } tan − ( | s ix − s iy | ) if v x = v y (10)We need this kind of norm because if we take for instance the euclidean distance ρ ( x, y ) = k s x − s y k ,we can have ρ ( x, y ) = 0 and x = y as soon as s x = s y and v x = v y .Then, we want to apply theorem 27.6 of [14]. We deﬁne t ∗ ( x ) as t ∗ ( x ) = { time to hit the boundary of E leaving from x and following the ﬂow on s } t ∗ ( x ) = + ∞ as the only boundary is for x = (0 , v ) which is never reached because S t increasestoward inﬁnity following the ﬂow.Moreover, we deﬁne the total jump rate λ ( x ) = P j (cid:0) α j ( w, v ) δ ( v j ) + βδ ( v j ) (cid:1) = λ ( v ). Thus, as λ is bounded by assumption 3.7 and it only depends on v , as soon as ρ ( x, y ) < v x = v y so λ ( x ) = λ ( y ), hence λ ∈ C b ( E ).Finally, we deﬁne Q as Q (cid:0) { (( s − δ ( v i ) s i e i , v + e i ) } , ( s, v ) (cid:1) = α i ( w, v ) δ ( v i ) + βδ ( v i ) λ ( v )and show it is continuous for f ∈ D ( B w ). Indeed, let f ∈ D ( B w ), if ρ ( x, y ) ≤ η < | Qf ( x ) − Qf ( y ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i f ( s x , v + e i ) α i ( w, v ) δ ( v i ) + βδ ( v i ) λ ( v ) − X i f ( s y , v + e i ) α i ( w, v ) δ ( v i ) + βδ ( v i ) λ ( v ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N sup i | f ( s x , v + e i ) − f ( s y , v + e i ) | Then, choosing η such that sup v ∈ I | f ( s x , v ) − f ( s y , v ) | ≤ (cid:15)N (possible as f ∈ D ( B w ) ⊂ C b ( E ))we have for all (cid:15) > ∃ η > ρ ( x, y ) ≤ η ⇒ | Qf ( x ) − Qf ( y ) | ≤ (cid:15) Thus, x → Qf ( x ) is continuous for f ∈ D ( B w ). We can apply theorem 27.6 of Davis’ book [14]which ends the proof.We can now prove theorem 3.2: Proof.

Proposition 3.8 and Proposition 3.9 allows to apply Theorem 3.3 and thus conclude onthe existence of an invariant measure of probability for ( S t , V t ).In the following, we show that such a measure is unique.10 .1.2 Uniqueness through Laplace transform We now want to show this process has a unique invariant measure of probability π . To do so,we ﬁnd the possible Laplace transforms of the invariant measures of the process. We prove suchLaplace transforms satisfy an equation with a unique solution. By uniqueness of the Laplacetransform of a measure, we deduce the result we want.In the following, we use an equivalent deﬁnition of invariant measures which makes use of thegenerator ( B w , D ( B w )) of the process, see proposition 34.7 in [14]. Proposition 3.10.

Let ( T t ) t ≥ be a semigroup on F , a Banach space, associated to a Markovprocess ( X t ) t ≥ . We note, ( B w , D ( B w )) its generator and we assume D ( B w ) is separating. Then, π is an invariant measure if and only if ∀ f ∈ D ( B w ) , Z E B w f dπ = 0 (11)We remind us what is a separating class of functions: Deﬁnition 3.11.

A class of functions

D ∈ B ( E ) (measurable and bounded function on E)is said to be separating if for probability measures µ w and µ w on E , µ w = µ w whenever R E f dµ w = R E f dµ w for all f ∈ D .In what follows, domains of generators will always be separating as showed in the proposition34.11 of [14]. Uniqueness

We invite you to have a look to the appendix A to have a better view on the following computations.

Proposition 3.12.

Assume the process ( X t ) t ≥ in dimension N has at least one invariantmeasure of probability π w . Then it is unique.Proof. Let start with some notations: I = { , } N and E = R N + × I ∀ ( s, v ) ∈ E , s = ( s , ..., s N ) ∈ R N + and v = ( v , ..., v N ) ∈ Ie i = (0 , ..., , |{z} i , , ..., B I = ( ~v , ..., ~v N ) an enumeration of I s.t. k ≥ l ⇒ N X i =1 v ik ≥ N X i =1 v il | λ | = N X i =1 λ i (12)The jump process alone ( V t ) t ≥ has a unique invariant measure µ w = ( µ w , ..., µ w N ) ∈ R N + . Indeed,as each neuron is connected to each other, ( V t ) t ≥ is irreducible. As its state space is ﬁnite,the process is also positive recurrent so it has a unique invariant probability measure µ w bytheorem1.7.7 in [39]. Moreover, as each state is positive recurrent, µ wv > , ∀ v ∈ I . In particular,this measure satisﬁes P N k =1 B g ( ~v k ) µ wk = 0, where B is the generator of ( V t ) t ≥ and for functions g I -measurable: B g ( v ) = B w g ( s, v ) = N X i =1 βδ ( v i )[ g ( v − e i ) − g ( v )] + α i ( w, v ) δ ( v i ) [ g ( v + e i ) − g ( v )] (13)11ence, with g ( v ) = ~v j ( v ) we get ∀ j ∈ [[1 , N ]]: N X k =1 B g ( ~v k ) µ wk = N X k =1 µ wk N X i =1 ( βδ ( v ik )[ ~v j ( v k − e i ) − ~v j ( ~v k )] + α i ( ~v k ) δ ( v ik ) (cid:2) ~v j ( ~v k + e i ) − ~v j ( ~v k ) (cid:3) ) = 0 ⇔ N X k =1 ,k = j µ wk N X i =1 [ βδ ( v ik ) ~v j ( ~v k − e i ) + α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i )] = µ wj N X i =1 βδ ( v ij ) + α i ( ~v j ) δ ( v ij )(14)We can then write the system satisﬁed by Laplace transforms of invariant probability measuresof the process ( S t , V t ) t ≥ . We call π w one of them. First we can decompose π w as: π w ( ds, v ) = N X k =1 π w~v k ( ds ) µ wk ~v k ( v ) (15)In what follows, for the sake of simplicity, we note π wk for π w~v k .From proposition 3.10, ∀ f ∈ D ( B w ): N X k =1 Z s ∈ R N + B w f ( s, ~v k ) µ wk π wk ( ds ) = 0 (16)Where ( B w , D ( B w )) is the generator of the process ( X t ) t ≥ (6) . As we are interested in ﬁndingthe Laplace transform of π w we take f ( s, v ) = e − ~λ.~s g ( v ). First we compute B w f : B w f ( s, v ) = N X i =1 βδ ( v i )[ e − ~λ.~s g ( v − e i ) − e − ~λ.~s g ( v )]+ N X i =1 α i ( w, v ) δ ( v i ) h e − ~λ. ( ~s − s i ~e i ) g ( v + e i ) − e − ~λ.~s g ( v ) i − ( N X i =1 λ i ) | {z } | λ | e − ~λ.~s g ( v ) (17)So in (16) we get: N X k =1 Z s ∈ R N + B w f ( s, ~v k ) µ wk π wk ( ds )= N X k =1 " N X i =1 βδ ( v ik )[ g ( ~v k − e i ) − g ( ~v k )] − α i ( ~v k ) δ ( v ik ) g ( ~v k ) ! − | λ | g ( ~v k ) µ wk Z s e − ~λ.~s π wk ( ds ) | {z } L ( π wk )( λ ) + N X k =1  N X i =1 α i ( ~v k ) δ ( v ik ) g ( ~v k + e i ) Z s e − ~λ. ( ~s − s i ~e i ) π wk ( ds ) | {z } L ( π wk )( b λ i )  µ wk = 0 (18)12here b λ i = ( λ , ..., λ i − , , λ i +1 , ..., λ N ). We ﬁrst show recursively that we can express L ( π wk )( λ )in function of linear combinations of L ( π wl )(ˇ λ l ) where ˇ λ l = (0 , ..., , λ l , , ..., D (ˇ λ l ) invertible such that: D (ˇ λ l )  L ( π w )(ˇ λ l )... L ( π w N )(ˇ λ l )  = Λ ( l ) , with Λ ( l ) ∈ R N a constant vector Where ˇ λ l = (0 , ..., , λ l , , ..., L ( π wk )( λ ) as a linear combination of L ( π wk )(ˇ λ l ). Step 1

First, we express the L ( π wk )( λ ) in function of the L ( π wl )( b λ i ). In particular, we ﬁnd Γ( λ ) : R N + → M N ( R ) and Λ( λ ) : R N + → R N , for which Λ j ( λ ) depends only on linear combination of L ( π wl )( b λ i )where i ∈ [[1 , N ]] and l ∈ [[1 , N ]], such that:Γ( λ )  L ( π w )( λ )... L ( π w N )( λ )  = Λ( λ ) (19)To do so, we take g ( v ) = ~v j ( v ) in (18) and ﬁnd Γ and Λ : N X k =1 L ( π wk )( λ ) " N X i =1 βδ ( v ik )[ ~v j ( ~v k ) − ~v j ( ~v k − e i )] + α i ( ~v k ) δ ( v ij ) ~v j ( ~v k ) ! + | λ | ~v j ( ~v k ) µ wk | {z } Γ jk ( λ ) = N X k =1 " N X i =1 α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) L ( π wk )( b λ i ) µ wk | {z } Λ j ( λ ) (20)We can remark from (14) that:  Γ jk ( λ ) = 0 ∀ k < j Γ jj ( λ ) = h(cid:16)P Ni =1 βδ ( v ij ) + α i ( ~v j ) δ ( v ij ) (cid:17) + | λ | i µ wj > jk ( λ ) = − P Ni =1 βδ ( v ik ) ~v j ( ~v k − e i ) µ wk , ∀ k > j

13o Γ jj ( λ ) = µ wj N X i =1 βδ ( v ij ) + α i ( ~v j ) δ ( v ij ) + | λ | µ wj = N X k =1 ,k = j µ wk N X i =1 [ βδ ( v ik ) ~v j ( ~v k − e i ) + α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i )] + | λ | µ wj = N X k =1 ,k = j | Γ jk ( λ ) | + N X k =1 ,k = j µ wk N X i =1 α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) + | λ | µ wj (21)Thus Γ is invertible as a strictly dominant diagonal matrix as soon as | λ | ≥

0. We will use thesame idea in what follows to show there is a unique way to express each L ( π wm )( λ ), m ∈ I , as alinear combination of terms of the family (cid:16) L ( π wk )(ˇ λ l ) (cid:17) ≤ l ≤ N,m ∈ I .Second, take a sequence k , k , ... , k d ∈ [[1 , N ]], d ≤ N − b λ k ...k d whichchecks the conditions b λ k i k ...k d = 0. We have from (19):Γ( b λ k ...k d )  L ( π w )( b λ k ...k d )... L ( π w N )( b λ k ...k d )  = Λ( b λ k ...k d ) (22)Using (20) we get:Λ j ( b λ k ...k d ) = N X k =1  X i/ ∈{ k ,...,k d } α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) L ( π wk )( b λ k ...k d m )  + X i ∈{ k ,...,k d } α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) | {z } Γ jk L ( π wk )( b λ k ...k d )  µ wk Hence we can decompose Λ j ( b λ k ...k d ) as follows:Λ( b λ k ...k d ) = Λ ( k ...k d ) ( λ ) + Γ  L ( π w )( b λ k ...k d )... L ( π w N )( b λ k ...k d )  Where Λ ( k ...k d ) j ( λ ) depends on λ only through (cid:16) L ( π wk )( b λ k ...k d m ) (cid:17) m/ ∈{ k ,...,k d } ,k ∈ I . Thus, equa-tion (22) can be rewritten as: h Γ( b λ k ...k d ) − Γ i| {z } Γ ( k ...kd ) ( b λ k ...kd )  L ( π w )( b λ k ...k d )... L ( π w N )( b λ k ...k d )  = Λ ( k ...k d ) ( λ ) (23)14ventually, we show Γ ( k ...k d ) ( b λ k ...k d ) is invertible as soon as | λ | ≥

0, denoting by K = { k , ..., k d } : N X k =1 , = j | Γ ( k ...k d ) jk ( b λ k ...k d ) | = X kj Γ( b λ k ...k d )= N X k = j X i ∈ K α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) + N X i =1 βδ ( v ik ) ~v j ( ~v k − e i ) ! µ wk = N X k =1 ,k = j µ wk N X i =1 (cid:0) α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) + βδ ( v ik ) ~v j ( ~v k − e i ) (cid:1) − N X k =1 ,k = j µ wk X i ∈ I \ K α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i )= Γ jj ( b λ k ...k d ) − | b λ k ...k d | µ wj − N X k =1 ,k = j µ wk X i ∈ I \ K α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i )Hence Γ ( k ...k d ) ( b λ k ...k d ) is invertible as a strictly dominant diagonal matrix as soon as | λ | ≥ L ( π wm )( λ ), m ∈ I , as a linear combination of termsof the family (cid:16) L ( π wk )(ˇ λ l = b λ ...l − l +1 ...N ) (cid:17) ≤ l ≤ N,m ∈ I . Step 2

To end with a way to compute L ( π w )( λ ) we show how to ﬁnd L ( π wm )(ˇ λ l ) and then we get a newsystem of the form: D (ˇ λ l )  L ( π w )(ˇ λ l )... L ( π w N )(ˇ λ l )  = Λ ( i ) , with Λ ( l ) ∈ R N a constant vector (24)The idea is the same as previously. We evaluate the expression (19) in all ˇ λ l which gives:Γ(ˇ λ l )  L ( π w )(ˇ λ l )... L ( π w N )(ˇ λ l )  = Λ(ˇ λ l ) (25)So at line j: Λ j (ˇ λ l ) = N X k =1  N X i =1 α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) L ( π wk )( b λ i ∩ ˇ λ l | {z } =(0 ,..., if i = l )  µ wk And as L ( π wk )(0 , ...,

0) = 1 we have:Λ j (ˇ λ l ) = N X k =1 µ wk N X i =1 ,i = l α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) | {z } Λ ( l ) j = cst + N X k =1 α l ( ~v k ) δ ( v lk ) ~v j ( ~v k + e l ) µ wk | {z } D jk for k j As previously we show thanks to (14) that whenever λ l ≥ | D jj (ˇ λ l ) | = N X k =1 ,k = j | D jk (ˇ λ l ) | + µ wk N X i =1 ,i = l α i ( ~v k ) δ ( v ik ) ~v j ( ~v k + e i ) + λ l µ wj > N X k =1 ,k = j | D jk (ˇ λ l ) | Hence, L ( π wk )(ˇ λ l ) are uniquely determined by (24) for all k, and for any l. Moreover, there isa unique way to express each L ( π wm )( λ ), m ∈ I , as a linear combination of terms of the family (cid:16) L ( π wk )(ˇ λ l ) (cid:17) ≤ l ≤ N,m ∈ I . We conclude that if it exists, (16) has a unique solution π w . As we know that synaptic weights dynamics are slow compare to the network dynamics, ( S t , V t ) t ≥ change fast compare to ( W t ) t> , so: X ˜ w ∈ G wi , ˜ w = w φ i ( s, ˜ w, w ) (cid:28) φ i ( s, w, w )Hence, in order to make a slow fast analysis we introduce the sequence ( (cid:15) n ) n ≥ , such thatlim n ∞ (cid:15) n = 0, as follows: X ˜ w ∈ G wi , ˜ w = w φ in ( s, ˜ w, w ) = O ( (cid:15) n ) = 1 − φ in ( s, w, w ) (26)We denote ϕ i functions such that φ in ( s, ˜ w, w ) = (cid:15) n ϕ i ( s, ˜ w, w ) + o ( (cid:15) n ) , so that X ˜ w ∈ G wi , ˜ w = w ϕ i ( s, ˜ w, w ) = O (1) (27) Remark . We give an example to illustrate (27) . Indeed, ϕ i depends on the choice of φ in whichis not unique. For instance, if the slow process comes from the fact p + / − are multiplied by (cid:15) n .Thus, φ in can be deduced from (4) where we replace p + / − by (cid:15) n p + / − : φ in ( s, w , w ) = Y j = i (cid:2) ζ jp (cid:15) n p + ( s j ) + (1 − ζ jp )(1 − (cid:15) n p + ( s j )) (cid:3) h ζ jd (cid:15) n p − ( s j ) + (1 − ζ jd )(1 − (cid:15) n p − ( s j )) i So, reminding that W ijt ≥ ∆ w for all t ≥ : φ in ( s, w = w − ∆ wE ij , w ) =  (cid:15) n p − ( s j ) Q k = i,j [(1 − (cid:15) n p − ( s k ))] Q k = i [(1 − (cid:15) n p + ( s k ))] if w ij > ∆ w if w ij = ∆ wφ in ( s, w = w + ∆ wE ij , w ) = (cid:15) n p + ( s j ) Y k = i (cid:2) (1 − (cid:15) n p − ( s k )) (cid:3) Y k = i,j (cid:2) (1 − (cid:15) n p + ( s k )) (cid:3) or all other w , φ in ( s, w , w ) = o ( (cid:15) n ) .Thus: φ in ( s, w = w − ∆ wE ij , w ) = (cid:26) (cid:15) n p − ( s j ) + o ( (cid:15) n ) if w ij > ∆ w if w ij = ∆ wφ in ( s, w = w + ∆ wE ij , w ) = (cid:15) n p + ( s j ) + o ( (cid:15) n ) Hence we give the ϕ i which veriﬁes conditions of (26) and (27) for this example: ϕ i ( s, w = w − ∆ wE ij , w ) = (cid:26) p − ( s j ) if w ij > ∆ w if w ij = ∆ wϕ i ( s, w = w + ∆ wE ij , w ) = p + ( s j ) ϕ i ( s, w , w ) = 0 for all other w We give another example: if we keep the normal p + / − , and deﬁne φ in as φ in = (cid:15) n φ i , then ϕ i = φ i . We now highlight the diﬀerence of time scale in the new generator C n which is the same as C with φ in instead of φ i . In the following, test functions we take are all in D ( C ) ∩ C b ( E ) : C n f ( w, s, v ) = N X i =1 ∂ s i f ( w, s, v ) + X i δ ( v i ) β [ f ( w, s, v − e i ) − f ( w, s, v )]+ X i α i ( w, v ) δ ( v i ) ( f ( w, s − s i e i , v + e i ) − f ( w, s, v )) φ in ( s, w, w )+ X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w, s − s i e i , v + e i ) − f ( w, s, v )) φ in ( s, ˜ w, w )  = N X i =1 ∂ s i f ( w, s, v ) | {z } B tr f ( w,s,v ) + X i δ ( v i ) β [ f ( w, s, v − e i ) − f ( w, s, v )] | {z } B ↓ f ( w,s,v ) + (1 − O ( (cid:15) n )) X i α i ( w, v ) δ ( v i ) ( f ( w, s − s i e i , v + e i ) − f ( w, s, v )) | {z } B ↑ f ( w,s,v ) + X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w, s − s i e i , v + e i ) − f ( w, s, v ))( (cid:15) n ϕ i ( s, ˜ w, w ) + o ( (cid:15) n ))  Denoting the operator A : D ( A ) ⊂ C b ( E ) → C b ( E ) by: A f ( w, s, v ) = X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w, s − s i e i , v + e i ) − f ( w, s, v )) ϕ i ( s, ˜ w, w )  (28)And A r : D ( A r ) ⊂ C b ( E ) → C b ( E ) by: A r f ( w, s, v ) = X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w, s − s i e i , v + e i ) − f ( w, s, v ))  (29)17nd B : D ( B ) ⊂ C b ( E ) → C b ( E ): B = B tr + B ↓ + B ↑ (30)With the previous assumptions on time scales we get the following process ( e X nt ) t ≥ =( f W nt , e S nt , e V nt ) t ≥ generated by: C n = (cid:15) n A + B + O ( (cid:15) n ) B ↑ + o ( (cid:15) n ) A r On this time scale, the network evolve at speed 1 and the plasticity at (cid:15) n . In order to apply resultsof [31], we will study the system (cid:15) n times faster and then denote by ( Y nt ) t ≥ = ( W nt , S nt , V nt ) t ≥ = (cid:16) e X n t(cid:15)n (cid:17) t ≥ . Thus, ( Y nt ) t ≥ is generated by: C n = 1 (cid:15) n C n = A + 1 (cid:15) n B + O (1) B ↑ + o (1) A r (31)We remark that ∀ w ∈ E , h ∈ D ( B ) ⊂ C b ( E ) the operator B w deﬁned by B w h ( s, v ) = B h ( w, s, v )is the one studied previously. In the above, we showed it has a unique invariant measure π w .Thereby, the process ( W nt , S nt , V nt ) t ≥ with generator C n is composed of a fast part which givesthe dynamics of the network, ( S nt , V nt ) t ≥ , and a slow one which gives the weights’ dynamics,( W nt ) t ≥ . Hence, we can expect that as n tends to inﬁnity, the fast part will quickly reach itsstationary distribution depending on the current weights whereas the weights will jump fromtime to time. As soon as weights jump, the network will reach a new stationary distributioninstantaneously. Weights jumps will depend on the network distribution. We apply Theorem 2.1of [31] in the special case of example 2.3 given in the same article which gives in our case thefollowing proposition: Proposition 3.13. ( W nt , S nt , V nt ) t ≥ converges, when n → + ∞ , in law to ( W t , S t , V t ) t ≥ where ( S t , V t ) ∼ π W t and ( W t ) is the solution of the martingale problem associated to the operator C av : D ( C av ) → C b ( E ) : C av f ( w ) = Z E A f ( w, s, v ) π w ( ds, dv ) (32) Proof.

We use the Theorem 2.1 of [31] twice. Once to link the occupation measure of the fastprocess to its invariant measure and then again to show (32).We denote by F nt the natural ﬁltration of ( W nt , S nt , V nt ) t ≥ . I will enumerate and show theproperties we need in order to apply [31].1. ( W nt ) t ≥ satisﬁes the compact containment condition that is for each (cid:15) > T > K ⊂ E such that:inf n P ( W nt ∈ K, t ≤ T ) ≥ − (cid:15) Proof.

We denote K i = (cid:8) ˜ w ∈ E s.t. ∀ k, l ∈ [[1 , N ]] , | ˜ w kl − w kl | ≤ i ∆ w (cid:9) . Therefore, we want toshow that for each (cid:15) > T > ∃ i large enough to have ∀ n ∈ N : P ( W nt ∈ K i − , t ≤ T ) ≥ − (cid:15) (33)But: P ( W nt ∈ K i − , t ≤ T ) = P (cid:18)f W nt ∈ K i − , t ≤ T(cid:15) n (cid:19) = 1 − P (cid:18) ∃ t ≤ T(cid:15) n , f W nt / ∈ K i − (cid:19)

18o we major P (cid:16) ∃ t ≤ T(cid:15) n , f W nt / ∈ K i − (cid:17) in what follows. As lim n ∞ T(cid:15) n = + ∞ , the time on whichwe are looking at our process is becoming larger and larger with n so we need the probabilityof jumping to become smaller and smaller as it is the case for ( f W nt ) t ≥ . Indeed, when neuron i jumps from 0 to 1, w ij and w ji for j = i have probability to jump of order (cid:15) n .First, from (26) there exists c > i jumped from 0 to 1 is less than c (cid:15) n , so for all i, s and w : X ˜ w ∈ G wi , ˜ w = w φ in ( s, e w, w ) = P (cid:16)f W nt = f W nt − | e V n,it − e V n,it − = 1 (cid:17) ≤ c (cid:15) n < X nt as the particular case of the process e X nt for which neuronsare independent and ﬁre at rate γ = max( β, α M ) and whenever a neuron i jumps (from 0 to 1 or1 to 0), W nt change with probability c (cid:15) n . We just impose that the size of weights jumps are asbefore: + / − ∆ w . Hence, in such a process weights jump more frequently. So denoting by N wt and N wt processes respectively counting the number of jumps of f W nt and W wt between 0 and t,and as previously, N t the counting process corresponding to the number of jump of the process( V t ) t ≥ . Thus: P (cid:18) ∃ t ≤ T(cid:15) n , f W nt / ∈ K i − (cid:19) = P (cid:18) ∃ k, l ∈ [[1 , N ]] , ∃ t ≤ T(cid:15) n , | ( f W nt ) kl − w kl | ≥ i ∆ w (cid:19) ≤ P (cid:16) N w T(cid:15)n ≥ i (cid:17) ≤ P (cid:16) N w T(cid:15)n ≥ i (cid:17) = + ∞ X k = i P ( N w T(cid:15)n = k )But P ( N w T(cid:15)n = k ) = + ∞ X m = k (cid:18) P ( N T(cid:15)n = m )( c(cid:15) n ) k (1 − c(cid:15) n ) m − k (cid:18) mk (cid:19)(cid:19) = + ∞ X m = k e − Nα M T(cid:15)n ( N α

M T(cid:15) n ) m m ! ( c(cid:15) n ) k (1 − c(cid:15) n ) m − k (cid:18) mk (cid:19)| {z } probability ( W nt ) t ≥ changed k times knowing ( V nt ) t ≥ jumped m times So for (cid:15) n small enough: P (cid:18) ∃ t ≤ T(cid:15) n , f W nt / ∈ K i − (cid:19) ≤ + ∞ X k = i + ∞ X m = k e − Nα M T(cid:15)n ( N α

M T(cid:15) n ) m m ! ( c(cid:15) n ) k (1 − c(cid:15) n ) m − k (cid:18) mk (cid:19)! ≤ + ∞ X k = i e − Nα M T(cid:15)n + ∞ X m = k (cid:18) ( N α M T ) m k !( m − k )! c k ( (cid:15) n ) m − k (1 − c(cid:15) n ) m − k (cid:19) ≤ + ∞ X k = i ( N α M T c ) k k ! e − Nα M T(cid:15)n + ∞ X m = k ( N α M T ) m − k ( (cid:15) n − c ) m − k ( m − k )! !| {z } e NαM ( 1 (cid:15)n − c ) T ≤ + ∞ X k = i ( N α M T c ) k k ! e − Nα M T ≤ + ∞ X k = i ( N α M T c ) k k ! −→ i → + ∞ ∃ i such that (33) is satisﬁed. 19. Moreover, deﬁne ∀ w ∈ E , h ∈ D ( B ) ⊂ C b ( E ) the operator B w by B w h ( s, v ) = B h ( w, s, s ).There exists a unique probability measure on E π w such that: Z E B w ( s, v ) π w ( ds, dv ) = 0 Proof.

See theorem 3.1.3. ∀ g ∈ D ( C ) ∩ C b ( E ) : g ( W nt ) − Z t A g ( W nu , S nu , V nu ) du + o (cid:15) n (1) Z t A r g ( W nu , S nu , V nu ) du (34)is a F nt martingale and ∀ ( w, s, v ) ∈ E lim n → + ∞ E ( w,s,v ) (cid:20) sup t ≤ T (cid:12)(cid:12)(cid:12)(cid:12) o (cid:15) n (1) Z t A r g ( W nu , S nu , V nu ) du (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) = 0 (35) Proof. ∀ f ∈ D ( C ) = D ( C n ): f ( W nt , S nt , V nt ) − Z t C n f ( W nu , S nu , V nu ) du (36)is a F nt martingale and ∀ g ∈ D ( C ) ∩ C b ( E ) (cid:15) n C n g ( w, s, v ) = X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( g ( ˜ w ) − g ( w )) φ in ( s, ˜ w, w )  = (cid:15) n A g ( w, s, v ) + o ( (cid:15) n ) A r g ( w, s, v )So (34) is a F nt martingale.Moreover, as g ∈ D ( A ) ∩ C b ( E ) and max i ∈ I,w ∈ E ( G wi ) ∩ ≤ N − , ∃ M > |A r g ( w, s, v ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i α i ( w, v ) δ ( v i ) X ˜ w ∈ G wi , ˜ w = w ( g ( ˜ w ) − g ( w )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N α M (cid:18) max i ∈ I,w ∈ E G wi (cid:19) x ∈ E | g ( x ) | ≤ M Hence, ∀ ( w, s, v ) ∈ E , E ( w,s,v ) h sup t ≤ T (cid:12)(cid:12)(cid:12) o (cid:15) n (1) R t A r g ( W nu , S nu , V nu ) du (cid:12)(cid:12)(cid:12)i = o (cid:15) n (1), thus condi-tion (35) is satisﬁed.4. Similarly, ∀ h ∈ D ( C ) ∩ C b ( E ) h ( S nt , V nt ) − Z t (cid:15) n B h ( W nu , S nu , V nu ) du (37)is a F nt martingale 20 roof. (cid:15) n C n h ( w, s, v ) = N X i =1 ∂ s i h ( s, v ) + X i δ ( v i ) β [ h ( s, v − e i ) − h ( s, v )]+ X i α i ( w, v ) δ ( v i ) ( h ( s − s i e i , v + e i ) − h ( s, v )) φ in ( s, w, w )+ X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( h ( s − s i e i , v + e i ) − h ( s, v )) φ in ( s, ˜ w, w )  = N X i =1 ∂ s i h ( s, v ) + X i δ ( v i ) β [ h ( s, v − e i ) − h ( s, v )]+ X i α i ( w, v ) δ ( v i ) ( h ( s − s i e i , v + e i ) − h ( s, v ))  φ in ( s, w, w ) + X ˜ w ∈ G wi , ˜ w = w φ in ( s, ˜ w, w ) | {z } =1 = B h ( w, s, v )So C n h = 1 (cid:15) n B h As (36)is a F nt martingale, (37) is a F nt martingale too.Thus, conditions of example 2.3 of [31] are satisﬁed and ( W nt , S nt , V nt ) t ≥ converges, when n → + ∞ , in law to ( W t , S t , V t ) t ≥ where ( S t , V t ) ∼ π W t and ( W t ) is the solution of the martingaleproblem associated to the operator C av : D ( C av ) ⊂ C b ( E ) → C b ( E ): C av f ( w ) = Z E A f ( w, s, v ) π w ( ds, dv )Indeed, we use theorem 2.1 of [31] twice. First the point 1., 2. and 4. enable us to use thetheorem to obtain that when n → + ∞ , ( W n , Γ n ) → ( W, Γ) such that there exists a ﬁltration {G t } such that M t = Z t Z E B f ( W ( s ) , y )Γ( ds × dy )is a {G t } -martingale for each f ∈ D ( C | C b ( E ) ). But M t is continuous and of bounded variation,so it must be constant (see for instance Theorem 27 of [43]) and ﬁnally M t = 0 for all t >

0. Wethen write Γ( ds × dy ) = γ s ( dy ) ds and get Z t Z E B f ( W ( s ) , y ) γ s ( dy ) ds = 0And then Z E B f ( W ( s ) , y ) γ s ( dy ) = 0So we can take γ s ( dy ) = π W s ( dy ) is the unique invariant measure for B x such that B x f ( y ) = B f ( x, y ). We conclude using 1.,2. and 3. and the Theorem 2.1 of [31] which gives that Z t Z E A f ( W ( s ) , y )Γ( ds × dy )21 martingale and thus ( W t ) is the solution of the martingale problem associated to the operator C av : D ( C av ) ⊂ C b ( E ) → C b ( E ): C av f ( w ) = Z E A f ( w, s, v ) π w ( ds, dv )This time scale separation gives the inﬁnitesimal generator of the weight process on the slowtime scale. However, we don’t know explicitly π w but its Laplace transform. Under some simpleassumptions, we can get explicitly the dynamic of the weights which is a Markov process on E with non-homogeneous jump rates depending on the Laplace transform of π w . Proposition 3.14.

Suppose that for all i ∃ Φ i ( ˜ w,w ) such that ϕ i ( s, ˜ w, w ) = L (cid:16) Φ i ( ˜ w,w ) (cid:17) ( s ) .Then, C av f ( w ) = X ˜ w ∈ G w , ˜ w = w ( f ( ˜ w ) − f ( w )) X v ∈ I µ wv X i s.t. ˜ w ∈ G wi α i ( w, v ) δ ( v i ) Z R N + Φ i ( ˜ w,w ) ( s ) L ( π wv )( s )( ds )  Where G w = { w ∈ E , P ( W = w | W = w ) > } and µ wv is the invariant measure of the processgenerated by B deﬁned in (13) .Proof. If we develop the inﬁnitesimal generator of the process ( W t ) t ≥ . Thanks to (32) and (28)we get: C av f ( w ) = Z E A f ( w, s, v ) π w ( ds, dv ) = X v ∈ I Z E A f ( w, s, v ) µ wv π wv ( ds )= X v ∈ I Z E X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w ) − f ( w )) ϕ i ( s, ˜ w, w )  µ wv π wv ( ds )= X v ∈ I µ wv X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w ) − f ( w )) Z E ϕ i ( s, ˜ w, w ) π wv ( ds )  With the assumption that for all i ∃ Φ i ( ˜ w,w ) such that ϕ i ( s, ˜ w, w ) = L (Φ ( ˜ w,w ) )( s ) we get: C av f ( w ) = X v ∈ I µ wv X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w ) − f ( w )) Z R N + L (Φ i ( ˜ w,w ) )( s ) π wv ( ds )  = X v ∈ I µ wv X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w ) − f ( w )) Z R N + Φ i ( ˜ w,w ) ( s ) L ( π wv )( s )( ds )  = X ˜ w ∈ G, ˜ w = w ( f ( ˜ w ) − f ( w )) X v ∈ I µ wv X i s.t. ˜ w ∈ G wi α i ( w, v ) δ ( v i ) Z R N + Φ i ( ˜ w,w ) ( s ) L ( π wv )( s )( ds )  Suﬃcient conditions for recurrence and transience

Plasticity models evolved interacting with neurologists’ discoveries. For instance, models basedon STDP conﬁrmed the need of homeostasis in order to regulate evolution of weights: preventfrom their divergence or extinction, need of competition. Indeed, Hebbian learning suﬀers from apositive feedback instability and lead to all neurons wiring together [48]. Synaptic scaling andmetaplasticity are the main homeostatic mechanisms used in models through diﬀerent ways [47].In our model we don’t have such mechanisms, like hard or soft bounds, but we can show thatweights still stabilize under some conditions. We propose some general conditions which wemanage to express in a simple condition on parameters of our model.In our case, we are faced with a non-homogeneous in space and homogeneous in time Markovprocess which is in a space equivalent to N N . A few results exists for such processes. Asunderlines authors of the book [37], Lyapunov techniques seem to be the most adapted to analysesuch processes.For the sake of simplicity and as it doesn’t change anything in what follows, we consider now∆ w = 1. Then E = N N ∗ . Also, we are interested in the case presented in the ﬁrst example givenin remark 2. Therefore, the slow process comes from the fact p + / − are multiplied by (cid:15) n , so: ϕ i ( s, w = w − E ij , w ) = (cid:26) p − ( s j ) if w ij >

10 if w ij = 1 ϕ i ( s, w = w + E ij , w ) = p + ( s j ) ϕ i ( s, w , w ) = 0 for all other w If we develop the inﬁnitesimal generator of the process ( W t ) t ≥ . Thanks to (32) and (28) we get: C av f ( w ) = Z E A f ( w, s, v ) π w ( ds, dv ) = X v ∈ I Z E A f ( w, s, v ) µ wv π wv ( ds )= X v ∈ I Z E X i α i ( w, v ) δ ( v i )  X ˜ w ∈ G wi , ˜ w = w ( f ( ˜ w ) − f ( w )) ϕ i ( s, ˜ w, w )  µ wv π wv ( ds )= X i,j : i = j ( f ( w + E ij ) − f ( w ))  X v,v j =0 µ wv α j ( w, v ) Z E p + ( s i ) π wv ( ds )  + X i,j : i = j ]1 , + ∞ [ ( w ij )( f ( w − E ij ) − f ( w ))  X v,v i =0 µ wv α i ( w, v ) Z E p − ( s j ) π wv ( ds )  (38)Denoting rate of jump by r + / − ij ( w ) we get: C av f ( w ) = X i,j ( f ( w + E ij ) − f ( w )) r + ij ( w ) + ]1 , + ∞ [ ( w ij )( f ( w − E ij ) − f ( w )) r − ij ( w ) (39)23 .1 General conditions for positive recurrence and transience Proposition 4.1.

Assume the following conditions:• ∃ I + / − m , I + / − M ∈ R ∗ + such that I + / − m ≤ r + / − ij ( w ) ≤ I + / − M for all w ,• I − m > I + M which leads to r + ij ( w ) − r − ij ( w ) ≤ I + M − I − m < for all w Then, the process ( W t ) t ≥ associated to the generator C av given in (39) is positive recurrent.Proof. We use proposition 1.3 from Hairer’s course [22]. In order to check assumptions of thisproposition, we need to ﬁnd a function f : E → R + such that lim x → + ∞ f ( x ) = + ∞ and ∃ A ⊂ E ﬁnite such that for all w ∈ E \ A : C av f ( w ) ≤ − f : E → R + as: ∀ w ∈ E , f ( w ) = X i,j : i = j ( w ij ) = || w || So C av f ( w ) = X i,j : i = j ( || w + E ij || − || w || ) r + ij ( w ) + X i,j : i = j, w ij =1 ( || w − E ij || − || w || ) r − ij ( w )= X i,j : i = j (2 w ij + 1) r + ij ( w ) + X i,j : i = j, w ij =1 ( − w ij + 1) r − ij ( w )= X i,j : i = j, w ij =1 (2 w ij + 1) r + ij ( w ) + X i,j : i = j, w ij =1 ( − w ij + 1) r − ij ( w ) + X i,j : i = j, w ij =1 (2 w ij + 1) r + ij ( w )= X i,j : i = j, w ij =1 w ij ( r + ij ( w ) − r − ij ( w )) + X i,j : i = j, w ij =1 (cid:0) r − ij ( w ) + r + ij ( w ) (cid:1) + X i,j : i = j, w ij =1 (2 w ij + 1) r + ij ( w ) ≤ X i,j : i = j, w ij =1 w ij ( r + ij ( w ) − r − ij ( w )) + ( N − { w ij = 1 } ) r − ij ( w ) + N r + ij ( w ) + 2 { w ij = 1 } r + ij ( w ) | {z } ≤ N ( r + ij ( w )+ r − ij ( w )) ≤ N ( I + M + I − M ) As r + ij ( w ) − r − ij ( w ) ≤ I + M − I − m < w such that || w || > N (to enforce that at least one w kl > C av f ( w ) ≤ i,j : i = j ( w ij )( I + M − I − m ) + 3 N (( I + M + I − M )) −→ || w ||→ + ∞ −∞ Let w sep ∈ N ∗ + be such that w ≥ w sep ⇒ w ( I + M − I − m ) + 3 N ( I + M + I − M ) ≤ − i,j ( w ij ) ≥ || w || N , we deﬁne A = { w, || w || > N w sep } so: w ∈ A ⇒ max i,j ( w ij ) ≥ w sep ⇒ i,j ( w ij )( I + M − I − m ) + 3 N (( I + M + I − M )) ≤ − A = A c = { w, || w || ≤ N w sep } . A is ﬁnite and for all w ∈ E \ A : C av f ( w ) ≤ − W t ) t ≥ .24 orollary 4.2. If lim r → + ∞ sup w ∈ Σ , k w k≥ r ( r + ij ( w ) − r − ij ( w )) < Then, the process ( W t ) t ≥ associated to the generator C av given in (39) is positive recurrent.Proof. Exactly the same as the proof of proposition 4.1.

Proposition 4.3. p + ( s ) − p − ( s ) > γ > for all s ∈ R ∗ + imply transience of ( W t ) t ≥ .Proof. Let deﬁne A = { w ∈ E s.t. min w ij ≤ w > } And f : E → R + such that: f ( x ) =  N w if x ∈ A, P x ij if x ∈ A c Thus, inf A f = N w so for all w ∈ A c , f ( w ) < inf A f .Moreover, C av f ( w ) = X i,j ( f ( w + E ij ) − f ( w )) r + ij ( w ) + ]1 , + ∞ [ ( w ij )( f ( w − E ij ) − f ( w )) r − ij ( w )= X i,j − w ( w + 1) r + ij ( w ) + ]1 , + ∞ [ ( w ij ) 1 w ( w − r − ij ( w ) ≤ X i,j − w ( w + 1) r + ij ( w ) + 1 w ( w − r − ji ( w ) ≤ X i X v,v i =0 µ wv α i ( w, v ) X j = i w ( w + 1) Z E ( p − ( s j ) − p + ( s j )) π wv ( ds ) < − γ w ( w + 1) X i X v,v i =0 µ wv α i ( w, v ) ≤ p + ( s ) − p − ( s ) < − γ < s ∈ R ∗ + imply positive recurrenceof ( W t ) t ≥ as we showed in simulations. Remark . Denoting by η ( w ) the expectation of jumps of ( W t ) t ≥ , we easily get that η ij ( w ) =( r + ij ( w ) − r − ij ( w ))∆ w . Thus, conditions on ( r + ij ( w ) − r − ij ( w )) are equivalent to conditions on η ij ( w ).We now compute the constants I + / − m , I + / − M ∈ R ∗ + in order to derive a simple condition oftransience or recurrence depending on parameters. We want to bound the following quantities: r + / − ij ( w ) = X v,v j =0 µ wv α j ( w, v ) Z E p + / − ( s i ) π wv ( ds )The main idea is to use that 0 < α m ≤ α j ( w, v ) ≤ α M so α m Z E p + ( s i )  X v,v j =0 µ wv π wv ( ds )  ≤ X v,v j =0 µ wv α j ( w, v ) Z E p + ( s i ) π wv ( ds ) ≤ α M Z E p + ( s i )  X v,v j =0 µ wv π wv ( ds )  Z E p + ( s i )  X v,v j =0 µ wv π wv ( ds )  But for all diﬀerentiable p + , by Fubini: Z E p + ( s i ) π wv ( ds ) = Z E (cid:18)Z s i ( p + ) ( u ) du + p + (0) (cid:19) π wv ( ds )= p + (0) + Z E (cid:18)Z + ∞ ( p + ) ( u ) { u u | V t = v (cid:1) ( p + ) ( u ) du We are ﬁnally interested in bounding X v,v j =0 µ wv P π w (cid:0) S it > u | V t = v (cid:1) = X v,v j =0 P π w ( V t = v ) P π w (cid:0) S it > u | V t = v (cid:1) = P π w (cid:16) S it > u, V jt = 0 (cid:17) Proposition 4.4.

For all w ∈ E : α M e − βu − β e − α M u α M − β βα M + β ≤ P π w (cid:16) S it > u, V jt = 0 (cid:17) ≤ α m e − βu − β e − α m u α m − β βα m + β (41)Let ( V t , S t ) and ( V t , S t ) be the processes for which (cid:16) ( V it , S it ) t ≥ (cid:17) i (the same for (cid:16) ( V it , S it ) t ≥ (cid:17) i )are independent each other and neurons jump from 0 to 1 respectively with a rate α M and α m and from 1 to 0 with the rate β . We thus get for similar trajectories, for all t ≥ i : S it ≤ S it ≤ S it Thus, we can bound P π w (cid:0) S it > u | V t = v (cid:1) as follows: P π w (cid:16) S it > u, V jt = 0 (cid:17) ≤ P π w (cid:16) S it > u, V jt = 0 (cid:17) ≤ P π w (cid:16) S it > u, V jt = 0 (cid:17) So P π w (cid:16) S it > u (cid:17) P π w (cid:16) V jt = 0 (cid:17) ≤ P π w (cid:16) S it > u, V jt = 0 (cid:17) ≤ P π w (cid:16) S it > u (cid:17) P π w (cid:16) V jt = 0 (cid:17) (42)First, let bound P π w (cid:16) V jt = 0 (cid:17) = P v,v j =0 . Proposition 4.5.

For all i , w : βα M + β ≤ X v,v i =0 µ wv ≤ βα m + β (43)26 roof. Let recall from (13) the generator of the process of neurons ( V t ) only when w is ﬁxedjump: B g ( v ) = N X i =1 βδ ( v i )[ g ( v − e i ) − g ( v )] + α i ( w, v ) δ ( v i ) [ g ( v + e i ) − g ( v )]Which gives for the invariant measure µ w = ( µ wv ) v ∈ I : X v ∈ I B g ( v ) µ wv = 0Thus, let i ∈ [[1 , N ]] with g i ( v ) = δ ( v i ), we get:0 = X v ∈ I B g i ( v ) µ wv = X v ∈ I µ wv N X j =1 βδ ( v j )[ g i ( v − e j ) − g i ( v )] + α j ( w, v ) δ ( v j ) [ g i ( v + e j ) − g i ( v )]= X v ∈ I,v i =0 µ wv N X j =1 βδ ( v j )[ g i ( v − e j ) − g i ( v )] + α j ( w, v ) δ ( v j ) [ g i ( v + e j ) − g i ( v )] X v ∈ I,v i =1 µ wv N X j =1 βδ ( v j )[ g i ( v − e j ) − g i ( v )] + α j ( w, v ) δ ( v j ) [ g i ( v + e j ) − g i ( v )]= X v ∈ I,v i =0 µ wv ( − α i ( w, v )) + X v ∈ I,v i =1 µ wv β Indeed, when v i = 0, for all j = i one has g i ( v − e j ) − g i ( v ) = g i ( v + e j ) − g i ( v ) = 1 − g i ( v + e i ) − g i ( v ) = 0 − − δ ( v i ) = 0. Doing the same reasoning with v i = 1 we get the lastline. Then, we also know that P v ∈ I,v i =0 µ wv + P v ∈ I,v i =1 µ wv = 1 so:0 = X v ∈ I,v i =0 µ wv ( − α i ( w, v )) + X v ∈ I,v i =1 µ wv β = X v ∈ I,v i =0 µ wv ( − α i ( w, v )) + (1 − X v ∈ I,v i =0 µ wv ) β = β − ( β X v ∈ I,v i =0 µ wv + X v ∈ I,v i =0 µ wv α i ( w, v ))Finally,( β + α m ) X v ∈ I,v i =0 µ wv ≤ ( β X v ∈ I,v i =0 µ wv + X v ∈ I,v i =0 µ wv α i ( w, v )) ≤ ( β + α M ) X v ∈ I,v i =0 µ wv We conclude that βα M + β ≤ X v,v i =0 µ wv ≤ βα m + β (44)We now focus our interest on computations of P π w (cid:16) S it > u (cid:17) and P π w (cid:16) S it > u (cid:17) .27t is interesting to note that the previous inequality holds for all t ≥

0. We already showedin theorem 3.1 that each of ( S it , V it ) and ( S it , V it ) possesses a unique invariant measure ( S ∞ , V ∞ )and ( S ∞ , V ∞ ). Therefore, as (42) is true for all t ≥

0, we get: P (cid:0) S ∞ > u (cid:1) P π w (cid:16) V jt = 0 (cid:17) ≤ P π w (cid:16) S it > u, V jt = 0 (cid:17) ≤ P (cid:0) S ∞ > u (cid:1) P π w (cid:16) V jt = 0 (cid:17) (45)We turn on the computing of measures of ( S ∞ , V ∞ ) and ( S ∞ , V ∞ ) from their Laplace trans-forms. To do so, we study the process ( S t , V t ) ∈ R + ×{ , } with the following generator( A , D ( A )): A f ( s, v ) = βδ ( v ) ( f ( s, − f ( s, αδ ( v ) ( f (0 , − f ( s, ∂ s f ( s, v ) (46) Proposition 4.6.

The invariant probability measure π ( ds, v ) of ( S t , V t ) is: π ( ds, v ) = αβα − β ( e − βs − e − αs ) dsµ ( v ) + βe − βs dsµ ( v ) (47) Proof.

As in (15), π can be written as: π ( ds, v ) = π ( ds ) ( v ) µ + π ( ds ) ( v ) µ . In this case, µ = βα + β and µ = αα + β . Moreover, it is an invariant measure if and only if E π [ A f ] = 0 , ∀ f ∈ D ( A ). Thanks to functions f well-chosen we get equations on Laplace transforms of π and π .Denoting e iλ ( s, v ) = e − λs δ v ( i ) we get:  βα + β R R + Ae λ ( s, π ( ds ) + αα + β R R + Ae λ ( s, π ( ds ) = 0 βα + β R R + Ae λ ( s, π ( ds ) + αα + β R R + Ae λ ( s, π ( ds ) = 0We remind us that: A f ( s, v ) = βδ ( v ) ( f ( s, − f ( s, αδ ( v ) ( f (0 , − f ( s, ∂ s f ( s, v )Thus:  βα + β R R + ( αe − λs + λe − λs ) π ( ds ) = αα + β R R + βe − λs π ( ds ) β R R + απ ( ds ) = α R R + ( β + λ ) e − λs π ( ds )But R R + π ( ds ) = R R + π ( ds ) = 1.Therefore:  R R + ( α + λ ) e − λs π ( ds ) = α R R + e − λs π ( ds ) R R + e − λs π ( ds ) = β ( β + λ ) ⇒ π ( s ) = βe − βs So  R R + e − λs π ( ds ) = αβ ( α + λ )( β + λ ) ⇒ π ( s ) = αβα − β ( e − βs − e − αs ) π ( s ) = βe − βs

28e ﬁnally check the measure is invariant, that is to say: E π [ A f ] = βα + β Z R + A f ( s, π ( ds ) + αα + β Z R + f ( s, π ( ds )= αβ ( α + β )( α − β ) Z R + ( − αf ( s,

0) + ∂ s f ( s, e − βs − e − αs ) ds + αβα + β Z R + ( β ( f ( s, − f ( s, ∂ s f ( s, e − βs ds = αβ ( α + β )( α − β ) "Z R + ( − αf ( s,

0) + ∂ s f ( s, e − βs ds + Z R + ( αf ( s, − ∂ s f ( s, e − αs ds + αβα + β "Z R + βf ( s, e − βs ds + Z R + ( − βf ( s,

1) + ∂ s f ( s, e − βs ds = (cid:20) αβ ( β − α )( α + β )( α − β ) + αβ ( α + β ) (cid:21) Z R + f ( s, e − βs ds = 0Moreover, P v ∈{ , } R R + π ( ds, v ) = βα + β R R + π ( ds ) + αα + β R R + π ( ds ) = 1 completes the proof.We can now go on the proof of proposition 4.4. Proof.

We replace α by α M for S ∞ and by α m for S ∞ : P (cid:0) S ∞ > u (cid:1) = Z ∞ u ( π α M ( ds,

0) + π α M ( ds, Z ∞ u (cid:18) α M βα M − β ( e − βs − e − α M s ) βα M + β + βe − βs α M α M + β (cid:19) ds = α M e − βu − β e − α M u α M − β And P (cid:0) S ∞ > u (cid:1) = Z ∞ u ( π α m ( ds,

0) + π α m ( ds, α m e − βu − β e − α m u α m − β So from (45): α M e − βu − β e − α M u α M − β P π w (cid:16) V jt = 0 (cid:17) ≤ P π w (cid:16) S it > u, V jt = 0 (cid:17) ≤ α m e − βu − β e − α m u α m − β P π w (cid:16) V jt = 0 (cid:17) Hence with (44): α M e − βu − β e − α M u α M − β βα M + β ≤ P π w (cid:16) S it > u, V jt = 0 (cid:17) ≤ α m e − βu − β e − α m u α m − β βα m + β (48)29rom this proposition we deduce bounds on the rates r + / − ij ( w ) = P v,v i =0 µ wv α i ( w, v ) R E p + / − ( s j ) π wv ( ds )for all p + and p − diﬀerentiable monotone. If functions p + and p − are decreasing: (cid:18) p + / − (0) + Z + ∞ (cid:18) α m e − βu − β e − α m u α m − β (cid:19) ( p + / − ) ( u ) du (cid:19) α m X v,v j =0 µ wv ≤ r + / − ij ( w ) ≤ (cid:18) p + / − (0) + Z + ∞ (cid:18) α M e − βu − β e − α M u α M − β (cid:19) ( p + / − ) ( u ) du (cid:19) α M X v,v j =0 µ wv We ﬁnally conclude with p + ( s ) = A + e − sτ + and p − ( s ) = A − e − sτ − : r + / − ij ( w ) ≥ (cid:18) A + / − − Z + ∞ (cid:18) α m e − βu − β e − α m u α m − β (cid:19) A + / − τ + / − e − uτ + / − du (cid:19) α m X v,v j =0 µ wv ≥ A + / − α m  − α m τ + / − β +1 − β τ + / − α m +1 α m − β  X v,v j =0 µ wv ≥ A + / − α m βα M + β  − α m τ + / − β +1 − β τ + / − α m +1 ( α m − β )  To get the last inequality, we used the fact that 1 − α mτ + / − β +1 − β τ + / − αm +1 ( α m − β ) ≥ r + / − ij ( w ): r + / − ij ( w ) ≤ A + / − α M βα m + β  − α M τ + / − β +1 − β τ + / − α M +1 ( α M − β )  But we showed that if r + ij ( w ) < r − ij ( w ) for all w we get that the limit process W t is recurrentpositive so it is the case if: A + α M α m + β  − α M τ + β +1 − β τ + α M +1 ( α M − β )  < A − α m α M + β  − α m τ − β +1 − β τ − α m +1 ( α m − β )  Finally we get the following simple condition: α M A + τ + ( α M τ + + βτ + + 1)( τ − α m + 1)( τ − β + 1) α m A − τ − ( α m τ − + βτ − + 1)( τ + α M + 1)( τ + β + 1) < p + and p − are not monotone, we can get a similar condition separating intervals where theyare increasing or decreasing. 30inally, previous results show that in our model weights can diverge although rates are boundedand we can give simple explicit condition on parameters for which they don’t diverge. This isthe ﬁrst time, to our knowledge, that such a condition can be given without any homeostaticmechanisms added. Some analytical studied previously needed to add some constraints in orderto bound weights and obtained results depending on the spike correlation matrix they were notable to control [18, 29, 40]. With such a condition, our model becomes ready to use being awareof criticizes we present in the sixth section. As shown in the appendix A, we can ﬁnd the Laplace transform of π , the invariant measure of thefast process. However, inverting it analytically for a network of N neurons, N too large, needs tooheavy computations. Hence, we apply our results in a network of 2 neurons and then simulate abigger network. But ﬁrst let remind us the parameters present in our model. Even if simple, our model depends on many parameters. First, let’s recall the probability tojump: p + ( s ) = A + e − sτ + and p − ( s ) = A − e − sτ − Then let’s detail the function ξ i we used in our simulations. We used the same ξ i = ξ for allneurons, σ > θ > ξ i ( x ) = ξ ( x ) = S e − σ ( x − θ ) + α m Our parameters are then: (cid:15), A + , A − , τ − , τ + , σ, θ, β, α m and α M . Time of inﬂuence of a spike10ms so β ∼ .

1. Firing rates of neurons are bounded by α m ∼ .

01 and α M ∼

1. STDP param-eters are in the following range: τ + / − ∈ [1 , , A + / − ∈ [0 , S = α M , σ = 0 . , θ = ln ( α M /α m − σ and (cid:15) ≤ . p + and p − enable to be close to biological experiments [7]: -100 0 1000.00.51.0 Model STDP curve dt(ms) d w / w Wij

Figure 3: Bi-Poo experiment on our model compare to the real one. Parameters used here are: A + =1 , A − =0 . , τ − =2 τ + =34 ms as in [20]. .2 First applications of our results In the simple case of (3) we get: r + ij ( w ) = X v,v j =0 µ wv α j ( w, v ) Z E p + ( s i ) π wv ( ds )= X v ∈ I,v j =0 µ wv α j ( w, v ) Z E A + e − siτ + π wv ( ds )= X v ∈ I,v j =0 µ wv α j ( w, v ) A + L{ π wv } (0 , ..., , τ + |{z} i , , ..., r − ij ( w ) = X v ∈ I,v i =0 µ wv α i ( w, v ) A − L{ π wv } (0 , ..., , τ − |{z} j , , ..., One weight free and 2 neurons:

In this example of one weight free and 2 neurons, we get a birth and death process with w ﬁxed, w =( w , w ). We can ﬁnd the explicit stationnary distribution of the weights in that case.From previous computations we have: w → w + ∆ w : r + ( w ) = A + (cid:20) µ w α ( w, L ( π w ) (cid:18) τ + , (cid:19) + µ w α ( w, L ( π w ) (cid:18) τ + , (cid:19)(cid:21) w → w − ∆ w : r − ( w ) = ]∆ w, + ∞ [ ( w ) A − (cid:20) µ w α ( w, L ( π w ) (cid:18) , τ − (cid:19) + µ w α ( w, L ( π w ) (cid:18) , τ − (cid:19)(cid:21) Hence, it is similar to a birth process on N with 0 reﬂecting. In order to study the conditions fortransience and recurrence, we use the following theorem which gather some results of the fourﬁrst sections of [28] with its notations. Theorem 5.1.

Suppose X t is a birth and death process on N with birth rates λ k > for all k ∈ N and death rates µ k > for all k ∈ N ∗ and µ = 0 . Then [28] gives the following classiﬁcation:(a) The process is ergodic if and only if P + ∞ i =1 Q ij =1 µ j λ j = + ∞ and P + ∞ i =1 Q ij =1 λ j − µ j < + ∞ . Inthis case, there exists a unique θ invariant measure given by: θ ( i ) = θ (0) i Y j =1 λ j − µ j With θ (0) = 11 + P + ∞ i =1 Q ij =1 λ j − µ j (b) The process is null recurent if and only if P + ∞ i =1 Q ij =1 µ j λ j = + ∞ and P + ∞ i =1 Q ij =1 λ j − µ j = + ∞ (b) The process is transient if and only if P + ∞ i =1 Q ij =1 µ j λ j < + ∞ and P + ∞ i =1 Q ij =1 λ j − µ j = + ∞

32n order to apply this theorem to our example, we prove the following corollary.

Corollary 5.2.

Suppose assumptions of theorem 5.1 hold. Suppose in more that λ k and µ k converge respectively towards λ and µ when k → + ∞ . Then X t is ergodic iﬀ < λ < µ andtransient if λ > µ > .Proof. Let prove only the ergodic case as the proof for the transient one is similar. Suppose that0 < λ < µ . Thus, for all (cid:15) > ∃ k ∈ N such that for all j > k , λ j − µ j ≤ λµ − (cid:15) = l (cid:15) and λ j µ j ≤ l (cid:15) .Taking l (cid:15) < Remark . The case λ = µ > is more complex as it will depend on the way ( λ k ) and ( µ k ) converge. We come back to our example.

Proposition 5.3. r + ( w , w ) and r − ( w , w ) are strictly positive and converge respectivelyto R + ( α M , w ) > and R − ( α M , w ) > when w → ∞ .Proof. First, α ( w,

00) = α ( w,

00) = ξ (0) = α m and α ( w,

01) = ξ ( w ) don’t depend on w .Second, x

7→ L π wv (0 , x ), x

7→ L π wv ( x,

0) and µ wv depend on w only through α ( w,

10) = ξ ( w ).But lim w → + ∞ ξ ( w ) = α M so ~µ w converges to ~µ solution of (50) with α = α ( w,

01) = ξ ( w )and α = lim w → + ∞ α ( w,

10) = lim w → + ∞ ξ ( w ) = α M . Concerning x

7→ L π wv (0 , x ),we can ﬁx x = x and call f v ( ξ ( w )) = L π wv (0 , x ). Computations of 54 show that for all v ∈ { , } , 0 < f v ( y ) < ∞ for all y ∈ [ α m , α M ] and is continuous as a positive boundedrational fraction. Hence, w f v ( ξ ( w )) is continuous by composition. We conclude thatlim w → + ∞ L π wv (0 , x ) = f v ( α M ) and:lim w →∞ r + ( w ) = lim w →∞ A +  µ w α m L ( π w ) (cid:18) τ + , (cid:19)| {z } −→ w →∞ f ( α M ) + µ w α M L ( π w ) (cid:18) τ + , (cid:19)| {z } −→ w →∞ f ( α M )  = A + ( µ α m f ( α M ) + µ α M f ( α M ))= R + It is similar for x

7→ L π wv ( x, w →∞ r − ( w ) = R − Hence, by corollary 5.2, R + < R − ensures the process w t admits a unique invariant measure θ : θ ( i ∆ w ) = θ (∆ w ) i Y j =2 r + (( j − w ) r − ( j ∆ w )With θ (∆ w ) = 11 + P + ∞ i =1 Q ij =2 r + (( j − w ) r − ( j ∆ w )

33e then wonder when this condition holds and we did simulations with parameters in therange of biological ones. Practically, explosion of the weight reﬂects the fact that LTP winsover LTD. Some studies has tried to tackle question of the relationship between STDP curveparameters, τ + / − and A + / − , and the balance of LTP and LTD. They showed that when theintegral of the STDP window is enough biased toward depression the system is intrinsicallystable [25, 29, 30]. In our case, we can ﬁnd examples for which the "enough" is important. Forinstance with the following parameters, we get an explosion of w when depression wins againstpotentiation: β = 0 . , α m = 0 . , α M = 1 , τ + = 17 ms, τ − = 34 ms, A − = 0 . , A + = 0 . , (cid:15) = 10 − We took p + / − (cid:15) = (cid:15)p + / − . When (cid:15) is small enough ( ≤ − ) simulations agrees with analyticalresults. That is to say w diverges when w <

25 and doesn’t diverge when w > + p - Figure 4: Plot of r + ( w ) − r − ( w ) (left) and plot of p + , p − on the same graph(right) time(ms) w time(ms) w Figure 5: Evolution of the weight w when w is ﬁxed at (left) and 30 (right) and (cid:15) = 10 − Remark . We can even get divergence when p + ( s ) < p − ( s ) for all s ∈ R + xample with 2 excitatory neurons Let’s apply this result in a network of 2 excitatory neurons. First, we denote w = ( w , w ) sincethe diagonal elements are null. We are interested in the sign of the limit of sup k w k≥ r ( r ij + ( w ) − r ij − ( w ))which is equivalent to sup k w k≥ r ( η ( w )) ij (see 3), when r → ∞ , in order to use corollary 4.2 tostudy stability of weights. We ﬁrst show this limit exists and then compute it to determineparameters for which we don’t have weights divergence.In order to show the existence of the limit, we ﬁrst recall that w is only present in neurons’ rates.Thus, thanks to the sigmoid, these rates are bounded and when one of the components of w goes to ∞ , rates in which it plays a role tends to the upper bound of the sigmoid, α M , since allneurons are excitatory ones. For instance: α ( w,

01) = ξ ( w ) −→ w →∞ α M Therefore, we can separate the space R + × R + as following the intuition given by the graph of( η ( w )) for instance: �� Figure 6: η ( w ) when A + = A − = 0 , and τ − = 2 τ + = 34 ms So the separation looks like this:As showed in the appendix A, we can compute the Laplace transforms L{ π wv } ( λ , λ ) forﬁxed w . If we introduce the dependence on w , it will be in rate terms such as α = α ( w, α = α ( w,

01) = ξ ( w ), α = α ( w,

10) = ξ ( w ), α = α = ξ (0) = α m and α = α = α = α = β . So we can rewrite η as a function of α ( w ) and α ( w ). Therefore, when r → ∞ , η ( α ( w ) , α ( w )) → η ( α M , α M ) on B . The sup of η becomes sup α m ≤ α ≤ α M η ( α M , α ) on A and sup α m ≤ α ≤ α M η ( α, α M ) on A . We conclude withlim r ∞ sup k w k≥ r η = max (cid:18) sup α m ≤ α ≤ α M η ( α, α M ) , sup α m ≤ α ≤ α M η ( α M , α ) (cid:19) We can compute numerically this limit in function of A − and τ − : t a u m o i n s - . - . . . . . . Figure 7: sup η when k w k → ∞ for A + = 0 , and τ + = 17 ms

36e note that we need a really small value of A + compared to the one of A − to satisfythe condition of positive recurrence. However, such a diﬀerence doesn’t seem to be needed insimulations. Indeed, we can have numerically positive recurrence for any parameters A + between0 and 1. Remark . The condition for null recurrence given in [37] result in η ij = 0 for all i, j in ourcase. Condition for transience leads to the exact opposite of the one of corollary 4.2: lim r → + ∞ sup w ∈ Σ , k w k≥ r ( η ( w )) ij ≥ , ∀ i, j And ∃ ( k, l ) , j = i s.t. lim r → + ∞ sup w ∈ Σ , k w k≥ r ( η ( w )) kl > It would be interesting to try to have a larger range of values of parameters for which we are inthe null recurrence case, and we need another plasticity rule to do so (with the condition of [37]).

10 neurons:

When depression is really higher than potentiation, weights seem to converge to a stationarydistribution and have such trajectories: 37owever, initial weights can play an important role. With parameters A + =0 . , A − =0 . , β =1 , α m =0 . , α M = 0 . (cid:15) =0 .

1, we have no divergence in short time with low initial weights and selection of oneweight from big initial ones, W i10 = :The selected weight is diﬀerent from one trajectory to another. Remark . We have chosen 10 neurons for plotting constraints. Thousands of them are easilysimulated. This kind of phenomenon is called winner take all dynamics in [33] where they prevent themusing iSTDP. The reason to avoid them is that it prevents new assemblies to be formed.38

Discussion

Mathematical results

Based on a well known neural network model, we added plasticity in order to get insight onthe combined neurons - weights dynamics. We could analyse plasticity on the slow time scaleof weights dynamics compared to the neurons ones, thus producing a simpliﬁed model. Thislatter gives the weights dynamics under the stationary distribution of the fast process and isa continuous time Markov jump process on the state space of weights with non homogeneousin space jump rates. Such processes are hard to deal with and current results are given in [37].Moreover, even if we could prove existence and uniqueness of the invariant measure of the fastprocess, we were not able to express it explicitly. Thus, it is even harder to analyse the limitmodel. However, we can compute its Laplace transform in small networks, we didn’t try morethan 2 but it should not be too hard for more. The problem will nevertheless become quicklyharder as it consists in inverting a 2 N square matrix for a given w and as soon as w change, thiscomputation need to be done again. Here, making use of bounds on jump rates of neurons, weare able to give conditions of stability, but we emphasize it is only suﬃcient ones. To know if weneed additive terms, depending on weights for instance or just hard bounds, in order to avoiddivergence in the context of biological parameters is still under study. Simulation results

For small networks (2 neurons) and in the case of a STDP rule following the classical STDPcurve [7], we computed Laplace transform of the stationary distribution. We then gave explicitexpression of jump rates for the limit process which enabled us to study the weight dynamicsmore precisely. We even show that the divergence of weights is possible even when integral ofthe learning window is biased towards synaptic depression, even when depression curve is alwaysstronger than depression ( p + ( s ) < p − ( s ) for all s ). Such a result is not intuitive and led us to ﬁndconditions on parameters for which such a divergence doesn’t occur. Simulations with more thantwo neurons showed the winner take all phenomenon takes place. A calibration of parametersis needed to test more characteristics of the model: how does it respond to high frequence, lowfrequence? Does it enable bidirectional connections?... Limitations of our model and future work

We are aware our neuron model is far from the reality of neurons. It is really simple in order tomake the study of plasticity easier. Some questions raise when we try to match it with biology.For instance, what does β represents? Many things at the same time: the time one neuron willinﬂuence others, the time of a spike as it will not be able to spike again until the moment itcomes back to the state 0. Neurons are generally described through their membrane potentialwhich has no link to our model. Then, observations such as potential depolarisation is needed tolead to potentiation cannot be checked or modelled. Moreover, the way their rate of jump from 0to 1 depends on weights is not really clear and needs to be clarify, maybe there is a need to adddelay as it is done in other papers [32].While STDP seems good to keep in memory stimuli, even spontaneously after such inputs [33],it needs to forget somehow. This seems not be the case in our model. Such a phenomenonis possible for instance under homeostatic mechanisms [33, 45, 48, 49]. STDP plays the role ofadditive synaptic scaling as when a weight increases, let say w , then w decreases. It is not agood thing according to [45], as they observed multiplicative synaptic scaling in their experiments.This is understandable as it is too speciﬁc and seems not suﬃcient. It is not useless if you think39s information supported by w is the exact opposite of the one supported by w , it enablesneurons " to win time ". So there is a need to add homeostasis to our model. Metaplasticityor plastic inhibitory (iSTDP) neurons are the most used. Indeed, we studied only a network ofexcitatory neurons. Adding non plastic inhibitory neurons will just decrease the minimum ofﬁring rates of neurons. However, plastic inhibitory neurons could prevent from divergence ofweights. Finally, w ii = 0 is imposed but it could be interesting to use it as an homeostatic factor,decreasing the ﬁring rate when it is to high and increasing it when it is weak. Relation to previous work

Analysis using the separation of time scale between weights dynamics and the network one hasbeen done in many other articles [11, 16, 18, 29, 30, 32, 40]. They modelled neurons as Poisson,except for [40], and derived a similar equation for weights on their slow time scale. This equationmainly depends on the cross correlation matrix which is not easy to handle with. They use Taylorexpansion and Fourier transform to approximate it for their simulations. In our model, such amatrix is hidden in the invariant measure of the fast process. Concerning the stability of weights,a similar result was found in [30] where "a stable ﬁxed point of the output rate is possible if theintegral over the learning window is suﬃciently negative." As, in their model, rates are linear inweights, stability of rates is equivalent to weights stability. Even if it is not a necessary condition,we could give an idea of how much negative the integral over the learning window needs to be inorder to have stability.

Conclusion

We propose a new view on STDP models. In contrast with tiny deterministic jumps of weights,weights have some weak probability to make a "big" jump. Thus, instead of continuous, weightsare discrete [2, 44]. Associated to the inter arrival time of spikes and the network state, we get aMarkov process. We simpliﬁed it thanks to a separation of time scale and found simple conditionsof positive recurrence. This work opens a new framework of study for plasticity which we hope itwill give rise to more mathematical results on plasticity in the following.40 nnexesA Dimension 2 for uniqueness

After giving the generator ( B , D ( B )) in 2 dimensions, we then compute the equation satisﬁes bythe Laplace transform of a given stationary distribution for ( S t , V t ). GeneratorProposition A.1. D ( B ) = { f ∈ C ub ( E ) and ( ∂ s + ∂ s ) f ∈ C ub ( E ) } and ∀ f ∈ D ( B ) :  B f ( s, (0 , α ( f (( s , , (0 , − f ( s, (0 , α ( f ((0 , s ) , (1 , − f ( s, (0 , P ∂ s i f ( s, (0 , B f ( s, (0 , α ( f ((0 , s ) , (1 , − f ( s, (0 , β ( f ( s, (0 , − f ( s, (0 , P ∂ s i f ( s, (0 , B f ( s, (1 , α ( f (( s , , (1 , − f ( s, (1 , β ( f ( s, (0 , − f ( s, (1 , P ∂ s i f ( s, (1 , B f ( s, (1 , β ( f ( s, (0 , − f ( s, (1 , β ( f ( s, (1 , − f ( s, (1 , P ∂ s i f ( s, (1 , Or in a shorter version: B f ( s, v ) = (( ∂ s + ∂ s ) f )( x )+ α (1 − v ,v ) v [ f (( s v , s ) , (1 − v , v )) − f ( x )]+ α ( v , − v ) v [ f (( s , s v ) , ( v , − v )) − f ( x )] Proof.

Let f ∈ D ( B ), then by deﬁnition lim t → E x ( f ( X t )) − f ( x ) t exists. Let’s compute it. We knowthat each element v ∈ I has only two neighbors (in the sens it can only reach two diﬀerent states).We note α v v the rates to reach the neighbor v’. We do the computations for v = (0 , E ( s, (0 , ( f ( S t , V t )) = P ( s, (0 , ( V t = v ) f (( s + t, s + t ) , (0 , P ( s, (0 , ( V t = (1 , f ((0 , s + t ) , (1 , P ( s, (0 , ( V t = (0 , f (( s + t, s + t ) , (0 , o ( t )= (cid:16) − (cid:0) α + α (cid:1) t e − ( α + α ) t (cid:17) f (( s + t, s + t ) , (0 , α t e − α t f ((0 , s + t ) , (1 , α t e − α t f (( s + t, s + t ) , (0 , o ( t )= f (( s + t ) , (0 , α ( f ((0 , s + t ) , (1 , − f ( s + t, (0 , β ( f ( s + t, (0 , − f ( s + t, (0 , o ( t )Then we obtain: B f ( x ) = lim t → E ( s, (0 , ( f ( X t )) − f ( x ) t = α ( f ((0 , s ) , (1 , − f ( s, (0 , β ( f ( s, (0 , − f ( s, (0 , , . ∇ s f ( s, (0 , B f ( x ) as in the proposition ∀ x ∈ E , and D ( B ) ⊆ { f ∈ C ub ( E ) and ( ∂ s + ∂ s ) f ∈ C ub ( E ) } . In order to have the other inclusion, we take f ∈ { g, g ∈ C ub ( E ) and ( ∂ s + ∂ s ) g ∈ C ub ( E ) } , then we compute for x = ( s, (0 , ∈ E : r x ( t ) = (cid:12)(cid:12)(cid:12)(cid:12) E x ( f ( X t )) − f ( x ) t − (cid:2) α ( f ((0 , s ) , (1 , − f ( s, (0 , β ( f ( s, (0 , − f ( s, (0 , , . ∇ s f ( s, (0 , (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) t → , . ∇ s f ∈ C ub ( E ) (cid:12)(cid:12)(cid:12)(cid:12) f ( s + t, (0 , − f ( s, (0 , t − (1 , . ∇ s f ( s, (0 , (cid:12)(cid:12)(cid:12)(cid:12) ≤ t Z t | (1 , . ∇ s f ( s + u, (0 , − (1 , . ∇ s f ( s, (0 , | du ≤ sup ≤ u ≤ t | (1 , . ∇ s f ( s + u, (0 , − (1 , . ∇ s f ( s, (0 , |≤ (cid:15) If t small enough.Hence, lim t → E ( s, (0 , ( f ( X t )) − f ( x ) t exists. As we can do exactly the same computations for all x ∈ E , we deduce that { f ∈ C ub ( E ) and ( ∂ s + ∂ s ) f ∈ C ub ( E ) } ⊆ D ( B ). Thus, we have theequality wanted.We can see here the need to chose C ub ( E ) instead of C b ( E ) for instance. Indeed, the uniformcontinuity enable us to conclude on the domain of B and on another hand it is the biggestsubspace of L ∞ ( E ) on which the derivative is the generator of a C -semigroup. If we had chosen C ( E ) = { f unctions vanishing at ∞} , we see immediately the semigroup associated to ourprocess will not map C ( E ) into itself. T t f has no reason to vanish at ∞ . C ub ( E ) seems tobe the space that suits. Moreover, thanks to the portmanteau lemma, the knowledge of thesemigroup on C ub ( E ) characterizes the law of the process. We can then use the deﬁnition 3.10to search the Laplace transforms of invariant measures. Laplace transform

First, we show we can write any invariant measure of the process in the form π ( s, v ) = P k ∈ I δ v k ( v ) µ wk π k ( s ) where ( µ w , ..., µ wN ) is the only invariant measure of the jump process ( V t )and π k is a measure on B ( R ). Then, we prove that if the process ( X t ) t ≥ has at least oneinvariant measure of probability π , then it is unique.It is interesting to look at the form of invariant measures for the following. Indeed, as ( V t )doesn’t depend on ( S t ), we can study its dynamic and deduce a nice decomposition of thestationary distribution of ( X t ). Proposition A.2.

The jump process alone ( V t ) t ≥ has a unique invariant probability measure ~µ = ( µ w , µ w , µ w , µ w ) T . Moreover, µ wv > , ∀ v ∈ I , and it satisﬁes: Q~µ =  − α − α β β α − α − β βα − α − β β α α − β   µ w µ w µ w µ w  = 0 (50) Proof.

Indeed, as each neuron is connected to each other, ( V t ) t ≥ is irreducible. As its state spaceis ﬁnite, the process is also positive recurrent so has a unique invariant probability measure µ w by theorem1.7.7 in [39]. 42oreover, as each state is positive recurrent, µ wv > , ∀ v ∈ I .The matrix Q is the matrix of transition rates (Q-matrix) of ( V t ) t ≥ . With 1 = (0 , , , , , , , Q = ( q ij ) ≤ i,j ≤ we have Q has in the proposition. As µ w isinvariant, it belongs to the kernel of Q, which is (50), Theorem 3.5.5 in [39].From this result, we deduce that ∀ k ∈ I, R R π ( ds, k ) = µ wk . Therefore, we deﬁne π k as π k ( A ) = R A π ( ds,k ) µ wk , ∀ A ∈ B ( R ). Hence, π ( s, v ) = P k ∈ I δ v k ( v ) µ wk π k ( s ).Now, we previously showed the process ( X t ) t ≥ has at least one invariant probability measureon E , let π be one of them and let’s compute its Laplace transform to show the followingproposition: Proposition A.3.

Assume the process ( X t ) t ≥ has at least one invariant measure of probability π . Then it is unique.Proof. We will show that all invariant measure of probability has the same Laplace transform andas the later characterizes it, see for instance Theorem 4.3 in [26], there only exists one invariantmeasure of probability.We can write π as π ( A, v ) = P k ∈ I π k ( A ) ⊗ µ wk δ k ( v ) , ∀ A ∈ B ( R ), with π k ( A ) = π ( A, k )( µ wk ) − .To simplify computations, we will denote by L π be the vector of Laplace transforms of π k . So ∀ λ , λ ∈ R + : L π ( λ , λ ) =  L π ( λ , λ ) L π ( λ , λ ) L π ( λ , λ ) L π ( λ , λ )  where ∀ v ∈ I, L π v ( λ , λ ) = Z R e − ( λ s + λ s ) π v ( ds )Just a remark, ∀ v ∈ I L π v (0 ,

0) = Z R π v ( ds ) = ( µ wv ) − Z R π ( ds, v ) = 1As we want to compute the Laplace transform of π which is in fact ( λ , λ ) P v ∈ I µ wv L π v ( λ , λ ),let’s use the following test functions, with λ = ( λ , λ ) and ∀ k ∈ I : e kλ ( s, v ) = e − ( λ s + λ s ) δ k ( v )By deﬁnition 3.10 of an invariant measure we get ∀ v ∈ I : X k ∈ I Z R +2 B e vλ ( s, k ) µ wk π k ( ds ) = 0 (51)43e then compute B e kλ ( s, v ): B e λ ( s, (0 , − α − α − ( λ + λ )) e − λ s − λ s B e λ ( s, (0 , βe − λ s − λ s B e λ ( s, (1 , βe − λ s − λ s B e λ ( s, (1 , B e λ ( s, (0 , α e − λ s B e λ ( s, (0 , − α − β − ( λ + λ )) e − λ s − λ s B e λ ( s, (1 , B e λ ( s, (1 , βe − λ s − λ s B e λ ( s, (0 , α e − λ s B e λ ( s, (0 , B e λ ( s, (1 , − α − β − ( λ + λ )) e − λ s − λ s B e λ ( s, (1 , βe − λ s − λ s B e λ ( s, (0 , B e λ ( s, (0 , α e − λ s B e λ ( s, (1 , α e − λ s B e λ ( s, (1 , − β − ( λ + λ )) e − λ s − λ s So with (51) and v = (0 ,

0) for instance: X k ∈ I Z R +2 B e λ ( s, k ) µ wk π k ( ds ) = 0 ⇔ (cid:0) − α − α − ( λ + λ ) (cid:1) µ w L π ( λ , λ ) + βµ w L π + βµ w L π = 0After computations for all v ∈ I we get: M ( λ , λ )  L π ( λ , λ ) L π ( λ , λ ) L π ( λ , λ ) L π ( λ , λ )  =  − α L π ( λ , µ w1 µ w2 − α L π (0 , λ ) µ w1 µ w3 − α L π (0 , λ ) µ w2 µ w4 − α L π ( λ , µ w3 µ w4  (52)44ith: M ( λ , λ ) =  − α − α − λ − λ β µ w2 µ w1 β µ w3 µ w1 − α − β − λ − λ β µ w4 µ w2 − α − β − λ − λ β µ w4 µ w3 − β − λ − λ  As we have L π ( λ ,

0) =  L π ( λ , L π ( λ , L π ( λ , L π ( λ ,  and L π (0 , λ ) =  L π (0 , λ ) L π (0 , λ ) L π (0 , λ ) L π (0 , λ )  .Then we can get L π ( λ ,

0) and L π (0 , λ ) evaluating (52) in λ = 0 and λ = 0: M ( λ , L π ( λ ,

0) =  − α L π ( λ , µ w1 µ w2 − α L π (0 , µ w1 µ w3 − α L π (0 , µ w2 µ w4 − α L π ( λ , µ w3 µ w4  As ∀ v ∈ I , L π v (0 ,

0) = R R +2 π v ( ds , ds ) = 1 so: M ( λ , L π ( λ ,

0) =  − α µ w µ w − α µ w µ w  L π ( λ ,

0) +  − α µ w µ w − α µ w µ w  And M (0 , λ ) L π (0 , λ ) =  − α µ w µ w − α µ w µ w  L π ( λ ,

0) +  − α µ w µ w − α µ w µ w  Putting terms in T i ( λ i ) in matrices marked M i ( λ i ) we get: M ( λ ) L π ( λ ,

0) =  − α µ w1 µ w2 − α µ w3 µ w4  and M ( λ ) L π (0 , λ ) =  − α µ w1 µ w3 − α µ w2 µ w4  (53)45ith: M ( λ ) =  − α − α − λ β µ µ β µ µ α µ µ − α − β − λ β µ µ − α − β − λ β µ µ α µ µ − β − λ  And M ( λ ) =  − α − α − λ β µ µ β µ µ − α − β − λ β µ µ α µ µ − α − β − λ β µ µ α µ µ − β − λ  As a triangular superior matrix with diagonal elements strictly positive, M is invertible. Moreover, M and M are invertible as diagonally dominant matrices whenever ( λ , λ ) ∈ R ∗ + × R ∗ + :First line of (50) gives: (cid:0) α + α (cid:1) µ w = βµ w + βµ w So ∀ λ ∈ R ∗ + : | M ( λ ) | − X j =2 | M j ( λ ) | = λ > ∀ λ ∈ R ∗ + : | M ( λ ) | − X j =2 | M j ( λ ) | = λ > | M ( λ ) | − X j =3 | M j ( λ ) | = α µ w + λ > | M ( λ ) | − X j =4 | M j ( λ ) | = α µ w + λ > M which shows M ( λ ) et M ( λ ) are diagonally dominant matricesso they are invertible ∀ ( λ , λ ) ∈ R ∗ + × R ∗ + . Hence, if π is an invariant measure for ( X t ) t ≥ , ∀ ( λ , λ ) ∈ R ∗ + × R ∗ + : L π (0 ,

0) =  

By (53): L π ( λ ,

0) = ( M ( λ )) −  − α µ w1 µ w2 − α µ w3 µ w4  and L π (0 , λ ) = ( M ( λ )) −  − α µ w1 µ w3 − α µ w2 µ w4  (54)46y (52) L π ( λ , λ ) = ( M ( λ , λ )) −  − α L π ( λ , µ w1 µ w2 − α L π (0 , λ ) µ w1 µ w3 − α L π (0 , λ ) µ w2 µ w4 − α L π ( λ , µ w3 µ w4  We conclude using the fact the Laplace transform of a law determines it, so π is unique.47 eferences [1] L. F. Abbott and S. B. Nelson. Synaptic plasticity: taming the beast. Nature neuroscience ,3:1178–1183, 2000.[2] D. J. Amit and S. Fusi. Learning in neural networks with material synapses.

NeuralComputation , 6(5):957–982, 1994.[3] P. A. Appleby and T. Elliott. Synaptic and temporal ensemble interpretation of spike-timing-dependent plasticity.

Neural computation , 17(11):2316–2336, 2005.[4] P. A. Appleby and T. Elliott. Stable competitive dynamics emerge from multispike interactionsin a stochastic model of spike-timing-dependent plasticity.

Neural computation , 18(10):2414–2464, 2006.[5] M. Benayoun, J. D. Cowan, W. van Drongelen, and E. Wallace. Avalanches in a StochasticModel of Spiking Neurons.

PLoS Computational Biology , 6(7):e1000846, July 2010.[6] M. K. Benna and S. Fusi. Computational principles of synaptic memory consolidation.

Nature Neuroscience , 19(12):1697–1706, Oct. 2016.[7] G.-q. Bi and M.-m. Poo. Synaptic modiﬁcations in cultured hippocampal neurons: dependenceon spike timing, synaptic strength, and postsynaptic cell type.

Journal of neuroscience ,18(24):10464–10472, 1998.[8] E. L. Bienenstock, L. N. Cooper, and P. W. Munro. Theory for the development of neuronselectivity: orientation speciﬁcity and binocular interaction in visual cortex. Technical report,DTIC Document, 1981.[9] P. C. Bressloﬀ. Metastable states and quasicycles in a stochastic Wilson-Cowan model ofneuronal population dynamics.

Physical Review E , 82(5), Nov. 2010.[10] N. Brunel. Is cortical connectivity optimized for storing information?

Nature Neuroscience ,19(5):749–755, Apr. 2016.[11] A. N. Burkitt, H. Meﬃn, and D. B. Grayden. Spike-timing-dependent plasticity: therelationship to rate-based learning for models with weight dynamics determined by a stableﬁxed point.

Neural Computation , 16(5):885–940, 2004.[12] C. Clopath, L. Büsing, E. Vasilaki, and W. Gerstner. Connectivity reﬂects coding: A modelof voltage-based spike-timing-dependent-plasticity with homeostasis.

Nature , 2009.[13] M. H. Davis. Piecewise-deterministic Markov processes: A general class of non-diﬀusionstochastic models.

Journal of the Royal Statistical Society. Series B (Methodological) , pages353–388, 1984.[14] M. H. A. Davis.

Markov models and optimization . Monographs on statistics and appliedprobability. Chapman & Hall, London ; New York, 1st ed edition, 1993.[15] K. Fox and M. Stryker. Integrating Hebbian and homeostatic plasticity: introduction.

Philosophical Transactions of the Royal Society B: Biological Sciences , 372(1715):20160413,Mar. 2017. 4816] M. N. Galtier and G. Wainrib. A Biological Gradient Descent for Prediction Through aCombination of STDP and Homeostatic Plasticity.

Neural Computation , 25(11):2815–2832,Nov. 2013.[17] W. Gerstner and W. M. Kistler.

Spiking neuron models: single neurons, populations, plasticity .Cambridge University Press, Cambridge, U.K. ; New York, 2002.[18] M. Gilson, A. N. Burkitt, D. B. Grayden, D. A. Thomas, and J. L. van Hemmen. Emergence ofnetwork structure due to spike-timing-dependent plasticity in recurrent neuronal networks. I.Input selectivity–strengthening correlated input pathways.

Biological Cybernetics , 101(2):81–102, Aug. 2009.[19] M. Gilson, A. N. Burkitt, D. B. Grayden, D. A. Thomas, and J. L. van Hemmen. Emergenceof network structure due to spike-timing-dependent plasticity in recurrent neuronal networksV: self-organization schemes and weight dependence.

Biological Cybernetics , 103(5):365–386,Nov. 2010.[20] M. Gilson, T. Fukai, and A. N. Burkitt. Spectral Analysis of Input Spike Trains by Spike-Timing-Dependent Plasticity.

PLOS Computational Biology , 8(7):e1002584, 2012.[21] M. Graupner and N. Brunel. Calcium-based plasticity model explains sensitivity of synapticchanges to spike pattern, rate, and dendritic location.

PNAS , 109(52):21551–21552, 2012.[22] M. Hairer. Convergence of Markov processes. lecture notes , 2010.[23] D. Hebb.

The Organization of Behavior . Wiley & Sons. Wiley, New York, 1st ed edition,1949.[24] E. M. Izhikevich.

Dynamical systems in neuroscience: the geometry of excitability and bursting .Computational neuroscience. MIT Press, Cambridge, Mass, 2007. OCLC: ocm65400606.[25] E. M. Izhikevich and N. S. Desai. Relating stdp to bcm.

Neural computation , 15(7):1511–1523,2003.[26] O. Kallenberg.

Foundations of modern probability . Springer Science & Business Media, 2006.[27] H.-W. Kang and T. G. Kurtz. Separation of time-scales and model reduction for stochasticreaction networks.

The Annals of Applied Probability , 23(2):529–583, Apr. 2013.[28] S. Karlin and J. McGregor. The classiﬁcation of birth and death processes.

Transactions ofthe American Mathematical Society , 86(2):366–400, 1957.[29] R. Kempter, W. Gerstner, and J. L. Van Hemmen. Hebbian learning and spiking neurons.

Physical Review E , 59(4):4498, 1999.[30] R. Kempter, W. Gerstner, and J. L. Van Hemmen. Intrinsic stabilization of output rates byspike-based Hebbian learning.

Neural computation , 13(12):2709–2741, 2001.[31] T. G. Kurtz. Averaging for martingale problems and stochastic approximation. In

AppliedStochastic Analysis , pages 186–209. Springer, 1992.[32] G. Lajoie, N. I. Krouchev, J. F. Kalaska, A. L. Fairhall, and E. E. Fetz. Correlation-basedmodel of artiﬁcially induced plasticity in motor cortex by a bidirectional brain-computerinterface.

PLOS Computational Biology , 13(2):e1005343, 2017.4933] A. Litwin-Kumar and B. Doiron. Formation and maintenance of neuronal assemblies throughsynaptic plasticity.

Nature communications , 5:5319, 2014.[34] H. Markram. A history of spike-timing-dependent plasticity.

Frontiers in Synaptic Neuro-science , 3, 2011.[35] H. Markram, W. Gerstner, and P. J. Sjöström. Spike-Timing-Dependent Plasticity: AComprehensive Overview.

Frontiers in Synaptic Neuroscience , 4, 2012.[36] H. Markram, J. Lübke, M. Frotscher, and B. Sakmann. Regulation of synaptic eﬃcacy bycoincidence of postsynaptic aps and epsps.

Science , 275(5297):213–215, 1997.[37] M. Menshikov, S. Popov, and A. Wade.

Non-homogeneous Random Walks: LyapunovFunction Methods for Near-Critical Stochastic Systems , volume 209. Cambridge UniversityPress, 2016.[38] A. Morrison, M. Diesmann, and W. Gerstner. Phenomenological models of synaptic plasticitybased on spike timing.

Biological Cybernetics , 98(6):459–478, June 2008.[39] J. R. Norris.

Markov chains . Number 2. Cambridge university press, 1998.[40] G. K. Ocker, A. Litwin-Kumar, and B. Doiron. Self-organization of microcircuits in networksof spiking neurons with plastic synapses.

PLoS Comput Biol , 11(8):e1004458, 2015.[41] D. H. O’Connor, G. M. Wittenberg, and S. S.-H. Wang. Graded bidirectional synapticplasticity is composed of switch-like unitary events.

Proceedings of the National Academy ofSciences of the United States of America , 102(27):9679–9684, 2005.[42] E. Pechersky, G. Via, and A. Yambartsev. Stochastic Ising model with plastic interactions.

Statistics & Probability Letters , 123:100–106, Apr. 2017.[43] P. E. Protter.

Stochastic Integration and Diﬀerential Equations: Version 2.1 . Number 21 inStochastic Modelling and Applied Probability. Springer, Berlin, 2. ed. , corr. 3rd pr edition,2010. OCLC: 837782643.[44] C. Ribrault, K. Sekimoto, and A. Triller. From the stochasticity of molecular processes tothe variability of synaptic transmission.

Nature Reviews Neuroscience , 12(7):375–387, June2011.[45] G. G. Turrigiano. The dialectic of Hebb and homeostasis.

Philosophical Transactions of theRoyal Society B: Biological Sciences , 372(1715):20160258, Mar. 2017.[46] R. L. Tweedie. Invariant Measures for Markov Chains with no Irreducibility Assumptions.

Journal of Applied Probability , 25:275–285, 1988.[47] P. Yger and M. Gilson. Models of Metaplasticity: A Review of Concepts.

Frontiers inComputational Neuroscience , 9, Nov. 2015.[48] F. Zenke, W. Gerstner, and S. Ganguli. The temporal paradox of Hebbian learning andhomeostatic plasticity.

Current Opinion in Neurobiology , 43:166–176, Apr. 2017.[49] F. Zenke, G. Hennequin, and W. Gerstner. Synaptic Plasticity in Neural Networks NeedsHomeostasis with a Fast Rate Detector.

Related Researches

Adding edge dynamics to wireless random-access networks

by Matteo Sfragara

A CLT for degenerate diffusions with periodic coefficients, and application to homogenisation of linear PDEs

by Nikola Sandri?

A probabilistic approach to the Erdös-Kac theorem for additive functions

by Louis H. Y. Chen

Exact lower bound on an "exactly one" probability

by Iosif Pinelis

Frozen 1 -RSB structure of the symmetric Ising perceptron

by Will Perkins

Infinite Horizon Multi-Dimensional BSDE with Oblique Reflection and Switching Problem

by Brahim El Asri

On telegraph processes, their first passage times and running extrema

by Nikita Ratanov

Berry--Esseen Bounds for Multivariate Nonlinear Statistics with Applications to M-estimators and Stochastic Gradient Descent Algorithms

by Qi-Man Shao

Stationary Distribution Convergence of the Offered Waiting Processes in Heavy Traffic under General Patience Time Scaling

by Chihoon Lee

Statistical Enumeration of Groups by Double Cosets

by Persi Diaconis

A shape theorem for exploding sandpiles

by Ahmed Bou-Rabee

The two-sided exit problem for a random walk on Z and having infinite variance II

by Kohei Uchiyama

The Brownian Web as a random R -tree

by Giuseppe Cannizzaro

A note on Fokker-Planck equations and graphons

by Fabio Coppini

Convergence rate to the Tracy-Widom laws for the largest eigenvalue of Wigner matrices

by Kevin Schnelli

Numerical approximations of one-point large deviations rate functions of stochastic differential equations with small noise

by Jialin Hong

Set-valued Ito's formula with an application to the general set-valued backward stochastic differential equation

by Yao-jia Zhang

Existence of solutions to a system of SDEs with mean-field drift and jump random measures

by Ying Jiao

Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations

by Zongming Ma

Local elliptic law

by Johannes Alt

Unified Signature Cumulants and Generalized Magnus Expansions

by Peter K. Friz

The Critical Mean-field Chayes-Machta Dynamics

by Antonio Blanca

A new discrete distribution arising from a generalised random game and its asymptotic properties

by Rudolf Frühwirth

Multiple Phase Transitions for an Infinite System of Spiking Neurons

by A. M. B. Nascimento

Stability of Overshoots of Markov Additive Processes

by Leif Döring

«

1

2

3

4

»

Submitted on 1 Jun 2017 (v1), last revised 1 Mar 2018 (this version, v3) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar