Improved algorithm for neuronal ensemble inference by Monte Carlo method
aa r X i v : . [ c ond - m a t . d i s - nn ] N ov Improved algorithm for neuronal ensembleinference by Monte Carlo method
Shun Kimura and Koujin Takeda
Department of Mechanical Systems Engineering,Graduate school of Science and Engineering,Ibaraki University
Correspondence:[email protected]
Abstract.
Neuronal ensemble inference is one of the significant prob-lems in the study of biological neural networks. Various methods havebeen proposed for ensemble inference from their activity data taken ex-perimentally. Here we focus on Bayesian inference approach for ensembleswith generative model, which was proposed in recent work. However, thismethod requires large computational cost, and the result sometimes getsstuck in bad local maximum solution of Bayesian inference. In this work,we give improved Bayesian inference algorithm for these problems. Wemodify ensemble generation rule in Markov chain Monte Carlo method,and introduce the idea of simulated annealing for hyperparameter con-trol. We also compare the performance of ensemble inference betweenour algorithm and the original one.
Keywords: neural network, Bayesian inference, Markov chain MonteCarlo method, simulated annealing, neuronal dynamics and structureinference
In recent study of biological neural network, advanced recording technologiessuch as calcium imaging or functional magnetic resonance imaging enable us toobtain neuronal activity data from thousands of neurons simultaneously. Suchactivity data will reveal features of neural network. For instance, neurons in thesame neuronal ensemble tend to fire synchronously[1][2], therefore identificationof ensembles is significant for understanding whole neural network structureand its dynamical behavior. In fact, there are some studies on neural networkstructure using ensemble information[3][4].Several statistical methods are known for ensemble identification in activitydata. For instance, principal component analysis or singular value decompositioncan identify ensembles[5]. Their advantage is that they can effectively reducedimension and volume of activity data, which are very large in general. Onthe other hand, these methods require prior knowledge such as the number ofensembles. There is an alternative approach using graph theory, where neuronalactivity is expressed as nodes in graph. From such graph, ensemble activity can
Shun Kimura and Koujin Takeda be extracted by graph clustering method such as spectral clustering[6], whilethis method is basically applied to static data and neglects dynamical behavior.One of the strategies to overcome above-mentioned problems is Bayesianmodeling for neuronal activity. In recent work, generative model of ensembleactivity was proposed by Bayesian inference framework[7]. Using Markov chainMonte Carlo (MCMC) method, this enables us to infer neuronal ensembles andtheir dynamical behavior. However, this requires large computational cost inMCMC, and the result sometimes gets stuck in bad local maximum solution ofBayesian inference.In this work, for reduction of computational cost and bad local maximumproblem, we propose an improved algorithm for neuronal ensemble identifica-tion. First, we change the update rule in MCMC for controlling the number ofensembles. Second, we introduce the idea of simulated annealing for hyperpa-rameter control. We also compare our algorithm and the original one in termsof ensemble identification using synthetic data, and discuss the advantage of ourmethod.
Here we outline the framework of Bayesian inference. We basically follow thenotation in the original paper[7]. See it for the detail. Each neuron has thelabel i ∈ { , , . . . N } , and N is the total number of neurons. The variable k ∈{ , , . . . M } is the time step, and M is the size of time frame. The variable µ ∈ { , , . . . A } is the label of neuronal ensemble, and A is the total numberof ensembles. The i th neuron has neuronal membership label to an ensemble, t i ∈ { , , . . . , A } , and activity variable s ik ∈ { , } at time k . The µ th ensemblehas ensemble activity variable ω kµ ∈ { , } at time k . The state 1 for binaryvariable means active (firing) neuron or neuronal ensemble, and the state 0 isinactive.With these variables, we give generative model for neuronal activity as theconditional joint probability, P ( t , ω , s | n , p , λ ) ∝ N Y i =1 n t i ! · A Y µ =1 M Y k =1 p ω kµ µ (1 − p µ ) − ω kµ ! · N Y i =1 M Y k =1 [ λ t i ( ω kt i )] s ik [1 − λ t i ( ω kt i )] (1 − s ik ) ! , (1)where boldface letter represents the set of variables (e.g. t = { t , t , . . . t N } ).The µ th ensemble has activity rate p µ , which means activity of ensemble. Italso has assign probability n µ , which describes how many neurons belong tothis ensemble. Conditional activity rate λ t i of the i th neuron for given ensembleactivity ω kt i is defined as λ t i ( ω kt i ) = P ( s ik = 1 | ω kt i ) for ω kt i ∈ { , } . (2) mproved algorithm for neuronal ensemble inference 3 i=1:N k=1:M μ=1:A α µ(n) β µ(p) α µ(p) A n µ λ μ (0) λ μ (1) ω kμ p µ β α β α s ik t i B i=1:N k=1:M s ik t i μ=1:A ω kμ β µ(p) α µ(p) β α β α α µ(n) μ=1:A Fig. 1.
Graphical representation of the relation among variables/parameters inBayesian inference model: (A) the full model (B) the model after integrating out of n , p , λ . Accordingly, inactivity rate is expressed as 1 − λ t i ( ω kt i ) = P ( s ik = 0 | ω kt i ).The parameter λ describes coherence or incoherence (=noise) between neuronalactivity and ensemble activity. We assume that the priors of ensemble activityrate p and conditional activity rate λ are beta distribution (denoted by Beta),while the prior of assign probability n is Dirichlet distribution (by Dir), namely P ( p µ ) = Beta (cid:16) α ( p ) µ , β ( p ) µ (cid:17) , (3) P ( λ µ ( z )) = Beta (cid:16) α ( λ ) z,µ , β ( λ ) z,µ (cid:17) , (4) P ( n , · · · , n A ) = Dir (cid:16) α ( n )1 , · · · , α ( n ) A (cid:17) , (5)where α ( p ) µ , β ( p ) µ , α ( λ ) z,µ , β ( λ ) z,µ , α ( n ) µ ( z ∈ { , } , µ ∈ { , , . . . , A } ) are hyperparame-ters. The relation among variables/parameters in this generative model is rep-resented graphically in Fig. 1A.For improvement of inference accuracy, we analytically integrate out the setof model parameters { n , p , λ } . Integration over these parameters leads to the Shun Kimura and Koujin Takeda joint probability as P ( t , ω , s ) = Z d n d p d λ P ( t , ω , s | n , p , λ ) P ( n , p , λ ) ∝ Z d n d p d λ N Y i =1 n t i ! · Dir (cid:16) α ( n )1 , . . . , α ( n ) A (cid:17) · A Y µ =1 M Y k =1 p ω kµ µ (1 − p µ ) − ω kµ ! · A Y µ =1 Beta (cid:16) α ( p ) µ , β ( p ) µ (cid:17)! · N Y i =1 M Y k =1 [ λ t i ( ω kt i )] s ik [1 − λ t i ( ω kt i )] (1 − s ik ) ! · A Y µ =1 Y z ∈{ , } Beta (cid:16) α ( λ ) z,µ , β ( λ ) z,µ (cid:17) = B ( α ( n )1 + G , α ( n )2 + G , . . . , α ( n ) A + G A ) B ( α ( n )1 , α ( n )2 , . . . , α ( n ) A ) ! · A Y µ =1 B ( H µ , ¯ H µ ) B ( α ( p ) µ , β ( p ) µ ) Y z ∈{ , } B ( T z µ , T z µ ) B ( α ( λ ) z,µ , β ( λ ) z,µ ) . (6)Note that B ( · , · ) is beta function, and B is defined by gamma functions as B ( x , · · · , x A ) ≡ Q Ak =1 Γ ( x k ) Γ ( P Ak =1 x k ) . (7)In addition, we introduce several variables, G µ = N X i =1 δ µ,t i , H µ = α ( p ) µ + M X k =1 ω kµ , ¯ H µ = β ( p ) µ + M X k =1 (1 − ω kµ ) ,T z µ = α ( λ ) z,µ + M X k =1 X i ∈ µ δ z,ω kµ δ ,s ik , T z µ = β ( λ ) z,µ + M X k =1 X i ∈ µ δ z,ω kµ δ ,s ik , (8)where δ ij is Kronecker delta function, boldface µ is the set of neurons in the µ th ensemble, and z ∈ { , } . All variables are defined for the µ th ensemble, andtheir meanings are as follows: G µ counts the number of neurons in the ensemble. H µ and ¯ H µ indicate frequencies of active and inactive states, respectively. T z µ and T z µ measure coherence between ensemble activity and neuronal activityunder the same superscript numbers, and incoherence (=noise) under differentsuperscript numbers. The relation among variables/parameters in the modelafter integrating out of parameters { n , p , λ } is represented in Fig. 1B. mproved algorithm for neuronal ensemble inference 5 The posterior P ( t , ω | s ) can be constructed from this model. With this poste-rior, we can infer membership t and ensemble activity ω from neuronal activityvariable s , which is obtained experimentally. With the scheme mentioned above, we can obtain neuronal ensembles and en-sembles activity by Bayesian inference. However, the number of possible neuronalstates in all neurons is huge, therefore direct Bayesian inference is infeasible.For computational cost problem, we employ MCMC to evaluate maximum ofposterior distribution. In the previous work[7] collapsed Gibbs sampling is used,where the number of ensembles A is unknown and we also need to evaluate it.For inference of A , Dirichlet process (DP)[8] is introduced, and we can vary A tillconvergence of DP. However, we need to start with large A as initial conditionin DP. If we start with small A , we will get stuck at the solution of very fewensembles or without ensemble structure, which is supposed to be bad localmaximum solution of Bayesian inference. Hence, this method still requires largecomputational cost at early stage of MCMC for successful inference, because thecost is proportional to A at a given MCMC stage. (See Algorithm 1.)To cope with this problem, we propose a novel method. The differences fromthe previous work are summarized as follows: – In our method, when new ensemble is created for increasing A in DP, multiple neurons can move to new ensemble simultaneously , while single neuron canmove to new ensemble in the original. Introduction of such simultaneousmove will make new ensemble hard to vanish. – We apply the idea of simulated annealing to transient probability to newensemble in DP.As shown later, our method enables us to infer appropriate ensemble structure without starting large A .We give the detail of our algorithm in the following. In our MCMC, wefirst update ensemble activity ω , then update ensemble membership label t bycollapsed Gibbs sampling. This process is the same as the original. Next, weemploy DP in order to increase/decrease the number of ensembles. Supposethat we have A ensembles in the intermediate stage of MCMC. Destinationensemble of the i th neuron after MCMC update, denoted by t ∗ i , is drawn fromthe probability, q i ( t ∗ i ) = G ( \ i ) t ∗ i q [ γ ] α + N − t ∗ i = 1 , . . . , A,q [ γ ] α q [ γ ] α + N − t ∗ i = A + 1 . (9) Shun Kimura and Koujin Takeda
The backslash \ denotes the removal of a specific element, and G ( \ i ) t ∗ i means thenumber of neurons in the t ∗ i th ensemble, where the i th neuron is not counted.Note that P Aµ =1 G ( \ i ) µ = N − q i ( t ∗ i ) satisfies P A +1 µ =1 q i ( t ∗ i ) = 1. The param-eter q [ γ ] α is proportional to transient probability to the new ( A + 1)th ensemble.As mentioned before, we apply the idea of simulated annealing to DP. In ourmethod, the transient parameter at the γ th MCMC stage q [ γ ] α decays exponen-tially as q [ γ ] α = q [0] α e − γτ , (10)where τ is decay constant. The idea of equation (10) is that the number of en-sembles A is changed frequently at early MCMC stage for exploring appropriate A , while the change is suppressed at late stage for convergence. If new ensembleis accepted in DP, we must generate new ensemble activity ω and hyperparam-eters of new ensemble. We give hyperparameters of the new ( A + 1)th ensembleas arithmetic average of already-existing hyperparameters, because new hyper-parameter should have the same scale as others for appropriate convergence ofMCMC. α ( p ) A +1 = 1 A A X µ =1 α ( p ) µ , β ( p ) A +1 = 1 A A X µ =1 β ( p ) µ ,α ( λ ) z,A +1 = 1 A A X µ =1 α ( λ ) z,µ , β ( λ ) z,A +1 = 1 A A X µ =1 β ( λ ) z,µ for z ∈ { , } ,α ( n ) A +1 = 1 A A X µ =1 α ( n ) µ . (11)In addition, if some already-existing ensembles become empty (=no neuron)after update, we delete these ensembles and their hyperparameters.Now we consider the case that the membership label t = { t , t , . . . , t N } may be updated to new one t ∗ = { t ∗ , t ∗ , . . . , t ∗ N } in MCMC. In this update, theratio of conditional probabilities between t and t ∗ is calculated from equation(6), which is necessary for acceptance rule of MCMC update, P ( t ∗ , ω , s ) P ( t , ω , s ) = Q Aµ =1 n Γ ( G µ + α µ ) Q z = { , } B ( T z1 µ , T z0 µ ) o | t = t ∗ Q Aµ =1 n Γ ( G µ + α µ ) Q z = { , } B ( T z1 µ , T z0 µ ) o | t = t · " Γ ( P A +1 µ =1 α ( n ) µ ) Γ ( P Aµ =1 α ( n ) µ ) · Γ ( G A +1 + α ( n ) A +1 ) Γ ( α ( n ) A +1 ) · B ( H A +1 , ¯ H A +1 ) · Q z = { , } B ( T z A +1 , T z A +1 ) B ( α ( p ) A +1 , β ( p ) A +1 ) · Q z = { , } B ( α ( λ ) z,A +1 , β ( λ ) z,A +1 ) . (12)The factor in the square bracket is the contribution from transient neurons to thenew ( A + 1)th ensemble. If there is no neuron to the new ensemble, the factor in mproved algorithm for neuronal ensemble inference 7 the square bracket vanishes because the denominator and the numerator cancelout.For Metropolis-Hastings update rule, we also need to define proposal dis-tribution from the t i th ensemble to the t ∗ i th, Q i ( t ∗ i | t i ), and its reverse process Q i ( t i | t ∗ i ) for the i th neuron. These probabilities are calculated to satisfy detailedbalance condition as Q i ( t ∗ i | t i ) = G ( \ i ) t ∗ i q [ γ ] α + N − t ∗ i = 1 , . . . , A,q [ γ ] α q [ γ ] α + N − t ∗ i = A + 1 , (13) Q i ( t i | t ∗ i ) = G ( \ i ) t i N − , (14)for the γ th MCMC stage, where equation (13) is the same as (9). If multipleneurons move simultaneously, we must consider the product of the probabilitiesabove for all transient neurons. As a result, acceptance rate from the membershiplabel t to t ∗ is written as a ( t ∗ , t ) = min (cid:26) , P ( t ∗ , ω , s ) P ( t , ω , s ) Q ( t | t ∗ ) Q ( t ∗ | t ) (cid:27) , (15)where Q ( t | t ∗ ) = N Y i =1 Q i ( t i | t ∗ i ) , Q ( t ∗ | t ) = N Y i =1 Q i ( t ∗ i | t i ) . (16)Finally, we update hyperparameters of { p , λ , n } for remaining ensembles.For hyperparameter update, we introduce learning rate ε [ γ ] , where γ is the stageof MCMC update as in (10), to control the effect of simulated annealing. Herewe use sigmoid function for ε [ γ ] because it is bounded and smooth, ε [ γ ] = 11 + e − γτ . (17)The decay constant τ is the same as in (10). Note that we do not need tointroduce additional hyperparameter for the learning rate. Following the updaterule in the original[7], hyperparameters should be updated with learning rate ε [ γ ] as ˜ α ( p ) µ = α ( p ) µ + ε [ γ ] M X k =1 ω kµ ! , ˜ β ( p ) µ = β ( p ) µ + ε [ γ ] M X k =1 (1 − ω kµ ) ! , ˜ α ( λ ) z,µ = α ( λ ) z,µ + ε [ γ ] M X k =1 X i ∈ µ δ z,ω kµ δ ,s ik , ˜ β ( λ ) z,µ = β ( λ ) z,µ + ε [ γ ] M X k =1 X i ∈ µ δ z,ω kµ δ ,s ik , ˜ α ( n ) µ = α ( n ) µ + ε [ γ ] G µ , (18) Shun Kimura and Koujin Takeda
Algorithm 1
Inference of ensembles activity and the number of ensembles initialize ω and t while the number of ensembles A converges dofor each ensemble µ ∈ [1 , A ], k ∈ [1 , M ] do draw ω kµ ∼ P ( ω kµ = 1 | t, ω \ kµ , s ) end forfor each neuron i ∈ [1 , N ] do draw destination ensemble t ∗ i ∼ q ( t ∗ i ) in equation (9) if t ∗ i = A + 1 then A → A + 1give new hyperparameters as equation (11) end ifend forfor each neuron i ∈ [1 , N ] do draw t ∼ a ( t ∗ , t ) in equation (16) end forfor each ensemble µ ∈ [1 , A ] doif G µ = 0 then delete the µ th ensemble and its hyperparameters end ifend for update hyperparameters as equation (18) end while where tilde means updated hyperparameter. By introducing learning rate, hy-perparameter update is suppressed at early stage of MCMC, when the numberof ensembles A is frequently changed instead.To conclude, we summarize our MCMC process as the pseudo code in Al-gorithm 1. The symbol ω \ kµ means the set of parameters ω excepting ω kµ fordescribing collapsed Gibbs sampling. Before discussion on utility of our algorithm, we summarize how to generatesynthetic neuronal activity data for our numerical experiment. In our generativemodel, neuronal activities are closely related to ensemble activities as in equation(2), and the relation between neuronal/ensemble activities is characterized byconditional activity rate λ . Hence, to generate synthetic data, we first divide allneurons into ensembles. Next we generate ensemble activity data ω based onensemble activity parameter p . In the last we determine activity of each neuron s by conditional activity rate λ .The algorithm of synthetic data generation is summarized as the pseudo codein Algorithm 2. Detail can be found in the original work as well[7]. mproved algorithm for neuronal ensemble inference 9 Algorithm 2
Generation of synthetic neuronal activity data set all ω and s to be 0 for each ensemble µ ∈ [1 , A ], k ∈ [1 , M ] do draw ω kµ ∼ P ( ω kµ ) end forfor each neuron i ∈ [1 , N ] do deal the i th neuron to an ensemble end forfor each neuron i ∈ [1 , N ], k ∈ [1 , M ] doif ω kt i = 1 then draw s ik ∼ P ( s ik | ω kt i = 1) else draw s ik ∼ P ( s ik | ω kt i = 0) end ifend for Table 1. The condition of synthetic data generationparameter valuethe number of neurons N = 100the number of ensembles A = 10ensemble activity rate p µ = 0 . ∀ µ )conditional activity rate λ µ (0) = 0 . λ µ (1) = 0 . ∀ µ ) To validate our method, we conduct numerical experiment for neuronal ensembleinference. In the experiment, we first generate synthetic data by Algorithm 2,then we extract the information of ensembles by Algorithm 1. We illustrate asample of activity matrix s in Fig. 2A, which is obtained by Algorithm 2. In thissample we have 10 ensembles and 100 neurons, where each ensemble has equally10 neurons. The vertical axis shows the neuron label, which is sorted by neuronalmembership label t . We can easily see the structure of 10 ensembles, howeverwe should note that it cannot be seen if neuron labels are randomly permuted.The parameters for synthetic data generation are summarized in Table 1. Allensembles/neurons are generated with the same ensemble activity rate p andconditional activity rate λ .For ensemble inference, we use the data in Fig. 2A as input activity s . Asshown in Algorithm 1, we repeatedly update ensemble activity ω , ensemble mem-bership label t , and hyperparameters till the number of transient neurons in DPbecomes sufficiently small. In this experiment, we set initial number of ensem-bles A = 3, decay constant τ = 10, initial transient parameter q [0] α = 100 andhyperparameters α ( p ) µ = 100 , β ( p ) µ = 100 , α ( λ ) z,µ = 100 , β ( λ ) z,µ = 100 , α ( n ) µ = 100 forall µ, z . At initialization step, we randomly assign initial membership label toeach neuron uniformly between 1 to A . stage of MCMC update ( γ ) n e u r on l a b e l ( i ) B
0 100 200 300 400 500 600 700 800 900 1000 time ( k )
20 40 60 80 100 n e u r on l a b e l ( i ) A C t h e nu m b e r o f e n s e m b l e s ( A ) t r a n s i e n t r a t e stage of MCMC update ( γ )
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 10 20 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1the number of ensemblestransient rate
Fig. 2.
Dynamical behavior of Algorithm 1 in MCMC: (A) Synthetic data of neu-ronal activity with 100 neurons, 10 ensembles and 1000 time steps. black=active,white=inactive. (B) Behavior of ensemble membership label. (C) Behavior of the num-ber of ensembles (broken) and transient rate (solid).mproved algorithm for neuronal ensemble inference 11
In Fig. 2B, we show a typical example of dynamical membership label be-havior in MCMC by the heat map, where the horizontal axis indicates the stageof MCMC update. The colors in the heat map distinguish ensemble numbers { , , . . . , A } . The original ensemble structure is clearly obtained at most after40 MCMC stages.In Fig. 2C, we show the behavior of the number of ensembles A and transientrate in MCMC, which is calculated from the result in Fig. 2B. Transient rate isdefined by transient rate = 1 N N X i =1 (cid:16) − δ t [ γ ] i ,t [ γ − i (cid:17) (19)at the γ th MCMC stage, namely the fraction of transient neurons between the( γ − γ th MCMC stages. The broken line represents the numberof ensembles and the solid line represents transient rate. From Fig. 2C, we findthat the number of ensembles and ensemble membership label converge after 40MCMC stages.We should note that the number of final ensembles is 5, which is smaller thanthe ground-truth value 10. This is because some ensembles are merged in the finalresult. To manage this problem, we use our algorithm repeatedly with anotherinitial membership label, which leads to different ensemble structure. Even inthis case, we will obtain blockwise ensemble structure again like in Fig. 2B, whilethe number of ensemble is still smaller than 10. However, we should note that other ensembles are merged under different initial membership label. Therefore,we can obtain the original 10-ensemble structure exactly by combining severalMCMC results with different initial membership label.When we change conditional activity rate λ , which controls coherence ornoise, the ensemble inference becomes easy/hard. Even under hard condition ornoisy case, we can still obtain nearly correct ensemble structure, and noise canbe removed as much as possible. In addition, even if the sizes of ground-truthensembles are not equal unlike Fig. 2A, we can infer correct ensemble structure. We compare our algorithm and the original one in Ref.[7]. For comparison, weuse the same synthetic data in Fig. 2A. In the original algorithm, we do not usesimultaneous update rule for multiple neurons to new ensemble, nor simulatedannealing idea for hyperparameters (or take the limit of τ → + ∞ ).A typical example of dynamics is shown in Fig. 3. In Fig. 3A, we start withsmall number of ensembles ( A = 3) like our algorithm in Sect. 3.2. However,we cannot obtain blockwise structure. All ensembles are merged into one at lateMCMC stage. In Fig. 3B, we show the number of ensembles and transient ratein the result of Fig. 3A, which exhibits slow convergence of transient rate. Weverify that this result does not depend on the initial condition such as ensemblemembership label. From these results, we conclude that the original algorithmwill get stuck in bad local maximum of Bayesian inference, when we start withsmall initial number of ensembles. stage of MCMC update ( γ ) A t h e nu m b e r o f e n s e m b l e s ( A ) t r a n s i e n t r a t e B
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 n e u r on l a b e l ( i ) t he number of ensemblestransient rate stage of MCMC update ( γ ) Fig. 3.
Dynamical behavior of the original algorithm in Ref.[7] in MCMC: (A) Behaviorof ensemble membership label. (B) Behavior of the number of ensembles (broken) andtransient rate (solid).
In this work, we proposed Bayesian inference algorithm with faster convergence.In our method, we introduced the update rule to new ensemble for multiple neu-rons and the idea of simulated annealing, to avoid bad local maximum solutionof Bayesian inference. For simulated annealing, we introduced decay constant τ to control annealing schedules of transient probability and learning rate. As aconsequence, we find that our method can successfully obtain blockwise neuronal mproved algorithm for neuronal ensemble inference 13 ensemble structure by numerical experiment for synthetic data, even with smallinitial number of ensembles. We also compare our algorithm with the originalone, and the result indicates our algorithm has advantage for finding correctensemble solution.Note that our method in this work focuses on neuronal ensemble identifica-tion, not for the detail of neural network structure like connection. However, webelieve that our idea for ensemble identification will be helpful for understandingwhole structure of biological neural network. Moreover, by further improvementof our method, we think that we can construct Bayesian inference framework forthe detail of neural network structure.Several issues are remained as future works. First, the additional hyperpa-rameter τ may also be useful for finding hierarchical ensemble structure. If τ is set to be large or annealing schedule is slow, we can obtain finer ensemblestructure, while it requires many MCMC stages till convergence. On the otherhand, if τ is small we can obtain ensemble structure more faster. However, onlylarge scale structure will be found and fine structure will be neglected. Evenin this case, we can obtain fine structure if we use this method repeatedly, asmentioned in Sect. 3.2. We should investigate the role of the hyperparameter τ in more detail.Second, we should apply our algorithm to real neuronal activity data, whichwe are planning at present. One of the problems for application is that realexperimental data of neuronal activity is often continuous, not binary like ourformulation. The natural idea for application to continuous data is to binarizereal activity data, however this may neglect significant information in neuronalactivity. Another idea is to generalize our formalism to continuous activity data,and for this idea we must modify generative model for continuous data. Weshould consider which strategy is more appropriate for application to real activitydata. We appreciate the comments from Giovanni Diana and Yuishi Iwasaki. Thiswork is supported by KAKENHI Nos. 18K11175, 19K12178.