An information-theoretic approach to infer the underlying interaction domain among elements from finite length trajectories in a noisy environment
Udoy S. Basak, Sulimon Sattari, Hossain M. Motaleb, Kazuki Horikawa, Tamiki Komatsuzaki
IInferring Domain of Interactions among Elements from Ensemble of Trajectories II.-Effects of finite length trajectories and external noise
Udoy S. Basak
Graduate School of Life Science, Transdisciplinary Life Science Course,Hokkaido University, Kita 12, Nishi 6, Kita-ku, Sapporo 060-0812, Japan andPabna University of Science and Technology, Pabna 6600, Bangladesh
Sulimon Sattari
Research Center of Mathematics for Social Creativity,Research Institute for Electronic Science, Hokkaido University,Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, Japan
Hossain M. Motaleb
Research Center of Mathematics for Social Creativity,Research Institute for Electronic Science, Hokkaido University,Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, Japan andUniversity of Dhaka, Dhaka 1000, Bangladesh
Kazuki Horikawa
Department of Optical Imaging, The Institute of Biomedical Sciences, Tokushima University Graduate School,3-18-15 Kuramoto-cho, Tokushima City, Tokushima, 770-8503, Japan
Tamiki Komatsuzaki
Research Center of Mathematics for Social Creativity,Research Institute for Electronic Science, Hokkaido University,Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, JapanInstitute for Chemical Reaction Design and Discovery (WPI-ICReDD),Hokkaido University Kita 21 Nishi 10, Kita-ku, Sapporo, Hokkaido 001-0021, JapanGraduate School of Life Science, Transdisciplinary Life Science Course,Hokkaido University, Kita 12, Nishi 6, Kita-ku, Sapporo 060-0812, Japan andGraduate School of Chemical Sciences and Engineering Materials Chemistry and Engineering Course,Hokkaido University, Kita 13, Nishi 8, Kita-ku, Sapporo 060-0812, Japan
Transfer entropy in information theory was recently shown [Phys. Rev. E 102, 012404 (2020)] toenable us to elucidate the interaction domain among interacting particles solely from an ensemble oftrajectories. There, only pairs of particles whose distances are shorter than some distance variable,termed cutoff distance, are taken into account in the computation of transfer entropies. The predic-tion performance in capturing the underlying interaction domain is subject to noise level exerted onthe particles and the sufficiency of statistics of the interaction events. In this paper, the dependenceof the prediction performance is scrutinized systematically on noise level and the length of trajecto-ries by using a modified Vicsek model. The larger the noise level and the shorter the time length oftrajectories, the more the derivative of average transfer entropy fluctuates, which makes it difficultto identify the interaction domain in terms of the position of global minimum of the derivative ofaverage transfer entropy. A measure to quantify the degree of strong convexity at coarse-grainedlevel is proposed. It is shown that the convexity score scheme can identify the interaction distancefairly well even while the position of global minimum of the derivative of average transfer entropydoes not.
I. INTRODUCTION
Collective migration is the synchronized movement ofagents emerging from the mutual interactions betweenthem [1, 2]. One of the basic properties of collective mo-tion is that the movement of an individual is influencedby the movement of other individuals in its local vicin-ity and/or through long range interactions, e.g., via somesignals such as chemicals emitted by cells. At the cellularlevel, collective motion can be observed in wound heal-ing, cancer development, and organogenesis [3–5]. The question of how microscopic interactions between agentsregulate the macroscopic group behavior is one of themost intriguing subjects [6]. This is closely related to theproblem of causal inference within systems composed ofmany agents.For the qualitative understanding of collective motion,a variety of simulation models have been proposed suchas the Reynolds’ flocking model [7], the Vicsek model(VM) [8], and the Couzin model [9]. Among these, theVM has been widely used to study various dynamics ofcollectively moving self-propelled particles, such as sym- a r X i v : . [ n li n . AO ] S e p metry breaking [10, 11], phase transition [8, 12], and clas-sification of leaders and followers [13, 14]. In the VM,each particle moves with a constant speed and its thedirection is determined by the average direction of mo-tion of its neighboring particles in the presence of noise[15–17]. In other words, a moving particle interacts onlywith the particles within a distance of R as it would viadirect interactions but also via signal transduction suchas chemicals.One of the possible drivers of collectively movingagents is the presence of influential individuals, some-times referred to as ‘leaders’, who control the movementof the other individuals, referred to as the ‘followers’.Leader-follower relationships have been studied in a fishshoal [18], troops of baboons [19], in a colony of honey bee[20] and so forth. At the cellular level, it has been stud-ied that collective migration of MDCK epithelial cells[21, 22], wound healing [23], cancer growth in breast [24],etc., are regulated by the leader cells.Identifying leader and follower agents is a challengingendeavor. First and foremost, one must identify whatit means to be a leader. Based on asymmetric natureof influence on activities among entities, in this article,we define a ‘leader’ as an entity which more influences(on average) on the activity of the other entities (termed‘followers’). Once leadership has been defined, varioustypes of empirical data, e.g., ensembles of trajectoriesof agents, can be used to infer the differential influencein interaction and identify leader-follower relationships.By definition, leaders are expected to be more persua-sive compared to the followers. Since followers followthe movement of leaders, some correlation should existbetween some physical quantity related to a leader andthat related to a follower with a certain time delay, asinformation cannot travel from a leader to a follower atinfinite speed.To measure causal influence among multivariate timeseries, information theory provides a variety of ap-proaches [25]. Some of the typical quantities used are mu-tual information [26], time-delayed mutual information[27], transfer entropy [27], and causation entropy [28].These quantities have been computed using time lapsemotion data of moving individuals to determine the di-rections of influence. For example, it was found by usinga zebrafish interaction model that net transfer entropyis a more accurate classifier than extreme-event synchro-nization and cross-correlation for classifying leaders andfollowers [13]. In swarms of bats, transfer entropy wasused to demonstrate that there exists a leader-followerrelationship between the front bat and the rear bat [29].Using the trajectories of handball players, it was showedthat transfer entropy is capable of capturing the causalrelationships between players [30].In the above-mentioned studies, all pairs of agents aretaken into account at every time instance to evaluatetransfer entropy irrespective of the distance between theagents. This is not necessarily an optimal use of the dataavailable for capturing the underlying leader-follower re- lationship, given that the interaction domain is known.It was shown, using a modified VM, that the classifica-tion scores of leaders and followers significantly increaseupon incorporating the identified interaction domain in-formation compared to the conventional way of trans-fer entropy estimation where the distance between theagents is not taken into account [31]. When two parti-cles come into their interaction domain, they may shareor transfer information which results in some change intheir motion such as the direction of motion. As the dis-tance between them exceeds the interaction radius, theamount of information flow decreases and goes to zero atthe limit of the distance being infinity in a fluctuatingenvironment. This methodology requires that the inter-action domain is known, which may not be the case. Anew scheme has been proposed to estimate the underly-ing interaction domain from the trajectories of particlesto monitor the change in transfer entropy as a functionof the distance between them, called cutoff distance λ . Itwas demonstrated that the derivative of average trans-fer entropy (and also cross correlation) with respect to λ has a minimum near the interaction domain, by whichone can identify the underlying interaction domain froma set of trajectories [31].The scheme is dependent on how transfer entropy canbe estimated so that it takes into account enough statis-tics of interacting particles, and positions and numbersof the minimum of the derivative of average transfer en-tropy along the cutoff distance λ may also be subject tothe extent of external noise and time length of trajec-tories. In this paper, we scrutinize how the predictionperformance in capturing the underlying interaction do-main depends on the size of noise and time length of thetrajectory data. We also examine an alternative schemeexpected to be stable against noises and time length, thatrelies on the degree of convexity at coarse-grained scale inthe derivative of average transfer entropy along the cut-off distance, and time variance of underlying interactionradius of particles. II. IDENTIFICATION OF LEADERS ANDFOLLOWERS
Transfer Entropy (TE) from time series of a stochas-tic variable X = { ..., x t − , x t , x t +1 , ... } to time series ofanother stochastic variable Y = { ..., y t − , y t , y t +1 , ... } isdefined as [27]: T E X → Y = I ( y t + τ ; x t | y t ) , = (cid:88) y t + τ ,y t ,x t p ( y t + τ , y t , x t ) log ( p ( y t + τ | y t , x t ) p ( y t + τ | y t ) ) , = H ( y t + τ | y t ) − H ( y t + τ | y t , x t ) , (1)where τ is the time lag between the two time instantsand H ( . | . ) represents the conditional Shannon entropy[26]. TE is proven to be non-negative. A positive valueof T X → Y is considered to indicate the causal influenceof X on Y [32]. For a pair of agents X and Y , the nettransfer entropy from X to Y , defined as N T E X → Y = T E X → Y − T E Y → X can be used to infer the direction ofcausal influence. A positive N T E X → Y may indicate that Y follows X , which quantifies the causal direction from X to Y .As a classifier to differentiate leaders and followers, theaverage net transfer entropy is used, which is denoted as χ and defined for a given particle i as follows: χ ( i ) = 1 N − (cid:88) j ( (cid:54) = i ) ( T E i → j − T E j → i ) , where T E i → j represents TE from the particle i to j and N is the total number of particles in the system. Thevalue of χ ( i ) for each particle i is compared to a selectedthreshold value (cid:15) . A particle for which χ ( i ) is higherthan the threshold (cid:15) is identified as a leader, otherwiseit is identified as a follower. The resulting classifica-tion is compared to the ground truth to obtain the true-positive rate and the false-positive rate for the chosen (cid:15) . To show the classification performance of a classi-fier receiver-operating characteristic curve is used. It isobtained by plotting the true positive rate versus falsepositive rate at different values of (cid:15) [13]. In order toquantify the accuracy of the classifier ' s performance andto compare the performance of different classifiers, areaunder receiver-operating characteristic curve (AUC) hasbeen used [33]. An AUC score of 1.0 means that thatclassifier accurately predicts the identities of the parti-cles whereas a value of 0.5 means that the classifier hasno class separation capacity whatsoever. III. MODIFIED VICSEK MODEL
Similar to the standard VM [8], we consider that N self-propelled particles are moving with the same con-stant speed v in a two-dimensional square box of length L with periodic boundary conditions, and at time t = 0the particles are positioned and oriented randomly. Attime t + 1, the position of i th particle is denoted by (cid:126)r t +1 i is updated at each time step ∆ t as: (cid:126)r t +1 i = (cid:126)r ti + (cid:126)v ti ∆ t, (2)where (cid:126)r ti denotes the position of i th particle at time t , and (cid:126)v ti represents the corresponding velocity of the particlewith an absolute speed v and a direction given by theangle θ i ( t ). This angle is obtained from the followingequation: θ i ( t + 1) = (cid:104) θ ( t ) (cid:105) R, w ,(cid:126)r ti + ∆ θ i . (3)Here (cid:104) θ ( t ) (cid:105) R, w ,(cid:126)r ti is the weighted orientation aver-aged over particles (including the particle i itself),which are within a circle of radius R centered onthe position (cid:126)r ti of the particle i at time t , computed by arctan (cid:104)(cid:80) (cid:48) j w ji sin θ j ( t ) / (cid:80) (cid:48) j w ji cos θ j ( t ) (cid:105) where (cid:80) (cid:48) takes over all j satisfying | (cid:126)r ti − (cid:126)r tj |≤ R [31]. w is amatrix whose element w ij corresponds to the interactionstrength that the particle i exhibits on its neighboringparticle j . If the particle i is a leader and j is a follower,then w ij > w ji . Also the interaction strength of a par-ticle i on itself is 1.0 i.e., w ii = 1 .
0. We set the valuesof leaders ' and followers ' interaction strengths to 1.05and 1.00, respectively. ∆ θ i represents random numberat time t which can be chosen with a uniform probabil-ity distribution from the interval [ − η / , η / η may be considered as a temperature-like parameter. Thetotal time length is designated by T during which trans-fer entropies are estimated between leader and followerparticles. 𝜃 𝑘 𝜃 𝑖 𝜃 𝑗 Δ𝜃 𝑘 Δ𝜃 𝑖 Δ𝜃 𝑗 𝑅 𝑅 𝑅𝜆 𝜆 Figure 1. Schematic diagram of cutoff distance. Ovals repre-sent moving particles. θ i , θ j , and θ k represent the direction ofmotion of receptive particles at time t and R is the interactionradius. λ and λ exemplify two different cutoff distances.For example, for the cutoff distance λ = λ , the actual dis-tance between the particles i and j at this time instance islarger than λ , and hence, their orientation information is notconsidered for TE calculation between them even though theparticles are located within each other’s interaction domain(likewise the orientational information between the particle i and k is not taken into account in the computation of TEat λ = λ ). But for λ = λ , particles j and k are both lo-cated within the distance of λ from particle i . Hence theorientation information of θ k and θ i and that of θ j and θ i are considered to compute TE irrespective of the underlyinginteraction radius R . In the VM each particle moves with a constant speed v . In this paper the value of v is set to be 0.3 arb.units. It was found that in the range of 0 . ≤ v ≤ . ' s headings. Hence orienta- . . . . Noise A UC sc o r e (a) N = 2 N = 6 N = 10 N = 15 . . . . Noise A UC sc o r e (b) N = 2 N = 6 N = 10 N = 15 Figure 2. Classification scores of different numbers of particles at different noise levels. (a) Fixed box size. Density changeswith different number of particles: ρ = 0.02 arb. units ( N = 2), 0.06 ( N = 6), 0.10 ( N = 10), and 0.15 ( N = 15). (b) Fixeddensity as 0 . L = 4 .
47 arb. units ( N = 2), 7.75 ( N = 6), 10( N = 10), and 12.25 ( N = 15). Interaction radius R T i m e l eng t h T (a) Interaction strength w LF T i m e l eng t h T (b) Figure 3. AUC landscape at different time length for different interaction radius and interaction strength. (a) AUC landscapecorresponds to η = 1 . π arb. units and w LF = 1 .
05 arb. units. AUC value increases as the time length and interaction radius R increase. (b) AUC landscape correspond to η = 1 . π and R = 1. For this analysis w FL = 1 is fixed and w LF were varied.Similar behaviors were observed at other noise levels (not shown). tion of particles is used to compute transferred informa-tion between them. In this paper, the orientation space[0 : 2 π ] is discretized into six bins of equal size which arerepresented by unique symbols [31].In Eq. (3) the orientation of the particle i at time( t + 1) depends largely on the orientation of itself andnearby particles at time t . The delay time τ is set to be1 for the estimation of TE throughout this paper. Allanalyses were performed with 1000 realizations and foreach realization the values of (cid:126)r ti and (cid:126)r ti at time t = 0 arechosen randomly in Eq. (2). In this paper box size L ,number of particles N , time length T , interaction radius R , are varied to check their effect on classification score. IV. CUTOFF DISTANCE
Knowledge of the interaction domain greatly improvesthe classification of leaders and followers. In practice,however, one does not know the interaction domain fora group of animals, cells, or birds a priori. To deduceit from an ensemble of trajectories, the ‘cutoff distancevariable’ λ was introduced [31]. In this problem setting,the interaction domain is considered as a circle of radius R , which is unknown, however, in general the same tech-nique can be applied to infer an interaction domain ofany shape. Then for the estimation of TE, the cutoffdistance λ is defined as a distance up to which the in-teractions between particles are taken into consideration[Figure 1]. In other words, for the estimation of TE fromthe ‘symbolic time series’ of a particle to another, theprobability distributions are estimated only at the timeinstance t when the distance between those two particlesis less than the cutoff distance λ . Finally the value of λ is varied and TE between particles is computed as afunction of λ . Whenever there is no mention of a cutoffdistance λ , e.g. in Section V, it means that the distanceinformation between particles is not considered which isthe conventional way of TE computation [13, 29]. V. RESULTS AND DISCUSSIONS
Figure 4. Distributions of the classifiers χ TE (/bits) from theleader to the others, and those from a follower to the othersfor R = 3, η = 1 . π , and (a) T = 100 ,
000 arb. units, (b) T = 50 , T = 25 , T = 12 , Figure 2 shows the AUC classification score (=the ex-tent of how leader and followers are correctly classifiedfrom their orientation dynamics) with different numbersof particles N and noise levels η . Here, only one particleserves as a leader and the other ( N −
1) particles serve asfollowers, under the constraint of box size L [Fig. 2(a)]or density ρ [Fig. 2(b)]. At low noise level η ∼ η > . π , the distributions of transferentropies χ TE from leader to follower and vice versa over1000 realizations were found to significantly overlap witheach other, which makes differentiation between leaderand follower difficult. The low AUC at very low noiselevel arises from the fact that particles fall into someconcerted motion very quickly dependent solely on ini-tial configurations, resulting in insufficient sampling oforientational dynamics, and in turn the low AUC at veryhigh noise arises from overshadowing of the interactionsby random noises [31]. As the number of particles in-creases, the AUC score decreases in both fixed box sizeand fixed density cases. This is because, in any pair of Figure 5. Distributions of the classifiers χ TE (/bits) from theleader to the others, and those from a follower to the othersfor T = 100 , η = 1 . π , and (a) R = 4 .
0, (b) R = 3 .
0, (c) R = 2 .
0, and (d) R = 1 . particles for which TE is evaluated, their motions arealso influenced by the other particles, and the more thenumber of particles increases, the more the motions ofthe particles in question are influenced by the third (orhigher) particle. Compared the fixed box size case to thefixed density case, the AUC score for the fixed box sizequickly drops to 0.5 (corresponding to a coin toss) around η ∼ . π than those for the fixed density condition. Itsuggests that, as the density gets higher, the motion ofparticles in any pair are more influenced by additionalparticles.The effects of time length T , interaction radius R , andleaders ' interaction strength w LF on classification scoreare shown in Fig. 3. In this analysis 10 particles wereused, with one serving as a leader and the other 9 par-ticles as followers. In Fig. 3(a) we varied time length T and interaction radius R . It is shown that the AUCscore increases with T and R . For shorter T , due to in-sufficient sampling in characterizing leader-follower inter-action relationship, the distributions of χ TE from leaderto the others and from follower to the others have highervariance as shown in Fig. 4. As a result, it is difficultto distinguish leader and followers for short T . As T increases, due to more exploration of interaction eventsbetween leader and followers, the variance of leaders’ andfollowers’ distributions gets smaller, making the classifi-cation easier. Figure 5 shows, in turn, the interactionradius R dependency on distributions of the classifiers χ TE . Larger R allows particles to be taken into ac-count in elucidating the leader-follower interaction rela-tionship, which produces easily distinguishable distribu-tions of leader and followers as shown in Fig. 5(a). In .
25 0 .
75 1 .
25 1 .
75 2 .
25 2 .
75 3 .
25 3 .
75 4 .
25 4 . Cutoff distance N (a) .
25 0 .
75 1 .
25 1 .
75 2 .
25 2 .
75 3 .
25 3 .
75 4 .
25 4 . Cutoff distance N (b) Figure 6. AUC landscape for different numbers of particles at different cutoffs. Actual interaction radius R used for thetrajectory calculation is 3 . η = π . (a) Fixed box size. The highest AUC scores and their locations are: 0.999at λ = 3 .
25 arb. units ( N = 2), 0.984 at λ = 3 .
25 ( N = 4), 0.946 at λ = 3 . N = 6), 0.926 at λ = 3 .
25 ( N = 8), 0.903 at λ = 3 . N = 10), 0.878 at λ = 3 . N = 12), and 0.846 at λ = 3 . N = 15). (b) Fixed density. The highest AUC scores andtheir locations are: 0.999 at λ = 3 . N = 2), 0.986 at λ = 3 .
25 ( N = 4), 0.947 at λ = 3 . N = 6), 0.921 at λ = 3 . N = 8),0.897 at λ = 3 . N = 10), 0.870 at λ = 3 . N = 12), and 0.851 at λ = 3 . N = 15). (a) T =12,000 T =25,000 T =50,000 T =100,000 (b) T =12,000 T =25,000 T =50,000 T =100,000 (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 7. Derivative of average TE for different time length along with local minima and identified interaction radius based onglobal minimum scheme at (a) η = 0 . π , (b) η = 1 . π and (c) η = 1 . π . The actual interaction radius R used for ensembleof trajectories is 3. Circles represent all local minima and filled-circles represent the identified interaction radius at each timelength T . contrast, when R is smaller, the χ TE distributions of lead-ers and followers overlap each other more with smallervariance, making the classification more difficult.In Fig. 3(b) we set R = 1 . w LF . In thisanalysis followers’ interaction strength is set to 1.0, i.e., w FL = w FF = 1 .
0. Hence w LF = 1 . w LF increases AUC value also increases as the leader isgetting more influential on followers which makes classi-fication easier even at short time length.How does the predictability of interaction radius byusing TE with cutoff distance depend on the number of particles? Figure 6 represents the AUC landscape as afunction of number of particles N and cutoff distance λ .Figure 6(a) corresponds to a fixed box size (i.e. densityis changing with N ), whereas Fig. 6(b) represents thesystems having same density (i.e. L is changing with N ). The actual interaction radius used to simulate thetrajectories of the particles was R = 3 and noise was setto η = π . Although at each N the maximum AUCsare located at the underlying interaction radius R = 3 . R = 5 √ (a) T =12,000 T =25,000 T =50,000 T =100,000 (b) T =12,000 T =25,000 T =50,000 T =100,000 (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 8. Derivative of average TE for different time lengths along with the identified interaction radius based on convexityscore scheme at (a) η = 0 . π , (b) η = 1 . , π and (c) η = 1 . π for { n } = { n | ≤ n ≤ } and { M } = { M | ≤ M ≤ } and δ = 1 × − . between particles. Similar behaviors are observed at dif-ferent interaction radius R and noise levels (not shownhere).How can one infer the underlying interaction radiussolely from ensembles of trajectories? Recently, a sim-ple scheme has been proposed [31] to infer the underly-ing interaction distance from ensembles of trajectories,based on the existence of a significant decrease in aver-aged transfer entropy when cutoff distance λ exceeds theunderlying interaction radius. The interaction distanceis inferred as that where the minimum of the derivativesexists along the cutoff distance λ :ˆ R ≡ argmin λ d (cid:104) TE (cid:105) λ dλ , under the condition of d (cid:104) TE (cid:105) λ dλ = 0. In practice, thelength of trajectories may not be long enough and shorterlength tends to result in a fluctuation in the course of TEalong the cutoff distance λ , resulting in apparent minima.Figure 7 shows the derivative of average TE as a func-tion of cutoff distance λ , denoted by d (cid:104) T E (cid:105) λ dλ for interac-tion radius R = 3 at (a) η = 0 . π , (b) η = 1 . π , and (c) η = 1 . π for different T . Here circles represent all localminima and the filled-circles represent the global mini-mum identified as the interaction radius. In Figs. 7(b)and 7(c) for relatively short T = 25 ,
000 or less, d (cid:104) T E (cid:105) λ dλ as a function of λ has the global minimum at low λ . Thelocations of these global minima for short time lengthchange with T and they vanish when T is longer, andthe longer T = 50 , − ,
000 both result in a closevalue of the underlying interaction radius R = 3. Thisimplies that to look for global minimum of derivative oftransfer entropy may not necessarily result in an approx-imation of the true interaction radius especially for someshort T at high noise levels.In this paper, we present another scheme expectedto be robust against fluctuations of average transfer en- tropies along the cutoff distance by introducing a mea-sure to quantify the degree of strong convexity at coarse-grained level, and time variance of underlying interactionradius of particles as follows.Note in the insets of Fig. 7 that the shape of d (cid:104) T E (cid:105) λ dλ as a function of λ is (strongly) convex near the actualinteraction radius irrespective of time length T . This in-dicates that the derivative of transfer entropy as a func-tion of cutoff distance can shed light on the underlyingspatial scale of interactions among particles. However, itis not trivial to devise a scheme to automatically inferthe interaction radius. Since d (cid:104) T E (cid:105) λ dλ as a function of λ isconvex near the interaction radius, a measure of convex-ity of d (cid:104) T E (cid:105) λ dλ is versatile in determining the interactionradius. In general, due to noise, d (cid:104) T E (cid:105) λ dλ can be fluctu-ated, producing apparent convex patterns locally. Thusin defining the convexity score, it is necessary to cap-ture the non-local feature of d (cid:104) T E (cid:105) λ dλ rather than the localfeature that may be subject to noise(s). We define theconvexity score κ ( λ i ) of a function f ( λ ) at a point λ i as κ ( λ i ) = M (cid:80) Mm =1 σ i ( m ) where σ i ( m ) = +1 if f ( λ i − m ) − f ( λ i ) > δ and f ( λ i + m ) − f ( λ i ) > δ and σ i ( m ) = − f ( λ i ) − f ( λ i − m ) > δ and f ( λ i ) − f ( λ i + m ) > δ , otherwise σ i ( m ) = 0. Here δ is a non-negative small number and M is the number of neighboring points used to determinethe convexity score, and − ≤ κ ( λ i ) ≤
1. As the functionof f ( λ ) we employ a simple moving averaged d (cid:104) T E (cid:105) λ dλ withan interval n : f ( λ i ) = 1 /n (cid:80) n − j =0 d (cid:104) T E (cid:105) λ /dλ | λ = λ i + j .Thus, around a point λ i where d (cid:104) T E (cid:105) λ dλ is convex atsome coarse-grained level κ ( i ) is close to unity. Hencethe point λ i around which κ ( λ i ) is the maximum is con-sidered to be an indicator of the interaction radius abovewhich average transfer entropy between particles signifi-cantly decreases.How to choose the optimal n , M , and δ ? Wedefine the estimated interaction radius ˆ R as ˆ R ≡ Figure 9. Relative error ˆ R in identifying underlying interaction domain using global minimum scheme at (a) η = 0 . π , (b) η = 1 . π , and (c) η = 1 . π . Cross-marked boxes ‘NaN’ mean that any minimum of derivatives was not detected.Figure 10. Relative error ˆ R in identifying underlying interaction domain using convexity score scheme with δ = 1 × − and { n } = { n | ≤ n ≤ } , { M } = { M | ≤ M ≤ } at (a) η = 0 . π , (b) η = 1 . π , and (c) η = 1 . π . Cross-marked boxes‘NaN’ mean that the no point was detected having non-zero convexity score κ . argmax λ κ ( n, M ; T ), where T represents the time length.Then the cost function is defined as C ( n, M ) ≡ (cid:88) T (cid:88) T (cid:48) | ˆ R ( n, M ; T ) − ˆ R ( n, M ; T (cid:48) ) | , (4)by assuming that the interaction radius is inde-pendent of time, i.e., time-invariant, and there ex-ists (approximately) sufficient statistics for each timelength in elucidating the interaction events. Theparameters ( n, M ) is determined so that ( n, M ) =argmin n ∈{ n } ,M ∈{ M } C ( n, M ). Here the sets of n and M to be searched for finding optimal n and M , { n } and { M } , are those users need to input a priori . Note thatfor some time length T , ˆ R ( n, M ; T ) could not be chosenuniquely due to the degeneracy of κ ( n, M ; T ) and alsoˆ R ( n, M ; T ) could become undefined when d (cid:104) T E (cid:105) λ dλ curvedoes not has any strongly convex part. In both the cases,we exclude such T in Eq. (4) to compute the cost function C ( n, M ) in defining optimal n and M . We also varied δ = 0 , × − , × − , × − (arb. units) and foundthat within this range of δ has no significant effect on theestimation of interaction radius.Figures 9 and 10 show the relative errors (∆ R ) inidentifying the underlying interaction domain using the global minimum of the derivative of transfer entropy andconvexity score scheme, respectively, as a function of timelength and interaction radius at three different noise lev-els. Here relative error (∆ R ) is defined as ∆ R = | R − ˆ R | R ,where ˆ R is the identified interaction radius using eitherof the two schemes.For the global minimum (of TE derivative) scheme[Fig. 9], there exists a clear trend such that the largerthe noise level the larger the relative error, and as thetime length T decreases, the relative errors are more pro-nounced for relatively large noise levels η = 1 . π − . π .Because of the appearance of a minimum at low cutoffdistance for short T that ceases to exist for longer T inFigs. 7(b) and 7(c), the global minimum scheme appar-ently possesses higher relative error for short T [Figs. 9(b) and 9 (c)]. Note also that even when the time length T is sufficiently long, e.g., T = 50000 − η = 1 . π , and inter-action radius R is short, i.e., R = 1 . − . R is short ( R = 1 . ,
2) and noise levelis very high [Fig. 10(c)].
VI. CONCLUSIONS
In this study, we examined the performance of twoheuristic schemes using the derivative of transfer entropywith respect to cutoff distance λ , d (cid:104) T E (cid:105) λ dλ : global mini-mum and convexity score scheme for determining the in-teraction radius by using the modified VM. The strikingfeature -based on which we proposed the two schemes-is that d (cid:104) T E (cid:105) λ dλ exhibits a kink near the actual interac-tion radius. A method that is capable of determiningthe exact location of that kink can be used in inferringinteraction radius.For short time length (at the moderate and high levelof noise) for the modified VM, the derivative of aver-age TE exhibits a minimum at low cutoff distances thatproduce a relatively high error for the global minimumscheme. Moreover, in real experiments it is not neces-sarily possible to get sufficiently long trajectories withsmall noises, and the global minimum scheme may yieldsome non-negligible error especially for short trajecto-ries with noise. In this paper, an alternative scheme waspresented, based on the property of convexity of d (cid:104) T E (cid:105) λ dλ at the coarse-grained level and the assumption of time-invariance of the underlying interaction domain. For themodified VM, the scheme could capture the underlyinginteraction radius at the low and moderate levels of noiseespecially for relatively short time length, ca. T =12,000-25,000 for which global minimum scheme possesses highrelative error (at moderate noise level). These two heuris-tic schemes are compliment to each other and, as forappropriate usages, one should first visualize transfer en-tropy as a function of cutoff distance λ with its deriva-tives with respect to cutoff distance to confirm the exis-tence of abrupt changes along the cutoff distance. Sig-nificant changes in the derivative of TE with respect tocutoff distance were also observed for classical trajecto-ries of particles interacting via Lennard-Jones potential(not shown), and the existence of some significant changealong cutoff distance around the typical length scale ofinteractions may be ubiquitous.In systems with many variables, identifying causal re-lationships is a daunting task. An important aspect ofsystems that should be exploited, however, is that partic-ular variable may be only influencing another particularvariable at certain time instances. We have shown thatfiltering out those time instances where influence does not occur greatly improves the identification of causal rela-tionships. In the Vicsek model, for example, two agentsmay only interact when their distance is less than a cer-tain threshold. To pose this as a question, we wonder atwhich value of interaction radius R does the motion ofone agent influence the motion of another? More gener-ally, we ponder the question: at which levels of variable X does variable Y influence variable Z ? In future work,we will demonstrate the applicability of this method byapplying it to data sets stemming from different fields. ACKNOWLEDGMENTS
We thank Prof. A. Nakamura, Prof. M. Toda, andProf. S. Sawai for their valuable discussions. This workwas supported by a Grant-in-Aid for Scientific Researchon Innovative Areas “Singularity Biology (No.8007)”(18H05413), MEXT, and by JSPS (No. 25287105 and25650044 to T.K.), and JST/CREST (No. JPMJCR1662to T.K.).
Appendix A: Estimated interaction radius at R = 2 . Figure (11) shows the derivative of average TE for dif-ferent time lengths along with the identified interactionradius based on convexity score scheme at different noiselevels. It has been found that at low and moderate noiselevels the convexity score scheme identifies the interac-tion radius satisfactorily [Figs. 11(a) and 11(b)]. But atvery high level of noise ( η ≈ . π ) this scheme performssatisfactorily for longer T but possesses higher relativeerror for shorter T [Fig. 11(c)]. Appendix B: Convexity score
According to the definition convexity score κ is maxi-mum at the point for which the d (cid:104) T E (cid:105) λ dλ is strongly convexand κ is minimum where the curve is concave. Figure(12) shows the convexity score for very high noise level( η = 1 . π ) at different interaction radii R . For R = 1 . T ( T = 12 , − , λ . Since d (cid:104) T E (cid:105) λ dλ has no convex point for T = 50 , − , R = 1 .
5, the convexity score scheme fails to identifyinteraction radius. For R = 2 . T = 50 ,
000 but it identifies inter-action radius for T = 100 ,
000 [Fig. 12(b)]. For R = 2 . T ( T = 50 , − , (a) T =12,000 T =25,000 T =50,000 T =100,000 (b) T =12,000 T =25,000 T =50,000 T =100,000 (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 11. Derivative of average TE for different time lengths along with the identified interaction radius based on convexityscore scheme at (a) η = 0 . π , (b) η = 1 . , π and (c) η = 1 . π for R = 2 . δ = 1 × − , and { n } = { n | ≤ n ≤ } and { M } = { M | ≤ M ≤ } . Cutoff distance C on v e x i t y sc o r e (a) T =12,000 T =25,000 T =50,000 T =100,000 Cutoff distance -0.4-0.200.20.4 C on v e x i t y sc o r e (b) T =12,000 T =25,000 T =50,000 T =100,000 Cutoff distance -0.4-0.200.20.4 C on v e x i t y sc o r e (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 12. Convexity score κ at (a) R = 1 .
5, (b) R = 2 .
0, and (c) R = 2 . η = 1 . π .[1] A. Grada, M. Otero-Vinas, F. Prieto-Castrillo, Z. Obagi,and V. Falanga, Journal of Investigative Dermatology , e11 (2017).[2] E. Theveneau and C. Linker, F1000Research (2017).[3] P. Friedl and D. Gilmour, Nature reviews Molecular cellbiology , 445 (2009).[4] A. Haeger, K. Wolf, M. M. Zegers, and P. Friedl, Trendsin cell biology , 556 (2015).[5] X. Trepat, M. R. Wasserman, T. E. Angelini, E. Millet,D. A. Weitz, J. P. Butler, and J. J. Fredberg, Naturephysics , 426 (2009).[6] W. M. Lord, J. Sun, N. T. Ouellette, and E. M. Bollt,IEEE Transactions on Molecular, Biological and Multi-Scale Communications , 107 (2016).[7] C. W. Reynolds, in Proceedings of the 14th annual con-ference on Computer graphics and interactive techniques (1987) pp. 25–34.[8] T. Vicsek, A. Czir´ok, E. Ben-Jacob, I. Cohen, andO. Shochet, Phys. Rev. Lett. , 1226 (1995).[9] I. D. Couzin, J. Krause, R. James, G. D. Ruxton, andN. R. Franks, Journal of theoretical biology , 1(2002). [10] W. Li, H.-T. Zhang, M. Z. Chen, and T. Zhou, Phys.Rev. E , 021920 (2008).[11] A. Creppy, F. Plourabou´e, O. Praud, X. Druart, S. Cazin,H. Yu, and P. Degond, Journal of The Royal SocietyInterface , 20160575 (2016).[12] B. Szab´o, G. J. Sz¨oll¨osi, B. G¨onci, Z. Jur´anyi,D. Selmeczi, and T. Vicsek, Phys. Rev. E , 061908(2006).[13] S. Butail, V. Mwaffo, and M. Porfiri, Phys. Rev. E ,042411 (2016).[14] V. Mwaffo, S. Butail, and M. Porfiri, Frontiers inRobotics AI , 35 (2017).[15] M. Aldana, V. Dossetti, C. Huepe, V. M. Kenkre, andH. Larralde, Phys. Rev. Lett. , 095702 (2007).[16] Z. Liu and L. Guo, Science in China Series F: InformationSciences , 848 (2008).[17] H. Chat´e, F. Ginelli, G. Gr´egoire, F. Peruani, andF. Raynaud, The European Physical Journal B , 451(2008).[18] J. Krause, D. Hoare, S. Krause, C. Hemelrijk, andD. Rubenstein, Fish and Fisheries , 82 (2000).[19] C. Sueur, BMC ecology , 26 (2011). [20] T. D. Seeley, The wisdom of the hive: the social phys-iology of honey bee colonies (Harvard University Press,2009).[21] N. Yamaguchi, T. Mizutani, K. Kawabata, and H. Haga,Scientific reports , 1 (2015).[22] M. Reffay, L. Petitjean, S. Coscoy, E. Grasland-Mongrain, F. Amblard, A. Buguin, and P. Silberzan,Biophysical journal , 2566 (2011).[23] T. Omelchenko, J. Vasiliev, I. Gelfand, H. Feder, andE. Bonder, Proceedings of the National Academy of Sci-ences , 10788 (2003).[24] K. J. Cheung, E. Gabrielson, Z. Werb, and A. J. Ewald,Cell , 1639 (2013). [25] K. Hlav´aˇckov´a-Schindler, M. Paluˇs, M. Vejmelka, andJ. Bhattacharya, Physics Reports , 1 (2007).[26] T. M. Cover and J. A. Thomas, Elements of informationtheory (Wiley Interscience, New York, 2006).[27] T. Schreiber, Phys. Rev. Lett. , 461 (2000).[28] J. Sun and E. M. Bollt, Physica D: Nonlinear Phenomena , 49 (2014).[29] N. Orange and N. Abaid, Eur. Phys. J. , 3279 (2015).[30] K. Itoda, N. Watanabe, and Y. Takefuji, Procedia Com-puter Science , 85 (2015).[31] U. S. Basak, S. Sattari, K. Horikawa, and T. Komat-suzaki, Phys. Rev. E , 012404 (2020).[32] R. G. James, N. Barnett, and J. P. Crutchfield, Phys.Rev. Lett. , 238701 (2016).[33] J. A. Hanley and B. J. McNeil, Radiology143