[PDF] An information-theoretic approach to infer the underlying interaction domain among elements from finite length trajectories in a noisy environment

Abstract

Transfer entropy in information theory was recently demonstrated [Phys. Rev. E 102, 012404 (2020)] to enable us to elucidate the interaction domain among interacting elements solely from an ensemble of trajectories. There, only pairs of elements whose distances are shorter than some distance variable, termed cutoff distance, are taken into account in the computation of transfer entropies. The prediction performance in capturing the underlying interaction domain is subject to noise level exerted on the elements and the sufficiency of statistics of the interaction events. In this paper, the dependence of the prediction performance is scrutinized systematically on noise level and the length of trajectories by using a modified Vicsek model. The larger the noise level and the shorter the time length of trajectories, the more the derivative of average transfer entropy fluctuates, which makes it difficult to identify the interaction domain in terms of the position of global minimum of the derivative of average transfer entropy. A measure to quantify the degree of strong convexity at coarse-grained level is proposed. It is shown that the convexity score scheme can identify the interaction distance fairly well even while the position of global minimum of the derivative of average transfer entropy does not. We also derive an analytical model to explain the relationship between the interaction domain and the change of transfer entropy that supports our cutoff distance technique to elucidate the underlying interaction domain from trajectories.

Full PDF

IInferring Domain of Interactions among Elements from Ensemble of Trajectories II.-Eﬀects of ﬁnite length trajectories and external noise

Udoy S. Basak

Graduate School of Life Science, Transdisciplinary Life Science Course,Hokkaido University, Kita 12, Nishi 6, Kita-ku, Sapporo 060-0812, Japan andPabna University of Science and Technology, Pabna 6600, Bangladesh

Sulimon Sattari

Research Center of Mathematics for Social Creativity,Research Institute for Electronic Science, Hokkaido University,Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, Japan

Hossain M. Motaleb

Research Center of Mathematics for Social Creativity,Research Institute for Electronic Science, Hokkaido University,Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, Japan andUniversity of Dhaka, Dhaka 1000, Bangladesh

Kazuki Horikawa

Department of Optical Imaging, The Institute of Biomedical Sciences, Tokushima University Graduate School,3-18-15 Kuramoto-cho, Tokushima City, Tokushima, 770-8503, Japan

Tamiki Komatsuzaki

Research Center of Mathematics for Social Creativity,Research Institute for Electronic Science, Hokkaido University,Kita 20, Nishi 10, Kita-ku, Sapporo 001-0020, JapanInstitute for Chemical Reaction Design and Discovery (WPI-ICReDD),Hokkaido University Kita 21 Nishi 10, Kita-ku, Sapporo, Hokkaido 001-0021, JapanGraduate School of Life Science, Transdisciplinary Life Science Course,Hokkaido University, Kita 12, Nishi 6, Kita-ku, Sapporo 060-0812, Japan andGraduate School of Chemical Sciences and Engineering Materials Chemistry and Engineering Course,Hokkaido University, Kita 13, Nishi 8, Kita-ku, Sapporo 060-0812, Japan

Transfer entropy in information theory was recently shown [Phys. Rev. E 102, 012404 (2020)] toenable us to elucidate the interaction domain among interacting particles solely from an ensemble oftrajectories. There, only pairs of particles whose distances are shorter than some distance variable,termed cutoﬀ distance, are taken into account in the computation of transfer entropies. The predic-tion performance in capturing the underlying interaction domain is subject to noise level exerted onthe particles and the suﬃciency of statistics of the interaction events. In this paper, the dependenceof the prediction performance is scrutinized systematically on noise level and the length of trajecto-ries by using a modiﬁed Vicsek model. The larger the noise level and the shorter the time length oftrajectories, the more the derivative of average transfer entropy ﬂuctuates, which makes it diﬃcultto identify the interaction domain in terms of the position of global minimum of the derivative ofaverage transfer entropy. A measure to quantify the degree of strong convexity at coarse-grainedlevel is proposed. It is shown that the convexity score scheme can identify the interaction distancefairly well even while the position of global minimum of the derivative of average transfer entropydoes not.

I. INTRODUCTION

Collective migration is the synchronized movement ofagents emerging from the mutual interactions betweenthem [1, 2]. One of the basic properties of collective mo-tion is that the movement of an individual is inﬂuencedby the movement of other individuals in its local vicin-ity and/or through long range interactions, e.g., via somesignals such as chemicals emitted by cells. At the cellularlevel, collective motion can be observed in wound heal-ing, cancer development, and organogenesis [3–5]. The question of how microscopic interactions between agentsregulate the macroscopic group behavior is one of themost intriguing subjects [6]. This is closely related to theproblem of causal inference within systems composed ofmany agents.For the qualitative understanding of collective motion,a variety of simulation models have been proposed suchas the Reynolds’ ﬂocking model [7], the Vicsek model(VM) [8], and the Couzin model [9]. Among these, theVM has been widely used to study various dynamics ofcollectively moving self-propelled particles, such as sym- a r X i v : . [ n li n . AO ] S e p metry breaking [10, 11], phase transition [8, 12], and clas-siﬁcation of leaders and followers [13, 14]. In the VM,each particle moves with a constant speed and its thedirection is determined by the average direction of mo-tion of its neighboring particles in the presence of noise[15–17]. In other words, a moving particle interacts onlywith the particles within a distance of R as it would viadirect interactions but also via signal transduction suchas chemicals.One of the possible drivers of collectively movingagents is the presence of inﬂuential individuals, some-times referred to as ‘leaders’, who control the movementof the other individuals, referred to as the ‘followers’.Leader-follower relationships have been studied in a ﬁshshoal [18], troops of baboons [19], in a colony of honey bee[20] and so forth. At the cellular level, it has been stud-ied that collective migration of MDCK epithelial cells[21, 22], wound healing [23], cancer growth in breast [24],etc., are regulated by the leader cells.Identifying leader and follower agents is a challengingendeavor. First and foremost, one must identify whatit means to be a leader. Based on asymmetric natureof inﬂuence on activities among entities, in this article,we deﬁne a ‘leader’ as an entity which more inﬂuences(on average) on the activity of the other entities (termed‘followers’). Once leadership has been deﬁned, varioustypes of empirical data, e.g., ensembles of trajectoriesof agents, can be used to infer the diﬀerential inﬂuencein interaction and identify leader-follower relationships.By deﬁnition, leaders are expected to be more persua-sive compared to the followers. Since followers followthe movement of leaders, some correlation should existbetween some physical quantity related to a leader andthat related to a follower with a certain time delay, asinformation cannot travel from a leader to a follower atinﬁnite speed.To measure causal inﬂuence among multivariate timeseries, information theory provides a variety of ap-proaches [25]. Some of the typical quantities used are mu-tual information [26], time-delayed mutual information[27], transfer entropy [27], and causation entropy [28].These quantities have been computed using time lapsemotion data of moving individuals to determine the di-rections of inﬂuence. For example, it was found by usinga zebraﬁsh interaction model that net transfer entropyis a more accurate classiﬁer than extreme-event synchro-nization and cross-correlation for classifying leaders andfollowers [13]. In swarms of bats, transfer entropy wasused to demonstrate that there exists a leader-followerrelationship between the front bat and the rear bat [29].Using the trajectories of handball players, it was showedthat transfer entropy is capable of capturing the causalrelationships between players [30].In the above-mentioned studies, all pairs of agents aretaken into account at every time instance to evaluatetransfer entropy irrespective of the distance between theagents. This is not necessarily an optimal use of the dataavailable for capturing the underlying leader-follower re- lationship, given that the interaction domain is known.It was shown, using a modiﬁed VM, that the classiﬁca-tion scores of leaders and followers signiﬁcantly increaseupon incorporating the identiﬁed interaction domain in-formation compared to the conventional way of trans-fer entropy estimation where the distance between theagents is not taken into account [31]. When two parti-cles come into their interaction domain, they may shareor transfer information which results in some change intheir motion such as the direction of motion. As the dis-tance between them exceeds the interaction radius, theamount of information ﬂow decreases and goes to zero atthe limit of the distance being inﬁnity in a ﬂuctuatingenvironment. This methodology requires that the inter-action domain is known, which may not be the case. Anew scheme has been proposed to estimate the underly-ing interaction domain from the trajectories of particlesto monitor the change in transfer entropy as a functionof the distance between them, called cutoﬀ distance λ . Itwas demonstrated that the derivative of average trans-fer entropy (and also cross correlation) with respect to λ has a minimum near the interaction domain, by whichone can identify the underlying interaction domain froma set of trajectories [31].The scheme is dependent on how transfer entropy canbe estimated so that it takes into account enough statis-tics of interacting particles, and positions and numbersof the minimum of the derivative of average transfer en-tropy along the cutoﬀ distance λ may also be subject tothe extent of external noise and time length of trajec-tories. In this paper, we scrutinize how the predictionperformance in capturing the underlying interaction do-main depends on the size of noise and time length of thetrajectory data. We also examine an alternative schemeexpected to be stable against noises and time length, thatrelies on the degree of convexity at coarse-grained scale inthe derivative of average transfer entropy along the cut-oﬀ distance, and time variance of underlying interactionradius of particles. II. IDENTIFICATION OF LEADERS ANDFOLLOWERS

Transfer Entropy (TE) from time series of a stochas-tic variable X = { ..., x t − , x t , x t +1 , ... } to time series ofanother stochastic variable Y = { ..., y t − , y t , y t +1 , ... } isdeﬁned as [27]: T E X → Y = I ( y t + τ ; x t | y t ) , = (cid:88) y t + τ ,y t ,x t p ( y t + τ , y t , x t ) log ( p ( y t + τ | y t , x t ) p ( y t + τ | y t ) ) , = H ( y t + τ | y t ) − H ( y t + τ | y t , x t ) , (1)where τ is the time lag between the two time instantsand H ( . | . ) represents the conditional Shannon entropy[26]. TE is proven to be non-negative. A positive valueof T X → Y is considered to indicate the causal inﬂuenceof X on Y [32]. For a pair of agents X and Y , the nettransfer entropy from X to Y , deﬁned as N T E X → Y = T E X → Y − T E Y → X can be used to infer the direction ofcausal inﬂuence. A positive N T E X → Y may indicate that Y follows X , which quantiﬁes the causal direction from X to Y .As a classiﬁer to diﬀerentiate leaders and followers, theaverage net transfer entropy is used, which is denoted as χ and deﬁned for a given particle i as follows: χ ( i ) = 1 N − (cid:88) j ( (cid:54) = i ) ( T E i → j − T E j → i ) , where T E i → j represents TE from the particle i to j and N is the total number of particles in the system. Thevalue of χ ( i ) for each particle i is compared to a selectedthreshold value (cid:15) . A particle for which χ ( i ) is higherthan the threshold (cid:15) is identiﬁed as a leader, otherwiseit is identiﬁed as a follower. The resulting classiﬁca-tion is compared to the ground truth to obtain the true-positive rate and the false-positive rate for the chosen (cid:15) . To show the classiﬁcation performance of a classi-ﬁer receiver-operating characteristic curve is used. It isobtained by plotting the true positive rate versus falsepositive rate at diﬀerent values of (cid:15) [13]. In order toquantify the accuracy of the classiﬁer ' s performance andto compare the performance of diﬀerent classiﬁers, areaunder receiver-operating characteristic curve (AUC) hasbeen used [33]. An AUC score of 1.0 means that thatclassiﬁer accurately predicts the identities of the parti-cles whereas a value of 0.5 means that the classiﬁer hasno class separation capacity whatsoever. III. MODIFIED VICSEK MODEL

Similar to the standard VM [8], we consider that N self-propelled particles are moving with the same con-stant speed v in a two-dimensional square box of length L with periodic boundary conditions, and at time t = 0the particles are positioned and oriented randomly. Attime t + 1, the position of i th particle is denoted by (cid:126)r t +1 i is updated at each time step ∆ t as: (cid:126)r t +1 i = (cid:126)r ti + (cid:126)v ti ∆ t, (2)where (cid:126)r ti denotes the position of i th particle at time t , and (cid:126)v ti represents the corresponding velocity of the particlewith an absolute speed v and a direction given by theangle θ i ( t ). This angle is obtained from the followingequation: θ i ( t + 1) = (cid:104) θ ( t ) (cid:105) R, w ,(cid:126)r ti + ∆ θ i . (3)Here (cid:104) θ ( t ) (cid:105) R, w ,(cid:126)r ti is the weighted orientation aver-aged over particles (including the particle i itself),which are within a circle of radius R centered onthe position (cid:126)r ti of the particle i at time t , computed by arctan (cid:104)(cid:80) (cid:48) j w ji sin θ j ( t ) / (cid:80) (cid:48) j w ji cos θ j ( t ) (cid:105) where (cid:80) (cid:48) takes over all j satisfying | (cid:126)r ti − (cid:126)r tj |≤ R [31]. w is amatrix whose element w ij corresponds to the interactionstrength that the particle i exhibits on its neighboringparticle j . If the particle i is a leader and j is a follower,then w ij > w ji . Also the interaction strength of a par-ticle i on itself is 1.0 i.e., w ii = 1 .

0. We set the valuesof leaders ' and followers ' interaction strengths to 1.05and 1.00, respectively. ∆ θ i represents random numberat time t which can be chosen with a uniform probabil-ity distribution from the interval [ − η / , η / η may be considered as a temperature-like parameter. Thetotal time length is designated by T during which trans-fer entropies are estimated between leader and followerparticles. 𝜃 𝑘 𝜃 𝑖 𝜃 𝑗 Δ𝜃 𝑘 Δ𝜃 𝑖 Δ𝜃 𝑗 𝑅 𝑅 𝑅𝜆 𝜆 Figure 1. Schematic diagram of cutoﬀ distance. Ovals repre-sent moving particles. θ i , θ j , and θ k represent the direction ofmotion of receptive particles at time t and R is the interactionradius. λ and λ exemplify two diﬀerent cutoﬀ distances.For example, for the cutoﬀ distance λ = λ , the actual dis-tance between the particles i and j at this time instance islarger than λ , and hence, their orientation information is notconsidered for TE calculation between them even though theparticles are located within each other’s interaction domain(likewise the orientational information between the particle i and k is not taken into account in the computation of TEat λ = λ ). But for λ = λ , particles j and k are both lo-cated within the distance of λ from particle i . Hence theorientation information of θ k and θ i and that of θ j and θ i are considered to compute TE irrespective of the underlyinginteraction radius R . In the VM each particle moves with a constant speed v . In this paper the value of v is set to be 0.3 arb.units. It was found that in the range of 0 . ≤ v ≤ . ' s headings. Hence orienta- . . . . Noise A UC sc o r e (a) N = 2 N = 6 N = 10 N = 15 . . . . Noise A UC sc o r e (b) N = 2 N = 6 N = 10 N = 15 Figure 2. Classiﬁcation scores of diﬀerent numbers of particles at diﬀerent noise levels. (a) Fixed box size. Density changeswith diﬀerent number of particles: ρ = 0.02 arb. units ( N = 2), 0.06 ( N = 6), 0.10 ( N = 10), and 0.15 ( N = 15). (b) Fixeddensity as 0 . L = 4 .

47 arb. units ( N = 2), 7.75 ( N = 6), 10( N = 10), and 12.25 ( N = 15). Interaction radius R T i m e l eng t h T (a) Interaction strength w LF T i m e l eng t h T (b) Figure 3. AUC landscape at diﬀerent time length for diﬀerent interaction radius and interaction strength. (a) AUC landscapecorresponds to η = 1 . π arb. units and w LF = 1 .

05 arb. units. AUC value increases as the time length and interaction radius R increase. (b) AUC landscape correspond to η = 1 . π and R = 1. For this analysis w FL = 1 is ﬁxed and w LF were varied.Similar behaviors were observed at other noise levels (not shown). tion of particles is used to compute transferred informa-tion between them. In this paper, the orientation space[0 : 2 π ] is discretized into six bins of equal size which arerepresented by unique symbols [31].In Eq. (3) the orientation of the particle i at time( t + 1) depends largely on the orientation of itself andnearby particles at time t . The delay time τ is set to be1 for the estimation of TE throughout this paper. Allanalyses were performed with 1000 realizations and foreach realization the values of (cid:126)r ti and (cid:126)r ti at time t = 0 arechosen randomly in Eq. (2). In this paper box size L ,number of particles N , time length T , interaction radius R , are varied to check their eﬀect on classiﬁcation score. IV. CUTOFF DISTANCE

Knowledge of the interaction domain greatly improvesthe classiﬁcation of leaders and followers. In practice,however, one does not know the interaction domain fora group of animals, cells, or birds a priori. To deduceit from an ensemble of trajectories, the ‘cutoﬀ distancevariable’ λ was introduced [31]. In this problem setting,the interaction domain is considered as a circle of radius R , which is unknown, however, in general the same tech-nique can be applied to infer an interaction domain ofany shape. Then for the estimation of TE, the cutoﬀdistance λ is deﬁned as a distance up to which the in-teractions between particles are taken into consideration[Figure 1]. In other words, for the estimation of TE fromthe ‘symbolic time series’ of a particle to another, theprobability distributions are estimated only at the timeinstance t when the distance between those two particlesis less than the cutoﬀ distance λ . Finally the value of λ is varied and TE between particles is computed as afunction of λ . Whenever there is no mention of a cutoﬀdistance λ , e.g. in Section V, it means that the distanceinformation between particles is not considered which isthe conventional way of TE computation [13, 29]. V. RESULTS AND DISCUSSIONS

Figure 4. Distributions of the classiﬁers χ TE (/bits) from theleader to the others, and those from a follower to the othersfor R = 3, η = 1 . π , and (a) T = 100 ,

000 arb. units, (b) T = 50 , T = 25 , T = 12 , Figure 2 shows the AUC classiﬁcation score (=the ex-tent of how leader and followers are correctly classiﬁedfrom their orientation dynamics) with diﬀerent numbersof particles N and noise levels η . Here, only one particleserves as a leader and the other ( N −

1) particles serve asfollowers, under the constraint of box size L [Fig. 2(a)]or density ρ [Fig. 2(b)]. At low noise level η ∼ η > . π , the distributions of transferentropies χ TE from leader to follower and vice versa over1000 realizations were found to signiﬁcantly overlap witheach other, which makes diﬀerentiation between leaderand follower diﬃcult. The low AUC at very low noiselevel arises from the fact that particles fall into someconcerted motion very quickly dependent solely on ini-tial conﬁgurations, resulting in insuﬃcient sampling oforientational dynamics, and in turn the low AUC at veryhigh noise arises from overshadowing of the interactionsby random noises [31]. As the number of particles in-creases, the AUC score decreases in both ﬁxed box sizeand ﬁxed density cases. This is because, in any pair of Figure 5. Distributions of the classiﬁers χ TE (/bits) from theleader to the others, and those from a follower to the othersfor T = 100 , η = 1 . π , and (a) R = 4 .

0, (b) R = 3 .

0, (c) R = 2 .

0, and (d) R = 1 . particles for which TE is evaluated, their motions arealso inﬂuenced by the other particles, and the more thenumber of particles increases, the more the motions ofthe particles in question are inﬂuenced by the third (orhigher) particle. Compared the ﬁxed box size case to theﬁxed density case, the AUC score for the ﬁxed box sizequickly drops to 0.5 (corresponding to a coin toss) around η ∼ . π than those for the ﬁxed density condition. Itsuggests that, as the density gets higher, the motion ofparticles in any pair are more inﬂuenced by additionalparticles.The eﬀects of time length T , interaction radius R , andleaders ' interaction strength w LF on classiﬁcation scoreare shown in Fig. 3. In this analysis 10 particles wereused, with one serving as a leader and the other 9 par-ticles as followers. In Fig. 3(a) we varied time length T and interaction radius R . It is shown that the AUCscore increases with T and R . For shorter T , due to in-suﬃcient sampling in characterizing leader-follower inter-action relationship, the distributions of χ TE from leaderto the others and from follower to the others have highervariance as shown in Fig. 4. As a result, it is diﬃcultto distinguish leader and followers for short T . As T increases, due to more exploration of interaction eventsbetween leader and followers, the variance of leaders’ andfollowers’ distributions gets smaller, making the classiﬁ-cation easier. Figure 5 shows, in turn, the interactionradius R dependency on distributions of the classiﬁers χ TE . Larger R allows particles to be taken into ac-count in elucidating the leader-follower interaction rela-tionship, which produces easily distinguishable distribu-tions of leader and followers as shown in Fig. 5(a). In .

25 0 .

75 1 .

25 1 .

75 2 .

25 2 .

75 3 .

25 3 .

75 4 .

25 4 . Cutoff distance N (a) .

25 0 .

75 1 .

25 1 .

75 2 .

25 2 .

75 3 .

25 3 .

75 4 .

25 4 . Cutoff distance N (b) Figure 6. AUC landscape for diﬀerent numbers of particles at diﬀerent cutoﬀs. Actual interaction radius R used for thetrajectory calculation is 3 . η = π . (a) Fixed box size. The highest AUC scores and their locations are: 0.999at λ = 3 .

25 arb. units ( N = 2), 0.984 at λ = 3 .

25 ( N = 4), 0.946 at λ = 3 . N = 6), 0.926 at λ = 3 .

25 ( N = 8), 0.903 at λ = 3 . N = 10), 0.878 at λ = 3 . N = 12), and 0.846 at λ = 3 . N = 15). (b) Fixed density. The highest AUC scores andtheir locations are: 0.999 at λ = 3 . N = 2), 0.986 at λ = 3 .

25 ( N = 4), 0.947 at λ = 3 . N = 6), 0.921 at λ = 3 . N = 8),0.897 at λ = 3 . N = 10), 0.870 at λ = 3 . N = 12), and 0.851 at λ = 3 . N = 15). (a) T =12,000 T =25,000 T =50,000 T =100,000 (b) T =12,000 T =25,000 T =50,000 T =100,000 (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 7. Derivative of average TE for diﬀerent time length along with local minima and identiﬁed interaction radius based onglobal minimum scheme at (a) η = 0 . π , (b) η = 1 . π and (c) η = 1 . π . The actual interaction radius R used for ensembleof trajectories is 3. Circles represent all local minima and ﬁlled-circles represent the identiﬁed interaction radius at each timelength T . contrast, when R is smaller, the χ TE distributions of lead-ers and followers overlap each other more with smallervariance, making the classiﬁcation more diﬃcult.In Fig. 3(b) we set R = 1 . w LF . In thisanalysis followers’ interaction strength is set to 1.0, i.e., w FL = w FF = 1 .

0. Hence w LF = 1 . w LF increases AUC value also increases as the leader isgetting more inﬂuential on followers which makes classi-ﬁcation easier even at short time length.How does the predictability of interaction radius byusing TE with cutoﬀ distance depend on the number of particles? Figure 6 represents the AUC landscape as afunction of number of particles N and cutoﬀ distance λ .Figure 6(a) corresponds to a ﬁxed box size (i.e. densityis changing with N ), whereas Fig. 6(b) represents thesystems having same density (i.e. L is changing with N ). The actual interaction radius used to simulate thetrajectories of the particles was R = 3 and noise was setto η = π . Although at each N the maximum AUCsare located at the underlying interaction radius R = 3 . R = 5 √ (a) T =12,000 T =25,000 T =50,000 T =100,000 (b) T =12,000 T =25,000 T =50,000 T =100,000 (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 8. Derivative of average TE for diﬀerent time lengths along with the identiﬁed interaction radius based on convexityscore scheme at (a) η = 0 . π , (b) η = 1 . , π and (c) η = 1 . π for { n } = { n | ≤ n ≤ } and { M } = { M | ≤ M ≤ } and δ = 1 × − . between particles. Similar behaviors are observed at dif-ferent interaction radius R and noise levels (not shownhere).How can one infer the underlying interaction radiussolely from ensembles of trajectories? Recently, a sim-ple scheme has been proposed [31] to infer the underly-ing interaction distance from ensembles of trajectories,based on the existence of a signiﬁcant decrease in aver-aged transfer entropy when cutoﬀ distance λ exceeds theunderlying interaction radius. The interaction distanceis inferred as that where the minimum of the derivativesexists along the cutoﬀ distance λ :ˆ R ≡ argmin λ d (cid:104) TE (cid:105) λ dλ , under the condition of d (cid:104) TE (cid:105) λ dλ = 0. In practice, thelength of trajectories may not be long enough and shorterlength tends to result in a ﬂuctuation in the course of TEalong the cutoﬀ distance λ , resulting in apparent minima.Figure 7 shows the derivative of average TE as a func-tion of cutoﬀ distance λ , denoted by d (cid:104) T E (cid:105) λ dλ for interac-tion radius R = 3 at (a) η = 0 . π , (b) η = 1 . π , and (c) η = 1 . π for diﬀerent T . Here circles represent all localminima and the ﬁlled-circles represent the global mini-mum identiﬁed as the interaction radius. In Figs. 7(b)and 7(c) for relatively short T = 25 ,

000 or less, d (cid:104) T E (cid:105) λ dλ as a function of λ has the global minimum at low λ . Thelocations of these global minima for short time lengthchange with T and they vanish when T is longer, andthe longer T = 50 , − ,

000 both result in a closevalue of the underlying interaction radius R = 3. Thisimplies that to look for global minimum of derivative oftransfer entropy may not necessarily result in an approx-imation of the true interaction radius especially for someshort T at high noise levels.In this paper, we present another scheme expectedto be robust against ﬂuctuations of average transfer en- tropies along the cutoﬀ distance by introducing a mea-sure to quantify the degree of strong convexity at coarse-grained level, and time variance of underlying interactionradius of particles as follows.Note in the insets of Fig. 7 that the shape of d (cid:104) T E (cid:105) λ dλ as a function of λ is (strongly) convex near the actualinteraction radius irrespective of time length T . This in-dicates that the derivative of transfer entropy as a func-tion of cutoﬀ distance can shed light on the underlyingspatial scale of interactions among particles. However, itis not trivial to devise a scheme to automatically inferthe interaction radius. Since d (cid:104) T E (cid:105) λ dλ as a function of λ isconvex near the interaction radius, a measure of convex-ity of d (cid:104) T E (cid:105) λ dλ is versatile in determining the interactionradius. In general, due to noise, d (cid:104) T E (cid:105) λ dλ can be ﬂuctu-ated, producing apparent convex patterns locally. Thusin deﬁning the convexity score, it is necessary to cap-ture the non-local feature of d (cid:104) T E (cid:105) λ dλ rather than the localfeature that may be subject to noise(s). We deﬁne theconvexity score κ ( λ i ) of a function f ( λ ) at a point λ i as κ ( λ i ) = M (cid:80) Mm =1 σ i ( m ) where σ i ( m ) = +1 if f ( λ i − m ) − f ( λ i ) > δ and f ( λ i + m ) − f ( λ i ) > δ and σ i ( m ) = − f ( λ i ) − f ( λ i − m ) > δ and f ( λ i ) − f ( λ i + m ) > δ , otherwise σ i ( m ) = 0. Here δ is a non-negative small number and M is the number of neighboring points used to determinethe convexity score, and − ≤ κ ( λ i ) ≤

1. As the functionof f ( λ ) we employ a simple moving averaged d (cid:104) T E (cid:105) λ dλ withan interval n : f ( λ i ) = 1 /n (cid:80) n − j =0 d (cid:104) T E (cid:105) λ /dλ | λ = λ i + j .Thus, around a point λ i where d (cid:104) T E (cid:105) λ dλ is convex atsome coarse-grained level κ ( i ) is close to unity. Hencethe point λ i around which κ ( λ i ) is the maximum is con-sidered to be an indicator of the interaction radius abovewhich average transfer entropy between particles signiﬁ-cantly decreases.How to choose the optimal n , M , and δ ? Wedeﬁne the estimated interaction radius ˆ R as ˆ R ≡ Figure 9. Relative error ˆ R in identifying underlying interaction domain using global minimum scheme at (a) η = 0 . π , (b) η = 1 . π , and (c) η = 1 . π . Cross-marked boxes ‘NaN’ mean that any minimum of derivatives was not detected.Figure 10. Relative error ˆ R in identifying underlying interaction domain using convexity score scheme with δ = 1 × − and { n } = { n | ≤ n ≤ } , { M } = { M | ≤ M ≤ } at (a) η = 0 . π , (b) η = 1 . π , and (c) η = 1 . π . Cross-marked boxes‘NaN’ mean that the no point was detected having non-zero convexity score κ . argmax λ κ ( n, M ; T ), where T represents the time length.Then the cost function is deﬁned as C ( n, M ) ≡ (cid:88) T (cid:88) T (cid:48) | ˆ R ( n, M ; T ) − ˆ R ( n, M ; T (cid:48) ) | , (4)by assuming that the interaction radius is inde-pendent of time, i.e., time-invariant, and there ex-ists (approximately) suﬃcient statistics for each timelength in elucidating the interaction events. Theparameters ( n, M ) is determined so that ( n, M ) =argmin n ∈{ n } ,M ∈{ M } C ( n, M ). Here the sets of n and M to be searched for ﬁnding optimal n and M , { n } and { M } , are those users need to input a priori . Note thatfor some time length T , ˆ R ( n, M ; T ) could not be chosenuniquely due to the degeneracy of κ ( n, M ; T ) and alsoˆ R ( n, M ; T ) could become undeﬁned when d (cid:104) T E (cid:105) λ dλ curvedoes not has any strongly convex part. In both the cases,we exclude such T in Eq. (4) to compute the cost function C ( n, M ) in deﬁning optimal n and M . We also varied δ = 0 , × − , × − , × − (arb. units) and foundthat within this range of δ has no signiﬁcant eﬀect on theestimation of interaction radius.Figures 9 and 10 show the relative errors (∆ R ) inidentifying the underlying interaction domain using the global minimum of the derivative of transfer entropy andconvexity score scheme, respectively, as a function of timelength and interaction radius at three diﬀerent noise lev-els. Here relative error (∆ R ) is deﬁned as ∆ R = | R − ˆ R | R ,where ˆ R is the identiﬁed interaction radius using eitherof the two schemes.For the global minimum (of TE derivative) scheme[Fig. 9], there exists a clear trend such that the largerthe noise level the larger the relative error, and as thetime length T decreases, the relative errors are more pro-nounced for relatively large noise levels η = 1 . π − . π .Because of the appearance of a minimum at low cutoﬀdistance for short T that ceases to exist for longer T inFigs. 7(b) and 7(c), the global minimum scheme appar-ently possesses higher relative error for short T [Figs. 9(b) and 9 (c)]. Note also that even when the time length T is suﬃciently long, e.g., T = 50000 − η = 1 . π , and inter-action radius R is short, i.e., R = 1 . − . R is short ( R = 1 . ,

2) and noise levelis very high [Fig. 10(c)].

VI. CONCLUSIONS

In this study, we examined the performance of twoheuristic schemes using the derivative of transfer entropywith respect to cutoﬀ distance λ , d (cid:104) T E (cid:105) λ dλ : global mini-mum and convexity score scheme for determining the in-teraction radius by using the modiﬁed VM. The strikingfeature -based on which we proposed the two schemes-is that d (cid:104) T E (cid:105) λ dλ exhibits a kink near the actual interac-tion radius. A method that is capable of determiningthe exact location of that kink can be used in inferringinteraction radius.For short time length (at the moderate and high levelof noise) for the modiﬁed VM, the derivative of aver-age TE exhibits a minimum at low cutoﬀ distances thatproduce a relatively high error for the global minimumscheme. Moreover, in real experiments it is not neces-sarily possible to get suﬃciently long trajectories withsmall noises, and the global minimum scheme may yieldsome non-negligible error especially for short trajecto-ries with noise. In this paper, an alternative scheme waspresented, based on the property of convexity of d (cid:104) T E (cid:105) λ dλ at the coarse-grained level and the assumption of time-invariance of the underlying interaction domain. For themodiﬁed VM, the scheme could capture the underlyinginteraction radius at the low and moderate levels of noiseespecially for relatively short time length, ca. T =12,000-25,000 for which global minimum scheme possesses highrelative error (at moderate noise level). These two heuris-tic schemes are compliment to each other and, as forappropriate usages, one should ﬁrst visualize transfer en-tropy as a function of cutoﬀ distance λ with its deriva-tives with respect to cutoﬀ distance to conﬁrm the exis-tence of abrupt changes along the cutoﬀ distance. Sig-niﬁcant changes in the derivative of TE with respect tocutoﬀ distance were also observed for classical trajecto-ries of particles interacting via Lennard-Jones potential(not shown), and the existence of some signiﬁcant changealong cutoﬀ distance around the typical length scale ofinteractions may be ubiquitous.In systems with many variables, identifying causal re-lationships is a daunting task. An important aspect ofsystems that should be exploited, however, is that partic-ular variable may be only inﬂuencing another particularvariable at certain time instances. We have shown thatﬁltering out those time instances where inﬂuence does not occur greatly improves the identiﬁcation of causal rela-tionships. In the Vicsek model, for example, two agentsmay only interact when their distance is less than a cer-tain threshold. To pose this as a question, we wonder atwhich value of interaction radius R does the motion ofone agent inﬂuence the motion of another? More gener-ally, we ponder the question: at which levels of variable X does variable Y inﬂuence variable Z ? In future work,we will demonstrate the applicability of this method byapplying it to data sets stemming from diﬀerent ﬁelds. ACKNOWLEDGMENTS

We thank Prof. A. Nakamura, Prof. M. Toda, andProf. S. Sawai for their valuable discussions. This workwas supported by a Grant-in-Aid for Scientiﬁc Researchon Innovative Areas “Singularity Biology (No.8007)”(18H05413), MEXT, and by JSPS (No. 25287105 and25650044 to T.K.), and JST/CREST (No. JPMJCR1662to T.K.).

Appendix A: Estimated interaction radius at R = 2 . Figure (11) shows the derivative of average TE for dif-ferent time lengths along with the identiﬁed interactionradius based on convexity score scheme at diﬀerent noiselevels. It has been found that at low and moderate noiselevels the convexity score scheme identiﬁes the interac-tion radius satisfactorily [Figs. 11(a) and 11(b)]. But atvery high level of noise ( η ≈ . π ) this scheme performssatisfactorily for longer T but possesses higher relativeerror for shorter T [Fig. 11(c)]. Appendix B: Convexity score

According to the deﬁnition convexity score κ is maxi-mum at the point for which the d (cid:104) T E (cid:105) λ dλ is strongly convexand κ is minimum where the curve is concave. Figure(12) shows the convexity score for very high noise level( η = 1 . π ) at diﬀerent interaction radii R . For R = 1 . T ( T = 12 , − , λ . Since d (cid:104) T E (cid:105) λ dλ has no convex point for T = 50 , − , R = 1 .

5, the convexity score scheme fails to identifyinteraction radius. For R = 2 . T = 50 ,

000 but it identiﬁes inter-action radius for T = 100 ,

000 [Fig. 12(b)]. For R = 2 . T ( T = 50 , − , (a) T =12,000 T =25,000 T =50,000 T =100,000 (b) T =12,000 T =25,000 T =50,000 T =100,000 (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 11. Derivative of average TE for diﬀerent time lengths along with the identiﬁed interaction radius based on convexityscore scheme at (a) η = 0 . π , (b) η = 1 . , π and (c) η = 1 . π for R = 2 . δ = 1 × − , and { n } = { n | ≤ n ≤ } and { M } = { M | ≤ M ≤ } . Cutoff distance C on v e x i t y sc o r e (a) T =12,000 T =25,000 T =50,000 T =100,000 Cutoff distance -0.4-0.200.20.4 C on v e x i t y sc o r e (b) T =12,000 T =25,000 T =50,000 T =100,000 Cutoff distance -0.4-0.200.20.4 C on v e x i t y sc o r e (c) T =12,000 T =25,000 T =50,000 T =100,000 Figure 12. Convexity score κ at (a) R = 1 .

5, (b) R = 2 .

0, and (c) R = 2 . η = 1 . π .[1] A. Grada, M. Otero-Vinas, F. Prieto-Castrillo, Z. Obagi,and V. Falanga, Journal of Investigative Dermatology , e11 (2017).[2] E. Theveneau and C. Linker, F1000Research (2017).[3] P. Friedl and D. Gilmour, Nature reviews Molecular cellbiology , 445 (2009).[4] A. Haeger, K. Wolf, M. M. Zegers, and P. Friedl, Trendsin cell biology , 556 (2015).[5] X. Trepat, M. R. Wasserman, T. E. Angelini, E. Millet,D. A. Weitz, J. P. Butler, and J. J. Fredberg, Naturephysics , 426 (2009).[6] W. M. Lord, J. Sun, N. T. Ouellette, and E. M. Bollt,IEEE Transactions on Molecular, Biological and Multi-Scale Communications , 107 (2016).[7] C. W. Reynolds, in Proceedings of the 14th annual con-ference on Computer graphics and interactive techniques (1987) pp. 25–34.[8] T. Vicsek, A. Czir´ok, E. Ben-Jacob, I. Cohen, andO. Shochet, Phys. Rev. Lett. , 1226 (1995).[9] I. D. Couzin, J. Krause, R. James, G. D. Ruxton, andN. R. Franks, Journal of theoretical biology , 1(2002). [10] W. Li, H.-T. Zhang, M. Z. Chen, and T. Zhou, Phys.Rev. E , 021920 (2008).[11] A. Creppy, F. Plourabou´e, O. Praud, X. Druart, S. Cazin,H. Yu, and P. Degond, Journal of The Royal SocietyInterface , 20160575 (2016).[12] B. Szab´o, G. J. Sz¨oll¨osi, B. G¨onci, Z. Jur´anyi,D. Selmeczi, and T. Vicsek, Phys. Rev. E , 061908(2006).[13] S. Butail, V. Mwaﬀo, and M. Porﬁri, Phys. Rev. E ,042411 (2016).[14] V. Mwaﬀo, S. Butail, and M. Porﬁri, Frontiers inRobotics AI , 35 (2017).[15] M. Aldana, V. Dossetti, C. Huepe, V. M. Kenkre, andH. Larralde, Phys. Rev. Lett. , 095702 (2007).[16] Z. Liu and L. Guo, Science in China Series F: InformationSciences , 848 (2008).[17] H. Chat´e, F. Ginelli, G. Gr´egoire, F. Peruani, andF. Raynaud, The European Physical Journal B , 451(2008).[18] J. Krause, D. Hoare, S. Krause, C. Hemelrijk, andD. Rubenstein, Fish and Fisheries , 82 (2000).[19] C. Sueur, BMC ecology , 26 (2011). [20] T. D. Seeley, The wisdom of the hive: the social phys-iology of honey bee colonies (Harvard University Press,2009).[21] N. Yamaguchi, T. Mizutani, K. Kawabata, and H. Haga,Scientiﬁc reports , 1 (2015).[22] M. Reﬀay, L. Petitjean, S. Coscoy, E. Grasland-Mongrain, F. Amblard, A. Buguin, and P. Silberzan,Biophysical journal , 2566 (2011).[23] T. Omelchenko, J. Vasiliev, I. Gelfand, H. Feder, andE. Bonder, Proceedings of the National Academy of Sci-ences , 10788 (2003).[24] K. J. Cheung, E. Gabrielson, Z. Werb, and A. J. Ewald,Cell , 1639 (2013). [25] K. Hlav´aˇckov´a-Schindler, M. Paluˇs, M. Vejmelka, andJ. Bhattacharya, Physics Reports , 1 (2007).[26] T. M. Cover and J. A. Thomas, Elements of informationtheory (Wiley Interscience, New York, 2006).[27] T. Schreiber, Phys. Rev. Lett. , 461 (2000).[28] J. Sun and E. M. Bollt, Physica D: Nonlinear Phenomena , 49 (2014).[29] N. Orange and N. Abaid, Eur. Phys. J. , 3279 (2015).[30] K. Itoda, N. Watanabe, and Y. Takefuji, Procedia Com-puter Science , 85 (2015).[31] U. S. Basak, S. Sattari, K. Horikawa, and T. Komat-suzaki, Phys. Rev. E , 012404 (2020).[32] R. G. James, N. Barnett, and J. P. Crutchﬁeld, Phys.Rev. Lett. , 238701 (2016).[33] J. A. Hanley and B. J. McNeil, Radiology143