System Size Identification from Sinusoidal Probing in Diffusive Complex Networks
SSystem Size Identification from Sinusoidal Probing
Melvyn Tyloo and Robin Delabays School of Engineering, University of Applied Sciences of Western Switzerland HES-SO, CH-1951 Sion, Switzerland. Automatic Control Laboratory, ETH Zurich, CH-8092 Zurich, Switzerland (Dated: September 17, 2020)One of the most fundamental characteristic of a complex system is its size (or volume), which,in many modelling, is represented by the number of its individual components. Complex systemsunder investigation nowadays are typically large and/or time-varying, rendering their identificationchallenging. We propose here an accurate and efficient method to determine the size of (i.e., numberof agents in) a complex dynamical system. Our only requirement is to be able to inject a probingsignal at any point of the system and to measure its response to our probing. The complexity ofthe approach depends only on the length of the measurements and not on the size of the systeminvestigated, which renders it extremely efficient.
Introduction.
At the era of big data , access to high-resolution (in space and time) measurements is increas-ingly easy. In parallel, the ever-improving computationalpower of computers allows for the analysis of such largesets of data [1]. Among many examples, these consider-ations apply to domains ranging from electrical grid tosocial networks and gene regulatory networks.
Phasormeasurement units [2] are broadly installed in the elec-trical grid and provide high-frequency measurements ofvoltage and current over large interconnected areas of thegrid. Similarly, online social networks gather a significantfraction of the world population with the ability to mon-itor their opinion in real time [3]. And modern biochem-ical technologies allow to assess the expression of a largevariety of genes governed by gene regulatory networks [4].Overall, it is a general trend that an increasing number ofdata are available covering larger and larger physical andvirtual systems. With maching learning leading the way,modern technologies promise to leverage this increasingamount of data to improve the wellbeing of humanity inthe future [5].However, the other side of the coin of this tremen-dous amount of data is that it comes with significantuncertainties. Indeed, it is technically impossible to per-manently monitor the monitoring system itself. Somemeasurements might be inaccurate, wrong, or even miss-ing. Furthermore, in general, it is not possible to guaran-tee that each component of a large system will be mon-itored at all time, either due to the number of units tobe monitored or to the time-varying nature of the sys-tem’s components. These unavoidable inaccuracies anduncertainties can jeopardize the efficacy of data-basedtechnologies.In this scope, it is of particular interest to be able torecover the system’s characteristics (parameters, internalstructure, etc.) from the measured data. The more effi-cient such inference method are, the closer to real-timeit can be performed, allowing an accuate picture of thesystem at all time. For instance, recovering the underly-ing structure of a network of dynamical agents, based onmeasurements, has been an active topic of research along the last decades [6–13]. However, the majority of thoseapproaches rely on the knowledge of most (if not all) theagents composing the system. Sometimes, a subset ofthese agents is not accessible to the observer, renderingthe recovery of the network harder, if not impossible [14].In the worst situations, it even happens that the oberverdoes not even know the actual size of the system [15].The number of components in a system is one of itsmost fundamental characteristics and is often unknown,especially for large, time-varying systems. A crucial stepin the process of system identification is then to recoverit as efficiently as possible, i.e., as accurately and withas few measurements as possible. Despite its apparentsimplicity, this problem is actually not trivial and surpiz-ingly underinvestigated given its fundamental relevancein the scope of system identification. The authors of [14]proposed an approach to locate hidden nodes in a net-worked dynamical system, which relies on the compari-son between measured and predicted trajectories of theaccessible agents of the system. This method requiresa good knowledge of the differential equations determin-ing the dynamics of the system in order to numericallyintegrate them. Moreover, the authors assume that thenumber of unaccessible agents is very small (usually onlyone).As far as we can tell, the most up-to-date approachto recover the number of agents in a system has beendetailed in [15]. This approach recovers the number ofunits as the rank of a detection matrix constructed withtime series of the measured units. Namely, it requiresto observe the system at k time steps, along M differ-ent trajectories, and if k and M are large enough, inprinciple, the total number of dynamical units can be re-covered. In summary, if N is the number of observableunits, this method relies on N kM measured values andrequires to be able to set the system in M different initialconditions. A recent Letter [16] elegantly draws the linkbetween the detection matrix of [15] and the observabilitymatrix commonly used in control theory [17]. Ref. [16]shows that the approach proposed in [15] can unambigu-ously determine the system size if and only if the system a r X i v : . [ n li n . AO ] S e p is observable in the control-theoretic sense.We show here that an accurate determination of thenumber of units composing a physical system can pro-ceed via measurement of the trajectory of one or twodynamical units, which reduces significantly the numberof measurements compared to the state-of-the-art litera-ture [15]. The cost of our approach is that we require tobe able to inject a probing signal at one of the nodes ofthe network. In counterpart, we are able to accuratelyrecover the number of agents by injecting a probing sig-nal at unit i and by measuring the induced response atunit j (possibly equal to i ). Moreover, the numericalcomplexity of our method is independent of the systemsize, rendering it extremely scalable.We argue that our method can hardly be more efficient.Indeed, extracting information from a system requires tobe able to measure some output (we require one time se-ries) and to have some information about the inputs thattriggered the measured response (we require a controlledprobing signal). System of coupled units.
Let us consider a generalsystem of n coupled dynamical units with first-order dy-namics˙ x i = ω i − n (cid:88) j =1 a ij f ij ( x i − x j ) + ξ i , i = 1 , ..., n , (1)where x i ∈ M is the time-varying value of the i th agent,evolving on a one-dimensional manifold M , ω i ∈ R is thenatural driving term of agent i , and ξ i will be used as aninput to the system. Two agents i and j are interactingif a link between them exists in the interaction network,i.e., if and only if the corresponding term of the adjacencymatrix a ij = 1. We assume that the interaction graph isconnected (otherwise we restrict ourselves to a connectedcomponent). The interaction function between i and j is an odd, differentiable function f ij : R → R , and weconsider the coupling to be attractive ( ∂f ij /∂x >
0) inan interval around x = 0.If a fixed point x ∗ ∈ M n exists, one can linearizeEq. (1) around it, which yields, for a small deviation δx = x − x ∗ , to the approximate dynamics˙ δx = − J ( x ∗ ) δx + ξ , (2)where we use the Jacobian matrix of Eq. (1), J ij ( x ∗ ) = (cid:40) − a ij ∂ x f ij ( x ) (cid:12)(cid:12) x = x ∗ i − x ∗ j , if i (cid:54) = j , (cid:80) k (cid:54) = i ∂ x f ik ( x ) (cid:12)(cid:12) x = x ∗ i − x ∗ k , if i = j . (3)One can verify that the structure of the interac-tions implies that the Jacobian J is a weighted directedLaplacian matrix of the interaction graph. Let us de-notes its right- (resp. left-) eigenvectors u , ..., u n (resp. v , ..., v n ). From now on, we will focus on stable fixedpoints of Eq. (1), implying that the eigenvalues have non-negative real part, 0 = λ < Re( λ ) ≤ ... ≤ Re( λ n ) (note that one eigenvalue is always zero) and the eigenvectorsform a basis of R n .Equation (2) is then solved by expanding the devi-ation δx over the eigenvectors v α of J , i.e., δx i ( t ) = (cid:80) α c α ( t ) v α,i , yielding a set of Langevin equations in c α ,whose solution is c α ( t ) = e − λ α t (cid:90) t e λ α t (cid:48) v (cid:62) α ξ ( t (cid:48) )d t (cid:48) , (4)for α = 1 , ..., n. A detailed derivation of this result canbe found in [18], we do not reproduce it here for sake ofbrevity.
Sinusoidal probing.
We will say that a pair of agents( i, j ) can be probed if we have the ability to inject a prob-ing signal at one of them and to measure the response atthe other. Note that we accept that i = j .We propose now to inject a sinusoidal signal at agent i and to measure its impact at agent j . Let ξ i ( t ) = a sin( ω t ) , (5)be the probing signal at agent i . We do not inject a prob-ing signal at other nodes. To guarantee a minimal impacton the operation of the system, we keep the amplitude a and most importantly the frequency ω to small val-ues. Keeping a small probing frequency guarantees thatthe system can adapt to the input and follow the probingsignal. More precisely, a probing frequency can be qual-ified as ”small” as long as it is smaller than the smallest(nonzero) eigenvalue of the Jacobian matrix J in absolutevalue.Introducing Eq. (5) into Eq. (4), and recombining theeigenmodes yields the following response measured atagent j while probing agent i , x ij ( t ) = (cid:88) α v α,i u α,j a λ α + ω × (cid:2) λ α sin( ω t ) + ω e − λ α t − ω cos( ω t ) (cid:3) , (6)which, in the long time limit λ α t (cid:29) ω (cid:28) λ α , yields x ij ( t ) = J † ij a sin ( ω t ) + a nω [1 − cos ( ω t )] , (7)where the † denotes the Moore-Penrose pseudo-inverse.From here, we distinguish two cases. First, if it is pos-sible to probe the system with a signal frequency which ismuch smaller than all characteristic times of the systems,i.e., ω (cid:28) λ α for all α = 2 , ..., n , then the first term in theright hand side of Eq. (7) can be neglected with respectto the second term. Considering the maximal value ofthe trajectory of x j , one then getsmax t | x ij ( t ) | ≈ a nω , (8)from which one obtains an estimate for the number ofnodes as, ˆ n = 2 a ω max t | x ij | . (9)The only requirement is that the time series is longenough to reach the maximum of the trajectory.Note that we take the maximum to have a better ac-curacy in the estimation. However one can choose a par-ticular time step t , keeping in mind that t too short leadsto vanishing values for Eq. (7).The second case we consider is when, for some reason,it is not possible to probe with a signal with sufficientlylow frequency, meaning that the first term in the righthand side of Eq. (7) cannot be neglected with respectto the second. In this case, we are able to estimate thenumber of agent in the system provided we can measurethe trajectories of two distinct agents j and k while weinject the probing signal at agent i (which, again, can be j or k ). One realizes that the trajectories x ij ( t ) and x ik ( t )differ in the first term of the right hand side of Eq. (7).But this term vanishes for all t = kπ/ω , k ∈ Z . At thesetime steps, all trajectories then coincide, and take valueeither zero, or ˆ x := 2 a /nω . The numer of agents canthen be estimated as ˆ n = 2 a ω ˆ x . (10)As previously, we require that the measured trajectoriesare long enough to reach the intersection.Note that if, for some reason, J † ij = J † ik (which we can-not exclude a priori), then the two trajectories coincidefor all t . In this case, n cannot be estimated based onthese two trajectories (which are technically the same). Numerical validation.
In order to get numerical con-firmation of our results, we will apply them to the Ku-ramoto model on three different interaction graphs. Eachsystem has the same number of units, n = 3809, but sig-nificantly different network structure: ER:
An Erd¨os-R´enyi graph with edge probability p =0 .
003 ( m = 21444 in our realization); SW:
A Small-World constructed following the Watts-Strogatz process [19], with m = 38090 edges andrewiring probability p = 0 . EU:
The more realistic network of the PanTaGruElmodel of the European interconnected high voltagegrid [20, 21], composed of m = 4955 edges.For probing, we use a sinusoidal signal with controlledamplitude and frequency, similarly as what is typicallyused to identify eigenmodes in electrical networks [22].The relative errors of the estimate of n for each of thethree networks considered and for different probing fre-quencies is shown if Fig. 1. For small frequency of prob-ing [panels (g-i)], all trajectories almost coincide and we can use Eq. (9) to estimate the number of agents. Forlarger frequencies [panels (a-f)], the trajectories are dis-tinct, but intersect all at the same time step, and Eq. (10)can be used to estimate the number of nodes. Overall,we see an extremely good accuracy with the trajecto-ries considered, except in Fig. 1(c). We recall that thecomputation time required by these estimations is inde-pendent of the actual size of the system as we only needto work with one or two time series.Determining what a ”small” probing frequency de-pends on the Jacobian matrix. More precisely, a probingfrequency can be considered as ”small” if it is sufficientlysmaller than all the intrinsic time scales of the system,i.e., smaller than all eigenvalues of the Jacobian. In prac-tice, this means that the intrinsic dynamics of the systemrelax before the probing signal significantly changes. Ifthe probing frequency is not small enough, the systemwill not relax before the probing significantly changes.In such a case, our approximation Eq. (7) will be inac-curate and consequently the estimate ˆ n as well. Thisexplains why the estimate in Fig. 1(c) is not very accu-rate. Also, if J has a lot of small eigenvalues (in the sensethat they are close to λ ), the addition of all the slow-relaxing modes will take a long time to have a negligiblecontribution to the trajectories Eq. (6). An accurate es-timate of ˆ n will then require a smaller probing frequency,and this explains why different ω /λ ratios are consid-ered for different networks in Fig. 1. In Fig. 2, we showthe histograms of the eigenvalues of the Jacobian for eachsystem considered in Fig. 1. Unravelling with more preci-sion the relation between the distribution of eigenvaluesand the probing frequency will be the purpose of a futurework. Conclusion.
As far as we can tell, we improved thecurrent state-of-the-art approaches to estimate the sizeof a complex dynamical system, based on one time seriesmeasurement solely. The computation cost is low andindependent of the system’s size. The only requirementsare that we need to be able to probe the system with asignal (typically with low amplitude and frequency) andto measure its response.In our opinion, the performance of our method (interms of cost of data acquisition and computation) isclose to be as optimal as possible. Indeed, in order toget information about the system, one needs to measuresomething, i.e., at least one output, which is what wedo. Furthermore, to be able to analyze the output of thesystem, the observer needs some information about theinput that triggered the response. Aside of our approachwhich is to have a direct control on the input signal, onecould monitor and analyze a non-controlable input sig-nal [13], but this would require more computation time.A possible improvement could be to rely on shorter timeseries, which can be long in our case, due to the low fre-quency of the probing signal.
Figure 1. Histogram of the relative error on the number of nodes obtained for the two methods Eq. (9) and Eq. (10). Weconsider an Erd¨os-R´enyi graph (first row) with m = 21444 edges, a Small-World network (second row) with m = 38090 edges,and rewiring probability p = 0 .
01, and the network of the PanTaGruEl model of the European electrical grid (thrid row) with m = 4955 edges. All networks have n = 3809 vertices. We consider three probing frequency regimes, decreasing from left toright panels. Ratios between probing frequencies and λ are shown in insets. Insets of panels (a-f) show all nodes’ trajectoryas response to a single node probing, while insets of panels (g-i) show trajectories obtained from probing and measurement ata single node for 39 different network nodes. The estimate of the number of agents in the system is given by the height ofthe intersection of the curves in the insets of panels (a-f) [see Eq. (10)] and by the maximal value of the curves in panels (g-i)[see Eq. (9)]. Percentages at the top of each panel correspond to the fraction of nodes for which the error was more than 1%,delimited by the vertical dashed lines.Figure 2. Histograms of the eigenvalues of the three networks considered (normalized by the smallest nonvanishing eigenvalue λ ). We observe very different distributions for different network topologies. Acknowledgments.
We thank Philippe Jacquod foruseful and stimulating discussions. This work has beensupported by the Swiss National Science Foundation un-der grant 2000020 182050. RD acknowledges support byETH Z¨urich funding. [1] M. Hilbert and P. Lopez, Science , 60 (2011). [2] J. Machowski, J. W. Bialek, and J. R. Bumby,
PowerSystem Dynamics , 2nd ed. (Wiley, Chichester, U.K,2008).[3] V. Sekara, A. Stopczynski, and S. Lehmann, Proc. Natl.Acad. Sci. USA , 9977 (2016).[4] D. Bray, Science , 1864 (2003).[5] J. D. Ford, S. E. Tilleard, L. Berrang-Ford, M. Araos,R. Biesbroek, A. C. Lesnikowski, G. K. MacDonald,A. Hsu, C. Chen, and L. Bizikova, Proc. Natl. Acad.Sci. USA , 10729 (16).[6] N. Prˇzulj and N. Malod-Dognin, Science , 123 (2016).[7] D. Yu, M. Righero, and L. Kocarev, Phys. Rev. Lett. , 188701 (2006).[8] M. Timme, Phys. Rev. Lett. , 224101 (2007).[9] M. Newman, Networks (Oxford University Press, 2018).[10] T. P. Peixoto, Phys. Rev. Lett. , 128301 (2019).[11] M. Timme and J. Casadiego, J. Phys. A , 343001(2014).[12] I. Brugere, B. Gallagher, and T. Y. Berger-Wolf, ACMComput. Surv. , 24 (2018). [13] M. Tyloo, R. Delabays, and P. Jacquod, arXiv preprint:2007.16136 (2020).[14] R.-Q. Su, W.-X. Wang, and Y.-C. Lai, Phys. Rev. E ,065201 (2012).[15] H. Haehne, J. Casadiego, J. Peinke, and M. Timme,Phys. Rev. Lett. , 158301 (2019).[16] M. Porfiri, Phys. Rev. Lett. , 168301 (2020).[17] W. J. Rugh, Linear System Theory , Vol. 2 (Prentice Hall,1996).[18] M. Tyloo, T. Coletta, and P. Jacquod, Phys. Rev. Lett. , 084101 (2018).[19] D. J. Watts and S. H. Strogatz, Nature , 440 (1998).[20] M. Tyloo, L. Pagnier, and P. Jacquod, Sci. Adv. ,eaaw8359 (2019).[21] L. Pagnier and P. Jacquod, “PanTaGruEl - a pan-European transmission grid and electricity genera-tion model (Zenodo Rep.),” https://doi.org/10.5281/zenodo.2642175 (2019).[22] J. W. Pierre, N. Zhou, F. K. Tuffner, J. F. Hauer, D. J.Trudnowski, and W. A. Mittelstadt, IEEE Trans. PowerSyst.25