Community detection analysis in wind speed-monitoring systems using mutual information-based complex network
Mohamed Laib, Fabian Guignard, Mikhail Kanevski, Luciano Telesca
AAIP/123-QED
Community detection analysis in wind speed-monitoring systems using mutualinformation-based complex network
Mohamed Laib, a) Fabian Guignard, Mikhail Kanevski, and Luciano Telesca IDYST, Faculty of Geosciences and Environment, University of Lausanne 1015,Switzerland. CNR, Istituto di Metodologie per l’Analisi Ambientale, 85050 Tito (PZ),Italy. (Dated: 5 September 2018)
A mutual information-based weighted network representation of a wide wind speedmonitoring system in Switzerland was analysed in order to detect communities. Twocommunities have been revealed, corresponding to two clusters of sensors situatedrespectively on the Alps and on the Jura-Plateau that define the two major climaticzones of Switzerland. The silhouette measure is used to evaluate the obtained com-munities and confirm the membership of each sensor to its cluster.Keywords: Wind, Weighted network, Mutual information, Community detection,Time series a) Electronic mail: [email protected] a r X i v : . [ phy s i c s . s o c - ph ] S e p ince the installation of dense meteorological monitoring systems made availablea huge amount of data, investigating the properties of meteo-climatic parame-ters has become challenging to understand the mechanisms underlying climaticsystems. Complex networks represent an important theoretical framework thathelps to describe and understand the interaction among meteo-climatic param-eters concomitantly measured by sensors of a very dense monitoring system.This work proposes a mutual information-based network to study the interac-tion between the wind speed series measured by a meteorological monitoringsystem in Switzerland, characterized by so diverse topographies. Applying amultilevel community detection method, two clusters of wind stations were iden-tified, matching the two main climatic zones of Switzerland. The results of thisstudy suggest new methodological approaches to investigate wind speed timeseries.I. INTRODUCTION Over the past years, more and more data have been being collected at ever higher fre-quency that developing efficient pattern-detection methods and data-mining techniques hasbecome very crucial to identify a few highly informative features. In this context, one of themost relevant examples is given by high-dimensional (multiple) time series that originatefrom constituent units of large systems characterized by inner interactions.The cooperative behaviour within a complex system involving relationships among itsconstituent units can be effectively described by networks, where the interactions amongthe constituents (or nodes of the network) are represented by links. The topology of thenetwork, which coincides with the topology of such interconnections or links, is in itselfcomplex . These networks show a certain organization at a mesoscopic level, which isintermediate between the microscopic level that involves the single constituent units and themacroscopic level that involves the entire system as a whole. This mesoscopic level reflectsthe modular organization of the system, characterized by the existence of interconnectedgroups where some units are heavily linked with each other while, at the same time, areless correlated with the rest of the network. These interconnected groups are generally2eatured as communities . Detecting such communities represents an important step inthe dynamical characterization of a network, because it could reveal special relationshipsbetween the nodes that may not be easily detectable by direct empirical tests , this helpsto a better understanding of the characteristics of dynamic processes that take place in anetwork.The use of complex networks to understand the interactions characterizing a climaticsystem has been growing in the last years , and various approaches have been usedin constructing the related networks . Further, complex network offer a new math-matical modelling approach for non-linear dynamics and for climatological data analysis .Among the meteo-climatic parameters, wind is an important factor that influences theevolution of a climatic system; several studies have been devoted to understand better itstime dynamics by using several methods, like extreme value theory and copula , machinelearning algorithms , visibility graph analysis , Markov chains models , fractal , mul-tifractal analysis .The topological properties of wind systems have been a focus of investigation only inthe very recent years. Laib et al. studied the long-range fluctuations in the connectivitydensity time series of a correlation-based network of high-dimensional wind speed time seriesrecorded by a monitoring system in Switzerland. They found that the daily time series of aconnectivity density of the wind speed network is characterized by a clear annual periodicitythat modulates the connectivity density more intensively for low than high absolute valuesof the correlation threshold. Laib et al. analysed the multifractality of connectivity densitytime series of the wind network and found that the larger multifractality at higher absolutevalues of thresholds could be probably induced by the higher spatial sparseness of the linkednodes at these thresholds.Considering the topographic conditions of Switzerland and its wide-spread wind moni-toring system, it is challenging to investigate the topology of the wind network in termsof existence of network communities, and to check if these communities match with thetopography of the territory.To this aim, the edges of the network (the links between any two stations of the windsystem, which are the nodes of the network) are weighted by the mutual information be-tween the wind time series recorded at each station. The mutual information, which quan-tifies the degree of non-linear correlation between two time series, has been already used to3onstruct seismic networks , global foreign exchange markets , prediction of stock marketmovements . II. DATA AND NETWORK CONSTRUCTION
The data used in this work consists of daily mean wind speed, collected from 119 mea-suring stations from 2012 to 2016 by SwissMetNet, which is one of the weather monitoringsystems in Switzerland covering almost homogeneously all the Swiss territory (Fig. 1). Fig.2 shows, as an example, some of the measured wind speed series.To construct the network, the mutual information was used as a metric to weight theedges between the nodes: I ( X, Y ) = (cid:88) x ∈ X (cid:88) y ∈ Y p ( x, y ) log (cid:18) p ( x, y ) p ( x ) p ( y ) (cid:19) (1)where X and Y are two different random variables (wind time series), p ( x ) and p ( y ) arerespectively their probabilities, while p ( x, y ) is their joint probability.Mutual information is a measure of the amount of information that one random variablecontains about another random variable .It can be shown that Eq. 1 can be written as follows I ( X, Y ) = D ( p ( x, y ) (cid:107) p ( x ) p ( y )) (2)where D is the Kullback-Leibler divergence, which is a dissimilarity measure between twoprobability distributions.Thus, the mutual information can be seen as the departure of the joint probability p ( x, y )from the product of the two marginal probabilities p ( x ) and p ( y ). We can easily show that I ( X, Y ) (cid:62) X and Y are independent . Consequently, thehigher the mutual information, the stronger the dependence between X and Y .Since the mutual information, defined in Eq. 1, is symmetric, the network is undirected.Furthermore, the network is completely connected, because all the nodes are connected.However, the edges differ by their weights given by the mutual information.4 II. COMMUNITY DETECTION BY THE MULTILEVEL METHOD
Proposed by Blondel et al. , the MultiLevel algorithm (ML) is one of the communitydetection methods. Yang et al. compared several well-known algorithms of communitydetection (Edge-betweenness , Fastgreedy , Infomap , walktrap , and Spinglass ), andfound that ML outperforms all other algorithms on a set of benchmarks.The ML algorithm aims to optimise the modularity , which measures the density of linksinside a community, and compares it between other communities. The modularity is definedas follows: Q = 12 m (cid:88) ij (cid:20) A ij − k i k j m δ ( c i , c j ) (cid:21) (3) where Q ranges between − : • A ij is the weight between nodes i and j ; • m is the sum of all the weights in the graph; • k i and k j are the sum of weights connected to nodes i and j respectively; • c i and c j communities (classes) of nodes. • δ is the delta function of the variables c i and c j .The ML algorithm consists of two iterative steps. Firstly, each node is considered asa community for an initial partition. Then, the node i is removed from its community c i and placed in another community c j , if this replacement maximises the modularity (Eq. 3),otherwise the node i remains in its original community until when there is no gain in themodularity. The gain in modularity of moving a node i into a community C is computed asfollows : ∆ Q = (cid:34) (cid:80) in +2 k i,in m − (cid:18) (cid:80) tot + k i m (cid:19) (cid:35) − (cid:34) (cid:80) in m − (cid:18) (cid:80) tot m (cid:19) − (cid:18) k i m (cid:19) (cid:35) (4) where (cid:80) in is the sum of weights inside C , (cid:80) tot is the sum of weights of edges incident tonodes in community C , k i,in is the sum of weights of connection of node i with other nodesof the same community C , and m is the sum of all weights in the network.In the second step, every community is considered as a node and building a new network.The weights between these new nodes are defined by the sum of the link weights of the5orresponding communities of the old network, as it is proposed by Arenas et al. for reducingsize of a complex network by preserving the modularity. Then, the first step is applied againon the new network iteratively until the modularity stops to increase. IV. RESULTS AND DISCUSSION
Fig. 3 shows the mutual information among all the nodes. Applying the communitydetection based on the MultiLevel method, three different communities are identified, asshown in Fig. 4. Mapping the communities on the territory of Switzerland (Fig. 5), twoclasses are mixed spatially (stations indicated by green and black circles).To quantify such spatial mixing effect, the well-known silhouette width was used . Thisis defined as s ( i ) = b ( i ) − a ( i ) max { a ( i ) , b ( i ) } (5)where a ( i ) is the dissimilarity between the node (object) i and the other nodes of the samecommunity, b ( i ) is the minimum value of dissimilarity between the node i and the othernodes of other communities, and the dissimilarity is the minimum Euclidean distance. FromEq. 5, we can see that the silhouette s ( i ) ranges between − . .
09 (XY coordinates). These low values indicate that theobtained communities are not well spatially separated.In order to understand the origin of such spatial mixing between communities, we filteredout from the wind series the trend and the yearly cycle by using the Seasonal Decompo-sition of Time Series by Loess (STL) (implemented by using the stl function of the statsR library ). Then, we applied the community detection MultiLevel method to the residualwind series. Fig. 8 shows the residuals of the same time series showed in Fig. 2, and Fig. 9presents two detected communities.Mapping the communities on the Swiss territory, the two communities do not show sig-nificant spatial mixing (Fig. 10).Furthermore, the silhouette width for each station of each community is shown in Figs.11 and 12, and the mean silhouette values are 0 .
35 (mutual information matrix) and 0 . ,
000 random spatial distribution of the stations. Figs. 13 and 14show the histogram of the silhouette width for the randomised classes.
V. CONCLUSIONS
1. The wind network, constructed by representing the interactions between the nodesusing the mutual information, highlights the (non-linear) correlations among the windseries.2. The STL decomposition permits to extract the residuals of the wind speed not influ-enced by the trends and annual weather-induced forcings, but only by local meteo-climatic features depending on the geo-morphological and topographic characteristicsof each measuring station.3. The MultiLevel method for community detection in the mutual information-basednetwork of wind series shows different topological structures of the monitoring system,before and after the removal of the trend and seasonal components. The networkconstructed on the original data is characterized by three different communities, whilethat constructed on the residual data (deprived of the trend and seasonal component)is characterized only by two communities.4. The communities of the network built on the original data are quite spatially mixed.However, the communities of the network built on the residual data are, instead,spatially well separated, with no significantly apparent mixing between the stationsbelonging to the two communities.5. The silhouette width, used to quantify the spatial mixing between the found commu-nities, shows an average value for the communities detected in the network based onthe original data much lower than that found for the communities detected in the net-work based on the residuals. Furthermore, the last is significant against the silhouettewidths calculated after shuffling the stations of the two communities.7. The two communities detected after removing the trend and seasonal componentsmatch very well with climatic zones of Switzerland, the Alps and the Jura-Plateau.This suggests the potential of the complex network method in disclosing the innerinteractions among wind speed series measured in different climatic regions mainlydue to the local topographic factors.
VI. ACKNOWLEDGEMENTS
F. Guignard thanks the support of the National Research Programme 75 ”Big Data”(PNR75) of the Swiss National Science Foundation (SNSF).L. Telesca thanks the support of the ”Scientific Exchanges” project n ◦ REFERENCES Arenas, A., Duch, J., Fernandez, A., and Gomez, S., “Size reduction of complex networkspreserving modularity,” New Journal of Physics , 176. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E., “Fast unfolding ofcommunities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, P10008 (2008). Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., and Hwang, D.-U., “Complex net-works: Structure and dynamics,” Physics Reports , 175 – 308 (2006). Cao, G., Zhang, Q., and Li, Q., “Causal relationship between the global foreign exchangemarket based on complex networks and entropy theory,” Chaos, Solitons & Fractals ,36 – 44 (2017). Clauset, A., Newman, M. E. J., and Moore, C., “Finding community structure in verylarge networks,” Phys. Rev. E , 066111 (2004). Cleveland, R. B., Cleveland, W. S., McRae, J. E., and Terpenning, I., “Stl: A seasonal-8rend decomposition procedure based on loess,” Journal of Official Statistics , 3 – 73(1990). COVER, T. M. and THOMAS, J. A.,
ELEMENTS OF INFORMATION THEORY. 2nded. (Wiley-interscience, 2006) p. 774. D’Amico, G., Petroni, F., and Prattico, F., “Wind speed prediction for wind farm appli-cations by extreme value theory and copulas,” Wind Eng. Ind. Aerodyn. Journal , p.229–236 (2015). Donges, J. F., Zou, Y., Marwan, N., and Kurths, J., “The backbone of the climatenetwork,” EPL (Europhysics Letters) , 48007 (2009). Donges, J. F., Zou, Y., Marwan, N., and Kurths, J., “Complex networks in climatedynamics,” The European Physical Journal Special Topics , 157–179 (2009). Donner, R. V., Lindner, M., Tupikina, L., and Molkenthin, N., “Characterizing flows bycomplex network methods,” in
A Mathematical Modeling Approach from Nonlinear Dy-namics to Complex Systems , edited by E. E. N. Macau (Springer International Publishing,Cham, 2019) pp. 197–226. Donner, R. V., Wiedermann, M., and Donges, J. F., “Complex network techniques forclimatological data analysis,” in
Nonlinear and Stochastic Climate Dynamics , edited byC. L. E. Franzke and T. J. O’Kane (Cambridge University Press, 2017) p. 159183. Fortuna, L., Nunnari, S., and Guariso, G., “Fractal order evidences in wind speed timeseries,” ICFDA’14 International Conference on Fractional Differentiation and Its Applica-tions 2014 , 1–6 (2014). Fortunato, S., “Community detection in graphs,” Physics Reports , 75 – 174 (2010). Garcia-Marin, A. P., Estvez, J., Jim´enez-Hornero, F. J., and Ayuso-Munoz, J. L., “Multi-fractal analysis of validated wind speed time series,” Chaos: An Interdisciplinary Journalof Nonlinear Science , 013133 (2013). Girvan, M. and Newman, M. E. J., “Community structure in social and biological net-works,” Proceedings of the National Academy of Sciences , 7821–7826 (2002). Girvan, M. and Newman, M. E. J., “Community structure in social and biological net-works,” Proceedings of the National Academy of Sciences , 7821–7826 (2002). Gozolchiani, A., Yamasaki, K., Gazit, O., and Havlin, S., “Pattern of climate networkblinking links follows el Nio events,” EPL (Europhysics Letters) , 28005 (2008). Holger Kantz, D. H., Ragwitz, M., and Vitanov, N. K., “Markov chain model for turbulent9ind speed data,” Physica A: Statistical Mechanics and its Applications , 315 – 321(2004), proceedings of the VIII Latin American Workshop on Nonlinear Phenomena. Jimnez, A., “A complex network model for seismicity based on mutual information,” Phys-ica A: Statistical Mechanics and its Applications , 2498 – 2506 (2013). Kim, M. and Sayama, H., “Predicting stock market movements using network science: aninformation theoretic approach,” Applied Network Science , 35 (2017). Laib, M., Golay, J., Telesca, L., and Kanevski, M., “Multifractal analysis of the timeseries of daily means of wind speed in complex regions,” Chaos, Solitons & Fractals ,118 – 127 (2018). Laib, M., Telesca, L., and Kanevski, M., “Long-range fluctuations and multifractality inconnectivity density time series of a wind speed monitoring network,” Chaos: An Inter-disciplinary Journal of Nonlinear Science , 033108 (2018). Laib, M., Telesca, L., and Kanevski, M., “Periodic fluctuations in correlation-based con-nectivity density time series: Application to wind speed-monitoring network in switzer-land,” Physica A: Statistical Mechanics and its Applications , 1555 – 1569 (2018). Lancichinetti, A., Fortunato, S., and Radicchi, F., “Benchmark graphs for testing com-munity detection algorithms,” Phys. Rev. E , 046110 (2008). Newman, M., “The structure and function of complex networks,” SIAM Review , 167–256 (2003). Newman, M. E. J., “Analysis of weighted networks,” Phys. Rev. E , 056131 (2004). de Oliveira Santos, M., Stosic, T., and Stosic, B. D., “Long-term correlations in hourlywind speed records in Pernambuco, Brazil,” Physica A: Statistical Mechanics and itsApplications , 1546 – 1552 (2012). Pierini, J. O., Lovallo, M., and Telesca, L., “Visibility graph analysis of wind speed recordsmeasured in central Argentina,” Physica A: Statistical Mechanics and its Applications ,5041 – 5048 (2012). Pons, P. and Latapy, M., “Computing communities in large networks using random walks,”in
Computer and Information Sciences - ISCIS 2005 , edited by p. Yolum, T. G¨ung¨or,F. G¨urgen, and C. ¨Ozturan (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005) pp.284–293. R Core Team,,
R: A Language and Environment for Statistical Computing , R Foundationfor Statistical Computing, Vienna, Austria (2018).10 Reichardt, J. and Bornholdt, S., “Statistical mechanics of community detection,” Phys.Rev. E , 016110 (2006). Rosvall, M. and Bergstrom, C. T., “An information-theoretic framework for resolving com-munity structure in complex networks,” Proceedings of the National Academy of Sciences , 7327–7331 (2007). Rousseeuw, P. J., “Silhouettes: A graphical aid to the interpretation and validation ofcluster analysis,” Journal of Computational and Applied Mathematics , 53 – 65 (1987). Steinhaeuser, K., Chawla, N. V., and Ganguly, A. R., “An exploration of climate data us-ing complex networks,” in
Proceedings of the Third International Workshop on KnowledgeDiscovery from Sensor Data (ACM, 2009) pp. 23–31. Telesca, L., Lovallo, M., and Kanevski, M., “Power spectrum and multifractal detrendedfluctuation analysis of high-frequency wind measurements in mountainous regions,” Ap-plied Energy , 1052 – 1061 (2016). Treiber, N. A., Heinermann, J., and Kramer, O., “Wind power prediction with machinelearning,” in
Computational Sustainability (Springer International Publishing, 2016) Chap.Wind Power Prediction with Machine Learning, pp. 13–29. Tsonis, A. and Roebber, P., “The architecture of the climate network,” Physica A: Statis-tical Mechanics and its Applications , 497 – 504 (2004). Tsonis, A. A. and Swanson, K. L., “Topology and predictability of el Ni˜no and la Ni˜nanetworks,” Phys. Rev. Lett. , 228–502 (2008). Tsonis, A. A., Swanson, K. L., and Roebber, P. J., “What do networks have to do withclimate?” Bulletin of the American Meteorological Society , 585–595 (2006). Yamasaki, K., Gozolchiani, A., and Havlin, S., “Climate networks around the globe aresignificantly affected by el Ni˜no,” Phys. Rev. Lett. , 228501 (2008). Yang, Z., Algesheimer, R., and Tessone, C. J., “A comparative analysis of communitydetection algorithms on artificial networks,” Scientific reports , 30750 (2016).11 FIG. 1. Study area and location of measuring stations. FJ Time W i nd s peed ( m / s ) KOP
Time W i nd s peed ( m / s ) BAS
Time W i nd s peed ( m / s ) CHU
Time W i nd s peed ( m / s ) LUG
Time W i nd s peed ( m / s ) FIG. 2. Some example of daily wind time series. .00.20.40.60.81.01.2PRE MLS SBO ENG WYN P R E M L SSB O E N G W Y N FIG. 3. Mutual information matrix (119 x 119). Each cell ( i, j ) represents the mutual informationbetween station i and station j TTCHD EGH SMMTITBOL BOUBRZEIN GROKOP PRE RAGSTKTHU CMAPMAVABALPDUB ELMEMM GRAHAIHLL LAG MERPAA QUISCMSIR ABO CIM COVGRHMLSMRPMVEPILALTBASBIE BIZ EBKGVEKLO PIOHOE ROBULR WFJANDBEZ BUS GOEINT LEIMUBNEU SBOSHA VISWAE CHA JUNNAP ROESAMBER CHUCOMLUGLUZPAYREH RUESMA STGCDFDAV DISDOLENG FRE GSBCGI FAH GIHGUTMAGOTLPUY SIOTAEBUF EVO GUEPLF SAESBESCUZERAIGDEMEGO GLAWYN GENGOR MTRNASARHCRMGRE MAHMOA MOESPF ORO
FIG. 4. Network visualisation with the three communities obtained by ML method before applyingthe STL. l l llll l l lll lll l lll l lll lll ll lll l l lll ll l lll lll l l l ll llllll lll lll ll ll l ll lllll ll l ll lll ll ll l l llll l l ll ll ll lll l l ll ll l lll ll ll ll FIG. 5. Communities detected in network constructed before the STL decomposition. il houe tt e w i d t h s − . − . . . . Average silhouette = 0.12 Average silhouette = 0.24 Average silhouette = 0.24
FIG. 6. Silhouette width of each node of each community obtained, on the Mutual informationmatrix, before applying the STL decomposition. The average value of the silhouette widths are:0 .
12 for the first community (black), 0 .
24 for the second community (red), 0 .
24 for the thirdcommunity (green). The total average is 0 . il houe tt e w i d t h s − . − . . . . Average silhouette = 0.04 Average silhouette = 0.24 Average silhouette = −0.14
FIG. 7. Silhouette width of each node of each community obtained, on the XY coordinates, beforeapplying the STL decomposition. The average value of the silhouette widths are: 0 .
04 for the firstcommunity (black), 0 .
24 for the second community (red), − .
14 for the third community (green).The total average is 0 . FJ Time W i nd s peed ( m / s ) − − KOP
Time W i nd s peed ( m / s ) − BAS
Time W i nd s peed ( m / s ) − CHU
Time W i nd s peed ( m / s ) − LUG
Time W i nd s peed ( m / s ) − FIG. 8. Residuals of wind series, shown in Fig. 2, obtained by using the STL decomposition. TTCHD EGHSMM TITBOL BOUBRZ EINGRO KOP PRERAGSTK THUCMAPMAVAB ALP DUBELMEMM GRAHAIHLLLAGMER PAAQUISCMSIR ABOCIM COVGRH MLSMRPMVE PILALT BAS BIEBIZ EBKGVE KLOPIO HOEROBULR WFJ ANDBEZ BUSGOEINT LEI MUB NEUSBO SHAVISWAE CHAJUN NAPROE SAMBERCHUCOMLUG LUZ PAYREHRUE SMASTG CDFDAVDIS DOLENGFREGSB CGIFAHGIHGUTMAGOTL PUYSIO TAEBUF EVO GUE PLF SAESBE SCU ZER AIGDEMEGOGLA WYNGEN GORMTR NASARH CRMGREMAHMOA MOESPF ORO
FIG. 9. Network visualisation with the two communities obtained by ML method after applyingthe STL. l l llll l l lll lll l lll l lll lll ll lll l l lll ll l lll lll l l l ll llllll lll lll ll ll l ll lllll ll l ll lll ll ll l l llll l l ll ll ll lll l l ll ll l lll ll ll ll FIG. 10. Communities detected in network constructed after the STL decomposition. il houe tt e w i d t h s − . . . . . . Average silhouette = 0.47 Average silhouette = 0.23
FIG. 11. Silhouette width of each node of each community obtained, on the mutual informationmatrix, after applying the STL decomposition. The average value of the silhouette widths are:0 .
47 for the first community (black), 0 .
23 for the second community (red). The total average is0 . il houe tt e w i d t h s − . − . . . . Average silhouette = 0.21 Average silhouette = 0.27
FIG. 12. Silhouette width of each node of each community obtained, on the XY coordinates, afterapplying the STL decomposition. The average value of the silhouette widths are: 0 .
21 for the firstcommunity (black), 0 .
27 for the second community (red). The total average is 0 . .0 0.1 0.2 0.3 0.4 Silhouette width
FIG. 13. Comparison between the silhouette width (obtained using the mutual information matrix)histogram of 1 ,
000 random classes (blue) and the total average silhouette width for classes obtainedafter STL decomposition (red). .00 0.05 0.10 0.15 0.20 0.25 Silouette widths
FIG. 14. Comparison between the silhouette width (obtained using the XY coordinates) histogramof 1 ,
000 random classes (blue) and the total average silhouette width for classes obtained afterSTL decomposition (red).000 random classes (blue) and the total average silhouette width for classes obtained afterSTL decomposition (red).