[PDF] A Weight-based Information Filtration Algorithm for Stock-Correlation Networks

Abstract

Several algorithms have been proposed to filter information on a complete graph of correlations across stocks to build a stock-correlation network. Among them the planar maximally filtered graph (PMFG) algorithm uses 3n−6 edges to build a graph whose features include a high frequency of small cliques and a good clustering of stocks. We propose a new algorithm which we call proportional degree (PD) to filter information on the complete graph of normalised mutual information (NMI) across stocks. Our results show that the PD algorithm produces a network showing better homogeneity with respect to cliques, as compared to economic sectoral classification than its PMFG counterpart. We also show that the partition of the PD network obtained through normalised spectral clustering (NSC) agrees better with the NSC of the complete graph than the corresponding one obtained from PMFG. Finally, we show that the clusters in the PD network are more robust with respect to the removal of random sets of edges than those in the PMFG network.

Full PDF

AA Weight-based Information Filtration Algorithm forStock-Correlation Networks

Seyed Soheil Hosseini, Nick Wormald ∗ , and Tianhai TianSchool of Mathematics, Monash University Abstract

Several algorithms have been proposed to ﬁlter information on a complete graph of corre-lations across stocks to build a stock-correlation network. Among them the planar maximallyﬁltered graph (PMFG) algorithm uses n − edges to build a graph whose features includea high frequency of small cliques and a good clustering of stocks. We propose a new algo-rithm which we call proportional degree (PD) to ﬁlter information on the complete graph ofnormalised mutual information (NMI) across stocks. Our results show that the PD algorithmproduces a network showing better homogeneity with respect to cliques, as compared to eco-nomic sectoral classiﬁcation than its PMFG counterpart. We also show that the partition of thePD network obtained through normalised spectral clustering (NSC) agrees better with the NSCof the complete graph than the corresponding one obtained from PMFG. Finally, we show thatthe clusters in the PD network are more robust with respect to the removal of random sets ofedges than those in the PMFG network. Key words— stock-correlation network, PD network, PMFG network, normalised mutual in-formation ‘Complex systems’ is the term referring to the study of systems with a signiﬁcant number of com-ponents in which we want to ﬁnd out how the relationships between those components affect thebehaviour of the system. The study of complex systems includes concepts from various disciplinessuch as mathematics, statistics, and computer science. One type of complex system is a com-plex network which consists of a large number of vertices and the relationships across them [1].There are many examples of such networks such as the Worldwide Web [2, 3], papers citationnetwork [4, 5], social networks [6–9], and ﬁnancial networks [10, 11]. ∗ Research supported by ARC DP160100835. a r X i v : . [ q -f i n . S T ] A p r ne kind of ﬁnancial network is a stock network. In such a network, the vertices denote thestocks and the weight of an edge between two stocks shows the similarity between them. Sim-ilarity could be, for example, the inﬂuence the stocks have over the price of each other. One ofthe most commonly used measures to account for similarity in stock networks is Pearson corre-lation coefﬁcient. That being said, including all the cross correlations across stocks would createa complete graph that reﬂects the complexity through a densely interwoven structure; because ofwhich, several algorithms have been proposed to ﬁlter the complete graph into a simple subgraphto use as a representation of the original network. Some of these algorithms are minimum spanningtree (MST) [12–15], asset graph (AG) [16], planar maximally ﬁltered graph (PMFG) [17–21], andcorrelation threshold method [22–25]. Also, see Birch et al. [26] for an advantages-and-limitationscomparison of MST, AG, and PMFG on a dataset. Now the question is, what exactly makes aﬁltering algorithm better than the others? There is no unique answer, but to get a perspective, letus take a quick look at the reported positive aspects of the above-mentioned methods.Mantegna [12] attributed the advantage of MST to the fact that it provided a hierarchical clus-tering of stocks. Onnela et al. [16] demonstrated the advantage of AG by observing that it hada higher survival ratio (ratio of common edges existing in two consecutive time steps) comparedwith MST. Yet they also mentioned that unlike MST, there was not an evident scale free behaviourindicating that the degree distribution follows a power law for AG. In sum, they found AG betterin terms of being less fragile in the presence of market crisis, and that it incorporated more in-formation from the original complete graph compared to MST, for it did not have the structurallimitations of MST. Tumminello [17] attributed the usefulness of PMFG to the fact that the net-work produced always contains the one produced by MST, and that it contained cliques in it withstocks in those cliques mostly belonging to the same economic sectors. Boginski [22] mentionedthat correlation threshold method was useful since for a large enough minimum threshold on thevalue of correlation coefﬁcient, their network had a scale-free behaviour, and they could classifyﬁnancial instruments through the analysis of cliques and independent sets of their network.Others have also discussed advantages of the above algorithms. Huang et al. [23] argued thatcoefﬁcient threshold method displayed robustness against random vertex failures and a high aver-age clustering coefﬁcient. One of the points Wang et al. [21] made is that PMFG is useful because itprovided a good clustering of the stocks according to the economic sectoral benchmark clustering.In sum, the positive aspects of ﬁltering algorithms considered so far in the literature are sparsity,scale-free behaviour, homogeneity of cliques, survival ratio, good clustering, and robustness. Ofthese, the clustering behaviour seems to attract the most attention.We propose an algorithm called proportional degree (PD) to build a stock-correlation networkbased on the normalised mutual information (NMI) similarity matrix across the stocks. We showthat the PD network with the same size as its PMFG counterpart has better homogeneity of cliquesaccording to the stock economic sectors. We also show that the PD network has an overall betterclustering compared to the PMFG network in terms of agreement with the normalised spectralclustering (NSC) of the similarity matrix.In Section 2, we deﬁne mutual information and explain why we used this measure to accountfor correlation across the stocks. Then we describe the PD and PMFG algorithms in conjunctionwith the methods that we used to compare the corresponding networks of those algorithms. InSection 3, we provide the results of the comparison of the two networks built by the PD and PMFG2lgorithms. Finally, Section 4 includes our conclusion and some ideas for prospective researchers. The measure we use to account for the correlation across stocks is NMI. The reason we prefermutual information over correlation coefﬁcient is that the former can detect the relationship be-tween variables that cannot be detected by a linear correlation measure such as the latter [27].This feature of mutual information measure is more evident when the stock market exhibits violentﬂuctuations [15]. We deﬁne this measure in the following.Mutual information measures the level of independence between two random variables [28]where a value of zero shows statistical independence of the random variables. The mutual infor-mation between two stocks X and Y can be formulated as I ( X, Y ) = H ( X ) + H ( Y ) − H ( X, Y ) (1)which is derived from Shannon’s information entropy [29], a measure that quantiﬁes the uncer-tainty of a random variable. Here, I ( X, Y ) is the mutual information of X and Y , H ( X ) and H ( Y ) denote entropy of X and Y respectively, and H ( X, Y ) denotes the joint entropy of X and Y . The entropy and joint entropy of discrete random variables X and Y are deﬁned by H ( X ) = − (cid:88) i p ( x i ) log p ( x i ) (2) H ( X, Y ) = − (cid:88) i (cid:88) j p ( x i , y j ) log p ( x i , y j ) (3)where p ( x i ) and p ( x i , y j ) are the probability distribution and joint probability distribution of X and ( X, Y ) respectively.Unlike the correlation coefﬁcient, mutual information is not bounded above by 1. Since largevalues of mutual information could be hard to interpret, it is useful to use the normalised mutualinformation, NMI, which brings the values down to the bounded interval [0 , . It is deﬁned by NMI ( X, Y ) = 2 I ( X, Y ) H ( X ) + H ( Y ) . (4)One question we must face is how to construct the probability and joint probability distribu-tions of the stocks in the S&P/ASX 200 that we have chosen to study in order to ﬁnd the mutualinformation between them? To this end, we use the same numerical method as proposed by Guoet al. [15]. For n stocks traded in m business days, let P it be the closing price of stocks i on day t .The log-return of stock i on day t for t = 2 , , . . . , m and i = 1 , , . . . , n is deﬁned by R it = ln P it P i ( t − . (5)3n order to ﬁnd the probability distribution of the log-return of stock i , we sort R it values for t = 2 , , . . . , m in ascending order and divide the sorted values into q bins. Then we count thenumber of log-returns of stock i for i = 1 , , . . . , n in each bin a for a = 1 , , . . . , q denotedby f ia and get the approximate probability by p ia ≈ f ia m . Similarly, we ﬁnd the joint probabilitydistribution of the log-returns of stocks i and j for i, j = 1 , , . . . , n by dividing their sorted log-returns into q × q bins. In such case, f ijab denotes the number of log-returns of i and log-returns of j in bin ( a, b ) , and the the approximate joint probability is given by p ijab ≈ f ijab m . As a result, wecan approximate the entropy of stock i and joint entropy of stocks i and j by H ( S i ) = − q (cid:88) a =1 p ia log p ia (6) H ( S i , S j ) = − q (cid:88) a =1 q (cid:88) b =1 p ijab log p ijab . (7)Therefore, the mutual information of stocks i and j can be given by substituting equations (6) and(7) in equation (1), and the NMI is given by NMI ( S i , S j ) = 2 I ( S i , S j ) H ( S i ) + H ( S j ) , i (cid:54) = j (8)which produces a symmetric n × n matrix with diagonal elements of zero. We consider this matrixto be the similarity matrix of the stocks. A graph can be represented by G ( V, E ) in which V = { v , v , . . . , v n } denotes the vertices and E = { e , e , . . . , e ij , . . . } denotes the edges. A planar graph is one that can be embedded ontoa surface with genus g = 0 , or the plane, without any two edges crossing or edges intersecting avertex. The PMFG algorithm builds a network as follows. Algorithm

PMFG algorithm

Input: V : set of stocks s ij : similarity between stock i and j given in equation (8) Output: G ( V, E ) : planar network G ( V, E ) ← empty network of stocks V S ← list of ( i, j, s ij ) ( i, j ∈ V , i (cid:54) = j ) , sorted in descending order for ( i, j, s ij ) in S do E ← E ∪ { e ij } if G is planar = F alse then E ← E − { e ij } n − edges whenever n is at least 3. We ﬁrst determine the degree of each vertex in our output network in a manner such that it isproportional to its weight, where the weight of a vertex (or stock weight) is the sum of its similarityvalue across all the other vertices. The weight of stock i is deﬁned by SW i = (cid:88) j (cid:54) = i s ij (9)where SW i and s ij respectively denote the weight of stock i and the similarity between stocks i and j .Consequently, the calculated degree of a vertex d (cid:48) i should be more or less given by d (cid:48) i = SW in (cid:80) j =1 SW j × (2 M ) (10)in which M is the total number of edges, so M would be the sum of the degrees of all vertices.However, the degree of a vertex, being the number of adjacent vertices, is required to be integer.In order to round the calculated degrees d (cid:48) i while preserving their total sum, we apply the cascaderounding algorithm. For the rest of the paper, wherever we mention degree in association with thePD algorithm, it means the integer or rounded calculated degree. To use cascade rounding, we ﬁrstrelabel the vertices as 1 to n , from largest stock weight to smallest. Then we determine the degreeof vertex i recursively by subtracting the cumulative sum of the degrees of the i − vertices beforeit, from the rounded cumulative sum of the calculated degrees of vertices 1 to i . Thus, d = (cid:98) d (cid:48) (cid:101) and d i = (cid:98) i (cid:88) j =1 d (cid:48) j (cid:101) − i − (cid:88) j =1 d j , i ≥ (11)where d i is the degree of vertex i and (cid:98) x (cid:101) denotes the nearest integer to x . Then the PD algorithmbuilds a network as follows. 5 lgorithm PD algorithm

Input: V : set of stocks s ij : similarity between stock i and j given in equation (8) Output: G ( V, E ) : proportional degree network G ( V, E ) ← empty network of stocks V S ← list of ( i, j, s ij ) ( i, j ∈ V , i (cid:54) = j ) , sorted in descending order deg ( i ) : number of vertices adjacent to vertex i in network G for ( i, j, s ij ) in S doif ( deg ( i ) < d i ) and ( deg ( j ) < d j ) and ( e ij / ∈ E ) then E ← E ∪ { e ij } For the purpose of comparing with PMFG network, we set the total number of edges in thisalgorithm to M = 3 n − to equal the value in PMFG. One of the advantages of PMFG over MST is the additional information linked with the inclusionof 3 and 4-cliques [17,18]. A clique is a subset of vertices in which every two vertices are connectedvia an edge. Such a subset is called a maximal clique if it is not contained in any larger clique. Aclique of size m is referred to as an m -clique. One way of analysing the cliques is to investigatehow often the stocks in them belong to the same economic sector; in other words, what is thedegree of cliques homogeneity with respect to the economic sectors [17]? One of the most extensively investigated features of complex networks is community structure orclustering. Clusters in a graph are groups of vertices in which the density of edges inside thosegroups is considerably larger than the average edge density of the graph [30]. If each vertex of agraph only belongs to one cluster (no overlapping vertices), such a division of the graph determinesa partition. Partitions of the stock-correlation PMFG networks have been widely studied [19–21,31]. As with the analysis of cliques, one of the ways of analysing the clusters is investigating howwell they match the economic sector classiﬁcation of the stocks since we would hope that stocksbelonging to the same economic sector are more likely to be in the same cluster [32]. We evaluatethe clusters found by Louvain community detection [33], which is deﬁned in the next subsection,in PD and PMFG networks through their similarity to the stocks’ economic sectors partition. Wealso use Louvain community detection and normalised spectral clustering (NSC) [34] later in thepaper on the similarity matrix of the stocks (complete graph of NMI between stocks) and comparethe resulting partitions with the partitions of the PD and PMFG networks achieved thorugh the6ame methods. To compare any two partitions, we use adjusted rand index (ARI) [35] which wediscuss in more detail in subsection 2.5.

Louvain community detection is a greedy algorithm that tries to optimise the modularity functionby choosing values c i for each vertex i of the network. The modularity function is given as below. Q = 12 S (cid:88) ij (cid:20) s ij − SW i SW j S (cid:21) δ ( c i , c j ) (12)Here, Q ∈ [ − , , S is the sum of all similarities (edge weights), c i and c j are the communities ofstocks i and j , δ is a simple delta function, and SW i , SW j , and s ij as already deﬁned in equation(9). In this algorithm, in the ﬁrst step, each vertex is in its own community, that is, all the c i ’sare distinct. The effect on modularity caused by changing the community of a vertex i to that ofeach of its neighbours in turn is checked. Then the community of vertex i is reassigned to thecommunity of the neighbour vertex that leads to the largest increase in modularity. In the case ofno increase in modularity, i keeps its own community label. This process is applied to all verticesand repeated until the community reassignment of none of the vertices leads to an increase in Q .In the second step, all the the vertices belonging to the same community are considered as a singlevertex, and the edges across vertices in the previous step are now denoted by self loops on thenew vertex. Also, several edges from vertices of the same community in the previous step to avertex in another community is denoted by a weighted edge between communities. These twosteps are repeated iteratively until there is no change in the community assignment of the verticesin step one. That being said, depending on the order of vertices evaluated by this algorithm, we getdifferent partitions. Accordingly, this algorithm does not yield the global maximum modularity. Itis also worth mentioning that ﬁnding the exact maximum modularity is an NP-hard problem, andone does not hope for an algorithm to solve it. NSC is an algorithm that takes the similarity matrix and the number of clusters k as its inputs andpartitions the data set as below [34, 36]. Algorithm

NSC algorithm W = ( w ij ) i,j =1 , ,...,n : similarity matrix k : number of clusters D : digonal matrix with d i = n (cid:80) j =1 w ij , i = 1 , , . . . , nL = D − W : Laplacian matrix ν , ν , . . . , ν k ← eigenvectors of the k smallest eigenvalues of the eigneproblem Lν = λDνV ∈ R n × k ← matrix with ν , ν , . . . , ν k as columns y i ∈ R k , i = 1 , , . . . , n ← corresponding vector of i -th row of VC , C , . . . , C k ← clusters of y i ∈ R k , i = 1 , , . . . , n by k-means algorithm7ut what is a good choice of k ? One tool to answer this question is eigengap heuristic [36].Deﬁning sorted eigenvalues of Laplacian L of the similarity matrix as λ , λ , . . . , λ n , eigengapheuristic states that the network should be divided into k clusters so that λ k +1 is signiﬁcantly largerthan λ , . . . , λ k . In other words, if the largest gap is between λ k and λ k +1 for k = 1 , , . . . , n − in the sorted eigenvalues of the Laplacian of the similarity matrix, we divide the network into k clusters. The Rand index [37] is a measure in statistics that quantiﬁes the similarity between two partitionsof a data set. The ARI is another version of the Rand index that is corrected for chance. Given twopartitions, namely A and B of the set S containing n elements, the ARI of A = { A , A , . . . , A r } and B = { B , B , . . . , B s } is as given by ARI = (cid:80) ij (cid:0) n ij (cid:1) − (cid:34)(cid:80) i (cid:0) a i (cid:1) (cid:80) j (cid:0) b j (cid:1)(cid:35)(cid:0) n (cid:1) (cid:20)(cid:80) i (cid:0) a i (cid:1) + (cid:80) i (cid:0) a i (cid:1)(cid:21) − (cid:34)(cid:80) i (cid:0) a i (cid:1) (cid:80) j (cid:0) b j (cid:1)(cid:35)(cid:0) n (cid:1) (13)where n ij = | A i ∩ B j | , a i = s (cid:80) j =1 n ij , and b j = r (cid:80) i =1 n ij . This measure satisﬁes ARI ∈ [ − , sothat 1 shows identical clusters, -1 shows complete mismatch, and 0 shows random assignment toclusters. We selected 125 out of 200 stocks in the S&P/ASX 200. The criterion used for selection was thatthese 125 stocks are the ones that were traded throughout the whole period of the years 2013-2016.The economic sectors and their number of corresponding stocks are as shown in the Table 1.In order to get the NMI between all the stocks to generate the similarity matrix, we chose thebin size of q = 20 (referring to equations (6) and (7)) for the 1013 trading days in our data since asGuo et al. [15] mention, for a large enough q , there is not much difference in the values of mutualinformation, and they considered a bin size of q = 10 for the 734 trading days in their data.We generated the PD and PMFG networks for the above-mentioned data, the analysis of whichis provided below. In the PD network, we have vertices with degrees ranging from 1 to 9 whereasin the PMFG network the degrees range from 3 to 29. Visualisations of both networks are in Figure3. 8 able 1. Economic sectors of stocks

Economic Sector Number of Stocks

Consumer Dscretionary 21Consumer Staples 6Energy 8Financials 19Health Care 10Industrials 16Information Technology 2Materials 26Real Estate 12Telecommunication Services 2Utilities 3

The analysis of the networks generated in the previous subsection indicates that there are 87 max-imal cliques of size 3 and larger, including 52 maximal cliques of size 3, 23 of size 4, 9 of size 5,and 3 of size 6. Similarly, there are 122 maximal cliques of size 4 in PMFG network. To quantifyhomogeneity, / = 0 . of the maximal cliques in the PD network consist of stocks all belong-ing to the same economic sector whereas this ratio is / = 0 . for the PMFG network. Wealso compared the homogeneity of the maximal cliques with minimum size of 3 in the two net-works on different random subsets of the stocks. To this end, we considered different proportions r = / , / , / , / of all the 125 stocks, and for each r , we took 10 samples of size (cid:98) r × n (cid:101) fromthe stocks. We plotted the results of all samples for each r and each network as shown in Figure1, and we can see that for every r , the PD network has an overall larger homogeneity of maximalcliques compared with PMFG.Contrasting with this, in a PMFG network, as any other planar graph, we cannot have a maxi-mal clique with size more than 4 since such a clique cannot be embedded into a surface with genus g = 0 without any two edges crossing. In fact, we can have at most n − maximal 4-cliquesand n − / = 0 . in the 3-cliques and / = 0 . in the 4-cliques of it. These ratios are / = 0 . and / = 0 . in the PMFG network. Sowe can also see that all the maximal cliques in the PMFG network are 4-cliques here. As with theprevious analysis, we compared the homogeneity of 3-cliques and 4-cliques of the two networkson different random subsets of the stocks, and the result is plotted on Figure 2. We can see thatthe comparison of 3-cliques and 4-cliques homogeneity between the two networks is even morestriking than for maximal cliques. 9 ig. 1. Maximal cliques homogeneity comparison of the two networks ondifferent random subsets of proportions r of all stocks (a) (b) Fig. 2. r of all stocks Using the Louvain community detection approach, following Wang and Xie [20] and Wang etal. [21], we identiﬁed clusters as shown in Tables 2 and 3. We found 7 clusters in PD and 6 inPMFG. In both tables, “Sectors” refers to the number of stocks belonging to each economic sectorin the corresponding cluster, e.g. 17F denotes 17 stocks belonging to Financials sector, “Size”refers to the cluster size, “Dominant” refers to the economic sector repeated the most in the cluster,and “Percentage” refers to the proportion of the stocks belonging to the dominant sector in thecluster. 10 able 2.

Clusters captured in the PD network by Louvain community detectionCluster Sectors Size Dominant Percentage

Table 3.

Clusters captured in the PMFG network by Louvain communitydetectionCluster Sectors Size Dominant Percentage

Nonetheless, as pointed out in Section 2.4.1, Louvain community detection yields differentpartitions depending on the order of vertex evaluation. To mitigate the effect of different partitionscorresponding to different orders of vertices on ARI, we applied the Louvain method on 100 ran-dom orders of vertices in both networks and took the average of those 100 ARIs for each methodin terms of resemblance to economic sectors’ partition of stocks. This produced average ARIs of0.31 and 0.26 for PD and PMFG networks respectively.However, the economic sector classiﬁcation is not the be-all and end-all partition of stocks.For example, every stock labelled as Real Estate in the ASX/S&P 200 data of 01/10/2018 had beenput in the Financials category in the ASX/S&P 200 data of 21/03/2016, which means that the eco-nomic sector classiﬁcation is subject to change, reducing the likelihood that it represents the uniquecorrect partition. Indeed, it could also be argued that there are some signiﬁcant sub-categories inother economic sectors, which would create more clusters than the number of economic sectors.To create another partition benchmark other than the economic sector classiﬁcation, we used Lou-vain community detection on the complete graph of NMI between stocks (similarity matrix of thestocks). This computation produced only four clusters of stocks. Comparing clusters achieved byLouvain community detection in PD and PMFG networks as shown in Tables 2 and 3 with the newpartition benchmark, we got ARIs of 0.40 and 0.36 respectively. Yet not much can be concludedfrom this comparison since the number of clusters in the benchmark partition is so different fromthe numbers of clusters in the two networks.To draw a more signiﬁcant comparison between the clustering behaviour of the two networks,11 a) Different colors referring to different clusters of Table 2 (b)

Different colors referring to different clusters of Table 3

Fig. 3.

Clusters found in the PD (a) and PMFG (b) networks using Louvaincommunity detection we used NSC on the similarity matrix of the stocks and called the resulting partition C K . Thenwe applied NSC to the PD and PMFG networks where the corresponding partitions are denotedby C PD and C PMFG respectively. For the similarity matrix to input into NSC, we used the binary12djacency matrix of the networks. In Figure 4, the Y-axis denotes the ARI of C K and C PD versusthat of C K and C PMFG , and the X-axis denotes k . Here, we regard a network to have a good ARIperformance if its ARI against C K is large. It can be seen that for small values of k , there is notmuch difference in the ARI performance of the networks, for k = 7 , , PMFG has a better ARIperformance, and for k > , PD consistently has a better ARI performance than PMFG. As shownin Figure 4, we restrict the number of clusters to being at least 4 because this is the least numberof clusters in the application of Louvain to any of the networks or graphs under discussion, and ismuch smaller than the number of economic sectors.Implementing the heuristic described in Section 2.4.2 and ignoring the gaps between the ﬁrst,second, and third sorted eigenvalues since we ignore 1 and 2 as the number of clusters, we ﬁndthe largest gaps between the sorted eigenvalues for the similarity matrix are g ( λ , λ ) = 0 . , g ( λ , λ ) = 0 . , and g ( λ , λ ) = 0 . . We then note that the PD network has a better ARIperformance than the PMFG network for k = 4 , , . From another perspective, one of thepoints we made is that there could be some subsectors lurking in the classiﬁcation of stocks by theeconomic sector. As we have 11 economic sectors, from this point of view, the number of clusterscan be k > , and for these values of k , PD consistently displays a better ARI performance thanPMFG. It should be said that although spectral clustering does not perform well on sparse networksall the time [39–41], NSC gives a sensible result in our networks as the partitions match fairly wellwith the economic sector classiﬁcation C e of the stocks as shown in Table 4. Also, the result of thistable is another indicator that small values of k are not valid, for the ARI of C PD and C e is smallerthan that of C K and C e . Besides this, the ARI of C PD / C PMFG and C e is small for small values of k compared to larger values of k . Table 4.

ARI of C PD / C PMFG / C K and C e k PD PMFG Complete graph As with our analysis of the homogeneity of cliques, to test the validity of our result, we alsocompared the ARI performance of the two networks on different random subsets of the stocks.To this end, again, we considered different possible proportions r = / , / , / , / , and for each r , we took 10 samples of size (cid:98) rn (cid:101) from the stocks. Then we implemented the PD and PMFGalgorithms on the samples to generate the two networks and applied NSC on both networks foreach sample. Denoting the partitions of PD, PMFG, and the complete similarity matrix of sample i by C PD i , C PMFG i , and C K i respectively, we considered the average ARI of C K i and C PD i versusthat of C K i and C PMFG i for i = 1 , . . . , and plotted the results as shown in Figure 5. We cansee the same pattern for every r ; that for a large enough k , PD has consistently a better averageARI performance than PMFG whereas for small values of k , there is virtually no difference in the13 ig. 4. ARI performance comparison of C PD versus C PMFG average ARI performance of the networks. In addition, we can see on Figure 5 that as the size ofthe network shrinks (for smaller values of r ), the difference between the average ARI performanceof the networks become smaller. In other words, forcing into a planar network has less effect onclustering of the stocks in smaller networks. One reason could be that there is less use of largercliques in smaller networks; thus, the restriction of PMFG to maximal clique size of 4 becomesless important. One method to investigate the robustness or stability of network is removing a subset of its verticesor edges at a certain rate [23]. On both networks, we removed 100 different samples of 20%, 30%,and 40% of the edges randomly and applied NSC on them. Then we plotted the the average ARIof C K and C PD and that of C K and C PMFG as shown in Figure 6. As expected, there is an overalldecrease in ARI performance of both networks as the percentages of edge removal increases. Thatbeing said, there is an increase in the average ARI for small k ’s ( k ≤ and k ≤ for the PDand PMFG networks respectively), which could be another indicator that small values of k arenot valid. Hence, PD has a better clustering behaviour that PMFG since for large values of k , itdisplays a better ARI performance.In order to see which network has more change of clusters by edge removal, we took thevariance of ARIs of both networks for each k in 4 states, being ﬁrstly the networks with no change,and then the networks with 20%,30%, and 40% edge removal respectively. The results are plottedon Figure 7, and we can see that for every k , there is either not a signiﬁcant difference in thevariance of the ARIs or PD has a signiﬁcantly smaller variance than PMFG; thus, more robust withrespect to change in clusters. 14 a) r = / (b) r = / (c) r = / (d) r = / Fig. 5.

Average ARI performance of the PD and PMFG networks for differentproportions r of stocks a) PD (b) PMFG

Fig. 6.

Fluctuations in ARI for NSC of the networks for different proportions ofedge removal ig. 7. Robustness of the networks clusters in presence of edge removal for each k We used the NMI measure to build a cross-correlation similarity matrix across stocks and ap-plied the PD and PMFG algorithms to generate the corresponding stock-correlation networks. Weshowed that maximal cliques, 3-cliques, and 4-cliques had a higher homogeneity in the PD net-work than the PMFG network as to ﬁnancial sectoral classiﬁcation of the stocks. Moreover, weshowed that for a realistic number of clusters in the NSC algorithm, the PD network has a bet-ter ARI performance than the PMFG network in terms of matching the clusters achieved throughapplying the NSC algorithm on the similarity matrix of the stocks.It should be noted that the aforementioned results were achieved using NMI, and they are notnecessarily expected using other correlation measures. Also, we used n − edges to build thePD network for the whole purpose of comparing its performance with its PMFG counterpart. Itis not clear this size of the PD network is the optimal one considering the criteria of a superiorstocks-correlation network. A future topic for prospective researchers can be varying the sparsityof the PD algorithm and comparing the resulting networks. Also, other measures of correlation anddependence across stocks such as Spearman’s rank correlation coefﬁcient could be used to buildthe stock-correlation network and compare that with other stock-correlation networks according tothe criteria used in the literature. References [1] R´eka Albert and Albert-L´aszl´o Barab´asi. Statistical mechanics of complex networks.

Reviewsof modern physics , 74(1):47, 2002.[2] R´eka Albert, Hawoong Jeong, and Albert-L´aszl´o Barab´asi. Internet: Diameter of the world-wide web. nature , 401(6749):130, 1999. 173] Albert-L´aszl´o Barab´asi, R´eka Albert, and Hawoong Jeong. Scale-free characteristics of ran-dom networks: the topology of the world-wide web.

Physica A: statistical mechanics and itsapplications , 281(1-4):69–77, 2000.[4] Sidney Redner. How popular is your paper? an empirical study of the citation distribution.

The European Physical Journal B-Condensed Matter and Complex Systems , 4(2):131–134,1998.[5] Henry Small. Co-citation in the scientiﬁc literature: A new measure of the relationship be-tween two documents.

Journal of the American Society for information Science , 24(4):265–269, 1973.[6] Joseph Galaskiewicz and Stanley Wasserman. Social network analysis: Concepts, methodol-ogy, and directions for the 1990s.

Sociological Methods & Research , 22(1):3–22, 1993.[7] Stanley Wasserman and Katherine Faust.

Social network analysis: Methods and applications ,volume 8. Cambridge university press, 1994.[8] Duncan J Watts, Peter Sheridan Dodds, and Mark EJ Newman. Identity and search in socialnetworks. science , 296(5571):1302–1305, 2002.[9] Mark EJ Newman, Duncan J Watts, and Steven H Strogatz. Random graph models of socialnetworks.

Proceedings of the National Academy of Sciences , 99(suppl 1):2566–2572, 2002.[10] Michael Boss, Helmut Elsinger, Martin Summer, and Stefan Thurner 4. Network topologyof the interbank market.

Quantitative ﬁnance , 4(6):677–684, 2004.[11] Kimmo Soram¨aki, Morten L Bech, Jeffrey Arnold, Robert J Glass, and Walter E Beyeler. Thetopology of interbank payment ﬂows.

Physica A: Statistical Mechanics and its Applications ,379(1):317–333, 2007.[12] Rosario N Mantegna. Hierarchical structure in ﬁnancial markets.

The European PhysicalJournal B-Condensed Matter and Complex Systems , 11(1):193–197, 1999.[13] Giovanni Bonanno, Guido Caldarelli, Fabrizio Lillo, and Rosario N Mantegna. Topologyof correlation-based minimal spanning trees in real and model markets.

Physical Review E ,68(4):046130, 2003.[14] Benjamin M Tabak, Thiago R Serra, and Daniel O Cajueiro. Topological properties of stockmarket networks: The case of brazil.

Physica A: Statistical Mechanics and its Applications ,389(16):3240–3249, 2010.[15] Xue Guo, Hu Zhang, and Tianhai Tian. Development of stock correlation networks usingmutual information and ﬁnancial big data.

PloS one , 13(4):e0195941, 2018.[16] Jukka-Pekka Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and Antti Kanto.Asset trees and asset graphs in ﬁnancial markets.

Physica Scripta , 2003(T106):48, 2003.1817] Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N Mantegna. A tool forﬁltering information in complex systems.

Proceedings of the National Academy of Sciences ,102(30):10421–10426, 2005.[18] Michele Tumminello, Tiziana Di Matteo, Tomaso Aste, and Rosario N Mantegna. Correlationbased networks of equity returns sampled at different time horizons.

The European PhysicalJournal B , 55(2):209–217, 2007.[19] Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and Rosario N Mantegna. Evolu-tion of worldwide stock markets, correlation structure, and correlation-based graphs.

PhysicalReview E , 84(2):026108, 2011.[20] Gang-Jin Wang and Chi Xie. Correlation structure and dynamics of international real estatesecurities markets: A network perspective.

Physica A: Statistical Mechanics and its Applica-tions , 424:176–193, 2015.[21] Gang-Jin Wang, Chi Xie, and Shou Chen. Multiscale correlation networks analysis of theus stock market: a wavelet analysis.

Journal of Economic Interaction and Coordination ,12(3):561–594, 2017.[22] Vladimir Boginski, Sergiy Butenko, and Panos M Pardalos. Statistical analysis of ﬁnancialnetworks.

Computational statistics & data analysis , 48(2):431–443, 2005.[23] Wei-Qiang Huang, Xin-Tian Zhuang, and Shuang Yao. A network analysis of the chinesestock market.

Physica A: Statistical Mechanics and its Applications , 388(14):2956–2964,2009.[24] K Tse Chi, Jing Liu, and Francis CM Lau. A network perspective of the stock market.

Journalof Empirical Finance , 17(4):659–667, 2010.[25] A Namaki, AH Shirazi, R Raei, and GR Jafari. Network analysis of a ﬁnancial marketbased on genuine correlation and threshold method.

Physica A: Statistical Mechanics and itsApplications , 390(21-22):3835–3841, 2011.[26] Jenna Birch, Athanasios A Pantelous, and Kimmo Soram¨aki. Analysis of correlation basednetworks representing dax 30 stock price returns.

Computational Economics , 47(4):501–525,2016.[27] Fernando Lopes da Silva, Jan Pieter Pijn, and Peter Boeijinga. Interdependence of eeg sig-nals: linear vs. nonlinear associations and the signiﬁcance of time delays and phase shifts.

Brain topography , 2(1-2):9–18, 1989.[28] Thomas M Cover and Joy A Thomas.

Elements of information theory . John Wiley & Sons,2012.[29] Claude Elwood Shannon. A mathematical theory of communication.

ACM SIGMOBILEmobile computing and communications review , 5(1):3–55, 2001.[30] Santo Fortunato. Community detection in graphs.

Physics reports , 486(3-5):75–174, 2010.1931] Giuseppe Buccheri, Stefano Marmi, and Rosario N Mantegna. Evolution of correlation struc-ture of industrial indices of us equity markets.

Physical Review E , 88(1):012806, 2013.[32] Huan Chen, Yong Mai, and Sai-Ping Li. Analysis of network clustering behavior of thechinese stock market.

Physica A: Statistical Mechanics and its Applications , 414:360–367,2014.[33] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fastunfolding of communities in large networks.

Journal of statistical mechanics: theory andexperiment , 2008(10):P10008, 2008.[34] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation.

IEEE Transactionson pattern analysis and machine intelligence , 22(8):888–905, 2000.[35] Lawrence Hubert and Phipps Arabie. Comparing partitions.

Journal of classiﬁcation ,2(1):193–218, 1985.[36] Ulrike Von Luxburg. A tutorial on spectral clustering.

Statistics and computing , 17(4):395–416, 2007.[37] William M Rand. Objective criteria for the evaluation of clustering methods.

Journal of theAmerican Statistical association , 66(336):846–850, 1971.[38] David R Wood. On the maximum number of cliques in a graph.

Graphs and Combinatorics ,23(3):337–352, 2007.[39] Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zde-borov´a, and Pan Zhang. Spectral redemption in clustering sparse networks.

Proceedings ofthe National Academy of Sciences , 110(52):20935–20940, 2013.[40] Can M Le, Elizaveta Levina, and Roman Vershynin. Sparse random graphs: regularizationand concentration of the laplacian. arXiv preprint arXiv:1502.03049 , 2015.[41] Arash A Amini, Aiyou Chen, Peter J Bickel, Elizaveta Levina, et al. Pseudo-likelihood meth-ods for community detection in large sparse networks.