[PDF] Computing Cliques and Cavities in Networks

Abstract

Complex networks have complete subgraphs such as nodes, edges, triangles, etc., referred to as cliques of different orders. Notably, cavities consisting of higher-order cliques have been found playing an important role in brain functions. Since searching for the maximum clique in a large network is an NP-complete problem, we propose using k-core decomposition to determine the computability of a given network subject to limited computing resources. For a computable network, we design a search algorithm for finding cliques of different orders, which also provides the Euler characteristic number. Then, we compute the Betti number by using the ranks of the boundary matrices of adjacent cliques. Furthermore, we design an optimized algorithm for finding cavities of different orders. Finally, we apply the algorithm to the neuronal network of C. elegans in one dataset, and find all of its cliques and some cavities of different orders therein, providing a basis for further mathematical analysis and computation of the structure and function of the C. elegans neuronal network.

Full PDF

aarXiv on 3 January 2021

Computing Cliques and Cavities in Networks

Dinghua Shi *, Zhifeng Chen , Xiang Sun , Qinghua Chen *, Yang Lou , Guanrong Chen * ( Department of Mathematics, College of Science, Shanghai University, China, [email protected]; School of Mathematics and Statistics, Fujian Normal University, China, [email protected]; Department of Electrical Engineering, City University of Hong Kong, China, [email protected])

Abstract:

Complex networks have complete subgraphs such as nodes, edges, triangles, etc., referred to as cliques of different orders. Notably, cavities consisting of higher-order cliques have been found playing an important role in brain functions. Since searching for the maximum clique in a large network is an NP-complete problem, we propose using k -core decomposition to determine the computability of a given network subject to limited computing resources. For a computable network, we design a search algorithm for finding cliques of different orders, which also provides the Euler characteristic number. Then, we compute the Betti number by using the ranks of the boundary matrices of adjacent cliques. Furthermore, we design an optimized algorithm for finding cavities of different orders. Finally, we apply the algorithm to the neuronal network of C. elegans in one dataset, and find its all cliques and some cavities of different orders therein, providing a basis for further mathematical analysis and computation of the structure and function of the C. elegans neuronal network. Keywords:

C. elegans neuronal network, boundary matrix, clique, cavity, 0-1 programming, Euler characteristic number, Betti number

Introduction

A network has three basic sub-structures: chain, star and cycle. Chains are closely related to the concept of average distance, while a small average distance and a large clustering coefficient together implies a small-world network , where the clustering coefficient is determined by the number of triangles, special cycles. Stars follow heterogeneous degree distributions, with which the growth of node numbers and a preferential attachment mechanism together leads from random networks to scale-free networks . Cycles contain not only triangles but also higher-order cliques and cavities. In retrospect, we introduced the notion of totally homogeneous networks in studying optimal network synchronization, which are networks with the same node degree, same girth (length of the smallest cycle passing the node) and same path-sum (sum of all distances from other nodes to the node). We showed that totally homogeneous networks are the easiest ones to self-synchronize among all networks of rXiv on 3 January 2021 the same size. Recently, we found that cycles are essentially described by the Euler characteristic number (alternative sum of cliques of different orders) and the Betti number (number of cavities of different orders), while higher-order cliques and smallest cavities are key components of the totally homogenous networks. It is more challenging to study network cycles than nodes. A triangle is the smallest first-order cycle (denoted 1-cycle for brevity), which consists of three edges, and is a second-order clique (denoted 2-clique for brevity). Similarly, a complete graph of four nodes, which consists of 4 triangles, is a 3-clique. In the same manner, these concepts can be extended to higher-order ones. In a connected undirected network, the number of cycles with different lengths (defined as the number of cliques that compose the cycle) is huge, therefore new mathematical concepts and tools are needed , including such as cyclic operations and equivalent cycles, to classify them and select their representatives for effective analysis and computation. In the studies of brain science, computational neuroscience has a special focus on cyclic structures in neuronal networks. It was found, for example as reported in [8], that cycles generate neural loops in the brain, which not only can transmit information all over the brain but also have an important feedback function. It was suggested that this provides a foundation for the brain functions of memories and controls. Unlike cliques, which are placed at some particular locations e.g. cerebral cortexes, cavities extend to almost everywhere in the brain connecting many different regions together. In [9], it points out that in both biological and artificial neural networks, one can find huge numbers of cliques and cavities therein, which are massive and complex but not noticed before. Of particular importance is that cavities play an indispensable role in brain functioning. All these findings indicate an encouraging and promising direction in brain science research. However, it remains unclear today as how and in what pattern all such neuronal cliques and cavities are organized and mutually connected together. This calls for further endeavor into understanding the relationship between the complexity of higher-order topologies and the complexity of intrinsic neural functions of the brain. To do so, however, it needs to find most if not all cliques and especially cavities of different orders from the network. Artificial intelligence, on the other hand, relies on artificial neural networks inspired by the brain neuronal network , including recurrent neural networks, convolutional neural networks, Hopfield neural network, etc. Now, given the recent discover of higher-order cliques and cavities in the brain, the question is how to further develop artificial intelligence to an even higher level by utilizing the new knowledge about the brain topology. It is notable that a new neuronal network construction is recently proposed by an MIT research team inspired by the real structure of neuronal network of the C. elegans . It is an important but challenging problem to understand how the brain store information, learn new knowledge and react to external stimuli, as well as its rXiv on 3 January 2021 adaptively created topological connections and parallel computing patterns, which depend on in-depth studies of the brain neuronal network. Recently, the Brain Initiative project of USA , the Human Brain project of EU and the China Brain project are established to take such big challenges. In retrospect, many innovative mathematicians contributed a lot of fundamental work to related subjects, such as Euler characteristic number, Betti number, the notions of groups introduced by Abel and Galois and higher-order Laplacian matrices as well as Euler-Poincaré formula and the homology group. This also demonstrates the importance of studying cliques and cavities for further development of network science. In addition, the advance from the node-based dynamics to higher-order Laplacian-based dynamics requires the knowledge of higher-order cliques and cavities . The numbers of zero eigenvalues of higher-order Laplacians are equal to the corresponding Betti numbers, while their associate eigenvectors are closely related to higher-order cavities . Motivated by all the above observations, this paper investigates the important issue of the computability of a complex network, based on which the study continues to find higher-order cliques and their Euler characteristic number, as well as higher-order Betti numbers and higher-order cavities. The approach starts from 𝑘 -core decomposition , through finding cliques of different orders, and then performs a sequence of computations on the ranks of the corresponding boundary matrices to obtain the Betti numbers. To that end, an optimized algorithm is developed for finding higher-order cavities. Finally, the paper shows how to apply the optimized algorithm to the neuronal network of C. elegans from a dataset, and find its all cliques and some cavities of different orders. Results

For computable undirected networks, the proposed approach is able to find all higher-order cliques and cavities, thereby obtaining the Euler characteristic number and all Betti numbers. These can provide global information for understanding and analyzing the relationships between topologies and functions of various complex networks such as brain neural networks.

1. Computable Networks

For undirected networks, the concept of clique in graph theory refers to a complete subgraph, e.g., a node is a 0-clique, an edge is a 1-clique, a triangle is a 2-cliques, etc. For example, it is easy to find all such cliques from the sample network shown in Figure 1. rXiv on 3 January 2021

Figure 1.

A sample network, with 14 nodes, 26 edges, 13 triangles, and 1 tetrahedron

For a given general large-scale complex network, however, finding all cliques of different orders is never an easy task. In fact, even just searching for a maximum clique from a large network is a computationally NP-complete problem . It is noticed that, to find all cliques of a large-scale undirected network, especially when the network is dense, the number of cliques are huge and will increase rapidly as the network size becomes larger. For example, in the real USair, Jazz and Yeast networks , if the number of cliques is limited to not more than 10 to be computable, the order of the cliques can go up only to 9, 6 and 4, respectively, as summarized in Table 1, where | N | (| E |) is the number of nodes (edges). If the number of cliques does not decrease with the increase of the order, it will become impossible to compute them by using personal computers. Table 1.

Three real networks: their sizes and maximum cores k max , maximum cliques c max and the maximum order of the cliques when their numbers < 10 Network | N | | E | k max c max max{ k | m k <10 } USAir 332 2126 26 > 21 9 ( m = 9121594) Jazz 198 2742 29 29 6 ( m = 2416059) Yeast 2375 11693 40 > 30 ? 4 ( m = 2454474) For large and dense networks, k -core decomposition can determine the cells (layers), where the 𝑘 th cell has all nodes with degrees at least k , and the kernel of the network has the largest core value, where nodes are very dense. Therefore, the largest core value k max can be used to estimate the order of a maximum clique. For this reason, 𝑘 -core decomposition is used to determine whether a given network is computable subject to the available limited computing resources. If the computing resources allow the number of cliques, with the first several lowest orders, be no more than 10 , which commercial laptops and PCs can handle, then the maximum core value should not be bigger than 30, say limited to k max = 25, as detailed in Supplementary Information 1.

2. Clique-Searching Algorithm

For computable networks, we propose an algorithm for searching cliques, namely a common-neighbors scheme, which can quickly find all cliques of different orders and the associate Euler characteristic number. rXiv on 3 January 2021

For illustration, consider the sample network shown in Figure 1. (1) Find all neighbors of each node, which are: Note 1 {2,3,4,5}, Node 2 {1,3,4,5}, Node 3 {1,2,4,6,8}, Node 4 {1,2,3}, Node 5 {1,2}, Node 6 {3,7}, Node 7 {6,8}, Node 8 {3,7}, Node 9 {5,10,11,12,13}, Node 10 {9,11,13,14}, Node 11 {9,10,12,14}, Node 12 {9,11,13,14}, Node 13 {9,10,12,14}, Node 14 {10,11,12,13}. Compute the number of nodes in 0-clique: m = 14. (2) Then, from the above list, generate edges in increasing order of node numbers: (1,2) ， (1,3) ， (1,4) ， (1,5) ， (2,3) ， (2,4) ， (2,5) ， (3,4) ， (3,6) ， (3,8) ， (5,9) ， (6,7) ， (6,14) ， (7,8) ， (9,10) ， (9,11) ， (9,12) ， (9,13) ， (10,11) ， (10,13) ， (10,14) ， (11,12) ， (11,14) ， (12,13) ， (12,14) ， (13,14). Compute the number of edges in 1-clique: m = 26. (3) For every edge, check if its two nodes have common neighbors (the index-number should be bigger than the index-numbers of both nodes), and record all such neighbors. For example, edge (1,2) has common neighbors {3,4,5}, edge (1,3) has {4}, edge (2,3) has {4}, edge (9,10) has {11,13}, edge (9,11) has {12}, edge (9,12) has {13}, edge (10,11) has {14}, edge (10,13) has {14}, edge (11,12) has {14}, edge (12,13) has {14}. However, edge (1,4) and edges (1,5), (3,4), (3,6), (3,8), (5,9), (6,7), (6,14), (7,8), (9,13), (10,14), (11,14), (12,14), (13,14) do not have any common neighbor. Thus, the following triangles are obtained: (1,2,3), (1,2,4), (1,2,5), (1,3,4), (2,3,4), (9,10,11), (9,10,13), (9,11,12), (9,12,13), (10,11,14), (10,13,14), (11,12,14), (12,13,14). Compute the number of triangles in 2-cliques: m = 13. (4) For each triangle, check if its three nodes have common neighbors (the index-number should be bigger than the index-numbers of three nodes), and record all such neighbors. Here, only triangle (1,2,3) has a common neighbor {4}, yielding 1 tetrahedron (1,2,3,4). Compute the number of tetrahedrons in 3-cliques: m = 1. (5) This does not yield any more higher-order clique. (6) Compute the Euler characteristic number : 𝜒 = 𝑚 − 𝑚 + 𝑚 − 𝑚 = 14 − 26 + 13 − 1 = 0 .

3. Computing Betti Numbers

Based on the above-obtained cliques of all orders, which can be used to generate boundary matrices B k , 𝑘 = 1,2, …, where B is the node-edge matrix, in which an rXiv on 3 January 2021 element is 1 if the node is on the corresponding edge; otherwise, it is 0. Similarly, B is the edge-face matrix, in which an element is 1 if the edge is on the corresponding face; otherwise, it is 0, etc. It is straightforward to compute the rank r k matrices B k for every 𝑘 = 1,2, … , using row-column operations in the binary field F , following the binary operation rules, namely . Then, the Betti number can be obtained as 𝛽 𝑘 = 𝑚 𝑘 − 𝑟 𝑘 − 𝑟 𝑘+1 . Figure 2.

A subnetwork of the network shown in Figure 1.

As an example, consider the network shown in Figure 2, which is a subnetwork of the one shown in Figure 1, with the node-edge boundary matrix B of rank 𝑟 = 7 as follows: B (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,6) (3,8) (6,7) (7,8) 1 1 1 1 1 0 0 0 0

0 0 0 0 2

0 0 0 1 1 1 0

0 0 0 0 3 0

0 0 1 0 0 1

1 1 0 0 4 0 0

0 0 1 0 1

0 0 0 0 5 0 0 0

0 0 1 0

0 0 0 0 6 0 0 0 0 0 0 0 0

0 1 0 7 0 0 0 0 0 0 0 0

0 0

0 1

Moreover, its edge-face boundary matrix of rank 𝑟 = 4 is obtained, as follows: B (1,2,3) (1,2,4) (1,2,5) (1,3,4) (2,3,4) (1,2) 1 1 1 0 0 (1,3) 1 0 0 1 0 (1,4) 0 1 0 1 0 (1,5) 0 0 1 0 0 (2,3)

0 0 0 1 (2,4) 0

0 0 1 (2,5) 0 0

0 0 (3,4) 0 0 0 Table 2 summarizes all data for the network shown in Figure 1, in which the Euler characteristic number and Betti numbers satisfy the Euler-Poincar é formula 𝜒 = 𝛽 − 𝛽 + 𝛽 − 𝛽 = 1 + 2 − 1 − 0 = 0 Table 2.

Data for the network shown in Figure 1

Order 𝑘

0 1 2 3 rXiv on 3 January 2021 𝑚 𝑘

14 26 13 1 𝑟 𝑘

0 13 11 1 𝛽 𝑘 = 𝑚 𝑘 − 𝑟 𝑘 − 𝑟 𝑘+1

1 2 1 0

4. Cavity-Searching Algorithm

The concept of cavity comes from the homology group in algebraic topology. Since a network usually has many 1-cycles, for instance the network shown in Figure 1 has nearly one hundred, to facilitate investigation they are classified into equivalent classes. In a network, each 1-cavity belongs to a linearly independent cycle-equivalent class with the total number equal to the Betti number 𝛽 . It is relatively easy to understand 1-cavity, which has boundary edges consisting of 1-cliques. It needs some imagination to understand higher-order cavities, which have boundary consisting of some cliques of the same order. So far, in the literature, only one 2-cavity consisting of 8 triangles is found and reported . In the present paper, we found all possible smallest cavities and list them up to order 11 in Supplementary Information 2. Since a cavity belongs to cycle-equivalent class, only one representative from the class with smallest length (namely, smallest number of cliques) is chosen for further discussion. To find the smallest one, however, optimization is needed. Select a maximum linearly independent group of column vectors from the boundary matrix B k as the minimum 𝑘 th-order spanning tree, which consists of r k k -cliques, where r k is the rank value of matrix B k . Then, perform row-column binary operations to reduce it to a simplest form. In very row of the resultant matrix, the column index of the first nonzero element is used as the index of the k -clique in the spanning tree. For the example shown in Figure 2, those bold-faced 1's in the matrix 𝐵 correspond to columns (1, 2), (1, 3), (1, 4), (1, 5), (3, 6), (3, 8), (6,7), which constitute a spanning tree. Note that the minimum 𝑘 th-order spanning trees are not unique in general. Then, find the maximum group of linearly independent column vectors from boundary matrix B k +1 , and obtain r k +1 ( k +1)-cliques as a group of linearly independent cliques. From this group, search for a 𝑘 -clique (the row index of the first nonzero element) that belongs to the boundary of the ( k +1)-clique but does not belong to the 𝑘 th-order spanning tree. In other words, the r k +1 k -cliques should not be a k -clique in the minimum spanning tree. If this cannot be found, then choose another maximum group of linearly independent column vectors from boundary matrix B k +1 , and try again. In this way, r k ( k +1)-cliques are found. As an example, see the example shown in Figure 2, where the bold-faced 1's in the boundary matrix 𝐵 correspond to the rXiv on 3 January 2021 rows (2, 3), (2, 4), (2, 5), (3, 4), which are different from the cliques in the spanning tree. Recall the formula of Betti numbers, 𝛽 𝑘 = 𝑚 𝑘 − 𝑟 𝑘 − 𝑟 𝑘+1 , which is the number of linearly independent k -cliques. Now, the task is to find the rest k -cliques that are not in the k th-order minimum spanning tree and also not on the boundaries of linearly independent ( 𝑘 + 1 )-cliques. These are called cavity-generating cliques. In the example shown in Figure 2, there is only one: (7, 8). On the minimum spanning tree, after including all linearly independent boundaries, adding every cavity-generating k -clique will create a linearly independent k -cavity. Every cavity-generating k -clique corresponds to at least one k -cavity. But, a cavity-generating 𝑘 -clique may correspond to several equal-length cavities, where the length is the number of cliques. Since a cavity is a linearly independent cycle with the smallest number of cliques, the task of searching for a cavity can be reformulated as a 0-1 programming problem. Recall that there are m k k -cliques, B k is the boundary matrix between a (𝑘 −1) -clique and a k -clique, B k +1 is the boundary matrix between a k -clique and a (𝑘 +1) -clique, and a 𝑘 -cavity consists of some 𝑘 -cliques. Let C k be the vector space based on k -cliques. A k -cavity can be expressed as 𝒙 = (𝑥 , 𝑥 , … , 𝑥 𝑚 𝑘 ) ∈ 𝐶 𝑘 , in which each component 𝑥 𝑖 takes value 1 or 0, where 1 represents a 𝑘 -clique with index 𝑖 in the cavity, while 0 means no such cliques. Now, suppose that a cavity-generating k -clique has index 𝑣 among all k -cliques and let 𝒆 = (1, 1, … , 1) 𝑇 . Then, the problem of searching for a k -cavity becomes the following optimization problem to solve for all nonzero solutions: min 𝒙∈𝐶 𝑘 𝑓(𝒙) = 𝒙𝒆 s.t. (1) 𝑥 𝑣 = 1, (2) 𝐵 𝑘 𝒙 𝑇 = 0 (mod 2) , (3) rank(𝒙 𝑇 , 𝐵 𝑘+1 ) 𝑭 ≠ r k +1 . Here, the first constraint means that the cavity comes from the cavity-generating k -clique with index 𝑣 . The second constraint implies that the cavity is a k -cycle, namely the boundaries of 𝑘 -cliques that form the cavity should appear in pairs. The third constraint shows that the k -cavity to be found is not a linear representation of the (𝑘 + 1) -cliques. This can avoid generating false cavities. It has been found that the sample network shown in Figure 1 has 2 1-cavities, where two cavity-generating 1-cliques are 𝑥 = 1 corresponding to edge (7, 8) and 𝑥 = 1 corresponding to edge (5, 9). Its optimization problem is as follows: min 𝒙∈𝐶 𝑓(𝒙) = 𝒙𝒆 s.t. (1) 𝑥 = 1 , (2) 𝐵 𝒙 𝑇 = 0 (mod 2) , namely x + x + x + x =0, x + x + x + x =0, x + x + x + x + x =0, x + x + x =0, x + x + x =0, x + x + x =0, x + x =0, x + x =0, x + x + x + x + x =0, x + x + x + x =0, rXiv on 3 January 2021 x + x + x + x =0, x + x + x + x =0, x + x + x + x =0, x + x + x + x + x =0, (3) rank(𝒙 𝑇 , 𝐵 ) 𝑭 ≠ r . Solving the above 0-1 programming problem, from 𝑥 = 1 corresponding to (7, 8) it yields 𝑥 = 1 corresponding to (3, 8), and from 𝑥 = 1 corresponding to (6, 7) it yields 𝑥 = 1 corresponding to (3, 6), leading to the first cavity (3, 6, 7, 8). Then, replacing 𝑥 = 1 by 𝑥 = 1 yields the second cavity (1, 5, 9, 10, 14, 6, 3). Finally, there are 8 equal-length cavities, including 1-cavity ( , 5, 9, 10, 14, 6, 3) and 1-cavity (1, 5, 9, , 14, 6, 3), etc.

5. Cliques and Cavities of C. elegans

For a dataset of C. elegans with 297 neurons and 2148 synapses , its all cliques and some cavities are obtained in this paper by using the above-described techniques, which are compared to the typical scale-free network (SF), small-world network (SW) and random network (ER) models with the same number of nodes and edges. The results are shown in Figure 3 and Table 3. Figure 3.

The number of cliques and the Betti numbers for the

C. elegans versus ER, SF and SW networks

Table 3.

The Euler characteristic number, Betti numbers and the Euler-Poincar é formula Network

The Euler characteristic number, Betti numbers and the Euler-Poincar é formula C. elegans 𝜒 = − − − − = − − = − ER 𝜒 = − − = − = − SW 𝜒 = − − − = − = − SF 𝜒 = − − − − = − = − Since the highest-order nonzero Betti number is 𝛽 = 4 , the C. elegans has 4 linearly independent 3-cavities, the two cavities in which has cavity-generating rXiv on 3 January 2021 as shown in Figure 4 (a). The cavity-generating 3-clique (118, 119, 167, 227) forms a 3-cavity with 11 nodes: (162, 3, 163, 158, 13, 85, 227, 118, 167, 154, 119) as shown in Figure 4 (b), see Supplementary Information 3. (a) 3-cavity with 8 nodes (b) 3-cavity with 11 nodes Figure 4.

Two 3-cavities in the C. elegans neuronal network

Discussions

For a directed network, how to analyze higher-order cliques and cavities? In [9], by introducing directed cliques it develops a Hasse algorithm to find directed cliques. However, both concepts of cycle and especially cavity were not precisely defined therein. For an undirected network, the length of a cavity, namely the number of cliques that compose it, is longer than the length of the clique as a cycle having the same order with the cavity. For example, an undirected triangle of length 3 not only is a 2-clique but also is a 1-cycle, while 1-cavity at lest is a tetrahedron with length 4. For a directed network, however, this may not be true. For example, the smallest 1-cavity could be composed by two oppositely directed edges, with length 2, but a directed 2-clique could be a directed triangle of length 3. This implies the extreme complexity of directed cavities, which will be a topic for future investigation. It should be noted that the key technique in the present approach is to examine various combinations of cliques and cavities, which differs from the focus on node degrees in the current investigation of complex networks, where the focus is on the statistical rather than topological properties. After comparing the neuronal network of the C. elegans to the scale-free network, small-world network and random network models, we found that they are very different regarding the numbers of cliques and cavities. From the perspective of brain science, various combinations of higher-order topological components such as cliques and cavities are of extreme importance, without which it is very difficult or even impossible to understand and explain the rXiv on 3 January 2021 functional complexity of the brain. In fact, this seems provide reasonable supports to the recent works of many brain scientists.

Method

1. Maximum Clique and 0-1 Programming

The clique-searching algorithm aims to find all cliques of different orders. To find a maximum clique is to find a complete subgraph of the largest size. This is a classical combinatorial optimization problem, which is NP-complete in computational complexity. A common approach to finding a solution is the branch-and-bound method, which constructs a binary tree through all nodes on the network. Searching for cavities of different orders can be formulated as a typical 0-1 programming problem, which can also be implemented using the binary tree method, as follows. Let 𝑥 , 𝑥 , … , 𝑥 𝑛 be integer variables taking values 0 or 1. Denote the problem by 𝑃(𝑥 , 𝑥 , … , 𝑥 𝑛 ) , and its relaxed (not restricted to be integers) linear programming by 𝐿𝑃(𝑥 , 𝑥 , … , 𝑥 𝑛 ) with an optimal solution 𝑓(𝑥 , 𝑥 , … , 𝑥 𝑛 ) . First, solve two sub-problems 𝑃(0, 𝑥 , … , 𝑥 𝑛 ) and 𝑃(1, 𝑥 , … , 𝑥 𝑛 ) by using 𝐿𝑃(0, 𝑥 , … , 𝑥 𝑛 ) and 𝐿𝑃(1, 𝑥 , … , 𝑥 𝑛 ) , obtaining two solutions 𝑓(0, 𝑥 , … , 𝑥 𝑛 ) and 𝑓(1, 𝑥 , … , 𝑥 𝑛 ) , respectively. If both are integer solutions, then the smaller one will be the optimal solution of the original problem. If only one is an integer solution, but it is not larger than the other one, then this integer is the optimal solution. If, however, the result is not either of the above two cases, then continue to solve new sub-problems with the second variables being 0 or 1, and so on, till the end.

2. Homology Groups and

Cavity-Searching

Algorithm

The vector space based on cliques includes the kernel space and the image space . For example, the vector space 𝐶 of all edges (1-cliques) as its basis contains elements called 1-chains. A chain 𝑙 ∈ 𝐶 , which satisfies 𝜕 (𝑙) = 0 , is a 1-cycle, where 𝜕 𝑘 : 𝐶 𝑘 → 𝐶 𝑘−1 , 𝑘 = 1,2, … , is the boundary operator. All 1-cycles constitute the kernel space 𝑍 = ker(𝜕 ) , while 𝜕 (△) is the image of mapping from 𝐶 to 𝐶 , where △ represents a triangle. All such images together constitute the image space 𝑌 = im(𝜕 ) . Note that the homology group is defined by Z k / Y k = ker(𝜕 𝑘 ) / im(𝜕 𝑘+1 ) , where Y k ⊆ Z k ⊆ C k , so the cavity-searching algorithm must be subject to three constraints: (1) 𝑥 𝑣 = 1, (2) 𝐵 𝑘 𝒙 𝑇 = 0 (mod 2) , (3) rank(𝒙 𝑇 , 𝐵 𝑘+1 ) 𝑭 ≠ r k +1 . A feasible and improved method is to use some information provided by the eigenvector of higher-order Laplacian matrices to solve a dimension-descending 0-1 programming for the l th-cavity 𝒙 (𝑙) , 𝑙 = 1,2, … , 𝛽 𝑘 : (1) 𝑥 𝑣1(𝑙) = 1 , ⋯ , 𝑥 𝑣𝑖(𝑙) = 1 , rXiv on 3 January 2021 𝑥 𝑢1(𝑙) = 0 , ⋯ , 𝑥 𝑢𝑗(𝑙) = 0 ; (2) 𝐵 𝑘 𝒙 𝑇 = 0 (mod 2) ; (3) rank(𝒙 (1) , ⋯ , 𝒙 (𝑙) , 𝐵 𝑘+1 ) 𝑭 = l + r k +1 , which can ensure that the cavities found are linearly independent. Data availability

Data used in this work can be accessed at http://linkprediction.org/index.php/link/resource/data/1

References

1 Watts, DJ, Strogatz, SH. Collective dynamics of 'small-world' networks. 1998;

Nature : 440-442. 2 Erdös, P, Rényi, A. On Random Graphs.

Publicationes Mathematicae : 290-291. 3 Barabási, A-L, Albert, R. Emergence of scaling in random networks. 1999; Science : 509-512. 4 Shi, DH, Chen, GR, Thong, WWK et al . Searching for optimal network topology with best possible synchronizability,

IEEE Circ. Syst . Magaz. : 66-75. 5 Shi, DH, Lü, LY, Chen, GR. Totally homogeneous networks. Natl. Sci. Rev. : 962-969. 6 Zomorodian, A, Carlsson, G. Computing persistent homology. Discrete Comput. Geom. : 249-274. 7 Gu, D. XF, Yau, ST. Computational Conformal Geometry----Theory . International Press of Boston, Inc . 2008. 8 Sizemore, AE, Giusti, C. Kahn, A et al . Cliques and cavities in the human connectome.

J. Comput. Neurosci. : 115-145. 9 Reimann, MW, et al . Cliques of Neurons Bound into Cavities Provide a missing link between structure and function. Frontiers in Comput. Neurosci. : 00048. 10 Mohamad H. Hassoun,

Fundamentals of Artificial Neural Networks . MIT Press , 1995. 11 Lechner, M, Hasani, R et al. Neural circuit policies enabling auditable autonomy.

Nature Machine Intelligence . 2020; : 542-652. 12 https://en.wikipedia.org/wiki/BRAIN_Initiative https://braininitiative.nih.gov/ 13 https://en.wikipedia.org/wiki/China_Brain_Project 15 Battiston, F, Latora, V, Petri, G et al. Networks beyond pairwise interactions: structure and dynamics, Phys Rep . 2020; : 004. 16 Millan, A. P., Torres, J, J, & Bianconi, G. Explosive higher-order dynamics on simplicial complexes. Phys. Rev. Lett. : 218301. (2020) 17 Kitsak, M, Makse, HA, et al.. Identification of influential spreaders in complex networks,

Nat. Phys . 2010; (11): 888-893 18 Bomze, IM, Budinich, M, Pardalos, PM, & Pelillo, M. The maximum clique problem. In

Handbook of Combinatorial Optimization , pp. 1-74.

Springer , Boston, MA, 1999. 19

Fan, TL, Lü, LY, Shi , DH & Zhou T. Characterizing cycle structure in complex networks. arXiv:2001.08541 [physics.soc-ph] 20

Rossi, RA, & Ahned, NK. The network data repository with interactive graph analysis and visualization. In

Twenty-Ninth AAAI Conference , AAAI Press , 2015; 4292-4293. rXiv on 3 January 2021

Supplementary Information 1. 𝒌 -Cores and Computable Networks For the real USair, Jazz and Yeast networks , the number of cliques of different orders is limited to not more than 10 as detailed in Table SI-1. Table SI-1

Number of cliques of different orders in real networks Network 0-cliques 0-cliques 2-cliques 3-cliques 4-cliques USAir 332 2126 12181 61072 243506 Jazz 198 2742 17899 78442 273697 Yeast 2375 11693 60689 424444

Network 5-cliques 6-cliques 7-cliques 8-cliques 9-cliques USAir 766659 1931547 3947163 6608097

Jazz 845960

Yeast

If the number of cliques does not decrease with the increase of the order, it will become impossible to compute them by using personal computers. It is noted that, in any network of a fixed size, except trees, its number of cliques of different orders has a peak value as the order number increases, namely it is first increasing and then decreasing. For instance, for a fully-connected network of size 𝑁 , the numbers of its 𝑚 -th order cliques are: 𝑚 = 𝐶 𝑚1 , 𝑚 = 𝐶 𝑚2 , ⋯ , 𝑚 𝑁−2 = 𝐶

𝑁𝑁−1 , 𝑚 𝑁−1 = 𝐶 𝑁𝑁 , where it peaks at ( 𝑁2 − 1) -clique (if 𝑁 is even) or ( 𝑁−12 − 1) -clique (if 𝑁 is odd). For example, when 𝑁 = 30 , it peaks at the 14-clique, with m =155117520; when 𝑁 = 25 , it peaks at the 12-clique, with 𝑚 = 𝐶 = 5,200,300 . Given limited computational resources, how can one determine if a given network is computable? For relatively large-scale and dense networks, 𝑘 -core decomposition may be used to roughly give an estimate. The 𝑘 -core technique can be used to determine the cell of different orders, where all nodes on the 𝑘 -sell have degree larger than or equal to 𝑘 . The cell with the largest core value is the core of the network, where the connection is dense, therefore it can be used for measuring the order of the largest clique in the network. For example, in the Jazz network, the 29th cell has 30 nodes and 435 edges, implying that this is a fully-connected network; therefore, its core is a 29-clique, which is the order of the largest clique of the Jazz network. In the USAir network, the largest core value is 26, where the core has 35 nodes and 539 edges; therefore, its largest clique is a 21-clique, which is close to the core value 26. For the Yeast network, its core has 64 nodes and 1623 edges, which is rXiv on 3 January 2021 known to have largest core value of 40; although the computation here reaches up to 6-clique, it can be seen that the order of the largest clique would not be small. The detailed core values of USAir, Jazz and Yeast are summarized in Table SI-2, where m i is the core value of the 𝑖 -core, 𝑖 = 0, 1, 2, … , 29 . Table SI-2

Core values of real networks

Core value m m m m m m m m m m USAir-26

35 539 4938 30580 137428 468604 1248988 2656044 4570650 6425067

Jazz-29

30 435 4060 27405 142506 593775 2035800 5852925 14307175 30045015

Yeast-40

64 1623 22344 196991 1222179 5656082 20278476

Core value m m m m m m m m m m USAir-26

Jazz-29

Yeast-40

Core value m m m m m m m m m m USAir-26

132 5 0

Jazz-29

Yeast-40

The above analysis shows that, given the limited computational resources today, if the number of 𝑘 -cliques is up to the order of then the largest core value of the network should not be larger than 30, or even should be restricted to be below 25. If the 𝑘 -core decomposition is performed by removing all nodes of degree 𝑘 = 1 then some new nodes of degree 𝑘 ≤ 1 may emerge, and these nodes need to be removed as well, until all nodes have degree 𝑘 > 1 . All removed nodes and edges constitute 1-core with core value 1. This process continues for 𝑘 = 2, 3, … , until the highest value 𝑘 𝑚𝑎𝑥 at which all nodes will be removed, and this last core has a core value 𝑘 𝑚𝑎𝑥 , and is the core of the original network. The same idea can be used for cliques, named 𝑘 -clique decomposition. Consider the sample network shown in Fig. 1, for instance. This network does not have 0-core and 1-core, and its 2-core contains nodes 6, 7, 8 and edges (3,6), (3,8), (6,7), (6,14), (7,8). Its 1-clique is composed of edges (3,6), (3,8), (6,7), (6,14), (7,8). Its 3rd sell consists of nodes 1, 2, 3, 4, 5 and edges (1,2), (1,3), (1,4), (1,5), (2,3), (2,4), (2,5), (3,4), (5,9). Its 2-clique is composed of edges (1,2,5), (9,10,11), (9,10,13), (9,11,12), rXiv on 3 January 2021 (9,12,13), (10,11,14), (10,13,14), (11,12,14), (12,13,14). Its 4th cell consists of nodes 9, 10, 11, 12, 13, 14and edges (9,10), (9,11), (9,12), (9,13), (10,11), (10,13), (10,14), (11,12), (11,14), (12,13), (12,14), (13,14). Its 3-cliques is composed of (1, 2, 3, 4). This example shows the difference between the 𝑘 -coe decomposition and the 𝑘 -clique decomposition.

2. Smallest Possible Cavities of Different Orders

The concept of cavity comes from homology group in algebraic topology. Cavity is a special topological structure. The 1-cavity and 2-cavity have been found by observation . In general, a smallest 𝑛 -cavity is the smallest cycle consisting of some 𝑛 -cliques, where the number of such 𝑛 -cliques is larger than the number of boundaries of (𝑛 + 1) -cliques. Furthermore, a smallest 𝑛 -cavity can be obtained by introducing 2 more nodes, each connects to all nodes in the smallest (𝑛 − 1) -cavity. Today, it is suspected that there is as high as 11th-order cavity in the neural network of the brain . It is also known that the smallest 𝑘 -cavity has a characteristic number : 𝜒 = 1 + (−1) 𝑘 . Numbers and features of smallest cavities of order 1 to order 11 are summarized in Fig. SI-1. m =4 ， m =4 ；  =0 2-cavity: m =6 ， m =12 ， m =8 ；  =2 3-cavity: m =8 ， m =24 ， m =32 ， m =16 ；  =0 ， m1=40 ， m2=40 ， m3=80 ， m4=32 ；  =2 5-vacity: m0=12 ， m1=60 ， m2=120 ， m3=240 ， m4=192 ， m5=64 ；  =0 rXiv on 3 January 2021 ， m1=84 ， m2=280 ， m3=560 ， m4=672 ， m5=448 ， m6=128 ；  =2 =16 ， m =112 ， m =448 ， m =1120 ， m =1792 ， m =1792 ， m =1024 ， m =256 ；  =0 8-cavity: m =18 ， m =144 ， m =672 ， m =2016 ， m =4032 ， m =5376 ， m =4608 ， m =2304 ， m =512 ；  =2 9-cavity: m =20 ， m =180 ， m =960 ， m =560 ， m =3360 ， m =8064 ， m =13440 ， m =15360 ， m =11520 ， m =1024 ；  =0 m =22 ， m =220 ， m =1320 ， m =5280 ， m =14784 ， m =29568 ， m =42240 ， m =42240 ， m =28160 ， m =11264 ， m =2048 ；  =2 11-cavity: m =24 ， m =264 ， m =1760 ， m =7920 ， m =25344 ， m =59136 ， m =101376 ， m =125720 ， m =112640 ， m =67584 ， m =24576 ， m =4096 ；  =0 Figure SI-1.

Smallest cavities of order 1 to order 11.

3. Cliques and Cavities in C. elegans Network

For a dataset of C. elegans with 297 neurons and 2148 synapses , its cliques, rXiv on 3 January 2021 ranks and cavities are all obtained by using an available algorithm , with results summarized in Table SI-3. For the results of comparisons of this C. elegans network with the scale-free network, small-world network and random network models of the same seize (same number of nodes and edges), see Excel 1. Table SI-3.

C. elegans Network Clique 𝑚 𝑚 𝑚 𝑚 𝑚 𝑚 𝑚 𝑚 𝑚

297 2146 3241 2010 801 240 40 2 0 Rank 𝑟 𝑟 𝑟 𝑟 𝑟 𝑟 𝑟 𝑟 𝑟

0 296 1713 1407 599 202 38 2 0 Betti number 𝛽 𝛽 𝛽 𝛽 𝛽 𝛽 𝛽 𝛽 𝛽

1 139 121 4 0 0 0 0 0

Based on the data in Table SI-3, using the 0-1 programming, it is possible to find 2 different 3-cavities, for details below. The first 3-cavity with 8 nodes is surrounded by the following 16 3-cliques: (3,13,85,158) (13,85,158,118) (85,158,118,119) (158,118,119, 163) (118,119,163,164) (119,163,164,3) (163,164,3,13) (164,3,13,85) (3,85,158,119) (13,158,118, 163) (85,118,119,164) (158,119,163,3) (118,163,164,13) (119,164,3,85) (163,3,13,158) (164,13,85,118) The second 3-cavity with 11 nodes is surrounded by the following 28 3-cliques: (162,163,3,13) (162,163,3,119) (162,163,13,118) (162,163,118,119) (163,158,3,13) (163,158,3,119) (163,158,13,118) (163,158,118,119) (158,85,3,13) (158,85,3,119) (158,85,13,118) (158,85,118,119) (85,227,3,13) (85,227,3,119) (85,227,13,118) (85,227,118,119) (227,167,3,13) (227,167,3,119) (227,167,13,118) (227,167,118,119) (167,154,3,13) (167,154,3,119) (167,154,13,118) (167,154,118,119) (154,162,3,13) (154,162,3,119) (154,162,13,118) (154,162,118,119)(158,119,163,3) (118,163,164,13) (119,164,3,85) (163,3,13,158) (164,13,85,118) The second 3-cavity with 11 nodes is surrounded by the following 28 3-cliques: (162,163,3,13) (162,163,3,119) (162,163,13,118) (162,163,118,119) (163,158,3,13) (163,158,3,119) (163,158,13,118) (163,158,118,119) (158,85,3,13) (158,85,3,119) (158,85,13,118) (158,85,118,119) (85,227,3,13) (85,227,3,119) (85,227,13,118) (85,227,118,119) (227,167,3,13) (227,167,3,119) (227,167,13,118) (227,167,118,119) (167,154,3,13) (167,154,3,119) (167,154,13,118) (167,154,118,119) (154,162,3,13) (154,162,3,119) (154,162,13,118) (154,162,118,119)