STruD: Truss Decomposition of Simplicial Complexes
Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi
SST D : Truss Decomposition of Simplicial Complexes Giulia Preti
ISI FoundationTurin, [email protected]
Gianmarco De Francisci Morales
ISI FoundationTurin, [email protected]
Francesco Bonchi
ISI Foundation, ItalyEurecat, [email protected]
ABSTRACT
A simplicial complex is a generalization of a graph: a collectionof = -ary relationships (instead of binary as the edges of a graph),named simplices. In this paper, we develop a new tool to study thestructure of simplicial complexes: we generalize the graph notionof truss decomposition to complexes, and show that this morepowerful representation gives rise to di erent properties comparedto the graph-based one. This power, however, comes with importantcomputational challenges derived from the combinatorial explosioncaused by the downward closure property of complexes.Drawing upon ideas from itemset mining and similarity search,we design a memory-aware algorithm, dubbed ST D , which isable to e ciently compute the truss decomposition of a simplicialcomplex. ST D adapts its behavior to the amount of availablememory by storing intermediate data in a compact way. We thendevise a variant that computes directly the = simplices of maximumtrussness. By applying ST D to several datasets, we prove itsscalability, and provide an analysis of their structure.Finally, we show that the truss decomposition can be seen as a ltration , and as such it can be used to study the persistent homology of a dataset, a method for computing topological features at di erentspatial resolutions, prominent in Topological Data Analysis. CCS CONCEPTS • Mathematics of computing ! Hypergraphs ; Graph algorithms ; • Information systems ! Data mining . KEYWORDS
Graph mining, truss decomposition, simplicial complex, higherorder, topological data analysis.
ACM Reference Format:
Giulia Preti, Gianmarco De Francisci Morales, and Francesco Bonchi. 2021. ST D : Truss Decomposition of Simplicial Complexes. In Proceedings of theWeb Conference 2021 (WWW ’21), April 19–23, 2021, Ljubljana, Slovenia.
ACM,New York, NY, USA, 11 pages. https://doi.org/10.1145/3442381.3450073
Graphs have been widely used to model complex relationships( edges ) between pairs of entities ( vertices ). Aiming to discover use-ful patterns such as communities, and identify important vertices ina graph, several metrics and substructures have been de ned, andapplied in settings such as biology, sociology, and Internet topol-ogy. Among them, truss decomposition has received considerable This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © authors conference year DH, JK SODA 1994CF, JK, JL SIGKDD 2005CF, DC, JK, JL PKDD 2005CF, DC CSUR 2006CF, JL SIGKDD 2006JK, JL, LB SIGKDD 2009DH, JK, JL WWW 2010LB, JL WSDM 2011LB, JK CSCW 2014 DC CFJLLBDHJK DC CFJLLBDH JK
Figure 1: Authors, conference, and year of publication of aset of papers (left), the corresponding co-authorship rela-tions represented as a simplicial complex K (center), and asa simple collaboration network ⌧ (right). attention [21, 40] thanks to its e ciency and e ectiveness. Trussesare cohesive subgraphs that are rich in triangles: a relaxation ofcliques that can be computed exactly in polynomial time.However, many real-world interactions occur among more thantwo entities at once [8, 41]. For example, people are likely to col-laborate in groups when writing papers, and often send emails thathave multiple recipients. Simple graphs are not able to capture suchhigher-order relationships, because they cannot distinguish the caseof three authors writing three papers in pairs from the case whereall three authors collaborate on a single paper. Hypergraphs [10] and simplicial complexes [2] are higher-order generalizations of simplegraphs that can characterize interactions between any number ofentities [38]. Hypergraphs generalize graphs by allowing an edge toconnect several vertices. A hypergraph is a pair ( + , ⇢ ) where + is a set of vertices and ⇢ a family of subsets of vertices called hyperedges . Conversely, a simplicial complex is a collection of poly-topes such as triangles and tetrahedra, which are called simplices .Both these structures can be used to represent any higher-orderrelation [37]. The key di erence between hypergraphs and sim-plicial complexes is that the latter satisfy the downward closure property: every substructure (also known as face ) of a simplex thatis contained in a complex K is also in K . While it might appearconstraining, this property naturally arises in all systems charac-terized by interactions that are “maximal” [42]: e.g., in scienti ccollaborations (the authors of a paper are all co-authors) or geneactivation pathways (largest group of collectively activated genes). E The table in Figure 1 (left) reports authors, conference,and year of publication of a set of papers extracted from DBLP. Atypical representation of this type of data is a collaboration networkbuilt around the co-authorship relation: two authors (vertices) areconnected by an edge if they have been co-authors in some paper. Thistype of simple graph is reported in Figure 1 (right).A richer representation can be obtained by interpreting each subsetof authors of a paper as co-authors, and representing it as a simplex:this leads to the simplicial complex K illustrated in Figure 1 (center).or example, D. Chakrabarti (DC), C. Faloutsos (CF), J. Kleinberg (JK),J. Leskovec (JL) form a tetrahedron because they wrote together apaper at PKDD 2005. Although in the table there is no paper authoredsolely by (DC, JK, JL), such a triangle is nevertheless part of the sim-plicial complex K , which represents the fact that these three scientistshave been co-authors in a paper (even though with other co-authors).The simplicial complex conveys information that is lost if the co-authorship relations are modeled as a simple collaboration network.For instance, the latter would represent the case where JK, JL, and DHwrote a paper together in the same way as the case where they wrotepapers only in pairs. Simplices and simplicial complexes have been successfully ap-plied to analyze the organization of the brain [28, 30], understandthe mechanics of social contagion [16], predict the appearance ofnew links [3], and study protein interaction networks [11].In this paper, we develop a new tool to study the structure ofsimplicial complexes, by generalizing the graph notion of trussdecomposition. The de nition of truss decomposition in a graphis based on triangles, i.e., a : -truss is a subgraph whose edgesparticipate in (are supported by) at least : triangles. When theinput is a simplicial complex, a truss can as well be determined bythe existence of higher-order structures.
Challenges and contributions.
We bridge two di erent disci-plines, i.e., graph mining and Topological Data Analysis (TDA),by generalizing the notion of truss decomposition to simplicialcomplexes. Our problem statement is a proper generalization oftruss decomposition on graphs: if the simplicial complex in inputis a collection of only 1-dimensional simplices (edges), or in otherterms, it is a graph, then the truss decomposition of the simplicialcomplex corresponds to that of the graph.We show several properties of simplicial truss decomposition,and show how this more expressive representation di erentiatesitself from the graph-based one. However, computing it is muchharder than for a graph. Indeed, due to the downward closureproperty of simplicial complexes that produces a combinatorialexplosion of the candidate substructures, the computation becomesespecially demanding in terms of memory.To tackle these signi cant computation challenges, this paperintroduces ST D , an e cient and scalable algorithm for simplicialtruss decomposition. A key observation towards developing ourmethod is an a-priori property similar to the one from frequentitemset mining: a ( @ + ) -simplex has simplicial trussness no largerthan a @ -simplex that it contains. Based on this key observation,we derive lower and upper bounds for the simplicial trussness ofa simplex which are at the basis of our method. Our algorithmrelies on nding joists , higher-order generalization of triangles thatcompose the support of a simplex. The identi cation of the joists ofa simplex is a major computational bottleneck, as their number canbe extremely large. To tackle this challenge, we leverage a compactinverted index, and devise a memory-aware strategy that switchesto out-of-core operations depending on the memory available. Ourextensive empirical assessment con rms the scalability of ST D and of the relevance of the simplicial truss decomposition.Finally, we show how the simplicial truss decomposition canbe interpreted and used as a ltration in the context of persistenthomology [14], a technique used to quantify the shape of the datavia summarization of its topological features. Persistent homology Table 1: Simplicial trussness CA K of the simplices in K in Fig-ure 1 (center), number of joists | | containing each simplex,and graph trussness CA ⌧ of the edges in ⌧ in Figure 1 (right). f ( @ ) | | CA K CA ⌧ f ( @ ) | | CA K CA ⌧ [ ⇠ , ⇡⇠ ] [ ⇡ , ! ] [ ⇠ , ] [ ⇡ , , ! ] [ ⇠ , ! ] [ , ! , !⌫ ] [ ⇡⇠ , ] [ ⇡⇠ , , ! ] [ ⇡⇠ , ! ] [ ⇠ , , ! ] [ , ! ] [ ⇠ , ⇡⇠ , ! ] [ !⌫ , ] [ ⇠ , ⇡⇠ , ] [ !⌫ , ! ] [ ⇠ , ⇡⇠ , , ! ] [ ⇡ , ] requires to de ne a nested sequence of simplicial complexes called ltration, and tracks the topological features which exhibit longpersistence through the ltration. Persistent homology has beenapplied, among others, to characterize nancial markets [39], studymultivariate time series [35], and analyze sensor networks [9].The contributions of our work can be summarized as follows: • We de ne the truss decomposition of a simplicial complex. • We design an e cient and scalable algorithm that leveragesbounds and techniques from itemset mining and similaritysearch to compute the simplicial truss decomposition. • We provide extensive empirical evidence of the scalability ofthe algorithm, and of the relevance of the simplicial truss de-composition. • We showcase the use of the simplicial truss decomposition as a ltration to compute a persistent homology. We next provide the formal de nitions of the needed notions andof the computational problem addressed in this paper.Let V be a ground set of elements. A q-dimensional simplex , orsimply q-simplex , is a relation connecting @ + elements of V . D @ , , ). A @ -simplex f ( @ ) isa set of @ + elements (or vertices) f ( @ ) = [ E , . . . , E @ ] ⇢ V . Each A -simplex f ( A ) ⇢ f ( @ ) (with A < @ ) is called a face of f ( @ ) , and it iscalled a coface when A = @ . According to this de nition, a vertex of a graph is a -simplex,and an edge a -simplex, while a -simplex can be seen as a triangle,a -simplex a tetrahedron, and so on. For instance, in Figure 1(center), the triplet of authors (LB, JL, JK) constitutes a -simplex f ,and each pair of authors is a coface of f . D , , = ). A simplicial complex K is a set of simplices such that all the facesof all the simplices in K are also in K . The dimension of K is thedimension of its largest simplex, and the = -skeleton of K is the subsetof simplices of K of dimensions @ = . According to this de nition, the -skeleton of a simplicial complexcorresponds to the underlying (undirected simple) graph of K . Forinstance, the -skeleton of the simplicial complex in Figure 1 (center)is the graph in Figure 1 (right). The original de nition of : -truss though, requires that each edge take part to at least : triangles, so that a : -clique is a : -truss [7]. Given that the two de nitions areanalogous, for convenience in this work we adopt the one requiring : triangles. n a simple graph ⌧ = ( + , ⇢ ) , a k-truss is de ned as the sub-graph ⌧ ) : of ⌧ induced by the edges ( that participate in at least : triangles in ( [7]. According to this de nition, for each edge in the : -truss ⌧ ) : , there exist : pairs of edges [ , ] such thatthe edges [ , , ] forms a triangle in ⌧ ) : ; as such, a : -clique is a ( : ) -truss. As an example, the graph in Figure 1 (right) containsthe -truss formed by the subgraph induced by the vertices (CF,DC, JK, JL), and the -truss formed by the whole graph. We gener-alize this notion to simplicial complexes by considering @ -simplicesinstead of edges, and joists (de ned next) instead of triangles. D ). The joist of a simplex f ( @ ) is the set f ( @ ) of all its cofaces (of which there are @ + ). D : ). Given a simplicial complex K , a : -truss in K is a maximal set of simplices ) : ✓ K of dimensiongreater than , such that for each f ( @ ) = [ E , . . . , E @ ] ) : there existat least : joists , . . . , : such that [ , : ] , f ( @ ) and g , g ) : . The maximal : such that f ( @ ) ) : is called simplicialtrussness of f ( @ ) and denoted as CA K ( f ( @ ) ) . E The simplicial complex in Figure 1 (center) containsa -truss and a -truss. The -truss includes the simplices in Table 1with simplicial trussness CA K greater or equal to , while the -trussincludes the simplices with simplicial trussness greater or equal to .Di erently from the standard truss decomposition, in this case the -truss ) contains also all the -simplices (triangles) spanned by thevertices CF, DC, JK, and JL. We next provide two key properties of simplicial : -trusses. Theproofs can be found in Appendix B. P ). The : -truss of a simplicial complex K is unique. P ). The ( : + ) -truss of a simplicialcomplex K is a subset of its : -truss. Thanks to the properties of uniqueness and containment we cande ne a simplicial truss decomposition , i.e., the problem of comput-ing all the non-empty : -trusses of a simplicial complex K . P T D ). Given a sim-plicial complex K , nd the simplicial truss decomposition of K , i.e.,the sequence of : -trusses T = [ ) , . . . , ) ] , where is the maximalinteger such that ) < ú . O Our problem statement is a proper generalizationof the standard truss decomposition on graphs. When a simplicialcomplex K contains only binary relationships, there exists a bijection : T G T between the simplicial truss decomposition T of K and the standard truss decomposition G T of the -skeleton of K . Thebijection maps each : -truss ) : T to the : -truss ⌧ ) : G T , byassociating each -simplex [ E , E ] to the edge ( E , E ) .For instance, the simplicial truss decomposition of the graph inFigure 1 (right) is ) = {[ ⇠ , ⇡⇠ ] , [ ⇠ , ] , [ ⇠ , ! ] , [ ⇡⇠ , ] , [ ⇡⇠ , ! ] , [ , ! ] , [ !⌫ , ] , [ !⌫ , ! ] , [ ⇡ , ! ] , [ ⇡ , ]} and ) = {[ ⇠ , ⇡⇠ ] , [ ⇠ , ] , [ ⇠ , ! ] , [ ⇡⇠ , ] , [ ⇡⇠ , ! ] , [ , ! ]} ,which maps to the -truss ⌧ ) = ⌧ and the -truss ⌧ ) with edge set {( ⇠ , ⇡⇠ ) , ( ⇠ , ) , ( ⇠ , ! ) , ( ⇡⇠ , ) , ( ⇡⇠ , ! ) , ( , ! )} . The original de nition of : -truss requires each edge to take part to at least : triangles, so that a : -clique is a : -truss. Given that the two de nitions are analogous,for convenience in this work we adopt the one requiring : triangles. In this section we present our algorithms for computing the simpli-cial truss decomposition. A key property (proof in in Appendix B)derives directly from the downward closure of simplicial complexes,and helps greatly to prune the search space, similarly to the a-priori property of frequency in frequent itemset mining [1]. P P ). A @ -simplex f ( @ ) that is a faceof a ( @ + ) -simplex f ( @ + ) has trussness CA K ( f ( @ ) ) CA K ( f ( @ + ) ) . A corollary of Property 3 is that a : -truss ) : is a simplicialcomplex, because if a @ -simplex belongs to ) : , then all its facesbelong to ) : as well. In addition, we can identify a lower bound ofthe simplicial trussness of any f ( @ ) in K by looking at the largestsimplex that contains f ( @ ) (i.e., that f ( @ ) is a face of): P ). Let f ( @ ) be a @ -simplex and f ( @ + ⌘ ) be the largest simplex in K such that f ( @ ) ⇢ f ( @ + ⌘ ) . Then, CA K ( f ( @ ) ) ⌘ @ . Finally, since a simplicial truss is de ned by a set of joists, anupper bound of the trussness of f ( @ ) is given by the total numberof joists containing f ( @ ) : P ). Let f ( @ ) be a @ -simplex in K and f ( @ + ) indicate a joist of a ( @ + ) -simplex. Then, CA K ( f ( @ ) ) f ( @ + ) | f ( @ + ) ✓ K ^ f ( @ ) f ( @ + ) . We propose algorithms to solve two di erent variants of thetruss decomposition problem. The rst algorithm performs thecomplete truss decomposition of the simplicial complex, and thesecond one nds the top- = simplices with maximum trussness andgiven size @ . Both algorithms follow a 3-step, apriori-like approachthat materializes and examines simplices of increasing dimension.In the rst step, they extend the simplices retained in the previousiteration by adding an additional vertex; in the second step, theysearch for all the sets of simplices that form a joist; and in the laststep, they compute the trussness of each simplex. The simpliceswith positive trussness will be extended in the following iteration.The pseudocode of our ST D algorithm is reported in Algorithm 1.Recall that a @ -simplex f ( @ ) belongs to a joist when, togetherwith other ( @ + ) @ -simplices, it forms the cofaces of the same ( @ + ) -simplex. Therefore, f ( @ ) shares exactly @ vertices with anyother simplex in the joist. As a consequence, we can run ST D separately on each connected component of the -skeleton of thesimplicial complex. Moreover, thanks to Property 3, only the sim-plices with trussness greater than are extended by the procedure S (line 6). This procedure receives in input a set ofsimplices ( and the set of simplices in the connected componentunder examination ⇠ , and extends each f ( by appending eachvertex E + f , the set of all the vertices that appear in some f K such that f ⇢ f . Generating the extensions by using only thosevertices guarantees that we create only simplices that exist in K .Once the set of extended simplices ⇢ is computed, we need to nd all the joists formed by simplices in ⇢ . To do so, we need to nd,for each @ -simplex f ⇢ , all the @ -simplices in ⇢ that could be partof a joist with f . Therefore, as the size of ⇢ increases, the memoryrequired to store the candidate joists increases as well. To deal with lgorithm 1 ST D Require:
Simplicial complex K ; Max size Ensure:
Trussness of the simplices in K CA ú for ⇠ C (K) do ( ú ; min ( , max f ⇠ | f |) for @ to do if @ > and ( = ú then break ⇢ S ( ( , ⇠ , @ ) if ⇢ = ú then break ;1 [ f ] max g ⇠ ^ f ✓ g (| g |) | f | for f ⇢ ⇠ J ( ⇢ ) CA [ f ] | ⇠ [ f ]| for f ⇢ if ö f s.t. ;1 [ f ] < CA [ f ] then ( { f | f ⇢ ^ CA [ f ] > } continue ( ú ù simplices to extend in the next iteration & ú ù ordered queue of simplices to examine & [ CA [ f ]] & [ CA [ f ]] [ { f } for f ⇢ while & < ú do " simplices with minimum trussness in & for f " do for subset b of f of size | f | do for E ⇠ [ f ] do g b [ { E } if CA [ g ] > CA [ f ] then remove f \ g from ⇠ [ g ] update & if CA [ g ] has changed if CA [ f ] > then ( ( [ { f } remove f from & remove from CA each f with CA [ f ] = ;1 [ f ] return CA the large memory requirement, and hence allow the algorithm to beexecuted also on less powerful machines, we propose two di erentstrategies to nd and validate all the candidate joists. The rst one isan in-memory strategy, while the second one temporarily stores thecandidates in chunks on disk, and then reads one chunk at a timeto validate the candidates in it. To decide which strategy to use, wekeep track of the memory consumption of the data structures thatstore the candidates. When we reach the memory limit, we create " chunks, where the C⌘ chunk contains the pairs ( , ) suchthat mod " = and is the id of the simplex that could bepart of a joist with the simplex with id . Then, if the chunk canbe loaded into main memory, we load and validate its candidatesby calling Procedure J . Otherwise, if the size of thechunk exceeds the memory limit, we need to load it in batches.Procedure J requires all the simplices that couldform a joist with a given simplex f to determine the actual joistscontaining f (Algorithm 2, line 25). To make sure that we processthem in the same batch, we sort the chunk by via an algorithmfor external sorting; then we load and validate the candidate joistsof one simplex at a time. Algorithm 2 J Require:
Set of simplices ⇢ Ensure:
The joists formed by simplices in ⇢ ⇠ ú ; ú for B ⇢ do ⇠ ( ⇠ , M ( B , I) , B ) for subset b of f of size | f | do [ b ] [ b ] [ { B } ⇠ J ( ⇠ ) return ⇠ function M ( f , I ) ⇠ ú for subset b of f of size | f | do for g [ b ] do ⇠ ⇠ [ { g } return ⇠ function ( ⇠ , <0C2⌘4B , f ) for g <0C2⌘4B do E f \ g ; D g \ f add g to ⇠ [ f ] if D f [| f | ] add f to ⇠ [ g ] if E g [| g | ] return ⇠ function J ( ⇠ ) J ú for f ⇠ do , vertices in the simplices in ⇠ [ f ] but not in f for F , do ⇠ Ff { g | g ⇠ [ f ] ^ F g } if ⇠ Ff = | f | then add F to J [ f ] add f \ g to J [ g ] for g ⇠ Ff return J To determine which sets of simplices could form a joist, we borrowtechniques from similarity search and build a pruned inverted indexfor the simplices. To do so, we associate to each simplex f a set of ngerprints, or codes , de ned as subsets of vertices of f . Since anecessary condition for two @ -simplices f and f to be in the samejoist is to share all but one vertex, we generate codes of size | f | .This way, we ensure that each simplex shares one code with all andonly candidate cofaces of a joist. Then, Algorithm 2 dynamicallycreates an inverted index that maps a code b to a set of simplicesthat contain b . In the rst step, the algorithm uses Procedure M to nd all the simplices g that share an indexed code withthe current simplex f . In the second step, the algorithm indexes allthe codes of f . Executing M before indexing the codesof f ensures that each pair of simplices is compared only once.Once found all the simplices that share a coface b with f , Proce-dure updates the data structure ⇠ , which tracks candidatejoists. To ensure that Procedure J examines each can-didate joist only once, we leverage the natural order of the vertices,and thus add to the set of candidates ⇠ [ f ] of a simplex f onlyhe simplices g such that the vertex E in g but not in f satis es E f [| f | ] (Algorithm 2 lines 16-17). This condition is true onlyfor a single @ -simplex [ E , . . . , E @ ] in each joist [ E , . . . , E @ + ] .Procedure J extracts subsets of simplices in thecandidate set ⇠ [ f ] of a simplex f = [ E , . . . , E @ ] that could form,together with f , the joist of a ( @ + ) -simplex [ E , . . . , E @ , F ] . Eachsubset of ( @ + ) simplices in a joist of a ( @ + ) -simplex sharea vertex F , because each pair of simplices in have @ vertices incommon. Therefore, for each vertex F in some simplex in ⇠ [ f ] but not in f , the algorithm checks if the total number of simplices in ⇠ [ f ] that contain F is equal to @ + (line 25). When this conditionholds, it adds F to the set of real joists J [ f ] , and g \ f to the set ofreal joists J [ g ] of all the simplices g containing E (lines 26-27). We rst illustrate the algorithm that performs the complete trussdecomposition and then explain how the top- = algorithm di ersfrom it. Algorithm 1 initializes the trussness of each simplex f in ⇢ as its upper bound (line 10), which is the number of joists thatcontain f . By exploiting an inverted index, the lower bound ;1 [ f ] of the simplicial trussness of a simplex f can be computed in lineartime. Therefore, the algorithm stores only the simplicial trussness CA [ f ] of the simplices such that CA [ f ] > ;1 [ f ] . If all the simplices f have lower bound equal to the upper bound, the algorithm hasalready found their real simplicial trussness. Therefore, it insertsthem in the set ( of simplices to extend in the next iteration, in-creases @ , and nally continues to the next iteration. Otherwise,the simplices need to be processed.To compute the real simplicial trussness values, we insert thesimplices in a dictionary & (line 16) that allows us to extract themin increasing order of upper bound. When examining a simplex f ,all the simplices in the same joists of f are visited by generatingthem on the y. Since the set ⇠ stores the vertices E such that f is a coface of the ( @ + ) -simplex f [ { E } , we can generate thosesimplices by adding the vertex E to all the subsets of f of size | f | (line 22). If some g has trussness greater than that of f (line 23),we remove from ⇠ [ g ] the vertex that belongs to f but not to g . Ifthis is the rst time that we remove it, the simplicial trussness of g is decreased by 1 and & is updated accordingly (line 25). Finally,if CA [ f ] > we insert f in the set of simplices to extend in thenext iteration. At the end of the computation, we remove all thesimplices f such that CA [ f ] = ;1 [ f ] from CA , so that CA containsonly the non-trivial simplicial trussness values.To nd the top- = simplices of size @ with maximum simplicialtrussness for given parameters = and @ , we modify two proceduresof Algorithm 1. First, we let procedure S generateonly the simplices of size @ . Secondly, we replace the while cycle atlines 17-27 with Algorithm 3, whose pseudocode is reported in Ap-pendix A. This algorithm examines the simplices in the dictionary & in descending order of upper bound, to increase the likelihoodthat the simplices with largest real simplicial trussness are discov-ered sooner. Then, it exploit a priority queue ) of ( C , i ) pairs sortedon C to maintain the trussness C of the simplices i for which thereal simplicial trussness has been found. This way, it can terminatethe examination of a simplex f as soon as at least = simplices havebeen collected in ) , and the estimate of the trussness of f is lower Table 2: Characteristics of the real datasets.
DBLP Enron Zebra ETFs
Vertices
Edges
Triangles
11 350 197 1 431 200 3 080 990 7130
Maximal Simplices
Max Dimension 17 64 61 71Min/Max Jaccard 0.01/0.97 0.007/0.95 0.013/0.81 0.07/0.87Avg/Median Jaccard 0.61/0.68 0.55/0.60 0.55/0.61 0.39/0.33Connected Components 60773 1004 313 174 than the minimum key in ) (line 8). Moreover, the algorithm canterminate the computation as soon as the largest upper bound of thesimplices still in & is lower than the minimum key ) (line 4). Finally,the algorithm returns the top = elements in ) , which correspond tothe = simplices in K with maximum trussness. Let K be a simplicial complex with vertex set + of size < . In theworst case, K is a clique complex, i.e., any subset of + belongs to K .In this case, the number of simplices to examine in Algorithm 1 isequal to the number of proper subsets of + with size greater than 1,which is < < . Then, each @ -simplex shares @ vertices withevery other @ -simplex, and therefore Algorithm 2 has complexity O ©≠´ < ’ @ = ✓ <@ ◆ · ✓ <@ ◆™Æ¨ O < and the total cost of Algorithm 1 is upper-bounded by O ( < ) .When the input contains only binary relationships (i.e. it is a graph),the parameter @ takes only value 2, and therefore the summationbecomes < ⇤ < . In this case, the complexity is upper boundedby O < . Let = be the number of binary relationships in the input,then the complexity can be expressed as O = , because the numberof binary relationships in a clique complex is = = < ( < )/ . In this section we (1) show the signi cance of simplicial truss de-composition when compared to classic graph decomposition, (2)evaluate the performance and scalability of our algorithm ST D ,(3) study the persistent homology of a dataset by using its simpli-cial truss decomposition as ltration, (4) show how the simplicialtrussness can be used to measure the manifoldness of a dataset, (5)compare the simplicial truss decomposition of real datasets withthat of random complexes, and (6) present and discuss particular ndings in one of the real datasets.We implemented ST D in Python 3.6 and Networkx v2.4. Weused Dionysus v2.0.7 to compute persistent homology. The code isavailable on GitHub. We ran the experiments on a -Core ( . GHz) Intel(R) Xeon(R) Gold 6138 with 126GB of RAM, Ubuntu . ,limiting the available memory to 70GB, and using a single core.We considered four real-world and two synthetic datasets. Theircharacteristics are reported in Table 2 and Table 3. DBLP is the coauth-DBLP simplicial complex provided by Ben-son et al. [3], where each simplex represents a publication and itsvertices are the corresponding authors. https://github.com/lady-bluecopper/STruD able 3: Characteristics of the synthetic datasets. RFC SCM
Vertices
Edges
69 274 2324
Triangles
443 033 21 023
Maximal Simplices
254 820 191
Max Dimension 6 50Min/Max Jaccard .
11 e /0.64 0.23/0.6Avg/Median Jaccard 0.35/0.30 0.46/0.57Connected Components 23 3 Enron contains emails sent from roughly 150 employees of theEnron corporation. We obtain a simplicial complex by representingsenders and recipients as vertices, and each email as a simplex.
Zebra is a simplicial complex constructed from the genetic data ofzebra shes provided by COXPRESdb [26]. Starting from the genecorrelation table generated using the method RC-PS , we extract,for each gene, the genes with mutual rank value lower than onetenth of the average value in its neighborhood, and then createda simplicial complex containing the extracted genes. Finally, weretain the simplices with size in the 99th percentile. ETFs (Exchange-Traded Funds) contains general aspects, portfolioindicators, returns, and nancial ratios of 2353 ETFs, scraped fromthe Yahoo Finance website. We use this information as featuresof the ETFs, compute the Kendall correlation between each pair ofETFs, and found the . -th percentile for each row of the correla-tion matrix. Then, for each ETF, we create a simplex containing allthe ETFs with correlation above the . -th percentile. Finally, weretain the simplices with size in the . -th percentile. RFC is a random ag complex [19], i.e., a simplicial complex whose @ -simplices correspond to the ( @ + ) -cliques of a random graph. Itis constructed by rst generating an Erdös-Rényi random graphon = vertices and edge probability ? , and then creating a @ -simplexfor each ( @ + ) -clique in the graph. We use = = and ? = · log ( = )/ = . SCM is a null model for comparison with empirical simplicial com-plexes [42]. Let the degree of a vertex in a complex be the numberof maximal simplices that contain it, and the size of a simplex bethe number of vertices it contains. Then, SCM is the uniform dis-tribution over all the complexes with degree sequence and sizesequence B . We sample a complex from this distribution by usingthe uniform Markov chain Monte Carlo sampler (MCMC), whichtakes in input only a list of maximal simplices from a real complex(we use the Enron complex for this). Figure 2 proves the richness of the simplicial truss decomposition,when compared to the standard truss decomposition. As we cansee, the size of the graph trusses is a convex function, whereas thesize of the simplicial trusses is concave. This di erence is due to thepresence of the higher-order structures that exist in the complex,but are lost when adopting a graph representation of the data. https://coxpresdb.jp/download https:// nance.yahoo.com https://github.com/jg-you/scm Figure 2: Size of the trusses found in the -skeleton of ETFs(left) and size of the simplicial trusses found in ETFs (right). We evaluate the impact of the dataset characteristics on the per-formance of ST D . Figure 3 (left) shows the total time requiredto compute the simplicial truss decomposition of the real datasets,varying the maximum size of the simplices to explore. For theseexperiments, we terminate the computation when the memory sizelimit is reached, and therefore the running times are not determinedby I/O operations. As expected, the time grows exponentially withthe maximum size, although more steeply for the Enron and Zebradatasets than for DBLP and ETFs. Even though the size of Enronis smaller than that of DBLP, Enron contains simplices with largerdimension and lower overlap of their vertices, which results in anumber of candidate simplices to examine that grows exponentiallywith the max size. Recall that the number of candidate simplicesat each iteration @ is upper-bounded by the number of possiblesubsets of size @ of the simplices in the complex. Figure 4 illustrateshow the actual number of candidates grows for the DBLP and En-ron dataset, together with the running time required to completeeach step of the computation. At iteration @ = , the number ofcandidates in Enron is almost 3 times the number of candidates inDBLP, and hence at iteration @ = , the memory limit is reached. Asimilar situation happens in the Zebra dataset, where the numberof distinct triangles to examine is 3 times the one in Enron.Instead, Figure 3 (right) shows the time required to nd the top- simplices of size (8I4 with highest simplicial trussness, comparedwith the running time of the brute force approach (denoted with -B),which rst compute the simplicial trussness of the simplices andthen retains the top- . We report only the time required to com-plete the rst step. The chart indicates that the di erence betweenthe two approaches is not signi cant. As illustrated in Figure 4, thisis due to the fact that the time required to nd ( candidates ) andvalidate ( validation ) the neighbors dominates the computation, andthe two algorithms di er only in how the perform the trussness step,i.e., in how they compute the simplicial trussness of the simplices.These results show that the combinatorial explosion caused bythe downward closure property severely a ects the performance ofany approach that needs to operate on the simplices in a complex. Persistent homology [14] is a mathematical tool used in topologicaldata analysis (TDA) to identify the qualitative features of a sim-plicial complex, and quantify the shape of the underlying data interms of those features. This is achieved by measuring the lifetimeof the topological features through a ltration , which is a sequence .1110100100010000100000 2 3 4 5 6 7 8 9 10 MAX T i m e ( s ) Max Size DBLPENRONETFsZEBRA T i m e ( s ) Size
DBLP DBLP-BENRON ENRON-BETFs ETFs-BZEBRA ZEBRA-B
Figure 3: Running time of ST D to nd the simplicial trussdecomposition of all the datasets, varying max size of thesimplices considered (left); and to nd the top- simpliceswith highest simplicial trussness and given size (right). T i m e ( s ) Sizesimplices extension validationcandidates trussness T i m e ( s ) Sizesimplices extension validationcandidates trussness
Figure 4: Running time required by the various steps of ST D in DBLP (left) and Enron (right). The bars indicatethe number of simplices of size Size that must be examined.Figure 5: Persistence diagrams of , , and , for two dif-ferent subsets of the DBLP dataset. of nested subcomplexes of the simplicial complex constructed byiteratively adding simplices to an initially empty set, under the con-dition that a simplex is added only after all its faces. The qualitativefeatures are the topological features with long persistence throughthe ltration. As such, persistent homology provides a measureof robustness of the features emerging across di erent scales, andgenerates an accurate approximation of the underlying data space.Given that the de nition of simplicial truss satis es the con-tainment property, we can use the reverse of the simplicial trussdecomposition as a ltration. The persistence homology of a ltra-tion can be visualized via a persistence diagram, which representeach topological feature in the ltration as a point. Each pointis called a persistence pair , and its coordinates indicate the birthand death time of the feature in the ltration sequence. Roughlyspeaking, a -dimensional feature is a connected component, a -dimensional feature is a hole, and a -dimensional feature is a void. Figure 5 shows the result of the truss-based persistence ho-mology on di erent clusters of co-authors in DBLP via persistencediagrams. The left diagrams indicate the -dimensional features,the middle ones indicate the -dimensional features, while the rightones the -dimensional features. Points close to the diagonal arefeatures which are born and immediately die, and thus representfeatures created by simplices that are not maximal, but contained ina larger simplex. These simplices have trivial simplicial trussness,i.e., simplicial trussness equal to the lower bound. The middle andright parts of Figure 5 (middle and right) show that many of thehigher-order simplices in DBLP have trivial simplicial trussness.Di erently, points in the upper-left corner indicate features orig-inated early in the ltration and died at the end of the ltration(or never died). Alive features can correspond to joists of simplicesthat do not exist in the complex, and can be surrounded by sim-plices with both high and low simplicial trussness values. The caseof high simplicial trussness is particularly interesting, because itcorresponds to a dense region of the complex with a hole, i.e., to agroup of researchers that frequently collaborated in subgroups butnever together. Dead features, instead, can correspond to simpliceswhose cofaces have simplicial trussness much higher than that ofthe simplex, meaning that the neighborhood of the simplex is muchless dense than those of its cofaces. When looking, for example, forstrong collaborations or central vertices in DBLP, one may want toconcentrate on these particular cofaces. In contrast, points in theupper-right corner indicate features originated towards the end ofthe ltration. Since the simplicial trussness is inversely proportionalto the time of birth, these points can indicate papers written mostlyby small group of authors which do not often cross-collaborate.Another interesting case is illustrated in the upper-left corner ofFigure 5. Given that the simplicial trussness of a simplex is lowerbounded by the size of the largest simplex containing it, when thecomplex contains simplices with heterogeneous sizes, persistencepairs can be found all over the persistence diagram.Finally, by looking at the persistence diagrams of di erentdatasets, we can compare the dynamics that characterize them.In the case presented in Figure 5, similar diagrams indicate thatauthors in di erent elds of research collaborate in a similar way,while di erent diagrams suggest di erent underlying mechanisms.The structure and patterns of scienti c collaboration have beenan object of study for many research communities [13]. Amongthem, Patania et al. [27] use persistence homology to characterizethe patterns of collaboration in the arXiv data. It has been shown that the topology and the geometry of a net-work a ect its dynamics [4], and hence play an important role inunderstanding the organization of the brain [5], and in de ningrouting protocols [20], among others. A well-studied topologicalobject is the manifold. A simplicial manifold is a simplicial complexfor which the geometric realization is homeomorphic to a topolog-ical manifold, i.e., a space where the neighborhood of each pointis homeomorphic to R = for some integer = . A simplicial manifold M of dimension is a growing simplicial complex generated bygluing -simplices along their cofaces [24]. Let = X be the number of igure 6: Simplicial Trussness of ETFs.Figure 7: Simplicial Trussness of DBLP. -simplices of which X is a coface minus . In the rst step, M con-sists in a single -simplex. At any subsequent step C , an additional -simplex is attached to a coface X in M with probability: % X = = X Õ X " = X . It follows that the number of vertices in M grows at each stepby one, and hence the total number of vertices is equal to = B + , where B is the number of -simplices in M . Given parameters and B , we generated several manifolds of dimension and size B and performed their simplicial truss decomposition. Experimentalresults showed that in a manifold of dimension , each @ -simplexhas simplicial trussness equal to @ . This behavior arises fromhow the simplicial manifolds are generated. By construction, a newvertex is added at each step, meaning that the new vertex E in thelast -simplex added is included in di erent joists, and thereforeits simplicial trussness is equal to . For the same reason, all the -simplices incident to E have simplicial trussness , and so on.By cascade, all the neighbors of E have simplicial trussness , all theneighbors of its incident -simplices have trussness , and soon. As a consequence, simplicial trussness can quantify how mucha dataset deviates from a manifold: the more diverse the simplicialtrussness values of simplices of the same size are, the more thesimplicial complex is di erent from a manifold. For example, bylooking at Figure 6 and Figure 7 we can conclude that ETFs is moresimilar to a manifold than DBLP, because most of the simplices ofthe same size have the same simplicial trussness value. Random models are a useful tool to prove the statistical signi canceof the ndings of some analysis on real data. We show that ourde nition of simplicial trussness is informative and non-trivial bycomparing the simplicial truss decomposition of a real complexwith that of random complexes.We generate two random complexes, RFC and SCM, and computetheir simplicial truss decomposition. Figure 8 illustrates the size ofthe simplicial trusses found in Enron (a), RFC (b), and SCM (c), whilethe simplicial trussness of their simplices is reported in Figure 9 (in a) Enronb) RFCc) SCM Figure 8: Simplicial trusses of Enron, RFC, and SCM.
Appendix). As the majority of the simplices in SCM have simplicialtrussness , in Figure 8 (c) we can see that the size of the : -trusseswith : is almost the same. This behavior is due to the presenceof a large simplex of size that contains a large portion of thevertices, and thus determines the simplicial trussness of most ofthe simplices in the complex. In contrast, social networks suchas Enron usually follow power-law degree distributions, so thatmost of the vertices have few connections, and few vertices arehighly-connected. As a consequence, the simplicial trussness valuesof the simplices in Enron are more heterogeneous (Figure 8 (a))and the convexity of the function in the chart more pronounced.Similarly, RFC does not follow a power-law distribution, havingmainly size-5 and size-3 simplices, and therefore, the simplicialtrussness values of its simplices are smaller than those of the Enronsimplices (Figure 8 (b)). Additionally, Figure 9 (b) shows that most ofthe higher-order structures have simplicial trussness 1 or 2, henceleading to a : -truss size chart that resembles those obtained inthe standard truss decomposition. This situation happens whenthe larger simplices do not participate in many joists, and canbe explained by the simplicial closure phenomenon [27]: in socialnetworks, three vertices connected in pairs are more likely to forma triangle. Since this principle does not generally hold for randomnetworks, most of the joists are joists of simplices that do not existin the complex, and thus the simplicial trussness values are low.The same conclusions can be drawn by looking at Table 4, whichsummarizes the topological features of Enron, RFC, and SCM, interms of Betti numbers V (i.e., structural holes of dimension ),percentage of open joists to total joists, percentage of open trianglesto total open joists, and percentage of simplices with non-trivial able 4: Betti numbers V of Enron, RFC, and SCM; percent-age of open joists; percentage of open triangles; and percent-age of simplices with non-trivial simplicial trussness. V V V Open Open Non-trivialJoists (%) Triangles (%) CA (%) Enron
215 34685 28725 0.46 89.2 0.12
RFC
SCM simplicial trussness. Here, we use the term open to indicate joistsof simplices that do not exist in the complex. As we can see, in RFC . of the joists are open (the number grows to for SCM)compared to the . of Enron.Finally, we note that most of the simplices in SCM have trivialsimplicial trussness ( . ), which means that their simplicialtrussness value is only determined by the largest simplex that con-tains them. This number is quite similar to that of Enron ( . )due to the simplicial closure phenomenon, but goes down to . for RFC. Arguably, the simplicial truss decomposition of SCM be-haves similarly to that of Enron is because the SCM random complexis generated by using Enron as input. By analyzing the experimental results, it turns out that a simplex f achieves a high simplicial trussness in one of the following twocases: (i) f belongs to many overlapping joists, or (ii) f is containedin a very large simplex g . In a collaboration network such as DBLP,these two cases can correspond, respectively, to authors that haveoften collaborated with each others, and to a paper written by alarge group of researchers. When looking for interesting structuresin the network, the rst situation may be of greater interest thanthe second one, as it indicates that stronger and more persistent re-lationships exist between the authors. By storing only the simpliceswith non-trivial simplicial trussness, we can immediately detect themost interesting simplices in the dataset. Indeed, we note that inthe second case the simplicial trussness of f is determined by thesize of g , i.e., it is equal to the lower bound. On the contrary, in the rst case, the simplicial trussness of f is non-trivial, and moreover,it is more likely to be much larger than that of the larger simplices g containing it. As an example, by looking at the simplices with non-trivial simplicial trussness in DBLP, we can identify the triangle(Marek Tutaj, Howard J. Jacob, Weisong Liu) with simplicial truss-ness 11, determined by several papers co-written by them. On theother hand, by looking at the top simplices with maximal simplicialtrussness, we nd that the triangle (Pablo Losada, Charo Gil, MariaC. Viegas) has simplicial trussness 14. However, the latter simplexmay be less interesting than the rst one, because its simplicialtrussness is completely determined by a single paper published in2011 co-written by 17 authors.Finally, other interesting structures to analyze are the structuralholes, i.e., joists of simplices that do not exist in the complex, sur-rounded by dense structures, because, due to the simplicial closurephenomenon, they indicate structures that could be lled in thefuture. This kind of joists is characterized by the presence of higher-order structures with high simplicial trussness around. For example,let consider the joist of the triangle (Liwei Wang, Harold R. Solbrig,Cui Tao). Liwei Wang wrote 3 papers together with Cui Tao, one of which has many authors and thus contributes signi cantly to thesimplicial trussness of the edge (Liwei Wang, Cui Tao). On the otherhand, 15 papers contribute to the simplicial trussness of the edge(Cui Tao, Harold R. Solbrig). Finally, one paper with many authorsdetermines the simplicial trussness of the edge (Liwei Wang, HaroldR. Solbrig). Even though, Liwei Wang and Harold R. Solbrig workin the same research center and Harold R. Solbrig has an extensivecollaboration with Cui Tao, the three of them never collaboratedtogether. However, the high values of simplicial trussness of theedges may be a sign of future collaboration. Truss Decomposition.
Truss mining has garnered attention in thedata mining community as it represents a cohesive, relaxed versionof clique mining that can be computed very e ciently [40]. Its de -nition is based on the notion of triangles, which have always beenconsidered fundamental building blocks of a network, especially so-cial ones [36]. For this reason, : -trusses have been successfully usedto detect communities in social networks [21] and to identify targetvertices for viral marketing [23], among other applications. Trussdecomposition, which is the task of detecting all the non-empty : -trusses of a graph, has been studied for probabilistic graphs [15],large graphs [6, 18], bipartite graphs [32] and dynamic graphs [29].In addition, Sariyuce et al. [33] have proposed a generalization ofthe : -truss decomposition that nds a hierarchy of dense structuresin a simple graph. However, computing the truss decomposition ofa simplicial complex has not been considered thus far. Frequent Itemset Mining.
If we represent a simplicial complexand its simplices as a family of sets, then the simplicial complex canbe seen as a transactional database, and the simplices as transac-tions. This way, the task of nding the top- = simplices with largesttrussness resembles that of frequent itemset mining, which requires nding all the subsets of items that appear frequently in a trans-actional database [1, 17, 25]. Here, the support of an itemset is thenumber of transaction in the database that contain all the elementsin the itemset. Similarly to our case, this de nition satis es thea-priori property, and thus the search space can be e ciently ex-plored in the same bottom-up fashion used in our work. However,an algorithm designed to solve frequent itemset mining cannot bedirectly applied to solve simplicial truss decomposition: the truss-ness of a simplex depends on the size of the simplices that containit and the trussness of the simplices whose vertex set intersect withthat of the simplex. Conversely, the support of an itemset dependson the number of transaction that contain it, rather than their size. Simplicial Complex Analysis.
Graphs are not always the mostsuitable data structure to encode relationships between real-worldactors. For example, in an email network we cannot distinguisha message with multiple recipients from several messages with asingle recipient. Similarly, in a co-authorship network, we cannotdistinguish a paper written by a group of authors from severalpapers written pairs of authors. Simplicial complexes have beenused by the data mining community to better capture these higher-order relationships and solve several interesting problems [31]. Byobserving that the interactions between subsets of a group of usersin a social network increase the likelihood that the members of thegroup will be pairwise connected in the future, Benson et al. [3]ackled link prediction via simplicial closure. Similarly, Eswaranet al. [12] addressed the semi-supervised learning task of labelpropagation in partially-labeled graphs. Iacopini et al. [16] adoptedsimplicial complexes to describe social contagion and di usionphenomena, Horak et al. [14] to characterize networks by meansof their topological features, Serrano and Gómez [34] to de necentrality measures, and nally substantial work has been devotedto study the brain’s functional and structural organization [22, 28]. We introduced the problem of truss decomposition in simplicialcomplexes, which generalizes the standard truss decompositionon graphs. We showed that our de nition of : -truss in a com-plex satis es the uniqueness, containment, and a-priori property,which allows the development of bottom-up solutions that exam-ine simplices of increasing dimension. Moreover, we identi ed aconvenient lower bound to the simplicial trussness of the simplicesthat let us reduce the size of the output signi cantly; and an up-per bound that gives an ordering of the simplices to use for thecomputation of the real values of simplicial trussness. Borrowingideas from similarity search, we designed ST D , a memory-awarealgorithm that can e ciently compute the simplicial truss decom-position of a complex. In addition, we presented a version of ST D that extract the top- = simplices with maximum trussness and givensize. Our experimental evaluation has proven (i) the richness of thesimplicial trusses when compared to the standard ones, and (ii) thescalability of ST D ; and has shown (iii) a topological and (iv) ageometrical interpretations of the simplicial trussness (as ltrationand as a measure of manifoldness). Acknowledgments.
The authors acknowledge support from In-tesa Sanpaolo Innovation Center. The funders had no role in studydesign, data collection and analysis, decision to publish, or prepa-ration of the manuscript.
REFERENCES [1] Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for MiningAssociation Rules in Large Databases. In
VLDB .[2] Ron Atkin. 1974.
Mathematical structure in human a airs . Heinemann EducationalPublishers.[3] Austin R Benson, Rediet Abebe, Michael T Schaub, Ali Jadbabaie, and Jon Klein-berg. 2018. Simplicial closure and higher-order link prediction. PNAS avor:from complexity to quantum geometry. Physical Review E
93, 3 (2016), 032315.[5] Ed Bullmore and Olaf Sporns. 2009. Complex brain networks: graph theoreticalanalysis of structural and functional systems.
Nature Reviews Neuroscience
10, 3(2009), 186–198.[6] Pei-Ling Chen, Chung-Kuang Chou, and Ming-Syan Chen. 2014. Distributedalgorithms for k-truss decomposition. In
Big Data . 471–480.[7] Jonathan Cohen. 2008.
Trusses: Cohesive subgraphs for social network analysis .Technical Report.[8] Yuri Dabaghian, Facundo Mémoli, Loren Frank, and Gunnar Carlsson. 2012. Atopological paradigm for hippocampal spatial map formation using persistenthomology.
PLoS Computational Biology
8, 8 (2012), e1002581.[9] V. De Silva and R. Ghrist. 2007. Homological sensor networks.
Notices of theAmerican mathematical society
54 (2007). Issue 1.[10] W Dör er and DA Waller. 1980. A category-theoretical approach to hypergraphs. Archiv der Mathematik
34, 1 (1980), 185–192.[11] Ernesto Estrada and Grant J Ross. 2018. Centralities in simplicial complexes.Applications to protein interaction networks.
Journal of Theoretical Biology
The Web Conference 2020 . 2493–2499. [13] Wolfgang Glänzel and András Schubert. 2004. Analysing scienti c networksthrough co-authorship. In Handbook of quantitative science and technology re-search . 257–276.[14] Danijela Horak, Slobodan Maleti ć , and Milan Rajkovi ć . 2009. Persistent homologyof complex networks. Journal of Statistical Mechanics: Theory and Experiment
ICDM . 77–90.[16] Iacopo Iacopini, Giovanni Petri, Alain Barrat, and Vito Latora. 2019. Simplicialmodels of social contagion.
Nature Communications
10, 1 (2019), 1–9.[17] Ruoming Jin and G. Agrawal. 2005. An algorithm for in-core frequent itemsetmining on streaming data. In
ICDM .[18] Humayun Kabir and Kamesh Madduri. 2017. Shared-memory graph truss decom-position. In
HiPC . 13–22.[19] Matthew Kahle. 2009. Topology of random clique complexes.
Discrete Mathemat-ics
INFOCOM .1902–1909.[21] Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020. Truss-based community search over large directed graphs. In
ICDM . 2183–2197.[22] Louis-David Lord, Paul Expert, Henrique M. Fernandes, Giovanni Petri, Tim J.Van Hartevelt, Francesco Vaccarino, Gustavo Deco, Federico Turkheimer, andMorten L. Kringelbach. 2016. Insights into Brain Architectures from the Ho-mological Sca olds of Functional Connectivity Networks. Frontiers in SystemsNeuroscience
10 (2016), 85.[23] Fragkiskos D Malliaros, Maria-Evgenia G Rossi, and Michalis Vazirgiannis. 2016.Locating in uential nodes in complex networks. Scienti c reports Scienti c Reports
8, 1 (2018), 1–10.[25] S. Moens, E. Aksehirli, and B. Goethals. 2013. Frequent Itemset Mining for BigData. In
BigData . 111–118.[26] Takeshi Obayashi, Yuki Kagaya, Yuichi Aoki, Shu Tadaka, and Kengo Kinoshita.2018. COXPRESdb v7: a gene coexpression database for 11 animal species sup-ported by 23 coexpression platforms for technical evaluation and evolutionaryinference.
Nucleic Acids Research
47, D1 (2018), D55–D62.[27] Alice Patania, Giovanni Petri, and Francesco Vaccarino. 2017. The shape ofcollaborations.
EPJ Data Science
6, 1 (2017), 18.[28] Giovanni Petri, Paul Expert, Federico Turkheimer, Robin Carhart-Harris, DavidNutt, Peter J Hellyer, and Francesco Vaccarino. 2014. Homological sca olds ofbrain functional networks. Journal of The Royal Society Interface
11, 101 (2014).[29] Venkata Rohit Jakkula and George Karypis. 2019. Streaming and Batch Algorithmsfor Truss Decomposition. arXiv preprint arXiv:1908.10550 (2019).[30] Manish Saggar, Olaf Sporns, Javier Gonzalez-Castillo, Peter A Bandettini, GunnarCarlsson, Gary Glover, and Allan L Reiss. 2018. Towards a new approach toreveal dynamical organization of the brain using topological data analysis.
NatureCommunications
9, 1 (2018), 1–14.[31] Vsevolod Salnikov, Daniele Cassese, and Renaud Lambiotte. 2018. Simplicialcomplexes and complex systems.
European Journal of Physics
40, 1 (2018).[32] Ahmet Erdem Sarıyüce and Ali Pinar. 2018. Peeling bipartite networks for densesubgraph discovery. In
Proceedings of the Eleventh ACM International Conferenceon Web Search and Data Mining . 504–512.[33] Ahmet Erdem Sariyuce, C Seshadhri, Ali Pinar, and Umit V Catalyurek. 2015.Finding the hierarchy of dense subgraphs using nucleus decompositions. In
Proceedings of the 24th International Conference on World Wide Web . 927–937.[34] Daniel Hernández Serrano and Darío Sánchez Gómez. 2019. Centrality measuresin simplicial complexes: applications of TDA to Network Science. arXiv preprintarXiv:1908.02967 (2019).[35] L. M. Seversky, S. Davis, and M. Berger. 2016. On time-series topological dataanalysis: new data and opportunities. In
CVPR Workshops . 59–67.[36] Georg Simmel. 1950.
The Sociology of Georg Simmel . Vol. 92892. Simon andSchuster.[37] David I Spivak. 2009. Higher-dimensional models of networks. arXiv preprintarXiv:0909.4314 (2009).[38] Leo Torres, Ann S Blevins, Danielle S Bassett, and Tina Eliassi-Rad. 2020. Thewhy, how, and when of representations for complex systems. arXiv preprintarXiv:2006.02870 (2020).[39] Jeremy D. Turiela, Paolo Barucca, and Tomaso Astea. 2020. Simplicial persistenceof nancial markets: ltering, generative processes and portfolio risk. arXivpreprint arXiv:2009.08794 (2020).[40] Jia Wang and James Cheng. 2012. Truss decomposition in massive networks. arXiv preprint arXiv:1205.6693 (2012).[41] Kelin Xia and Guo-Wei Wei. 2014. Persistent homology analysis of proteinstructure, exibility, and folding. Int J Numer Meth Bio
30, 8 (2014), 814–844.[42] Jean-Gabriel Young, Giovanni Petri, Francesco Vaccarino, and Alice Patania. 2017.Construction of and e cient sampling from the simplicial con guration model. Physical Review E
96, 3 (2017).
TOP-N SIMPLICES OF SIZE Q
Algorithm 3 reports the pseudocode of the method for ndingthe top- = simplices with size @ and maximal simplicial trussness,discussed in Section 3.2. Algorithm 3
Top- = Simplices of size @ Require:
Simplicial complex K ; Num simplices = ; Size @ Ensure:
Top- = simplices of size @ with maximum trussness ) priority queue of ( C , i ) pairs sorted on C while & < ú do " simplices with max trussness upper bound CA " in & if | ) | = and CA " ) . <8= _ :4~ () then break for f " do for E ⇠ [ f ] do if | ) | = and CA [ f ] ) . <8= _ :4~ () then break for subset b of f of size | f | do g b [ { E } if CA [ g ] < CA [ f ] then ⇠ [ f ] ⇠ [ f ] \ { E } CA [ f ] | ⇠ [ f ] | break remove f from & insert ( CA [ f ] , f ) in ) return ) . C>? ( = ) B PROOFS
We next report the proof of Property 1 (uniqueness): i.e., the : -trussof a simplicial complex K is unique. P . Assume, by absurd, that ) : and ) : are two distinct : -trusses of K . Let ) : = ) : [ ) : . For each f ) : , CA ( f ) : becauseeach f in either ) : or ) : has trussness : by de nition. Since thesize of ) : is not lower than that of ) : and ) : , each @ -simplex in ) : is involved in not less than : ( @ + ) -ary relationships, meaningthat the number of joists to which f ( @ ) belongs is not lower than : . Therefore ) : is a : -truss of K larger than ) : and ) : , which is acontraction as ) : and ) : are maximal. ⇤ We next report the proof of Property 2 (containment): i.e., the ( : + ) -truss of a simplicial complex K is a subset of its : -truss. P . Each f ) : + has trussness CA ( f ) : + , meaning thatit belongs to at least : + joists. Therefore, it satis es the conditionto belong to ) : . As a result, it holds that ) : + ⇢ ) : . ⇤ We next report the proof of Property 3 (a-priori property): i.e., a @ -simplex f ( @ ) that is a face of a ( @ + ) -simplex f ( @ + ) has trussness CA K ( f ( @ ) ) CA K ( f ( @ + ) ) . P . Let assume there exists a ( @ + ) -simplex f ( @ + ) thatcontains f ( @ ) and has trussness : + with : = CA K ( f ( @ ) ) . Then, ) : + contains : + joists , = , . . . , : + of ( @ + ) -simplices f ( @ + ) ,each of which contains ( @ + ) ( @ + ) -simplices with trussness : + . Each f ( @ + ) contains the same vertices of f ( @ + ) plus anadditional vertex E . Since each pair of cofaces of a @ -simplex shareexactly @ vertices, for each there exists a pair of simplicesthat share the coface f ( @ ) , for a total of : + ( @ + ) -simplices.Similarly, there exist a set of : + ( @ + ) -simplices that containeach other coface of f ( @ + ) , meaning that f ( @ ) and all the other @ -simplices in f ( @ + ) appear in : + joists. Therefore, the simplicialtrussness of f ( @ ) is : + . We reached a contradiction, as we assumed : = CA K ( f ( @ ) ) . ⇤ C ADDITIONAL EXPERIMENTS
Figure 9 illustrates the number of simplices per simplicial trussnessvalue, for Enron (a), RFC (b), and SCM (c). These charts supplementthose presented in Section 4.5. a) Enronb) RFCc) SCMa) Enronb) RFCc) SCM