[PDF] Co-clustering Vertices and Hyperedges via Spectral Hypergraph Partitioning

Abstract

We propose a novel method to co-cluster the vertices and hyperedges of hypergraphs with edge-dependent vertex weights (EDVWs). In this hypergraph model, the contribution of every vertex to each of its incident hyperedges is represented through an edge-dependent weight, conferring the model higher expressivity than the classical hypergraph. In our method, we leverage random walks with EDVWs to construct a hypergraph Laplacian and use its spectral properties to embed vertices and hyperedges in a common space. We then cluster these embeddings to obtain our proposed co-clustering method, of particular relevance in applications requiring the simultaneous clustering of data entities and features. Numerical experiments using real-world data demonstrate the effectiveness of our proposed approach in comparison with state-of-the-art alternatives.

Full PDF

CCo-clustering Vertices and Hyperedges viaSpectral Hypergraph Partitioning

Yu Zhu, Boning Li, Santiago Segarra

Rice University, USA

Abstract —We propose a novel method to co-cluster the verticesand hyperedges of hypergraphs with edge-dependent vertexweights (EDVWs). In this hypergraph model, the contributionof every vertex to each of its incident hyperedges is representedthrough an edge-dependent weight, conferring the model higherexpressivity than the classical hypergraph. In our method, weleverage random walks with EDVWs to construct a hypergraphLaplacian and use its spectral properties to embed verticesand hyperedges in a common space. We then cluster theseembeddings to obtain our proposed co-clustering method, ofparticular relevance in applications requiring the simultaneousclustering of data entities and features. Numerical experimentsusing real-world data demonstrate the effectiveness of our pro-posed approach in comparison with state-of-the-art alternatives.

Index Terms —Hypergraphs, co-clustering, Laplacian, spectralpartitioning, edge-dependent vertex weights.

I. I

NTRODUCTION

Clustering, a fundamental task in data mining and machinelearning, aims to divide a set of entities into several groupssuch that entities in the same group are more similar to eachother than to those in other groups. In graph clustering orpartitioning, the entities are modeled as the vertices of a graphand their similarities are encoded in the edges. In this setting,the goal is to group the vertices into clusters such that thereare more edges within each cluster than across clusters.While graphs serve as a popular tool to model pairwise relationships, in many real world applications the entitiesengage in more complicated, higher-order relationships. Forexample, in coauthorship networks [1] more than two authorscan interact in writing a manuscript. Hypergraphs can be usedto represent such datasets, where the notion of an edge isextended to a hyperedge that can connect more than twovertices. Existing research on hypergraph partitioning mainlyfollows two directions. One is to project a hypergraph ontoa proxy graph via hyperedge expansion and then graph par-titioning methods can be directly leveraged [2–4]. Anotherone is to represent hypergraphs using tensors and adopt tensordecomposition algorithms [5–8].To better accommodate hypergraphs for the representa-tion of real-world data, several extensions over the classi-cal hypergraph have been recently proposed [9–13]. Thesemore elaborate models consider different types of vertices

This work was supported by NSF under award CCF-2008555. B. Li waspartially supported by the Ken Kennedy Institute 2020/21 Ken Kennedy-Cray Graduate Fellowship. We also acknowledge the support of NVIDIACorporation. E-mails: { yz126, boning.li, segarra } @rice.edu or hyperedges, or different levels of relations. In this paper,we consider edge-dependent vertex weights (EDVWs) [11],which can be used to reﬂect the different importance orcontribution of vertices in a hyperedge. This model is highlyrelevant in practice. For example, an e-commerce system canbe modeled as a hypergraph with EDVWs where users andproducts are respectively modeled as vertices and hyperedges,and EDVWs represent the quantity of a product in a user’sshopping basket [14]. EDVWs can also be used to model therelevance of a word to a document in text mining [12], theprobability of an image pixel belonging to a segment in imagesegmentation [15], and the author positions in a coauthorshipor citation network [11], to name a few.A large portion of clustering algorithms focus on one-wayclustering, i.e., clustering data entities based on their featuresand, in the hypergraph setting, clustering vertices based onhyperedges. Indeed, in [12], a hypergraph partitioning algo-rithm was proposed to cluster the vertices in a hypergraphwith EDVWs. However, it is more desirable to simultaneouslycluster (or co-cluster) both vertices and hyperedges in manyapplications including text mining [16, 17], product recom-mendation [18], and bioinformatics [19, 20]. Moreover, co-clustering can leverage the beneﬁt of exploiting the dualitybetween data entities and features to effectively deal with high-dimensional and sparse data [17, 21].In this paper, we study the problem of co-clustering verticesand hyperedges in a hypergraph with EDVWs. Our contribu-tions can be summarized as follows:(i) We deﬁne a Laplacian for hypergraphs with EDVWsthrough random walks on vertices and hyperedges and showits equivalence to the Laplacian of a speciﬁc digraph obtainedvia a modiﬁed star expansion of the hypergraph.(ii) We propose a spectral hypergraph co-clustering methodbased on the proposed hypergraph Laplacian.(iii) We validate the effectiveness of the proposed method vianumerical experiments on real-world datasets.Notation: The entries of a matrix X are denoted by X ij or X ( i, j ) . Operations ( · ) (cid:62) and Tr( · ) represent transpose andtrace, respectively. and I refer to the all-ones vector andthe identity matrix, where the sizes are clear from context. I N and N × M refer to the identity matrix of size N × N and the all-zero matrix of size N × M . diag( x ) denotes adiagonal matrix whose diagonal entries are given by the vector x . Finally, [ X ; Y ] represents the matrix obtained by verticallyconcatenating two matrices X and Y , while [ X , Y ] denoteshorizontal concatenation. a r X i v : . [ c s . D S ] F e b I. P

RELIMINARIES

A. Hypergraphs with edge-dependent vertex weights

Hypergraphs are generalizations of graphs where edges canconnect more than two vertices. In this paper, we consider thehypergraph model with EDVWs [11] as deﬁned next.

Deﬁnition 1.

A hypergraph H = ( V , E , ω, γ ) with EDVWsconsists of a set of vertices V , a set of hyperedges E where ahyperedge is a subset of the vertex set, a weight ω ( e ) for everyhyperedge e ∈ E , and a weight γ e ( v ) for every hyperedge e ∈ E and every vertex v ∈ e .The difference between the above hypergraph model and thetypical hypergraph model considered in most existing papersis the introduction of the EDVWs { γ e ( v ) } . The motivation isto enable the model to describe the cases when the vertices inthe same hyperedge contribute differently to this hyperedge.For example, in a coauthorship network, every author (vertex)in general has a different degree of contribution to a paper(hyperedge), usually represented by the order of the authors.This information is lost in traditional hypergraph models butit can be easily encoded through EDVWs.For convenience, let R ∈ R |E|×|V| collect edge-dependentvertex weights, with R ev = γ e ( v ) if v ∈ e and otherwise.Also, let W ∈ R |V|×|E| collect hyperedge weights, with W ve = ω ( e ) if v ∈ e and otherwise. Throughout the paperwe assume that the hypergraph is connected. B. Spectral graph partitioning

Given an undirected graph G with N vertices, the goal ofgraph partitioning is to divide its vertex set into k disjointsubsets (clusters) S , · · · , S k such that there are more (heavilyweighted) edges inside a cluster and few edges across clusters,while these clusters are also balanced in size. To postulate this problem, let A g , D g = diag( A g ) ,and L g = D g − A g denote the weighted adjacency matrix,the degree matrix, and the combinatorial graph Laplacian,respectively. Denote by S a subset of vertices and S c itscomplement. Then, the cut between S and S c is deﬁned asthe sum of weights of edges across them whereas the volumeof S is deﬁned as the sum of weighted degrees of vertices in S . More formally, we have cut( S , S c ) = (cid:88) u ∈S ,v ∈S c A g ( u, v ) , vol( S ) = (cid:88) u ∈S D g ( u, u ) . One well-known measure for evaluating the partition is nor-malized cut (Ncut) [23] deﬁned as

Ncut( S , · · · , S k ) = k (cid:88) i =1 cut( S i , S ci )vol( S i ) . If we deﬁne an N × k matrix Q whose entries are Q vi = (cid:40) / (cid:112) vol( S i ) if v ∈ S i , otherwise , (1) Although there are different variations of the graph partitioning prob-lem [22], this is the one that we adopt in this paper. then it can be shown that

Ncut( S , · · · , S k ) = Tr( Q (cid:62) L g Q ) .Thus, we can write the problem of minimizing the Ncut as min S , ··· , S k Tr( Q (cid:62) L g Q ) s.t. Q (cid:62) D g Q = I , Q as in (1) . (2)The spectral graph partitioning method [23] relaxes (2) toa continuous optimization problem by ignoring its secondconstraint. The solution to the relaxed problem is the k generalized eigenvectors of L g q i = λ i D g q i associated withthe k smallest eigenvalues. Then, k -means [24] can be appliedto the rows of Q = [ q , · · · , q k ] to obtain the desired clusters S , · · · , S k .III. T HE P ROPOSED H YPERGRAPH C O - CLUSTERING

A. Star expansion and hypergraph Laplacians

We project the hypergraph H onto a directed graph G s =( V s , E s ) via the so-called star expansion, where we replaceeach hyperedge with a star graph. More precisely, we introducea new vertex for every hyperedge e ∈ E , thus V s = V ∪E . Thegraph G s connects each new vertex representing a hyperedge e with each vertex v ∈ e through two directed edges (one ineach direction) that we weigh differently, as explained next.We consider a random walk on the hypergraph H (equiv-alently, on G s ) in which we walk from a vertex v to ahyperedge e that contains v with probability proportional to ω ( e ) , and then walk from e to a vertex u contained in e with probability proportional to γ e ( u ) . We deﬁne two matrices P V→E ∈ R |V|×|E| and P E→V ∈ R |E|×|V| to collect thetransition probabilities from V to E and from E to V , respec-tively. The corresponding entries are given by P V→E ( v, e ) = W ve / (cid:80) e (cid:48) W ve (cid:48) and P E→V ( e, v ) = R ev / (cid:80) v (cid:48) R ev (cid:48) . Then, thetransition probability matrix associated with a random walkon G s can be written as P = (cid:20) |V|×|V| P V→E P E→V |E|×|E| (cid:21) . When the hypergraph H is connected, the graph G s is stronglyconnected, thus the random walk deﬁned by P is irreducible(every vertex can reach any vertex). Moreover, it is periodicsince G s is bipartite and once we start at a vertex v , we canonly return to v after even steps.It is well known that a random walk has a unique stationarydistribution if it is irreducible and aperiodic [25]. To ﬁx theabove periodicity problem, we introduce self-loops to G s anddeﬁne a new transition probability matrix P α = (1 − α ) I + α P where < α < . Matrix P α deﬁnes a random walk (the so-called lazy random walk) where at each discrete time pointwe take a step of the original random walk with probability α and stay at the current vertex with probability − α . Thestationary distribution πππ of the random walk is the all-positivedominant left eigenvector of P α , i.e. πππ (cid:62) P α = πππ (cid:62) , scaled tosatisfy (cid:107) πππ (cid:107) = 1 . Notice that different choices of α lead tothe same πππ .ABLE I: Summary of datasets considered. Datasets Subsets

Given P α and Φ = diag( πππ ) , we generalize the directedcombinatorial Laplacian L and the normalized Laplacian L [25] to hypergraphs as follows L = Φ − ΦP α + P (cid:62) α Φ , (3) L = Φ − LΦ − = I − Φ P α Φ − + Φ − P (cid:62) α Φ . (4)It can be readily veriﬁed that (3) and (4) are equal to thecombinatorial and normalized Laplacians of the undirectedgraph deﬁned by the following weighted adjacency matrix A = ΦP α + P (cid:62) α Φ , (5)where Φ = diag( A1 ) is the corresponding degree matrix. B. Spectral hypergraph partitioning

We can leverage the hypergraph Laplacians proposed inSection III-A to apply spectral graph partitioning methods (asintroduced in Section II-B) to hypergraphs . More precisely,we compute the k generalized eigenvectors U = [ u , · · · , u k ] of the generalized eigenproblem Lu = λ Φu associated withthe k smallest eigenvalues, and then cluster the rows of U using k -means. Note that Lu = λ Φu can be written as Φ − LΦ − ( Φ u ) = λ Φ u , implying that ( λ, Φ u ) is aneigenpair of the normalized Laplacian L . Hence, if v is aneigenvector of L , then u = Φ − v .Since obtaining eigenvectors can be computationally chal-lenging, we show next how to compute the eigenvectors of L from a smaller size matrix. To do this, let us ﬁrst rewrite P α and Φ as P α = (cid:20) (1 − α ) I |V| α P V→E α P E→V (1 − α ) I |E| (cid:21) , Φ = (cid:20) Φ V |V|×|E| |E|×|V| Φ E (cid:21) . Proposition 1.

Deﬁne the following matrix ¯ A = 12 (cid:16) Φ V P V→E Φ − E + Φ − V P (cid:62)E→V Φ E (cid:17) , (6)and denote by ¯ u and ¯ v the left and right singular vectors of ¯ A associated with the singular value ¯ λ , respectively. Then, thevector v = [¯ u (cid:62) , ¯ v (cid:62) ] (cid:62) is the eigenvector of L associated withthe eigenvalue λ = α (1 − ¯ λ ) . Proof.

Let us rewrite L as L = α (cid:20) I |V| − ¯ A − ¯ A (cid:62) I |E| (cid:21) . (7)Split its eigenvector into two parts v = [ v (cid:62)V , v (cid:62)E ] (cid:62) where v V and v E respectively have length |V| and |E| . Then we have α (cid:20) I |V| − ¯ A − ¯ A (cid:62) I |E| (cid:21) (cid:20) v V v E (cid:21) = λ (cid:20) v V v E (cid:21) , and it follows that ¯ Av E = (1 − α − λ ) v V , ¯ A (cid:62) v V = (1 − α − λ ) v E . When − α − λ > , i.e. λ < α , v V and v E are respectivelythe left and right singular vectors of ¯ A and − α − λ is thecorresponding singular value.Based on Proposition 1, our proposed spectral hypergraphco-clustering algorithm is given by the following steps:1) Compute the k left and right singular vectors of ¯ A associ-ated with the k largest singular values, denoted by ¯ U ∈ R |V|× k and ¯ V |E|× k , respectively.2) Leverage Proposition 1 to form U = [ Φ − V ¯ U ; Φ − E ¯ V ] .3) (Optional) Normalize the rows of U to have unit norm.4) Apply k -means to the rows of U (or its normalized version).The optional normalization step above is inspired by thespectral partitioning algorithm proposed in [26]. In our nextsection, we denote the variant of our algorithm without nor-malization as s-spec-1 whereas the one that implements thethird step above is denoted as s-spec-2. How to choose parameter α ? From Proposition 1 and (7) wecan see that the choice of α affects the eigenvalues of L butdoes not change its eigenvectors (or their order). Hence, theproposed spectral clustering method is independent of α .IV. E XPERIMENTS

In this section, we evaluate the performance of the pro-posed methods via numerical experiments. We consider twowidely used real-world text datasets: 20 Newsgroups andReuters Corpus Volume 1 (RCV1) [27]. Both of them containdocuments in different categories. We extract two subsets ofdocuments from each of them to build datasets of differentlevels of difﬁculty (datasets 1 and 3 are easier than datasets 2and 4; see Table I). We consider the , most frequent wordsin the corpus after removing stop words and words appearingin > and < . of the documents.To model text datasets using hypergraphs with EDVWs,we follow the procedure in [12]. More precisely, we considerdocuments as vertices and words as hyperedges. A document(vertex) belongs to a word (hyperedge) if the word appearsin the document. The EDVWs (the entries in R ) are takenas the corresponding tf-idf (term frequency–inverse documentfrequency) values, which reﬂect how relevant a word is to adocument in a collection of documents. The weight associatedwith a hyperedge is computed as the standard deviation of theentries in the corresponding row of R . The code needed to replicate the numerical experiments presented in thispaper can be found at https://github.com/yuzhu2019/hypergraph cocluster. http://qwone.com/ ∼ jason/20Newsgroups/ ig. 1: Performance comparison of clustering algorithms. The two rows respectively show the clustering accuracy of documentsand words. Each column corresponds to one dataset.Fig. 2: The 2D t-SNE plot of document and word embeddingslearned by s-spec-2 in Dataset 1. doc- i and word- i indicatedocuments and words from the four classes in the dataset.We compare the proposed methods (s-spec-1 and s-spec-2) with the following three methods. (i) The naive method(naive): We run k -means on the columns and the rows of thetf-idf matrix R to cluster documents and words, respectively.(ii) Bipartite spectral graph partitioning (bi-spec) [16]: Thedataset is modeled as an (undirected) bipartite graph betweendocuments and words, then a spectral graph partitioning al-gorithm is applied; see Section II-B. (iii) Clique expansion(c-spec, Algorithm 1 in [12]): This method projects the hyper-graph with EDVWs onto a proxy graph via the so-called cliqueexpansion, then applies a spectral graph partitioning algorithm.We consider it as the state-of-the-art method. Since c-spec canonly cluster the vertices (and not the hyperedges), we builda hypergraph as mentioned above to cluster documents andthen we construct another hypergraph in which we take wordsas vertices and documents as hyperedges to cluster words.Notice that of the above mentioned methods only the proposedmethods (s-spec-1 and s-spec-2) and bi-spec can co-clusterdocuments and words.To evaluate the clustering performance, we consider four Fig. 3: Word clouds for words predicted in the classes‘comp.os.ms-windows.misc’ and ‘sci.crypt’.metrics, namely, clustering accuracy score (ACC), normal-ized mutual information (NMI), weighted F1 score (F1), andadjusted Rand index (ARI) [28]. For all of them, a largervalue indicates a better performance. Notice that there areno ground-truth classes for words. Hence, following [29],we consider the class conditional word distribution. Moreprecisely, we compute the aggregate word distribution foreach document class, then for every word we assign it to theclass in which it has the highest probability in the aggregatedistribution. We regard this assignment as the ground truth forperformance evaluation.The numerical results (averaged over runs of k -means)are shown in Fig. 1. We ﬁrst notice that, of the proposedmethods, s-spec-2 usually performs better than s-spec-1. Thisis in line with [26], where it was observed that the lack of anormalization step (as in our s-spec-1) might lead to perfor-mance decays when the connectivity within each cluster variessubstantially across clusters. It can also be seen that the pro-posed methods and c-spec tend to work better than the naivemethod and the classical bipartite spectral graph partitioningmethod. This underscores the value of the hypergraph modelconsidered. Importantly, s-spec-2 achieves similar clusteringaccuracy as the state-of-the-art c-spec for documents but tendsto perform better in clustering words. Moreover, the proposedmethods achieve small standard deviations, indicating theirrobustness to different centroid initializations in k -means.Having showed the superiority in performance of s-spec-2,we now present visualizations of its application to Dataset 1to further illustrate its effectiveness. In Fig. 2, we depict thembeddings of documents and words obtained by s-spec-2 bymapping them to a 2D space using t-SNE [30]. We can see thatdocuments and words in the same class appear to form groups.In Fig. 3, we plot the word clouds for the words predicted inthe classes ‘comp.os.ms-windows.misc’ (Microsoft Windowsoperating system) and ‘sci.crypt’ (cryptography). The sizeof a word is determined by its frequency in the documentspredicted in the same class, thus is able to reveal its importancein the class. We can see that the top words (such as windows,ﬁle, dos, ms in ‘comp.os.ms-windows.misc’) align well withour intuitive understanding of the class topics.V. C ONCLUSIONS

We developed valid Laplacian matrices for hypergraphs withEDVWs, based on which we proposed spectral partitioningalgorithms for co-clustering vertices and hyperedges. Throughreal-world text mining applications, we showcased the valueof considering hypergraph models and demonstrated the effec-tiveness of our proposed methods. Future research avenues in-clude: (i) Developing alternative co-clustering methods wherewe replace the spectral clustering step by non-negative matrixtri-factorization algorithms [29, 31, 32] of matrices relatedto the hypergraph Laplacians. (ii) Generalizing additionalexisting digraph Laplacians [33, 34] to the hypergraph case.(iii) Study the use of the hypergraph model with EDVWs inother network analysis tasks such hypergraph alignment [35–37]. Related to this last point, the fact that our proposedmethods embed vertices and hyperedges in the same vectorspace (as shown in Fig. 2) facilitates the development ofembedding-based hypergraph alignment algorithms [38].R

EFERENCES[1] Yi Han, Bin Zhou, Jian Pei, and Yan Jia, “Understanding importanceof collaborations in co-authorship networks: A supportiveness analysisapproach,” in

SDM , 2009, pp. 1112–1123.[2] Sameer Agarwal, Jongwoo Lim, Lihi Zelnik-Manor, Pietro Perona,David Kriegman, and Serge Belongie, “Beyond pairwise clustering,”in

CVPR , 2005, vol. 2, pp. 838–845.[3] Dengyong Zhou, Jiayuan Huang, and Bernhard Sch¨olkopf, “Learningwith hypergraphs: Clustering, classiﬁcation, and embedding,” in

NIPS ,2007, pp. 1601–1608.[4] Sameer Agarwal, Kristin Branson, and Serge Belongie, “Higher orderlearning with graphs,” in

ICML , 2006, pp. 17–24.[5] Amnon Shashua, Ron Zass, and Tamir Hazan, “Multi-way clusteringusing super-symmetric non-negative tensor factorization,” in

ECCV ,2006, pp. 595–608.[6] Debarghya Ghoshdastidar and Ambedkar Dukkipati, “A provable gen-eralized tensor spectral method for uniform hypergraph partitioning,” in

ICML , 2015, pp. 400–409.[7] Yannan Chen, Liqun Qi, and Xiaoyan Zhang, “The ﬁedler vector of alaplacian tensor for hypergraph partitioning,”

Siam Journal on ScientiﬁcComputing , vol. 39, no. 6, pp. A2508–A2537, 2017.[8] Zheng Tracy Ke, Feng Shi, and Dong Xia, “Community detectionfor hypergraph networks via regularized tensor power iteration,” arXivpreprint arXiv:1909.06503 , 2019.[9] Pan Li and Olgica Milenkovic, “Inhomogeneous hypergraph clusteringwith applications,” in

NIPS , 2017, pp. 2308–2318.[10] Inci M Baytas, Cao Xiao, Fei Wang, Anil K Jain, and Jiayu Zhou,“Heterogeneous hyper-network embedding,” in

ICDM , 2018, pp. 875–880.[11] Uthsav Chitra and Benjamin J Raphael, “Random walks on hypergraphswith edge-dependent vertex weights,” in

ICML , 2019, pp. 1172–1181. https://github.com/amueller/word cloud [12] Koby Hayashi, Sinan G. Aksoy, Cheong Hee Park, and Haesun Park,“Hypergraph random walks, laplacians, and clustering,” in CIKM , 2020,pp. 495–504.[13] Michael T Schaub, Yu Zhu, Jean-Baptiste Seby, T Mitchell Roddenberry,and Santiago Segarra, “Signal processing on higher-order networks:Livin’on the edge... and beyond,” arXiv preprint arXiv:2101.05510 ,2021.[14] Jianbo Li, Jingrui He, and Yada Zhu, “E-tail product return predictionvia hypergraph-based local graph cut,” in

KDD , 2018, pp. 519–527.[15] Lei Ding and Alper Yilmaz, “Interactive image segmentation usingprobabilistic hypergraphs,”

Pattern Recognition , vol. 43, no. 5, pp. 1863–1873, 2010.[16] Inderjit S Dhillon, “Co-clustering documents and words using bipartitespectral graph partitioning,” in

KDD , 2001, pp. 269–274.[17] Inderjit S Dhillon, Subramanyam Mallela, and Dharmendra S Modha,“Information-theoretic co-clustering,” in

KDD , 2003, pp. 89–98.[18] Michail Vlachos, Francesco Fusco, Charalambos Mavroforakis, Anasta-sios Kyrillidis, and Vassilios G Vassiliadis, “Improving co-cluster qualitywith application to product recommendations,” in

CIKM , 2014, pp. 679–688.[19] Yizong Cheng and George M Church, “Biclustering of expression data.,”in

Ismb , 2000, vol. 8, pp. 93–103.[20] Hyuk Cho, Inderjit S Dhillon, Yuqiang Guan, and Suvrit Sra, “Minimumsum-squared residue co-clustering of gene expression data,” in

SDM ,2004, pp. 114–125.[21] Bo Long, Zhongfei Zhang, and Philip S Yu, “Co-clustering by blockvalue decomposition,” in

KDD , 2005, pp. 635–640.[22] Aydın Buluc¸, Henning Meyerhenke, Ilya Safro, Peter Sanders, andChristian Schulz, “Recent advances in graph partitioning,”

Algorithmengineering , pp. 117–158, 2016.[23] Jianbo Shi and Jitendra Malik, “Normalized cuts and image segmenta-tion,”

IEEE Transactions on pattern analysis and machine intelligence ,vol. 22, no. 8, pp. 888–905, 2000.[24] Stuart Lloyd, “Least squares quantization in PCM,”

IEEE transactionson information theory , vol. 28, no. 2, pp. 129–137, 1982.[25] Fan Chung, “Laplacians and the cheeger inequality for directed graphs,”

Annals of Combinatorics , vol. 9, no. 1, pp. 1–19, 2005.[26] Andrew Y Ng, Michael I Jordan, and Yair Weiss, “On spectralclustering: Analysis and an algorithm,” in

NIPS , 2002, pp. 849–856.[27] David D Lewis, Yiming Yang, Tony G Rose, and Fan Li, “RCV1: Anew benchmark collection for text categorization research,”

JMLR , vol.5, pp. 361–397, Apr 2004.[28] Scott Emmons, Stephen Kobourov, Mike Gallant, and Katy B¨orner,“Analysis of network clustering algorithms and cluster quality metricsat scale,”

PloS one , vol. 11, no. 7, 2016.[29] Chris Ding, Tao Li, Wei Peng, and Haesun Park, “Orthogonal nonnega-tive matrix t-factorizations for clustering,” in

KDD , 2006, pp. 126–135.[30] Laurens van der Maaten and Geoffrey Hinton, “Visualizing data usingt-sne,”

JMLR , vol. 9, pp. 2579–2605, Nov 2008.[31] Fanhua Shang, LC Jiao, and Fei Wang, “Graph dual regularization non-negative matrix factorization for co-clustering,”

Pattern Recognition ,vol. 45, no. 6, pp. 2237–2250, 2012.[32] Hua Wang, Feiping Nie, Heng Huang, and Fillia Makedon, “Fastnonnegative matrix tri-factorization for large-scale data co-clustering,”in

IJCAI , 2011.[33] Yanhua Li and Zhi-Li Zhang, “Random walks on digraphs, the gener-alized digraph laplacian and the degree of asymmetry,” in

InternationalWorkshop on Algorithms and Models for the Web-Graph , 2010, pp. 74–85.[34] Mihai Cucuringu, Huan Li, He Sun, and Luca Zanetti, “Hermitianmatrices for clustering directed graphs: insights and applications,” in

AISTATS , 2020, pp. 983–992.[35] Ron Zass and Amnon Shashua, “Probabilistic graph and hypergraphmatching,” in

CVPR , 2008, pp. 1–8.[36] Shulong Tan, Ziyu Guan, Deng Cai, Xuzhen Qin, Jiajun Bu, and ChunChen, “Mapping users across networks by manifold alignment onhypergraph,” in

AAAI , 2014, vol. 28.[37] Shahin Mohammadi, David F Gleich, Tamara G Kolda, and AnanthGrama, “Triangular alignment (TAME): A tensor-based approach forhigher-order network alignment,”

IEEE/ACM transactions on computa-tional biology and bioinformatics , vol. 14, no. 6, pp. 1446–1458, 2016.[38] Mark Heimann, Haoming Shen, Tara Safavi, and Danai Koutra, “Regal:Representation learning-based graph alignment,” in