Weighted Graph Nodes Clustering via Gumbel Softmax
WWeighted Graph Nodes Clustering via Gumbel Softmax
Deepak Bhaskar Acharya
Computer ScienceHuntsville, Alabama, [email protected]
Huaming Zhang
Computer ScienceHuntsville, Alabama, [email protected]
ABSTRACT
Graph is a ubiquitous data structure in data science that is widelyapplied in social networks, knowledge representation graphs,recommendation systems, etc. When given a graph datasetconsisting of one graph or more graphs, where the graphs areweighted in general, the first step is often to find clusters in thegraphs. In this paper, we present some ongoing research results ongraph clustering algorithms for clustering weighted graph datasets,which we name as Weighted Graph Node Clustering via GumbelSoftmax (WGCGS for short). We apply WGCGS on the Karate clubweighted network dataset. Our experiments demonstrate thatWGCGS can efficiently and effectively find clusters in the Karateclub weighted network dataset. Our algorithm’s effectiveness isdemonstrated by (1) comparing the clustering result obtained fromour algorithm and the given labels of the dataset; and (2)comparing various metrics between our clustering algorithm andother state-of-the-art graph clustering algorithms.
CCS CONCEPTS • Computing methodologies → Cluster analysis . KEYWORDS
Graph Neural Networks, Gumbel-Softmax, Weighted GraphClustering.
ACM Reference Format:
Deepak Bhaskar Acharya and Huaming Zhang. 2020. Weighted Graph NodesClustering via Gumbel Softmax. In
ACM, New York, NY, USA, 4 pages.https://doi.org/10.1145/3374135.
Graph clustering concerns the discovery of densely related nodegroups in a graph. Here an edge between two nodes normallyimplies any underlying similarity or affinity between nodes, whilethe absence of an edge suggests dissimilarity and distance. Giventhe (often noisy) similarity/dissimilarity observations embedded inthe network, graph clustering thus attempts to infer groups ofclosely connected nodes. Graph analysis using the technique ofmachine learning has been recognized as the strength of graphs is
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].
ACMSE 2020, April 2–4, 2020, Tampa, FL, USA © 2020 Association for Computing Machinery.ACM ISBN 978-1-4503-7105-6/20/03...$15.00https://doi.org/10.1145/3374135. enormous [19], i.e., graphs can be used to denote a vast variety ofstructures in different fields, including [9] social networks, [8]knowledge graphs, [11] natural science (physical systems [3] andprotein-protein interaction networks [6]) and many other researchareas. In several applications, such as community detection, userprofiling, graph clustering, is an essential subroutine. In machinelearning, statistics, and pattern recognition, clustering is a subjectof active research. Clustering, which is used almost everywhere incomputing, is a simple problem. It requires splitting the data intogroups of similar objects, and most machine learning applicationsare the fundamental issue behind it. Data mining leads to datasetswith many different attributes being clustered. Therefore therelevant clustering algorithms need particular computationalrequirements. Several algorithms have recently emerged that meetthese requirements and have been used successfully for real-lifedata mining challenges.Clustering is the unsupervised mechanism by which naturalclusters are found such that objects from the same cluster areidentical, and objects from separate clusters are distinct. Ifsimilarity relationships between objects are interpreted inclustering as a simple, weighted graph where objects are verticesand similarities between objects are edge weights, clusteringreduces the graph clustering problem. However, in manyapplications, the edge/non-edge between each pair of nodes canreflect very different confidence levels of node similarity. In certaincircumstances, edge observation (the absence of an edge) can becreated by accurate measurements. Therefore, it is a clearindication that the same clusters should be assigned to the twonodes.In other cases, the findings will be very uncertain, and thus,no knowledge about the cluster configuration will give an edge orthe lack of it. The findings between certain node pairs can hold noinformation as an exceptional example, so they are practicallyunnoticed.By extending Acharya and Zhang [1] Community DetectionClustering technique via Gumbel Softmax (WGCGS), in thisresearch paper, we propose the new method for Weighted GraphNodes Clustering via Gumbel Softmax. The article presented byusing [1] focuses on the unweighted graph datasets. In thisresearch, we recommend finding the cluster of the nodes when thegraph is weighted.The remaining of the paper is organized as follows. In Section 2,we introduce the background and related works in more detail. InSection 3, we introduce our main method. In Section 4, we presentthe experimental results. In section 5, we conclude our work withthe future enhancements. a r X i v : . [ c s . S I] F e b BACKGROUND AND RELATED WORK2.1 Community Detection Clustering viaGumbel Softmax
Different algorithms have been proposed, and there is extensiveresearch on community detection. Detecting communities is amethod for defining related groups that can be difficult, dependingon the graph network’s size and scale. Acharya and Zhang [1]provide a general understanding of their paper on working withunsupervised graph datasets and identifying graph nodes’ clusters.In general, let the adjacency matrix be 𝐴 𝑛 × 𝑛 , where ’n’ representsthe total number of nodes in the graphs dataset. The 𝑊 𝐶 matrix sizeis 𝑛 × 𝑘 , where k indicates the number of clusters. After obtaininga matrix of the size 𝑘 × 𝑘 then perform the 𝑊 𝑡𝐶 𝐴𝑊 𝐶 operation andcall this matrix as 𝑅 𝑘 × 𝑘 .The resulting 𝑅 𝑘 × 𝑘 matrix displays the cluster strength, wherethe primary diagonal shows the strength of the data within thecluster groups, and the other 𝑅 matrix components show thestrength of the data between the different clusters. Then as adiscrete distribution of probabilities, apply the softmax function tothe 𝑅 matrix obtained to express inputs. Mathematically be definedas follows: 𝑆𝑜 𝑓 𝑡𝑚𝑎𝑥 ( 𝑥 𝑖 ) = 𝑒𝑥𝑝 ( 𝑥 𝑖 ) (cid:205) 𝑚𝑗 = 𝑒𝑥𝑝 ( 𝑥 𝑗 ) 𝑓 𝑜𝑟 𝑖 = , , , ......, 𝑚. (1) In the suggested approach for the citation datasets, the articlefeatures selection and extraction for Graph Neural Networks,Acharya and Zhang [2] selected and extracted Graph NeuralNetwork (GNN). By applying the feature selection and extractiontechnique to GNNs using gumbel softmax, they conduct testsusing different reference datasets: Cora, Pubmed, and Citeseer.Acharya and Zhang demo an example of the Citeseer dataset,where 375 of 3703 features are selected and ranked according toprominent features. The proposed deep learning method workswell with reduced features, decreasing the number of features thatthe dataset initially had by about 80-85 percent. The experimentresults show that the accuracy declines steadily when using theselected classification features falling within the ranges 1-75,76-150, 151-225, 226-300, and 301-375.In general, let the input feature matrix be 𝑋 𝑛 × 𝑓 where ’n’ and’f’ represent the total number of nodes and features in the graph’sdataset. The resulting matrix is 𝑀 𝑛 × 𝑘 , where ’k’ represents thefeatures we select from the ’f’ features.Acharya and Zhang select the features in a graph citationdataset using the gumbel softmax function and the proposedapproach. Gumbel softmax’s distribution is [10, 12]’ a continuousdistribution over the simplex that can approximate categoricaldistribution samples.’ A categorical distribution is a one-hot vectorthat defines the maximum probability of one and all the otherprobability as zero.The two layer Graph Convolution Network (GCN) used in theexperiment is defined as 𝐺𝐶𝑁 ( 𝑋, 𝐴 ) = 𝑆𝑜 𝑓 𝑡𝑚𝑎𝑥 ( 𝐴 ( 𝑅𝑒𝐿𝑢 ( 𝐴𝑋𝑊 𝐺 𝑊 )) 𝑊 ) (2) To verify the selected features and calculate the accuracy forclassification they use the following two layer Graph ConvolutionNetwork as defined below 𝐺𝐶𝑁 ( 𝑋, 𝐴 ) = 𝑆𝑜 𝑓 𝑡𝑚𝑎𝑥 ( 𝐴 ( 𝑅𝑒𝐿𝑢 ( 𝐴𝑋𝑊 ′ 𝐺 )) 𝑊 ) (3) 𝐴 : Adjacency matrix of the undirected graph G. 𝑋 : Input feature matrix. 𝑊 𝐺 : gumbel softmax feature selection / feature extraction matrix. 𝑊 ′ 𝐺 : feature selection / feature extraction matrix obtain ed from theresult of Equation 2. 𝑊 ,𝑊
2: Layer-specific trainable weight matrix.
𝑅𝑒𝐿𝑢 : Activation function ReLu(.) = max(0,.).
The proposed approach in the article Community DetectionClustering via Gumbel Softmax (CDCGS) clusters of graph datasetnodes and the article Feature Selection and Extraction for GraphNeural Networks (FSEGNN) explains how prominent features forgraph citation datasets can be selected and extracted. The datasetused is a graph in each of the papers. There are nodes and edges inthe graphs, and they have relations between the nodes that canhave some knowledge about the similarities or dependenciesbetween the nodes. The two papers above do not provide insightinto the network of weighted graphs. We suggest a new method ofWeighted Graph Nodes Clustering via Gumbel Softmax (WGCCS)to cluster the graph nodes as an extension to the CDCGS andFSEGNN.We consider the network of ’n’ nodes and the adjacency matrix’A’ in our WGCCS method. An adjacency matrix is used to representa finite graph. The elements of the matrix show whether the pairsof vertices in the graph are adjacent or not. Then we cluster thegraph into k clusters, applying our method.Gumbel softmax distributions can be used to estimate thesampling process of discrete data if we have a stochastic neuralnetwork with discrete variables. Using backpropagation, thenetwork can then be trained, where the efficiency of the networkwill depend on the temperature parameter choice.The method used in the experiment is defined as below:
𝑊 𝑒𝑖𝑔ℎ𝑡𝑒𝑑 − 𝐺𝑟𝑎𝑝ℎ − 𝐶𝑙𝑢𝑠𝑡𝑒𝑟 ( 𝐴𝑑 𝑗 ) = 𝑆𝑜 𝑓 𝑡𝑚𝑎𝑥 ( 𝑊 𝑡𝐶 ( 𝐴𝑑 𝑗 ) 𝑊 𝐶 ) (4)In the Equation 4, ’ 𝐴𝑑 𝑗 ’ indicates the Adjacency matrix of theundirected weighted graph G, where weight of graph edgesindicates the similarity between the nodes of the graph and ’ 𝑊 𝐶 ’indicates the gumbel cluster weight matrix.We have the gumbel cluster weight matrix 𝑊 𝐶 of size 𝑁 × 𝑘 oncemodel training is completed, where 𝑁 is the number of graph nodes,and 𝐾 is the number of clusters. Then, we look at the sum of therow elements, where each row sums up to 1, and the row number’smaximum index is the cluster to which the graph node belongs. Forexample, let’s consider k= 2, i.e., we’re trying to cluster the datasetinto two cluster groups. Let us say that the data is [0.65 0.35] forthe first row, where 0.65 is 0 𝑡ℎ , and 0.35 is 1 𝑠𝑡 . By looking at themaximum value in the row, we get the corresponding maximumvalue index, and one can easily say that the data belongs to cluster0. If [0.39 0.61] is the data for the second row, 0.39 will be 0 𝑡ℎ , and0.61 will be 1 𝑠𝑡 index. By looking at the maximum value in the row,e get the corresponding maximum value index, and one can easilysay that the data belongs to cluster 1.The resulting 𝑘 × 𝑘 size matrix derived from the 4 equation iscompared to the identity matrix ( 𝐼 𝑘 × 𝑘 ) to find network losses. Herethe cluster groups represent diagonal components. We may thenform the cluster groups of the data points by using our proposedapproach. A detailed analysis is given in the result section, takinginto account an example of a dataset weighted karate club network. Zachary’s Weighted Karate Club Network is a well-known datasetdescribing the relationships in a university karate club used byWayne W. Zachary in his paper "An Information Flow Model forConflict and Fission in Small Groups." This dataset is known forits simple description of the community structure, which occursbecause it is possible to cluster network nodes into strongly linkedsets. Focused on Mr. Hi, the karate instructor, and John A, theclub president, Zachary’s Karate Club network can be split intotwo groups. The network accurately predicts how, following adisagreement over the pay, the karate club splits into two new clubsand creates a rift within Hr. Hi and John A. The network showswhich members of the club will join the new club by analyzingcommunity members’ meetings outside the club’s background in33 of 34 instances.The initial dataset of Zachary’s Karate Club is weighted bymultiple friendship measures. Several visualizations of Zachary’sKarate Club have been created since the first article was publishedin 1977. Michelle Girvan and Mark Newman [7] used Zachary’sKarate Club again in 2002 to demonstrate community structure intheir article "Community structure in social and biologicalnetworks." Figure 1 is a network created by a computer thatcontains each member of the club.
The weighted network of Zachary karate club is the network wecan think of to evaluate our methodology. We also know that thetwo teacher nodes are dominant graph nodes 0 and 33, as theweighted Zachary network is labeled. Any other network nodescan either join the teacher community of node 0 or the teachercommunity of node 33. Until clustering, the original network isseen in figure 2. Second, we run our strategy to cluster thenetwork into 2-clusters like the Zachary karate club’sground-truth community structure has 2-clusters. We compare ourproposed method results with the results of algorithms such asgreedy optimization of leading eigenvector of the communitymatrix(ECM) [13], edge betweenness(EB) [14], short randomwalks(RW) [15], Infomap community finding(ICF) [18], multi-leveloptimization of modularity(MOM) [4], propagating labels(PL) [16].Metrics such as adjusted randomized index (ARI) [5], homogeneity(HOMO), normalized mutual information (NMI), completeness(COMP), and v-measure V-MES) [17] has also been computed tocomplete the analysis. ARI = 1, HOMO = 1, NMI = 1, COMP = 1,V-MES = 1 and the modularity calculation = 0.371 are thecorresponding metric metrics if we cluster the weighted Zacharynetwork into 2-clusters (WGCGS(2C)) using WGCGS. The value of
Figure 1: Weighted Karate Club Network
ARI, HOMO, NMI, COMP, and V-MES indicates that the proposedalgorithm clusters the nodes to their respective groups as per thetarget label. Figure 2 represents Zachary’s network of 2-clusterkarate clubs resulting from our approach proposed.Whenever we consider the graph dataset, the higher themodularity value indicates a stronger assignment to the cluster. Toobtain higher modularity for the stronger assignment, weclustered the weighted Zachary karate club network into fourclusters. 4-clustered Zachary network (WGCGS(4C)) gives us a0.4197 modularity metric (Figure 3). Considering that studentsjoining node 0 as cluster group 1 and node 33 as cluster group 2,node 0 and node 33 are already connected to several nodes, andnodes 23, 24, 25, 27, 28, 31 (cluster 3) and 4, 5, 6, 10, 16 (cluster 4)mean that students are not joining cluster group 1 or 2. To test theclassification of metadata groups by various algorithms, we usedthe best scores of the Zachary karate club network along withdifferent algorithms. Our WGCGS approach outperforms themetric results of the ECM, EB, RW, MOM, and PL algorithms, andwe presented them in Table 1.The result obtained from the 4 equation is compared to theloss function. The loss function used for our experiment is theidentity matrix ( 𝐼 𝑘 × 𝑘 ), where the cluster group is represented byeach diagonal member. We introduced the Weighted Graph Nodes Clustering via GumbelSoftmax strategy for the weighted karate club network. Theexperimental results illustrate the efficiency by selectingappropriate parameter values and checking the resultingclustering accuracy. The approach currently available is just asdiverse as the implementations for weighted undirected graph able 1: Zachary karate club network metrics for the different algorithms and our method (WGCGS)Metric ECM EB RW ICF MOM PL WGCGS(2C) WGCGS(4C)
ARI 0.5120 0.5125 0.2620 0.7021 0.4619 0.4714 0.5414 1NMI 0.6770 0.6097 0.3729 0.6994 0.5866 0.5282 0.6873 1HOMO 1 0.8850 0.5588 0.8535 0.8535 0.6905 1 1COMP 0.5118 0.4651 0.2798 0.5925 0.44.68 0.4277 0.5235 1V-MES 0.6770 0.6097 0.3729 0.6994 0.5866 0.5282 0.6872 1
Figure 2: Weighted Karate Club Network with 2-Clusters clustering. We are currently experimenting with applying theprinciple of clustering for Graphs on few more weighted graphdatasets and a directed graph dataset.
REFERENCES [1] D. B. Acharya and H. Zhang. Community detection clustering via gumbel softmax.
SN Computer Science , 1(5):1–11, 2020.[2] D. B. Acharya and H. Zhang. Feature selection and extraction for graph neuralnetworks. In
Proceedings of the 2020 ACM Southeast Conference , ACM SE ’20,page 252–255, New York, NY, USA, 2020. Association for Computing Machinery.[3] P. Battaglia, R. Pascanu, M. Lai, D. J. Rezende, et al. Interaction Networks forLearning about Objects, Relations and Physics. In
Advances in neural informationprocessing systems , pages 4502–4510, Barcelona, Spain, 2016.[4] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfoldingof communities in large networks.
Journal of Statistical Mechanics: Theory andExperiment , 2008(10):P10008, Oct 2008.[5] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae,P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt,and G. Varoquaux. API design for machine learning software: experiences fromthe scikit-learn project. In
ECML PKDD Workshop: Languages for Data Miningand Machine Learning , pages 108–122, 2013.[6] A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur. Protein Interface Prediction usingGraph Convolutional Networks. In
Advances in Neural Information ProcessingSystems , pages 6530–6539, Long Beach, USA, 2017.[7] M. Girvan and M. E. Newman. Community structure in social and biologicalnetworks.
Proceedings of the national academy of sciences , 99(12):7821–7826, 2002.
Figure 3: Weighted Karate Club Network with 4-Clusters [8] T. Hamaguchi, H. Oiwa, M. Shimbo, and Y. Matsumoto. Knowledge Transferfor Out-of-knowledge-base Entities: A Graph Neural Network Approach. In
Proceedings of the Twenty-Sixth International Joint Conference on ArtificialIntelligence , Melbourne, Australia, 2017.[9] W. Hamilton, Z. Ying, and J. Leskovec. Inductive Representation Learning onLarge Graphs. In
Advances in Neural Information Processing Systems , pages1024–1034, Long Beach, USA, 2017.[10] E. Jang, S. Gu, and B. Poole. Categorical Reparameterization with Gumbel-softmax. In
ICLR , Toulon, France, 2017.[11] E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning CombinatorialOptimization Algorithms over Graphs. In
Advances in Neural InformationProcessing Systems , pages 6348–6358, Long Beach, USA, 2017.[12] C. J. Maddison, A. Mnih, and Y. W. Teh. The Concrete Distribution: A ContinuousRelaxation of Discrete Random Variables. In
ICLR , Toulon, France, 2017.[13] M. E. Newman. Finding community structure in networks using the eigenvectorsof matrices.
Physical review E , 74(3):036104, 2006.[14] M. E. J. Newman and M. Girvan. Finding and evaluating community structure innetworks.
Phys. Rev. E , 69:026113, Feb 2004.[15] P. Pons and M. Latapy. Computing communities in large networks using randomwalks (long version), 2005.[16] U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detectcommunity structures in large-scale networks.
Physical Review E , 76(3), Sep 2007.[17] A. Rosenberg and J. Hirschberg. V-measure: A conditional entropy-based externalcluster evaluation measure. pages 410–420, 2007.[18] M. Rosvall, D. Axelsson, and C. T. Bergstrom. The map equation.
The EuropeanPhysical Journal Special Topics , 178(1):13–23, Nov 2009.[19] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How Powerful Are Graph NeuralNetworks? In